CN113450881B - Scoring method for three-dimensional similarity of molecules for virtual screening of drugs - Google Patents

Scoring method for three-dimensional similarity of molecules for virtual screening of drugs Download PDF

Info

Publication number
CN113450881B
CN113450881B CN202110829201.2A CN202110829201A CN113450881B CN 113450881 B CN113450881 B CN 113450881B CN 202110829201 A CN202110829201 A CN 202110829201A CN 113450881 B CN113450881 B CN 113450881B
Authority
CN
China
Prior art keywords
molecules
molecule
hydrogen bond
similarity
volume
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202110829201.2A
Other languages
Chinese (zh)
Other versions
CN113450881A (en
Inventor
严鑫
李瑞麟
卢峰
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Weixin Zhiyao Shanghai Technology Co ltd
Original Assignee
Beijing Zhongda Weixin Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Zhongda Weixin Technology Co ltd filed Critical Beijing Zhongda Weixin Technology Co ltd
Priority to CN202110829201.2A priority Critical patent/CN113450881B/en
Publication of CN113450881A publication Critical patent/CN113450881A/en
Application granted granted Critical
Publication of CN113450881B publication Critical patent/CN113450881B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16CCOMPUTATIONAL CHEMISTRY; CHEMOINFORMATICS; COMPUTATIONAL MATERIALS SCIENCE
    • G16C20/00Chemoinformatics, i.e. ICT specially adapted for the handling of physicochemical or structural data of chemical particles, elements, compounds or mixtures
    • G16C20/50Molecular design, e.g. of drugs
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16CCOMPUTATIONAL CHEMISTRY; CHEMOINFORMATICS; COMPUTATIONAL MATERIALS SCIENCE
    • G16C20/00Chemoinformatics, i.e. ICT specially adapted for the handling of physicochemical or structural data of chemical particles, elements, compounds or mixtures
    • G16C20/20Identification of molecular entities, parts thereof or of chemical compositions
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16CCOMPUTATIONAL CHEMISTRY; CHEMOINFORMATICS; COMPUTATIONAL MATERIALS SCIENCE
    • G16C20/00Chemoinformatics, i.e. ICT specially adapted for the handling of physicochemical or structural data of chemical particles, elements, compounds or mixtures
    • G16C20/60In silico combinatorial chemistry
    • G16C20/64Screening of libraries
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16CCOMPUTATIONAL CHEMISTRY; CHEMOINFORMATICS; COMPUTATIONAL MATERIALS SCIENCE
    • G16C20/00Chemoinformatics, i.e. ICT specially adapted for the handling of physicochemical or structural data of chemical particles, elements, compounds or mixtures
    • G16C20/70Machine learning, data mining or chemometrics

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Chemical & Material Sciences (AREA)
  • Physics & Mathematics (AREA)
  • Computing Systems (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • General Health & Medical Sciences (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Crystallography & Structural Chemistry (AREA)
  • Data Mining & Analysis (AREA)
  • Software Systems (AREA)
  • Evolutionary Computation (AREA)
  • Artificial Intelligence (AREA)
  • Spectroscopy & Molecular Physics (AREA)
  • General Engineering & Computer Science (AREA)
  • Mathematical Physics (AREA)
  • Medicinal Chemistry (AREA)
  • General Physics & Mathematics (AREA)
  • Molecular Biology (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Computational Linguistics (AREA)
  • Medical Informatics (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Pharmacology & Pharmacy (AREA)
  • Databases & Information Systems (AREA)
  • Library & Information Science (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention relates to the technical field of computer-aided drug research and development, in particular to a scoring method for three-dimensional similarity of molecules for drug virtual screening, which comprises the following steps of firstly, acquiring characteristic parameters of two molecules for similarity comparison; training a deep learning model; and step three, external verification of the deep learning model. According to the invention, the deep neural network is trained, the multidimensional similarity is selected as a characteristic parameter, and the final comprehensive scoring, namely the AUC value, is given through transformation of the deep neural network. The scoring method has remarkable improvement on the drug screening accuracy, and can screen out as many molecules with potential biological activity from a database containing a large number of molecules as possible without missing any potential active molecules, thereby solving the problem of false positive or false negative, and simultaneously maintaining the calculation speed of screening high flux. Therefore, the invention has very wide application prospect in the aspect of drug virtual screening.

Description

Scoring method for three-dimensional similarity of molecules for virtual screening of drugs
Technical Field
The invention relates to the technical field of computer-aided drug research and development, in particular to a scoring method for three-dimensional similarity of molecules for drug virtual screening.
Background
The drug research and development has the characteristics of large investment, high risk and long period, and generally one drug research and development period is more than 10 years, the research and development investment is hundreds of millions of dollars, and the trend is rising year by year. Drug screening is a key link of drug discovery, and high-flux drug virtual screening can greatly reduce screening time and cost, and has important significance for accelerating drug research and development.
In the virtual screening of drugs, a scoring method based on three-dimensional similarity of molecules is commonly used at present to carry out screening and sorting of molecules. Such scoring methods typically involve similarity of molecular shape and similarity of pharmacophores (typically scoring of similarity of database molecules to template molecules or pharmacophore models) and constructing therefrom a comprehensive score of similarity by a simple weighting function, the effectiveness of which determines the effectiveness of the screening and the speed of calculation. However, a significant problem with scoring methods of the type described above is the lack of accuracy, which makes virtual screening of drugs highly false positive or false negative.
In recent years, the deep learning technology based on the artificial neural network has made a significant breakthrough in the fields of unmanned driving, voice recognition, image recognition and the like, and has also gradually made a significant progress in the biomedical field, and diagnostic applications of diseases such as skin cancer, congenital cataract, childhood autism and the like based on deep learning have been developed. With the development of technology, the pharmaceutical field is also focusing on accelerating the development of drugs by using deep learning technology to reduce the development cost. Research shows that the deep learning technology has more advantages in optimizing synthetic routes, predicting drug targets and virtual screening compared with the traditional machine learning method.
Therefore, the application of the deep learning technology in the aspect of drug virtual screening is urgently needed to be researched, so that the problem of high false positive or false negative in the existing drug virtual screening process is solved.
Disclosure of Invention
The invention aims to provide a scoring method for three-dimensional similarity of molecules for virtual screening of medicines, aiming at the defects in the prior art, so as to solve the problem of poor accuracy of the existing virtual screening of medicines, and simultaneously maintain high flux computing speed of screening.
The aim of the invention is achieved by the following technical scheme:
the method for scoring the three-dimensional similarity of molecules for virtual screening of drugs comprises the following steps: the method comprises the following steps:
step one, obtaining various characteristic parameters of two molecules for similarity comparison
Respectively reading topological structure and three-dimensional structure information of two molecules for similarity comparison, and calculating to obtain various characteristic parameters, wherein the characteristic parameters comprise: the atomic number difference (F1) between the two molecules; number of rotatable chemical bonds of two molecules (F2); volume difference of two molecules (F3); shape similarity of two molecules (F4); similarity of two molecular hydrogen bond acceptors (F5); similarity of two molecular hydrogen bond donors (F6); similarity of two aromatic rings (F7); hydrophobic center similarity of two molecules (F8); positively charged group similarity of two molecules (F9); and a negative group similarity (F10) of two molecules; wherein:
f1 is calculated by reading in the respective topological structure information of two molecules, and then taking the absolute value of the difference value of the total number of atoms of the two molecules;
the calculation mode of F2 is based on the calculation mode of F1, judging whether each chemical bond is a rotatable bond, obtaining the total number of the rotatable bonds of two molecules, and then taking the absolute value of the difference value of the total number of the rotatable bonds of two molecules;
f3 is calculated by obtaining Van der Waals radius of the atoms according to the types of the atoms in the two molecules on the basis of F1 calculation, wherein each atom is represented by a Gaussian sphere, the radius of the Gaussian sphere is the same as the Van der Waals radius of the atom, the position coordinate of the Gaussian sphere is the same as the coordinate of the atom, and the atomic coordinate is from the input molecular three-dimensional structure; calculating the superposition volume of Gaussian balls in each molecule, wherein the ij-th Gaussian ball group comprises Gaussian balls corresponding to the ith atom and Gaussian balls corresponding to the jth atom, and the superposition volume of the ij-th Gaussian ball group is v ij The method comprises the steps of carrying out a first treatment on the surface of the Calculating the superposition volume of each of the two molecules asN is the total number of atoms in the molecule; then taking the absolute value of the difference value of the superposition volumes of the two molecules;
the calculation mode of F4 is based on the calculation mode of F3, and the intermolecular superposition volume of two molecules under the condition of multiple superposition is calculatedWherein v is ij For the overlapping volume of the ith atom in the first molecule and the jth atom in the second molecule, N is the total number of atoms in the first molecule, M is the total number of atoms in the second molecule, the maximum of which is selectedAs the maximum intermolecular volume; calculate the shape similarity of two molecules +.>Wherein V is A Is the self-overlapping volume of the first molecule, V B Is the self-overlapping volume of the second molecule;
the calculation mode of F5 is based on the calculation mode of F1, and the positions of hydrogen bond acceptors in two molecules are found out; calculation of the overlap volume of the respective Hydrogen bond acceptors in the two moleculesWherein F is ij Is the superposition volume between the ith hydrogen bond acceptor and the jth hydrogen bond acceptor; calculating the superposition volume of intermolecular hydrogen bond acceptor under multiple superposition conditions>Wherein F is ij For the overlapping volume of the ith hydrogen bond acceptor in the first molecule and the jth hydrogen bond acceptor in the second molecule, N is the total number of hydrogen bond acceptors in the first molecule, M is the total number of hydrogen bond acceptors in the second molecule, the maximum value of which is selected->Superimposed volumes as the largest intermolecular hydrogen bond acceptors; calculation of Hydrogen bond acceptor similarity of two molecules +.>Wherein P is A Is the self-overlapping volume of the hydrogen bond acceptor in the first molecule, P B Is the self-overlapping volume of the hydrogen bond acceptor in the second molecule;
the calculation mode of F6 is the same as that of F5, and only the hydrogen bond acceptors in two molecules are replaced by hydrogen bond donors;
the calculation mode of F7 is the same as that of F5, and only the hydrogen bond acceptor is replaced by an aromatic ring;
the calculation mode of F8 is the same as that of F5, and only the hydrogen bond acceptor is replaced by a hydrophobic center;
the calculation mode of F9 is the same as that of F5, and only the hydrogen bond acceptor is replaced by a positive group;
the calculation mode of F10 is the same as that of F5, and only the hydrogen bond acceptor is replaced by a negative group;
training deep learning model
The DUD-E data set is adopted, 102 biological target point information is arranged in the data set, each target point is provided with a corresponding active component set and a corresponding Decoy component set, and the data of each target point is processed as follows:
selecting crystal structure molecules in the active molecule set of each target point as template molecules, and calculating F1-F10 characteristic parameters of every two molecules in the template molecules and the other molecules according to the calculation mode in the step one, so that each target point is calculated to obtain a set of characteristic parameter data;
modeling by using a deep learning method, taking the characteristic parameter data of each target obtained by calculation as input data, and taking whether the activity of molecules is taken as an objective function of two classifications, wherein the direction of model optimization is to minimize the error of the prediction of the activity of all the targets in the molecules, so that the average value of AUC values is maximized; after training is completed, a final deep learning model is obtained;
step three, external verification of deep learning model
Verifying generalization capability of a deep learning model by adopting a MUV data set, selecting 10 biological target point information in the MUV data set, wherein each target point has a corresponding active component set and a corresponding Decoy component set; selecting crystal structure molecules in the active molecule set of each target point as template molecules, and calculating F1-F10 characteristic parameters of every two molecules in the template molecules and other molecules according to the calculation mode in the first step with other molecules in the active molecule set of the target point and all molecules in the Decoy molecule set respectively; and inputting the characteristic parameters into a trained deep learning model, and calculating to obtain the AUC value of virtual screening of each target point.
In the above technical solution, in the first step, the three-dimensional structure information includes a total number of atoms in the molecule, a total number of chemical bonds, a type of each atom, and a coordinate value thereof.
In the above technical scheme, in the second step, an AUC value of virtual screening is calculated for each target point by adopting a 5-time cross validation mode.
The invention has the beneficial effects that:
the invention relates to a scoring method for three-dimensional similarity of molecules for virtual screening of drugs, which comprises the following steps of firstly, obtaining characteristic parameters of two molecules for similarity comparison, wherein the characteristic parameters mainly comprise atomic number difference of the two molecules, rotatable chemical bond number, volume difference, shape similarity, similarity of hydrogen bond acceptors, similarity of hydrogen bond donors, similarity of aromatic rings, hydrophobic center similarity, positive group similarity, negative group similarity and the like; training a deep learning model; and step three, external verification of the deep learning model, wherein the generalization capability of the model is verified by adopting the MUV data set. The multi-dimensional similarity selected in the step one is used as a characteristic parameter through training the deep neural network, and final comprehensive scoring, namely an AUC value, is given through transformation of the deep neural network. The evaluation index of the AUC value of the drug virtual screening is a common standard for evaluating the accuracy of the screening method, the AUC value ranges from 0 to 1, and the closer the value is 1, the more accurate the screening method is. Experiments prove that compared with the prior art, the invention has remarkable improvement on the drug screening accuracy, and simultaneously maintains the high flux calculation speed of screening. Therefore, the scoring method can screen as many molecules with potential biological activity as possible from a database containing a large number of molecules, and the more accurate the screening method is, the easier the molecules with potential activity can be found, so that any potential active molecules can be omitted as much as possible, and the problem of false positive or false negative is solved. Therefore, the invention has very wide application prospect in the aspect of drug virtual screening.
Detailed Description
The present invention will be described in further detail with reference to specific examples, but is not limited thereto.
In a specific implementation manner of the scoring method for three-dimensional similarity of molecules for virtual drug screening in this embodiment, a target ADA17 in a DUD-E dataset is taken as an example, there are 102 biological target information in the DUD-E dataset, and each target has a corresponding active fraction set and a Decoy fraction set. Wherein the dataset of ADA17 targets contained 1,341 active fraction and 35,900 Decoy fraction. The active molecules in the crystal structure are respectively selected as template molecules (hereinafter referred to as "molecule A") and the first molecule in the active molecule set (hereinafter referred to as "molecule B"), and the following operations are performed on the data:
step one, obtaining various characteristic parameters of two molecules for similarity comparison:
the topological structure and three-dimensional structure information (including the total number of atoms, the total number of chemical bonds, the type of each atom, the coordinate value thereof and the like) of the molecule A and the molecule B are read, and then the characteristic parameters for similarity comparison are respectively calculated according to the following steps:
and step 1, taking the absolute value of the difference value of the total number of atoms of the molecules A and B to obtain a first characteristic parameter F1.
And 2, judging whether each chemical bond in the molecules A and B is a rotatable bond, respectively obtaining the total number of the rotatable bonds of the molecules A and B, and obtaining a second characteristic parameter F2 by taking the absolute value of the difference value of the total number of the two.
Step 3, obtaining the van der Waals radius of the atoms according to the types of the atoms in the molecules A and B, wherein each atom is represented by a Gaussian sphere, the radius of the Gaussian sphere is the same as the van der Waals radius of the atom, the position coordinate of the Gaussian sphere is the same as the coordinate of the atom, and the atomic coordinate is from the input three-dimensional structure of the molecule; calculating the superposition volume of a group of Gaussian balls (hereinafter referred to as Gaussian ball group) corresponding to any two atoms in the molecule A, wherein the ij-th Gaussian ball group comprises the Gaussian balls corresponding to the ith atom and the Gaussian balls corresponding to the j-th atom in the molecule A, and the superposition volume of the ij-th Gaussian ball group is v ij The method comprises the steps of carrying out a first treatment on the surface of the Calculate the superimposed volume of molecule A itself asN is the total number of atoms in molecule A; the same method was used to calculate the superimposed volume of molecule B itself asM is the total number of atoms in molecule B; and then taking the absolute value of the difference value of the self-overlapped volumes of the molecules A and B to obtain F3.
Step 4, calculating intermolecular overlapping volume of the molecules A and B under various overlapping conditionsWherein v is ij For the volume of superposition of the ith atom in molecule A and the jth atom in molecule B, N is the total number of atoms in molecule A, M is the total number of atoms in molecule B, the maximum of which is selected>As the maximum intermolecular volume; calculating the shape similarity of two moleculesWherein V is A Is the self-overlapping volume of molecule A, V B Is the self-folded volume of molecule B (i.e., calculated in step 4).
Step 5, finding out the positions of all hydrogen bond acceptors in the molecules A and B; calculation of the overlap volume of Hydrogen bond acceptors in molecule AWherein F is ij N is the total number of hydrogen bond acceptors in the molecule A, which is the superposition volume between the ith hydrogen bond acceptor and the jth hydrogen bond acceptor in the molecule A; the overlapping volume of the hydrogen bond acceptors in molecule B was calculated in the same way +.>Wherein F is ij The volume of superposition between the ith hydrogen bond acceptor and the jth hydrogen bond acceptor in the molecule B is M is the total number of the hydrogen bond acceptors in the molecule B; calculating the superposition of molecules A and B in various waysThe overlapping volume of the intermolecular hydrogen bond acceptors in the case +.>Wherein F is ij For the overlapping volume of the ith hydrogen bond acceptor in molecule A and the jth hydrogen bond acceptor in molecule B, N is the total number of hydrogen bond acceptors in molecule A, M is the total number of hydrogen bond acceptors in molecule B, the maximum of which is selected->A volume of overlap that acts as a hydrogen bond acceptor between molecules a and B; calculating the hydrogen bond acceptor similarity of molecules A and B +.>Wherein P is A Is the self-overlapping volume of the hydrogen bond acceptor in the molecule A, P B Is the self-overlapping volume of the hydrogen bond acceptors in molecule B.
And step 6, the calculation mode is consistent with that of the step 5, and the hydrogen bond acceptor similarity F6 of the molecules A and B can be obtained only by replacing the hydrogen bond donor with the hydrogen bond acceptor.
And 7, the calculation mode is consistent with that of the step 5, and the aromatic ring similarity F7 of the molecules A and B can be obtained only by replacing the hydrogen bond donor with an aromatic ring.
And 8, the calculation mode is consistent with that of the step 5, and the hydrophobic center similarity F8 of the molecules A and B can be obtained only by replacing the hydrogen bond donor with the hydrophobic center.
And 9, the calculation mode is consistent with that of the step 5, and the similarity F9 of the positive groups of the molecules A and B can be obtained only by replacing the hydrogen bond donor with the positive group.
And step 10, the calculation mode is consistent with that of the step 5, and the similarity F10 of the negative groups of the molecules A and B can be obtained only by replacing the hydrogen bond donor with the negative group.
Thus, the characteristic parameters F1 to F10 of the first molecule in the set of the molecule A and the active molecule, namely the molecule B, are obtained, and 10 total.
Training a deep learning model:
taking the DUD-E data set target ADA17 as an example, the same calculation method as in the first step is adopted to calculate the characteristic parameters of the second molecule (hereinafter referred to as "molecule C") in the molecule a and the active molecule set, and 10 corresponding characteristic parameters are obtained.
And similarly, calculating the third, fourth and fifth of the set of molecules respectively, the nth molecule (N is a natural number) until all the active molecules in the set of active molecules have been calculated, to obtain a number of characteristic parameters of 10×1341= 13,410.
Next, the characteristic parameters of each of the molecules a and Decoy in the set were calculated in the same manner as in step one, resulting in 10×35900= 359,000 characteristic parameters. So far, all the characteristic parameters of the target ADA17 are calculated.
Then, the characteristic parameter sets of the other 101 targets in the DUD-E data set are calculated by adopting the same calculation mode. To this end, there are 102 targets in the DUD-E dataset, each with a set of feature parameter data.
Finally, the 102 sets of characteristic parameter data are used as input characteristic data of a deep learning model, whether the activity of molecules is used as an objective function of two classifications, and the model optimization direction is that the error of the prediction of the activity of all targets in the molecules is minimized, so that the average value of AUC values is maximized. In the training process, a 5-time cross validation mode is adopted, the AUC value of virtual screening is calculated for each target point (see table 1), the average AUCaver of all 102 target point AUC values is taken, and the model optimization direction is that the AUCaver value is the largest. After training is completed, a final deep learning model can be obtained.
TABLE 1 AUC values calculated for the DUD-E dataset using the conventional method and the method of this example
Step three, external verification of the deep learning model:
the generalization ability of the deep learning model was verified using the MUV dataset. Taking the example of the target 466 in the MUV dataset, the target 466 dataset contains 31 active fractions and 15000 Decoy fractions. Active molecules in the crystal structure are selected as template molecules (hereinafter referred to as molecule a). And (3) obtaining characteristic parameters of the molecule A and each molecule in the active ingredient set and the Decoy molecule set respectively by adopting a calculation mode in the step two, and obtaining a data set of 10 (31+15000) = 150310 characteristic parameters in total. The AUC value of the target virtual screening can be calculated by inputting the characteristic parameters into a trained deep learning model (see Table 2).
Table 2. AUC values calculated for muv data sets using conventional methods and the method of this example
The evaluation index of the AUC value of the drug virtual screening is a common standard for evaluating the accuracy of the screening method, the AUC value ranges from 0 to 1, and the closer the value is 1, the more accurate the screening method is. As shown in the experimental results in tables 1 and 2, compared with the prior art, the invention has obvious improvement on the drug screening accuracy, and simultaneously, the calculation speed of high flux of screening is maintained. The scoring method can screen out as many molecules with potential biological activity as possible from a database containing a large number of molecules, and the more accurate the screening method is, the easier the molecules with potential activity can be found, so that any potential active molecules can be omitted as much as possible, and the problem of false positive or false negative is solved.
The above examples are provided for convenience of description of the present invention and are not to be construed as limiting the invention in any way, and any person skilled in the art will make partial changes or modifications to the invention by using the disclosed technical content without departing from the technical features of the invention.

Claims (3)

1. A scoring method for three-dimensional similarity of molecules for virtual screening of drugs, which is characterized by comprising the following steps of: the method comprises the following steps:
step one, obtaining various characteristic parameters of two molecules for similarity comparison
Respectively reading topological structure and three-dimensional structure information of two molecules for similarity comparison, and calculating to obtain various characteristic parameters, wherein the characteristic parameters comprise: the atomic number difference F1 of two molecules; a number of rotatable chemical bonds of two molecules F2; volume difference F3 of two molecules; shape similarity F4 of two molecules; similarity F5 of two hydrogen bond acceptors; similarity F6 of two molecular hydrogen bond donors; similarity F7 of two aromatic rings; hydrophobic center similarity F8 of two molecules; positively charged group similarity F9 of two molecules; and a negative group similarity F10 of two molecules; wherein:
f1 is calculated by reading in the respective topological structure information of two molecules, and then taking the absolute value of the difference value of the total number of atoms of the two molecules;
the calculation mode of F2 is based on the calculation mode of F1, judging whether each chemical bond is a rotatable bond, obtaining the total number of the rotatable bonds of two molecules, and then taking the absolute value of the difference value of the total number of the rotatable bonds of two molecules;
f3 is calculated by obtaining Van der Waals radius of atoms according to the types of atoms in two molecules based on F1 calculation, wherein each atom is represented by a Gaussian sphere, the radius of the Gaussian sphere is the same as the Van der Waals radius of the atom, and the position coordinates of the Gaussian sphere are the same as the atomsThe coordinates of (2) are the same, and the atomic coordinates are from the input molecular three-dimensional structure; calculating the superposition volume of Gaussian balls in each molecule, wherein the ij-th Gaussian ball group comprises Gaussian balls corresponding to the ith atom and Gaussian balls corresponding to the jth atom, and the superposition volume of the ij-th Gaussian ball group is v ij The method comprises the steps of carrying out a first treatment on the surface of the Calculating the superposition volume of each of the two molecules asN is the total number of atoms in the molecule; then taking the absolute value of the difference value of the superposition volumes of the two molecules;
the calculation mode of F4 is based on the calculation mode of F3, and the intermolecular superposition volume of two molecules under the condition of multiple superposition is calculatedWherein v is ij For the overlapping volume of the ith atom in the first molecule and the jth atom in the second molecule, N is the total number of atoms in the first molecule, M is the total number of atoms in the second molecule, the maximum of which is selectedAs the maximum intermolecular volume; calculate the shape similarity of two molecules +.>Wherein V is A Is the self-overlapping volume of the first molecule, V B Is the self-overlapping volume of the second molecule;
the calculation mode of F5 is based on the calculation mode of F1, and the positions of hydrogen bond acceptors in two molecules are found out; calculation of the overlap volume of the respective Hydrogen bond acceptors in the two moleculesWherein F is ij Is the superposition volume between the ith hydrogen bond acceptor and the jth hydrogen bond acceptor; calculating the superposition volume of intermolecular hydrogen bond acceptor under multiple superposition conditions>Wherein F is ij For the overlapping volume of the ith hydrogen bond acceptor in the first molecule and the jth hydrogen bond acceptor in the second molecule, N is the total number of hydrogen bond acceptors in the first molecule, M is the total number of hydrogen bond acceptors in the second molecule, the maximum value of which is selected->Superimposed volumes as the largest intermolecular hydrogen bond acceptors; calculation of Hydrogen bond acceptor similarity of two molecules +.>Wherein P is A Is the self-overlapping volume of the hydrogen bond acceptor in the first molecule, P B Is the self-overlapping volume of the hydrogen bond acceptor in the second molecule;
the calculation mode of F6 is the same as that of F5, and only the hydrogen bond acceptors in two molecules are replaced by hydrogen bond donors;
the calculation mode of F7 is the same as that of F5, and only the hydrogen bond acceptor is replaced by an aromatic ring;
the calculation mode of F8 is the same as that of F5, and only the hydrogen bond acceptor is replaced by a hydrophobic center;
the calculation mode of F9 is the same as that of F5, and only the hydrogen bond acceptor is replaced by a positive group;
the calculation mode of F10 is the same as that of F5, and only the hydrogen bond acceptor is replaced by a negative group;
training deep learning model
The DUD-E data set is adopted, 102 biological target point information is arranged in the data set, each target point is provided with a corresponding active component set and a corresponding Decoy component set, and the data of each target point is processed as follows:
selecting crystal structure molecules in the active molecule set of each target point as template molecules, and calculating F1-F10 characteristic parameters of every two molecules in the template molecules and the other molecules according to the calculation mode in the step one, so that each target point is calculated to obtain a set of characteristic parameter data;
modeling by using a deep learning method, taking the characteristic parameter data of each target obtained by calculation as input data, and taking whether the activity of molecules is taken as an objective function of two classifications, wherein the direction of model optimization is to minimize the error of the prediction of the activity of all the targets in the molecules, so that the average value of AUC values is maximized; after training is completed, a final deep learning model is obtained;
step three, external verification of deep learning model
Verifying generalization capability of a deep learning model by adopting a MUV data set, selecting 10 biological target point information in the MUV data set, wherein each target point has a corresponding active component set and a corresponding Decoy component set; selecting crystal structure molecules in the active molecule set of each target point as template molecules, and calculating F1-F10 characteristic parameters of every two molecules in the template molecules and other molecules according to the calculation mode in the first step with other molecules in the active molecule set of the target point and all molecules in the Decoy molecule set respectively; and inputting the characteristic parameters into a trained deep learning model, and calculating to obtain the AUC value of virtual screening of each target point.
2. A method for scoring three-dimensional similarity of molecules for virtual drug screening according to claim 1, wherein: in the first step, the three-dimensional structure information includes the total number of atoms in the molecule, the total number of chemical bonds, the type of each atom and the coordinate value thereof.
3. A method for scoring three-dimensional similarity of molecules for virtual drug screening according to claim 1, wherein: in the second step, a 5-time cross validation mode is adopted to calculate the AUC value of the virtual screening for each target point.
CN202110829201.2A 2021-07-22 2021-07-22 Scoring method for three-dimensional similarity of molecules for virtual screening of drugs Active CN113450881B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110829201.2A CN113450881B (en) 2021-07-22 2021-07-22 Scoring method for three-dimensional similarity of molecules for virtual screening of drugs

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110829201.2A CN113450881B (en) 2021-07-22 2021-07-22 Scoring method for three-dimensional similarity of molecules for virtual screening of drugs

Publications (2)

Publication Number Publication Date
CN113450881A CN113450881A (en) 2021-09-28
CN113450881B true CN113450881B (en) 2024-02-13

Family

ID=77816998

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110829201.2A Active CN113450881B (en) 2021-07-22 2021-07-22 Scoring method for three-dimensional similarity of molecules for virtual screening of drugs

Country Status (1)

Country Link
CN (1) CN113450881B (en)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115579050B (en) * 2022-12-08 2023-03-14 香港中文大学(深圳) Method and system for searching key original set in biomolecule functional dynamics

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106446607A (en) * 2016-09-26 2017-02-22 华东师范大学 Drug target virtual screening method based on interactive fingerprints and machine learning
CN110444250A (en) * 2019-03-26 2019-11-12 广东省微生物研究所(广东省微生物分析检测中心) High-throughput drug virtual screening system based on molecular fingerprint and deep learning
CN112086146A (en) * 2020-08-24 2020-12-15 南京邮电大学 Small molecule drug virtual screening method and device based on deep parameter transfer learning

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106446607A (en) * 2016-09-26 2017-02-22 华东师范大学 Drug target virtual screening method based on interactive fingerprints and machine learning
CN110444250A (en) * 2019-03-26 2019-11-12 广东省微生物研究所(广东省微生物分析检测中心) High-throughput drug virtual screening system based on molecular fingerprint and deep learning
CN112086146A (en) * 2020-08-24 2020-12-15 南京邮电大学 Small molecule drug virtual screening method and device based on deep parameter transfer learning

Also Published As

Publication number Publication date
CN113450881A (en) 2021-09-28

Similar Documents

Publication Publication Date Title
CN109964278A (en) Pass through the system and method for the error in evaluated in parallel classifier the first classifier of output calibration
Rustam et al. Random-forest (RF) and support vector machine (SVM) implementation for analysis of gene expression data in chronic kidney disease (CKD)
CN107731309A (en) A kind of Forecasting Methodology of pharmaceutical activity and its application
Shi et al. Protein complex detection with semi-supervised learning in protein interaction networks
CN105117618B (en) It is a kind of based on the drug targets of guilt by association principle and network topology structure feature interact recognition methods
US20140006447A1 (en) Generating epigenentic cohorts through clustering of epigenetic suprisal data based on parameters
CN113450881B (en) Scoring method for three-dimensional similarity of molecules for virtual screening of drugs
CN113593655B (en) Scoring method for three-dimensional similarity of molecules for virtual screening of drugs
CN108875298B (en) Based on the matched drug screening method of molecular shape
Liu et al. Artificial intelligence in psoriasis: Where we are and where we are going
Li et al. ACP-GBDT: An improved anticancer peptide identification method with gradient boosting decision tree
Wang et al. scGMAAE: Gaussian mixture adversarial autoencoders for diversification analysis of scRNA-seq data
da Silva et al. Parallel niche pareto AlineaGA–an evolutionary multiobjective approach on multiple sequence alignment
CN116403731A (en) Missense mutation effect prediction method and system for clinical drug effect based on deep learning
CN114999566B (en) Drug repositioning method and system based on word vector characterization and attention mechanism
Tian et al. Accurate admet prediction with xgboost
Tian et al. A longitudinal feature selection method identifies relevant genes to distinguish complicated injury and uncomplicated injury over time
CN111383708A (en) Small molecule target prediction algorithm based on chemical genomics and application thereof
Yang et al. scCross: A Deep Generative Model for Unifying Single-cell Multi-omics with Seamless Integration, Cross-modal Generation, and In-silico Exploration
Hasija All about Bioinformatics: From Beginner to Expert
Chen et al. PUTransGCN: identification of piRNA–disease associations based on attention encoding graph convolutional network and positive unlabelled learning
Roberts et al. Using deep Siamese neural networks to speed up natural products research
Mrukwa et al. Finding significantly enriched cells in single-cell RNA sequencing by single-sample approaches
He et al. A new hidden Markov model for protein quality assessment using compatibility between protein sequence and structure
Rahal et al. Secondary protein structure prediction combining protein structural class, relative surface accessibility, and contact number

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant
TR01 Transfer of patent right

Effective date of registration: 20240301

Address after: Room 2FB203, Block B, Building 1, No. 800 Naxian Road, Pudong New Area Free Trade Zone, Shanghai, August 2012

Patentee after: Weixin Zhiyao (Shanghai) Technology Co.,Ltd.

Country or region after: China

Address before: 100084 office b-521-b008, floor 5, building 2, courtyard 1, Nongda South Road, Haidian District, Beijing

Patentee before: Beijing Zhongda Weixin Technology Co.,Ltd.

Country or region before: China

TR01 Transfer of patent right