CN115050428B - Drug property prediction method and system based on deep learning fusion molecular graph and fingerprint - Google Patents
Drug property prediction method and system based on deep learning fusion molecular graph and fingerprint Download PDFInfo
- Publication number
- CN115050428B CN115050428B CN202210654644.7A CN202210654644A CN115050428B CN 115050428 B CN115050428 B CN 115050428B CN 202210654644 A CN202210654644 A CN 202210654644A CN 115050428 B CN115050428 B CN 115050428B
- Authority
- CN
- China
- Prior art keywords
- molecular
- model
- drug
- fingerprint
- module
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
- 239000003814 drug Substances 0.000 title claims abstract description 141
- 229940079593 drug Drugs 0.000 title claims abstract description 130
- 238000000034 method Methods 0.000 title claims abstract description 77
- 230000004927 fusion Effects 0.000 title claims abstract description 61
- 238000013135 deep learning Methods 0.000 title claims abstract description 56
- 238000012549 training Methods 0.000 claims abstract description 32
- 238000005457 optimization Methods 0.000 claims abstract description 31
- 238000012795 verification Methods 0.000 claims abstract description 28
- 238000012360 testing method Methods 0.000 claims abstract description 27
- 238000010276 construction Methods 0.000 claims abstract description 23
- 238000013136 deep learning model Methods 0.000 claims abstract description 14
- 229940126586 small molecule drug Drugs 0.000 claims abstract description 7
- 230000000694 effects Effects 0.000 claims description 62
- 150000003384 small molecules Chemical class 0.000 claims description 57
- 239000013598 vector Substances 0.000 claims description 45
- 150000001875 compounds Chemical class 0.000 claims description 35
- 238000010586 diagram Methods 0.000 claims description 33
- 238000000605 extraction Methods 0.000 claims description 29
- 238000004458 analytical method Methods 0.000 claims description 27
- 239000000126 substance Substances 0.000 claims description 18
- 239000002547 new drug Substances 0.000 claims description 15
- 230000008569 process Effects 0.000 claims description 14
- 230000007246 mechanism Effects 0.000 claims description 12
- 238000007781 pre-processing Methods 0.000 claims description 12
- 238000011156 evaluation Methods 0.000 claims description 11
- 230000006870 function Effects 0.000 claims description 10
- 238000013528 artificial neural network Methods 0.000 claims description 8
- 238000009510 drug design Methods 0.000 claims description 6
- 239000000284 extract Substances 0.000 claims description 6
- 238000010606 normalization Methods 0.000 claims description 6
- 239000001257 hydrogen Substances 0.000 claims description 5
- 229910052739 hydrogen Inorganic materials 0.000 claims description 5
- OKTJSMMVPCPJKN-UHFFFAOYSA-N Carbon Chemical compound [C] OKTJSMMVPCPJKN-UHFFFAOYSA-N 0.000 claims description 4
- 229910052799 carbon Inorganic materials 0.000 claims description 4
- 238000010612 desalination reaction Methods 0.000 claims description 4
- 238000002372 labelling Methods 0.000 claims description 3
- 238000012545 processing Methods 0.000 claims description 3
- 238000012935 Averaging Methods 0.000 claims description 2
- 125000003118 aryl group Chemical group 0.000 claims description 2
- 238000009396 hybridization Methods 0.000 claims description 2
- 238000013507 mapping Methods 0.000 claims description 2
- 239000011159 matrix material Substances 0.000 claims description 2
- 230000000717 retained effect Effects 0.000 claims description 2
- UFHFLCQGNIYNRP-UHFFFAOYSA-N Hydrogen Chemical compound [H][H] UFHFLCQGNIYNRP-UHFFFAOYSA-N 0.000 claims 1
- 230000007547 defect Effects 0.000 abstract description 4
- 230000008499 blood brain barrier function Effects 0.000 description 35
- 210000001218 blood-brain barrier Anatomy 0.000 description 35
- 102000013530 TOR Serine-Threonine Kinases Human genes 0.000 description 30
- 108010065917 TOR Serine-Threonine Kinases Proteins 0.000 description 30
- 125000004429 atom Chemical group 0.000 description 28
- 230000002401 inhibitory effect Effects 0.000 description 26
- 108091007914 CDKs Proteins 0.000 description 21
- 238000003860 storage Methods 0.000 description 21
- 230000005764 inhibitory process Effects 0.000 description 19
- 230000035699 permeability Effects 0.000 description 16
- 102000004169 proteins and genes Human genes 0.000 description 16
- 108090000623 proteins and genes Proteins 0.000 description 16
- 102100024457 Cyclin-dependent kinase 9 Human genes 0.000 description 13
- 101000980930 Homo sapiens Cyclin-dependent kinase 9 Proteins 0.000 description 13
- 239000003112 inhibitor Substances 0.000 description 13
- 238000004590 computer program Methods 0.000 description 11
- 206010028980 Neoplasm Diseases 0.000 description 10
- 201000011510 cancer Diseases 0.000 description 10
- 238000012827 research and development Methods 0.000 description 10
- 238000002474 experimental method Methods 0.000 description 8
- 238000010801 machine learning Methods 0.000 description 8
- 238000009509 drug development Methods 0.000 description 7
- 238000003041 virtual screening Methods 0.000 description 7
- 238000004364 calculation method Methods 0.000 description 6
- 108090000266 Cyclin-dependent kinases Proteins 0.000 description 5
- 102000003903 Cyclin-dependent kinases Human genes 0.000 description 5
- 238000004422 calculation algorithm Methods 0.000 description 5
- 238000013461 design Methods 0.000 description 5
- IJGRMHOSHXDMSA-UHFFFAOYSA-N Atomic nitrogen Chemical compound N#N IJGRMHOSHXDMSA-UHFFFAOYSA-N 0.000 description 4
- 108010025461 Cyclin-Dependent Kinase 9 Proteins 0.000 description 4
- 102000013702 Cyclin-Dependent Kinase 9 Human genes 0.000 description 4
- 108091000080 Phosphotransferase Proteins 0.000 description 4
- 238000004617 QSAR study Methods 0.000 description 4
- 238000013473 artificial intelligence Methods 0.000 description 4
- 230000008901 benefit Effects 0.000 description 4
- 238000011161 development Methods 0.000 description 4
- 238000009826 distribution Methods 0.000 description 4
- 102000020233 phosphotransferase Human genes 0.000 description 4
- 238000012216 screening Methods 0.000 description 4
- ORILYTVJVMAKLC-UHFFFAOYSA-N Adamantane Natural products C1C(C2)CC3CC1CC2C3 ORILYTVJVMAKLC-UHFFFAOYSA-N 0.000 description 3
- 150000002500 ions Chemical class 0.000 description 3
- 238000007479 molecular analysis Methods 0.000 description 3
- 230000003204 osmotic effect Effects 0.000 description 3
- 238000007619 statistical method Methods 0.000 description 3
- 229940043263 traditional drug Drugs 0.000 description 3
- KZMAWJRXKGLWGS-UHFFFAOYSA-N 2-chloro-n-[4-(4-methoxyphenyl)-1,3-thiazol-2-yl]-n-(3-methoxypropyl)acetamide Chemical compound S1C(N(C(=O)CCl)CCCOC)=NC(C=2C=CC(OC)=CC=2)=C1 KZMAWJRXKGLWGS-UHFFFAOYSA-N 0.000 description 2
- 230000035502 ADME Effects 0.000 description 2
- YNAVUWVOSKDBBP-UHFFFAOYSA-N Morpholine Chemical group C1COCCN1 YNAVUWVOSKDBBP-UHFFFAOYSA-N 0.000 description 2
- 238000009825 accumulation Methods 0.000 description 2
- 230000006907 apoptotic process Effects 0.000 description 2
- 238000013459 approach Methods 0.000 description 2
- QVGXLLKOCUKJST-UHFFFAOYSA-N atomic oxygen Chemical compound [O] QVGXLLKOCUKJST-UHFFFAOYSA-N 0.000 description 2
- 230000004071 biological effect Effects 0.000 description 2
- 125000001951 carbamoylamino group Chemical group C(N)(=O)N* 0.000 description 2
- 238000004140 cleaning Methods 0.000 description 2
- 230000000295 complement effect Effects 0.000 description 2
- 230000034994 death Effects 0.000 description 2
- 230000001419 dependent effect Effects 0.000 description 2
- 230000036541 health Effects 0.000 description 2
- 239000000463 material Substances 0.000 description 2
- 125000004573 morpholin-4-yl group Chemical group N1(CCOCC1)* 0.000 description 2
- 229910052757 nitrogen Inorganic materials 0.000 description 2
- 239000001301 oxygen Substances 0.000 description 2
- 229910052760 oxygen Inorganic materials 0.000 description 2
- 230000035515 penetration Effects 0.000 description 2
- 238000004321 preservation Methods 0.000 description 2
- 238000004445 quantitative analysis Methods 0.000 description 2
- 238000011160 research Methods 0.000 description 2
- 101100383153 Caenorhabditis elegans cdk-9 gene Proteins 0.000 description 1
- 101100533230 Caenorhabditis elegans ser-2 gene Proteins 0.000 description 1
- 108010024986 Cyclin-Dependent Kinase 2 Proteins 0.000 description 1
- 102100032857 Cyclin-dependent kinase 1 Human genes 0.000 description 1
- 101710106279 Cyclin-dependent kinase 1 Proteins 0.000 description 1
- 102100036239 Cyclin-dependent kinase 2 Human genes 0.000 description 1
- 102000004190 Enzymes Human genes 0.000 description 1
- 108090000790 Enzymes Proteins 0.000 description 1
- PXGOKWXKJXAPGV-UHFFFAOYSA-N Fluorine Chemical compound FF PXGOKWXKJXAPGV-UHFFFAOYSA-N 0.000 description 1
- 101001056180 Homo sapiens Induced myeloid leukemia cell differentiation protein Mcl-1 Proteins 0.000 description 1
- 102100026539 Induced myeloid leukemia cell differentiation protein Mcl-1 Human genes 0.000 description 1
- 229920000776 Poly(Adenosine diphosphate-ribose) polymerase Polymers 0.000 description 1
- 238000010521 absorption reaction Methods 0.000 description 1
- 239000000370 acceptor Substances 0.000 description 1
- 230000009471 action Effects 0.000 description 1
- 230000001093 anti-cancer Effects 0.000 description 1
- 230000009286 beneficial effect Effects 0.000 description 1
- 230000000975 bioactive effect Effects 0.000 description 1
- 239000003560 cancer drug Substances 0.000 description 1
- 238000012512 characterization method Methods 0.000 description 1
- OGEBRHQLRGFBNV-RZDIXWSQSA-N chembl2036808 Chemical group C12=NC(NCCCC)=NC=C2C(C=2C=CC(F)=CC=2)=NN1C[C@H]1CC[C@H](N)CC1 OGEBRHQLRGFBNV-RZDIXWSQSA-N 0.000 description 1
- 238000000205 computational method Methods 0.000 description 1
- 201000010099 disease Diseases 0.000 description 1
- 208000037265 diseases, disorders, signs and symptoms Diseases 0.000 description 1
- 238000012362 drug development process Methods 0.000 description 1
- 238000007877 drug screening Methods 0.000 description 1
- 238000005516 engineering process Methods 0.000 description 1
- 230000007613 environmental effect Effects 0.000 description 1
- 230000029142 excretion Effects 0.000 description 1
- 238000001914 filtration Methods 0.000 description 1
- 239000011737 fluorine Substances 0.000 description 1
- 229910052731 fluorine Inorganic materials 0.000 description 1
- 238000013537 high throughput screening Methods 0.000 description 1
- 125000004435 hydrogen atom Chemical group [H]* 0.000 description 1
- 238000000338 in vitro Methods 0.000 description 1
- 238000012482 interaction analysis Methods 0.000 description 1
- 229940123729 mTOR kinase inhibitor Drugs 0.000 description 1
- 239000003628 mammalian target of rapamycin inhibitor Substances 0.000 description 1
- 239000012528 membrane Substances 0.000 description 1
- 230000004060 metabolic process Effects 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 238000003032 molecular docking Methods 0.000 description 1
- 230000008520 organization Effects 0.000 description 1
- 230000037361 pathway Effects 0.000 description 1
- 230000000149 penetrating effect Effects 0.000 description 1
- 230000000144 pharmacologic effect Effects 0.000 description 1
- 239000002243 precursor Substances 0.000 description 1
- 230000002265 prevention Effects 0.000 description 1
- 230000005180 public health Effects 0.000 description 1
- FGVVTMRZYROCTH-UHFFFAOYSA-N pyridine-2-thiol N-oxide Chemical compound [O-][N+]1=CC=CC=C1S FGVVTMRZYROCTH-UHFFFAOYSA-N 0.000 description 1
- 229960002026 pyrithione Drugs 0.000 description 1
- 231100000419 toxicity Toxicity 0.000 description 1
- 230000001988 toxicity Effects 0.000 description 1
- 230000007704 transition Effects 0.000 description 1
- 230000000007 visual effect Effects 0.000 description 1
Classifications
-
- G—PHYSICS
- G16—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
- G16C—COMPUTATIONAL CHEMISTRY; CHEMOINFORMATICS; COMPUTATIONAL MATERIALS SCIENCE
- G16C20/00—Chemoinformatics, i.e. ICT specially adapted for the handling of physicochemical or structural data of chemical particles, elements, compounds or mixtures
- G16C20/30—Prediction of properties of chemical compounds, compositions or mixtures
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
-
- G—PHYSICS
- G16—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
- G16C—COMPUTATIONAL CHEMISTRY; CHEMOINFORMATICS; COMPUTATIONAL MATERIALS SCIENCE
- G16C20/00—Chemoinformatics, i.e. ICT specially adapted for the handling of physicochemical or structural data of chemical particles, elements, compounds or mixtures
- G16C20/50—Molecular design, e.g. of drugs
-
- G—PHYSICS
- G16—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
- G16C—COMPUTATIONAL CHEMISTRY; CHEMOINFORMATICS; COMPUTATIONAL MATERIALS SCIENCE
- G16C20/00—Chemoinformatics, i.e. ICT specially adapted for the handling of physicochemical or structural data of chemical particles, elements, compounds or mixtures
- G16C20/70—Machine learning, data mining or chemometrics
-
- Y—GENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
- Y02—TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
- Y02A—TECHNOLOGIES FOR ADAPTATION TO CLIMATE CHANGE
- Y02A90/00—Technologies having an indirect contribution to adaptation to climate change
- Y02A90/10—Information and communication technologies [ICT] supporting adaptation to climate change, e.g. for weather forecasting or climate simulation
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Life Sciences & Earth Sciences (AREA)
- Computing Systems (AREA)
- Chemical & Material Sciences (AREA)
- Bioinformatics & Computational Biology (AREA)
- Physics & Mathematics (AREA)
- General Health & Medical Sciences (AREA)
- Crystallography & Structural Chemistry (AREA)
- Health & Medical Sciences (AREA)
- Artificial Intelligence (AREA)
- Software Systems (AREA)
- Data Mining & Analysis (AREA)
- Evolutionary Computation (AREA)
- Computational Linguistics (AREA)
- Mathematical Physics (AREA)
- General Physics & Mathematics (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Databases & Information Systems (AREA)
- Medical Informatics (AREA)
- General Engineering & Computer Science (AREA)
- Molecular Biology (AREA)
- Biophysics (AREA)
- Biomedical Technology (AREA)
- Medicinal Chemistry (AREA)
- Pharmacology & Pharmacy (AREA)
- Spectroscopy & Molecular Physics (AREA)
- Management, Administration, Business Operations System, And Electronic Commerce (AREA)
- Investigating Or Analysing Biological Materials (AREA)
Abstract
The invention discloses a drug property prediction method and a drug property prediction system based on deep learning fusion molecular graph and fingerprint. The prediction method comprises the following steps: prediction of different drug properties; constructing a deep learning model suitable for predicting the drug properties; selecting a specific mode according to the requirement of model construction, and splitting a data set into a training set, a testing set and a verification set; inputting the data set into a network model, training and updating parameters in the network according to the difference between the predicted result of the training set and the true value of the training set, determining optimal network parameters according to the optimal result on the verification set, and detecting the data of the test set; determining an optimal super-parameter combination of the model according to a super-parameter optimization strategy; and for the prediction of different drug properties, generating a targeted optimal model for subsequent application to new small molecule drug property prediction. The invention combines classical molecular fingerprint characteristics and solves the defect that important characteristics cannot be effectively extracted on a small-scale data set in deep learning.
Description
Technical Field
The invention relates to the technical field of deep learning prediction of drug properties, in particular to a drug property prediction method and system based on a deep learning fusion molecular graph and fingerprint.
Background
Cancer is one of the major diseases that currently endanger human health and life. There will be 1810 ten thousand new cases of cancer and 960 ten thousand cases of cancer death worldwide in 2018 according to the "2020 world cancer report" issued by the international cancer center (IARC) subordinate to the world health organization. Global cancer statistics in 2018 show that cancer morbidity and mortality in China are the first global. In 1810 Mo Xin cancer cases, china accounts for 380.4 ten thousand cases; of 960 ten thousand cancer death cases, 229.6 ten thousand cases are taken up in China. Cancer prevention and treatment have become important public health problems in China. Therefore, the urgent need for cancer drug development is emphasized in the development and implementation of significant new drugs in the past countries.
From the perspective of traditional drug molecule design, accurate prediction of molecular properties, including physicochemical and bioactive properties, as well as ADME/T (absorption, distribution, metabolism, excretion and toxicity) properties, is a fundamental challenge for molecular design. Since the concept of computer-aided drug design was proposed and developed and applied over time, as one of the most widely and well-established computational methods in molecular property prediction, quantitative structure-activity (property) relationship (QSAR/QSPR) modeling has been developed and applied by fitting and learning known data relationships using empirical, linear or nonlinear functions to estimate the activity/properties of unfamiliar chemical structures, and then applying these models to predict and design new molecules with desired functional properties. The QSAR/QSPR model, which was the precursor of artificial intelligence in the current drug development field, was limited by the lack of computational hardware and experimental data and could not be generalized and applied for thirty years ago, but with the continual accumulation of experimental data (such as chemical, biological and pharmacological related data) and the upgrade of hardware conditions, artificial Intelligence (AI) and Machine Learning (ML) algorithms created many successful cases in the drug development field and were considered as indispensable tools for building the QSAR/QSPR model, helping to rapidly and reliably predict and evaluate the biology and ADME/T characteristics of small molecules in physicochemical, drug development practices.
Generally, the ML-based QSAR/QSPR modeling prediction method is severely dependent on a proper molecular characterization mode, and currently commonly used molecular representation methods can be divided into three main categories, namely molecular descriptors, molecular fingerprints and molecular figures. Molecular descriptors and fingerprints are derived from human expert domain knowledge and are used to fully describe the structure, physicochemical, topology and structural characteristics of molecules. The representation of molecular patterns typically occurs in deep learning (DEEP LEARNING, DL) based methods, which principle is that atoms and bonds of molecules are considered as nodes and edges, and integrated dotted information is input into the structure of the deep neural network as an information material providing machine learning. Both traditional ML-based approaches and DL-based approaches proposed in recent years have created many successful cases in the field of drug development, but there is still a controversy as to whether graph-based DL models are superior to traditional descriptor-based ML models. Studies report that map-based DL models remain potentially limited in the event of insufficient data sets. The present invention speculates and verifies during development that the information captured based on the molecular representation of a graph or fingerprint is different and complementary.
The development of deep learning in the field of pharmaceutical research has data as a support for advantages and is also limited by the data. The data of drug development has larger resistance on data accumulation due to the characteristics of difficult unification of environmental standards of various sources and high noise. In the traditional drug development process for decades, the high-throughput screening and combined chemical technology are adopted, and the data in the drug development field initially touch a threshold of big data, but due to the characteristics of the data type, after standardized treatment, artificial intelligence is more a small data problem in the biomedical industry. Therefore, in the data transition period in the biomedical field, the advantages of traditional machine learning and deep learning and the respective captured complementary information are combined, and research and verification prove that the method is used as a first innovative method strategy, and the prediction accuracy is higher than that of the existing algorithm.
Disclosure of Invention
The invention aims to overcome the defects of the prior art and provide a drug property prediction method and a drug property prediction system based on deep learning fusion of a molecular graph and a fingerprint.
The object of the invention can be achieved by the following technical scheme.
The medicine property prediction method based on deep learning fusion molecular graph and fingerprint is used for realizing rapid property prediction of small molecular medicine, and comprises the following steps:
1) For the prediction of different drug properties, a targeted and specific data set containing a large amount of drug small molecule data is obtained;
2) Constructing a deep learning model suitable for drug property prediction, wherein the model fuses two characteristics of a molecular graph and a molecular fingerprint into different modules, and finally, a fusion type neural network framework is assembled by using a full-connection layer;
3) Selecting a specific mode according to the requirement of model construction, and splitting a data set into a training set, a testing set and a verification set;
4) Inputting the data set into a network model, training and updating parameters in the network according to the difference between the predicted result of the training set and the true value of the training set, determining optimal network parameters according to the optimal result on the verification set, and detecting the data of the test set;
5) Determining an optimal super-parameter combination of the model according to a super-parameter optimization strategy;
6) Generating a targeted optimal model for predicting different drug properties for subsequent application to new small molecule drug property predictions;
7) For the generated optimal model, an explanatory analysis is provided for reference in subsequent drug design.
In the step 1), in order to obtain a data set for training, the method specifically includes the following steps:
1-1) for the pharmaceutical property field of the existing industry accepted classical data set, adopting the classical data set to carry out model construction;
1-2) for the field of pharmaceutical properties for which no accepted classical data sets exist in the industry, the targeted pharmaceutical activity data is collected from experimental records derived from pharmaceutical chemistry or stored in a biological laboratory, or from compound activity data provided by a database of pharmaceutical chemistry published on a network, or from databases of other routes, and is subjected to a model construction after pretreatment.
In the step 1-2), in order to preprocess the obtained original pharmaceutical activity data set, the method specifically comprises the following steps:
1-2-1) obtaining targeted raw pharmaceutical activity data from various sources;
1-2-2) checking weight according to small drug molecules, and averaging the activity data of repeated molecules;
1-2-3) carrying out dehydroions, desalination ions, structural force field optimization and the like on small molecules of the medicine;
1-2-4) for regression tasks, specific activity values are retained; for classification tasks, labeling negative and positive drug micromolecules according to a specified threshold;
1-2-5) data sets are presented as simplified molecular linear input canonical format (SMILES) for small molecules of drug and corresponding target values.
In the step 2), the constructed model specifically comprises the following points:
2-1) the feature extraction part of the model fuses two modules for extracting the molecular map features and extracting the molecular fingerprint features, and extracts the features of the drug small molecules respectively to generate corresponding feature vectors;
2-2) a module for extracting molecular diagram features from the model, and adopting a network structure of a diagram attention mechanism; generating a component graph according to the inputted SMILES format: the atoms of the molecules map nodes in the component diagram, the chemical bonds map edges in the component diagram, and physical and chemical properties of the atoms and the chemical bonds are calculated and used as initial feature vectors of the points and the edges; attention mechanisms in the network structure pay attention to the influence among adjacent atoms, namely, the attention among the adjacent atoms, so as to iteratively update the feature vectors of the atoms in the molecule; after the iterative updating is finished, integrating the feature vectors of all atoms to be used as the feature vectors of the molecular graph to be output;
2-3) a module for extracting molecular fingerprint characteristics from the model, wherein a plurality of full-connection layers are adopted; three different types of molecular fingerprints are generated according to the inputted SMILES format: molecular fingerprint MACCS FP based on substructure, molecular fingerprint PubChem FP based on substructure, molecular fingerprint Pharmacophore ErGFP based on pharmacophore; inputting the serial connection of the three fingerprints into a full-connection layer network of the module to obtain a characteristic vector of the molecular fingerprint;
2-4) the model is used for inputting the characteristic vectors generated by the two modules into a plurality of full-connection layers after splicing, so as to predict the properties of small molecules of the medicine and generate a final prediction result.
In the step 2-2), the molecular diagram feature extraction module of the model specifically comprises the following steps when extracting the molecular diagram features:
2-2-1) calculating the physicochemical property of each atom as an initial feature vector of points in the molecular diagram; the physicochemical properties specifically include: the atomic type (carbon, nitrogen, oxygen or other types), the number of attached chemical bonds, the number of charges, the chiral carbon case, the number of attached hydrogen atoms, the hybridization orbit case, the atomic mass, whether aromatic or not, etc., including atoms having an atomic number within one hundred such as carbon, nitrogen, oxygen, fluorine, etc.
2-2-2) Calculating the attention degree between adjacent atoms, and updating the expression of the atoms according to the attention iteration as follows:
eij=LeakyRelu(a·[W1hi||W1hj])
Wherein h i and h j are iterative feature vectors of adjacent atoms i and j, W 1 is a weight matrix, and alpha ij is a weight; the attention value calculated between the adjacent atoms i and j is e ij, and the attention value e ik of each adjacent atom K and the atom i in K adjacent atoms is summed, so that the attention effect of the atom j on the atom i is calculated; before updating the atom i, carrying out normalization processing on the attention values corresponding to all neighbors of the atom i to obtain alpha ij; the number of the multiple attentions is K, the multi-head attentions mechanism repeatedly calculates multiple attentions, and an average value of the multiple attentions is taken to update an atom i, so that an iterative characteristic vector h i' is obtained;
2-2-3) computing feature vectors of the molecular graph, the expression being:
Wherein N is the total atomic number of the molecule, h i' is the eigenvector of the atoms after the iteration update, and the eigenvectors of all the atoms are averaged to be used as the eigenvector of the molecule.
In the step 3), the method specifically comprises the following steps when splitting the data set required by constructing the model:
3-1) the model can define the splitting mode and splitting proportion in a self-defined way;
3-2) the built-in splitting mode of the model is as follows: randomly splitting and splitting a framework; randomly splitting, namely randomly splitting the data set out of order; firstly, calculating the skeleton number and the corresponding molecular number of small drug molecules in a data set, and orderly classifying the skeleton and the molecules with small corresponding molecular numbers into a verification set and a test set until the number of the molecules in the verification set and the test set is enough, and uniformly classifying the remaining molecules into a training set; the framework splitting can realize that molecular frameworks in a training set, a verification set and a test set are not overlapped, so that higher requirements are put on the prediction capability of the model, and the model is facilitated to find drug molecules with novel frameworks.
In the step 5), when the super parameter of the model is optimized, the method specifically comprises the following steps:
5-1) six super parameters are built in the model: the method comprises the steps of extracting a loss rate of a molecular diagram module, the number of attentiveness of the molecular diagram module, the number of attentiveness iterations of the molecular diagram module, the loss rate of a molecular fingerprint module, the feature vector dimension of the molecular fingerprint module, and the proportion of a molecular diagram to a molecular fingerprint vector when a full-connection layer of a fusion module is input;
5-2) performing super-parameter optimization on the model according to a Bayesian optimization mode, optimizing for 20 rounds, and selecting a group of super-parameters with optimal evaluation scores of the test set.
In the step 6), the prediction application of the model specifically includes the following steps:
6-1) generating an optimal prediction model aiming at specific drug properties according to the optimal super-parameter combination screened in the step 5);
6-2) when predicting a drug molecule with unknown properties, loading a corresponding optimal model, and inputting the SMILES format of the molecule into the model to obtain a predicted result of the drug molecule;
6-3) the model supports mass prediction of drug molecules with unknown properties, and realizes rapid and efficient molecular property judgment.
The step 7) is that the model specifically comprises the following steps when performing explanatory analysis:
7-1) providing two model interpretation functions such as fingerprint interpretation and molecular diagram interpretation according to the optimal prediction model for specific drug properties generated in the step 6) and the input requirement of a user;
7-2) when a user requires fingerprint interpretation, calculating importance indexes of different fingerprint sites in the model, wherein the higher the indexes are, the greater the role played by the sites in the model generation process is, and the intramolecular information represented by the sites plays an important role in designing drug molecules aiming at specific drug properties;
7-3) when the user requests the interpretation of the molecular diagram, calculating the attention value in the molecular diagram in the model, mapping the attention value of a certain part of atoms to the molecular diagram, wherein the higher the attention value of a certain part of atoms is, the greater the effect of the structure in the model generation process is, and the important effect is on designing the drug molecules aiming at specific drug properties.
A system for predicting drug properties based on deep learning fusion of molecular figures and fingerprints comprises: the data preprocessing module is used for preprocessing the collected chemical molecule activity original data set so that the model can be applied to the construction of new drug molecule property data sets; the model construction module is used for modeling the processed sample through a deep learning model based on a molecular graph and a molecular fingerprint; the deep learning model based on the molecular graph and the molecular fingerprint comprises a feature extraction module based on the molecular graph, a feature extraction module based on the molecular fingerprint and a fusion module; the feature extraction module based on the molecular graph adopts a graph attention mechanism network and focuses on judging the influence of the relationship between adjacent atoms on molecular properties; the characteristic extraction module based on the molecular fingerprints extracts the influence of molecular structures and pharmacophores on molecular properties from three different types of molecular fingerprints; the fusion module is used for merging the feature vectors obtained by the two feature extraction modules and inputting the feature vectors into a multi-layer full-connection layer network; and a prediction module: the prediction module is used for predicting the new drug small molecules according to the optimal model generated by the model construction module, so that the model is applied to the prediction of the new drug molecules; an explanatory module: the explanatory module is used for carrying out explanatory analysis on the small drug molecules according to the optimal model generated by the model construction module, so that the model can provide drug design suggestions aiming at specific drug properties for users;
A non-transitory computer readable storage medium having stored thereon a computer program, wherein the program when executed by a processor implements the method and application for predicting pharmaceutical properties based on deep learning fusion score and fingerprint.
A computer device comprising a storage medium, a processor and a computer program stored on the storage medium and executable on the processor, characterized in that the processor implements the deep learning fusion score and fingerprint based drug property prediction method and application when executing the program.
Compared with the prior art, the invention has the beneficial effects that:
compared with the traditional drug micromolecule property prediction method based on deep learning, the method disclosed by the invention has the advantages that classical molecular fingerprint characteristics are fused, and the defect that important characteristics cannot be effectively extracted on a small-scale data set in deep learning is overcome; compared with the method based on traditional machine learning and manual extraction of molecular features, the method disclosed by the invention combines the advantages of deep learning, autonomously calculates and extracts structural features in the molecular figures by the relation among atoms in the molecular figures of a computer, and improves the defects of manual feature extraction. The invention can be used in the property field where no known classical data set exists, provides a data preprocessing function, collects and processes the original activity data of molecules, and constructs a sample set which can be used for modeling. After the targeted optimal model is established, the method can further accurately and efficiently predict the appointed property of the molecules, thereby effectively improving the efficiency of drug research and development and accelerating the speed of virtual screening of small molecular drugs.
Drawings
FIG. 1 is a flow chart of a method for predicting drug properties based on deep learning fusion of molecular figures and fingerprints;
FIG. 2 is a schematic diagram of a preset network structure of a drug property prediction deep learning model integrating a molecular graph and a fingerprint;
FIG. 3 is a process schematic diagram of a drug property prediction deep learning model training method fusing molecular figures and fingerprints;
FIG. 4 is a graph of performance accuracy versus results for a drug property prediction method based on deep learning fusion molecular graph and fingerprint and other artificial intelligence based drug property prediction methods;
FIG. 5 is a graph of the results of ablative experiments on neural networks using molecular map portions and molecular fingerprint portions alone, based on a deep learning fusion molecular map and fingerprint based drug property prediction method;
FIG. 6 is a graph of accuracy versus results of a drug property prediction method based on deep learning fusion of molecular figures and fingerprints over a classical unbiased dataset LIT-PCBA with a currently leading variety of drug property prediction methods;
FIG. 7 is a graph of model accuracy versus results for a drug property prediction method based on deep learning fusion of molecular graphs and fingerprints using different molecular fingerprints;
FIG. 8a is a graph of the effect on the intensity of protein expression downstream of CDK9 pathway in the results of a protein experiment of a drug molecule that predicts positive for CDK9 inhibitory activity in the application of a deep-learning fusion molecular graph and fingerprint based drug property prediction method;
FIG. 8b is a graph showing the gray scale statistical analysis of the downstream protein p-RNAPICTD (Ser 2) by protein experiments of drug molecules with positive CDK9 inhibitory activity in the application of the deep learning fusion molecular graph and fingerprint based drug property prediction method;
FIG. 8c is a graph of gray scale statistical analysis of downstream protein Mcl-1 for protein experiments of drug molecules predicted positive for CDK9 inhibitory activity in application of a deep learning fusion molecular graph and fingerprint based drug property prediction method;
FIG. 8d is a graph of gray scale statistical analysis of downstream protein CLEAVED PARP for protein experiments of drug molecules predicted positive for CDK9 inhibitory activity in application of the deep-learning fusion molecular graph and fingerprint-based drug property prediction method;
FIG. 8e is a graph of an apoptosis experiment of drug molecules predicted positive for CDK9 inhibitory activity on MOLM-13 cells containing CDK9 targets in application of a deep learning fusion molecular graph and fingerprint based drug property prediction method;
FIG. 8f is a graph showing the results of apoptosis experiments of pyrithione as a control group on MOLM-13 cells containing CDK9 targets and quantitative analysis thereof in the application of the deep learning fusion molecular graph and fingerprint based drug property prediction method;
FIG. 9a is a graph showing the explanatory analysis and verification of negative small molecules for the prediction of the osmotic activity level of the blood brain barrier by a drug property prediction method based on deep learning fusion molecular graph and fingerprint;
FIG. 9b is an explanatory analysis verification graph of positive small molecules predicted to be the osmotic activity of the blood brain barrier by a drug property prediction method based on deep learning fusion molecular figures and fingerprints;
FIG. 10a is an explanatory analysis of a drug property prediction method based on deep learning fusion molecular figures and fingerprints to predict small molecule 1 with inhibitory activity against rapamycin target protein (mTOR) that contains morpholine rings and ureido pharmacophores known to play a key role and is given higher attention;
FIG. 10b is an explanatory analysis of a drug property prediction method based on deep learning fusion molecular figures and fingerprints to predict small molecule 2 with inhibitory activity against rapamycin target protein (mTOR) that contains a bridged morpholino ring pharmacophore known to play a key role and is given higher attention;
Fig. 10c is an explanatory analysis of a drug property prediction method based on deep learning fusion molecular figures and fingerprints to predict small molecule 3 with inhibitory activity against rapamycin target protein (mTOR) that contains a morpholino ring pharmacophore on pyrazolopyrimidine skeleton that is known to play a key role and is given higher attention.
Detailed Description
The following is a specific example of the application in a development team laboratory to illustrate the manner and logic of use of the application, but not to limit the scope of the application, and equivalent modifications to various selected materials of the application by those skilled in the art will fall within the scope of the application as defined in the claims appended hereto.
Example 1
The present embodiment provides a method for predicting pharmaceutical properties based on deep learning fusion molecular figures and fingerprints, taking the inhibition activity of small molecules to be predicted in the present embodiment on cyclin-dependent kinase family members (CDK 1-9, 14, 19) as an example, the method comprises the following steps:
1) Obtaining inhibition activity data comprising a plurality of small drug molecules on cyclin dependent kinase family members (CDKs 1-9, 14, 19) for use in constructing a data set comprising the steps of:
1-2) since there is no accepted established classical data set for treatment in the industry, it is necessary to collect experimental records derived from pharmaceutical chemistry or from laboratory preservation in biological laboratories, or from compound activity data provided by the pharmaceutical chemistry databases published on the network, and then to perform data preprocessing.
1-2-1) The examples of the present invention selected to collect all activity data records recorded for the target from the pharmaceutical chemistry database ChEMBL according to the target sequence numbers of CDKs 1-9, 14, 19.
1-2-2) Data screening. Only the biological activity records of test type B, report activity type IC50, EC50, ki, kd are reserved, and the drug small molecules with a plurality of activity records are subjected to weight checking, and the activity data of repeated molecules are averaged.
1-2-3) Data normalization. And (3) carrying out dehydroion and desalination ion cleaning and force field optimization on the small molecular structure.
1-2-4) Data annotation. In this embodiment, the data types belong to classification tasks, and the molecules need to be systematically labeled according to a specified threshold. For this example, the threshold was 10. Mu.M, small molecules with test activity less than or equal to 10. Mu.M were labeled as inhibitors, and small molecules with a concentration of > 10. Mu.M were labeled as non-inhibitors.
1-2-5) Obtaining standardized data. A total of 12532 compounds and their enzyme inhibition activity data for 11 CDK subtype proteins were obtained after normalization treatment, such as 1871 compounds with test records for CDK1 targets, with 883 compounds labeled non-inhibitor, 988 compounds labeled inhibitor, available data for testing for CDK2 targets containing 4305 compounds, with 1598 compounds labeled non-inhibitor, 2707 compounds labeled inhibitor, and compound activity test points for CDK9 targets containing 1330 compounds, with 243 compounds labeled non-inhibitor, 1087 compounds labeled active. The resulting standardized dataset consisted of 12532 pairs of small drug molecule SMILES formats and corresponding CDK subtype protein inhibitory activity targets.
2) And constructing a deep learning model suitable for drug property prediction, wherein the model fuses two characteristics of a molecular graph and a molecular fingerprint into different modules, and finally, a fusion type neural network framework is assembled by using a full-connection layer. Specifically, in this embodiment, a deep learning network model based on a graph attention mechanism is adopted in the feature extraction module based on the molecular graph; in the characteristic extraction module based on the molecular fingerprint, serial bit strings of three fingerprints MACCS FP, pubChemFP and Pharmacophore ErG FP are selected as the input of the molecular fingerprint representation method. The output vectors of the two feature extraction modules are connected in series and input into a fusion module of a plurality of full-connection layers, and CDK subtype protein inhibition activity corresponding to the small molecule drug is predicted.
3) In the aspect of data set construction of a model, a random splitting mode is selected, and three sets are randomly divided according to the training set, the testing set and the verification set with the proportion of 8, 1 and 1.
4) Inputting the divided data set into a network model, training and updating parameters in the network according to the difference between the predicted result of the training set and the true value of the training set, determining optimal network parameters according to the optimal result on the verification set, and detecting the data of the test set.
Specifically, in this embodiment, when the CDK subtype protein inhibitory activity of the drug small molecule is predicted, a BCE loss function with a sigmoid function as a front is selected to comprehensively calculate the loss between the prediction result and the true value, then a counter-propagation calculation gradient is performed, an Adam optimizer is used to update the network parameters, the iteration is performed for 20 rounds, and finally the ROC-AUC is used as an evaluation index to select the network parameters of the round with the optimal performance on the verification set as a final model.
5) And determining the optimal super-parameter combination of the model according to the super-parameter optimization strategy.
In particular, in this embodiment, for the drug small molecules, six super parameters and their corresponding optimal selection ranges are set in total for their corresponding CDK subtype protein inhibitory activity prediction: the loss rate of the extracted molecular graph module ([ 0,0.05, …,0.6 ]), the attention number of the extracted molecular graph module ([ 2,3, …,8 ]), the attention iteration number of the extracted molecular graph module ([ 40,45, …,80 ]), the loss rate of the extracted molecular fingerprint module ([ 0,0.05, …,0.6 ]), the feature vector dimension of the extracted molecular fingerprint module ([ 300,350, …,600 ]), the molecular graph and the molecular fingerprint vector ratio at the input of the fusion module full link layer ([ 0,0.1, …,1 ]).
In order to find out the excellent super-parameter combination as efficiently and accurately as possible, a bayesian optimization strategy is adopted in the embodiment, and six super-parameters and the range thereof are combined and explored. The Bayesian optimization strategy calculates posterior probability parts of the existing results through Gaussian process regression according to the existing super-parameter combinations and results, obtains expected mean values and variances of the six super-parameters on each possible value, and comprehensively judges which value combination is selected by the six super-parameters during the next optimization. In the Bayes optimization process, as the number of molecules of the drug small molecules in the corresponding CDK subtype protein inhibitory activity prediction data set is not large, the chemical distribution of the data sets is different, in order to reduce the influence caused by random splitting of the sample sets, ten random number seeds are selected to split ten versions of data sets when each super-parameter combination is calculated, and the average value of ten training results is used as the evaluation value of each step of optimization. In the bayesian optimization of the embodiment, 15 steps are combined, and the super-parameter combination with the optimal evaluation index on the verification set is selected as the final super-parameter combination.
6) And for the prediction of different drug properties, generating a targeted optimal model for subsequent application to new small molecule drug property prediction.
Specifically, in this example, a total of 11 optimal models for CDKs 1-9, 14, 19 were constructed using a small molecule-based data set of inhibitory activity on cyclin-dependent kinase family members (CDKs 1-9, 14, 19) to provide a user with predictions of properties of novel drug small molecules of the CDK family.
In this embodiment, the prediction application is performed using 11 optimal models for the CDK family, specifically including the following steps:
6-1) selecting an existing library comprising a population of desired compounds or compounds for which a predicted target value is desired.
A SPECS compound library (containing 208670 compounds, https:// www.specs.net /) was selected for the present example to mine CDK9 inhibitors. A library of CDK9 inhibitor screening compounds (about 194916 compounds) was created by subjecting the SPECS library to the same standardized protocol as in step S103, and filtering by Lipinski' S rule five.
6-2) Inputting the library of compounds into a constructed optimal deep learning predictive model for CDK 9. The SMILES chemical character string of each molecule in the compound library is input into an optimal model constructed on CDK9 inhibition activity prediction by the drug property prediction algorithm based on the deep learning fusion molecular graph and the fingerprint, and the inhibition degree of the corresponding molecule to CDK9 kinase is output through calculation of each node, and the more the output value is close to 1, the more the CDK9 kinase is inhibited.
6-3) Ranking the compound library from high to low the data of degree of inhibition of CDK9 kinase calculated in CDK9 optimal model, selecting the first 1000 molecules from 194916 compounds for further analysis. Finally, 19 compounds were selected and purchased for biological experimental verification by molecular docking process software based on visual ligand-protein interaction analysis. In biological experiments, the verification result of the cell level shows that 6 compounds in the 19 compounds have obvious anticancer activity of the cancer cell level, and the in vitro CDK9 kinase inhibition test result shows that 5 compounds have obvious inhibition activity on targets.
The embodiment of the invention shows that the medicine property prediction result based on the big data and the deep learning neural network prediction model is correct, and meets the practical situation. The drug property prediction algorithm based on the deep learning fusion molecular graph and the fingerprint provided by the invention has advancement and practicability, and can provide rapid and efficient screening of drug molecular property prediction for drug chemists and practitioners in related fields.
Corresponding to the embodiment, the invention also provides a computer device.
The computer device of the present embodiment includes a storage medium, a processor, and a computer program stored on the storage medium and executable on the processor, where the processor can implement the deep learning fusion score and fingerprint-based drug property prediction method and application described in the embodiments when executing the program. The computer device of this example, when processing a computer program, generates a total of 11 optimal models for different kinases by data preprocessing the acquired small molecule-based inhibition activity data set of cyclin-dependent kinase family members (CDKs 1-9, 14, 19), and can realize the prediction of the properties of new drug small molecules by using the optimal models. The computer device of the embodiment can rapidly and efficiently predict the inhibitory activity of 11 CDK kinase family members, promote the research and development efficiency when research and development of CDK kinase family member related drugs, and accelerate the speed of virtual screening.
The present invention also proposes a non-transitory computer-readable storage medium corresponding to the above-described embodiments.
The non-transitory computer-readable storage medium of the present embodiment stores thereon a computer program that is executed by a processor to perform a method for predicting drug properties based on deep learning fusion of a molecular graph and a fingerprint. The non-transitory computer readable storage medium of this example contained an acquired data set based on inhibitory activity of small molecules on cyclin dependent kinase family members (CDKs 1-9, 14, 19), a pre-treated sample set, and a total of 11 optimal models for different kinases generated from the sample set. By using the non-transitory computer readable storage medium of the embodiment, the optimal model generated by the embodiment can be directly used, so that the model generation time is saved, a user can rapidly and efficiently predict the inhibitory activity of 11 CDK kinase family members, the research and development efficiency in research and development of CDK kinase family member related drugs is improved, and the virtual screening speed is accelerated.
Example 2
The present example provides a drug property prediction method and an explanatory analysis of molecular patterns based on deep learning fusion of molecular patterns and molecular fingerprints, taking the permeability activity of small molecules to be predicted in the present example on the blood brain barrier as an example, the method comprises the following steps:
1) A data set of permeability activity of a large number of small molecules of the drug against the blood brain barrier was obtained, the data set was derived from a pharmaceutical chemistry data set disclosed on the network, the data had been pre-treated to give a total of 2039 compounds, of which 1560 positive compounds, 479 negative compounds, the positive ratio of the data set was 23.49%. The standardized dataset consisted of 2039 osmotic activity yin-yang values for the drug small molecule SMILES format and its corresponding blood brain barrier.
2) And constructing a deep learning model suitable for drug property prediction, wherein the model fuses two characteristics of a molecular graph and a molecular fingerprint into different modules, and finally, a fusion type neural network framework is assembled by using a full-connection layer. Specifically, in this embodiment, a deep learning network model based on a graph attention mechanism is adopted in the feature extraction module based on the molecular graph; in the characteristic extraction module based on the molecular fingerprint, serial bit strings of three fingerprints MACCS FP, pubChemFP and Pharmacophore ErG FP are selected as the input of the molecular fingerprint representation method. The output vectors of the two feature extraction modules are connected in series and input into a fusion module of a plurality of full-connection layers, and the blood brain barrier permeability activity value corresponding to the small molecular medicine is predicted.
3) In the aspect of data set construction of a model, a random splitting mode is selected, and three sets are randomly divided according to the training set, the testing set and the verification set with the proportion of 8, 1 and 1.
4) Inputting the divided data set into a network model, training and updating parameters in the network according to the difference between the predicted result of the training set and the true value of the training set, determining optimal network parameters according to the optimal result on the verification set, and detecting the data of the test set.
Specifically, in this embodiment, when the permeability activity of the blood brain barrier corresponding to the small drug molecule is predicted, a BCE loss function with a sigmoid function as a front is selected to comprehensively calculate the loss between the predicted result and the true value, then a counter-propagation calculation gradient is performed, an Adam optimizer is used to update the network parameters, the iteration is performed for 20 rounds, and finally the ROC-AUC is used as an evaluation index to select the network parameters of the round with the optimal performance on the verification set as a final model.
5) And determining the optimal super-parameter combination of the model according to the super-parameter optimization strategy.
In this embodiment, six super parameters and the corresponding optimal selection range are set for the drug small molecules in total when the corresponding blood brain barrier permeability activity is predicted: the loss rate of the extracted molecular graph module ([ 0,0.05, …,0.6 ]), the attention number of the extracted molecular graph module ([ 2,3, …,8 ]), the attention iteration number of the extracted molecular graph module ([ 40,45, …,80 ]), the loss rate of the extracted molecular fingerprint module ([ 0,0.05, …,0.6 ]), the feature vector dimension of the extracted molecular fingerprint module ([ 300,350, …,600 ]), the molecular graph and the molecular fingerprint vector ratio at the input of the fusion module full link layer ([ 0,0.1, …,1 ]).
In order to find out the excellent super-parameter combination as efficiently and accurately as possible, a bayesian optimization strategy is adopted in the embodiment, and six super-parameters and the range thereof are combined and explored. The Bayesian optimization strategy calculates posterior probability parts of the existing results through Gaussian process regression according to the existing super-parameter combinations and results, obtains expected mean values and variances of the six super-parameters on each possible value, and comprehensively judges which value combination is selected by the six super-parameters during the next optimization. In the Bayes optimizing process, as the number of molecules of the drug micromolecules in the corresponding blood brain barrier permeability activity prediction data set is not large, the chemical distribution of the data sets is different, in order to reduce the influence caused by random splitting of the sample sets, ten random number seeds are selected to split ten versions of data sets when each super-parameter combination is calculated, and the average value of ten training results is used as the evaluation value of each step of optimization. In this embodiment, bayesian optimization is performed for 20 steps altogether, and a hyper-parameter combination with the optimal evaluation index on the verification set is selected as a final hyper-parameter combination.
6) For the prediction of different drug properties, a targeted optimal model is generated for subsequent application to new small molecule drug property prediction and interpretation analysis.
In particular, in this embodiment, 1 optimal model for blood brain barrier permeability is constructed using a blood brain barrier permeability activity dataset based on small molecules, and the model is provided to a user for predicting properties of small molecules of a new drug for blood brain barrier permeability.
7) In this embodiment, an explanatory analysis is performed using an optimal model for blood brain barrier permeability, specifically including the steps of:
7-1) generating a molecular data set which needs to be subjected to explanatory analysis, and selecting 2 small drug molecules;
7-2) loading a pre-generated optimal model aiming at blood brain barrier permeability, and carrying out property prediction and molecular diagram module explanatory analysis on small drug molecules;
7-3) generating property predictions and analyzing the results by the molecular map module (see FIG. 9a and FIG. 9 b);
7-4) SMILES format of molecule 1: [ C@H ]1CN (C [ C @ H ] (C) N1) C2C (F) C (N) C3C (=o) C (=cn (C4 CC 4) C3C 2F) C (O) =o, and the property prediction result is 0.134, and the predicted negative molecule is consistent with the actual blood brain barrier permeability of the molecule. As shown in fig. 9a, the higher the attention value obtained by calculation and judgment, the more attention the part structure is focused on by the representative model in prediction, and the part in the circle is the part with higher attention value obtained by calculation, namely the substructure of the part where the edges of the model are considered to play an important role in that the molecule cannot penetrate the blood brain barrier. The color of the square frame is darker than that of the round frame, namely the square frame inner substructure plays an important role in that molecules cannot penetrate through the blood brain barrier. The blood brain barrier is a membrane structure, and whether the molecule can penetrate the blood brain barrier is related to the lipophilicity and polarity of the molecule, and for negative molecules, the larger the polarity of the molecule is, the lower the ClogP value of the molecule is, the less the molecule can penetrate the blood brain barrier. The molecular analysis is carried out by adopting software chembiosrow, the ClogP value in the square area is calculated to be-0.905 by adopting the software, the ClogP value in the round area is 0.934, the value of the square area is lower and the polarity is larger compared with the ClogP value of the two areas, the molecular analysis plays an important role in the failure of the molecular to penetrate the blood brain barrier, and the high attention of the model to the square area is consistent with the judgment that the model cannot penetrate the blood brain barrier in the whole molecular analysis. The SMILES format for molecule 2 is: c1CCN (CC 1) CC1cccc (C1) OCCCNC (=o) C, property prediction result is 0.988, and is predicted as a positive molecule, consistent with the actual blood brain barrier permeability of the molecule. Molecular profiling analysis as shown in fig. 9b, the part in the circle is the part with larger calculated attention value, and represents that the model pays more attention to the part structure in prediction, namely the substructure of the edge of the part of the model plays an important role in the molecular penetration of the blood brain barrier. The color of the substructure surrounded by the square frame is darker than that of the substructure surrounded by the round frame, namely the substructure in the square frame plays an important role in enabling molecules to penetrate the blood brain barrier. For positive molecules, the lower the molecular polarity, the higher the ClogP value of the molecule, the more able to penetrate the blood brain barrier. The result of quantitative analysis by software shows that the ClogP value in the square area is 2.142, the ClogP value in the round area is 1.389, and compared with the ClogP values of the two areas, the square area has higher value and smaller polarity, plays an important role in the penetration of molecules through the blood brain barrier, and the high attention of the model to the square area is consistent with the judgment that the model can penetrate through the blood brain barrier on the whole molecule. For the comparison of the two molecules, it was found that molecule 2 was a positive molecule, and the ClogP value of the low-interest circular region was much higher than that of the low-interest circular region in negative molecule 1, which side of the explanation indicated that molecule 2 was less polar overall and more likely to pass the blood brain barrier. The prediction and explanatory analysis of the two molecules by the model are consistent with the actual situation, and the optimal model constructed by the algorithm can perform correct property prediction and reasonable explanatory analysis on the molecules, thereby providing powerful help for chemists to design drug molecules.
Corresponding to the embodiment, the invention also provides a computer device.
The computer device of the present embodiment includes a storage medium, a processor, and a computer program stored on the storage medium and executable on the processor, where the processor can implement the deep learning fusion score and fingerprint-based drug property prediction method and application described in the embodiments when executing the program. When the computer equipment of the embodiment processes the computer program, the acquired small molecule-based blood brain barrier permeability activity data set is collected to generate 1 optimal model, and the prediction and explanatory analysis of the new drug small molecule property by using the optimal model can be realized. The computer equipment of the embodiment can rapidly and efficiently predict the permeability activity of the blood brain barrier, improves the research and development efficiency when researching and developing related drugs which need to penetrate the blood brain barrier, and accelerates the speed of virtual screening.
The present invention also proposes a non-transitory computer-readable storage medium corresponding to the above-described embodiments.
The non-transitory computer-readable storage medium of the present embodiment stores thereon a computer program that is executed by a processor to perform a method for predicting drug properties based on deep learning fusion of a molecular graph and a fingerprint. The non-transitory computer readable storage medium of this embodiment contains an acquired data set based on small molecule to blood brain barrier permeability activity, and generates 1 optimal model according to the data set. By using the non-transitory computer readable storage medium of the embodiment, the optimal model generated by the embodiment can be directly used, so that the model generation time is saved, a user can rapidly and efficiently predict the inhibitory activity and analyze the molecular structure of small molecules capable of penetrating the blood brain barrier, the research and development efficiency in research and development of related drugs which need to penetrate the blood brain barrier is improved, and the virtual screening speed is accelerated.
Example 3
The present embodiment provides a method for predicting pharmaceutical properties based on deep learning fusion molecular figures and fingerprints, taking the inhibition activity of small molecules to be predicted in the present embodiment on rapamycin target protein (mTOR) as an example, the method comprises the following steps:
1) Obtaining data comprising a plurality of drug small molecules inhibiting activity on rapamycin target proteins (mTOR) for constructing a data set, wherein the data set is constructed by the following steps:
1-2) since there is no accepted established classical data set for treatment in the industry, it is necessary to collect experimental records derived from pharmaceutical chemistry or from laboratory preservation in biological laboratories, or from compound activity data provided by the pharmaceutical chemistry databases published on the network, and then to perform data preprocessing.
1-2-1) The examples of the present invention opted to obtain relevant protein level information from the protein database UniProt and collect all activity data records recorded for the mTOR kinase from the pharmaceutical chemistry database ChEMBL according to Uniport ID for that kinase.
1-2-2) Data screening. Only the biological activity records of test type B, report activity type IC50, EC50, ki, kd are reserved, and the drug small molecules with a plurality of activity records are subjected to weight checking, and the activity data of repeated molecules are averaged.
1-2-3) Data normalization. And (3) carrying out dehydroion and desalination ion cleaning and force field optimization on the small molecular structure.
1-2-4) Data annotation. In this embodiment, the data types belong to classification tasks, and the molecules need to be systematically labeled according to a specified threshold. For this example, the threshold was 1. Mu.M, small molecules with test activity less than or equal to 1. Mu.M were labeled as inhibitors, and small molecules with > 1. Mu.M were labeled as non-inhibitors.
1-2-5) Obtaining standardized data. A total of 4104 compounds and their kinase inhibition activity data for mTOR kinase were obtained after normalization treatment, with 565 compounds labeled as non-inhibitors and 3539 compounds labeled as inhibitors, with a positive data set of 86.23%. The resulting standardized dataset consisted of 4104 versus small molecule SMILES format of drug and corresponding target mTOR kinase inhibitor activity.
2) And constructing a deep learning model suitable for drug property prediction, wherein the model fuses two characteristics of a molecular graph and a molecular fingerprint into different modules, and finally, a fusion type neural network framework is assembled by using a full-connection layer. Specifically, in this embodiment, a deep learning network model based on a graph attention mechanism is adopted in the feature extraction module based on the molecular graph; in the characteristic extraction module based on the molecular fingerprint, serial bit strings of three fingerprints MACCS FP, pubChemFP and Pharmacophore ErG FP are selected as the input of the molecular fingerprint representation method. The output vectors of the two feature extraction modules are connected in series and input into a fusion module of a plurality of full-connection layers, and mTOR kinase inhibition activity corresponding to the small molecular medicine is predicted.
3) In the aspect of data set construction of a model, a random splitting mode is selected, and three sets are randomly divided according to the training set, the testing set and the verification set with the proportion of 8, 1 and 1.
4) Inputting the divided data set into a network model, training and updating parameters in the network according to the difference between the predicted result of the training set and the true value of the training set, determining optimal network parameters according to the optimal result on the verification set, and detecting the data of the test set.
Specifically, in this embodiment, when the mTOR kinase inhibitory activity of the drug small molecule is predicted, a BCE loss function with sigmoid function as a front is selected to comprehensively calculate the loss between the predicted result and the true value, then a counter-propagation calculation gradient is performed, an Adam optimizer is used to update the network parameters, the iteration is performed for 40 rounds, and finally the ROC-AUC is used as an evaluation index to select the network parameters with optimal rounds on the verification set as a final model.
5) And determining the optimal super-parameter combination of the model according to the super-parameter optimization strategy.
In particular, in this embodiment, for the drug small molecules, six super parameters and their corresponding optimal selection ranges are set in total for their corresponding CDK subtype protein inhibitory activity prediction: the loss rate of the extracted molecular graph module ([ 0,0.05, …,0.6 ]), the attention number of the extracted molecular graph module ([ 2,3, …,8 ]), the attention iteration number of the extracted molecular graph module ([ 40,45, …,80 ]), the loss rate of the extracted molecular fingerprint module ([ 0,0.05, …,0.6 ]), the feature vector dimension of the extracted molecular fingerprint module ([ 300,350, …,600 ]), the molecular graph and the molecular fingerprint vector ratio at the input of the fusion module full link layer ([ 0,0.1, …,1 ]).
In order to find out the excellent super-parameter combination as efficiently and accurately as possible, a bayesian optimization strategy is adopted in the embodiment, and six super-parameters and the range thereof are combined and explored. The Bayesian optimization strategy calculates posterior probability parts of the existing results through Gaussian process regression according to the existing super-parameter combinations and results, obtains expected mean values and variances of the six super-parameters on each possible value, and comprehensively judges which value combination is selected by the six super-parameters during the next optimization. In the Bayes optimization process, because the molecular number of the drug small molecules in the corresponding mTOR kinase inhibition activity prediction data sets is not large, the chemical distribution of the data sets is different, in order to reduce the influence caused by random splitting of the sample sets, ten random number seeds are selected to split ten versions of data sets when each super-parameter combination is calculated, and the average value of ten training results is used as the evaluation value of each step of optimization. In this embodiment, bayesian optimization is performed for 20 steps altogether, and a hyper-parameter combination with the optimal evaluation index on the verification set is selected as a final hyper-parameter combination.
6) For the prediction of different drug properties, a targeted optimal model is generated for subsequent application to new small molecule drug property prediction and interpretation analysis.
In particular, in this example, 1 optimal model was constructed using a data set based on the inhibitory activity of small molecules on mTOR kinase, and the user was provided with predictions of the properties and explanatory analysis of molecular map modules for new drug small molecules of the CDK family.
7) In this example, using an optimal model for mTOR kinase, predictive and analytical applications were performed, specifically comprising the steps of:
7-1) generating a molecular data set which needs to be subjected to explanatory analysis, and selecting 3 small drug molecules;
7-2) loading a pre-generated optimal model for the mTOR kinase, and carrying out property prediction and molecular diagram module explanatory analysis on the small drug molecules;
7-3) generating property prediction results and analyzing results (such as FIG. 10a, FIG. 10b, FIG. 10 c) by a molecular diagram module;
7-4) the predicted values for mTOR inhibitory activity were 0.984, 0.949 and 0.964, respectively, for three molecules, all predicted as positive molecules, consistent with the actual inhibition of the mTOR kinase by the molecules. The inhibitory activity of a small molecule against an mTOR kinase is greatly dependent on the binding capacity of the molecule to hydrogen bonds, and in the box labeling of three molecules, morpholine rings and ureido are hydrogen bond acceptors and can form hydrogen bond action, so that the inhibition activity on the mTOR kinase is very high. Consistent with the results of molecules predicted to be positive. On the pre-generated optimal model, the prediction and molecular graph explanatory analysis of the three small molecules are consistent with the actual inhibitory activity of the molecules and the actual hydrogen bonding capability of the substructures, and the model constructed based on the invention is proved to be capable of realizing the prediction and model explanation of the inhibitory activity of mTOR kinase. The invention can effectively help chemists to carry out mass drug screening and drug molecule design on mTOR kinase.
Corresponding to the embodiment, the invention also provides a computer device.
The computer device of the present embodiment includes a storage medium, a processor, and a computer program stored on the storage medium and executable on the processor, where the processor can implement the deep learning fusion score and fingerprint-based drug property prediction method and application described in the embodiments when executing the program. When the computer program is processed, the obtained data set based on the inhibition activity of the mTOR kinase is subjected to data preprocessing to generate 1 optimal model, and the prediction and explanatory analysis of the new drug small molecule property by using the optimal model can be realized. The computer equipment of the embodiment can rapidly and efficiently predict the inhibition activity of the mTOR kinase, promote the research and development efficiency when researching and developing related drugs which need to target the mTOR kinase, and accelerate the speed of virtual screening.
Corresponding to the above embodiment, the present embodiment further provides a system for predicting pharmaceutical properties based on deep learning fusion of molecular figures and fingerprints, including: the data preprocessing module is used for preprocessing the collected chemical molecule activity original data set so that the model can be applied to the construction of new drug molecule property data sets; the model construction module is used for modeling the processed sample through a deep learning model based on a molecular graph and a molecular fingerprint; the deep learning model based on the molecular graph and the molecular fingerprint comprises a feature extraction module based on the molecular graph, a feature extraction module based on the molecular fingerprint and a fusion module; the feature extraction module based on the molecular graph adopts a graph attention mechanism network and focuses on judging the influence of the relationship between adjacent atoms on molecular properties; the characteristic extraction module based on the molecular fingerprints extracts the influence of molecular structures and pharmacophores on molecular properties from three different types of molecular fingerprints; the fusion module is used for merging the feature vectors obtained by the two feature extraction modules and inputting the feature vectors into a multi-layer full-connection layer network; and a prediction module: the prediction module is used for predicting the new drug small molecules according to the optimal model generated by the model construction module, so that the model is applied to the prediction of the new drug molecules; an explanatory module: the explanatory module is used for carrying out explanatory analysis on the small drug molecules according to the optimal model generated by the model construction module, so that the model can provide drug design suggestions aiming at specific drug properties for users.
The invention also proposes a non-transitory computer readable storage medium.
The non-transitory computer-readable storage medium of the present embodiment stores thereon a computer program that is executed by a processor to perform a method for predicting drug properties based on deep learning fusion of a molecular graph and a fingerprint. The non-transitory computer readable storage medium of this embodiment contains an acquired data set based on small molecule inhibition activity of mTOR kinase, and generates 1 optimal model according to the data set. By using the non-transitory computer readable storage medium of the embodiment, the optimal model generated by the embodiment can be directly used, so that the model generation time is saved, a user can rapidly and efficiently predict the activity and analyze the molecular structure of the small molecule capable of inhibiting the mTOR kinase, the research and development efficiency when the related medicine which needs to target the mTOR kinase is researched and developed is improved, and the speed of virtual screening is accelerated.
Claims (5)
1. The medicine property prediction method based on the deep learning fusion molecular graph and the fingerprint is characterized by comprising the following steps of:
1) For the prediction of different drug properties, a targeted and specific data set containing a large amount of drug small molecule data is obtained;
2) Constructing a deep learning model suitable for drug property prediction, wherein the model fuses two characteristics of a molecular graph and a molecular fingerprint into different modules, and finally, a fusion type neural network framework is assembled by using a full-connection layer;
3) Selecting a specific mode according to the requirement of model construction, and splitting a data set into a training set, a testing set and a verification set;
4) Inputting the data set into a network model, training and updating parameters in the network according to the difference between the predicted result of the training set and the true value of the training set, determining optimal network parameters according to the optimal result on the verification set, and detecting the data of the test set;
5) Determining an optimal super-parameter combination of the model according to a super-parameter optimization strategy;
6) Generating a targeted optimal model for predicting different drug properties for subsequent application to new small molecule drug property predictions;
7) For the generated optimal model, providing an explanatory analysis for reference of subsequent drug design;
The step 2) specifically comprises the following steps:
2-1) the feature extraction part of the model fuses two modules for extracting the molecular map features and extracting the molecular fingerprint features, and extracts the features of the drug small molecules respectively to generate corresponding feature vectors;
2-2) a module for extracting molecular diagram features from the model, and adopting a network structure of a diagram attention mechanism; generating a component graph according to the inputted SMILES format: the atoms of the molecules map nodes in the component diagram, the chemical bonds map edges in the component diagram, and physical and chemical properties of the atoms and the chemical bonds are calculated and used as initial feature vectors of the points and the edges; attention mechanisms in the network structure pay attention to the influence among adjacent atoms, namely, the attention among the adjacent atoms, so as to iteratively update the feature vectors of the atoms in the molecule; after the iterative updating is finished, integrating the feature vectors of all atoms to be used as the feature vectors of the molecular graph to be output;
2-3) a module for extracting molecular fingerprint characteristics from the model, wherein a plurality of full-connection layers are adopted; three different types of molecular fingerprints are generated according to the inputted SMILES format: molecular fingerprint MACCS FP based on substructure, molecular fingerprint PubChem FP based on substructure, molecular fingerprint Pharmacophore ErG FP based on pharmacophore; inputting the serial connection of the three fingerprints into a full-connection layer network of the module to obtain a characteristic vector of the molecular fingerprint;
2-4) the model splices the feature vectors generated by the two modules and inputs the feature vectors into a plurality of full-connection layers for predicting the properties of small molecules of the medicine to generate a final prediction result;
the step 2-2) specifically comprises the following steps:
2-2-1) calculating the physicochemical property of each atom as an initial feature vector of points in the molecular diagram; the physicochemical properties specifically include: atom type, number of chemical bonds attached, number of charges, chiral carbon case, number of hydrogen attached, hybridization orbital case, atomic mass, whether aromatic or not; the atomic type is an atom with an atomic number within one hundred;
2-2-2) calculating the attention degree between adjacent atoms, and updating the expression of the atoms according to the attention iteration as follows:
Wherein, And/>Iterative feature vector for adjacent atoms i and j,/>Is a weight matrix,/>Is the weight; the calculated attention value between adjacent atoms i and j is/>; Before updating the atom i, carrying out normalization processing on the attention values corresponding to all neighbors of the atom i to obtain/>; Repeatedly calculating multiple attentions by using a multi-head attentions mechanism, and updating an atom i by taking the average value of the multiple attentions to obtain an iterative feature vector/>;
2-2-3) Computing feature vectors of the molecular graph, the expression being:
Wherein, For the feature vector of the atoms after the iteration update is finished, taking the average value of the feature vectors of all the atoms as the feature vector of the molecules;
the step 5) specifically comprises the following steps:
5-1) six super parameters are built in the model: the method comprises the steps of extracting a loss rate of a molecular diagram module, the number of attentiveness of the molecular diagram module, the number of attentiveness iterations of the molecular diagram module, the loss rate of a molecular fingerprint module, the feature vector dimension of the molecular fingerprint module, and the proportion of a molecular diagram to a molecular fingerprint vector when a full-connection layer of a fusion module is input;
5-2) performing super-parameter optimization on the model according to a Bayesian optimization mode, optimizing for 20 rounds, and selecting a group of super-parameters with optimal evaluation scores of the test set;
Step 6) specifically comprises the following steps:
6-1) generating an optimal prediction model aiming at specific drug properties according to the optimal super-parameter combination screened in the step 5);
6-2) when predicting a drug molecule with unknown properties, loading a corresponding optimal model, and inputting the SMILES format of the molecule into the model to obtain a predicted result of the drug molecule;
6-3) the model supports mass prediction of drug molecules with unknown properties, so that quick and efficient molecular property judgment is realized;
The step 7) specifically comprises the following steps:
7-1) providing fingerprint interpretation and molecular diagram interpretation functions according to the optimal prediction model for specific drug properties generated in the step 6) and the input requirements of a user;
7-2) when a user requires fingerprint interpretation, calculating importance indexes of different fingerprint sites in the model, wherein the higher the indexes are, the greater the role played by the sites in the model generation process is, and the intramolecular information represented by the sites plays an important role in designing drug molecules aiming at specific drug properties;
7-3) when the user requests the interpretation of the molecular diagram, calculating the attention value in the molecular diagram in the model, mapping the attention value to the molecular diagram, wherein the larger the attention value of a part of atoms, the larger the effect of the structure in the model generating process is, and playing an important role in designing the drug molecules aiming at specific drug properties.
2. The method for predicting pharmaceutical properties based on deep learning fusion score and fingerprint according to claim 1, wherein step 1) specifically comprises the steps of:
1-1) for the pharmaceutical property field of the existing industry accepted classical data set, adopting the classical data set to carry out model construction;
1-2) for the field of pharmaceutical properties for which no accepted classical data sets exist in the industry, the targeted pharmaceutical activity data is collected from experimental records derived from pharmaceutical chemistry or stored in a biological laboratory, or from compound activity data provided by a database of pharmaceutical chemistry published on a network, or from databases of other routes, and is subjected to a model construction after pretreatment.
3. The method for predicting pharmaceutical properties based on deep learning fusion score and fingerprint according to claim 2, wherein the step 1-2) specifically comprises the steps of:
1-2-1) obtaining targeted raw pharmaceutical activity data from various sources;
1-2-2) checking weight according to small drug molecules, and averaging the activity data of repeated molecules;
1-2-3) carrying out dehydroion, desalination ion and structural force field optimization on small molecules of the medicine;
1-2-4) for regression tasks, specific activity values are retained; for classification tasks, labeling negative and positive drug micromolecules according to a specified threshold;
1-2-5) data sets are presented as simplified molecular linear input specification format SMILES for small molecules of a drug and corresponding target values.
4. The method for predicting pharmaceutical properties based on deep learning fusion score and fingerprint according to claim 1, wherein the step 3) specifically comprises the steps of:
3-1) a model self-defining splitting mode and splitting proportion;
3-2) the built-in splitting mode of the model is as follows: randomly splitting and splitting a framework; wherein, randomly splitting, then randomly splitting the data set out of order; and firstly, calculating the skeleton number and the corresponding molecular number of the drug micromolecules in the data set, and orderly classifying the skeleton and the molecules with small corresponding molecular numbers into a verification set and a test set until the molecular numbers of the verification set and the test set are enough, so that the remaining molecules are uniformly classified into a training set.
5. The system for realizing the drug property prediction method based on the deep learning fusion molecular graph and the fingerprint according to any one of claims 1-4 is characterized by comprising a data preprocessing module, a model construction module, a model prediction module and a model interpretation module;
The data preprocessing module is used for preprocessing the collected chemical molecular activity original data set so that the model can be applied to the construction of a new drug molecular property data set;
The model construction module is used for modeling the processed sample through a deep learning model based on a molecular graph and a molecular fingerprint; the deep learning model based on the molecular graph and the molecular fingerprint comprises a feature extraction module based on the molecular graph, a feature extraction module based on the molecular fingerprint and a fusion module; the feature extraction module based on the molecular graph adopts a graph attention mechanism network and focuses on judging the influence of the relationship between adjacent atoms on molecular properties; the characteristic extraction module based on the molecular fingerprints extracts the influence of molecular structures and pharmacophores on molecular properties from three different types of molecular fingerprints; the fusion module is used for merging the feature vectors obtained by the two feature extraction modules and inputting the feature vectors into a multi-layer full-connection layer network;
The prediction module is used for predicting the new drug small molecules according to the optimal model generated by the model construction module, so that the model is applied to the prediction of the new drug molecules;
the model interpretation module is used for performing interpretation analysis on the small drug molecules according to the optimal model generated by the model construction module, so that the model can provide drug design suggestions aiming at specific drug properties for users.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202210654644.7A CN115050428B (en) | 2022-06-10 | 2022-06-10 | Drug property prediction method and system based on deep learning fusion molecular graph and fingerprint |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202210654644.7A CN115050428B (en) | 2022-06-10 | 2022-06-10 | Drug property prediction method and system based on deep learning fusion molecular graph and fingerprint |
Publications (2)
Publication Number | Publication Date |
---|---|
CN115050428A CN115050428A (en) | 2022-09-13 |
CN115050428B true CN115050428B (en) | 2024-06-14 |
Family
ID=83161049
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202210654644.7A Active CN115050428B (en) | 2022-06-10 | 2022-06-10 | Drug property prediction method and system based on deep learning fusion molecular graph and fingerprint |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN115050428B (en) |
Families Citing this family (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN115691703A (en) * | 2022-10-15 | 2023-02-03 | 苏州创腾软件有限公司 | Drug property prediction method and system based on pharmacokinetic model |
CN116230109A (en) * | 2023-05-10 | 2023-06-06 | 北京大学 | Chiral separation prediction method based on deep learning |
CN116502130B (en) * | 2023-06-26 | 2023-09-15 | 湖南大学 | Method for identifying smell characteristics of algae source |
Citations (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN112164427A (en) * | 2020-09-23 | 2021-01-01 | 常州微亿智造科技有限公司 | Method and device for predicting activity of small drug molecule target based on deep learning |
CN112164428A (en) * | 2020-09-23 | 2021-01-01 | 常州微亿智造科技有限公司 | Method and device for predicting properties of small drug molecules based on deep learning |
Family Cites Families (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN112366001A (en) * | 2020-11-25 | 2021-02-12 | 苏州莱奥生物技术有限公司 | Comprehensive early patent medicine property evaluation method based on pharmacokinetics and application thereof |
-
2022
- 2022-06-10 CN CN202210654644.7A patent/CN115050428B/en active Active
Patent Citations (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN112164427A (en) * | 2020-09-23 | 2021-01-01 | 常州微亿智造科技有限公司 | Method and device for predicting activity of small drug molecule target based on deep learning |
CN112164428A (en) * | 2020-09-23 | 2021-01-01 | 常州微亿智造科技有限公司 | Method and device for predicting properties of small drug molecules based on deep learning |
Also Published As
Publication number | Publication date |
---|---|
CN115050428A (en) | 2022-09-13 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN115050428B (en) | Drug property prediction method and system based on deep learning fusion molecular graph and fingerprint | |
Muzio et al. | Biological network analysis with deep learning | |
Al-Tashi et al. | Approaches to multi-objective feature selection: a systematic literature review | |
Lavecchia | Deep learning in drug discovery: opportunities, challenges and future prospects | |
Liu et al. | DrugEx v2: de novo design of drug molecules by Pareto-based multi-objective reinforcement learning in polypharmacology | |
US8296116B2 (en) | Bioinformatics system | |
De Paris et al. | Clustering molecular dynamics trajectories for optimizing docking experiments | |
Maudsley et al. | Intelligent and effective informatic deconvolution of “Big Data” and its future impact on the quantitative nature of neurodegenerative disease therapy | |
Wang et al. | Drug-protein-disease association prediction and drug repositioning based on tensor decomposition | |
Tom et al. | Calibration and generalizability of probabilistic models on low-data chemical datasets with DIONYSUS | |
WO2022082739A1 (en) | Method for predicting protein and ligand molecule binding free energy on basis of convolutional neural network | |
Attea et al. | Improving the performance of evolutionary-based complex detection models in protein–protein interaction networks | |
Zenil et al. | Algorithmic complexity and reprogrammability of chemical structure networks | |
Bongini et al. | A Deep Learning Approach to the Prediction of Drug Side–Effects on Molecular Graphs | |
Oliveira Pereira et al. | End-to-end deep reinforcement learning for targeted drug generation | |
Oloulade et al. | Cancer drug response prediction with surrogate modeling-based graph neural architecture search | |
Asim et al. | MP-VHPPI: Meta predictor for viral host protein-protein interaction prediction in multiple hosts and viruses | |
Park et al. | Dual Representation Learning for Predicting Drug-side Effect Frequency using Protein Target Information | |
Sharma et al. | Detecting protein complexes based on a combination of topological and biological properties in protein-protein interaction network | |
Lau et al. | Drug repurposing for Leishmaniasis with Hyperbolic Graph Neural Networks | |
Gao et al. | TCR: A transformer based deep network for predicting cancer drugs response | |
Bongini et al. | A deep learning approach to the prediction of drug side-effects on molecular graphs | |
Cirinciani et al. | Drug Mechanism: A bioinformatic update | |
Lee et al. | A protein interaction verification system based on a neural network algorithm | |
Pu et al. | Using graph-based model to identify cell specific synthetic lethal effects |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |