CN111916145B - Novel coronavirus target prediction and drug discovery method based on graph representation learning - Google Patents
Novel coronavirus target prediction and drug discovery method based on graph representation learning Download PDFInfo
- Publication number
- CN111916145B CN111916145B CN202010725014.5A CN202010725014A CN111916145B CN 111916145 B CN111916145 B CN 111916145B CN 202010725014 A CN202010725014 A CN 202010725014A CN 111916145 B CN111916145 B CN 111916145B
- Authority
- CN
- China
- Prior art keywords
- network
- drug
- target
- node
- interaction
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
- 241000711573 Coronaviridae Species 0.000 title claims abstract description 30
- 238000000034 method Methods 0.000 title claims abstract description 29
- 238000007876 drug discovery Methods 0.000 title claims abstract description 11
- 239000003814 drug Substances 0.000 claims abstract description 47
- 230000002401 inhibitory effect Effects 0.000 claims abstract description 11
- 238000013528 artificial neural network Methods 0.000 claims description 35
- 229940079593 drug Drugs 0.000 claims description 32
- 230000003993 interaction Effects 0.000 claims description 31
- 238000012549 training Methods 0.000 claims description 21
- 239000003596 drug target Substances 0.000 claims description 20
- 201000010099 disease Diseases 0.000 claims description 17
- 208000037265 diseases, disorders, signs and symptoms Diseases 0.000 claims description 17
- 230000007246 mechanism Effects 0.000 claims description 13
- 230000009471 action Effects 0.000 claims description 10
- 238000002790 cross-validation Methods 0.000 claims description 9
- 238000012545 processing Methods 0.000 claims description 9
- 230000006870 function Effects 0.000 claims description 7
- 206010035664 Pneumonia Diseases 0.000 claims description 6
- 230000004913 activation Effects 0.000 claims description 6
- 230000000694 effects Effects 0.000 claims description 6
- 239000011159 matrix material Substances 0.000 claims description 6
- 238000005070 sampling Methods 0.000 claims description 6
- 238000012360 testing method Methods 0.000 claims description 6
- 206010061623 Adverse drug reaction Diseases 0.000 claims description 5
- 208000030453 Drug-Related Side Effects and Adverse reaction Diseases 0.000 claims description 5
- 230000008406 drug-drug interaction Effects 0.000 claims description 5
- 230000008569 process Effects 0.000 claims description 5
- 238000012216 screening Methods 0.000 claims description 5
- ORILYTVJVMAKLC-UHFFFAOYSA-N Adamantane Natural products C1C(C2)CC3CC1CC2C3 ORILYTVJVMAKLC-UHFFFAOYSA-N 0.000 claims description 3
- 238000012512 characterization method Methods 0.000 claims description 3
- 238000013507 mapping Methods 0.000 claims description 3
- 239000013598 vector Substances 0.000 claims description 3
- 238000012795 verification Methods 0.000 claims description 3
- 238000002474 experimental method Methods 0.000 claims description 2
- 238000012827 research and development Methods 0.000 abstract description 3
- 238000013135 deep learning Methods 0.000 description 4
- 238000010801 machine learning Methods 0.000 description 4
- 238000011161 development Methods 0.000 description 3
- WHTVZRBIWZFKQO-AWEZNQCLSA-N (S)-chloroquine Chemical compound ClC1=CC=C2C(N[C@@H](C)CCCN(CC)CC)=CC=NC2=C1 WHTVZRBIWZFKQO-AWEZNQCLSA-N 0.000 description 2
- 241001678559 COVID-19 virus Species 0.000 description 2
- IWUCXVSUMQZMFG-AFCXAGJDSA-N Ribavirin Chemical compound N1=C(C(=O)N)N=CN1[C@H]1[C@H](O)[C@H](O)[C@@H](CO)O1 IWUCXVSUMQZMFG-AFCXAGJDSA-N 0.000 description 2
- 102000008235 Toll-Like Receptor 9 Human genes 0.000 description 2
- 108010060818 Toll-Like Receptor 9 Proteins 0.000 description 2
- 229960003677 chloroquine Drugs 0.000 description 2
- WHTVZRBIWZFKQO-UHFFFAOYSA-N chloroquine Natural products ClC1=CC=C2C(NC(C)CCCN(CC)CC)=CC=NC2=C1 WHTVZRBIWZFKQO-UHFFFAOYSA-N 0.000 description 2
- 229960004171 hydroxychloroquine Drugs 0.000 description 2
- XXSMGPRMXLTPCZ-UHFFFAOYSA-N hydroxychloroquine Chemical compound ClC1=CC=C2C(NC(C)CCCN(CCO)CC)=CC=NC2=C1 XXSMGPRMXLTPCZ-UHFFFAOYSA-N 0.000 description 2
- 230000001537 neural effect Effects 0.000 description 2
- 229960000329 ribavirin Drugs 0.000 description 2
- HZCAHMRRMINHDJ-DBRKOABJSA-N ribavirin Natural products O[C@@H]1[C@H](O)[C@@H](CO)O[C@H]1N1N=CN=C1 HZCAHMRRMINHDJ-DBRKOABJSA-N 0.000 description 2
- HJTAZXHBEBIQQX-UHFFFAOYSA-N 1,5-bis(chloromethyl)naphthalene Chemical compound C1=CC=C2C(CCl)=CC=CC2=C1CCl HJTAZXHBEBIQQX-UHFFFAOYSA-N 0.000 description 1
- UEJJHQNACJXSKW-UHFFFAOYSA-N 2-(2,6-dioxopiperidin-3-yl)-1H-isoindole-1,3(2H)-dione Chemical compound O=C1C2=CC=CC=C2C(=O)N1C1CCC(=O)NC1=O UEJJHQNACJXSKW-UHFFFAOYSA-N 0.000 description 1
- 208000025721 COVID-19 Diseases 0.000 description 1
- 206010013710 Drug interaction Diseases 0.000 description 1
- PCZOHLXUXFIOCF-UHFFFAOYSA-N Monacolin X Natural products C12C(OC(=O)C(C)CC)CC(C)C=C2C=CC(C)C1CCC1CC(O)CC(=O)O1 PCZOHLXUXFIOCF-UHFFFAOYSA-N 0.000 description 1
- 206010028980 Neoplasm Diseases 0.000 description 1
- 108060008682 Tumor Necrosis Factor Proteins 0.000 description 1
- 241000700605 Viruses Species 0.000 description 1
- GOLCXWYRSKYTSP-UHFFFAOYSA-N arsenic trioxide Inorganic materials O1[As]2O[As]1O2 GOLCXWYRSKYTSP-UHFFFAOYSA-N 0.000 description 1
- 229960002594 arsenic trioxide Drugs 0.000 description 1
- 230000009286 beneficial effect Effects 0.000 description 1
- GXJABQQUPOEUTA-RDJZCZTQSA-N bortezomib Chemical compound C([C@@H](C(=O)N[C@@H](CC(C)C)B(O)O)NC(=O)C=1N=CC=NC=1)C1=CC=CC=C1 GXJABQQUPOEUTA-RDJZCZTQSA-N 0.000 description 1
- 229960001467 bortezomib Drugs 0.000 description 1
- 238000013136 deep learning model Methods 0.000 description 1
- 230000002950 deficient Effects 0.000 description 1
- 238000013461 design Methods 0.000 description 1
- 238000010586 diagram Methods 0.000 description 1
- 229940000406 drug candidate Drugs 0.000 description 1
- 238000009509 drug development Methods 0.000 description 1
- 239000003777 experimental drug Substances 0.000 description 1
- 229960002751 imiquimod Drugs 0.000 description 1
- DOUYETYNHWVLEO-UHFFFAOYSA-N imiquimod Chemical compound C1=CC=CC2=C3N(CC(C)C)C=NC3=C(N)N=C21 DOUYETYNHWVLEO-UHFFFAOYSA-N 0.000 description 1
- PCZOHLXUXFIOCF-BXMDZJJMSA-N lovastatin Chemical compound C([C@H]1[C@@H](C)C=CC2=C[C@H](C)C[C@@H]([C@H]12)OC(=O)[C@@H](C)CC)C[C@@H]1C[C@@H](O)CC(=O)O1 PCZOHLXUXFIOCF-BXMDZJJMSA-N 0.000 description 1
- 229960004844 lovastatin Drugs 0.000 description 1
- QLJODMDSTUBWDW-UHFFFAOYSA-N lovastatin hydroxy acid Natural products C1=CC(C)C(CCC(O)CC(O)CC(O)=O)C2C(OC(=O)C(C)CC)CC(C)C=C21 QLJODMDSTUBWDW-UHFFFAOYSA-N 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 230000001902 propagating effect Effects 0.000 description 1
- 230000004850 protein–protein interaction Effects 0.000 description 1
- 230000000306 recurrent effect Effects 0.000 description 1
- 238000011160 research Methods 0.000 description 1
- 230000000717 retained effect Effects 0.000 description 1
- 238000006467 substitution reaction Methods 0.000 description 1
- 229960003433 thalidomide Drugs 0.000 description 1
- 230000001988 toxicity Effects 0.000 description 1
- 231100000419 toxicity Toxicity 0.000 description 1
- 102000003390 tumor necrosis factor Human genes 0.000 description 1
- 238000010200 validation analysis Methods 0.000 description 1
- UBQNRHZMVUUOMG-UHFFFAOYSA-N zonisamide Chemical compound C1=CC=C2C(CS(=O)(=O)N)=NOC2=C1 UBQNRHZMVUUOMG-UHFFFAOYSA-N 0.000 description 1
- 229960002911 zonisamide Drugs 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G16—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
- G16B—BIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
- G16B15/00—ICT specially adapted for analysing two-dimensional or three-dimensional molecular structures, e.g. structural or functional relations or structure alignment
- G16B15/30—Drug targeting using structural data; Docking or binding prediction
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/045—Combinations of networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/047—Probabilistic or stochastic networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
-
- G—PHYSICS
- G16—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
- G16B—BIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
- G16B50/00—ICT programming tools or database systems specially adapted for bioinformatics
Landscapes
- Engineering & Computer Science (AREA)
- Physics & Mathematics (AREA)
- Theoretical Computer Science (AREA)
- Health & Medical Sciences (AREA)
- Life Sciences & Earth Sciences (AREA)
- General Health & Medical Sciences (AREA)
- Biophysics (AREA)
- Software Systems (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Evolutionary Computation (AREA)
- Computational Linguistics (AREA)
- Molecular Biology (AREA)
- Computing Systems (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Mathematical Physics (AREA)
- Biomedical Technology (AREA)
- Artificial Intelligence (AREA)
- Spectroscopy & Molecular Physics (AREA)
- Data Mining & Analysis (AREA)
- Chemical & Material Sciences (AREA)
- Bioinformatics & Computational Biology (AREA)
- Biotechnology (AREA)
- Evolutionary Biology (AREA)
- Medical Informatics (AREA)
- Medicinal Chemistry (AREA)
- Pharmacology & Pharmacy (AREA)
- Crystallography & Structural Chemistry (AREA)
- Probability & Statistics with Applications (AREA)
- Bioethics (AREA)
- Databases & Information Systems (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
The invention belongs to the field of bioinformatics, and discloses a new coronavirus target prediction and drug discovery method based on graph representation learning. According to the potential relation of individual medicine for inhibiting the new coronavirus in the relocation network, the target of the new coronavirus is predicted, and the medicine capable of inhibiting the new coronavirus is screened out. Through the mode, the method can effectively screen out the medicine which can inhibit the novel coronavirus, accelerate the research and development of the medicine, and has very important popularization and application values.
Description
Technical Field
The invention relates to a new coronavirus target prediction and drug discovery method based on graph representation learning, belonging to the field of bioinformatics.
Background
New coroneumonia (COVID-19), caused by a novel coronavirus (2019-nCoV, SARS-CoV-2), is urgently needed to discover or develop more drugs capable of inhibiting the virus. The relocation of the drug not only can save a lot of cost for designing and screening the drug in the initial stage of drug development, but also obviously reduces the risk in the later stage of the development because the pharmacokinetic properties and toxicity of the used drug are thoroughly researched. The relevance of the drug target is researched by a computer prediction method, the search space of the candidate experimental drug is reduced, a reference can be provided for drug discovery and relocation, and corresponding time investment and cost consumption are reduced.
Much of the effort in recent years has focused on using machine learning based methods for drug-target correlation prediction. Many machine learning models are emerging that predict performance improvements. Most of these efforts are based on the correlation principle, considering the prediction task as a binary classification task, i.e. predicting whether drug-target related effects exist or not. However, in some machine learning methods, constraining the model to a simpler form (e.g., bilinear or logarithmic bilinear functions, etc.) may not be sufficient to obtain the complex hidden features behind the heterogeneous data. With the rise and development of deep learning, scientific researchers can construct a deeper learning model with better performance, so that useful and complex information can be better extracted from large-scale correlation network data, and the drug-target correlation can be more accurately predicted. Therefore, researchers gradually transit from a method of studying machine learning to a method of studying deep learning to discover medicines, and a plurality of useful deep learning models are provided. Recently, more and more researchers are focusing on the deep learning method on the graphic data. To process complex graph data, many researchers have used deep learning ideas in images to design the architecture of graph neural networks. Over the past few years, concepts, operations, models, etc. in the context of graphical neural networks have been constantly evolving and developing. This has greatly facilitated the development of the field of drug-target association prediction. While the traditional graph neural network learns the representation of the target node by propagating neighbor information in an iterative manner through a recurrent neural architecture until a stable fixed point is reached. This process is computationally expensive. And the convergence based on the fixed point can cause more information sharing among hidden states among nodes, thereby causing the state of the nodes to be too smooth and the characteristic information belonging to the nodes is deficient. The graph convolution neural network gets rid of a loop-based method, starts to move to a multilayer graph neural network, and shows strong performance in the aspect of extracting graph embedding. However, when the method aggregates information of different neighbor nodes, the method is viewed as the same entity, and has certain limitation.
In addition, heterogeneous data sources can provide more information and different perspectives for drug-target correlation prediction. Therefore, the accuracy of the drug-target correlation prediction can be improved to a certain extent by using heterogeneous data sources, such as drug-disease correlation network data, drug-side effect correlation network data and the like; based on the heterogeneous network, the graph convolution neural network is combined with the attention mechanism, the influence degree of information of different edge types on node embedding is reflected when neighborhood information is aggregated, so that the extracted node embedding interpretability is stronger, the model performance is better, the potential relations in heterogeneous relation networks such as a drug-target and the like can be effectively found, the target of the novel coronavirus can be predicted, drugs which can inhibit the novel coronavirus can be screened out, and the drug research and development are accelerated.
Disclosure of Invention
Aiming at the problem that different neighbor information in a heterogeneous network is treated equally in the previous research, and limitation exists when potential relations in the heterogeneous relation network such as a medicine-target are expressed, the invention provides a new coronavirus target prediction and medicine discovery method based on graph expression learning.
In order to achieve the above object, the solution of the present invention is:
a graph representation learning-based new coronavirus target prediction and drug discovery method comprises the following steps:
firstly, preparing a heterogeneous network data set: constructing an isomeric relationship network comprising action relationships between drugs, targets, side effects, diseases, wherein the drug-target interaction and drug-drug interaction networks are based on a DRUGBANK database, the target-target interaction network is based on an HPRD database, the drug-disease association and target-disease association networks are based on a CTD database, and the drug-side effect association network is based on a SIDER database; in the process of constructing a heterogeneous network data set, a common object in an interaction network is reserved, and the well-processed interaction network is regarded as a heterogeneous network and is used as a data set of a model;
secondly, dividing the data set:
2.1. randomly sampling all negative samples of a drug-target interaction network in a data set, wherein the number of the negative samples is ten times of the number of positive examples, and forming a data set of a training model by using the negative examples sampled from the drug-target interaction network, all positive examples and all positive and negative examples of other interaction networks, wherein the nodes of the negative samples have no interaction relation, and the positive samples have the negative examples;
2.2. dividing the processed data set in 2.1 into 10 mutually exclusive subsets with the same size, and obtaining each subset through random layered sampling; then, taking 9 mutually exclusive subsets as a training set each time, randomly extracting 5% of the training set as a verification set, and taking the rest mutually exclusive subsets as a test set, namely training and testing the model through cross validation by ten folds;
2.3. repeating ten-fold cross validation for 10 times;
thirdly, constructing a model;
3.1. constructing a 3-layer stacked neural network, wherein one layer and two layers use a graph convolution neural network to integrate neighborhood information of a complex heterogeneous relation network in a divided data set, and a given l-th layer node is embedded into the neural networkAnd node neighbor embeddingThe characteristics of the next level node are expressed as follows:
whereinA set of neighbors representing node i under relation r, ci,rRepresents the sum of the edge values between node i and neighbors whose edge type is r, s (e) represents the edge value between node i and node j,and W(l)Representing weights in the graph convolution neural network, σ representing the activation function ReLU;
3.2. the neural network at the third layer is composed of a graph convolution neural network and an attention mechanism, namely after neighborhood information sums of different relationship types are obtained through processing of the graph convolution neural network, the attention mechanism is used for reflecting the influence degree of the neighborhood information sums of the different relationship types on node embedding, wherein the attention mechanism is expressed as follows:
wherein,is the neighborhood information sum of the node i with the relation type r, and is processed by a neural network with the same weight w and an offset value b to obtain si,r(ii) a Further, s is processed with a softmax functioni,rObtaining the attention coefficient alpha of the neighborhood information sum of each relationship typei,r(ii) a Finally, attention coefficient α is giveni,rMultiplication by the original embeddingObtaining the sum of the domain information of different relation types after adding the attention mechanismMultiplying the sum of the processed domain information by different relations based on the graph convolution neural networkThe number of types, and the mapping of the embedding of the upper layer of the node are added, and the final node embedding is obtained after the processing of the activation function;
3.3. after the final embedding of the nodes is obtained through the 3.2 processing, the characteristic representation in the relation heterogeneous network is forcibly extracted by using a network topology reconstruction method to obtain a final relocation network, wherein the network topology reconstruction method comprises the following steps:
wherein G isr,HrIs a projection matrix of a particular edge type r, if edge type r is symmetric, then Gr=Hr(ii) a Respectively through Gr,HrCharacterization of nodes EuAnd EvPerforming edge specific projection, and performing inner product on vectors obtained by two projections to enable the inner product result to complete original edge value s (e) reconstruction, wherein a projection matrix is initialized to be in Gaussian distribution;
step four, training a model: feeding the training set in the divided data set into the third step of the constructed medicine-target relation prediction model based on the graph convolution neural network and the attention mechanism; then, calculating an updated step length by using an Adam optimizer, and minimizing a reconstruction error; performing ten-fold cross validation for 10 times, training 100 models in total, and using the model trained for the last time;
fifthly, target prediction and drug discovery are carried out on the new coronavirus
5.1. Selecting a certain medicine for inhibiting the coronavirus, and finding out novel coronavirus targets according to the strength of the relationship action between the medicine and all targets in the relocation network, wherein the targets can be targets of the existing interaction relationship between the original network and the corresponding medicine or potential targets without the interaction relationship between the original network and the corresponding medicine;
5.2. screening out the drugs for inhibiting the novel coronavirus according to the relationship action strength of the targets in the relocation network and the drugs obtained in the step 5.1;
5.3. after a drug inhibiting new coronary pneumonia is determined by a wet experiment, the target related to the drug is determined as a target related to the new coronary pneumonia.
The invention has the beneficial effects that:
the invention provides a new coronavirus target prediction and drug discovery method based on graph representation learning, which is used for effectively discovering potential relationships in heterogeneous relationship networks such as drug-target and the like, predicting targets of novel coronaviruses, screening drugs capable of inhibiting the novel coronaviruses and accelerating drug research and development.
Drawings
FIG. 1 is a diagram of the overall basic steps of the present invention;
FIG. 2 is a view showing the overall model structure of the present invention.
Detailed Description
The invention is further illustrated with reference to the following figures and examples.
As shown in fig. 1, the following describes the steps of the method in detail with reference to an example:
firstly, preparing a heterogeneous network data set; constructing an isomeric relationship network comprising action relationships between drugs, targets, side effects, diseases, wherein the drug-target interaction and drug-drug interaction networks are based on a DRUGBANK database, the target-target interaction network is based on an HPRD database, the drug-disease association and target-disease association networks are based on a CTD database, and the drug-side effect association network is based on a SIDER database; in the process of constructing a heterogeneous network data set, objects common in an interaction network are retained. For example, if there is drug x in the drug-drug interaction network, but there is no drug x in the drug-disease association network, then the subject needs to be removed. The processed interaction networks can be regarded as a heterogeneous network. Combining these six networks creates a heterogeneous network containing a total of 708 drugs, 1512 targets, 5603 diseases, 4192 side effects, and serves as a data set for the model. This heterogeneous network represents drugs, targets and other objects as nodes and drug-target etc. node interactions or associations as edges. Therefore, the object types are 4 in total, namely O { drug, target, disease, side effect }. And the relation types set R { drug-drug interaction, drug-disease association, drug-side effect association, disease-drug association, side effect-drug association, drug-protein interaction, protein-drug interaction, protein-protein interaction, protein-disease association, disease-protein association }, 10 kinds in total.
Secondly, dividing the data set:
2.1. randomly sampling all negative samples of a drug-target interaction network in a data set, wherein the number of the negative samples is ten times of the number of positive examples, and forming a data set of a training model by using the negative examples sampled from the drug-target interaction network, all positive examples and all positive and negative examples of other interaction networks, wherein the nodes of the negative samples have no interaction relation, and the positive samples have the negative examples;
2.2. dividing the processed data set in 2.1 into 10 mutually exclusive subsets with the same size, and obtaining each subset through random layered sampling; then, taking 9 mutually exclusive subsets as a training set each time, randomly extracting 5% of the training set as a verification set, and taking the rest mutually exclusive subsets as a test set, namely training and testing the model through cross validation by ten folds;
2.3. repeating ten-fold cross validation for 10 times;
thirdly, constructing a model;
3.1. constructing a 3-layer stacked neural network, wherein one layer and two layers use a graph convolution neural network to integrate neighborhood information of a complex heterogeneous relation network in a divided data set, and a given l-th layer node is embedded into the neural networkAnd node neighbor embeddingThe characteristics of the next level node are expressed as follows:
whereinA set of neighbors representing node i under relation r, ci,rRepresents the sum of the edge values between node i and neighbors whose edge type is r, s (e) represents the edge value between node i and node j,and W(l)Representing weights in the graph convolution neural network, σ representing the activation function ReLU;
3.2. the neural network at the third layer is formed by combining a graph convolution neural network and an attention mechanism, namely, after neighborhood information sums of different relation types are obtained through processing of the graph convolution neural network, when a node is finally embedded, the attention mechanism is used for reflecting the influence degree of the neighborhood information sums considering the different relation types on node embedding, wherein the attention mechanism is expressed as follows:
wherein,is the neighborhood information sum of the node i with the relation type r, and is processed by a neural network with the same weight w and an offset value b to obtain si,r(ii) a Further, s is processed with a softmax functioni,rObtaining the attention coefficient alpha of the neighborhood information sum of each relationship typei,r(ii) a Finally, attention coefficient α is giveni,rMultiplication by the original embeddingObtaining the sum of the domain information of different relation types after adding the attention mechanismBased on the graph convolution neural network, multiplying the sum of the processed domain information by the number of different relation types, adding the mapping of the embedding of the previous layer of the node, and obtaining the final node embedding after the processing of the activation function;
3.3. after the final embedding of the nodes is obtained through the 3.2 processing, the characteristic representation in the relation heterogeneous network is forcibly extracted by using a network topology reconstruction method to obtain a final relocation network, wherein the network topology reconstruction method comprises the following steps:
wherein G isr,HrIs a projection matrix of a particular edge type r, if edge type r is symmetric, then Gr=Hr(ii) a Respectively through Gr,HrCharacterization of nodes EuAnd EvPerforming edge specific projection, and performing inner product on vectors obtained by two projections to enable the inner product result to complete original edge value s (e) reconstruction, wherein a projection matrix is initialized to be in Gaussian distribution;
step four, training a model: feeding the training set in the divided data set into the third step of the constructed medicine-target relation prediction model based on the graph convolution neural network and the attention mechanism; then, calculating an updated step length by using an Adam optimizer, and minimizing a reconstruction error; performing ten-fold cross validation for 10 times, training 100 models in total, and using the model trained for the last time;
fifthly, target prediction and drug discovery are carried out on the new coronavirus;
5.1. selecting a drug chloroquine capable of inhibiting the coronavirus, and finding out 2 targets with the maximum relational action strength according to the relational action strengths of the chloroquine and all the targets in the relocation network, wherein the 2 targets are respectively Toll-like receptor 9 and Tumor necrosis factor.
5.2. According to the strength of the relationship action of the Toll-like receptor 9 in the relocation network and the drugs, such as Imiquimod, Hydroxychloroquine, Lovastatin, Ribavirin and the like which can inhibit the novel coronavirus are screened out, wherein the Hydroxychloroquine and Ribavirin are proved to be the drugs which can inhibit the novel coronavirus; according to the strength of the relationship between Tumor neosis factor and the medicament in the relocation network, screening out medicaments such as Thalidomide, Arsenic trioxide, Bortezomib, Zonisamide and the like which can possibly inhibit the novel coronavirus;
5.3. identifying as a new coronary pneumonia related target that target associated with a validated drug that inhibits new coronary pneumonia, even though some drugs have not been validated, is a priority target for performing wet experimental validation.
Finally, it should be noted that the above embodiments are only used for illustrating the technical solutions of the present invention, and do not limit the protection scope of the present invention. With reference to the description of the embodiment, those skilled in the art will understand and make modifications or substitutions related to the technical solution of the present invention without departing from the spirit and scope of the present invention.
Claims (1)
1. A graph representation learning-based new coronavirus target prediction and drug discovery method is characterized by comprising the following steps of:
firstly, preparing a heterogeneous network data set: constructing an isomeric relationship network comprising action relationships between drugs, targets, side effects, diseases, wherein the drug-target interaction and drug-drug interaction networks are based on a DRUGBANK database, the target-target interaction network is based on an HPRD database, the drug-disease association and target-disease association networks are based on a CTD database, and the drug-side effect association network is based on a SIDER database; in the process of constructing a heterogeneous network data set, a common object in an interaction network is reserved, and the well-processed interaction network is regarded as a heterogeneous network and is used as a data set of a model;
secondly, dividing the data set:
2.1. randomly sampling all negative samples of a drug-target interaction network in a data set, wherein the number of the negative samples is ten times of the number of positive samples, and forming a data set of a training model by using counter examples sampled from the drug-target interaction network and all positive and negative examples of all positive and other interaction networks, wherein the nodes of the negative samples have no interaction relation, and the positive samples have the interaction relation;
2.2. dividing the processed data set in 2.1 into 10 mutually exclusive subsets with the same size, and obtaining each subset through random layered sampling; then, taking 9 mutually exclusive subsets as a training set each time, randomly extracting 5% of the training set as a verification set, and taking the rest mutually exclusive subsets as a test set, namely training and testing the model through cross validation by ten folds;
2.3. repeating ten-fold cross validation for 10 times;
thirdly, constructing a model;
3.1. constructing a 3-layer stacked neural network, wherein one layer and two layers use a graph convolution neural network to integrate neighborhood information of a complex heterogeneous relation network in a divided data set, and a given l-th layer node is embedded into the neural networkAnd node neighbor embeddingThe characteristics of the next level node are expressed as follows:
whereinA set of neighbors representing node i under relation r, ci,rRepresenting nodesSum of edge values between i and neighbors whose edge type is r, s (e) represents the edge value between node i and node j, Wr (l)And W(l)Representing weights in the graph convolution neural network, σ representing the activation function ReLU;
3.2. the neural network at the third layer is composed of a graph convolution neural network and an attention mechanism, namely after neighborhood information sums of different relationship types are obtained through processing of the graph convolution neural network, the attention mechanism is used for reflecting the influence degree of the neighborhood information sums of the different relationship types on node embedding, wherein the attention mechanism is expressed as follows:
wherein,is the neighborhood information sum of the node i with the relation type r, and is processed by a neural network with the same weight w and an offset value b to obtain si,r(ii) a Further, s is processed with a softmax functioni,rObtaining the attention coefficient alpha of the neighborhood information sum of each relationship typei,r(ii) a Finally, attention coefficient α is giveni,rMultiplying by the neighborhood information sum of the original node i with the relation type rObtaining the sum of the domain information of different relation types after adding the attention mechanismBased on the graph convolution neural network, multiplying the sum of the processed domain information by the number of different relation types, adding the mapping of the embedding of the previous layer of the node, and obtaining the final node embedding after the processing of the activation function;
3.3. after the final embedding of the nodes is obtained through the 3.2 processing, the characteristic representation in the relation heterogeneous network is forcibly extracted by using a network topology reconstruction method to obtain a final relocation network, wherein the network topology reconstruction method comprises the following steps:
wherein G isr,HrIs a projection matrix of a particular edge type r, if edge type r is symmetric, then Gr=Hr(ii) a Respectively through Gr,HrCharacterization of nodes EuAnd EvPerforming edge specific projection, and performing inner product on vectors obtained by two projections to complete the reconstruction of an original edge value s (e) by the inner product result, wherein a projection matrix is initialized to be in Gaussian distribution;
step four, training a model: inputting the training set in the divided data set into the drug-target relation prediction model which is constructed in the third step and is based on the graph convolution neural network and the attention mechanism; then, calculating an updated step length by using an Adam optimizer, and minimizing a reconstruction error; performing ten-fold cross validation for 10 times, training 100 models in total, and using the model trained for the last time;
fifthly, target prediction and drug discovery are carried out on the new coronavirus
5.1. Selecting a certain medicine for inhibiting the coronavirus, and finding out novel coronavirus targets according to the strength of the relationship action between the medicine and all targets in the relocation network, wherein the targets can be targets of the existing interaction relationship between the original network and the corresponding medicine or potential targets without the interaction relationship between the original network and the corresponding medicine;
5.2. screening out the drugs for inhibiting the novel coronavirus according to the relationship action strength of the targets in the relocation network and the drugs obtained in the step 5.1;
5.3. after a certain drug inhibiting new coronary pneumonia is determined by a wet experiment, the target related to the drug is determined to be a target related to the new coronary pneumonia.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202010725014.5A CN111916145B (en) | 2020-07-24 | 2020-07-24 | Novel coronavirus target prediction and drug discovery method based on graph representation learning |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202010725014.5A CN111916145B (en) | 2020-07-24 | 2020-07-24 | Novel coronavirus target prediction and drug discovery method based on graph representation learning |
Publications (2)
Publication Number | Publication Date |
---|---|
CN111916145A CN111916145A (en) | 2020-11-10 |
CN111916145B true CN111916145B (en) | 2022-03-11 |
Family
ID=73280763
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202010725014.5A Active CN111916145B (en) | 2020-07-24 | 2020-07-24 | Novel coronavirus target prediction and drug discovery method based on graph representation learning |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN111916145B (en) |
Families Citing this family (14)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN112669990A (en) * | 2020-12-07 | 2021-04-16 | 三峡大学 | New drug use prediction method based on deep self-coding and self-adaptive fusion |
CN112863634B (en) * | 2021-01-12 | 2022-09-20 | 山东大学 | Traditional Chinese medicine prescription recommendation method and system based on new crown protein heterogeneous network clustering |
CN114765060B (en) * | 2021-01-13 | 2023-12-08 | 四川大学 | Multi-attention method for predicting drug target interactions |
CN113421658B (en) * | 2021-07-06 | 2023-06-16 | 西北工业大学 | Drug-target interaction prediction method based on neighbor attention network |
CN114023397B (en) * | 2021-09-16 | 2024-05-10 | 平安科技(深圳)有限公司 | Drug redirection model generation method and device, storage medium and computer equipment |
CN114023464B (en) * | 2021-11-08 | 2022-08-09 | 东北林业大学 | Drug-target interaction prediction method based on supervised synergy map contrast learning |
CN114049930B (en) * | 2021-11-12 | 2024-07-16 | 东南大学 | Traditional Chinese medicine prescription repositioning method based on heterogeneous network representation learning |
CN114121181B (en) * | 2021-11-12 | 2024-03-29 | 东南大学 | Heterogeneous graph neural network traditional Chinese medicine target prediction method based on attention mechanism |
CN114242186B (en) * | 2021-12-30 | 2022-08-12 | 湖南大学 | Chinese and western medicine relocation method and system fusing GHP and GCN and storage medium |
CN114613452B (en) * | 2022-03-08 | 2023-04-28 | 电子科技大学 | Drug repositioning method and system based on drug classification graph neural network |
CN114898879B (en) * | 2022-05-10 | 2023-04-21 | 电子科技大学 | Chronic disease risk prediction method based on graph representation learning |
CN114974406B (en) * | 2022-05-11 | 2023-04-14 | 中国人民解放军总医院 | Training method, system, device and product of antiviral drug repositioning model |
CN115620807B (en) * | 2022-12-19 | 2023-05-23 | 粤港澳大湾区数字经济研究院(福田) | Method for predicting interaction strength between target protein molecule and drug molecule |
CN117708679B (en) * | 2024-02-04 | 2024-04-26 | 西北工业大学 | Drug screening method and device based on neural network |
Family Cites Families (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN107194203A (en) * | 2017-06-09 | 2017-09-22 | 西安电子科技大学 | Medicine method for relocating based on miRNA data and tissue specificity network |
CN107506591B (en) * | 2017-08-28 | 2020-06-02 | 中南大学 | Medicine repositioning method based on multivariate information fusion and random walk model |
CN108520166B (en) * | 2018-03-26 | 2022-04-08 | 中山大学 | Drug target prediction method based on multiple similarity network migration |
US20190303535A1 (en) * | 2018-04-03 | 2019-10-03 | International Business Machines Corporation | Interpretable bio-medical link prediction using deep neural representation |
CN109887540A (en) * | 2019-01-15 | 2019-06-14 | 中南大学 | A kind of drug targets interaction prediction method based on heterogeneous network insertion |
CN111081316A (en) * | 2020-03-25 | 2020-04-28 | 元码基因科技(北京)股份有限公司 | Method and device for screening new coronary pneumonia candidate drugs |
-
2020
- 2020-07-24 CN CN202010725014.5A patent/CN111916145B/en active Active
Also Published As
Publication number | Publication date |
---|---|
CN111916145A (en) | 2020-11-10 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN111916145B (en) | Novel coronavirus target prediction and drug discovery method based on graph representation learning | |
Ronoud et al. | An evolutionary deep belief network extreme learning-based for breast cancer diagnosis | |
CN111860638B (en) | Parallel intrusion detection method and system based on unbalanced data deep belief network | |
CN113327644A (en) | Medicine-target interaction prediction method based on deep embedding learning of graph and sequence | |
Guha et al. | Introducing clustering based population in binary gravitational search algorithm for feature selection | |
CN113299338B (en) | Knowledge-graph-based synthetic lethal gene pair prediction method, system, terminal and medium | |
CN109712678A (en) | Relationship Prediction method, apparatus and electronic equipment | |
Shi et al. | Protein complex detection with semi-supervised learning in protein interaction networks | |
Rezaee et al. | Deep learning‐based microarray cancer classification and ensemble gene selection approach | |
Li et al. | Deep learning on high-throughput transcriptomics to predict drug-induced liver injury | |
Kumar et al. | An upper approximation based community detection algorithm for complex networks | |
Ye et al. | Molecular substructure graph attention network for molecular property identification in drug discovery | |
CN112652355A (en) | Medicine-target relation prediction method based on deep forest and PU learning | |
Zhang et al. | Large-scale community detection based on core node and layer-by-layer label propagation | |
CN114420201A (en) | Method for predicting interaction of drug targets by efficient fusion of multi-source data | |
Khan et al. | Cervical cancer diagnosis model using extreme gradient boosting and bioinspired firefly optimization | |
Lugo-Martinez et al. | Classification in biological networks with hypergraphlet kernels | |
Omar et al. | Improving the clustering performance of the k-means algorithm for non-linear clusters | |
Carissimo et al. | Validation of community robustness | |
CN115019878A (en) | Drug discovery method based on graph representation and deep learning | |
Pizzuti et al. | Experimental evaluation of topological-based fitness functions to detect complexes in PPI networks | |
Mazaheri et al. | Ranking loss and sequestering learning for reducing image search bias in histopathology | |
Almayyan | Lymph diseases prediction using random forest and particle swarm optimization | |
Moosavi et al. | Feature selection based on dataset variance optimization using hybrid sine cosine–firehawk algorithm (hscfha) | |
Wang et al. | Predicting hepatoma-related genes based on representation learning of PPI network and gene ontology annotations |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |