CN111916145B - Novel coronavirus target prediction and drug discovery method based on graph representation learning - Google Patents

Novel coronavirus target prediction and drug discovery method based on graph representation learning Download PDF

Info

Publication number
CN111916145B
CN111916145B CN202010725014.5A CN202010725014A CN111916145B CN 111916145 B CN111916145 B CN 111916145B CN 202010725014 A CN202010725014 A CN 202010725014A CN 111916145 B CN111916145 B CN 111916145B
Authority
CN
China
Prior art keywords
network
drug
target
node
interaction
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202010725014.5A
Other languages
Chinese (zh)
Other versions
CN111916145A (en
Inventor
彭绍亮
周德山
王小奇
徐志建
王力
李肯立
钟武
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Hunan University
Original Assignee
Hunan University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Hunan University filed Critical Hunan University
Priority to CN202010725014.5A priority Critical patent/CN111916145B/en
Publication of CN111916145A publication Critical patent/CN111916145A/en
Application granted granted Critical
Publication of CN111916145B publication Critical patent/CN111916145B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16BBIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
    • G16B15/00ICT specially adapted for analysing two-dimensional or three-dimensional molecular structures, e.g. structural or functional relations or structure alignment
    • G16B15/30Drug targeting using structural data; Docking or binding prediction
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/047Probabilistic or stochastic networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16BBIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
    • G16B50/00ICT programming tools or database systems specially adapted for bioinformatics

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • Health & Medical Sciences (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • General Health & Medical Sciences (AREA)
  • Biophysics (AREA)
  • Software Systems (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Evolutionary Computation (AREA)
  • Computational Linguistics (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Mathematical Physics (AREA)
  • Biomedical Technology (AREA)
  • Artificial Intelligence (AREA)
  • Spectroscopy & Molecular Physics (AREA)
  • Data Mining & Analysis (AREA)
  • Chemical & Material Sciences (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Biotechnology (AREA)
  • Evolutionary Biology (AREA)
  • Medical Informatics (AREA)
  • Medicinal Chemistry (AREA)
  • Pharmacology & Pharmacy (AREA)
  • Crystallography & Structural Chemistry (AREA)
  • Probability & Statistics with Applications (AREA)
  • Bioethics (AREA)
  • Databases & Information Systems (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention belongs to the field of bioinformatics, and discloses a new coronavirus target prediction and drug discovery method based on graph representation learning. According to the potential relation of individual medicine for inhibiting the new coronavirus in the relocation network, the target of the new coronavirus is predicted, and the medicine capable of inhibiting the new coronavirus is screened out. Through the mode, the method can effectively screen out the medicine which can inhibit the novel coronavirus, accelerate the research and development of the medicine, and has very important popularization and application values.

Description

Novel coronavirus target prediction and drug discovery method based on graph representation learning
Technical Field
The invention relates to a new coronavirus target prediction and drug discovery method based on graph representation learning, belonging to the field of bioinformatics.
Background
New coroneumonia (COVID-19), caused by a novel coronavirus (2019-nCoV, SARS-CoV-2), is urgently needed to discover or develop more drugs capable of inhibiting the virus. The relocation of the drug not only can save a lot of cost for designing and screening the drug in the initial stage of drug development, but also obviously reduces the risk in the later stage of the development because the pharmacokinetic properties and toxicity of the used drug are thoroughly researched. The relevance of the drug target is researched by a computer prediction method, the search space of the candidate experimental drug is reduced, a reference can be provided for drug discovery and relocation, and corresponding time investment and cost consumption are reduced.
Much of the effort in recent years has focused on using machine learning based methods for drug-target correlation prediction. Many machine learning models are emerging that predict performance improvements. Most of these efforts are based on the correlation principle, considering the prediction task as a binary classification task, i.e. predicting whether drug-target related effects exist or not. However, in some machine learning methods, constraining the model to a simpler form (e.g., bilinear or logarithmic bilinear functions, etc.) may not be sufficient to obtain the complex hidden features behind the heterogeneous data. With the rise and development of deep learning, scientific researchers can construct a deeper learning model with better performance, so that useful and complex information can be better extracted from large-scale correlation network data, and the drug-target correlation can be more accurately predicted. Therefore, researchers gradually transit from a method of studying machine learning to a method of studying deep learning to discover medicines, and a plurality of useful deep learning models are provided. Recently, more and more researchers are focusing on the deep learning method on the graphic data. To process complex graph data, many researchers have used deep learning ideas in images to design the architecture of graph neural networks. Over the past few years, concepts, operations, models, etc. in the context of graphical neural networks have been constantly evolving and developing. This has greatly facilitated the development of the field of drug-target association prediction. While the traditional graph neural network learns the representation of the target node by propagating neighbor information in an iterative manner through a recurrent neural architecture until a stable fixed point is reached. This process is computationally expensive. And the convergence based on the fixed point can cause more information sharing among hidden states among nodes, thereby causing the state of the nodes to be too smooth and the characteristic information belonging to the nodes is deficient. The graph convolution neural network gets rid of a loop-based method, starts to move to a multilayer graph neural network, and shows strong performance in the aspect of extracting graph embedding. However, when the method aggregates information of different neighbor nodes, the method is viewed as the same entity, and has certain limitation.
In addition, heterogeneous data sources can provide more information and different perspectives for drug-target correlation prediction. Therefore, the accuracy of the drug-target correlation prediction can be improved to a certain extent by using heterogeneous data sources, such as drug-disease correlation network data, drug-side effect correlation network data and the like; based on the heterogeneous network, the graph convolution neural network is combined with the attention mechanism, the influence degree of information of different edge types on node embedding is reflected when neighborhood information is aggregated, so that the extracted node embedding interpretability is stronger, the model performance is better, the potential relations in heterogeneous relation networks such as a drug-target and the like can be effectively found, the target of the novel coronavirus can be predicted, drugs which can inhibit the novel coronavirus can be screened out, and the drug research and development are accelerated.
Disclosure of Invention
Aiming at the problem that different neighbor information in a heterogeneous network is treated equally in the previous research, and limitation exists when potential relations in the heterogeneous relation network such as a medicine-target are expressed, the invention provides a new coronavirus target prediction and medicine discovery method based on graph expression learning.
In order to achieve the above object, the solution of the present invention is:
a graph representation learning-based new coronavirus target prediction and drug discovery method comprises the following steps:
firstly, preparing a heterogeneous network data set: constructing an isomeric relationship network comprising action relationships between drugs, targets, side effects, diseases, wherein the drug-target interaction and drug-drug interaction networks are based on a DRUGBANK database, the target-target interaction network is based on an HPRD database, the drug-disease association and target-disease association networks are based on a CTD database, and the drug-side effect association network is based on a SIDER database; in the process of constructing a heterogeneous network data set, a common object in an interaction network is reserved, and the well-processed interaction network is regarded as a heterogeneous network and is used as a data set of a model;
secondly, dividing the data set:
2.1. randomly sampling all negative samples of a drug-target interaction network in a data set, wherein the number of the negative samples is ten times of the number of positive examples, and forming a data set of a training model by using the negative examples sampled from the drug-target interaction network, all positive examples and all positive and negative examples of other interaction networks, wherein the nodes of the negative samples have no interaction relation, and the positive samples have the negative examples;
2.2. dividing the processed data set in 2.1 into 10 mutually exclusive subsets with the same size, and obtaining each subset through random layered sampling; then, taking 9 mutually exclusive subsets as a training set each time, randomly extracting 5% of the training set as a verification set, and taking the rest mutually exclusive subsets as a test set, namely training and testing the model through cross validation by ten folds;
2.3. repeating ten-fold cross validation for 10 times;
thirdly, constructing a model;
3.1. constructing a 3-layer stacked neural network, wherein one layer and two layers use a graph convolution neural network to integrate neighborhood information of a complex heterogeneous relation network in a divided data set, and a given l-th layer node is embedded into the neural network
Figure BDA0002601348120000021
And node neighbor embedding
Figure BDA0002601348120000022
The characteristics of the next level node are expressed as follows:
Figure BDA0002601348120000031
wherein
Figure BDA0002601348120000032
A set of neighbors representing node i under relation r, ci,rRepresents the sum of the edge values between node i and neighbors whose edge type is r, s (e) represents the edge value between node i and node j,
Figure BDA0002601348120000033
and W(l)Representing weights in the graph convolution neural network, σ representing the activation function ReLU;
3.2. the neural network at the third layer is composed of a graph convolution neural network and an attention mechanism, namely after neighborhood information sums of different relationship types are obtained through processing of the graph convolution neural network, the attention mechanism is used for reflecting the influence degree of the neighborhood information sums of the different relationship types on node embedding, wherein the attention mechanism is expressed as follows:
Figure BDA0002601348120000034
Figure BDA0002601348120000035
Figure BDA0002601348120000036
wherein,
Figure BDA0002601348120000037
is the neighborhood information sum of the node i with the relation type r, and is processed by a neural network with the same weight w and an offset value b to obtain si,r(ii) a Further, s is processed with a softmax functioni,rObtaining the attention coefficient alpha of the neighborhood information sum of each relationship typei,r(ii) a Finally, attention coefficient α is giveni,rMultiplication by the original embedding
Figure BDA0002601348120000038
Obtaining the sum of the domain information of different relation types after adding the attention mechanism
Figure BDA0002601348120000039
Multiplying the sum of the processed domain information by different relations based on the graph convolution neural networkThe number of types, and the mapping of the embedding of the upper layer of the node are added, and the final node embedding is obtained after the processing of the activation function;
3.3. after the final embedding of the nodes is obtained through the 3.2 processing, the characteristic representation in the relation heterogeneous network is forcibly extracted by using a network topology reconstruction method to obtain a final relocation network, wherein the network topology reconstruction method comprises the following steps:
Figure BDA00026013481200000310
wherein G isr,HrIs a projection matrix of a particular edge type r, if edge type r is symmetric, then Gr=Hr(ii) a Respectively through Gr,HrCharacterization of nodes EuAnd EvPerforming edge specific projection, and performing inner product on vectors obtained by two projections to enable the inner product result to complete original edge value s (e) reconstruction, wherein a projection matrix is initialized to be in Gaussian distribution;
step four, training a model: feeding the training set in the divided data set into the third step of the constructed medicine-target relation prediction model based on the graph convolution neural network and the attention mechanism; then, calculating an updated step length by using an Adam optimizer, and minimizing a reconstruction error; performing ten-fold cross validation for 10 times, training 100 models in total, and using the model trained for the last time;
fifthly, target prediction and drug discovery are carried out on the new coronavirus
5.1. Selecting a certain medicine for inhibiting the coronavirus, and finding out novel coronavirus targets according to the strength of the relationship action between the medicine and all targets in the relocation network, wherein the targets can be targets of the existing interaction relationship between the original network and the corresponding medicine or potential targets without the interaction relationship between the original network and the corresponding medicine;
5.2. screening out the drugs for inhibiting the novel coronavirus according to the relationship action strength of the targets in the relocation network and the drugs obtained in the step 5.1;
5.3. after a drug inhibiting new coronary pneumonia is determined by a wet experiment, the target related to the drug is determined as a target related to the new coronary pneumonia.
The invention has the beneficial effects that:
the invention provides a new coronavirus target prediction and drug discovery method based on graph representation learning, which is used for effectively discovering potential relationships in heterogeneous relationship networks such as drug-target and the like, predicting targets of novel coronaviruses, screening drugs capable of inhibiting the novel coronaviruses and accelerating drug research and development.
Drawings
FIG. 1 is a diagram of the overall basic steps of the present invention;
FIG. 2 is a view showing the overall model structure of the present invention.
Detailed Description
The invention is further illustrated with reference to the following figures and examples.
As shown in fig. 1, the following describes the steps of the method in detail with reference to an example:
firstly, preparing a heterogeneous network data set; constructing an isomeric relationship network comprising action relationships between drugs, targets, side effects, diseases, wherein the drug-target interaction and drug-drug interaction networks are based on a DRUGBANK database, the target-target interaction network is based on an HPRD database, the drug-disease association and target-disease association networks are based on a CTD database, and the drug-side effect association network is based on a SIDER database; in the process of constructing a heterogeneous network data set, objects common in an interaction network are retained. For example, if there is drug x in the drug-drug interaction network, but there is no drug x in the drug-disease association network, then the subject needs to be removed. The processed interaction networks can be regarded as a heterogeneous network. Combining these six networks creates a heterogeneous network containing a total of 708 drugs, 1512 targets, 5603 diseases, 4192 side effects, and serves as a data set for the model. This heterogeneous network represents drugs, targets and other objects as nodes and drug-target etc. node interactions or associations as edges. Therefore, the object types are 4 in total, namely O { drug, target, disease, side effect }. And the relation types set R { drug-drug interaction, drug-disease association, drug-side effect association, disease-drug association, side effect-drug association, drug-protein interaction, protein-drug interaction, protein-protein interaction, protein-disease association, disease-protein association }, 10 kinds in total.
Secondly, dividing the data set:
2.1. randomly sampling all negative samples of a drug-target interaction network in a data set, wherein the number of the negative samples is ten times of the number of positive examples, and forming a data set of a training model by using the negative examples sampled from the drug-target interaction network, all positive examples and all positive and negative examples of other interaction networks, wherein the nodes of the negative samples have no interaction relation, and the positive samples have the negative examples;
2.2. dividing the processed data set in 2.1 into 10 mutually exclusive subsets with the same size, and obtaining each subset through random layered sampling; then, taking 9 mutually exclusive subsets as a training set each time, randomly extracting 5% of the training set as a verification set, and taking the rest mutually exclusive subsets as a test set, namely training and testing the model through cross validation by ten folds;
2.3. repeating ten-fold cross validation for 10 times;
thirdly, constructing a model;
3.1. constructing a 3-layer stacked neural network, wherein one layer and two layers use a graph convolution neural network to integrate neighborhood information of a complex heterogeneous relation network in a divided data set, and a given l-th layer node is embedded into the neural network
Figure BDA0002601348120000051
And node neighbor embedding
Figure BDA0002601348120000052
The characteristics of the next level node are expressed as follows:
Figure BDA0002601348120000053
wherein
Figure BDA0002601348120000054
A set of neighbors representing node i under relation r, ci,rRepresents the sum of the edge values between node i and neighbors whose edge type is r, s (e) represents the edge value between node i and node j,
Figure BDA0002601348120000055
and W(l)Representing weights in the graph convolution neural network, σ representing the activation function ReLU;
3.2. the neural network at the third layer is formed by combining a graph convolution neural network and an attention mechanism, namely, after neighborhood information sums of different relation types are obtained through processing of the graph convolution neural network, when a node is finally embedded, the attention mechanism is used for reflecting the influence degree of the neighborhood information sums considering the different relation types on node embedding, wherein the attention mechanism is expressed as follows:
Figure BDA0002601348120000056
Figure BDA0002601348120000057
Figure BDA0002601348120000061
wherein,
Figure BDA0002601348120000062
is the neighborhood information sum of the node i with the relation type r, and is processed by a neural network with the same weight w and an offset value b to obtain si,r(ii) a Further, s is processed with a softmax functioni,rObtaining the attention coefficient alpha of the neighborhood information sum of each relationship typei,r(ii) a Finally, attention coefficient α is giveni,rMultiplication by the original embedding
Figure BDA0002601348120000063
Obtaining the sum of the domain information of different relation types after adding the attention mechanism
Figure BDA0002601348120000064
Based on the graph convolution neural network, multiplying the sum of the processed domain information by the number of different relation types, adding the mapping of the embedding of the previous layer of the node, and obtaining the final node embedding after the processing of the activation function;
3.3. after the final embedding of the nodes is obtained through the 3.2 processing, the characteristic representation in the relation heterogeneous network is forcibly extracted by using a network topology reconstruction method to obtain a final relocation network, wherein the network topology reconstruction method comprises the following steps:
Figure BDA0002601348120000065
wherein G isr,HrIs a projection matrix of a particular edge type r, if edge type r is symmetric, then Gr=Hr(ii) a Respectively through Gr,HrCharacterization of nodes EuAnd EvPerforming edge specific projection, and performing inner product on vectors obtained by two projections to enable the inner product result to complete original edge value s (e) reconstruction, wherein a projection matrix is initialized to be in Gaussian distribution;
step four, training a model: feeding the training set in the divided data set into the third step of the constructed medicine-target relation prediction model based on the graph convolution neural network and the attention mechanism; then, calculating an updated step length by using an Adam optimizer, and minimizing a reconstruction error; performing ten-fold cross validation for 10 times, training 100 models in total, and using the model trained for the last time;
fifthly, target prediction and drug discovery are carried out on the new coronavirus;
5.1. selecting a drug chloroquine capable of inhibiting the coronavirus, and finding out 2 targets with the maximum relational action strength according to the relational action strengths of the chloroquine and all the targets in the relocation network, wherein the 2 targets are respectively Toll-like receptor 9 and Tumor necrosis factor.
5.2. According to the strength of the relationship action of the Toll-like receptor 9 in the relocation network and the drugs, such as Imiquimod, Hydroxychloroquine, Lovastatin, Ribavirin and the like which can inhibit the novel coronavirus are screened out, wherein the Hydroxychloroquine and Ribavirin are proved to be the drugs which can inhibit the novel coronavirus; according to the strength of the relationship between Tumor neosis factor and the medicament in the relocation network, screening out medicaments such as Thalidomide, Arsenic trioxide, Bortezomib, Zonisamide and the like which can possibly inhibit the novel coronavirus;
5.3. identifying as a new coronary pneumonia related target that target associated with a validated drug that inhibits new coronary pneumonia, even though some drugs have not been validated, is a priority target for performing wet experimental validation.
Finally, it should be noted that the above embodiments are only used for illustrating the technical solutions of the present invention, and do not limit the protection scope of the present invention. With reference to the description of the embodiment, those skilled in the art will understand and make modifications or substitutions related to the technical solution of the present invention without departing from the spirit and scope of the present invention.

Claims (1)

1. A graph representation learning-based new coronavirus target prediction and drug discovery method is characterized by comprising the following steps of:
firstly, preparing a heterogeneous network data set: constructing an isomeric relationship network comprising action relationships between drugs, targets, side effects, diseases, wherein the drug-target interaction and drug-drug interaction networks are based on a DRUGBANK database, the target-target interaction network is based on an HPRD database, the drug-disease association and target-disease association networks are based on a CTD database, and the drug-side effect association network is based on a SIDER database; in the process of constructing a heterogeneous network data set, a common object in an interaction network is reserved, and the well-processed interaction network is regarded as a heterogeneous network and is used as a data set of a model;
secondly, dividing the data set:
2.1. randomly sampling all negative samples of a drug-target interaction network in a data set, wherein the number of the negative samples is ten times of the number of positive samples, and forming a data set of a training model by using counter examples sampled from the drug-target interaction network and all positive and negative examples of all positive and other interaction networks, wherein the nodes of the negative samples have no interaction relation, and the positive samples have the interaction relation;
2.2. dividing the processed data set in 2.1 into 10 mutually exclusive subsets with the same size, and obtaining each subset through random layered sampling; then, taking 9 mutually exclusive subsets as a training set each time, randomly extracting 5% of the training set as a verification set, and taking the rest mutually exclusive subsets as a test set, namely training and testing the model through cross validation by ten folds;
2.3. repeating ten-fold cross validation for 10 times;
thirdly, constructing a model;
3.1. constructing a 3-layer stacked neural network, wherein one layer and two layers use a graph convolution neural network to integrate neighborhood information of a complex heterogeneous relation network in a divided data set, and a given l-th layer node is embedded into the neural network
Figure FDA0003467873700000011
And node neighbor embedding
Figure FDA0003467873700000012
The characteristics of the next level node are expressed as follows:
Figure FDA0003467873700000013
wherein
Figure FDA0003467873700000014
A set of neighbors representing node i under relation r, ci,rRepresenting nodesSum of edge values between i and neighbors whose edge type is r, s (e) represents the edge value between node i and node j, Wr (l)And W(l)Representing weights in the graph convolution neural network, σ representing the activation function ReLU;
3.2. the neural network at the third layer is composed of a graph convolution neural network and an attention mechanism, namely after neighborhood information sums of different relationship types are obtained through processing of the graph convolution neural network, the attention mechanism is used for reflecting the influence degree of the neighborhood information sums of the different relationship types on node embedding, wherein the attention mechanism is expressed as follows:
Figure FDA0003467873700000021
Figure FDA0003467873700000022
Figure FDA0003467873700000023
wherein,
Figure FDA0003467873700000024
is the neighborhood information sum of the node i with the relation type r, and is processed by a neural network with the same weight w and an offset value b to obtain si,r(ii) a Further, s is processed with a softmax functioni,rObtaining the attention coefficient alpha of the neighborhood information sum of each relationship typei,r(ii) a Finally, attention coefficient α is giveni,rMultiplying by the neighborhood information sum of the original node i with the relation type r
Figure FDA0003467873700000025
Obtaining the sum of the domain information of different relation types after adding the attention mechanism
Figure FDA0003467873700000026
Based on the graph convolution neural network, multiplying the sum of the processed domain information by the number of different relation types, adding the mapping of the embedding of the previous layer of the node, and obtaining the final node embedding after the processing of the activation function;
3.3. after the final embedding of the nodes is obtained through the 3.2 processing, the characteristic representation in the relation heterogeneous network is forcibly extracted by using a network topology reconstruction method to obtain a final relocation network, wherein the network topology reconstruction method comprises the following steps:
Figure FDA0003467873700000027
wherein G isr,HrIs a projection matrix of a particular edge type r, if edge type r is symmetric, then Gr=Hr(ii) a Respectively through Gr,HrCharacterization of nodes EuAnd EvPerforming edge specific projection, and performing inner product on vectors obtained by two projections to complete the reconstruction of an original edge value s (e) by the inner product result, wherein a projection matrix is initialized to be in Gaussian distribution;
step four, training a model: inputting the training set in the divided data set into the drug-target relation prediction model which is constructed in the third step and is based on the graph convolution neural network and the attention mechanism; then, calculating an updated step length by using an Adam optimizer, and minimizing a reconstruction error; performing ten-fold cross validation for 10 times, training 100 models in total, and using the model trained for the last time;
fifthly, target prediction and drug discovery are carried out on the new coronavirus
5.1. Selecting a certain medicine for inhibiting the coronavirus, and finding out novel coronavirus targets according to the strength of the relationship action between the medicine and all targets in the relocation network, wherein the targets can be targets of the existing interaction relationship between the original network and the corresponding medicine or potential targets without the interaction relationship between the original network and the corresponding medicine;
5.2. screening out the drugs for inhibiting the novel coronavirus according to the relationship action strength of the targets in the relocation network and the drugs obtained in the step 5.1;
5.3. after a certain drug inhibiting new coronary pneumonia is determined by a wet experiment, the target related to the drug is determined to be a target related to the new coronary pneumonia.
CN202010725014.5A 2020-07-24 2020-07-24 Novel coronavirus target prediction and drug discovery method based on graph representation learning Active CN111916145B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010725014.5A CN111916145B (en) 2020-07-24 2020-07-24 Novel coronavirus target prediction and drug discovery method based on graph representation learning

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010725014.5A CN111916145B (en) 2020-07-24 2020-07-24 Novel coronavirus target prediction and drug discovery method based on graph representation learning

Publications (2)

Publication Number Publication Date
CN111916145A CN111916145A (en) 2020-11-10
CN111916145B true CN111916145B (en) 2022-03-11

Family

ID=73280763

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010725014.5A Active CN111916145B (en) 2020-07-24 2020-07-24 Novel coronavirus target prediction and drug discovery method based on graph representation learning

Country Status (1)

Country Link
CN (1) CN111916145B (en)

Families Citing this family (14)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112669990A (en) * 2020-12-07 2021-04-16 三峡大学 New drug use prediction method based on deep self-coding and self-adaptive fusion
CN112863634B (en) * 2021-01-12 2022-09-20 山东大学 Traditional Chinese medicine prescription recommendation method and system based on new crown protein heterogeneous network clustering
CN114765060B (en) * 2021-01-13 2023-12-08 四川大学 Multi-attention method for predicting drug target interactions
CN113421658B (en) * 2021-07-06 2023-06-16 西北工业大学 Drug-target interaction prediction method based on neighbor attention network
CN114023397B (en) * 2021-09-16 2024-05-10 平安科技(深圳)有限公司 Drug redirection model generation method and device, storage medium and computer equipment
CN114023464B (en) * 2021-11-08 2022-08-09 东北林业大学 Drug-target interaction prediction method based on supervised synergy map contrast learning
CN114049930B (en) * 2021-11-12 2024-07-16 东南大学 Traditional Chinese medicine prescription repositioning method based on heterogeneous network representation learning
CN114121181B (en) * 2021-11-12 2024-03-29 东南大学 Heterogeneous graph neural network traditional Chinese medicine target prediction method based on attention mechanism
CN114242186B (en) * 2021-12-30 2022-08-12 湖南大学 Chinese and western medicine relocation method and system fusing GHP and GCN and storage medium
CN114613452B (en) * 2022-03-08 2023-04-28 电子科技大学 Drug repositioning method and system based on drug classification graph neural network
CN114898879B (en) * 2022-05-10 2023-04-21 电子科技大学 Chronic disease risk prediction method based on graph representation learning
CN114974406B (en) * 2022-05-11 2023-04-14 中国人民解放军总医院 Training method, system, device and product of antiviral drug repositioning model
CN115620807B (en) * 2022-12-19 2023-05-23 粤港澳大湾区数字经济研究院(福田) Method for predicting interaction strength between target protein molecule and drug molecule
CN117708679B (en) * 2024-02-04 2024-04-26 西北工业大学 Drug screening method and device based on neural network

Family Cites Families (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107194203A (en) * 2017-06-09 2017-09-22 西安电子科技大学 Medicine method for relocating based on miRNA data and tissue specificity network
CN107506591B (en) * 2017-08-28 2020-06-02 中南大学 Medicine repositioning method based on multivariate information fusion and random walk model
CN108520166B (en) * 2018-03-26 2022-04-08 中山大学 Drug target prediction method based on multiple similarity network migration
US20190303535A1 (en) * 2018-04-03 2019-10-03 International Business Machines Corporation Interpretable bio-medical link prediction using deep neural representation
CN109887540A (en) * 2019-01-15 2019-06-14 中南大学 A kind of drug targets interaction prediction method based on heterogeneous network insertion
CN111081316A (en) * 2020-03-25 2020-04-28 元码基因科技(北京)股份有限公司 Method and device for screening new coronary pneumonia candidate drugs

Also Published As

Publication number Publication date
CN111916145A (en) 2020-11-10

Similar Documents

Publication Publication Date Title
CN111916145B (en) Novel coronavirus target prediction and drug discovery method based on graph representation learning
Ronoud et al. An evolutionary deep belief network extreme learning-based for breast cancer diagnosis
CN111860638B (en) Parallel intrusion detection method and system based on unbalanced data deep belief network
CN113327644A (en) Medicine-target interaction prediction method based on deep embedding learning of graph and sequence
Guha et al. Introducing clustering based population in binary gravitational search algorithm for feature selection
CN113299338B (en) Knowledge-graph-based synthetic lethal gene pair prediction method, system, terminal and medium
CN109712678A (en) Relationship Prediction method, apparatus and electronic equipment
Shi et al. Protein complex detection with semi-supervised learning in protein interaction networks
Rezaee et al. Deep learning‐based microarray cancer classification and ensemble gene selection approach
Li et al. Deep learning on high-throughput transcriptomics to predict drug-induced liver injury
Kumar et al. An upper approximation based community detection algorithm for complex networks
Ye et al. Molecular substructure graph attention network for molecular property identification in drug discovery
CN112652355A (en) Medicine-target relation prediction method based on deep forest and PU learning
Zhang et al. Large-scale community detection based on core node and layer-by-layer label propagation
CN114420201A (en) Method for predicting interaction of drug targets by efficient fusion of multi-source data
Khan et al. Cervical cancer diagnosis model using extreme gradient boosting and bioinspired firefly optimization
Lugo-Martinez et al. Classification in biological networks with hypergraphlet kernels
Omar et al. Improving the clustering performance of the k-means algorithm for non-linear clusters
Carissimo et al. Validation of community robustness
CN115019878A (en) Drug discovery method based on graph representation and deep learning
Pizzuti et al. Experimental evaluation of topological-based fitness functions to detect complexes in PPI networks
Mazaheri et al. Ranking loss and sequestering learning for reducing image search bias in histopathology
Almayyan Lymph diseases prediction using random forest and particle swarm optimization
Moosavi et al. Feature selection based on dataset variance optimization using hybrid sine cosine–firehawk algorithm (hscfha)
Wang et al. Predicting hepatoma-related genes based on representation learning of PPI network and gene ontology annotations

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant