CN115472305A - Method and system for predicting microorganism-drug association effect - Google Patents
Method and system for predicting microorganism-drug association effect Download PDFInfo
- Publication number
- CN115472305A CN115472305A CN202210938454.8A CN202210938454A CN115472305A CN 115472305 A CN115472305 A CN 115472305A CN 202210938454 A CN202210938454 A CN 202210938454A CN 115472305 A CN115472305 A CN 115472305A
- Authority
- CN
- China
- Prior art keywords
- drug
- microorganism
- attribute
- matrix
- network
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 239000003814 drug Substances 0.000 title claims abstract description 313
- 229940079593 drug Drugs 0.000 title claims abstract description 270
- 238000000034 method Methods 0.000 title claims abstract description 58
- 230000000694 effects Effects 0.000 title claims abstract description 18
- 244000005700 microbiome Species 0.000 claims abstract description 123
- 238000003062 neural network model Methods 0.000 claims abstract description 43
- 238000012549 training Methods 0.000 claims abstract description 41
- 238000013528 artificial neural network Methods 0.000 claims abstract description 31
- 230000003993 interaction Effects 0.000 claims abstract description 27
- 230000014509 gene expression Effects 0.000 claims abstract description 15
- 239000011159 matrix material Substances 0.000 claims description 126
- 239000013598 vector Substances 0.000 claims description 30
- 230000000813 microbial effect Effects 0.000 claims description 17
- 108700005443 Microbial Genes Proteins 0.000 claims description 15
- 230000006870 function Effects 0.000 claims description 12
- 108090000623 proteins and genes Proteins 0.000 claims description 12
- 238000002790 cross-validation Methods 0.000 claims description 10
- 238000010586 diagram Methods 0.000 claims description 9
- 238000012360 testing method Methods 0.000 claims description 7
- 238000012795 verification Methods 0.000 claims description 5
- 230000008569 process Effects 0.000 claims description 4
- 238000005070 sampling Methods 0.000 claims description 4
- 239000008186 active pharmaceutical agent Substances 0.000 claims description 3
- 238000010276 construction Methods 0.000 claims description 3
- 230000008406 drug-drug interaction Effects 0.000 claims description 3
- 238000012847 principal component analysis method Methods 0.000 claims description 3
- 230000000644 propagated effect Effects 0.000 claims description 3
- 230000001902 propagating effect Effects 0.000 claims description 3
- 238000005295 random walk Methods 0.000 claims description 3
- 238000001228 spectrum Methods 0.000 claims description 3
- 230000009466 transformation Effects 0.000 claims description 3
- 230000007704 transition Effects 0.000 claims description 3
- 230000009471 action Effects 0.000 description 4
- 238000002474 experimental method Methods 0.000 description 2
- 230000006872 improvement Effects 0.000 description 2
- 230000007246 mechanism Effects 0.000 description 2
- 239000002547 new drug Substances 0.000 description 2
- 241000894006 Bacteria Species 0.000 description 1
- 239000003242 anti bacterial agent Substances 0.000 description 1
- 229940088710 antibiotic agent Drugs 0.000 description 1
- 230000009286 beneficial effect Effects 0.000 description 1
- 150000001875 compounds Chemical class 0.000 description 1
- 238000000205 computational method Methods 0.000 description 1
- 239000000890 drug combination Substances 0.000 description 1
- 238000009509 drug development Methods 0.000 description 1
- 238000011156 evaluation Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 238000011160 research Methods 0.000 description 1
- 238000011524 similarity measure Methods 0.000 description 1
- 241000894007 species Species 0.000 description 1
- 230000001988 toxicity Effects 0.000 description 1
- 231100000419 toxicity Toxicity 0.000 description 1
- 238000010200 validation analysis Methods 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G16—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
- G16H—HEALTHCARE INFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR THE HANDLING OR PROCESSING OF MEDICAL OR HEALTHCARE DATA
- G16H70/00—ICT specially adapted for the handling or processing of medical references
- G16H70/40—ICT specially adapted for the handling or processing of medical references relating to drugs, e.g. their side effects or intended usage
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
-
- G—PHYSICS
- G16—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
- G16B—BIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
- G16B40/00—ICT specially adapted for biostatistics; ICT specially adapted for bioinformatics-related machine learning or data mining, e.g. knowledge discovery or pattern finding
-
- G—PHYSICS
- G16—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
- G16C—COMPUTATIONAL CHEMISTRY; CHEMOINFORMATICS; COMPUTATIONAL MATERIALS SCIENCE
- G16C20/00—Chemoinformatics, i.e. ICT specially adapted for the handling of physicochemical or structural data of chemical particles, elements, compounds or mixtures
- G16C20/70—Machine learning, data mining or chemometrics
Landscapes
- Engineering & Computer Science (AREA)
- Health & Medical Sciences (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Health & Medical Sciences (AREA)
- Life Sciences & Earth Sciences (AREA)
- Evolutionary Computation (AREA)
- Medical Informatics (AREA)
- Software Systems (AREA)
- Data Mining & Analysis (AREA)
- Artificial Intelligence (AREA)
- Computing Systems (AREA)
- Biophysics (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Molecular Biology (AREA)
- Biomedical Technology (AREA)
- Mathematical Physics (AREA)
- Chemical & Material Sciences (AREA)
- General Physics & Mathematics (AREA)
- Databases & Information Systems (AREA)
- Epidemiology (AREA)
- Public Health (AREA)
- General Engineering & Computer Science (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Computational Linguistics (AREA)
- Bioinformatics & Computational Biology (AREA)
- Spectroscopy & Molecular Physics (AREA)
- Evolutionary Biology (AREA)
- Biotechnology (AREA)
- Bioethics (AREA)
- Crystallography & Structural Chemistry (AREA)
- Medicinal Chemistry (AREA)
- Pharmacology & Pharmacy (AREA)
- Toxicology (AREA)
- Primary Health Care (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
The invention discloses a method for predicting a microorganism-drug association effect, which comprises the following steps: s1, constructing a microorganism-drug association network Net1 through a microorganism-drug association database; s2, constructing an interaction network Net2 through a microorganism-drug association database; s3, constructing a multi-mode attribute map of the microorganisms and the drugs according to the comprehensive similarity of the drugs, the drug network topology of the drug network, the functional similarity of the microorganisms and the genome sequence; s4, establishing a graph neural network model introducing regularization; s5, obtaining embedded expressions Z1 and Z2; inputting Z1 and Z2 into the graph neural network for training to obtain a trained graph neural network; and S6, acquiring a data set to be predicted, and predicting the association effect of the microorganisms and the drugs in the data set to be predicted through a trained graph neural network. The invention solves the problem that the prior art can not construct interpretable node characteristics of organisms and medicines, and has the characteristic of considering the sparsity problem caused by the existing microorganism-medicine related data set.
Description
Technical Field
The invention relates to the technical field of bioinformatics, in particular to a method for predicting a microorganism-drug association effect.
Background
In recent years, the focus of research in the medical field has been to explore the relationship between microbial community imbalance and drug efficacy and toxicity, however, there is still a lack of comprehensive understanding of the complex mechanisms by which microbial communities interact with drugs in the human body. At present, new drug development faces two major challenges. On the one hand, the discovery of antibiotics is of a few kinds, most of the work has focused on optimizing or combining known compounds, it is difficult to culture the target species under laboratory conditions, and most of the drugs fail during the course of the experiment. On the other hand, the number of resistant bacteria is also increasing at an alarming rate. More and more studies show that microorganisms and drugs have close interactions, and some connections between microorganisms and drugs have been confirmed by culture experiments, but are not enough to elucidate the complex interaction mechanism between human microorganisms and drugs. Therefore, there is an urgent need to develop an efficient method to systematically explore the possible association between microorganisms and drugs.
Two types of computational methods currently exist for predicting the relationship of microorganisms to drugs.
The first category of methods focuses primarily on similarity measures, such as the HMDAKATZ method using KATZ measures, but such measures are too simple to adequately reflect similarity, resulting in inaccurate association identification.
The second category of methods uses graph learning methods that use rich semantic information in the graph data representation with better predictive power than previous similarity metric based methods. There are currently two common methods of learning graph characteristics: meta-paths and graph convolution networks.
The meta-path algorithm mainly utilizes the marginal information associated with the microbial drugs for prediction. The meta-path algorithm combines meta-path 2vec with neural network recommendations for learning low-dimensional embedded representations of microorganisms and drugs. The meta-path algorithm does contribute to the improvement of the prediction ability of the model, but it relies too much on edge information, which naturally leads to a failure of prediction in consideration of the absence of existing edge information when a new drug or a new microorganism is introduced.
Compared with the meta path, the GCN method can capture not only edge information but also node information. Therefore, in the current methods, the use of the GCN method for predicting microbial drug correlations is of great interest. Long et al first applied the GCN encoder to the microbial drug correlation method GCNMDA and introduced a conditional random field into the GCN hidden layer. There is also a node level GCN attention method, EGATMDA, to learn node (i.e., microbe and drug) embedding that effectively preserves the target neighbors of the graph and only relevant information. However, existing methods fail to construct node features that contain biological information.
In summary, the existing methods for predicting microbe-drug related action have the problem that abundant interpretable node characteristics of organisms and drugs cannot be constructed, so how to invent a method for predicting microbe-drug related action, which can construct interpretable node characteristics of organisms and drugs, is a technical problem to be solved urgently in the field.
Disclosure of Invention
The invention provides a method for predicting a microorganism-drug association effect, aiming at solving the problem that the prior art can not construct interpretable node characteristics of organisms and drugs, and the method has the characteristic of considering the sparsity problem caused by the existing microorganism-drug association data set.
In order to realize the purpose of the invention, the technical scheme is as follows:
a method of predicting a microbe-drug association effect, comprising the steps of:
s1, constructing a microorganism-drug association network through a microorganism-drug association database, wherein the association network is called as Net1;
s2, retrieving related interaction of the microorganisms and the microorganisms through a microorganism database in a microorganism-drug association database, and retrieving related interaction of the drugs and the drugs through a drug database in a microorganism-drug association database; constructing an interaction network according to the related interaction of the microorganism and the related interaction of the drug and the drug, and calling the interaction network as Net2;
s3, constructing a topological attribute network of the medicine through a medicine database, and constructing a microbial gene sequence through a microbial database; constructing a multi-modal attribute map of the microorganism-drug according to the comprehensive similarity attribute and drug network topology attribute of the drug in the drug database, and the functional similarity attribute and genome sequence attribute of the microorganism in the microorganism database;
s4, establishing a graph neural network model introduced with regularization according to Net1, net2 and the multi-mode attribute graph of the microorganism-medicine;
s5, inputting the Net1 and the Net2 into a neural network model of the graph in combination with a multi-mode attribute diagram of the microorganism-medicament to obtain embedded representations Z1 and Z2; inputting the embedded expressions Z1 and Z2 into a neural network of the graph for training to obtain a trained neural network of the graph;
and S6, acquiring a data set to be predicted, and predicting the correlation action of the microorganisms and the drugs in the data set to be predicted through a trained graph neural network.
The invention constructs a microorganism-drug association network through a microorganism-drug association database, and further obtains interaction networks Net1 and Net2; establishing a graph neural network model introduced with regularization, inputting Net1 and Net2 into the graph neural network model in combination with a multi-mode attribute graph of a microorganism-medicament to obtain embedded expressions Z1 and Z2, and inputting the embedded expressions Z1 and Z2 into the graph neural network for training to obtain a trained graph neural network; the interpretable node characteristics of the organisms and the medicines are constructed, and the problem of sparsity brought by the existing microorganism-medicine related data set is considered.
Preferably, in step S3, a topological attribute network of the drug is constructed through the drug database, and a microbial gene sequence is constructed through the microbial database; the specific steps of constructing the microorganism-drug multi-modal attribute map according to the comprehensive similarity attribute of the drugs in the drug database, the drug network topology attribute of the drug network, and the functional similarity attribute and the genome sequence attribute of the microorganisms in the microorganism database are as follows:
s301, constructing a similarity characteristic matrix of the medicines according to the medicine similarity attributes in the medicine database, and constructing a topological attribute network of the medicines through the medicine database, so as to obtain a second attribute characteristic matrix of the medicines;
s302, constructing a similarity characteristic matrix of microorganisms according to the functional similarity attributes of the microorganisms in the microorganism database, and constructing a microorganism gene sequence through the microorganism database, so as to obtain a second attribute characteristic matrix of the microorganisms;
s303, constructing a microorganism-drug similarity characteristic network according to the similarity characteristic matrix of the drugs and the similarity characteristic matrix of the microorganisms;
s304, constructing a microorganism-drug second attribute feature network according to the second attribute feature matrix of the drug and the second attribute feature matrix of the microorganism;
s305, combining the microorganism-drug similarity characteristic network with the microorganism-drug second attribute characteristic network to obtain a microorganism-drug multi-mode attribute map.
Further, in step S301, a similarity feature matrix of the drug is constructed according to the drug similarity attribute in the drug database, and a topological attribute network of the drug is constructed through the drug database, so as to obtain a second attribute feature matrix of the drug, which specifically includes:
A1. calculating the similarity attribute of the drugs in the drug database by using SIMCOMP2 tool to obtain the molecular structure similarity matrix DS of the drugs struct (di,dj);
A2. The drug-drug interaction spectrum in Net2 is represented by matrix DIP, yielding the normalized kernel bandwidth:
where μ denotes the normalized kernel bandwidth and μ' is the original bandwidth, set to 1,DIP (d) i ) Denotes the drug d i Interaction with other drugs, nd represents the number of microorganisms in the Net1;
A3. the similarity characteristic matrix of the drugs is expressed as S d (d i ,d j ):
A4. Constructing a drug network topology attribute in a drug database by a random walk method with restart, performing random drift and restart on a drug network until the drug network is converged to complete the construction of the drug network, thereby obtaining a probability distribution vector of each drug, and constructing a second attribute feature matrix F of the drug d ∈R nd×nd 。
Further, in step A4, the formula of random drift and restart is:
wherein,representing the probability that the ith node of the drug network moves to other nodes at time T +1, theta is the restart probability, T is the transition probability matrix, p i (0) ∈R n×1 Starting probability vector, p, representing the ith node of the drug network i (t) ∈R n×1 Representing the probability that the ith node of the drug network moves to other nodes at time t.
Further, in step S302, the specific steps of constructing a similarity feature matrix of the microorganism according to the functional similarity attributes of the microorganisms in the microorganism database, and constructing a microorganism gene sequence through the microorganism database, so as to obtain a second attribute feature matrix of the microorganism, are as follows:
B1. calculating the functional similarity attribute of the microorganism in the biological database by using a Kamneva tool to obtain a similarity feature matrix S of the microorganism m ∈R nm×nm Wherein nm represents the number of microorganisms in Net1; microorganism m i And microorganismsm j The similarity between them is represented as S m (m i ,m j );
B2. Encoding an original gene sequence of microbial data in a microbial database to obtain a microbial gene sequence;
B3. filling all the encoded microbial gene sequences with zeros to ensure that the lengths of all the filled microbial gene sequences are the same;
B4. analyzing all the filled microorganism gene sequences by using a principal component analysis method to obtain a k-dimensional matrix, and expressing a second attribute characteristic matrix of the microorganism as F by the k-dimensional matrix m ∈R nm×k 。
Further, in step S303, a microorganism-drug similarity feature network is constructed according to the drug similarity feature matrix and the microorganism similarity feature matrix, and the specific steps are as follows:
C1. constructing a microorganism-drug similarity feature network X according to the similarity feature matrix of the drug and the similarity feature matrix of the microorganism simility :
C2. Constructing a microorganism-drug second attribute feature network X according to the second attribute feature matrix of the drug and the second attribute feature matrix of the microorganism secondary :
C3. Combining the microorganism-drug similarity characteristic network and the microorganism-drug second attribute characteristic network to obtain a microorganism-drug multimodal attribute map X:
X=[X simility ,X secondary ]。
furthermore, in step S4, a regularized graph neural network model is established according to Net1, net2 and the multi-modal property graph of the microorganism-drug, and the specific steps are as follows:
s401, establishing a microorganism-drug characteristic matrix according to a microorganism-drug multi-mode attribute diagram, and constructing a microorganism-drug heterogeneous matrix, wherein in the heterogeneous matrix, vi represents microorganisms or drugs of any node, and the heterogeneous matrix is represented as follows:
wherein Y is a characteristic matrix of the microorganism-drug,representing the content feature vector of the node vi;
s402, setting a learnable matrix W epsilon R m Xf, and assigning an initial value to an element of the learnable matrix using a random number, where f is a dimension of node embedding representation set by a hyper-parameter, n = nd + nm is the number of nodes, m =2 × (nd + nm) is a characteristic dimension of the nodes, based onAnd W generating a feature transformed vector
S403, setting a scaling constant s epsilon R which represents the norm of the propagated hidden features and generating normalized feature transformation vectors from the GNCN network of the regularized graph neural network model
S404, solving a formula g () of L2 regularization:
s405. Encoding the microbe-drug association network and the microbe-drug multi-modal attribute map using a GNCN encoder:
wherein A ∈ R nd×nm An adjacency matrix of the correlation network in the step S1, if known correlation exists between the nodes i and j in the correlation network, setting the element Aij in the A to be 1, otherwise, setting the element Aij to be 0;wherein I N Is an identity matrix of the order of N,is composed ofThe degree matrix of (c).
Furthermore, in the step S5, net1 and Net2 are input into the neural network model of the graph in combination with the multi-modal property diagram of the microorganism-drug to obtain embedded representations Z1 and Z2; inputting the embedded expressions Z1 and Z2 into a neural network of the graph for training, and obtaining the trained neural network of the graph specifically comprises the following steps:
Wherein,is the unit vector of i in the matrix,the unit vector of j in the matrix is defined, and degi is the degree of the node i; degj is the degree of the node j;
s502. Embedding vectors according to nodesGenerating a node embedding matrix, generating an implicit variable Z belonging to Rn multiplied by f of the GNCN encoder, and obtaining Z1 corresponding to Net1 and Z2 corresponding to Net 2:
Zi=GNCN(X,A,s);
s503, defining a loss function, wherein the loss function is binary cross entropy between the multi-modal attribute graph and a reconstructed graph obtained by a graph neural network in training:
wherein, L is a loss function, N is the total number of all nodes, y represents the value of a certain element in the adjacency matrix A and takes the value of 0 or 1,adjacency matrix representing reconstructionThe value of the corresponding element is between 0 and 1;
s504, inputting Z1 and Z2 into a DNN classifier of the graph neural network model, setting the training times epoch as k2, adopting random gradient descent in the training process, and stopping training when the loss function is converged to obtain the trained graph neural network.
Furthermore, in the step S5, after the graph neural network model is trained, the graph neural network model is verified, and the verification specifically includes the steps of:
D1. introducing a k-fold cross validation framework, randomly dividing all known microorganism-drug associated data on the existing microorganism-drug associated database into k1 groups under the k-fold cross validation framework, selecting a subset of random sampling unknown associated pairs with the same size batch in each of the k1 groups as a test set, and selecting the remaining known associated pairs as a training set;
D2. inputting the test set into the trained graph neural network model to obtain a classification result;
D3. if the classification result is positive, predicting that the microorganism is associated with the medicine, and if the classification result is negative, predicting that the microorganism is not associated with the medicine;
D4. obtaining an AUC value of the trained graph neural network model according to the classification result; and verifying the accuracy of the neural network model of the graph according to the AUC value.
Further, in step D4, an AUC value of the trained neural network model is obtained according to the classification result, and the specific steps are as follows;
E1. inputting the training set into a model to obtain a reconstruction graph of the current model to the training set, and recording scores of edges between nodes in the reconstruction graph of the current model to the training set as association probability, wherein the association probability takes a value between 0 and 1;
E2. the association probability is used as a classification threshold, when other association probabilities are larger than the classification threshold, the samples are regarded as positive samples, and when other association probabilities are smaller than the classification threshold, the samples are regarded as negative samples;
E3. obtaining a label truth value of an edge in the training set according to the incidence relation of the microorganism and the medicine in the training set, wherein the label truth value is 0 or 1, wherein 0 represents that the edge does not exist, namely the incidence relation does not exist, namely the negative sample actually exists, and 1 represents that the edge exists, namely the incidence relation exists, namely the positive sample actually exists;
E4. and (3) counting the true positive rate and the false positive rate under each classification threshold:
wherein, TPRate is a true positive rate, FPRate is a false positive rate, TP is a true positive rate, which indicates the number of samples actually predicted as positive samples from negative samples, FN is a false negative rate, which indicates the number of samples actually predicted as negative samples from positive samples, FP is a false positive rate, which indicates the number of samples actually predicted as positive samples from negative samples, TN is a true negative rate, which indicates the number of samples actually predicted as negative samples from negative samples;
E5. and (3) drawing an ROC curve by taking the FPRate as a horizontal axis and the TPrate as a vertical axis, and calculating the area of the ROC curve by using a infinitesimal method, namely an AUC value.
The invention has the following beneficial effects:
the invention constructs a microorganism-drug association network through a microorganism-drug association database, and further obtains interaction networks Net1 and Net2; establishing a graph neural network model introduced with regularization, inputting Net1 and Net2 into the graph neural network model in combination with a multi-mode attribute graph of a microorganism-medicament to obtain embedded expressions Z1 and Z2, and inputting the embedded expressions Z1 and Z2 into the graph neural network for training to obtain a trained graph neural network; the interpretable node characteristics of the organisms and the medicines are constructed, and the problem of sparsity brought by the existing microorganism-medicine related data set is considered.
Drawings
FIG. 1 is a schematic flow diagram of a method of predicting a microorganism-drug association of the present invention.
FIG. 2 is a schematic flow chart of a method for predicting a microorganism-drug association effect to construct a multi-modal property map.
FIG. 3 is a schematic flow chart of the method for predicting the association probability of a microorganism-drug association according to the present invention.
Detailed Description
The invention is described in detail below with reference to the drawings and the detailed description.
Example 1
As shown in fig. 1, a method for predicting a microbe-drug association effect includes the steps of:
s1, constructing a microorganism-drug association network through a microorganism-drug association database, wherein the association network is called as Net1;
s2, retrieving related interaction of the microorganisms and the microorganisms through a microorganism database in a microorganism-drug association database, and retrieving related interaction of the drugs and the drugs through a drug database in a microorganism-drug association database; constructing an interaction network according to the related interaction of the microorganism and the related interaction of the drug and the drug, and calling the interaction network as Net2;
s3, constructing a topological attribute network of the medicine through a medicine database, and constructing a microbial gene sequence through a microbial database; constructing a microorganism-drug multi-mode attribute graph according to the comprehensive similarity attribute of the drugs in the drug database, the drug network topology attribute, and the functional similarity attribute and the genome sequence attribute of the microorganisms in the microorganism database;
s4, establishing a graph neural network model introduced with regularization according to Net1, net2 and the multi-mode attribute graph of the microorganism-medicine;
s5, inputting the Net1 and the Net2 into a neural network model of the graph in combination with a multi-mode attribute diagram of the microorganism-medicament to obtain embedded representations Z1 and Z2; inputting the embedded expressions Z1 and Z2 into a neural network of the graph for training to obtain a trained neural network of the graph;
and S6, acquiring a data set to be predicted, and predicting the correlation action of the microorganisms and the drugs in the data set to be predicted through a trained graph neural network.
The invention constructs a microorganism-drug association network through a microorganism-drug association database, and further obtains interaction networks Net1 and Net2; establishing a graph neural network model introduced with regularization, inputting Net1 and Net2 into the graph neural network model in combination with a multi-mode attribute graph of a microorganism-medicament to obtain embedded expressions Z1 and Z2, and inputting the embedded expressions Z1 and Z2 into the graph neural network for training to obtain a trained graph neural network; interpretable node characteristics of organisms and medicines are constructed, and the problem of sparsity brought by an existing microorganism-medicine related data set is considered.
Example 2
Specifically, as shown in fig. 2, in a specific embodiment, in step S3, a topological attribute network of the drug is constructed through the drug database, and a microbial gene sequence is constructed through the microbial database; the specific steps of constructing the microorganism-drug multi-modal attribute map according to the comprehensive similarity attribute and drug network topology attribute of the drugs in the drug database and the functional similarity attribute and genome sequence attribute of the microorganisms in the microorganism database are as follows:
s301, constructing a similarity characteristic matrix of the medicines according to the medicine similarity attributes in the medicine database, and constructing a topological attribute network of the medicines through the medicine database, so as to obtain a second attribute characteristic matrix of the medicines;
s302, constructing a similarity characteristic matrix of microorganisms according to the functional similarity attributes of the microorganisms in the microorganism database, and constructing a microorganism gene sequence through the microorganism database, so as to obtain a second attribute characteristic matrix of the microorganisms;
s303, constructing a microorganism-medicament similarity characteristic network according to the medicament similarity characteristic matrix and the microorganism similarity characteristic matrix;
s304, constructing a microorganism-drug second attribute feature network according to the second attribute feature matrix of the drug and the second attribute feature matrix of the microorganism;
s305, combining the microorganism-drug similarity characteristic network with the microorganism-drug second attribute characteristic network to obtain a microorganism-drug multi-mode attribute map.
In a specific embodiment, in step S301, a similarity feature matrix of the drug is constructed according to the drug similarity attribute in the drug database, and a topological attribute network of the drug is constructed through the drug database, so as to obtain a second attribute feature matrix of the drug, which specifically includes:
A1. calculating the similarity attribute of the drugs in the drug database by using SIMCOMP2 tool to obtain a molecular structure similarity matrix DS of the drugs struct (di,dj);
A2. The drug-drug interaction spectrum in Net2 is represented by matrix DIP, resulting in a normalized kernel bandwidth:
where μ denotes the normalized kernel bandwidth and μ' is the original bandwidth, set to 1,DIP (d) i ) Denotes the drug d i Interaction with other drugs, nd represents the number of microorganisms in the Net1;
A3. the similarity characteristic matrix of the drugs is expressed as S d (d i ,d j ):
A4. Constructing the topological attribute of a drug network in a drug database by a random walk method with restart, performing random drift and restart on the drug network until the drug network is converged, completing the construction of the drug network, thereby obtaining the probability distribution vector of each drug, and constructing a second attribute feature matrix F of the drug d ∈R nd×nd 。
In a specific embodiment, in step A4, the formula of random drift and restart is:
wherein,representing the probability that the ith node of the drug network moves to other nodes at time T +1, theta is the restart probability, T is the transition probability matrix, p i (0) ∈R n×1 Starting probability vector, p, representing the ith node of a drug network i (t) ∈R n×1 Representing the probability that the ith node of the drug network moves to other nodes at time t.
In an embodiment, in the step S302, the specific steps of constructing a similarity feature matrix of the microorganisms according to the functional similarity attributes of the microorganisms in the microorganism database, and constructing a microorganism gene sequence through the microorganism database, so as to obtain the second attribute feature matrix of the microorganisms are:
B1. calculating the functional similarity attribute of the microorganism in the biological database by using a Kamneva tool to obtain a similarity feature matrix S of the microorganism m ∈R nm×nm Wherein nm represents the number of microorganisms in Net1; microorganism m i And a microorganism m j The similarity between them is represented as S m (m i ,m j );
B2. Encoding an original gene sequence of microbial data in a microbial database to obtain a microbial gene sequence;
B3. filling all the encoded microbial gene sequences with zeros to ensure that the lengths of all the filled microbial gene sequences are the same;
B4. analyzing all the filled microorganism gene sequences by using a principal component analysis method to obtain a k-dimensional matrix, and expressing a second attribute characteristic matrix of the microorganism as F by the k-dimensional matrix m ∈R nm×k 。
In a specific embodiment, in step S303, a microorganism-drug similarity feature network is constructed according to the drug similarity feature matrix and the microorganism similarity feature matrix, and the specific steps are as follows:
C1. constructing a microorganism-drug similarity feature network X according to the similarity feature matrix of the drug and the similarity feature matrix of the microorganism simility :
C2. Constructing a microorganism-drug second attribute feature network X according to the second attribute feature matrix of the drug and the second attribute feature matrix of the microorganism secondary :
C3. Combining the microorganism-drug similarity characteristic network and the microorganism-drug second attribute characteristic network to obtain a microorganism-drug multimodal attribute map X:
X=[X simility ,X secondary ]。
in a specific embodiment, in step S4, a regularized graph neural network model is established according to Net1, net2, and a multi-modal property graph of a microbe-drug, and the specific steps are as follows:
s401, establishing a microorganism-drug characteristic matrix according to a microorganism-drug multi-modal attribute map, and establishing a microorganism-drug heterogeneous matrix, wherein in the heterogeneous matrix, vi represents a microorganism or a drug of any node, and the heterogeneous matrix is represented as follows:
wherein Y is a characteristic matrix of the microorganism-drug,representing content feature vectors of the nodes vi;
s402, setting a learnable matrix W epsilon R m F and assigning initial values to elements of the learnable matrix using random numbers, wherein f is a dimension of node embedding representation set by a hyper-parameter, n = nd + nm is the number of nodes, m =2 × (nd + nm) is a characteristic dimension of the nodes, based onAnd W generating a feature transformed vector
S403, setting a scaling constant s epsilon R which represents the norm of the propagated hidden features and generating normalized feature transformation vectors from the GNCN network of the regularized graph neural network model
S404, solving a formula g () of L2 regularization:
s405. Encoding the microbe-drug association network and the microbe-drug multi-modal attribute map using a GNCN encoder:
wherein A ∈ R nd×nm An adjacency matrix of the correlation network in the step S1, if known correlation exists between the nodes i and j in the correlation network, setting the element Aij in the A to be 1, otherwise, setting the element Aij to be 0;wherein I N Is an identity matrix of the order of N,is composed ofThe degree matrix of (c).
In one embodiment, as shown in fig. 3, in step S5, net1, net2 are input into the graph neural network model in combination with the multi-modal property map of the microbe-drug to obtain embedded representations Z1 and Z2; inputting the embedded expressions Z1 and Z2 into a neural network of the graph for training, and specifically obtaining the trained neural network of the graph comprises the following steps:
Wherein,is the unit vector of i in the matrix,the unit vector of j in the matrix is defined, and degi is the degree of the node i; degj is the degree of the node j;
s502. Embedding vectors according to nodesGenerating a node embedding matrix, generating an implicit variable Z belonging to Rn multiplied by f of the GNCN encoder, and obtaining Z1 corresponding to Net1 and Z2 corresponding to Net 2:
Zi=GNCN(X,A,s);
s503, defining a loss function, wherein the loss function is binary cross entropy between the multi-modal attribute graph and a reconstructed graph obtained by a graph neural network in training:
wherein, L is a loss function, N is the total number of all nodes, y represents the value of a certain element in the adjacency matrix A and takes the value of 0 or 1,adjacency matrix representing reconstructionThe value of the corresponding element is between 0 and 1;
s504, inputting the Z1 and the Z2 into a DNN classifier of the graph neural network model, setting the training times epoch as k2, adopting random gradient descent in the training process, and stopping training when a loss function is converged to obtain the trained graph neural network.
Example 3
In a specific embodiment, in the step S5, after the graph neural network model is trained, verification of the graph neural network model is further performed, where the verification specifically includes:
D1. introducing a k-fold cross validation framework, randomly dividing all known microorganism-drug associated data on the existing microorganism-drug associated database into k1 groups under the k-fold cross validation framework, selecting a subset of random sampling unknown associated pairs with the same size batch in each of the k1 groups as a test set, and selecting the remaining known associated pairs as a training set;
D2. inputting the test set into the trained graph neural network model to obtain a classification result;
D3. if the classification result is positive, predicting that the microorganism is associated with the medicine, and if the classification result is negative, predicting that the microorganism is not associated with the medicine;
D4. obtaining an AUC value of the trained graph neural network model according to the classification result; and verifying the accuracy of the graph neural network model according to the AUC value.
In a specific embodiment, in step D4, an AUC value of the trained neural network model is obtained according to the classification result, and the specific steps are as follows;
E1. inputting the training set into a model to obtain a reconstruction graph of the current model to the training set, and recording scores of edges between nodes in the reconstruction graph of the current model to the training set as association probability, wherein the association probability takes a value between 0 and 1;
E2. the association probability is used as a classification threshold, when other association probabilities are larger than the classification threshold, the samples are regarded as positive samples, and when other association probabilities are smaller than the classification threshold, the samples are regarded as negative samples;
E3. obtaining a label truth value of an edge in the training set according to the incidence relation of the microorganism and the medicine in the training set, wherein the label truth value is 0 or 1, wherein 0 represents that the edge does not exist, namely the incidence relation does not exist, namely the negative sample actually exists, and 1 represents that the edge exists, namely the incidence relation exists, namely the positive sample actually exists;
E4. and (3) counting the true positive rate and the false positive rate under each classification threshold:
wherein, TPRate is a true positive rate, FPRate is a false positive rate, TP is a true positive rate, which indicates the number of samples actually predicted as positive samples from negative samples, FN is a false negative rate, which indicates the number of samples actually predicted as negative samples from positive samples, FP is a false positive rate, which indicates the number of samples actually predicted as positive samples from negative samples, TN is a true negative rate, which indicates the number of samples actually predicted as negative samples from negative samples;
E5. and (3) drawing an ROC curve by taking FPRate as a horizontal axis and TPrate as a vertical axis, and calculating the area of the ROC curve by using a infinitesimal method, namely an AUC value.
In this example, in order to verify the method for predicting the microbe-drug association effect of the present invention, this example uses default parameter settings, runs the method and 5 existing methods on MDAD dataset, uses AUC value as the performance evaluation index, and the greater the AUC value, the higher the accuracy of the method.
In this example, 5-fold cross validation and 10-fold cross validation were performed on all methods including the present invention, and the experimentally validated drug combinations were randomly divided into 5 or 10 subsets of the same size, each subset in turn being used as a test set, with the remainder being used to train the model. In order to eliminate random sampling deviation, the process is repeated for 10 times, a final AUC score is calculated according to the average value of AUC values in 10 repeated verifications, and the final AUC score is used as a performance index to evaluate the accuracy of each method.
The validation results are shown in the table below, and in the 5-fold cross validation, the method employed by the present invention is expressed as G2 gnamda, and the final AUC score of the present invention is the highest of all methods. The final AUC score of the invention was also the highest among all methods in the 10-fold cross validation. Therefore, the accuracy of the method is superior to that of the prior method:
it should be understood that the above-described embodiments of the present invention are merely examples for clearly illustrating the present invention, and are not intended to limit the embodiments of the present invention. Any modification, equivalent replacement, and improvement made within the spirit and principle of the present invention should be included in the protection scope of the claims of the present invention.
Claims (10)
1. A method of predicting a microorganism-drug association effect, comprising: the method comprises the following steps:
s1, constructing a microorganism-drug association network through a microorganism-drug association database, wherein the association network is called as Net1;
s2, retrieving related interaction of the microorganisms and the microorganisms through a microorganism database in a microorganism-drug association database, and retrieving related interaction of the drugs and the drugs through a drug database in a microorganism-drug association database; constructing an interaction network according to the related interaction of the microorganism and the related interaction of the drug and the drug, and calling the interaction network as Net2;
s3, constructing a topological attribute network of the medicine through a medicine database, and constructing a microbial gene sequence through a microbial database; constructing a multi-modal attribute map of the microorganism-drug according to the comprehensive similarity attribute and drug network topology attribute of the drug in the drug database, and the functional similarity attribute and genome sequence attribute of the microorganism in the microorganism database;
s4, establishing a graph neural network model introduced with regularization according to Net1, net2 and the multi-mode attribute graph of the microorganism-medicine;
s5, inputting Net1 and Net2 into a neural network model of a graph by combining with a multi-modal attribute diagram of the microorganism and the medicament to obtain embedded expressions Z1 and Z2; inputting the embedded expressions Z1 and Z2 into a neural network of the graph for training to obtain a trained neural network of the graph;
s6, predicting the microorganism-drug association effect in the microorganism-drug data set by training a graph neural network.
2. The method of predicting a microbe-drug association as recited in claim 1, wherein: in the step S3, a topological attribute network of the medicine is constructed through the medicine database, and a microbial gene sequence is constructed through the microbial database; the specific steps of constructing the microorganism-drug multi-modal attribute map according to the comprehensive similarity attribute and drug network topology attribute of the drugs in the drug database and the functional similarity attribute and genome sequence attribute of the microorganisms in the microorganism database are as follows:
s301, constructing a similarity characteristic matrix of the medicines according to the medicine similarity attributes in the medicine database, and constructing a topological attribute network of the medicines through the medicine database, so as to obtain a second attribute characteristic matrix of the medicines;
s302, constructing a similarity characteristic matrix of the microorganisms according to the functional similarity attributes of the microorganisms in the microorganism database, and constructing a microorganism gene sequence through the microorganism database so as to obtain a second attribute characteristic matrix of the microorganisms;
s303, constructing a microorganism-medicament similarity characteristic network according to the medicament similarity characteristic matrix and the microorganism similarity characteristic matrix;
s304, constructing a microorganism-drug second attribute feature network according to the second attribute feature matrix of the drug and the second attribute feature matrix of the microorganism;
s305, combining the microorganism-drug similarity characteristic network with the microorganism-drug second attribute characteristic network to obtain a microorganism-drug multi-mode attribute map.
3. The method of predicting a microbe-drug association effect of claim 2, wherein: in step S301, a similarity feature matrix of the drug is constructed according to the drug similarity attributes in the drug database, and a topology attribute network of the drug is constructed through the drug database, so as to obtain a second attribute feature matrix of the drug, which specifically includes:
A1. calculating the similarity attribute of the drugs in the drug database by using SIMCOMP2 tool to obtain a molecular structure similarity matrix DS of the drugs struct (di,dj);
A2. The drug-drug interaction spectrum in Net2 is represented by matrix DIP, yielding the normalized kernel bandwidth:
where μ represents the normalized kernel bandwidth and μ' is the original bandwidth, set to 1,DIP (d) i ) Denotes the drug d i Interaction with other drugs, nd represents the number of microorganisms in the Net1;
A3. the similarity characteristic matrix of the drugs is expressed as S d (d i ,d j ):
A4. Constructing a drug network topology attribute in a drug database by a random walk method with restart, performing random drift and restart on a drug network until the drug network is converged to complete the construction of the drug network, thereby obtaining a probability distribution vector of each drug, and constructing a second attribute feature matrix F of the drug d ∈R nd×nd 。
4. The method of predicting a microbe-drug association as recited in claim 3, wherein: in step A4, the formula of random drift and restart is:
wherein,representing the probability that the ith node of the drug network moves to other nodes at time T +1, theta is the restart probability, T is the transition probability matrix, p i (0) ∈R n×1 Starting probability vector, p, representing the ith node of a drug network i (t) ∈R n×1 Representing the probability that the ith node of the drug network moves to other nodes at time t.
5. The method of predicting a microbe-drug association effect of claim 2, wherein: in the step S302, the specific steps of constructing the similarity feature matrix of the microorganism according to the functional similarity attributes of the microorganisms in the microorganism database, and constructing the microorganism gene sequence through the microorganism database, thereby obtaining the second attribute feature matrix of the microorganism, are:
B1. calculating the functional similarity attribute of the microorganisms in the biological database by using a Kamneva tool to obtain a similarity feature matrix S of the microorganisms m ∈R nm×nm Wherein nm represents the number of microorganisms in Net1; microorganism m i And a microorganism m j The similarity between them is represented as S m (m i ,m j );
B2. Encoding an original gene sequence of microbial data in a microbial database to obtain a microbial gene sequence;
B3. filling all the encoded microbial gene sequences with zeros to ensure that the lengths of all the filled microbial gene sequences are the same;
B4. analyzing all the filled microorganism gene sequences by using a principal component analysis method to obtain a k-dimensional matrix, and expressing a second attribute characteristic matrix of the microorganism as F by the k-dimensional matrix m ∈R nm×k 。
6. The method of predicting a microbe-drug association as recited in claim 5, wherein: in the step S303, a microorganism-drug similarity feature network is constructed according to the drug similarity feature matrix and the microorganism similarity feature matrix, and the specific steps are as follows:
C1. constructing a microorganism-drug similarity feature network X according to the similarity feature matrix of the drug and the similarity feature matrix of the microorganism simility :
C2. Constructing a microorganism-drug second attribute feature network X according to the drug second attribute feature matrix and the microorganism second attribute feature matrix secondary :
C3. Combining the microorganism-drug similarity characteristic network with the microorganism-drug second attribute characteristic network to obtain a microorganism-drug multimodal attribute diagram X:
X=[X simility ,X secondary ]。
7. the method of predicting a microbe-drug association effect of claim 6, wherein: in the step S4, a graph neural network model introduced with regularization is established according to Net1, net2 and a multi-modal attribute graph of microorganism-medicament, and the concrete steps are as follows:
s401, establishing a microorganism-drug characteristic matrix according to a microorganism-drug multi-mode attribute diagram, and constructing a microorganism-drug heterogeneous matrix, wherein in the heterogeneous matrix, vi represents microorganisms or drugs of any node, and the heterogeneous matrix is represented as follows:
wherein Y is a characteristic matrix of the microorganism-drug,representing content feature vectors of the nodes vi;
s402, setting a learnable matrix W epsilon R m F and assigning initial values to elements of the learnable matrix using random numbers, wherein f is a dimension of node embedding representation set by a hyper-parameter, n = nd + nm is the number of nodes, m =2 × (nd + nm) is a characteristic dimension of the nodes, based onAnd W generating a feature transformed vector
S403, setting a scaling constant s epsilon R which represents the norm of the propagated hidden features and generating normalized feature transformation vectors from the GNCN network of the regularized graph neural network model
S404, solving a formula g () of L2 regularization:
s405. Encoding the microbe-drug association network and the microbe-drug multi-modal attribute map using a GNCN encoder:
wherein A ∈ R nd×nm Setting an element Ai j in the A to be 1 if known correlation exists between nodes i and j in the correlation network for the adjacency matrix of the correlation network in the step S1, otherwise, setting the element Ai j to be 0;wherein I N Is an identity matrix of the order of N,is composed ofThe degree matrix of (c).
8. The method of predicting a microbe-drug association effect of claim 7, wherein: in the step S5, net1 and Net2 are combined with a multi-modal attribute diagram of the microorganism-drug to be input into a graph neural network model to obtain embedded expressions Z1 and Z2; inputting the embedded expressions Z1 and Z2 into a neural network of the graph for training, and specifically obtaining the trained neural network of the graph comprises the following steps:
Wherein,is the unit vector of i in the matrix,the unit vector of j in the matrix is defined, and degi is the degree of the node i; degj is the degree of the node j;
s502. Embedding vectors according to nodesGenerating a node embedding matrix, generating an implicit variable Z belonging to Rn multiplied by f of the GNCN encoder, and obtaining Z1 corresponding to Net1 and Z2 corresponding to Net 2:
Z=GNCN(X,A,s);
s503, defining a loss function, wherein the loss function is binary cross entropy between the multi-modal attribute graph and a reconstructed graph obtained by a graph neural network in training:
wherein, L is a loss function, N is the total number of all nodes, y represents the value of a certain element in the adjacency matrix A and takes the value of 0 or 1,adjacency matrix representing reconstructionThe value of the corresponding element is between 0 and 1;
s504, inputting Z1 and Z2 into a DNN classifier of the graph neural network model, setting the training times epoch as k2, adopting random gradient descent in the training process, and stopping training when the loss function is converged to obtain the trained graph neural network.
9. The method of predicting a microbe-drug association effect of claim 8, wherein: in the step S5, after the neural network model of the graph is trained, the accuracy of the neural network model of the graph is verified, and the verification specifically includes the steps of:
D1. introducing a k-fold cross validation framework, randomly dividing all known microorganism-drug associated data on the existing microorganism-drug associated database into k1 groups under the k-fold cross validation framework, selecting a subset of random sampling unknown associated pairs with the same size batch in each of the k1 groups as a test set, and selecting the remaining known associated pairs as a training set;
D2. inputting the test set into the trained graph neural network model to obtain a classification result;
D3. if the classification result is positive, predicting that the microorganism is associated with the medicine, and if the classification result is negative, predicting that the microorganism is not associated with the medicine;
D4. obtaining an AUC value of the trained graph neural network model according to the classification result; and verifying the accuracy of the graph neural network model according to the AUC value.
10. The method of predicting a microbe-drug association effect of claim 9, wherein: in the step D4, obtaining an AUC value of the trained graph neural network model according to the classification result, wherein the specific step is as follows;
E1. inputting the training set into a model to obtain a reconstruction graph of the current model to the training set, and recording scores of edges between nodes in the reconstruction graph of the current model to the training set as association probability, wherein the association probability takes a value between 0 and 1;
E2. the association probability is used as a classification threshold value, when other association probabilities are larger than the classification threshold value, the sample is regarded as a positive sample, and when other association probabilities are smaller than the classification threshold value, the sample is regarded as a negative sample;
E3. obtaining a label truth value of an edge in the training set according to the association relationship of the microorganisms and the medicines in the training set, wherein the label truth value is 0 or 1, wherein 0 represents that the edge does not exist, namely the association relationship does not exist, namely the negative sample actually exists, and 1 represents that the edge exists, namely the association relationship exists, namely the positive sample actually exists;
E4. and (3) counting the true positive rate and the false positive rate under each classification threshold:
wherein, TPRate is a true positive rate, FPRate is a false positive rate, TP is a true positive rate, which indicates the number of samples actually predicted as positive samples from negative samples, FN is a false negative rate, which indicates the number of samples actually predicted as negative samples from positive samples, FP is a false positive rate, which indicates the number of samples actually predicted as positive samples from negative samples, TN is a true negative rate, which indicates the number of samples actually predicted as negative samples from negative samples;
E5. and (3) drawing an ROC curve by taking the FPRate as a horizontal axis and the TPrate as a vertical axis, and calculating the area of the ROC curve by using a infinitesimal method, namely an AUC value.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202210938454.8A CN115472305A (en) | 2022-08-05 | 2022-08-05 | Method and system for predicting microorganism-drug association effect |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202210938454.8A CN115472305A (en) | 2022-08-05 | 2022-08-05 | Method and system for predicting microorganism-drug association effect |
Publications (1)
Publication Number | Publication Date |
---|---|
CN115472305A true CN115472305A (en) | 2022-12-13 |
Family
ID=84366630
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202210938454.8A Pending CN115472305A (en) | 2022-08-05 | 2022-08-05 | Method and system for predicting microorganism-drug association effect |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN115472305A (en) |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN117095741A (en) * | 2023-10-19 | 2023-11-21 | 华东交通大学 | Graph self-attention-based microorganism-drug association prediction method |
-
2022
- 2022-08-05 CN CN202210938454.8A patent/CN115472305A/en active Pending
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN117095741A (en) * | 2023-10-19 | 2023-11-21 | 华东交通大学 | Graph self-attention-based microorganism-drug association prediction method |
CN117095741B (en) * | 2023-10-19 | 2024-01-30 | 华东交通大学 | Graph self-attention-based microorganism-drug association prediction method |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US11462304B2 (en) | Artificial intelligence engine architecture for generating candidate drugs | |
Zeebaree et al. | Machine Learning Semi-Supervised Algorithms for Gene Selection: A Review | |
Hu et al. | Active learning with partial feedback | |
Urbanowicz et al. | An analysis pipeline with statistical and visualization-guided knowledge discovery for michigan-style learning classifier systems | |
Huang et al. | Machine learning applications for therapeutic tasks with genomics data | |
CN113764034B (en) | Method, device, equipment and medium for predicting potential BGC in genome sequence | |
CN114582429B (en) | Mycobacterium tuberculosis drug resistance prediction method and device based on hierarchical attention neural network | |
Sekaran et al. | Predicting autism spectrum disorder from associative genetic markers of phenotypic groups using machine learning | |
Zhao et al. | Learning, visualizing and exploring 16S rRNA structure using an attention-based deep neural network | |
CN116386899A (en) | Graph learning-based medicine disease association relation prediction method and related equipment | |
Choi et al. | DeepMicroGen: a generative adversarial network-based method for longitudinal microbiome data imputation | |
Yelmen et al. | Deep convolutional and conditional neural networks for large-scale genomic data generation | |
CN115472305A (en) | Method and system for predicting microorganism-drug association effect | |
KR20200133067A (en) | Method and system for predicting disease from gut microbial data | |
Pamulaparthyvenkata et al. | Leveraging Interpretable Machine Learning for Granular Risk Stratification in Hospital Readmission: Unveiling Actionable Insights from Electronic Health Records | |
CN117875444A (en) | Model training method, antibacterial peptide prediction method and system | |
Dedja et al. | BELLATREX: Building explanations through a locally accurate rule extractor | |
CN113284627A (en) | Medication recommendation method based on patient characterization learning | |
CN115148303A (en) | Microorganism-drug association prediction method based on normalized graph neural network | |
Fan et al. | Large margin nearest neighbor embedding for knowledge representation | |
Leke-Betechuoh et al. | Prediction of HIV status from demographic data using neural networks | |
Souliotis | Bayesian and machine learning approaches in metagenomics | |
CN115346688A (en) | Method for predicting relation between microorganisms and medicines based on multi-association graph | |
Kurz et al. | Isolating cost drivers in interstitial lung disease treatment using nonparametric Bayesian methods | |
CN118609823B (en) | Glioma risk prediction method and glioma risk prediction system based on multi-modal information |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination |