CN114420310A - Medicine ATCCode prediction method based on graph transformation network - Google Patents

Medicine ATCCode prediction method based on graph transformation network Download PDF

Info

Publication number
CN114420310A
CN114420310A CN202210063363.4A CN202210063363A CN114420310A CN 114420310 A CN114420310 A CN 114420310A CN 202210063363 A CN202210063363 A CN 202210063363A CN 114420310 A CN114420310 A CN 114420310A
Authority
CN
China
Prior art keywords
drug
disease
layer
matrix
target protein
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202210063363.4A
Other languages
Chinese (zh)
Inventor
罗慧敏
索志豪
阎朝坤
张戈
王建林
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Henan University
Original Assignee
Henan University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Henan University filed Critical Henan University
Priority to CN202210063363.4A priority Critical patent/CN114420310A/en
Publication of CN114420310A publication Critical patent/CN114420310A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16HHEALTHCARE INFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR THE HANDLING OR PROCESSING OF MEDICAL OR HEALTHCARE DATA
    • G16H70/00ICT specially adapted for the handling or processing of medical references
    • G16H70/40ICT specially adapted for the handling or processing of medical references relating to drugs, e.g. their side effects or intended usage
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16BBIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
    • G16B15/00ICT specially adapted for analysing two-dimensional or three-dimensional molecular structures, e.g. structural or functional relations or structure alignment
    • G16B15/30Drug targeting using structural data; Docking or binding prediction
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16BBIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
    • G16B40/00ICT specially adapted for biostatistics; ICT specially adapted for bioinformatics-related machine learning or data mining, e.g. knowledge discovery or pattern finding
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16BBIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
    • G16B50/00ICT programming tools or database systems specially adapted for bioinformatics
    • G16B50/30Data warehousing; Computing architectures
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16CCOMPUTATIONAL CHEMISTRY; CHEMOINFORMATICS; COMPUTATIONAL MATERIALS SCIENCE
    • G16C20/00Chemoinformatics, i.e. ICT specially adapted for the handling of physicochemical or structural data of chemical particles, elements, compounds or mixtures
    • G16C20/70Machine learning, data mining or chemometrics
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16CCOMPUTATIONAL CHEMISTRY; CHEMOINFORMATICS; COMPUTATIONAL MATERIALS SCIENCE
    • G16C20/00Chemoinformatics, i.e. ICT specially adapted for the handling of physicochemical or structural data of chemical particles, elements, compounds or mixtures
    • G16C20/90Programming languages; Computing architectures; Database systems; Data warehousing

Landscapes

  • Engineering & Computer Science (AREA)
  • Health & Medical Sciences (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Theoretical Computer Science (AREA)
  • Medical Informatics (AREA)
  • Physics & Mathematics (AREA)
  • General Health & Medical Sciences (AREA)
  • Chemical & Material Sciences (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Databases & Information Systems (AREA)
  • Spectroscopy & Molecular Physics (AREA)
  • Software Systems (AREA)
  • Biotechnology (AREA)
  • Computing Systems (AREA)
  • Evolutionary Biology (AREA)
  • Crystallography & Structural Chemistry (AREA)
  • Biophysics (AREA)
  • Evolutionary Computation (AREA)
  • Epidemiology (AREA)
  • Public Health (AREA)
  • Pharmacology & Pharmacy (AREA)
  • Bioethics (AREA)
  • Medicinal Chemistry (AREA)
  • Data Mining & Analysis (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Artificial Intelligence (AREA)
  • Toxicology (AREA)
  • Primary Health Care (AREA)
  • Investigating Or Analysing Biological Materials (AREA)

Abstract

The invention discloses a medicine ATC Code prediction method based on a Graph transformation network, namely DACPGTN, which comprises the steps of firstly, obtaining target protein and disease related to medicine, obtaining 7 kinds of medicine similarity through medicine interaction information based on different evaluation standards, searching or calculating the target protein and disease similarity information related to the medicine, using the similarity information as characteristics to jointly construct a corresponding composite characteristic matrix, secondly, constructing an isomerous Graph representing a plurality of different edge relations by considering the known correlation information existing among three entities of the introduced medicine, the target protein and the disease, learning the correlation information of a plurality of isomerous adjacent matrixes by using Graph transformation network Layer in the Graph transformation network, thereby learning the potential Graph structure between the medicine and the multiple target protein and the disease, and finally obtaining the correlation information Graph structure by the Graph transformation network Layer, inputting the characteristic matrix and the drug-target protein-disease composite characteristic matrix into an end-to-end prediction module for learning to make a final drug ATC Code prediction; the method is simple and effective, and compared with other methods, and tests on a data set show that the method has better performance in the aspect of medicine ATC Code prediction.

Description

Medicine ATCCode prediction method based on graph transformation network
Technical Field
The invention belongs to the field of bioinformatics, and particularly relates to a graph transformation network-based drug ATCCode prediction method, namely DACPGTN, namely the graph transformation network is utilized to predict the ATC Code of a known drug.
Background
Research and development of a medicine are time-consuming and money-consuming works, and a new medicine needs to be researched for decades from research and development to use, and costs billions of dollars. How to find new indications from the existing approved medicines and reduce the development cost is a research hotspot in the field of bioinformatics at present. The Anatomical, Therapeutic and Chemical classification system (ATC) for drugs is the official classification system for drugs by the world health organization. The introduction of standard ATC Code in ATC system greatly facilitates the use of medicine in the treatment stage. The method predicts the classification of Anatomical Therapeutic Chemistry (ATC) of a given compound, infers the effective components, treatment, pharmacology and chemical properties of the compound, is helpful for correctly using the medicine or inferring the new application of the compound, is convenient to know the indication and potential toxic and side effects of the compound, accelerates the development process of the medicine, and is a common new application research idea of old medicines. The ATC Code classifies drugs into five grades, first grade, organ or anatomical system on which the drug acts; second order, pharmacological effects; tertiary and quaternary, chemical, pharmacological and therapeutic subgroups; grade five, specific single or combined medication. The first level includes 14 categories, respectively, (1) organism track and method, (2) Blood and Blood forming organs, (3) Cardiovascular system, (4) Dermatologicals, (5) genomic system and sex microorganisms, (6) systematic carbohydrate preparation, enclosing sex carbohydrates and insulators, (7) antibiotic-interacting for system use, (8) antibiotic and immunological preparation, (9) Musculelel system, (10) neurous system, (11) antibiotic products, interactions and reagents, (12) resource system, (13) vacuum system, (14) vacuum system.
In the widely used medicament information database at present, a large amount of medicaments without ATCCODes exist, and the traditional experimental method is applied to carry out ATCCODe classification on new medicaments or existing medicaments, which wastes time and labor. With the accumulation of relevant data of drugs and the rapid development of various pharmaco informatics databases, the prediction of the drug ATCCode by the prior art means is taken as a research and development strategy widely adopted internationally, and the method has higher input-output efficiency. How to design an effective drug ATC Code prediction method has attracted more and more attention. In the initial pharmaceutical ATCCode study, prediction of ATCCode was defined as a single label learning task, which was considered inappropriate due to the multi-label nature of the biological system, which is a problem with the ATCCode system of compounds.
In recent years, several multi-label classification methods for drug ATC classification have been proposed. For example: chen et al, first, propose to develop a classification method to predict the drug ATCCode by integrating the drug chemistry-chemistry interaction information and the chemistry-chemistry similarity information, and construct a drug ATCCode primary code reference dataset. On the basis of the reference data set, classification methods integrating related information of a plurality of medicines are provided, and ATCCode primary codes of the medicines are predicted. Cheng et al propose a multi-labeled Gaussian kernel regression classifier atc-mis f that assigns drugs to 14 ATCCode first class classes based on drug chemistry-chemistry interactions, structural and fingerprint similarities. After that, Cheng et al further integrated the drug ontology-based predictor iATC-mDO, and improved iATC-mISF to iATC-mHyb on this basis, improving the prediction performance of the classifier. A multi-label classifier EnsLIF based on a gradient histogram algorithm is developed by Nanni and Brahnam, and a one-dimensional characteristic vector of a drug compound is constructed into a two-dimensional matrix, so that the classification performance is improved to a certain extent. ZHou and the like construct a plurality of drug interaction networks, extract drug characteristics in the networks through a network embedding algorithm Mashup, convert original multi-label classification problems into a plurality of single-label classification problems by adopting RAndom k-labELsets (RAKEL) algorithms, and construct a classifier iATC-NRAKEL by adopting a classical machine learning algorithm Support Vector Machine (SVM) in a classification stage to obtain a better prediction effect. On the basis of the classifier, Zhou et al simplify the input of the classifier, and propose a multi-label classifier iATC-FRAKEL using only drug fingerprint information (SMILES format) as feature input, ATCCode for identifying drugs, and provide web services. Wang et al propose a method for predicting drug first-level ATCCode ATC-NLSP, ATC-NLSP uses a machine learning framework, combines drug-drug interaction information, structural similarity and fingerprint similarity, and adopts an NLSP method to discuss correlation among labels, thereby providing a better prediction result. With the successful application of the deep learning technology in multiple fields, Nanni and the like propose a first-level ATC Code multi-label classifier system (FUS3) integrated based on a deep learning method, extract features by using a Convolutional Neural Network (CNN) and a long-short term memory network, train on two general classifiers and obtain better effect. In the current latest research, Zhao et al proposes a new drug ATC Code end-to-end prediction model CGATCPred, which uses a CNN layer to extract composite features from 7 drug association score matrices, establishes an ATC tag association diagram, and learns tag information through two GCN layers in combination with word embedded information. And constructing a new feature by using the dot product between the composite feature and the generated label correlation matrix, splicing the generated new feature and the composite feature extracted by the CNN layer into a fully-connected neural network layer, and predicting the ATCCode of the medicine.
In summary, most of the existing drug ATCCode prediction methods predict based on the correlation between the property of the drug and the drug ATCCode label. To a certain extent, the potential effect of relevant information such as target protein and diseases related to the medicine on medicine ATC Code prediction is ignored, and the known relevant information among different types of data is not fully utilized.
Disclosure of Invention
In order to solve the above problems, the present invention provides a Graph transformation Network-based pharmaceutical ATCCode prediction method, i.e., DACPGTN (Drug-ATC code prediction method on Graph Transformer Network). The implementation of the method is based on potential correlation information between the drug and the related target protein and the disease, and can provide valuable information for the prediction of the drug ATCCode. The hypothesis that the ATCCode classes of two drugs may be the same when the two drugs act on the same target protein or disease, or when there is a multiple association between the two drugs and a target protein or disease. Firstly, acquiring the characteristics of a drug and related target proteins and diseases thereof, and constructing a composite characteristic matrix; secondly, a group of heterogeneous networks is constructed according to the association information among the drug-target protein, the drug-disease and the target protein-disease, and potential association information in the group of heterogeneous networks is learned by using a GraphTransformamer layer in a graph conversion network; and finally, inputting the obtained composite characteristic matrix and the potential correlation information matrix into an end-to-end prediction module to predict the medicine ATCCode. Compared with other methods and tests on a data set show that the method has better performance in the aspect of medicine ATC Code prediction.
The technical scheme adopted by the invention is as follows:
(1) construction of drug-target protein-disease composite feature matrix
(2) Construction of heterogeneous networks between drugs, target proteins, diseases
(3) Obtaining potential association information between medicine-target protein-disease
(4) Predictive drug ATCCode label
The invention has the beneficial effects that: according to the method, the potential association between the medicine and the related entity is obtained by integrating the composite characteristic information of the medicine and the related entity and utilizing the Graph transform Layer in the Graph conversion network, and the known biological information is fully utilized, so that the experimental result shows that the ATCCode prediction method for the medicine can effectively predict the ATCCode label of the medicine. The method is simple and effective, and compared with other methods, and tests on a data set show that the method has better performance in the aspect of medicine ATC Code prediction.
Drawings
FIG. 1 is a flow chart of DACPGTN according to the present invention.
FIG. 2 is a schematic diagram of the construction of the drug-target protein-disease complex feature of the present invention.
Fig. 3 is a schematic diagram of potential association between multi-source heterogeneous network construction and graphtransformamer layer learning according to the present invention.
FIG. 4 is a schematic diagram of an end-to-end prediction module according to the present invention.
FIG. 5 is a diagram illustrating the effect of the number of output nodes of the GCN feature extractor on the result.
Detailed Description
As shown in fig. 1 to 5, a graph switching network-based drug ATCCode prediction method includes the following steps:
1) the method comprises the following steps of obtaining target proteins and diseases related to the drugs by using the known drugs, and calculating the similarity between the diseases and the similarity between the obtained target proteins, wherein the specific processes of obtaining the target proteins and calculating the similarity between the diseases are as follows: firstly, acquiring a comprehensive score among drug-related target proteins from a String database as similarity information of the target proteins; secondly, acquiring a correlation matrix between the medicines and the related diseases, and calculating a Pearson correlation coefficient of each column by using the correlation matrix, namely information provided in the correlation information of each disease and all medicines is used as disease similarity information; performing superposition operation on the acquired drug similarity information under different known evaluation standards on the same dimension, and taking the average value as a drug characteristic matrix; target protein similarity and disease similarity as a feature matrix of the target protein and the disease; reducing the dimension of the feature matrixes of the three entities to the same dimension by utilizing a PCA (principal component analysis) technology, and splicing up and down to construct a composite feature matrix;
2) constructing a drug-target protein heterogeneous network, a drug-disease heterogeneous network, a target protein-disease heterogeneous network and transpositions of the heterogeneous networks: according to the association information between the entities, the specific construction process of the heterogeneous network is as follows: if the association relationship exists between the current Drug i and the Target protein j, the corresponding position element Drug-Target in the heterogeneous networkijThe value is 1, the value of the corresponding position element is 0, and finally a sparse matrix Drug-Target with the values of 0 and 1 is obtained; similarly, a Target-distance heterogeneous network and a Target-distance heterogeneous network are constructed; the heterogeneous network constructed by the association information between the entities is transposed to finally obtain a heterogeneous network set between different entities
Figure BDA0003476698770000041
Namely Drug-Target protein heterogeneous network (Drug-Target), Drug-Disease heterogeneous network (Target-Disease), Target protein-Drug heterogeneous network (Drug-Target-Drug)T) Disease-Drug heterogeneous network (Drug-Disease)T) Disease-Target protein heterogeneous network (Target-Disease)T);
3) Heterogeneous network set obtained based on step 2)
Figure BDA0003476698770000051
Acquiring potential association information among three entities of drug-target protein-disease by using a Graph transducer Layer, and constructing a new potential association information matrix; the concrete implementation of the Graph transform Layer is as follows:
Figure BDA0003476698770000052
wherein phi is the convolution layer, wφ∈R1×1×KIs a parameter of convolution layer phi; assembling a Graph Transformer layer from a heterogeneous network
Figure BDA0003476698770000053
Select the adjacency matrix (heterogeneous networks of different types) and select the adjacency matrix through twoOf the adjacency matrix Q1And Q2The new graph structure is learned by matrix multiplication; the soft selection of the adjacency matrix is to be selected from
Figure BDA0003476698770000054
Obtaining non-negative weight, and carrying out 1 x 1 convolution weighted summation on the candidate adjacency matrix;
4) inputting potential correlation information between the drug, the target protein and the disease, which is acquired by the Graph transducer Layer in the step 3), and the composite characteristic matrix constructed in the step 1) into an end-to-end prediction module, and performing ATCCode prediction on the drug node.
In the step 4), the GCN layer is used as a feature extractor in the end-to-end prediction module, the dimensionality reduction operation is carried out by using multiple linear layers, and Dropout is added between the linear layers; the number of GCN layer output nodes is 150, the linear layer 1 comprises 150 neurons, the linear layer 2 comprises 128 neurons, the linear layer 3 comprises 64 neurons, and the linear layer 4 serves as an output layer and comprises 14 neurons; an end-to-end prediction module training and prediction stage, wherein the multi-label classification problem is converted into a predicted target score and a non-target score which are subjected to difference comparison in pairs, a softmax activation function is used for being matched with smooth popularization of a cross entropy loss function on multi-label classification, an extra 0 class is introduced, and the scores of the target classes are larger than S0The scores of all non-target classes are less than S0The specific implementation is completed by the following formula:
Figure BDA0003476698770000061
Ωneg,Ωpossetting threshold S for positive and negative sample sets respectively0And (3) obtaining the final Loss which is the popularization of the softmax activation function and the cross entropy Loss function on the multi-label classification problem:
loss(ytrue,ypred)=logsumexp(ypred-neg,0)+logsumexp(ypred-pos,0)
and (3) by means of the good properties of the logsumex function, balancing the weight and solving the class imbalance problem, training an end-to-end prediction module, and outputting classes larger than 0 in the last layer of linear layer in the final prediction stage, namely the prediction result.
As shown in fig. 1, the specific implementation process of the present invention is as follows:
firstly, construction of drug-target protein-disease complex characteristics
The data sets applied by the method comprise a medicine set, a target protein set and a disease set.
1. Drug and related target protein, disease data acquisition
In the ATCCode study, Chen et al constructed a reference dataset to facilitate model comparison at the ATCCode first level. The baseline dataset contained 3883 compounds, each compound corresponding to one or more of the 14 ATC Code first class categories. The method experiment is carried out after further improvement on the basis of the data set. In the KEGG and drug bank databases, target and disease association data of drugs are collected, 1749 drugs in 3883 drugs have target and disease association information, and finally the 1749 drugs serve as a benchmark dataset of the method.
TABLE 1 details of entity information in the data set of the method
Entity type Statistics of quantity
Medicine 1749
Target proteins 982
Disease and disorder 355
2. Drug similarity information
First, using the 7 similarity information provided by Zhao et al for all drugs in the Chen et al data set, see equation (1), for:
{SMSim,SMExp,SMDat,SMTex,SMCom,SMcp,SMsub}R3883×3883×7#(1)
the "similarity", "experimental", "database", "text mining" and "composite score", similarity calculation tools SIMCOMP and subemp calculate the similarity between pairs of compounds. All information of 1749 drugs required by the method is extracted from the seven similarity score matrixes, and finally, a drug similarity score matrix is obtained and shown in a formula (2) and serves as drug similarity information in the method:
{SMSim,SMExp,SMDat,SMTex,SMCom,SMcp,SMsub}R1749×1749×7#(2)
3. target protein similarity information
According to 982 Target proteins used in the method, a file '9606. protein. info. v11.0' is downloaded from a String library, 982 protein sequence numbers are traversed from the file to obtain a combined score (combined score) between two proteins, and a protein relation score matrix Target is constructed982×982And (3) normalizing the obtained matrix through a formula (3) to finally obtain a protein comprehensive fraction matrix:
Figure BDA0003476698770000071
4. disease similarity information calculation
Constructing a Drug-disease relation matrix by using all drugs in reference data sets such as chen and the like and 355 known diseases meeting the requirements of the method, wherein if the drugs and the diseases have a relation, the corresponding position value in the matrix is 1, otherwise, the corresponding position value is 0, and obtaining a Drug-disease relation sparse matrix Drug-diseasese3883×355. Calculating a Pearson correlation coefficient between each column by using the obtained drug-disease relation matrix to obtain a correlation matrix between diseases, and calculating Pearson correlation by equation (4):
Figure BDA0003476698770000072
a and B represent two different columns in the matrix, i represents the ith row in the current column, and n is 3883.
5. Constructing a composite feature matrix
And (3) performing superposition operation on the 7 similarity matrixes on the same dimension according to the acquired 7 medicine similarity information, namely performing summation operation on the 7 similarity scores of each current medicine, and performing averaging processing to obtain a medicine similarity score matrix used in the method finally, wherein the medicine similarity score matrix is used as a medicine feature matrix. And taking the protein comprehensive score matrix as a target protein characteristic matrix, and taking the inter-disease Pearson correlation coefficient matrix obtained by calculation as a disease characteristic matrix. In order to enable the model to learn enough characteristics and avoid the problems of gradient disappearance and the like in the model learning process caused by too large dimensionality. While the characteristics of related entities are retained to the maximum extent, noise data which are unfavorable to experimental results are removed to a certain extent, the characteristics are mutually independent, valuable information is better provided for ATCCode category classification, and the feature matrixes of the three types of data are sequentially subjected to dimensionality reduction by using a PCA (principal component analysis) technology. Through experiments, the optimal characteristic dimension is 300. After dimension reduction, the feature matrices of the three types of data are spliced to obtain a node composite feature matrix in the final DACPGTN model.
Secondly, constructing heterogeneous networks among different entities of drugs, Target proteins and diseases, in the construction of experimental data, firstly searching information in two databases of KEGG and Drug bank according to 1749 drugs and 982 Target proteins selected from an experimental data set, and constructing a Drug-Target adjacency matrix. Drug-Target if there is a relationship between Drug i and Target protein jijIs 1, otherwise, the value is 0, and finally, the sparse matrix Drug-target with the values of 0 and 1 is obtained1749×982
According to the same principle, a Drug-Disease adjacency matrix is constructed, and if the Drug and the Disease have an association relationship in two databases of KEGG and Drug bank, the Drug-Disease adjacency matrix is constructedijThe value is 1, otherwise, the value is 0, and finally the sparse matrix Drug-Disease is obtained1749×355
Meanwhile, the relation information of 982 medicines and 355 diseases in the experiment is extracted from the existing medicine information database, and a Target-Disease relation matrix is constructed. The definition of the matrix median is similar to the construction of a drug protein relationship matrix, and finally a sparse matrix Target-Disease is obtained982×355
In order to better learn the potential correlation information, the constructed heterogeneous matrix is transposed, and finally six adjacent matrixes (D _ T represents the adjacent matrix Drug-Target) are obtained1749×982D _ D represents the adjacency matrix Drug-distance1749×355T _ D represents the adjacency matrix Target-distance982×355,D_TTRepresents a transposition of Dt, DDTRepresents a transpose of D _ D, T _ DTRepresenting the transpose of T _ D. ).
And thirdly, acquiring potential association information between the drug and the target protein and the disease, and acquiring the potential association information between the drug and the target protein and the disease by using a Graph transformer layer in a Graph transformation network, wherein the Graph transformer layer in the Graph transformation network is a soft choice for different edge types and compound relations, namely a method for searching a new Graph structure by using a plurality of candidate adjacency matrixes to execute more effective Graph convolution and learn more powerful node representation. The concrete realization of the Graph transform Layer is completed by the formula (5):
Figure BDA0003476698770000091
wherein phi is the convolution layer, Wφ∈R1×1×KIs a parameter of convolution layer phi. Graph transform layer from set of adjacency matrices
Figure BDA0003476698770000092
In selecting an adjacency matrix (different types of adjacencyConstruct a network) and pass through two selected adjacency matrices Q1And Q2Learning a new graph structure. The soft selection of the adjacency matrix is to be selected from
Figure BDA0003476698770000093
To obtain non-negative weights, and to perform 1 × 1 convolution weighted summation on the candidate adjacency matrix. In the implementation process, the constructed adjacency matrix is subjected to a Graph transform Layer operation by the formula (6-8), and each Q isiCan be expressed as
Figure BDA0003476698770000094
Figure BDA0003476698770000095
Represents a set of edges, l represents the ith Graph Transformer Layer,
Figure BDA0003476698770000096
representing the weight of the current edge matrix at the l-th layer. And realizing the transfer of the nodes by the multiplication operation of the adjacent matrixes of different types to obtain the connection relation between different nodes. When a Graph Transformer Layer is used, two convolution kernels are provided in the first Layer in the case of a single Layer, and 1 convolution kernel is provided in the other Graph Transformer Layer except the first Layer in the case of a multilayer. And after a new graph structure is obtained according to the weight, multiplication operation between adjacent matrixes is carried out. For enhanced numerical stability, for each layer of the adjacent matrix, a degree matrix D-1Normalizing the Graph structure to obtain the Graph structure output A of the current Graph Transformer Layer(l)
Figure BDA0003476698770000101
Figure BDA0003476698770000102
A(l)=D-1Q1Q2#(8)
Based on the group of heterogeneous networks constructed in the steps, the Graph Transformer Layer is adopted to learn the association information in different heterogeneous networks, and finally, a Graph information matrix representing the potential association between different nodes is obtained.
Fourthly, predicting the drug ATCCode by an end-to-end prediction module
(1) GCN layer performs feature extraction on composite features and potential associated information matrix
After a brand new graph information matrix is obtained, a graph convolution neural network (GCN) is introduced to serve as a feature extractor to perform convolution operation on graph data. For a GCN network, layer-to-layer propagation is performed by equation (9):
Figure BDA0003476698770000103
Figure BDA0003476698770000104
the method is characterized in that the method is a method for generating a new Graph structure for the current input Graph structure, namely a new Graph structure generated after Graph Transformer Layer learning, a potential correlation information matrix,
Figure BDA0003476698770000105
is composed of
Figure BDA0003476698770000106
H is the input characteristic of the current GCN network layer, namely the constructed node composite characteristic matrix W(l)∈Rd×dFor trainable weight matrices, H(l+1)For the feature matrix output of the current GCN network layer, σ represents the activation function Relu.
In order to learn various connection relations among different node types, the output channel of the Graph Transformer Layer 1 multiplied by 1 convolution can be set as a plurality of channels C, and the adjacent matrix Q after weighted summation is used1,Q2Becomes adjacent tensor
Figure BDA0003476698770000107
Passing through l GrasAfter the superposition of the ph Transformer Layer, the tensor is obtained
Figure BDA0003476698770000108
Applying one GCN layer for each channel of the tensor, the multipass is passed through equation (10):
Figure BDA0003476698770000109
| | represents a join operator, C represents the number of output channels,
Figure BDA0003476698770000111
representative tensor
Figure BDA0003476698770000112
Of the ith adjacency matrix, DiRepresents
Figure BDA0003476698770000113
A degree matrix of W ∈ Rd×dRepresenting a trainable cross-channel shared weight matrix with X ∈ RN×dRepresenting a feature matrix, using D for the computation of a directed graph-1A is substituted for
Figure BDA0003476698770000114
And carrying out normalization processing on the adjacency matrix.
Applying the constructed node characteristic matrix and the adjacent tensor obtained by the Graph Transformer Layer to the GCN Layer operation to obtain the output of a specific dimensionality,
(2) multi-layer linear layer dimension reduction prediction
And (3) performing dimensionality reduction processing on the output of the GCN layer by using a plurality of linear layers, taking the feature vector extracted by the GCN module as the input of the first layer of the full-connection layer, and taking the output dimensionality of the last layer of the linear layer as the same as the dimensionality of the ATCCode label vector of the medicine as an ATC classification prediction result of the medicine. In order to solve the over-fitting problem existing in the superposition of the multilayer network, Relu activation function processing is used after the first layer of linear layer, and Dropout layers are added between each subsequent layer of linear layer. The Dropout layer removes the neuron nodes from the network according to a certain probability, for random gradient descent, due to the introduction of the random neuron removal, each iteration trains different networks, the Dropout layer can effectively solve the over-fitting problem, and the generalization capability of the model is improved.
(3) Model optimization algorithm and loss function
In the DACPGTN model training process, learning is carried out by adopting an Adam optimizer random optimization algorithm, the Adam optimizer random optimization algorithm has excellent performance in deep learning, and has great advantages compared with other types of random optimization algorithms.
Loss function reference Su generalizes over the multi-label classification problem with the softmax activation function used in the single label classification problem in cooperation with the Cross Entropy Loss function (Cross entry Loss). In the original single label classification, the cross entropy loss function is defined as (11):
Figure BDA0003476698770000115
n represents the number of all possible classes, SiOf which is a single category. Derived as an approximation of the max function, as shown in equation (12):
Figure BDA0003476698770000121
in the multi-label classification problem, the score of each target class is also expected to be not less than that of each non-target class, and the popularization of loss is obtained according to the same principle, formula (13)
Figure BDA0003476698770000122
Ωneg,ΩposRespectively positive and negative sample sets.
In multi-label problem prediction, the number k of labels that a sample has is a non-fixed constant, and a threshold is needed to determine which classes to output. To this end, an additional class 0 is introduced, with the desired scores of all target classesGreater than S0The scores of all non-target classes are less than S0To obtain equation (14):
Figure BDA0003476698770000123
if the threshold S is set0Simplifying equation (14) by 0 yields equation (15):
Figure BDA0003476698770000124
finally, a loss function formula (16) is obtained, namely the popularization of the softmax activation function and the cross entropy loss function on the multi-label classification problem:
loss(ytrue,ypred)=logsumexp(ypred-neg,0)+logsumexp(ypred-pos,0)=logsumexp((ypred-ytrue),0)+logsumexp((ypred-(1-ytrue)),0)#(16)
ytrueas a true label for the drug, ypredFor predicting the label for a drug, ypred-neg,ypred-posPositive and negative sample sets are predicted for the drug, respectively. And in the prediction stage of the model, outputting the class with the output larger than 0 in the last layer of linear layer. Compared with the method in the prior ATCCode classification research, the method does not convert the multi-label problem into a plurality of two classification problems, but converts the multi-label problem into the comparison of the target class score and the non-target class score, solves the class imbalance and automatically balances the weight of each item by virtue of the good property of the logsumexp function.
Fifth, experiment verification
1. Evaluation index
In order to verify the effectiveness of the method, the method adopts ten times of cross validation to carry out experiments and tests the prediction performance of the DACPGTN model.
(1) Cross validation by ten folds
The K-fold cross validation is a common cross validation method in deep learning and is commonly used for more rigorously evaluating the performance of a model, and in the performance validation of the method, 10-fold cross validation is used for evaluating the performance of the model. For each trade, the drug samples in the data set are divided into (training set: validation set): test set (9: 1): 1, 10 fold results were averaged for each 10 fold cross validation. And finally, performing 10-fold cross validation for ten times to obtain an average value, evaluating the performance of the model and ensuring that the error of the experimental result is as small as possible.
(2) Evaluation index
In the multi-label classification problem, because one or more labels exist in a single sample, the traditional single-label evaluation index does not have practical significance here, and compared with the traditional single-label evaluation standard, the evaluation standard of the multi-label problem is more complex and finer. Chou et al defined 5 evaluation criteria for evaluating the performance of multi-label classifiers, and the previous ATCCode label classification problem studies were compared using the evaluation criteria, and in order to ensure the fairness of the experiments, the method also used the evaluation criteria in the experiments. The evaluation criteria are specifically defined in the formulae (17-21):
Figure BDA0003476698770000141
Figure BDA0003476698770000142
Figure BDA0003476698770000143
Figure BDA0003476698770000144
Figure BDA0003476698770000145
wherein N is the total number of samples, M is the number of labels, the operator | · | is used for calculating the number of elements in the set, U/# represents the union/intersection operation of the set, YiTrue mark representing current sample iSign vector, Yi *A prediction label vector representing the current sample i passing through the model, and K representing a function for judging whether the two vectors are identical, which is defined by formula (22):
Figure BDA0003476698770000146
2. results of the experiment
To evaluate the effectiveness of DACPGTN, DACPGTN was compared to five other methods (CGATCRPred, iATC-NRAKBL, iATC _ mISF, ML-KNN, and RandomForest). CGATCCId is a medicine ATCCode prediction method based on medicine similarity information and label correlation information; the iATC-NRAKBL is a medicine ATCCode prediction method based on a medicine interaction network and a RAKEL algorithm; the iATC _ mISF is a method for predicting the drug ATCCode based on the drug chemistry-chemistry interaction, structure and fingerprint similarity and by using a Gaussian kernel regression method as a classifier; ML-KNN and RandomForest are general classification methods in multi-label classification. For the 5 methods of comparison, the specific drug ATCCode prediction method, the parameter settings were all the same as their determined optimal parameters. For the basic multi-label classification method, the parameters are all set to default. The parameters set for the DACPGTN method are shown in table 2.
TABLE 2 DACPGTN method parameter settings
Number of Grapb Transformer Layer 1
Number of output channels 2
Training epochs 250
Learning rate 0.005
Weight decay 0.001
Number of GCN layers 1
Input feature dimension 300
GCN layer output dimension 150
FC1 neuron number 150
FC2 neuron number 128
FC3 neuron number 64
FC4 neuron number 14
Dropout 0.2
(1) Ten-fold cross validation analysis
And performing comparison experiments on the data set, performing ten-fold cross validation on all the experiments for 10 times, and taking an average value to ensure the fairness of the comparison experiments. The specific experimental results are listed in the following table:
TABLE 3 DACPGTN method vs. other methods results (10X 10-fold CV)
Classifier Aiming Coverage Accuracy Absolute true Absolute false
DACPGTN 0.8543 0.8517 0.8320 0.7902 0.0241
CGATCPred 0.7864 0.8022 0.7711 0.7290 0.0338
IATC-NRAKEL 0.7744 0.8020 0.7550 0.6947 0.0376
iATC_mISF 0.7094 0.7127 0.7036 0.6306 0.0244
ML-KNN 0.7293 0.7071 0.6861 0.6300 0.0433
RandomForest 0.6723 0.6533 0.6471 0.6187 0.0368
As can be seen from Table 3, the DACPGTN method of the present invention is most effective in predicting the current data set. Compared with the optimal model CGATCRPred in the current medicine ATCCode classification problem, the optimal model CGATCRPred is improved by 6.8% on Aiming, 5% on Coverage, 5.9% on Accuracy and 5.8% on Absolutetree. Of the five evaluation standards, Accuracy and Absolutetree are the most important evaluation standards, and the DACPGTN method is improved to a certain extent on the two indexes. These results indicate that when a pharmaceutical compound has correlation information between a target protein and a disease, the DACPGTN method of the present invention can learn potential correlation information between the drug, the target protein, and the disease from a plurality of heteromorphic maps using a Graph Transformer Layer in a Graph-transformed network. By integrating the correlation information and the composite characteristics among various nodes, better classification performance can be obtained in the ATCCode classification.
(2) Influence of output dimension of GCN layer on experimental result
In the experiment, the GCN Layer provides classification information for an end-to-end prediction stage by learning a composite feature matrix and a potential correlation information matrix obtained by a Graph Transformer Layer. In order to verify the influence of the characteristic output dimension of the GCN layer nodes on the experimental result and ensure that the model achieves the best performance, the following experiment is carried out, and the result is shown in FIG. 5. The input dimension dim of the original node of the GCN layer is 300, 4 output dimensions are preset, and the performance of different output dimensions of the node of the GCN layer on 5 evaluation standards is obtained through a 10-fold cross validation experiment. As can be seen from fig. 5, the model achieves the best prediction performance when the GCN layer output dimension is 150. Therefore, the node output dimension of the prediction module GCN layer is set to dim 150, and all experiments are performed on this parameter.
(3) Ablation experiment,
In order to explain the drug-target protein correlation information and the drug-disease correlation information more reasonably, potential correlation information of different nodes is obtained after Graph Transformer Layer learning, and the influence on the classification problem of the drug ATC Code is avoided. And respectively taking the drug-target protein correlation information and the drug-disease correlation information as the input of a Graph transducer Layer differential map, and reconstructing a node composite characteristic matrix as the input of a GCN end-to-end prediction module. The results of 10-fold cross validation using the same parameters as in the above experiment are shown in Table 4.
TABLE 4 ablation test results
Classifier Aiming Coverage Accuracy Absolutetrue Absolutefalse
DACPGTN-Disease 0.8442 0.8437 0.8231 0.7782 0.02516
DACPGTN-Target 0.8327 0.8307 0.8051 0.7536 0.02875
As can be seen from the above table, when only the drug-target protein related information or only the drug-disease related information is inputted as the Graph Transformer Layer, the performance of the present invention is somewhat degraded, and the individual drug target protein related information is better than the individual drug disease information. Because the related information of the drug target protein is more than that of the drug disease, more potential related information among the nodes can be acquired, and more valuable information can be provided in the classification problem. In the ATC Code classification problem, the DACPGTN method can obtain better prediction performance by considering multi-source associated information compared with the method only considering single associated information. Fully shows that the DACPGTN method can extract information useful for classification from multi-source associated information, obtains a new graph structure by learning different heterogeneous graphs, and has obvious advantages on the ATCCode classification problem after learning of an end-to-end prediction module.
The above-described embodiments are merely preferred examples of the present invention, and not intended to limit the scope of the invention, so that equivalent changes or modifications in the structure, features and principles described in the present invention should be included in the claims of the present invention.

Claims (2)

1. A medicine ATCCode prediction method based on a graph transformation network is characterized by comprising the following steps:
1) the method comprises the following steps of obtaining target proteins and diseases related to the drugs by using the known drugs, and calculating the similarity between the diseases and the similarity between the obtained target proteins, wherein the specific processes of obtaining the target proteins and calculating the similarity between the diseases are as follows: firstly, acquiring a comprehensive score among drug-related target proteins from a String database as similarity information of the target proteins; secondly, acquiring a correlation matrix between the medicines and the related diseases, and calculating a Pearson correlation coefficient of each column by using the correlation matrix, namely information provided in the correlation information of each disease and all medicines is used as disease similarity information; performing superposition operation on the acquired drug similarity information under different known evaluation standards on the same dimension, and taking the average value as a drug characteristic matrix; target protein similarity and disease similarity as a feature matrix of the target protein and the disease; reducing the dimension of the feature matrixes of the three entities to the same dimension by utilizing a PCA (principal component analysis) technology, and splicing up and down to construct a composite feature matrix;
2) constructing a drug-target protein heterogeneous network, a drug-disease heterogeneous network, a target protein-disease heterogeneous network and transpositions of the heterogeneous networks: according to the association information between the entities, the specific construction process of the heterogeneous network is as follows:
Figure DEST_PATH_IMAGE001
if the association relationship exists between the current drug i and the target protein j, corresponding position elements in the heterogeneous network
Figure DEST_PATH_IMAGE003
The value is 1, the value of the corresponding position element is 0, and finally the sparse matrix with the values of 0 and 1 is obtained
Figure 180991DEST_PATH_IMAGE001
(ii) a In the same way, construct
Figure DEST_PATH_IMAGE005
A heterogeneous network,
Figure 377617DEST_PATH_IMAGE007
A heterogeneous network; the heterogeneous network constructed by the association information between the entities is transposed to finally obtain a heterogeneous network set between different entities
Figure 551853DEST_PATH_IMAGE009
I.e. drug-target protein heterogeneous network (
Figure 820023DEST_PATH_IMAGE011
) Drug-disease heterogeneous network (
Figure 15512DEST_PATH_IMAGE013
) Target protein-disease heterogeneous network (
Figure 101149DEST_PATH_IMAGE007
) Target protein-drug heterogeneous network (
Figure 873933DEST_PATH_IMAGE015
) Disease-drug heterogeneous network (
Figure 821160DEST_PATH_IMAGE017
) Disease-target protein heterogeneous network (
Figure 887468DEST_PATH_IMAGE019
);
3) Based on step 2) obtainingHeterogeneous network aggregation
Figure 753792DEST_PATH_IMAGE009
Acquiring potential association information among three entities of drug-target protein-disease by using a Graph transducer Layer, and constructing a new potential association information matrix; the concrete implementation of the Graph transform Layer is as follows:
Figure 889239DEST_PATH_IMAGE021
wherein
Figure DEST_PATH_IMAGE023
Is a coiled-up layer, and is,
Figure 420583DEST_PATH_IMAGE025
is a convolution layer
Figure 590664DEST_PATH_IMAGE023
The parameters of (1); assembling a Graph Transformer layer from a heterogeneous network
Figure 893470DEST_PATH_IMAGE009
Select an adjacency matrix (heterogeneous networks of different types) and pass through two selected adjacency matrices
Figure DEST_PATH_IMAGE027
And
Figure 801033DEST_PATH_IMAGE029
the new graph structure is learned by matrix multiplication; the soft selection of the adjacency matrix is to be selected from
Figure DEST_PATH_IMAGE031
Obtaining non-negative weight, and carrying out 1 x 1 convolution weighted summation on the candidate adjacency matrix;
4) inputting potential correlation information between the drug, the target protein and the disease, which is acquired by the Graph transducer Layer in the step 3), and the composite characteristic matrix constructed in the step 1) into an end-to-end prediction module, and performing ATCCode prediction on the drug node.
2. The graph transformation network-based drug ATCCode prediction method according to claim 1, wherein: in the step 4), the GCN layer is used as a feature extractor in the end-to-end prediction module, the dimensionality reduction operation is carried out by using multiple linear layers, and Dropout is added between the linear layers; the number of GCN layer output nodes is 150, the linear layer 1 comprises 150 neurons, the linear layer 2 comprises 128 neurons, the linear layer 3 comprises 64 neurons, and the linear layer 4 serves as an output layer and comprises 14 neurons; training and predicting stage of end-to-end predicting module, comparing the difference between the target score and non-target score of the multi-label classification problem and utilizing
Figure DEST_PATH_IMAGE033
The activation function is matched with the smooth popularization of the cross entropy loss function on multi-label classification, and an additional class 0 is introduced to ensure that the scores of the target classes are all larger than those of the target classes
Figure DEST_PATH_IMAGE035
The scores of all non-target classes are less than
Figure 41128DEST_PATH_IMAGE035
The specific implementation is completed by the following formula:
Figure DEST_PATH_IMAGE037
Figure DEST_PATH_IMAGE039
Figure DEST_PATH_IMAGE041
setting threshold values for positive and negative sample sets respectively
Figure 691814DEST_PATH_IMAGE035
=0, the Loss is obtained
Figure 165521DEST_PATH_IMAGE033
Popularization of an activation function and a cross entropy loss function on a multi-label classification problem:
Figure DEST_PATH_IMAGE043
by means of
Figure DEST_PATH_IMAGE045
And in the final prediction stage, the class which is output to be more than 0 in the last layer of linear layer is the prediction result.
CN202210063363.4A 2022-01-18 2022-01-18 Medicine ATCCode prediction method based on graph transformation network Pending CN114420310A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202210063363.4A CN114420310A (en) 2022-01-18 2022-01-18 Medicine ATCCode prediction method based on graph transformation network

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210063363.4A CN114420310A (en) 2022-01-18 2022-01-18 Medicine ATCCode prediction method based on graph transformation network

Publications (1)

Publication Number Publication Date
CN114420310A true CN114420310A (en) 2022-04-29

Family

ID=81274545

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210063363.4A Pending CN114420310A (en) 2022-01-18 2022-01-18 Medicine ATCCode prediction method based on graph transformation network

Country Status (1)

Country Link
CN (1) CN114420310A (en)

Cited By (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115458046A (en) * 2022-10-09 2022-12-09 兰州大学 Method for predicting drug target binding property based on parallel deep fine-grained model
CN115458148A (en) * 2022-08-30 2022-12-09 中国人民解放军总医院第三医学中心 Intelligent selection method and intelligent selection device for triage method
CN115497555A (en) * 2022-08-16 2022-12-20 哈尔滨工业大学(深圳)(哈尔滨工业大学深圳科技创新研究院) Multi-species protein function prediction method, device, equipment and storage medium
CN116705194A (en) * 2023-06-06 2023-09-05 之江实验室 Method and device for predicting drug cancer suppression sensitivity based on graph neural network
CN116805513A (en) * 2023-08-23 2023-09-26 成都信息工程大学 Cancer driving gene prediction and analysis method based on isomerism map transducer framework
CN116705194B (en) * 2023-06-06 2024-06-04 之江实验室 Method and device for predicting drug cancer suppression sensitivity based on graph neural network

Cited By (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115497555A (en) * 2022-08-16 2022-12-20 哈尔滨工业大学(深圳)(哈尔滨工业大学深圳科技创新研究院) Multi-species protein function prediction method, device, equipment and storage medium
CN115497555B (en) * 2022-08-16 2024-01-05 哈尔滨工业大学(深圳)(哈尔滨工业大学深圳科技创新研究院) Multi-species protein function prediction method, device, equipment and storage medium
CN115458148A (en) * 2022-08-30 2022-12-09 中国人民解放军总医院第三医学中心 Intelligent selection method and intelligent selection device for triage method
CN115458046A (en) * 2022-10-09 2022-12-09 兰州大学 Method for predicting drug target binding property based on parallel deep fine-grained model
CN115458046B (en) * 2022-10-09 2023-08-11 兰州大学 Method for predicting drug target binding property based on parallel deep fine granularity model
CN116705194A (en) * 2023-06-06 2023-09-05 之江实验室 Method and device for predicting drug cancer suppression sensitivity based on graph neural network
CN116705194B (en) * 2023-06-06 2024-06-04 之江实验室 Method and device for predicting drug cancer suppression sensitivity based on graph neural network
CN116805513A (en) * 2023-08-23 2023-09-26 成都信息工程大学 Cancer driving gene prediction and analysis method based on isomerism map transducer framework
CN116805513B (en) * 2023-08-23 2023-10-31 成都信息工程大学 Cancer driving gene prediction and analysis method based on isomerism map transducer framework
CN117976244B (en) * 2024-04-01 2024-06-07 天津理工大学 Medicine interaction prediction method and device based on multidimensional characteristics

Similar Documents

Publication Publication Date Title
CN114420310A (en) Medicine ATCCode prediction method based on graph transformation network
Nadif et al. Unsupervised and self-supervised deep learning approaches for biomedical text mining
CN110021341B (en) Heterogeneous network-based GPCR (GPCR-based drug and targeting pathway) prediction method
Zare et al. Scoring relevancy of features based on combinatorial analysis of Lasso with application to lymphoma diagnosis
CN113936735A (en) Method for predicting binding affinity of drug molecules and target protein
Dewi et al. Drug-drug interaction relation extraction with deep convolutional neural networks
CN112599187B (en) Method for predicting drug and target protein binding fraction based on double-flow neural network
CN111370073B (en) Medicine interaction rule prediction method based on deep learning
CN116386899A (en) Graph learning-based medicine disease association relation prediction method and related equipment
CN115376704A (en) Medicine-disease interaction prediction method fusing multi-neighborhood correlation information
CN115985520A (en) Medicine disease incidence relation prediction method based on graph regularization matrix decomposition
Singh et al. Predicting potential applicants for any private college using LightGBM
Galeano et al. Machine learning prediction of side effects for drugs in clinical trials
Ye et al. Drug-target interaction prediction via graph auto-encoder and multi-subspace deep neural networks
Xu et al. Dilated convolution capsule network for apple leaf disease identification
Niyaz et al. Augmenting knowledge distillation with peer-to-peer mutual learning for model compression
CN115691817A (en) LncRNA-disease association prediction method based on fusion neural network
CN116206775A (en) Multi-dimensional characteristic fusion medicine-target interaction prediction method
Iraji et al. Druggable protein prediction using a multi-canal deep convolutional neural network based on autocovariance method
Kumar et al. Deep learning in gene expression modeling
Sathe et al. Gene expression and protein function: A survey of deep learning methods
Phan et al. Deep learning based biomedical NER framework
Bao et al. Characterizing tissue composition through combined analysis of single-cell morphologies and transcriptional states
Sun et al. An enhanced LRMC method for drug repositioning via gcn-based HIN embedding
Agrawal et al. Implementation of Protein Sequence Classification for Globin family using Ensemble Learning

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination