CN113571125A - Drug target interaction prediction method based on multilayer network and graph coding - Google Patents

Drug target interaction prediction method based on multilayer network and graph coding Download PDF

Info

Publication number
CN113571125A
CN113571125A CN202110865457.9A CN202110865457A CN113571125A CN 113571125 A CN113571125 A CN 113571125A CN 202110865457 A CN202110865457 A CN 202110865457A CN 113571125 A CN113571125 A CN 113571125A
Authority
CN
China
Prior art keywords
network
target
drug
similarity
data
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202110865457.9A
Other languages
Chinese (zh)
Inventor
刘闯
王逸伟
詹秀秀
张子柯
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Hangzhou Normal University
Original Assignee
Hangzhou Normal University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Hangzhou Normal University filed Critical Hangzhou Normal University
Priority to CN202110865457.9A priority Critical patent/CN113571125A/en
Publication of CN113571125A publication Critical patent/CN113571125A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16BBIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
    • G16B15/00ICT specially adapted for analysing two-dimensional or three-dimensional molecular structures, e.g. structural or functional relations or structure alignment
    • G16B15/30Drug targeting using structural data; Docking or binding prediction
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16BBIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
    • G16B40/00ICT specially adapted for biostatistics; ICT specially adapted for bioinformatics-related machine learning or data mining, e.g. knowledge discovery or pattern finding

Landscapes

  • Life Sciences & Earth Sciences (AREA)
  • Engineering & Computer Science (AREA)
  • Health & Medical Sciences (AREA)
  • Physics & Mathematics (AREA)
  • Medical Informatics (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Spectroscopy & Molecular Physics (AREA)
  • Biotechnology (AREA)
  • General Health & Medical Sciences (AREA)
  • Theoretical Computer Science (AREA)
  • Chemical & Material Sciences (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Biophysics (AREA)
  • Evolutionary Biology (AREA)
  • Medicinal Chemistry (AREA)
  • Pharmacology & Pharmacy (AREA)
  • Crystallography & Structural Chemistry (AREA)
  • Artificial Intelligence (AREA)
  • Bioethics (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Data Mining & Analysis (AREA)
  • Databases & Information Systems (AREA)
  • Epidemiology (AREA)
  • Evolutionary Computation (AREA)
  • Public Health (AREA)
  • Software Systems (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention discloses a medicine target interaction prediction method based on multilayer network and graph coding. The method comprises a data acquisition module, a data preprocessing module, a feature learning module, a model algorithm design module and a result evaluation module. The data preprocessing module constructs a drug and protein network and processes heterogeneous images. The feature learning module comprises self-supervision learning on a structural graph encoder, vector encoding on a graph and isomorphic vector processing, and topology information of the graph is represented in a vector form. The model algorithm design module comprises the steps of constructing a cross validation set and designing a prediction model. And the result evaluation module verifies the prediction effect of the model by adopting an ROC curve based on a confusion matrix and a PR curve based on an accuracy and recall sequence. The method of the invention researches the medicine and the target from the aspects of data mining and graph, and predicts the interaction between the medicine and the target through the generated graph structure information and the subsequent tree model.

Description

Drug target interaction prediction method based on multilayer network and graph coding
Technical Field
The invention belongs to the technical field of data mining, and particularly relates to a medicine target interaction prediction method based on a multilayer network and graph coding.
Background
With the rapid development of machine learning, the development of biological detection technologies such as third-generation gene sequencing and the like, and the arrival of a big data era in the field due to the rapid increase of biological data volume, more and more researchers and companies aim at the field of AI auxiliary drug development. The computer algorithm is used for assisting in screening the target targets, and the most intuitive advantage is that the computer is used for screening candidate drugs and narrowing the candidate range, so that the period of new drug discovery is greatly shortened, and the research consumables of new drug discovery are reduced. Practical application data indicates that AI technology can substantially reduce drug development costs by about 35%. By analyzing the net income trend of the international top medicine enterprises in recent years, the net income of most medicine enterprises is increased to different degrees after the AI auxiliary medicine is introduced for research and development. The AI technology can also perform multi-specific target analysis on the drug to predict multiple targets of the drug, thereby revealing the complex action mechanism of some diseases. In addition, the AI technology can also improve the accuracy and safety of the prediction of the drug, and search the side effect mechanism of the drug. Therefore, the AI technology can greatly simplify the process of research and development of new drugs on the whole, save research and development expenses, and assist drug enterprises in quickly researching and developing new drugs.
Disclosure of Invention
The invention aims to provide a method for predicting the interaction of drug targets based on a multilayer network and graph coding, which can eliminate the randomness of clinical experiments, narrow the screening range and accelerate the test period.
The invention constructs nine drug related networks (drug interaction network, drug disease related network, drug side effect related network, chemical similarity network of drug, therapeutic similarity network of drug, action target sequence similarity network of drug, biological process similarity network of drug, molecular function similarity network of drug, action cell component similarity network of drug), six target related networks (target interaction network, target disease related network, target sequence similarity network, target biological process similarity network, cell component similarity network where target is located, target molecular function similarity network) and drug target interaction network used as label. And respectively training corresponding structural self-encoders by using the networks independently, encoding the nodes into vectors by using the trained self-encoders, and finally splicing the encoded vectors of the nodes in different networks to form final characteristic vectors. And (3) sending the drug target pairs to be predicted into a trained lifting tree model (the model is obtained by linearly adding a series of decision trees constructed based on a training set) to obtain a final evaluation score.
The method comprises a data acquisition module, a data preprocessing module, a feature learning module, a model algorithm design module and a result evaluation module.
(1) The data acquisition module comprises:
(1-1) for drugs, collecting drug-drug interaction relationship data, drug-disease relationship data, drug-side effect relationship data, and six different types of drug-pair similarity relationship data, including: chemical fingerprint data of the medicine, therapeutic data of the medicine, peptide chain data of an action target of the medicine, biological process data of the medicine, molecular function data of the medicine and action cell component data of the medicine;
(1-2) for the target, namely protein, collecting the data of the interaction relation between the target and the target, the data of the relation between the target and the disease and the data of the similarity relation between four different types of targets, comprising: peptide chain data of the target spot, biological process data of the target spot, cell component data of the target spot and target spot molecule function data;
(1-3) collecting the interaction relation data of the medicine and the target.
(2) The data preprocessing module comprises a medicine and target related network and a multilayer network;
(2-1) the construction of the drug and target related network comprises:
A. for single-class object interaction relation data, constructing homogeneous interaction network, including drug interaction network G1DTarget interaction network G1T
B. For objects of different classesInteraction relationship data, constructing heterogeneous interaction networks, including drug disease-related network GD_DINetwork G relating to side effects of drugsD_SETarget disease-related network GT_DI
C. Collecting drug information of different dimensions, and constructing drug similarity network including chemical similarity network G of drug2DTherapeutic similarity network of drugs G3DAnd the action target point sequence similarity network G of the medicine4DBiological process similarity network G of drugs5DMolecular functional similarity network G of drugs6DNetwork of similarity of active cellular components of drugs G7D
D. Collecting target point information of different dimensions, and constructing a target point similarity network including a target point sequence similarity network G2TTarget biological process similarity network G3TSimilarity network G of cellular components of target site4TTarget molecule functional similarity network G5T
E. Construction of drug target interaction network GD_T
(2-2) the method for generating the multilayer network comprises the steps of generating a medicine multilayer network and generating a target multilayer network, and comprises the following specific steps:
(2-2-1) first, the drug disease-related network GD_DIDisease similarity network G decomposed and converted into drug8D=(V8D,E8D) In which V is8D、E8DRespectively representing a drug node set in the network and an edge weight set of disease similarity between two drugs; margin for disease similarity of drugs
Figure BDA0003187370820000031
xD_MAnd yD_MTwo drugs are shown in GD_DIThe corresponding row vector in the adjacency matrix of (a) represents the vector modulo;
network G relating drug side effectsD_SENetwork G of similarity of side effects of drug decomposition and conversion9D=(V9D,E9D) In which V is9D、E9DAre respectively provided withA set of drug nodes in the network, a set of edge weights representing side effect similarities between two drugs; margin for similarity of side effects of drugs
Figure BDA0003187370820000032
xD_SEAnd yD_SETwo drugs are shown in GD_SEThe corresponding row vector in the adjacency matrix of (a);
target disease-related network GT_DIDecomposing and converting into target disease similarity network G6T=(V6T,E6T) Wherein V is6T、E6TRespectively representing a target point node set in the network and an edge weight set of disease similarity between two target points; margin for disease similarity of target points
Figure BDA0003187370820000033
xT_DIAnd yT_DIIndicates that two target points are at GT_DICorresponding row vectors in the adjoining matrix of (a);
(2-2-2) then combining the drug-related networks into a drug multilayer network GD={GiD=(ViD,EiD) I is the drug network number, i belongs to [1,9 ]](ii) a Combining target related networks into a target multilayer network GT={GjT=(VjT,EjT) J is the network number of the target point, j belongs to [1,6 ]]。
(3) The feature learning module comprises a training structural self-encoder, encoding output and similar feature vector processing;
(3-1) training the structural autoencoder: drug multilayer network GDWith target multilayer network GTCorrespondingly training a structural self-encoder for each layer;
(3-2) encoding output: respectively coding the corresponding network layers by using the coding ends of the trained structural self-coder to obtain multilayer vectors of all the medicines and the target spots;
(3-3) processing the similar feature vectors: splicing the multiple layers of vectors of a drug to obtain the final characteristic vector representation of the drug; and splicing the multi-layer vectors of a target point to obtain the final characteristic vector representation of the target point.
(4) The model algorithm design module comprises a training sample construction module, a training and evaluation model and a medicine target point interaction prediction module;
(4-1) constructing a training sample: constructing a training sample by adopting a PairWise model, randomly dividing data into M parts, and performing M-fold cross validation, namely selecting one part as a validation set and the rest as a training set each time, adjusting model parameters through the overall performance of the cross validation, wherein M is a positive integer greater than 3;
(4-2) training and evaluating the model: building a lifting tree by adopting a lightweight gradient lifting decision tree and taking the decision tree as a weak learner, namely building the decision tree T (x, theta) by adopting iterationl) Wherein x and θlRespectively inputting a characteristic vector and a learnable parameter of the first decision tree;
(4-3) predicting drug target interaction: and according to the optimal prediction model obtained by the result evaluation module, calculating the interaction probability of all the drug target pairs, and screening out the drug target pairs with high possibility as candidate drug target pairs capable of interacting as prediction results.
(5) The result evaluation module verifies the prediction effect of the model by adopting an ROC curve and a PR curve; the method comprises the following steps:
(5-1) plotting ROC curves: defining the false positive rate FPR as a horizontal axis and the true positive rate TPR as a vertical axis, wherein the larger the area AUROC value covered by the ROC curve is, the better the prediction effect of the model is represented;
real positive rate TPR of ROC curveαAnd false positive rate FPRαThe calculation by the confusion matrix is as follows:
Figure BDA0003187370820000041
the drug target pair is a positive sample in the presence of interaction, and is a negative sample in the absence of interaction; TPαIndicates the number of positive samples, FP, predicted from the positive samples in the test setαRepresenting negative examples in a test setMeasured as the number of positive samples, FNαIndicates the number of positive samples predicted as negative samples, TNαRepresenting the number of negative samples predicted in the test set as negative samples; α represents a prediction confidence;
(5-2) drawing a PR curve: precision at different prediction confidence alphaαRecall with recall recallingαComposition of precision-recall sequence:
Figure BDA0003187370820000051
drawing a precision-recall curve, namely a PR curve, by taking the horizontal axis as recall rate and the vertical axis as precision rate, wherein AUPR (area under PR) can reflect the classification effect of the classifier on the whole, and the larger AUPR value of the area under the PR curve is, the better the prediction effect of the model is;
(5-3) evaluation of model: and (4) according to the prediction result of the step (4-3), utilizing the drawn ROC curve and PR curve, calculating AUROC and AUPR, and searching for a model parameter under the optimal prediction result.
The method researches the interaction of the drug target pairs from the aspects of data mining and multilayer networks, abstracts different types of data into the same data structure by constructing the network, and realizes the drug target prediction by combining the methods of the decomposition of heterogeneous networks, the automatic learning of network topological structures by structural self-encoders, tree-based classifiers and the like. Therefore, the method can effectively analyze the drug target data and predict the interaction between the drug target data and the drug target data, thereby providing scientific guidance for the research and development of new drugs, improving the research and development efficiency of the new drugs and promoting the development of medical independent innovation to a certain extent.
Drawings
FIG. 1 is a schematic flow diagram of the process of the present invention.
Detailed Description
The following detailed description of the embodiments of the invention is provided in connection with the accompanying drawings.
The existing 732 drug-related data, 1915 targets (proteins) and corresponding 12904 side effects and 440 disease-related data comprise data of interactions between drug pairs, between drug diseases, between drug side effects, between targets and targets, between targets and diseases, MACCS fingerprint data of drug chemical formula, GO annotation of drug and target, protein sequence data of target, and half-inhibitory concentration data between drug and target.
As shown in fig. 1, a method for predicting drug target interaction based on multilayer network and graph coding comprises a data acquisition module, a data preprocessing module, a feature learning module, a model algorithm design module, and a result evaluation module, and specifically comprises the following steps:
(1) a data acquisition module comprising:
(1-1) for drugs, collecting drug-drug interaction relationship data, drug-disease relationship data, drug-side effect relationship data, and six different types of drug-pair similarity relationship data, including: chemical fingerprint data of the medicine, therapeutic data of the medicine, peptide chain data of an action target of the medicine, biological process data of the medicine, molecular function data of the medicine and action cell component data of the medicine;
(1-2) for the target, namely protein, collecting the data of the interaction relation between the target and the target, the data of the relation between the target and the disease and the data of the similarity relation between four different types of targets, comprising: peptide chain data of the target spot, biological process data of the target spot, cell component data of the target spot and target spot molecule function data;
(1-3) collecting interaction relation data of the medicine and the target;
the above data is downloaded through a public website.
(2) The data preprocessing module comprises a module for constructing a medicine and target related network and generating a multilayer network, and provides a data basis for medicine target prediction, and specifically comprises the following steps:
(2-1) constructing a medicine and target related network, comprising:
(I) for the interaction relation data of the drug and the drug, constructing a drug interaction network G1D=(V1D,E1D),V1DRepresenting a set of drug nodes in the network, E1DRepresents the netThe edge set of the interaction between two drugs in the collateral exists;
constructing a target interaction network G for the interaction relation data of the target and the target1T=(V1T,E1T),V1TRepresenting a set of target nodes in the network, E1TRepresenting a set of edges that have an interaction between two targets in the network;
(II) for the relation data of the medicine and the disease, constructing a medicine disease related network
Figure BDA0003187370820000061
Wherein
Figure BDA0003187370820000062
ED_DIRespectively representing a medicine node set, a disease node set and an edge set of the relation between the medicine and the disease in the network;
for the relation data of the medicine and the side effect, a medicine side effect related network is constructed
Figure BDA0003187370820000063
Wherein
Figure BDA0003187370820000064
ED_SERespectively representing a drug node set, a side effect node set and an edge set of the relationship between the drug and the side effect in the network;
for target and disease relation data, constructing target disease related network
Figure BDA0003187370820000065
Wherein
Figure BDA0003187370820000066
ET_DIRespectively representing a target point node set, a disease node set and an edge set of the relation between a target point and a disease in the network;
(III) for chemical fingerprint data of the medicine, constructing a chemical similarity network G of the medicine2D=(V2D,E2D) In which V is2D、E2DRespectively representing a drug node set and an edge weight set of chemical similarity between two drugs in the network; margin of chemical similarity
Figure BDA0003187370820000071
Wherein a is1And b1Is the bit number of MACCS fingerprints of two drugs respectively, c1The number of the same bit of the two medicines;
for therapeutic data of a drug, a therapeutic similarity network G of the drug is constructed3D=(V3D,E3D) In which V is3D、E3DA set of drug nodes in the network, a set of side weights representing therapeutic similarity between two drugs, respectively; margin of therapeutic similarity
Figure BDA0003187370820000072
Wherein a is2And b2Coding for the respective ATC of the two drugs, c2The number of digits for the same ATC code for both drugs;
constructing a medicine action target point sequence similarity network G for the peptide chain data of the medicine action target point4D=(V4D,E4D) In which V is4D、E4DRespectively representing a drug node set in the network and an edge weight set of action target point similarity between two drugs; margin for similarity of drug action targets
Figure BDA0003187370820000073
Wherein a and b represent the respective targets of the two drugs, TT_T(a, b) shows the sequence similarity of respective targets of the two drugs, mean (-) shows the mean;
for biological process data of the drug, a biological process similarity network G of the drug is constructed5D=(V5D,E5D) In which V is5D、E5DRespectively representing a drug node set in the network and an edge weight set of the similarity of biological processes between two drugs; margin for similarity of pharmacogenomic processes
Figure BDA0003187370820000074
TT_P(a, b) representing the similarity of biological processes at the respective targets of the two drugs;
for the molecular function data of the medicine, a molecular function similarity network G of the medicine is constructed6D=(V6D,E6D) In which V is6D、 E6DRespectively representing a drug node set in the network and an edge weight set of molecular function similarity between two drugs; the boundary of functional similarity of drug molecules
Figure BDA0003187370820000075
TT_M(a, b) shows the molecular functional similarity of the respective targets of the two drugs;
for the acting cell component data of the medicine, constructing an acting cell component similarity network G of the medicine7D=(V7D,E7D) In which V is7D、E7DRespectively representing a drug node set in the network and an edge weight set of similarity of acting cell components between two drugs; margin for similarity of cell components for drug action
Figure BDA0003187370820000081
TT_C(a, b) shows the similarity of the acting cell components of the respective targets of the two drugs;
(IV) constructing a target sequence similarity network G for the peptide chain data of the target2T=(V2T,E2T) In which V is2T、E2TRespectively representing a target point node set and an edge weight set of sequence similarity between two target points in the network; sequence similarity margin
Figure BDA0003187370820000082
Wherein a is3And b3The number of peptide chain sequence positions of two targets respectively, c3The number of bits of the peptide chain sequence which is the same with the two targets;
for the biological process data of the target, a similarity network G of the biological process of the target is constructed3T=(V3T,E3T) In which V is3T、E3TRespectively representing a target point node set in the network and an edge weight set of the similarity of the biological processes between two target points; edge weights T of similarity of target biological processesT_P(a, b) semantic annotation of GO in the biological process of two targets;
constructing a cell component similarity network G for the cell component data of the target point4T=(V4T,E4T) In which V is4T、E4TRespectively representing a target point node set in the network and an edge weight set of the similarity of the cell components between the two target points; margin T of similarity of cellular components at target siteT_C(a, b) semantic annotation of GO of cell components of two target points;
constructing a target molecule functional similarity network G for the target molecule functional data5T=(V5T,E5T) In which V is5T、E5TRespectively representing a target point node set in the network and an edge weight set of molecular function similarity between two target points; edge weight T of target molecule function similarityT_M(a, b) semantic annotation of GO with molecular functions of two targets;
(V) for the interaction relation data of the drug and the target, constructing a drug target interaction network
Figure BDA0003187370820000083
Wherein
Figure BDA0003187370820000084
ED_TRespectively representing a drug node set, a target point node set and an edge set of the relationship between the drug and the target point in the network.
(2-2) generating a multilayer network, including generating a drug multilayer network and generating a target multilayer network:
(2-2-1) network G relating drug diseases to drug diseasesD_DIDisease similarity network G decomposed and converted into drug8D=(V8D,E8D) In which V is8D、E8DRespectively representing a drug node set in the network and an edge weight set of disease similarity between two drugs; medicineBy the similarity of diseases
Figure BDA0003187370820000085
xD_MAnd yD_MTwo drugs are shown in GD_DIThe corresponding row vector in the adjacency matrix of (a) represents the vector modulo;
network G relating drug side effectsD_SENetwork G of similarity of side effects of drug decomposition and conversion9D=(V9D,E9D) In which V is9D、E9DRespectively representing a drug node set in the network and an edge weight set of side effect similarity between two drugs; margin for similarity of side effects of drugs
Figure BDA0003187370820000091
xD_SEAnd yD_SETwo drugs are shown in GD_SEThe corresponding row vector in the adjacent matrix of (2);
target disease-related network GT_DIDecomposing and converting into target disease similarity network G6T=(V6T,E6T) In which V is6T、E6TRespectively representing a target point node set in the network and an edge weight set of disease similarity between two target points; margin for disease similarity of targets
Figure BDA0003187370820000092
xT_DIAnd yT_DIIndicates that two target points are at GT_DIThe corresponding row vector in the adjacency matrix of (2);
(2-2-2) combining a drug interaction network, a drug disease similarity network, a drug side effect similarity network, a drug chemical similarity network, a drug therapeutic similarity network, a drug action target sequence similarity network, a drug biological process similarity network, a drug molecular function similarity network and a drug action cell component similarity network into a drug multilayer network GD={GiD=(ViD,EiD) I is the drug network number, i belongs to [1,9 ]];
Phase of target pointThe interaction network, the disease similarity network of the target, the sequence similarity network of the target, the similarity network of the biological process of the target, the similarity network of the cellular components of the target and the functional similarity network of the target molecule are combined into a target multilayer network GT={GjT=(VjT,EjT) J is the network number of the target point, j belongs to [1,6 ]]。
(3) A feature learning module:
in the study of machine learning related problems, data and features determine the upper limit of the prediction result, and models and algorithms only approximate the upper limit. The feature coding module of the invention solves the problem of feature selection of the first half sentence, namely better learning gene features of a model algorithm, and achieves the most accurate prediction result. The module is based on a drug multilayer network GDWith target multilayer network GTThe method adopts the structural self-encoder to automatically encode the network structure, thereby ensuring the integrity of feature extraction.
(3-1) training the structural autoencoder: drug multilayer network GDWith target multilayer network GTEach layer of (a) correspondingly trains a structural self-encoder, and the training process is as follows:
a. using the adjacent matrix corresponding to the single-layer network as the input of the encoder;
b. after encoding, the output of the encoder is obtained and is used as the input of the decoder;
c. decoding to obtain the output of a decoder, and calculating a loss function by using the adjacency matrix, the output of the encoder and the output of the decoder;
d. calculating the gradient of each parameter of the encoder and the decoder by using a loss function, updating the parameters, wherein the updating step length is a multiple of the negative gradient;
e. repeating steps b through d until the loss function converges.
Said loss function LmThe calculation includes two parts:
first order loss of similarity
Figure BDA0003187370820000101
N is the number of nodes, zpAnd zgRepresenting the coded output vectors, T, of the coder for node p and node g, respectivelypgRepresenting the weight of the connected edge; if it is an interaction network, TpgIt is only possible to take 0 and 1, representing the case of no edge and an edge, respectively; if it is a similarity network, TpgAny value between 0 and 1, inclusive, may be used. The loss function is defined in order to make the feature vectors encoded by drugs or targets with high similarity as similar as possible.
Second order loss of similarity
Figure BDA0003187370820000102
bnAnd
Figure BDA0003187370820000103
representing the encoder input vector and the decoder output vector, respectively, of node n. The purpose of defining the loss function is to enable the decoder to reconstruct the original input vector as much as possible from the encoded vector, so that the encoded vector contains as much information as possible of the original vector.
Total loss function Lm=L2nd+λL1stλ is a penalty term, 0 < λ < 1.
(3-2) encoding output: and respectively coding the corresponding network layers by using the coding ends of the trained structural self-coder to obtain multilayer vectors of all the medicines and the target points.
(3-3) processing the same-class feature vectors:
splicing the multiple layers of vectors of a drug to obtain the final characteristic vector representation of the drug;
and splicing the multi-layer vectors of a target point to obtain the final characteristic vector representation of the target point.
(4) A model algorithm design module comprising:
(4-1) constructing a training sample: the drug target pairs include verified drug target pairs and unverified drug target pairs, including undiscovered but objectively interacting drug target pairs. The invention finds out the drug target pairs which have objective interaction but are not discovered from the unverified drug target pairs. Therefore, it can be assumed that the probability that an unverified drug target pair interacts is certainly not greater than the probability of a verified interaction drug target pair. Based on the assumption, a PairWise model is adopted to construct training samples, namely, a positive sample is extracted from a verified and interacted drug target pair, a negative sample is also extracted from an unverified drug target pair, and training samples are constructed through corresponding positive and negative samples to obtain paired positive and negative training sample sets with the same quantity; and randomly dividing the data into M parts, performing M-fold cross validation, namely selecting one part as a validation set and the rest as a training set each time, and adjusting model parameters through the overall performance of the cross validation, wherein M is a positive integer greater than 3.
(4-2) training and evaluating the model: building a lifting tree by adopting a lightweight gradient lifting decision tree and taking the decision tree as a weak learner, namely building the decision tree T (x, theta) by adopting iterationl) Wherein x and θlThe method comprises the following specific processes of inputting feature vectors and learnable parameters of the first decision tree respectively:
(4-2-1) before each round of decision tree construction, screening small gradient samples by using a gradient-based unilateral sampling (GOSS) algorithm, namely reserving a small part of large gradient samples and randomly selecting a part of small gradient samples to calculate the total variance gain, so that the number of samples is reduced;
(4-2-2) before each round of construction of the decision tree, merging mutually exclusive features by using a mutually Exclusive Feature Bundling (EFB) algorithm, thereby reducing feature dimensions;
(4-2-3) constructing a fitting target for the generated first decision tree when an input feature vector x and a corresponding label y of a certain sample are input based on the screened sample: if l is 1, the fitting target is the label of the sample, wherein the label of the positive sample is 1, and the label of the negative sample is 0; when l is more than or equal to 2, the fitting target is
Figure BDA0003187370820000111
Wherein the lifting tree obtained after the first-1 iteration
Figure BDA0003187370820000112
L is a loss function, and under the binary task, a single sample (x, y) has a predicted value of
Figure BDA0003187370820000113
The time loss function is defined as:
Figure BDA0003187370820000114
(4-2-4) based on the screened samples, fitting the target to construct a binary decision tree, wherein a leaf node of the binary decision tree is split by the following steps: constructing a histogram for each screened feature according to the value range of the feature, calculating the variance gain of each division point by using the histogram, selecting the feature with the maximum variance gain and the division point as the splitting feature of the current node and the optimal division point, and dividing the data of the leaf node corresponding to the optimal division point into two batches; recursion continues until the maximum depth of the tree is reached. The variance gain of feature f based on dataset D at partition point D is expressed as:
Figure BDA0003187370820000115
wherein xl、xl,f、glRespectively representing the ith sample vector, the ith feature of the ith sample vector and the negative gradient thereof,
Figure BDA0003187370820000121
and
Figure BDA0003187370820000122
all features f are smaller and larger than the division point D in the dataset D, respectively.
(4-2-5) performing K rounds of iteration to generate K decision trees;
(4-2-6) deciding K decisionsAdding the trees to generate a final lightweight gradient lifting decision tree
Figure BDA0003187370820000123
For the input feature vector x of the sample, the decision tree output H (x) e [0,1]The probability that the input sample is a positive sample can be interpreted;
(4-3) predicting drug target interaction: and according to the optimal prediction model obtained by the result evaluation module, calculating the interaction probability of all the drug target pairs, and screening out the drug target pairs with high possibility as candidate drug target pairs capable of interacting as prediction results.
(5) The result evaluation module verifies the prediction effect of the model by adopting an ROC curve and a PR curve; the method comprises the following steps:
(5-1) plotting ROC curves: plotting the ROC curve requires generating a confusion matrix, which is also an index for evaluating the model results, is part of the model evaluation, and is represented in the form of a square matrix, displaying the accuracy of the prediction results in a confusion matrix, each column representing the prediction category, the total number of each column representing the number of data predicted as the category, each row representing the true attribution category of data, and the total number of each row representing the number of data instances of the category.
The ROC curve is a new classification model performance evaluation method introduced from the field of medical analysis, is suitable for the research problem of two classifications, and when the ROC curve is drawn, the false positive rate FPR is defined as a horizontal axis, the true positive rate TPR is defined as a vertical axis, the larger the area AUROC value covered by the ROC curve is, namely the closer to 1, the better the prediction effect of the model is represented.
Real positive rate TPR of ROC curveαAnd false positive rate FPRαThe calculation by the confusion matrix is as follows:
Figure BDA0003187370820000124
in the context of drug target prediction, the presence of drug target pair interaction is a positive sample and the absence is a negative sample. TPαIndicates the number of positive samples, FP, predicted from the positive samples in the test setαIndicating the number of negative samples predicted as positive samples in the test set, FNαDenotes the number of predicted positive samples as negative samples, TNαRepresenting the number of negative samples predicted from the test set; α represents a prediction confidence;
(5-2) drawing a PR curve: the rendering of the PR curve requires the generation of precision-recall sequences that are represented by precision at different prediction confidence degrees alphaαRecall with recall recallingαThe calculation formula is as follows:
Figure BDA0003187370820000131
the precision rate describes the accuracy rate of correctly classifying the positive samples under the confidence degree alpha, and the recall rate describes the proportion of correctly classifying the positive samples in the total positive samples under the confidence degree alpha; the two show opposite change trends along with the change of alpha. Therefore, an accuracy-recall ratio pair sequence generated by different alpha is utilized, a horizontal axis is used as a recall ratio, a vertical axis is used as an accuracy ratio to draw a precision-recall curve, namely a PR curve, an area AUPR under the PR curve can reflect the classification effect of the classifier on the whole, and the larger the area AUPR under the PR curve is, the closer the area AUPR is to 1, the better the prediction effect of the expression model is;
(5-3) evaluation of model: and (4) according to the prediction result of the step (4-3), utilizing the drawn ROC curve and PR curve, calculating AUROC and AUPR, and searching for a model parameter under the optimal prediction result.
Screening candidate drugs is a main means for assisting the development of new drugs by AI, wherein the computer modeling (i.e. which data structure is adopted to represent both) and prediction model selection of drugs and targets are the most critical two steps. The method adopts two different computer modeling, namely network nodes and characteristic vectors, for the medicine and the target at different stages. Two data models are described below, using drugs as examples.
The drug networks can well reflect the relationship between drugs, and the multilayer networks formed by different types of drug networks can better reflect the relationship at different angles, thereby providing a new idea for drug screening. Specifically, the drug network represents a single drug as a node, and the interaction between drugs is defined as the connecting edges between nodes. The definition of edges is different for different types of drug networks, thus expressing the relationship between drug pairs at different viewing angles. Taking the chemical similarity network of drugs as an example, the edge weight between node pairs represents the chemical structure similarity between corresponding drug pairs, and the absence of an edge represents that the similarity is 0. In the process of constructing a drug network, the edge weights are usually normalized so that the weight values range from 0 to 1.
The eigenvector is an array of real numbers, each of which represents an eigenvalue and contains specific information in the application. In the method, the medicine characteristic vector is obtained by a structural self-encoder based on medicine network encoding, and the topological information of the network is contained in the characteristic value. The autoencoder is an auto-supervised representation learning method, and can convert nodes into feature vectors only according to input (here, a medicine network), and the dimensionality of the feature vectors is far smaller than the number of the nodes. Compared with the traditional one-hot coding, the method greatly reduces the complexity and the sparsity of the data. The structural self-encoder adopted by the method considers the first-order adjacency and the second-order adjacency of the network and more comprehensively comprises the whole structure of the network.
Network representation, vector coding and prediction model training of drugs and targets are the core content of comparison in drug target prediction algorithms. The algorithm model avoids the blindness of manual screening, greatly saves time cost and capital cost, and represents the information into a uniform data form by integrating the information of different aspects of the medicine and the target spot, and provides a feasible paradigm for the future medicine target spot prediction by a plurality of relatively independent and clear modules, thereby improving the prediction accuracy and ensuring the high efficiency, flexibility and expandability of the algorithm.

Claims (8)

1. The medicine target interaction prediction method based on the multilayer network and the graph coding comprises a data acquisition module, a data preprocessing module, a feature learning module, a model algorithm design module and a result evaluation module, and is characterized in that:
(1) the data acquisition module comprises:
(1-1) for drugs, collecting drug-drug interaction relationship data, drug-disease relationship data, drug-side effect relationship data, and six different types of drug-pair similarity relationship data, including: chemical fingerprint data of the medicine, therapeutic data of the medicine, peptide chain data of an action target of the medicine, biological process data of the medicine, molecular function data of the medicine and action cell component data of the medicine;
(1-2) for the target, namely protein, collecting target-target interaction relationship data, target-disease relationship data, and four different types of target-pair similarity relationship data, including: peptide chain data of the target spot, biological process data of the target spot, cell component data of the target spot and target spot molecule function data;
(1-3) collecting interaction relation data of the medicine and the target;
(2) the data preprocessing module comprises a medicine and target related network and a multilayer network;
(2-1) the construction of the drug and target related network comprises:
A. for single-class object interaction relation data, constructing homogeneous interaction network, including drug interaction network G1DTarget interaction network G1T
B. For different classes of object interaction relationship data, constructing heterogeneous interaction networks, including drug disease related network GD_DINetwork G relating to side effects of drugsD_SETarget disease-related network GT_DI
C. Collecting drug information of different dimensions, and constructing drug similarity network including chemical similarity network G of drug2DTherapeutic similarity network of drugs G3DThe action of the drugsTarget sequence similarity network G4DBiological process similarity network G of drugs5DMolecular functional similarity network G of drugs6DNetwork of similarity of active cellular components of drugs G7D
D. Collecting target point information of different dimensions, and constructing a target point similarity network including a target point sequence similarity network G2TTarget biological process similarity network G3TSimilarity network G of cellular components of target site4TTarget molecule functional similarity network G5T
E. Construction of drug target interaction network GD_T
(2-2) the method for generating the multilayer network comprises the steps of generating a medicine multilayer network and generating a target multilayer network, and comprises the following specific steps:
(2-2-1) first, the drug disease-related network GD_DIDisease similarity network G decomposed and converted into drug8D=(V8D,E8D) In which V is8D、E8DRespectively representing a drug node set in the network and an edge weight set of disease similarity between two drugs; margin for disease similarity of drugs
Figure FDA0003187370810000021
xD_MAnd yD_MTwo drugs are shown in GD_DIThe corresponding row vector in the adjacency matrix of (a) represents the vector modulo;
network G relating drug side effectsD_SENetwork G of similarity of side effects of drug decomposition and conversion9D=(V9D,E9D) In which V is9D、E9DRespectively representing a drug node set in the network and an edge weight set of side effect similarity between two drugs; margin for similarity of side effects of drugs
Figure FDA0003187370810000022
xD_SEAnd yD_SETwo drugs are shown in GD_SEThe corresponding row vector in the adjacency matrix of (a);
targeting pointDisease-related network GT_DIDecomposing and converting into target disease similarity network G6T=(V6T,E6T) In which V is6T、E6TRespectively representing a target point node set in the network and an edge weight set of disease similarity between two target points; margin for disease similarity of targets
Figure FDA0003187370810000023
xT_DIAnd yT_DIIndicates that two target points are at GT_DIThe corresponding row vector in the adjacency matrix of (a);
(2-2-2) then combining the drug-related networks into a drug multilayer network GD={GiD=(ViD,EiD) I is the drug network number, i belongs to [1,9 ]](ii) a Combining target related networks into a target multilayer network GT={GjT=(VjT,EjT) J is the network number of the target point, j belongs to [1,6 ]];
(3) The feature learning module comprises a training structural self-encoder, encoding output and similar feature vector processing;
(3-1) training the structural autoencoder: drug multilayer network GDWith target multilayer network GTCorrespondingly training a structural self-encoder for each layer;
(3-2) encoding output: respectively coding the corresponding network layers by using the coding ends of the trained structural self-coder to obtain multilayer vectors of all the medicines and the target points;
(3-3) processing the similar feature vectors: splicing the multiple layers of vectors of a drug to obtain the final characteristic vector representation of the drug; splicing the multi-layer vectors of a target point to obtain the final characteristic vector representation of the target point;
(4) the model algorithm design module comprises:
(4-1) constructing a training sample: constructing a training sample by adopting a PairWise model, randomly dividing data into M parts, and performing M-fold cross validation, namely selecting one part as a validation set and the rest as a training set each time, adjusting model parameters through the overall performance of the cross validation, wherein M is a positive integer greater than 3;
(4-2) training and evaluating the model: building a lifting tree by adopting a lightweight gradient lifting decision tree and taking the decision tree as a weak learner, namely building the decision tree T (x, theta) by adopting iterationl) Wherein x and θlRespectively inputting a characteristic vector and a learnable parameter of the first decision tree;
(4-3) predicting drug target interaction: calculating the interaction probability of all drug target pairs according to the optimal prediction model obtained by the result evaluation module, and screening out drug target pairs with high possibility as candidate drug target pairs capable of interacting as prediction results;
(5) the result evaluation module verifies the prediction effect of the model by adopting an ROC curve and a PR curve; the method comprises the following steps:
(5-1) plotting ROC curves: defining the false positive rate FPR as a horizontal axis and the true positive rate TPR as a vertical axis, wherein the larger the area AUROC value covered by the ROC curve is, the better the prediction effect of the model is represented;
real positive rate TPR of ROC curveαAnd false positive rate FPRαThe calculation by the confusion matrix is as follows:
Figure FDA0003187370810000031
the drug target pair is a positive sample in the presence of interaction, and is a negative sample in the absence of interaction; TPαIndicates the number of positive samples, FP, predicted from the positive samples in the test setαIndicating the number of negative samples predicted as positive samples in the test set, FNαDenotes the number of predicted positive samples as negative samples, TNαRepresenting the number of negative samples predicted in the test set as negative samples; α represents a prediction confidence;
(5-2) drawing a PR curve: precision at different prediction confidence alphaαRecall with recall recallingαCompose precision-recall sequence:
Figure FDA0003187370810000032
drawing a precision-recall curve, namely a PR curve, by taking the horizontal axis as recall rate and the vertical axis as precision rate, wherein the AUPR (area under PR) can reflect the classification effect of the classifier on the whole, and the larger the AUPR value of the area under PR is, the better the prediction effect of the model is;
(5-3) evaluation of model: and (4) according to the prediction result of the step (4-3), utilizing the drawn ROC curve and PR curve, calculating AUROC and AUPR, and searching for a model parameter under the optimal prediction result.
2. The method for predicting the interaction of a drug target based on multilayer network and graph coding according to claim 1, wherein: in the (2-1), A is specifically:
constructing a drug interaction network G for drug and drug interaction relation data1D=(V1D,E1D),V1DRepresenting a set of drug nodes in the network, E1DA set of edges indicating the presence of interaction between two drugs in the network;
constructing a target interaction network G for the interaction relation data of the target and the target1T=(V1T,E1T),V1TRepresenting a set of target nodes in the network, E1TIndicating a set of edges in the network that have an interaction between two targets.
3. The method for predicting the interaction of a drug target based on multilayer network and graph coding according to claim 1, wherein: in the (2-1), B is specifically:
for the relation data of the medicine and the disease, a medicine disease related network is constructed
Figure FDA0003187370810000041
Wherein
Figure FDA0003187370810000042
ED_DIRespectively representing a medicine node set, a disease node set and an edge set of the relation between the medicine and the disease in the network;
for the relation data of the medicine and the side effect, a medicine side effect related network is constructed
Figure FDA0003187370810000043
Wherein
Figure FDA0003187370810000044
ED_SERespectively representing a drug node set, a side effect node set and an edge set of the relationship between the drug and the side effect in the network;
for target and disease relation data, constructing target disease related network
Figure FDA0003187370810000045
Wherein
Figure FDA0003187370810000046
ET_DIRespectively representing a target point node set, a disease node set and an edge set of the relation between the target point and the disease in the network.
4. The method for predicting the interaction of a drug target based on multilayer network and graph coding according to claim 1, wherein: c in (2-1) is specifically:
for chemical fingerprint data of medicine, constructing chemical similarity network G of medicine2D=(V2D,E2D) In which V is2D、E2DRespectively representing a drug node set and an edge weight set of chemical similarity between two drugs in the network; edge weights for chemical similarity
Figure FDA0003187370810000047
Wherein a is1And b1Is the bit number of MACCS fingerprints of two drugs respectively, c1The number of the same bit of the two medicines;
for therapeutic data of a drug, a therapeutic similarity network G of the drug is constructed3D=(V3D,E3D) In which V is3D、E3DA set of drug nodes in the network, a set of side weights representing therapeutic similarity between two drugs, respectively; margin for therapeutic similarity
Figure FDA0003187370810000051
Wherein a is2And b2Coding for the respective ATC of the two drugs, c2The number of digits for the same ATC code for both drugs;
constructing a medicine action target point sequence similarity network G for the peptide chain data of the medicine action target point4D=(V4D,E4D) In which V is4D、E4DRespectively representing a drug node set in the network and an edge weight set of action target point similarity between two drugs; margin for similarity of drug action targets
Figure FDA0003187370810000052
Wherein a and b represent the respective targets of the two drugs, TT_T(a, b) shows the sequence similarity of respective targets of the two drugs, mean (-) shows the mean;
for biological process data of the drug, a biological process similarity network G of the drug is constructed5D=(V5D,E5D) In which V is5D、E5DRespectively representing a drug node set in the network and an edge weight set of the similarity of biological processes between two drugs; margin for similarity of pharmacogenomic processes
Figure FDA0003187370810000053
TT_P(a, b) indicates the similarity of biological processes of the respective targets of the two drugs;
for the molecular function data of the medicine, a molecular function similarity network G of the medicine is constructed6D=(V6D,E6D) In which V is6D、E6DRespectively representing a drug node set in the network and an edge weight set of molecular function similarity between two drugs; the boundary of functional similarity of drug molecules
Figure FDA0003187370810000054
TT_M(a, b) represents the molecular functional similarity of the respective targets of the two drugs;
for the acting cell component data of the medicine, constructing an acting cell component similarity network G of the medicine7D=(V7D,E7D) In which V is7D、E7DRespectively representing a drug node set in the network and an edge weight set of similarity of acting cell components between two drugs; margin for similarity of cell components for drug action
Figure FDA0003187370810000055
TT_C(a, b) shows the similarity of the acting cellular components of the respective targets of the two drugs.
5. The method for predicting the interaction of a drug target based on multilayer network and graph coding according to claim 1, wherein: in the (2-1), D is specifically:
constructing a target sequence similarity network G for the peptide chain data of the target2T=(V2T,E2T) In which V is2T、E2TRespectively representing a target point node set and an edge weight set of sequence similarity between two target points in the network; sequence similarity margin
Figure FDA0003187370810000061
Wherein a is3And b3The number of peptide chain sequence positions of two targets respectively, c3The number of bits of the peptide chain sequence which is the same with the two targets;
for the biological process data of the target, a similarity network G of the biological process of the target is constructed3T=(V3T,E3T) In which V is3T、E3TRespectively representing a target point node set in the network and an edge weight set of the similarity of the biological processes between two target points; edge weights T of similarity of target biological processesT_P(a, b) semantic annotation of GO in the biological process of two targets;
constructing the cell of the target point according to the cell component data of the target pointComponent similarity network G4T=(V4T,E4T) In which V is4T、E4TRespectively representing a target point node set in the network and an edge weight set of the similarity of the cell components between the two target points; margin T of similarity of cellular components at target siteT_C(a, b) semantic annotation of GO of cell components of two target points;
constructing a target molecule functional similarity network G for the target molecule functional data5T=(V5T,E5T) In which V is5T、E5TRespectively representing a target point node set in the network and an edge weight set of molecular function similarity between two target points; edge weight T of target molecule functional similarityT_M(a, b) are obtained by GO semantic annotation of the molecular functions of the two targets.
6. The method for predicting the interaction of a drug target based on multilayer network and graph coding according to claim 1, wherein: in the (2-1), E is specifically:
constructing a drug target interaction network for drug and target interaction relation data
Figure FDA0003187370810000062
Wherein
Figure FDA0003187370810000063
ED_TRespectively representing a medicine node set, a target node set and an edge set of the relation between the medicine and the target in the network.
7. The method for predicting the interaction of a drug target based on multilayer network and graph coding according to claim 1, wherein: (3-1) the training process is as follows:
a. using the adjacent matrix corresponding to the single-layer network as the input of the encoder;
b. after encoding, the output of the encoder is obtained and is used as the input of the decoder;
c. obtaining the output of a decoder after decoding, and calculating a loss function by utilizing the adjacency matrix, the output of the encoder and the output of the decoder;
d. calculating the gradient of each parameter of the encoder and the decoder by using a loss function, updating the parameters, wherein the updating step length is a multiple of the negative gradient;
e. repeating steps b to d until the loss function converges;
said loss function LmThe calculation includes two parts:
first order loss of similarity
Figure FDA0003187370810000071
N is the number of nodes, zpAnd zgRepresenting the coded output vectors, T, of the coder for node p and node g, respectivelypgRepresenting the weight of the connected edge;
second order loss of similarity
Figure FDA0003187370810000072
bnAnd
Figure FDA0003187370810000073
an encoder input vector and a decoder output vector representing node n, respectively;
total loss function Lm=L2nd+λL1stλ is a penalty term, 0 < λ < 1.
8. The method for predicting the interaction of a drug target based on multilayer network and graph coding according to claim 1, wherein: (4-2) the specific process is as follows:
(4-2-1) before each round of decision tree construction, screening out small gradient samples by using a gradient-based unilateral sampling algorithm, namely reserving a small part of large gradient samples and randomly selecting a part of small gradient samples to calculate the total variance gain;
(4-2-2) before each round of decision tree construction, merging mutually exclusive features by using a mutually Exclusive Feature Bundling (EFB) algorithm;
(4-2-3) constructing a simulation for the generated first decision tree based on the screened samples when the input feature vector x and the corresponding label y of a certain sample are inputSynthesizing a target: if l is 1, the fitting target is the label of the sample, wherein the label of the positive sample is 1, and the label of the negative sample is 0; when l is more than or equal to 2, the fitting target is
Figure FDA0003187370810000074
Wherein the lifting tree obtained after the first-1 iteration
Figure FDA0003187370810000075
L is a loss function, and under the binary task, a single sample (x, y) has a predicted value of
Figure FDA0003187370810000076
The time loss function is defined as:
Figure FDA0003187370810000077
(4-2-4) constructing a binary decision tree by fitting the target based on the screened samples, wherein a leaf node of the binary decision tree is split by the following steps: constructing a histogram for each screened feature according to the value range of the feature, calculating the variance gain of each division point by using the histogram, selecting the feature with the maximum variance gain and the division point as the splitting feature and the optimal division point of the current node, and dividing the data of the leaf node corresponding to the optimal division point into two batches; recursion is continued until the maximum depth of the tree is reached; the variance gain of feature f based on dataset D at partition point D is expressed as:
Figure FDA0003187370810000081
wherein xl、xl,f、glRespectively representing the ith sample vector, the ith feature of the ith sample vector and the negative gradient thereof,
Figure FDA0003187370810000082
and
Figure FDA0003187370810000083
respectively counting the number of samples with the characteristics f smaller than the division point D and larger than the division point D in the data set D;
(4-2-5) performing K rounds of iteration to generate K decision trees;
(4-2-6) adding the K decision trees to generate a final lightweight gradient lifting decision tree
Figure FDA0003187370810000084
CN202110865457.9A 2021-07-29 2021-07-29 Drug target interaction prediction method based on multilayer network and graph coding Pending CN113571125A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110865457.9A CN113571125A (en) 2021-07-29 2021-07-29 Drug target interaction prediction method based on multilayer network and graph coding

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110865457.9A CN113571125A (en) 2021-07-29 2021-07-29 Drug target interaction prediction method based on multilayer network and graph coding

Publications (1)

Publication Number Publication Date
CN113571125A true CN113571125A (en) 2021-10-29

Family

ID=78169065

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110865457.9A Pending CN113571125A (en) 2021-07-29 2021-07-29 Drug target interaction prediction method based on multilayer network and graph coding

Country Status (1)

Country Link
CN (1) CN113571125A (en)

Cited By (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114023464A (en) * 2021-11-08 2022-02-08 东北林业大学 Drug-target interaction prediction method based on supervised synergy map contrast learning
CN114038499A (en) * 2021-11-12 2022-02-11 东南大学 Traditional Chinese medicine prescription active ingredient group prediction method based on heterogeneous network embedding
CN114334038A (en) * 2021-12-31 2022-04-12 杭州师范大学 Disease drug prediction method based on heterogeneous network embedded model
CN114944191A (en) * 2022-06-21 2022-08-26 湖南中医药大学 Component-target interaction prediction method based on web crawler and multi-modal characteristics
CN114974408A (en) * 2022-05-26 2022-08-30 浙江大学 Construction method, prediction method and device of drug interaction prediction model
WO2023123168A1 (en) * 2021-12-30 2023-07-06 Boe Technology Group Co., Ltd. Method of generating negative sample set for predicting macromolecule-macromolecule interaction, method of predicting macromolecule-macromolecule interaction, method of training model

Cited By (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114023464A (en) * 2021-11-08 2022-02-08 东北林业大学 Drug-target interaction prediction method based on supervised synergy map contrast learning
CN114023464B (en) * 2021-11-08 2022-08-09 东北林业大学 Drug-target interaction prediction method based on supervised synergy map contrast learning
CN114038499A (en) * 2021-11-12 2022-02-11 东南大学 Traditional Chinese medicine prescription active ingredient group prediction method based on heterogeneous network embedding
WO2023123168A1 (en) * 2021-12-30 2023-07-06 Boe Technology Group Co., Ltd. Method of generating negative sample set for predicting macromolecule-macromolecule interaction, method of predicting macromolecule-macromolecule interaction, method of training model
CN114334038A (en) * 2021-12-31 2022-04-12 杭州师范大学 Disease drug prediction method based on heterogeneous network embedded model
CN114334038B (en) * 2021-12-31 2024-05-14 杭州师范大学 Disease medicine prediction method based on heterogeneous network embedded model
CN114974408A (en) * 2022-05-26 2022-08-30 浙江大学 Construction method, prediction method and device of drug interaction prediction model
CN114944191A (en) * 2022-06-21 2022-08-26 湖南中医药大学 Component-target interaction prediction method based on web crawler and multi-modal characteristics

Similar Documents

Publication Publication Date Title
CN113571125A (en) Drug target interaction prediction method based on multilayer network and graph coding
CN111312329B (en) Transcription factor binding site prediction method based on deep convolution automatic encoder
CN113327644B (en) Drug-target interaction prediction method based on deep embedding learning of graph and sequence
CN110110324B (en) Biomedical entity linking method based on knowledge representation
CN113936735A (en) Method for predicting binding affinity of drug molecules and target protein
Yu Three principles of data science: predictability, computability, and stability (PCS)
CN113393911B (en) Ligand compound rapid pre-screening method based on deep learning
CN111681718B (en) Medicine relocation method based on deep learning multi-source heterogeneous network
CN111370073B (en) Medicine interaction rule prediction method based on deep learning
CN115526246A (en) Self-supervision molecular classification method based on deep learning model
CN116798652A (en) Anticancer drug response prediction method based on multitasking learning
CN115985520A (en) Medicine disease incidence relation prediction method based on graph regularization matrix decomposition
Ma et al. Heuristics and metaheuristics for biological network alignment: A review
CN114021584A (en) Knowledge representation learning method based on graph convolution network and translation model
CN116646001B (en) Method for predicting drug target binding based on combined cross-domain attention model
CN115458046B (en) Method for predicting drug target binding property based on parallel deep fine granularity model
CN114999566B (en) Drug repositioning method and system based on word vector characterization and attention mechanism
CN116978464A (en) Data processing method, device, equipment and medium
CN116312808A (en) TransGAT-based drug-target interaction prediction method
CN112735604B (en) Novel coronavirus classification method based on deep learning algorithm
CN114970684A (en) Community detection method for extracting network core structure by combining VAE
Abd Elaziz et al. Quantum artificial hummingbird algorithm for feature selection of social IoT
CN117976047B (en) Key protein prediction method based on deep learning
Halsana et al. DensePPI: A Novel Image-Based Deep Learning Method for Prediction of Protein–Protein Interactions
Zhang et al. Enhanced Gradient for Differentiable Architecture Search

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination