CN113571125A - Drug target interaction prediction method based on multilayer network and graph coding - Google Patents
Drug target interaction prediction method based on multilayer network and graph coding Download PDFInfo
- Publication number
- CN113571125A CN113571125A CN202110865457.9A CN202110865457A CN113571125A CN 113571125 A CN113571125 A CN 113571125A CN 202110865457 A CN202110865457 A CN 202110865457A CN 113571125 A CN113571125 A CN 113571125A
- Authority
- CN
- China
- Prior art keywords
- network
- target
- drug
- similarity
- data
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 230000003993 interaction Effects 0.000 title claims abstract description 67
- 238000000034 method Methods 0.000 title claims abstract description 46
- 239000003596 drug target Substances 0.000 title claims description 49
- 239000003814 drug Substances 0.000 claims abstract description 270
- 229940079593 drug Drugs 0.000 claims abstract description 180
- 239000013598 vector Substances 0.000 claims abstract description 64
- 230000000694 effects Effects 0.000 claims abstract description 34
- 239000011159 matrix material Substances 0.000 claims abstract description 20
- 238000004422 calculation algorithm Methods 0.000 claims abstract description 18
- 238000011156 evaluation Methods 0.000 claims abstract description 17
- 230000008569 process Effects 0.000 claims abstract description 10
- 238000013461 design Methods 0.000 claims abstract description 8
- 238000007781 pre-processing Methods 0.000 claims abstract description 8
- 108090000623 proteins and genes Proteins 0.000 claims abstract description 8
- 238000002790 cross-validation Methods 0.000 claims abstract description 7
- 238000012545 processing Methods 0.000 claims abstract description 6
- 102000004169 proteins and genes Human genes 0.000 claims abstract description 6
- 201000010099 disease Diseases 0.000 claims description 53
- 208000037265 diseases, disorders, signs and symptoms Diseases 0.000 claims description 53
- 238000003066 decision tree Methods 0.000 claims description 29
- 230000031018 biological processes and functions Effects 0.000 claims description 28
- 238000012549 training Methods 0.000 claims description 28
- 210000003850 cellular structure Anatomy 0.000 claims description 25
- 230000009471 action Effects 0.000 claims description 21
- 230000006870 function Effects 0.000 claims description 20
- 239000000126 substance Substances 0.000 claims description 18
- 230000004879 molecular function Effects 0.000 claims description 16
- 108090000765 processed proteins & peptides Proteins 0.000 claims description 14
- 230000001225 therapeutic effect Effects 0.000 claims description 13
- 238000012216 screening Methods 0.000 claims description 10
- 238000012360 testing method Methods 0.000 claims description 10
- 238000010276 construction Methods 0.000 claims description 9
- 239000010410 layer Substances 0.000 claims description 9
- 230000001413 cellular effect Effects 0.000 claims description 8
- 206010013710 Drug interaction Diseases 0.000 claims description 7
- 208000030453 Drug-Related Side Effects and Adverse reaction Diseases 0.000 claims description 6
- 238000004364 calculation method Methods 0.000 claims description 6
- 206010061623 Adverse drug reaction Diseases 0.000 claims description 5
- 238000000354 decomposition reaction Methods 0.000 claims description 4
- 238000006243 chemical reaction Methods 0.000 claims description 3
- 230000008406 drug-drug interaction Effects 0.000 claims description 3
- 238000010200 validation analysis Methods 0.000 claims description 3
- 238000005192 partition Methods 0.000 claims description 2
- 230000002974 pharmacogenomic effect Effects 0.000 claims description 2
- 238000005070 sampling Methods 0.000 claims description 2
- 239000002356 single layer Substances 0.000 claims description 2
- 210000004027 cell Anatomy 0.000 claims 1
- 238000004088 simulation Methods 0.000 claims 1
- 230000008685 targeting Effects 0.000 claims 1
- 238000011160 research Methods 0.000 abstract description 4
- 238000007418 data mining Methods 0.000 abstract description 3
- 239000002547 new drug Substances 0.000 description 7
- 238000005516 engineering process Methods 0.000 description 5
- 238000012827 research and development Methods 0.000 description 5
- 238000011161 development Methods 0.000 description 4
- 238000004458 analytical method Methods 0.000 description 2
- 230000008859 change Effects 0.000 description 2
- 238000005094 computer simulation Methods 0.000 description 2
- 238000009509 drug development Methods 0.000 description 2
- 238000007876 drug discovery Methods 0.000 description 2
- 238000010801 machine learning Methods 0.000 description 2
- 230000007246 mechanism Effects 0.000 description 2
- 201000004569 Blindness Diseases 0.000 description 1
- 230000008901 benefit Effects 0.000 description 1
- 238000013145 classification model Methods 0.000 description 1
- 238000013499 data model Methods 0.000 description 1
- 238000001514 detection method Methods 0.000 description 1
- 238000010586 diagram Methods 0.000 description 1
- 238000007877 drug screening Methods 0.000 description 1
- 238000013210 evaluation model Methods 0.000 description 1
- 238000002474 experimental method Methods 0.000 description 1
- 238000000605 extraction Methods 0.000 description 1
- 230000001737 promoting effect Effects 0.000 description 1
- 238000009877 rendering Methods 0.000 description 1
- 238000012163 sequencing technique Methods 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G16—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
- G16B—BIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
- G16B15/00—ICT specially adapted for analysing two-dimensional or three-dimensional molecular structures, e.g. structural or functional relations or structure alignment
- G16B15/30—Drug targeting using structural data; Docking or binding prediction
-
- G—PHYSICS
- G16—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
- G16B—BIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
- G16B40/00—ICT specially adapted for biostatistics; ICT specially adapted for bioinformatics-related machine learning or data mining, e.g. knowledge discovery or pattern finding
Landscapes
- Life Sciences & Earth Sciences (AREA)
- Engineering & Computer Science (AREA)
- Health & Medical Sciences (AREA)
- Physics & Mathematics (AREA)
- Medical Informatics (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Spectroscopy & Molecular Physics (AREA)
- Biotechnology (AREA)
- General Health & Medical Sciences (AREA)
- Theoretical Computer Science (AREA)
- Chemical & Material Sciences (AREA)
- Bioinformatics & Computational Biology (AREA)
- Biophysics (AREA)
- Evolutionary Biology (AREA)
- Medicinal Chemistry (AREA)
- Pharmacology & Pharmacy (AREA)
- Crystallography & Structural Chemistry (AREA)
- Artificial Intelligence (AREA)
- Bioethics (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Data Mining & Analysis (AREA)
- Databases & Information Systems (AREA)
- Epidemiology (AREA)
- Evolutionary Computation (AREA)
- Public Health (AREA)
- Software Systems (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
The invention discloses a medicine target interaction prediction method based on multilayer network and graph coding. The method comprises a data acquisition module, a data preprocessing module, a feature learning module, a model algorithm design module and a result evaluation module. The data preprocessing module constructs a drug and protein network and processes heterogeneous images. The feature learning module comprises self-supervision learning on a structural graph encoder, vector encoding on a graph and isomorphic vector processing, and topology information of the graph is represented in a vector form. The model algorithm design module comprises the steps of constructing a cross validation set and designing a prediction model. And the result evaluation module verifies the prediction effect of the model by adopting an ROC curve based on a confusion matrix and a PR curve based on an accuracy and recall sequence. The method of the invention researches the medicine and the target from the aspects of data mining and graph, and predicts the interaction between the medicine and the target through the generated graph structure information and the subsequent tree model.
Description
Technical Field
The invention belongs to the technical field of data mining, and particularly relates to a medicine target interaction prediction method based on a multilayer network and graph coding.
Background
With the rapid development of machine learning, the development of biological detection technologies such as third-generation gene sequencing and the like, and the arrival of a big data era in the field due to the rapid increase of biological data volume, more and more researchers and companies aim at the field of AI auxiliary drug development. The computer algorithm is used for assisting in screening the target targets, and the most intuitive advantage is that the computer is used for screening candidate drugs and narrowing the candidate range, so that the period of new drug discovery is greatly shortened, and the research consumables of new drug discovery are reduced. Practical application data indicates that AI technology can substantially reduce drug development costs by about 35%. By analyzing the net income trend of the international top medicine enterprises in recent years, the net income of most medicine enterprises is increased to different degrees after the AI auxiliary medicine is introduced for research and development. The AI technology can also perform multi-specific target analysis on the drug to predict multiple targets of the drug, thereby revealing the complex action mechanism of some diseases. In addition, the AI technology can also improve the accuracy and safety of the prediction of the drug, and search the side effect mechanism of the drug. Therefore, the AI technology can greatly simplify the process of research and development of new drugs on the whole, save research and development expenses, and assist drug enterprises in quickly researching and developing new drugs.
Disclosure of Invention
The invention aims to provide a method for predicting the interaction of drug targets based on a multilayer network and graph coding, which can eliminate the randomness of clinical experiments, narrow the screening range and accelerate the test period.
The invention constructs nine drug related networks (drug interaction network, drug disease related network, drug side effect related network, chemical similarity network of drug, therapeutic similarity network of drug, action target sequence similarity network of drug, biological process similarity network of drug, molecular function similarity network of drug, action cell component similarity network of drug), six target related networks (target interaction network, target disease related network, target sequence similarity network, target biological process similarity network, cell component similarity network where target is located, target molecular function similarity network) and drug target interaction network used as label. And respectively training corresponding structural self-encoders by using the networks independently, encoding the nodes into vectors by using the trained self-encoders, and finally splicing the encoded vectors of the nodes in different networks to form final characteristic vectors. And (3) sending the drug target pairs to be predicted into a trained lifting tree model (the model is obtained by linearly adding a series of decision trees constructed based on a training set) to obtain a final evaluation score.
The method comprises a data acquisition module, a data preprocessing module, a feature learning module, a model algorithm design module and a result evaluation module.
(1) The data acquisition module comprises:
(1-1) for drugs, collecting drug-drug interaction relationship data, drug-disease relationship data, drug-side effect relationship data, and six different types of drug-pair similarity relationship data, including: chemical fingerprint data of the medicine, therapeutic data of the medicine, peptide chain data of an action target of the medicine, biological process data of the medicine, molecular function data of the medicine and action cell component data of the medicine;
(1-2) for the target, namely protein, collecting the data of the interaction relation between the target and the target, the data of the relation between the target and the disease and the data of the similarity relation between four different types of targets, comprising: peptide chain data of the target spot, biological process data of the target spot, cell component data of the target spot and target spot molecule function data;
(1-3) collecting the interaction relation data of the medicine and the target.
(2) The data preprocessing module comprises a medicine and target related network and a multilayer network;
(2-1) the construction of the drug and target related network comprises:
A. for single-class object interaction relation data, constructing homogeneous interaction network, including drug interaction network G1DTarget interaction network G1T;
B. For objects of different classesInteraction relationship data, constructing heterogeneous interaction networks, including drug disease-related network GD_DINetwork G relating to side effects of drugsD_SETarget disease-related network GT_DI;
C. Collecting drug information of different dimensions, and constructing drug similarity network including chemical similarity network G of drug2DTherapeutic similarity network of drugs G3DAnd the action target point sequence similarity network G of the medicine4DBiological process similarity network G of drugs5DMolecular functional similarity network G of drugs6DNetwork of similarity of active cellular components of drugs G7D;
D. Collecting target point information of different dimensions, and constructing a target point similarity network including a target point sequence similarity network G2TTarget biological process similarity network G3TSimilarity network G of cellular components of target site4TTarget molecule functional similarity network G5T;
E. Construction of drug target interaction network GD_T。
(2-2) the method for generating the multilayer network comprises the steps of generating a medicine multilayer network and generating a target multilayer network, and comprises the following specific steps:
(2-2-1) first, the drug disease-related network GD_DIDisease similarity network G decomposed and converted into drug8D=(V8D,E8D) In which V is8D、E8DRespectively representing a drug node set in the network and an edge weight set of disease similarity between two drugs; margin for disease similarity of drugsxD_MAnd yD_MTwo drugs are shown in GD_DIThe corresponding row vector in the adjacency matrix of (a) represents the vector modulo;
network G relating drug side effectsD_SENetwork G of similarity of side effects of drug decomposition and conversion9D=(V9D,E9D) In which V is9D、E9DAre respectively provided withA set of drug nodes in the network, a set of edge weights representing side effect similarities between two drugs; margin for similarity of side effects of drugsxD_SEAnd yD_SETwo drugs are shown in GD_SEThe corresponding row vector in the adjacency matrix of (a);
target disease-related network GT_DIDecomposing and converting into target disease similarity network G6T=(V6T,E6T) Wherein V is6T、E6TRespectively representing a target point node set in the network and an edge weight set of disease similarity between two target points; margin for disease similarity of target pointsxT_DIAnd yT_DIIndicates that two target points are at GT_DICorresponding row vectors in the adjoining matrix of (a);
(2-2-2) then combining the drug-related networks into a drug multilayer network GD={GiD=(ViD,EiD) I is the drug network number, i belongs to [1,9 ]](ii) a Combining target related networks into a target multilayer network GT={GjT=(VjT,EjT) J is the network number of the target point, j belongs to [1,6 ]]。
(3) The feature learning module comprises a training structural self-encoder, encoding output and similar feature vector processing;
(3-1) training the structural autoencoder: drug multilayer network GDWith target multilayer network GTCorrespondingly training a structural self-encoder for each layer;
(3-2) encoding output: respectively coding the corresponding network layers by using the coding ends of the trained structural self-coder to obtain multilayer vectors of all the medicines and the target spots;
(3-3) processing the similar feature vectors: splicing the multiple layers of vectors of a drug to obtain the final characteristic vector representation of the drug; and splicing the multi-layer vectors of a target point to obtain the final characteristic vector representation of the target point.
(4) The model algorithm design module comprises a training sample construction module, a training and evaluation model and a medicine target point interaction prediction module;
(4-1) constructing a training sample: constructing a training sample by adopting a PairWise model, randomly dividing data into M parts, and performing M-fold cross validation, namely selecting one part as a validation set and the rest as a training set each time, adjusting model parameters through the overall performance of the cross validation, wherein M is a positive integer greater than 3;
(4-2) training and evaluating the model: building a lifting tree by adopting a lightweight gradient lifting decision tree and taking the decision tree as a weak learner, namely building the decision tree T (x, theta) by adopting iterationl) Wherein x and θlRespectively inputting a characteristic vector and a learnable parameter of the first decision tree;
(4-3) predicting drug target interaction: and according to the optimal prediction model obtained by the result evaluation module, calculating the interaction probability of all the drug target pairs, and screening out the drug target pairs with high possibility as candidate drug target pairs capable of interacting as prediction results.
(5) The result evaluation module verifies the prediction effect of the model by adopting an ROC curve and a PR curve; the method comprises the following steps:
(5-1) plotting ROC curves: defining the false positive rate FPR as a horizontal axis and the true positive rate TPR as a vertical axis, wherein the larger the area AUROC value covered by the ROC curve is, the better the prediction effect of the model is represented;
real positive rate TPR of ROC curveαAnd false positive rate FPRαThe calculation by the confusion matrix is as follows:
the drug target pair is a positive sample in the presence of interaction, and is a negative sample in the absence of interaction; TPαIndicates the number of positive samples, FP, predicted from the positive samples in the test setαRepresenting negative examples in a test setMeasured as the number of positive samples, FNαIndicates the number of positive samples predicted as negative samples, TNαRepresenting the number of negative samples predicted in the test set as negative samples; α represents a prediction confidence;
(5-2) drawing a PR curve: precision at different prediction confidence alphaαRecall with recall recallingαComposition of precision-recall sequence:
drawing a precision-recall curve, namely a PR curve, by taking the horizontal axis as recall rate and the vertical axis as precision rate, wherein AUPR (area under PR) can reflect the classification effect of the classifier on the whole, and the larger AUPR value of the area under the PR curve is, the better the prediction effect of the model is;
(5-3) evaluation of model: and (4) according to the prediction result of the step (4-3), utilizing the drawn ROC curve and PR curve, calculating AUROC and AUPR, and searching for a model parameter under the optimal prediction result.
The method researches the interaction of the drug target pairs from the aspects of data mining and multilayer networks, abstracts different types of data into the same data structure by constructing the network, and realizes the drug target prediction by combining the methods of the decomposition of heterogeneous networks, the automatic learning of network topological structures by structural self-encoders, tree-based classifiers and the like. Therefore, the method can effectively analyze the drug target data and predict the interaction between the drug target data and the drug target data, thereby providing scientific guidance for the research and development of new drugs, improving the research and development efficiency of the new drugs and promoting the development of medical independent innovation to a certain extent.
Drawings
FIG. 1 is a schematic flow diagram of the process of the present invention.
Detailed Description
The following detailed description of the embodiments of the invention is provided in connection with the accompanying drawings.
The existing 732 drug-related data, 1915 targets (proteins) and corresponding 12904 side effects and 440 disease-related data comprise data of interactions between drug pairs, between drug diseases, between drug side effects, between targets and targets, between targets and diseases, MACCS fingerprint data of drug chemical formula, GO annotation of drug and target, protein sequence data of target, and half-inhibitory concentration data between drug and target.
As shown in fig. 1, a method for predicting drug target interaction based on multilayer network and graph coding comprises a data acquisition module, a data preprocessing module, a feature learning module, a model algorithm design module, and a result evaluation module, and specifically comprises the following steps:
(1) a data acquisition module comprising:
(1-1) for drugs, collecting drug-drug interaction relationship data, drug-disease relationship data, drug-side effect relationship data, and six different types of drug-pair similarity relationship data, including: chemical fingerprint data of the medicine, therapeutic data of the medicine, peptide chain data of an action target of the medicine, biological process data of the medicine, molecular function data of the medicine and action cell component data of the medicine;
(1-2) for the target, namely protein, collecting the data of the interaction relation between the target and the target, the data of the relation between the target and the disease and the data of the similarity relation between four different types of targets, comprising: peptide chain data of the target spot, biological process data of the target spot, cell component data of the target spot and target spot molecule function data;
(1-3) collecting interaction relation data of the medicine and the target;
the above data is downloaded through a public website.
(2) The data preprocessing module comprises a module for constructing a medicine and target related network and generating a multilayer network, and provides a data basis for medicine target prediction, and specifically comprises the following steps:
(2-1) constructing a medicine and target related network, comprising:
(I) for the interaction relation data of the drug and the drug, constructing a drug interaction network G1D=(V1D,E1D),V1DRepresenting a set of drug nodes in the network, E1DRepresents the netThe edge set of the interaction between two drugs in the collateral exists;
constructing a target interaction network G for the interaction relation data of the target and the target1T=(V1T,E1T),V1TRepresenting a set of target nodes in the network, E1TRepresenting a set of edges that have an interaction between two targets in the network;
(II) for the relation data of the medicine and the disease, constructing a medicine disease related networkWhereinED_DIRespectively representing a medicine node set, a disease node set and an edge set of the relation between the medicine and the disease in the network;
for the relation data of the medicine and the side effect, a medicine side effect related network is constructedWhereinED_SERespectively representing a drug node set, a side effect node set and an edge set of the relationship between the drug and the side effect in the network;
for target and disease relation data, constructing target disease related networkWhereinET_DIRespectively representing a target point node set, a disease node set and an edge set of the relation between a target point and a disease in the network;
(III) for chemical fingerprint data of the medicine, constructing a chemical similarity network G of the medicine2D=(V2D,E2D) In which V is2D、E2DRespectively representing a drug node set and an edge weight set of chemical similarity between two drugs in the network; margin of chemical similarityWherein a is1And b1Is the bit number of MACCS fingerprints of two drugs respectively, c1The number of the same bit of the two medicines;
for therapeutic data of a drug, a therapeutic similarity network G of the drug is constructed3D=(V3D,E3D) In which V is3D、E3DA set of drug nodes in the network, a set of side weights representing therapeutic similarity between two drugs, respectively; margin of therapeutic similarityWherein a is2And b2Coding for the respective ATC of the two drugs, c2The number of digits for the same ATC code for both drugs;
constructing a medicine action target point sequence similarity network G for the peptide chain data of the medicine action target point4D=(V4D,E4D) In which V is4D、E4DRespectively representing a drug node set in the network and an edge weight set of action target point similarity between two drugs; margin for similarity of drug action targetsWherein a and b represent the respective targets of the two drugs, TT_T(a, b) shows the sequence similarity of respective targets of the two drugs, mean (-) shows the mean;
for biological process data of the drug, a biological process similarity network G of the drug is constructed5D=(V5D,E5D) In which V is5D、E5DRespectively representing a drug node set in the network and an edge weight set of the similarity of biological processes between two drugs; margin for similarity of pharmacogenomic processesTT_P(a, b) representing the similarity of biological processes at the respective targets of the two drugs;
for the molecular function data of the medicine, a molecular function similarity network G of the medicine is constructed6D=(V6D,E6D) In which V is6D、 E6DRespectively representing a drug node set in the network and an edge weight set of molecular function similarity between two drugs; the boundary of functional similarity of drug moleculesTT_M(a, b) shows the molecular functional similarity of the respective targets of the two drugs;
for the acting cell component data of the medicine, constructing an acting cell component similarity network G of the medicine7D=(V7D,E7D) In which V is7D、E7DRespectively representing a drug node set in the network and an edge weight set of similarity of acting cell components between two drugs; margin for similarity of cell components for drug actionTT_C(a, b) shows the similarity of the acting cell components of the respective targets of the two drugs;
(IV) constructing a target sequence similarity network G for the peptide chain data of the target2T=(V2T,E2T) In which V is2T、E2TRespectively representing a target point node set and an edge weight set of sequence similarity between two target points in the network; sequence similarity marginWherein a is3And b3The number of peptide chain sequence positions of two targets respectively, c3The number of bits of the peptide chain sequence which is the same with the two targets;
for the biological process data of the target, a similarity network G of the biological process of the target is constructed3T=(V3T,E3T) In which V is3T、E3TRespectively representing a target point node set in the network and an edge weight set of the similarity of the biological processes between two target points; edge weights T of similarity of target biological processesT_P(a, b) semantic annotation of GO in the biological process of two targets;
constructing a cell component similarity network G for the cell component data of the target point4T=(V4T,E4T) In which V is4T、E4TRespectively representing a target point node set in the network and an edge weight set of the similarity of the cell components between the two target points; margin T of similarity of cellular components at target siteT_C(a, b) semantic annotation of GO of cell components of two target points;
constructing a target molecule functional similarity network G for the target molecule functional data5T=(V5T,E5T) In which V is5T、E5TRespectively representing a target point node set in the network and an edge weight set of molecular function similarity between two target points; edge weight T of target molecule function similarityT_M(a, b) semantic annotation of GO with molecular functions of two targets;
(V) for the interaction relation data of the drug and the target, constructing a drug target interaction networkWhereinED_TRespectively representing a drug node set, a target point node set and an edge set of the relationship between the drug and the target point in the network.
(2-2) generating a multilayer network, including generating a drug multilayer network and generating a target multilayer network:
(2-2-1) network G relating drug diseases to drug diseasesD_DIDisease similarity network G decomposed and converted into drug8D=(V8D,E8D) In which V is8D、E8DRespectively representing a drug node set in the network and an edge weight set of disease similarity between two drugs; medicineBy the similarity of diseasesxD_MAnd yD_MTwo drugs are shown in GD_DIThe corresponding row vector in the adjacency matrix of (a) represents the vector modulo;
network G relating drug side effectsD_SENetwork G of similarity of side effects of drug decomposition and conversion9D=(V9D,E9D) In which V is9D、E9DRespectively representing a drug node set in the network and an edge weight set of side effect similarity between two drugs; margin for similarity of side effects of drugsxD_SEAnd yD_SETwo drugs are shown in GD_SEThe corresponding row vector in the adjacent matrix of (2);
target disease-related network GT_DIDecomposing and converting into target disease similarity network G6T=(V6T,E6T) In which V is6T、E6TRespectively representing a target point node set in the network and an edge weight set of disease similarity between two target points; margin for disease similarity of targetsxT_DIAnd yT_DIIndicates that two target points are at GT_DIThe corresponding row vector in the adjacency matrix of (2);
(2-2-2) combining a drug interaction network, a drug disease similarity network, a drug side effect similarity network, a drug chemical similarity network, a drug therapeutic similarity network, a drug action target sequence similarity network, a drug biological process similarity network, a drug molecular function similarity network and a drug action cell component similarity network into a drug multilayer network GD={GiD=(ViD,EiD) I is the drug network number, i belongs to [1,9 ]];
Phase of target pointThe interaction network, the disease similarity network of the target, the sequence similarity network of the target, the similarity network of the biological process of the target, the similarity network of the cellular components of the target and the functional similarity network of the target molecule are combined into a target multilayer network GT={GjT=(VjT,EjT) J is the network number of the target point, j belongs to [1,6 ]]。
(3) A feature learning module:
in the study of machine learning related problems, data and features determine the upper limit of the prediction result, and models and algorithms only approximate the upper limit. The feature coding module of the invention solves the problem of feature selection of the first half sentence, namely better learning gene features of a model algorithm, and achieves the most accurate prediction result. The module is based on a drug multilayer network GDWith target multilayer network GTThe method adopts the structural self-encoder to automatically encode the network structure, thereby ensuring the integrity of feature extraction.
(3-1) training the structural autoencoder: drug multilayer network GDWith target multilayer network GTEach layer of (a) correspondingly trains a structural self-encoder, and the training process is as follows:
a. using the adjacent matrix corresponding to the single-layer network as the input of the encoder;
b. after encoding, the output of the encoder is obtained and is used as the input of the decoder;
c. decoding to obtain the output of a decoder, and calculating a loss function by using the adjacency matrix, the output of the encoder and the output of the decoder;
d. calculating the gradient of each parameter of the encoder and the decoder by using a loss function, updating the parameters, wherein the updating step length is a multiple of the negative gradient;
e. repeating steps b through d until the loss function converges.
Said loss function LmThe calculation includes two parts:
first order loss of similarityN is the number of nodes, zpAnd zgRepresenting the coded output vectors, T, of the coder for node p and node g, respectivelypgRepresenting the weight of the connected edge; if it is an interaction network, TpgIt is only possible to take 0 and 1, representing the case of no edge and an edge, respectively; if it is a similarity network, TpgAny value between 0 and 1, inclusive, may be used. The loss function is defined in order to make the feature vectors encoded by drugs or targets with high similarity as similar as possible.
Second order loss of similaritybnAndrepresenting the encoder input vector and the decoder output vector, respectively, of node n. The purpose of defining the loss function is to enable the decoder to reconstruct the original input vector as much as possible from the encoded vector, so that the encoded vector contains as much information as possible of the original vector.
Total loss function Lm=L2nd+λL1stλ is a penalty term, 0 < λ < 1.
(3-2) encoding output: and respectively coding the corresponding network layers by using the coding ends of the trained structural self-coder to obtain multilayer vectors of all the medicines and the target points.
(3-3) processing the same-class feature vectors:
splicing the multiple layers of vectors of a drug to obtain the final characteristic vector representation of the drug;
and splicing the multi-layer vectors of a target point to obtain the final characteristic vector representation of the target point.
(4) A model algorithm design module comprising:
(4-1) constructing a training sample: the drug target pairs include verified drug target pairs and unverified drug target pairs, including undiscovered but objectively interacting drug target pairs. The invention finds out the drug target pairs which have objective interaction but are not discovered from the unverified drug target pairs. Therefore, it can be assumed that the probability that an unverified drug target pair interacts is certainly not greater than the probability of a verified interaction drug target pair. Based on the assumption, a PairWise model is adopted to construct training samples, namely, a positive sample is extracted from a verified and interacted drug target pair, a negative sample is also extracted from an unverified drug target pair, and training samples are constructed through corresponding positive and negative samples to obtain paired positive and negative training sample sets with the same quantity; and randomly dividing the data into M parts, performing M-fold cross validation, namely selecting one part as a validation set and the rest as a training set each time, and adjusting model parameters through the overall performance of the cross validation, wherein M is a positive integer greater than 3.
(4-2) training and evaluating the model: building a lifting tree by adopting a lightweight gradient lifting decision tree and taking the decision tree as a weak learner, namely building the decision tree T (x, theta) by adopting iterationl) Wherein x and θlThe method comprises the following specific processes of inputting feature vectors and learnable parameters of the first decision tree respectively:
(4-2-1) before each round of decision tree construction, screening small gradient samples by using a gradient-based unilateral sampling (GOSS) algorithm, namely reserving a small part of large gradient samples and randomly selecting a part of small gradient samples to calculate the total variance gain, so that the number of samples is reduced;
(4-2-2) before each round of construction of the decision tree, merging mutually exclusive features by using a mutually Exclusive Feature Bundling (EFB) algorithm, thereby reducing feature dimensions;
(4-2-3) constructing a fitting target for the generated first decision tree when an input feature vector x and a corresponding label y of a certain sample are input based on the screened sample: if l is 1, the fitting target is the label of the sample, wherein the label of the positive sample is 1, and the label of the negative sample is 0; when l is more than or equal to 2, the fitting target isWherein the lifting tree obtained after the first-1 iterationL is a loss function, and under the binary task, a single sample (x, y) has a predicted value ofThe time loss function is defined as:
(4-2-4) based on the screened samples, fitting the target to construct a binary decision tree, wherein a leaf node of the binary decision tree is split by the following steps: constructing a histogram for each screened feature according to the value range of the feature, calculating the variance gain of each division point by using the histogram, selecting the feature with the maximum variance gain and the division point as the splitting feature of the current node and the optimal division point, and dividing the data of the leaf node corresponding to the optimal division point into two batches; recursion continues until the maximum depth of the tree is reached. The variance gain of feature f based on dataset D at partition point D is expressed as:
wherein xl、xl,f、glRespectively representing the ith sample vector, the ith feature of the ith sample vector and the negative gradient thereof,
(4-2-5) performing K rounds of iteration to generate K decision trees;
(4-2-6) deciding K decisionsAdding the trees to generate a final lightweight gradient lifting decision treeFor the input feature vector x of the sample, the decision tree output H (x) e [0,1]The probability that the input sample is a positive sample can be interpreted;
(4-3) predicting drug target interaction: and according to the optimal prediction model obtained by the result evaluation module, calculating the interaction probability of all the drug target pairs, and screening out the drug target pairs with high possibility as candidate drug target pairs capable of interacting as prediction results.
(5) The result evaluation module verifies the prediction effect of the model by adopting an ROC curve and a PR curve; the method comprises the following steps:
(5-1) plotting ROC curves: plotting the ROC curve requires generating a confusion matrix, which is also an index for evaluating the model results, is part of the model evaluation, and is represented in the form of a square matrix, displaying the accuracy of the prediction results in a confusion matrix, each column representing the prediction category, the total number of each column representing the number of data predicted as the category, each row representing the true attribution category of data, and the total number of each row representing the number of data instances of the category.
The ROC curve is a new classification model performance evaluation method introduced from the field of medical analysis, is suitable for the research problem of two classifications, and when the ROC curve is drawn, the false positive rate FPR is defined as a horizontal axis, the true positive rate TPR is defined as a vertical axis, the larger the area AUROC value covered by the ROC curve is, namely the closer to 1, the better the prediction effect of the model is represented.
Real positive rate TPR of ROC curveαAnd false positive rate FPRαThe calculation by the confusion matrix is as follows:
in the context of drug target prediction, the presence of drug target pair interaction is a positive sample and the absence is a negative sample. TPαIndicates the number of positive samples, FP, predicted from the positive samples in the test setαIndicating the number of negative samples predicted as positive samples in the test set, FNαDenotes the number of predicted positive samples as negative samples, TNαRepresenting the number of negative samples predicted from the test set; α represents a prediction confidence;
(5-2) drawing a PR curve: the rendering of the PR curve requires the generation of precision-recall sequences that are represented by precision at different prediction confidence degrees alphaαRecall with recall recallingαThe calculation formula is as follows:
the precision rate describes the accuracy rate of correctly classifying the positive samples under the confidence degree alpha, and the recall rate describes the proportion of correctly classifying the positive samples in the total positive samples under the confidence degree alpha; the two show opposite change trends along with the change of alpha. Therefore, an accuracy-recall ratio pair sequence generated by different alpha is utilized, a horizontal axis is used as a recall ratio, a vertical axis is used as an accuracy ratio to draw a precision-recall curve, namely a PR curve, an area AUPR under the PR curve can reflect the classification effect of the classifier on the whole, and the larger the area AUPR under the PR curve is, the closer the area AUPR is to 1, the better the prediction effect of the expression model is;
(5-3) evaluation of model: and (4) according to the prediction result of the step (4-3), utilizing the drawn ROC curve and PR curve, calculating AUROC and AUPR, and searching for a model parameter under the optimal prediction result.
Screening candidate drugs is a main means for assisting the development of new drugs by AI, wherein the computer modeling (i.e. which data structure is adopted to represent both) and prediction model selection of drugs and targets are the most critical two steps. The method adopts two different computer modeling, namely network nodes and characteristic vectors, for the medicine and the target at different stages. Two data models are described below, using drugs as examples.
The drug networks can well reflect the relationship between drugs, and the multilayer networks formed by different types of drug networks can better reflect the relationship at different angles, thereby providing a new idea for drug screening. Specifically, the drug network represents a single drug as a node, and the interaction between drugs is defined as the connecting edges between nodes. The definition of edges is different for different types of drug networks, thus expressing the relationship between drug pairs at different viewing angles. Taking the chemical similarity network of drugs as an example, the edge weight between node pairs represents the chemical structure similarity between corresponding drug pairs, and the absence of an edge represents that the similarity is 0. In the process of constructing a drug network, the edge weights are usually normalized so that the weight values range from 0 to 1.
The eigenvector is an array of real numbers, each of which represents an eigenvalue and contains specific information in the application. In the method, the medicine characteristic vector is obtained by a structural self-encoder based on medicine network encoding, and the topological information of the network is contained in the characteristic value. The autoencoder is an auto-supervised representation learning method, and can convert nodes into feature vectors only according to input (here, a medicine network), and the dimensionality of the feature vectors is far smaller than the number of the nodes. Compared with the traditional one-hot coding, the method greatly reduces the complexity and the sparsity of the data. The structural self-encoder adopted by the method considers the first-order adjacency and the second-order adjacency of the network and more comprehensively comprises the whole structure of the network.
Network representation, vector coding and prediction model training of drugs and targets are the core content of comparison in drug target prediction algorithms. The algorithm model avoids the blindness of manual screening, greatly saves time cost and capital cost, and represents the information into a uniform data form by integrating the information of different aspects of the medicine and the target spot, and provides a feasible paradigm for the future medicine target spot prediction by a plurality of relatively independent and clear modules, thereby improving the prediction accuracy and ensuring the high efficiency, flexibility and expandability of the algorithm.
Claims (8)
1. The medicine target interaction prediction method based on the multilayer network and the graph coding comprises a data acquisition module, a data preprocessing module, a feature learning module, a model algorithm design module and a result evaluation module, and is characterized in that:
(1) the data acquisition module comprises:
(1-1) for drugs, collecting drug-drug interaction relationship data, drug-disease relationship data, drug-side effect relationship data, and six different types of drug-pair similarity relationship data, including: chemical fingerprint data of the medicine, therapeutic data of the medicine, peptide chain data of an action target of the medicine, biological process data of the medicine, molecular function data of the medicine and action cell component data of the medicine;
(1-2) for the target, namely protein, collecting target-target interaction relationship data, target-disease relationship data, and four different types of target-pair similarity relationship data, including: peptide chain data of the target spot, biological process data of the target spot, cell component data of the target spot and target spot molecule function data;
(1-3) collecting interaction relation data of the medicine and the target;
(2) the data preprocessing module comprises a medicine and target related network and a multilayer network;
(2-1) the construction of the drug and target related network comprises:
A. for single-class object interaction relation data, constructing homogeneous interaction network, including drug interaction network G1DTarget interaction network G1T;
B. For different classes of object interaction relationship data, constructing heterogeneous interaction networks, including drug disease related network GD_DINetwork G relating to side effects of drugsD_SETarget disease-related network GT_DI;
C. Collecting drug information of different dimensions, and constructing drug similarity network including chemical similarity network G of drug2DTherapeutic similarity network of drugs G3DThe action of the drugsTarget sequence similarity network G4DBiological process similarity network G of drugs5DMolecular functional similarity network G of drugs6DNetwork of similarity of active cellular components of drugs G7D;
D. Collecting target point information of different dimensions, and constructing a target point similarity network including a target point sequence similarity network G2TTarget biological process similarity network G3TSimilarity network G of cellular components of target site4TTarget molecule functional similarity network G5T;
E. Construction of drug target interaction network GD_T;
(2-2) the method for generating the multilayer network comprises the steps of generating a medicine multilayer network and generating a target multilayer network, and comprises the following specific steps:
(2-2-1) first, the drug disease-related network GD_DIDisease similarity network G decomposed and converted into drug8D=(V8D,E8D) In which V is8D、E8DRespectively representing a drug node set in the network and an edge weight set of disease similarity between two drugs; margin for disease similarity of drugsxD_MAnd yD_MTwo drugs are shown in GD_DIThe corresponding row vector in the adjacency matrix of (a) represents the vector modulo;
network G relating drug side effectsD_SENetwork G of similarity of side effects of drug decomposition and conversion9D=(V9D,E9D) In which V is9D、E9DRespectively representing a drug node set in the network and an edge weight set of side effect similarity between two drugs; margin for similarity of side effects of drugsxD_SEAnd yD_SETwo drugs are shown in GD_SEThe corresponding row vector in the adjacency matrix of (a);
targeting pointDisease-related network GT_DIDecomposing and converting into target disease similarity network G6T=(V6T,E6T) In which V is6T、E6TRespectively representing a target point node set in the network and an edge weight set of disease similarity between two target points; margin for disease similarity of targetsxT_DIAnd yT_DIIndicates that two target points are at GT_DIThe corresponding row vector in the adjacency matrix of (a);
(2-2-2) then combining the drug-related networks into a drug multilayer network GD={GiD=(ViD,EiD) I is the drug network number, i belongs to [1,9 ]](ii) a Combining target related networks into a target multilayer network GT={GjT=(VjT,EjT) J is the network number of the target point, j belongs to [1,6 ]];
(3) The feature learning module comprises a training structural self-encoder, encoding output and similar feature vector processing;
(3-1) training the structural autoencoder: drug multilayer network GDWith target multilayer network GTCorrespondingly training a structural self-encoder for each layer;
(3-2) encoding output: respectively coding the corresponding network layers by using the coding ends of the trained structural self-coder to obtain multilayer vectors of all the medicines and the target points;
(3-3) processing the similar feature vectors: splicing the multiple layers of vectors of a drug to obtain the final characteristic vector representation of the drug; splicing the multi-layer vectors of a target point to obtain the final characteristic vector representation of the target point;
(4) the model algorithm design module comprises:
(4-1) constructing a training sample: constructing a training sample by adopting a PairWise model, randomly dividing data into M parts, and performing M-fold cross validation, namely selecting one part as a validation set and the rest as a training set each time, adjusting model parameters through the overall performance of the cross validation, wherein M is a positive integer greater than 3;
(4-2) training and evaluating the model: building a lifting tree by adopting a lightweight gradient lifting decision tree and taking the decision tree as a weak learner, namely building the decision tree T (x, theta) by adopting iterationl) Wherein x and θlRespectively inputting a characteristic vector and a learnable parameter of the first decision tree;
(4-3) predicting drug target interaction: calculating the interaction probability of all drug target pairs according to the optimal prediction model obtained by the result evaluation module, and screening out drug target pairs with high possibility as candidate drug target pairs capable of interacting as prediction results;
(5) the result evaluation module verifies the prediction effect of the model by adopting an ROC curve and a PR curve; the method comprises the following steps:
(5-1) plotting ROC curves: defining the false positive rate FPR as a horizontal axis and the true positive rate TPR as a vertical axis, wherein the larger the area AUROC value covered by the ROC curve is, the better the prediction effect of the model is represented;
real positive rate TPR of ROC curveαAnd false positive rate FPRαThe calculation by the confusion matrix is as follows:
the drug target pair is a positive sample in the presence of interaction, and is a negative sample in the absence of interaction; TPαIndicates the number of positive samples, FP, predicted from the positive samples in the test setαIndicating the number of negative samples predicted as positive samples in the test set, FNαDenotes the number of predicted positive samples as negative samples, TNαRepresenting the number of negative samples predicted in the test set as negative samples; α represents a prediction confidence;
(5-2) drawing a PR curve: precision at different prediction confidence alphaαRecall with recall recallingαCompose precision-recall sequence:
drawing a precision-recall curve, namely a PR curve, by taking the horizontal axis as recall rate and the vertical axis as precision rate, wherein the AUPR (area under PR) can reflect the classification effect of the classifier on the whole, and the larger the AUPR value of the area under PR is, the better the prediction effect of the model is;
(5-3) evaluation of model: and (4) according to the prediction result of the step (4-3), utilizing the drawn ROC curve and PR curve, calculating AUROC and AUPR, and searching for a model parameter under the optimal prediction result.
2. The method for predicting the interaction of a drug target based on multilayer network and graph coding according to claim 1, wherein: in the (2-1), A is specifically:
constructing a drug interaction network G for drug and drug interaction relation data1D=(V1D,E1D),V1DRepresenting a set of drug nodes in the network, E1DA set of edges indicating the presence of interaction between two drugs in the network;
constructing a target interaction network G for the interaction relation data of the target and the target1T=(V1T,E1T),V1TRepresenting a set of target nodes in the network, E1TIndicating a set of edges in the network that have an interaction between two targets.
3. The method for predicting the interaction of a drug target based on multilayer network and graph coding according to claim 1, wherein: in the (2-1), B is specifically:
for the relation data of the medicine and the disease, a medicine disease related network is constructedWhereinED_DIRespectively representing a medicine node set, a disease node set and an edge set of the relation between the medicine and the disease in the network;
for the relation data of the medicine and the side effect, a medicine side effect related network is constructedWhereinED_SERespectively representing a drug node set, a side effect node set and an edge set of the relationship between the drug and the side effect in the network;
4. The method for predicting the interaction of a drug target based on multilayer network and graph coding according to claim 1, wherein: c in (2-1) is specifically:
for chemical fingerprint data of medicine, constructing chemical similarity network G of medicine2D=(V2D,E2D) In which V is2D、E2DRespectively representing a drug node set and an edge weight set of chemical similarity between two drugs in the network; edge weights for chemical similarityWherein a is1And b1Is the bit number of MACCS fingerprints of two drugs respectively, c1The number of the same bit of the two medicines;
for therapeutic data of a drug, a therapeutic similarity network G of the drug is constructed3D=(V3D,E3D) In which V is3D、E3DA set of drug nodes in the network, a set of side weights representing therapeutic similarity between two drugs, respectively; margin for therapeutic similarityWherein a is2And b2Coding for the respective ATC of the two drugs, c2The number of digits for the same ATC code for both drugs;
constructing a medicine action target point sequence similarity network G for the peptide chain data of the medicine action target point4D=(V4D,E4D) In which V is4D、E4DRespectively representing a drug node set in the network and an edge weight set of action target point similarity between two drugs; margin for similarity of drug action targetsWherein a and b represent the respective targets of the two drugs, TT_T(a, b) shows the sequence similarity of respective targets of the two drugs, mean (-) shows the mean;
for biological process data of the drug, a biological process similarity network G of the drug is constructed5D=(V5D,E5D) In which V is5D、E5DRespectively representing a drug node set in the network and an edge weight set of the similarity of biological processes between two drugs; margin for similarity of pharmacogenomic processesTT_P(a, b) indicates the similarity of biological processes of the respective targets of the two drugs;
for the molecular function data of the medicine, a molecular function similarity network G of the medicine is constructed6D=(V6D,E6D) In which V is6D、E6DRespectively representing a drug node set in the network and an edge weight set of molecular function similarity between two drugs; the boundary of functional similarity of drug moleculesTT_M(a, b) represents the molecular functional similarity of the respective targets of the two drugs;
for the acting cell component data of the medicine, constructing an acting cell component similarity network G of the medicine7D=(V7D,E7D) In which V is7D、E7DRespectively representing a drug node set in the network and an edge weight set of similarity of acting cell components between two drugs; margin for similarity of cell components for drug actionTT_C(a, b) shows the similarity of the acting cellular components of the respective targets of the two drugs.
5. The method for predicting the interaction of a drug target based on multilayer network and graph coding according to claim 1, wherein: in the (2-1), D is specifically:
constructing a target sequence similarity network G for the peptide chain data of the target2T=(V2T,E2T) In which V is2T、E2TRespectively representing a target point node set and an edge weight set of sequence similarity between two target points in the network; sequence similarity marginWherein a is3And b3The number of peptide chain sequence positions of two targets respectively, c3The number of bits of the peptide chain sequence which is the same with the two targets;
for the biological process data of the target, a similarity network G of the biological process of the target is constructed3T=(V3T,E3T) In which V is3T、E3TRespectively representing a target point node set in the network and an edge weight set of the similarity of the biological processes between two target points; edge weights T of similarity of target biological processesT_P(a, b) semantic annotation of GO in the biological process of two targets;
constructing the cell of the target point according to the cell component data of the target pointComponent similarity network G4T=(V4T,E4T) In which V is4T、E4TRespectively representing a target point node set in the network and an edge weight set of the similarity of the cell components between the two target points; margin T of similarity of cellular components at target siteT_C(a, b) semantic annotation of GO of cell components of two target points;
constructing a target molecule functional similarity network G for the target molecule functional data5T=(V5T,E5T) In which V is5T、E5TRespectively representing a target point node set in the network and an edge weight set of molecular function similarity between two target points; edge weight T of target molecule functional similarityT_M(a, b) are obtained by GO semantic annotation of the molecular functions of the two targets.
6. The method for predicting the interaction of a drug target based on multilayer network and graph coding according to claim 1, wherein: in the (2-1), E is specifically:
7. The method for predicting the interaction of a drug target based on multilayer network and graph coding according to claim 1, wherein: (3-1) the training process is as follows:
a. using the adjacent matrix corresponding to the single-layer network as the input of the encoder;
b. after encoding, the output of the encoder is obtained and is used as the input of the decoder;
c. obtaining the output of a decoder after decoding, and calculating a loss function by utilizing the adjacency matrix, the output of the encoder and the output of the decoder;
d. calculating the gradient of each parameter of the encoder and the decoder by using a loss function, updating the parameters, wherein the updating step length is a multiple of the negative gradient;
e. repeating steps b to d until the loss function converges;
said loss function LmThe calculation includes two parts:
first order loss of similarityN is the number of nodes, zpAnd zgRepresenting the coded output vectors, T, of the coder for node p and node g, respectivelypgRepresenting the weight of the connected edge;
second order loss of similaritybnAndan encoder input vector and a decoder output vector representing node n, respectively;
total loss function Lm=L2nd+λL1stλ is a penalty term, 0 < λ < 1.
8. The method for predicting the interaction of a drug target based on multilayer network and graph coding according to claim 1, wherein: (4-2) the specific process is as follows:
(4-2-1) before each round of decision tree construction, screening out small gradient samples by using a gradient-based unilateral sampling algorithm, namely reserving a small part of large gradient samples and randomly selecting a part of small gradient samples to calculate the total variance gain;
(4-2-2) before each round of decision tree construction, merging mutually exclusive features by using a mutually Exclusive Feature Bundling (EFB) algorithm;
(4-2-3) constructing a simulation for the generated first decision tree based on the screened samples when the input feature vector x and the corresponding label y of a certain sample are inputSynthesizing a target: if l is 1, the fitting target is the label of the sample, wherein the label of the positive sample is 1, and the label of the negative sample is 0; when l is more than or equal to 2, the fitting target isWherein the lifting tree obtained after the first-1 iterationL is a loss function, and under the binary task, a single sample (x, y) has a predicted value ofThe time loss function is defined as:
(4-2-4) constructing a binary decision tree by fitting the target based on the screened samples, wherein a leaf node of the binary decision tree is split by the following steps: constructing a histogram for each screened feature according to the value range of the feature, calculating the variance gain of each division point by using the histogram, selecting the feature with the maximum variance gain and the division point as the splitting feature and the optimal division point of the current node, and dividing the data of the leaf node corresponding to the optimal division point into two batches; recursion is continued until the maximum depth of the tree is reached; the variance gain of feature f based on dataset D at partition point D is expressed as:
wherein xl、xl,f、glRespectively representing the ith sample vector, the ith feature of the ith sample vector and the negative gradient thereof,andrespectively counting the number of samples with the characteristics f smaller than the division point D and larger than the division point D in the data set D;
(4-2-5) performing K rounds of iteration to generate K decision trees;
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202110865457.9A CN113571125A (en) | 2021-07-29 | 2021-07-29 | Drug target interaction prediction method based on multilayer network and graph coding |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202110865457.9A CN113571125A (en) | 2021-07-29 | 2021-07-29 | Drug target interaction prediction method based on multilayer network and graph coding |
Publications (1)
Publication Number | Publication Date |
---|---|
CN113571125A true CN113571125A (en) | 2021-10-29 |
Family
ID=78169065
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202110865457.9A Pending CN113571125A (en) | 2021-07-29 | 2021-07-29 | Drug target interaction prediction method based on multilayer network and graph coding |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN113571125A (en) |
Cited By (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN114023464A (en) * | 2021-11-08 | 2022-02-08 | 东北林业大学 | Drug-target interaction prediction method based on supervised synergy map contrast learning |
CN114038499A (en) * | 2021-11-12 | 2022-02-11 | 东南大学 | Traditional Chinese medicine prescription active ingredient group prediction method based on heterogeneous network embedding |
CN114334038A (en) * | 2021-12-31 | 2022-04-12 | 杭州师范大学 | Disease drug prediction method based on heterogeneous network embedded model |
CN114944191A (en) * | 2022-06-21 | 2022-08-26 | 湖南中医药大学 | Component-target interaction prediction method based on web crawler and multi-modal characteristics |
CN114974408A (en) * | 2022-05-26 | 2022-08-30 | 浙江大学 | Construction method, prediction method and device of drug interaction prediction model |
WO2023123168A1 (en) * | 2021-12-30 | 2023-07-06 | Boe Technology Group Co., Ltd. | Method of generating negative sample set for predicting macromolecule-macromolecule interaction, method of predicting macromolecule-macromolecule interaction, method of training model |
-
2021
- 2021-07-29 CN CN202110865457.9A patent/CN113571125A/en active Pending
Cited By (8)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN114023464A (en) * | 2021-11-08 | 2022-02-08 | 东北林业大学 | Drug-target interaction prediction method based on supervised synergy map contrast learning |
CN114023464B (en) * | 2021-11-08 | 2022-08-09 | 东北林业大学 | Drug-target interaction prediction method based on supervised synergy map contrast learning |
CN114038499A (en) * | 2021-11-12 | 2022-02-11 | 东南大学 | Traditional Chinese medicine prescription active ingredient group prediction method based on heterogeneous network embedding |
WO2023123168A1 (en) * | 2021-12-30 | 2023-07-06 | Boe Technology Group Co., Ltd. | Method of generating negative sample set for predicting macromolecule-macromolecule interaction, method of predicting macromolecule-macromolecule interaction, method of training model |
CN114334038A (en) * | 2021-12-31 | 2022-04-12 | 杭州师范大学 | Disease drug prediction method based on heterogeneous network embedded model |
CN114334038B (en) * | 2021-12-31 | 2024-05-14 | 杭州师范大学 | Disease medicine prediction method based on heterogeneous network embedded model |
CN114974408A (en) * | 2022-05-26 | 2022-08-30 | 浙江大学 | Construction method, prediction method and device of drug interaction prediction model |
CN114944191A (en) * | 2022-06-21 | 2022-08-26 | 湖南中医药大学 | Component-target interaction prediction method based on web crawler and multi-modal characteristics |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN113571125A (en) | Drug target interaction prediction method based on multilayer network and graph coding | |
CN111312329B (en) | Transcription factor binding site prediction method based on deep convolution automatic encoder | |
CN113327644B (en) | Drug-target interaction prediction method based on deep embedding learning of graph and sequence | |
CN110110324B (en) | Biomedical entity linking method based on knowledge representation | |
CN113936735A (en) | Method for predicting binding affinity of drug molecules and target protein | |
Yu | Three principles of data science: predictability, computability, and stability (PCS) | |
CN113393911B (en) | Ligand compound rapid pre-screening method based on deep learning | |
CN111681718B (en) | Medicine relocation method based on deep learning multi-source heterogeneous network | |
CN111370073B (en) | Medicine interaction rule prediction method based on deep learning | |
CN115526246A (en) | Self-supervision molecular classification method based on deep learning model | |
CN116798652A (en) | Anticancer drug response prediction method based on multitasking learning | |
CN115985520A (en) | Medicine disease incidence relation prediction method based on graph regularization matrix decomposition | |
Ma et al. | Heuristics and metaheuristics for biological network alignment: A review | |
CN114021584A (en) | Knowledge representation learning method based on graph convolution network and translation model | |
CN116646001B (en) | Method for predicting drug target binding based on combined cross-domain attention model | |
CN115458046B (en) | Method for predicting drug target binding property based on parallel deep fine granularity model | |
CN114999566B (en) | Drug repositioning method and system based on word vector characterization and attention mechanism | |
CN116978464A (en) | Data processing method, device, equipment and medium | |
CN116312808A (en) | TransGAT-based drug-target interaction prediction method | |
CN112735604B (en) | Novel coronavirus classification method based on deep learning algorithm | |
CN114970684A (en) | Community detection method for extracting network core structure by combining VAE | |
Abd Elaziz et al. | Quantum artificial hummingbird algorithm for feature selection of social IoT | |
CN117976047B (en) | Key protein prediction method based on deep learning | |
Halsana et al. | DensePPI: A Novel Image-Based Deep Learning Method for Prediction of Protein–Protein Interactions | |
Zhang et al. | Enhanced Gradient for Differentiable Architecture Search |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination |