CN113223655A - Medicine-disease associated prediction method based on variational self-encoder - Google Patents

Medicine-disease associated prediction method based on variational self-encoder Download PDF

Info

Publication number
CN113223655A
CN113223655A CN202110496613.9A CN202110496613A CN113223655A CN 113223655 A CN113223655 A CN 113223655A CN 202110496613 A CN202110496613 A CN 202110496613A CN 113223655 A CN113223655 A CN 113223655A
Authority
CN
China
Prior art keywords
drug
disease
encoder
association
matrix
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202110496613.9A
Other languages
Chinese (zh)
Other versions
CN113223655B (en
Inventor
鱼亮
陈生建
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Xidian University
Original Assignee
Xidian University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Xidian University filed Critical Xidian University
Priority to CN202110496613.9A priority Critical patent/CN113223655B/en
Publication of CN113223655A publication Critical patent/CN113223655A/en
Application granted granted Critical
Publication of CN113223655B publication Critical patent/CN113223655B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16HHEALTHCARE INFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR THE HANDLING OR PROCESSING OF MEDICAL OR HEALTHCARE DATA
    • G16H20/00ICT specially adapted for therapies or health-improving plans, e.g. for handling prescriptions, for steering therapy or for monitoring patient compliance
    • G16H20/10ICT specially adapted for therapies or health-improving plans, e.g. for handling prescriptions, for steering therapy or for monitoring patient compliance relating to drugs or medications, e.g. for ensuring correct administration to patients
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16BBIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
    • G16B50/00ICT programming tools or database systems specially adapted for bioinformatics
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16HHEALTHCARE INFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR THE HANDLING OR PROCESSING OF MEDICAL OR HEALTHCARE DATA
    • G16H50/00ICT specially adapted for medical diagnosis, medical simulation or medical data mining; ICT specially adapted for detecting, monitoring or modelling epidemics or pandemics
    • G16H50/20ICT specially adapted for medical diagnosis, medical simulation or medical data mining; ICT specially adapted for detecting, monitoring or modelling epidemics or pandemics for computer-aided diagnosis, e.g. based on medical expert systems
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16HHEALTHCARE INFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR THE HANDLING OR PROCESSING OF MEDICAL OR HEALTHCARE DATA
    • G16H50/00ICT specially adapted for medical diagnosis, medical simulation or medical data mining; ICT specially adapted for detecting, monitoring or modelling epidemics or pandemics
    • G16H50/70ICT specially adapted for medical diagnosis, medical simulation or medical data mining; ICT specially adapted for detecting, monitoring or modelling epidemics or pandemics for mining of medical data, e.g. analysing previous cases of other patients
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02ATECHNOLOGIES FOR ADAPTATION TO CLIMATE CHANGE
    • Y02A90/00Technologies having an indirect contribution to adaptation to climate change
    • Y02A90/10Information and communication technologies [ICT] supporting adaptation to climate change, e.g. for weather forecasting or climate simulation

Landscapes

  • Engineering & Computer Science (AREA)
  • Health & Medical Sciences (AREA)
  • Medical Informatics (AREA)
  • Public Health (AREA)
  • General Health & Medical Sciences (AREA)
  • Primary Health Care (AREA)
  • Data Mining & Analysis (AREA)
  • Biomedical Technology (AREA)
  • Databases & Information Systems (AREA)
  • Epidemiology (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Physics & Mathematics (AREA)
  • Pathology (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Biophysics (AREA)
  • Medicinal Chemistry (AREA)
  • Bioethics (AREA)
  • Chemical & Material Sciences (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Biotechnology (AREA)
  • Evolutionary Biology (AREA)
  • Spectroscopy & Molecular Physics (AREA)
  • Theoretical Computer Science (AREA)
  • Medical Treatment And Welfare Office Work (AREA)

Abstract

The invention provides a medicine-disease association prediction method based on a variational self-encoder, which mainly solves the problem of low accuracy of medicine-disease association prediction in the prior art and comprises the following steps: (1) constructing a drug-disease association matrix A and a disease-drug association matrix B; (2) constructing a drug characteristic matrix C and a disease characteristic matrix D; (3) constructing a medicine-disease associated prediction model H based on a variational self-encoder; (4) performing iterative training on a medicine-disease associated prediction model H based on a variational self-encoder; (5) and obtaining a medicine-disease correlation prediction result Y. The method reduces the influence of noise and data loss on the prediction result, fully extracts the implicit information of complex data, effectively improves the accuracy of drug-disease association prediction, and can be used for drug candidates for drug relocation.

Description

Medicine-disease associated prediction method based on variational self-encoder
Technical Field
The invention belongs to the technical field of bioinformatics, relates to a medicine-disease correlation prediction method, and particularly relates to a medicine-disease correlation prediction method based on a variational self-encoder, which can be used for providing candidate recommendation for new treatment application of the existing medicine in medicine relocation.
Background
The purpose of drug relocation is to determine the new application of the existing drugs, compared with the traditional drug research, the risk is greatly reduced, and the cost and the time are saved, so that the drug relocation is widely concerned, and the new indications of the existing drugs account for 20% of 84 drugs listed in 2013. In recent years, non-trade organizations, academic institutions, and governments have placed increasing emphasis on, and provide substantial financial support for, drug relocation. For example, the national center for advanced transformation science and the british medical research council have initiated a number of major funding projects in the field of drug relocation, with the goal of expanding drug molecules that have undergone significant research and development by the pharmaceutical industry to more new indications. In addition, the U.S. food and drug administration FDA has also created multiple common databases that are dedicated to computing drug relocation services, which provides much assistance for drug relocation.
The identification of drug-disease associations can provide important information for drug discovery and drug relocation. Because manual surveys are time consuming, a large number of computational methods have been proposed as high-throughput techniques have evolved and databases have been continuously updated.
In 2016, Luo et al published on Bioinformatics paper "Drug disposition based on comprehensive similarity metric similarity measures and Bi-Random walk algorithm," and disclosed a Drug-disease association prediction method MBIRW based on comprehensive similarity measures and two-way Random walk that identifies potential new indications for a given Drug based on the assumption that similar drugs are usually associated with similar diseases and vice versa, using some comprehensive similarity measures and two-way Random walk algorithms. By combining the drug or disease characteristic information with the known drug-disease association information, a comprehensive similarity measurement method is established to calculate the similarity of the drug and the disease. Drug-like and disease-like networks are then constructed and integrated into heterogeneous networks where drugs are known to interact with diseases. Based on the drug-disease heterogeneous network, a two-way random walk algorithm is employed to predict new potential drug-disease associations.
Luo et al published a paper "practical Drug reproduction using Low-Rank Matrix Approximation and random optimized Algorithms" in 2018 on Bioinformatics, disclosing a Drug-disease association prediction method DRRS using Low Rank Matrix Approximation and random algorithm, which predicts new Drug indications by integrating relevant data information of drugs and diseases. First, a heterogeneous drug-disease interaction network is constructed by integrating drug-drug, disease-disease, and drug-disease networks. The heterogeneous network is represented by a large drug-disease adjacency matrix whose entries include drug pairs, disease pairs, known drug-disease interaction pairs, and unknown drug-disease pairs. Then, for the unknown drug-disease pairs, the drug-disease adjacency matrix is complemented with the predicted unknown drug-disease pair scores using the fast singular value threshold SVT algorithm.
However, the above algorithm is operated in a default noise-free environment, and the processing capability of sparse data is not good enough, that is, the anti-interference capability is weak, and meanwhile, the above algorithm is difficult to learn deep information of complex data, and cannot sufficiently extract implicit information of the complex data.
Disclosure of Invention
The invention aims to provide a medicine-disease association prediction method based on a variational self-encoder aiming at overcoming the defects of the prior art, and aims to solve the problem of low medicine-disease association prediction precision in the prior art.
In order to achieve the purpose, the technical scheme adopted by the invention comprises the following steps:
(1) constructing a drug-disease association matrix A and a disease-drug association matrix B:
(1a) obtaining S ═ S of M medicines from database1,S2,...,Sm,...,SMThere are associated N diseases T ═ T1,T2,...,Tn,...,TNK drug-disease association data E ═ E1,E2,...,Ek,...,EKS, each drugmIs associated with at least one disease, and each diseaseDisease TnIs associated with at least one drug, wherein K is more than or equal to 1000, M is more than or equal to 100, N is more than or equal to 200, SmDenotes the m-th drug, TnM is 1. ltoreq. m.ltoreq.M, N is 1. ltoreq. N, EkIndicates the kth drug-disease association;
(1b) constructing the element A with the size of M × N and the M row and the N columnmnAnd transposing A to obtain a disease-drug association matrix B, wherein A is 0 or 1mnWhen the value of (A) is 0, it represents AmnThe correlation between the corresponding mth drug and the nth disease is not found in the drug-disease correlation data E, AmnWhen the value of (A) is 1, represents AmnThe association of the corresponding mth drug and nth disease is in drug-disease association data E;
(2) constructing a drug characteristic matrix C and a disease characteristic matrix D:
(2a) obtaining S ═ S of M medicines from database1,S2,...,Sm,...,SMThere are associated P genes G ═ G }1,G2,...,Gp,...,GPQ pieces of drug-gene association data R ═ R } ═ R1,R2,...,Rq,...,RQS, each drugmIs associated with at least one gene, and each gene GpAssociated with at least one drug; construction size M P and M row P column element C'mpA drug-gene association matrix C 'of value 0 or 1, wherein C'mpIs C 'when the value of (2)'mpThe correlation between the corresponding mth drug and the pth gene is not in the drug-gene correlation data R, C'mpIs C 'when the value of (2)'mpThe corresponding M-th medicine and P-th gene are related in the medicine-gene related data R, P is more than or equal to 200, Q is more than or equal to 1000, M is more than or equal to 1 and less than or equal to M, P is more than or equal to 1 and less than or equal to P, GpDenotes the p-th gene, RqRepresents the q-th drug-gene association;
(2b) obtaining T ═ T of N diseases from database1,T2,...,Tn,...,TNThere are associated O kinds of genes G ═ G }1,G2,...,Go,...,GOJ pieces of disease-gene association data U ═ U1,U2,...,Uj,...,UJ}, T for each diseasenIs associated with at least one gene, and each gene GoIs associated with at least one disease; construction of size N O and N row O column element D'noThe disease-gene correlation matrix D ' having a value of 0 or 1, wherein D ' represents D ' when the value of D ' is 0 'noIf the correlation between the corresponding n-th disease and the o-th gene is not in the disease-gene correlation data U, D 'represents D' when the value of D 'is 1'noThe corresponding N-th disease and O-th gene are related in the disease-gene related data U, O is more than or equal to 200, J is more than or equal to 1000, N is more than or equal to 1 and less than or equal to N, O is more than or equal to 1 and less than or equal to O, UjRepresents the jth disease-gene association;
(2c) respectively reducing dimensions of C 'with the size of M multiplied by P and D' with the size of N multiplied by O to obtain a medicine characteristic matrix C with the size of M multiplied by V and a disease characteristic matrix D with the size of N multiplied by W, wherein each line in C is the characteristic of the medicine in the line, each line in D is the characteristic of the disease in the line, V is more than or equal to 1 and less than or equal to P, and W is more than or equal to 1 and less than or equal to O;
(3) constructing a medicine-disease associated prediction model H based on a variational self-encoder:
(3a) constructing a medicine-disease associated prediction model H structure based on a variational self-encoder:
constructing a first variational autocoder f comprising a parallel arrangement1And a second variational self-encoder f2The drug-disease association prediction model of (1), wherein the first variation is from the encoder f1Using a first encoder f comprising a series connectione 1A first latent variable layer fz 1And a first decoder fd 1Of a neural network ofe 1Comprising a plurality of fully connected layers and a mean variance layer, fz 1Is connected to a first data fusion module, fd 1Comprising a plurality of fully-connected layers and a sigmoid-activated function output layer, f1The weight parameter is
Figure BDA0003054669440000032
(ii) a Second variational autoencoder f2Comprising a second encoder f connected in seriese 2A second latent variable layer fz 2And a second decoder fd 2,fe 2Comprising a plurality of fully connected layers and a mean variance layer, fz 2Is connected to a second data fusion module, fd 2Comprising a plurality of fully-connected layers and a sigmoid-activated function output layer, f2The weight parameter is
Figure BDA0003054669440000033
(3b) Defining a first variational autoencoder f1Loss function Loss1 and second variational self-encoder f2Loss function Loss 2:
Figure BDA0003054669440000031
Figure BDA0003054669440000041
Figure BDA0003054669440000042
Figure BDA0003054669440000043
wherein x represents f1The input data of (a) to (b),
Figure BDA0003054669440000044
denotes f1The result of the prediction of (a) is,
Figure BDA0003054669440000045
Lredenotes f1Loss of reconstruction of, POxDenotes the set of elements with a median value of 1, POx={xi|xi=1,1≤i≤N},NPxDenotes the set of elements with a value of 0 in x, NPx={xj|xj=0,1≤j≤N},xiAnd xjRespectively representing the ith and jth elements of x, beta representing a non-positive loss attenuation factor, non-positive indicating that the current association is not among the known associations, beta ∈ [0,1 ]];
Figure BDA0003054669440000046
Represents the mean value of μxVariance of
Figure BDA0003054669440000047
Normal distribution of (1), N (0,1) represents the standard positive-Taiwan distribution,
Figure BDA0003054669440000048
to represent
Figure BDA0003054669440000049
And the relative entropy of N (0,1),
Figure BDA00030546694400000410
μxand deltaxRespectively represents f1When the input is x, fe 1A represents a relative entropy loss attenuation factor, a ∈ [0,1 ]](ii) a y represents f2The input data of (a) to (b),
Figure BDA00030546694400000411
denotes f2The result of the prediction of (a) is,
Figure BDA00030546694400000412
(4) performing iterative training on a variational self-encoder-based medicine-disease associated prediction model H:
(4a) the initial iteration number is I, the maximum iteration number is I, I is more than or equal to 300, and the ith iteration is a first variational self-encoder f1The weight parameter is
Figure BDA00030546694400000413
And a second variational self-encoder f2The weight parameter is
Figure BDA00030546694400000414
And let i be 0 and/or 0,
Figure BDA00030546694400000415
Figure BDA00030546694400000416
(4b) using the drug-disease association matrix A and the drug characteristics C as a first variational self-encoder f in the drug-disease association prediction model H1Input of (1), a first encoder fe 1Coding A line by line, a first hidden variable layer fz 1To fe 1Mean value of the code
Figure BDA00030546694400000417
Sum variance
Figure BDA00030546694400000418
Constructed normal distribution
Figure BDA00030546694400000419
Sampling is carried out, and the first data fusion module is used for fz 1Hidden variables with V dimension collected
Figure BDA00030546694400000420
Additive fusion with the drug C of the corresponding row in the drug signature C, a first decoder fd 1Fusion result to the first data fusion module
Figure BDA00030546694400000421
Decoding to obtain the predicted medicine-disease correlation matrix
Figure BDA00030546694400000422
(4c) Using the disease-drug association matrix B and the disease characteristics D as a second variational self-encoder f in the drug-disease association prediction model H2Input of (2), a second encoder fe 2Coding B line by line, a second latent variable layerfz 2To fe 2Mean value of the code
Figure BDA00030546694400000423
Sum variance
Figure BDA00030546694400000424
Constructed positive Taiwan distribution
Figure BDA00030546694400000425
Sampling is performed, and the second data fusion module pair fz 2Hidden variables of dimension W are collected
Figure BDA00030546694400000426
Additive fusion with the drugs D of the corresponding row in the drug profile D, a second decoder fd 2Fusion results to the second data fusion module
Figure BDA0003054669440000051
Decoding to obtain a predicted disease-drug correlation matrix
Figure BDA0003054669440000052
(4d) Using Loss function Loss1 and passing
Figure BDA0003054669440000053
A and
Figure BDA0003054669440000054
first variational autocoder f in calculating H1Loss value of L1iWhile using the Loss function Loss2 and passing
Figure BDA0003054669440000055
B and
Figure BDA0003054669440000056
second variational autocoder f in calculation H2Loss value of L2i
(4e) By reverse transmissionBroadcasting method, and through L1iCalculating f1Then using a gradient descent algorithm through f1Parameter gradient pair f1Weight parameter of
Figure BDA0003054669440000057
Updating is carried out; while using the counter-propagating method, and passing through L2iCalculating f2Then using a gradient descent algorithm through f2Parameter gradient pair f2Weight parameter of
Figure BDA0003054669440000058
Updating is carried out;
(4f) judging whether I is greater than or equal to I, if so, obtaining a trained drug-disease association prediction model H', otherwise, enabling I to be I +1, and executing the step (4 b);
(5) obtaining a drug-disease association prediction result Y:
using the drug-disease association matrix A and the drug characteristics C as a first variational self-encoder f in a trained drug-disease association prediction model H1Is propagated forward to obtain f1Predicted drug-disease association set Y1Simultaneously using the disease-drug association matrix B and the disease characteristics D as a second variational self-encoder f in the trained drug-disease association prediction model H2Is propagated forward to obtain f2Predicted drug-disease association set Y2,Y1And Y2Y is1∩Y2The prediction result of drug-disease association is obtained, wherein, n represents intersection.
Compared with the prior art, the invention has the following advantages:
1. the drug-disease associated prediction model based on the variational self-encoder comprises two variational self-encoders arranged in parallel and two data fusion modules, and in the process of performing iterative training on the model and acquiring a drug-disease associated result, the two data fusion modules fuse various information related to drugs and diseases, so that implicit information of complex data is fully extracted.
2. The medicine-disease associated prediction model constructed by the invention learns the data distribution rather than the unique characteristic representation of the data, so that the influence of noise and data loss on the prediction result can be reduced, and the medicine-disease associated prediction precision is further improved compared with the prior art.
Drawings
FIG. 1 is a flow chart of an implementation of the present invention.
Detailed Description
The invention will be described in further detail with reference to the following drawings and specific examples, which are not intended to limit the invention to the 25 th clause, but are in accordance with the second clause of the patent statutes:
referring to fig. 1, the present example includes the steps of:
step 1) constructing a drug-disease association matrix A and a disease-drug association matrix B:
step 1a) obtaining from the database S ═ S for M drugs1,S2,...,Sm,...,SMThere are associated N diseases T ═ T1,T2,...,Tn,...,TNK drug-disease association data E ═ E1,E2,...,Ek,...,EKS, each drugmIs associated with at least one disease, and each disease TnAt least one drug is associated, in this example, K ═ 2352, M ═ 663, N ═ 409, SmDenotes the m-th drug, TnM is 1. ltoreq. m.ltoreq.M, N is 1. ltoreq. N, EkIndicates the kth drug-disease association;
step 1b) constructing the element A with the size of M multiplied by N and the mth row and the nth columnmnAnd transposing A to obtain a disease-drug association matrix B, wherein A is 0 or 1mnWhen the value of (A) is 0, it represents AmnThe number of drug-disease associations between the mth drug and the nth diseaseAccording to E, AmnWhen the value of (A) is 1, represents AmnThe association of the corresponding mth drug and nth disease is in drug-disease association data E.
Step 2), constructing a drug characteristic matrix C and a disease characteristic matrix D:
the drug characteristic matrix C and the disease characteristic matrix C of the present example are obtained based on the drug similarity matrix C 'and the disease similarity matrix D'; the Drug similarity matrix C 'and the disease similarity matrix D' are obtained directly from the paper "Drug rearrangement based on comprehensive similarity measures and Bi-Random walk algorithm" published by Luo et al in "Bioinformatics" in 2016, the size of C 'is 663 × 663, the size of D' is 409 × 409, the example uses principal component analysis to reduce the sizes of C 'and D' to 663 × 10 and 409 × 10, respectively; the dimensionality reduction adopts a principal component analysis method, and the realization steps are as follows:
step 2a) subtracting the mean value of each column in the drug similarity matrix C 'with the size of 663 x 663, and subtracting the mean value of each column in the disease similarity matrix D' with the size of 409 x 409 to obtain the drug similarity matrix C 'after data centralization'1And disease similarity matrix D'1
Step 2b) respectively obtaining C'1And D'1To obtain a covariance matrix of 663 × 663
Figure BDA0003054669440000061
And covariance matrix of 409 x 409
Figure BDA0003054669440000062
Step 2c) is to
Figure BDA0003054669440000063
And
Figure BDA0003054669440000064
respectively decomposing the characteristic values to obtain
Figure BDA0003054669440000065
663 eigenvalues and 663 eigenvectors of and
Figure BDA0003054669440000066
409 eigenvalues and 409 eigenvectors;
step 2d) according to the sequence from big to small
Figure BDA0003054669440000071
663 eigenvalues of the table are arranged, the first 10 eigenvalues are selected, and then
Figure BDA0003054669440000072
663 eigenvectors corresponding to the 10 eigenvalues in the eigenvectors are respectively used as column vectors to form an eigenvector matrix
Figure BDA0003054669440000073
Figure BDA0003054669440000074
And C'1The product of (a) is a medicine feature matrix C with the size of 663 multiplied by 10, and the medicine feature matrix C is simultaneously paired according to the sequence from large to small
Figure BDA0003054669440000075
409 and the first 10 eigenvalues are selected, and then
Figure BDA0003054669440000076
The 10 eigenvectors in the 409 eigenvectors corresponding to the eigenvalues are respectively used as column vectors to form an eigenvector matrix
Figure BDA0003054669440000077
Figure BDA0003054669440000078
And D'1The product of (a) is a disease feature matrix D with a size of 409 x 10.
Step 3) building a medicine-disease associated prediction model H based on a variational self-encoder:
step 3a) building a medicine-disease associated prediction model H structure based on a variational self-encoder:
constructing a first variational autocoder f comprising a parallel arrangement1And a second variational self-encoder f2And a drug-disease associated prediction model H of the first data fusion module and the second data fusion module, wherein the first variation is from the encoder f1Comprising a first encoder f connected in seriese 1A first latent variable layer fz 1And a first decoder fd 1,fe 1Comprising a plurality of fully connected layers and a mean variance layer, fd 1Comprising a plurality of fully-connected layers and a sigmoid-activated function output layer, f1The weight parameter is
Figure BDA0003054669440000079
Second variational autoencoder f2Comprising a second encoder f connected in seriese 2A second latent variable layer fz 2And a second decoder fd 2,fe 2Comprising a plurality of fully connected layers and a mean variance layer, fd 2Comprising a plurality of fully-connected layers and a sigmoid-activated function output layer, f2The weight parameter is
Figure BDA00030546694400000710
Output of the first data fusion module and fz 1Is connected to the output of the second data fusion module, and the output of the second data fusion module is connected to fz 2The outputs of the two are connected;
the first encoder fe 1Comprises a full connection layer and a mean variance layer, wherein the input dimension of the full connection layer is 663, the output dimension is 50, the mean variance layer is divided into two parallel parts, one part takes the output of the front layer as the input, the full connection layer is connected, the output is taken as the mean value, the input and output dimensions of the part are respectively 50 and 10, the other part also takes the output of the front layer as the input, the other full connection layer is connected, the output is taken as the variance,the input and output dimensions of the section are 50 and 10, respectively; second encoder fe 2The method comprises a full connection layer and a mean variance layer, wherein the input dimension of the full connection layer is 409, the output dimension is 50, the mean variance layer is divided into two parallel parts, one part takes the output of a front layer as input, the full connection layer is connected, the output is taken as a mean value, the input and output dimensions of the part are respectively 50 and 10, the other part also takes the output of the front layer as input, the other full connection layer is connected, the output is taken as a variance, and the input and output dimensions of the part are respectively 50 and 10;
said first decoder fd 1The method comprises a full connection layer and a sigmoid activation function output layer, wherein the input dimension of the full connection layer is 10, the output dimension is 50, the input dimension of the sigmoid activation function output layer is 50, and the output dimension is 663; second decoder fd 2The method comprises a full connection layer and a sigmoid activation function output layer, wherein the input dimension of the full connection layer is 10, the output dimension is 50, the input dimension of the sigmoid activation function output layer is 50, and the output dimension is 409;
the drug-disease associated prediction model based on the variational self-encoder comprises two variational self-encoders arranged in parallel and two data fusion modules, in the process of carrying out iterative training on the model and obtaining a drug-disease associated result, the two data fusion modules fuse various information related to drugs and diseases, implicit information in complex data is fully extracted, and meanwhile, the drug-disease associated prediction model built by the invention learns data distribution rather than data unique characteristic representation, so that the influence of noise and data loss on the prediction result can be reduced.
Step 3b) defining a first variational autocoder f1Loss function Loss1 and second variational self-encoder f2Loss function Loss 2:
Figure BDA0003054669440000081
Figure BDA0003054669440000082
Figure BDA0003054669440000083
Figure BDA0003054669440000084
wherein x represents f1The input data of (a) to (b),
Figure BDA0003054669440000085
denotes f1The result of the prediction of (a) is,
Figure BDA0003054669440000086
Lredenotes f1P denotes a set of elements with a median value of 1 in x, and P ═ xi|xi1,1 ≦ i ≦ N, NP representing the set of elements with x having a median value of 0, NP ≦ xj|xj=0,1≤j≤N},xiAnd xjRespectively representing the ith and jth elements of x, beta representing a non-positive loss attenuation factor, non-positive indicating that the current association is not among the known associations, beta ∈ [0,1 ]];
Figure BDA0003054669440000087
Represents the mean value of μxVariance of
Figure BDA0003054669440000088
Normal distribution of (1), N (0,1) represents the standard positive-Taiwan distribution,
Figure BDA0003054669440000089
to represent
Figure BDA00030546694400000810
And the relative entropy of N (0,1),
Figure BDA00030546694400000811
μxand deltaxRespectively represents f1When the input is x, fe 1A represents a relative entropy loss attenuation factor, a ∈ [0,1 ]](ii) a y represents f2The input data of (a) to (b),
Figure BDA00030546694400000812
denotes f2The result of the prediction of (a) is,
Figure BDA00030546694400000813
step 4) iterative training is carried out on the medicine-disease associated prediction model H based on the variational self-encoder:
step 4a) initializing the iteration number as I, the maximum iteration number as I, wherein I is 350, and the ith iteration is a first variational self-encoder f1The weight parameter is
Figure BDA00030546694400000814
And a second variational self-encoder f2The weight parameter is
Figure BDA00030546694400000815
And let i be 0 and/or 0,
Figure BDA00030546694400000816
step 4b) using the drug-disease association matrix A and the drug characteristics C as a first variational self-encoder f in the drug-disease association prediction model H1Input of (1), a first encoder fe 1Coding A line by line, a first hidden variable layer fz 1To fe 1Mean value of the code
Figure BDA0003054669440000091
Sum variance
Figure BDA0003054669440000092
Constructed normal distribution
Figure BDA0003054669440000093
Sampling is carried out, and the first data fusion module is used for fz 1Hidden variables of dimension 10 collected
Figure BDA0003054669440000094
Additive fusion with the drug C of the corresponding row in the drug signature C, a first decoder fd 1Fusion result to the first data fusion module
Figure BDA0003054669440000095
Decoding to obtain the predicted medicine-disease correlation matrix
Figure BDA0003054669440000096
The first encoder fe 1Coding a line by line in this example, 8 drugs are selected at a time for coding, that is, the minimum batch minipatch is 8; the positive normal distribution
Figure BDA0003054669440000097
The sampling is carried out, in the example not directly at
Figure BDA0003054669440000098
Middle sampling one
Figure BDA0003054669440000099
Since the gradient of the computed samples cannot be propagated backwards, which would result in the model not being trained, the solution is to first sample ε in a standard positive-Tailgate N (0,1)1Then by the formula
Figure BDA00030546694400000910
Is calculated to obtain
Figure BDA00030546694400000911
Step 4c) using the disease-drug association matrix B and the disease characteristics D as a second variational self-encoder f in the drug-disease association prediction model H2Input of, second encodingDevice fe 2B is coded line by line, and a second hidden variable layer fz 2To fe 2Mean value of the code
Figure BDA00030546694400000912
Sum variance
Figure BDA00030546694400000913
Constructed positive Taiwan distribution
Figure BDA00030546694400000914
Sampling is performed, and the second data fusion module pair fz 2Hidden variables of dimension 10 collected
Figure BDA00030546694400000915
Additive fusion with the drugs D of the corresponding row in the drug profile D, a second decoder fd 2Fusion results to the second data fusion module
Figure BDA00030546694400000916
Decoding to obtain a predicted disease-drug correlation matrix
Figure BDA00030546694400000917
The second encoder fe 2Coding B line by line in this example, 8 diseases are selected at a time for coding, i.e. the minimum batch minimatch is 8; the positive normal distribution
Figure BDA00030546694400000918
The sampling is carried out, in the example not directly at
Figure BDA00030546694400000919
Middle sampling one
Figure BDA00030546694400000920
The solution is taken because the gradient of the computed samples cannot be propagated backwards, which would result in the model not being trainedIs to sample in a standard positive-Taiwan distribution N (0,1) to obtain epsilon2Then by the formula
Figure BDA00030546694400000921
Is calculated to obtain
Figure BDA00030546694400000922
Step 4d) using Loss function Loss1 and passing
Figure BDA00030546694400000923
A and
Figure BDA00030546694400000924
first variational autocoder f in calculating H1Loss value of L1iWhile using the Loss function Loss2 and passing
Figure BDA00030546694400000925
B and
Figure BDA00030546694400000926
second variational autocoder f in calculation H2Loss value of L2i
Step 4e) Using the back propagation method and passing through L1iCalculating f1Then using a gradient descent algorithm through f1Parameter gradient pair f1Weight parameter of
Figure BDA0003054669440000101
Updating is carried out; while using the counter-propagating method, and passing through L2iCalculating f2Then using a gradient descent algorithm through f2Parameter gradient pair f2Weight parameter of
Figure BDA0003054669440000102
Updating is carried out;
Figure BDA0003054669440000103
and
Figure BDA0003054669440000104
the update formula of (2) is:
Figure BDA0003054669440000105
Figure BDA0003054669440000106
wherein:
Figure BDA0003054669440000107
and
Figure BDA0003054669440000108
respectively represents f1And f2The updated weight value parameters are used to update the weight value parameters,
Figure BDA0003054669440000109
and
Figure BDA00030546694400001010
respectively represents f1And f2The weight value parameters before the update are set,
Figure BDA00030546694400001011
and
Figure BDA00030546694400001012
respectively represents f1And f2The step size of the learning of (2),
Figure BDA00030546694400001013
and
Figure BDA00030546694400001014
respectively represents f1And f2The gradient of the weight parameter.
And 4f) judging whether I is greater than or equal to I, if so, obtaining a trained drug-disease association prediction model H', otherwise, enabling I to be I +1, and executing the step (4 b).
Step 5) obtaining a medicine-disease correlation prediction result Y:
using the drug-disease association matrix A and the drug characteristics C as a first variational self-encoder f in a trained drug-disease association prediction model H1Is propagated forward to obtain f1Predicted drug-disease association set Y1Simultaneously using the disease-drug association matrix B and the disease characteristics D as a second variational self-encoder f in the trained drug-disease association prediction model H2Is propagated forward to obtain f2Predicted drug-disease association set Y2,Y1And Y2Y is1∩Y2The prediction result of drug-disease association is obtained, wherein, n represents intersection.
For the first variation autoencoder f1And a second variational self-encoder f2Predicted result Y of (2)1And Y2Taking the intersection can effectively reduce the false positive ratio of drug-disease association in Y.
The technical effects of the invention are further explained by simulation experiments as follows:
1. simulation conditions and contents:
simulation experiments were performed in Intel (R) core (TM) i5-7300HQ CPU, main frequency 2.50GHz, memory 8G, Python 3.6.5 on a Pycharm platform in combination with tensorflow1.0, using the Cdasetets data set proposed by Luo et al in the paper "Drug relocation basic on comprehensive knowledge media and Bi-Random walk algorithm" published in "Bioinformatics" in 2016.
The prediction accuracy of the present invention is simulated and compared with the prediction accuracy given in the comparison document, and the result is shown in table 1, prior art 1 in table 1 proposes a Drug relocation method MBIRW based on full similarity measurement and bidirectional Random walk for the paper "Drug relocation base on comprehensive similarity measure and Bi-directional Random walk" published by Luo et al in "Bioinformatics" in 2016, and prior art 2 in table 1 proposes a Drug relocation method rs using Low Rank Matrix and Random algorithm for the paper "Drug relocation method using Low Rank Matrix and Random walk" published by Luo et al in "Bioinformatics" in 2018.
2. And (3) simulation result analysis:
evaluation indexes adopted for representing the prediction precision of the drug-disease association comprise AUC and AUPR.
(1) Auc (area under curve) is the area under the ROC curve (receiving operating characteristic curve), the abscissa of the ROC curve is the false Positive rate FPR (false Positive rate), the ordinate is the true Positive rate TPR (true Positive rate), FPR/(TN + FP), TPR is TP/(TP + FN), where FP represents the number of samples that are actually negative but the model is incorrectly predicted as Positive, TN represents the number of samples that are actually negative and the model is correctly predicted as negative, TP represents the number of samples that are actually Positive and the model is correctly predicted as Positive, and FN represents the number of samples that are actually Positive but the model is incorrectly predicted as negative.
(2) The Area Under the PR Curve (Area Under Precision-Recall Curve) is AUPR (Area Under Precision-Recall Curve), the ordinate axis of the PR Curve is Precision (Precision), the abscissa axis of the PR Curve is Recall (Recall), Precision is TP/(TP + FP), and Recall is TP/(TP + FN).
The results of comparing the AUC and the aucr values on the Cdatasets dataset for the present invention with the two prior art are shown in table 1.
TABLE 1 comparison of the prediction accuracy of the prior art and the present invention
Figure BDA0003054669440000111
The combination table shows that the AUC value and AUPR value of the method are higher than those of the prior art, and the method proves that the method effectively improves the accuracy of the drug-disease correlation prediction.
The foregoing description is only an example of the present invention and should not be construed as limiting the invention in any way, and it will be apparent to those skilled in the art that various changes and modifications in form and detail may be made therein without departing from the principles and arrangements of the invention, but such changes and modifications are within the scope of the invention as defined by the appended claims.

Claims (4)

1. A medicine-disease associated prediction method based on a variational self-encoder is characterized by comprising the following steps:
(1) constructing a drug-disease association matrix A and a disease-drug association matrix B:
(1a) obtaining S ═ S of M medicines from database1,S2,...,Sm,...,SMThere are associated N diseases T ═ T1,T2,...,Tn,...,TNK drug-disease association data E ═ E1,E2,...,Ek,...,EKS, each drugmIs associated with at least one disease, and each disease TnIs associated with at least one drug, wherein K is more than or equal to 1000, M is more than or equal to 100, N is more than or equal to 200, SmDenotes the m-th drug, TnM is 1. ltoreq. m.ltoreq.M, N is 1. ltoreq. N, EkIndicates the kth drug-disease association;
(1b) constructing the element A with the size of M × N and the M row and the N columnmnAnd transposing A to obtain a disease-drug association matrix B, wherein A is 0 or 1mnWhen the value of (A) is 0, it represents AmnThe correlation between the corresponding mth drug and the nth disease is not found in the drug-disease correlation data E, AmnWhen the value of (A) is 1, represents AmnThe association of the corresponding mth drug and nth disease is in drug-disease association data E;
(2) constructing a drug characteristic matrix C and a disease characteristic matrix D:
(2a) obtaining S ═ S of M medicines from database1,S2,...,Sm,...,SMThere are associated P genes G ═ G }1,G2,...,Gp,...,GPQ pieces of drug-gene association data R ═ R } ═ R1,R2,...,Rq,...,RQS, each drugmIs associated with at least one gene, and each gene GpAssociated with at least one drug; construction size M P and M row P column element C'mpA drug-gene association matrix C 'of value 0 or 1, wherein C'mpIs C 'when the value of (2)'mpThe correlation between the corresponding mth drug and the pth gene is not in the drug-gene correlation data R, C'mpIs C 'when the value of (2)'mpThe corresponding M-th medicine and P-th gene are related in the medicine-gene related data R, P is more than or equal to 200, Q is more than or equal to 1000, M is more than or equal to 1 and less than or equal to M, P is more than or equal to 1 and less than or equal to P, GpDenotes the p-th gene, RqRepresents the q-th drug-gene association;
(2b) obtaining T ═ T of N diseases from database1,T2,...,Tn,...,TNThere are associated O kinds of genes G ═ G }1,G2,...,Go,...,GOJ pieces of disease-gene association data U ═ U1,U2,...,Uj,...,UJ}, T for each diseasenIs associated with at least one gene, and each gene GoIs associated with at least one disease; construction of size N O and N row O column element D'noThe disease-gene correlation matrix D ' having a value of 0 or 1, wherein D ' represents D ' when the value of D ' is 0 'noIf the correlation between the corresponding n-th disease and the o-th gene is not in the disease-gene correlation data U, D 'represents D' when the value of D 'is 1'noThe corresponding N-th disease and O-th gene are related in the disease-gene related data U, O is more than or equal to 200, J is more than or equal to 1000, N is more than or equal to 1 and less than or equal to N, O is more than or equal to 1 and less than or equal to O, UjRepresents the jth disease-gene association;
(2c) respectively reducing dimensions of C 'with the size of M multiplied by P and D' with the size of N multiplied by O to obtain a medicine characteristic matrix C with the size of M multiplied by V and a disease characteristic matrix D with the size of N multiplied by W, wherein each line in C is the characteristic of the medicine in the line, each line in D is the characteristic of the disease in the line, V is more than or equal to 1 and less than or equal to P, and W is more than or equal to 1 and less than or equal to O;
(3) constructing a medicine-disease associated prediction model H based on a variational self-encoder:
(3a) constructing a medicine-disease associated prediction model H structure based on a variational self-encoder:
constructing a first variational autocoder f comprising a parallel arrangement1And a second variational self-encoder f2The drug-disease association prediction model of (1), wherein the first variation is from the encoder f1Using a first encoder f comprising a series connectione 1A first latent variable layer fz 1And a first decoder fd 1Of a neural network ofe 1Comprising a plurality of fully connected layers and a mean variance layer, fz 1Is connected to a first data fusion module, fd 1Comprising a plurality of fully-connected layers and a sigmoid-activated function output layer, f1The weight parameter is
Figure FDA0003054669430000029
Second variational autoencoder f2Comprising a second encoder f connected in seriese 2A second latent variable layer fz 2And a second decoder fd 2,fe 2Comprising a plurality of fully connected layers and a mean variance layer, fz 2Is connected to a second data fusion module, fd 2Comprising a plurality of fully-connected layers and a sigmoid-activated function output layer, f2The weight parameter is
Figure FDA00030546694300000210
(3b) Defining a first variational autoencoder f1Loss function Loss1 and second variational self-encoder f2Loss function Loss 2:
Figure FDA0003054669430000021
Figure FDA0003054669430000022
Figure FDA0003054669430000023
Figure FDA0003054669430000024
wherein x represents f1The input data of (a) to (b),
Figure FDA0003054669430000025
denotes f1The result of the prediction of (a) is,
Figure FDA0003054669430000026
Lredenotes f1Loss of reconstruction of, POxDenotes the set of elements with a median value of 1, POx={xi|xi=1,1≤i≤N},NPxDenotes the set of elements with a value of 0 in x, NPx={xj|xj=0,1≤j≤N},xiAnd xjRespectively representing the ith and jth elements of x, beta representing a non-positive loss attenuation factor, non-positive indicating that the current association is not among the known associations, beta ∈ [0,1 ]];
Figure FDA0003054669430000027
Represents the mean value of μxVariance of
Figure FDA0003054669430000028
Normal distribution of (1), N (0,1) represents the standard positive-Taiwan distribution,
Figure FDA0003054669430000031
to represent
Figure FDA0003054669430000032
And the relative entropy of N (0,1),
Figure FDA0003054669430000033
μxand deltaxRespectively represents f1When the input is x, fe 1A represents a relative entropy loss attenuation factor, a ∈ [0,1 ]](ii) a y represents f2The input data of (a) to (b),
Figure FDA0003054669430000034
denotes f2The result of the prediction of (a) is,
Figure FDA0003054669430000035
(4) performing iterative training on a variational self-encoder-based medicine-disease associated prediction model H:
(4a) the initial iteration number is I, the maximum iteration number is I, I is more than or equal to 300, and the ith iteration is a first variational self-encoder f1The weight parameter is
Figure FDA0003054669430000036
And a second variational self-encoder f2The weight parameter is
Figure FDA0003054669430000037
And let i be 0 and/or 0,
Figure FDA0003054669430000038
(4b) using the drug-disease association matrix A and the drug characteristics C as a first variational self-encoder f in the drug-disease association prediction model H1Input of (1), a first encoder fe 1Coding A line by line, a first hidden variable layer fz 1To fe 1Encoded mean value muf1_iSum variance
Figure FDA0003054669430000039
Constructed normal distribution
Figure FDA00030546694300000310
Sampling is carried out, and the first data fusion module is used for fz 1Hidden variables with V dimension collected
Figure FDA00030546694300000311
Additive fusion with the drug C of the corresponding row in the drug signature C, a first decoder fd 1Fusion result to the first data fusion module
Figure FDA00030546694300000312
Decoding to obtain the predicted medicine-disease correlation matrix
Figure FDA00030546694300000313
(4c) Using the disease-drug association matrix B and the disease characteristics D as a second variational self-encoder f in the drug-disease association prediction model H2Input of (2), a second encoder fe 2B is coded line by line, and a second hidden variable layer fz 2To fe 2Mean value of the code
Figure FDA00030546694300000314
Sum variance
Figure FDA00030546694300000315
Constructed positive Taiwan distribution
Figure FDA00030546694300000316
Sampling is performed, and the second data fusion module pair fz 2Hidden variables of dimension W are collected
Figure FDA00030546694300000317
Additive fusion with the drugs D of the corresponding row in the drug profile D, a second decoder fd 2Fusion results to the second data fusion module
Figure FDA00030546694300000318
Decoding to obtain predicted disease-drug associationsMatrix array
Figure FDA00030546694300000319
(4d) Using Loss function Loss1 and passing
Figure FDA00030546694300000320
A and
Figure FDA00030546694300000321
first variational autocoder f in calculating H1Loss value of L1iWhile using the Loss function Loss2 and passing
Figure FDA00030546694300000322
B and
Figure FDA00030546694300000323
second variational autocoder f in calculation H2Loss value of L2i
(4e) Using a counter-propagating method and passing through L1iCalculating f1Then using a gradient descent algorithm through f1Parameter gradient pair f1Weight parameter of
Figure FDA00030546694300000324
Updating is carried out; while using the counter-propagating method, and passing through L2iCalculating f2Then using a gradient descent algorithm through f2Parameter gradient pair f2Weight parameter of
Figure FDA0003054669430000041
Updating is carried out;
(4f) judging whether I is greater than or equal to I, if so, obtaining a trained drug-disease association prediction model H', otherwise, enabling I to be I +1, and executing the step (4 b);
(5) obtaining a drug-disease association prediction result Y:
correlating drug-disease matricesA and the drug characteristics C are used as a first variational self-encoder f in a trained drug-disease associated prediction model H1Is propagated forward to obtain f1Predicted drug-disease association set Y1Simultaneously using the disease-drug association matrix B and the disease characteristics D as a second variational self-encoder f in the trained drug-disease association prediction model H2Is propagated forward to obtain f2Predicted drug-disease association set Y2,Y1And Y2Y is1∩Y2The prediction result of drug-disease association is obtained, wherein, n represents intersection.
2. The method for predicting drug-disease association based on variational self-encoder as claimed in claim 1, wherein said step (2C) comprises performing dimension reduction on C 'with size M × P and D' with size N × O, respectively, by using principal component analysis method, and the method comprises the following steps:
(2c1) subtracting the mean value of each column from each column of the drug-gene association matrix C 'of size M × P, and subtracting the mean value of each column from each column of the disease-gene association matrix D' of size N × O to obtain the drug-gene association matrix C 'after data centering'1And disease-Gene correlation matrix D'1
(2c2) Respectively obtaining C'1And D'1To obtain a covariance matrix of size P x P
Figure FDA0003054669430000042
And a covariance matrix of size O x O
Figure FDA0003054669430000043
(2c3) To pair
Figure FDA0003054669430000044
And
Figure FDA0003054669430000045
respectively decomposing the characteristic values to obtain
Figure FDA0003054669430000046
P eigenvalues and P eigenvectors of and
Figure FDA0003054669430000047
o eigenvalues and O eigenvectors;
(2c4) in the order from big to small
Figure FDA0003054669430000048
The first V eigenvalues are selected and then the P eigenvalues are ranked and the first V eigenvalues are selected and ranked
Figure FDA0003054669430000049
The eigenvectors corresponding to the V eigenvalues in the P eigenvectors are respectively used as column vectors to form an eigenvector matrix
Figure FDA00030546694300000410
Figure FDA00030546694300000411
And C1The product of' is a drug feature matrix C with size of M × V, and the pairs are in descending order
Figure FDA00030546694300000412
The first W eigenvalues are selected and then the O eigenvalues are ranked
Figure FDA00030546694300000413
The eigenvectors corresponding to the W eigenvalues in the O eigenvectors are respectively used as column vectors to form an eigenvector matrix
Figure FDA00030546694300000414
Figure FDA00030546694300000415
And D'1The product of (a) is a disease feature matrix D with size of M × W.
3. The method for drug-disease associated prediction based on variational self-encoder as claimed in claim 1, wherein said step (3a) builds a variational self-encoder based drug-disease associated prediction model H structure, wherein the first encoder fe 1The mean variance layer comprises two full-connection layers which have different weight parameters and are arranged in parallel, and the outputs of the two full-connection layers are respectively used as a mean value and a variance; second encoder fe 2The mean variance layer comprises two full-connection layers which have different weight parameters and are arranged in parallel, and the outputs of the two full-connection layers are respectively used as a mean and a variance.
4. The method for predicting drug-disease association based on variational self-encoder as claimed in claim 1, wherein said step (4e) is performed by using gradient descent algorithm through f1Parameter gradient pair f1Weight parameter of
Figure FDA0003054669430000051
Updating and using a gradient descent algorithm through f2Parameter gradient pair f2Weight parameter of
Figure FDA0003054669430000052
Updating, wherein the updating formulas are respectively as follows:
Figure FDA0003054669430000053
Figure FDA0003054669430000054
wherein:
Figure FDA0003054669430000055
and
Figure FDA0003054669430000056
respectively represents f1And f2The updated weight value parameters are used to update the weight value parameters,
Figure FDA0003054669430000057
and
Figure FDA0003054669430000058
respectively represents f1And f2The weight value parameters before the update are set,
Figure FDA0003054669430000059
and
Figure FDA00030546694300000510
respectively represents f1And f2The step size of the learning of (2),
Figure FDA00030546694300000511
and
Figure FDA00030546694300000512
respectively represents f1And f2The gradient of the weight parameter.
CN202110496613.9A 2021-05-07 2021-05-07 Drug-disease association prediction method based on variation self-encoder Active CN113223655B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110496613.9A CN113223655B (en) 2021-05-07 2021-05-07 Drug-disease association prediction method based on variation self-encoder

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110496613.9A CN113223655B (en) 2021-05-07 2021-05-07 Drug-disease association prediction method based on variation self-encoder

Publications (2)

Publication Number Publication Date
CN113223655A true CN113223655A (en) 2021-08-06
CN113223655B CN113223655B (en) 2023-05-12

Family

ID=77091888

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110496613.9A Active CN113223655B (en) 2021-05-07 2021-05-07 Drug-disease association prediction method based on variation self-encoder

Country Status (1)

Country Link
CN (1) CN113223655B (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114613452A (en) * 2022-03-08 2022-06-10 电子科技大学 Drug relocation method and system based on drug classification map neural network

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20190244680A1 (en) * 2018-02-07 2019-08-08 D-Wave Systems Inc. Systems and methods for generative machine learning
WO2019231624A2 (en) * 2018-05-30 2019-12-05 Quantum-Si Incorporated Methods and apparatus for multi-modal prediction using a trained statistical model
CN111681718A (en) * 2020-06-11 2020-09-18 湖南大学 Medicine relocation method based on deep learning multi-source heterogeneous network
CN112071373A (en) * 2020-09-02 2020-12-11 深圳晶泰科技有限公司 Drug molecule screening method and system
CN112308326A (en) * 2020-11-05 2021-02-02 湖南大学 Biological network link prediction method based on meta-path and bidirectional encoder

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20190244680A1 (en) * 2018-02-07 2019-08-08 D-Wave Systems Inc. Systems and methods for generative machine learning
WO2019231624A2 (en) * 2018-05-30 2019-12-05 Quantum-Si Incorporated Methods and apparatus for multi-modal prediction using a trained statistical model
CN111681718A (en) * 2020-06-11 2020-09-18 湖南大学 Medicine relocation method based on deep learning multi-source heterogeneous network
CN112071373A (en) * 2020-09-02 2020-12-11 深圳晶泰科技有限公司 Drug molecule screening method and system
CN112308326A (en) * 2020-11-05 2021-02-02 湖南大学 Biological network link prediction method based on meta-path and bidirectional encoder

Non-Patent Citations (5)

* Cited by examiner, † Cited by third party
Title
JARADA, TAMER N.1 等: "SNF–CVAE: Computational method to predict drug–disease interactions using similarity network fusion and collective variational autoencoder", 《KNOWLEDGE-BASED SYSTEMS》 *
刘佳琦;李阳;: "基于信息最大化变分自编码器的孪生神经主题模型", 《计算机应用与软件》 *
支尧: "基于概率关系自编码器的药靶关系预测研究", 《万方数据库-学位论文库》 *
李苗苗;: "基于XG-B00ST和多数据源的药物重定位预测", 《软件导刊》 *
鱼亮 等: "基于组织特异性和直接邻居相似度方法预测疾病–药物关系", 《中国科学:信息科学》 *

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114613452A (en) * 2022-03-08 2022-06-10 电子科技大学 Drug relocation method and system based on drug classification map neural network
CN114613452B (en) * 2022-03-08 2023-04-28 电子科技大学 Drug repositioning method and system based on drug classification graph neural network

Also Published As

Publication number Publication date
CN113223655B (en) 2023-05-12

Similar Documents

Publication Publication Date Title
CN114927162A (en) Multi-set correlation phenotype prediction method based on hypergraph representation and Dirichlet distribution
CN112599187B (en) Method for predicting drug and target protein binding fraction based on double-flow neural network
CN112652355A (en) Medicine-target relation prediction method based on deep forest and PU learning
Kim et al. Spiked Dirichlet process prior for Bayesian multiple hypothesis testing in random effects models
CN111951886A (en) Drug relocation prediction method based on Bayesian inductive matrix completion
CN115472221A (en) Protein fitness prediction method based on deep learning
Wu et al. Prediction and screening model for products based on fusion regression and xgboost classification
CN113223655A (en) Medicine-disease associated prediction method based on variational self-encoder
Wang et al. MVIL6: Accurate identification of IL-6-induced peptides using multi-view feature learning
Svirsky et al. Interpretable deep clustering
CN113284627A (en) Medication recommendation method based on patient characterization learning
Wayahdi et al. KNN and XGBoost Algorithms for Lung Cancer Prediction
Zhong et al. Recent advances on the machine learning methods in predicting ncRNA-protein interactions
CN116758993A (en) DNA methylation prediction method integrating multiple groups of chemical characteristics
CN116401369A (en) Entity identification and classification method for biological product production terms
Zhong et al. Recent advances on the semi-supervised learning for long non-coding RNA-protein interactions prediction: A review
CN115083511A (en) Peripheral gene regulation and control feature extraction method based on graph representation learning and attention
Iraji et al. Druggable protein prediction using a multi-canal deep convolutional neural network based on autocovariance method
CN113192562B (en) Pathogenic gene identification method and system fusing multi-scale module structure information
Yaman et al. MachineTFBS: Motif-based method to predict transcription factor binding sites with first-best models from machine learning library
CN113223622A (en) miRNA-disease association prediction method based on meta-path
Liu et al. A novel model-based on FCM–LM algorithm for prediction of protein folding rate
Akalın et al. Classification of exon and intron regions on dna sequences with hybrid use of sbert and anfis approaches
Han et al. Hessian Regularized L 2, 1-Nonnegative Matrix Factorization and Deep Learning for miRNA–Disease Associations Prediction
CN117198426B (en) Multi-scale medicine-medicine response interpretable prediction method and system

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant