CN115116580A - Virus-drug association prediction method based on matrix decomposition and heterogeneous graph reasoning - Google Patents

Virus-drug association prediction method based on matrix decomposition and heterogeneous graph reasoning Download PDF

Info

Publication number
CN115116580A
CN115116580A CN202210813511.XA CN202210813511A CN115116580A CN 115116580 A CN115116580 A CN 115116580A CN 202210813511 A CN202210813511 A CN 202210813511A CN 115116580 A CN115116580 A CN 115116580A
Authority
CN
China
Prior art keywords
virus
matrix
drug
similarity
association
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202210813511.XA
Other languages
Chinese (zh)
Inventor
程效龙
瞿佳
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Changzhou University
Original Assignee
Changzhou University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Changzhou University filed Critical Changzhou University
Priority to CN202210813511.XA priority Critical patent/CN115116580A/en
Publication of CN115116580A publication Critical patent/CN115116580A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16HHEALTHCARE INFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR THE HANDLING OR PROCESSING OF MEDICAL OR HEALTHCARE DATA
    • G16H20/00ICT specially adapted for therapies or health-improving plans, e.g. for handling prescriptions, for steering therapy or for monitoring patient compliance
    • G16H20/10ICT specially adapted for therapies or health-improving plans, e.g. for handling prescriptions, for steering therapy or for monitoring patient compliance relating to drugs or medications, e.g. for ensuring correct administration to patients
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F17/00Digital computing or data processing equipment or methods, specially adapted for specific functions
    • G06F17/10Complex mathematical operations
    • G06F17/16Matrix or vector computation, e.g. matrix-matrix or matrix-vector multiplication, matrix factorization
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N5/00Computing arrangements using knowledge-based models
    • G06N5/04Inference or reasoning models
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16CCOMPUTATIONAL CHEMISTRY; CHEMOINFORMATICS; COMPUTATIONAL MATERIALS SCIENCE
    • G16C20/00Chemoinformatics, i.e. ICT specially adapted for the handling of physicochemical or structural data of chemical particles, elements, compounds or mixtures
    • G16C20/50Molecular design, e.g. of drugs
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16HHEALTHCARE INFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR THE HANDLING OR PROCESSING OF MEDICAL OR HEALTHCARE DATA
    • G16H70/00ICT specially adapted for the handling or processing of medical references
    • G16H70/40ICT specially adapted for the handling or processing of medical references relating to drugs, e.g. their side effects or intended usage

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Health & Medical Sciences (AREA)
  • Theoretical Computer Science (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Chemical & Material Sciences (AREA)
  • General Physics & Mathematics (AREA)
  • Mathematical Physics (AREA)
  • Medicinal Chemistry (AREA)
  • General Health & Medical Sciences (AREA)
  • Data Mining & Analysis (AREA)
  • Computing Systems (AREA)
  • Software Systems (AREA)
  • Mathematical Analysis (AREA)
  • Mathematical Optimization (AREA)
  • Pure & Applied Mathematics (AREA)
  • Computational Mathematics (AREA)
  • Epidemiology (AREA)
  • Pharmacology & Pharmacy (AREA)
  • General Engineering & Computer Science (AREA)
  • Medical Informatics (AREA)
  • Primary Health Care (AREA)
  • Public Health (AREA)
  • Algebra (AREA)
  • Spectroscopy & Molecular Physics (AREA)
  • Crystallography & Structural Chemistry (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Toxicology (AREA)
  • Databases & Information Systems (AREA)
  • Evolutionary Computation (AREA)
  • Computational Linguistics (AREA)
  • Artificial Intelligence (AREA)
  • Measuring Or Testing Involving Enzymes Or Micro-Organisms (AREA)

Abstract

The invention provides a virus-drug association prediction method based on matrix decomposition and heterogeneous graph reasoning, which mainly solves the problems of low virus-drug association prediction precision and rare virus-drug association prediction in the prior art, and comprises the following steps: (1) obtaining a data set comprising known virus-drug associations; (2) integrating the similarity matrix (3) of the virus and the medicine, performing matrix decomposition on the virus-medicine adjacent matrix, and constructing a new virus-medicine association matrix (4) to construct a differential graph to predict potential virus-medicine association. The invention predicts the association of the virus and the medicine by matrix decomposition and heterogeneous graph reasoning, and has good prediction performance and robustness.

Description

Virus-drug association prediction method based on matrix decomposition and heterogeneous graph reasoning
Technical Field
The invention relates to the technical field of machine learning and biological information, in particular to a virus-drug association prediction method based on restricted Boltzmann machine matrix decomposition and heterogeneous graph reasoning.
Background
Human life and other higher animals are closely related to microbial communities, including bacteria as well as archaea, viruses, fungi and protozoa. Viruses are a class of microorganisms, and outbreaks of new viruses can pose a significant hazard to humans. For example, SARS-CoV-2, a severe acute respiratory syndrome called COVID-19, is caused by the rapid spread of "novel coronavirus" in the world, and no specific vaccine or antiviral drug against SARS-CoV-2 has been found at present. Therefore, it is imperative to find a specific antiviral drug to prevent the spread of SARS-CoV-2 as soon as possible. In addition, Human Immunodeficiency Virus (HIV) infection can cause acquired immunodeficiency syndrome (AIDS) through stages that include viral replication and transmission, long-term asymptomatic phases, and depletion of CD4+ T cells. Ebola virus (EBOV) enters the body through damaged skin or through mucosal surfaces, resulting in EBOV infection. EBOV infections can cause fever, mucosal bleeding, and even death. Zika virus (ZIKV) can infect cells by fusion using acidic endosome and rennin mediated endocytosis. ZIKV infection can cause many diseases such as dengue fever, yellow fever and west nile virus.
Generally, after infection with a virus and a disease, one first uses a drug to treat the disease. Therefore, there is a need to find effective antiviral drugs. Drug discovery is one of the major targets of pharmaceutical science, a interdisciplinary field of basic science including biology, chemistry, physics, and statistics. For thousands of years, nature has been the source of medicinal products and many useful actives have been developed from plant sources. In the 20 th century, the discovery of penicillin was the starting point for drug discovery from microbial sources. Most drugs are developed from lead structures based on natural products synthesized by bacteria. Drugs derived from bacterial secondary metabolites find a variety of uses, for example in the diagnosis, alleviation or treatment or prevention of disease or alleviation of discomfort. It is estimated that in the golden age of microbial natural product screening (1940) -1970, tens of millions of soil microorganisms have been screened, which is a tremendous effort, providing the vast majority of microbial metabolites known today. These substances include widely used antibacterial therapies such as erythromycin, streptomycin, tetracycline, vancomycin, and chemotherapeutic drugs such as doxorubicin. 90% of all antibiotics used in clinics today are derived from microorganisms. Currently, 23000 kinds of natural products having antibacterial activity are known to be produced by microorganisms, and only 25000 kinds of natural products isolated from higher organisms such as plants and animals are known.
However, currently, drug development faces two major challenges. On the one hand, a drug has a long time period, and it takes a long time from the beginning of development to the market. On the other hand, drug resistance has begun to emerge, constituting a serious threat to human health. To overcome this problem, combinatorial chemistry has been developed as a key technology that can generate large screening libraries to meet the needs of high throughput screening. Furthermore, drug reuse, also known as drug relocation, is an idea of using drugs that have been approved for the market to treat other existing diseases. For drug combination therapy and drug relocation, determining drug-to-virus association is crucial. Therefore, detection of the interaction between virus and drug is of great importance for virus therapeutics and drug development. However, traditional wet laboratory experiments (e.g., culture-based methods) find virus-drug association time-consuming, laborious, and expensive. Therefore, computational methods that efficiently and accurately predict virus-drug binding are a beneficial addition to limited experimental methods.
Generally, due to the problem of virus resistance and long development period of new drugs, the identified virus-drug association has important significance for drug development and disease treatment. Traditional experimental methods are time-consuming and labor-consuming, and it is an extremely urgent problem to develop efficient computational methods to identify potential virus-drug associations.
Disclosure of Invention
In order to solve the above problems, we invented a virus-drug association prediction method based on matrix decomposition and heterogeneous graph reasoning, which is more time-saving and labor-saving in predicting virus-drug association.
In order to achieve the purpose, the technical scheme of the invention is as follows:
a virus-drug association prediction method based on matrix decomposition and heterogeneous graph reasoning comprises the following steps:
the method comprises the following steps: obtaining a known virus-drug association matrix;
the data set is composed using known virus-drug data. The form of the data is represented as follows:
Figure BDA0003740215440000031
wherein A (i, p) represents the drug d i And virus v p And whether or not to correlate, if so, the value of A (i, p) is 1, otherwise, it is 0.
Step two: integrating similarity matrix of virus and drug;
for the similarity of the drugs, the chemical structure similarity of the drugs, the side effect similarity of the drugs and the Gaussian interaction spectrum nuclear similarity of the drugs are integrated to obtain the integrated drug similarity. If the drugs have chemical structure similarity or side effect similarity, the integrated drug similarity is the average of the chemical structure similarity of the drugs and the similarity of the side effects of the drugs. Otherwise, the integrated drug similarity is equal to the value of the gaussian interaction spectrum nuclear similarity of the drug. The calculation formula is as follows:
Figure BDA0003740215440000032
wherein SS1 is the similarity of chemical structures of drugs, SS2 is the similarity of side effects of drugs, GD is the Gaussian similarity of drugs, and SD is the similarity of integrated drugs.
For virus similarity, the virus sequence similarity and the gaussian interaction profile nuclear similarity of the viruses are integrated together to obtain integrated virus similarity. The formula is as follows:
Figure BDA0003740215440000033
where MV is the drug similarity of the virus, GV is the Gaussian similarity of the virus, and SV is the integrated virus similarity.
Step three: performing matrix decomposition on the virus-drug adjacency matrix and constructing a new virus-drug association matrix;
since some of the associations of the drug-virus adjacency matrix in the dataset may be redundant or absent, we break down the known drug-virus adjacency matrix into two parts. The first part includes the original matrix and the low rank matrix. The low rank matrix includes non-redundant data that can be used to construct a new drug-virus association matrix. The second part is a sparse matrix in which the elements are mostly zero. The formula of the decomposition is as follows.
A=AX+E(3)
Wherein A is a virus and drug correlation matrix, X is a decomposed low-rank matrix, and E is a sparse matrix after decomposition;
then, a low rank matrix X is obtained using the kernel norm, and a sparse matrix E is obtained using the sparse norm. The above equation (3) can be converted into the following equation:
Figure BDA0003740215440000041
wherein: i | · | purple wind * Representing kernel norm, | · | viry 2,1 Representing the sparse norm, α is for the control weight;
equation (4) above can also be equivalently expressed as:
Figure BDA0003740215440000042
in brief, equation (5) above can be viewed as a constraint and convex optimization problem. Inaccurate enhanced lagrange multipliers (IALMs) are intended to solve this problem. First, equation (5) above is transformed into an unconstrained problem. Second, the unconstrained problem is minimized by using the enhanced Lagrangian function of the following equation (6).
Figure BDA0003740215440000043
Wherein, Y 1 、Y 2 Represents the langerhan multiplier; μ is a penalty factor;
Figure BDA0003740215440000044
represents the F norm;
from the above equation (6), two solutions can be obtained, which are defined as X, respectively * And E * . Then, by using A and X * Establishing a new drug-virus incidence matrix A * . The results are as follows:
A * =AX * (7)
step four: a differential map is constructed to predict potential virus-drug associations.
And (4) combining the new drug-virus association matrix constructed in the step three with drug integration similarity and virus integration similarity into a heteromorphic graph. And predicting the potential association probability of the medicine and the virus from the heterogeneous graph. If there is no known association between the drug and the virus, the potential association probability matrix is defined as follows:
Figure BDA0003740215440000051
wherein nv represents the number of viruses, nd represents the number of drugs, v l Represents any one of 1 to nv viruses, d m Represents any one of 1 to nd drugs.
In addition, the weights of the edges in the heteromorphic graph, i.e., the integrated drug similarity and the integrated virus similarity, are normalized according to the degree of their endpoints. The normalized equation is as follows:
Figure BDA0003740215440000052
Figure BDA0003740215440000053
wherein v is l Represents any one of 1 to nv viruses, d m Represents any one of 1 to nd drugs; according to previous studies, the normalization process helps convergence. Further, an iterative approach may be used to compute the potential association probability matrix between all drugs and all viruses. The iterative equation is as follows:
P k+1 =aSV×P k ×SD+(1-a)A * (11)
wherein, P k+1 Represents k +1 iterations of the potential relevance probability matrix, and a represents a penalty factor.
After iteration is finished, the potential association probability matrix between all the medicines and all the viruses is a predicted virus-medicine association score matrix, the score matrix is still a matrix of nd rows and nv columns, and finally, virus-medicine association prediction is carried out by utilizing the score matrix.
Compared with the prior art, the technical scheme of the invention has the beneficial effects that:
the method takes the virus-drug association prediction problem as a prediction task, adopts multi-source biological data comprising a known drug-virus association matrix, a drug chemical structure similarity matrix, a drug side effect similarity matrix, a drug Gaussian similarity matrix, a virus sequence similarity matrix and a virus Gaussian similarity matrix, has rich data quantity, is beneficial to predicting potential drug-virus association, and realizes more accurate virus-drug association prediction.
Drawings
FIG. 1 is a flow chart of the method of the present invention.
FIG. 2 is a graph comparing ROC curves and global one-left AUC values in a data set of a virus-drug association prediction method based on matrix decomposition and isomerous graph reasoning and four other existing methods in an example.
FIG. 3 is a graph comparing ROC curves and local one-off AUC values in data sets of the virus-drug association prediction method based on matrix decomposition and isomerogram reasoning and four other prior methods involved in the example.
Detailed Description
It is to be understood that the following detailed description is exemplary and is intended to provide further explanation of the invention as claimed. Unless otherwise indicated, it will be understood by those skilled in the art that certain well-known structures in the drawings and descriptions thereof may be omitted.
The technical solution of the present invention is further described below with reference to the accompanying drawings and examples.
The present example provides a virus-drug association prediction Method (MDHGIVDA) based on matrix factorization and heterogeneous map reasoning, as shown in fig. 1, comprising the steps of:
the method comprises the following steps: a known virus-drug association matrix is obtained.
Numerous biological experiments have found a number of virus-drug associations. The virus-drug association information we used is from the drug virus dataset. Long et al constructed a DrugVirus dataset based on a DrugVirus database. The drug virus data set includes 933 known virus-drug associations, which contain 175 drugs and 95 viruses. We build adjacency matrix a to store virus-drug association information. In A, nd represents the number of drugs and nm represents the number of viruses. If the drug is associated with a virus, the association value is 1, otherwise it is 0.
Figure BDA0003740215440000071
Step two: integrating similarity matrix of virus and drug;
to obtain integrated drug similarity, we integrated the chemical structure similarity of drugs, the side effect similarity of drugs and the gaussian interaction spectrum nuclear similarity of drugs. If the drugs have chemical structure similarity or side effect similarity, the integrated drug similarity is the average of the chemical structure similarity of the drugs and the similarity of the side effects of the drugs. Otherwise, the integrated drug similarity is equal to the value of the gaussian interaction spectrum nuclear similarity of the drug. The calculation formula is as follows:
Figure BDA0003740215440000072
wherein SS1 is the similarity of chemical structures of drugs, SS2 is the similarity of side effects of drugs, GD is the Gaussian similarity of drugs, and SD is the similarity of integrated drugs.
For virus similarity, we integrate viral sequence similarity and the gaussian interaction profile nuclear similarity of the viruses to obtain integrated virus similarity. The formula is as follows:
Figure BDA0003740215440000073
wherein, the similarity of the virus is the similarity of the drugs of the virus, the Gaussian similarity of the virus and the integrated virus.
Step three: performing matrix decomposition on the virus-drug adjacency matrix and constructing a new virus-drug association matrix;
since some of the associations of the drug-virus adjacency matrix in the dataset may be redundant or absent, we break down the known drug-virus adjacency matrix into two parts. The first part includes the original matrix and the low rank matrix. The low rank matrix includes non-redundant data that can be used to construct a new drug-virus association matrix. The second part is a sparse matrix in which the elements are mostly zero. The formula of the decomposition is as follows.
A=AX+E
Then, a low rank matrix X is obtained using the kernel norm, and a sparse matrix E is obtained using the sparse norm. Thus, the above equation can be converted into the following equation:
Figure BDA0003740215440000081
wherein: i | · | purple wind * Represents the kernel norm, | ·| luminance 2,1 Represents the sparse norm, α is for the control weight;
the above formula can also be equivalently expressed as:
Figure BDA0003740215440000082
briefly, the above equation can be viewed as a constraint and convex optimization problem. Inaccurate enhanced lagrange multipliers (IALMs) are intended to solve this problem. First, the above equation can be converted into an unconstrained problem. Second, the unconstrained problem is minimized with the following enhanced Lagrangian function.
Figure BDA0003740215440000083
We can derive two solutions from the above equation, which are defined as X, respectively * And E * . Then, by using A and X * Establishing a new drug-virus association matrix A * . The results are as follows:
A * =AX *
step four: a heteromorphic map is constructed to predict potential virus-drug associations.
The new drug-virus association matrix and drug integration similarity, virus integration similarity are combined into a heteromorphic graph. The probability of potential association of the drug with the virus can then be predicted from the heterogeneous map. If there is no known association between the drug and the virus, we define their potential association probability as the formula:
Figure BDA0003740215440000084
in addition, the weights of the edges (integrated drug similarity and integrated virus similarity) are normalized according to the degree of their endpoints. The normalized equation is as follows:
Figure BDA0003740215440000091
Figure BDA0003740215440000092
according to previous studies, the normalization process helps convergence. Further, an iterative approach may be used to calculate the potential association probability between the drug and the virus. The iterative equation is as follows: z is a partition function, expressed as follows:
P k+1 =aSV×P k ×SD+(1-a)A *
finally, the predicted virus-drug association score matrix is a matrix of nd rows and nv columns. We use P to maintain this scoring matrix. Ranked according to predicted score, higher scores indicate a greater likelihood of a virus-drug association. And according to the sequencing result, the relevance possibility ranking between certain viruses and certain medicines can be given, the prediction relevance has great reference value, and the interaction relation between certain viruses and certain medicines can be researched in a targeted manner in the biomedical field, so that the research and development of medicines are facilitated, and diseases are treated.
The evaluation method comprises the following steps: we used global LOOCV, local LOOCV and five-fold cross-validation methods to evaluate the predictive performance of the method proposed by the present invention. In LOOCV, each known virus-drug association is selected in turn as a test sample, and the remaining known virus-drug associations are used as training samples. For global LOOCV, all unknown virus-drug pairs were used as candidate samples. Then, we train the model with the training samples, and predict the scores of the test samples and the candidate samples with the trained model. We further rank the test samples and candidate samples according to the predicted scores of the global LOOCV. Finally, we get a ranking of all test samples. While in local LOOCV, the score of the test sample is ranked against the scores of candidate samples, including the drug being investigated in the test sample. Finally, we also obtained a ranking of all test samples. In five-fold cross-validation, the known virus-drug associations are randomly divided into five subsets, each subset in turn being considered as a test sample, the other four subsets being considered as training samples. All unknown virus-drug pairs will be considered candidate samples. Then, we rank the score of each test sample with the score of the candidate sample. Finally, we get a ranking of all test samples. To avoid bias from random sample partitioning, the five-fold cross validation was repeated 100 times. Furthermore, we plot ROC curves. And calculating the area under the AUC curve, AUC, to evaluate the predicted performance of the method.
And (4) evaluation results: for the five-fold cross validation, the AUC and standard deviation obtained by our Method (MDHGIVDA) was 0.8299+/-0.0037, and the results for the comparative methods HGIMDA, IMCMDA, KATCMDA, RLSMDA were 0.6996+/-0.0022, 0.6808+/-0.0040, 0.8228+/-0.0023, 0.6513+/-0.0229, respectively. In the global leave-one-out cross validation, the results are shown in fig. two, and the AUC obtained by MDHGIVDA is 0.8528, which is higher than 0.7084 of HGIMDA, 0.6902 of IMCMDA, 0.8247 of KATCMDA, and 0.6849 of RLSMDA. In the local leave-one-out cross validation, the results are shown in fig. three, and the AUC obtained by MDHGIVDA is 0.8532, which is higher than 0.7537 of HGIMDA, 0.7436 of IMCMDA, 0.8247 of KATCMDA, and 0.6815 of RLSMDA.
Case study: further, we used a case study to further evaluate the predicted performance of our Method (MDHGIVDA). We have chosen three viruses as representatives to realize case study, and the three viruses are Zika virus, new coronavirus and HIV type 1 respectively. By performing MDHGIVDA, we predicted three virus-associated drugs. Then, we ranked the relevant drugs according to the prediction scores and validated the top 10 potentially relevant drugs for the three viruses by searching the literature on PubMed. The results are shown in tables one, two and three. For the Zika virus, the new coronavirus and the AIDS virus type 1, 10, 8 and 8 of the predicted first ten drugs are verified respectively.
Table one: predicted top ten related drugs of Zika virus
Figure BDA0003740215440000101
Figure BDA0003740215440000111
Table two: predicted first ten related drugs of novel coronaviruses
Figure BDA0003740215440000112
Figure BDA0003740215440000121
Table three: predictive HIV type 1 top ten related drugs
Figure BDA0003740215440000122
Figure BDA0003740215440000131
The same or similar reference numerals correspond to the same or similar parts;
the terms describing positional relationships in the drawings are for illustrative purposes only and are not to be construed as limiting the patent;
it should be understood that the above-described embodiments of the present invention are merely examples for clearly illustrating the present invention and are not intended to limit the embodiments of the present invention. Other variations and modifications will be apparent to persons skilled in the art in light of the above description. And are neither required nor exhaustive of all embodiments. Any modification, equivalent replacement, and improvement made within the spirit and principle of the present invention should be included in the protection scope of the claims of the present invention.

Claims (1)

1. A virus-drug association prediction method based on matrix decomposition and heterogeneous graph reasoning is characterized by comprising the following steps:
the method comprises the following steps: acquiring a known virus-drug association matrix;
a dataset was composed using known virus-drug data, the form of which is expressed as follows:
Figure FDA0003740215430000011
wherein A (i, p) represents the drug d i And virus v p If so, the value of A (i, p) is 1, otherwise it is 0,
step two: integrating similarity matrix of virus and drug;
for the similarity of the drugs, the chemical structure similarity of the drugs, the side effect similarity of the drugs and the Gaussian interaction spectrum nuclear similarity of the drugs are integrated to obtain the integrated drug similarity, and the calculation formula is as follows:
Figure FDA0003740215430000012
wherein SS1 is the similarity of chemical structures of drugs, SS2 is the similarity of side effects of drugs, GD is the Gaussian similarity of drugs, SD is the similarity of integrated drugs,
for virus similarity, integrating virus sequence similarity and the gaussian interaction profile nuclear similarity of the viruses to obtain integrated virus similarity, the formula is as follows:
Figure FDA0003740215430000013
wherein MV is the drug similarity of the virus, GV is the Gaussian similarity of the virus, SV is the integrated virus similarity,
step three: performing matrix decomposition on the virus-drug adjacency matrix and constructing a new virus-drug association matrix;
the known drug-virus adjacency matrix is decomposed into two parts, wherein the first part comprises an original matrix and a low-rank matrix, the second part is a sparse matrix, the decomposition formula is as follows,
A=AX+E (3)
wherein A is a virus and drug correlation matrix, X is a decomposed low-rank matrix, and E is a sparse matrix after decomposition;
then, using the kernel norm to obtain the low rank matrix X, and using the sparse norm to obtain the sparse matrix E, equation (3) above can be converted into the following equation:
Figure FDA0003740215430000021
wherein: i | · | purple wind * Representing kernel norm, | · | viry 2,1 Represents the sparse norm, α is for the control weight;
equation (4) above can also be equivalently expressed as:
Figure FDA0003740215430000022
firstly, the above equation (5) is converted into an unconstrained problem, and secondly, the unconstrained problem is minimized using the enhanced Lagrangian function of the following equation (6),
Figure FDA0003740215430000023
wherein, Y 1 、Y 2 Represents the langerhan multiplier; μ is a penalty factor;
Figure FDA0003740215430000024
represents the F norm;
from the above equation (6), two solutions can be obtained, which are defined as X, respectively * And E * Then, by using A and X * Establishing a new drug-virus association matrix A * The results are as follows:
A * =AX * (7)
step four: a heteromorphic map is constructed to predict potential virus-drug associations,
combining the new drug-virus association matrix constructed in the step three with drug integration similarity and virus integration similarity into a heterogeneous map, predicting potential association probability of the drug and the virus from the heterogeneous map, and if no known association exists between the drug and the virus, defining the potential association probability matrix as the following formula:
Figure FDA0003740215430000031
wherein nv represents the number of viruses, nd represents the number of drugs, v l Represents any one of 1 to nv viruses, d m Represents any one of 1 to nd drugs,
in addition, the weights of the edges in the heteromorphic graph, i.e., the integrated drug similarity and the integrated virus similarity, are normalized according to the degree of their endpoints, and the normalization equation is as follows:
Figure FDA0003740215430000032
Figure FDA0003740215430000033
wherein v is l Represents any one of 1 to nv viruses, d m Represents any one of 1 to nd drugs; further, an iterative method is used to calculate the potential association probability matrix between all drugs and all viruses, the iterative equation is as follows:
P k+1 =aSV×P k ×SD+(1-a)A * (11)
wherein, P k+1 Represents k +1 iterations of the potential relevance probability matrix, a represents a penalty factor,
after iteration is finished, the potential association probability matrix between all the medicines and all the viruses is a predicted virus-medicine association score matrix, the score matrix is still a matrix of nd rows and nv columns, and finally, virus-medicine association prediction is carried out by utilizing the score matrix.
CN202210813511.XA 2022-07-11 2022-07-11 Virus-drug association prediction method based on matrix decomposition and heterogeneous graph reasoning Pending CN115116580A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202210813511.XA CN115116580A (en) 2022-07-11 2022-07-11 Virus-drug association prediction method based on matrix decomposition and heterogeneous graph reasoning

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210813511.XA CN115116580A (en) 2022-07-11 2022-07-11 Virus-drug association prediction method based on matrix decomposition and heterogeneous graph reasoning

Publications (1)

Publication Number Publication Date
CN115116580A true CN115116580A (en) 2022-09-27

Family

ID=83332284

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210813511.XA Pending CN115116580A (en) 2022-07-11 2022-07-11 Virus-drug association prediction method based on matrix decomposition and heterogeneous graph reasoning

Country Status (1)

Country Link
CN (1) CN115116580A (en)

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115966252A (en) * 2023-02-12 2023-04-14 汤永 Antiviral drug screening method based on L1norm diagram
CN116230077A (en) * 2023-02-20 2023-06-06 汤永 Antiviral drug screening method based on restarting hypergraph double random walk
CN116453710B (en) * 2023-06-14 2023-09-22 中国地质大学(武汉) Drug side effect prediction method and device, electronic equipment and storage medium
CN116798545A (en) * 2023-08-21 2023-09-22 中国人民解放军总医院 Antiviral drug screening method, system and storage medium based on non-negative matrix

Cited By (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115966252A (en) * 2023-02-12 2023-04-14 汤永 Antiviral drug screening method based on L1norm diagram
CN115966252B (en) * 2023-02-12 2024-01-19 中国人民解放军总医院 Antiviral drug screening method based on L1norm diagram
CN116230077A (en) * 2023-02-20 2023-06-06 汤永 Antiviral drug screening method based on restarting hypergraph double random walk
CN116230077B (en) * 2023-02-20 2024-01-26 中国人民解放军总医院 Antiviral drug screening method based on restarting hypergraph double random walk
CN116453710B (en) * 2023-06-14 2023-09-22 中国地质大学(武汉) Drug side effect prediction method and device, electronic equipment and storage medium
CN116798545A (en) * 2023-08-21 2023-09-22 中国人民解放军总医院 Antiviral drug screening method, system and storage medium based on non-negative matrix
CN116798545B (en) * 2023-08-21 2023-11-14 中国人民解放军总医院 Antiviral drug screening method, system and storage medium based on non-negative matrix

Similar Documents

Publication Publication Date Title
CN115116580A (en) Virus-drug association prediction method based on matrix decomposition and heterogeneous graph reasoning
Sharma et al. An improved random forest algorithm for predicting the COVID-19 pandemic patient health
Jiang et al. Machine learning advances in microbiology: A review of methods and applications
CN116092598B (en) Antiviral drug screening method based on manifold regularized non-negative matrix factorization
CN115362506A (en) Molecular design
CN114944192A (en) Disease-related circular RNA recognition method based on graph attention
Shankar et al. A systematic evaluation of high-dimensional, ensemble-based regression for exploring large model spaces in microbiome analyses
Mu et al. A novel position-specific encoding algorithm (SeqPose) of nucleotide sequences and its application for detecting enhancers
CN118280436A (en) LncRNA-disease association prediction method based on singular value decomposition and graph comparison learning
Bello et al. Integrating AI/ML models for patient stratification leveraging omics dataset and clinical biomarkers from COVID-19 patients: A promising approach to personalized medicine
Hiltbrunner et al. Assessing genome-wide diversity in European hantaviruses through sequence capture from natural host samples
Zhang Genomic biomarker heterogeneities between SARS-CoV-2 and COVID-19
Amaratunga et al. High-dimensional data
Arshad et al. An explainable stacking-based approach for accelerating the prediction of antidiabetic peptides
Qu et al. A new integrated framework for the identification of potential virus–drug associations
CN114822681A (en) Virus-drug association prediction method based on recommendation system
Sun et al. DBPboost: A method of classification of DNA-binding proteins based on improved differential evolution algorithm and feature extraction
Rajkumar et al. DeepViFi: detecting oncoviral infections in cancer genomes using transformers
Maheshwari et al. Early Detection of Influenza Using Machine Learning Techniques
Parsarad et al. Biased Deep Learning Methods in Detection of COVID-19 Using CT Images: A Challenge Mounted by Subject-Wise-Split ISFCT Dataset
Wang et al. A model for predicting drug-disease associations based on dense convolutional attention network
Kumar et al. Rapid and Accurate Diagnosis of COVID-19 Cases from Chest X-ray Images through an Optimized Features Extraction Approach
Lau et al. Drug repurposing for Leishmaniasis with Hyperbolic Graph Neural Networks
Pietrzak et al. On the analysis of the human immunome via an information theoretical approach
Verma et al. Using machine learning to distinguish infected from non-infected subjects at an early stage based on viral inoculation

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination