CN116759015A - Antiviral drug screening method and system based on hypergraph matrix tri-decomposition - Google Patents
Antiviral drug screening method and system based on hypergraph matrix tri-decomposition Download PDFInfo
- Publication number
- CN116759015A CN116759015A CN202311050747.3A CN202311050747A CN116759015A CN 116759015 A CN116759015 A CN 116759015A CN 202311050747 A CN202311050747 A CN 202311050747A CN 116759015 A CN116759015 A CN 116759015A
- Authority
- CN
- China
- Prior art keywords
- matrix
- virus
- drug
- similarity matrix
- similarity
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
- 239000011159 matrix material Substances 0.000 title claims abstract description 351
- 238000000034 method Methods 0.000 title claims abstract description 80
- 239000003443 antiviral agent Substances 0.000 title claims abstract description 24
- 238000000354 decomposition reaction Methods 0.000 title claims abstract description 24
- 238000007877 drug screening Methods 0.000 title claims abstract description 21
- 239000003814 drug Substances 0.000 claims abstract description 161
- 229940079593 drug Drugs 0.000 claims abstract description 108
- 241000700605 Viruses Species 0.000 claims abstract description 103
- 230000010354 integration Effects 0.000 claims abstract description 51
- 239000000126 substance Substances 0.000 claims abstract description 43
- 230000006870 function Effects 0.000 claims abstract description 36
- 230000003612 virological effect Effects 0.000 claims abstract description 23
- 108090000623 proteins and genes Proteins 0.000 claims abstract description 19
- 238000001228 spectrum Methods 0.000 claims abstract description 19
- 238000012216 screening Methods 0.000 claims abstract description 12
- 238000012163 sequencing technique Methods 0.000 claims abstract description 9
- 238000004364 calculation method Methods 0.000 claims description 36
- 238000010276 construction Methods 0.000 claims description 18
- 108700005077 Viral Genes Proteins 0.000 claims description 12
- 238000012545 processing Methods 0.000 claims description 10
- 238000004590 computer program Methods 0.000 claims description 8
- 102000008297 Nuclear Matrix-Associated Proteins Human genes 0.000 claims description 7
- 108010035916 Nuclear Matrix-Associated Proteins Proteins 0.000 claims description 7
- 210000000299 nuclear matrix Anatomy 0.000 claims description 7
- 230000008569 process Effects 0.000 claims description 6
- 230000003595 spectral effect Effects 0.000 claims description 4
- 210000004027 cell Anatomy 0.000 claims description 3
- 238000009795 derivation Methods 0.000 claims description 3
- 238000002826 magnetic-activated cell sorting Methods 0.000 claims 1
- 238000013473 artificial intelligence Methods 0.000 abstract description 2
- 229940126585 therapeutic drug Drugs 0.000 abstract description 2
- 238000012986 modification Methods 0.000 description 11
- 230000004048 modification Effects 0.000 description 11
- 238000012360 testing method Methods 0.000 description 5
- 238000002790 cross-validation Methods 0.000 description 4
- 230000006872 improvement Effects 0.000 description 4
- 230000003993 interaction Effects 0.000 description 3
- 241001678559 COVID-19 virus Species 0.000 description 2
- 230000006978 adaptation Effects 0.000 description 2
- 230000007547 defect Effects 0.000 description 2
- 238000011161 development Methods 0.000 description 2
- 238000010586 diagram Methods 0.000 description 2
- AUZONCFQVSMFAP-UHFFFAOYSA-N disulfiram Chemical compound CCN(CC)C(=S)SSC(=S)N(CC)CC AUZONCFQVSMFAP-UHFFFAOYSA-N 0.000 description 2
- 239000002547 new drug Substances 0.000 description 2
- 238000012549 training Methods 0.000 description 2
- WHTVZRBIWZFKQO-AWEZNQCLSA-N (S)-chloroquine Chemical compound ClC1=CC=C2C(N[C@@H](C)CCCN(CC)CC)=CC=NC2=C1 WHTVZRBIWZFKQO-AWEZNQCLSA-N 0.000 description 1
- OVCDSSHSILBFBN-UHFFFAOYSA-N Amodiaquine Chemical compound C1=C(O)C(CN(CC)CC)=CC(NC=2C3=CC=C(Cl)C=C3N=CC=2)=C1 OVCDSSHSILBFBN-UHFFFAOYSA-N 0.000 description 1
- QAGYKUNXZHXKMR-UHFFFAOYSA-N CPD000469186 Natural products CC1=C(O)C=CC=C1C(=O)NC(C(O)CN1C(CC2CCCCC2C1)C(=O)NC(C)(C)C)CSC1=CC=CC=C1 QAGYKUNXZHXKMR-UHFFFAOYSA-N 0.000 description 1
- 241000579895 Chlorostilbon Species 0.000 description 1
- 241000711573 Coronaviridae Species 0.000 description 1
- ZBNZXTGUTAYRHI-UHFFFAOYSA-N Dasatinib Chemical compound C=1C(N2CCN(CCO)CC2)=NC(C)=NC=1NC(S1)=NC=C1C(=O)NC1=C(C)C=CC=C1Cl ZBNZXTGUTAYRHI-UHFFFAOYSA-N 0.000 description 1
- 206010013710 Drug interaction Diseases 0.000 description 1
- 241000657949 Elderberry carlavirus D Species 0.000 description 1
- 239000002067 L01XE06 - Dasatinib Substances 0.000 description 1
- HZQDCMWJEBCWBR-UUOKFMHZSA-N Mizoribine Chemical compound OC1=C(C(=O)N)N=CN1[C@H]1[C@H](O)[C@H](O)[C@@H](CO)O1 HZQDCMWJEBCWBR-UUOKFMHZSA-N 0.000 description 1
- XCUAIINAJCDIPM-XVFCMESISA-N N(4)-hydroxycytidine Chemical compound O[C@@H]1[C@H](O)[C@@H](CO)O[C@H]1N1C(=O)NC(=NO)C=C1 XCUAIINAJCDIPM-XVFCMESISA-N 0.000 description 1
- 108091028043 Nucleic acid sequence Proteins 0.000 description 1
- IWUCXVSUMQZMFG-AFCXAGJDSA-N Ribavirin Chemical compound N1=C(C(=O)N)N=CN1[C@H]1[C@H](O)[C@H](O)[C@@H](CO)O1 IWUCXVSUMQZMFG-AFCXAGJDSA-N 0.000 description 1
- YQNQNVDNTFHQSW-UHFFFAOYSA-N acetic acid [2-[[(5-nitro-2-thiazolyl)amino]-oxomethyl]phenyl] ester Chemical compound CC(=O)OC1=CC=CC=C1C(=O)NC1=NC=C([N+]([O-])=O)S1 YQNQNVDNTFHQSW-UHFFFAOYSA-N 0.000 description 1
- 230000004075 alteration Effects 0.000 description 1
- 229960001444 amodiaquine Drugs 0.000 description 1
- 238000004422 calculation algorithm Methods 0.000 description 1
- XASIMHXSUQUHLV-UHFFFAOYSA-N camostat Chemical compound C1=CC(CC(=O)OCC(=O)N(C)C)=CC=C1OC(=O)C1=CC=C(N=C(N)N)C=C1 XASIMHXSUQUHLV-UHFFFAOYSA-N 0.000 description 1
- 229960000772 camostat Drugs 0.000 description 1
- 229960003677 chloroquine Drugs 0.000 description 1
- WHTVZRBIWZFKQO-UHFFFAOYSA-N chloroquine Natural products ClC1=CC=C2C(NC(C)CCCN(CC)CC)=CC=NC2=C1 WHTVZRBIWZFKQO-UHFFFAOYSA-N 0.000 description 1
- KKHPNPMTPORSQE-UHFFFAOYSA-N chlorphenoxamine Chemical compound C=1C=C(Cl)C=CC=1C(C)(OCCN(C)C)C1=CC=CC=C1 KKHPNPMTPORSQE-UHFFFAOYSA-N 0.000 description 1
- 229960003686 chlorphenoxamine Drugs 0.000 description 1
- 229960002448 dasatinib Drugs 0.000 description 1
- 238000007418 data mining Methods 0.000 description 1
- 238000013136 deep learning model Methods 0.000 description 1
- 229960002563 disulfiram Drugs 0.000 description 1
- 238000007876 drug discovery Methods 0.000 description 1
- 239000010976 emerald Substances 0.000 description 1
- 229910052876 emerald Inorganic materials 0.000 description 1
- 238000005516 engineering process Methods 0.000 description 1
- 238000002474 experimental method Methods 0.000 description 1
- 239000012634 fragment Substances 0.000 description 1
- 229960004171 hydroxychloroquine Drugs 0.000 description 1
- XXSMGPRMXLTPCZ-UHFFFAOYSA-N hydroxychloroquine Chemical compound ClC1=CC=C2C(NC(C)CCCN(CCO)CC)=CC=NC2=C1 XXSMGPRMXLTPCZ-UHFFFAOYSA-N 0.000 description 1
- 238000009434 installation Methods 0.000 description 1
- 239000010977 jade Substances 0.000 description 1
- 239000003446 ligand Substances 0.000 description 1
- 238000010801 machine learning Methods 0.000 description 1
- 229960003762 maribavir Drugs 0.000 description 1
- KJFBVJALEQWJBS-XUXIUFHCSA-N maribavir Chemical compound CC(C)NC1=NC2=CC(Cl)=C(Cl)C=C2N1[C@H]1O[C@@H](CO)[C@H](O)[C@@H]1O KJFBVJALEQWJBS-XUXIUFHCSA-N 0.000 description 1
- 239000000463 material Substances 0.000 description 1
- BUGYDGFZZOZRHP-UHFFFAOYSA-N memantine Chemical compound C1C(C2)CC3(C)CC1(C)CC2(N)C3 BUGYDGFZZOZRHP-UHFFFAOYSA-N 0.000 description 1
- 229960004640 memantine Drugs 0.000 description 1
- 229960000901 mepacrine Drugs 0.000 description 1
- HPNSFSBZBAHARI-UHFFFAOYSA-N micophenolic acid Natural products OC1=C(CC=C(C)CCC(O)=O)C(OC)=C(C)C2=C1C(=O)OC2 HPNSFSBZBAHARI-UHFFFAOYSA-N 0.000 description 1
- 238000005065 mining Methods 0.000 description 1
- 229950000844 mizoribine Drugs 0.000 description 1
- 238000003032 molecular docking Methods 0.000 description 1
- RTGDFNSFWBGLEC-SYZQJQIISA-N mycophenolate mofetil Chemical compound COC1=C(C)C=2COC(=O)C=2C(O)=C1C\C=C(/C)CCC(=O)OCCN1CCOCC1 RTGDFNSFWBGLEC-SYZQJQIISA-N 0.000 description 1
- 229960004866 mycophenolate mofetil Drugs 0.000 description 1
- 229960000951 mycophenolic acid Drugs 0.000 description 1
- HPNSFSBZBAHARI-RUDMXATFSA-N mycophenolic acid Chemical compound OC1=C(C\C=C(/C)CCC(O)=O)C(OC)=C(C)C2=C1C(=O)OC2 HPNSFSBZBAHARI-RUDMXATFSA-N 0.000 description 1
- QAGYKUNXZHXKMR-HKWSIXNMSA-N nelfinavir Chemical compound CC1=C(O)C=CC=C1C(=O)N[C@H]([C@H](O)CN1[C@@H](C[C@@H]2CCCC[C@@H]2C1)C(=O)NC(C)(C)C)CSC1=CC=CC=C1 QAGYKUNXZHXKMR-HKWSIXNMSA-N 0.000 description 1
- 229960000884 nelfinavir Drugs 0.000 description 1
- RJMUSRYZPJIFPJ-UHFFFAOYSA-N niclosamide Chemical compound OC1=CC=C(Cl)C=C1C(=O)NC1=CC=C([N+]([O-])=O)C=C1Cl RJMUSRYZPJIFPJ-UHFFFAOYSA-N 0.000 description 1
- 229960001920 niclosamide Drugs 0.000 description 1
- 229960002480 nitazoxanide Drugs 0.000 description 1
- 102000004169 proteins and genes Human genes 0.000 description 1
- ZLIBICFPKPWGIZ-UHFFFAOYSA-N pyrimethanil Chemical compound CC1=CC(C)=NC(NC=2C=CC=CC=2)=N1 ZLIBICFPKPWGIZ-UHFFFAOYSA-N 0.000 description 1
- GPKJTRJOBQGKQK-UHFFFAOYSA-N quinacrine Chemical compound C1=C(OC)C=C2C(NC(C)CCCN(CC)CC)=C(C=CC(Cl)=C3)C3=NC2=C1 GPKJTRJOBQGKQK-UHFFFAOYSA-N 0.000 description 1
- 229960000329 ribavirin Drugs 0.000 description 1
- HZCAHMRRMINHDJ-DBRKOABJSA-N ribavirin Natural products O[C@@H]1[C@H](O)[C@@H](CO)O[C@H]1N1N=CN=C1 HZCAHMRRMINHDJ-DBRKOABJSA-N 0.000 description 1
- 239000010979 ruby Substances 0.000 description 1
- 229910001750 ruby Inorganic materials 0.000 description 1
- 238000004088 simulation Methods 0.000 description 1
- 238000000638 solvent extraction Methods 0.000 description 1
- 238000000547 structure data Methods 0.000 description 1
- 230000001225 therapeutic effect Effects 0.000 description 1
- 229960004626 umifenovir Drugs 0.000 description 1
- KCFYEAOKVJSACF-UHFFFAOYSA-N umifenovir Chemical compound CN1C2=CC(Br)=C(O)C(CN(C)C)=C2C(C(=O)OCC)=C1CSC1=CC=CC=C1 KCFYEAOKVJSACF-UHFFFAOYSA-N 0.000 description 1
- BPQMGSKTAYIVFO-UHFFFAOYSA-N vismodegib Chemical compound ClC1=CC(S(=O)(=O)C)=CC=C1C(=O)NC1=CC=C(Cl)C(C=2N=CC=CC=2)=C1 BPQMGSKTAYIVFO-UHFFFAOYSA-N 0.000 description 1
- 229960004449 vismodegib Drugs 0.000 description 1
Classifications
-
- G—PHYSICS
- G16—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
- G16C—COMPUTATIONAL CHEMISTRY; CHEMOINFORMATICS; COMPUTATIONAL MATERIALS SCIENCE
- G16C20/00—Chemoinformatics, i.e. ICT specially adapted for the handling of physicochemical or structural data of chemical particles, elements, compounds or mixtures
- G16C20/50—Molecular design, e.g. of drugs
-
- G—PHYSICS
- G16—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
- G16B—BIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
- G16B20/00—ICT specially adapted for functional genomics or proteomics, e.g. genotype-phenotype associations
Landscapes
- Engineering & Computer Science (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Life Sciences & Earth Sciences (AREA)
- Health & Medical Sciences (AREA)
- Chemical & Material Sciences (AREA)
- Physics & Mathematics (AREA)
- General Health & Medical Sciences (AREA)
- Bioinformatics & Computational Biology (AREA)
- Spectroscopy & Molecular Physics (AREA)
- Theoretical Computer Science (AREA)
- Biophysics (AREA)
- Medicinal Chemistry (AREA)
- Computing Systems (AREA)
- Crystallography & Structural Chemistry (AREA)
- Analytical Chemistry (AREA)
- Pharmacology & Pharmacy (AREA)
- Genetics & Genomics (AREA)
- Molecular Biology (AREA)
- Proteomics, Peptides & Aminoacids (AREA)
- Biotechnology (AREA)
- Evolutionary Biology (AREA)
- Medical Informatics (AREA)
- Measuring Or Testing Involving Enzymes Or Micro-Organisms (AREA)
Abstract
The application provides an antiviral drug screening method and system based on hypergraph matrix tri-decomposition, which belongs to the technical field of bioinformatics, computational biology and artificial intelligence intersection, wherein the method is realized by the system and comprises the following steps: s1, constructing an adjacency matrix of virus-drug association; s2, calculating a viral Gaussian distance similarity matrix and a drug Gaussian distance similarity matrix; s3, calculating a virus gene sequence similarity matrix and a pharmaceutical chemical structure similarity matrix; s4, integrating to obtain a virus integration similarity matrix and a drug integration similarity matrix by using a rapid kernel learning method; s5, obtaining a virus kernel matrix and a medicine kernel matrix by using a frequency spectrum offset method; s6, constructing a loss function by using matrix trigonometric decomposition based on hypergraph learning, and solving to obtain a virus-drug prediction score matrix; s7, based on a virus-drug prediction score matrix, screening and sequencing to obtain a final prediction result. The application can screen out the effective therapeutic drug of virus with high efficiency and rapidness.
Description
Technical Field
The application relates to the technical field of intersection of bioinformatics, computational biology and artificial intelligence, in particular to an antiviral drug screening method and system based on hypergraph matrix tri-decomposition.
Background
The development of new drugs from scratch has a number of difficulties such as long time consumption, high risk, and the like, and a new drug discovery strategy is needed, and the therapeutic effect of the obtained drugs in batches outside the range of original medical indications is discovered by using a calculation method, so that the time consumption and the labor cost can be greatly reduced.
The reported antiviral drug calculation screening methods can be divided into the following three types, but the following disadvantages exist respectively: (1) The molecular docking method simulates the interaction between the ligand and the target protein on the atomic level, and has the defects of multiple parameters, high requirement on experience and the like; (2) The kinetic simulation method confirms the stable interaction of ligand-protein through a thermodynamic method, and has large calculation amount and high requirement on hardware conditions; (3) And a deep learning model predicts the problem of poor interpretation of antiviral drugs. With the gradual development of data mining technology, the related machine learning method can integrate and disclose various information of the biomedical database, and can efficiently and accurately finish drug screening tasks.
Disclosure of Invention
The application provides an antiviral drug screening method and system based on hypergraph matrix trisection, which can accurately and efficiently screen antiviral drugs according to virus-drug association, virus genome sequence and drug chemical structure data.
The first aspect of the embodiments of the present specification discloses an antiviral drug screening method based on hypergraph matrix trisection, comprising the steps of:
s1, constructing an adjacency matrix of virus-drug association;
s2, calculating a virus Gaussian distance similarity matrix and a drug Gaussian distance similarity matrix based on the adjacent matrix of the virus-drug association;
s3, calculating a virus gene sequence similarity matrix based on a virus genome sequence, and calculating a drug chemical structure similarity matrix based on a drug chemical structure;
s4, based on the viral Gaussian distance similarity matrix and the viral gene sequence similarity matrix, integrating by using a fast kernel learning method to obtain a viral integration similarity matrix; based on the Gaussian distance similarity matrix of the medicine and the chemical structure similarity matrix of the medicine, a rapid kernel learning method is used for integrating to obtain a medicine integration similarity matrix;
s5, processing the virus integration similar matrix and the drug integration similar matrix by using a frequency spectrum deviation method to obtain a virus kernel matrix and a drug kernel matrix;
s6, constructing a loss function based on matrix trisection of hypergraph learning, and solving to obtain a virus-drug prediction score matrix;
s7, screening out the scores of the rows of the target viruses based on the virus-medicine prediction score matrix, and sequencing to obtain a final prediction result.
In the embodiments disclosed in the present specification, in S1:
inputting a known virus-drug association pair to construct an adjacency matrix A of the virus-drug association;
if the correlation pair is known, the corresponding position is 1, otherwise, the correlation pair is 0;
the row number of the adjacent matrix A is the virus number nv, and the column number is the medicine number nd.
In the embodiments disclosed in the present specification, in S2:
if the association exists between the medicine d (i) and a certain virus, the corresponding position is marked as 1, otherwise, the corresponding position is marked as 0, a vector formed by 0 or 1 with the size of 1 Xnv is formed, the vector is marked as a vector spectrum IP (d (i)) of the medicine d (i), and then the Gaussian distance similarity between the medicine d (i) and the medicine d (j) is calculated:
;
in the above, the parameter gamma d For controlling the nuclear bandwidth by normalizing the new bandwidth parameter gamma' d Obtaining:
;
in a similar manner, the Gaussian distance similarity between viruses v (i) and v (j) is defined, a vector consisting of 0 or 1 in the size of 1×nd is obtained, denoted as vector spectrum IP (v (i)) of virus v (i), and the Gaussian distance similarity between viruses v (i) and v (j) is calculated:
;
parameter gamma v For controlling the nuclear bandwidth by normalizing the new bandwidth parameter gamma' v Obtaining:
;
above gamma' d And gamma' v Are constant.
In the embodiments disclosed in the present specification, in S3:
calculating a viral gene sequence similarity matrix based on the viral genome sequence by using a multiple sequence comparison method;
based on the chemical structure of the medicine, the MACS fingerprint of the medicine is obtained, and the chemical structure similarity matrix of the medicine is calculated by adopting the valley coefficient (namely Jaccard similarity).
In the embodiments disclosed in the present specification, in S4:
the semi-positive programming formula of the fast kernel learning method is as follows:
;
wherein, the first term is a reconstruction loss norm term and represents the magnitude of the integration error of the similarity matrix; the second term is a regularization term, which is used to avoid overfitting; wherein A is virus-drug association adjacency matrix, S j v (j=1, 2) respectively represent a viral Gaussian distance similarity matrix and a viral gene sequence similarity matrix, μ v For regularization parameters, lambda v ∈R 1×2 For the coefficients to be solved, by lambda v Obtaining a virus integration similarity matrix:
;
similarly, according to the aboveObtaining the integrated parameter lambda of the drug chemical structure similarity matrix and the drug Gaussian distance similarity matrix d ∈R 1×2 Drug integration similarity matrix is then calculated:
;
wherein S is j d (j=1, 2) represents a pharmaceutical gaussian distance similarity matrix and a pharmaceutical chemical structure similarity matrix, respectively.
In the embodiments disclosed in the present specification, in S5:
processing virus integration similarity matrices using spectral shiftingS v Obtaining a viral nuclear matrix K * v Processing drug-integrated similarity matrices using spectral shiftingS d Obtaining a medicine nuclear matrix K * d The specific calculation method is as followsIntegrated similarity matrix for decomposing virus or medicineSWhereinUIs an orthogonal matrix of the type that,Λis a diagonal matrix of real eigenvalues, i.eΛ = diag(λ 1 , λ 2 , …, λ n ),λ min (S) Representing an input matrixSIs used to determine the minimum characteristic value of the (c),Irepresenting the identity matrix of the cell,Trepresenting a transpose of the matrix;
processing virus integration similarity matrix by adopting frequency spectrum offset methodS v Matrix and drug integration similarityS d Matrix, the purpose of which is to strengthen the similarity between any two samples without changing the similarity between themS v AndS d matrix self-similarity.
In the embodiments disclosed in the present specification, in S6:
constructing a loss function by using matrix trigonometric decomposition based on hypergraph learning, and solving to obtain a virus-drug prediction score matrix;
first, an objective function is constructed using a matrix trigonometric decomposition based on hypergraph learning as follows:
;
wherein the method comprises the steps ofThe matrix is a known virus-drug association matrix, and F is a predictive score matrix to be solved; />The representation will be +.>The right-hand sub-value of (2) is recorded asJMin represents the minimum value to be calculated,representing a projection matrix, whereinr nv Andr nd representing the dimension of the virus latent feature space and the dimension of the drug latent feature space, respectively, || ·| #, respectively F Finger matrixFThe norm of the sample is calculated,tr(. Cndot.) represents the trace of the matrix,λ 1 andλ 2 is a regularization coefficient;
based on viral kernel matrix K * v ,Solving by Singular Value Decomposition (SVD) method to obtain low-rank approximate matrix W, specifically;/>Hypergraph normalized Laplacian matrix representing virus calculated in +.>Wherein I is an identity matrix, D vv And D ve Diagonal matrix of vertex degree and superside degree corresponding to viruses respectively, and matrix H v Diagonal matrix for virus hypergraph association, A v A diagonal matrix for virus hyperedge weights; based on medicine nuclear matrix K * d By singular meansSolving by a value decomposition SVD method to obtain a low-rank approximate matrix H, specifically +.> Hypergraph normalized Laplacian matrix representing drug, calculated as +.>Wherein I is an identity matrix, D dv And D de Diagonal matrix of vertex and superside of medicine respectively, matrix H d Diagonal matrix for drug hypergraph association, A d A diagonal matrix for drug overedge weight;
post-derivation commandObtain->Solving->,Wherein->Is an identity matrix, the matrices P, Q and C are intermediate process variables without special meaning, and the matrix is obtained by using a universal solution of the Sieve equation>The method comprises the steps of carrying out a first treatment on the surface of the Calculating a viral-drug predictive score matrix。
In the embodiments disclosed in the present specification, in S7:
and screening out the scores of the rows of the target viruses according to the virus-drug association pair prediction scores, and obtaining a final prediction result after sequencing.
The second aspect of the embodiment of the application discloses an antiviral drug screening system based on hypergraph matrix trisection, which comprises:
the adjacency matrix construction module is used for constructing an adjacency matrix of virus-drug association;
the Gaussian distance similarity matrix calculation module is used for calculating a viral Gaussian distance similarity matrix and a drug Gaussian distance similarity matrix based on the adjacent matrix of the virus-drug association;
the virus gene sequence similarity matrix and pharmaceutical chemical structure similarity matrix calculation module is used for calculating a virus gene sequence similarity matrix based on a virus genome sequence and calculating a pharmaceutical chemical structure similarity matrix based on a pharmaceutical chemical structure;
the integration similarity matrix calculation module is used for integrating the virus integration similarity matrix by using a fast kernel learning method based on the virus Gaussian distance similarity matrix and the virus gene sequence similarity matrix; based on the Gaussian distance similarity matrix of the medicine and the chemical structure similarity matrix of the medicine, a rapid kernel learning method is used for integrating to obtain a medicine integration similarity matrix;
the loss function construction module is used for processing the virus integration similarity matrix and the drug integration similarity matrix by using a frequency spectrum deviation method and constructing a loss function by using matrix trisection of hypergraph learning;
the loss function solving module is used for solving the loss function to obtain a virus-medicine prediction score matrix;
and the prediction module is used for screening out the scores of the rows of the target viruses based on the virus-medicine prediction score matrix, and obtaining a final prediction result after sequencing.
In an embodiment disclosed in the present specification, the hypergraph matrix tri-decomposition-based antiviral drug screening system further includes:
the processor is respectively connected with the adjacent matrix construction module, the Gaussian distance similarity matrix calculation module, the viral gene sequence similarity matrix and pharmaceutical chemical structure similarity matrix calculation module, the integration similarity matrix calculation module, the loss function construction module, the loss function solving module and the prediction module;
a memory coupled to the processor and storing a computer program executable on the processor;
when the processor executes the computer program, the processor controls the adjacent matrix construction module, the Gaussian distance similarity matrix calculation module, the virus gene sequence similarity matrix and pharmaceutical chemical structure similarity matrix calculation module, the integration similarity matrix calculation module, the loss function construction module, the loss function solving module and the prediction module to work so as to realize the antiviral drug screening method based on hypergraph matrix trisection.
In summary, the application has at least the following advantages:
the application constructs an adjacent matrix of virus-drug association, and respectively calculates a virus Gaussian distance similarity matrix and a drug Gaussian distance similarity matrix; calculating a virus gene sequence similarity matrix by using a virus genome sequence, and calculating a drug chemical structure similarity matrix by using chemical structure information of a drug; calculating a virus integration similarity matrix and a drug integration similarity matrix by using a fast kernel learning method; and constructing a loss function by combining a spectrum offset method and matrix trigonometry of hypergraph learning, solving to obtain a virus-drug association prediction score matrix, and screening and sequencing to obtain a final result. The application can rapidly and efficiently screen out effective viral therapeutic drugs, overcomes the defects of long time consumption and high cost of biomedical experimental methods, and provides ideas for emergency solutions under specific conditions.
Drawings
In order to more clearly illustrate the technical solutions of the embodiments of the present application, the drawings required for the description of the embodiments will be briefly described below, and it is obvious that the drawings in the following description are only some embodiments of the present application, and other drawings may be obtained according to these drawings without inventive effort for a person skilled in the art.
FIG. 1 is a schematic diagram of steps of a hypergraph matrix-based trisection antiviral drug screening method according to the present application.
FIG. 2 is a flow chart of a hypergraph matrix-based method for screening antiviral drugs according to the present application.
FIG. 3 is a graph showing the comparison of the results of five-fold cross-validation of the hypergraph matrix-based antiviral drug screening method and the baseline method according to the present application.
FIG. 4 is a schematic diagram of an antiviral drug screening system based on hypergraph matrix tricompositions according to the present application.
Detailed Description
Hereinafter, only certain exemplary embodiments are briefly described. As will be recognized by those of skill in the pertinent art, the described embodiments may be modified in numerous different ways without departing from the spirit or scope of the embodiments of the present application. Accordingly, the drawings and description are to be regarded as illustrative in nature and not as restrictive.
The following disclosure provides many different implementations, or examples, for implementing different configurations of embodiments of the application. In order to simplify the disclosure of embodiments of the present application, components and arrangements of specific examples are described below. Of course, they are merely examples and are not intended to limit embodiments of the present application. Furthermore, embodiments of the present application may repeat reference numerals and/or letters in the various examples, which are for the purpose of brevity and clarity, and which do not themselves indicate the relationship between the various embodiments and/or arrangements discussed.
Embodiments of the present application will be described in detail below with reference to the accompanying drawings.
It should be noted that the known human drug-virus association data used in the examples of this specification were collected from the literature concerned, and that 455 confirmed human virus-drug interactions were obtained after the literature-reported experimentally verified drug-virus interaction pairs were first sorted using text mining techniques, involving 34 viruses and 219 drugs (literature DOI:10.1016/j. Asoc. 2021.107135); the pharmaceutical chemistry is downloaded from the drug bank database and the viral genome nucleotide sequences are obtained from the NCBI database of the national center for biotechnology information.
As shown in fig. 1 and 2, a first aspect of an embodiment of the present specification discloses an antiviral drug screening method based on hypergraph matrix tri-decomposition, including the steps of:
s1, constructing an adjacency matrix of virus-drug association.
Inputting a known virus-drug association pair to construct an adjacency matrix A of the virus-drug association;
;
the obtained adjacent matrix A element is 0 or 1, the size is 34 rows multiplied by 219 columns, and the value range of i and j is more than or equal to 1 and less than or equal to 34,1 and less than or equal to 219.
S2, calculating a virus Gaussian distance similarity matrix and a drug Gaussian distance similarity matrix based on an adjacent matrix of virus-drug association.
If the association exists between the medicine d (i) and a certain virus, the corresponding position is marked as 1, otherwise, the corresponding position is marked as 0, a vector formed by 0 or 1 with the size of 1 multiplied by 34 is formed, the vector spectrum is marked as a vector spectrum IP (d (i)) of the medicine d (i), and then the Gaussian distance similarity between the medicine d (i) and the medicine d (j) is calculated:
;
in the above formula, IP (d (j)) is the vector spectrum of the drug d (j); parameter gamma d For controlling the nuclear bandwidth by normalizing the new bandwidth parameter gamma' d Obtaining:
;
in a similar manner, defining the Gaussian distance similarity between the viruses v (i) and v (j), if the association exists between a certain virus v (i) and a certain medicine, marking the corresponding position as 1, otherwise marking the corresponding position as 0, forming a vector formed by 0 or 1 with the size of 1 multiplied by 219, marking the vector as a vector spectrum IP (v (i)) of the virus v (i), and then calculating the Gaussian distance similarity between the viruses v (i) and v (j):
;
in the above formula, IP (v (j)) is the vector spectrum of virus v (j), and parameter gamma v For controlling the nuclear bandwidth by normalizing the new bandwidth parameter gamma' v Obtaining:
;
above gamma' d And gamma' v Are all constant, take gamma' d =γ’ v =1。
Where nv denotes the number of viruses, in this case 34, nd denotes the number of drugs, in this case 219, which is calculated to give a symmetric matrix S of 34X 34 1 v (viral Gaussian distance similarity matrix) and a symmetric matrix S of 219×219 1 d (drug gaussian distance similarity matrix) and both matrix element values are between 0 and 1.
S3, calculating a viral gene sequence similarity matrix based on the viral genome sequence, and calculating a pharmaceutical chemical structure similarity matrix based on the pharmaceutical chemical structure.
Inputting viral genome sequence, and calculating to obtain viral gene sequence similarity matrix S by using a multi-sequence comparison tool MAFFT 2 v The method comprises the steps of carrying out a first treatment on the surface of the Inputting a chemical structure of a drug represented by SMILES codes, obtaining a molecular access system fingerprint (MACS) of the drug by using a chemical informatics software RDkit or Open Babel, and calculating Tanimoto similarity by using an R packet RxnSim to obtain a chemical structure similarity matrix S of the drug 2 d The specific calculation method is that for two medicines D (i) and D (j), the character string set of the binary representation of MACS fragments of the two medicines is respectively marked as the similarity S between D (i) and D (j) d ij The value can be calculated using the following formula:
;
s4, integrating to obtain a virus integration similarity matrix by using a fast kernel learning method based on the virus Gaussian distance similarity matrix and the virus gene sequence similarity matrix; based on the drug Gaussian distance similarity matrix and the drug chemical structure similarity matrix, a rapid kernel learning method is used for integration to obtain a drug integration similarity matrix.
The method comprises the steps of integrating a virus gene sequence similarity matrix and a virus Gaussian distance similarity matrix by using a fast kernel learning method, and specifically solving the following semi-positive programming:
;
wherein, the first term is a reconstruction loss norm term and represents the magnitude of the integration error of the similarity matrix; the second term is a regularization term, which is used to avoid overfitting; wherein A is virus-drug association adjacency matrix, S j v (j=1, 2) respectively represent a viral Gaussian distance similarity matrix and a viral gene sequence similarity matrix, μ v For regularization parameters, lambda v ∈R 1×2 For coefficients to be solved, a CVX tool box in Matlab software is used for solving to obtain a virus integration similarity matrix:
;
similarly, the integrated parameter lambda of the pharmaceutical chemical structure similarity matrix and the pharmaceutical Gaussian distance similarity matrix can be obtained according to the above d ∈R 1×2 Drug integration similarity matrix is then calculated:
;
wherein S is j d (j=1, 2) represents a pharmaceutical gaussian distance similarity matrix and a pharmaceutical chemical structure similarity matrix, respectively.
S5, processing the virus integration similar matrix and the drug integration similar matrix by using a frequency spectrum offset method to obtain a corresponding kernel matrix K * v And K * d ;
The specific calculation method is as followsIntegrated similarity matrix for decomposing virus or medicineSWhereinUIs an orthogonal matrix of the type that,Λis a diagonal matrix of real eigenvalues, i.eΛ = diag(λ 1 , λ 2 , …, λ n ),λ min (S) Representing an input matrixSIs used to determine the minimum characteristic value of the (c),Irepresenting the identity matrix of the cell,Trepresenting a transpose of the matrix; processing virus integration similarity matrix by adopting frequency spectrum offset methodS v Matrix and drug integration similarityS d Matrix, the purpose of which is to strengthen the similarity between any two samples without changing the similarity between themS v AndS d matrix self-similarity.
S6, constructing a loss function by using matrix trigonometric decomposition based on hypergraph learning, and solving to obtain a virus-drug prediction score matrix;
first, an objective function is constructed using a matrix trigonometric decomposition based on hypergraph learning as follows:
;
wherein the method comprises the steps ofThe matrix is a known virus-drug association matrix, and F is a predictive score matrix to be solved; />The representation will be +.>The right-hand sub-value of (2) is recorded asJMin represents the minimum value to be calculated,representing a projection matrix, whereinr nv Andr nd representing the dimension of the virus latent feature space and the dimension of the drug latent feature space, respectively, || ·| #, respectively F Finger matrixFThe norm of the sample is calculated,tr(. Cndot.) represents the trace of the matrix,λ 1 andλ 2 is a regularization coefficient; based on viral kernel matrix K * v ,Solving by Singular Value Decomposition (SVD) method to obtain a low-rank approximate matrix W, specifically +.>;/>Hypergraph normalized Laplacian matrix representing virus calculated in +.>Wherein I is an identity matrix, D vv And D ve Diagonal matrix of vertex degree and superside degree corresponding to viruses respectively, and matrix H v Diagonal matrix for virus hypergraph association, A v A diagonal matrix for virus hyperedge weights; based on medicine nuclear matrix K * d Solving by Singular Value Decomposition (SVD) method to obtain low-rank approximate matrix H, specifically;/>Hypergraph normalized Laplacian matrix representing drug, calculated as +.>Wherein I is an identity matrix, D dv And D de Diagonal matrix of vertex and superside of medicine respectively, matrix H d Diagonal matrix for drug hypergraph association, A d A diagonal matrix for drug overedge weight;
post-derivation commandObtain->Solving->,Wherein->Is an identity matrix, the matrices P, Q and C are intermediate process variables without special meaning, and the matrix is obtained by using a universal solution of the Sieve equation>The method comprises the steps of carrying out a first treatment on the surface of the Calculating a viral-drug predictive score matrix。
S7, screening out the scores of the target viruses according to the virus-drug association pair prediction scores, and sequencing to obtain a final prediction result.
In the implementation process of the algorithm, regularization parameters are takenλ 1 =λ 2 =1; when Matlab programming is used for implementation, a kernel matrix K is obtained * v And K * d SVD decomposition of (a) uses SVDs functions, inversion operations involving matrices P, Q and C uses pinv functions, and the inversion operations are related to matricesIs a sylvester function.
The validity of the application is verified:
the method for screening the antiviral drugs based on hypergraph matrix three-decomposition shown in fig. 1 and fig. 2 adopts five-fold cross validation to evaluate the prediction performance, and the specific implementation mode is as follows: all known drug-virus associations are randomly and averagely divided into 5 groups, each group is sequentially set as a test sample, and other groups are used as training samples (when the selection conditions of the test samples are different, the Gaussian distance similarity matrix calculated by depending on the test samples is changed). The training samples are used as inputs to the method to obtain a predicted result, and finally the predicted score of each test sample in the set is compared with the score of the candidate sample. To reduce the impact of random partitioning on the results during the generation of test samples, 100 five-fold cross-validation was performed.
The following data were obtained after calculation using Matlab programming, as shown in figure 3, which is a comparison of AUROC (area under ROC curve) values between the present method HGRTMFVDA and several virus-drug screening models that have been reported. The method obtains AUROC values of 0.8315 +/-0.0084 in five-fold cross validation, and shows more excellent prediction performance than that of several classical models.
On the other hand, the method is used for predicting a specific virus, such as a novel coronavirus (SARS-CoV-2), and the row corresponding to the SARS-CoV-2 in the scoring matrix is screened to obtain the prediction score of the novel coronal related drugs, and 17 of the first 20 drugs can be supported by the reported literature after the descending order of the prediction score.
The table below shows the predicted results for the first 20 drug names and PMID number or DOI number of the support literature.
Sequence number | Drug name | Support evidence |
1 | Nitazoxanide | PMID:36332361 |
2 | Mizoribine | PMID:17336519 |
3 | Chloroquine | PMID:33906514 |
4 | Hydroxychloroquine | PMID:32926573,33906514 |
5 | Ribavirin | PMID:33689451 |
6 | N4-Hydroxycytidine | PMID:35492218 |
7 | Amodiaquine | PMID:36332361 |
8 | Niclosamide | PMID:34664162 |
9 | Maribavir | DOI:10.1021/acsbiomedchemau.2c00039 |
10 | Memantine | PMID:32828269 |
11 | Quinacrine | PMID:33477376 |
12 | Mycophenolic Acid | PMID:32579258 |
13 | Disulfiram | PMID:33855277 |
14 | Dasatinib | PMID:36704839 |
15 | Mycophenolate Mofetil | Is not found temporarily |
16 | Chlorphenoxamine | Is not found temporarily |
17 | Camostat | PMID:35692220 |
18 | Umifenovir | PMID:36245851 |
19 | Vismodegib | Is not found temporarily |
20 | Nelfinavir | DOI:10.26434/chemrxiv.12039888.v1 |
In summary, the application has the advantages that: by introducing the hypergraph regularization method in matrix trisection, the high-dimensional manifold structure information can be fully obtained to improve the prediction performance, so that the virus-drug association prediction result is more accurate and has robustness.
As shown in fig. 4, a second aspect of the embodiment of the present application discloses an antiviral drug screening system based on hypergraph matrix tri-decomposition, comprising:
the adjacency matrix construction module is used for constructing an adjacency matrix of virus-drug association;
the Gaussian distance similarity matrix calculation module is used for calculating a viral Gaussian distance similarity matrix and a drug Gaussian distance similarity matrix based on an adjacent matrix of virus-drug association;
the virus gene sequence similarity matrix and pharmaceutical chemical structure similarity matrix calculation module is used for calculating a virus gene sequence similarity matrix based on a virus genome sequence and calculating a pharmaceutical chemical structure similarity matrix based on a pharmaceutical chemical structure;
the integrated similarity matrix calculation module is used for integrating to obtain a virus integrated similarity matrix by using a fast kernel learning method based on the virus Gaussian distance similarity matrix and the virus gene sequence similarity matrix; based on the drug Gaussian distance similarity matrix and the drug chemical structure similarity matrix, a rapid kernel learning method is used for integrating to obtain a drug integration similarity matrix;
the loss function construction module is used for constructing a loss function by using matrix trisection based on hypergraph learning based on an adjacent matrix, a virus integration similarity matrix and a drug integration similarity matrix of virus-drug association;
the loss function solving module is used for solving the loss function to obtain a virus-medicine prediction score matrix;
and the prediction module is used for screening out the scores of the rows of the target viruses based on the virus-medicine prediction score matrix, and obtaining a final prediction result after sequencing.
In embodiments disclosed herein, the hypergraph matrix tri-decomposition based antiviral drug screening system further comprises:
the processor is respectively connected with the adjacent matrix construction module, the Gaussian distance similarity matrix calculation module, the viral gene sequence similarity matrix and pharmaceutical chemical structure similarity matrix calculation module, the integration similarity matrix calculation module, the loss function construction module, the loss function solving module and the prediction module;
a memory coupled to the processor and storing a computer program executable on the processor;
when the processor executes the computer program, the processor controls the adjacent matrix construction module, the Gaussian distance similarity matrix calculation module, the virus gene sequence similarity matrix and pharmaceutical chemical structure similarity matrix calculation module, the integration similarity matrix calculation module, the loss function construction module, the loss function solving module and the prediction module to work so as to realize the hypergraph matrix three-decomposition-based antiviral drug screening method of any one of the above.
The above embodiments are provided to illustrate the present application and not to limit the present application, so that the modification of the exemplary values or the replacement of equivalent elements should still fall within the scope of the present application.
From the foregoing detailed description, it will be apparent to those skilled in the art that the present application can be practiced without these specific details, and that the present application meets the requirements of the patent statutes.
While preferred embodiments of the present application have been described, additional variations and modifications in those embodiments may occur to those skilled in the art once they learn of the basic inventive concepts. It is therefore intended that the following claims be interpreted as including the preferred embodiments and all such alterations and modifications as fall within the scope of the application. The foregoing description of the preferred embodiment of the application is not intended to be limiting, but rather to cover all modifications, equivalents, and alternatives falling within the spirit and principles of the application.
It should be noted that the above description of the flow is only for the purpose of illustration and description, and does not limit the application scope of the present specification. Various modifications and changes to the flow may be made by those skilled in the art under the guidance of this specification. However, such modifications and variations are still within the scope of the present description.
While the basic concepts have been described above, it will be apparent to those of ordinary skill in the art after reading this application that the above disclosure is by way of example only and is not intended to be limiting. Although not explicitly described herein, various modifications, improvements, and adaptations of the application may occur to one of ordinary skill in the art. Such modifications, improvements, and modifications are intended to be suggested within the present disclosure, and therefore, such modifications, improvements, and adaptations are intended to be within the spirit and scope of the exemplary embodiments of the present disclosure.
Meanwhile, the present application uses specific words to describe embodiments of the present application. For example, "one embodiment," "an embodiment," and/or "some embodiments" means a particular feature, structure, or characteristic in connection with at least one embodiment of the application. Thus, it should be emphasized and should be appreciated that two or more references to "an embodiment" or "one embodiment" or "an alternative embodiment" in various positions in this specification are not necessarily referring to the same embodiment. Furthermore, certain features, structures, or characteristics of one or more embodiments of the application may be combined as suitable.
Furthermore, those of ordinary skill in the art will appreciate that aspects of the application are illustrated and described in the context of a number of patentable categories or conditions, including any novel and useful processes, machines, products, or materials, or any novel and useful improvements thereof. Accordingly, aspects of the present application may be implemented entirely in hardware, entirely in software (including firmware, resident software, micro-code, etc.) or a combination of hardware and software. The above hardware or software may be referred to as a "unit," module, "or" system. Furthermore, aspects of the present application may take the form of a computer program product embodied in one or more computer-readable media, wherein the computer-readable program code is embodied therein.
Computer program code required for operation of portions of the present application may be written in any one or more programming languages, including an object oriented programming language such as Java, scala, smalltalk, eiffel, JADE, emerald, C ++, C#, VB.NET, python, etc., a conventional programming language such as C programming language, visualBasic, fortran2103, perl, COBOL2102, PHP, ABAP, a dynamic programming language such as Python, ruby and Groovy, or other programming languages, etc. The program code may execute entirely on the user's computer, or as a stand-alone software package, partly on the user's computer and partly on a remote computer or entirely on the remote computer or server. In the latter scenario, the remote computer may be connected to the user's computer through any form of network, such as a Local Area Network (LAN) or a Wide Area Network (WAN), or the connection may be made to an external computer (for example, through the Internet), or the use of services such as software as a service (SaaS) in a cloud computing environment.
Furthermore, the order in which the elements and sequences are presented, the use of numerical letters, or other designations are used in the application is not intended to limit the sequence of the processes and methods unless specifically recited in the claims. While certain presently useful inventive embodiments have been discussed in the foregoing disclosure, by way of example, it is to be understood that such details are merely illustrative and that the appended claims are not limited to the disclosed embodiments, but, on the contrary, are intended to cover all modifications and equivalent arrangements included within the spirit and scope of the embodiments of the application. For example, while the implementation of the various components described above may be embodied in a hardware device, it may also be implemented as a purely software solution, e.g., an installation on an existing server or mobile device.
Likewise, it should be noted that in order to simplify the presentation of the disclosure and thereby aid in understanding one or more inventive embodiments, various features are sometimes grouped together in a single embodiment, figure, or description thereof. This method of disclosure, however, is not to be interpreted as reflecting an intention that the claimed subject matter requires more features than are expressly recited in each claim. Rather, the inventive subject matter should be provided with fewer features than the single embodiments described above.
Claims (2)
1. The antiviral drug screening method based on hypergraph matrix trisection is characterized by comprising the following steps of:
s1, constructing an adjacency matrix of virus-drug association;
s2, calculating a virus Gaussian distance similarity matrix and a drug Gaussian distance similarity matrix based on the adjacent matrix of the virus-drug association;
s3, calculating a virus gene sequence similarity matrix based on a virus genome sequence, and calculating a drug chemical structure similarity matrix based on a drug chemical structure;
s4, based on the viral Gaussian distance similarity matrix and the viral gene sequence similarity matrix, integrating by using a fast kernel learning method to obtain a viral integration similarity matrix; based on the Gaussian distance similarity matrix of the medicine and the chemical structure similarity matrix of the medicine, a rapid kernel learning method is used for integrating to obtain a medicine integration similarity matrix;
s5, processing the virus integration similar matrix and the drug integration similar matrix by using a frequency spectrum deviation method to respectively obtain a virus core matrix and a drug core matrix;
s6, constructing a loss function by using matrix trigonometric decomposition based on hypergraph learning, and solving to obtain a virus-drug prediction score matrix;
s7, screening out the scores of the rows of the target viruses based on the virus-medicine prediction score matrix, and sequencing to obtain a final prediction result;
the specific implementation method of the S1 is as follows:
inputting a known virus-drug association pair to construct an adjacency matrix A of the virus-drug association;
if the correlation pair is known, the corresponding position is 1, otherwise, the correlation pair is 0;
the row number of the adjacent matrix A is the virus number nv, and the column number is the medicine number nd;
the specific implementation method of the S2 is as follows:
if the association exists between the medicine d (i) and a certain virus, the corresponding position is marked as 1, otherwise, the corresponding position is marked as 0, a vector formed by 0 or 1 with the size of 1 Xnv is formed, the vector spectrum IP (d (i)) of the medicine d (i) is marked, and nv is the number of viruses; the gaussian distance similarity between drugs d (i) and d (j) is then calculated:
;
in the above formula, IP (d (j)) is the vector spectrum of the drug d (j); parameter gamma d For controlling the nuclear bandwidth by normalizing the new bandwidth parameter gamma' d Obtaining:
;
wherein nd is the number of drugs; in a similar manner, the Gaussian distance similarity between viruses v (i) and v (j) is defined, a vector consisting of 0 or 1 in the size of 1×nd is obtained, denoted as vector spectrum IP (v (i)) of virus v (i), and the Gaussian distance similarity between viruses v (i) and v (j) is calculated:
;
wherein IP (v (j)) is the vector spectrum of virus v (j); parameter gamma v For controlling the nuclear bandwidth by normalizing the new bandwidth parameter gamma' v Obtaining:
;
above gamma' d And gamma' v Are all constant;
the specific implementation method of the S3 is as follows:
calculating a viral gene sequence similarity matrix based on the viral genome sequence by using a multiple sequence comparison method;
based on the chemical structure of the medicine, obtaining a medicine MACS fingerprint, and calculating a medicine chemical structure similarity matrix by adopting valley coefficients;
the specific implementation method of the S4 is as follows:
the semi-positive programming formula of the fast kernel learning method is as follows:
;
wherein, the first term is a reconstruction loss norm term and represents the magnitude of the integration error of the similarity matrix; the second term is a regularization term, which is used to avoid overfitting; wherein A is virus-drug association adjacency matrix, S j v (j=1, 2) respectively represent a viral Gaussian distance similarity matrix and a viral gene sequence similarity matrix, μ v For regularization parameters, lambda v ∈R 1×2 For the coefficients to be solved, by lambda v Obtaining a virus integration similarity matrix S v :
;
Similarly, the integrated parameter lambda of the pharmaceutical chemical structure similarity matrix and the pharmaceutical Gaussian distance similarity matrix can be obtained according to the above d ∈R 1×2 Then calculate the drug integration similarity matrix S d :
;
Wherein S is j d (j=1, 2) represents a pharmaceutical gaussian distance similarity matrix and a pharmaceutical chemical structure similarity matrix, respectively;
the specific implementation method of the S5 is as follows:
processing virus integration similarity matrices using spectral shiftingS v Obtaining a viral nuclear matrix K * v Processing drug-integrated similarity matrices using spectral shiftingS d Obtaining a medicine nuclear matrix K * d The specific calculation method is as followsIntegrated similarity matrix for decomposing virus or medicineSWhereinUIs an orthogonal matrix of the type that,Λis composed of real characteristic valuesDiagonal matrix lambda min (S) Representing an input matrixSIs used to determine the minimum characteristic value of the (c),Irepresenting the identity matrix of the cell,Trepresenting a transpose of the matrix;
in S6:
the objective function is constructed using a matrix trigonometric based on hypergraph learning as follows:
;
wherein the method comprises the steps ofThe matrix is a known virus-drug association matrix, and F is a predictive score matrix to be solved; />The representation will be +.>The right-side sub-value of (2) is expressed asJMin represents the minimum value to be calculated,representing a projection matrix, whereinr nv Andr nd representing the dimension of the virus latent feature space and the dimension of the drug latent feature space, respectively, || ·| #, respectively F Finger matrixFThe norm of the sample is calculated,tr(. Cndot.) represents the trace of the matrix,λ 1 andλ 2 is a regularization coefficient;
based on viral kernel matrix K * v ,Solving by Singular Value Decomposition (SVD) method to obtain low-rank approximate matrix W, specifically;/>Hypergraph normalized Laplacian matrix representing virus, and calculation methodIs->Wherein I is an identity matrix, D vv And D ve Diagonal matrix of vertex degree and superside degree corresponding to viruses respectively, and matrix H v Diagonal matrix for virus hypergraph association, A v A diagonal matrix for virus hyperedge weights; based on medicine nuclear matrix K * d Solving by a Singular Value Decomposition (SVD) method to obtain a low-rank approximate matrix H, specifically +.>;Hypergraph normalized Laplacian matrix representing drug, calculated as +.>Wherein I is an identity matrix, D dv And D de Diagonal matrix of vertex and superside of medicine respectively, matrix H d Angular matrix for drug hypergraph association pairs, A d A diagonal matrix for drug overedge weight;
post-derivation commandObtain->Solving->,Wherein->Is a unit matrix, and the matrices P, Q and C are non-specialThe intermediate process variable is defined, and a universal solution of a Sieve equation is used to obtain a matrix theta; calculating a viral-drug predictive score matrix。
2. An antiviral drug screening system based on hypergraph matrix trisection, comprising:
the adjacency matrix construction module is used for constructing an adjacency matrix of virus-drug association;
the Gaussian distance similarity matrix calculation module is used for calculating a viral Gaussian distance similarity matrix and a drug Gaussian distance similarity matrix based on the adjacent matrix of the virus-drug association;
the virus gene sequence similarity matrix and pharmaceutical chemical structure similarity matrix calculation module is used for calculating a virus gene sequence similarity matrix based on a virus genome sequence and calculating a pharmaceutical chemical structure similarity matrix based on a pharmaceutical chemical structure;
the integration similarity matrix calculation module is used for integrating the virus integration similarity matrix by using a fast kernel learning method based on the virus Gaussian distance similarity matrix and the virus gene sequence similarity matrix; based on the Gaussian distance similarity matrix of the medicine and the chemical structure similarity matrix of the medicine, a rapid kernel learning method is used for integrating to obtain a medicine integration similarity matrix;
the loss function construction module is used for constructing a loss function based on the adjacent matrix, the virus integration similarity matrix and the medicine integration similarity matrix of the virus-medicine association and based on the matrix trisection of hypergraph learning;
the loss function solving module is used for solving the loss function to obtain a virus-medicine prediction score matrix;
the prediction module is used for screening out the scores of the rows of the target viruses based on the virus-medicine prediction score matrix, and obtaining a final prediction result after sequencing;
the processor is respectively connected with the adjacent matrix construction module, the Gaussian distance similarity matrix calculation module, the viral gene sequence similarity matrix and pharmaceutical chemical structure similarity matrix calculation module, the integration similarity matrix calculation module, the loss function construction module, the loss function solving module and the prediction module;
a memory coupled to the processor and storing a computer program executable on the processor;
when the processor executes the computer program, the processor controls the adjacency matrix construction module, the Gaussian distance similarity matrix calculation module, the virus gene sequence similarity matrix and pharmaceutical chemistry structure similarity matrix calculation module, the integration similarity matrix calculation module, the loss function construction module, the loss function solving module and the prediction module to work so as to realize the hypergraph matrix three-decomposition-based antiviral drug screening method according to claim 1.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202311050747.3A CN116759015B (en) | 2023-08-21 | 2023-08-21 | Antiviral drug screening method and system based on hypergraph matrix tri-decomposition |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202311050747.3A CN116759015B (en) | 2023-08-21 | 2023-08-21 | Antiviral drug screening method and system based on hypergraph matrix tri-decomposition |
Publications (2)
Publication Number | Publication Date |
---|---|
CN116759015A true CN116759015A (en) | 2023-09-15 |
CN116759015B CN116759015B (en) | 2023-11-24 |
Family
ID=87957634
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202311050747.3A Active CN116759015B (en) | 2023-08-21 | 2023-08-21 | Antiviral drug screening method and system based on hypergraph matrix tri-decomposition |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN116759015B (en) |
Citations (12)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JP2016042359A (en) * | 2014-08-18 | 2016-03-31 | 株式会社デンソーアイティーラボラトリ | Recognition apparatus, real number matrix decomposition method, and recognition method |
KR20190000166A (en) * | 2017-06-22 | 2019-01-02 | 한국과학기술원 | Method and system for predicting drug repositioning candidate based on similarity between drug and metabolite |
US20190325343A1 (en) * | 2018-04-19 | 2019-10-24 | National University Of Singapore | Machine learning using partial order hypergraphs |
CN111597440A (en) * | 2020-05-06 | 2020-08-28 | 上海理工大学 | Recommendation system information estimation method based on internal weighting matrix three-decomposition low-rank approximation |
CN115346689A (en) * | 2022-08-16 | 2022-11-15 | 厦门理工学院 | Virus-drug association prediction method based on hypergraph adaptive induction matrix completion |
US20220406407A1 (en) * | 2021-06-15 | 2022-12-22 | The Regents Of The University Of Michigan | Deciphering Multi-Way Interactions In The Human Genome With Use Of Hypergraphs |
LU502421B1 (en) * | 2022-06-30 | 2023-01-10 | Univ Central South | A Method for Predicting Disease Association in Biological Association Network |
CN116092598A (en) * | 2023-01-31 | 2023-05-09 | 汤永 | Antiviral drug screening method based on manifold regularized non-negative matrix factorization |
CN116153391A (en) * | 2023-04-19 | 2023-05-23 | 中国人民解放军总医院 | Antiviral drug screening method, system and storage medium based on joint projection |
CN116189760A (en) * | 2023-04-19 | 2023-05-30 | 中国人民解放军总医院 | Matrix completion-based antiviral drug screening method, system and storage medium |
CN116230077A (en) * | 2023-02-20 | 2023-06-06 | 汤永 | Antiviral drug screening method based on restarting hypergraph double random walk |
CN116597186A (en) * | 2023-03-20 | 2023-08-15 | 广西大学 | Multi-view subspace clustering method, system, electronic equipment and storage medium |
-
2023
- 2023-08-21 CN CN202311050747.3A patent/CN116759015B/en active Active
Patent Citations (12)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JP2016042359A (en) * | 2014-08-18 | 2016-03-31 | 株式会社デンソーアイティーラボラトリ | Recognition apparatus, real number matrix decomposition method, and recognition method |
KR20190000166A (en) * | 2017-06-22 | 2019-01-02 | 한국과학기술원 | Method and system for predicting drug repositioning candidate based on similarity between drug and metabolite |
US20190325343A1 (en) * | 2018-04-19 | 2019-10-24 | National University Of Singapore | Machine learning using partial order hypergraphs |
CN111597440A (en) * | 2020-05-06 | 2020-08-28 | 上海理工大学 | Recommendation system information estimation method based on internal weighting matrix three-decomposition low-rank approximation |
US20220406407A1 (en) * | 2021-06-15 | 2022-12-22 | The Regents Of The University Of Michigan | Deciphering Multi-Way Interactions In The Human Genome With Use Of Hypergraphs |
LU502421B1 (en) * | 2022-06-30 | 2023-01-10 | Univ Central South | A Method for Predicting Disease Association in Biological Association Network |
CN115346689A (en) * | 2022-08-16 | 2022-11-15 | 厦门理工学院 | Virus-drug association prediction method based on hypergraph adaptive induction matrix completion |
CN116092598A (en) * | 2023-01-31 | 2023-05-09 | 汤永 | Antiviral drug screening method based on manifold regularized non-negative matrix factorization |
CN116230077A (en) * | 2023-02-20 | 2023-06-06 | 汤永 | Antiviral drug screening method based on restarting hypergraph double random walk |
CN116597186A (en) * | 2023-03-20 | 2023-08-15 | 广西大学 | Multi-view subspace clustering method, system, electronic equipment and storage medium |
CN116153391A (en) * | 2023-04-19 | 2023-05-23 | 中国人民解放军总医院 | Antiviral drug screening method, system and storage medium based on joint projection |
CN116189760A (en) * | 2023-04-19 | 2023-05-30 | 中国人民解放军总医院 | Matrix completion-based antiviral drug screening method, system and storage medium |
Non-Patent Citations (4)
Title |
---|
JIYANG YU EL AL: "Robust capped norm dual hyper-grarh regularized non-negative matrix tri-factorization", 《MATHEMATICAL BIOSCIENCES AND ENGINEERING》, vol. 20, no. 7, pages 12486 - 12509 * |
常彩霞;王永丽;: "求解矩阵补全问题的三分解方法", 山东科技大学学报(自然科学版), no. 4, pages 82 - 87 * |
李金鑫: "基于网络模型的miRNA与疾病之间关联预测方法研究", 《中国优秀硕士学位论文全文数据库医药卫生科技辑》, no. 8, pages 1 - 68 * |
陈璐瑶 等: "基于超图正则化非负Tucker分解的图像聚类算法", 《计算机工程》, vol. 48, no. 4, pages 197 - 205 * |
Also Published As
Publication number | Publication date |
---|---|
CN116759015B (en) | 2023-11-24 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN116189760B (en) | Matrix completion-based antiviral drug screening method, system and storage medium | |
CN116153391B (en) | Antiviral drug screening method, system and storage medium based on joint projection | |
CN115966252B (en) | Antiviral drug screening method based on L1norm diagram | |
CN116092598B (en) | Antiviral drug screening method based on manifold regularized non-negative matrix factorization | |
Cui et al. | The computational prediction of drug-disease interactions using the dual-network L 2, 1-CMF method | |
CN116230077B (en) | Antiviral drug screening method based on restarting hypergraph double random walk | |
CN116631537B (en) | Antiviral drug screening method, system and storage medium based on fuzzy learning | |
CN114913916A (en) | Drug relocation method for predicting new coronavirus adaptive drugs | |
CN115240762A (en) | Multi-scale small molecule virtual screening method and system | |
Lin et al. | Machine learning in neural networks | |
CN116759015B (en) | Antiviral drug screening method and system based on hypergraph matrix tri-decomposition | |
Ren et al. | De novo prediction of Cell-Drug sensitivities using deep learning-based graph regularized matrix factorization | |
Nguyen et al. | A matrix completion method for drug response prediction in personalized medicine | |
CN116798545B (en) | Antiviral drug screening method, system and storage medium based on non-negative matrix | |
Alghamdi et al. | A prediction modelling and pattern detection approach for the first-episode psychosis associated to cannabis use | |
CN116705148B (en) | Antiviral drug screening method and system based on Laplace least square method | |
CN116759016A (en) | Antiviral drug screening method, system and storage medium based on least square method | |
CN116631502A (en) | Antiviral drug screening method, system and storage medium based on hypergraph learning | |
Jain et al. | Supervised Rank aggregation (SRA): A novel rank aggregation approach for ensemble-based feature selection | |
CN114842924A (en) | Optimized de novo drug design method | |
Testa et al. | A Non-Negative Matrix Tri-Factorization Based Method for Predicting Antitumor Drug Sensitivity | |
Choudhary et al. | Artificial intelligence in medicine discovery: AI in virtual screening | |
Ghorbanali et al. | DRP-VEM: Drug repositioning prediction using voting ensemble | |
Hsieh et al. | Molecular descriptors selection and machine learning approaches in protein-ligand binding affinity with applications to molecular docking | |
Sadeghi et al. | Rusdr: Class imbalance-aware ensemble learning for drug repurposing |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |