CN110838342A - Similarity-based virus-receptor interaction relation prediction method and device - Google Patents

Similarity-based virus-receptor interaction relation prediction method and device Download PDF

Info

Publication number
CN110838342A
CN110838342A CN201911108401.8A CN201911108401A CN110838342A CN 110838342 A CN110838342 A CN 110838342A CN 201911108401 A CN201911108401 A CN 201911108401A CN 110838342 A CN110838342 A CN 110838342A
Authority
CN
China
Prior art keywords
virus
receptor
similarity
matrix
similarity matrix
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201911108401.8A
Other languages
Chinese (zh)
Other versions
CN110838342B (en
Inventor
王建新
严承
李洪东
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Central South University
Original Assignee
Central South University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Central South University filed Critical Central South University
Priority to CN201911108401.8A priority Critical patent/CN110838342B/en
Publication of CN110838342A publication Critical patent/CN110838342A/en
Application granted granted Critical
Publication of CN110838342B publication Critical patent/CN110838342B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16BBIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
    • G16B35/00ICT specially adapted for in silico combinatorial libraries of nucleic acids, proteins or peptides
    • G16B35/20Screening of libraries
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02ATECHNOLOGIES FOR ADAPTATION TO CLIMATE CHANGE
    • Y02A90/00Technologies having an indirect contribution to adaptation to climate change
    • Y02A90/10Information and communication technologies [ICT] supporting adaptation to climate change, e.g. for weather forecasting or climate simulation

Landscapes

  • Engineering & Computer Science (AREA)
  • Library & Information Science (AREA)
  • Physics & Mathematics (AREA)
  • Health & Medical Sciences (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Chemical & Material Sciences (AREA)
  • Biochemistry (AREA)
  • Biophysics (AREA)
  • Molecular Biology (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Biotechnology (AREA)
  • Evolutionary Biology (AREA)
  • General Health & Medical Sciences (AREA)
  • Medical Informatics (AREA)
  • Spectroscopy & Molecular Physics (AREA)
  • Theoretical Computer Science (AREA)
  • Measuring Or Testing Involving Enzymes Or Micro-Organisms (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention discloses a method and a device for predicting virus-receptor interaction relation based on similarity, wherein the method comprises the steps of firstly, respectively constructing Gaussian nuclear similarity matrixes of viruses and receptors based on known virus-receptor interaction relation data; constructing a sequence similarity matrix of the receptor by using sequence information of the receptor; and integrating final receptor similarity based on the receptor sequence similarity matrix and the Gaussian kernel similarity matrix. Then, the interaction relationship between the virus and the receptor, for which the interaction relationship does not exist, is initialized based on the neighborhood information. And finally, calculating the interaction relation score of the virus-receptor pair by using a Laplace regularization least square method. The invention can effectively predict the virus-receptor interaction relation.

Description

Similarity-based virus-receptor interaction relation prediction method and device
Technical Field
The invention belongs to the field of system biology, and relates to a similarity-based virus-receptor interaction relation prediction method and device.
Background
With the recent development of microbiology research, there is increasing evidence that microorganisms are closely linked to human diseases. Viruses, an important group of microorganisms and a widespread population of microorganisms, have a necessary connection with viral infectious diseases, such as ebola virus, influenza virus, and the like. The process of virus transmission begins when the virus contacts the host surface, and binding of the virus to the receptor is the first step in the entry of the virus into the host cell. In addition, specificity and affinity are major factors that viruses can attach to different types of receptor protein molecules and enter cells.
With the development of high throughput technology, many studies have shown that molecules, including proteins, can act as receptors for viruses, such as carbohydrates and lipids. In addition, binding of the virus to the receptor is a dynamic process, whose recognition can progress during infection, followed by the appearance of viral variants with different receptor binding specificities. In line with the importance of current virus-receptor interaction relationship research in the diagnosis and cure of viral diseases, a research team has constructed a mammalian-based virus-receptor interaction database, viralReceptor, which provides an important data base for understanding and studying the interaction mechanism between viruses and receptors. In this database, 128 viruses, 119 receptors and 268 interactions between them were included. This database is by far the most comprehensive mammalian virus-receptor interaction database, but the database still provides very limited virus-receptor interaction data.
The current research for predicting the virus-receptor interaction relationship by a calculation method is seriously lack, and the research is seriously unmatched with the importance degree of the current virus disease research. In addition, with the inherent characteristics of the variant of the virus and the like, a greater challenge is also provided for the diagnosis and treatment of the virus diseases, and a high-efficiency method for predicting the virus-receptor interaction relationship is urgently needed to discover a new (potential) virus-receptor interaction relationship, provide a basic guidance function for a subsequent biomedical experiment for identifying the virus-receptor interaction relationship, and improve the efficiency of the biomedical experiment.
Disclosure of Invention
The invention aims to solve the technical problem that the similarity-based virus-receptor interaction relation prediction method, the similarity-based virus-receptor interaction relation prediction device, the electronic equipment and the storage medium are provided aiming at the defects of the prior art, and the virus-receptor interaction relation can be predicted more accurately.
The technical solution of the invention is as follows:
a method for similarity-based prediction of virus-receptor interaction relationships, comprising the steps of:
step 1: construction of a Virus similarity matrix SvAnd receptor similarity matrix Sp
Step 2: based on known virus-receptor interaction relation data, virus similarity matrix SvAnd receptor similarity matrix SpThe score of the interaction relationship of each virus-receptor pair, i.e., the probability of the existence of an interaction relationship of each virus-receptor pair, is predicted.
All virus-receptor pairs are ranked based on interaction relationship pairs, with the top ranked virus-receptor pairs being more likely to have an interaction relationship.
The present prediction method can predict the possibility of an interaction relationship existing in a virus-receptor pair for which no known interaction relationship exists.
Further, based on the known data of the virus-receptor interaction relationship, the Gaussian nuclear similarity matrix G of the virus is calculated respectivelyvAnd the Gaussian nuclear similarity matrix G of the receptorp(ii) a Calculating to obtain a sequence similarity matrix G of the receptor according to the sequence information of the receptors
The Gaussian kernel similarity matrix G of the virusvAs final similarity matrix S of the final virusv
Gaussian nuclear similarity matrix G by receptorspAnd the sequence similarity matrix GsIntegrating to obtain a final receptor similarity matrix Sp
Further, in the step 2, the interaction relationship score of each virus-receptor pair is predicted based on a laplace regularization least square method (LapRLS), and the prediction process is as follows:
first, according to the virus similarity matrix SvAnd receptor similarity matrix SpRespectively constructing corresponding regularized Laplacian matrixes, wherein the calculation process is as follows:
Lv=(Dv)-1/2(Dv-Sv)(Dv)-1/2
Lp=(Dp)-1/2(Dp-Sp)(Dp)-1/2
wherein L isvAnd LpLaplace matrix regularized for virus and receptor similarity matrices, respectively, DvIs Nv×NvOf diagonal elements D ofv(i, i) is the virus similarity matrix SvThe sum of all elements of row i of (1); dpIs Np×NpOf diagonal elements D ofp(j, j) is the receptor similarity matrix SpThe sum of all elements of the j-th row of (a).
Then, prediction matrices F were constructed from the virus side and the receptor side, respectivelyvAnd FpThe optimization model of (2):
Figure BDA0002272003410000021
Figure BDA0002272003410000022
wherein the content of the first and second substances,
Figure BDA0002272003410000023
and
Figure BDA0002272003410000024
respectively represent FvAnd FpThe optimal solution of | | - | | luminanceFRepresenting the F-norm (Frobenius norm) of the matrix, βvAnd βpIs a smoothing parameter, is an empirical parameter;
then, the optimization model is solved to obtainAnd
Figure BDA0002272003410000031
finally, find out
Figure BDA0002272003410000032
And
Figure BDA0002272003410000033
to obtain a final virus-receptor interaction relation prediction matrix F*
LapRLS, a semi-supervised learning method, is an extended algorithm of Recursive Least Squares (RLS). In the invention, under the scene that the known virus-receptor interaction relation data is less and the interaction relation of most virus-receptor pairs is unknown, a LapRLS algorithm is adopted and Laplace regularization term constraint in the LapRLS algorithm is utilized, so that better prediction performance can be obtained.
Further, in the optimization model, the adjacent matrix Y is a matrix Y after initialization processing, the initialization processing refers to initialization processing of an interaction relationship between a virus and a receptor, which does not have a known interaction relationship, and the initialization processing method includes:
for each virus viIf it has no known interaction with all receptors, i.e.
Y(i,:)=[Y(i,1),Y(i,2),...,Y(i,Np)]If the vector is zero vector, then according to the virus similarity matrix SvUpdating element Y (i, j) therein, i.e. initializing virus viAnd the receptor pjThe formula is as follows:
Figure BDA0002272003410000034
wherein S isv(i, n) is the virus similarity matrix SvThe element in the ith row and the nth column, i.e. virus viAnd virus vnSimilarity of (c);
similarly, p for each receptorjIf it does not have a known interaction relationship with all viruses, i.e.
Y(:,j)=[Y(1,j),Y(2,j),...,Y(Nv,j)]TIs zero vector, then according to the receptor similarity matrix SpUpdate elements thereinElement Y (i, j), the initiating receptor pjAnd virus viThe formula is as follows:
Figure BDA0002272003410000035
wherein S isp(j, m) is the receptor similarity matrix SpThe element of row j and column m, i.e. the acceptor pjAnd the receptor pmThe similarity of (c).
The initialization process described above can further improve the prediction performance of the present invention.
Further, the optimization model is solved according to the following formula:
Figure BDA0002272003410000036
Figure BDA0002272003410000037
the invention also provides a device for predicting the virus-receptor interaction relation based on similarity, which comprises a similarity calculation module and a prediction module;
the similarity calculation module is used for executing the step 1: obtaining a virus similarity matrix SvAnd receptor similarity matrix Sp
The prediction module is configured to perform step 2: based on known virus-receptor interaction relationship, virus similarity matrix SvAnd receptor similarity matrix SpThe score of the interaction relationship of each virus-receptor pair, i.e., the probability of the existence of an interaction relationship of each virus-receptor pair, is predicted.
All virus-receptor pairs are ranked based on the interaction relationship pairs, with the higher ranked pairs having a greater likelihood of interaction.
The prediction device can predict the possibility of the existence of an interaction relationship between a virus-receptor pair for which no known interaction relationship exists.
The invention also provides an electronic device, which comprises a memory and a processor, wherein the memory stores a computer program, and the computer program is executed by the processor, so that the processor realizes the virus-receptor interaction relation method.
The present invention also provides a computer-readable storage medium having stored thereon a computer program which, when executed by a processor, implements the above-described virus-receptor interaction relationship method.
Has the advantages that:
the invention firstly calculates the Gaussian nuclear similarity of the virus and the receptor through the known virus-receptor interaction relation, and then calculates the sequence similarity of the receptor by utilizing the protein sequence information of the receptor. The gaussian nuclear similarity of the virus is then taken as the final similarity of the virus. And the final similarity of the receptors is obtained by integrating the similarity of the receptor sequences and the Gaussian nucleus similarity, and in the process, the invention explores the integration mode based on the verification result of the experiment and selects the integration mode which leads the prediction performance to be better. After the final similarity of the viruses and the receptors is obtained, a neighbor (all viruses and receptors) based interaction relation data initialization processing process is added to the viruses and the receptors without known interaction relation. The potential virus-receptor interaction relationships are ultimately predicted based on the hypothesis that "similar viruses use similar receptors and vice versa": potential virus-receptor interaction relationship scores are calculated by using the virus similarity, receptor similarity and interaction relationship data after initialization processing through a Laplace regularization least square method. Ten-fold cross validation (10CV) and leave-on-validation (LOOCV) were used to evaluate the predicted performance of the methods and compared to other methods, where AUC was used as an indicator of performance evaluation. The experimental result shows that the invention can effectively predict the virus-receptor interaction relationship, provides a basic guidance function for the development of a subsequent calculation model and the improvement of the biomedical experimental efficiency, and can improve the research efficiency of the subsequent biomedical experiment and save the research cost.
The invention provides an effective calculation method for predicting the virus-receptor interaction relationship for the first time aiming at the field of virus-receptor interaction relationship, and provides an important basis for the development of a subsequent virus-receptor interaction relationship prediction method. In addition, the invention can also improve the biomedical experimental efficiency of the binding relationship between the subsequent viruses and the receptor.
Drawings
FIG. 1 is a general flow diagram of a method for predicting a virus-receptor interaction relationship;
FIG. 2 is a graph comparing the performance of the present invention with ten-fold cross-validation of other methods;
FIG. 3 is a graph of leave-one-out cross-validation performance comparisons of the present invention with other methods;
FIG. 4 is a comparison graph of ten-fold cross-validation of the present invention in different integration patterns of receptor sequence and Gaussian nuclear similarity;
FIG. 5 is a comparison of the leave-one-out cross-validation of the present invention in different integration patterns of receptor sequence and Gaussian nuclear similarity;
Detailed Description
The invention will be described in further detail below with reference to the following figures and specific examples:
example 1:
the embodiment discloses a similarity-based virus-receptor interaction relationship prediction method, which comprises the following steps:
step 1: obtaining a virus similarity matrix SvAnd receptor similarity matrix Sp
Step 2: based on known virus-receptor interaction relationship, virus similarity matrix SvAnd receptor similarity matrix SpPredicting the interaction relationship score of each virus-receptor pair, namely the possibility of the interaction relationship of each virus-receptor pair; all virus-receptor pairs are then ranked based on the interaction relationship pairs, with the top ranked virus-receptor pairs being more likely to have an interaction relationship. The method can predict the possibility of the existence of the interaction relation aiming at the virus-receptor pair without the known interaction relation.
Example 2:
this example is based on example 1, and uses the Gaussian kernel similarity matrix G of virusvAs final similarity matrix S of the final virusv(ii) a Instant Sv=Gv(ii) a Since the virus only calculates one Gaussian kernel similarity in this embodiment, the Gaussian kernel similarity is used as the final similarity Sv
Gaussian kernel similarity matrix G for virusesvCalculated according to the known virus-receptor interaction relationship.
Example 3:
this example is based on example 1 and consists of a Gaussian nuclear similarity matrix G of receptorspAnd the sequence similarity matrix GsIntegrating to obtain a final receptor similarity matrix Sp(ii) a Wherein the Gaussian nuclear similarity matrix G of the receptorspCalculating according to the known virus-receptor interaction relation; sequence similarity matrix G for receptorssCalculated according to the sequence information of the receptor.
Example 4:
this example calculates the Gaussian nuclear similarity matrix G of the virus based on the known virus-receptor interaction relationship in the following steps based on examples 2 and 3vAnd gaussian nuclear similarity matrix of receptors:
first, defineA set of all viruses, where NvIs the number of viruses;
Figure BDA0002272003410000062
is a collection of all receptors, in which NpIs the number of receptors;
y is defined as the adjacency matrix of the virus-receptor interaction relationship, and the size is Nv×NpThe ith row and the jth column of the element yijDetermined according to the following method: if a virus v is knowniAnd the receptor pjAn interaction exists, then Y (i, j) is 1,otherwise Y (i, j) is 0;
then, a Gaussian kernel similarity matrix G of the virus is constructedvOf size Nv×NvElement G of ith row and nth columnv(i, n) represents a virus viAnd vnThe gaussian kernel similarity of (a) is calculated according to the following formula:
Gv(i,n)=exp(-γv||Y(i,:)-Y(n,:)||2)
Figure BDA0002272003410000063
wherein i, N is 1,2, …, Nv;Y(i,:)=[Y(i,1),Y(i,2),...,Y(i,Np)]Is the ith row of the matrix Y, i.e. virus viVector of interaction relationships with all receptors; gamma rayvBeing a parameter for controlling the width of the nucleus, γ'vAs an empirical parameter, the usage experience according to the gaussian kernel is set to 1 in the present embodiment; in the specific calculation process, the Gaussian nuclear similarity between the virus Feline Leukemia strain B/lambda-B1 and the virus Human alphaheres virus 3 is 0.2279.
Construction of a Gaussian Nuclear similarity matrix G of receptorspOf size Np×NpElement G of jth row and mth columnp(j, m) denotes the acceptor pjAnd pmThe gaussian kernel similarity of (a) is calculated according to the following formula:
Gp(j,m)=exp(-γp||Y(:,j)-Y(:,m)||2)
Figure BDA0002272003410000064
wherein j, m is 1,2, …, Np;Y(:,j)=[Y(1,j),Y(2,j),...,Y(Nv,j)]TIn column j of matrix Y, i.e. the acceptor pjVectors of interaction relationships with all viruses; gamma raypBeing a parameter for controlling the width of the nucleus, γ'pFor the empirical parameters, the experience in terms of the use of the gaussian kernel is set to 1 in this embodiment. According to this calculation, the receptor MER proto-oncogene, tyrosine kinaseThe Gaussian nuclear similarity to the receptor dipeptyl peptidase 4 was 0.3492.
Example 5:
this example calculates the sequence similarity matrix G of the receptor based on the known sequence information of the receptor by the following procedure based on example 3s
First, sequence data for all receptors is downloaded from the KEGG database, and then the regularized Smith-Waterman score is used to calculate the sequence similarity of the receptors;
construction of a sequence similarity matrix G for receptorssOf size Np×NpElement G of jth row and mth columns(j, m) denotes the acceptor pjAnd pmThe sequence similarity of (a) is calculated as follows:
wherein j, m is 1,2, …, NpSW (j, m) is an acceptor pjAnd pmThe original Smith-Waterman score between, which is the receptor p calculated by the Smith-Waterman algorithmjAnd pmThe maximum element value (maximum score value) in the match score matrix of the sequence data. By this calculation, the sequence similarity between the receptor Human alphaheresvirus 2 and the receptor humanocoronavirus 229E was 0.1744.
Example 6:
this example is based on example 3 and determines the Gaussian nuclear similarity matrix G from the receptorspAnd the sequence similarity matrix GsIntegrating to obtain a final receptor similarity matrix SpThe following three integration modes:
Figure BDA0002272003410000072
namely, the three integration modes are respectively as follows: (1) receptor gaussian nuclear similarity alone; (2) receptor sequence similarity alone; (3) mean values of receptor gaussian nuclear similarity and receptor sequence similarity were used.
The performance influence of the prediction methods in the three integration modes is explored by adopting a cross validation method, and a mode with relatively good prediction results is finally selected, namely the final receptor similarity matrix S is calculated by the third integration modep
Example 7:
this example is based on example 1 and based on known virus-receptor interaction relationship data, a virus similarity matrix SvAnd receptor similarity matrix SpThe method based on initialization process and Laplace regularization least squares (LapRLS) (IILLS method for short) is used for predicting the interaction relationship scores of each virus-receptor pair, and the prediction process is as follows:
first, according to the virus similarity matrix SvAnd receptor similarity matrix SpRespectively constructing corresponding regularized Laplacian matrixes, wherein the calculation process is as follows:
Lv=(Dv)-1/2(Dv-Sv)(Dv)-1/2
Lp=(Dp)-1/2(Dp-Sp)(Dp)-1/2
wherein L isvAnd LpLaplace matrix regularized for virus and receptor similarity matrices, respectively, DvIs Nv×NvOf diagonal elements D ofv(i, i) is the virus similarity matrix SvThe sum of all elements of row i of (1); dpIs Np×NpOf diagonal elements D ofp(j, j) is the receptor similarity matrix SpThe sum of all elements of the j-th row of (a).
Then, according to the definition of Laplace least square method, a prediction matrix F is respectively constructed from the virus side and the receptor sidevAnd FpThe optimization model of (2):
Figure BDA0002272003410000081
Figure BDA0002272003410000082
wherein the content of the first and second substances,
Figure BDA0002272003410000087
andrespectively represent FvAnd FpThe optimal solution of | | - | | luminanceFRepresenting the F-norm (Frobenius norm) of the matrix, βvAnd βpFor the smoothing parameters, 1 is set according to the use experience;
solving the optimization model to obtain
Figure BDA0002272003410000088
And
Figure BDA0002272003410000084
finally, find outAnd
Figure BDA0002272003410000085
to obtain a final virus-receptor interaction relation prediction matrix F*
Example 8:
in this embodiment and embodiment 7, in the optimization model, the matrix Y after initialization processing is used as the adjacent matrix Y, and first, the virus-receptor interaction relationship matrix Y is initialized, that is, the interaction relationship data of the virus and the receptor without known interaction relationship is initialized, as follows:
for each virus viIf it has no known interaction with all receptors, i.e.
Y(i,:)=[Y(i,1),Y(i,2),...,Y(i,Np)]If the vector is zero vector, then according to the virus similarity matrix SvUpdate elements thereinElement Y (i, j), i.e. initiating virus viAnd the receptor pjThe formula is as follows:
Figure BDA0002272003410000086
wherein S isv(i, n) is the virus similarity matrix SvThe element in the ith row and the nth column, i.e. virus viAnd virus vnSimilarity of (c);
similarly, p for each receptorjIf it does not have a known interaction relationship with all viruses, i.e.
Y(:,j)=[Y(1,j),Y(2,j),...,Y(Nv,j)]TIs zero vector, then according to the receptor similarity matrix SpUpdating element Y (i, j) therein, i.e. initializing acceptor pjAnd virus viThe formula is as follows:
Figure BDA0002272003410000091
wherein S isp(j, m) is the receptor similarity matrix SpThe element of row j and column m, i.e. the acceptor pjAnd the receptor pmThe similarity of (c).
In the above step, for Y (i,: Y (: j), i ═ 1,2, …, Nv,j=1,2,…,NpThe judgment of zero vector and the updating of elements in the vector are performed, and there is no limitation on the sequence, for example, Y (i, i) may be sequentially set to 1,2, …, NvJudging whether the vector is zero or not and updating elements in the vector, and sequentially comparing Y (: j), j is 1,2, … and NpAnd judging whether the vector is zero or not and updating elements in the vector.
Experiments show that the prediction performance of the invention can be further improved by the initialization processing process.
Example 9:
in this embodiment, on the basis of embodiment 8, an optimization model is solved according to the following formula:
Figure BDA0002272003410000093
example 10:
this example is based on example 9 and makes
Figure BDA0002272003410000094
The overall scheme for the prediction of the virus-receptor interaction relationship in this example is shown in FIG. 1.
According to the above calculation process, the interaction relationship score between the virus Dengue virus and the receptor C-type molecule family 4member M is 0.0669. The virus-receptor pairs are then ranked according to this interaction score, with the higher the probability that the top ranked virus-receptor pairs will have an interaction.
Example 11:
the embodiment discloses a similarity-based virus-receptor interaction relation prediction device, which comprises a similarity calculation module and a prediction module;
the similarity calculation module is used for executing the step 1: obtaining a virus similarity matrix SvAnd receptor similarity matrix Sp
The prediction module is configured to perform step 2: based on known virus-receptor interaction relationship, virus similarity matrix SvAnd receptor similarity matrix SpPredicting the interaction relationship score of each virus-receptor pair, namely the possibility of the interaction relationship of each virus-receptor pair; all virus-receptor pairs are then ranked based on the interaction relationship pairs, with the top ranked virus-receptor pairs being more likely to have an interaction relationship. The device can predict the possibility of the existence of an interaction relation for a virus-receptor pair without a known interaction relation.
The technical means specifically adopted in step 1 and step 2 refer to the foregoing embodiments, and are not described herein again.
Example 12:
the embodiment discloses an electronic device, which includes a memory and a processor, wherein the memory stores a computer program, and when the computer program is executed by the processor, the processor is enabled to implement the method for predicting a virus-receptor interaction relationship in any of the above embodiments.
Example 13:
the present embodiment discloses a computer-readable storage medium, on which a computer program is stored, which, when executed by a processor, implements the method for predicting a virus-receptor interaction relationship in any of the above embodiments.
If the above embodiment is used for predicting the potential human virus-receptor interaction relationship, firstly, the human virus-receptor interaction relationship data is extracted from the viralReceptor database, the number of the extracted human viruses is 104, the number of the extracted human receptors is 74, and the number of the known human virus-receptor interaction relationships is 211. Then, on the basis of the extracted human body virus-receptor interaction relation data, the human body virus-receptor interaction relation data is analyzed, and then a corresponding calculation model is established to predict the human potential virus-receptor interaction relation. Similarly, if the virus-receptor interaction relationship of other types of mammals needs to be predicted, corresponding virus-receptor interaction relationship data can be extracted from the viralReceptor database, and then the data is developed and analyzed, and then a corresponding calculation model is established for prediction.
Experimental verification
In order to verify the effectiveness of the method and evaluate the prediction performance of the method, two verification modes are adopted to evaluate the prediction performance of the method by referring to the verification standards of prediction methods in other fields: including ten-fold cross validation and leave-one validation. The AUC (the area under ROC currents) value is used as its metric. In ten-fold cross validation, the known virus-receptor interaction relationship was divided into 10, and 1 was selected as the test set by sequential polling, the remaining 9 were used as the training set, and the average after 100 runs was used as its prediction result. In the leave-one validation, one known virus-receptor interaction relationship is selected as a test set from the known virus-receptor interaction relationships, and the rest is used as a training set.
Figure 2 depicts AUC plots for the present invention and other methods in a ten-fold cross validation. As can be seen from the figure, the AUC value of IILLS of the present invention is 0.8675, which is superior to other 3 methods BRWH (0.7959), LapRLS (0.7577) and CMF (0.7128). The invention proves that the potential virus-receptor interaction relation can be predicted more effectively compared with other methods.
FIG. 3 depicts AUC plots in leave-one-out cross-validation for the present invention and other methods. It can also be seen that IILLS has an AUC value of 0.9061, while the other three methods have AUC values of BRWH (0.8105), LapRLS (0.7713) and CMF (0.7496). The invention is further illustrated as being effective in predicting potential virus-receptor interaction relationships.
FIG. 4 depicts a ten-fold cross-validation performance comparison of the present invention in different integrated ways that integrate receptor sequence similarity and Gaussian nuclear similarity. The method specifically comprises the following steps: (1) IILLS _ Seqsim: sequence similarity alone is used; (2) IILLS _ GipSeq: using the mean of sequence and receptor similarity; (3) IILLS _ Gipsim: only gaussian kernel similarity is used. It can also be seen that the present invention achieves the maximum AUC value when only the receptor sequence similarity is used (0.8675), the next time when the mean of the receptor sequence similarity and gaussian kernel similarity is taken (0.8464), and the results of using the receptor gaussian kernel similarity alone are worse than the former two (0.8242).
FIG. 5 depicts graphs comparing the performance of the present invention in leave-on validation in different integrated ways integrating receptor sequence similarity and Gaussian nuclear similarity. It can also be seen from the figure that the IILLS of the present invention achieved the best predicted performance (AUC: 0.9061) when only the receptor sequence similarity was taken, and was also superior to the performance of the other two cases (mean of Gaussian nuclear similarity and sequence similarity: 0.8865, only Gaussian nuclear similarity: 0.8724). The above experiments all show that the invention can obtain better prediction performance under the condition of only taking the similarity of the receptor sequences, so the invention selects the similarity of the receptor sequences as the final similarity of the receptors.
The expression of the invention in the application case proves that the invention can effectively predict the potential virus-receptor interaction relationship, accelerate the subsequent research process of the virus-receptor interaction relationship and contribute to further improving the treatment and diagnosis level of virus diseases.

Claims (10)

1. A similarity-based virus-receptor interaction relationship prediction method is characterized by comprising the following steps:
step 1: construction of a Virus similarity matrix SvAnd receptor similarity matrix Sp
Step 2: based on known virus-receptor interaction relation data, virus similarity matrix SvAnd receptor similarity matrix SpThe score of the interaction relationship of each virus-receptor pair, i.e., the probability of the existence of an interaction relationship of each virus-receptor pair, is predicted.
2. The method of claim 2, wherein the similarity-based prediction of viral-receptor interaction relationship,
respectively calculating the Gaussian nuclear similarity matrix G of the virus according to the known virus-receptor interaction relation datavAnd the Gaussian nuclear similarity matrix G of the receptorp(ii) a Calculating to obtain a sequence similarity matrix G of the receptor according to the sequence information of the receptors
The Gaussian kernel similarity matrix G of the virusvAs final similarity matrix S of the final virusv
Gaussian nuclear similarity matrix G by receptorspAnd the sequence similarity matrix GsIntegrating to obtain a final receptor similarity matrix Sp
3. The similarity-based virus-receptor interaction relationship prediction method according to claim 2, wherein the Gaussian nuclear similarity matrix G of the virus is calculated according to known virus-receptor interaction relationship datavAnd the Gaussian nuclear similarity matrix G of the receptorpThe process of (2) is as follows:
first, define
Figure FDA0002272003400000011
A set of all viruses, where NvIs the number of viruses;
Figure FDA0002272003400000012
is a collection of all receptors, in which NpIs the number of receptors;
y is defined as the adjacency matrix of the virus-receptor interaction relationship, and the size is Nv×NpThe ith row and the jth column of the element yijDetermined according to the following method: if a virus v is knowniAnd the receptor pjIf an interaction exists, then Y (i, j) is 1, otherwise Y (i, j) is 0;
then, a Gaussian kernel similarity matrix G of the virus is constructedvOf size Nv×NvElement G of ith row and nth columnv(i, n) represents a virus viAnd vnThe gaussian kernel similarity of (a) is calculated according to the following formula:
Gv(i,n)=exp(-γv||Y(i,:)-Y(n,:)||2)
Figure FDA0002272003400000013
wherein i, N is 1,2, …, Nv;Y(i,:)=[Y(i,1),Y(i,2),...,Y(i,Np)]Is the ith row of the matrix Y, i.e. virus viVector of interaction relationships with all receptors; gamma rayvBeing a parameter for controlling the width of the nucleus, γ'vIs an empirical parameter;
construction of a Gaussian Nuclear similarity matrix G of receptorspOf size Np×NpElement G of jth row and mth columnp(j, m) denotes the acceptor pjAnd pmThe gaussian kernel similarity of (a) is calculated according to the following formula:
Gp(j,m)=exp(-γp||Y(:,j)-Y(:,m)||2)
Figure FDA0002272003400000021
wherein j, m is 1,2, …, Np;Y(:,j)=[Y(1,j),Y(2,j),...,Y(Nv,j)]TIn column j of matrix Y, i.e. the acceptor pjVectors of interaction relationships with all viruses; gamma raypBeing a parameter for controlling the width of the nucleus, γ'pAre empirical parameters.
4. The method of claim 2, wherein the receptor sequence similarity matrix G is calculated based on known receptor sequence informationsThe process of (2) is as follows:
build size Np×NpOf the receptor of (3) sequence similarity matrix GsElement G of jth row and mth columns(j, m) denotes the acceptor pjAnd pmThe sequence similarity of (a) is calculated as follows:
Figure FDA0002272003400000022
wherein j, m is 1,2, …, NpSW (j, m) is an acceptor pjAnd pmThe original Smith-Waterman score between, which is the receptor p calculated by the Smith-Waterman algorithmjAnd pmThe maximum element value in the match score matrix of the sequence data.
5. The similarity-based virus-receptor interaction relationship prediction method according to any one of claims 1 to 4, wherein in the step 2, the interaction relationship score of each virus-receptor pair is predicted by a Laplace regularized least squares method, and the prediction process is as follows:
first, according to the virus phaseSimilarity matrix SvAnd receptor similarity matrix SpRespectively constructing corresponding regularized Laplace matrixes:
Lv=(Dv)-1/2(Dv-Sv)(Dv)-1/2
Lp=(Dp)-1/2(Dp-Sp)(Dp)-1/2
wherein D isvIs Nv×NvOf diagonal elements D ofv(i, i) is the virus similarity matrix SvThe sum of all elements of row i of (1); dpIs Np×NpOf diagonal elements D ofp(j, j) is the receptor similarity matrix SpThe sum of all elements of the jth row of (1);
then, prediction matrices F were constructed from the virus side and the receptor side, respectivelyvAnd FpThe optimization model of (2):
Figure FDA0002272003400000023
Figure FDA0002272003400000031
wherein Y is an adjacency matrix of virus-receptor interaction relationships,
Figure FDA0002272003400000032
and
Figure FDA0002272003400000033
respectively represent FvAnd FpThe optimal solution of | | - | | luminanceFRepresenting the F-norm of the matrix βvAnd βpIs a smoothing parameter, is an empirical parameter;
then, the optimization model is solved to obtain
Figure FDA0002272003400000034
And
Figure FDA0002272003400000035
finally, find out
Figure FDA0002272003400000036
And
Figure FDA0002272003400000037
to obtain a final virus-receptor interaction relation prediction matrix F*
6. The method for predicting virus-receptor interaction relationship based on similarity according to claim 5, wherein the adjacent matrix Y in the optimization model is initialized as follows:
for each virus viIf it does not have a known interaction with all receptors, i.e. Y (i): ═ Y (i,1), Y (i,2)p)]If the vector is zero vector, then according to the virus similarity matrix SvUpdating element Y (i, j) therein, i.e. initializing virus viAnd the receptor pjThe formula is as follows:
Figure FDA0002272003400000038
wherein S isv(i, n) is the virus similarity matrix SvThe element in the ith row and the nth column, i.e. virus viAnd virus vnSimilarity of (c);
similarly, p for each receptorjIf there is no known interaction with all viruses, i.e., Y (: j) ═ Y (1, j), Y (2, j),. · Y (N)v,j)]TIs zero vector, then according to the receptor similarity matrix SpUpdating element Y (i, j) therein, i.e. initializing acceptor pjAnd virus viThe formula is as follows:
Figure FDA0002272003400000039
wherein S isp(j, m) is the receptor similarity matrix SpThe element of row j and column m, i.e. the acceptor pjAnd the receptor pmThe similarity of (c).
7. The method of claim 5, wherein the optimization model is solved according to the following formula:
Figure FDA00022720034000000310
Figure FDA0002272003400000041
8. the similarity-based virus-receptor interaction relation prediction device is characterized by comprising a similarity calculation module and a prediction module;
the similarity calculation module is used for executing the step 1: obtaining a virus similarity matrix SvAnd receptor similarity matrix Sp
The prediction module is configured to perform step 2: based on known virus-receptor interaction relationship, virus similarity matrix SvAnd receptor similarity matrix SpThe score of the interaction relationship of each virus-receptor pair, i.e., the probability of the existence of an interaction relationship of each virus-receptor pair, is predicted.
9. An electronic device comprising a memory and a processor, the memory having stored therein a computer program, wherein the computer program, when executed by the processor, causes the processor to implement the method of any of claims 1-7.
10. A computer-readable storage medium, on which a computer program is stored, which, when being executed by a processor, carries out the method according to any one of claims 1 to 7.
CN201911108401.8A 2019-11-13 2019-11-13 Similarity-based virus-receptor interaction relation prediction method and device Active CN110838342B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201911108401.8A CN110838342B (en) 2019-11-13 2019-11-13 Similarity-based virus-receptor interaction relation prediction method and device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201911108401.8A CN110838342B (en) 2019-11-13 2019-11-13 Similarity-based virus-receptor interaction relation prediction method and device

Publications (2)

Publication Number Publication Date
CN110838342A true CN110838342A (en) 2020-02-25
CN110838342B CN110838342B (en) 2022-08-16

Family

ID=69574899

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201911108401.8A Active CN110838342B (en) 2019-11-13 2019-11-13 Similarity-based virus-receptor interaction relation prediction method and device

Country Status (1)

Country Link
CN (1) CN110838342B (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113241114A (en) * 2021-03-24 2021-08-10 辽宁大学 LncRNA-protein interaction prediction method based on graph convolution neural network
CN116705148A (en) * 2023-07-24 2023-09-05 中国人民解放军总医院 Antiviral drug screening method and system based on Laplace least square method

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2014071836A (en) * 2012-10-01 2014-04-21 Japan Science & Technology Agency Approval prediction device, approval prediction method, and program
CN107610784A (en) * 2017-09-15 2018-01-19 中南大学 A kind of method of predictive microbiology and disease relationship
CN107862179A (en) * 2017-11-06 2018-03-30 中南大学 A kind of miRNA disease association Relationship Prediction methods decomposed based on similitude and logic matrix

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2014071836A (en) * 2012-10-01 2014-04-21 Japan Science & Technology Agency Approval prediction device, approval prediction method, and program
CN107610784A (en) * 2017-09-15 2018-01-19 中南大学 A kind of method of predictive microbiology and disease relationship
CN107862179A (en) * 2017-11-06 2018-03-30 中南大学 A kind of miRNA disease association Relationship Prediction methods decomposed based on similitude and logic matrix

Non-Patent Citations (3)

* Cited by examiner, † Cited by third party
Title
CHENG YAN ET AL.: "DWNN-RLS: regularized least squares method for predicting circRNA-disease associations", 《BMC BIOINFORMATICS》 *
XIA Z ET AL.: "Semi-supervised drug-protein interaction prediction from heterogeneous biological spaces", 《BMC SYST BIOL》 *
黄志安: "疾病与生物标志物的关联关系预测模型研究", 《中国优秀硕士学位论文全文数据库(电子期刊)医药卫生科技辑》 *

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113241114A (en) * 2021-03-24 2021-08-10 辽宁大学 LncRNA-protein interaction prediction method based on graph convolution neural network
CN116705148A (en) * 2023-07-24 2023-09-05 中国人民解放军总医院 Antiviral drug screening method and system based on Laplace least square method
CN116705148B (en) * 2023-07-24 2023-10-27 中国人民解放军总医院 Antiviral drug screening method and system based on Laplace least square method

Also Published As

Publication number Publication date
CN110838342B (en) 2022-08-16

Similar Documents

Publication Publication Date Title
Karim et al. Unicon: Combating label noise through uniform selection and contrastive learning
Sathya et al. Optimal multilevel thresholding using bacterial foraging algorithm
Duraisamy et al. A new multilevel thresholding method using swarm intelligence algorithm for image segmentation
CN110838342B (en) Similarity-based virus-receptor interaction relation prediction method and device
CN112951328B (en) MiRNA-gene relation prediction method and system based on deep learning heterogeneous information network
WO2019178291A1 (en) Methods for data segmentation and identification
CN112837747B (en) Protein binding site prediction method based on attention twin network
Pashaei et al. Gene selection using hybrid dragonfly black hole algorithm: A case study on RNA-seq COVID-19 data
Hernández et al. Evolutionary multi-objective visual cortex for object classification in natural images
CN106055928A (en) Classification method for metagenome contigs
CN114496084A (en) Efficient prediction method for association relation between circRNA and miRNA
Dao et al. BDselect: a package for k-mer selection based on the binomial distribution
Shujaat et al. Cr-prom: A convolutional neural network-based model for the prediction of rice promoters
Huang et al. Greedynasv2: Greedier search with a greedy path filter
Ruffieux et al. A global-local approach for detecting hotspots in multiple-response regression
Wang et al. Towards calibrated hyper-sphere representation via distribution overlap coefficient for long-tailed learning
CN112331257A (en) Virus-host interaction prediction method based on graph convolution neural network
CN113241114A (en) LncRNA-protein interaction prediction method based on graph convolution neural network
CN109920478B (en) Microorganism-disease relation prediction method based on similarity and low-rank matrix filling
Chiang et al. The application of ant colony optimization for gene selection in microarray-based cancer classification
Li et al. Finding pre-images via evolution strategies
Bai et al. A unified deep learning model for protein structure prediction
Fan et al. Neighborhood constraint matrix completion for drug-target interaction prediction
CN110739028B (en) Cell line drug response prediction method based on K-nearest neighbor constraint matrix decomposition
Ramachandran et al. Deep learning for better variant calling for cancer diagnosis and treatment

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant