CN110838342A

CN110838342A - Similarity-based virus-receptor interaction relation prediction method and device

Info

Publication number: CN110838342A
Application number: CN201911108401.8A
Authority: CN
Inventors: 王建新; 严承; 李洪东
Original assignee: Central South University
Current assignee: Central South University
Priority date: 2019-11-13
Filing date: 2019-11-13
Publication date: 2020-02-25
Anticipated expiration: 2039-11-13
Also published as: CN110838342B

Abstract

The invention discloses a method and a device for predicting virus-receptor interaction relation based on similarity, wherein the method comprises the steps of firstly, respectively constructing Gaussian nuclear similarity matrixes of viruses and receptors based on known virus-receptor interaction relation data; constructing a sequence similarity matrix of the receptor by using sequence information of the receptor; and integrating final receptor similarity based on the receptor sequence similarity matrix and the Gaussian kernel similarity matrix. Then, the interaction relationship between the virus and the receptor, for which the interaction relationship does not exist, is initialized based on the neighborhood information. And finally, calculating the interaction relation score of the virus-receptor pair by using a Laplace regularization least square method. The invention can effectively predict the virus-receptor interaction relation.

Description

Similarity-based virus-receptor interaction relation prediction method and device

Technical Field

The invention belongs to the field of system biology, and relates to a similarity-based virus-receptor interaction relation prediction method and device.

Background

With the recent development of microbiology research, there is increasing evidence that microorganisms are closely linked to human diseases. Viruses, an important group of microorganisms and a widespread population of microorganisms, have a necessary connection with viral infectious diseases, such as ebola virus, influenza virus, and the like. The process of virus transmission begins when the virus contacts the host surface, and binding of the virus to the receptor is the first step in the entry of the virus into the host cell. In addition, specificity and affinity are major factors that viruses can attach to different types of receptor protein molecules and enter cells.

With the development of high throughput technology, many studies have shown that molecules, including proteins, can act as receptors for viruses, such as carbohydrates and lipids. In addition, binding of the virus to the receptor is a dynamic process, whose recognition can progress during infection, followed by the appearance of viral variants with different receptor binding specificities. In line with the importance of current virus-receptor interaction relationship research in the diagnosis and cure of viral diseases, a research team has constructed a mammalian-based virus-receptor interaction database, viralReceptor, which provides an important data base for understanding and studying the interaction mechanism between viruses and receptors. In this database, 128 viruses, 119 receptors and 268 interactions between them were included. This database is by far the most comprehensive mammalian virus-receptor interaction database, but the database still provides very limited virus-receptor interaction data.

The current research for predicting the virus-receptor interaction relationship by a calculation method is seriously lack, and the research is seriously unmatched with the importance degree of the current virus disease research. In addition, with the inherent characteristics of the variant of the virus and the like, a greater challenge is also provided for the diagnosis and treatment of the virus diseases, and a high-efficiency method for predicting the virus-receptor interaction relationship is urgently needed to discover a new (potential) virus-receptor interaction relationship, provide a basic guidance function for a subsequent biomedical experiment for identifying the virus-receptor interaction relationship, and improve the efficiency of the biomedical experiment.

Disclosure of Invention

The invention aims to solve the technical problem that the similarity-based virus-receptor interaction relation prediction method, the similarity-based virus-receptor interaction relation prediction device, the electronic equipment and the storage medium are provided aiming at the defects of the prior art, and the virus-receptor interaction relation can be predicted more accurately.

The technical solution of the invention is as follows:

a method for similarity-based prediction of virus-receptor interaction relationships, comprising the steps of:

step 1: construction of a Virus similarity matrix S_vAnd receptor similarity matrix S_p；

Step 2: based on known virus-receptor interaction relation data, virus similarity matrix S_vAnd receptor similarity matrix S_pThe score of the interaction relationship of each virus-receptor pair, i.e., the probability of the existence of an interaction relationship of each virus-receptor pair, is predicted.

All virus-receptor pairs are ranked based on interaction relationship pairs, with the top ranked virus-receptor pairs being more likely to have an interaction relationship.

The present prediction method can predict the possibility of an interaction relationship existing in a virus-receptor pair for which no known interaction relationship exists.

Further, based on the known data of the virus-receptor interaction relationship, the Gaussian nuclear similarity matrix G of the virus is calculated respectively_vAnd the Gaussian nuclear similarity matrix G of the receptor_p(ii) a Calculating to obtain a sequence similarity matrix G of the receptor according to the sequence information of the receptor_s；

The Gaussian kernel similarity matrix G of the virus_vAs final similarity matrix S of the final virus_v；

Gaussian nuclear similarity matrix G by receptors_pAnd the sequence similarity matrix G_sIntegrating to obtain a final receptor similarity matrix S_p。

Further, in the step 2, the interaction relationship score of each virus-receptor pair is predicted based on a laplace regularization least square method (LapRLS), and the prediction process is as follows:

first, according to the virus similarity matrix S_vAnd receptor similarity matrix S_pRespectively constructing corresponding regularized Laplacian matrixes, wherein the calculation process is as follows:

L^v＝(D^v)^-1/2(D^v-S_v)(D^v)^-1/2

L^p＝(D^p)^-1/2(D^p-S_p)(D^p)^-1/2

wherein L is_vAnd L_pLaplace matrix regularized for virus and receptor similarity matrices, respectively, D^vIs N_v×N_vOf diagonal elements D of^v(i, i) is the virus similarity matrix S_vThe sum of all elements of row i of (1); d^pIs N_p×N_pOf diagonal elements D of^p(j, j) is the receptor similarity matrix S_pThe sum of all elements of the j-th row of (a).

Then, prediction matrices F were constructed from the virus side and the receptor side, respectively_vAnd F_pThe optimization model of (2):

wherein the content of the first and second substances,

and

respectively represent F_vAnd F_pThe optimal solution of | | - | | luminance_FRepresenting the F-norm (Frobenius norm) of the matrix, β_vAnd β_pIs a smoothing parameter, is an empirical parameter;

then, the optimization model is solved to obtainAnd

finally, find out

And

to obtain a final virus-receptor interaction relation prediction matrix F^*。

LapRLS, a semi-supervised learning method, is an extended algorithm of Recursive Least Squares (RLS). In the invention, under the scene that the known virus-receptor interaction relation data is less and the interaction relation of most virus-receptor pairs is unknown, a LapRLS algorithm is adopted and Laplace regularization term constraint in the LapRLS algorithm is utilized, so that better prediction performance can be obtained.

Further, in the optimization model, the adjacent matrix Y is a matrix Y after initialization processing, the initialization processing refers to initialization processing of an interaction relationship between a virus and a receptor, which does not have a known interaction relationship, and the initialization processing method includes:

for each virus v_iIf it has no known interaction with all receptors, i.e.

Y(i,:)＝[Y(i,1),Y(i,2),...,Y(i,N_p)]If the vector is zero vector, then according to the virus similarity matrix S_vUpdating element Y (i, j) therein, i.e. initializing virus v_iAnd the receptor p_jThe formula is as follows:

wherein S is_v(i, n) is the virus similarity matrix S_vThe element in the ith row and the nth column, i.e. virus v_iAnd virus v_nSimilarity of (c);

similarly, p for each receptor_jIf it does not have a known interaction relationship with all viruses, i.e.

Y(:,j)＝[Y(1,j),Y(2,j),...,Y(N_v,j)]^TIs zero vector, then according to the receptor similarity matrix S_pUpdate elements thereinElement Y (i, j), the initiating receptor p_jAnd virus v_iThe formula is as follows:

wherein S is_p(j, m) is the receptor similarity matrix S_pThe element of row j and column m, i.e. the acceptor p_jAnd the receptor p_mThe similarity of (c).

The initialization process described above can further improve the prediction performance of the present invention.

Further, the optimization model is solved according to the following formula:

the invention also provides a device for predicting the virus-receptor interaction relation based on similarity, which comprises a similarity calculation module and a prediction module;

the similarity calculation module is used for executing the step 1: obtaining a virus similarity matrix S_vAnd receptor similarity matrix S_p；

The prediction module is configured to perform step 2: based on known virus-receptor interaction relationship, virus similarity matrix S_vAnd receptor similarity matrix S_pThe score of the interaction relationship of each virus-receptor pair, i.e., the probability of the existence of an interaction relationship of each virus-receptor pair, is predicted.

All virus-receptor pairs are ranked based on the interaction relationship pairs, with the higher ranked pairs having a greater likelihood of interaction.

The prediction device can predict the possibility of the existence of an interaction relationship between a virus-receptor pair for which no known interaction relationship exists.

The invention also provides an electronic device, which comprises a memory and a processor, wherein the memory stores a computer program, and the computer program is executed by the processor, so that the processor realizes the virus-receptor interaction relation method.

The present invention also provides a computer-readable storage medium having stored thereon a computer program which, when executed by a processor, implements the above-described virus-receptor interaction relationship method.

Has the advantages that:

the invention firstly calculates the Gaussian nuclear similarity of the virus and the receptor through the known virus-receptor interaction relation, and then calculates the sequence similarity of the receptor by utilizing the protein sequence information of the receptor. The gaussian nuclear similarity of the virus is then taken as the final similarity of the virus. And the final similarity of the receptors is obtained by integrating the similarity of the receptor sequences and the Gaussian nucleus similarity, and in the process, the invention explores the integration mode based on the verification result of the experiment and selects the integration mode which leads the prediction performance to be better. After the final similarity of the viruses and the receptors is obtained, a neighbor (all viruses and receptors) based interaction relation data initialization processing process is added to the viruses and the receptors without known interaction relation. The potential virus-receptor interaction relationships are ultimately predicted based on the hypothesis that "similar viruses use similar receptors and vice versa": potential virus-receptor interaction relationship scores are calculated by using the virus similarity, receptor similarity and interaction relationship data after initialization processing through a Laplace regularization least square method. Ten-fold cross validation (10CV) and leave-on-validation (LOOCV) were used to evaluate the predicted performance of the methods and compared to other methods, where AUC was used as an indicator of performance evaluation. The experimental result shows that the invention can effectively predict the virus-receptor interaction relationship, provides a basic guidance function for the development of a subsequent calculation model and the improvement of the biomedical experimental efficiency, and can improve the research efficiency of the subsequent biomedical experiment and save the research cost.

The invention provides an effective calculation method for predicting the virus-receptor interaction relationship for the first time aiming at the field of virus-receptor interaction relationship, and provides an important basis for the development of a subsequent virus-receptor interaction relationship prediction method. In addition, the invention can also improve the biomedical experimental efficiency of the binding relationship between the subsequent viruses and the receptor.

Drawings

FIG. 1 is a general flow diagram of a method for predicting a virus-receptor interaction relationship;

FIG. 2 is a graph comparing the performance of the present invention with ten-fold cross-validation of other methods;

FIG. 3 is a graph of leave-one-out cross-validation performance comparisons of the present invention with other methods;

FIG. 4 is a comparison graph of ten-fold cross-validation of the present invention in different integration patterns of receptor sequence and Gaussian nuclear similarity;

FIG. 5 is a comparison of the leave-one-out cross-validation of the present invention in different integration patterns of receptor sequence and Gaussian nuclear similarity;

Detailed Description

The invention will be described in further detail below with reference to the following figures and specific examples:

example 1:

the embodiment discloses a similarity-based virus-receptor interaction relationship prediction method, which comprises the following steps:

step 1: obtaining a virus similarity matrix S_vAnd receptor similarity matrix S_p；

Step 2: based on known virus-receptor interaction relationship, virus similarity matrix S_vAnd receptor similarity matrix S_pPredicting the interaction relationship score of each virus-receptor pair, namely the possibility of the interaction relationship of each virus-receptor pair; all virus-receptor pairs are then ranked based on the interaction relationship pairs, with the top ranked virus-receptor pairs being more likely to have an interaction relationship. The method can predict the possibility of the existence of the interaction relation aiming at the virus-receptor pair without the known interaction relation.

Example 2:

this example is based on example 1, and uses the Gaussian kernel similarity matrix G of virus_vAs final similarity matrix S of the final virus_v(ii) a Instant S_v＝G_v(ii) a Since the virus only calculates one Gaussian kernel similarity in this embodiment, the Gaussian kernel similarity is used as the final similarity S_v。

Gaussian kernel similarity matrix G for viruses_vCalculated according to the known virus-receptor interaction relationship.

Example 3:

this example is based on example 1 and consists of a Gaussian nuclear similarity matrix G of receptors_pAnd the sequence similarity matrix G_sIntegrating to obtain a final receptor similarity matrix S_p(ii) a Wherein the Gaussian nuclear similarity matrix G of the receptors_pCalculating according to the known virus-receptor interaction relation; sequence similarity matrix G for receptors_sCalculated according to the sequence information of the receptor.

Example 4:

this example calculates the Gaussian nuclear similarity matrix G of the virus based on the known virus-receptor interaction relationship in the following steps based on examples 2 and 3_vAnd gaussian nuclear similarity matrix of receptors:

first, defineA set of all viruses, where N_vIs the number of viruses;

is a collection of all receptors, in which N_pIs the number of receptors;

y is defined as the adjacency matrix of the virus-receptor interaction relationship, and the size is N_v×N_pThe ith row and the jth column of the element y_ijDetermined according to the following method: if a virus v is known_iAnd the receptor p_jAn interaction exists, then Y (i, j) is 1,otherwise Y (i, j) is 0;

then, a Gaussian kernel similarity matrix G of the virus is constructed_vOf size N_v×N_vElement G of ith row and nth column_v(i, n) represents a virus v_iAnd v_nThe gaussian kernel similarity of (a) is calculated according to the following formula:

G_v(i,n)＝exp(-γ_v||Y(i,:)-Y(n,:)||²)

wherein i, N is 1,2, …, N_v；Y(i,:)＝[Y(i,1),Y(i,2),...,Y(i,N_p)]Is the ith row of the matrix Y, i.e. virus v_iVector of interaction relationships with all receptors; gamma ray_vBeing a parameter for controlling the width of the nucleus, γ'_vAs an empirical parameter, the usage experience according to the gaussian kernel is set to 1 in the present embodiment; in the specific calculation process, the Gaussian nuclear similarity between the virus Feline Leukemia strain B/lambda-B1 and the virus Human alphaheres virus 3 is 0.2279.

Construction of a Gaussian Nuclear similarity matrix G of receptors_pOf size N_p×N_pElement G of jth row and mth column_p(j, m) denotes the acceptor p_jAnd p_mThe gaussian kernel similarity of (a) is calculated according to the following formula:

G_p(j,m)＝exp(-γ_p||Y(:,j)-Y(:,m)||²)

wherein j, m is 1,2, …, N_p；Y(:,j)＝[Y(1,j),Y(2,j),...,Y(N_v,j)]^TIn column j of matrix Y, i.e. the acceptor p_jVectors of interaction relationships with all viruses; gamma ray_pBeing a parameter for controlling the width of the nucleus, γ'_pFor the empirical parameters, the experience in terms of the use of the gaussian kernel is set to 1 in this embodiment. According to this calculation, the receptor MER proto-oncogene, tyrosine kinaseThe Gaussian nuclear similarity to the receptor dipeptyl peptidase 4 was 0.3492.

Example 5:

this example calculates the sequence similarity matrix G of the receptor based on the known sequence information of the receptor by the following procedure based on example 3_s：

First, sequence data for all receptors is downloaded from the KEGG database, and then the regularized Smith-Waterman score is used to calculate the sequence similarity of the receptors;

construction of a sequence similarity matrix G for receptors_sOf size N_p×N_pElement G of jth row and mth column_s(j, m) denotes the acceptor p_jAnd p_mThe sequence similarity of (a) is calculated as follows:

wherein j, m is 1,2, …, N_pSW (j, m) is an acceptor p_jAnd p_mThe original Smith-Waterman score between, which is the receptor p calculated by the Smith-Waterman algorithm_jAnd p_mThe maximum element value (maximum score value) in the match score matrix of the sequence data. By this calculation, the sequence similarity between the receptor Human alphaheresvirus 2 and the receptor humanocoronavirus 229E was 0.1744.

Example 6:

this example is based on example 3 and determines the Gaussian nuclear similarity matrix G from the receptors_pAnd the sequence similarity matrix G_sIntegrating to obtain a final receptor similarity matrix S_pThe following three integration modes:

namely, the three integration modes are respectively as follows: (1) receptor gaussian nuclear similarity alone; (2) receptor sequence similarity alone; (3) mean values of receptor gaussian nuclear similarity and receptor sequence similarity were used.

The performance influence of the prediction methods in the three integration modes is explored by adopting a cross validation method, and a mode with relatively good prediction results is finally selected, namely the final receptor similarity matrix S is calculated by the third integration mode_p。

Example 7:

this example is based on example 1 and based on known virus-receptor interaction relationship data, a virus similarity matrix S_vAnd receptor similarity matrix S_pThe method based on initialization process and Laplace regularization least squares (LapRLS) (IILLS method for short) is used for predicting the interaction relationship scores of each virus-receptor pair, and the prediction process is as follows:

L^v＝(D^v)^-1/2(D^v-S_v)(D^v)^-1/2

L^p＝(D^p)^-1/2(D^p-S_p)(D^p)^-1/2

Then, according to the definition of Laplace least square method, a prediction matrix F is respectively constructed from the virus side and the receptor side_vAnd F_pThe optimization model of (2):

wherein the content of the first and second substances,

andrespectively represent F_vAnd F_pThe optimal solution of | | - | | luminance_FRepresenting the F-norm (Frobenius norm) of the matrix, β_vAnd β_pFor the smoothing parameters, 1 is set according to the use experience;

solving the optimization model to obtain

And

finally, find outAnd

to obtain a final virus-receptor interaction relation prediction matrix F^*。

Example 8:

in this embodiment and embodiment 7, in the optimization model, the matrix Y after initialization processing is used as the adjacent matrix Y, and first, the virus-receptor interaction relationship matrix Y is initialized, that is, the interaction relationship data of the virus and the receptor without known interaction relationship is initialized, as follows:

for each virus v_iIf it has no known interaction with all receptors, i.e.

Y(i,:)＝[Y(i,1),Y(i,2),...,Y(i,N_p)]If the vector is zero vector, then according to the virus similarity matrix S_vUpdate elements thereinElement Y (i, j), i.e. initiating virus v_iAnd the receptor p_jThe formula is as follows:

Y(:,j)＝[Y(1,j),Y(2,j),...,Y(N_v,j)]^TIs zero vector, then according to the receptor similarity matrix S_pUpdating element Y (i, j) therein, i.e. initializing acceptor p_jAnd virus v_iThe formula is as follows:

In the above step, for Y (i,: Y (: j), i ═ 1,2, …, N_v，j＝1,2,…,N_pThe judgment of zero vector and the updating of elements in the vector are performed, and there is no limitation on the sequence, for example, Y (i, i) may be sequentially set to 1,2, …, N_vJudging whether the vector is zero or not and updating elements in the vector, and sequentially comparing Y (: j), j is 1,2, … and N_pAnd judging whether the vector is zero or not and updating elements in the vector.

Experiments show that the prediction performance of the invention can be further improved by the initialization processing process.

Example 9:

in this embodiment, on the basis of embodiment 8, an optimization model is solved according to the following formula:

example 10:

this example is based on example 9 and makes

The overall scheme for the prediction of the virus-receptor interaction relationship in this example is shown in FIG. 1.

According to the above calculation process, the interaction relationship score between the virus Dengue virus and the receptor C-type molecule family 4member M is 0.0669. The virus-receptor pairs are then ranked according to this interaction score, with the higher the probability that the top ranked virus-receptor pairs will have an interaction.

Example 11:

the embodiment discloses a similarity-based virus-receptor interaction relation prediction device, which comprises a similarity calculation module and a prediction module;

The prediction module is configured to perform step 2: based on known virus-receptor interaction relationship, virus similarity matrix S_vAnd receptor similarity matrix S_pPredicting the interaction relationship score of each virus-receptor pair, namely the possibility of the interaction relationship of each virus-receptor pair; all virus-receptor pairs are then ranked based on the interaction relationship pairs, with the top ranked virus-receptor pairs being more likely to have an interaction relationship. The device can predict the possibility of the existence of an interaction relation for a virus-receptor pair without a known interaction relation.

The technical means specifically adopted in step 1 and step 2 refer to the foregoing embodiments, and are not described herein again.

Example 12:

the embodiment discloses an electronic device, which includes a memory and a processor, wherein the memory stores a computer program, and when the computer program is executed by the processor, the processor is enabled to implement the method for predicting a virus-receptor interaction relationship in any of the above embodiments.

Example 13:

the present embodiment discloses a computer-readable storage medium, on which a computer program is stored, which, when executed by a processor, implements the method for predicting a virus-receptor interaction relationship in any of the above embodiments.

If the above embodiment is used for predicting the potential human virus-receptor interaction relationship, firstly, the human virus-receptor interaction relationship data is extracted from the viralReceptor database, the number of the extracted human viruses is 104, the number of the extracted human receptors is 74, and the number of the known human virus-receptor interaction relationships is 211. Then, on the basis of the extracted human body virus-receptor interaction relation data, the human body virus-receptor interaction relation data is analyzed, and then a corresponding calculation model is established to predict the human potential virus-receptor interaction relation. Similarly, if the virus-receptor interaction relationship of other types of mammals needs to be predicted, corresponding virus-receptor interaction relationship data can be extracted from the viralReceptor database, and then the data is developed and analyzed, and then a corresponding calculation model is established for prediction.

Experimental verification

In order to verify the effectiveness of the method and evaluate the prediction performance of the method, two verification modes are adopted to evaluate the prediction performance of the method by referring to the verification standards of prediction methods in other fields: including ten-fold cross validation and leave-one validation. The AUC (the area under ROC currents) value is used as its metric. In ten-fold cross validation, the known virus-receptor interaction relationship was divided into 10, and 1 was selected as the test set by sequential polling, the remaining 9 were used as the training set, and the average after 100 runs was used as its prediction result. In the leave-one validation, one known virus-receptor interaction relationship is selected as a test set from the known virus-receptor interaction relationships, and the rest is used as a training set.

Figure 2 depicts AUC plots for the present invention and other methods in a ten-fold cross validation. As can be seen from the figure, the AUC value of IILLS of the present invention is 0.8675, which is superior to other 3 methods BRWH (0.7959), LapRLS (0.7577) and CMF (0.7128). The invention proves that the potential virus-receptor interaction relation can be predicted more effectively compared with other methods.

FIG. 3 depicts AUC plots in leave-one-out cross-validation for the present invention and other methods. It can also be seen that IILLS has an AUC value of 0.9061, while the other three methods have AUC values of BRWH (0.8105), LapRLS (0.7713) and CMF (0.7496). The invention is further illustrated as being effective in predicting potential virus-receptor interaction relationships.

FIG. 4 depicts a ten-fold cross-validation performance comparison of the present invention in different integrated ways that integrate receptor sequence similarity and Gaussian nuclear similarity. The method specifically comprises the following steps: (1) IILLS _ Seqsim: sequence similarity alone is used; (2) IILLS _ GipSeq: using the mean of sequence and receptor similarity; (3) IILLS _ Gipsim: only gaussian kernel similarity is used. It can also be seen that the present invention achieves the maximum AUC value when only the receptor sequence similarity is used (0.8675), the next time when the mean of the receptor sequence similarity and gaussian kernel similarity is taken (0.8464), and the results of using the receptor gaussian kernel similarity alone are worse than the former two (0.8242).

FIG. 5 depicts graphs comparing the performance of the present invention in leave-on validation in different integrated ways integrating receptor sequence similarity and Gaussian nuclear similarity. It can also be seen from the figure that the IILLS of the present invention achieved the best predicted performance (AUC: 0.9061) when only the receptor sequence similarity was taken, and was also superior to the performance of the other two cases (mean of Gaussian nuclear similarity and sequence similarity: 0.8865, only Gaussian nuclear similarity: 0.8724). The above experiments all show that the invention can obtain better prediction performance under the condition of only taking the similarity of the receptor sequences, so the invention selects the similarity of the receptor sequences as the final similarity of the receptors.

The expression of the invention in the application case proves that the invention can effectively predict the potential virus-receptor interaction relationship, accelerate the subsequent research process of the virus-receptor interaction relationship and contribute to further improving the treatment and diagnosis level of virus diseases.

Claims

1. A similarity-based virus-receptor interaction relationship prediction method is characterized by comprising the following steps:

2. The method of claim 2, wherein the similarity-based prediction of viral-receptor interaction relationship,

respectively calculating the Gaussian nuclear similarity matrix G of the virus according to the known virus-receptor interaction relation data_vAnd the Gaussian nuclear similarity matrix G of the receptor_p(ii) a Calculating to obtain a sequence similarity matrix G of the receptor according to the sequence information of the receptor_s；

3. The similarity-based virus-receptor interaction relationship prediction method according to claim 2, wherein the Gaussian nuclear similarity matrix G of the virus is calculated according to known virus-receptor interaction relationship data_vAnd the Gaussian nuclear similarity matrix G of the receptor_pThe process of (2) is as follows:

first, define

A set of all viruses, where N_vIs the number of viruses;

is a collection of all receptors, in which N_pIs the number of receptors;

y is defined as the adjacency matrix of the virus-receptor interaction relationship, and the size is N_v×N_pThe ith row and the jth column of the element y_ijDetermined according to the following method: if a virus v is known_iAnd the receptor p_jIf an interaction exists, then Y (i, j) is 1, otherwise Y (i, j) is 0;

G_v(i,n)＝exp(-γ_v||Y(i,:)-Y(n,:)||²)

wherein i, N is 1,2, …, N_v；Y(i,:)＝[Y(i,1),Y(i,2),...,Y(i,N_p)]Is the ith row of the matrix Y, i.e. virus v_iVector of interaction relationships with all receptors; gamma ray_vBeing a parameter for controlling the width of the nucleus, γ'_vIs an empirical parameter;

G_p(j,m)＝exp(-γ_p||Y(:,j)-Y(:,m)||²)

wherein j, m is 1,2, …, N_p；Y(:,j)＝[Y(1,j),Y(2,j),...,Y(N_v,j)]^TIn column j of matrix Y, i.e. the acceptor p_jVectors of interaction relationships with all viruses; gamma ray_pBeing a parameter for controlling the width of the nucleus, γ'_pAre empirical parameters.

4. The method of claim 2, wherein the receptor sequence similarity matrix G is calculated based on known receptor sequence information_sThe process of (2) is as follows:

build size N_p×N_pOf the receptor of (3) sequence similarity matrix G_sElement G of jth row and mth column_s(j, m) denotes the acceptor p_jAnd p_mThe sequence similarity of (a) is calculated as follows:

wherein j, m is 1,2, …, N_pSW (j, m) is an acceptor p_jAnd p_mThe original Smith-Waterman score between, which is the receptor p calculated by the Smith-Waterman algorithm_jAnd p_mThe maximum element value in the match score matrix of the sequence data.

5. The similarity-based virus-receptor interaction relationship prediction method according to any one of claims 1 to 4, wherein in the step 2, the interaction relationship score of each virus-receptor pair is predicted by a Laplace regularized least squares method, and the prediction process is as follows:

first, according to the virus phaseSimilarity matrix S_vAnd receptor similarity matrix S_pRespectively constructing corresponding regularized Laplace matrixes:

L^v＝(D^v)^-1/2(D^v-S_v)(D^v)^-1/2

L^p＝(D^p)^-1/2(D^p-S_p)(D^p)^-1/2

wherein D is^vIs N_v×N_vOf diagonal elements D of^v(i, i) is the virus similarity matrix S_vThe sum of all elements of row i of (1); d^pIs N_p×N_pOf diagonal elements D of^p(j, j) is the receptor similarity matrix S_pThe sum of all elements of the jth row of (1);

wherein Y is an adjacency matrix of virus-receptor interaction relationships,

and

respectively represent F_vAnd F_pThe optimal solution of | | - | | luminance_FRepresenting the F-norm of the matrix β_vAnd β_pIs a smoothing parameter, is an empirical parameter;

then, the optimization model is solved to obtain

And

finally, find out

And

to obtain a final virus-receptor interaction relation prediction matrix F^*。

6. The method for predicting virus-receptor interaction relationship based on similarity according to claim 5, wherein the adjacent matrix Y in the optimization model is initialized as follows:

for each virus v_iIf it does not have a known interaction with all receptors, i.e. Y (i): ═ Y (i,1), Y (i,2)_p)]If the vector is zero vector, then according to the virus similarity matrix S_vUpdating element Y (i, j) therein, i.e. initializing virus v_iAnd the receptor p_jThe formula is as follows:

similarly, p for each receptor_jIf there is no known interaction with all viruses, i.e., Y (: j) ═ Y (1, j), Y (2, j),. · Y (N)_v,j)]^TIs zero vector, then according to the receptor similarity matrix S_pUpdating element Y (i, j) therein, i.e. initializing acceptor p_jAnd virus v_iThe formula is as follows:

7. The method of claim 5, wherein the optimization model is solved according to the following formula:

8. the similarity-based virus-receptor interaction relation prediction device is characterized by comprising a similarity calculation module and a prediction module;

9. An electronic device comprising a memory and a processor, the memory having stored therein a computer program, wherein the computer program, when executed by the processor, causes the processor to implement the method of any of claims 1-7.

10. A computer-readable storage medium, on which a computer program is stored, which, when being executed by a processor, carries out the method according to any one of claims 1 to 7.