CN115910212A

CN115910212A - Method for analyzing cell communication mediated by ligand-receptor interaction

Info

Publication number: CN115910212A
Application number: CN202211217739.9A
Authority: CN
Inventors: 彭利红; 阳龙; 谭经纬; 周立前
Original assignee: Hunan University of Technology
Current assignee: Hunan University of Technology
Priority date: 2022-09-30
Filing date: 2022-09-30
Publication date: 2023-04-04

Abstract

The invention discloses a cell communication method for analyzing ligand-receptor interaction mediation, which comprises the steps of firstly collecting a plurality of ligand-receptor interaction data sets, secondly screening and identifying potential ligand-receptor interaction through a heterogeneous deep integration model combining a heterogeneous Newton Boosting model and a deep neural network, thirdly filtering known and predicted ligand-receptor interaction in tissues based on single-cell transcriptome data, fourthly predicting cell communication based on the filtered ligand-receptor interaction and an expression threshold method, an expression product method and a joint scoring strategy, and finally visualizing a cell communication result.

Description

Method for analyzing cell communication mediated by ligand-receptor interaction

Technical Field

The invention belongs to the field of system bioinformatics, and particularly relates to a cell communication method for analyzing a ligand-receptor interaction mediated effect.

Background

In multicellular organisms, cellular communication induces multiple cells to coordinate with each other, forming tissues, organs, and systems, and further carrying out various vital activities. Many cancers rely to a large extent on communication between cancer cells and normal cells. The cell communication is important for understanding the tumor occurrence and development, the tumor immunity and the treatment resistance, and the prediction of the cell communication is helpful for understanding the molecular mechanism of tumor progression and metastasis, and further provides guidance for anti-cancer drug design and tumor targeted treatment.

Despite the advances made in biomedical experimental technology, there are still many deficiencies in understanding cellular communication. With the continuous maturation of single cell sequencing technology, the identification of cell communication is becoming a research hotspot. Cellular communication is typically mediated by ligand-receptor interactions (LRIs), where the ligand either secretes and binds to the receptor in soluble form or binds to the receptor in membrane-bound form, and requires physical access of the two communicating cell types. Treatment against cell signaling pathways would be a powerful strategy in clinical practice when the significance of ligand-receptor interactions for patient prognosis is fully resolved. To date, several computational methods have been proposed to predict cellular communication based on single cell transcriptome data and known ligand-receptor interactions.

These methods include four broad categories: a differential expression combination based approach, a network based approach, an expression perturbation based approach, and a tensor based approach. However, most approaches based on differential expression profiling predict cellular communication based solely on the expression intensity and, or specificity, of ligands and receptors in known ligand-receptor interaction data, do not take the ligand-receptor interactions into account in all cell types in combination, and network-based approaches do not take into account gene regulatory mechanisms. Tensor-based methods have difficulty interpreting cellular communication strength from a tensor resolution perspective. More importantly, all four methods require solving the problem of false positives and false negatives.

The patent with publication number CN112466403B discloses a cell communication analysis method and a system, wherein the method comprises cell communication prediction and ligand-target gene regulation prediction; the cell communication prediction comprises the analysis of the expression abundance of the ligand-receptor pairs, the analysis of the number of the significantly enriched ligand-receptor pairs and the construction of a cell interaction network diagram; ligand-target gene regulation prediction comprises ligand activity analysis and ligand-target gene regulation potential analysis; the cell communication analysis method mainly describes the correlation among cells, utilizes CellphoneDB software to construct a cell communication network based on a single cell gene expression quantity matrix, and utilizes Nichenet software to analyze the activity of a ligand and the regulation and control potential of the ligand to a target gene based on a ligand-target gene expression relation.

Disclosure of Invention

The present invention has been made to solve the above-mentioned problems occurring in the prior art, and an object of the present invention is to provide a method for analyzing a cell communication mediated by a ligand-receptor interaction, which solves the problems of the prior art.

The technical scheme adopted by the invention is as follows:

the invention provides a method for analyzing cell communication mediated by ligand-receptor interaction, which comprises the following steps:

s1: arranging ligand-receptor interaction data, and collecting four ligand-receptor interaction data sets;

s2: predicting ligand-receptor interaction, and screening and identifying potential ligand-receptor interaction through a heterogeneous deep integration model combining a heterogeneous Newton Boosting model and a deep neural network;

s3: ligand-receptor interaction filtering, filtering ligand-receptor interactions in combination with single cell transcriptome data, known ligand-receptor interactions and recognized ligand-receptor interactions;

s4: predicting cell communication;

s5: cell communication was visualized.

Further, the heterogeneous deep integration model screening in step S2 identifies potential ligand-receptor interactions, including the following steps:

s2.1: extracting characteristics;

s2.2: reducing the dimension;

s2.3: ligand-receptor interaction classification, assuming D = (X, Y) denotes a ligand-receptor interaction dataset with n samples (ligand-receptor pairs), where (X, Y) denotes a training sample, X ∈ X is a D-dimensional feature vector, and Y ∈ Y denotes its label. For the ith (i =1,2, \8230;, n) sample x _i If the ligand-receptor pair interacts, y _i =1, otherwise y _i ＝0。

S2.3.1, calculating the interaction probability of each ligand-receptor pair by using a heterogeneous Newton Boosting machine;

to predict x _i Considering an objective function defined by the formula:

wherein: y is _i And f (x) _i ) Respectively represent x _i True labels and predicted labels. Loss function l (y) _i ,f(x _i ) Is twice differentiable, corresponding to f (x) _i )，l′(y _i ,f(x _i ) And l' (y) _i ,f(x _i ) Respectively) represent the first and second derivatives.

Assume that each boosting iteration can be obtained from one of K different subclasses, and H ^(k) Represents the kth (K =1,2,.., K) subclass defined by the following formula:

wherein

Representing a finite class function b (x) _i ) R → R satisfies >>

For the domain defined by the above formula, in each boosting iteration, a subclass is randomly selected to contain several binary decision trees with the maximum depth at D _min And D _max Are randomly and evenly arranged. Thus, K = N is obtained _D (N _D ＝D _max -D _min + 1) unique choice of subclass. The corresponding probability mass function Φ can be defined by the following equation:

let u _m (u _m K) is the sample sub-class index at the mth boosting iteration, and the underlying assumption inserted at the mth boosting iteration is determined by the following formula:

wherein g is _i ＝l′(y _i ,f ^m-1 (x _i ))，h _i ＝l″(y _i ,f ^m-1 (x _i ))

To ensure global convergence, the model ∈ >0 is updated by applying a learning rate:

f ^m (x _i )＝f ^m-1 (x _i )+∈b _m (x _i )

finally the optimization model can be solved by algorithm 1. Based on algorithm 1, the target ligand-receptor pair x can be calculated _i Probability of interaction of

Algorithm 1: heterogeneous Newton Boosting machine

1: initialization f ⁰ (x _i )＝0

2: enter a for loop, when M =1.

3: calculating g _i ＝l′(y _i ,f ^m-1 (x _i ))

4: calculating h _i ＝l″(y _i ,f ^m-1 (x _i ))

5: phi-based sample subclass index mu _m

6: so as to comply with the basic assumption that,

7: updating the model: f. of ^m (x)＝f ^m-1 (x _i )+∈b _m (x _i )

8: end for cycle

9: and (3) outputting:

s2.3.2, calculating the interaction probability of each ligand-receptor pair through a deep neural network;

as shown in fig. 9, the DNN model consists of two long-short-time memory (LSTM) network layers, three fully-connected layers, and one output layer. The first two layers are LSTM network layers with d neurons for outputting the transformed d-dimensional feature vector b. The LSTM network is an important back propagation-based deep learning model and can better transfer information. An LSTM block is input by an input gate L and a forgetting gate

The output gate omega and a unit. It can dynamically change the weights of the self-recursive model by adding gates. In an LSTM network, the inputs and outputs in the forgetting gate may be implemented based on the inputs and previous cell states. Suppose that

Denotes x _i At the jth input (characteristic) at time t, c represents one of the c memory cells. w is a _CL 、/>

And w _cω Respectively, the peephole weights from unit c to the three doors. />

Indicating the state of cell c at time t. />

The output of the h-th neuron at time t. f. g and h represent the activation functions in the three gates, respectively.

The inputs and outputs in the input gate may be represented by the following equations:

the inputs and outputs in the forgetting gate can be described by the following equations:

the input and state of cell c at time t can be represented by the following formula:

the output of the unit c at time t can be described by the following equation:

the inputs and outputs in the output gates can be represented by the following equations.

Based on the above formula, the final output b can be obtained, which is a d-dimensional feature vector transformed by two LSTM network layers.

The lower three layers are fully connected layers consisting of 256, 128 and 64 neurons, respectively. In the first of the three layers, the input is a transformed d-dimensional feature vector b representing a ligand-receptor pair. RELU was used as the activation function for the three layers to reduce gradient vanishing phenomena and further improve generalization performance. Its definition is expressed as follows:

where a denotes the parameter, a =1.

The output layer is composed of single neurons and outputs each target ligand-receptor pair x based on a sigmoid function defined by a formula _i The probability of interaction of (a). The formula is as follows:

at the output level, the binary cross-entropy function is minimized to illustrate that the predicted ligand-receptor interaction is expressed by the following formula:

wherein: y is _i Representing a target ligand-receptor pair x _i The real label of (a) is,

denotes x _i Predicted interaction probability. The experiment was trained on 100 epochs with a minimum batch of 128.

S2.4: and integrating learning, namely integrating a heterogeneous Newton Boosting model and a deep neural network to obtain a heterogeneous deep integration model so as to obtain the final ligand-receptor interaction. In the learning process, HNBM and DNN are used as basic classifiers, and the outputs of the two models are integrated to obtain a final classification result. For one ligand-receptor pair x _i Suppose that

And &>

Indicates the probability of interaction, calculated by HNBM and DNN, respectively, which is ultimately the probability of ligand-receptor interaction->

Can be obtained by the following expression:

wherein: the parameter α represents the relative importance of HNBM and DNN to the performance of the ligand-receptor interaction classification, and requires learning through cross-validation.

Further, the proposed LRI-HDEnHD model achieves higher AUC and AUPR on four ligand-receptor interaction datasets when α is set to 0.6 on most datasets, with α set to 0.5,0.8, preferably 0.6 in step S2.4.

Further, the feature extraction in step S2.1 includes the following steps:

s2.1.1: obtaining ligand-receptor sequence information from the UniProt database;

s2.1.2: uses include kmer, auto-covariance and cross-covariance combinations (ACC), distance-based Top-n plots (DT), distance pair-based pseudo amino acid composition (PseAAC-DP), parallel-related pseudo amino acid composition (PC-PseAAC), and serial-related pseudo-amino acidsMethod for extracting biological characteristics of protein by using amino acid composition (SC-PseAAC), and p _o Represents a protein (ligand or receptor) sequence having L amino acid residues:

p _o ＝{R ₁ R ₂ …R _i …R _L }

wherein R is _i Represents p _o Amino acid residue at position i.

Let t denote the distance of each residue pair, biological features of the protein including Kmer, ACC, DT, pseAAC-DP, PC-PseAAC and SC-PseAAC were extracted to represent the ligand/receptor.

Kmer based on the Kmer method proposed by Liu et al, proteins can be expressed based on the frequency of occurrence of k adjacent amino acids. Thus, the Kmer-based biological characteristics of each protein were extracted and the ligands/receptors were described as a 400-dimensional vector.

ACC is a combination of autocovariance and cross-covariance. Autocovariance is used to measure the correlation between the same attribute for each residual pair of distance t:

wherein P is _U (R _i ) Represents R _i The physical and chemical property value of (2),

represents the average of the physicochemical properties.

The cross-covariance evaluates the correlation between two different attributes of a residual pair at a distance t by the following formula:

wherein U is ₁ And U ₂ Represents twoA number of different physicochemical indices.

Suppose that the method of auto-covariance and cross-covariance is used to calculate N ₁ And N ₂ Characterisation, then the ACC characteristic of the ligand/receptor can be expressed as (N) ₁ + ₂ ) A dimensional feature vector.

DT: DT is used to measure the distance between the top n gram pairs. It obtains a top-n-gram feature vector based on the relative position information of each top-n-gram pair in a sequence of top-n-grams. The invention constructs a 1220-dimensional top-1-gram feature vector based on distance to describe ligand/receptor.

PseAAC-DP describes a protein described based on the reduced alphabet method and the distance between each pair of residues. Using the amino acid cluster spectrum cp provided by Liu et al as the reduced alphabet method, a feature vector is defined, representing the ligand/receptor by the following formula:

wherein f is ^t (u)＝f(R _i ,R _j | t) denotes a pair of residues (R) separated by a pair of residues at a distance t _i And R _j ) The frequency of occurrence of (c).

The PseAAC method can extract the main features of 20 amino acid components, but they lose other features in addition to 20 components. PC-PseAAC integrates adjacent local and global sequence order information based on the protein sequence.

Sequence order correlation factor:

wherein h is ¹ (R _i )、h ² (R _i ) And M (R) _i ) Each represents R _i Values for hydrophobicity, hydrophilicity and mass of side chains. They consist of the following formula:

wherein, the first and the second end of the pipe are connected with each other,

and M ₀ (R _i ) Each represents R _i Hydrophobic, hydrophilic and mass.

Finally, the PC-PseAAC characteristics of the ligand/receptor are expressed using the following formula:

wherein f is _i Indicates the normalized frequency of occurrence of the ith amino acid.

SC-PSEAC is a variant of PC-PseAAC. Considering that different types of proteins have different hydrophobic and hydrophilic characteristics, chou et al define the sequence order characteristics by the following formula:

wherein

j and->

Each represents two residues R _i And R _j The hydrophobic and hydrophilic correlations between them, which consist of the following formulae: />

By incorporating the amphipathic correlation factor of 2 λ into the classical amino acid composition, an augmented discrete form is achieved, the formula:

s2.1.3: the number of features for each type is derived, these features are fused to describe the ligand-receptor, and a ligand-receptor pair can be represented as a 4576-dimensional vector using a cascade of operations.

Further, the step S2.2 of reducing the dimension includes the steps of: after feature extraction, a ligand-receptor pair is characterized as a 4576-dimensional vector, dimensionality reduction based on principal component analysis while retaining 99% of the information, and finally, a d-dimensional feature vector is obtained to describe each ligand-receptor pair.

Further, said step S3 ligand-receptor interaction filtering, comprising the steps of:

s3.1: downloading single-cell transcriptome data of the target tissue from the GEO database;

s3.2: updating the ligand-receptor interaction data by merging predicted ligand-receptor interactions and known ligand-receptor interactions;

s3.3: ligand-receptor interaction is removed when the ligand or receptor from the ligand-receptor interaction is not expressed in the cells of the single cell transcriptome data of the corresponding tissue.

Further, the joint scoring strategy in step S4 includes the following steps:

s4.1: given filtered p LRIs { (l) ₁ ,r ₁ )，(l ₂ ,r ₂ )，…，(l _p ,r _p ) In which l _i /r _i Representing Single cell transcriptome data and m cell type C ₁ ,C ₂ ,…,C _k ,…,C _m Expression vector of ligand i/receptor j in. First, cell type C was calculated _k1 Arithmetic mean expression value of ligand i

And cell type C _k2 The arithmetic mean of the middle receptor j->

C is then described based on a joint scoring strategy that combines expression thresholding and expression product methods _k1 And C _k2 The communication potential between them.

First, C is described based on an expression thresholding method _k1 And C _k2 Communication potential between;

for two cell types C mediated by ligand-receptor interactions _k1 And C _k2 A potential signaling pathway may exist when gene expression values from both ligand and receptor in a ligand-receptor interaction are greater than a given threshold. Thus the expression threshold method can be used to test C _k1 And C _k2 To "talk" therebetween. First, the arithmetic mean expression value of all cells was calculated based on the expression level of ligand i/receptor j

And standard deviation σ _i /σ _j . Second, when->

Greater than or equal to>

When the ligand is considered to be at C _k1 High expression level. Similarly, when>

Greater than or equal to>

When the receptor is considered to be in C _k2 High expression degree. For ligand-receptor interactions, when ligand i and receptor j from a ligand-receptor interaction are present in cell type C _k1 And C _k2 When highly expressed, they are each considered to be C _k1 And C _k2 Potential ligand-receptor interaction between->

The following formula is defined in detail: />

The recognized ligand-receptor interaction also indicates the direction of the communication line (sender → receiver). Finally, C _k1 And C _k2 G of communication between ₁ (k ₁ ,k ₂ ) All LRIs defined by the Pair of formulas ¹ Calculated by summation, expressed as follows:

c is then described based on the expression product method _k1 And C _k2 Potential for communication between;

the expression product method was used to score the cellular communication intensity based on single cell transcriptome data and filtered ligand-receptor interactions. First, cell type C _k1 And C _k2 One ligand-receptor pair (l) _i ,r _i ) The expression product of (a) is defined by the following formula:

C _k1 and C _k2 G of communication between ₂ (k ₁ ,k ₂ ) Then all LRIs are addressed by the following formula ² Summing to calculate:

s4.2: the two expressions were normalized by the min-max method and expressed as:

in which represents

Is represented by g ₁ (k ₃ ,k ₄ )(k ₃ ,k ₄ ＝1,2,…,m)；

In which represents

Denotes g ₂ (k ₃ ,k ₄ )(k ₃ ,k ₄ ＝1,2,…,m)；

S4.3: finally, C _k1 And C _k2 Can be added by adding

And &>

The following equation is obtained:

obtaining a joint scoring strategy cellular communication score g ₁ (k ₁ ,k ₂ )。

Further, said step S5 of analyzing and visualizing cellular communication in human melanoma tissue and head and neck squamous cell carcinoma tissue comprises the steps of: the ligand-receptor interactions ranked 3 top per cell communication intensity in human melanoma tissue and head and neck squamous cell carcinoma tissue are illustrated, and the thermographic and network views visualize cell communication.

Further, after analyzing and visualizing cellular communication of human melanoma tissue, it was concluded that CAF cells, macrophages and endothelial cells were found to interact with melanoma cancer cells.

Further, after the analysis and visualization of cellular communication of head and neck squamous cell carcinoma tissue (HNSCC), it was concluded that endothelial cells, macrophage cells, and fibroblasts were found to interact with head and neck squamous cell carcinoma.

Compared with the prior art, the invention has the beneficial effects that:

the heterogeneous deep integration model can obtain the optimal performance, proves the strong ligand-receptor interaction classification capability and can improve the ligand-receptor interaction recognition performance. The combined scoring strategy can identify the heterogeneity of gene expression, is further beneficial to differential gene expression analysis, and solves the problems of false positive and false negative caused by an expression threshold method. Meanwhile, the invention also solves the problem that when the expression product method has obvious difference on the transcription level of the ligand and the receptor from the ligand-receptor interaction, one ligand or receptor dominates the interaction signal, thereby being beneficial to the prediction of cell communication in the tumor microenvironment and further helping the design of anti-cancer drugs and the targeted therapy of tumors.

Drawings

FIG. 1: a flow chart of a method for analyzing cell communication mediated by ligand-receptor interaction;

FIG. 2: a method for analyzing cell communication mediated by ligand-receptor interaction, step S2 a process of identifying ligand-receptor interaction;

FIG. 3: an LSTM network;

FIG. 4: AUC for five methods on dataset 1;

FIG. 5: AUC for five methods on data set 2;

FIG. 6: AUC for five methods on dataset 3;

FIG. 7: AUC for five methods on dataset 4;

FIG. 8: AUPRs for the five methods on dataset 1;

FIG. 9: AUPRs for the five methods on dataset 2;

FIG. 10: AUPRs for five methods on dataset 3;

FIG. 11: AUPRs for five methods on dataset 4;

FIG. 12: a heat map of cell communication in human melanoma tissue based on filtered ligand-receptor interactions and expression threshold methods;

FIG. 13 is a schematic view of: a network view of cellular communication in human melanoma tissue based on filtered ligand-receptor interactions and expression threshold methods;

FIG. 14 is a schematic view of: the first 3 ligand-receptor interactions of cell communication in human melanoma tissue based on the filtered ligand-receptor interactions and expression product approach;

FIG. 15: a heat map of cell communication in human melanoma tissue based on the filtered ligand-receptor interaction and expression product approach;

FIG. 16: a network view of cellular communication in human melanoma tissue based on filtered ligand-receptor interaction and expression product approach;

FIG. 17: the first 3 ligand-receptor interactions of the human melanoma histocyte communication assay based on filtered ligand-receptor interactions and a joint scoring strategy;

FIG. 18: a heat map of human melanoma histocyte communication analysis based on filtered ligand-receptor interactions and a joint scoring strategy;

FIG. 19: a network view of human melanoma histocyte communication analysis based on filtered ligand-receptor interactions and a joint scoring strategy;

FIG. 20: (ii) a heatmap of HNSCC tissue cell communication analysis based on filtered ligand-receptor interactions and expression thresholding;

FIG. 21: a network view of HNSCC tissue cell communication analysis based on filtered ligand-receptor interactions and expression thresholding;

FIG. 22: the first 3 ligand-receptor interactions in HNSCC histocyte communication analysis based on the filtered ligand-receptor interactions and expression product method;

FIG. 23: a heat map of HNSCC tissue cell communication analysis based on the filtered ligand-receptor interaction and expression product method;

FIG. 24: a network view of HNSCC histocyte communication analysis based on the filtered ligand-receptor interaction and expression product method;

FIG. 25 is a schematic view of: the first 3 ligand-receptor interactions analyzed for cellular communication in HNSCC tissues based on filtered ligand-receptor interactions and a joint scoring strategy;

FIG. 26: a heatmap of cellular communication analysis in HNSCC tissues based on filtered ligand-receptor interactions and a joint scoring strategy;

FIG. 27 is a schematic view showing: a network view of cellular communication analysis in HNSCC tissues based on filtered ligand-receptor interactions and joint scoring strategies;

FIG. 28: comparing the cell communication analysis results of LRI-HDEnHD-join with CellPhoneDB, LIANA and CellChat in human melanoma tissues;

FIG. 29: LRI-HDEnHD-join and CellPhoneDB, LIANA, cellChat in human HNSCC tissue cell communication analysis results were compared.

Detailed Description

In order to clearly explain the technical features of the present invention, the following detailed description of the present invention is provided with reference to the accompanying drawings. In the following description, numerous specific details are set forth in order to provide a thorough understanding of the present application, however, the present application may be practiced in other ways than those described herein, and therefore the scope of the present application is not limited by the specific embodiments disclosed below. Unless otherwise defined, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this invention belongs. The terminology used herein is for the purpose of describing particular embodiments only and is not intended to limit the scope of the present invention.

Example 1

The embodiment provided by the invention comprises the following steps: a method for analyzing a cell communication mediated by a ligand-receptor interaction, comprising the steps of:

s1: ligand-receptor interaction data were collated and four ligand-receptor interaction data sets were collected as shown in table 1.

Data sets 1 and 2 were obtained by shore et al manually searching for protein-protein interactions from the training database based on text mining techniques, and these 2 data sets included 3398 human ligand-receptor interactions and 2033 mouse ligand-receptor interactions, respectively. For both data sets, the present invention eliminates repeated ligand-receptor interactions and ligand-receptor interactions in the Uniprot database that lack sequence information for either the ligand or the receptor. After pre-processing, dataset 1 contains 3390 ligand-receptor interactions between 812 ligands and 780 receptors in humans. Data set 2 contains 2031 ligand-receptor interactions between 650 ligands and 588 receptors in mice.

Data set 3 was constructed by Skelly et al using the human ligand-receptor interaction data set provided by Ramilowski et al. The present invention preprocesses data set 3 to obtain a 2006 ligand-receptor interaction comprising between 574 human ligands and 559 receptors.

Dataset 4 was constructed by Ximerirkis et al extracting 6638 ligand-receptor interactions from iRefIndex, pathway Commons, and BioGRID. Following similar pre-processing as

data sets

1 and 2, the present invention achieved 6585 ligand-receptor interactions between 1129 ligands and 1335 receptors in mice.

In addition, 351 coincidental ligands, 335 coincidental receptors and 352 coincidental ligand-receptor interactions were present on both human datasets, while 423 coincidental ligands, 402 coincidental receptors and 914 coincidental ligand-receptor interactions were present on both mouse datasets.

S2: prediction of ligand-receptor interactions screening for identification of potential ligand-receptor interactions by a heterogeneous deep integration model (LRI-HDEnHD) combining a heterogeneous newtonian Boosting model and a deep neural network, the LRI-HDEnHD comprising the following four steps:

s2.1: the feature extraction comprises the following steps:

s2.1.1: sequence information for ligand-receptors was obtained from UniProt databases.

S2.1.2: biological characteristics of proteins were extracted using the method of kmer, autocovariance and cross-covariance combination (ACC), distance-based Top-n plot (DT), distance pair-based pseudo amino acid composition (PseAAC-DP), parallel-related pseudo amino acid composition (PC-PseAAC), and serial-related pseudo amino acid composition (SC-PseAAC).

S2.1.3: the number of features for each type was obtained and integrated to describe each ligand-receptor pair, one ligand-receptor pair can be represented as a 4576-dimensional vector using cascade operations, as shown in table 2.

S2.2: dimension reduction, after feature extraction, a ligand-receptor pair is represented as a 4576-dimensional vector. In order to reduce the calculation cost, the invention realizes the dimensionality reduction based on principal component analysis while retaining 99 percent of information. Finally, a d-dimensional feature vector is obtained to describe each ligand-receptor pair.

S2.3: classification of ligand-receptor interactionsHeterogeneous Newton Boosting model

And deep neural network->

Calculating the probability of interaction->

And &>

S2.4: integrating learning, and fusing a heterogeneous Newton Boosting model and a deep neural network to obtain a heterogeneous deep integration model

And calculating the probability of the final ligand-receptor interaction->

To obtain a final ligand-receptor interaction, wherein: wherein alpha represents the relative importance of the heterogeneous Newton Boosting model and the deep neural network model on the classification performance of the ligand-receptor interaction, and learning is required through cross validation.

S3: ligand-receptor interaction filtration comprising the steps of:

s3.2: updating the ligand-receptor interaction data by combining the predicted ligand-receptor interactions and known ligand-receptor interactions;

s3.3: (ii) the ligand or receptor from the ligand-receptor interaction is removed when it is not expressed in cells of the single cell transcriptome data of the corresponding tissue;

s4: predicting cell communication based on the filtered ligand-receptor interaction and expression threshold method, expression product method and joint scoring strategy;

s5: cell communication was visualized.

The LRI-HDEnHD model was compared with the performance of four recent protein interaction prediction methods (PIPR, XGboost, DNNXGB, and OR-RCNN):

to evaluate the performance of the proposed LRI-HDEnHD model, LRI-HDEnHD was compared with four recent protein interaction prediction methods (PIPR, XGboost, DNNXGB, and OR-RCNN). The invention performed 20 5-fold cross validation runs, and for each experiment 80% of the ligand-receptor pairs were randomly selected as training data, the remainder as test data. Dimension d of each ligand-receptor interaction feature vector after dimensionality reduction is set to 327, 295, 280, and 346, respectively, on the four ligand-receptor interaction datasets.

Parameter settings for each model

/>

The present invention uses AUC and aucr to evaluate the performance of a ligand-receptor interaction prediction method. AUC and AUPR have been widely used to evaluate various classification models. AUC represents the area under the ROC curve. It can be calculated by the relationship of true positive rate and false positive rate. AUPR represents the area under the precision-recycle curve. This can be calculated by plotting the ratio of ligand-receptor interactions observed in all predicted ligand-receptor interaction data for each given recall. The higher the AUC and AUPR, the better the performance. A grid search was performed to set the optimal value of α, finding that the proposed LRI-HDEnHD model achieves the highest AUC and aucr over the four ligand-receptor interaction datasets when α = 0.6.

Performance comparison of five ligand-receptor interaction prediction models

It can be observed from fig. 4-7 that LRI-HDEnHD achieved the highest AUC on data sets 1 and 4. In

data sets

2 and 3, although it calculated an AUC slightly lower than DNNXGB, the difference was very small. For example, the AUC values obtained from LRI-HDEnHD were only 0.12% and 0.38% lower than DNNXGB on

datasets

2 and 3, respectively. More importantly, it can be observed from fig. 8-11 that LRI-HDEnHD calculated the highest AUPR over the four datasets at values of 0.8537, 0.8112, 0.8052 and 0.8399, respectively, 5.29%, 3.55%, 3.60% and 2.66% better than the second best approach. LRI-HDEnHD gave the best performance in most cases, demonstrating its powerful ligand-receptor interaction classification capability. Thus, the present invention utilizes predicted ligand-receptor interactions to improve cellular communication assays.

Performance comparison of Individual and Integrated models

The invention compares the performance of LRI-HDEnHD with HNBM and DNN for classifying the ligand-receptor interaction data, and according to the results in Table 4, the results show that LRI-HDEnHD obtains the optimal AUC and AUPR on four data sets, which are obviously superior to HNBM and DNN.

Sensitivity selection of parameter alpha

To investigate the effect of parameter α on the performance of the classification of ligand-receptor interactions, the present invention takes into account the prediction accuracy when α is set in the range of [0.5,0.8 ]. The results from table 5 show that in most datasets, the LRI-HDEnHD model achieves better AUC and aucr on four ligand-receptor interaction datasets when α is set to 0.6. Thus, the present invention selects an α of 0.6 to balance the effect of HNBM and DNN on the performance of the ligand-receptor interaction classification.

/>

And (4) performance comparison conclusion: LRI-HDEnHD can achieve strong ligand-receptor interaction classification accuracy due to the following features: (i) various biological characteristics of the ligands and receptors were extracted. (ii) HNBM has strong generalization capability and better global convergence and does not influence the classification performance. (iii) The DNN can process data sets of different scales and can improve classification accuracy. In addition, the LSTM network may adjust the weights of the self-recursive model by dynamically adding gates. (iv) The heterogeneous deep integration model considers the diversity of a complex heterogeneous fusion structure and shows good flexibility and strong generalization performance.

Verification of ligand-receptor interaction data

The LRI-HDEnHD model enables screening of new ligand-receptor interaction data to improve cellular communication assays. The present invention verifies the predicted ligand-receptor interaction in three ways: three representative cellular communication analysis tools, the existing PPI database and PubMed database, and molecular docking.

Validation of predicted ligand-receptor interaction data by three representative cellular communication analysis tools

LRI-HDEnHD was first analyzed with the consensus ligand-receptor interaction data predicted by celltalk db, cytoTalk and CellPhoneDB. Celltalk db is an artificially organized comprehensive database comprising 3398 human ligand-receptor interactions and 2033 mouse ligand-receptor interactions. CytoTalk constructs a cell-specific network using single-cell transcriptome data. It can effectively identify potential information paths. CellphoneDB is a new ligand-receptor interaction database. It takes into account the substructure of ligands and receptors and can predict cellular communication from single cell transcriptomics data. These three methods allow screening of predictive ligand-receptor interaction data. Table 6 shows LRI-HDEnHD, cellTalkDB, cytoTalk and CellPhoneDB prediction of 4 data sets of coincidence of ligand-receptor interaction number.

In data set 1, 202 coincident ligand-receptor interactions were predicted by LRI-HDEnHD and celltalk db, and 13 coincident ligand-receptor interactions were predicted by LRI-HDEnHD and CytoTalk. After deleting the repeated ligand-receptor interaction (DEFB 4A-CCR 6), LRI-HDEnHD and three cell communication analysis tools predicted 214 coincident ligand-receptor interaction data. Similarly, the other 3 data sets had 384, 322 and 9 coincident ligand-receptor interaction data.

Ligand-receptor interaction validation by the existing five PPI databases and PubMed databases

The present invention validates predicted ligand-receptor interactions by manually searching five PPI databases (BioGrid, STRING, pickle, intact, and Pathway Commons) and PubMed databases. The results show that 313, 57, 30 and 827 ligand-receptor interactions identified by LRI-HDEnHD on four ligand-receptor interaction datasets can be validated by one or more of the six databases described above.

Ligand-receptor interaction validation by molecular docking

30 predicted ligand-receptor interaction data were randomly selected on each data set for molecular docking experimental validation. The structures of ligand and receptor were downloaded using the online tool ZDOCK, and then molecular docking was performed using the molecular docking tool PDBePISA with the parameters set to default values. Table 7 shows the ligand, receptor, binding energy (kcal/mol) (BE), hydrogen Bond (HB), and Interfacial Area (IA) of the ligand-receptor interaction with the highest binding energy among the 30 ligand-receptor interactions randomly selected by molecular pairing on each dataset.

Example 2

The embodiment provided by the invention comprises the following steps: an application of cell communication method mediated by analysis of ligand-receptor interaction in human melanoma tissue cell communication analysis is disclosed.

Studies have shown that all cells have the ability to communicate with each other, and one way to assess the strength of communication between two cell types is to count the number of ligand-receptor interactions associated with the two cell types separately, and to consider the expression levels of ligands and receptors from these ligand-receptor interactions. The present invention uses seven cell type data from a suspension of melanoma-derived single cells provided by Zhou et al to analyze cell communication in melanoma: melanoma cancer cells, cancer-associated fibroblasts (CAF), macrophages, endothelial cells, T cells, B cells and natural killer cells (NK), comprising in particular the steps of:

based on steps S1 and S2 in example 1.

S3: ligand-receptor interaction filtering, filtering ligand-receptor interactions in combination with single cell transcriptome data, known ligand-receptor interactions and recognized ligand-receptor interactions, comprising the steps of:

s3.1: single cell transcriptome data associated with melanoma was downloaded from GEO databases.

S3.2: the ligand-receptor interactions were filtered by integrating known ligand-receptor interactions, predicted ligand-receptor interactions and single cell transcriptome data in melanoma tissues. In data set 1, 67595 ligand-receptor interactions were identified in addition to 3390 known ligand-receptor interactions.

S3.3: when a ligand or receptor in a certain ligand-receptor interaction is not expressed in cells of the melanoma tissue single cell transcriptome data, the ligand-receptor interaction is removed. Thus, 24881 ligand-receptor interactions associated with melanoma tissue were obtained in dataset 1.

S4: and (3) predicting cell communication, namely predicting a cell communication method to identify the cell communication based on the filtered ligand-receptor interaction and a combined scoring strategy of a combined expression threshold and an expression product, and calculating communication scores among the seven cell types in the melanoma tissue by using the expression threshold method, the expression product method and the combined scoring strategy.

S4.1 cellular communication analysis based on filtered ligand-receptor interaction and expression thresholding

Based on known ligand-receptor interactions and screened ligand-receptor interactions, expression thresholding was used to calculate cellular communication between melanoma cancer cells and other six cell types (i.e., CAF, macrophages, endothelial cells, T cells, B cells, and NK cells). The results are shown in table 8, table 8 showing the communication specificity ranking between melanoma cancer cells and six other cell types, the number of ligand-receptor interactions mediating melanoma cancer cell communication with 6 other cell types, and the number of ligand-receptor interactions of the other cell types with melanoma tumor cell "tapped back".

And (4) conclusion: CAF communicates most strongly with human melanoma cancer cells, followed by endothelial cells and macrophages. In particular, it was found that T cells communicate with melanoma cancer cells with relatively low intensity in all six cell types.

S5.1 cell communication visualization, and cell communication is visualized by heat map and network view.

The heat map, as shown in FIG. 12, rows and columns represent cells expressing the ligand and receptor, respectively. The communication scores between the two cell types are expressed using different colors. Darker colors indicate stronger communication.

Network view, as shown in FIG. 13, each edge represents communication between two cell types, starting with a cell type expressing one ligand and ending with a cell type expressing the other receptor. The thickness of the edge is proportional to the cellular communication score.

And (4) conclusion: ren et al found that human melanoma cancer cells had low intensity of communication with T cells. The present invention obtains the same results as Ren et al based on known and screened ligand-receptor interaction and expression threshold methods, and constructs a cell-cell communication network in melanoma tissue by means of heat maps and network maps.

S4.2 cellular communication assay based on filtered ligand-receptor interaction and expression product method

The expression product method was used to measure the intensity of cellular communication in human melanoma tissue based on filtered ligand-receptor interactions. Table 9 shows the first three pairs of ligand-receptor interactions that mediate communication of cancer cells to the other 6 cell types in melanoma tissue, and the first three pairs of ligand-receptor interactions that mediate communication of the other six cell types to cancer cells.

S5.2 visualization of cellular communication, fig. 14 illustrates the top 3 ligand-receptor interactions per cellular communication in human melanoma tissue, with darker ligand-receptor interactions more likely to mediate corresponding cellular communications. The heatmap of fig. 15 and the network view of fig. 16 visualize cellular communication.

And (4) conclusion: CAF communicates most strongly with human melanoma cancer cells, followed by macrophages and endothelial cells. In addition, melanoma cancer cells communicate with CAF through B2MRPSA, HLA-A-RPSA and HLA-C-RPSA, whereas CAF communicates with melanoma cells through B2M-RPSA, HLA-CRPSA and HLA-A-RPSA.

S4.3 cellular communication analysis based on filtered ligand-receptor interaction and Joint Scoring strategy

The cellular communication in human melanoma tissues was studied by joint scoring, and it can be seen in table 9 that the joint scoring strategy achieved the same cellular communication ranking as the expression product method. From tables 8-9, it was found that CAF, macrophages and endothelial cells may communicate with melanoma cancer cells.

S5.3 visualization of cell communication, fig. 17 mediates the first 3 pairs of ligand-receptor interactions for human melanoma tissue cell communication, fig. 18 heatmap and fig. 19 network view evaluate cell communication between human melanoma tissue and eight other cell types. Several studies have demonstrated that CAF is closely associated with tumor growth, metastasis and therapeutic resistance, melanoma cells communicate with tumor-associated macrophages by releasing soluble factors that enhance or prevent tumor growth, metastatic B16 melanoma cells produce macrophage cytotoxic elements that allow tumor cells to escape immune surveillance of the host and prevent metastasis, notch signaling can mediate communication between melanoma cells and endothelial cells, and surface-spreading melanoma cells can be transformed into endothelial cells at the site of metastasis.

Comparison of LRI-HDEnHD with CellPhoneDB, LIANA and CellChat cell communication analysis in melanoma tissue

To further analyze the performance of the proposed cellular communication prediction framework (LRI-HDEnHD + join), the present invention compares it to three cellular communication analysis tools, cellPhoneDB, LIANA and CellChat. CellPhoneDB is a new ligand-receptor interaction database, can be used from single cell transcriptome data prediction between different cell types communication. It was analyzed how to evaluate the communication between 10000 cells and 19 cell types. LIANA is an open source ligand-receptor assay framework. It systematically compared 16 cellular communication analysis resources and 7 methods and evaluated the consistency of cellular communication prediction methods by spatial co-localization, cytokine activity and receptor protein abundance. CellChat quantitatively identifies and analyzes cell communication networks from single-cell RNA sequencing data based on network analysis and pattern recognition methods. CellPhoneDB, LIANA and CellChat showed good cell communication prediction performance and are the most advanced cell communication analysis tools. Table 10 shows the results of cellular communication analysis in human melanoma tissue obtained from the above four cellular communication analysis tools.

From table 10, it can be found that LRI-HDEnHD in combination predicts that CAF is more likely to communicate with melanoma cancer cells, consistent with the results obtained with LIANA and CellChat. In addition, macrophages are predicted to have a likely signaling pathway with melanoma cancer cells, whereas CellPhoneDB predicts that macrophages are most likely to communicate with melanoma cancer cells. FIG. 28 shows a predicted cellular communication heatmap of LRI-HDEnHD-join, cellPhoneDB, LIANA, and CellChat in human melanoma tissue. FIG. 28 shows that LRI-HDEnHD-join, cellPhoneDB, LIANA and CellChat predict that CAF is more likely to communicate with melanoma cancer cells among all seven cell types.

Comparison of LRI-HDEnHD and Nichenet

The LRI-HDEnHD and NicheNet predicted coincident ligand-receptor interaction data in melanoma tissues were further compared. NicheNet accurately predicts ligand-receptor interactions that mediate cell communication based on expression information and prior knowledge of gene regulation and signaling networks. Unlike most current cell communication prediction methods, nicheNet studies cell communication from single cell expression data, which takes into account genetic regulatory information of ligands and integrates intracellular signaling and transcriptional regulation. Nichenet is mainly used for ligand-receptor interaction recognition and is a very useful ligand-receptor interaction prediction tool. Thus, the present invention analyzes coincident ligand-receptor interaction data not observed in human melanoma tissue but predicted by LRI-HDEnHD to be screened by NicheNet. In melanoma tissue, LRI-HDEnHD eventually yielded 23768 ligand-receptor interactions from ligand-receptor pairs not observed in data set 1. NicheNet achieved 10555 ligand-receptor interactions in melanoma tissues. Finally, there are 1981 coincident ligand-receptor interactions between the ligand-receptor interactions predicted by LRI-HDEnHD and NicheNet in human melanoma tissue.

And (4) conclusion:

in this study, the specificity of cellular communication was analyzed using expression thresholding to determine the cell type in human melanoma tissue by setting a threshold for the expression level of ligands and receptors from each ligand-receptor pair. When the expression level of both genes is greater than a threshold, the ligand-receptor pair will be considered "active"; otherwise, it will be in an "inactive" state. That is, the ligand-receptor pairs would be assigned to 1 and 0, respectively. All ligand-receptor pairs are classified as "active" or "inactive" according to various thresholds.

However, the expression threshold method assumes that higher gene expression is required for ligand-receptor interaction and requires selection of a gene expression threshold, which may lead to false positives and false negatives. More importantly, many proteins have different biological activities depending on the concentration, and the mRNA-protein level association varies from gene to gene.

The expression product method calculates successive values based on the expression products of ligand and receptor, thereby successfully identifying the difference of one ligand-receptor pair between cells. However, the expression product approach can be problematic when there is a significant difference in transcription level between the ligand and receptor in a ligand-receptor interaction, where one ligand or receptor dominates the interaction signal.

Therefore, the invention designs a combined scoring strategy analysis ligand-receptor interaction mediated cell communication method based on a heterogeneous deep integration model and single cell transcriptome data, so as to identify the heterogeneity of trans-cell gene expression and further help differential expression analysis.

Example 3

The embodiment provided by the invention comprises the following steps: an application of the method for analyzing the cell communication mediated by the ligand-receptor interaction in the cell communication analysis of human HNSCC tissue.

Human HNSCC, derived from mucosal epithelial cells, is one of the most aggressive and recurrent cancers, and Puram et al, system studies the ligand-receptor expression pattern between HNSCC cancer cells and other eight cell types (i.e., fibroblasts, B-cells, myocytes, macrophages, endothelial cells, T-cells, dendritic cells and mast cells) in HNSCC-derived single cell suspensions to measure the intensity of cellular communication in human HNSCC tissues, comprising the steps of:

based on steps S1 and S2 in example 1.

s3.1: single-cell transcriptome data associated with HNSCC was downloaded from GEO database (GSE 103322).

S3.2 Filtering ligand-receptor interactions by integrating known ligand-receptor interactions, predicted ligand-receptor interactions and single cell transcriptome data in HNSCC tissues, 68400 ligand-receptor interactions associated with HNSCC tissues were obtained.

And S4, predicting cell communication, namely predicting the cell communication method based on the filtered ligand-receptor interaction and a combined scoring strategy of a combined expression threshold and an expression product to identify the cell communication, and calculating the communication score among the nine cell types in the HNSCC tissue by using the expression threshold method, the expression product method and the combined scoring strategy.

The intensity of cellular communication between HNSCC cancer cells and the other eight cell types was calculated using expression thresholding. Table 11 shows the communication between HNSCC cancer cells and 8 other cell types.

And (4) conclusion: dendritic cells communicate most strongly with HNSCC cancer cells, followed by endothelial cells and mast cells.

S5.1 cell communication visualization, fig. 20 heatmap and fig. 21 network view assess the strength of communication between HNSCC cancer cells and other eight cell types.

S4.2 cell communication analysis based on filtered ligand-receptor interaction and expression product method

The present invention further uses the expression product method to identify cellular communication specificity in HNSCC tissue with filtered ligand-receptor interactions, and Table 12 gives the information on the communication between HNSCC cancer cells and other 8 cell types.

And (4) conclusion: the results show that macrophages have the highest specificity of communication with HNSCC cancer cells, followed by endothelial cells and fibroblasts. Furthermore, HNSCC cancer cells communicate with macrophages via B2M-CD74, MIF-CD74 and CALM2-CD74, while macrophages communicate with HNSCC cancer cells via B2M-CD9, B2MIFITM1 and B2M-RPSA.

S5.2 cell communication visualization, fig. 22 graphically illustrates the top three ligand-receptor interactions affecting each cell communication, fig. 23 heatmap, and fig. 24 network view characterizing the communication between HNSCC cancer cells and the other eight cell types.

The cellular communications between HNSCC cancer cells and the other eight cell types were scored using a joint scoring strategy, and table 13 shows the ranking of the specificity of the communications between HNSCC cancer cells and the other 8 cell types, the first 3 pairs of ligand-receptor interactions that mediate communications from HNSCC cancer cells to the other eight cell types, and the first 3 pairs of ligand-receptor interactions that mediate communications from the other cell types to HNSCC cancer cells.

S5.2 cell communication visualization, fig. 25 graphically depicts the top three ligand-receptor interactions in each pair of cell communications, fig. 26 heatmap, and fig. 27 network view characterizing cell communications between HNSCC cancer cells and the other eight cell types.

And (4) conclusion: the results show that, unlike the expression product method, the combined scoring strategy predicts that the endothelial cells have the strongest communication ability with the human HNSCC cancer cells, and then macrophages and fibroblasts are the second. In addition, the proposed framework predicts that dendritic cells are likely to communicate with HNSCC cancer cells based on expression threshold methods. Hypoxia is one of the most prominent features of HNSCC. It induces tumor progression and is closely associated with tumor recurrence, chemotherapy resistance and low survival rate. Up-regulation of pro-inflammatory cytokine secretion by dendritic cells can improve the plasticity of immune cells.

Comparison of LRI-HDEnHD-join with CellPhoneDB, LIANA and CellChat in HNSCC tissue cell traffic analysis

To further analyze the performance of the proposed LRI-HDEnHD-join framework, the present invention compared it with three cytological analysis tools, cellPhoneDB, LIANA and CellChat, in HNSCC tissue. Table 14 shows the results of the cellular communication analysis obtained by the four cellular communication analysis tools on human HNSCC tissues.

/>

The results in Table 14 show that LRI-HDenHD-join predicts that endothelial cells have the strongest communication specificity with HNSCC cancer cells, followed by macrophages and fibroblasts. In addition, cellPhoneDB, LIANA and CellChat demonstrated strong communication between endothelial cells and HNSCC cancer cells. FIG. 29 shows a heat map of cell traffic identified by LRI-HDEnHD-join, cellPhoneDB, LIANA, and CellChat in human HNSCC tissue.

From FIG. 29, it can be seen that LRI-HDenHD-join predicts the strongest communication between endothelial cells and HNSCC cancer cells among all eight cell types likely to communicate with HNSCC cancer cells, and that CellPhoneDB, LIANA and CellChat also reached this conclusion. Comparison of LRI-HDEnHD and Nichenet

The invention further compares the ligand-receptor interaction prediction results of LRI-HDEnHD and Nichenet in human HNSCC tissues. In human HNSCC tissue, LRI-HDEnHD-join obtained 65145 ligand-receptor interactions and 29060 ligand-receptor interactions from the ligand-receptor pairs not observed in dataset 1 for Nichenet. Finally, in HNSCC, 4969 coincident interaction data were observed between the ligand-receptor interactions predicted by LRI-HDEnHD and NicheNet.

Computing time and storage analysis

Finally, the invention carries out calculation time and memory analysis on LRI-HDEnHD by four PPI prediction models XGboost, DNNXGB, OR-RCNN and PIPR. The experimental server is configured as follows: AMD EPYC 7302 CPU, geForce RTX 2080 Ti, 256GB memory and Ubuntu20.04.4 LTS operating system. Table 15 shows the calculated time (m) and space (MB) required for five ligand-receptor interaction prediction models to perform one 5-fold cross-validation experiment on four ligand-receptor interaction datasets.

The results show that DNNXGB takes the least amount of runtime and XGBoost requires the least amount of memory on the four ligand-receptor interaction datasets. In all five ligand-receptor interaction prediction methods, although LRI-HDEnHD requires more runtime and space than XGboost and DNNXGB, it costs much less runtime and space than OR-RCNN and PIPR. In particular, LRI-HDEnHD has run times of 134.94m, 91.66m, 86.17m and 258.89m on four datasets, 250.40%, 258.12%, 306.97% and 210.55% less than OR-RCNN, 52.62%, 43.44%, 52.77% and 16.42% less than PIPR, respectively. It requires 2880.02MB, 2225.61MB, 2303.09MB and 3957.66MB on four data sets, 690.66%, 981.24%, 946.12% and 456.55% less than OR-RCNN, 992.99%, 1938.04%, 1922.03% and 647.40% less than PIPR, respectively. With the rapid development of computer hardware technology, the computation time required for LRI-HDEnHD is relatively less important than the memory and its resulting prediction performance. Therefore, LRI-HDEnHD is more suitable for screening new ligand-receptor interactions than the other four most advanced prediction models of ligand-receptor interactions.

And (4) conclusion:

However, the expression threshold method assumes that higher gene expression is required for ligand-receptor interaction and requires selection of a gene expression threshold, which may lead to false positives and false negatives. More importantly, many proteins have different biological activities depending on concentration, and the mRNA-protein level correlation varies from gene to gene.

It will be evident to those skilled in the art that the invention is not limited to the details of the foregoing illustrative embodiments, and that the present invention may be embodied in other specific forms without departing from the spirit or essential attributes thereof. The present embodiments are therefore to be considered in all respects as illustrative and not restrictive, the scope of the invention being indicated by the appended claims rather than by the foregoing description, and all changes which come within the meaning and range of equivalency of the claims are therefore intended to be embraced therein, and any reference signs in the claims are not intended to be construed as limiting the claim concerned.

Claims

1. A method for analyzing a cell communication mediated by a ligand-receptor interaction, comprising the steps of:

s1: ligand-receptor interaction data collation, collecting a plurality of ligand-receptor interaction data sets;

s2: ligand-receptor interaction prediction, identifying potential ligand-receptor interactions by a heterogeneous deep integration model combining a heterogeneous Newton Boosting model and a deep neural network;

s4: predicting cell communication based on the filtered ligand-receptor interaction and a joint scoring strategy;

s5: cell communication was visualized.

2. The method for analyzing ligand-receptor interaction mediated cell communication according to claim 1, wherein the heterogeneous deep integration model screening of step S2 for identifying potential ligand-receptor interactions comprises the following steps:

s2.1: extracting characteristics;

s2.2: reducing the dimension;

s2.3: classification of ligand-receptor interactions by heterogeneous Newton Boosting models, respectively

Wherein x _i Denotes the ith sample, f ^M (x _i ) Representing model pairs x _i Calculated M classification probability, f ^m (x _i )＝f ^m-1 (x _i )+∈b _m (x _i )，∈>0 and deep neural network &>

Wherein->

a is the parameter 1, b represents a d-dimensional feature vector, and the probability of interaction is calculated for each ligand-receptor pair>

And &>

S2.4: integrating classification results, and fusing a heterogeneous Newton Boosting model and a deep neural network to obtain a heterogeneous deep integration model

And calculating the probability of the final ligand-receptor interaction->

To obtain a final ligand-receptor interaction, wherein: wherein alpha represents the relative importance of the heterogeneous Newton Boosting model and the deep neural network model on the classification performance of the ligand-receptor interaction, and needs to be learned through cross validation.

3. The method for analyzing ligand-receptor interaction mediated cell communication according to claim 2, wherein α in step S2.4 is set in the range of [0.5,0.8], preferably 0.6.

4. The method for analyzing ligand-receptor interaction mediated cell communication according to claim 2, wherein the feature extraction in step S2.1 comprises the following steps:

s2.1.1: obtaining sequence information of ligands and receptors from UniProt databases;

s2.1.2: using kmer, auto-covariance and cross-covariance combinations, distance-based Top-n maps, distance pair-based pseudo-amino acid compositions, parallel-related pseudo-amino acid compositions, and serial-related pseudo-amino acid compositions methods to extract biological features of proteins;

s2.1.3: these features are fused to describe each ligand or receptor so that a ligand-receptor pair can be represented as a 4576-dimensional vector using tandem operations.

5. The method of claim 2, wherein the step S2.2 dimensionality reduction comprises the steps of: after feature extraction, a ligand-receptor pair is described as a sample, dimension reduction is performed on the features of the sample while 99% of information is retained based on principal component analysis, and finally, a d-dimensional feature vector is obtained to describe each ligand-receptor pair.

6. The method for analyzing ligand-receptor interaction mediated cell communication according to claim 1, wherein said step S3 of ligand-receptor interaction filtering comprises the steps of:

s3.2: integrating the predicted ligand-receptor interaction data with known ligand-receptor interaction data to update the ligand-receptor interaction data;

s3.3: when a ligand or receptor for a certain ligand-receptor interaction is not expressed in the cells of the target tissue single cell transcriptome data, the ligand-receptor interaction is considered not to mediate cell communication in the tissue and thus is removed.

7. The method of claim 1, wherein the step S4 of jointly scoring comprises the steps of:

s4.1: respectively calculating cell communication fraction g of expression threshold value method and expression product method ₁ (k ₁ ,k ₂ ) And g ₂ (k ₁ ,k ₂ ) Which is expressed as

And/or>

Wherein

All indicate->

And &>

In the presence of a potential ligand-receptor interaction, </or>

And &>

Represents two cell types mediated by ligand-receptor interaction, i represents ligand, j represents receptor, and/or>

And &>

Represents the mean expression value of ligand i and receptor j in all cells>

And

indicates that ligand i and receptor j are present in the cell type->

And &>

Average expression value of [ sigma ] _i And σ _j The corresponding deviation is indicated.

S4.2: the two expressions were normalized by min-max scaling, which was expressed as

Wherein->

Denotes g ₁ (k ₃ ,k ₄ )(k ₃ ,k ₄ ＝1,2,…,m)；/>

Wherein->

Is represented by g ₂ (k ₃ ,k ₄ )(k ₃ ,k ₄ ＝1,2,…,m)。

S4.3: obtaining a combined cellular communication score g ₁ (k ₁ ,k ₂ ) Which is expressed as

8. The method for analyzing cell communication mediated by ligand-receptor interaction according to claim 1, wherein said step S5 visualizes cell communication in human melanoma tissue and squamous cell carcinoma of head and neck tissue comprising the steps of: the ligand-receptor interactions ranked 3 top per cell communication intensity in human melanoma tissue and head and neck squamous cell carcinoma tissue are illustrated, and the thermographic and network views visualize cell communication.

9. The method of claim 8, wherein upon analyzing and visualizing cellular communication in human melanoma tissue, CAF cells, macrophages and endothelial cells are found to be in strong communication with melanoma cancer cells.

10. The method of claim 8, wherein the analysis and visualization of cellular communication in squamous cell carcinoma of head and neck is followed by finding strong communication between endothelial cells, macrophage cells and fibroblasts with squamous cell carcinoma of head and neck.