CN115083524A - Method for detecting phase change critical point of complex biological system based on single cell diagram entropy - Google Patents

Method for detecting phase change critical point of complex biological system based on single cell diagram entropy Download PDF

Info

Publication number
CN115083524A
CN115083524A CN202210627839.2A CN202210627839A CN115083524A CN 115083524 A CN115083524 A CN 115083524A CN 202210627839 A CN202210627839 A CN 202210627839A CN 115083524 A CN115083524 A CN 115083524A
Authority
CN
China
Prior art keywords
cell
entropy
gene
local
critical
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202210627839.2A
Other languages
Chinese (zh)
Inventor
刘锐
钟佳元
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
South China University of Technology SCUT
Original Assignee
South China University of Technology SCUT
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by South China University of Technology SCUT filed Critical South China University of Technology SCUT
Priority to CN202210627839.2A priority Critical patent/CN115083524A/en
Publication of CN115083524A publication Critical patent/CN115083524A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16BBIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
    • G16B40/00ICT specially adapted for biostatistics; ICT specially adapted for bioinformatics-related machine learning or data mining, e.g. knowledge discovery or pattern finding
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16BBIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
    • G16B30/00ICT specially adapted for sequence analysis involving nucleotides or amino acids

Landscapes

  • Life Sciences & Earth Sciences (AREA)
  • Physics & Mathematics (AREA)
  • Health & Medical Sciences (AREA)
  • Medical Informatics (AREA)
  • Engineering & Computer Science (AREA)
  • General Health & Medical Sciences (AREA)
  • Spectroscopy & Molecular Physics (AREA)
  • Biophysics (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Biotechnology (AREA)
  • Evolutionary Biology (AREA)
  • Theoretical Computer Science (AREA)
  • Artificial Intelligence (AREA)
  • Chemical & Material Sciences (AREA)
  • Analytical Chemistry (AREA)
  • Proteomics, Peptides & Aminoacids (AREA)
  • Bioethics (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Data Mining & Analysis (AREA)
  • Databases & Information Systems (AREA)
  • Epidemiology (AREA)
  • Evolutionary Computation (AREA)
  • Public Health (AREA)
  • Software Systems (AREA)
  • Measuring Or Testing Involving Enzymes Or Micro-Organisms (AREA)

Abstract

The invention discloses a method for detecting a phase transition critical point of a complex biological system based on single cell diagram entropy, which is characterized in that a sparse gene expression matrix is converted into a non-sparse diagram entropy matrix from the perspective of a single cell specific network, and different dynamic characteristics between a critical transformation pre-stage and a critical stage are quantified based on the diagram entropy matrix, so that an early warning signal of a critical state or phase transition is detected. In order to verify the effectiveness of the detection method, the detection method is applied to a single-cell transcriptome dataset of five real early embryonic developments, which are respectively as follows: data of mouse embryonic fibroblasts induced to differentiate into nerve cells, data of neural progenitor cells differentiated into nerve cells, data of human embryonic stem cells differentiated into endoderm cells, data of mouse hepatoblast differentiated into hepatocyte and cholangiocyte cells, and data of mouse embryonic stem cells differentiated into mesoderm progenitor cells.

Description

Method for detecting phase change critical point of complex biological system based on single cell diagram entropy
Technical Field
The invention relates to the technical field of biological systems, in particular to a method for detecting a phase change critical point of a complex biological system based on single cell diagram entropy.
Background
The dynamic development process of biological systems can be generally regarded as the evolution of a nonlinear dynamical system having three stages, namely a critical pre-transition stage, a critical stage and a critical post-transition stage, wherein the critical stage is the critical point at which the critical pre-transition stage enters the critical post-transition stage. Conventional biomarkers aim to distinguish between relatively critical pre-and post-transition phases depending on the amount of expression of a particular molecule or the high or low content of molecular products, but the criticality of the phase transition of a complex biological system may not be detected because there is usually no significant difference between the critical pre-and critical phases. Therefore, the detection of early warning signals at critical stages is a challenge, which in practice means the prediction of critical points of phase transition in complex biological systems. The theoretical derivation of this calculation is as follows:
expressed in a nonlinear discrete-time dynamical system: z (t) ═ f (Z (t-1); P) to characterize the dynamic evolution process of complex biological systems, where Z (t) ═ Z (Z) is 1 (t),z 2 (t),…,z n (t)) is an R n A vector of (a), which represents the value of an internal variable of the system at the point in time t; p ═ P (P) 1 ,p 2 ,…,p s ) Is a parameter vector or driving factor representing a slowly varying factor, such as a genetic factor (SNP, CNV, etc.), an epigenetic factor (methylation, etc.) or an environmental factor. f: r n ×R s ×R n Is a non-linear function. The power system is assumed to satisfy the following three conditions:
(i)
Figure BDA0003678532060000011
is the stationary point of equation (1-2), i.e.
Figure BDA0003678532060000012
(ii) Presence of a regulatory parameter P 0 So that
Figure BDA0003678532060000013
At a fixed point
Figure BDA0003678532060000014
A characteristic value with the matched pair of 1 is obtained;
(iii) equation (1-2) at P ≠ P 0 There is not always a characteristic value modulo 1.
For such a non-linear system, the systemIn that
Figure BDA0003678532060000021
Will undergo a critical phase change or a critical phase change when the parameter P reaches the threshold value P c Bifurcation of (Gilmore, 1993). When P reaches P c Previously, the system should maintain a stable equilibrium so that the absolute values of all eigenvalues are within (0, 1). Parameter value P for changing system state c Referred to as a bifurcation parameter value or a threshold value, and the phase preceding such bifurcation is referred to as the critical transition preceding phase. In the ideal case of small noise, when a complex biological system approaches the critical phase, there is a dominant group defined as dynamic network biomarkers inside the system among all the observed variables, this group of molecules satisfying the following three conditions based on the observed data (Chen et al, 2012; Liu et al, 2012):
1. the variance of each molecule in this group of variables increases rapidly;
2. the pearson correlation coefficient between the inside of this group of variables increases rapidly;
3. the pearson correlation coefficient of each molecule in the set of variables with the outer molecule decreases rapidly.
From the nature of dynamic network biomarkers, the critical state transition of a system can be actually represented by a group of highly correlated and highly fluctuating molecules at the network level. In particular, dynamic network biomarkers exhibit significant collective fluctuating behavior as the system approaches critical conditions, so their correlation at the critical stage is significantly different from the critical pre-transition stage. For a sub-network consisting of dynamic network biomarkers, when the system approaches a critical state, the network structure changes significantly, indicating an upcoming critical phase. Thus, by exploring the dynamic information of this set of dominant molecules at the network level, the quantitative state change can be predicted.
Most biomolecules perform their function by interacting with functional modules or other biomolecules between modules. This inter-and intra-modular interconnectivity suggests that the effect of a particular genetic abnormality not only affects the activity of the gene product carrying it, but can also extend along the links of a network composed of biomolecules, altering the activity of other gene products. Therefore, understanding the interaction network environment of biomolecules is crucial for determining the phenotype of defects affecting biomolecules.
Disclosure of Invention
The invention aims to provide a method for detecting a phase change critical point of a complex biological system based on single cell diagram entropy by exploring dynamic difference information among different groups of cells on the single cell level. The method can quantitatively represent the stability and criticality of the gene regulation network among cell populations, is a novel method for analyzing the data of the single cell transcription group, is beneficial to tracking the dynamic development of a biological system from the aspect of network entropy, and can identify the critical stage under the condition of the single cell transcription data with sparse characteristics.
The purpose of the invention can be achieved by adopting the following technical scheme:
a method for detecting phase transition critical points of a complex biological system based on single cell diagram entropy, the method comprises the following steps:
s1, constructing a statistical relevance index
Figure BDA0003678532060000031
The process is as follows:
s11, for the normalized gene expression matrix, arbitrarily taking the gene pair (gi, g) j ) Drawing a scatter diagram in a planar rectangular coordinate system, wherein the vertical axis and the horizontal axis respectively represent the expression values of the two genes, each point in the diagram represents a cell, and for the cell C k Horizontal coordinate is
Figure BDA0003678532060000032
(Gene gi in cell C k Expression value in (1) and the vertical coordinate is
Figure BDA0003678532060000033
(Gene g) j In cell C k Expression value in (1), horizontal coordinate
Figure BDA0003678532060000034
Is gene g i In cell C k The vertical coordinate of the expression value of
Figure BDA0003678532060000035
Is gene g j In cell C k The expression value of (1);
s12, Gene Pair (g) i ,g j ) In the scattergram of (2), for cell C k Based on two preset parameters n (k) (E i ) 0.1N and N (k) (E j ) 0.1N (N stands for the number of cells in the data matrix) in each gene expression value
Figure BDA0003678532060000036
And
Figure BDA0003678532060000037
a bar frame is arranged nearby, wherein n (k) (E i ) Represents
Figure BDA0003678532060000038
Number of nearby cells, n (k) (E j ) Represents
Figure BDA0003678532060000039
The number of nearby cells;
s13, labeling the number of cells in the overlapping part of the two frames as n (k) (E i ,E j );
S14, based on three statistics n (k) (E i )、n (k) (E j ) And n (k) (E i ,E j ) Constructing a statistical relevance index
Figure BDA00036785320600000310
The definition is as follows:
Figure BDA00036785320600000311
s2, for eachConstructing a specific network for each cell, and constructing a statistical relevance index based on the specific network
Figure BDA0003678532060000041
Structural cell C k If the statistical relevance index
Figure BDA0003678532060000042
Greater than 0, i.e., equation (A1) is greater than 0, indicates that in cell C k Middle gene g i And g j There is a connecting edge between them, otherwise there is no connecting edge, through statistical relevance index
Figure BDA0003678532060000043
Determination of cell C between any two genes k If there is a connecting edge, after traversing all the gene pairs, constructing cell C k Specific network of (2) (k) Extracting each local network or subnetwork from the cell specific network, each gene having a corresponding local network, the local network being formed by a central node gene and first order neighbours of the central node gene, dividing the cell specific network into a slice of local network, dividing the cell C into a plurality of slices of local networks k Specific network of (2) (k) Partitioning into M (M representing the number of genes in the data matrix) local networks;
s3, calculating a local graph entropy value for each local network, cell C k Specific network of (2) (k) After being divided into partial networks of one piece, the gene g is divided into i Local network as central node
Figure BDA0003678532060000044
Computing local networks
Figure BDA0003678532060000045
The entropy of the local graph of (a) is defined as follows:
Figure BDA0003678532060000046
wherein
Figure BDA0003678532060000047
In the formula, statistical relevance index
Figure BDA0003678532060000048
Represents the center node gene g i And its first-order neighbor genes
Figure BDA0003678532060000049
The weight coefficient between the weight of the first and second groups,
Figure BDA00036785320600000410
represents the center node gene g i First order neighbor genes of
Figure BDA00036785320600000411
In cell C k The constant S represents a local network
Figure BDA00036785320600000412
According to the formula (A2), cell C k Each gene g in (1) i The expression value of the single cell transcriptome data can be converted into a local map entropy value, and a sparse gene expression matrix of the single cell transcriptome data is converted into a non-sparse local map entropy value matrix in a one-to-one conversion mode.
S4, calculating map entropy of the single cell based on the group of gene clusters with the maximum local map entropy, and calculating the map entropy of the single cell for the cell C k Calculating cell C k The diagram entropy of (a) is defined as follows:
Figure BDA00036785320600000413
where the constant T is a tunable parameter set to the number of the first 5% of genes with the largest entropy of the map, in equation (A4), H (k) Represents cell C k The graph entropy of (c).
S5, calculating the mean map entropy H of the cell population as follows:
Figure BDA0003678532060000051
wherein Q represents the cell number of the cell population, and the early warning signal of the phase transition critical point of the complex biological system is detected based on the graph entropy H.
Further, the step S1 is based on the statistical relevance index
Figure BDA0003678532060000052
Whether it is greater than the threshold value 0, if
Figure BDA0003678532060000053
Then represents gene g i And g j There is a connecting edge between them, otherwise there is no connecting edge.
Further, the step S2 is to construct a cell-specific network, which is beneficial for analyzing network characteristics of different cells.
Further, in the step S3, based on the equation (a2), the sparse gene expression matrix (large noise data characteristic) can be converted into the non-sparse local graph entropy matrix (small noise data characteristic), so as to achieve the noise reduction effect.
Furthermore, the number of the genes with the adjustable parameters T of the first 5% with the maximum entropy value of the local map is taken in the step S4, so that the accuracy of the calculation result can be improved, and the complexity of calculation and analysis can be reduced.
Further, the sudden and rapid increase of the mean map entropy H of the cell population in said step S5 is indicative of an upcoming critical transition, or the occurrence of a critical point of phase transition of a complex biological system.
Compared with the prior art, the invention has the following advantages and effects:
the invention provides a single cell diagram entropy-based calculation method for identifying critical transitions of complex biological systems, which is validated by real data sets. It is worth noting that the present invention aims to detect early warning signals generated by critical stages, rather than to find evidence that critical post-transition stages of qualitative change have occurred.
1. The traditional differential expression analysis method can only judge whether the biological system is in the stage before critical transition or in the stage after critical transition, and the critical state of the stage before critical transition, namely the critical transition critical period, can not be effectively perceived, so that the critical expression analysis method can accurately reflect the critical stage in the development process of the complex biological system;
2. the single cell transcription data in the existing single cell analysis technology has the characteristics of sparsity, strong noise, heterogeneity and the like, and the critical point signals are not obvious, but the method can overcome the defect;
3. analysis at the cellular network level enables more reliable characterization of critical transition key phases of biological systems than analysis based on gene expression levels of single cells;
4. the method of the present invention is a model-free method, which means that there is neither feature selection nor model/parameter training process. Therefore, unlike traditional machine learning or classification methods, a robust model is generated during the learning process, requiring a large number of samples to avoid the over-fitting problem;
5. the method opens up a new way for predicting the critical transition critical period of the complex biological system on the single cell level, and is favorable for tracking the dynamic development of the biological system and the critical molecular mechanism thereof from the single cell level.
Drawings
The accompanying drawings, which are included to provide a further understanding of the invention and are incorporated in and constitute a part of this application, illustrate embodiment(s) of the invention and together with the description serve to explain the invention without limiting the invention. In the drawings:
FIG. 1 is a schematic diagram of the detection of critical phases based on graph entropy algorithm disclosed in the present invention;
FIG. 2(A) is a schematic diagram showing the identification of the critical point at which mouse embryonic fibroblasts are induced to differentiate into neural cells in the present invention;
FIG. 2(B) is a schematic diagram showing the identification of the critical point of differentiation of neural progenitor cells into neural cells in the present invention;
FIG. 2(C) is a schematic diagram showing the identification of the critical point of differentiation of human embryonic stem cells into endoderm cells in the present invention;
FIG. 2(D) is a schematic diagram showing the identification of the critical points of the differentiation of mouse hepatoblasts into hepatocytes and cholangiocytes in the present invention;
FIG. 2(E) is a schematic diagram showing the identification of the critical point of the differentiation of mouse embryonic stem cells into mesodermal progenitors in the present invention;
FIG. 2(F) is a schematic diagram showing the clustering of signal genes in data on the induction of differentiation of mouse embryonic fibroblasts into neural cells according to the present invention;
FIG. 2(G) is a schematic diagram showing the clustering of signal genes in the data on the differentiation of neural progenitor cells into neural cells according to the present invention;
FIG. 2(H) is a schematic diagram showing the clustering of signal genes in the data of the differentiation of human embryonic stem cells into endoderm cells according to the present invention;
FIG. 2(I) is a schematic diagram of the clustering of signal genes of mouse hepatoblast differentiated hepatocyte and cholangiocyte data in the present invention;
FIG. 2(J) is a schematic diagram of clustering of data signal genes for differentiation of mouse embryonic stem cells into mesodermal progenitor cells in accordance with the present invention;
FIG. 3(A) is a schematic diagram of "dark gene" of data on the differentiation of mouse embryonic fibroblasts into neural cells induced in the present invention;
FIG. 3(B) is a schematic diagram of the "dark gene" of the data of mouse hepatoblasts differentiated into hepatocytes and cholangiocytes in the present invention;
FIG. 3(C) is a schematic diagram of the "dark genes" of the data of the differentiation of human embryonic stem cells into endoderm cells in accordance with the present invention;
FIG. 4(A) is a schematic diagram of the signal path of differentiation of human embryonic stem cells into endodermal cell data in which the first-order difference genes are mainly enriched;
FIG. 4(B) is a schematic diagram of the signal path for differentiation of human embryonic stem cells into a major enrichment of the "dark genes" in endoderm cell data according to the present invention;
FIG. 4(C) is a schematic diagram showing the regulation and control relationship between "dark gene" and differential first-order neighbor gene in the data of human embryonic stem cells differentiated into endoderm cells in MAPK and PI3K/Akt signaling pathways.
Detailed Description
In order to make the objects, technical solutions and advantages of the embodiments of the present invention clearer, the technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are some, but not all, embodiments of the present invention. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
Examples
As shown in FIG. 1, the embodiment of the present invention discloses a method for detecting phase transition critical points of a complex biological system based on unicell diagram entropy. According to the schematic flow diagram disclosed in fig. 1, the results obtained by the example are as follows:
1. key period of 'cell fate choice' in early-stage embryo development process based on graph entropy algorithm early warning
The graph entropy algorithm is applied to the single-cell transcriptome data of five early embryo developments, and an early warning signal of a 'cell fate choice' key phase in the early embryo development process is detected. Specifically, the map entropy for each cell is calculated based on the steps of the map entropy algorithm. Further, the mean map entropy of the cell population at each time point is calculated, and then the early warning signal of the key phase of 'cell fate choice' in the early embryo development process is detected based on the mean map entropy H at each time point. For the data of the mouse embryonic fibroblasts induced to differentiate into the neural cells, as shown in fig. 2(a), the mean map entropy H rapidly increased from day 5 to day 20, and the statistical P value thereof was significant (P ═ 0.0168). Significant changes in mean map entropy H forewarn a "cell fate decision" key phase after day 20, i.e. the induction of differentiation of mouse embryonic intermediate cells into neurons occurs at day 22. For the data on the differentiation of neural progenitor cells into neural cells, as shown in fig. 2(B), a significant difference change in mean entropy H (P0.0362) occurred on day 1, indicating that a "cell fate choice" key phase is about to occur after day 1. The early warning signal was consistent with observations in the original experiment showing that cell heterogeneity was minimal at day 1, that cell heterogeneity began to increase after day 1, and that neuron heterogeneity was maximal at day 30. For the data of human embryonic stem cell differentiation into endoderm cells, fig. 2(C) shows that the significant difference change of mean graph entropy H (P ═ 0.0196) occurred at 36 hours, and the "cell fate decision" key phase occurred after the early warning for 36 hours, i.e. the differentiation of human embryonic stem cells into endoderm cells occurred at 72 hours. As shown in fig. 2(D), for the data of mouse hepatoblast differentiation into hepatocytes and cholangiocytes, a significant change in mean map entropy H (P ═ 7.3076E-05) occurred at day E12.5 during embryonic development, while hepatoblasts differentiated into hepatocytes and cholangiocytes after day E12.5. For the data of the differentiation of mouse embryonic stem cells into mesodermal progenitors, it can be seen from fig. 2(E) that the mean map entropy H is statistically significantly different at 24 hours (P0.0288), and the "cell fate choice" key phase appears 24 hours after the early warning. In fact, pluripotent stem cells differentiate into endoderm at around 48 hours. Therefore, the graph entropy algorithm can successfully detect the early warning signal of the 'cell fate decision' key period in the early embryo development process.
In addition, in order to test the performance of the local map entropy of the signal genes on cell clustering, for each single-cell data set, the signal genes with the maximum local map entropy are selected from the first 5% of the single-cell data sets at the critical points, and then the selected signal genes are subjected to t-distribution random neighborhood embedding (t-SNE) dimension reduction analysis and visualization based on the local map entropy. The results of cell clustering are shown in fig. 2(F) -2 (J), and for the single-cell dataset of these five early embryo developments, cell clustering based on local map entropy of signal genes can distinguish the states of cells at different stages or time points, i.e., cells at different time points are grouped into different categories, while cells at the same time point are grouped into one category. Therefore, the local map entropy of the signal gene has good performance on cell clustering, namely, clustering analysis based on the local map entropy of the signal gene can accurately distinguish heterogeneity of cells along with time under the resolution of single cells. Therefore, the graph entropy algorithm converts the sparse original gene expression matrix into a non-sparse local graph entropy matrix, which not only can be used for detecting the 'cell fate choice' key phase in the early embryo development process, but also can provide the entropy matrix for carrying out time point clustering analysis on cells and exploring the dynamic information of cell populations.
2. Mining 'dark gene' based on graph entropy algorithm "
In the biomedical field, differential expression analysis methods play a very important role in searching novel biomarkers, regulatory factors, drug targets and the like, but non-differential expression genes are often ignored by the traditional differential expression analysis methods. In fact, some non-differentially expressed genes are also involved in important biological processes, they are concentrated in important functional pathways and play important roles in the development of the embryo, and therefore this part of the non-differentially expressed genes should not be ignored. During the analysis of the single-cell transcriptome data set, we found that some non-differentially expressed genes have high sensitivity to local map entropy values, although their expression values have no significant difference. We have named such genes as "dark genes" which may play an important role in embryonic development. According to the definition of "dark gene" by predecessors, we define the judgment condition that a certain gene belongs to "dark gene" as follows: (i) no significant statistical difference in gene expression levels; (ii) the entropy value of the local graph has a significant statistical difference between a critical point and a non-critical point. To find "dark genes" closely related to the embryonic development process, we selected the first 5% genes with the largest local entropy at the critical point for each single-cell transcriptome dataset, and then analyzed the selected genes based on the judgment conditions of the "dark genes". FIGS. 3(A) -3 (C) show some "dark genes" of mouse embryonic fibroblasts induced to differentiate into neural cell data, mouse hepatoblasts differentiated into hepatocyte and cholangiocyte data, and human embryonic stem cells differentiated into endodermal cell data, respectively. As shown in fig. 3(a) -3 (C), for each "dark gene", it did not change significantly at the gene expression level, but did change significantly at the local map entropy (net entropy) level. Although they do not significantly change at the level of gene expression, it is likely that critical transitions in the development of the embryo play an important role.
As shown in fig. 4(a) -4 (B), KEGG pathway enrichment analysis was performed on the "dark gene" and its differential first-order neighbor genes in the human embryonic stem cell differentiation into endoderm cell data. These two genes are mainly enriched in pathways that are closely related to embryonic development. For example, the MAPK signaling pathway (MAPK signaling pathway) plays a key role in cell proliferation and differentiation. The PI3K/AKT signaling pathway regulates the proliferation and differentiation of various types of cells. The p53 signaling pathway (p53 signaling pathway) is a regulatory pathway for embryonic stem cell differentiation. To illustrate the potential regulatory relationship between the "dark genes" and their differential first-order neighbor genes on the PPI network, we showed the regulatory relationship of these two genes on the MAPK and PI3K/Akt signaling pathways (fig. 4 (C)). As can be seen from fig. 4(C), the "dark genes" such as IGF1 Growth Factor (GF), LAMC2, and COL4a1 extracellular matrix (ECM) are upstream regulatory factors that promote cell proliferation and differentiation by activating downstream molecules (differential first-order neighbor genes), and play a role as driving factors during cell differentiation. Although the expression level of the "dark genes" does not change much during the critical transition, the expression of some of their downstream molecules can change significantly, triggering cell proliferation and differentiation effects. In the MAPK and PI3K/Akt signaling pathways, there are signaling chains that play an important role in cell proliferation and differentiation. For example, in the MAPK signaling pathway, the dark gene MAPK9 is a key gene of c-Jun amino-terminal kinase signaling pathway (c-Jun N-terminal kinases signaling pathway), and can induce cell proliferation and differentiation effects by inducing various signals. As shown in fig. 4(C), up-regulation of molecules such as IGF1R, SOS1, and SOS2 will activate Ras and further activate RPS6KA3, which may lead to mitotic effects, thereby promoting proliferation and division of cells. In addition, in the PI3K/Akt signal pathway, extracellular matrix genes such as LAMC2, COL4A1 and TNC are found to activate downstream molecules ITGB1, and further participate in activating PI3K and AKT signal molecules together with down-regulated genes PPP2R5C, so that the expression of downstream molecules BRCA1 and CHUK is down-regulated, and the mitosis of cells G2/M is possibly promoted, and the proliferation and differentiation of the cells are further promoted. The signals for the Growth Factor (GF) of IGF1 and related genes LAMC2, COL4A1 extracellular matrix (ECM) to initiate cell proliferation and differentiation on the MAPK and PI3K/Akt signaling pathways occurred at approximately 36 hours, consistent with the time points reported in the original literature (Chu et al, 2016) for the differentiation of pluripotent embryonic stem cells into the "cell fate decision" critical phase of endoderm. Therefore, the map entropy algorithm helps to find "dark genes" that do not differ in gene expression levels but are sensitive to SGE, which play a key role in early embryo development.
The above embodiments are preferred embodiments of the present invention, but the present invention is not limited to the above embodiments, and any other changes, modifications, substitutions, combinations, and simplifications which do not depart from the spirit and principle of the present invention should be construed as equivalents thereof, and all such changes, modifications, substitutions, combinations, and simplifications are intended to be included in the scope of the present invention.

Claims (4)

1. A method for detecting a phase transition critical point of a complex biological system based on single cell diagram entropy is characterized by comprising the following steps:
s1, constructing a statistical relevance index
Figure FDA0003678532050000011
The process is as follows:
s11, for the normalized gene expression matrix, arbitrarily selecting the gene pair (g) i ,g j ) Drawing a scatter diagram in a plane rectangular coordinate system, wherein a vertical axis and a horizontal axis in the scatter diagram respectively represent expression values of the two genes, each point in the scatter diagram represents a cell, and for a cell C k Horizontal coordinate is
Figure FDA0003678532050000012
Figure FDA0003678532050000013
Is gene g i In cell C k The vertical coordinate of the expression value in (1) is
Figure FDA0003678532050000014
Figure FDA0003678532050000015
Is gene g j In cell C k Expression value in (1), horizontal coordinate
Figure FDA0003678532050000016
Is gene g i In cell C k The vertical coordinate of the expression value of
Figure FDA0003678532050000017
Is gene g j In cell C k The expression value of (1);
s12, Gene Pair (g) i ,g j ) In the scattergram of (2), for cell C k Based on two preset parameters n (k) (E i ) 0.1N and N (k) (E j ) 0.1N in the respective Gene expression values
Figure FDA0003678532050000018
And
Figure FDA0003678532050000019
a bar frame is arranged nearby, wherein n (k) (E i ) Represents
Figure FDA00036785320500000110
Number of nearby cells, n (k) (E j ) Represents
Figure FDA00036785320500000111
The number of nearby cells, N representing the number of cells in the data matrix;
s13, labeling the number of cells in the overlapping part of the two frames as n (k) (E i ,E j );
S14, based on three statistics n (k) (E i )、n (k) (E j ) And n (k) (E i ,E j ) Constructing a statistical relevance index
Figure FDA00036785320500000112
The definition is as follows:
Figure FDA00036785320500000113
s2, constructing a specific network for each cell, and constructing a statistical relevance index based on the constructed specific network
Figure FDA00036785320500000114
Structural cell C k If the statistical relevance index
Figure FDA00036785320500000115
Greater than 0, i.e., equation (A1) is greater than 0, indicates that in cell C k Middle gene g i And g j There is a connecting edge between them, otherwise there is no connecting edge, through statistical relevance index
Figure FDA00036785320500000116
Determination of cell C between any two genes k If there is a connecting edge, after traversing all the gene pairs, constructing cell C k Specific network of (2) (k) Extracting each local network or subnetwork from the cell specific network, each gene having a corresponding local network, the local network being formed by a central node gene and first order neighbours of the central node gene, dividing the cell specific network into a slice of local network, dividing the cell C into a plurality of slices of local networks k Specific network of (2) (k) Partitioning into M (M representing the number of genes in the data matrix) local networks;
s3, calculating a local graph entropy value for each local network, cell C k Specific network of (2) (k) After being divided into partial networks of one piece, the gene g is divided into i Office of central nodeNetwork of units
Figure FDA0003678532050000021
Computing local networks
Figure FDA0003678532050000022
The entropy of the local graph of (a) is defined as follows:
Figure FDA0003678532050000023
wherein
Figure FDA0003678532050000024
In the formula, statistical relevance index
Figure FDA0003678532050000025
Represents the center node gene g i And its first-order neighbor genes
Figure FDA0003678532050000026
The weight coefficient between the weight of the first and second groups,
Figure FDA0003678532050000027
represents the center node gene g i First order neighbor genes of
Figure FDA0003678532050000028
In cell C k The constant S represents a local network
Figure FDA0003678532050000029
According to the formula (A2), cell C k Each gene g in (1) i The expression value can be converted into a local map entropy value, and a sparse gene expression matrix of the single-cell transcriptome data is converted into a non-sparse gene expression matrix in a one-to-one conversion modeA sparse local graph entropy matrix.
S4, calculating map entropy of the single cell based on the group of gene clusters with the maximum local map entropy, and calculating the map entropy of the single cell for the cell C k Calculating cell C k The diagram entropy of (a) is defined as follows:
Figure FDA00036785320500000210
where the constant T is a tunable parameter set to the number of the first 5% of genes with the largest entropy of the map, in equation (A4), H (k) Represents cell C k The graph entropy of (c).
S5, calculating the mean map entropy H of the cell population as follows:
Figure FDA00036785320500000211
wherein Q represents the cell number of the cell population, and the early warning signal of the phase transition critical point of the complex biological system is detected based on the graph entropy H.
2. The method for detecting the critical point of phase transition of a complex biological system based on the single cell diagram entropy as claimed in claim 1, wherein the method is based on statistical relevance index
Figure FDA0003678532050000031
Whether it is greater than the threshold value 0, if
Figure FDA0003678532050000032
Then represents gene g i And g j There is a connecting edge between them, otherwise there is no connecting edge.
3. The method for detecting the critical point of phase transition of a complex biological system based on the single cell diagram entropy of claim 1, wherein the sudden and rapid increase of the average diagram entropy H of the cell population is indicative of an upcoming critical transition or the occurrence of the critical point of phase transition of the complex biological system.
4. The method for entropy-detecting phase transition critical points of a complex biological system based on a single cell diagram as claimed in claim 1, wherein the number of the first 5% genes with the largest entropy value of the local diagram is taken as the adjustable parameter T in step S4, so that the accuracy of the calculation result can be improved, and the complexity of the calculation analysis can be reduced.
CN202210627839.2A 2022-06-06 2022-06-06 Method for detecting phase change critical point of complex biological system based on single cell diagram entropy Pending CN115083524A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202210627839.2A CN115083524A (en) 2022-06-06 2022-06-06 Method for detecting phase change critical point of complex biological system based on single cell diagram entropy

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210627839.2A CN115083524A (en) 2022-06-06 2022-06-06 Method for detecting phase change critical point of complex biological system based on single cell diagram entropy

Publications (1)

Publication Number Publication Date
CN115083524A true CN115083524A (en) 2022-09-20

Family

ID=83248975

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210627839.2A Pending CN115083524A (en) 2022-06-06 2022-06-06 Method for detecting phase change critical point of complex biological system based on single cell diagram entropy

Country Status (1)

Country Link
CN (1) CN115083524A (en)

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111009292A (en) * 2019-11-20 2020-04-14 华南理工大学 Method for detecting phase change critical point of complex biological system based on single sample sKLD index
CN111261243A (en) * 2020-01-10 2020-06-09 华南理工大学 Method for detecting phase change critical point of complex biological system based on relative entropy index

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111009292A (en) * 2019-11-20 2020-04-14 华南理工大学 Method for detecting phase change critical point of complex biological system based on single sample sKLD index
CN111261243A (en) * 2020-01-10 2020-06-09 华南理工大学 Method for detecting phase change critical point of complex biological system based on relative entropy index

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
JIAYUAN ZHONG 等: "scGET: Predicting Cell Fate Transition During Early Embryonic Development by Single-cell Graph Entropy", 《GENOMICS PROTEOMICS BIOINFORMATICS》, 24 December 2021 (2021-12-24), pages 461 - 474 *

Similar Documents

Publication Publication Date Title
Talwar et al. AutoImpute: Autoencoder based imputation of single-cell RNA-seq data
Grün Revealing dynamics of gene expression variability in cell state space
Eraslan et al. Single-cell RNA-seq denoising using a deep count autoencoder
CN106682454B (en) A kind of macro genomic data classification method and device
WO2020154885A1 (en) Single cell type detection method, apparatus, device, and storage medium
Zhang et al. Imputing single-cell RNA-seq data by considering cell heterogeneity and prior expression of dropouts
Miao et al. scRecover: Discriminating true and false zeros in single-cell RNA-seq data for imputation
Bhar et al. Multiobjective triclustering of time-series transcriptome data reveals key genes of biological processes
Li et al. scImpute: accurate and robust imputation for single cell RNA-seq data
Zhong et al. scGET: predicting cell fate transition during early embryonic development by single-cell graph entropy
Park et al. Machine learning-based identification of genetic interactions from heterogeneous gene expression profiles
Zhang et al. PBLR: an accurate single cell RNA-seq data imputation tool considering cell heterogeneity and prior expression level of dropouts
Najar et al. Identifying cell state–associated alternative splicing events and their coregulation
Jin et al. Imputing dropouts for single-cell RNA sequencing based on multi-objective optimization
Liu et al. scESI: evolutionary sparse imputation for single-cell transcriptomes from nearest neighbor cells
CN115083524A (en) Method for detecting phase change critical point of complex biological system based on single cell diagram entropy
Wang et al. MMDAE-HGSOC: A novel method for high-grade serous ovarian cancer molecular subtypes classification based on multi-modal deep autoencoder
Bacher et al. Enhancing biological signals and detection rates in single-cell RNA-seq experiments with cDNA library equalization
Liu et al. Are dropout imputation methods for scRNA-seq effective for scATAC-seq data?
Mohammadi et al. DECODE-ing sparsity patterns in single-cell RNA-seq
Gan et al. DSAE-Impute: Learning discriminative stacked autoencoders for imputing single-cell rna-seq data
Zhang et al. MIClique: an algorithm to identify differentially coexpressed disease gene subset from microarray data
CN111461199B (en) Safety attribute selection method based on distributed junk mail classified data
Carl et al. A fully automated deep learning pipeline for high-throughput colony segmentation and classification
CN107609348B (en) High-throughput transcriptome data sample classification number estimation method

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination