CN115083524A - Method for detecting phase change critical point of complex biological system based on single cell diagram entropy - Google Patents
Method for detecting phase change critical point of complex biological system based on single cell diagram entropy Download PDFInfo
- Publication number
- CN115083524A CN115083524A CN202210627839.2A CN202210627839A CN115083524A CN 115083524 A CN115083524 A CN 115083524A CN 202210627839 A CN202210627839 A CN 202210627839A CN 115083524 A CN115083524 A CN 115083524A
- Authority
- CN
- China
- Prior art keywords
- cell
- entropy
- gene
- local
- critical
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 238000010586 diagram Methods 0.000 title claims abstract description 41
- 238000000034 method Methods 0.000 title claims abstract description 37
- 230000008859 change Effects 0.000 title description 16
- 230000014509 gene expression Effects 0.000 claims abstract description 33
- 230000007704 transition Effects 0.000 claims abstract description 30
- 239000011159 matrix material Substances 0.000 claims abstract description 19
- 108090000623 proteins and genes Proteins 0.000 claims description 91
- CIWBSHSKHKDKBQ-JLAZNSOCSA-N Ascorbic acid Chemical compound OC[C@H](O)[C@H]1OC(=O)C(O)=C1O CIWBSHSKHKDKBQ-JLAZNSOCSA-N 0.000 claims description 34
- 230000008569 process Effects 0.000 claims description 13
- 238000004458 analytical method Methods 0.000 claims description 9
- 238000004364 calculation method Methods 0.000 claims description 6
- 238000006243 chemical reaction Methods 0.000 claims description 2
- 108091008053 gene clusters Proteins 0.000 claims description 2
- 230000001976 improved effect Effects 0.000 claims description 2
- 238000002372 labelling Methods 0.000 claims description 2
- 238000000638 solvent extraction Methods 0.000 claims description 2
- 210000004027 cell Anatomy 0.000 abstract description 68
- 210000001671 embryonic stem cell Anatomy 0.000 abstract description 17
- 210000004039 endoderm cell Anatomy 0.000 abstract description 9
- 210000003999 epithelial cell of bile duct Anatomy 0.000 abstract description 7
- 210000003494 hepatocyte Anatomy 0.000 abstract description 7
- 210000002950 fibroblast Anatomy 0.000 abstract description 6
- 238000001514 detection method Methods 0.000 abstract description 4
- 210000005155 neural progenitor cell Anatomy 0.000 abstract description 4
- 210000002569 neuron Anatomy 0.000 abstract description 4
- 210000000130 stem cell Anatomy 0.000 abstract description 2
- 230000008143 early embryonic development Effects 0.000 abstract 1
- 210000003716 mesoderm Anatomy 0.000 abstract 1
- 230000009466 transformation Effects 0.000 abstract 1
- 230000004069 differentiation Effects 0.000 description 20
- 230000013020 embryo development Effects 0.000 description 12
- 230000024245 cell differentiation Effects 0.000 description 10
- 238000004422 calculation algorithm Methods 0.000 description 8
- 210000003061 neural cell Anatomy 0.000 description 8
- 108091054455 MAP kinase family Proteins 0.000 description 7
- 102000043136 MAP kinase family Human genes 0.000 description 7
- 230000000694 effects Effects 0.000 description 7
- 230000004663 cell proliferation Effects 0.000 description 6
- 238000011161 development Methods 0.000 description 6
- 230000018109 developmental process Effects 0.000 description 6
- 230000001105 regulatory effect Effects 0.000 description 6
- 230000019491 signal transduction Effects 0.000 description 6
- 230000007730 Akt signaling Effects 0.000 description 5
- 239000000090 biomarker Substances 0.000 description 5
- 108010037362 Extracellular Matrix Proteins Proteins 0.000 description 4
- 238000010195 expression analysis Methods 0.000 description 4
- 239000003102 growth factor Substances 0.000 description 4
- 230000037361 pathway Effects 0.000 description 4
- 102000010834 Extracellular Matrix Proteins Human genes 0.000 description 3
- 101001023271 Homo sapiens Laminin subunit gamma-2 Proteins 0.000 description 3
- 238000013459 approach Methods 0.000 description 3
- 210000002744 extracellular matrix Anatomy 0.000 description 3
- 230000035755 proliferation Effects 0.000 description 3
- 238000013518 transcription Methods 0.000 description 3
- 230000035897 transcription Effects 0.000 description 3
- 230000022963 DNA damage response, signal transduction by p53 class mediator Effects 0.000 description 2
- 101000901150 Homo sapiens Collagen alpha-1(IV) chain Proteins 0.000 description 2
- 101000599951 Homo sapiens Insulin-like growth factor I Proteins 0.000 description 2
- 102100037852 Insulin-like growth factor I Human genes 0.000 description 2
- 108010055717 JNK Mitogen-Activated Protein Kinases Proteins 0.000 description 2
- 102100035159 Laminin subunit gamma-2 Human genes 0.000 description 2
- 108091007960 PI3Ks Proteins 0.000 description 2
- 102000038030 PI3Ks Human genes 0.000 description 2
- 230000003213 activating effect Effects 0.000 description 2
- 230000009286 beneficial effect Effects 0.000 description 2
- 230000008901 benefit Effects 0.000 description 2
- 230000033228 biological regulation Effects 0.000 description 2
- 230000007547 defect Effects 0.000 description 2
- 238000005183 dynamical system Methods 0.000 description 2
- 210000001900 endoderm Anatomy 0.000 description 2
- 230000002068 genetic effect Effects 0.000 description 2
- 230000006698 induction Effects 0.000 description 2
- 210000001161 mammalian embryo Anatomy 0.000 description 2
- 230000004048 modification Effects 0.000 description 2
- 238000012986 modification Methods 0.000 description 2
- 230000009467 reduction Effects 0.000 description 2
- 108091008025 regulatory factors Proteins 0.000 description 2
- 102000037983 regulatory factors Human genes 0.000 description 2
- 238000006467 substitution reaction Methods 0.000 description 2
- 101000798762 Anguilla anguilla Troponin C, skeletal muscle Proteins 0.000 description 1
- 101100404726 Arabidopsis thaliana NHX7 gene Proteins 0.000 description 1
- 108700020463 BRCA1 Proteins 0.000 description 1
- 101150072950 BRCA1 gene Proteins 0.000 description 1
- 102100025401 Breast cancer type 1 susceptibility protein Human genes 0.000 description 1
- 102100022145 Collagen alpha-1(IV) chain Human genes 0.000 description 1
- 101001043764 Homo sapiens Inhibitor of nuclear factor kappa-B kinase subunit alpha Proteins 0.000 description 1
- 101001034652 Homo sapiens Insulin-like growth factor 1 receptor Proteins 0.000 description 1
- 101000935043 Homo sapiens Integrin beta-1 Proteins 0.000 description 1
- 101000950669 Homo sapiens Mitogen-activated protein kinase 9 Proteins 0.000 description 1
- 101000945090 Homo sapiens Ribosomal protein S6 kinase alpha-3 Proteins 0.000 description 1
- 101000783373 Homo sapiens Serine/threonine-protein phosphatase 2A 56 kDa regulatory subunit gamma isoform Proteins 0.000 description 1
- 101000868154 Homo sapiens Son of sevenless homolog 2 Proteins 0.000 description 1
- 101000666340 Homo sapiens Tenascin Proteins 0.000 description 1
- 102100021892 Inhibitor of nuclear factor kappa-B kinase subunit alpha Human genes 0.000 description 1
- 102100039688 Insulin-like growth factor 1 receptor Human genes 0.000 description 1
- 102100025304 Integrin beta-1 Human genes 0.000 description 1
- 102100033810 RAC-alpha serine/threonine-protein kinase Human genes 0.000 description 1
- 102100033643 Ribosomal protein S6 kinase alpha-3 Human genes 0.000 description 1
- 108700022176 SOS1 Proteins 0.000 description 1
- 101100197320 Saccharomyces cerevisiae (strain ATCC 204508 / S288c) RPL35A gene Proteins 0.000 description 1
- 102100032929 Son of sevenless homolog 1 Human genes 0.000 description 1
- 102100032930 Son of sevenless homolog 2 Human genes 0.000 description 1
- 101150100839 Sos1 gene Proteins 0.000 description 1
- 102100038126 Tenascin Human genes 0.000 description 1
- 230000005856 abnormality Effects 0.000 description 1
- 230000006399 behavior Effects 0.000 description 1
- 230000031018 biological processes and functions Effects 0.000 description 1
- 230000001413 cellular effect Effects 0.000 description 1
- 238000012512 characterization method Methods 0.000 description 1
- 230000002596 correlated effect Effects 0.000 description 1
- 230000000875 corresponding effect Effects 0.000 description 1
- 230000007423 decrease Effects 0.000 description 1
- 238000009795 derivation Methods 0.000 description 1
- 239000003596 drug target Substances 0.000 description 1
- 238000005516 engineering process Methods 0.000 description 1
- 238000010201 enrichment analysis Methods 0.000 description 1
- 230000007613 environmental effect Effects 0.000 description 1
- 230000001973 epigenetic effect Effects 0.000 description 1
- 238000002474 experimental method Methods 0.000 description 1
- 230000002349 favourable effect Effects 0.000 description 1
- 230000006870 function Effects 0.000 description 1
- 230000001939 inductive effect Effects 0.000 description 1
- 230000003993 interaction Effects 0.000 description 1
- 101150044508 key gene Proteins 0.000 description 1
- 238000012886 linear function Methods 0.000 description 1
- 238000010801 machine learning Methods 0.000 description 1
- 230000011987 methylation Effects 0.000 description 1
- 238000007069 methylation reaction Methods 0.000 description 1
- 238000005065 mining Methods 0.000 description 1
- 230000011278 mitosis Effects 0.000 description 1
- 230000000394 mitotic effect Effects 0.000 description 1
- 230000009456 molecular mechanism Effects 0.000 description 1
- 239000000101 novel biomarker Substances 0.000 description 1
- 210000001778 pluripotent stem cell Anatomy 0.000 description 1
- 230000001737 promoting effect Effects 0.000 description 1
- 230000035945 sensitivity Effects 0.000 description 1
- 230000011664 signaling Effects 0.000 description 1
- HOWHQWFXSLOJEF-MGZLOUMQSA-N systemin Chemical compound NCCCC[C@H](N)C(=O)N[C@@H](CCSC)C(=O)N[C@@H](CCC(N)=O)C(=O)N[C@@H]([C@@H](C)O)C(=O)N[C@@H](CC(O)=O)C(=O)OC(=O)[C@@H]1CCCN1C(=O)[C@H]1N(C(=O)[C@H](CC(O)=O)NC(=O)[C@H](CCCN=C(N)N)NC(=O)[C@H](CCCCN)NC(=O)[C@H](CO)NC(=O)[C@H]2N(CCC2)C(=O)[C@H]2N(CCC2)C(=O)[C@H](CCCCN)NC(=O)[C@H](CO)NC(=O)[C@H](CCC(N)=O)NC(=O)[C@@H](NC(=O)[C@H](C)N)C(C)C)CCC1 HOWHQWFXSLOJEF-MGZLOUMQSA-N 0.000 description 1
- 108010050014 systemin Proteins 0.000 description 1
- 238000012360 testing method Methods 0.000 description 1
- 238000012549 training Methods 0.000 description 1
- 230000003827 upregulation Effects 0.000 description 1
- 238000011144 upstream manufacturing Methods 0.000 description 1
- 238000012800 visualization Methods 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G16—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
- G16B—BIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
- G16B40/00—ICT specially adapted for biostatistics; ICT specially adapted for bioinformatics-related machine learning or data mining, e.g. knowledge discovery or pattern finding
-
- G—PHYSICS
- G16—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
- G16B—BIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
- G16B30/00—ICT specially adapted for sequence analysis involving nucleotides or amino acids
Landscapes
- Life Sciences & Earth Sciences (AREA)
- Physics & Mathematics (AREA)
- Health & Medical Sciences (AREA)
- Medical Informatics (AREA)
- Engineering & Computer Science (AREA)
- General Health & Medical Sciences (AREA)
- Spectroscopy & Molecular Physics (AREA)
- Biophysics (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Bioinformatics & Computational Biology (AREA)
- Biotechnology (AREA)
- Evolutionary Biology (AREA)
- Theoretical Computer Science (AREA)
- Artificial Intelligence (AREA)
- Chemical & Material Sciences (AREA)
- Analytical Chemistry (AREA)
- Proteomics, Peptides & Aminoacids (AREA)
- Bioethics (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Data Mining & Analysis (AREA)
- Databases & Information Systems (AREA)
- Epidemiology (AREA)
- Evolutionary Computation (AREA)
- Public Health (AREA)
- Software Systems (AREA)
- Measuring Or Testing Involving Enzymes Or Micro-Organisms (AREA)
Abstract
The invention discloses a method for detecting a phase transition critical point of a complex biological system based on single cell diagram entropy, which is characterized in that a sparse gene expression matrix is converted into a non-sparse diagram entropy matrix from the perspective of a single cell specific network, and different dynamic characteristics between a critical transformation pre-stage and a critical stage are quantified based on the diagram entropy matrix, so that an early warning signal of a critical state or phase transition is detected. In order to verify the effectiveness of the detection method, the detection method is applied to a single-cell transcriptome dataset of five real early embryonic developments, which are respectively as follows: data of mouse embryonic fibroblasts induced to differentiate into nerve cells, data of neural progenitor cells differentiated into nerve cells, data of human embryonic stem cells differentiated into endoderm cells, data of mouse hepatoblast differentiated into hepatocyte and cholangiocyte cells, and data of mouse embryonic stem cells differentiated into mesoderm progenitor cells.
Description
Technical Field
The invention relates to the technical field of biological systems, in particular to a method for detecting a phase change critical point of a complex biological system based on single cell diagram entropy.
Background
The dynamic development process of biological systems can be generally regarded as the evolution of a nonlinear dynamical system having three stages, namely a critical pre-transition stage, a critical stage and a critical post-transition stage, wherein the critical stage is the critical point at which the critical pre-transition stage enters the critical post-transition stage. Conventional biomarkers aim to distinguish between relatively critical pre-and post-transition phases depending on the amount of expression of a particular molecule or the high or low content of molecular products, but the criticality of the phase transition of a complex biological system may not be detected because there is usually no significant difference between the critical pre-and critical phases. Therefore, the detection of early warning signals at critical stages is a challenge, which in practice means the prediction of critical points of phase transition in complex biological systems. The theoretical derivation of this calculation is as follows:
expressed in a nonlinear discrete-time dynamical system: z (t) ═ f (Z (t-1); P) to characterize the dynamic evolution process of complex biological systems, where Z (t) ═ Z (Z) is 1 (t),z 2 (t),…,z n (t)) is an R n A vector of (a), which represents the value of an internal variable of the system at the point in time t; p ═ P (P) 1 ,p 2 ,…,p s ) Is a parameter vector or driving factor representing a slowly varying factor, such as a genetic factor (SNP, CNV, etc.), an epigenetic factor (methylation, etc.) or an environmental factor. f: r n ×R s ×R n Is a non-linear function. The power system is assumed to satisfy the following three conditions:
(ii) Presence of a regulatory parameter P 0 So thatAt a fixed pointA characteristic value with the matched pair of 1 is obtained;
(iii) equation (1-2) at P ≠ P 0 There is not always a characteristic value modulo 1.
For such a non-linear system, the systemIn thatWill undergo a critical phase change or a critical phase change when the parameter P reaches the threshold value P c Bifurcation of (Gilmore, 1993). When P reaches P c Previously, the system should maintain a stable equilibrium so that the absolute values of all eigenvalues are within (0, 1). Parameter value P for changing system state c Referred to as a bifurcation parameter value or a threshold value, and the phase preceding such bifurcation is referred to as the critical transition preceding phase. In the ideal case of small noise, when a complex biological system approaches the critical phase, there is a dominant group defined as dynamic network biomarkers inside the system among all the observed variables, this group of molecules satisfying the following three conditions based on the observed data (Chen et al, 2012; Liu et al, 2012):
1. the variance of each molecule in this group of variables increases rapidly;
2. the pearson correlation coefficient between the inside of this group of variables increases rapidly;
3. the pearson correlation coefficient of each molecule in the set of variables with the outer molecule decreases rapidly.
From the nature of dynamic network biomarkers, the critical state transition of a system can be actually represented by a group of highly correlated and highly fluctuating molecules at the network level. In particular, dynamic network biomarkers exhibit significant collective fluctuating behavior as the system approaches critical conditions, so their correlation at the critical stage is significantly different from the critical pre-transition stage. For a sub-network consisting of dynamic network biomarkers, when the system approaches a critical state, the network structure changes significantly, indicating an upcoming critical phase. Thus, by exploring the dynamic information of this set of dominant molecules at the network level, the quantitative state change can be predicted.
Most biomolecules perform their function by interacting with functional modules or other biomolecules between modules. This inter-and intra-modular interconnectivity suggests that the effect of a particular genetic abnormality not only affects the activity of the gene product carrying it, but can also extend along the links of a network composed of biomolecules, altering the activity of other gene products. Therefore, understanding the interaction network environment of biomolecules is crucial for determining the phenotype of defects affecting biomolecules.
Disclosure of Invention
The invention aims to provide a method for detecting a phase change critical point of a complex biological system based on single cell diagram entropy by exploring dynamic difference information among different groups of cells on the single cell level. The method can quantitatively represent the stability and criticality of the gene regulation network among cell populations, is a novel method for analyzing the data of the single cell transcription group, is beneficial to tracking the dynamic development of a biological system from the aspect of network entropy, and can identify the critical stage under the condition of the single cell transcription data with sparse characteristics.
The purpose of the invention can be achieved by adopting the following technical scheme:
a method for detecting phase transition critical points of a complex biological system based on single cell diagram entropy, the method comprises the following steps:
s11, for the normalized gene expression matrix, arbitrarily taking the gene pair (gi, g) j ) Drawing a scatter diagram in a planar rectangular coordinate system, wherein the vertical axis and the horizontal axis respectively represent the expression values of the two genes, each point in the diagram represents a cell, and for the cell C k Horizontal coordinate is(Gene gi in cell C k Expression value in (1) and the vertical coordinate is(Gene g) j In cell C k Expression value in (1), horizontal coordinateIs gene g i In cell C k The vertical coordinate of the expression value ofIs gene g j In cell C k The expression value of (1);
s12, Gene Pair (g) i ,g j ) In the scattergram of (2), for cell C k Based on two preset parameters n (k) (E i ) 0.1N and N (k) (E j ) 0.1N (N stands for the number of cells in the data matrix) in each gene expression valueAnda bar frame is arranged nearby, wherein n (k) (E i ) RepresentsNumber of nearby cells, n (k) (E j ) RepresentsThe number of nearby cells;
s13, labeling the number of cells in the overlapping part of the two frames as n (k) (E i ,E j );
S14, based on three statistics n (k) (E i )、n (k) (E j ) And n (k) (E i ,E j ) Constructing a statistical relevance indexThe definition is as follows:
s2, for eachConstructing a specific network for each cell, and constructing a statistical relevance index based on the specific networkStructural cell C k If the statistical relevance indexGreater than 0, i.e., equation (A1) is greater than 0, indicates that in cell C k Middle gene g i And g j There is a connecting edge between them, otherwise there is no connecting edge, through statistical relevance indexDetermination of cell C between any two genes k If there is a connecting edge, after traversing all the gene pairs, constructing cell C k Specific network of (2) (k) Extracting each local network or subnetwork from the cell specific network, each gene having a corresponding local network, the local network being formed by a central node gene and first order neighbours of the central node gene, dividing the cell specific network into a slice of local network, dividing the cell C into a plurality of slices of local networks k Specific network of (2) (k) Partitioning into M (M representing the number of genes in the data matrix) local networks;
s3, calculating a local graph entropy value for each local network, cell C k Specific network of (2) (k) After being divided into partial networks of one piece, the gene g is divided into i Local network as central nodeComputing local networksThe entropy of the local graph of (a) is defined as follows:
wherein
In the formula, statistical relevance indexRepresents the center node gene g i And its first-order neighbor genesThe weight coefficient between the weight of the first and second groups,represents the center node gene g i First order neighbor genes ofIn cell C k The constant S represents a local networkAccording to the formula (A2), cell C k Each gene g in (1) i The expression value of the single cell transcriptome data can be converted into a local map entropy value, and a sparse gene expression matrix of the single cell transcriptome data is converted into a non-sparse local map entropy value matrix in a one-to-one conversion mode.
S4, calculating map entropy of the single cell based on the group of gene clusters with the maximum local map entropy, and calculating the map entropy of the single cell for the cell C k Calculating cell C k The diagram entropy of (a) is defined as follows:
where the constant T is a tunable parameter set to the number of the first 5% of genes with the largest entropy of the map, in equation (A4), H (k) Represents cell C k The graph entropy of (c).
S5, calculating the mean map entropy H of the cell population as follows:
wherein Q represents the cell number of the cell population, and the early warning signal of the phase transition critical point of the complex biological system is detected based on the graph entropy H.
Further, the step S1 is based on the statistical relevance indexWhether it is greater than the threshold value 0, ifThen represents gene g i And g j There is a connecting edge between them, otherwise there is no connecting edge.
Further, the step S2 is to construct a cell-specific network, which is beneficial for analyzing network characteristics of different cells.
Further, in the step S3, based on the equation (a2), the sparse gene expression matrix (large noise data characteristic) can be converted into the non-sparse local graph entropy matrix (small noise data characteristic), so as to achieve the noise reduction effect.
Furthermore, the number of the genes with the adjustable parameters T of the first 5% with the maximum entropy value of the local map is taken in the step S4, so that the accuracy of the calculation result can be improved, and the complexity of calculation and analysis can be reduced.
Further, the sudden and rapid increase of the mean map entropy H of the cell population in said step S5 is indicative of an upcoming critical transition, or the occurrence of a critical point of phase transition of a complex biological system.
Compared with the prior art, the invention has the following advantages and effects:
the invention provides a single cell diagram entropy-based calculation method for identifying critical transitions of complex biological systems, which is validated by real data sets. It is worth noting that the present invention aims to detect early warning signals generated by critical stages, rather than to find evidence that critical post-transition stages of qualitative change have occurred.
1. The traditional differential expression analysis method can only judge whether the biological system is in the stage before critical transition or in the stage after critical transition, and the critical state of the stage before critical transition, namely the critical transition critical period, can not be effectively perceived, so that the critical expression analysis method can accurately reflect the critical stage in the development process of the complex biological system;
2. the single cell transcription data in the existing single cell analysis technology has the characteristics of sparsity, strong noise, heterogeneity and the like, and the critical point signals are not obvious, but the method can overcome the defect;
3. analysis at the cellular network level enables more reliable characterization of critical transition key phases of biological systems than analysis based on gene expression levels of single cells;
4. the method of the present invention is a model-free method, which means that there is neither feature selection nor model/parameter training process. Therefore, unlike traditional machine learning or classification methods, a robust model is generated during the learning process, requiring a large number of samples to avoid the over-fitting problem;
5. the method opens up a new way for predicting the critical transition critical period of the complex biological system on the single cell level, and is favorable for tracking the dynamic development of the biological system and the critical molecular mechanism thereof from the single cell level.
Drawings
The accompanying drawings, which are included to provide a further understanding of the invention and are incorporated in and constitute a part of this application, illustrate embodiment(s) of the invention and together with the description serve to explain the invention without limiting the invention. In the drawings:
FIG. 1 is a schematic diagram of the detection of critical phases based on graph entropy algorithm disclosed in the present invention;
FIG. 2(A) is a schematic diagram showing the identification of the critical point at which mouse embryonic fibroblasts are induced to differentiate into neural cells in the present invention;
FIG. 2(B) is a schematic diagram showing the identification of the critical point of differentiation of neural progenitor cells into neural cells in the present invention;
FIG. 2(C) is a schematic diagram showing the identification of the critical point of differentiation of human embryonic stem cells into endoderm cells in the present invention;
FIG. 2(D) is a schematic diagram showing the identification of the critical points of the differentiation of mouse hepatoblasts into hepatocytes and cholangiocytes in the present invention;
FIG. 2(E) is a schematic diagram showing the identification of the critical point of the differentiation of mouse embryonic stem cells into mesodermal progenitors in the present invention;
FIG. 2(F) is a schematic diagram showing the clustering of signal genes in data on the induction of differentiation of mouse embryonic fibroblasts into neural cells according to the present invention;
FIG. 2(G) is a schematic diagram showing the clustering of signal genes in the data on the differentiation of neural progenitor cells into neural cells according to the present invention;
FIG. 2(H) is a schematic diagram showing the clustering of signal genes in the data of the differentiation of human embryonic stem cells into endoderm cells according to the present invention;
FIG. 2(I) is a schematic diagram of the clustering of signal genes of mouse hepatoblast differentiated hepatocyte and cholangiocyte data in the present invention;
FIG. 2(J) is a schematic diagram of clustering of data signal genes for differentiation of mouse embryonic stem cells into mesodermal progenitor cells in accordance with the present invention;
FIG. 3(A) is a schematic diagram of "dark gene" of data on the differentiation of mouse embryonic fibroblasts into neural cells induced in the present invention;
FIG. 3(B) is a schematic diagram of the "dark gene" of the data of mouse hepatoblasts differentiated into hepatocytes and cholangiocytes in the present invention;
FIG. 3(C) is a schematic diagram of the "dark genes" of the data of the differentiation of human embryonic stem cells into endoderm cells in accordance with the present invention;
FIG. 4(A) is a schematic diagram of the signal path of differentiation of human embryonic stem cells into endodermal cell data in which the first-order difference genes are mainly enriched;
FIG. 4(B) is a schematic diagram of the signal path for differentiation of human embryonic stem cells into a major enrichment of the "dark genes" in endoderm cell data according to the present invention;
FIG. 4(C) is a schematic diagram showing the regulation and control relationship between "dark gene" and differential first-order neighbor gene in the data of human embryonic stem cells differentiated into endoderm cells in MAPK and PI3K/Akt signaling pathways.
Detailed Description
In order to make the objects, technical solutions and advantages of the embodiments of the present invention clearer, the technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are some, but not all, embodiments of the present invention. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
Examples
As shown in FIG. 1, the embodiment of the present invention discloses a method for detecting phase transition critical points of a complex biological system based on unicell diagram entropy. According to the schematic flow diagram disclosed in fig. 1, the results obtained by the example are as follows:
1. key period of 'cell fate choice' in early-stage embryo development process based on graph entropy algorithm early warning
The graph entropy algorithm is applied to the single-cell transcriptome data of five early embryo developments, and an early warning signal of a 'cell fate choice' key phase in the early embryo development process is detected. Specifically, the map entropy for each cell is calculated based on the steps of the map entropy algorithm. Further, the mean map entropy of the cell population at each time point is calculated, and then the early warning signal of the key phase of 'cell fate choice' in the early embryo development process is detected based on the mean map entropy H at each time point. For the data of the mouse embryonic fibroblasts induced to differentiate into the neural cells, as shown in fig. 2(a), the mean map entropy H rapidly increased from day 5 to day 20, and the statistical P value thereof was significant (P ═ 0.0168). Significant changes in mean map entropy H forewarn a "cell fate decision" key phase after day 20, i.e. the induction of differentiation of mouse embryonic intermediate cells into neurons occurs at day 22. For the data on the differentiation of neural progenitor cells into neural cells, as shown in fig. 2(B), a significant difference change in mean entropy H (P0.0362) occurred on day 1, indicating that a "cell fate choice" key phase is about to occur after day 1. The early warning signal was consistent with observations in the original experiment showing that cell heterogeneity was minimal at day 1, that cell heterogeneity began to increase after day 1, and that neuron heterogeneity was maximal at day 30. For the data of human embryonic stem cell differentiation into endoderm cells, fig. 2(C) shows that the significant difference change of mean graph entropy H (P ═ 0.0196) occurred at 36 hours, and the "cell fate decision" key phase occurred after the early warning for 36 hours, i.e. the differentiation of human embryonic stem cells into endoderm cells occurred at 72 hours. As shown in fig. 2(D), for the data of mouse hepatoblast differentiation into hepatocytes and cholangiocytes, a significant change in mean map entropy H (P ═ 7.3076E-05) occurred at day E12.5 during embryonic development, while hepatoblasts differentiated into hepatocytes and cholangiocytes after day E12.5. For the data of the differentiation of mouse embryonic stem cells into mesodermal progenitors, it can be seen from fig. 2(E) that the mean map entropy H is statistically significantly different at 24 hours (P0.0288), and the "cell fate choice" key phase appears 24 hours after the early warning. In fact, pluripotent stem cells differentiate into endoderm at around 48 hours. Therefore, the graph entropy algorithm can successfully detect the early warning signal of the 'cell fate decision' key period in the early embryo development process.
In addition, in order to test the performance of the local map entropy of the signal genes on cell clustering, for each single-cell data set, the signal genes with the maximum local map entropy are selected from the first 5% of the single-cell data sets at the critical points, and then the selected signal genes are subjected to t-distribution random neighborhood embedding (t-SNE) dimension reduction analysis and visualization based on the local map entropy. The results of cell clustering are shown in fig. 2(F) -2 (J), and for the single-cell dataset of these five early embryo developments, cell clustering based on local map entropy of signal genes can distinguish the states of cells at different stages or time points, i.e., cells at different time points are grouped into different categories, while cells at the same time point are grouped into one category. Therefore, the local map entropy of the signal gene has good performance on cell clustering, namely, clustering analysis based on the local map entropy of the signal gene can accurately distinguish heterogeneity of cells along with time under the resolution of single cells. Therefore, the graph entropy algorithm converts the sparse original gene expression matrix into a non-sparse local graph entropy matrix, which not only can be used for detecting the 'cell fate choice' key phase in the early embryo development process, but also can provide the entropy matrix for carrying out time point clustering analysis on cells and exploring the dynamic information of cell populations.
2. Mining 'dark gene' based on graph entropy algorithm "
In the biomedical field, differential expression analysis methods play a very important role in searching novel biomarkers, regulatory factors, drug targets and the like, but non-differential expression genes are often ignored by the traditional differential expression analysis methods. In fact, some non-differentially expressed genes are also involved in important biological processes, they are concentrated in important functional pathways and play important roles in the development of the embryo, and therefore this part of the non-differentially expressed genes should not be ignored. During the analysis of the single-cell transcriptome data set, we found that some non-differentially expressed genes have high sensitivity to local map entropy values, although their expression values have no significant difference. We have named such genes as "dark genes" which may play an important role in embryonic development. According to the definition of "dark gene" by predecessors, we define the judgment condition that a certain gene belongs to "dark gene" as follows: (i) no significant statistical difference in gene expression levels; (ii) the entropy value of the local graph has a significant statistical difference between a critical point and a non-critical point. To find "dark genes" closely related to the embryonic development process, we selected the first 5% genes with the largest local entropy at the critical point for each single-cell transcriptome dataset, and then analyzed the selected genes based on the judgment conditions of the "dark genes". FIGS. 3(A) -3 (C) show some "dark genes" of mouse embryonic fibroblasts induced to differentiate into neural cell data, mouse hepatoblasts differentiated into hepatocyte and cholangiocyte data, and human embryonic stem cells differentiated into endodermal cell data, respectively. As shown in fig. 3(a) -3 (C), for each "dark gene", it did not change significantly at the gene expression level, but did change significantly at the local map entropy (net entropy) level. Although they do not significantly change at the level of gene expression, it is likely that critical transitions in the development of the embryo play an important role.
As shown in fig. 4(a) -4 (B), KEGG pathway enrichment analysis was performed on the "dark gene" and its differential first-order neighbor genes in the human embryonic stem cell differentiation into endoderm cell data. These two genes are mainly enriched in pathways that are closely related to embryonic development. For example, the MAPK signaling pathway (MAPK signaling pathway) plays a key role in cell proliferation and differentiation. The PI3K/AKT signaling pathway regulates the proliferation and differentiation of various types of cells. The p53 signaling pathway (p53 signaling pathway) is a regulatory pathway for embryonic stem cell differentiation. To illustrate the potential regulatory relationship between the "dark genes" and their differential first-order neighbor genes on the PPI network, we showed the regulatory relationship of these two genes on the MAPK and PI3K/Akt signaling pathways (fig. 4 (C)). As can be seen from fig. 4(C), the "dark genes" such as IGF1 Growth Factor (GF), LAMC2, and COL4a1 extracellular matrix (ECM) are upstream regulatory factors that promote cell proliferation and differentiation by activating downstream molecules (differential first-order neighbor genes), and play a role as driving factors during cell differentiation. Although the expression level of the "dark genes" does not change much during the critical transition, the expression of some of their downstream molecules can change significantly, triggering cell proliferation and differentiation effects. In the MAPK and PI3K/Akt signaling pathways, there are signaling chains that play an important role in cell proliferation and differentiation. For example, in the MAPK signaling pathway, the dark gene MAPK9 is a key gene of c-Jun amino-terminal kinase signaling pathway (c-Jun N-terminal kinases signaling pathway), and can induce cell proliferation and differentiation effects by inducing various signals. As shown in fig. 4(C), up-regulation of molecules such as IGF1R, SOS1, and SOS2 will activate Ras and further activate RPS6KA3, which may lead to mitotic effects, thereby promoting proliferation and division of cells. In addition, in the PI3K/Akt signal pathway, extracellular matrix genes such as LAMC2, COL4A1 and TNC are found to activate downstream molecules ITGB1, and further participate in activating PI3K and AKT signal molecules together with down-regulated genes PPP2R5C, so that the expression of downstream molecules BRCA1 and CHUK is down-regulated, and the mitosis of cells G2/M is possibly promoted, and the proliferation and differentiation of the cells are further promoted. The signals for the Growth Factor (GF) of IGF1 and related genes LAMC2, COL4A1 extracellular matrix (ECM) to initiate cell proliferation and differentiation on the MAPK and PI3K/Akt signaling pathways occurred at approximately 36 hours, consistent with the time points reported in the original literature (Chu et al, 2016) for the differentiation of pluripotent embryonic stem cells into the "cell fate decision" critical phase of endoderm. Therefore, the map entropy algorithm helps to find "dark genes" that do not differ in gene expression levels but are sensitive to SGE, which play a key role in early embryo development.
The above embodiments are preferred embodiments of the present invention, but the present invention is not limited to the above embodiments, and any other changes, modifications, substitutions, combinations, and simplifications which do not depart from the spirit and principle of the present invention should be construed as equivalents thereof, and all such changes, modifications, substitutions, combinations, and simplifications are intended to be included in the scope of the present invention.
Claims (4)
1. A method for detecting a phase transition critical point of a complex biological system based on single cell diagram entropy is characterized by comprising the following steps:
s11, for the normalized gene expression matrix, arbitrarily selecting the gene pair (g) i ,g j ) Drawing a scatter diagram in a plane rectangular coordinate system, wherein a vertical axis and a horizontal axis in the scatter diagram respectively represent expression values of the two genes, each point in the scatter diagram represents a cell, and for a cell C k Horizontal coordinate is Is gene g i In cell C k The vertical coordinate of the expression value in (1) is Is gene g j In cell C k Expression value in (1), horizontal coordinateIs gene g i In cell C k The vertical coordinate of the expression value ofIs gene g j In cell C k The expression value of (1);
s12, Gene Pair (g) i ,g j ) In the scattergram of (2), for cell C k Based on two preset parameters n (k) (E i ) 0.1N and N (k) (E j ) 0.1N in the respective Gene expression valuesAnda bar frame is arranged nearby, wherein n (k) (E i ) RepresentsNumber of nearby cells, n (k) (E j ) RepresentsThe number of nearby cells, N representing the number of cells in the data matrix;
s13, labeling the number of cells in the overlapping part of the two frames as n (k) (E i ,E j );
S14, based on three statistics n (k) (E i )、n (k) (E j ) And n (k) (E i ,E j ) Constructing a statistical relevance indexThe definition is as follows:
s2, constructing a specific network for each cell, and constructing a statistical relevance index based on the constructed specific networkStructural cell C k If the statistical relevance indexGreater than 0, i.e., equation (A1) is greater than 0, indicates that in cell C k Middle gene g i And g j There is a connecting edge between them, otherwise there is no connecting edge, through statistical relevance indexDetermination of cell C between any two genes k If there is a connecting edge, after traversing all the gene pairs, constructing cell C k Specific network of (2) (k) Extracting each local network or subnetwork from the cell specific network, each gene having a corresponding local network, the local network being formed by a central node gene and first order neighbours of the central node gene, dividing the cell specific network into a slice of local network, dividing the cell C into a plurality of slices of local networks k Specific network of (2) (k) Partitioning into M (M representing the number of genes in the data matrix) local networks;
s3, calculating a local graph entropy value for each local network, cell C k Specific network of (2) (k) After being divided into partial networks of one piece, the gene g is divided into i Office of central nodeNetwork of unitsComputing local networksThe entropy of the local graph of (a) is defined as follows:
wherein
In the formula, statistical relevance indexRepresents the center node gene g i And its first-order neighbor genesThe weight coefficient between the weight of the first and second groups,represents the center node gene g i First order neighbor genes ofIn cell C k The constant S represents a local networkAccording to the formula (A2), cell C k Each gene g in (1) i The expression value can be converted into a local map entropy value, and a sparse gene expression matrix of the single-cell transcriptome data is converted into a non-sparse gene expression matrix in a one-to-one conversion modeA sparse local graph entropy matrix.
S4, calculating map entropy of the single cell based on the group of gene clusters with the maximum local map entropy, and calculating the map entropy of the single cell for the cell C k Calculating cell C k The diagram entropy of (a) is defined as follows:
where the constant T is a tunable parameter set to the number of the first 5% of genes with the largest entropy of the map, in equation (A4), H (k) Represents cell C k The graph entropy of (c).
S5, calculating the mean map entropy H of the cell population as follows:
wherein Q represents the cell number of the cell population, and the early warning signal of the phase transition critical point of the complex biological system is detected based on the graph entropy H.
2. The method for detecting the critical point of phase transition of a complex biological system based on the single cell diagram entropy as claimed in claim 1, wherein the method is based on statistical relevance indexWhether it is greater than the threshold value 0, ifThen represents gene g i And g j There is a connecting edge between them, otherwise there is no connecting edge.
3. The method for detecting the critical point of phase transition of a complex biological system based on the single cell diagram entropy of claim 1, wherein the sudden and rapid increase of the average diagram entropy H of the cell population is indicative of an upcoming critical transition or the occurrence of the critical point of phase transition of the complex biological system.
4. The method for entropy-detecting phase transition critical points of a complex biological system based on a single cell diagram as claimed in claim 1, wherein the number of the first 5% genes with the largest entropy value of the local diagram is taken as the adjustable parameter T in step S4, so that the accuracy of the calculation result can be improved, and the complexity of the calculation analysis can be reduced.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202210627839.2A CN115083524A (en) | 2022-06-06 | 2022-06-06 | Method for detecting phase change critical point of complex biological system based on single cell diagram entropy |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202210627839.2A CN115083524A (en) | 2022-06-06 | 2022-06-06 | Method for detecting phase change critical point of complex biological system based on single cell diagram entropy |
Publications (1)
Publication Number | Publication Date |
---|---|
CN115083524A true CN115083524A (en) | 2022-09-20 |
Family
ID=83248975
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202210627839.2A Pending CN115083524A (en) | 2022-06-06 | 2022-06-06 | Method for detecting phase change critical point of complex biological system based on single cell diagram entropy |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN115083524A (en) |
Citations (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN111009292A (en) * | 2019-11-20 | 2020-04-14 | 华南理工大学 | Method for detecting phase change critical point of complex biological system based on single sample sKLD index |
CN111261243A (en) * | 2020-01-10 | 2020-06-09 | 华南理工大学 | Method for detecting phase change critical point of complex biological system based on relative entropy index |
-
2022
- 2022-06-06 CN CN202210627839.2A patent/CN115083524A/en active Pending
Patent Citations (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN111009292A (en) * | 2019-11-20 | 2020-04-14 | 华南理工大学 | Method for detecting phase change critical point of complex biological system based on single sample sKLD index |
CN111261243A (en) * | 2020-01-10 | 2020-06-09 | 华南理工大学 | Method for detecting phase change critical point of complex biological system based on relative entropy index |
Non-Patent Citations (1)
Title |
---|
JIAYUAN ZHONG 等: "scGET: Predicting Cell Fate Transition During Early Embryonic Development by Single-cell Graph Entropy", 《GENOMICS PROTEOMICS BIOINFORMATICS》, 24 December 2021 (2021-12-24), pages 461 - 474 * |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
Talwar et al. | AutoImpute: Autoencoder based imputation of single-cell RNA-seq data | |
Grün | Revealing dynamics of gene expression variability in cell state space | |
Eraslan et al. | Single-cell RNA-seq denoising using a deep count autoencoder | |
CN106682454B (en) | A kind of macro genomic data classification method and device | |
WO2020154885A1 (en) | Single cell type detection method, apparatus, device, and storage medium | |
Zhang et al. | Imputing single-cell RNA-seq data by considering cell heterogeneity and prior expression of dropouts | |
Miao et al. | scRecover: Discriminating true and false zeros in single-cell RNA-seq data for imputation | |
Bhar et al. | Multiobjective triclustering of time-series transcriptome data reveals key genes of biological processes | |
Li et al. | scImpute: accurate and robust imputation for single cell RNA-seq data | |
Zhong et al. | scGET: predicting cell fate transition during early embryonic development by single-cell graph entropy | |
Park et al. | Machine learning-based identification of genetic interactions from heterogeneous gene expression profiles | |
Zhang et al. | PBLR: an accurate single cell RNA-seq data imputation tool considering cell heterogeneity and prior expression level of dropouts | |
Najar et al. | Identifying cell state–associated alternative splicing events and their coregulation | |
Jin et al. | Imputing dropouts for single-cell RNA sequencing based on multi-objective optimization | |
Liu et al. | scESI: evolutionary sparse imputation for single-cell transcriptomes from nearest neighbor cells | |
CN115083524A (en) | Method for detecting phase change critical point of complex biological system based on single cell diagram entropy | |
Wang et al. | MMDAE-HGSOC: A novel method for high-grade serous ovarian cancer molecular subtypes classification based on multi-modal deep autoencoder | |
Bacher et al. | Enhancing biological signals and detection rates in single-cell RNA-seq experiments with cDNA library equalization | |
Liu et al. | Are dropout imputation methods for scRNA-seq effective for scATAC-seq data? | |
Mohammadi et al. | DECODE-ing sparsity patterns in single-cell RNA-seq | |
Gan et al. | DSAE-Impute: Learning discriminative stacked autoencoders for imputing single-cell rna-seq data | |
Zhang et al. | MIClique: an algorithm to identify differentially coexpressed disease gene subset from microarray data | |
CN111461199B (en) | Safety attribute selection method based on distributed junk mail classified data | |
Carl et al. | A fully automated deep learning pipeline for high-throughput colony segmentation and classification | |
CN107609348B (en) | High-throughput transcriptome data sample classification number estimation method |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination |