CN116564419A - Space transcriptome characteristic enrichment difference analysis method and application thereof - Google Patents
Space transcriptome characteristic enrichment difference analysis method and application thereof Download PDFInfo
- Publication number
- CN116564419A CN116564419A CN202310833965.8A CN202310833965A CN116564419A CN 116564419 A CN116564419 A CN 116564419A CN 202310833965 A CN202310833965 A CN 202310833965A CN 116564419 A CN116564419 A CN 116564419A
- Authority
- CN
- China
- Prior art keywords
- transcriptome
- enrichment
- sample
- space
- data
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
- 238000004458 analytical method Methods 0.000 title claims abstract description 43
- 108090000623 proteins and genes Proteins 0.000 claims abstract description 58
- 238000000034 method Methods 0.000 claims abstract description 35
- 238000012216 screening Methods 0.000 claims abstract description 9
- 230000014509 gene expression Effects 0.000 claims description 14
- 238000012360 testing method Methods 0.000 claims description 5
- 238000000585 Mann–Whitney U test Methods 0.000 claims description 4
- 238000000692 Student's t-test Methods 0.000 claims description 4
- 230000010354 integration Effects 0.000 claims description 4
- 238000007619 statistical method Methods 0.000 claims description 4
- 238000012353 t test Methods 0.000 claims description 4
- 239000002771 cell marker Substances 0.000 claims description 2
- 238000000605 extraction Methods 0.000 claims description 2
- 238000011161 development Methods 0.000 abstract description 4
- 201000010099 disease Diseases 0.000 abstract description 4
- 208000037265 diseases, disorders, signs and symptoms Diseases 0.000 abstract description 4
- 238000004445 quantitative analysis Methods 0.000 abstract description 3
- 210000004027 cell Anatomy 0.000 description 55
- 230000002757 inflammatory effect Effects 0.000 description 8
- 210000001744 T-lymphocyte Anatomy 0.000 description 7
- 210000003491 skin Anatomy 0.000 description 6
- 210000001519 tissue Anatomy 0.000 description 6
- 210000002865 immune cell Anatomy 0.000 description 5
- 230000037311 normal skin Effects 0.000 description 4
- 210000002510 keratinocyte Anatomy 0.000 description 3
- 210000005134 plasmacytoid dendritic cell Anatomy 0.000 description 3
- 238000013077 scoring method Methods 0.000 description 3
- 238000003559 RNA-seq method Methods 0.000 description 2
- 230000002900 effect on cell Effects 0.000 description 2
- 230000000694 effects Effects 0.000 description 2
- 210000002889 endothelial cell Anatomy 0.000 description 2
- 210000002950 fibroblast Anatomy 0.000 description 2
- 210000003630 histaminocyte Anatomy 0.000 description 2
- 210000002752 melanocyte Anatomy 0.000 description 2
- 210000000440 neutrophil Anatomy 0.000 description 2
- 238000003556 assay Methods 0.000 description 1
- 230000009286 beneficial effect Effects 0.000 description 1
- 239000003153 chemical reaction reagent Substances 0.000 description 1
- 230000001186 cumulative effect Effects 0.000 description 1
- 238000007405 data analysis Methods 0.000 description 1
- 230000007547 defect Effects 0.000 description 1
- 230000007812 deficiency Effects 0.000 description 1
- 238000009792 diffusion process Methods 0.000 description 1
- 238000011065 in-situ storage Methods 0.000 description 1
- 230000004968 inflammatory condition Effects 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 210000001711 oxyntic cell Anatomy 0.000 description 1
- 230000001575 pathological effect Effects 0.000 description 1
- 230000008823 permeabilization Effects 0.000 description 1
- 239000002994 raw material Substances 0.000 description 1
- 238000005096 rolling process Methods 0.000 description 1
- 238000012174 single-cell RNA sequencing Methods 0.000 description 1
- 238000006467 substitution reaction Methods 0.000 description 1
- 210000000106 sweat gland Anatomy 0.000 description 1
Classifications
-
- G—PHYSICS
- G16—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
- G16B—BIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
- G16B40/00—ICT specially adapted for biostatistics; ICT specially adapted for bioinformatics-related machine learning or data mining, e.g. knowledge discovery or pattern finding
-
- G—PHYSICS
- G16—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
- G16B—BIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
- G16B30/00—ICT specially adapted for sequence analysis involving nucleotides or amino acids
-
- Y—GENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
- Y02—TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
- Y02A—TECHNOLOGIES FOR ADAPTATION TO CLIMATE CHANGE
- Y02A90/00—Technologies having an indirect contribution to adaptation to climate change
- Y02A90/10—Information and communication technologies [ICT] supporting adaptation to climate change, e.g. for weather forecasting or climate simulation
Landscapes
- Physics & Mathematics (AREA)
- Life Sciences & Earth Sciences (AREA)
- Health & Medical Sciences (AREA)
- Engineering & Computer Science (AREA)
- Medical Informatics (AREA)
- Biophysics (AREA)
- Theoretical Computer Science (AREA)
- Spectroscopy & Molecular Physics (AREA)
- General Health & Medical Sciences (AREA)
- Evolutionary Biology (AREA)
- Biotechnology (AREA)
- Bioinformatics & Computational Biology (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Data Mining & Analysis (AREA)
- Software Systems (AREA)
- Artificial Intelligence (AREA)
- Evolutionary Computation (AREA)
- Epidemiology (AREA)
- Databases & Information Systems (AREA)
- Public Health (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Bioethics (AREA)
- Chemical & Material Sciences (AREA)
- Analytical Chemistry (AREA)
- Proteomics, Peptides & Aminoacids (AREA)
- Measuring Or Testing Involving Enzymes Or Micro-Organisms (AREA)
Abstract
The invention discloses a space transcriptome characteristic enrichment difference analysis method and application thereof. The method comprises the following steps: integrating the multi-sample space transcriptome data, extracting a cell type specific characteristic gene set from the single cell transcriptome data, scoring the space transcriptome characteristic gene set, screening space enrichment points, and analyzing the space enrichment point duty ratio difference. The method can analyze the multi-sample multi-grouping space transcriptome data, effectively and reasonably perform statistical difference analysis on a sample level, improve the resolution of single cells, annotate the space transcriptome spot, have high consistency and repeatability, and have important significance in researching the distribution characteristics of the space transcriptome of cell subgroups and performing quantitative analysis on the variation differences of the cell types under different disease states or development stages.
Description
Technical Field
The invention belongs to the technical field of biology, and relates to a space transcriptome characteristic enrichment difference analysis method and application thereof.
Background
Spatial transcriptome RNA sequencing as a new technique combining spatial information and RNA transcript information, spatial position distribution information on tissue in situ slices can be added to captured transcripts to obtain high throughput transcriptome data with spatial distribution information. Spatial transcriptome RNA sequencing techniques have increased resolution within tissues and are now widely used. However, due to the limitations of the prior art, many commercial space transcriptome products on the market cannot achieve single cell resolution, and typically several cells are mixed in each space transcriptome spot. Therefore, cell type annotation of the space transcriptome spot by using a cell annotation mode of single-cell RNA sequencing technology is not possible, which is always a big problem of space transcriptome data analysis.
Currently, there are two commonly used methods for spatial transcriptome cell type resolution: firstly, analyzing the proportion of cells in each spot by using a deconvolution analysis method, wherein common software such as SEurat, SPOTlight and the like is used; secondly, each spot is scored using cell type signature genes, commonly used methods such as ssGSEA and AddModuleSCore, etc. However, both of the above methods have some drawbacks. The deconvolution analysis has low consistency and repeatability, and the deconvolution has poor analysis effect on cell subsets with high transcriptome level similarity, so that the condition of cell type deficiency often occurs, which is unfavorable for the study of careful cell subsets by using a space transcriptome. Analysis of independent gene sets for each cell type characteristic by means of characteristic gene scoring may be used to study cell subpopulations, however gene set scoring algorithms lack threshold decisions, and due to the large number of shots when integrating multiple samples, differential analysis of the gene set enrichment scores at the shot level typically results in a strong statistical significance for almost all characteristics, regardless of sample effect size, which can be erroneously interpreted as biological significance.
In conclusion, the existing space transcriptome cell type analysis method has the problems of low single cell resolution, poor analysis effect on cell subsets with high transcriptome level similarity, lack of threshold judgment, easiness in misjudgment, statistical significance and the like. How to provide a space transcriptome feature enrichment difference analysis method, which improves the resolution of single cells, effectively analyzes cell subsets with high similarity of transcriptome layers, and becomes one of the problems to be solved in the current biotechnology field.
Disclosure of Invention
Aiming at the defects and actual demands of the prior art, the invention provides a space transcriptome feature enrichment difference analysis method and application thereof, which can perform cell type feature enrichment scoring on space transcriptome multi-sample multi-grouping data, evaluate the proportion of enrichment space points of each space transcriptome sample and feature, and reasonably evaluate the sample level difference of the feature.
In order to achieve the aim of the invention, the invention adopts the following technical scheme:
in a first aspect, the present invention provides a method of spatial transcriptome feature enrichment differential analysis, the method comprising: integrating the sample space transcriptome data, extracting a cell type specific characteristic gene set from the single cell transcriptome data, scoring the space transcriptome characteristic gene set, screening space enrichment points, and analyzing the space enrichment point duty ratio difference.
The method can analyze the multi-sample multi-grouping space transcriptome data, effectively and reasonably perform statistical difference analysis on a sample level, improve the resolution of single cells, annotate the space transcriptome spot, have high consistency and repeatability, and have important significance in researching the distribution characteristics of the space transcriptome of cell subgroups and performing quantitative analysis on the variation differences of the cell types under different disease states or development stages.
Preferably, the integrating sample space transcriptome data comprises:
and adding a sample name before each data spot label, filling a corresponding sample name in a data slice image information frame, filling provided sample and group names in sample and group information columns of data meta.data, and carrying out multi-sample data integration by using a built-in merge function in a SEurat after adding information.
Preferably, the sample comprises a normal tissue slice and/or a pathological tissue slice.
Preferably, the method of extracting a set of cell type-specific signature genes from single cell transcriptome data comprises:
calculating differential genes of cell subtypes by using FindMarkers function, and selecting significant high-expression genes ordered by avg_log2FC as a characteristic gene set of the cell type, wherein the determination standard of the significant high expression is as follows: avg_log2fc > 0.25 in target subpopulations and expression values with Wilcoxon rank sum test significance p value of less than 0.05 in target subpopulation cells and other subpopulations cells.
Preferably, said scoring the set of spatial transcriptome signature genes comprises:
scoring was performed using four algorithms, ssGSEA, AUCell, UCell and singscore.
The ssGSEA sequences the genes of each space transcriptome spot according to the expression values, and calculates the enrichment score by calculating the difference of the cumulative density distribution of the gene expression values in the spot, which belong to the characteristic gene set and do not belong to the characteristic gene set. AUCell sequences the genes of each space transcriptome spot according to the expression values to obtain gene expression ranks, and the area under the curve is used for evaluating the enrichment score of the characteristic gene set. UCell is a ranking of gene expression based on individual spots, and Mann-Whitney U statistics are used to measure the enrichment score of a set of characteristic genes in a spot. The singscore, like the three scoring methods described above, is also a ranking of gene expression based on spot, whose algorithm uses the degree to which the set of signature genes is far from the center to evaluate the score of spot enrichment.
Preferably, the source of the set of spatial transcriptome signature genes comprises cell marker and/or single cell transcriptome data of the corresponding tissue.
Preferably, the screening spatial enrichment points comprise:
determining an enrichment threshold according to formula (1), in particular a score S scored according to the set of characteristic genes j Sorting each feature j, wherein the spots with the front alpha duty ratio with the highest score are not included in the determination of the threshold value, selecting the highest score after the spots with the front alpha duty ratio with the highest score are excluded as an enrichment threshold value, the spots with the enrichment score higher than the enrichment threshold value are spatial enrichment points,
formula (1);
wherein j is a feature thr j An enrichment threshold for feature j, S j Scoring the characteristic gene set, wherein alpha is the front alpha ratio with highest score, K is a defined percentage factor, and K is 0.6-0.8.
Preferably, the spatial enrichment point duty ratio difference analysis includes:
calculating the space enrichment point duty ratio of each feature j of each spatial transcriptome slice sample i according to formula (2), performing sample-level statistical analysis comparison between spatial transcriptome data sample sets using t-test or wilcox test, and setting pvalue < 0.05 to estimate statistical significance,
formula (2);
where j is a feature and i is a spatial transcriptome slice sample, fraction ij The spatial enrichment point duty cycle, N, for each feature j of each spatial transcriptome slice sample i ij For samples exceeding the threshold thr j The number of spots, M i The total number of shots of the sample.
In a second aspect, the invention provides a spatial transcriptome feature enrichment difference analysis device, which comprises an integrated sample spatial transcriptome data module, a cell type specific feature gene set module extracted from single cell transcriptome data, a scoring module for the spatial transcriptome feature gene set, a spatial enrichment point screening module and a spatial enrichment point duty ratio difference analysis module;
the integrated sample space transcriptome data module is to perform operations comprising:
adding a sample name before each data spot label, filling a corresponding sample name in a data slice image information frame, filling provided sample and group names in sample and group information columns of data meta.data, and carrying out multi-sample data integration by using a built-in merge function in a semoat after adding information;
the extraction of cell type specific signature gene sets from single cell transcriptome data module is for performing a method comprising:
calculating differential genes of cell subtypes by using FindMarkers function, and selecting significant high-expression genes ordered by avg_log2FC as a characteristic gene set of the cell type;
the judgment standard of the remarkable high expression is as follows: avg_log2fc > 0.25 in the target subpopulation and expression values with Wilcoxon rank sum test significance p value of less than 0.05 for target subpopulation cells and other subpopulations cells;
the scoring module for scoring the set of spatial transcriptome signature genes is configured to perform the steps comprising:
scoring was performed using four algorithms, ssGSEA, AUCell, UCell and singscore;
the screening space enrichment point module is configured to perform operations including:
determining an enrichment threshold according to formula (1), in particular a score S scored according to the set of characteristic genes j Sorting each feature j, wherein the spots with the front alpha duty ratio with the highest score are not included in the determination of the threshold value, and selecting the highest score after the spots with the front alpha duty ratio with the highest score are excluded as an enrichment threshold value, and the spots with the enrichment score higher than the enrichment threshold value are space enrichment points;
formula (1);
wherein j is a feature thr j An enrichment threshold for feature j, S j Scoring the characteristic gene set, wherein alpha is the front alpha ratio with the highest score, K is a defined percentage factor, and K is 0.6-0.8;
the spatial enrichment point duty ratio difference analysis module is configured to perform operations comprising: calculating the space enrichment point duty ratio of each feature j of each space transcriptome slice sample i according to the formula (2), performing sample-level statistical analysis comparison between space transcriptome data sample groups by using a t-test or a wilcox test, and setting pvalue < 0.05 to estimate statistical significance;
formula (2);
where j is a feature and i is a spatial transcriptome slice sample, fraction ij The spatial enrichment point duty cycle, N, for each feature j of each spatial transcriptome slice sample i ij For samples exceeding the threshold thr j The number of spots, M i The total number of shots of the sample.
In a third aspect, the present invention provides the use of the spatial transcriptome feature enrichment differential analysis device of the second aspect for analysis of spatial transcriptome data.
Preferably, the analysis of the spatial transcriptome data comprises analysis of spatial transcriptome feature enrichment differences and/or spatial transcriptome function enrichment differences.
Compared with the prior art, the invention has the following beneficial effects:
the method can analyze the multi-sample multi-grouping space transcriptome data, effectively and reasonably perform statistical difference analysis on a sample level, improve the resolution of single cells, annotate the space transcriptome spot, have high consistency and repeatability, and have important significance in researching the distribution characteristics of the space transcriptome of cell subgroups and performing quantitative analysis on the variation differences of the cell types under different disease states or development stages.
Drawings
FIG. 1 is a heat map of benchmark correlations for four scoring methods ssGSEA, AUCell, UCell and singscore;
FIG. 2 is a representation of scoring and spatial enrichment points for sets of cell type characteristic genes for inflammatory skin and normal skin spatial transcriptome sections;
FIG. 3 is a graph of deconvolution cell duty cycle differences;
fig. 4 is a graph of spatial enrichment point duty cycle differences.
Detailed Description
The technical means adopted by the invention and the effects thereof are further described below with reference to the examples and the attached drawings. It is to be understood that the specific embodiments described herein are merely illustrative of the invention and are not limiting thereof.
The specific techniques or conditions are not identified in the examples and are described in the literature in this field or are carried out in accordance with the product specifications. The reagents or apparatus used were conventional products commercially available through regular channels, with no manufacturer noted.
Example 1
Benchmarking was performed on the 4 methods ssGSEA, AUCell, UCell and singscore.
The invention performs parallel tests on scores of several mainstream gene sets and calculates correlations between features and between methods. As shown in fig. 1, the 4 methods ssGSEA, AUCell, UCell and singscore all have very high spearman correlation, ssGSEA performs most well in spatial transcriptome data, and therefore the ssGSEA algorithm was chosen as the recommended default method of use for scoring the feature gene set.
Example 2
And performing spatial transcriptome characteristic enrichment difference analysis on the spatial transcriptome data of the normal skin sample and the inflammatory state skin sample.
Comparing the result of scoring the gene set with the result of distributing the space enrichment points, wherein the result is shown in figure 2, G1 is skin slice space transcriptome data in an inflammatory state, G2 is normal skin slice space transcriptome data, the distribution of the space enrichment points (black spots in the lower row) of each cell type has the space distribution characteristic, and the distribution is consistent with the points with high scores of the scoring result (upper row) of the gene set, so that the algorithm is reasonable. Meanwhile, the cell types G1 (upper row) and G2 slices (upper row) are compared, the enrichment points of immune cells (including T cells, myeloid immune cells, neutrophils, mast cells and plasmacytoid dendritic cells) in the inflammatory state are obviously more than those of skin slice tissues in the normal state, and the biological common sense is met, namely, immune cells are increased in the inflammatory state, so that the method can effectively analyze multi-sample multi-grouping space transcriptome data.
Example 3
And (3) comparing the analysis result of the spatial transcriptome characteristic enrichment difference with the analysis result of the Seperat deconvolution analysis.
As a result of deconvolution cell duty cycle differences (fig. 3) and space enrichment point duty cycle differences (fig. 4), endothelial Cells (ECs), fibroblasts (fibroplasts), keratinocytes (Keratinocytes), mast cells (mascells), melanocytes (Melanocytes), myeloid immune cells (MPs), parietal cells (MuralCells), neutrophils (neutrophilis), plasmacytoid dendritic cells (pDCs), sweat gland cells (swetgladics) and T cells (TCells) types have consistent inter-group duty cycle trends, both fibroblast (fibriplasts) cell duty cycles are more in G2 (normal skin), whereas immune cells are more enriched in G1 (inflammatory skin), consistent with the biological characteristics of inflammatory conditions, indicating that both assays can be used to resolve space transcriptome cell types. In the deconvolution result, as the RNA content of a single T cell is low, the T cell ratio is very low, and the enrichment state of the T cell in the G1 inflammatory state is not reflected; the space enrichment point duty ratio difference analysis can show that the T cells are significantly enriched in the G1 inflammatory state. In addition, the deconvolution method has high requirement on the data quality of the space transcriptome, if the permeabilization time is unsuitable to cause the RNA diffusion of a certain cell type (such as keratinocytes in skin), the cell type with small proportion in the deconvolution result can be greatly influenced to cause the incapability of rolling out the cell proportion; the space enrichment point duty ratio difference analysis is independent analysis of each cell type, so that the influence is small. The method is proved to optimize the original gene set scoring method in space transcriptome cell type analysis, and the analysis result is superior to deconvolution analysis, so that statistical difference analysis and inter-group difference analysis on a sample level can be effectively and reasonably carried out.
In conclusion, the method can analyze the multi-sample multi-grouping space transcriptome data, effectively and reasonably perform statistical difference analysis on a sample level, improve the resolution of single cells, annotate the space transcriptome spot, have high consistency and repeatability, and have important significance in researching the space transcriptome distribution characteristics of cell subgroups and the cell type change difference under different disease states or development stages.
The applicant states that the detailed method of the present invention is illustrated by the above examples, but the present invention is not limited to the detailed method described above, i.e. it does not mean that the present invention must be practiced in dependence upon the detailed method described above. It should be apparent to those skilled in the art that any modification of the present invention, equivalent substitution of raw materials for the product of the present invention, addition of auxiliary components, selection of specific modes, etc., falls within the scope of the present invention and the scope of disclosure.
Claims (10)
1. A method of spatial transcriptome feature enrichment differential analysis, the method comprising:
integrating the sample space transcriptome data, extracting a cell type specific characteristic gene set from the single cell transcriptome data, scoring the space transcriptome characteristic gene set, screening space enrichment points, and analyzing the space enrichment point duty ratio difference.
2. The method of claim 1, wherein integrating sample spatial transcriptome data comprises:
and adding a sample name before each data spot label, filling a corresponding sample name in a data slice image information frame, filling provided sample and group names in sample and group information columns of data meta.data, and carrying out multi-sample data integration by using a built-in merge function in a SEurat after adding information.
3. The method of claim 1, wherein the method of extracting a set of cell type-specific signature genes from single cell transcriptome data comprises:
calculating differential genes of cell subtypes by using FindMarkers function, and selecting significant high-expression genes ordered by avg_log2FC as a characteristic gene set of the cell type;
the judgment standard of the remarkable high expression is as follows: avg_log2fc > 0.25 in target subpopulations and expression values with Wilcoxon rank sum test significance p value of less than 0.05 in target subpopulation cells and other subpopulations cells.
4. The method of claim 1, wherein scoring the set of spatial transcriptome signature genes comprises:
scoring was performed using four algorithms, ssGSEA, AUCell, UCell and singscore.
5. The method of claim 1, wherein the source of the set of spatial transcriptome signature genes comprises cell marker and/or single cell transcriptome data of the corresponding tissue.
6. The method of claim 1, wherein the screening for spatial enrichment points comprises:
determining an enrichment threshold according to formula (1), in particular a score S scored according to the set of characteristic genes j Sorting each feature j, wherein the spots with the front alpha duty ratio with the highest score are not included in the determination of the threshold value, and selecting the highest score after the spots with the front alpha duty ratio with the highest score are excluded as an enrichment threshold value, and the spots with the enrichment score higher than the enrichment threshold value are space enrichment points;
formula (1);
wherein j is a feature thr j An enrichment threshold for feature j, S j Scoring the characteristic gene set, wherein alpha is the front alpha ratio with highest score, K is a defined percentage factor, and K is 0.6-0.8.
7. The method of claim 1, wherein the spatial transcriptome feature enrichment differential analysis comprises:
calculating the space enrichment point duty ratio of each feature j of each space transcriptome slice sample i according to the formula (2), performing sample-level statistical analysis comparison between space transcriptome data sample groups by using a t-test or a wilcox test, and setting pvalue < 0.05 to estimate statistical significance;
formula (2);
where j is a feature and i is a spatial transcriptome slice sample, fraction ij The spatial enrichment point duty cycle, N, for each feature j of each spatial transcriptome slice sample i ij For samples exceeding the threshold thr j The number of spots, M i The total number of shots of the sample.
8. The device is characterized by comprising an integrated sample space transcriptome data module, a cell type specific characteristic gene set module extracted from single cell transcriptome data, a scoring module for the space transcriptome characteristic gene set, a space enrichment point screening module and a space enrichment point duty ratio difference analysis module;
the integrated sample space transcriptome data module is to perform operations comprising:
adding a sample name before each data spot label, filling a corresponding sample name in a data slice image information frame, filling provided sample and group names in sample and group information columns of data meta.data, and carrying out multi-sample data integration by using a built-in merge function in a semoat after adding information;
the extraction of cell type specific signature gene sets from single cell transcriptome data module is for performing a method comprising:
calculating differential genes of cell subtypes by using FindMarkers function, and selecting significant high-expression genes ordered by avg_log2FC as a characteristic gene set of the cell type;
the judgment standard of the remarkable high expression is as follows: avg_log2fc > 0.25 in the target subpopulation and expression values with Wilcoxon rank sum test significance p value of less than 0.05 for target subpopulation cells and other subpopulations cells;
the scoring module for scoring the set of spatial transcriptome signature genes is configured to perform the steps comprising:
scoring was performed using four algorithms, ssGSEA, AUCell, UCell and singscore;
the screening space enrichment point module is configured to perform operations including:
determining an enrichment threshold according to formula (1), in particular a score S scored according to the set of characteristic genes j Sorting each feature j, wherein the spots with the front alpha duty ratio with the highest score are not included in the determination of the threshold value, and selecting the highest score after the spots with the front alpha duty ratio with the highest score are excluded as an enrichment threshold value, and the spots with the enrichment score higher than the enrichment threshold value are space enrichment points;
formula (1);
wherein j is a feature thr j An enrichment threshold for feature j, S j Scoring the characteristic gene set, wherein alpha is the front alpha ratio with the highest score, K is a defined percentage factor, and K is 0.6-0.8;
the spatial enrichment point duty ratio difference analysis module is configured to perform operations comprising: calculating the space enrichment point duty ratio of each feature j of each space transcriptome slice sample i according to the formula (2), performing sample-level statistical analysis comparison between space transcriptome data sample groups by using a t-test or a wilcox test, and setting pvalue < 0.05 to estimate statistical significance;
formula (2);
where j is a feature and i is a spatial transcriptome slice sample, fraction ij The spatial enrichment point duty cycle, N, for each feature j of each spatial transcriptome slice sample i ij For samples exceeding the threshold thr j The number of spots, M i The total number of shots of the sample.
9. Use of the spatial transcriptome feature enrichment differential analysis device of claim 8 for analyzing spatial transcriptome data.
10. The use of claim 9, wherein analyzing the spatial transcriptome data comprises analyzing spatial transcriptome feature enrichment differences and/or spatial transcriptome function enrichment differences.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202310833965.8A CN116564419B (en) | 2023-07-10 | 2023-07-10 | Space transcriptome characteristic enrichment difference analysis method and application thereof |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202310833965.8A CN116564419B (en) | 2023-07-10 | 2023-07-10 | Space transcriptome characteristic enrichment difference analysis method and application thereof |
Publications (2)
Publication Number | Publication Date |
---|---|
CN116564419A true CN116564419A (en) | 2023-08-08 |
CN116564419B CN116564419B (en) | 2023-09-15 |
Family
ID=87491918
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202310833965.8A Active CN116564419B (en) | 2023-07-10 | 2023-07-10 | Space transcriptome characteristic enrichment difference analysis method and application thereof |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN116564419B (en) |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN117671676A (en) * | 2024-01-30 | 2024-03-08 | 中山大学附属口腔医院 | Method for evaluating abnormal immune cells based on space transcriptome visual image |
Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN108664769A (en) * | 2017-03-31 | 2018-10-16 | 中国科学院上海生命科学研究院 | Large-scale medicine method for relocating based on cancer gene group and non-specific gene label |
WO2019084046A1 (en) * | 2017-10-23 | 2019-05-02 | The Broad Institute, Inc. | Single cell cellular component enrichment from barcoded sequencing libraries |
CN112522371A (en) * | 2020-12-21 | 2021-03-19 | 广州基迪奥生物科技有限公司 | Analysis method of spatial transcriptome sequencing data |
CN114708910A (en) * | 2022-02-24 | 2022-07-05 | 上海市第一人民医院 | Method for calculating cell subset enrichment fraction in cell sequencing by using single cell sequencing data |
CN114944193A (en) * | 2022-05-20 | 2022-08-26 | 南开大学 | Analysis method and system for integrating single-cell transcriptome and spatial transcriptome data |
-
2023
- 2023-07-10 CN CN202310833965.8A patent/CN116564419B/en active Active
Patent Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN108664769A (en) * | 2017-03-31 | 2018-10-16 | 中国科学院上海生命科学研究院 | Large-scale medicine method for relocating based on cancer gene group and non-specific gene label |
WO2019084046A1 (en) * | 2017-10-23 | 2019-05-02 | The Broad Institute, Inc. | Single cell cellular component enrichment from barcoded sequencing libraries |
CN112522371A (en) * | 2020-12-21 | 2021-03-19 | 广州基迪奥生物科技有限公司 | Analysis method of spatial transcriptome sequencing data |
CN114708910A (en) * | 2022-02-24 | 2022-07-05 | 上海市第一人民医院 | Method for calculating cell subset enrichment fraction in cell sequencing by using single cell sequencing data |
CN114944193A (en) * | 2022-05-20 | 2022-08-26 | 南开大学 | Analysis method and system for integrating single-cell transcriptome and spatial transcriptome data |
Non-Patent Citations (1)
Title |
---|
赵梅娥: "基于单细胞测序的乙肝疫苗无免疫应答者外周血单个核细胞的转录特征研究", 《中国优秀硕士学位论文全文数据库 医药卫生科技辑》, no. 1, pages 12 - 27 * |
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN117671676A (en) * | 2024-01-30 | 2024-03-08 | 中山大学附属口腔医院 | Method for evaluating abnormal immune cells based on space transcriptome visual image |
CN117671676B (en) * | 2024-01-30 | 2024-04-09 | 中山大学附属口腔医院 | Method for evaluating abnormal immune cells based on space transcriptome visual image |
Also Published As
Publication number | Publication date |
---|---|
CN116564419B (en) | 2023-09-15 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN116564419B (en) | Space transcriptome characteristic enrichment difference analysis method and application thereof | |
CN114708910B (en) | Method for calculating enrichment score of cell subpopulations in cell sequencing by using single cell sequencing data | |
US20090226916A1 (en) | Automated Analysis of DNA Samples | |
KR20150107718A (en) | Visualization tools for digital pcr data | |
CN105506111B (en) | Method for detecting CNV (CNV) marker of MAPK10 gene of Nanyang cattle and application of CNV marker | |
CN112048560A (en) | Kit for analyzing HER2 gene copy number variation by combining multiple internal references with sequential probability ratio test and use method | |
CN111292807A (en) | Method for analyzing double cells in single-cell transcriptome data | |
CN112102944A (en) | NGS-based brain tumor molecular diagnosis analysis method | |
CN117079717A (en) | Cell subtype identification method, device, equipment and medium | |
KR102397822B1 (en) | Apparatus and method for analyzing cells using chromosome structure and state information | |
CN114634988B (en) | SNP (Single nucleotide polymorphism) sites and method for identifying and researching biological geographic sources of east Asia population | |
CN115948521A (en) | Method for detecting aneuploid missing chromosome information | |
US20080241848A1 (en) | Methods for prenatal diagnosis of aneuploidy | |
Baak et al. | DNA cytometric features in biopsies of TaT1 urothelial cell cancer predict recurrence and stage progression more accurately than stage, grade, or treatment modality | |
US20220221460A1 (en) | Method of quantifying her2 in breast cancer sample by mass spectrometry and scoring her2 status using the same | |
JP2022136465A (en) | Mechanical detection of breakpoint candidate of copy number variant on genome sequence | |
CN113584148A (en) | Specific marker screening method for azoospermia and severe oligospermia detection | |
CN113380318A (en) | Artificial intelligence assisted flow cytometry 40CD immunophenotyping detection method and system | |
EP3533883A1 (en) | Predicting cancer recurrence using a prognostic model that combines immunohistochemical staining and gene expression profiling | |
US6994965B2 (en) | Method for displaying results of hybridization experiment | |
CN111833297A (en) | Disease association method of marrow cell morphology automatic detection system | |
Schumann et al. | flowCyBar-Analyze flow cytometric data using gate information | |
CN112760383B (en) | qRT-PCR internal reference gene applied to lung adenocarcinoma cell subgroup and application thereof | |
WO2023246808A1 (en) | Use of cancer-associated short exons to assist cancer diagnosis and prognosis | |
CN110872618A (en) | Method for judging sex of detected sample based on Illumina human whole genome SNP chip data and application |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |