CN112735523A - System and detection method for identifying arabidopsis thaliana cotyledon cell type - Google Patents
System and detection method for identifying arabidopsis thaliana cotyledon cell type Download PDFInfo
- Publication number
- CN112735523A CN112735523A CN202011379750.6A CN202011379750A CN112735523A CN 112735523 A CN112735523 A CN 112735523A CN 202011379750 A CN202011379750 A CN 202011379750A CN 112735523 A CN112735523 A CN 112735523A
- Authority
- CN
- China
- Prior art keywords
- cell
- cells
- cotyledon
- arabidopsis thaliana
- type
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 238000001514 detection method Methods 0.000 title claims abstract description 7
- 241000219195 Arabidopsis thaliana Species 0.000 title claims description 47
- 238000012163 sequencing technique Methods 0.000 claims abstract description 44
- 241000219194 Arabidopsis Species 0.000 claims abstract description 9
- 210000004027 cell Anatomy 0.000 claims description 245
- 230000014509 gene expression Effects 0.000 claims description 45
- 108090000623 proteins and genes Proteins 0.000 claims description 42
- 239000003550 marker Substances 0.000 claims description 27
- 101100279849 Arabidopsis thaliana EPF1 gene Proteins 0.000 claims description 24
- 230000000442 meristematic effect Effects 0.000 claims description 24
- 210000000473 mesophyll cell Anatomy 0.000 claims description 24
- 230000006870 function Effects 0.000 claims description 17
- 101100272203 Arabidopsis thaliana BASL gene Proteins 0.000 claims description 16
- 101100279850 Arabidopsis thaliana EPF2 gene Proteins 0.000 claims description 16
- 101100078869 Arabidopsis thaliana MUTE gene Proteins 0.000 claims description 16
- 101100340248 Arabidopsis thaliana SCRM gene Proteins 0.000 claims description 16
- 101100257461 Arabidopsis thaliana SPCH gene Proteins 0.000 claims description 16
- -1 POLAR Proteins 0.000 claims description 16
- 210000000130 stem cell Anatomy 0.000 claims description 13
- 238000000034 method Methods 0.000 claims description 12
- 238000007405 data analysis Methods 0.000 claims description 10
- 238000012545 processing Methods 0.000 claims description 10
- 238000005516 engineering process Methods 0.000 claims description 9
- 101100446214 Arabidopsis thaliana FAMA gene Proteins 0.000 claims description 8
- 101100450324 Arabidopsis thaliana HDG2 gene Proteins 0.000 claims description 8
- 101001018552 Homo sapiens MyoD family inhibitor domain-containing protein Proteins 0.000 claims description 8
- 238000011161 development Methods 0.000 claims description 7
- 238000010586 diagram Methods 0.000 claims description 7
- 101100287056 Arabidopsis thaliana IQD5 gene Proteins 0.000 claims description 6
- 210000003969 blast cell Anatomy 0.000 claims description 5
- 238000001228 spectrum Methods 0.000 claims description 5
- 238000010276 construction Methods 0.000 claims description 3
- 230000001105 regulatory effect Effects 0.000 claims description 3
- 238000013507 mapping Methods 0.000 claims description 2
- 230000002068 genetic effect Effects 0.000 claims 1
- 238000004458 analytical method Methods 0.000 description 5
- 238000003556 assay Methods 0.000 description 4
- 238000004590 computer program Methods 0.000 description 2
- 230000008569 process Effects 0.000 description 2
- 241000854350 Enicospilus group Species 0.000 description 1
- 101000914511 Homo sapiens CD27 antigen Proteins 0.000 description 1
- 101150000208 SCRM gene Proteins 0.000 description 1
- 210000001744 T-lymphocyte Anatomy 0.000 description 1
- 210000003719 b-lymphocyte Anatomy 0.000 description 1
- 230000009286 beneficial effect Effects 0.000 description 1
- 238000002474 experimental method Methods 0.000 description 1
- 210000001806 memory b lymphocyte Anatomy 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 230000009467 reduction Effects 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G16—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
- G16B—BIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
- G16B30/00—ICT specially adapted for sequence analysis involving nucleotides or amino acids
-
- C—CHEMISTRY; METALLURGY
- C12—BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
- C12Q—MEASURING OR TESTING PROCESSES INVOLVING ENZYMES, NUCLEIC ACIDS OR MICROORGANISMS; COMPOSITIONS OR TEST PAPERS THEREFOR; PROCESSES OF PREPARING SUCH COMPOSITIONS; CONDITION-RESPONSIVE CONTROL IN MICROBIOLOGICAL OR ENZYMOLOGICAL PROCESSES
- C12Q1/00—Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions
- C12Q1/68—Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions involving nucleic acids
- C12Q1/6876—Nucleic acid products used in the analysis of nucleic acids, e.g. primers or probes
- C12Q1/6881—Nucleic acid products used in the analysis of nucleic acids, e.g. primers or probes for tissue or cell typing, e.g. human leukocyte antigen [HLA] probes
-
- G—PHYSICS
- G16—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
- G16B—BIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
- G16B40/00—ICT specially adapted for biostatistics; ICT specially adapted for bioinformatics-related machine learning or data mining, e.g. knowledge discovery or pattern finding
-
- G—PHYSICS
- G16—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
- G16B—BIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
- G16B45/00—ICT specially adapted for bioinformatics-related data visualisation, e.g. displaying of maps or networks
-
- C—CHEMISTRY; METALLURGY
- C12—BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
- C12Q—MEASURING OR TESTING PROCESSES INVOLVING ENZYMES, NUCLEIC ACIDS OR MICROORGANISMS; COMPOSITIONS OR TEST PAPERS THEREFOR; PROCESSES OF PREPARING SUCH COMPOSITIONS; CONDITION-RESPONSIVE CONTROL IN MICROBIOLOGICAL OR ENZYMOLOGICAL PROCESSES
- C12Q2600/00—Oligonucleotides characterized by their use
- C12Q2600/158—Expression markers
Landscapes
- Life Sciences & Earth Sciences (AREA)
- Health & Medical Sciences (AREA)
- Physics & Mathematics (AREA)
- Engineering & Computer Science (AREA)
- Chemical & Material Sciences (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Proteomics, Peptides & Aminoacids (AREA)
- Biophysics (AREA)
- Biotechnology (AREA)
- General Health & Medical Sciences (AREA)
- Medical Informatics (AREA)
- Analytical Chemistry (AREA)
- Bioinformatics & Computational Biology (AREA)
- Evolutionary Biology (AREA)
- Spectroscopy & Molecular Physics (AREA)
- Theoretical Computer Science (AREA)
- Organic Chemistry (AREA)
- Immunology (AREA)
- Data Mining & Analysis (AREA)
- Zoology (AREA)
- Wood Science & Technology (AREA)
- Genetics & Genomics (AREA)
- Bioethics (AREA)
- Microbiology (AREA)
- Biochemistry (AREA)
- General Engineering & Computer Science (AREA)
- Cell Biology (AREA)
- Artificial Intelligence (AREA)
- Molecular Biology (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Databases & Information Systems (AREA)
- Epidemiology (AREA)
- Evolutionary Computation (AREA)
- Public Health (AREA)
- Software Systems (AREA)
- Measuring Or Testing Involving Enzymes Or Micro-Organisms (AREA)
Abstract
The invention relates to a system and a detection method for identifying the cell type of an arabidopsis cotyledon based on single cell sequencing. About 1 ten thousand cells can be identified in about 10 minutes, so that the labor cost is greatly reduced, and the annotation precision is ensured.
Description
Technical Field
The invention belongs to the technical field of transcriptome sequencing, and particularly relates to a system and a detection method for identifying the type of an arabidopsis thaliana cotyledon cell based on single-cell transcriptome sequencing data.
Background
In the field of high-throughput single-cell transcriptome sequencing analysis, cell type identification is a crucial link, and by cell type identification and analysis, heterogeneity of complex cell populations can be effectively revealed, and a cell map is constructed. At present, two methods exist for identifying cell types, namely manual identification based on specific Marker genes (Marker-based), and identification based on single cell reference data sets. The use of the former method of marker-based artificial identification means that researchers must consult a large amount of literature to collect markers, is time-consuming and labor-consuming, and many cell types cannot distinguish cell types or subtypes well by a few markers. For example, in Reference-based analysis of long single-cell sequencing derived a translational structural macro, using CD27 gene can not accurately judge the negative B cell and memory B cell, and in T cell subtype, the marker gene has only high or low expression level difference in many cases, and the cell type can not be judged by the expression of a small amount of marker. However, methods based on singleR dataset identification can distinguish cell subtypes well.
For an arabidopsis thaliana cotyledon sample, no directly available reference data set is available at present for automatically and rapidly matching and identifying a cell type, manual identification only by a marker gene is time-consuming and labor-consuming, the automation degree is low, and the accuracy of identification of similar cell types is not high. Therefore, it is highly desirable to construct a single cell reference data set suitable for the identification of the cotyledon cell type of Arabidopsis thaliana, and to establish a set of computer programs for the automated identification of the cell type.
Disclosure of Invention
Based on the above problems, the present invention aims to overcome the above disadvantages of the prior art, and provide an analysis method for rapidly and objectively identifying the cell type of arabidopsis thaliana cotyledons based on single-cell transcriptome sequencing data.
The invention provides a system for identifying the type of an arabidopsis cotyledon cell based on single cell sequencing, which is characterized by comprising the following components: a cell sequencing platform, a database platform of cell types, and a data analysis and processing platform.
The cell sequencing platform is a single-cell transcriptome sequencing platform, and gene data of the cell is obtained by a single-cell transcriptome sequencing technology (scRNA-seq).
The database platform for cell types as described above was based on Marker genes of mesophyll cells (MPC), pseudomeristematic blast cells (MMC), early meristematic cells (EM), late meristematic cells (LM), Guard Mother Cells (GMC), Young Guard Cells (YGC), Guard Cells (GC), squamous cells (PC), and arabidopsis thaliana reference data platforms were constructed, wherein the Marker genes of each cell were as follows:
mesophyll cells (MPCs): RBCS, LHCB
Pseudomeristematic blast (MMC): HDG2, POLAR, SPCH, TMM, MUTE, EPF2
Early meristematic cells (EM): MUTE, BASL, SPCH, EPF2
Late meristematic cells (LM): BASL, MUTE, EPF1
Guard Mother Cell (GMC): EPF1, HIC, FAMA, SCRM
Young Guard Cells (YGC): RBCS, FAMA, EPF1
Guard Cells (GC): low expression of RBCS, FAMA, SCRM, and TMM genes
Flat cell (PC): IQD5, RBCS.
The database platform for cell types as described above was established as follows:
through single cell transcriptome sequencing technology (scRNA-seq), a plurality of Marker genes are collected to identify cell types representing different stages of stomatal development, and specific cells and Marker genes are as follows:
mesophyll cells (MPCs): RBCS, LHCB
Pseudomeristematic blast (MMC): HDG2, POLAR, SPCH, TMM, MUTE, EPF2
Early meristematic cells (EM): MUTE, BASL, SPCH, EPF2
Early meristematic cells (LM): BASL, MUTE, EPF1
Guard Mother Cell (GMC): EPF1, HIC, FAMA, SCRM
Young Guard Cells (YGC): RBCS, FAMA, EPF1
Guard Cells (GC): low expression of RBCS, FAMA, SCRM, and TMM genes
Flat cell (PC): IQD5, RBCS;
a database platform (single cell reference data set) suitable for identifying the type of the arabidopsis thaliana cotyledon cells is constructed.
The steps for constructing a database platform (single cell reference data set) suitable for identifying the type of the arabidopsis thaliana cotyledon cell are as follows:
plotting expression level in single cells for the relevant markers using FeaturePlot () and VlnPlot () functions;
plotting gene expression clustering heatmaps in single cells for related markers using a pheasap () function;
and judging the cell type composition of the arabidopsis thaliana cotyledon based on the expression quantity diagram and the gene expression clustering heat map, obtaining a single cell expression spectrum corresponding to each cell type of the arabidopsis thaliana cotyledon, and constructing a cell type identification reference data set.
The data analysis and processing platform identifies cell types using the SingleR () function, plots a correlation heat map for cell type identification, counts the most abundant cell types, outputs results and plots as described above.
Preferably, the data analysis and processing steps are as follows:
based on the construction, a cell type identification reference data set is obtained, and a SingleR packet is used for matching corresponding cell types by comparing the ranking of genes which are obviously up-regulated and expressed in each group of data to be detected in the reference data set, and is used for quickly judging the types of the arabidopsis thaliana cotyledon cells in the subsequent high-throughput single-cell transcriptome sequencing, and the specific operation steps are as follows:
importing data to be detected;
loading a constructed database platform (single cell reference data set) suitable for identifying the type of the arabidopsis thaliana cotyledon cells;
identifying a cell type using a SingleR () function;
mapping cell type identification correlation heatmaps;
counting the most abundant cell types;
and outputting the result and drawing.
The invention also provides a detection method for identifying the type of the arabidopsis thaliana cotyledon cell based on single cell sequencing, which is characterized by comprising the following steps:
based on a system for identifying the type of an arabidopsis cotyledon cell based on single cell sequencing, the system comprises: a cell sequencing platform, a database platform of cell types, and a data analysis and processing platform;
the cell sequencing platform is a single-cell transcriptome sequencing platform, and gene data of cells are obtained by a single-cell transcriptome sequencing technology (scRNA-seq);
the cell type database platform is based on Marker genes of mesophyll cells (MPC), pseudomeristematic blast cells (MMC), early meristematic cells (EM), late meristematic cells (LM), guard blast cells (GMC), Young Guard Cells (YGC), Guard Cells (GC) and squamous cells (PC), and an arabidopsis thaliana reference data platform is constructed, wherein the Marker genes of all the cells are as follows:
mesophyll cells (MPCs): RBCS, LHCB
Pseudomeristematic blast (MMC): HDG2, POLAR, SPCH, TMM, MUTE, EPF2
Early meristematic cells (EM): MUTE, BASL, SPCH, EPF2
Late meristematic cells (LM): BASL, MUTE, EPF1
Guard Mother Cell (GMC): EPF1, HIC, FAMA, SCRM
Young Guard Cells (YGC): RBCS, FAMA, EPF1
Guard Cells (GC): low expression of RBCS, FAMA, SCRM, and TMM genes
Flat cell (PC): IQD5, RBCS;
the data analysis and processing platform identifies cell types by using a SingleR () function, draws a cell type identification correlation heat map, counts the most cell types, and outputs results and a drawing.
The database platform for cell types as described above was established as follows:
through a single cell transcriptome sequencing technology (scRNA-seq), a plurality of Marker genes are collected to identify cell types representing different stages of stomatal development, and a database platform (single cell reference data set) suitable for identifying the type of the arabidopsis thaliana cotyledon cells is constructed.
The steps for constructing a database platform (single cell reference data set) suitable for identifying the type of the arabidopsis thaliana cotyledon cell are as follows:
plotting expression level in single cells for the relevant markers using FeaturePlot () and VlnPlot () functions;
plotting gene expression clustering heatmaps in single cells for related markers using a pheasap () function;
and judging the cell type composition of the arabidopsis thaliana cotyledon based on the expression quantity diagram and the gene expression clustering heat map, obtaining a single cell expression spectrum corresponding to each cell type of the arabidopsis thaliana cotyledon, and constructing a cell type identification reference data set.
Further elaborating the technical scheme of the invention:
the method provided by the invention collects a plurality of Marker genes in the existing related documents to identify the cell types representing different stages of stomatal development by a single cell transcriptome sequencing technology (scRNA-seq), constructs a single cell reference data set suitable for identifying the types of the arabidopsis thaliana cotyledon cells, and establishes a set of computer programs for automatic identification. The method specifically comprises the following steps:
1. expression levels in individual cells were plotted against the associated markers using the FeaturePlot () and VlnPlot () functions in the sourta package (v3.0.0).
Mesophyll cells (MPCs): RBCS, LHCB
Pseudomeristematic blast (MMC): HDG2, POLAR, SPCH, TMM, MUTE, EPF2
Early meristematic cells (EM): MUTE, BASL, SPCH, EPF2
Early meristematic cells (LM): BASL, MUTE, EPF1
Guard Mother Cell (GMC): EPF1, HIC, FAMA, SCRM
Young Guard Cells (YGC): RBCS, FAMA, EPF1
Guard Cells (GC): low expression of RBCS, FAMA, SCRM, and TMM genes
Flat cell (PC): iqd5, RBCS
2. The gene expression cluster heatmap in individual cells was plotted against the associated markers using the pheamap () function in the pheamap package.
library(pheatmap)
pdf("heatmap.pdf")
pheatmap(topn_markers2vis,cluster_rows=T,cluster_cols=T,show_rownames=T)
dev.off()
3. And judging the cell type composition of the arabidopsis thaliana cotyledon based on the expression quantity graph, the gene expression clustering heat map and the like, obtaining a single cell expression spectrum corresponding to each cell type of the arabidopsis thaliana cotyledon, and constructing a cell type identification reference data set.
library(SingleR)
library(Seurat)
library(scater)
library(dplyr)
ref_ob=readRDS("celltype.rds")
ref.m=GetAssayData(ref_ob,assay="RNA",slot="counts")
cell_metadata=ref_ob@meta.data%>%select("celltype")
ref.sce=SingleCellExperiment(assays=list(counts=ref.m),colData=cell_metadata)
ref.sce=logNormCounts(ref.sce)
saveRDS(ref.sce,"reference.rds")
4. And obtaining a cell type identification reference data set based on the construction, and using a SingleR packet to match corresponding cell types by comparing the ranking of the genes which are obviously up-regulated and expressed in each group of data to be detected in the reference data set, so as to be used for quickly judging the types of the arabidopsis thaliana cotyledon cells in the subsequent high-throughput single-cell transcriptome sequencing.
In conclusion, the beneficial effects of the invention are as follows: based on single cell transcriptome sequencing data, annotation aiming at the arabidopsis thaliana cotyledon cell type can be quickly finished by adopting the reference data set and the automatic identification process, and identification can be finished within about 10 minutes for about 1 ten thousand cells. The method has the advantages that single R is not innovatively used, but a reference data set of the arabidopsis thaliana cotyledon cell type is constructed for the first time by using the single R, so that subsequent researchers can quickly identify the arabidopsis thaliana cotyledon cell type in a single cell sequencing result.
Drawings
FIG. 1 is a violin diagram showing the expression level of marker gene, the abscissa is the number of cell population and the ordinate is the normalized gene expression value;
FIG. 2 is a graph showing the expression amount of Maker gene, featureplot;
FIG. 3 is a schematic diagram of the process of automated identification of the type of cotyledon cells of Arabidopsis thaliana in single cell sequencing;
FIG. 4 shows the results of cell type identification using the present automated procedure.
Detailed Description
The invention is further described in detail with reference to the following specific examples and the accompanying drawings. The procedures, conditions, experimental methods and the like for carrying out the present invention are general knowledge and common general knowledge in the art except for the contents specifically mentioned below, and the present invention is not particularly limited.
Example 1, Manual identification
Firstly, a lot of literature data are consulted to collect Marker genes, an expression clustering heat map of the genes and an expression quantity map in a single cell (FeatureParot) are drawn, so that cell types representing different stages of stomata development in an arabidopsis cotyledon are identified manually, and the specific used Marker genes are as follows:
mesophyll cells (MPCs): RBCS, LHCB
Pseudomeristematic blast (MMC): HDG2, POLAR, SPCH, TMM, MUTE, EPF2
Early meristematic cells (EM): MUTE, BASL, SPCH, EPF2
Late meristematic cells (LM): BASL, MUTE, EPF1
Guard Mother Cell (GMC): EPF1, HIC, FAMA, SCRM
Young Guard Cells (YGC): RBCS, FAMA, EPF1
Guard Cells (GC): low expression of RBCS, FAMA, SCRM, and TMM genes
Flat cell (PC): iqd5, RBCS
The expression level of the gene in a single cell was plotted using the following code:
example 2 identification method based on singleR reference dataset
Based on the identified arabidopsis thaliana cotyledon cell types, a reference data set of each cell type is constructed according to the expression profile of the arabidopsis thaliana cotyledon cell types and is used for quickly judging the arabidopsis thaliana cotyledon cell types in high-throughput single-cell transcriptome sequencing, and the specific operation steps are as follows:
seurat_ob=readRDS("seurat_ob.rds")
query.m=GetAssayData(seurat_ob,assay="RNA",slot="counts")
query.sce=SingleCellExperiment(assays=list(counts=query.m))
query.sce=logNormCounts(query.sce)
ref.sce=readRDS("reference.rds")
pred=SingleR(query.sce,ref.sce,labels=factor(ref.sce$celltype),BPPARAM=
MulticoreParam(workers=10))
saveRDS(pred,"singleR.rds")
step 4, drawing a correlation heat map for identifying the cell types;
step 5, counting the cell types with the most proportion in each cluster;
seurat_ob=SetIdent(seurat_ob,value="clusters")
top_celltype=main_celltyping_stat%>%group_by(clusters)%>%top_n(1,cell_num)
write.table(top_celltype,quote=F,"top_celltyping_statistics.xls",sep="\t",row.names=F)
and 6, outputting the comment result of each cluster and drawing.
from.id=as.vector(top_celltype$clusters)
to.id=as.vector(top_celltype$raw_celltype)
seurat_ob=SetIdent(seurat_ob,value=
plyr::mapvalues(x=Idents(seurat_ob),from=from.id,to=to.id))
seurat_ob=StashIdent(seurat_ob,save.name="celltype")
ggtsne2=DimPlot(object=seurat_ob,reduction="tsne",pt.size=1)+theme(plot.title=
element_text(hjust=0.5))
ggsave("celltyping.pdf",plot=ggtsne2)
Results and analysis:
the SCRM gene is one of Marker genes of Guard Mother Cells (GMC), and by drawing a violin graph and FeaturePlot (figure 1 and figure 2) of the expression amount of the gene in a single cell, the gene can be seen to be expressed in the 6 th group and the 11 th group, and only the expression amount is different, so that the accuracy of judging the type of a similar cell population is not high only by the expression of a small amount of Marker, and a large amount of literature data needs to be consulted to search for more Marker genes to manually identify, which is time-consuming and labor-consuming.
By using the reference data set and the automatic program constructed by the invention, the cell types representing different stages of stomatal development in the arabidopsis cotyledon can be quickly obtained only by inputting the data to be identified (the flow schematic diagram is shown in fig. 3) (fig. 4). About 1 ten thousand cells can be identified in about 10 minutes, so that the labor cost is greatly reduced, and two similar cell types are well distinguished: guard Mother Cells (GMC) and Young Guard Cells (YGC), i.e. group 6 and group 11 cells, ensured annotation accuracy.
The foregoing description is a general description of the invention. Although specific terms are employed herein, they are used in a generic and descriptive sense only and not for purposes of limitation, as form changes and equivalents may be employed. Various changes or modifications may be effected therein by one skilled in the art and equivalents may be made thereto without departing from the scope of the invention as defined in the claims appended hereto.
Claims (10)
1. A system for identifying an arabidopsis cotyledon cell type based on single cell sequencing, the system comprising: a cell sequencing platform, a database platform of cell types, and a data analysis and processing platform.
2. The system of claim 1, wherein the platform is a single-cell transcriptome sequencing platform, and the genetic data of the cell is obtained by single-cell transcriptome sequencing technology (scRNA-seq).
3. The system of claim 1, wherein the database platform of cell types is based on Marker genes of mesophyll cells (MPC), pseudomeristematic mother cells (MMC), early meristematic cells (EM), late meristematic cells (LM), Guard Mother Cells (GMC), Young Guard Cells (YGC), Guard Cells (GC), and squamous cells (PC), and the database platform of cell types constructs an arabidopsis thaliana reference data platform, wherein the Marker genes of each cell are as follows:
mesophyll cells (MPCs): RBCS, LHCB
Pseudomeristematic blast (MMC): HDG2, POLAR, SPCH, TMM, MUTE, EPF2
Early meristematic cells (EM): MUTE, BASL, SPCH, EPF2
Late meristematic cells (LM): BASL, MUTE, EPF1
Guard Mother Cell (GMC): EPF1, HIC, FAMA, SCRM
Young Guard Cells (YGC): RBCS, FAMA, EPF1
Guard Cells (GC): low expression of RBCS, FAMA, SCRM, and TMM genes
Flat cell (PC): IQD5, RBCS.
4. The system for identifying the type of Arabidopsis cotyledon cell based on single cell sequencing as claimed in claim 3, wherein the database platform of the cell type is established by the following method:
through single cell transcriptome sequencing technology (scRNA-seq), a plurality of Marker genes are collected to identify cell types representing different stages of stomatal development, and specific cells and Marker genes are as follows:
mesophyll cells (MPCs): RBCS, LHCB
Pseudomeristematic blast (MMC): HDG2, POLAR, SPCH, TMM, MUTE, EPF2
Early meristematic cells (EM): MUTE, BASL, SPCH, EPF2
Early meristematic cells (LM): BASL, MUTE, EPF1
Guard Mother Cell (GMC): EPF1, HIC, FAMA, SCRM
Young Guard Cells (YGC): RBCS, FAMA, EPF1
Guard Cells (GC): low expression of RBCS, FAMA, SCRM, and TMM genes
Flat cell (PC): IQD5, RBCS;
a database platform (single cell reference data set) suitable for identifying the type of the arabidopsis thaliana cotyledon cells is constructed.
5. The system for identifying the type of the arabidopsis thaliana cotyledon cell based on single cell sequencing as claimed in claim 4, wherein the step of constructing the database platform (single cell reference data set) suitable for identifying the type of the arabidopsis thaliana cotyledon cell is as follows:
plotting expression level in single cells for the relevant markers using FeaturePlot () and VlnPlot () functions;
plotting gene expression clustering heatmaps in single cells for related markers using a pheasap () function;
and judging the cell type composition of the arabidopsis thaliana cotyledon based on the expression quantity diagram and the gene expression clustering heat map, obtaining a single cell expression spectrum corresponding to each cell type of the arabidopsis thaliana cotyledon, and constructing a cell type identification reference data set.
6. The system of claim 1, wherein the data analysis and processing platform is configured to identify cell types using SingleR () function, generate a correlation heatmap for identifying cell types, generate statistical scores for cell types, and output results and maps.
7. The system for identifying the type of cotyledon cell of Arabidopsis thaliana based on single cell sequencing as claimed in claim 6, wherein the data analysis and processing steps are as follows:
based on the construction, a cell type identification reference data set is obtained, and a SingleR packet is used for matching corresponding cell types by comparing the ranking of genes which are obviously up-regulated and expressed in each group of data to be detected in the reference data set, and is used for quickly judging the types of the arabidopsis thaliana cotyledon cells in the subsequent high-throughput single-cell transcriptome sequencing, and the specific operation steps are as follows:
importing data to be detected;
loading a constructed database platform (single cell reference data set) suitable for identifying the type of the arabidopsis thaliana cotyledon cells;
identifying a cell type using a SingleR () function;
mapping cell type identification correlation heatmaps;
counting the most abundant cell types;
and outputting the result and drawing.
8. A detection method for identifying the type of an arabidopsis thaliana cotyledon cell based on single cell sequencing is characterized by comprising the following steps:
based on a system for identifying the type of an arabidopsis cotyledon cell based on single cell sequencing, the system comprises: a cell sequencing platform, a database platform of cell types, and a data analysis and processing platform;
the cell sequencing platform is a single-cell transcriptome sequencing platform, and gene data of cells are obtained by a single-cell transcriptome sequencing technology (scRNA-seq);
the cell type database platform is based on Marker genes of mesophyll cells (MPC), pseudomeristematic blast cells (MMC), early meristematic cells (EM), late meristematic cells (LM), guard blast cells (GMC), Young Guard Cells (YGC), Guard Cells (GC) and squamous cells (PC), and an arabidopsis thaliana reference data platform is constructed, wherein the Marker genes of all the cells are as follows:
mesophyll cells (MPCs): RBCS, LHCB
Pseudomeristematic blast (MMC): HDG2, POLAR, SPCH, TMM, MUTE, EPF2
Early meristematic cells (EM): MUTE, BASL, SPCH, EPF2
Late meristematic cells (LM): BASL, MUTE, EPF1
Guard Mother Cell (GMC): EPF1, HIC, FAMA, SCRM
Young Guard Cells (YGC): RBCS, FAMA, EPF1
Guard Cells (GC): low expression of RBCS, FAMA, SCRM, and TMM genes
Flat cell (PC): IQD5, RBCS;
the data analysis and processing platform identifies cell types by using a SingleR () function, draws a cell type identification correlation heat map, counts the most cell types, and outputs results and a drawing.
9. The detection method for identifying the type of the cotyledon cell of Arabidopsis thaliana based on single-cell sequencing according to claim 8, wherein: the database platform of the cell types is established as follows:
through single cell transcriptome sequencing technology (scRNA-seq), a plurality of Marker genes are collected to identify cell types representing different stages of stomatal development,
a database platform (single cell reference data set) suitable for identifying the type of the arabidopsis thaliana cotyledon cells is constructed.
10. The system for identifying the type of the arabidopsis thaliana cotyledon cell based on single cell sequencing as claimed in claim 8, wherein the step of constructing the database platform (single cell reference data set) suitable for identifying the type of the arabidopsis thaliana cotyledon cell is as follows:
plotting expression level in single cells for the relevant markers using FeaturePlot () and VlnPlot () functions;
plotting gene expression clustering heatmaps in single cells for related markers using a pheasap () function;
and judging the cell type composition of the arabidopsis thaliana cotyledon based on the expression quantity diagram and the gene expression clustering heat map, obtaining a single cell expression spectrum corresponding to each cell type of the arabidopsis thaliana cotyledon, and constructing a cell type identification reference data set.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202011379750.6A CN112735523A (en) | 2020-12-01 | 2020-12-01 | System and detection method for identifying arabidopsis thaliana cotyledon cell type |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202011379750.6A CN112735523A (en) | 2020-12-01 | 2020-12-01 | System and detection method for identifying arabidopsis thaliana cotyledon cell type |
Publications (1)
Publication Number | Publication Date |
---|---|
CN112735523A true CN112735523A (en) | 2021-04-30 |
Family
ID=75597119
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202011379750.6A Pending CN112735523A (en) | 2020-12-01 | 2020-12-01 | System and detection method for identifying arabidopsis thaliana cotyledon cell type |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN112735523A (en) |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN114295444A (en) * | 2021-12-30 | 2022-04-08 | 河南大学 | Frozen section method for peach fruit tissue space transcriptomics analysis |
Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20040014215A1 (en) * | 2000-07-27 | 2004-01-22 | Margit Menges | Synchronised arabidopsis cell suspensions and uses thereof |
US20140297194A1 (en) * | 2013-04-02 | 2014-10-02 | Yih-Sheng Yang | Gene signatures for detection of potential human diseases |
CN110060729A (en) * | 2019-03-28 | 2019-07-26 | 广州序科码生物技术有限责任公司 | A method of cell identity is annotated based on unicellular transcript profile cluster result |
CN111243675A (en) * | 2020-01-07 | 2020-06-05 | 广州基迪奥生物科技有限公司 | Interactive cell heterogeneity analysis visualization platform and implementation method thereof |
CN111951892A (en) * | 2020-08-04 | 2020-11-17 | 荣联科技集团股份有限公司 | Method for analyzing cell trajectory based on single cell sequencing data and electronic equipment |
-
2020
- 2020-12-01 CN CN202011379750.6A patent/CN112735523A/en active Pending
Patent Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20040014215A1 (en) * | 2000-07-27 | 2004-01-22 | Margit Menges | Synchronised arabidopsis cell suspensions and uses thereof |
US20140297194A1 (en) * | 2013-04-02 | 2014-10-02 | Yih-Sheng Yang | Gene signatures for detection of potential human diseases |
CN110060729A (en) * | 2019-03-28 | 2019-07-26 | 广州序科码生物技术有限责任公司 | A method of cell identity is annotated based on unicellular transcript profile cluster result |
CN111243675A (en) * | 2020-01-07 | 2020-06-05 | 广州基迪奥生物科技有限公司 | Interactive cell heterogeneity analysis visualization platform and implementation method thereof |
CN111951892A (en) * | 2020-08-04 | 2020-11-17 | 荣联科技集团股份有限公司 | Method for analyzing cell trajectory based on single cell sequencing data and electronic equipment |
Non-Patent Citations (2)
Title |
---|
ZHIXIN LIU ET AL: "Global Dynamic Molecular Profiling of Stomatal Lineage Cell Development by Single-Cell RNA Sequencing", 《MOLECULAR PLANT》 * |
郑光敏等: "单细胞测序数据的智能解析与数据库", 《发育医学电子杂志》 * |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN114295444A (en) * | 2021-12-30 | 2022-04-08 | 河南大学 | Frozen section method for peach fruit tissue space transcriptomics analysis |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN114420212B (en) | Escherichia coli strain identification method and system | |
CN114708910B (en) | Method for calculating enrichment score of cell subpopulations in cell sequencing by using single cell sequencing data | |
CN112289384B (en) | Construction method and application of citrus whole genome KASP marker library | |
CN112735523A (en) | System and detection method for identifying arabidopsis thaliana cotyledon cell type | |
CN113344272A (en) | Prediction method of interaction relation between circRNA, miRNA and RBP based on machine learning | |
CN103866007A (en) | Method for screening real-time fluorescence quantification PCR internal reference molecules of syntrichia caninervis in desert | |
Orlando et al. | Manipulating large-scale Arabidopsis microarray expression data: identifying dominant expression patterns and biological process enrichment | |
CN111292807B (en) | Method for analyzing double cells in single-cell transcriptome data | |
KR101506916B1 (en) | Method for identifying miRNA automatically from sample using miRNA automated detection system | |
CN112233722A (en) | Method for identifying variety, and method and device for constructing prediction model thereof | |
CN105279396B (en) | The Drought-resistant gene of plant module method of excavation | |
CN111681704B (en) | Construction method of matK gene-based unknown plant species identification database and database | |
CN108595914A (en) | One grows tobacco mitochondrial RNA (mt RNA) editing sites high-precision forecasting method | |
CN104112023A (en) | Computer database system based paternity identification search method | |
CN112102880A (en) | Method for identifying variety, and method and device for constructing prediction model thereof | |
CN113066530A (en) | Method for combining linkage disequilibrium SNP in eQTL analysis results in batch | |
CN113377765A (en) | Multi-group chemical data analysis system and data conversion method thereof | |
CN117095748B (en) | Method for constructing plant miRNA genetic regulation pathway | |
Meyer et al. | ReadZS detects developmentally regulated RNA processing programs in single cell RNA-seq and defines subpopulations independent of gene expression | |
CN110232952B (en) | Bioinformatics method for analyzing microsatellite data in batches | |
CN116467596B (en) | Training method of rice grain length prediction model, morphology prediction method and apparatus | |
CN118016145A (en) | Analysis method and system of sgRNA library | |
CN117535429B (en) | SNP locus set for identifying Tibetan chicken variety from Lingzhang and application thereof | |
CN116312786B (en) | Single cell expression pattern difference evaluation method based on multi-group comparison | |
CN116153410B (en) | Microbial genome reference database, construction method and application thereof |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
RJ01 | Rejection of invention patent application after publication |
Application publication date: 20210430 |
|
RJ01 | Rejection of invention patent application after publication |