CN108090325B - Method for analyzing single cell sequencing data by applying beta-stability - Google Patents

Method for analyzing single cell sequencing data by applying beta-stability Download PDF

Info

Publication number
CN108090325B
CN108090325B CN201611126940.0A CN201611126940A CN108090325B CN 108090325 B CN108090325 B CN 108090325B CN 201611126940 A CN201611126940 A CN 201611126940A CN 108090325 B CN108090325 B CN 108090325B
Authority
CN
China
Prior art keywords
cell
cells
gene expression
zygotic
stability
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201611126940.0A
Other languages
Chinese (zh)
Other versions
CN108090325A (en
Inventor
马占山
李连伟
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Kunming Institute of Zoology of CAS
Original Assignee
Kunming Institute of Zoology of CAS
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Kunming Institute of Zoology of CAS filed Critical Kunming Institute of Zoology of CAS
Priority to CN201611126940.0A priority Critical patent/CN108090325B/en
Publication of CN108090325A publication Critical patent/CN108090325A/en
Application granted granted Critical
Publication of CN108090325B publication Critical patent/CN108090325B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16BBIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
    • G16B25/00ICT specially adapted for hybridisation; ICT specially adapted for gene or protein expression

Landscapes

  • Physics & Mathematics (AREA)
  • Health & Medical Sciences (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Biophysics (AREA)
  • Genetics & Genomics (AREA)
  • Molecular Biology (AREA)
  • Engineering & Computer Science (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Biotechnology (AREA)
  • Evolutionary Biology (AREA)
  • General Health & Medical Sciences (AREA)
  • Medical Informatics (AREA)
  • Spectroscopy & Molecular Physics (AREA)
  • Theoretical Computer Science (AREA)
  • Measuring Or Testing Involving Enzymes Or Micro-Organisms (AREA)

Abstract

The invention relates to a bioinformatics analysis method of sequencing data, in particular to a method for analyzing single cell sequencing data by applying beta-stability. The invention aims to analyze the stability and variability of gene expression quantity between single cells on time and space scales by applying a beta-stability method aiming at single cell sequencing data. Taking a dynamic change model of the gene expression level of a single cell in the embryonic development period as an example, the expression level of the gene of the single cell in each period is analyzed by a beta-stability method, so that the dynamic change model of the gene expression level more accurate in the early embryonic development stage is obtained.

Description

Method for analyzing single cell sequencing data by applying beta-stability
Technical Field
The invention relates to a bioinformatics analysis technology of sequencing data, in particular to a method for analyzing the stability and variability of single-cell gene expression quantity on time and space scales by applying beta-stability, taking a dynamic change model of single-cell gene expression quantity in the early development stage of an embryo as an example.
Background
In 2001, the "human genome project" primarily completed the "draft" of human genome, and the function of the gene was originally exposed. Over a decade, gene sequencing technology has been rapidly developed and widely used in various fields. Especially in the field of life sciences, sequencing technology has become a powerful tool for researchers to explore the mysteries of life-base, and many methods for applying sequencing technology have been derived. The single cell sequencing technology is an emerging application method for the sequencing technology in recent years, and compared with the conventional sequencing method, the single cell sequencing technology has the great advantage that the single cell sequencing technology only sequences genetic materials of single cells, so that the heterogeneity of each cell on the genetic materials can be detected to the maximum extent. The advantage of the single cell sequencing technology enables the single cell sequencing technology to have great application value in the fields of life science and medicine. For example, researchers have used single cell sequencing technology to study the differences between tumor cells and normal cells, detect the variation of genetic material in tumor cells, and can examine the process of the change of gene expression levels of tumor cells at different periods; in addition, single cell sequencing of some pathogenic bacteria is used for developing specific vaccines. In addition, researchers perform single cell sequencing on sexual germ cells and embryonic cells to study the dynamic change of gene expression level in the early development stage of organisms.
The single cell sequencing technology is used for researching the change of the gene expression level in the early development stage of organisms, and the gene expression level of single cells in different stages such as sperm cells, egg cells, zygotic cells, two-cell stages, four-cell stages, eight-cell stages and the like is mainly detected (see figure 1). The method comprises the steps of measuring the gene expression quantity of each cell in the early embryonic development stage by using a single cell sequencing technology, firstly carrying out single cell sequencing on sperm cells and egg cells, then sequencing zygote cells generated by the two cells, sequencing cells in a two-cell stage, cells in a four-cell stage and cells in an eight-cell stage generated by the development of the zygote cells, and further researching the dynamic change of the gene expression quantity. However, currently, due to the limitations of the current experimental conditions, the complete development process of a zygotic cell cannot be tracked. For example, after extracting the genome of a zygotic cell, the cell is destroyed and cannot further develop into a two-cell stage embryo. The sequenced two-cell stage samples of genetic material developed from other zygotic cells. As another example, in FIG. 1, after extracting the genetic material of cell A, which is damaged, and cell B and cell A, which are samples from other cells in the two-cell phase, are obtained. Therefore, it is impossible to follow and study all cells in two-cell stage, four-cell stage and eight-cell stage generated by the same zygote cell.
Based on single cell sequencing data, the invention takes single cell gene expression quantity data at the early embryonic development stage as an example, and analyzes the expression quantity of single cell genes at each period by using a beta-stability method. Thereby obtaining a more accurate dynamic change model of the gene expression level.
Disclosure of Invention
The invention aims to:
a method for analyzing the stability and variability of gene expression levels between individual cells on a temporal and spatial scale using beta-stability is provided. Taking single cell sequencing data at the early stage of embryonic development as an example, analyzing the gene expression level of a single cell at each stage of embryonic development by using a beta-stability method so as to obtain a more accurate dynamic change model of the gene expression level.
In order to realize the purpose, the invention adopts the technical scheme that:
starting from sperm cells and egg cells, all individual cells during embryonic development were extracted through the eight-cell stage, and the transcriptome of each cell was subjected to single cell sequencing, thereby determining the expression amount of genes in each cell. As shown in FIG. 1, after single cell sequencing, data on the gene expression levels of 17 cells in total (sperm cells, egg cells, zygotic cells, cell A, cell B, cell A-cell D, cell 1-cell 8) were obtained. Due to the limitations of experimental techniques, the obtained sample of zygotic cells is not developed from the collected sperm cell and egg cell samples, but is generated from the development of other sperm cells and egg cells of the same individual. In response to this drawback, the inventors constructed a dynamic model of gene expression levels in the early embryonic development stage by listing all "developmental pathways" from zygotic cells using a permutation and combination approach.
The invention has the following effects:
a method for analyzing the stability and variability of gene expression quantity between single cells on a time scale and a space scale is invented. The method is used for analyzing the dynamic change of the gene expression quantity in the early embryonic development stage, and a more accurate dynamic change model of the gene expression quantity can be obtained under the limitation of the current experimental technology.
Drawings
FIG. 1 is a diagram showing the distribution of cells at each stage of the early development stage of an embryo
Detailed Description
As described above, due to the limitations of the experimental technique, it is not possible to determine whether cell a is derived from cell a or cell B in this experiment (cell B, cell C, and cell D also face the same problem), and it is also not possible to determine from which cell a to cell D cell 1 is derived (cell 2 to cell 8 face the same problem). Based on the limitations of the experimental conditions described above, the inventors enumerated developmental pathways for all permutations, for a total of 64 possible developmental pathways:
(1) zygotic cell → cell A → cell 1
(2) Zygotic cell → cell A → cell 2
(3) Zygotic cell → cell A → cell 3
(4) Zygotic cell → cell A → cell 4
(5) Zygotic cell → cell A → cell 5
(6) Zygotic cell → cell A → cell 6
(7) Zygotic cell → cell A → cell 7
(8) Zygotic cell → cell A → cell 8
(9) Zygotic cell → cell A → cell B → cell 1
(10) Zygotic cell → cell A → cell B → cell 2
(11) Zygotic cell → cell A → cell B → cell 3
(12) Zygotic cell → cell A → cell B → cell 4
(13) Zygotic cell → cell A → cell B → cell 5
(14) Zygotic cell → cell A → cell B → cell 6
(15) Zygotic cell → cell A → cell B → cell 7
(16) Zygotic cell → cell A → cell B → cell 8
(17) Zygotic cell → cell A → cell C → cell 1
(18) Zygotic cell → cell A → cell C → cell 2
(19) Zygotic cell → cell A → cell C → cell 3
(20) Zygotic cell → cell A → cell C → cell 4
(21) Zygotic cell → cell A → cell C → cell 5
(22) Zygotic cell → cell A → cell C → cell 6
(23) Zygotic cell → cell A → cell C → cell 7
(24) Zygotic cell → cell A → cell C → cell 8
(25) Zygotic cell → cell A → cell D → cell 1
(26) Zygotic cell → cell A → cell D → cell 2
(27) Zygotic cell → cell A → cell D → cell 3
(28) Zygotic cell → cell A → cell D → cell 4
(29) Zygotic cell → cell A → cell D → cell 5
(30) Zygotic cell → cell A → cell D → cell 6
(31) Zygotic cell → cell A → cell D → cell 7
(32) Zygotic cell → cell A → cell D → cell 8
(33) Zygotic cell → cell B → cell A → cell 1
(34) Zygotic cell → cell B → cell A → cell 2
(35) Zygotic cell → cell B → cell A → cell 3
(36) Zygotic cell → cell B → cell A → cell 4
(37) Zygotic cell → cell B → cell A → cell 5
(38) Zygotic cell → cell B → cell A → cell 6
(39) Zygotic cell → cell B → cell A → cell 7
(40) Zygotic cell → cell B → cell A → cell 8
(41) Zygotic cell → cell B → cell 1
(42) Zygotic cell → cell B → cell 2
(43) Zygotic cell → cell B → cell 3
(44) Zygotic cell → cell B → cell 4
(45) Zygotic cell → cell B → cell 5
(46) Zygotic cell → cell B → cell 6
(47) Zygotic cell → cell B → cell 7
(48) Zygotic cell → cell B → cell 8
(49) Zygotic cell → cell B → cell C → cell 1
(50) Zygotic cell → cell B → cell C → cell 2
(51) Zygotic cell → cell B → cell C → cell 3
(52) Zygotic cell → cell B → cell C → cell 4
(53) Zygotic cell → cell B → cell C → cell 5
(54) Zygotic cell → cell B → cell C → cell 6
(55) Zygotic cell → cell B → cell C → cell 7
(56) Zygotic cell → cell B → cell C → cell 8
(57) Zygotic cell → cell B → cell D → cell 1
(58) Zygotic cell → cell B → cell D → cell 2
(59) Zygotic cell → cell B → cell D → cell 3
(60) Zygotic cell → cell B → cell D → cell 4
(61) Zygotic cell → cell B → cell D → cell 5
(62) Zygotic cell → cell B → cell D → cell 6
(63) Zygotic cell → cell B → cell D → cell 7
(64) Zygotic cell → cell B → cell D → cell 8
A dynamic change model of the gene expression level of the cell was calculated for each of the above 64 routes, and the amount of change in the dynamic model was calculated for each route. The inventor adopts the beta-stability index proposed by Wang and Loreau to measure the dynamic change of the expression level of the single-cell gene in the process of embryonic development. The indices include alpha-variability, beta-variability, and gamma-variability. The calculation process is as follows:
(1) the coefficient of variation at the gene level in each cell was calculated:
Figure BSA0000137337330000041
wherein μ represents the average of the expression levels of all genes in a single cell, σ2Represents the variance of the expression levels of all genes in a single cell.
(2) Calculating the synchronism of the expression quantity change of the genes among cells in different periods of each development path:
Figure BSA0000137337330000042
in the formula, S represents the number of genes expressed in a single cell, ρSThe correlation relationship of the expression quantity change of the gene between different cells is shown, and one is calculated for each path
Figure BSA0000137337330000043
This coefficient allows the same cell to have different alpha-variability in the calculated gene expression levels in different pathways.
(3) Calculating the spatial synchronism of the gene expression level of each developmental path
Figure BSA0000137337330000044
Where m denotes the number of cells in each path, pPIndicating the correlation between the gene expression levels of the cells in each developmental pathway.
(4) Calculating the alpha-variability, beta-variability and gamma-variability of the gene expression level change of each cell in the development path.
Figure BSA0000137337330000051
Figure BSA0000137337330000052
Figure BSA0000137337330000053
Figure BSA0000137337330000054
The alpha-variability of gene expression can be calculated for each cell, based on each developmental pathwayThe β -variability and γ -variability were calculated from the α -variability of the cells at four stages, such as the homozygote cell, the two-cell stage cell, the four-cell stage cell, and the eight-cell stage cell. This is used to indicate the dynamic process of gene expression for each developmental pathway. The mean value v of the beta-stabilities of the eight developmental pathways (1, 9, 17, 25, 33, 41, 49, 57) of the producer cell 1 was calculated1As a parameter of the cell 1 dynamics model. Similarly, the dynamic change v of the cells 2 to 8 was calculated2~v8Together with v1Constructing a dynamic model of the expression level of the single-cell gene in the development process of the embryo from the zygote cell to the eight-cell stage.

Claims (6)

1. A method for analyzing stability and variability of gene expression quantity of single cell sequencing data by using a beta-stability method is characterized in that all single cells in the embryonic development process are extracted from sperm cells and egg cells to the eight-cell period, sequencing the transcriptome of each cell by single cell to determine the expression level of the gene in each cell, sequencing by single cell, obtaining the gene expression data of the cells, wherein the gene expression data comprise sperm cells, egg cells, zygotic cells, cells A, B, A-D and 1-8, the obtained zygote cell sample is generated by the development of other sperm cells and egg cells of the same individual, all embryo development paths are listed from the zygote cells by adopting a permutation and combination method, and an accurate dynamic change model of the gene expression quantity is obtained by integrating the dynamic change models calculated by all the paths.
2. The method of claim 1, wherein: beta-stability methods include alpha-variability, beta-variability, and gamma-variability.
3. The method of claim 1, wherein: the stability and variability of gene expression levels between single cells on both temporal and spatial scales were analyzed for single cell sequencing data.
4. The method of claim 1, wherein: and (3) aiming at the single cell sequencing data of each period in the early embryonic development stage, analyzing the dynamic change of the gene expression quantity by using a beta-stability method.
5. The method of claim 4, wherein: enumerating all embryo development paths, and analyzing the dynamic change of gene expression amount by applying a beta-stability method aiming at each development path.
6. The method of any of claims 1-5, wherein: products whose algorithms and functions are implemented in any form of software, firmware or hardware to provide services.
CN201611126940.0A 2016-11-23 2016-11-23 Method for analyzing single cell sequencing data by applying beta-stability Active CN108090325B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201611126940.0A CN108090325B (en) 2016-11-23 2016-11-23 Method for analyzing single cell sequencing data by applying beta-stability

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201611126940.0A CN108090325B (en) 2016-11-23 2016-11-23 Method for analyzing single cell sequencing data by applying beta-stability

Publications (2)

Publication Number Publication Date
CN108090325A CN108090325A (en) 2018-05-29
CN108090325B true CN108090325B (en) 2022-01-25

Family

ID=62170487

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201611126940.0A Active CN108090325B (en) 2016-11-23 2016-11-23 Method for analyzing single cell sequencing data by applying beta-stability

Country Status (1)

Country Link
CN (1) CN108090325B (en)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109033743B (en) * 2018-07-25 2021-01-01 上海交通大学 Method for reducing technical noise in single-cell transcriptome data

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105392894A (en) * 2012-01-20 2016-03-09 深圳华大基因医学有限公司 Method and system for determining whether copy number variation exists in sample genome, and computer readable medium
CN105603062A (en) * 2006-05-03 2016-05-25 人口诊断股份有限公司 Method of evaluating genetic disorders
CN105989249A (en) * 2014-09-26 2016-10-05 叶承羲 Method, system and device for assembling genomic sequence

Family Cites Families (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
KR20070058440A (en) * 2004-07-02 2007-06-08 헨리 엘 니만 Copy choice recombination and uses thereof
RS64230B1 (en) * 2011-05-24 2023-06-30 BioNTech SE Individualized vaccines for cancer

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105603062A (en) * 2006-05-03 2016-05-25 人口诊断股份有限公司 Method of evaluating genetic disorders
CN105392894A (en) * 2012-01-20 2016-03-09 深圳华大基因医学有限公司 Method and system for determining whether copy number variation exists in sample genome, and computer readable medium
CN105989249A (en) * 2014-09-26 2016-10-05 叶承羲 Method, system and device for assembling genomic sequence

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
Biodiversity and ecosystem stability across scales in metacommunities;Shaopeng Wang等;《Ecology Letters》;20160531;第1-4页 *
Detection of high variability in gene expression from single-cell RNA-seq profiling;Hung-I Harry Chen等;《The Author(s) BMC Genomics》;20160822;第1-3页 *

Also Published As

Publication number Publication date
CN108090325A (en) 2018-05-29

Similar Documents

Publication Publication Date Title
Crombie et al. Deep sampling of Hawaiian Caenorhabditis elegans reveals high genetic diversity and admixture with global populations
Simakov et al. Hemichordate genomes and deuterostome origins
Gawad et al. Single-cell genome sequencing: current state of the science
Duveau et al. Fitness effects of altering gene expression noise in Saccharomyces cerevisiae
Naik et al. Cellular barcoding: a technical appraisal
Kelly et al. Pervasive linked selection and intermediate-frequency alleles are implicated in an evolve-and-resequencing experiment of Drosophila simulans
Tschopp et al. Deep homology in the age of next-generation sequencing
CN107077537A (en) With short reading sequencing data detection repeat amplification protcol
Zenda et al. Advances in cereal crop genomics for resilience under climate change
Dornburg et al. Maximizing power in phylogenetics and phylogenomics: a perspective illuminated by fungal big data
Cissé et al. Genomic insights into the host specific adaptation of the Pneumocystis genus
CN115052994A (en) Method for determining base type of predetermined site in chromosome of embryonic cell and application thereof
Salmona et al. Inferring demographic history using genomic data
Widmayer et al. Evaluating the power and limitations of genome-wide association studies in Caenorhabditis elegans
Hopkins et al. Phenotypic screening models for rapid diagnosis of genetic variants and discovery of personalized therapeutics
CN108090325B (en) Method for analyzing single cell sequencing data by applying beta-stability
Pardo-De la Hoz et al. Ancient rapid radiation explains most conflicts among gene trees and well-supported phylogenomic trees of Nostocalean cyanobacteria
Calisi et al. RNAseq-ing a more integrative understanding of animal behavior
Kuo et al. Weak gene–gene interaction facilitates the evolution of gene expression plasticity
Barrie et al. Elevated genetic risk for multiple sclerosis originated in Steppe Pastoralist populations
Stres et al. New frontiers in soil microbiology: how to link structure and function of microbial communities?
Mardulyn et al. Controlling population evolution in the laboratory to evaluate methods of historical inference
Kelly et al. An examination of the evolve-and-resequence method using Drosophila simulans
Pratto et al. Germline DNA replication shapes the recombination landscape in mammals
CN117237324B (en) Non-invasive euploid prediction method and system

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant