CN115394359A - Method for identifying human embryonic cell chromosome variation and application - Google Patents

Method for identifying human embryonic cell chromosome variation and application Download PDF

Info

Publication number
CN115394359A
CN115394359A CN202211322202.9A CN202211322202A CN115394359A CN 115394359 A CN115394359 A CN 115394359A CN 202211322202 A CN202211322202 A CN 202211322202A CN 115394359 A CN115394359 A CN 115394359A
Authority
CN
China
Prior art keywords
expression
matrix
genes
gene
chromosome
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202211322202.9A
Other languages
Chinese (zh)
Other versions
CN115394359B (en
Inventor
乔杰
李烨
严智强
王玉倩
闫丽盈
王楠
朱小辉
关硕
阔瀛
孔思明
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Peking University Third Hospital Peking University Third Clinical Medical College
Original Assignee
Peking University Third Hospital Peking University Third Clinical Medical College
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Peking University Third Hospital Peking University Third Clinical Medical College filed Critical Peking University Third Hospital Peking University Third Clinical Medical College
Priority to CN202211322202.9A priority Critical patent/CN115394359B/en
Publication of CN115394359A publication Critical patent/CN115394359A/en
Application granted granted Critical
Publication of CN115394359B publication Critical patent/CN115394359B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16BBIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
    • G16B25/00ICT specially adapted for hybridisation; ICT specially adapted for gene or protein expression
    • G16B25/10Gene or protein expression profiling; Expression-ratio estimation or normalisation
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16BBIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
    • G16B20/00ICT specially adapted for functional genomics or proteomics, e.g. genotype-phenotype associations
    • G16B20/10Ploidy or copy number detection
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16BBIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
    • G16B20/00ICT specially adapted for functional genomics or proteomics, e.g. genotype-phenotype associations
    • G16B20/20Allele or variant detection, e.g. single nucleotide polymorphism [SNP] detection
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16BBIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
    • G16B20/00ICT specially adapted for functional genomics or proteomics, e.g. genotype-phenotype associations
    • G16B20/50Mutagenesis

Landscapes

  • Health & Medical Sciences (AREA)
  • Physics & Mathematics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Engineering & Computer Science (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Genetics & Genomics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Molecular Biology (AREA)
  • Biophysics (AREA)
  • Biotechnology (AREA)
  • Evolutionary Biology (AREA)
  • General Health & Medical Sciences (AREA)
  • Medical Informatics (AREA)
  • Spectroscopy & Molecular Physics (AREA)
  • Theoretical Computer Science (AREA)
  • Chemical & Material Sciences (AREA)
  • Analytical Chemistry (AREA)
  • Proteomics, Peptides & Aminoacids (AREA)
  • Measuring Or Testing Involving Enzymes Or Micro-Organisms (AREA)

Abstract

The invention relates to a method for identifying human embryonic cell chromosome variation, which obtains a reference system which takes a gene as a unit and can be used as an expression quantity reference by establishing a normal diploid gene expression matrix. Calculating the relative value of the chromosome expression quantity of the embryo to be detected can indicate the chromosome ploidy of the embryo. Embryo biopsy can capture information available throughout the embryo to ensure transcriptome sequencing as an effective tool for pre-implantation screening. The method of the invention can be used for generating chromosome karyotypes based on RNA expression changes, and the result is basically consistent with the result of CNV calculation by the existing whole genome sequencing.

Description

Method for identifying human embryo cell chromosome variation and application
Technical Field
The invention relates to the field of medical detection, in particular to a method for identifying human embryonic cell chromosome variation and application thereof.
Background
Less than half of human zygotes survive to birth, and some fetuses are born with genetic disease, primarily due to chromosomal deletions or duplications of meiotic or mitotic origin. Currently, the process of selecting embryos for uterine transplantation uses a temporal combination of morphological criteria, developmental dynamics and aneuploidy gene detection. However, there is no single criterion to ensure that a viable embryo is selected. Transcriptomes give rise to embryos with high developmental potential, but at the same time as it does, it is also necessary to know the chromosomal Copy Number Variation (CNV) of the embryos. Although there are methods available for obtaining chromosomal CNV by batch-based DNA assays or comparison of multiple biopsies of a few embryonic cells, these methods are based on genomic sequencing and do not simultaneously obtain transcriptome information.
At present, there are two existing techniques for identifying chromosomal variation in human embryonic cells prior to implantation by using the single-cell transcriptome technique:
(1) The RNA-seq library was generated by taking the trophectoderm biopsy and the remaining whole embryo. Specifically, based on the RNA expression value of each sample, the method uses a z-fraction as a standardization mode, establishes an RNA digital karyotype for each autosome of a batch of samples, divides a threshold value, and uses chromosomes with the z-fraction being more than 2 or less than-2 as abnormal values to report chromosome variation.
(2) Embryo karyotype is classified by transcriptome data, but more deep RNA-seq sequencing is required, and aneuploidy is inferred based on SNP genotyping by integrating the characteristics of allelic imbalance, detecting dose-related gene expression changes.
The aforementioned prior art approach to chromosomes removes the most noisy genes (those expressing <1 RPKM in all samples) and then treats each entire chromosome as a transcription unit and normalizes the z-score for the total amount of gene expression on each chromosome. Chromosomes with z-scores greater than 2 or less than-2 are considered outliers. Thereby judging the chromosome karyotype of the embryo. However, this method has two disadvantages:
first, unstable gene expression can affect the determination of chromosome copy number. Some genes have very high expression level, up to ten thousand RPKM, and some genes have single digit expression level. Thus, a highly expressed gene has a great influence on the total amount of chromosomal transcripts in which the gene is located. In particular, when some highly expressed genes themselves are not stably expressed, these genes may cause excessive intrinsic noise for karyotype determination. However, the method only screens the genes which are not expressed, but does not perform any treatment on the genes with high expression. Resulting in chromosomes with a low number of genes, such as chromosome 21, the karyotype calculation is susceptible to high expression of the genes.
Second, there is a systematic error. For diploid human embryonic trophoblast cells, the expression heterogeneity of genes among individuals is strong, so that the chromosome ploidy is measured by directly using the expression quantity of the genes, which brings large errors. In the method, only the normalization is carried out on the chromosome level, and no correction or normalization is carried out on the sample level, so that the difference of the total expression amount among samples can cause deviation, and the chromosomes of the samples with low expression amount (or low sample cell number) are more easily judged to be deleted; and vice versa.
Finally, the chromosome copy number is calculated to establish a relative value after a batch of samples are normalized at the same time, the method needs a certain number of samples to be compared at the same time, the premise is that most samples are normal diploids, so that an abnormal value after normalization is found, the requirement on the samples is high, and the method is sometimes difficult to achieve clinically.
Therefore, currently, there is no effective method for detecting single-cell chromosomal copy number variation by transcriptome.
Disclosure of Invention
To overcome the deficiencies of the prior art, we developed a transcriptome analysis method. The method is used for evaluating the development ability of embryos by deducing aneuploidy through 'identifying whether the gene expression quantity in each chromosome of a human embryo cell accords with that of a normal diploid embryo by single cell transcriptome sequencing data'. The invention mainly solves two problems: the first is to screen out unstable gene expression through coefficient of variation, and establish normal human embryo diploid gene expression reference system, eliminate internal noise. And secondly, correcting the chromosome expression quantity by using a diploid gene expression reference system at a sample level, synchronously multiplying the chromosome expression quantity of the sample by a coefficient, and adjusting the median of the chromosome expression quantity of each sample to 2 so as to eliminate the karyotype judgment system deviation caused by sample difference.
Specifically, we establish a normal diploid gene expression matrix to obtain a reference frame in units of genes, which can be used as a reference for expression. Calculating the relative value of the chromosome expression quantity of the embryo to be detected can indicate the chromosome ploidy of the embryo. Embryo biopsies can capture information available throughout the embryo to ensure transcriptome sequencing as an effective tool for pre-implantation screening. The results indicate that this technique can be used to generate chromosomal karyotypes based on changes in RNA expression, and that the results are essentially consistent with the results of current genome-wide sequencing calculations of CNVs.
In order to achieve the above technical effects, the following technical solutions are specifically provided:
in a first aspect of the present invention, there is provided a method for detecting single cell chromosomal Copy Number Variation (CNV) by transcriptome, the method comprising the steps of:
(1) Screening for stably expressed genes.
After deleting genes whose expression levels in all diploid samples were less than 1 on the average, the Coefficient of Variation (CV) was calculated for the expression levels of the remaining genes as follows:
Figure 782091DEST_PATH_IMAGE001
SD is the standard deviation of gene expression in each sample, mean is the Mean expression level of the gene
According to the distribution condition of the coefficient of variation, the CV values are arranged from high to low, the genes with the CV values positioned in the first 25 percent are selected as genes with unstable expression, the genes are screened out, and the remaining genes are genes with stable expression and can be reserved for the next calculation;
(1) Calculating the average expression level of each stably expressed gene in a diploid standard sample, and forming a new matrix together with the genes, wherein the matrix is a gene expression reference system of a normal diploid embryo of a human body:
Figure 329747DEST_PATH_IMAGE002
(3) Preparation of relative expression quantity matrix
After obtaining the diploid gene expression reference system, calculating the CNV of the clinical sample according to the transcript, firstly, making a relative expression matrix, specifically, firstly, selecting genes which are overlapped with the reference system from the generated matrix to form a new matrix, and dividing each gene in the new matrix by the average expression amount of the corresponding gene in the reference system to generate the relative expression matrix, wherein the limit that the expression amount is higher than 4 in the relative expression matrix is 4, so that the influence of overhigh fluctuation of a single gene on the whole is avoided. Assuming a geneXThe expression amount in the reference system is
Figure 296566DEST_PATH_IMAGE003
Of genesXExpression quantity matrix in all samples to be tested
Figure 802634DEST_PATH_IMAGE004
Comprises the following steps:
Figure 222114DEST_PATH_IMAGE005
geneXThe relative expression matrix of (a) is then:
Figure 573461DEST_PATH_IMAGE007
similarly, the relative expression matrix of all genes is:
Figure 660366DEST_PATH_IMAGE009
(4) Generation and correction of relative expression matrix in chromosome unit and judgment of CNV
Obtaining relative expression matrixes, and calculating the average relative expression quantity of genes of the chromosomes by taking the chromosomes as units, specifically, by using an alignment file downloaded from a UCSC genome database, the genes in each relative expression matrix are corresponding to the chromosome in which the relative expression matrix is located, and each chromosome calculates the average expression quantity of the genes contained in the chromosome:
Figure 570291DEST_PATH_IMAGE010
wherein n is the number of genes belonging to chromosome i in the diploid reference system,
Figure 477067DEST_PATH_IMAGE011
calculating each chromosome once to obtain a relative expression matrix using chromosome as unit, where each row in the matrix is the relative expression of each chromosome in a certain sample, and the relative expression matrix
Figure 366525DEST_PATH_IMAGE012
Comprises the following steps:
Figure 573516DEST_PATH_IMAGE013
because the expression quantity of the sample is deviated from the reference system due to the individual difference between each sample, most of the chromosomes of most samples are normal diploids, and therefore, the expression quantity of 22 chromosomes of each sample is multiplied by a coefficient in units of samples
Figure 155807DEST_PATH_IMAGE014
The median of the chromosome expression quantity of each sample is equal to 2, which indicates that the sample is a normal diploid, and the step is used for judging the CNV after the chromosome relative expression matrix is normalized; that is, those with a copy number of more than 2.7 are referred to as trisomy, and those with a copy number of less than 1.3 are referred to as monosomy.
In one embodiment, the step of determining the CNV is:
the expression level of 22 autosomes in each sample was
Figure 549879DEST_PATH_IMAGE015
The median value of the chromosomal expression was recorded
Figure 243029DEST_PATH_IMAGE016
(ii) a Its chromosomal expression coefficient
Figure 38946DEST_PATH_IMAGE017
Then it is:
Figure 792139DEST_PATH_IMAGE018
to obtain
Figure 172042DEST_PATH_IMAGE019
After the values of (3), the expression levels of the 22 chromosomes of the sample can be calculated as follows:
Figure 934462DEST_PATH_IMAGE020
example (a):
Figure 584886DEST_PATH_IMAGE021
the value in the obtained final chromosome relative expression matrix can represent the value of the chromosome Copy Number Variation, namely CNV (Copy Number Variation).
Compared with the prior art, the invention has the following remarkable advantages:
(1) Unstable gene expression. After screening out genes that were not expressed in the samples (genes with RPKM mean < 1), the Coefficient of Variation (CV) of each of the remaining genes in all samples was first calculated, i.e. the standard deviation divided by the mean. The more unstable the expression of the gene, the greater the coefficient of variation. After screening out 25% of the genes in the first CV value, the remaining genes are stably expressed and used for generating a diploid expression level reference system. The method basically eliminates the internal noise caused by unstable gene expression and the difference of the initial expression quantity thereof, and ensures that the calculated CNV variation is derived from the copy number variation of the chromosome; compared with the prior art that only low-expression genes are screened out, the method disclosed by the invention focuses on the influence of the gene expression stability on CNV, and selects stably-expressed genes as a judgment basis for chromosome copy number, so that the accuracy of the result is far higher than that of the prior art;
(2) Aiming at the influence of system errors, the invention calculates the real expression quantity of the gene and establishes a human diploid embryo gene expression reference system. The reference system is used for calibrating the real expression quantity, and the influence of the difference of the expression baseline among the genes on the result is eliminated. Specifically, after obtaining the diploid human embryo trophoblast cell transcriptome matrix, the method calculates the gene expression value (RPKM) of a sample, and corrects the gene expression value by using a human embryo diploid gene expression quantity reference system to obtain a relative expression quantity matrix. The relative expression matrix eliminates the difference between different genes and draws all genes to the same level for statistics. And then calculating the average expression quantity of each chromosome gene by taking the chromosomes as a unit, and finally uniformly up-regulating or down-regulating all the chromosome expression quantities of each sample to the median of 2 to obtain a matrix which is the chromosome ploidy.
Drawings
The accompanying drawings, which are included to provide a further understanding of the invention and are incorporated in and constitute a part of this specification, illustrate embodiments of the invention and together with the description serve to explain the principles of the invention and not to limit the invention. In the drawings:
FIG. 1 is a distribution of the coefficient of variation of each gene;
fig. 2 shows the CNV result determined by the method.
Detailed Description
The preferred embodiments of the present invention will be described in conjunction with the accompanying drawings, and it should be understood that they are presented herein only to illustrate and explain the present invention and not to limit the present invention.
EXAMPLE 1 construction of the model
Human embryo diploid gene expression level reference system: the average expression level of the stably expressed gene of the normal human diploid embryo was used as a standard control.
1. Single cell transcriptome sequencing
Extracapsular trophoblast cells were obtained from biopsies. 1, 3 or 5 cells were extracted for single cell transcriptome sequencing.
2. Sequencing data cleaning, comparison and comparison post-processing
Firstly, data quality is cleaned by trim _ galore (version 0.6.6), a second-generation sequencing joint sequence and low-quality bases are removed by default parameters, and only a sequence with the sequence length of more than 36 bp after treatment is reserved. Next, the alignment was performed using RSEM (version 1.3.3) with hg38 as the reference genome. The expression level of each gene was calculated for each sample using RSEM.
3. Screening of samples and Generation of Gene expression matrices
After the gene expression level of each sample was obtained, the samples were mass filtered. Samples with RPKM >1 genes more than 5000 are taken as qualified samples.
After selecting appropriate samples and obtaining the expression level (RPKM) of each gene of each sample, a matrix with column names of the samples and row names of the genes was prepared.
4. Production of human normal diploid embryo gene expression reference system
After obtaining the gene expression matrix, selecting a proper sample and a suitable gene thereof for establishing a gene expression reference system. First, the whole genome PGD result of the selected sample is used to determine whether the trophoblast cells of the embryo sample are normal diploid (gold standard). Secondly, selecting the normal diploid samples, sequencing transcriptome, and then screening genes and making a reference system.
Since the number of cells in each sample is different, the individual samples are also different, resulting in the total gene expression amount in each sample
Figure 243400DEST_PATH_IMAGE022
In contrast, to make the samples comparable, the average of the total expression of two normal samples was calculated
Figure 877644DEST_PATH_IMAGE023
Then, the expression level of each gene in each sample is measured
Figure 912596DEST_PATH_IMAGE024
Are all synchronously up-regulated/down-regulated so as to obtain the total value of gene expression
Figure 683106DEST_PATH_IMAGE025
And average value of total amount of gene expression
Figure 778101DEST_PATH_IMAGE026
Flush:
Figure 634062DEST_PATH_IMAGE027
in whichnIs the sample size;
Figure 738284DEST_PATH_IMAGE028
after correcting the gene expression level, deleting the genes with the average expression level less than 1 in all diploid samples, wherein the genes are regarded as not to be expressed and do not influence the judgment of chromosome karyotypes; then, the Coefficient of Variation (CV) was calculated for the expression level of the remaining genes as follows:
Figure 596256DEST_PATH_IMAGE001
SD is the standard deviation of gene expression in each sample, mean is the Mean expression level of the gene
The greater the coefficient of variation, the less stable the expression of the gene is considered.
And (3) according to the distribution condition of the coefficient of variation, arranging CV values from high to low, selecting genes with CV values positioned in the first 25 percent, regarding the genes as genes with unstable expression, and screening the genes, wherein the remaining genes are genes with stable expression and can be reserved for the next calculation.
Then, calculating the average expression level of each gene in the diploid standard sample, and forming a new matrix together with the genes, wherein the matrix is a gene expression reference system of a normal diploid embryo of a human being:
Figure 862152DEST_PATH_IMAGE002
5. preparation of relative expression quantity matrix
After obtaining the diploid gene expression reference system, calculating the CNV of the clinical sample can be started according to the transcript. First, a relative expression quantity matrix is prepared. Specifically, genes overlapping the reference system are selected from the matrix generated in step 3 to form a new matrix. Each gene in the new matrix is divided by the average expression level of the corresponding gene in the reference frame to generate a relative expression level matrix.
Specifically, assume a geneXThe expression amount in the reference system is
Figure 939830DEST_PATH_IMAGE029
. GeneXExpression quantity matrix in all samples to be tested
Figure 847743DEST_PATH_IMAGE030
Comprises the following steps:
Figure 327266DEST_PATH_IMAGE031
geneXThe relative expression matrix of (a) is then:
Figure 764063DEST_PATH_IMAGE032
similarly, the relative expression matrix of all genes is:
Figure 329037DEST_PATH_IMAGE034
6. generation and correction of relative expression matrix in chromosome unit and judgment of CNV
After obtaining the relative expression matrix, the average relative expression quantity of the chromosome genes is calculated by taking the chromosome as a unit. Specifically, using the alignment file downloaded from the UCSC genome database, the genes in each relative expression matrix are mapped to the chromosome on which they reside. The average expression level of the genes contained in each chromosome is calculated:
Figure 509483DEST_PATH_IMAGE010
wherein n is the number of genes belonging to chromosome i in the diploid reference system,
Figure 109091DEST_PATH_IMAGE011
the relative expression amounts of these genes. Each chromosome is calculated once to obtain a relative expression matrix using the chromosome as a unit, each row in the matrix is the relative expression quantity of each chromosome of a certain sample, and the relative expression quantity matrix
Figure 215325DEST_PATH_IMAGE012
Comprises the following steps:
Figure 267595DEST_PATH_IMAGE035
due to individual differences between each sample, the expression level of the sample may be shifted from the reference frame. While most chromosomes in most samples are normally diploid. Therefore, the expression level of 22 chromosomes per sample is multiplied by a coefficient in units of samples
Figure 251731DEST_PATH_IMAGE014
The median of the chromosome expression of each sample was made equal to 2, indicating that it is a normal diploid. After the chromosome relative expression matrix is normalized by the step, the judgment of the CNV is made.
For example, for sample A, the expression levels of 22 autosomes are
Figure 705846DEST_PATH_IMAGE015
The median value of the chromosomal expression was recorded
Figure 750026DEST_PATH_IMAGE016
(ii) a Its chromosomal expression coefficient
Figure 289592DEST_PATH_IMAGE017
Then it is:
Figure 77419DEST_PATH_IMAGE018
to obtain
Figure 386041DEST_PATH_IMAGE019
After the values of (3), the expression levels of the 22 chromosomes of the sample can be calculated as follows:
Figure 335542DEST_PATH_IMAGE036
namely:
Figure 362404DEST_PATH_IMAGE021
the resulting values in the final chromosome relative expression matrix can represent the values of the chromosomal Copy Number Variation, which we generally refer to as CNV (Copy Number Variation).
7. Thresholding and single cell chromosome copy number visualization
Dividing the chromosome copy number by a certain threshold value, and performing clinical judgment. In the clinic, a copy number of 1 represents a chromosomal deletion and a copy number of 3 represents a chromosomal duplication. However, the presence of chimeric embryos (i.e., some cells in the embryo are normally diploid, some are monomeric or trisomy, and current omics sequencing will mix the two cells together) results in the CNV being measured which is often not an integer. Therefore, based on the threshold value of DNA for detecting chromosome copy number, 0-1.3 is divided into deletion (monomer), 1.3-1.7 is divided into chimeric deletion, 1.7-2.3 is normal diploid, 2.3-2.7 is chimeric repeat, and more than 2.7 is repeat (trisomy).
Example 2 identification of Single cell chromosome copy number Using blastocyst biopsy Single cell RNA sequencing data construction
1. Screening of samples and Generation of Gene expression matrices
After the gene expression level of each sample was obtained, the samples were mass filtered. At the gene level, RPKM >1 is defined as expression; at the sample level, samples with RPKM >1 genes with number greater than 5000 are qualified samples.
After screening, a total of 39 samples were obtained for the next calculation. Of these, 16 samples were doubled by DNA sequencing (gold standard) showing normal chromosome number, and these 16 were left as reference frame and the other 23 for validation. After obtaining the expression level (RPKM) of each gene of each sample, a matrix having a column name of the sample and a row name of the gene name is prepared for each of the reference sample and the verification sample.
2. Production of human normal diploid embryo gene expression reference system
The reference sample matrix is used for gene screening and reference line creation.
First, the genes for making a reference system are selected and used for establishing a gene expression reference system. The total gene expression of each sample was averaged and then synchronously up/down regulated to the average.
Secondly, deleting genes with an average expression value RPKM <1, wherein the genes are regarded as not to be expressed, so that the judgment of chromosome karyotype is not influenced; then, the Coefficient of Variation (CV) was calculated from the expression level of the remaining genes. The larger the coefficient of variation, the more unstable the expression of the gene is considered.
The distribution of the coefficient of variation for each gene in the present 16 samples is shown in FIG. 1.
According to the figure, the vertical axis represents the number of genes, and 75% of the gene variation coefficients are concentrated between 0 and 1, so that genes with CV >1 are considered as genes with unstable expression and are screened out, and 7390 genes with stable expression are remained.
After screening out the unstable genes, the remaining genes are all genes used for making a reference frame and for subsequent calculations. As follows:
Figure 186878DEST_PATH_IMAGE037
calculating the average expression level of each gene in 16 standard samples, and forming a new matrix together with the genes, wherein the matrix is a gene expression reference system of the human normal diploid embryo. As shown in the following figures:
Figure 615586DEST_PATH_IMAGE038
3. preparation of relative expression quantity matrix
After obtaining the diploid gene expression reference system, the method can be used for detecting the chromosome copy number of 23 verification samples.
First, a relative expression matrix with respect to a reference frame is prepared. Specifically, 7390 genes that overlap the reference frame are selected from the matrix generated in step 1 to form a new matrix. Each gene in the new matrix is divided by the average expression level of the corresponding gene in the diploid embryo gene expression reference frame to generate a relative expression level matrix. The following figures:
Figure 735988DEST_PATH_IMAGE039
4. generation, correction and judgment of CNV of relative expression matrix in chromosome unit
After obtaining the relative expression matrix, the average relative expression quantity of the chromosome genes is calculated by taking the chromosome as a unit. Specifically, using the alignment file downloaded from the UCSC genome database, the genes in each relative expression matrix are mapped to the chromosome on which they reside. The average expression level of the genes contained in the chromosome is calculated every chromosome cycle. Obtaining a chromosome relative expression matrix:
Figure 515725DEST_PATH_IMAGE040
due to individual differences between each sample, the expression level of the sample may be shifted from the reference frame. Most chromosomes in most samples are normal diploids. Therefore, the expression level of 22 chromosomes per sample is multiplied by a coefficient in units of samples
Figure 379776DEST_PATH_IMAGE014
The median of the chromosome expression of each sample was made equal to 2, indicating that it is a normal diploid. After the chromosome relative expression matrix is normalized by the step, the CNV is further judged.
The coefficients for the 23 samples to be tested are:
Figure 662990DEST_PATH_IMAGE041
to obtain
Figure 219873DEST_PATH_IMAGE042
After the value of (3), the chromosome relative expression quantity is calibrated by weight to obtain the final relative expression quantity of 22 chromosomes in each sample:
Figure 532912DEST_PATH_IMAGE043
the values in the matrix are calculated values of the chromosome CNV (Copy Number Variation).
5. Thresholding and single cell chromosome copy number visualization
The chromosome copy number is divided by a certain threshold value at this time, and clinical judgment is carried out. Dividing the number of the gene into 0-1.3 deletion (monomer), 1.3-2.7 normal diploid or chimeric, and more than 2.7 duplication (trisomy). The CNV result determined by the method of this time according to the threshold is shown in fig. 2.
6. And (5) verifying the result.
The results of identifying human embryonic cell chromosomal variations using single cell transcriptome-based sequencing data and its gene expression reference lines are shown above in FIG. 2. The results obtained from the gold standards (copy number variation results using DNA whole genome sequencing) for this batch of samples and using the above method of the invention (RNA whole transcriptome sequencing build method to identify the copy number of chromosomes of a single cell) are compared in the following table:
Figure 403916DEST_PATH_IMAGE044
Figure DEST_PATH_IMAGE045
as can be seen from the above table, the 23 sample cases show that the method for establishing and identifying the copy number of the single-cell chromosome based on the sequencing of the whole transcriptome is completely consistent with the embryo result based on the sequencing of the whole genome, the diagnosis accuracy rate obtained by the conventional method is only 43.4%, and the diagnosis accuracy rate of the invention is up to 100%.
It will be apparent to those skilled in the art that various changes and modifications may be made in the present invention without departing from the spirit and scope of the invention. Thus, if such modifications and variations of the present invention fall within the scope of the claims of the present invention and their equivalents, the present invention is also intended to include such modifications and variations.

Claims (2)

1. A method for detecting chromosomal copy number variation in a single cell by transcriptome, said method comprising the steps of:
after the gene expression level is normalized by RPKM, deleting the gene with the expression level average value less than 1 in all diploid samples; then, the coefficient of variation was calculated for the expression level of the remaining gene as follows:
Figure 153332DEST_PATH_IMAGE001
SD is the standard deviation of the gene expression in each sample, mean is the average expression level of the gene;
arranging the coefficient of variation values from high to low according to the distribution condition of the coefficient of variation, selecting genes of which the coefficient of variation values are positioned at the first 25 percent, regarding the genes as genes with unstable expression, and screening out the genes, wherein the remaining genes are genes with stable expression and can be reserved and used for the next calculation;
calculating the average expression level of each gene in the diploid standard sample, and forming a new matrix together with the genes, wherein the matrix is a gene expression reference system of a human normal diploid embryo:
Figure 539314DEST_PATH_IMAGE002
(3) Preparation of relative expression quantity matrix
After obtaining the diploid gene expression reference system, calculating the copy number variation of clinical samples according to the transcript, firstly making a relative expression matrix, specifically, firstly selecting genes which are overlapped with the reference system from the generated matrix to form a new matrix, and dividing each gene in the new matrix by the average expression of the corresponding gene in the reference system to generate the relative expression matrix; wherein, the relative expression quantity of the genes with the expression quantity exceeding 4 is limited to 4, so that the overlarge influence of the fluctuation of a single gene on the whole is avoided; assuming a gene X, the expression level in the reference system is
Figure 787893DEST_PATH_IMAGE003
Expression quantity matrix of Gene X in all samples to be examined
Figure 917523DEST_PATH_IMAGE004
Comprises the following steps:
Figure 200736DEST_PATH_IMAGE005
geneXThe relative expression matrix of (a) is then:
Figure 492041DEST_PATH_IMAGE007
similarly, the relative expression matrix of all genes is:
Figure 493495DEST_PATH_IMAGE009
(4) Generation, correction and judgment of copy number variation of relative expression matrix in chromosome unit
After obtaining the relative expression matrix, next, calculating the average relative expression level of the genes of the chromosomes by taking the chromosomes as a unit, specifically, by using an alignment file downloaded from the UCSC genome database, the genes in each relative expression matrix are corresponding to the chromosome where the relative expression matrix is located, and each chromosome calculates the average expression level of the genes contained in the chromosome:
Figure 659771DEST_PATH_IMAGE010
wherein n is the number of genes belonging to chromosome i in the diploid reference system,
Figure 797492DEST_PATH_IMAGE011
calculating each chromosome once to obtain a relative expression matrix using chromosome as unit, where each row in the matrix is the relative expression of each chromosome in a certain sample, and the relative expression matrix
Figure 259697DEST_PATH_IMAGE012
Comprises the following steps:
Figure 217289DEST_PATH_IMAGE013
because the expression quantity of the sample is deviated from the reference system due to the individual difference between each sample, most of the chromosomes of most samples are normal diploids, and therefore, the expression quantity of 22 chromosomes of each sample is multiplied by a coefficient in units of samples
Figure 423142DEST_PATH_IMAGE014
And (3) normalizing the relative expression matrix of the chromosomes by the step, and then judging the copy number variation.
2. The method of claim 1, wherein the determining of copy number variation comprises:
the expression level of 22 autosomes in each sample was
Figure 680948DEST_PATH_IMAGE015
The median value of the chromosomal expression was recorded
Figure 579634DEST_PATH_IMAGE016
(ii) a Its chromosomal expression coefficient
Figure 555680DEST_PATH_IMAGE017
Then it is:
Figure 798181DEST_PATH_IMAGE018
to obtain
Figure 910493DEST_PATH_IMAGE019
After the values of (2), the expression levels of the 22 chromosomes of the sample can be calculated as follows:
Figure 980080DEST_PATH_IMAGE020
the value in the final chromosome relative expression matrix can represent the value of the chromosome copy number variation, i.e. the copy number variation.
CN202211322202.9A 2022-10-27 2022-10-27 Method for detecting single cell chromosome copy number variation through transcriptome Active CN115394359B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202211322202.9A CN115394359B (en) 2022-10-27 2022-10-27 Method for detecting single cell chromosome copy number variation through transcriptome

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202211322202.9A CN115394359B (en) 2022-10-27 2022-10-27 Method for detecting single cell chromosome copy number variation through transcriptome

Publications (2)

Publication Number Publication Date
CN115394359A true CN115394359A (en) 2022-11-25
CN115394359B CN115394359B (en) 2023-03-24

Family

ID=84128872

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202211322202.9A Active CN115394359B (en) 2022-10-27 2022-10-27 Method for detecting single cell chromosome copy number variation through transcriptome

Country Status (1)

Country Link
CN (1) CN115394359B (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117721222A (en) * 2024-02-07 2024-03-19 北京大学第三医院(北京大学第三临床医学院) Method for predicting embryo implantation by single cell transcriptome and application

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20130184999A1 (en) * 2012-01-05 2013-07-18 Yan Ding Systems and methods for cancer-specific drug targets and biomarkers discovery
US20130325360A1 (en) * 2011-10-06 2013-12-05 Sequenom, Inc. Methods and processes for non-invasive assessment of genetic variations
CN105722994A (en) * 2013-06-17 2016-06-29 维里纳塔健康公司 Method for determining copy number variations in sex chromosomes
CN111363831A (en) * 2020-04-15 2020-07-03 兰州大学 Method for detecting sheep PRAMEY gene copy number variation and application thereof
CN113192555A (en) * 2021-04-21 2021-07-30 杭州博圣医学检验实验室有限公司 Method for detecting copy number of second-generation sequencing data SMN gene by calculating sequencing depth of differential allele

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20130325360A1 (en) * 2011-10-06 2013-12-05 Sequenom, Inc. Methods and processes for non-invasive assessment of genetic variations
US20130184999A1 (en) * 2012-01-05 2013-07-18 Yan Ding Systems and methods for cancer-specific drug targets and biomarkers discovery
CN105722994A (en) * 2013-06-17 2016-06-29 维里纳塔健康公司 Method for determining copy number variations in sex chromosomes
CN111363831A (en) * 2020-04-15 2020-07-03 兰州大学 Method for detecting sheep PRAMEY gene copy number variation and application thereof
CN113192555A (en) * 2021-04-21 2021-07-30 杭州博圣医学检验实验室有限公司 Method for detecting copy number of second-generation sequencing data SMN gene by calculating sequencing depth of differential allele

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
MICHALIS KONSTANTINIDIS 等: "Aneuploidy and recombination in the human preimplantation embryo. Copy number variation analysis and genome-wide polymorphism genotyping", 《REPRODUCTIVE BIOMEDICINE ONLINE》 *
赵红翠 等: "卵母细胞及胚胎mtDNA 拷贝数在辅助生殖技术研究中的重要价值", 《现代妇产科进展》 *

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117721222A (en) * 2024-02-07 2024-03-19 北京大学第三医院(北京大学第三临床医学院) Method for predicting embryo implantation by single cell transcriptome and application
CN117721222B (en) * 2024-02-07 2024-05-10 北京大学第三医院(北京大学第三临床医学院) Method for predicting embryo implantation by single cell transcriptome and application

Also Published As

Publication number Publication date
CN115394359B (en) 2023-03-24

Similar Documents

Publication Publication Date Title
KR101795124B1 (en) Method and system for detecting copy number variation
CN106715711B (en) Method for determining probe sequence and method for detecting genome structure variation
US20220101944A1 (en) Methods for detecting copy-number variations in next-generation sequencing
CN108350498B (en) Parting method and device
CN115394359B (en) Method for detecting single cell chromosome copy number variation through transcriptome
CN113674803A (en) Detection method of copy number variation and application thereof
CN111755068A (en) Method and device for identifying tumor purity and absolute copy number based on sequencing data
CN106795551B (en) CNV analysis method and detection device for single cell chromosome
CN109461473B (en) Method and device for acquiring concentration of free DNA of fetus
WO2024140881A1 (en) Method and device for determining fetal dna concentration
JP7333838B2 (en) Systems, computer programs and methods for determining genetic patterns in embryos
CN107208152B (en) Method and apparatus for detecting mutant clusters
CN112823391A (en) Quality control metrics based on detection limits
WO2019132010A1 (en) Method, apparatus and program for estimating base type in base sequence
WO2016112539A1 (en) Method and device for determining fetal nucleic acid content
CN113793637B (en) Whole genome association analysis method based on parental genotype and progeny phenotype
EP4435791A1 (en) Sequence variation analysis method and system, and storage medium
CN114974415A (en) Method and device for detecting chromosome copy number abnormality
CN110993024B (en) Method and device for establishing fetal concentration correction model and method and device for quantifying fetal concentration
CN108229099A (en) Data processing method, device, storage medium and processor
Rhie et al. Chasing perfection: validation and polishing strategies for telomere-to-telomere genome assemblies
CN117721222B (en) Method for predicting embryo implantation by single cell transcriptome and application
Kannan et al. Transcriptomic entropy quantifies cardiomyocyte maturation at single cell level
CN114067909B (en) Method, device and storage medium for correcting homologous recombination defect score
CN114708905A (en) Chromosome aneuploidy detection method, device, medium and equipment based on NGS

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant