CN115394359A - Method for identifying human embryonic cell chromosome variation and application - Google Patents
Method for identifying human embryonic cell chromosome variation and application Download PDFInfo
- Publication number
- CN115394359A CN115394359A CN202211322202.9A CN202211322202A CN115394359A CN 115394359 A CN115394359 A CN 115394359A CN 202211322202 A CN202211322202 A CN 202211322202A CN 115394359 A CN115394359 A CN 115394359A
- Authority
- CN
- China
- Prior art keywords
- expression
- matrix
- genes
- gene
- chromosome
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
Images
Classifications
-
- G—PHYSICS
- G16—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
- G16B—BIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
- G16B25/00—ICT specially adapted for hybridisation; ICT specially adapted for gene or protein expression
- G16B25/10—Gene or protein expression profiling; Expression-ratio estimation or normalisation
-
- G—PHYSICS
- G16—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
- G16B—BIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
- G16B20/00—ICT specially adapted for functional genomics or proteomics, e.g. genotype-phenotype associations
- G16B20/10—Ploidy or copy number detection
-
- G—PHYSICS
- G16—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
- G16B—BIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
- G16B20/00—ICT specially adapted for functional genomics or proteomics, e.g. genotype-phenotype associations
- G16B20/20—Allele or variant detection, e.g. single nucleotide polymorphism [SNP] detection
-
- G—PHYSICS
- G16—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
- G16B—BIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
- G16B20/00—ICT specially adapted for functional genomics or proteomics, e.g. genotype-phenotype associations
- G16B20/50—Mutagenesis
Landscapes
- Health & Medical Sciences (AREA)
- Physics & Mathematics (AREA)
- Life Sciences & Earth Sciences (AREA)
- Engineering & Computer Science (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Genetics & Genomics (AREA)
- Bioinformatics & Computational Biology (AREA)
- Molecular Biology (AREA)
- Biophysics (AREA)
- Biotechnology (AREA)
- Evolutionary Biology (AREA)
- General Health & Medical Sciences (AREA)
- Medical Informatics (AREA)
- Spectroscopy & Molecular Physics (AREA)
- Theoretical Computer Science (AREA)
- Chemical & Material Sciences (AREA)
- Analytical Chemistry (AREA)
- Proteomics, Peptides & Aminoacids (AREA)
- Measuring Or Testing Involving Enzymes Or Micro-Organisms (AREA)
Abstract
The invention relates to a method for identifying human embryonic cell chromosome variation, which obtains a reference system which takes a gene as a unit and can be used as an expression quantity reference by establishing a normal diploid gene expression matrix. Calculating the relative value of the chromosome expression quantity of the embryo to be detected can indicate the chromosome ploidy of the embryo. Embryo biopsy can capture information available throughout the embryo to ensure transcriptome sequencing as an effective tool for pre-implantation screening. The method of the invention can be used for generating chromosome karyotypes based on RNA expression changes, and the result is basically consistent with the result of CNV calculation by the existing whole genome sequencing.
Description
Technical Field
The invention relates to the field of medical detection, in particular to a method for identifying human embryonic cell chromosome variation and application thereof.
Background
Less than half of human zygotes survive to birth, and some fetuses are born with genetic disease, primarily due to chromosomal deletions or duplications of meiotic or mitotic origin. Currently, the process of selecting embryos for uterine transplantation uses a temporal combination of morphological criteria, developmental dynamics and aneuploidy gene detection. However, there is no single criterion to ensure that a viable embryo is selected. Transcriptomes give rise to embryos with high developmental potential, but at the same time as it does, it is also necessary to know the chromosomal Copy Number Variation (CNV) of the embryos. Although there are methods available for obtaining chromosomal CNV by batch-based DNA assays or comparison of multiple biopsies of a few embryonic cells, these methods are based on genomic sequencing and do not simultaneously obtain transcriptome information.
At present, there are two existing techniques for identifying chromosomal variation in human embryonic cells prior to implantation by using the single-cell transcriptome technique:
(1) The RNA-seq library was generated by taking the trophectoderm biopsy and the remaining whole embryo. Specifically, based on the RNA expression value of each sample, the method uses a z-fraction as a standardization mode, establishes an RNA digital karyotype for each autosome of a batch of samples, divides a threshold value, and uses chromosomes with the z-fraction being more than 2 or less than-2 as abnormal values to report chromosome variation.
(2) Embryo karyotype is classified by transcriptome data, but more deep RNA-seq sequencing is required, and aneuploidy is inferred based on SNP genotyping by integrating the characteristics of allelic imbalance, detecting dose-related gene expression changes.
The aforementioned prior art approach to chromosomes removes the most noisy genes (those expressing <1 RPKM in all samples) and then treats each entire chromosome as a transcription unit and normalizes the z-score for the total amount of gene expression on each chromosome. Chromosomes with z-scores greater than 2 or less than-2 are considered outliers. Thereby judging the chromosome karyotype of the embryo. However, this method has two disadvantages:
first, unstable gene expression can affect the determination of chromosome copy number. Some genes have very high expression level, up to ten thousand RPKM, and some genes have single digit expression level. Thus, a highly expressed gene has a great influence on the total amount of chromosomal transcripts in which the gene is located. In particular, when some highly expressed genes themselves are not stably expressed, these genes may cause excessive intrinsic noise for karyotype determination. However, the method only screens the genes which are not expressed, but does not perform any treatment on the genes with high expression. Resulting in chromosomes with a low number of genes, such as chromosome 21, the karyotype calculation is susceptible to high expression of the genes.
Second, there is a systematic error. For diploid human embryonic trophoblast cells, the expression heterogeneity of genes among individuals is strong, so that the chromosome ploidy is measured by directly using the expression quantity of the genes, which brings large errors. In the method, only the normalization is carried out on the chromosome level, and no correction or normalization is carried out on the sample level, so that the difference of the total expression amount among samples can cause deviation, and the chromosomes of the samples with low expression amount (or low sample cell number) are more easily judged to be deleted; and vice versa.
Finally, the chromosome copy number is calculated to establish a relative value after a batch of samples are normalized at the same time, the method needs a certain number of samples to be compared at the same time, the premise is that most samples are normal diploids, so that an abnormal value after normalization is found, the requirement on the samples is high, and the method is sometimes difficult to achieve clinically.
Therefore, currently, there is no effective method for detecting single-cell chromosomal copy number variation by transcriptome.
Disclosure of Invention
To overcome the deficiencies of the prior art, we developed a transcriptome analysis method. The method is used for evaluating the development ability of embryos by deducing aneuploidy through 'identifying whether the gene expression quantity in each chromosome of a human embryo cell accords with that of a normal diploid embryo by single cell transcriptome sequencing data'. The invention mainly solves two problems: the first is to screen out unstable gene expression through coefficient of variation, and establish normal human embryo diploid gene expression reference system, eliminate internal noise. And secondly, correcting the chromosome expression quantity by using a diploid gene expression reference system at a sample level, synchronously multiplying the chromosome expression quantity of the sample by a coefficient, and adjusting the median of the chromosome expression quantity of each sample to 2 so as to eliminate the karyotype judgment system deviation caused by sample difference.
Specifically, we establish a normal diploid gene expression matrix to obtain a reference frame in units of genes, which can be used as a reference for expression. Calculating the relative value of the chromosome expression quantity of the embryo to be detected can indicate the chromosome ploidy of the embryo. Embryo biopsies can capture information available throughout the embryo to ensure transcriptome sequencing as an effective tool for pre-implantation screening. The results indicate that this technique can be used to generate chromosomal karyotypes based on changes in RNA expression, and that the results are essentially consistent with the results of current genome-wide sequencing calculations of CNVs.
In order to achieve the above technical effects, the following technical solutions are specifically provided:
in a first aspect of the present invention, there is provided a method for detecting single cell chromosomal Copy Number Variation (CNV) by transcriptome, the method comprising the steps of:
(1) Screening for stably expressed genes.
After deleting genes whose expression levels in all diploid samples were less than 1 on the average, the Coefficient of Variation (CV) was calculated for the expression levels of the remaining genes as follows:
SD is the standard deviation of gene expression in each sample, mean is the Mean expression level of the gene
According to the distribution condition of the coefficient of variation, the CV values are arranged from high to low, the genes with the CV values positioned in the first 25 percent are selected as genes with unstable expression, the genes are screened out, and the remaining genes are genes with stable expression and can be reserved for the next calculation;
(1) Calculating the average expression level of each stably expressed gene in a diploid standard sample, and forming a new matrix together with the genes, wherein the matrix is a gene expression reference system of a normal diploid embryo of a human body:
(3) Preparation of relative expression quantity matrix
After obtaining the diploid gene expression reference system, calculating the CNV of the clinical sample according to the transcript, firstly, making a relative expression matrix, specifically, firstly, selecting genes which are overlapped with the reference system from the generated matrix to form a new matrix, and dividing each gene in the new matrix by the average expression amount of the corresponding gene in the reference system to generate the relative expression matrix, wherein the limit that the expression amount is higher than 4 in the relative expression matrix is 4, so that the influence of overhigh fluctuation of a single gene on the whole is avoided. Assuming a geneXThe expression amount in the reference system isOf genesXExpression quantity matrix in all samples to be testedComprises the following steps:
similarly, the relative expression matrix of all genes is:
(4) Generation and correction of relative expression matrix in chromosome unit and judgment of CNV
Obtaining relative expression matrixes, and calculating the average relative expression quantity of genes of the chromosomes by taking the chromosomes as units, specifically, by using an alignment file downloaded from a UCSC genome database, the genes in each relative expression matrix are corresponding to the chromosome in which the relative expression matrix is located, and each chromosome calculates the average expression quantity of the genes contained in the chromosome:
wherein n is the number of genes belonging to chromosome i in the diploid reference system,calculating each chromosome once to obtain a relative expression matrix using chromosome as unit, where each row in the matrix is the relative expression of each chromosome in a certain sample, and the relative expression matrixComprises the following steps:
because the expression quantity of the sample is deviated from the reference system due to the individual difference between each sample, most of the chromosomes of most samples are normal diploids, and therefore, the expression quantity of 22 chromosomes of each sample is multiplied by a coefficient in units of samplesThe median of the chromosome expression quantity of each sample is equal to 2, which indicates that the sample is a normal diploid, and the step is used for judging the CNV after the chromosome relative expression matrix is normalized; that is, those with a copy number of more than 2.7 are referred to as trisomy, and those with a copy number of less than 1.3 are referred to as monosomy.
In one embodiment, the step of determining the CNV is:
the expression level of 22 autosomes in each sample wasThe median value of the chromosomal expression was recorded(ii) a Its chromosomal expression coefficientThen it is:
to obtainAfter the values of (3), the expression levels of the 22 chromosomes of the sample can be calculated as follows:
example (a):
the value in the obtained final chromosome relative expression matrix can represent the value of the chromosome Copy Number Variation, namely CNV (Copy Number Variation).
Compared with the prior art, the invention has the following remarkable advantages:
(1) Unstable gene expression. After screening out genes that were not expressed in the samples (genes with RPKM mean < 1), the Coefficient of Variation (CV) of each of the remaining genes in all samples was first calculated, i.e. the standard deviation divided by the mean. The more unstable the expression of the gene, the greater the coefficient of variation. After screening out 25% of the genes in the first CV value, the remaining genes are stably expressed and used for generating a diploid expression level reference system. The method basically eliminates the internal noise caused by unstable gene expression and the difference of the initial expression quantity thereof, and ensures that the calculated CNV variation is derived from the copy number variation of the chromosome; compared with the prior art that only low-expression genes are screened out, the method disclosed by the invention focuses on the influence of the gene expression stability on CNV, and selects stably-expressed genes as a judgment basis for chromosome copy number, so that the accuracy of the result is far higher than that of the prior art;
(2) Aiming at the influence of system errors, the invention calculates the real expression quantity of the gene and establishes a human diploid embryo gene expression reference system. The reference system is used for calibrating the real expression quantity, and the influence of the difference of the expression baseline among the genes on the result is eliminated. Specifically, after obtaining the diploid human embryo trophoblast cell transcriptome matrix, the method calculates the gene expression value (RPKM) of a sample, and corrects the gene expression value by using a human embryo diploid gene expression quantity reference system to obtain a relative expression quantity matrix. The relative expression matrix eliminates the difference between different genes and draws all genes to the same level for statistics. And then calculating the average expression quantity of each chromosome gene by taking the chromosomes as a unit, and finally uniformly up-regulating or down-regulating all the chromosome expression quantities of each sample to the median of 2 to obtain a matrix which is the chromosome ploidy.
Drawings
The accompanying drawings, which are included to provide a further understanding of the invention and are incorporated in and constitute a part of this specification, illustrate embodiments of the invention and together with the description serve to explain the principles of the invention and not to limit the invention. In the drawings:
FIG. 1 is a distribution of the coefficient of variation of each gene;
fig. 2 shows the CNV result determined by the method.
Detailed Description
The preferred embodiments of the present invention will be described in conjunction with the accompanying drawings, and it should be understood that they are presented herein only to illustrate and explain the present invention and not to limit the present invention.
EXAMPLE 1 construction of the model
Human embryo diploid gene expression level reference system: the average expression level of the stably expressed gene of the normal human diploid embryo was used as a standard control.
1. Single cell transcriptome sequencing
Extracapsular trophoblast cells were obtained from biopsies. 1, 3 or 5 cells were extracted for single cell transcriptome sequencing.
2. Sequencing data cleaning, comparison and comparison post-processing
Firstly, data quality is cleaned by trim _ galore (version 0.6.6), a second-generation sequencing joint sequence and low-quality bases are removed by default parameters, and only a sequence with the sequence length of more than 36 bp after treatment is reserved. Next, the alignment was performed using RSEM (version 1.3.3) with hg38 as the reference genome. The expression level of each gene was calculated for each sample using RSEM.
3. Screening of samples and Generation of Gene expression matrices
After the gene expression level of each sample was obtained, the samples were mass filtered. Samples with RPKM >1 genes more than 5000 are taken as qualified samples.
After selecting appropriate samples and obtaining the expression level (RPKM) of each gene of each sample, a matrix with column names of the samples and row names of the genes was prepared.
4. Production of human normal diploid embryo gene expression reference system
After obtaining the gene expression matrix, selecting a proper sample and a suitable gene thereof for establishing a gene expression reference system. First, the whole genome PGD result of the selected sample is used to determine whether the trophoblast cells of the embryo sample are normal diploid (gold standard). Secondly, selecting the normal diploid samples, sequencing transcriptome, and then screening genes and making a reference system.
Since the number of cells in each sample is different, the individual samples are also different, resulting in the total gene expression amount in each sampleIn contrast, to make the samples comparable, the average of the total expression of two normal samples was calculatedThen, the expression level of each gene in each sample is measuredAre all synchronously up-regulated/down-regulated so as to obtain the total value of gene expressionAnd average value of total amount of gene expressionFlush:
after correcting the gene expression level, deleting the genes with the average expression level less than 1 in all diploid samples, wherein the genes are regarded as not to be expressed and do not influence the judgment of chromosome karyotypes; then, the Coefficient of Variation (CV) was calculated for the expression level of the remaining genes as follows:
SD is the standard deviation of gene expression in each sample, mean is the Mean expression level of the gene
The greater the coefficient of variation, the less stable the expression of the gene is considered.
And (3) according to the distribution condition of the coefficient of variation, arranging CV values from high to low, selecting genes with CV values positioned in the first 25 percent, regarding the genes as genes with unstable expression, and screening the genes, wherein the remaining genes are genes with stable expression and can be reserved for the next calculation.
Then, calculating the average expression level of each gene in the diploid standard sample, and forming a new matrix together with the genes, wherein the matrix is a gene expression reference system of a normal diploid embryo of a human being:
5. preparation of relative expression quantity matrix
After obtaining the diploid gene expression reference system, calculating the CNV of the clinical sample can be started according to the transcript. First, a relative expression quantity matrix is prepared. Specifically, genes overlapping the reference system are selected from the matrix generated in step 3 to form a new matrix. Each gene in the new matrix is divided by the average expression level of the corresponding gene in the reference frame to generate a relative expression level matrix.
Specifically, assume a geneXThe expression amount in the reference system is. GeneXExpression quantity matrix in all samples to be testedComprises the following steps:
geneXThe relative expression matrix of (a) is then:
similarly, the relative expression matrix of all genes is:
6. generation and correction of relative expression matrix in chromosome unit and judgment of CNV
After obtaining the relative expression matrix, the average relative expression quantity of the chromosome genes is calculated by taking the chromosome as a unit. Specifically, using the alignment file downloaded from the UCSC genome database, the genes in each relative expression matrix are mapped to the chromosome on which they reside. The average expression level of the genes contained in each chromosome is calculated:
wherein n is the number of genes belonging to chromosome i in the diploid reference system,the relative expression amounts of these genes. Each chromosome is calculated once to obtain a relative expression matrix using the chromosome as a unit, each row in the matrix is the relative expression quantity of each chromosome of a certain sample, and the relative expression quantity matrixComprises the following steps:
due to individual differences between each sample, the expression level of the sample may be shifted from the reference frame. While most chromosomes in most samples are normally diploid. Therefore, the expression level of 22 chromosomes per sample is multiplied by a coefficient in units of samplesThe median of the chromosome expression of each sample was made equal to 2, indicating that it is a normal diploid. After the chromosome relative expression matrix is normalized by the step, the judgment of the CNV is made.
For example, for sample A, the expression levels of 22 autosomes areThe median value of the chromosomal expression was recorded(ii) a Its chromosomal expression coefficientThen it is:
to obtainAfter the values of (3), the expression levels of the 22 chromosomes of the sample can be calculated as follows:
namely:
the resulting values in the final chromosome relative expression matrix can represent the values of the chromosomal Copy Number Variation, which we generally refer to as CNV (Copy Number Variation).
7. Thresholding and single cell chromosome copy number visualization
Dividing the chromosome copy number by a certain threshold value, and performing clinical judgment. In the clinic, a copy number of 1 represents a chromosomal deletion and a copy number of 3 represents a chromosomal duplication. However, the presence of chimeric embryos (i.e., some cells in the embryo are normally diploid, some are monomeric or trisomy, and current omics sequencing will mix the two cells together) results in the CNV being measured which is often not an integer. Therefore, based on the threshold value of DNA for detecting chromosome copy number, 0-1.3 is divided into deletion (monomer), 1.3-1.7 is divided into chimeric deletion, 1.7-2.3 is normal diploid, 2.3-2.7 is chimeric repeat, and more than 2.7 is repeat (trisomy).
Example 2 identification of Single cell chromosome copy number Using blastocyst biopsy Single cell RNA sequencing data construction
1. Screening of samples and Generation of Gene expression matrices
After the gene expression level of each sample was obtained, the samples were mass filtered. At the gene level, RPKM >1 is defined as expression; at the sample level, samples with RPKM >1 genes with number greater than 5000 are qualified samples.
After screening, a total of 39 samples were obtained for the next calculation. Of these, 16 samples were doubled by DNA sequencing (gold standard) showing normal chromosome number, and these 16 were left as reference frame and the other 23 for validation. After obtaining the expression level (RPKM) of each gene of each sample, a matrix having a column name of the sample and a row name of the gene name is prepared for each of the reference sample and the verification sample.
2. Production of human normal diploid embryo gene expression reference system
The reference sample matrix is used for gene screening and reference line creation.
First, the genes for making a reference system are selected and used for establishing a gene expression reference system. The total gene expression of each sample was averaged and then synchronously up/down regulated to the average.
Secondly, deleting genes with an average expression value RPKM <1, wherein the genes are regarded as not to be expressed, so that the judgment of chromosome karyotype is not influenced; then, the Coefficient of Variation (CV) was calculated from the expression level of the remaining genes. The larger the coefficient of variation, the more unstable the expression of the gene is considered.
The distribution of the coefficient of variation for each gene in the present 16 samples is shown in FIG. 1.
According to the figure, the vertical axis represents the number of genes, and 75% of the gene variation coefficients are concentrated between 0 and 1, so that genes with CV >1 are considered as genes with unstable expression and are screened out, and 7390 genes with stable expression are remained.
After screening out the unstable genes, the remaining genes are all genes used for making a reference frame and for subsequent calculations. As follows:
calculating the average expression level of each gene in 16 standard samples, and forming a new matrix together with the genes, wherein the matrix is a gene expression reference system of the human normal diploid embryo. As shown in the following figures:
3. preparation of relative expression quantity matrix
After obtaining the diploid gene expression reference system, the method can be used for detecting the chromosome copy number of 23 verification samples.
First, a relative expression matrix with respect to a reference frame is prepared. Specifically, 7390 genes that overlap the reference frame are selected from the matrix generated in step 1 to form a new matrix. Each gene in the new matrix is divided by the average expression level of the corresponding gene in the diploid embryo gene expression reference frame to generate a relative expression level matrix. The following figures:
4. generation, correction and judgment of CNV of relative expression matrix in chromosome unit
After obtaining the relative expression matrix, the average relative expression quantity of the chromosome genes is calculated by taking the chromosome as a unit. Specifically, using the alignment file downloaded from the UCSC genome database, the genes in each relative expression matrix are mapped to the chromosome on which they reside. The average expression level of the genes contained in the chromosome is calculated every chromosome cycle. Obtaining a chromosome relative expression matrix:
due to individual differences between each sample, the expression level of the sample may be shifted from the reference frame. Most chromosomes in most samples are normal diploids. Therefore, the expression level of 22 chromosomes per sample is multiplied by a coefficient in units of samplesThe median of the chromosome expression of each sample was made equal to 2, indicating that it is a normal diploid. After the chromosome relative expression matrix is normalized by the step, the CNV is further judged.
The coefficients for the 23 samples to be tested are:
to obtainAfter the value of (3), the chromosome relative expression quantity is calibrated by weight to obtain the final relative expression quantity of 22 chromosomes in each sample:
the values in the matrix are calculated values of the chromosome CNV (Copy Number Variation).
5. Thresholding and single cell chromosome copy number visualization
The chromosome copy number is divided by a certain threshold value at this time, and clinical judgment is carried out. Dividing the number of the gene into 0-1.3 deletion (monomer), 1.3-2.7 normal diploid or chimeric, and more than 2.7 duplication (trisomy). The CNV result determined by the method of this time according to the threshold is shown in fig. 2.
6. And (5) verifying the result.
The results of identifying human embryonic cell chromosomal variations using single cell transcriptome-based sequencing data and its gene expression reference lines are shown above in FIG. 2. The results obtained from the gold standards (copy number variation results using DNA whole genome sequencing) for this batch of samples and using the above method of the invention (RNA whole transcriptome sequencing build method to identify the copy number of chromosomes of a single cell) are compared in the following table:
as can be seen from the above table, the 23 sample cases show that the method for establishing and identifying the copy number of the single-cell chromosome based on the sequencing of the whole transcriptome is completely consistent with the embryo result based on the sequencing of the whole genome, the diagnosis accuracy rate obtained by the conventional method is only 43.4%, and the diagnosis accuracy rate of the invention is up to 100%.
It will be apparent to those skilled in the art that various changes and modifications may be made in the present invention without departing from the spirit and scope of the invention. Thus, if such modifications and variations of the present invention fall within the scope of the claims of the present invention and their equivalents, the present invention is also intended to include such modifications and variations.
Claims (2)
1. A method for detecting chromosomal copy number variation in a single cell by transcriptome, said method comprising the steps of:
after the gene expression level is normalized by RPKM, deleting the gene with the expression level average value less than 1 in all diploid samples; then, the coefficient of variation was calculated for the expression level of the remaining gene as follows:
SD is the standard deviation of the gene expression in each sample, mean is the average expression level of the gene;
arranging the coefficient of variation values from high to low according to the distribution condition of the coefficient of variation, selecting genes of which the coefficient of variation values are positioned at the first 25 percent, regarding the genes as genes with unstable expression, and screening out the genes, wherein the remaining genes are genes with stable expression and can be reserved and used for the next calculation;
calculating the average expression level of each gene in the diploid standard sample, and forming a new matrix together with the genes, wherein the matrix is a gene expression reference system of a human normal diploid embryo:
(3) Preparation of relative expression quantity matrix
After obtaining the diploid gene expression reference system, calculating the copy number variation of clinical samples according to the transcript, firstly making a relative expression matrix, specifically, firstly selecting genes which are overlapped with the reference system from the generated matrix to form a new matrix, and dividing each gene in the new matrix by the average expression of the corresponding gene in the reference system to generate the relative expression matrix; wherein, the relative expression quantity of the genes with the expression quantity exceeding 4 is limited to 4, so that the overlarge influence of the fluctuation of a single gene on the whole is avoided; assuming a gene X, the expression level in the reference system isExpression quantity matrix of Gene X in all samples to be examinedComprises the following steps:
geneXThe relative expression matrix of (a) is then:
similarly, the relative expression matrix of all genes is:
(4) Generation, correction and judgment of copy number variation of relative expression matrix in chromosome unit
After obtaining the relative expression matrix, next, calculating the average relative expression level of the genes of the chromosomes by taking the chromosomes as a unit, specifically, by using an alignment file downloaded from the UCSC genome database, the genes in each relative expression matrix are corresponding to the chromosome where the relative expression matrix is located, and each chromosome calculates the average expression level of the genes contained in the chromosome:
wherein n is the number of genes belonging to chromosome i in the diploid reference system,calculating each chromosome once to obtain a relative expression matrix using chromosome as unit, where each row in the matrix is the relative expression of each chromosome in a certain sample, and the relative expression matrixComprises the following steps:
because the expression quantity of the sample is deviated from the reference system due to the individual difference between each sample, most of the chromosomes of most samples are normal diploids, and therefore, the expression quantity of 22 chromosomes of each sample is multiplied by a coefficient in units of samplesAnd (3) normalizing the relative expression matrix of the chromosomes by the step, and then judging the copy number variation.
2. The method of claim 1, wherein the determining of copy number variation comprises:
the expression level of 22 autosomes in each sample wasThe median value of the chromosomal expression was recorded(ii) a Its chromosomal expression coefficientThen it is:
to obtainAfter the values of (2), the expression levels of the 22 chromosomes of the sample can be calculated as follows:
the value in the final chromosome relative expression matrix can represent the value of the chromosome copy number variation, i.e. the copy number variation.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202211322202.9A CN115394359B (en) | 2022-10-27 | 2022-10-27 | Method for detecting single cell chromosome copy number variation through transcriptome |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202211322202.9A CN115394359B (en) | 2022-10-27 | 2022-10-27 | Method for detecting single cell chromosome copy number variation through transcriptome |
Publications (2)
Publication Number | Publication Date |
---|---|
CN115394359A true CN115394359A (en) | 2022-11-25 |
CN115394359B CN115394359B (en) | 2023-03-24 |
Family
ID=84128872
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202211322202.9A Active CN115394359B (en) | 2022-10-27 | 2022-10-27 | Method for detecting single cell chromosome copy number variation through transcriptome |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN115394359B (en) |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN117721222A (en) * | 2024-02-07 | 2024-03-19 | 北京大学第三医院(北京大学第三临床医学院) | Method for predicting embryo implantation by single cell transcriptome and application |
Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20130184999A1 (en) * | 2012-01-05 | 2013-07-18 | Yan Ding | Systems and methods for cancer-specific drug targets and biomarkers discovery |
US20130325360A1 (en) * | 2011-10-06 | 2013-12-05 | Sequenom, Inc. | Methods and processes for non-invasive assessment of genetic variations |
CN105722994A (en) * | 2013-06-17 | 2016-06-29 | 维里纳塔健康公司 | Method for determining copy number variations in sex chromosomes |
CN111363831A (en) * | 2020-04-15 | 2020-07-03 | 兰州大学 | Method for detecting sheep PRAMEY gene copy number variation and application thereof |
CN113192555A (en) * | 2021-04-21 | 2021-07-30 | 杭州博圣医学检验实验室有限公司 | Method for detecting copy number of second-generation sequencing data SMN gene by calculating sequencing depth of differential allele |
-
2022
- 2022-10-27 CN CN202211322202.9A patent/CN115394359B/en active Active
Patent Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20130325360A1 (en) * | 2011-10-06 | 2013-12-05 | Sequenom, Inc. | Methods and processes for non-invasive assessment of genetic variations |
US20130184999A1 (en) * | 2012-01-05 | 2013-07-18 | Yan Ding | Systems and methods for cancer-specific drug targets and biomarkers discovery |
CN105722994A (en) * | 2013-06-17 | 2016-06-29 | 维里纳塔健康公司 | Method for determining copy number variations in sex chromosomes |
CN111363831A (en) * | 2020-04-15 | 2020-07-03 | 兰州大学 | Method for detecting sheep PRAMEY gene copy number variation and application thereof |
CN113192555A (en) * | 2021-04-21 | 2021-07-30 | 杭州博圣医学检验实验室有限公司 | Method for detecting copy number of second-generation sequencing data SMN gene by calculating sequencing depth of differential allele |
Non-Patent Citations (2)
Title |
---|
MICHALIS KONSTANTINIDIS 等: "Aneuploidy and recombination in the human preimplantation embryo. Copy number variation analysis and genome-wide polymorphism genotyping", 《REPRODUCTIVE BIOMEDICINE ONLINE》 * |
赵红翠 等: "卵母细胞及胚胎mtDNA 拷贝数在辅助生殖技术研究中的重要价值", 《现代妇产科进展》 * |
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN117721222A (en) * | 2024-02-07 | 2024-03-19 | 北京大学第三医院(北京大学第三临床医学院) | Method for predicting embryo implantation by single cell transcriptome and application |
CN117721222B (en) * | 2024-02-07 | 2024-05-10 | 北京大学第三医院(北京大学第三临床医学院) | Method for predicting embryo implantation by single cell transcriptome and application |
Also Published As
Publication number | Publication date |
---|---|
CN115394359B (en) | 2023-03-24 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
KR101795124B1 (en) | Method and system for detecting copy number variation | |
CN106715711B (en) | Method for determining probe sequence and method for detecting genome structure variation | |
US20220101944A1 (en) | Methods for detecting copy-number variations in next-generation sequencing | |
CN108350498B (en) | Parting method and device | |
CN115394359B (en) | Method for detecting single cell chromosome copy number variation through transcriptome | |
CN113674803A (en) | Detection method of copy number variation and application thereof | |
CN111755068A (en) | Method and device for identifying tumor purity and absolute copy number based on sequencing data | |
CN106795551B (en) | CNV analysis method and detection device for single cell chromosome | |
CN109461473B (en) | Method and device for acquiring concentration of free DNA of fetus | |
WO2024140881A1 (en) | Method and device for determining fetal dna concentration | |
JP7333838B2 (en) | Systems, computer programs and methods for determining genetic patterns in embryos | |
CN107208152B (en) | Method and apparatus for detecting mutant clusters | |
CN112823391A (en) | Quality control metrics based on detection limits | |
WO2019132010A1 (en) | Method, apparatus and program for estimating base type in base sequence | |
WO2016112539A1 (en) | Method and device for determining fetal nucleic acid content | |
CN113793637B (en) | Whole genome association analysis method based on parental genotype and progeny phenotype | |
EP4435791A1 (en) | Sequence variation analysis method and system, and storage medium | |
CN114974415A (en) | Method and device for detecting chromosome copy number abnormality | |
CN110993024B (en) | Method and device for establishing fetal concentration correction model and method and device for quantifying fetal concentration | |
CN108229099A (en) | Data processing method, device, storage medium and processor | |
Rhie et al. | Chasing perfection: validation and polishing strategies for telomere-to-telomere genome assemblies | |
CN117721222B (en) | Method for predicting embryo implantation by single cell transcriptome and application | |
Kannan et al. | Transcriptomic entropy quantifies cardiomyocyte maturation at single cell level | |
CN114067909B (en) | Method, device and storage medium for correcting homologous recombination defect score | |
CN114708905A (en) | Chromosome aneuploidy detection method, device, medium and equipment based on NGS |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |