CN108363903B - Chromosome aneuploidy detection system suitable for single cell and application - Google Patents

Chromosome aneuploidy detection system suitable for single cell and application Download PDF

Info

Publication number
CN108363903B
CN108363903B CN201810078283.XA CN201810078283A CN108363903B CN 108363903 B CN108363903 B CN 108363903B CN 201810078283 A CN201810078283 A CN 201810078283A CN 108363903 B CN108363903 B CN 108363903B
Authority
CN
China
Prior art keywords
window
relative data
data volume
windows
product
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201810078283.XA
Other languages
Chinese (zh)
Other versions
CN108363903A (en
Inventor
李欣娱
张斯敏
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Basetra Medical Technology Co ltd
Original Assignee
Basetra Medical Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Basetra Medical Technology Co ltd filed Critical Basetra Medical Technology Co ltd
Priority to CN201810078283.XA priority Critical patent/CN108363903B/en
Publication of CN108363903A publication Critical patent/CN108363903A/en
Application granted granted Critical
Publication of CN108363903B publication Critical patent/CN108363903B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16BBIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
    • G16B25/00ICT specially adapted for hybridisation; ICT specially adapted for gene or protein expression
    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12QMEASURING OR TESTING PROCESSES INVOLVING ENZYMES, NUCLEIC ACIDS OR MICROORGANISMS; COMPOSITIONS OR TEST PAPERS THEREFOR; PROCESSES OF PREPARING SUCH COMPOSITIONS; CONDITION-RESPONSIVE CONTROL IN MICROBIOLOGICAL OR ENZYMOLOGICAL PROCESSES
    • C12Q1/00Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions
    • C12Q1/68Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions involving nucleic acids
    • C12Q1/6869Methods for sequencing
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16BBIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
    • G16B15/00ICT specially adapted for analysing two-dimensional or three-dimensional molecular structures, e.g. structural or functional relations or structure alignment
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16BBIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
    • G16B40/00ICT specially adapted for biostatistics; ICT specially adapted for bioinformatics-related machine learning or data mining, e.g. knowledge discovery or pattern finding

Abstract

The invention belongs to the technical field of genetic engineering, and particularly relates to a chromosome aneuploidy detection system suitable for single cells and application thereof. The detection system comprises a window division module, a section copy rate calculation module and a section aneuploidy judgment module. The detection system can detect chromosome deletion/duplication of more than 5Mb by using at most 700k reads/samples, the accuracy can reach 99.9%, the sequencing depth is reduced, and the cost is reduced.

Description

Chromosome aneuploidy detection system suitable for single cell and application
Technical Field
The invention belongs to the technical field of genetic engineering, and particularly relates to a chromosome aneuploidy detection system suitable for single cells and application thereof.
Background
Aneuploidy refers to the absence or additional addition of one or several chromosomes in an aneuploidy chromosome. Typically, gametes with abnormal chromosome numbers are formed because a pair of homologous chromosomes do not segregate or segregate in advance during meiosis, and such gametes bind to each other or to normal gametes, resulting in a variety of aneuploid cells. In addition, aneuploid cells, such as tumor cells with very high mutation rate, are also generated during the division of somatic cells.
Non-whole chromosomes are closely related to some genetic diseases in humans. The most common cases are Down syndrome, with the incidence rate of about 1/800, which is caused by an extra chromosome 21, and 13-trisomy and 18-trisomy syndrome, which is caused by an extra chromosome 13 and 18, respectively, resulting in abortion, etc. Autosomal aneuploidy is also a large aspect of the underlying cause of abortion resulting from pregnancy failure. Abnormal numbers of sex chromosomes can cause abnormal sexual development. Individuals who have one more X chromosome (47, XXY) in males are congenital testicular dysplasia (Klinefelter syndrome). Turner syndrome is also called congenital ovarian hypoplasia syndrome, because of deletion of an X chromosome, the karyotype is 45, X.
Embryo refers to sexual reproduction, which is a embryonic body capable of developing into an adult organism after a plurality of cell divisions and cell differentiation after a male germ cell and a female germ cell are combined to form a zygote. The embryo refers to the initial stage of the sexual reproduction development process, from the first division of the fertilized egg to the first development stage, which is the earliest stage in developmental biology.
Cells are the basic unit of life composition and also the basic unit of the whole set of chromosome composition. Genetic testing of embryos prior to implantation is currently required at the single (or multiple) cell level. Analyzing chromosome composition at the single cell level and detecting whether chromosomes are normal are also common research methods.
Traditional methods for detecting embryonic aneuploidy include Fluorescence In Situ Hybridization (FISH), realtime-PCR, MLPA, biochips, and the like. The biochip is divided into a comparative genome hybridization chip and an SNP chip, which has become a main means for detecting the heteroploids, but the biochip has low flux, can only detect limited embryos at one time, and has high cost and relatively complex operation. FISH and realtime-PCR have been applied to the detection of over 80% of heteroploids as faster molecular biological detection methods, but they are limited by the number of probes in the method, and can not realize the complete detection of all 23 pairs of chromosomes at the same time, and the flux is very low.
The new generation sequencing technology is rapidly developed, and the application of the sequencing technology in chromosome detection is more and more. Dennis Lo et al developed a method for detecting free nucleic acid in maternal plasma based on Illumina GA high-throughput sequencing, and the accuracy of detecting trisomy 21 could reach more than 99%. For embryo aneuploidy detection, although detection methods based in part on the next generation sequencing technology are also developed, the detection methods are not perfect in the aspects of detection speed, operation complexity, cost and the like.
Disclosure of Invention
In view of the above-mentioned shortcomings of the prior art, the present invention provides a single cell chromosome aneuploidy detection system and application thereof for rapidly detecting chromosome aneuploidy of embryonic cells, germ cells and other cells.
In order to achieve the above objects and other related objects, the present invention adopts the following technical solutions:
in a first aspect of the present invention, there is provided a system for detecting chromosomal aneuploidy suitable for single cells, comprising: a window dividing module for dividing a window over a sequence of a reference genome; the section dividing module is used for determining the breakpoint position according to the relative data quantity ratio of each window obtained by calculating the sample to be detected and the normal sample, and dividing the section according to the breakpoint position; wherein the normal sample does not have a chromosomal aneuploidy; a block copy rate calculation module for calculating a block copy rate CRj(ii) a And the segment aneuploidy judging module is used for judging whether the segment has chromosome aneuploidy or not according to the segment copy rate.
In a possible implementation manner, in the section dividing module, the relative data amount ratio is calculated by using the following formula:
ri=Wi/N
Figure BDA0001560229050000021
Wirepresenting the number of effective sequences falling into the ith window, wherein i is more than or equal to 1 and less than or equal to S, and S is the total number of the windows;
n represents the total number of effective sequences obtained by sequencing the sample to be detected; wherein the effective sequence is a sequence aligned to the reference genome in a sequence obtained by whole genome sequencing;
rirepresenting the relative amount of data for the ith window;
Figure BDA0001560229050000022
the relative data volume of the ith window of the sample to be detected;
Figure BDA0001560229050000023
relative data volume of the ith window of normal samples;
Viis the relative data volume ratio of the ith window.
In one possible implementation, the relative data volume ratio ViGC content correction is carried out to obtain the corrected relative data volume ratio V of each windowi', correction of the relative data amount ratio Vi' calculated using the following formula:
Vi′=Vi/Ce
Figure BDA0001560229050000031
Figure BDA0001560229050000032
Figure BDA0001560229050000033
Ceis the weight of the window falling within the e-th GC content interval,
Keas all windows V falling within the e-th GC content intervaliThe mean value of the values is determined,
g is the total number of all windows that fall within the e-th GC content interval,
Figure BDA0001560229050000034
is the relative data volume ratio V of all windows of the sample to be detectediThe mean value of the values is determined,
Vi' is the corrected relative data amount ratio of the ith window.
In a possible implementation manner, in the segment dividing module, the breakpoint position refers to a boundary between two adjacent windows having significant difference in relative data volume ratio.
In a possible implementation manner, in the segment dividing module, the breakpoint position is determined by the following method: using n adjacent windows as window group, dividing S windows into mA window group, wherein m is S-n +1, and m is the total number of the window group; s is the total number of windows divided by the sample; n is the number of windows in a window group, and then the relative data volume ratio V of each window in each window group is usediCarrying out nonparametric run length inspection, calculating the P value of each adjacent window group, and if the P value is smaller than a threshold value, judging that the boundary of the corresponding adjacent window group is the breakpoint position of Copy Number Variation (CNV); wherein the P value is a significant difference P value.
In one possible implementation, the sector copy rate is an average value of the ratio of the relative data amount of each window in the sector.
In one possible implementation, the sector copy rate is calculated using the following formula:
Figure BDA0001560229050000041
h is the number of windows in the j section;
b is the number of broken points;
CRjthe copy rate of the jth section is obtained by calculating the average value of the relative data volume ratio of each window in the jth section.
In a possible implementation manner, in the section aneuploidy determination module, if 0.75 is used<CRj<1.25, judging that the jth section is normal; if CR isjLess than or equal to 0.75 or CRjAnd if the number of the segments is more than or equal to 1.25, judging that the jth segment is a non-integral multiple abnormality.
In one possible implementation, if the jth segment corresponds to a whole chromosome, it can be determined whether the whole chromosome is a non-integer multiple abnormality.
In one possible implementation, the detection system further includes: the system comprises a cell separation module, a cell whole genome amplification module, a cell amplification product quality control module, a library construction module and a high-throughput sequencing module.
In a possible implementation manner, the cell amplification product quality control module is configured to perform quality control on a whole genome amplification product of a cell by using a housekeeping gene of the cell, and determine whether a quality control result meets a preset condition, and if the quality control result meets the preset condition, the whole genome amplification product is a qualified amplification product.
In one possible implementation, the library construction module is for constructing a sequencing library of the qualified amplification products, the library construction module comprising: the device comprises a crushing submodule, a joint connecting submodule, a mixing submodule, a mixed product purifying submodule, a PCR enrichment submodule and an enrichment product purifying submodule, wherein the crushing submodule is used for crushing the qualified amplification product to obtain a crushed product; the joint connection submodule is used for adding joints to the crushed product to obtain a joint crushed product; the mixing submodule is used for mixing the adaptor crushing products of a plurality of samples in an equal amount to obtain a mixed product; the mixed product purification submodule is used for purifying the mixed product; the PCR enrichment submodule is used for enriching the mixed product purified by the mixed product purification module to obtain an enriched product; and the enriched product purification submodule is used for purifying the enriched product to obtain the sequencing library.
The second aspect of the invention provides a method for detecting chromosome aneuploidy suitable for single cells, which comprises the following steps: s1, dividing windows on the sequence of the reference genome to obtain S windows; s2, determining the breakpoint position according to the relative data volume ratio of each window obtained by calculating the sample to be detected and the normal sample, and dividing sections according to the breakpoint position; wherein the normal sample does not have a chromosomal aneuploidy; wherein the normal sample does not have a chromosomal aneuploidy; s3, calculating the copy rate CR of the sectionj(ii) a S4, judging whether the segment is specific chromosome aneuploidy according to the segment copy rate.
In one possible implementation, the relative data volume ratio is calculated using the following formula:
ri=Wi/N
Figure BDA0001560229050000051
Wiindicating that falls into the ith windowThe number of effective sequences, i is more than or equal to 1 and less than or equal to S,
n represents the total number of effective sequences obtained by sequencing the sample to be detected; wherein the effective sequence is a sequence aligned to the reference genome in a sequence obtained by whole genome sequencing;
rirepresenting the relative amount of data for the ith window;
Figure BDA0001560229050000052
the relative data volume of the ith window of the sample to be detected;
Figure BDA0001560229050000053
relative data volume of the ith window of normal samples;
Viis the relative data volume ratio of the ith window.
In one possible implementation, the relative data volume ratio ViGC content correction is carried out to obtain the corrected relative data volume ratio V of each windowi', correction of the relative data amount ratio Vi' calculated using the following formula:
Vi′=Vi/Ce
Figure BDA0001560229050000061
Figure BDA0001560229050000062
Figure BDA0001560229050000063
Ceis the weight of the window falling within the e-th GC content interval,
Keas all windows V falling within the e-th GC content intervaliThe mean value of the values is determined,
g is the total number of all windows that fall within the e-th GC content interval,
Figure BDA0001560229050000064
the ratio V of the relative data amounts of all windows of the sampleiThe mean value of the values is determined,
Vi' is the corrected relative data amount ratio of the ith window.
In one possible implementation, the breakpoint position refers to a boundary between two adjacent windows having significant difference in relative data volume ratio.
In one possible implementation, the breakpoint position is determined by: dividing n adjacent windows in the S windows into a window group to obtain m window groups, wherein m is S-n + 1; m is the total number of window groups; s is the total number of the windows divided by the window dividing module; n is the number of windows contained in a window population; then, the relative data volume ratio V of each window in each window group is utilizediCarrying out nonparametric run length inspection, calculating the P value of each adjacent window group, and if the P value is smaller than a threshold value, judging that the boundary of the corresponding adjacent window group is the breakpoint position of Copy Number Variation (CNV); wherein the P value is a significant difference P value.
In one possible implementation, the sector copy rate is an average value of the ratio of the relative data amount of each window in the sector.
In one possible implementation, the sector copy rate CRjCalculated using the following formula:
Figure BDA0001560229050000071
h is the number of windows in the j section;
b is the number of broken points;
CRjis the copy rate of the j section, CRjObtained by calculating the average value of the relative data volume ratio of each window in the j section.
In one possible implementation, in step S4, if 0.75<CRj<1.25, judging that the jth section is normal; if CR isjLess than or equal to 0.75 or CRjAnd if the number of the segments is more than or equal to 1.25, judging that the jth segment is a non-integral multiple abnormality.
In one possible implementation, if the jth segment corresponds to a whole chromosome, it can be determined whether the whole chromosome is a non-integer multiple abnormality.
In one possible implementation, the method for detecting chromosomal aneuploidy for single cells further comprises the steps of: separating cells; amplifying a whole genome of the cell; controlling the quality of cell amplification products; constructing a library; high throughput sequencing.
In a possible implementation manner, the quality control of the cell amplification product is used to perform quality control on the whole genome amplification product of the cell by using a housekeeping gene of the cell, and determine whether a quality control result meets a preset condition, and if the quality control result meets the preset condition, the whole genome amplification product is a qualified amplification product.
In one possible implementation manner, the library construction comprises DNA fragmentation, linker ligation, mixing, mixed product purification, PCR enrichment and enriched product purification, wherein the DNA fragmentation refers to the fragmentation of the qualified amplification product to obtain a fragmented product; the joint connection means adding a joint to the crushed product to obtain a joint crushed product; the mixing refers to mixing the adaptor crushing products of a plurality of samples in equal quantity to obtain a mixed product; the mixed product purification refers to purifying the mixed product; and the enrichment product purification refers to purifying the enrichment product to obtain a sequencing library.
Compared with the prior art, the system and the method for detecting the chromosome aneuploidy of the single cell have the following beneficial effects:
(1) the required sample initial amount is low, and the chromosome aneuploidy of single cells can be detected.
(2) The size of the window can be set according to requirements, and the detection resolution is improved by controlling the size of the window.
(3) Quality control of whole genome amplification product
After the embryo cell completes the single cell whole genome amplification, the quality control is carried out on the amplification product to evaluate whether the whole genome amplification process is successful or not, if the whole genome amplification process is unsuccessful, the experiment is stopped, and the cost consumption is reduced.
(4) First mixing the sample and then enriching
In the library building process, all sample DNAs connected with the joints are mixed in equal amount, and then PCR enrichment is carried out.
(5) 1ng of starting DNA library
The cell sample belongs to a trace sample, the DNA amount is small, the detection can be finished by 1ng of DNA in the whole detection process, and the waste of the sample is reduced.
(6) 700k reads can complete variation detection of more than 5Mb
At most 700k reads/sample are used, so that chromosome deletion/duplication of more than 5Mb can be detected on the sample, and the accuracy rate can reach 99.9%. The sequencing depth is reduced, and the cost is reduced.
(7) Sample data correction
Because a cell sample is subjected to whole genome amplification, a plurality of amplification deviations are introduced, and the method introduces sample data correction and reduces the influence of the deviations on results.
Drawings
FIG. 1 is a schematic diagram of an implementation process for detecting embryo preimplantation chromosome aneuploidy by using the single cell chromosome aneuploidy detection system of the present invention.
FIG. 2: schematic representation of the amplification principle of MDA.
FIG. 3: the relative data volume ratio before GC content correction is shown in the distribution diagram.
FIG. 4: a graph showing the distribution of corrected relative data volume ratios after GC content correction.
FIG. 5: chromosome detection result chart of T22 abnormal embryo cells.
FIG. 6: chromosome detection result of segment deletion/repeat embryonic cells.
FIG. 7: chromosome detection result chart of normal embryo cells.
Detailed Description
Before the present embodiments are further described, it is to be understood that the scope of the invention is not limited to the particular embodiments described below; it is also to be understood that the terminology used in the examples is for the purpose of describing particular embodiments, and is not intended to limit the scope of the present invention; in the description and claims of the present application, the singular forms "a", "an" and "the" include plural referents unless the context clearly dictates otherwise.
When numerical ranges are given in the examples, it is understood that both endpoints of each of the numerical ranges and any value therebetween can be selected unless the invention otherwise indicated. Unless defined otherwise, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this invention belongs. In addition to the specific methods, devices, and materials used in the examples, any methods, devices, and materials similar or equivalent to those described in the examples may be used in the practice of the invention in addition to the specific methods, devices, and materials used in the examples, in keeping with the knowledge of one skilled in the art and with the description of the invention.
Unless otherwise indicated, the experimental methods, detection methods, and preparation methods disclosed herein all employ techniques conventional in the art of molecular biology, biochemistry, chromatin structure and analysis, analytical chemistry, cell culture, recombinant DNA technology, and related arts. These techniques are well described in the literature, and may be found in particular in the study of the MOLECULAR CLONING, Sambrook et al: a LABORATORY MANUAL, Second edition, Cold Spring Harbor LABORATORY Press, 1989 and Third edition, 2001; ausubel et al, Current PROTOCOLS IN MOLECULAR BIOLOGY, John Wiley & Sons, New York, 1987 and periodic updates; the series METHODS IN ENZYMOLOGY, Academic Press, San Diego; wolffe, CHROMATIN STRUCTURE AND FUNCTION, Third edition, Academic Press, San Diego, 1998; (iii) METHODS IN ENZYMOLOGY, Vol.304, Chromatin (P.M.Wassarman and A.P.Wolffe, eds.), Academic Press, San Diego, 1999; and METHODS IN MOLECULAR BIOLOGY, Vol.119, chromatography Protocols (P.B.Becker, ed.) Humana Press, Totowa, 1999, etc.
The embodiment of the invention provides a chromosome aneuploidy detection system suitable for single cells, which comprises: the device comprises a window division module, a section copy rate calculation module and a section aneuploidy judgment module.
The window dividing module is used for dividing a window on the sequence of the reference genome; there are a number of alternatives for window division, and in one embodiment, the windows may be divided by the number of equal bases of the reference genome in each window; in another embodiment, the windows may be divided in such a way that the number of valid sequences in which the sequence of normal samples falls within each window is equal; in another embodiment, the windows may be divided in such a way that the number of valid sequences in which the mimic sequence of the reference genome falls within each window is equal; the base number in a single window or the number of effective sequences falling into the single window can be set according to requirements, for example, any number of 50-1000kb can be taken, specifically, for example, 500kb is selected, and generally, the smaller the number is, the higher the resolution of the result is; wherein the normal sample is a sample without chromosome aneuploidy and with normal chromosomes.
The section dividing module is used for determining the breakpoint position according to the relative data quantity ratio of each window obtained by calculating the sample to be detected and the normal sample, and dividing the section according to the breakpoint position; wherein the normal sample does not have a chromosomal aneuploidy.
A block copy rate calculation module for calculating a block copy rate CRj
And the segment aneuploidy judging module is used for judging whether the segment has chromosome aneuploidy or not according to the segment copy rate. According to the theory of genetics, it is considered that 0.75<CRj<1.25 is normal, if the j section is out of the range, the j section is judged to be abnormal by non-integral multiple; if the jth segment corresponds to a whole chromosome, it can be determined whether the whole chromosome is a non-integral multiple abnormality.
In one example, in the section dividing module, the relative data amount ratio is calculated by using the following formula:
ri=Wi/N
Figure BDA0001560229050000101
Wirepresenting the number of effective sequences falling into the ith window, wherein i is more than or equal to 1 and less than or equal to S, and S is the total number of the windows;
n represents the total number of effective sequences obtained by sequencing the sample to be detected; wherein the effective sequence is a sequence aligned to the reference genome in a sequence obtained by whole genome sequencing;
rirepresenting the relative amount of data for the ith window;
Figure BDA0001560229050000102
the relative data volume of the ith window of the sample to be detected;
Figure BDA0001560229050000103
relative data volume of the ith window of normal samples;
Viis the relative data volume ratio of the ith window.
In one example, the relative data volume ratio ViGC content correction is carried out to obtain the corrected relative data volume ratio V of each windowi', correction of the relative data amount ratio Vi' calculated using the following formula:
Vi′=Vi/Ce
Figure BDA0001560229050000104
Figure BDA0001560229050000111
Figure BDA0001560229050000112
Ceis the weight of the window falling within the E-th GC content interval, E ∈ [1, E]E is the total number of GC content intervals;
Keis the relative data volume ratio V of all windows falling within the e-th GC content intervaliThe mean value of the values is determined,
g is the total number of all windows that fall within the e-th GC content interval,
Figure BDA0001560229050000113
is the relative data volume ratio V of all windows of the sample to be detectediThe mean value of the values is determined,
Vi' is the corrected relative data amount ratio of the ith window.
And E GC content intervals are obtained by calculating according to the GC content of the effective sequence of each window in the S windows and dividing the effective sequence into E intervals. In other words, the average GC content of the effective sequences in each window is calculated as the window GC content from the GC content data of the effective sequences falling in each window, the GC content is divided into a plurality of equally spaced intervals, and the GC content interval into which the corresponding window falls is known from the window GC content.
In one example, in the segment dividing module, the breakpoint position refers to a boundary between two adjacent windows having significant difference in relative data volume ratio.
In one example, in the segment dividing module, the breakpoint position is determined by: taking n adjacent windows as a window group, and dividing S windows into m window groups, wherein m is S-n +1, and m is the total number of the window groups; s is the total number of windows divided by the sample; n is the number of windows in a window group, and then the relative data volume ratio V of each window in each window group is usediNon-parameter run length detection is carried out, the P value of each adjacent window group is calculated, if the P value is smaller than the threshold value, the boundary part of the corresponding adjacent window group is judged to be copyBreakpoint location of point number variation (CNV); wherein the P value is a significant difference P value.
In one example, the sector copy rate is an average of the relative data volume ratios of the windows in a sector.
In one example, the sector copy rate is calculated using the following formula:
Figure BDA0001560229050000121
h is the number of windows in the j section;
b is the number of broken points;
CRjthe copy rate of the jth section is obtained by calculating the average value of the relative data volume ratio of each window in the jth section.
In one example, in the block for determining the block aneuploidy, if 0.75 is used<CRj<1.25, judging that the jth section is normal; if CR isjLess than or equal to 0.75 or CRjAnd if the number of the segments is more than or equal to 1.25, judging that the jth segment is a non-integral multiple abnormality.
In one example, if the jth segment corresponds to a whole chromosome, it can be determined whether the whole chromosome is a non-integer multiple abnormality.
In one example, the detection system further comprises: the system comprises a cell separation module, a cell whole genome amplification module, a cell amplification product quality control module, a library construction module and a high-throughput sequencing module.
The cell separation module can be a single cell separation module, and the single cell separation module can separate single cells by adopting a dilution method, an oral pipette separation method, micromanipulation, microdissection, flow cytometry separation, microfluidics and other methods. The embryo cell separation adopts a separation method of micromanipulation and microdissection.
The cell whole genome amplification module can adopt methods such as PEP-PCR, DOP-PCR, OmniPlex WGA, MDA (multiple strand displacement amplification) and the like, and can also adopt commercial kits such as GenomePlex of SigmaAldrich, PicoPlex of Rubicon Genomics, REPLI-g of Qiagen, illustra genomiPhi of GEHealthcare and the like to amplify cell whole genome. In one embodiment, the cell genome-wide amplification module employs an MDA amplification method to obtain a more complete and high-coverage amplification product. The amplification principle of MDA is shown in figure 2.
In one example, the cell amplification product quality control module is configured to perform quality control on a whole genome amplification product of a cell by using a housekeeping gene of the cell, determine whether a quality control result meets a preset condition, and if the quality control result meets the preset condition, indicate that the whole genome amplification product is a qualified amplification product. The housekeeping genes include 8 genes, CYB5A, PRPH, GABARAPL2, ACTG1, ndifa 7, UQCRC1, MYC, MIF, respectively. If 6 or more genes are amplified from among the 8 housekeeping genes, the genes are rated as A; if 4 or 5 were amplified, then the rating was B; if 2 or 3 were amplified, then the rating was C; if the number of amplified housekeeping genes was less than 2, the rating was D. A grade D may be defined as a non-qualified amplification product, and a grade C, D may be defined as a non-qualified amplification product.
In one example, the library construction module is for constructing a sequencing library of the qualified amplification products, the library construction module comprising: the device comprises a crushing submodule, a joint connecting submodule, a mixing submodule, a mixed product purifying submodule, a PCR enrichment submodule and an enrichment product purifying submodule.
And the crushing submodule is used for crushing the qualified amplification product to obtain a crushed product. In one embodiment, the qualified amplification product is diluted 15-25 times, preferably 20 times, and then 1ng of the amplification product is fragmented using a fragmentation submodule.
The joint connection submodule is used for adding joints to the crushed product to obtain a joint crushed product;
the mixing submodule is used for mixing the adaptor crushing products of a plurality of samples in an equal amount to obtain a mixed product;
a mixed product purification submodule is used for purifying the mixed product;
the PCR enrichment submodule is used for enriching the mixed product purified by the mixed product purification module to obtain an enriched product;
and an enriched product purification submodule is used for purifying the enriched product to obtain the sequencing library.
The library is prepared by adopting the library construction module, and the total process from DNA crushing to joint connection and enrichment only needs 3 hours.
The high-throughput sequencing module is used for sequencing according to the sequencing library to obtain a sequencing result. The high-throughput sequencing can adopt Illumina Hiseq series, Illumina NextSeq series, Miseq sequencing system, Illumina MiniSeq sequencing system, Ion Torrent sequencing system and the like.
The embodiment of the invention also provides a chromosome aneuploidy detection method suitable for the single cell, which comprises the following steps as shown in figure 1: s1, dividing windows on the sequence of the reference genome to obtain S windows; s2, determining the breakpoint position according to the relative data volume ratio of each window obtained by calculating the sample to be detected and the normal sample, and dividing sections according to the breakpoint position; wherein the normal sample does not have a chromosomal aneuploidy; wherein the normal sample does not have a chromosomal aneuploidy; s3, calculating the copy rate CR of the sectionj(ii) a S4, judging whether the segment is specific chromosome aneuploidy according to the segment copy rate.
In one possible implementation, the relative data volume ratio is calculated using the following formula:
ri=Wi/N
Figure BDA0001560229050000131
Wirepresenting the number of valid sequences falling in the ith window, i is more than or equal to 1 and less than or equal to S,
n represents the total number of effective sequences obtained by sequencing the sample to be detected; wherein the effective sequence is a sequence aligned to the reference genome in a sequence obtained by whole genome sequencing;
rirepresenting the relative amount of data for the ith window;
Figure BDA0001560229050000141
the relative data volume of the ith window of the sample to be detected;
Figure BDA0001560229050000142
relative data volume of the ith window of normal samples;
Viis the relative data volume ratio of the ith window.
In one example, the relative data volume ratio ViGC content correction is carried out to obtain the corrected relative data volume ratio V of each windowi', correction of the relative data amount ratio Vi' calculated using the following formula:
Vi′=Vi/Ce
Figure BDA0001560229050000143
Figure BDA0001560229050000144
Figure BDA0001560229050000145
Ceis the weight of the window falling within the e-th GC content interval,
Keas all windows V falling within the e-th GC content intervaliThe mean value of the values is determined,
g is the total number of all windows that fall within the e-th GC content interval,
Figure BDA0001560229050000146
the ratio V of the relative data amounts of all windows of the sampleiThe mean value of the values is determined,
Vi' corrected relative data amount ratio of ith window。
In one example, the breakpoint location refers to the boundary of two adjacent windows having significant differences in relative data volume ratios.
In one example, the breakpoint location is determined by: dividing n adjacent windows in the S windows into a window group to obtain m window groups, wherein m is S-n + 1; m is the total number of window groups; s is the total number of the windows divided by the window dividing module; n is the number of windows contained in a window population; then, the relative data volume ratio V of each window in each window group is utilizediCarrying out nonparametric run length inspection, calculating the P value of each adjacent window group, and if the P value is smaller than a threshold value, judging that the boundary of the corresponding adjacent window group is the breakpoint position of Copy Number Variation (CNV); wherein the P value is a significant difference P value.
In one example, the sector copy rate is an average of the relative data volume ratios of the windows in a sector.
In one example, the sector copy rate CRjCalculated using the following formula:
Figure BDA0001560229050000151
h is the number of windows in the j section;
b is the number of broken points;
CRjis the copy rate of the j section, CRjObtained by calculating the average value of the relative data volume ratio of each window in the j section.
In one example, in step S4, if 0.75<CRj<1.25, judging that the jth section is normal; if CR isjLess than or equal to 0.75 or CRjAnd if the number of the segments is more than or equal to 1.25, judging that the jth segment is a non-integral multiple abnormality.
In one example, if the jth segment corresponds to a whole chromosome, it can be determined whether the whole chromosome is a non-integer multiple abnormality.
In one example, the method for detecting chromosomal aneuploidy applicable to single cells further comprises the steps of: separating cells; amplifying a whole genome of the cell; controlling the quality of cell amplification products; constructing a library; high throughput sequencing.
In one example, the quality control of the cell amplification product is used for performing quality control on the whole genome amplification product of the cell by using a housekeeping gene of the cell, judging whether a quality control result meets a preset condition, and if the quality control result meets the preset condition, indicating that the whole genome amplification product is a qualified amplification product.
In one example, the library construction comprises DNA fragmentation, adaptor ligation, mixing, mixed product purification, PCR enrichment and enriched product purification, wherein the DNA fragmentation refers to the fragmentation of the qualified amplification product to obtain a fragmented product; the joint connection means adding a joint to the crushed product to obtain a joint crushed product; the mixing refers to mixing the adaptor crushing products of a plurality of samples in equal quantity to obtain a mixed product; the mixed product purification refers to purifying the mixed product; and the enrichment product purification refers to purifying the enrichment product to obtain a sequencing library.
The chromosome aneuploidy detection system and method suitable for the single cell provided by the embodiment of the invention have the following beneficial effects:
(1) the required sample initial amount is low, and the chromosome aneuploidy of single cells can be detected.
(2) The size of the window can be set according to requirements, and the detection resolution is improved by controlling the size of the window.
(3) Quality control of whole genome amplification product
After the embryo cell completes the single cell whole genome amplification, the quality control is carried out on the amplification product to evaluate whether the whole genome amplification process is successful or not, if the whole genome amplification process is unsuccessful, the experiment is stopped, and the cost consumption is reduced.
(4) First mixing the sample and then enriching
In the library building process, all sample DNAs connected with the joints are mixed in equal amount, and then PCR enrichment is carried out.
(5) 1ng of starting DNA library
The cell sample belongs to a trace sample, the DNA amount is small, the detection can be finished by 1ng of DNA in the whole detection process, and the waste of the sample is reduced.
(6) 700k reads can complete variation detection of more than 5Mb
At most 700k reads/sample are used, so that chromosome deletion/duplication of more than 5Mb can be detected on the sample, and the accuracy rate can reach 99.9%. The sequencing depth is reduced, and the cost is reduced.
(7) Sample data correction
Because a cell sample is subjected to whole genome amplification, a plurality of amplification deviations are introduced, and the method introduces sample data correction and reduces the influence of the deviations on results.
The following provides an exemplary description of the method for detecting chromosome aneuploidy of single cells according to the embodiments of the present invention.
Example 1
Samples were selected as follows:
normal male white blood cells, marked as male; normal female white blood cells, noted as female; normal embryonic cells, designated EC-1; t22 abnormal embryonic cells (T22, abbreviated as trisomy 22 and designated trisomy 22), designated EC-2; fragment-deleted/repeated embryonic cells (q25.1-q26 repeats) were scored as EC-3.
In the present embodiment, "normal" means that the cell does not have chromosome aneuploidy, that is, the chromosomes in the cell are all integer multiples.
The experimental operation flow is as follows:
1. embryo cell isolation
Under a microscope, using a laser cutting technique, the embryonic trophoblast cells are cut into 3-10 cells, and the cut cells are all taken out using a capillary pipette, washed three times in PBS, and centrifuged in a PCR tube of 3.5. mu.l PBS for use.
2. Single cell whole genome amplification
The REPLI-g single cell kit of Qiagen is selected for single cell whole genome amplification, and the specific operation is as follows:
21. preparing liquid: a new tube of DLB lyophilized powder: adding 500 mu lddH2Dissolving O; the D2Buffer was prepared as follows:
Figure BDA0001560229050000171
mixing, and storing at-20 deg.C for no more than one month.
22. The cells were loaded on an ice box, 3. mu.l of D2Buffer was added to 4. mu.l of PBS (containing the cell sample), gently mixed, and incubated at 65 ℃ for 10 min.
23. The mixture was loaded onto an ice box, 3. mu.l of Stop Solution was added, gently mixed, and cooled in a refrigerator at 4 ℃.
24. Mixing with MDA mixed solution, shaking and mixing; the MDA mixed solution is prepared as follows:
Figure BDA0001560229050000172
25. and (3) adding 40 mul of the MDA mixed solution prepared in the step (24) into the mixed solution in the step (23), and shaking and uniformly mixing.
26. Set up the hot lid on the thermocycler and perform the following procedure:
30℃ 8h
65℃ 3min
4℃ ∞
27. the product of step 26 was centrifuged rapidly and placed at 4 ℃ until use (long-term storage at-20 ℃).
3. Quality control of single cell amplification product
8 pairs of housekeeping genes (CYB5A, PRPH, GABARAPL2, ACTG1, NDUFA7, UQCRC1, MYC and MIF) on the genome are selected, and the product of the step 26 is subjected to quality control to determine whether the amplification process is qualified. The operation flow is as follows:
the product of step 26 is a single cell amplification product, which may also be referred to as a WGA product.
31. The WGA product was diluted 30-fold and the PCR mix was prepared as follows:
Figure BDA0001560229050000181
32. taking 1 mul of the diluted WGA product, respectively adding 19 mul of PCR mixed solution, and using a pipettor to blow and suck up and down for 10 times to mix uniformly, or shaking and mixing uniformly on an oscillator for 10 s;
33. briefly centrifuged, placed on a thermocycler, set hot lid, and run the following program:
Figure BDA0001560229050000182
34. mu.l of the PCR product was collected and detected by electrophoresis using 2% agarose gel using 1kb Marker.
35. Quality control standard:
Figure BDA0001560229050000183
4. library construction:
41. WGA product disruption
411. The diluted WGA product in the step 31 is diluted by 20 times, and 1ng of the diluted WGA product is taken for the subsequent library construction process.
412. A hot lid was placed on the thermocycler and the following program was set up, the program being named Frag:
Figure BDA0001560229050000184
the program was run and paused when run to 4 ℃.
413. The required reagents were mixed well in a 0.2ml PCR tube according to the following system, the amount of DNA sample was insufficient, and sterile water was used to make up.
Figure BDA0001560229050000185
Step 413 is operated on an ice box, and 5 х WGS Fragmentation Mix needs to be fully shaken and mixed for more than 30s before being used.
414. Using a pipettor to blow and suck up and down for 10 times to mix uniformly, or shaking and mixing uniformly on an oscillator for 10s (note: ensuring that each tube of reaction liquid is in a melting state);
415. briefly centrifuged, placed on a thermocycler and subjected to the Frag procedure.
42. Joint connection
421. Adding the following reagents directly into the crushed sample mixed solution:
Figure BDA0001560229050000191
422. using a pipettor to blow and suck up and down for 10 times to mix uniformly, or shaking and mixing uniformly on an oscillator for 10 s;
423. centrifuging for a short time, and incubating on a thermocycler at 25 deg.C for 10 min;
424. rapid centrifugation, 2.5. mu.l of product from each reaction tube were mixed into a new PCR tube.
425. Mix well with shaking, centrifuge briefly, mix volume 12.5 μ l (3 embryo samples +2 control samples) [ need to explain here, will join each sample DNA after equal amount of mixing ].
43. Product mixing purification
431. Resuspending AMPure XP Beads half an hour in advance and placing at room temperature for standby, and preparing 80% alcohol (1000 mul per reaction) for standby;
432. adding 8 μ l of resuspended AMPure XP Beads into 12.5 μ l of the connecting mixed product, and blowing and sucking up and down by using a pipette gun for uniformly mixing for 10 times, or shaking and uniformly mixing for 10s by using an oscillator;
433. standing at room temperature for 5 min;
434. after rapid centrifugation, the solution is placed on a magnetic frame for 5min until the solution is clear, and the supernatant is carefully removed without throwing away the magnetic beads;
435. keeping the magnetic frame still, adding 200 μ l of freshly prepared 80% alcohol into the centrifuge tube, standing at room temperature for 30s, and carefully removing the supernatant;
436. repeating 435;
437. the centrifuge tube was held on the magnetic stand and the centrifuge tube lid was opened and air dried for about 5min (note: beads not overdry)
438. Taking the centrifuge tube off the magnetic frame, adding 52 μ l of double distilled water, blowing and sucking up and down the pipette to resuspend the magnetic beads, and standing at room temperature for 5 min;
439. quickly centrifuging, placing the centrifugal tube on a magnetic frame, standing at room temperature for about 5min until the solution is clear, and transferring 50 μ l of supernatant to a new PCR tube;
4310. adding 40 μ l of resuspended AMPure XP Beads into 50 μ l of purified product, and blowing and sucking up and down by using a pipette gun for uniformly mixing for 10 times, or shaking and uniformly mixing for 10s by using an oscillator;
4311. repeating 433 and 437;
4312. taking the centrifuge tube off the magnetic frame, adding 22 μ l of double distilled water, sucking with pipette up and down for 10 times until the magnetic beads are resuspended, and standing at room temperature for 5 min;
4313. and (3) quickly centrifuging, placing the centrifugal tube on a magnetic frame, standing at room temperature for about 5min until the solution is clarified, and transferring 20 mu l of supernatant into a new PCR tube to obtain the adaptor DNA.
44. Enrichment by PCR
441. The following reagents were added to the linker DNA obtained in step 4313:
Figure BDA0001560229050000201
442. using a pipettor to blow and suck up and down for 10 times to mix uniformly, or shaking and mixing uniformly on an oscillator for 10 s;
443. briefly centrifuged, placed on a thermocycler, set hot lid, and run the following program:
Figure BDA0001560229050000202
45. purification of enriched product
451. Resuspending AMPure XP Beads half an hour in advance and placing at room temperature for standby, and preparing 80% alcohol (500 mul per reaction) for standby;
452. adding 45 mu l of resuspended AMPure XP Beads into 50 mu l of PCR products, and blowing and sucking up and down by using a pipette gun for uniformly mixing for 10 times or shaking and uniformly mixing for 10s by using an oscillator;
453. standing at room temperature for 5 min;
454. after rapid centrifugation, the solution is placed on a magnetic frame for 5min until the solution is clear, and the supernatant is carefully removed without throwing away the magnetic beads;
455. keeping the magnetic frame still, adding 200 μ l of freshly prepared 80% alcohol into the centrifuge tube, standing at room temperature for 30s, and carefully removing the supernatant;
456. repeating 455;
457. the centrifuge tube was held on the magnetic stand and the centrifuge tube lid was opened and air dried for about 5min (note: beads not overdry)
458. Taking the centrifuge tube off the magnetic frame, adding 22 μ l of 1 х TE Buffer, sucking up and down by a pipette 10 times until the magnetic beads are resuspended, and standing at room temperature for 5 min;
459. the tube was centrifuged quickly and placed on a magnetic stand at room temperature for about 5min until the solution was clear and 20. mu.l of the supernatant was transferred to a new tube.
Obtaining the library A.
4510. Quantification and fragment analysis were performed using the Qubit 3.0 and agilent 2100bioanalyzer (see related product specifications).
5. High throughput sequencing
The embodiment of the invention adopts an Illumina MiSeq sequencing system to carry out high-throughput sequencing on the library A. The prepared library MiSeq sequencer runs, and the sequencing length is 2 multiplied by 75 bp.
Here, it should be noted that the inventors of the present invention also performed comparative sequencing on the time sequence of sample DNA mixing during library establishment. As described above, library A was formed by mixing equal amounts of sample DNA after ligation of adaptors. In another method, the DNA of each sample is not directly mixed after being connected with a linker, but each sample is separately pooled, and then each sample library is mixed to obtain a library B. The CV yields of the data obtained from sequencing library a and library B on the same sequencer were 15.5% and 32.6%, respectively. Therefore, the sample DNA is equally mixed after being connected with the joint, and then the library A is formed, so that the data volume produced by each sample is more uniform, unnecessary data waste is avoided, the cost consumption after enrichment experiment can be reduced, the operation steps are reduced, and the operation complexity is reduced.
6. Alignment of data to reference genome
Comparing read data obtained by sequencing the library A to a reference genome, selecting human HG19 as a reference genome sequence, and counting comparison results. The statistical results of the data alignment are shown in table 1.
TABLE 1
Figure BDA0001560229050000211
Wherein Total read represents the Total read per sample; mappable read represents the number of reads that can be aligned to the reference genome; map rate represents the alignment rate; unique Read represents the number of reads that align uniquely to the reference genome; the Unique Rate represents the Unique alignment ratio, i.e., the ratio of reads that align uniquely to the reference genome to those that can be aligned.
7. Computing statistics
Sequencing results for library a:
s 1: dividing windows
A window is divided over the sequence of the reference genome, resulting in S windows. There are a number of alternatives for dividing the windows, and in one embodiment, the windows may be divided by the number of bases of the reference genome in each window being equal; in another embodiment, the windows may be divided in such a way that the number of valid sequences in which the sequence of normal samples falls within each window is equal; in another embodiment, the windows may be divided in such a way that the number of valid sequences that the mimic sequence of the reference genome falls within each window is equal. The number of bases in a single window or the number of effective sequences falling in a single window can be set as required, for example, any number of 50-1000kb can be taken, specifically, 500kb is taken, and generally, the smaller the number is, the higher the resolution of the result is.
s 2: calculating the relative data amount r of each windowi
Counting the relative data amount r of each window in which the effective sequence of each sample fallsi
ri=Wi/N
WiRepresenting the number of valid sequences falling in the ith window, i is more than or equal to 1 and less than or equal to S,
n represents the total number of effective sequences obtained by sequencing the sample;
rirepresenting the relative amount of data for the ith window;
s 3: calculating the relative data volume ratio V of each windowi
Taking the normal sample as a normal control, and counting the relative data quantity ratio V of the sample to be detected and the normal control sample falling into each windowi
Figure BDA0001560229050000221
Figure BDA0001560229050000222
The relative data volume of the ith window of the sample to be detected;
Figure BDA0001560229050000223
relative data volume of the ith window of normal samples;
Vithe relative data volume ratio of the ith window is obtained;
s 4: determining breakpoint locations to divide the segments:
taking n adjacent windows as window groups, dividing S windows into m window groups, where m is S-n +1
m is the total number of window groups;
s is the total number of windows divided by the sample;
n is the number of windows contained in a window population;
using the relative data volume ratio V of each window in each window groupiAnd (3) carrying out nonparametric run length test, calculating a P value (significance difference P value) of each adjacent window group, and if the P value is smaller than a threshold value (the threshold value can be an empirical value, such as 0.001; and can also be obtained by analyzing a plurality of normal samples), judging that the boundary of the corresponding adjacent window group is the breakpoint position of Copy Number Variation (CNV). B breakpoints are obtained from above, and the reference genome sequence is divided into b +1 segments by the breakpoints. The adjacent window groups refer to window groups adjacent from head to tail, and every two adjacent window groups correspond to a P value.
s 5: calculating the copy ratio CR of each sectionj
Figure BDA0001560229050000231
h is the number of windows in the j section;
b is the number of broken points;
CRjthe copy rate of the jth section;
CRjby calculating the relative data volume ratio V of each window in the j sectioniIs obtained as an average value of (a).
s 6: judging whether the section is abnormal with non-integral multiple
According to the theory of genetics, it is considered that 0.75<CRj<1.25 is normal, and if the j section is out of the range, the j section is judged to be abnormal with non-integral multiple. If the jth segment corresponds to a whole chromosome, it can be determined whether the whole chromosome is a non-integral multiple abnormality.
More preferably, in step S3, in order to improve the accuracy of the data, the relative data volume ratio V for each window may be usediGC content correction is carried out to obtain the corrected relative data volume ratio V of each windowi′。
Vi′=Vi/Ce
Figure BDA0001560229050000241
Figure BDA0001560229050000242
Figure BDA0001560229050000243
CeIs the weight of the window falling within the e-th GC content interval,
Keis the ratio V of the relative data amounts of all windows (the number of windows is represented by g) falling in the e-th GC content intervaliThe average value of (a) of (b),
Figure BDA0001560229050000244
relative data volume ratio V for all windows of a sampleiThe average value of (a) of (b),
Vi' is the corrected relative data amount ratio of the ith window,
calculating the average GC content of the effective sequences in each window according to the GC content data of the effective sequences falling into each window as the GC content of the window, dividing the GC content into a plurality of intervals with equal intervals, for example, the interval of 0.01 can be divided into a plurality of intervals, and knowing which GC content interval the corresponding window falls into according to the GC content of the window.
In one embodiment, as shown in FIG. 3, the GC content of each window and the relative data volume ratio V are takeniPlotting, dividing GC content into several intervals at intervals of 0.01, obtaining windows falling into each GC content interval, and obtaining the relative data volume ratio V according to the windows falling into the corresponding GC content intervalsiThe corrected relative data volume ratio V of each window can be calculated and obtained by using the formulai′。
When the ratio V of the relative data amount to each windowiAfter GC content correction, the results are shown in fig. 4. In step s4, the corrected relative data amount ratio of each window in each window group is usedValue Vi' non-parametric run length check is performed;
in one embodiment, the relative data volume ratio V for each window in FIG. 3iAfter GC content correction, the corrected relative data volume ratio V of each window in each window group is utilizedi' run nonparametric tests are performed.
In step s5, the copy rate calculation formula is:
Figure BDA0001560229050000251
h is the number of windows in the j section;
b is the number of broken points;
CRj' is the copy rate of the j-th section.
CRj' is obtained by averaging the relative data volume ratios of the windows in the j-th segment.
In step s6, according to the theory of genetics, it is considered that 0.75< CRj' <1.25 is normal, and if it is beyond this range, the j-th segment is judged to be a non-integral multiple abnormality. If the jth segment corresponds to a whole chromosome, it can be determined whether the whole chromosome is a non-integral multiple abnormality.
The chromosomes of T22 abnormal embryonic cells are shown in FIG. 5 according to the above determination method; chromosomes of the fragment deleted/repeated embryonic cells are shown in FIG. 6; chromosomes of normal embryonic cells are shown in FIG. 7.
The results of fig. 5, 6 and 7 are fully illustrative: the detection system can correctly distinguish normal embryos from abnormal embryos.
By adopting the scheme of the invention, the required initial sample is less, and 1ng of DNA can complete the detection and can be used for detecting the scarce sample; and after the cell whole genome is expanded, performing quality control on the amplified product to evaluate whether the whole genome amplification result is qualified or not, and if the whole genome amplification result is not qualified, terminating the experiment, thereby reducing unnecessary detection cost.
Summarizing, the technical scheme of the invention has the following advantages:
(1) the required sample initial amount is low, and the chromosome aneuploidy of single cells can be detected. (2) The size of the window can be set according to requirements, and the detection resolution is improved by controlling the size of the window. (3) And controlling the quality of the whole genome amplification product, wherein after the embryo cell completes the single-cell whole genome amplification, the amplification product is controlled to evaluate whether the whole genome amplification process is successful, if not, the experiment is terminated, and the cost consumption is reduced. (4) The enrichment after the sample is mixed, all sample DNAs after the connection joint are mixed in equal amount in the process of building a library, and then PCR enrichment is carried out. (5) 1ng of initial DNA is built, the cell sample belongs to a micro sample, the amount of DNA is small, the whole detection process can complete detection by 1ng of DNA, and the waste of the sample is reduced. (6) The number of reads of 700k can complete the mutation detection of more than 5Mb, and the deletion/duplication of chromosomes of more than 5Mb can be detected by using at most 700k reads/samples, and the accuracy can reach 99.9%. The sequencing depth is reduced, and the cost is reduced. (7) And sample data correction, which introduces many amplification deviations due to the whole genome amplification of a cell sample, and the method introduces sample data correction to reduce the influence of the deviations on the result.
In conclusion, the present invention effectively overcomes various disadvantages of the prior art and has high industrial utilization value.
The foregoing embodiments are merely illustrative of the principles and utilities of the present invention and are not intended to limit the invention. Any person skilled in the art can modify or change the above-mentioned embodiments without departing from the spirit and scope of the present invention. Accordingly, it is intended that all equivalent modifications or changes which can be made by those skilled in the art without departing from the spirit and technical spirit of the present invention be covered by the claims of the present invention.

Claims (20)

1. A system for detecting chromosomal aneuploidy for a single cell, the system comprising:
a window dividing module for dividing a window over a sequence of a reference genome;
the section dividing module is used for determining the breakpoint position according to the relative data quantity ratio of each window obtained by calculating the sample to be detected and the normal sample, and dividing the section according to the breakpoint position; wherein the normal sample does not have chromosome aneuploidy, and the relative data volume ratio is calculated by using the following formula:
ri=Wi/N
Vi=ri case/ri control
Wirepresenting the number of effective sequences falling into the ith window, wherein i is more than or equal to 1 and less than or equal to S, and S is the total number of the windows;
n represents the total number of effective sequences obtained by sequencing the sample to be detected; wherein the effective sequence is a sequence aligned to the reference genome in a sequence obtained by whole genome sequencing;
rirepresenting the relative amount of data for the ith window;
ri casethe relative data volume of the ith window of the sample to be detected;
ri controlrelative data volume of the ith window of normal samples;
Vithe relative data volume ratio of the ith window is obtained;
the relative data volume ratio ViGC content correction is carried out to obtain the corrected relative data volume ratio V of each windowi', correction of the relative data amount ratio Vi' calculated using the following formula:
Vi’=Vi/Ce
Figure FDA0003445921610000011
Figure FDA0003445921610000021
Figure FDA0003445921610000022
Ceis the weight of the window falling within the e-th GC content interval,
Keis the relative data volume ratio V of all windows falling within the e-th GC content intervaliThe mean value of the values is determined,
g is the total number of all windows that fall within the e-th GC content interval,
Figure FDA0003445921610000023
is the relative data volume ratio V of all windows of the sample to be detectediThe mean value of the values is determined,
Vi' is the corrected relative data volume ratio for the ith window;
a block copy rate calculation module for calculating a block copy rate CRj
And the segment aneuploidy judging module is used for judging whether the segment has chromosome aneuploidy or not according to the segment copy rate.
2. The system for detecting chromosomal aneuploidy according to claim 1, wherein the breakpoint location in the segment partitioning module is the boundary between two adjacent windows with significant difference in relative data volume ratio.
3. The system for detecting chromosomal aneuploidy according to claim 1, wherein the breakpoint location in the segmentation module is determined by:
taking n adjacent windows as a window group, dividing S windows into m window groups, wherein m is S-n +1,
m is the total number of window groups;
s is the total number of windows divided by the sample;
n is the number of windows in a window group, and then the relative data volume ratio V of each window in each window group is usediCarrying out nonparametric run length inspection, calculating the P value of each adjacent window group, and if the P value is smaller than a threshold value, judging that the boundary of the corresponding adjacent window group is the breakpoint position of copy number variation; wherein the P value is a significant difference P value.
4. The system for detecting the chromosomal aneuploidy of a single cell according to claim 1, wherein the segment copy rate is a relative data volume ratio V of each window in a segmentiAverage value of (a).
5. The system for detecting chromosomal aneuploidy according to claim 4, wherein the segment copy rate is calculated using the following formula:
Figure FDA0003445921610000031
h is the number of windows in the j section;
b is the number of broken points;
CRjthe copy rate of the jth section is obtained by calculating the average value of the relative data volume ratio of each window in the jth section.
6. The system for detecting chromosomal aneuploidy according to claim 1, wherein the segment aneuploidy determination module determines if 0.75< CRjIf the number is less than 1.25, judging the jth section to be normal; if CR isjLess than or equal to 0.75 or CRjAnd if the number of the segments is more than or equal to 1.25, judging that the jth segment is a non-integral multiple abnormality.
7. The system for detecting chromosomal aneuploidy for a single cell according to claim 6, wherein if the j-th segment corresponds to a whole chromosome, it can be determined whether the whole chromosome is an aneuploidy abnormality.
8. The system for detecting chromosomal aneuploidy for a single cell according to claim 1, further comprising: the system comprises a cell separation module, a cell whole genome amplification module, a cell amplification product quality control module, a library construction module and a high-throughput sequencing module.
9. The system for detecting the chromosomal aneuploidy according to claim 8, wherein the cell amplification product quality control module is configured to perform quality control on a whole genome amplification product of a cell by using a housekeeping gene of the cell, and determine whether a quality control result meets a preset condition, and if the quality control result meets the preset condition, the whole genome amplification product is a qualified amplification product.
10. The system for chromosomal aneuploidy detection according to claim 9, wherein said library construction module for constructing a sequencing library of said qualified amplification products comprises:
the device comprises a crushing submodule, a joint connecting submodule, a mixing submodule, a mixed product purifying submodule, a PCR enrichment submodule and an enrichment product purifying submodule, wherein the crushing submodule is used for crushing the qualified amplification product to obtain a crushed product; the joint connection submodule is used for adding joints to the crushed product to obtain a joint crushed product; the mixing submodule is used for mixing the adaptor crushing products of a plurality of samples in an equal amount to obtain a mixed product; the mixed product purification submodule is used for purifying the mixed product; the PCR enrichment submodule is used for enriching the mixed product purified by the mixed product purification module to obtain an enriched product; and the enriched product purification submodule is used for purifying the enriched product to obtain the sequencing library.
11. A method for detecting chromosome aneuploidy suitable for a single cell, comprising the steps of:
s1, dividing windows on the sequence of the reference genome to obtain S windows;
s2, determining the breakpoint position according to the relative data volume ratio of each window obtained by calculating the sample to be detected and the normal sample, and dividing sections according to the breakpoint position; wherein the normal sample does not have a chromosomal aneuploidy; wherein the normal sample does not have a chromosomal aneuploidy, and the relative data volume ratio is calculated using the following formula:
ri=Wi/N
Vi=ri case/ri control
Wirepresenting the number of valid sequences falling in the ith window, i is more than or equal to 1 and less than or equal to S,
n represents the total number of effective sequences obtained by sequencing the sample to be detected; wherein the effective sequence is a sequence aligned to the reference genome in a sequence obtained by whole genome sequencing;
rirepresenting the relative amount of data for the ith window;
ri casethe relative data volume of the ith window of the sample to be detected;
ri controlrelative data volume of the ith window of normal samples;
Vithe relative data volume ratio of the ith window is obtained;
the relative data volume ratio ViGC content correction is carried out to obtain the corrected relative data volume ratio V of each windowi', correction of the relative data amount ratio Vi' calculated using the following formula:
Vi’=Vi/Ce
Figure FDA0003445921610000051
Figure FDA0003445921610000052
Figure FDA0003445921610000053
Ceis the weight of the window falling within the E-th GC content interval, E ∈ [1, E]And E is the total number of GC content intervals
KeAs all windows V falling within the e-th GC content intervaliThe mean value of the values is determined,
g is the total number of all windows that fall within the e-th GC content interval,
Figure FDA0003445921610000054
the ratio V of the relative data amounts of all windows of the sampleiThe mean value of the values is determined,
Vi' is the corrected relative data volume ratio for the ith window;
s3, calculating the copy rate CR of the sectionj
S4, judging whether the segment has chromosome aneuploidy according to the segment copy rate.
12. The method of claim 11, wherein the breakpoint position is the boundary between two adjacent windows with significant difference in relative data volume ratio.
13. The method for detecting chromosomal aneuploidy for a single cell according to claim 12, wherein the breakpoint location is determined by:
dividing n adjacent windows in the S windows into a window group to obtain m window groups, wherein m is S-n + 1; m is the total number of window groups; s is the total number of the windows divided by the window dividing module; n is the number of windows contained in a window population;
then, the relative data volume ratio V of each window in each window group is utilizediPerforming nonparametric run length inspection to calculate each adjacent windowIf the P value is smaller than the threshold value, judging that the boundary of the corresponding adjacent window group is the breakpoint position of copy number variation; wherein the P value is a significant difference P value.
14. The method for detecting the chromosomal aneuploidy of a single cell according to claim 11, wherein the segment copy rate is an average of the relative data volume ratio of each window in a segment.
15. The method for detecting chromosomal aneuploidy applicable to a single cell according to claim 14,
the sector copy rate CRjCalculated using the following formula:
Figure FDA0003445921610000061
h is the number of windows in the j section;
b is the number of broken points;
CRjis the copy rate of the j section, CRjBy calculating the relative data volume ratio V of each window in the j sectioniIs obtained as an average value of (a).
16. The method for detecting chromosomal aneuploidy of a single cell according to claim 11, wherein in step S4, if 0.75< CRjIf the number is less than 1.25, judging the jth section to be normal; if CR isjLess than or equal to 0.75 or CRjAnd if the number of the segments is more than or equal to 1.25, judging that the jth segment is a non-integral multiple abnormality.
17. The method of claim 16, wherein if the j-th segment corresponds to a whole chromosome, determining whether the whole chromosome is an aneuploidy abnormality.
18. The method for detecting the chromosomal aneuploidy applicable to a single cell according to claim 11, further comprising the steps of: separating cells; amplifying a whole genome of the cell; controlling the quality of cell amplification products; constructing a library; high throughput sequencing.
19. The method for detecting the chromosomal aneuploidy according to claim 18, wherein the quality control of the cell amplification product is used to control the whole genome amplification product of the cell by using a housekeeping gene of the cell, and determine whether the quality control result meets a preset condition, and if the quality control result meets the preset condition, the whole genome amplification product is a qualified amplification product.
20. The method for detecting the chromosomal aneuploidy of a single cell according to claim 19, wherein the library construction comprises DNA fragmentation, adaptor ligation, mixing, mixed product purification, PCR enrichment and enriched product purification, wherein the DNA fragmentation is to fragment the qualified amplification product to obtain a fragmented product; the joint connection means adding a joint to the crushed product to obtain a joint crushed product; the mixing refers to mixing the adaptor crushing products of a plurality of samples in equal quantity to obtain a mixed product; the mixed product purification refers to purifying the mixed product; and the enrichment product purification refers to purifying the enrichment product to obtain a sequencing library.
CN201810078283.XA 2018-01-23 2018-01-23 Chromosome aneuploidy detection system suitable for single cell and application Active CN108363903B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201810078283.XA CN108363903B (en) 2018-01-23 2018-01-23 Chromosome aneuploidy detection system suitable for single cell and application

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201810078283.XA CN108363903B (en) 2018-01-23 2018-01-23 Chromosome aneuploidy detection system suitable for single cell and application

Publications (2)

Publication Number Publication Date
CN108363903A CN108363903A (en) 2018-08-03
CN108363903B true CN108363903B (en) 2022-03-04

Family

ID=63006985

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201810078283.XA Active CN108363903B (en) 2018-01-23 2018-01-23 Chromosome aneuploidy detection system suitable for single cell and application

Country Status (1)

Country Link
CN (1) CN108363903B (en)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP4086356A4 (en) * 2019-12-31 2023-09-27 BGI Clinical Laboratories (Shenzhen) Co., Ltd. Methods for determining chromosome aneuploidy and constructing classification model, and device

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2013059967A1 (en) * 2011-10-28 2013-05-02 深圳华大基因科技有限公司 Method for detecting micro-deletion and micro-repetition of chromosome
CN104520437A (en) * 2013-07-17 2015-04-15 深圳华大基因科技有限公司 Method and device for detecting chromosomal aneuploidy
CN106520940A (en) * 2016-11-04 2017-03-22 深圳华大基因研究院 Chromosomal aneuploid and copy number variation detecting method and application thereof

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
TWI485254B (en) * 2013-09-03 2015-05-21 Ming Chen Non-invasive prenatal detection method on the basis of the whole genome trend score

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2013059967A1 (en) * 2011-10-28 2013-05-02 深圳华大基因科技有限公司 Method for detecting micro-deletion and micro-repetition of chromosome
CN104520437A (en) * 2013-07-17 2015-04-15 深圳华大基因科技有限公司 Method and device for detecting chromosomal aneuploidy
CN106520940A (en) * 2016-11-04 2017-03-22 深圳华大基因研究院 Chromosomal aneuploid and copy number variation detecting method and application thereof

Also Published As

Publication number Publication date
CN108363903A (en) 2018-08-03

Similar Documents

Publication Publication Date Title
CN109797436B (en) Sequencing library construction method
US10465245B2 (en) Nucleic acids and methods for detecting chromosomal abnormalities
AU2020202992A1 (en) Methods for genome assembly and haplotype phasing
US20210024996A1 (en) Method for verifying bioassay samples
JP2019501641A (en) Rapid sequencing of short DNA fragments using nanopore technology
CN111052249B (en) Methods of determining predetermined chromosome conservation regions, methods of determining whether copy number variation exists in a sample genome, systems, and computer readable media
JP6904953B2 (en) How to determine cell clonality
CN107557874A (en) Methylate the library method for building up being sequenced altogether with transcript profile and its application suitable for unicellular full-length genome
US20210102246A1 (en) Genetic test for detecting congenital adrenal hyperplasia
US20230416826A1 (en) Target-enriched multiplexed parallel analysis for assessment of fetal dna samples
WO2016045105A1 (en) Pf rapid database construction method and application therefor
CN112522382A (en) Y chromosome sequencing method based on liquid phase probe capture
CN108363903B (en) Chromosome aneuploidy detection system suitable for single cell and application
CN105087560B (en) A kind of multiple PCR primer and method building pig BCR heavy chain library based on high-flux sequence
EP3927824A2 (en) High-throughput single-cell libraries and methods of making and of using
CN116083529B (en) Method for targeted enrichment of DNA of genome target region and application thereof
CN109897822B (en) Establishment and application of human immortalized B lymphocyte cell line
WO2023077487A1 (en) Adenovirus mnp marker combination, primer pair combination, kit and use thereof
US20230313281A1 (en) Methods and Compositions For Preparing Nucleic Acids For Genetic Analysis
JP7422762B2 (en) A new method for identifying species
US20240076736A1 (en) Compositions and methods for characterizing polynucleotide sequence alterations
WO2019010775A1 (en) Molecular tag, joint and method for determining nucleotide sequence containing low-frequency mutation
WO2023077489A1 (en) Mnp marker combination of yersinia pestis, primer pair combination, kit, and application thereof
Ludwick Snp Genotyping of Native DNA Using Oxford Nanopore Minion Sequencing
EP3696279A1 (en) Methods for noninvasive prenatal testing of fetal abnormalities

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant
PE01 Entry into force of the registration of the contract for pledge of patent right

Denomination of invention: A chromosomal aneuploidy detection system suitable for single cells and its application

Effective date of registration: 20231201

Granted publication date: 20220304

Pledgee: Industrial Bank Co.,Ltd. Shanghai Changning sub branch

Pledgor: BASETRA MEDICAL TECHNOLOGY CO.,LTD.

Registration number: Y2023310000796

PE01 Entry into force of the registration of the contract for pledge of patent right