TW201421273A - Non-invasive detection of fetus genetic abnormality - Google Patents

Non-invasive detection of fetus genetic abnormality Download PDF

Info

Publication number
TW201421273A
TW201421273A TW101143513A TW101143513A TW201421273A TW 201421273 A TW201421273 A TW 201421273A TW 101143513 A TW101143513 A TW 101143513A TW 101143513 A TW101143513 A TW 101143513A TW 201421273 A TW201421273 A TW 201421273A
Authority
TW
Taiwan
Prior art keywords
chromosome
coverage
depth
content
sequence
Prior art date
Application number
TW101143513A
Other languages
Chinese (zh)
Other versions
TWI489305B (en
Inventor
Fu-Man Jiang
Hui-Fei Chen
xiang-hua Chai
yu-ying Yuan
xiu-qing Zhang
Fang Chen
Original Assignee
Bgi Health Service Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Bgi Health Service Co Ltd filed Critical Bgi Health Service Co Ltd
Priority to TW101143513A priority Critical patent/TWI489305B/en
Publication of TW201421273A publication Critical patent/TW201421273A/en
Application granted granted Critical
Publication of TWI489305B publication Critical patent/TWI489305B/en

Links

Landscapes

  • Measuring Or Testing Involving Enzymes Or Micro-Organisms (AREA)

Abstract

The invention relates to non-invasive method of detecting fetal genetic polymorphism through large-scale sequencing of maternal Biological sample nucleotides. The invention also provides method of removing sequencing GC deviation caused by chromosome GC content difference. The invention not only makes detection more accurate, but also provides an integrated approach to detecting fetal aneuploidy including sex chromosome disease such as XO, XXX, XXY, XYY.

Description

對胎兒遺傳異常的無創性檢測 Non-invasive detection of fetal genetic abnormalities

本發明涉及通過對來自孕婦的樣品進行DNA測序來檢測胎兒遺傳異常的無創性方法。更具體而言,本發明涉及數據分析,以除去由擴增和測序DNA樣品引入的GC偏差。本發明還涉及統計分析,目的是檢測胎兒遺傳異常,例如包括非整倍性的染色體異常。 The present invention relates to a non-invasive method for detecting fetal genetic abnormalities by DNA sequencing of samples from pregnant women. More specifically, the present invention relates to data analysis to remove GC bias introduced by amplification and sequencing of DNA samples. The invention also relates to statistical analysis for the purpose of detecting fetal genetic abnormalities, such as chromosomal abnormalities including aneuploidy.

包含有創步驟的常規產前診斷方法例如絨膜絨毛取樣和羊膜腔穿刺術對胎兒和母親二者都有潛在的風險。使用產婦血清標記物和超音波來無創性篩查胎兒非整倍性是可行的,但敏感性和特異性有限(Kagan,et al.,Human Reproduction(2008)23:1968-1975;Malone,et al.,N Engl J Med(2005)353:2001-2011)。 Conventional prenatal diagnostic methods including invasive procedures such as chorionic villus sampling and amniocentesis have potential risks for both the fetus and the mother. Non-invasive screening for fetal aneuploidy using maternal serum markers and ultrasound is feasible, but with limited sensitivity and specificity (Kagan, et al. , Human Reproduction (2008) 23: 1968-1975; Malone, et Al. , N Engl J Med (2005) 353: 2001-2011).

最近的研究已經證明,通過對孕婦血漿中的DNA分子進行大規模平行測序而無創性檢測胎兒非整倍性是可行的。胎兒DNA已經在母體血漿和血清中被檢測到並被定量(Lo,et al.,Lancet(1997)350:485487;Lo,et al.,Am.J.hum.Genet.(1998)62:768-775)。多種胎兒細胞類型在母體循環系統出現,包括胎兒粒細胞、淋巴細胞、有核紅細胞、血液細胞和滋養細胞(Pertl and Bianchi,Obstetrics and Gynecology(2001)98:483-490)。胎兒DNA可以在妊娠第7週在血清中被檢測到,並且隨妊娠期增加。產婦血清和血漿中存在的胎兒DNA與從胎兒細胞分離方法得到的DNA濃度相當。 Recent studies have demonstrated that non-invasive detection of fetal aneuploidy is feasible by large-scale parallel sequencing of DNA molecules in maternal plasma. Fetal DNA has been detected and quantified in maternal plasma and serum (Lo, et al. , Lancet (1997) 350:485487; Lo, et al. , Am. J. hum. Genet. (1998) 62:768 -775). A variety of fetal cell types appear in the maternal circulatory system, including fetal granulocytes, lymphocytes, nucleated red blood cells, blood cells, and trophoblasts (Pertl and Bianchi, Obstetrics and Gynecology (2001) 98: 483-490). Fetal DNA can be detected in serum at week 7 of pregnancy and increases with gestation. The fetal DNA present in maternal serum and plasma is comparable to the DNA obtained from the method of separation from fetal cells.

循環的胎兒DNA已經被用於確定胎兒的性別(Lo,et al.,Am.J.hum.Genet.(1998)62:768-775)。同時,已經使用胎兒DNA檢測到胎兒恒河猴D基因型。然而,循環的胎兒DNA的診斷應用和臨床應用限於存在於胎兒但不存在於母親的基因(Pertl and Bianchi,Obstetrics and Gynecology(2001)98:483-490)。因此,仍存在這樣的無創性方法的需求,即該無創性方法可以確定胎兒DNA序列並提供對胎兒染色體異常的確定性診斷。 Cyclic fetal DNA has been used to determine the sex of the fetus (Lo, et al. , Am. J. hum. Genet. (1998) 62:768-775). At the same time, the fetal rhesus D genotype has been detected using fetal DNA. However, diagnostic applications and clinical applications of circulating fetal DNA are limited to genes that are present in the fetus but not present in the mother (Pertl and Bianchi, Obstetrics and Gynecology (2001) 98: 483-490). Therefore, there remains a need for a non-invasive method that can determine fetal DNA sequences and provide a definitive diagnosis of fetal chromosomal abnormalities.

過去數十年在母體血液中發現胎兒細胞和無細胞胎兒核酸以及對母體血漿無細胞DNA應用高通量鳥槍測序使得如下是可行的:檢測母體血漿樣品中由非整倍體胎兒造成的染色體的小變化。無創檢測三體性13、18和21妊娠已經被實現。 The discovery of fetal cells and cell-free fetal nucleic acids in maternal blood over the past few decades and high-throughput shotgun sequencing of maternal plasma cell-free DNA makes it feasible to detect chromosomes in maternal plasma samples caused by aneuploidy fetuses. Small changes. Non-invasive detection of trisomy 13, 18 and 21 pregnancies has been achieved.

然而,如一些研究表明的,通過擴增和測序引入的GC偏差對非整倍性檢測的敏感性產生了操作限制。在不同條件例如試劑組成、簇密度和溫度下,GC偏差可能在樣品製備和測序過程中引入,這造成對不同GC組成的DNA分子的差異取樣和富含GC或少含GC的染色體的測序數據的顯著偏差。 However, as some studies have shown, the sensitivity of GC bias introduced by amplification and sequencing to aneuploidy detection creates operational limitations. Under different conditions such as reagent composition, cluster density and temperature, GC bias may be introduced during sample preparation and sequencing, which results in differential sampling of DNA molecules with different GC compositions and sequencing data for GC-rich or GC-free chromosomes. Significant deviation.

為了提高敏感性,已經開發了用於去除GC偏差效應的方法。Fan and Quake開發了一種通過計算去除GC偏差的方法,該方法通過基於局部基因組GC含量對每個GC密度給予權重,以通過乘以相應權重改善映射到每一區段(bin)中的讀段(read)數目(Fan and Quake PLoS ONE(2010)5:e10439)。然而,該方法在處理性染色體病症特別是Y染色體相關病症方面有困難,原因是該方法可能造成數據輕微失真,這會干擾檢測的準確性。 In order to increase sensitivity, methods for removing GC bias effects have been developed. Fan and Quake developed a method to remove GC bias by calculation, which weights each GC density based on local genomic GC content to improve the mapping mapped to each bin by multiplying the corresponding weights (read) number (Fan and Quake PLoS ONE (2010) 5: e10439). However, this method has difficulty in dealing with sexual chromosomal disorders, particularly Y-chromosome-related disorders, because the method may cause slight distortion of the data, which may interfere with the accuracy of the assay.

在本文中,發明人描述了一種通過計算去除GC偏差的方法,目的是除了避免數據失真以外還獲得更高的胎兒遺傳異常檢測的敏感性。該方法根據GC含量定義用於統計測試的參數。另外,發明人通過顯示出更高的敏感性和特異性的二元假設將估計的胎兒分數引入診斷。發明人的方法還表明,對於含低胎兒DNA分數的母體樣品,通過測序更多的多核苷酸片段,有可能將對胎兒遺傳異常的無創性檢測的敏感性增加至預設精度。在後面的孕週對母體血漿再次取樣也可以增加診斷的敏感性。 In this context, the inventors describe a method for removing GC bias by calculation, in order to obtain a higher sensitivity to fetal genetic abnormality detection in addition to avoiding data distortion. The method defines parameters for statistical testing based on GC content. In addition, the inventors introduced the estimated fetal fraction into the diagnosis by a binary hypothesis showing higher sensitivity and specificity. The inventors' method also shows that for parent samples containing low fetal DNA fractions, by sequencing more polynucleotide fragments, it is possible to increase the sensitivity to non-invasive detection of fetal genetic abnormalities to a preset accuracy. Re-sampling the maternal plasma at a later gestational age can also increase the sensitivity of the diagnosis.

本發明涉及通過大規模測序來自母體生物樣品的核苷酸而無創性檢測胎兒遺傳異常的方法。還提供了去除由於染色體GC含量差異而造成的測序結果GC偏差的方法。 The present invention relates to a method for non-invasive detection of fetal genetic abnormalities by large scale sequencing of nucleotides from maternal biological samples. A method of removing the GC bias of the sequencing result due to the difference in the GC content of the chromosome is also provided.

因此,在一方面,本文提供了一種用於建立染色體的覆蓋深度和GC含量之間的關係的方法,所述方法包括:從多於一個樣品獲得涵蓋所述染色體的複數多核苷酸片段的序列訊息;基於所述序列訊息將所述片段分配至染色體;對於每個樣品基於所述序列訊息計算所述染色體的覆蓋深度和GC含量;並確定所述染色體的覆蓋深度和GC含量之間的關係。 Thus, in one aspect, provided herein is a method for establishing a relationship between a depth of coverage of a chromosome and a GC content, the method comprising: obtaining a sequence of a plurality of polynucleotide fragments encompassing the chromosome from more than one sample a message; assigning the segment to a chromosome based on the sequence message; calculating a depth of coverage and a GC content of the chromosome based on the sequence message for each sample; and determining a relationship between a depth of coverage of the chromosome and a GC content .

在一個實施方案中,所述多核苷酸片段的長度區間是約10-約1000 bp。在另一個實施方案中,所述多核苷酸片段的長度區間是約15-約500 bp。在又一個實施方案中,所述多核苷酸片段的長度區間是約20-約200 bp。在再一個實施方案中,所述多核苷酸片段的 長度區間是約25-約100 bp。在另一個實施方案中,所述多核苷酸片段的長度是約35 bp。 In one embodiment, the polynucleotide fragment has a length interval of from about 10 to about 1000 bp. In another embodiment, the polynucleotide fragment has a length interval of from about 15 to about 500 bp. In yet another embodiment, the polynucleotide fragment has a length interval of from about 20 to about 200 bp. In still another embodiment, the polynucleotide fragment The length interval is from about 25 to about 100 bp. In another embodiment, the polynucleotide fragment is about 35 bp in length.

在一個實施方案中,所述序列訊息通過平行基因組測序獲得。在另一個實施方案中,將所述片段分配至染色體是通過將所述片段的序列與參考人基因組序列比較進行的。參考人基因組序列可以是任何合適的和/或公布的人基因組版本(build),例如hg18或hg19。分配至多於一個染色體或未分配到任一染色體的片段可以被忽略。 In one embodiment, the sequence message is obtained by parallel genome sequencing. In another embodiment, assigning the fragment to a chromosome is performed by comparing the sequence of the fragment to a reference human genomic sequence. The reference human genomic sequence can be any suitable and/or published human genome build, such as hg18 or hg19. Fragments assigned to more than one chromosome or not assigned to either chromosome can be ignored.

在一個實施方案中,染色體的覆蓋深度是分配至所述染色體的片段數目和所述染色體的參考獨特讀段數目之間的比值。在另一個實施方案中,覆蓋深度被標準化。在又一個實施方案中,標準化相對於所有其他常染色體的覆蓋進行計算。在又一個實施方案中,標準化相對於所有其他染色體的覆蓋進行計算。 In one embodiment, the depth of coverage of a chromosome is the ratio between the number of fragments assigned to the chromosome and the number of reference unique reads of the chromosome. In another embodiment, the depth of coverage is standardized. In yet another embodiment, normalization is calculated relative to coverage of all other autosomes. In yet another embodiment, the normalization is calculated relative to the coverage of all other chromosomes.

在一個實施方案中,所述關係為以下公式:cr i,j =f(GC i,j )+ε i,j ,j=1,2,…,22,X,Y In one embodiment, the relationship is the following formula: cr i , j = f ( GC i , j ) + ε i , j , j =1, 2, ..., 22, X , Y

其中f(GCi,j)代表樣品i、染色體j的標準化覆蓋深度和相應GC含量之間的關係,εi,j代表樣品i、染色體j的殘差。在一些實施方案中,覆蓋深度和GC含量之間的關係通過局部多項式回歸進行計算。在一些實施方案中,所述關係可以是弱線性關係。在一些實施方案中,所述關係通過loess算法確定。 Where f (GC i,j ) represents the relationship between the normalized depth of coverage of sample i, chromosome j and the corresponding GC content, ε i,j represents the residual of sample i, chromosome j. In some embodiments, the relationship between depth of coverage and GC content is calculated by local polynomial regression. In some embodiments, the relationship can be a weak linear relationship. In some embodiments, the relationship is determined by a loess algorithm.

在一些實施方案中,所述方法還包括根據以下公式計算擬合覆蓋深度: In some embodiments, the method further comprises calculating a fit coverage depth according to the following formula:

在一些實施方案中,所述方法還包括根據以下公式計算標準差: 其中ns代表參考樣品的數目。 In some embodiments, the method further comprises calculating a standard deviation according to the following formula: Where ns represents the number of reference samples.

在一些實施方案中,所述方法還包括根據以下公式計算student t統計量: In some embodiments, the method further comprises calculating a student t statistic according to the following formula:

在一個實施方案中,染色體的GC含量是分配至所述染色體的所有片段的平均GC含量。片段的GC含量可以通過用所述片段的G/C核苷酸數目除以所述片段的核苷酸總數目進行計算。在另一個實施方案中,染色體的GC含量是所述染色體的參考獨特讀段的合計GC含量。 In one embodiment, the GC content of the chromosome is the average GC content of all fragments assigned to the chromosome. The GC content of a fragment can be calculated by dividing the number of G/C nucleotides of the fragment by the total number of nucleotides of the fragment. In another embodiment, the GC content of the chromosome is the aggregated GC content of the reference unique reads of the chromosome.

在一些實施方案中,使用至少2、5、10、20、50、100、200、500或1000個樣品。在一些實施方案中,所述染色體是染色體1、2、......、22、X或Y。 In some embodiments, at least 2, 5, 10, 20, 50, 100, 200, 500 or 1000 samples are used. In some embodiments, the chromosome is chromosome 1, 2, ..., 22, X or Y.

在一個實施方案中,所述樣品來自懷孕的女性受試者。在另一個實施方案中,所述樣品來自男性受試者。在又一個實施方案中,所述樣品來自懷孕的女性受試者和男性受試者二者。 In one embodiment, the sample is from a pregnant female subject. In another embodiment, the sample is from a male subject. In yet another embodiment, the sample is from both a pregnant female subject and a male subject.

在一些實施方案中,所述樣品是生物樣品。在一些實施方案中,所述樣品是外周血樣。 In some embodiments, the sample is a biological sample. In some embodiments, the sample is a peripheral blood sample.

本文還提供了一種檢測胎兒遺傳異常的方法,所述方法包括:a)從樣品獲得複數多核苷酸片段的序列訊息;b)基於所述序列訊息將所述片段分配至染色體;c)基於所述序列訊息計算染色體的覆蓋 深度和GC含量;d)使用所述染色體的GC含量以及確立的所述染色體的覆蓋深度和GC含量之間的關係計算所述染色體的擬合覆蓋深度;以及e)將所述染色體的擬合覆蓋深度與覆蓋深度進行比較,其中它們之間的差異指示胎兒遺傳異常。 Also provided herein is a method of detecting a genetic abnormality in a fetus comprising: a) obtaining a sequence message of a plurality of polynucleotide fragments from a sample; b) assigning the fragment to a chromosome based on the sequence message; c) Sequence message calculates chromosome coverage Depth and GC content; d) calculating the fitted coverage depth of the chromosome using the GC content of the chromosome and the established relationship between the depth of coverage of the chromosome and the GC content; and e) fitting the chromosome The depth of coverage is compared to the depth of coverage, where the difference between them indicates fetal genetic abnormalities.

在一些實施方案中,所述方法還包括步驟f)確定胎兒性別。所述胎兒性別可以根據以下公式確定: 其中cr.a i,x cr.a i,y 分別是X和Y染色體的標準化的相對覆蓋度。 In some embodiments, the method further comprises the step of f) determining the sex of the fetus. The fetal gender can be determined according to the following formula: Where cr.a i , x and cr.a i , y are the normalized relative coverage of the X and Y chromosomes , respectively.

在一些實施方案中,所述方法還包括步驟g)估計所述胎兒分數。所述胎兒分數可以根據以下的公式進行計算: 其中是從來自懷有女性胎兒的孕婦的樣品的染色體Y覆蓋深度和相應GC含量的關係計算的擬合覆蓋深度,是指從由男性受試者的染色體Y覆蓋深度和相應GC含量的關係計算的擬合覆蓋深度。或者,所述胎兒分數可以按照以下公式計算: 其中是從來自懷有女性胎兒的孕婦的樣品的染色體X覆蓋深度和相應GC含量的關係計算的擬合覆蓋深度,是指從男性受試者的染色體X覆蓋深度和相應GC含 量的關係計算的擬合覆蓋深度。此外,所述胎兒分數可以按照以下公式計算: 其中是從來自懷有女性胎兒的孕婦的樣品的染色體X覆蓋深度和相應GC含量的關係計算的擬合覆蓋深度,是指從來自懷有女性胎兒的孕婦的樣品的染色體Y覆蓋深度和相應GC含量的關係計算的擬合覆蓋深度,是指從男性受試者的染色體X覆蓋深度和相應GC含量的關係計算的擬合覆蓋深度,是指從男性受試者的染色體Y覆蓋深度和相應GC含量的關係計算的擬合覆蓋深度。 In some embodiments, the method further comprises the step of g) estimating the fetal fraction. The fetal score can be calculated according to the following formula: among them Is the fitted coverage depth calculated from the relationship between the chromosome Y coverage depth and the corresponding GC content of a sample from a pregnant woman with a female fetus, Refers to the fitted coverage depth calculated from the relationship between the chromosome Y coverage depth of the male subject and the corresponding GC content. Alternatively, the fetal fraction can be calculated according to the following formula: among them Is the fitted coverage depth calculated from the relationship between the chromosome X coverage depth and the corresponding GC content of a sample from a pregnant woman with a female fetus, Refers to the fitted coverage depth calculated from the relationship between the chromosome X coverage depth of the male subject and the corresponding GC content. Furthermore, the fetal fraction can be calculated according to the following formula: among them Is the fitted coverage depth calculated from the relationship between the chromosome X coverage depth and the corresponding GC content of a sample from a pregnant woman with a female fetus, Refers to the fitted coverage depth calculated from the relationship between the chromosome Y coverage depth and the corresponding GC content of a sample from a pregnant woman with a female fetus. Refers to the fitted coverage depth calculated from the relationship between the chromosome X coverage depth of the male subject and the corresponding GC content, Refers to the fitted coverage depth calculated from the relationship between the chromosome Y coverage depth of the male subject and the corresponding GC content.

在一個實施方案中,所述遺傳異常是染色體異常。在另一個實施方案中,所述遺傳異常是非整倍性。在又一個實施方案中,所述胎兒非整倍性是選自三體性13、18和21的常染色體病症。在再一個實施方案中,所述胎兒非整倍性是選自XO、XXX、XXY和XYY的性染色體病症。 In one embodiment, the genetic abnormality is a chromosomal abnormality. In another embodiment, the genetic abnormality is aneuploidy. In yet another embodiment, the fetal aneuploidy is an autosomal disorder selected from the group consisting of trisomy 13, 18 and 21. In still another embodiment, the fetal aneuploidy is a sex chromosome disorder selected from the group consisting of XO, XXX, XXY, and XYY.

在一些實施方案中,將所述染色體的擬合覆蓋深度與覆蓋深度進行比較通過統計假設檢驗進行,其中一個假設是所述胎兒是整倍體(H0)且另一個假設是所述胎兒是非整倍體(H1)。可以對這兩個假設計算統計量。在一些實施方案中,根據以下公式分別對H0和H1計算student t統計量:,其中fxy是胎兒分數。在一些實施方案 中,根據以下公式計算t1和t2的對數似然比:L i,j =log(p(t1 i,j ,degree | D))/log(p(t2 i,j ,degree|T)),其中degree是指t分布度,D是指二倍性,T是指三體性,p(t1 i,j ,degree | ),=D,T代表給定t分布度的條件概率密度。 In some embodiments, comparing the fitted coverage depth of the chromosome to the depth of coverage is performed by a statistical hypothesis test, wherein one hypothesis is that the fetus is euploid (H0) and the other hypothesis is that the fetus is non-holistic Ploid (H1). Statistics can be calculated for these two assumptions. In some embodiments, the student t statistic is calculated for H0 and H1, respectively, according to the following formula: with , where fxy is the fetal fraction. In some embodiments, the log likelihood ratios of t1 and t2 are calculated according to the following formula: L i , j =log( p (t1 i , j ,degree | D))/log( p ( t 2 i , j ,degree |T)), where degree is the degree of t distribution, D is diploid, T is trisomy, p (t1 i , j ,degree | * ), * =D, and T is the given degree of t distribution. Conditional probability density.

在一個實施方案中,所述胎兒性別是女性,student t統計量根據以下公式進行計算:,其中是從來自懷有女性胎兒的孕婦的樣品的染色體X覆蓋深度和相應GC含量的關係計算的擬合覆蓋深度。在一些實施方案中,|t1|>3.13指所述胎兒可能是XXX或XO。在一些實施方案中,|t1|>5指示所述胎兒是XXX或XO。 In one embodiment, the fetal gender is female and the student t statistic is calculated according to the following formula: ,among them It is the fitted coverage depth calculated from the relationship between the chromosome X coverage depth and the corresponding GC content of a sample from a pregnant woman with a female fetus. In some embodiments, |t1|>3.13 means that the fetus may be XXX or XO. In some embodiments, |t1|>5 indicates that the fetus is XXX or XO.

在另一個實施方案中,所述胎兒性別是男性,student t統計量根據以下公式計算:,其中是從來自懷有女性胎兒的孕婦的樣品的染色體X覆蓋深度和相應GC含量的關係計算的擬合覆蓋深度。在一些實施方案中,|t2|>3.13指示所述胎兒可能是XXY或XYY。在一些實施方案中,|t2|>5指示所述胎兒是XXY或XYY。 In another embodiment, the fetal gender is male and the student t statistic is calculated according to the following formula: ,among them It is the fitted coverage depth calculated from the relationship between the chromosome X coverage depth and the corresponding GC content of a sample from a pregnant woman with a female fetus. In some embodiments, |t2|>3.13 indicates that the fetus may be XXY or XYY. In some embodiments, |t2|>5 indicates that the fetus is XXY or XYY.

本文還提供了一種確定胎兒遺傳異常的方法,所述方法包括:a)從多於一個正常樣品獲得覆蓋目的染色體的複數多核苷酸片段的序列訊息;b)基於所述序列訊息將所述片段分配至染色體;c)基於所述正常樣品的序列訊息計算所述染色體的覆蓋深度和GC含量;d)確定所述染色體的覆蓋深度和GC含量之間的關係;e)從生物樣品獲得複數多核苷酸片段的序列訊息;f)基於來自所述生物樣品的序 列訊息將所述片段分配至染色體;g)基於所述生物樣品的序列訊息計算所述染色體的覆蓋深度和GC含量;h)使用所述染色體的GC含量以及所述染色體的覆蓋深度和GC含量之間的關係計算所述染色體的擬合覆蓋深度;以及i)將所述染色體的擬合覆蓋深度與覆蓋深度進行比較,其中它們之間的差異指示胎兒遺傳異常。 Also provided herein is a method of determining a genetic abnormality in a fetus comprising: a) obtaining a sequence message covering a plurality of polynucleotide fragments of a chromosome of interest from more than one normal sample; b) said fragment based on said sequence message Assigning to the chromosome; c) calculating the depth of coverage and GC content of the chromosome based on the sequence information of the normal sample; d) determining the relationship between the depth of coverage of the chromosome and the GC content; e) obtaining a plurality of multinuclei from the biological sample Sequence information of the nucleotide fragment; f) based on the sequence from the biological sample a column message assigns the fragment to a chromosome; g) calculating a depth of coverage and a GC content of the chromosome based on a sequence message of the biological sample; h) using a GC content of the chromosome and a depth of coverage and GC content of the chromosome The relationship between the calculated coverage depths of the chromosomes; and i) comparing the fitted coverage depth of the chromosomes to the depth of coverage, wherein the difference between them indicates a fetal genetic abnormality.

在另一方面中,本文提供了一種包含用於進行胎兒遺傳異常的產前診斷的多條指令的計算機可讀介質,其工作過程包括以下步驟:a)從樣品接收複數多核苷酸片段的序列訊息;b)基於所述序列訊息將所述多核苷酸片段分配至染色體;c)基於所述序列訊息計算染色體的覆蓋深度和GC含量;d)使用所述染色體的GC含量以及確立的所述染色體的覆蓋深度和GC含量之間的關係計算所述染色體的擬合覆蓋深度;以及e)將所述染色體的擬合覆蓋深度與覆蓋深度進行比較,其中它們之間的差異指示胎兒遺傳異常。 In another aspect, provided herein is a computer readable medium comprising a plurality of instructions for performing prenatal diagnosis of a fetal genetic abnormality, the process comprising the steps of: a) receiving a sequence of a plurality of polynucleotide fragments from a sample a message; b) assigning the polynucleotide fragment to a chromosome based on the sequence message; c) calculating a depth of coverage and a GC content of the chromosome based on the sequence message; d) using a GC content of the chromosome and establishing the The relationship between the depth of coverage of the chromosome and the GC content calculates the fitted coverage depth of the chromosome; and e) compares the fitted coverage depth of the chromosome to the depth of coverage, wherein the difference between them indicates a fetal genetic abnormality.

在又一方面中,本文提供了一種用於檢測胎兒遺傳異常的系統,其包括:a)用於從樣品獲得複數多核苷酸片段的序列訊息的工具;和b)包含用於進行胎兒遺傳異常的產前診斷的多條指令的計算機可讀介質。在一些實施方案中,所述系統還包含從懷孕的女性受試者獲得的生物樣品,其中所述生物樣品包含複數多核苷酸片段。 In yet another aspect, provided herein is a system for detecting a genetic abnormality in a fetus comprising: a) means for obtaining a sequence message of a plurality of polynucleotide fragments from a sample; and b) comprising for performing a genetic abnormality of the fetus A computer readable medium for pre-diagnosing multiple instructions. In some embodiments, the system further comprises a biological sample obtained from a pregnant female subject, wherein the biological sample comprises a plurality of polynucleotide fragments.

本發明涉及用於通過大規模測序來自母體生物樣品的多核苷酸片段而無創性檢測胎兒遺傳異常的方法。還提供了基於染色體的覆蓋深度和相應GC含量之間的關係去除由於染色體GC含量差異造 成的測序結果GC偏差的方法。因此,本文提供了一種方法,以通過局部加權的多項式回歸來擬合每個樣品的染色體相對於所述多核苷酸片段的GC含量的覆蓋深度,從而通過計算校正關於GC含量的student t計算的參考參數。 The present invention relates to a method for non-invasive detection of fetal genetic abnormalities by large-scale sequencing of polynucleotide fragments from maternal biological samples. It also provides a relationship between the depth of chromosome-based coverage and the corresponding GC content to remove the difference in GC content due to chromosomes. The method of sequencing the GC deviation. Accordingly, a method is provided herein to fit the depth of coverage of the chromosome content of each sample relative to the GC content of the polynucleotide fragment by locally weighted polynomial regression, thereby calculating the Student t calculation for correction of GC content by calculation. Reference parameters.

本文還提供了一種通過使用統計假設檢驗的統計分析來確定胎兒遺傳異常的方法。另外,還提供了方法,以計算可用於確定對特定統計顯著性水平所需要的臨床樣品量的數據質量控制(DQC)標準。 This paper also provides a method for determining fetal genetic abnormalities by using statistical analysis of statistical hypothesis testing. In addition, methods are provided to calculate data quality control (DQC) criteria that can be used to determine the amount of clinical sample required for a particular statistical significance level.

I.定義I. Definition

除非另外定義,否則本文使用的所有技術術語和科學術語具有與本發明所屬技術領域普通具有通常知識者通常理解相同的含義。本文參考的所有專利、專利申請、公布的專利申請和其他出版物都通過引用的方式全文納入本文。如果該部分中列出的定義與通過引用納入本文的所述專利、專利申請、公布的專利申請和其他出版物中列出的定義相反或要不然不一致,該部分中列出的定義優先於通過引用納入本文中的定義。 Unless otherwise defined, all technical and scientific terms used herein have the same meaning as commonly understood by those of ordinary skill in the art. All patents, patent applications, published patent applications, and other publications are hereby incorporated by reference in their entirety in their entirety in their entirety in their entirety. If the definitions listed in this section are contrary or otherwise inconsistent with the definitions listed in the patents, patent applications, published patent applications and other publications incorporated by reference, the definitions listed in this section take precedence over The references are incorporated into the definitions in this article.

除非另外指明,否則本文使用的單數形式「一(a)」、「一個(an)」和「該(the)」包括複數的指代物。例如,「一個」二體包括一個或複數二體。 The singular forms "a", "an" and "the" For example, a "one" two body includes one or plural two bodies.

術語「染色體異常」是指受試者染色體和正常同源染色體的結構之間的偏差。術語「正常」是指具體物種正常個體中出現的主流核型或帶型。染色體異常可以是數目的或結構的,包括但不限於非整倍性、多倍性、倒位、三體性、單體性、重複、缺失、部分染色 體缺失、增加、部分染色體增加、插入、染色體片段、染色體區、染色體重排和易位。染色體異常可能與存在病理病症或者與傾向於發生病理病症相關。本文定義的單核苷酸多態性(「SNP」)不是染色體異常。 The term "chromosomal abnormality" refers to the deviation between the structure of a subject's chromosome and a normal homologous chromosome. The term "normal" refers to the predominant karyotype or banding pattern that occurs in a normal individual of a particular species. Chromosomal abnormalities can be number or structural, including but not limited to aneuploidy, polyploidy, inversion, trisomy, haplotype, repeat, deletion, partial staining Loss, increase, partial chromosome increase, insertion, chromosome fragmentation, chromosomal region, chromosomal rearrangement, and translocation. Chromosomal abnormalities may be associated with the presence of pathological conditions or with the predisposition to pathological conditions. Single nucleotide polymorphisms ("SNPs") as defined herein are not chromosomal abnormalities.

單體性X(XO,缺失整條X染色體)是最常見類型的特納綜合症,每2500至3000個新生女嬰中出現1例(Sybert and McCauley N Engl J Med(2004)351:1227-1238)。XXY綜合症是男性具有額外X染色體的病症,每1000名男性中大約出現1例(Bock,Understanding Klinefelter Syndrome:A Guide for XXY Males and Their Families.NIH Pub.No.93-3202(1993))。XYY綜合症是男性有額外Y染色體的性染色體非整倍性,共有47條染色體,而不是正常的46條,1000個出生男性中有1例,並可能導致男性不育(Aksglaede,et al.,J Clin Endocrinol Metab(2008)93:169-176)。 Monosomy X (XO, missing the entire X chromosome) is the most common type of Turner syndrome, with 1 in every 2,500 to 3,000 newborn girls (Sybert and McCauley N Engl J Med (2004) 351:1227- 1238). XXY syndrome is a condition in which men have an extra X chromosome, and approximately 1 out of every 1,000 males (Bock, Understanding Klinefelter Syndrome: A Guide for XXY Males and Their Families. NIH Pub. No. 93-3202 (1993)). XYY syndrome is a sex chromosome aneuploidy in men with an extra Y chromosome, with 47 chromosomes instead of normal 46, and 1 in 1000 males, and may lead to male infertility (Aksglaede, et al. , J Clin Endocrinol Metab (2008) 93: 169-176).

特納綜合症包括數種病症,其中單體性X(XO,缺少整條性染色體,巴氏小體)最常見。女性通常具有兩條X染色體,但特納綜合症中這些性染色體中的一條缺失。在2000至5000表型女性中出現1例,該綜合症以多種方式顯現。克蘭費爾特綜合症是男性具有額外X染色體的病症。在人類中,克蘭費爾特綜合症是最常見的性染色體病症,是由存在額外染色體引起的第二常見病症。該病症在每1000名男性中出現大約1例。XYY綜合症是男性有額外Y染色體的性染色體非整倍性,共有47條染色體,而不是正常的46條。這產生47、XYY核型。該病症通常無症狀,1000個出生男性中有1例,有可能導致男性不育。 Turner syndrome includes several conditions in which monomeric X (XO, lacking the entire sex chromosome, Pap corpuscle) is most common. Women usually have two X chromosomes, but one of these sex chromosomes is missing in Turner syndrome. One case occurred in 2,000 to 5,000 phenotype women, and the syndrome appeared in a variety of ways. Cranfield syndrome is a condition in which men have an extra X chromosome. In humans, Kranfeldt syndrome is the most common sex chromosome disorder and is the second most common condition caused by the presence of extra chromosomes. The condition occurs in approximately 1 in every 1,000 males. XYY syndrome is a sex chromosome aneuploidy in men with an extra Y chromosome, with 47 chromosomes instead of the normal 46. This produces a 47, XYY karyotype. The condition is usually asymptomatic, with 1 in 1000 males, which may lead to male infertility.

三體性13(帕塔綜合症)、三體性18(Edward綜合症)和三體性21(唐氏綜合症)是臨床上最重要的常染色體三體性,如何檢測它們一直是熱點。檢測以上胎兒染色體畸變在產前診斷中十分重要(Ostler,Diseases of the eye and skin:a color atlas。Lippincott Williams & Wilkins.pp.72.ISBN 9780781749992(2004);Driscoll and Gross N Engl J Med(2009)360:2556-2562;Kagan,et al.,Human Reproduction(2008)23:1968-1975)。 Trisomy 13 (Pata Syndrome), Trisomy 18 (Edward Syndrome), and Trisomy 21 (Down Syndrome) are the most important autosomal trisomy in the clinic. How to detect them has always been a hot spot. Detection of the above fetal chromosomal aberrations is important in prenatal diagnosis (Ostler, Diseases of the eye and skin: a color atlas. Lippincott Williams & Wilkins.pp. 72. ISBN 9780781749992 (2004); Driscoll and Gross N Engl J Med (2009) ) 360: 2556-2562; Kagan, et al. , Human Reproduction (2008) 23: 1968-1975).

術語「參考獨特讀段」是指具有獨特序列的染色體片段。因此,這類片段可以被清楚地分配至單染色體定位。染色體的參考獨特讀段可以基於發布的參考基因組序列例如hg18或hg19進行構建。 The term "reference unique read" refers to a chromosomal segment with a unique sequence. Therefore, such fragments can be clearly assigned to single chromosome localization. Reference unique reads of chromosomes can be constructed based on published reference genome sequences such as hg18 or hg19.

術語「多核苷酸」、「寡核苷酸」、「核酸」和「核酸分子」在本文中可互換使用,是指任意長度的多聚體形式的核苷酸,可以包括核糖核苷酸、脫氧核糖核苷酸、其類似物或其混合物。該術語僅是指所述分子的一級結構。因此,所述術語包括三鏈、雙鏈和單鏈脫氧核糖核酸(「DNA」)以及三鏈、雙鏈和單鏈核糖核酸(「RNA」)。它還包括修飾的(例如通過烷基化和/或通過加帽)多核苷酸和非修飾形式的多核苷酸。更具體地,術語「多核苷酸」、「寡核苷酸」、「核酸」和「核酸分子」包括多脫氧核糖核苷酸(含2-脫氧-D-核糖)、多核糖核苷酸(含D-核糖),包括tRNA、rRNA、hRNA和剪接或未剪接的mRNA,為嘌呤或嘧啶鹼基的N糖苷或C糖苷的任何其他類型多核苷酸以及含非核苷酸(normucleotidic)主鏈的其他多聚體,例如聚醯胺(例如核酸肽(「PNA」))和多嗎啉代(可市購自Anti-Virals,Inc.,Corvallis,OR.,例如NeuGene®)多聚體和其他合成 的序列特異性核酸多聚體,條件是所述多聚體包含處於允許例如見於DNA和RNA中的鹼基配對和鹼基堆積的構型的核鹼基。因此,這些術語包括例如3'-脫氧-2',5'-DNA、寡脫氧核糖核苷酸N3'-P5'亞磷醯胺、2'-O-烷基取代的RNA、DNA與RNA之間或者PNA與DNA或RNA之間的雜交體,還包括已知類型的修飾物,例如標記物、烷基化物、「帽」、一個或複數核苷酸置換為類似物、核苷酸間修飾物(例如具有不荷電連接物那些(例如甲基磷酸酯、磷酸三酯、亞磷醯胺、氨基甲酸酯等)、具有荷負電連接物的那些(例如磷硫醯、二硫代磷酸酯等)和具有荷正電連接物的那些(例如氨烷基亞磷醯胺、氨基烷基磷酸三酯),包含懸垂部分例如蛋白質(包括酶(例如核酸酶)、毒素、抗體、信號肽、聚左旋賴氨酸等)的那些,具有插入劑(例如吖啶、補骨脂素等)的那些,包含螯合物(例如金屬、放射性金屬、硼、金屬氧化物等的螯合物)的那些,包含烷基化物的那些,具有修飾的連接物的那些(例如α異頭核酸);以及未修飾形式的所述多核苷酸或寡核苷酸。 The terms "polynucleotide", "oligonucleotide", "nucleic acid" and "nucleic acid molecule" are used interchangeably herein to refer to a polynucleotide of any length in the form of a multimer, which may include ribonucleotides, Deoxyribonucleotides, analogs thereof or mixtures thereof. This term refers only to the primary structure of the molecule. Thus, the terms include triple-stranded, double-stranded, and single-stranded deoxyribonucleic acids ("DNA") as well as three-stranded, double-stranded, and single-stranded ribonucleic acids ("RNAs"). It also includes modified (eg, by alkylation and/or by capping) polynucleotides and non-modified forms of polynucleotides. More specifically, the terms "polynucleotide", "oligonucleotide", "nucleic acid" and "nucleic acid molecule" include polydeoxyribonucleotides (including 2-deoxy-D-ribose), polyribonucleotides ( Containing D-ribose), including tRNA, rRNA, hRNA and spliced or unspliced mRNA, any other type of polynucleotide that is an N-glycoside or C-glycoside of a purine or pyrimidine base, and a non-nucleotide-containing backbone other polymers, such as polyamides (such as a peptide nucleic acid ( "PNA")) and multi-morpholino (commercially available from Anti-Virals, Inc., Corvallis, OR., for example NeuGene ®) and other polymer Synthetic sequence-specific nucleic acid multimers, provided that the multimer comprises nucleobases in a configuration that allows for base pairing and base stacking, for example, found in DNA and RNA. Thus, these terms include, for example, 3'-deoxy-2', 5'-DNA, oligodeoxyribonucleotide N3'-P5' phosphite, 2'-O-alkyl substituted RNA, DNA and RNA Intermixes between PNA and DNA or RNA, including known types of modifications, such as labels, alkylates, "caps", one or multiple nucleotide substitutions, analogs, internucleotide modifications (eg, those having uncharged linkers (eg, methyl phosphate, phosphotriester, phosphite, carbamate, etc.), those having a negatively charged linker (eg, phosphonium sulfonate, dithiophosphate) And those having a positively charged linker (eg, aminoalkylphosphonium, aminoalkyl phosphate), including pendant moieties such as proteins (including enzymes (eg, nucleases), toxins, antibodies, signal peptides, Those of poly L-lysine, etc., having an intercalating agent (for example, acridine, psoralen, etc.), containing a chelate (for example, a chelate of a metal, a radioactive metal, boron, a metal oxide, etc.) Those containing alkylates, those having modified linkers (eg, alpha anomeric nucleic acids); and unrepaired The form of a polynucleotide or oligonucleotide.

「大規模平行測序」是指用於測序數百萬核酸片段的技術,例如通過將隨機片段化的基因組DNA附著於透光的平面上並進行固相擴增以形成具有數百萬個簇的高密度測序流動池,每個簇在每平方厘米上包含約1000個拷貝的模板。將這些模板使用4色DNA邊合成邊測序技術進行測序。參見Illumina,Inc.,San Diego,Calif提供的產品。本發明使用的測序較佳在無預擴增或克隆步驟的情況下進行,但在具有可用於PCR和基於顯微模板的測序二者的反應室的微流晶片中可以與基於擴增的方法相結合。僅需要約30 bp的隨機序 列訊息來將序列確定為屬於具體的人染色體。更長序列可以唯一地確定更具體的靶。在本例中,獲得了大量的35 bp讀段。對大規模平行測序方法的進一步描述見Rogers and Ventner,Nature(2005)437:326-327。 "Large-scale parallel sequencing" refers to techniques for sequencing millions of nucleic acid fragments, for example by attaching randomly fragmented genomic DNA to a light-transmissive plane and performing solid-phase amplification to form millions of clusters. High density sequencing flow cells, each cluster containing approximately 1000 copies of the template per square centimeter. These templates were sequenced using 4-color DNA synthesis and sequencing techniques. See Illumina, Inc., San Diego, Calif for products. The sequencing used in the present invention is preferably carried out without a pre-amplification or cloning step, but can be combined with an amplification-based method in a microfluidic wafer having a reaction chamber that can be used for both PCR and microtemplate-based sequencing. Combine. Only a random sequence message of approximately 30 bp is required to determine the sequence as belonging to a particular human chromosome. Longer sequences can uniquely identify more specific targets. In this example, a large number of 35 bp reads are obtained. Further description of large scale parallel sequencing methods can be found in Rogers and Ventner, Nature (2005) 437:326-327.

本文使用的「生物樣品」是指自活體或病毒來源或其他來源的大分子和生物分子獲得的任何樣品,包括從中可獲得核酸、蛋白質或其他大分子的受試者的任何細胞類型或組織。生物樣品可以是從生物源直接獲得的樣品,或者被加工過的樣品。例如,可擴增的分離核酸構成了生物樣品。生物樣品包括但不限於體液,例如血液、血漿、血清、腦脊液、滑液、尿和汗液;來自動植物的組織和器官樣品以及由其獲得的加工樣品。 As used herein, "biological sample" refers to any sample obtained from macromolecules and biomolecules of a living or viral source or other source, including any cell type or tissue of a subject from which a nucleic acid, protein or other macromolecule can be obtained. The biological sample can be a sample obtained directly from a biological source, or a processed sample. For example, an amplifiable isolated nucleic acid constitutes a biological sample. Biological samples include, but are not limited to, body fluids such as blood, plasma, serum, cerebrospinal fluid, synovial fluid, urine, and perspiration; tissue and organ samples of automated plants and processed samples obtained therefrom.

文中「覆蓋所述染色體」是指複數核酸片段的序列訊息能夠覆蓋到至少一部分所述染色體,即例如,複數核酸片段的序列訊息能夠覆蓋到所述染色體的一部分、複數核酸片段的序列訊息能夠覆蓋到所述染色體的一部分和至少一條其他染色體的一部分、複數核酸片段的序列訊息能夠覆蓋到所述染色體的一部分和至少一條其他染色體的全部、複數核酸片段的序列訊息能夠覆蓋到所述染色體的全部、複數核酸片段的序列訊息能夠覆蓋到所述染色體的全部和至少一條其他染色體的一部分、或複數核酸片段的序列訊息能夠覆蓋到所述染色體的全部和至少一條其他染色體的全部。 The phrase "covering the chromosome" means that the sequence information of the plurality of nucleic acid fragments can cover at least a part of the chromosome, that is, for example, the sequence information of the plurality of nucleic acid fragments can cover a part of the chromosome, and the sequence information of the plurality of nucleic acid fragments can be covered. A sequence message to a portion of the chromosome and a portion of at least one other chromosome, the plurality of nucleic acid fragments can cover a portion of the chromosome and all of the at least one other chromosome, and the sequence information of the plurality of nucleic acid fragments can cover all of the chromosome The sequence information of the plurality of nucleic acid fragments can cover all of the chromosomes and a portion of at least one other chromosome, or the sequence information of the plurality of nucleic acid fragments can cover all of the chromosomes and all of the at least one other chromosome.

應理解,本文描述的本發明的方面和實施方案包括「由......組成」和/或「基本由......組成」的方面和實施方案。 It will be understood that aspects and embodiments of the invention described herein include aspects and embodiments of "consisting of" and/or "consisting essentially of."

從結合圖式的如下詳細說明,本發明的其他目標、優點和特徵 將變得清晰。 Other objects, advantages and features of the present invention will become apparent from the following detailed description. Will become clear.

II.建立覆蓋深度和GC含量之間的關係II. Establish the relationship between depth of coverage and GC content

本文提供了一種用於建立染色體的覆蓋深度和GC含量之間的關係的方法,所述方法包括:從多於一個樣品獲得涵蓋所述染色體的複數多核苷酸片段的序列訊息;基於所述序列訊息將所述片段分配至染色體;對於每個樣品基於所述序列訊息計算所述染色體的覆蓋深度和GC含量;並確定所述染色體的覆蓋深度和GC含量之間的關係。所述操作步驟可以以無特定順序的方式進行。在一些實施方案中,所述方法可以以如下順序進行:a)從多於一個樣品獲得涵蓋所述染色體的複數多核苷酸片段的序列訊息;b)基於所述序列訊息將所述片段分配至染色體;c)對於每個樣品基於所述序列訊息計算所述染色體的覆蓋深度和GC含量;和d)確定所述染色體的覆蓋深度和GC含量之間的關係。 Provided herein is a method for establishing a relationship between a depth of coverage of a chromosome and a GC content, the method comprising: obtaining a sequence message covering a plurality of polynucleotide fragments of the chromosome from more than one sample; based on the sequence A message assigns the fragment to a chromosome; calculating a depth of coverage and a GC content of the chromosome based on the sequence message for each sample; and determining a relationship between a depth of coverage of the chromosome and a GC content. The operational steps can be performed in a non-specific order. In some embodiments, the method can be performed in the following order: a) obtaining a sequence message covering a plurality of polynucleotide fragments of the chromosome from more than one sample; b) assigning the fragment to the sequence based on the sequence message Chromosome; c) calculating the depth of coverage and GC content of the chromosome based on the sequence message for each sample; and d) determining the relationship between the depth of coverage of the chromosome and the GC content.

為了計算染色體區域的覆蓋深度和GC含量,通過測序從樣品獲得的模板DNA來獲得多核苷酸片段的序列訊息。在一個實施方案中,所述模板DNA包含母體DNA和胎兒DNA二者。在另一個實施方案中,模板DNA獲自孕婦的血液。血液可以使用用於取血的任何常規技術收集,包括但不限於靜脈穿刺。例如,血液可以取自肘內側或手背的靜脈。血樣可以在妊娠任何時間從孕婦採集。例如,血樣可以在1□4,4□8、8□12、12□16、16□20、20□24、24□28、28□32、32□36、36□40或40□44胎兒孕週,較佳8-28胎兒孕週,從孕婦採集。 To calculate the depth of coverage and GC content of the chromosomal region, the sequence information of the polynucleotide fragment is obtained by sequencing the template DNA obtained from the sample. In one embodiment, the template DNA comprises both maternal DNA and fetal DNA. In another embodiment, the template DNA is obtained from the blood of a pregnant woman. Blood can be collected using any conventional technique for blood collection, including but not limited to venipuncture. For example, blood can be taken from the inside of the elbow or the vein on the back of the hand. Blood samples can be collected from pregnant women at any time during pregnancy. For example, a blood sample can be pregnant at 1 □ 4, 4 □ 8, 8 □ 12, 12 □ 16, 16 □ 20, 20 □ 24, 24 □ 28, 28 □ 32, 32 □ 36, 36 □ 40 or 40 □ 44 Week, preferably 8-28 weeks of gestation, collected from pregnant women.

基於所述序列訊息將所述多核苷酸片段分配至染色體區域。參 考基因組序列用於獲得所述參考獨特讀段。本文使用的術語「參考獨特讀段」是指基於參考基因組序列被分配至特異基因組區域的所有獨特多核苷酸片段。在一些實施方案中,參考獨特讀段具有相同長度,例如約10、12、15、20、25、30、35、40、50、100、200、300、500或1000 bp。在其他一些實施方案中,人基因組版本hg18或hg19可以被用作所述參考基因組序列。染色體定位是染色體上長度為約10、20、30、40、50、60、70、80、90、100、200、300、400、500、600、700、800、900、1000、2000、3000、4000、5000、6000、7000、8000、9000、10,000或更多KB的連續窗口。染色體定位還可以是單條染色體。 The polynucleotide fragment is assigned to a chromosomal region based on the sequence message. Reference The genomic sequence is used to obtain the reference unique reads. The term "reference unique read" as used herein refers to all unique polynucleotide fragments that are assigned to a specific genomic region based on a reference genomic sequence. In some embodiments, the reference unique reads have the same length, such as about 10, 12, 15, 20, 25, 30, 35, 40, 50, 100, 200, 300, 500, or 1000 bp. In other embodiments, the human genome version hg18 or hg19 can be used as the reference genomic sequence. Chromosome localization is about 10, 20, 30, 40, 50, 60, 70, 80, 90, 100, 200, 300, 400, 500, 600, 700, 800, 900, 1000, 2000, 3000 on the chromosome. A continuous window of 4000, 5000, 6000, 7000, 8000, 9000, 10,000 or more KB. The chromosomal location can also be a single chromosome.

本文使用的術語「覆蓋深度」是指分配到染色體區域的片段數目和所述染色體區域的參考獨特讀段數目之間的比值,使用以下公式計算:C i,j =n i,j /N j ,j=1,2,…,22,X,Y (1) The term "coverage depth" as used herein refers to the ratio between the number of segments assigned to a chromosomal region and the number of reference unique reads for that chromosomal region, calculated using the following formula: C i , j = n i , j / N j , j =1,2,...,22, X , Y (1)

其中n i,j 是樣品i中映射到染色體j上的獨特序列讀段數目;C i,j 是樣品i中染色體j的覆蓋深度;N j 是染色體j中參考獨特讀段數目。 Where n i , j is the number of unique sequence reads mapped to chromosome j in sample i; C i , j is the depth of coverage of chromosome j in sample i; N j is the number of reference unique reads in chromosome j.

在一些實施方案中,將未分配至單個染色體區域或分配至複數染色體區域的多核苷酸片段忽略。在一些實施方案中,所述覆蓋深度是基於另一染色體區域的覆蓋深度、另一染色體的覆蓋深度、所有其他常染色體的平均覆蓋深度、所有其他染色體的平均覆蓋深度或所有染色體的平均覆蓋深度而標準化的。在一些實施方案中,22條常染色體的平均覆蓋深度被用作標準化常數來計算對不同樣品獲 得的序列讀段總數目的差異: In some embodiments, polynucleotide fragments that are not assigned to a single chromosomal region or that are assigned to a plurality of chromosomal regions are ignored. In some embodiments, the depth of coverage is based on the depth of coverage of another chromosomal region, the depth of coverage of another chromosome, the average coverage depth of all other autosomes, the average coverage depth of all other chromosomes, or the average coverage depth of all chromosomes. And standardized. In some embodiments, the average coverage depth of 22 autosomes is used as a normalization constant to calculate the difference in the total number of sequence reads obtained for different samples:

其中cri,j代表樣品i中染色體j的相對覆蓋深度。從現在開始,每一染色體的「相對覆蓋深度」是指標準化的值,被用於比較不同的樣品以及用於後續分析。 Where cri,j represents the relative depth of coverage of chromosome j in sample i. From now on, the "relative coverage depth" of each chromosome refers to a standardized value that is used to compare different samples and for subsequent analysis.

染色體定位的GC含量可以基於染色體定位中的獨特參考讀段或者基於分配至所述染色體定位的測序多核苷酸片段通過所述染色體定位的平均GC百分比進行計算。染色體的CC含量可以使用以下公式進行計算:GC i,j =NGC i,j /BASE i,j (3) The GC content of the chromosomal location can be calculated based on a unique reference read in the chromosomal location or based on the average GC percentage of the chromosomal location of the sequenced polynucleotide fragments assigned to the chromosomal location. The CC content of a chromosome can be calculated using the following formula: GC i , j = NGC i , j / BASE i , j (3)

其中i代表樣品i,j代表染色體j,NGCi,j代表樣品i中染色體j上的G和CDNA鹼基數目,BASEi,j代表樣品i中染色體j上的DNA鹼基數目。 Where i represents sample i, j represents chromosome j, NGC i,j represents the number of G and CDNA bases on chromosome j in sample i, and BASE i,j represents the number of DNA bases on chromosome j in sample i.

覆蓋深度和GC含量可以基於從單個樣品或者從複數樣品獲得的多核苷酸片段的序列訊息。為了建立染色體區域的覆蓋深度和GC含量之間的關係,所述計算可以基於從至少1、2、5、10、20、50、100、200、500或1000個樣品獲得的多核苷酸片段的序列訊息。 The depth of coverage and GC content can be based on sequence information of polynucleotide fragments obtained from a single sample or from a plurality of samples. To establish the relationship between the depth of coverage of the chromosomal region and the GC content, the calculations can be based on polynucleotide fragments obtained from at least 1, 2, 5, 10, 20, 50, 100, 200, 500 or 1000 samples. Sequence message.

在一些實施方案中,覆蓋深度和GC含量之間的關係是不強的線性關係。 In some embodiments, the relationship between depth of coverage and GC content is a non-linear relationship.

Loess算法或局部加權多項式回歸可以用於評估數值對之間例如覆蓋深度和GC含量之間的非線性關係(相關性)。 The Loess algorithm or local weighted polynomial regression can be used to estimate the non-linear relationship (correlation) between pairs of values, such as coverage depth and GC content.

III.確定胎兒遺傳異常III. Determining fetal genetic abnormalities

本文還提供了一種確定胎兒遺傳異常的方法,所述方法包括:a)從樣品獲得複數多核苷酸片段的序列訊息;b)基於所述序列訊息將所述片段分配至染色體;c)基於所述序列訊息計算染色體的覆蓋深度和GC含量;d)使用所述染色體的GC含量以及確立所述染色體的覆蓋深度和GC含量之間的關係計算所述染色體的擬合覆蓋深度;並且e)將所述染色體的擬合覆蓋深度和覆蓋深度比較,其中它們之間的差異指示胎兒遺傳異常。 Also provided herein is a method of determining a genetic abnormality in a fetus comprising: a) obtaining a sequence message of a plurality of polynucleotide fragments from a sample; b) assigning the fragment to a chromosome based on the sequence message; c) The sequence message calculates the depth of coverage and GC content of the chromosome; d) calculates the fitted coverage depth of the chromosome using the GC content of the chromosome and establishes the relationship between the depth of coverage of the chromosome and the GC content; and e) The fitting coverage depth and coverage depth of the chromosome are compared, wherein the difference between them indicates fetal genetic abnormality.

所述方法可以用於檢測胎兒染色體異常,特別可用於檢測非整倍性、多倍性、單體性、三體性、三體性21、三體性13、三體性14、三體性15、三體性16、三體性18、三體性22、三倍性、四倍性和性染色體異常,包括XO、XXY、XYY和XXX。還可以根據本發明的方法關注人基因組中的某些區域,目的是鑑定部分單體性和部分三體性。例如,所述方法可以涉及分析確定的染色體滑動「窗口」中的序列數據,例如分布在整個染色體上的連續的、不重疊的50 Kb區域。除其他之外,已經報導了部分三體性13q、8p(8p23.1)、7q、遠端6p、5p、3q(3q25.1)、2q、1q(1q42.1和1q21-qter)、部分Xpand單體性4q35.1。在18q21.1-qter重複的情況下,染色體18長臂的部分重複可導致愛德華茲綜合症(Mewar,et al.,Am J Hum Genet.(1993)53:1269-78)。 The method can be used for detecting fetal chromosomal abnormalities, and can be particularly used for detecting aneuploidy, polyploidy, monosomy, trisomy, trisomy 21, trisomy 13, trisomy 14, trisomy. 15. Trisomy 16, trisomy 18, trisomy 22, triploid, tetraploid and sex chromosome abnormalities, including XO, XXY, XYY and XXX. It is also possible to focus on certain regions of the human genome in accordance with the methods of the invention, with the aim of identifying partial and partial trisomies. For example, the method can involve analyzing sequence data in a determined chromosome sliding "window", such as a continuous, non-overlapping 50 Kb region distributed over the entire chromosome. Partial trisomy 13q, 8p (8p23.1), 7q, distal 6p, 5p, 3q (3q25.1), 2q, 1q (1q42.1 and 1q21-qter), part, have been reported, among others Xpand monomeric 4q35.1. In the case of 18q21.1-qter repeats, partial duplication of the long arm of chromosome 18 can lead to Edwards syndrome (Mewar, et al. , Am J Hum Genet. (1993) 53:1269-78).

在一些實施方案中,所述胎兒分數基於對於來自樣品的多核苷酸片段獲得的序列訊息進行估計。染色體X和Y的覆蓋深度和GC含量可以用於估計所述胎兒分數。在一些實施方案中,所述胎兒性 別基於對來自樣品的多核苷酸片段獲得的序列訊息進行確定。染色體X和Y的覆蓋深度和GC含量可以用於確定所述胎兒性別。 In some embodiments, the fetal fraction is estimated based on sequence information obtained for a polynucleotide fragment from a sample. The depth of coverage and GC content of chromosomes X and Y can be used to estimate the fetal fraction. In some embodiments, the fetal The sequence information obtained from the polynucleotide fragment from the sample is not determined based on the sequence information. The depth of coverage and GC content of chromosomes X and Y can be used to determine the sex of the fetus.

在一些實施方案中,通過統計假設檢驗對所述染色體的擬合覆蓋深度與覆蓋深度進行比較,其中一個假設是所述胎兒是整倍體(H0)且另一個假設是所述胎兒是非整倍體(H1)。在一些實施方案中,分別對兩個假設計算student t統計量,作為t1和t2。在一些實施方案中,計算t1和t2的對數似然比。在一些實施方案中,對數似然比>1指示所述胎兒三體性。 In some embodiments, the fitted coverage depth of the chromosome is compared to the depth of coverage by statistical hypothesis testing, wherein one hypothesis is that the fetus is euploid (H0) and the other hypothesis is that the fetus is aneuploid Body (H1). In some embodiments, the student t statistic is calculated for the two hypotheses, respectively, as t1 and t2. In some embodiments, the log likelihood ratios of t1 and t2 are calculated. In some embodiments, a log likelihood ratio > 1 indicates the fetal trisomy.

IV.用於診斷胎兒遺傳異常的計算機可讀介質和系統IV. Computer readable medium and system for diagnosing fetal genetic abnormalities

在另一方面,本文提供了一種包含用於進行胎兒遺傳異常的產前診斷的多條指令的計算機可讀介質,其工作過程包含以下步驟:a)接收所述序列訊息;b)基於所述序列訊息將所述多核苷酸片段分配至染色體;c)基於所述序列訊息計算染色體的覆蓋深度和GC含量;d)使用所述染色體的GC含量以及確立的所述染色體的覆蓋深度和GC含量之間的關係計算所述染色體的擬合覆蓋深度;以及e)將所述染色體的擬合覆蓋深度與覆蓋深度進行比較,其中它們之間的差異指示遺傳異常。 In another aspect, provided herein is a computer readable medium comprising a plurality of instructions for performing prenatal diagnosis of a fetal genetic abnormality, the working process comprising the steps of: a) receiving said sequence message; b) based on said The sequence message assigns the polynucleotide fragment to the chromosome; c) calculates the depth of coverage and GC content of the chromosome based on the sequence message; d) uses the GC content of the chromosome and the established depth of coverage and GC content of the chromosome The relationship between the calculated coverage depths of the chromosomes; and e) comparing the fitted coverage depth of the chromosomes to the depth of coverage, wherein the difference between them indicates a genetic abnormality.

在又一方面中,本文提供了一種用於檢測胎兒非整倍性的系統,其包括:a)用於獲得所述複數多核苷酸片段的序列訊息的工具;和b)包含用於進行胎兒遺傳異常的產前診斷的多條指令的計算機可讀介質。在一些實施方案中,所述系統還包含從懷孕的女性受試者獲得的生物樣品,其中所述生物樣品包含複數多核苷酸片段。 In yet another aspect, provided herein is a system for detecting fetal aneuploidy comprising: a) means for obtaining a sequence message of said plurality of polynucleotide fragments; and b) comprising for performing a fetus A computer readable medium of multiple instructions for prenatal diagnosis of genetic abnormalities. In some embodiments, the system further comprises a biological sample obtained from a pregnant female subject, wherein the biological sample comprises a plurality of polynucleotide fragments.

對本領域具有通常知識者來說明顯的是,可以使用若干不同的 測序方法和變體。在一個實施方案中,所述測序使用大規模平行測序進行。大規模平行測序,例如可在454平臺(Roche)(Margulies,et al.,Nature(2005)437:376-380)、Illumina Genome Analyzer(或SolexaTM平臺)或SOLiD System(Applied Biosystems)上或採用Helicos True Single Molecule DNA測序技術(Harris,et al.,Science (2008)320:106-109)、Pacific Biosciences的單分子、實時(SMRTTM)技術和納米孔測序技術(Soni and Meller,Clin Chem(2007)53:1996-2001)實現的那些,使得可以平行的方式以高次多路對從樣本分離的許多核酸分子進行測序(Dear,Brief Funct Genomic Proteomic(2003)1:397-416)。這些平臺各自均可測序克隆擴增的或者甚至未擴增的核酸片段單分子。市購的測序設備可以用於獲得多核苷酸片段的序列訊息。 It will be apparent to those of ordinary skill in the art that several different sequencing methods and variants can be used. In one embodiment, the sequencing is performed using massively parallel sequencing. Massively parallel sequencing, for example, the platform 454 (Roche) (Margulies, et al , Nature (2005) 437:. 376-380), Illumina Genome Analyzer ( or Solexa TM platform) or SOLiD System on (Applied Biosystems) or using Helicos True single molecule DNA sequencing technology (Harris, et al, Science ( 2008) 320:. 106-109), Pacific Biosciences single molecule, real-time (SMRT TM) technology, and nanopore sequencing (Soni and Meller, Clin Chem ( 2007) 53: 1996-2001) those achieved such that many nucleic acid molecules isolated from the sample can be sequenced in a high-order multiplex in a parallel manner (Dear, Brief Funct Genomic Proteomic (2003) 1: 397-416). Each of these platforms can sequence clonal amplified or even unamplified nucleic acid fragment single molecules. Commercially available sequencing equipment can be used to obtain sequence information for polynucleotide fragments.

V.實施例V. Example

提高以下實施例以舉例說明本發明,但不限制本發明。 The following examples are presented to illustrate the invention but not to limit it.

實施例1 分析影響檢測敏感性的因素:GC偏差和性別Example 1 Analysis of factors affecting detection sensitivity: GC bias and gender

用於計算覆蓋深度和GC含量的原理步驟框架顯示於第1圖。發明人使用軟體通過將hg18參考序列切割成l-mer(這裏的l-mer是以與樣品測序讀段相同的長度「l」從人序列參考人工分解成的讀段)而產生參考獨特讀段,收集這些「獨特」l-mer作為發明人的參考獨特讀段。第二,本發明人將他們的測序樣品讀段映射至每個染色體的參考獨特讀段。第三,發明人通過應用五分之一離群值截止法刪除離群值,以得到乾淨的數據集。最後,發明人對每個樣品計算每一染色體的覆蓋深度,並且對於每個樣品計算映射到每一染色 體的測序獨特讀段的GC含量。 The principle step framework for calculating the depth of coverage and GC content is shown in Figure 1. The inventors used software to cut the hg18 reference sequence into l-mers (where the l-mer is a read segment that is manually decomposed from the human sequence reference by the same length "l" as the sample sequencing read). Collect these "unique" l-mers as instructor's reference unique reads. Second, the inventors mapped their sequencing sample reads to a reference unique read of each chromosome. Third, the inventor removes outliers by applying a one-fifth outlier cutoff to obtain a clean data set. Finally, the inventor calculates the depth of coverage for each chromosome for each sample and calculates a mapping to each stain for each sample. The GC content of the unique reads of the sequencing of the body.

為了研究GC含量如何影響發明人的數據,發明人選擇了有核型結果的300個整倍體案例,並將它們的測序讀段覆蓋深度和相關GC含量散布至圖中,所述圖顯示了它們之間的強相關性,該現象以前未被報導過(第2圖)。在第2圖中,覆蓋深度與GC-含量強相關,在一些染色體例如4、13等中顯示明顯的向下趨勢,而在其他染色體例如19、22等中顯示向上趨勢。將所有染色體按它們的固有GC-含量升序排列,如第3圖中所示,向下趨勢存在於較低GC-含量組染色體中,而向上趨勢存在於較高GC-含量組染色體中。這可以解釋為,如果對於一個樣品被測序的多核苷酸片段具有比其他樣品更高的GC-含量,則該樣品呈現的覆蓋深度與其他樣品的覆蓋深度相比在較低GC-含量組染色體中將下降,而在較高GC-含量組染色體中將上升。 To investigate how GC content affects the inventor's data, the inventors selected 300 euploid cases with karyotype results and spread their sequencing read coverage depth and associated GC content to the graph, which shows The strong correlation between them has not been reported before (Figure 2). In Fig. 2, the depth of coverage is strongly correlated with the GC-content, showing a clear downward trend in some chromosomes such as 4, 13, etc., and an upward trend in other chromosomes such as 19, 22, and the like. All chromosomes were arranged in ascending order of their intrinsic GC-content. As shown in Figure 3, the downward trend was present in the lower GC-content group chromosomes, while the upward trend was present in the higher GC-content group chromosomes. This can be explained by the fact that if a polynucleotide fragment sequenced for one sample has a higher GC-content than other samples, the sample exhibits a depth of coverage compared to the depth of coverage of other samples in the lower GC-content group chromosome. The lieutenant will decline and will rise in the higher GC-content group chromosomes.

對於不同GC-含量染色體中這種不同變化趨勢的可能解釋是,第4圖中所示的不同染色體中GC-含量組成的差異,結合以測序過程中引入的GC偏差。對於每一染色體的每35-mer參考獨特讀段的GC含量被用於將GC含量分級成36個水平。計算作為每一染色體的GC組成的每個水平的百分比,然後用於用Heatmap2軟體畫熱圖。以染色體13為例,其大部分由較低GC-含量序列區段組成,但其小部分由較高GC-含量序列區段組成。如果測序或PCR過程中的條件有利於測序這些較高GC-含量的區段,那麽較大部分的具有低GC-含量的染色體13將難以被測序,造成在該樣品中染色體13的覆蓋深度變得更低。相比之下,在較高GC-含量組例如染色體19 中,在該樣品中染色體19的覆蓋深度變得更高,原因是大部分的染色體19具有比測序儀偏好的更高的GC-含量。不管在哪個染色體中,少含GC區段和富含GC區段都難以被測序,但GC偏差引起的影響對具有不同GC-含量組成的不同染色體是不同的。將每一參考染色體分成1 KB的區段,計算所述區段中每個獨特參考讀段的GC含量。將以合適間隔結構[0.3,0.6]存在的每個區段的GC含量除以0.001的步長,然後計算每個間隔的相對覆蓋度。第5圖顯示了對每個染色體的相對覆蓋度和GC含量的作圖。 A possible explanation for this different trend in different GC-content chromosomes is the difference in GC-content composition in the different chromosomes shown in Figure 4, combined with the GC bias introduced during sequencing. The GC content for each 35-mer reference unique read for each chromosome was used to classify the GC content into 36 levels. Calculate the percentage of each level of GC composition as each chromosome and then use it to draw a heat map with the Heatmap2 software. Taking chromosome 13 as an example, most of it consists of lower GC-content sequence segments, but a small portion consists of higher GC-content sequence segments. If the conditions in the sequencing or PCR process favor the sequencing of these higher GC-content segments, then a larger portion of chromosome 13 with a low GC-content will be difficult to sequence, resulting in a change in the depth of coverage of chromosome 13 in the sample. Got lower. In contrast, in higher GC-content groups such as chromosome 19 In this sample, the depth of coverage of chromosome 19 becomes higher because most of chromosome 19 has a higher GC-content than that preferred by the sequencer. Regardless of the chromosome, the less GC-containing segments and the GC-rich segments are difficult to sequence, but the effects of GC bias are different for different chromosomes with different GC-content compositions. Each reference chromosome was divided into 1 KB segments and the GC content of each unique reference read in the segment was calculated. The GC content of each segment present in the appropriate spacing structure [0.3, 0.6] was divided by the step size of 0.001, and then the relative coverage of each interval was calculated. Figure 5 shows a plot of relative coverage and GC content for each chromosome.

使用兩獨立樣本的t檢驗分析胎兒性別對數據的影響。對於GC含量,除了性染色體之外的常染色體之間基本未發現顯著差異,但在女性和男性之間UR%有明顯差異(Chiu et al.,(2008)Proc Natl Acad Sci USA 105:20458-20463),暗示在檢測常染色體非整倍性時不需要區分胎兒性別,但在檢測性染色體非整倍性例如XO、XYY等時需要首先區分胎兒性別。 The t-test of two independent samples was used to analyze the effect of fetal gender on the data. For GC content, no significant differences were found between autosomes other than sex chromosomes, but there was a significant difference in UR% between women and men (Chiu et al., (2008) Proc Natl Acad Sci USA 105:20458- 20463), suggesting that there is no need to distinguish fetal sex when detecting autosomal aneuploidy, but it is necessary to first distinguish fetal gender when detecting sex chromosome aneuploidy such as XO, XYY, and the like.

實施例2 統計模型Example 2 Statistical Model

使用上文討論的該現象,發明人試圖使用局部多項式來擬合覆蓋深度和相應GC含量之間的關係。覆蓋深度由如下的GC的函數和常態分布的殘差組成:cr i,j =f(GC i,j )+ε i,j ,j=1,2,…,22,X,Y (4) Using this phenomenon discussed above, the inventors attempted to use a local polynomial to fit the relationship between the depth of coverage and the corresponding GC content. The depth of coverage consists of the following functions of the GC and the residuals of the normal distribution: cr i , j = f ( GC i , j ) + ε i , j , j =1, 2,...,22, X , Y (4)

其中f(GCi,j)代表樣品i、染色體j的覆蓋深度和相應GC含量之間的關係,εi,j代表樣品i、染色體j的殘差。在所述覆蓋深度和相應GC含量之間有不強的線性關係,因此發明人應用loess算法將 所述覆蓋深度與所述相應GC含量進行擬合,發明人從中計算了對發明人的模型來說重要的值,即擬合覆蓋深度: Where f (GC i,j ) represents the relationship between sample i, the depth of coverage of chromosome j and the corresponding GC content, ε i,j represents the residual of sample i, chromosome j. There is a strong linear relationship between the depth of coverage and the corresponding GC content, so the inventors applied the loess algorithm to fit the coverage depth to the corresponding GC content, and the inventors calculated the model for the inventor. Say the important value, that is, the fit coverage depth:

用所述擬合覆蓋深度,根據以下公式6和公式7計算標準方差和student t: The depth is covered with the fit, and the standard deviation and student t are calculated according to Equation 6 and Equation 7 below:

實施例3 胎兒分數估計Example 3 Fetal Score Estimation

由於胎兒分數對發明人的檢測非常重要,在測試步驟之前發明人估計了胎兒分數。如前文指出的,發明人測序了19名成年男性,當將他們的覆蓋深度與懷有女性胎兒的案例比較時,發明人發現男性的染色體X覆蓋深度是女性的接近1/2,男性的染色體Y覆蓋深度比女性的大接近0.5倍。因此,發明人可以如公式8、公式9和公式10依賴於染色體X和Y的覆蓋深度並考慮GC相關性來估計胎兒分數: Since the fetal score is very important for the inventor's detection, the inventors estimated the fetal score before the test procedure. As pointed out earlier, the inventors sequenced 19 adult males. When comparing their coverage depth with the case of a female fetus, the inventors found that the chromosomal X coverage depth of males is close to 1/2 of that of females. Y coverage depth is nearly 0.5 times larger than that of women. Therefore, the inventors can rely on the depth of coverage of chromosomes X and Y as in Equation 8, Equation 9, and Equation 10 and consider the GC correlation to estimate the fetal fraction:

其中是指通過對懷有女性胎兒案例的染色體X覆蓋深度和相應GC含量進行回歸關聯得到的擬合覆蓋深度, 是指通過對懷有女性胎兒案例的染色體Y覆蓋深度和相應GC含量進行回歸關聯得到的擬合覆蓋深度,是指通過對男性成年的染色體X覆蓋深度和相應GC含量進行回歸關聯得到的擬合覆蓋深度,是指通過對男性成年的染色體Y覆蓋深度和相應GC含量進行回歸關聯得到的擬合覆蓋深度。為了簡化計算,設定相等,相等。 among them Refers to the fitting coverage depth obtained by regression correlation of the chromosome X coverage depth and the corresponding GC content in the case of a female fetus. Refers to the fitting coverage depth obtained by regression correlation of the chromosome Y coverage depth and the corresponding GC content in the case of a female fetus. Refers to the fitting coverage depth obtained by regression correlation of male adult chromosome X coverage depth and corresponding GC content. It refers to the fitting coverage depth obtained by regression correlation of male adult chromosome Y coverage depth and corresponding GC content. In order to simplify the calculation, set with equal, with equal.

實施例4 計算每一染色體的殘差Example 4 Calculating the residual of each chromosome

第6圖顯示了,在某一獨特讀段總數目下每一染色體的標準差(參見公式3)受到參與的參考案例數目的影響。在對於每個案例測序170萬的總獨特讀段數目的條件下,當選擇的案例數目超過150個時,標準差幾乎不增加。然而,對於不同染色體所述標準差是不同的。在考慮GC偏差後,發明人的方法對於以下染色體有適度的標準差:染色體13(0.0063)、染色體18(0.0066)和染色體21(0.0072)。染色體X的標準差比以上提及的染色體高,它需要更多策略來進行準確的異常檢測。 Figure 6 shows that the standard deviation of each chromosome (see Equation 3) is affected by the number of reference cases involved in the total number of unique reads. Under the condition that the total number of unique reads of 1.7 million is sequenced for each case, when the number of selected cases exceeds 150, the standard deviation hardly increases. However, the standard deviation is different for different chromosomes. After considering the GC bias, the inventors' method has a modest standard deviation for the following chromosomes: chromosome 13 (0.0063), chromosome 18 (0.0066), and chromosome 21 (0.0072). The standard deviation of chromosome X is higher than the chromosomes mentioned above, and it requires more strategies for accurate anomaly detection.

第7圖顯示了Q-Q作圖,其中殘差被編譯成常態分布,常態分布表明所述student t計算是合理的。 Figure 7 shows the Q-Q plot where the residuals are compiled into a normal distribution and the normal distribution indicates that the student t calculation is reasonable.

實施例5 區分胎兒性別Example 5 Distinguishing the sex of the fetus

為了發現性染色體病症,最好區分胎兒性別。在發明人對300個案例中染色體Y覆蓋深度的頻率分布進行研究時,存在兩個明顯峰,這提示可通過染色體Y的覆蓋深度區分性別。覆蓋深度小於0.04的案例可以被看作懷有女性胎兒,而大於0.051被看作懷有男 性胎兒,0.04和0.051之間被看作性別不確定,如第8圖。對於這些性別不確定和非整倍性案例,使用邏輯回歸來預測他們的性別,如公式11(Fan,et al.,Proc Natl Acad Sci USA(2008)42:16266-16271): In order to discover a sex chromosome disorder, it is best to distinguish the sex of the fetus. When the inventors studied the frequency distribution of chromosome Y coverage depth in 300 cases, there were two distinct peaks, suggesting that gender can be distinguished by the depth of coverage of chromosome Y. Cases with a depth of less than 0.04 can be considered to have a female fetus, while greater than 0.051 is considered to have a male fetus, and between 0.04 and 0.051 is considered gender uncertainty, as shown in Figure 8. For these cases of gender uncertainty and aneuploidy, logistic regression is used to predict their gender, as in Equation 11 (Fan, et al. , Proc Natl Acad Sci USA (2008) 42: 16266-16271):

其中cr.a i,x cr.a i,y 分別是X和Y的標準化的相對覆蓋度。 Where cr.a i , x and cr.a i , y are the normalized relative coverage of X and Y , respectively.

與核型結果相比,發明人的區分胎兒性別的方法在其300個參考案例中進行的非常好,準確性100%,而在其901個案例組中進行時僅弄錯一個案例,並且該錯誤案例的染色體Y覆蓋深度在0.04和0.051之間。 Compared with the karyotype results, the inventor's method of distinguishing the sex of the fetus performed very well in its 300 reference cases with 100% accuracy, while only one case was mistaken when it was carried out in its 901 case group, and The chromosome Y coverage depth of the wrong case is between 0.04 and 0.051.

實施例6 GC-相關性t-檢驗方法的診斷性能Example 6 Diagnostic performance of GC-correlated t-test method

樣品募集Sample collection

903參加者預期地募集自深圳人民醫院和深圳母嬰保健中心,有其核型結果。從每個募集單位的公共審查部門獲得許可,所有參加者簽署知情同意書。在取血時記錄母親年齡和孕週。所述903個案例包括2個三體性13案例、15個三體性18案例、16個三體性21案例、3個XO案例、2個XXY案例和1個XYY案例。其核型結果分布如第9圖中所示。 903 participants were expected to be recruited from Shenzhen People's Hospital and Shenzhen Maternal and Child Health Center, with their karyotype results. Licenses were obtained from the public review department of each of the recruiting units, and all participants signed informed consent. Record the mother's age and gestational age when taking blood. The 903 cases include 2 trisomy 13 cases, 15 trisomy 18 cases, 16 trisomy 21 cases, 3 XO cases, 2 XXY cases and 1 XYY case. The distribution of its karyotype results is shown in Figure 9.

母體血漿DNA測序Maternal plasma DNA sequencing

從每個參加孕婦採集外周靜脈血(5毫升)至EDTA管中,在4小時內在1,600g下離心10分鐘。將血漿轉移至微量離心管中並在16,000g下離心10分鐘以除去殘留細胞。將無細胞血漿在80℃下 保存至DNA提取。每個血漿樣品僅凍融一次。 Peripheral venous blood (5 ml) was collected from each participating pregnant woman into an EDTA tube and centrifuged at 1,600 g for 10 minutes in 4 hours. The plasma was transferred to a microcentrifuge tube and centrifuged at 16,000 g for 10 minutes to remove residual cells. Cell-free plasma at 80 ° C Save to DNA extraction. Each plasma sample is only frozen and thawed once.

為了大規模平行基因組測序,根據改良的來自Illumina的方案使用從600 μl母體血漿提取的DNA進行DNA庫構建。簡而言之,使用T4 DNA聚合酶、KlenowTM聚合酶和T4多核苷酸激酶對母體血漿DNA片段進行末端配對。加入末端A殘基,然後將市購的適體(Illumina)連接至所述DNA片段。然後,將所述適體連接的DNA以常規多重引子使用17個循環的PCR另外進行擴增。使用Agencourt AMPureTM 60 ml Kit(Beckman)來純化PCR產物。在2100 BioanalyzerTM(Agilent)上以DNA 1000試劑盒分析所述測序庫的大小分布,並以實時PCR進行定量。然後,將具有不同索引(index)的測序庫等量合並為一個,然後在Illumina GA IITM上進行cluster station(單末端測序)。 For large-scale parallel genome sequencing, DNA library construction was performed using DNA extracted from 600 μl of maternal plasma according to a modified Illumina-based protocol. Briefly, T4 DNA polymerase, Klenow TM polymerase and T4 polynucleotide kinase maternal plasma DNA of paired end fragments. The terminal A residue was added, and then a commercially available aptamer (Illumina) was ligated to the DNA fragment. Then, the aptamer-ligated DNA was additionally amplified using a conventional multiplexer using 17 cycles of PCR. PCR products were purified using Agencourt AMPure TM 60 ml Kit (Beckman ). DNA 1000 Kit size in the analysis of sequencing libraries distributed over the 2100 Bioanalyzer TM (Agilent), and real-time quantitative PCR. Then, an equal amount of the sequenced library having a different index (index) is combined into one, then cluster station (one terminal sequencing) on Illumina GA II TM.

將19個男性整倍體樣品測序,用於胎兒DNA分數估計的後續分析。發明人開發了一種新的GC相關性t檢驗方法,用於診斷三體性13、三體性18、三體性21和性染色體異常,發明人將該新方法與下文提及的其他兩種方法在診斷性能方面進行了比較。 Nineteen male euploid samples were sequenced for subsequent analysis of fetal DNA fraction estimates. The inventors have developed a new GC-related t-test for the diagnosis of trisomy 13, trisomy 18, trisomy 21 and sex chromosome abnormalities. The inventors have compared this new method with the other two mentioned below. Methods were compared in terms of diagnostic performance.

實施例7 檢測胎兒非整倍性例如三體性13、18和21Example 7 Detection of fetal aneuploidy such as trisomy 13, 18 and 21

為了確定患者案例中的染色體拷貝數是否偏離正常,將染色體的覆蓋深度與所有其他參考案例進行比較。所有以前的研究僅有一個零假設。發明人通過使用兩個零假設首次引入了二元假設。一個零假設(H0:所述胎兒是整倍性)是假定所述患者案例分布的平均覆蓋深度和所有正常參考分布的平均覆蓋深度相等,這意味著如果該零假設被接受則該患者案例是整倍體。使用student t檢驗,t1可 以如公式12計算: To determine if the chromosome copy number in the patient case deviated from normal, the depth of coverage of the chromosome was compared to all other reference cases. All previous studies have only one null hypothesis. The inventor introduced the binary hypothesis for the first time by using two null hypotheses. A null hypothesis (H0: the fetus is euploid) assumes that the average coverage depth of the patient case distribution is equal to the average coverage depth of all normal reference distributions, which means that if the null hypothesis is accepted then the patient case is Euploid. Using the Student t test, t1 can be calculated as Equation 12:

另一零假設(H1:所述胎兒是非整倍性)是具有不良胎兒分數的患者案例分布的平均覆蓋深度等於具有相同胎兒分數的非整倍性案例分布的平均覆蓋深度,這意味著如果該零假設被接受則該患者案例是非整倍體。student t統計量,t2如公式13計算: Another null hypothesis (H1: the fetus is aneuploidy) is that the average coverage depth of the patient case distribution with a bad fetal fraction is equal to the average coverage depth of the aneuploidy case distribution with the same fetal fraction, which means if The patient case is aneuploidy if the null hypothesis is accepted. Student t statistic, t2 is calculated as Equation 13:

|t1|>3和|t2|<3將指示在大多數情況下特別是當整倍體案例和非整倍體案例之間的分布被完全區分時的非整倍體案例,而在其他條件下例如精密度不足或胎兒分數不足等,|t1|可以小於3,但胎兒是異常的。t1和t2結合可以幫助發明人作更正確的決定,然後發明人應用公式14的t1和t2的對數似然比:L i,j =log(p(t1 i,j ,degree | D))/log(p(t2 i,j ,degree|T)) (14) |t1|>3 and |t2|<3 will indicate aneuploid cases in most cases, especially when the distribution between the euploid case and the aneuploid case is completely distinguished, while in other conditions For example, if the precision is insufficient or the fetal score is insufficient, etc., |t1| may be less than 3, but the fetus is abnormal. The combination of t1 and t2 can help the inventor make a more correct decision, and then the inventors apply the log likelihood ratios of t1 and t2 of Equation 14: L i , j =log( p (t1 i , j ,degree | D))/ Log( p ( t 2 i , j ,degree|T)) (14)

其中Li,j是對數似然比。如果所述比值大於1,發明人將推斷所述胎兒可能是三體性。 Where L i,j is a log likelihood ratio. If the ratio is greater than 1, the inventors will conclude that the fetus may be trisomy.

但對於懷有女性胎兒的案例,發明人難以估計其胎兒分數,因此不可能進行計算。然而,根據胎兒分數的經驗分布,發明人可以得到7%的分數參考值(RV)。 However, in the case of a female fetus, it is difficult for the inventor to estimate the fetal fraction, so calculation is impossible. However, based on the empirical distribution of fetal fractions, the inventors can obtain a score reference value (RV) of 7%.

研究了903個案例,其中866個攜帶整倍體胎兒,在其中隨機選擇300個案例來開發GC相關性student t方法。另外,2個三體性13、12個三體性18、16個三體性21、4個XO(由3個XO案例和1個嵌合體45、xo/46、xx(27:23)案例組成)、2個XXY和1個 XYY案例參加了發明人的研究。在比對後,發明人獲得每個案例平均170萬的數據(SD=306185)獨特匹配讀段(無錯配)。通過使用發明人新開發的CG相關性student t檢驗,所有T13案例(2個中的2個)被成功鑑定,而901個非三體性13案例中的901個被正確分類(第10A圖)。該方法的敏感性和特異性是100%和100%(表1)。 A total of 903 cases were studied, of which 866 carried a euploid fetus, and 300 cases were randomly selected to develop the GC-related student t method. In addition, 2 trisomy 13, 12 trisomy 18, 16 trisomy 21, 4 XO (by 3 XO cases and 1 chimera 45, xo/46, xx (27:23) case Composition), 2 XXY and 1 The XYY case participated in the research of the inventor. After the comparison, the inventors obtained an average of 1.7 million data (SD = 306185) unique matching reads (no mismatches) for each case. All T13 cases (two of 2) were successfully identified by using the inventor's newly developed CG-related student t test, while 901 of the 901 non-trisomy 13 cases were correctly classified (Figure 10A). . The sensitivity and specificity of this method are 100% and 100% (Table 1).

對於三體性18,12個三體性18案例中的12個和891個非三體性18案例中的888可以被正確地鑑定(第10A圖)。該方法的敏感性和特異性分別是100%和99.66%。對於三體性21,16個三體性21案例中的16個和16個非三體性21案例中的16個也可以被正確地檢測(第10A圖)。該方法的敏感性和特異性分別是100%和100%。 For the trisomy 18, 12 of the 12 trisomy 18 cases and 888 of the 891 non-trisomy 18 cases can be correctly identified (Fig. 10A). The sensitivity and specificity of this method are 100% and 99.66%, respectively. For the trisomy 21, 16 of the 16 trisomy 21 cases and 16 of the 16 non-trisomy 21 cases can also be correctly detected (Fig. 10A). The sensitivity and specificity of this method are 100% and 100%, respectively.

實施例8 檢測XO、XXX、XXY、XYYExample 8 Detection of XO, XXX, XXY, XYY

在上文發明人考慮了對常染色體三體性的檢測,也可以通過發明人的方法檢測性染色體病症例如XO、XXX、XXY和XYY。 In the above, the inventors considered the detection of autosomal trisomies, and it is also possible to detect sexual chromosomal disorders such as XO, XXX, XXY and XYY by the inventors' methods.

首先,通過性別區分確認性別。如果測試案例被確認懷有女性胎兒,那麽需要計算student t值t1以用於XXX或XO檢測,其中std Xf 與公式10相同;如果t1大於3.13或小於-3.13,該案例可能是XXX或XO。但考慮到精確度被染色體X的覆蓋深度的大偏差限制,發明人再次取樣血漿並重複實驗以在|t1|<5(即使|t1|>3.13)時作出更可信的決定。在該案例中,|t1|>5被確認是非整倍性。所有發明人的檢測方法是基於這樣的前提,即數據滿足標準質量控制。 First, gender is identified through gender differentiation. If the test case is confirmed to have a female fetus, then the student t value t1 needs to be calculated. For use in XXX or XO testing, where And std Xf is the same as Equation 10; if t1 is greater than 3.13 or less than -3.13, the case may be XXX or XO. However, considering that the accuracy is limited by the large deviation of the depth of coverage of chromosome X, the inventors again sampled the plasma and repeated the experiment to make a more reliable decision at |t1|<5 (even if |t1|>3.13). In this case, |t1|>5 was confirmed to be aneuploidy. All inventors' detection methods are based on the premise that the data meets the standard quality control.

如果測試樣品被確認懷有男性胎兒,首先通過Y和X估計所述胎兒DNA分數。同時,發明人能夠以僅通過染色體Y覆蓋深度估計的胎兒DNA分數外推染色體X的擬合覆蓋深度,並且可以計算t2。。如果t2太大(大於5)或太小(小於-5),所述胎兒可以是XXY或XYY。另外,通過X和Y獨立估計的胎兒分數之間的差異將提供用於檢測有關性染色體的病症的訊息。 If the test sample is confirmed to harbor a male fetus, the fetal DNA fraction is first estimated by Y and X. At the same time, the inventors were able to extrapolate the depth of the fitting of the chromosome X by the fetal DNA fraction estimated only by the chromosome Y coverage depth, and can calculate t2. . If t2 is too large (greater than 5) or too small (less than -5), the fetus may be XXY or XYY. In addition, the difference between fetal scores independently estimated by X and Y will provide a message for detecting conditions related to the sex chromosome.

在XO檢測中,4個XO案例中的3個被檢測,不能被鑑定的案例是嵌合體案例(第10B圖)。該方法的敏感性和特異性分別是75%(如果發明人忽略所述嵌合體案例則是100%)和99.55%。對於XXY案例,所有2個案例都被成功鑑定,而901個非XXY案例中的901個被正確分類(第10B圖),敏感性100%且特異性100%。對於XYY案例,發明人將其正確地鑑定(第10B圖),敏感性和特異性分別是100%和100%。 In the XO test, 3 of the 4 XO cases were detected, and the case that could not be identified was the chimera case (Fig. 10B). The sensitivity and specificity of the method were 75% (100% if the inventor ignored the chimeric case) and 99.55%. For the XXY case, all 2 cases were successfully identified, and 901 of the 901 non-XXY cases were correctly classified (Figure 10B) with 100% sensitivity and 100% specificity. For the XYY case, the inventors correctly identified it (Fig. 10B) with sensitivity and specificity of 100% and 100%, respectively.

為了評估在與其他兩個報導的方法:z值和GC校正z值比較時,本發明的新方法是否有任何優點,發明人使用所有這3種方法分析發明人的900個案例,相同的300個案例作為參考組用於所有這些方法。測量的精度始終以置信度值(CV)體現。在發明人的研究中,標準的z值方法的CV在臨床感興趣的染色體18和21中大於其他方法(第11圖),導致對於三體性18和21的敏感性較低(表1)。 In order to evaluate whether the new method of the present invention has any advantages when compared with the other two reported methods: z value and GC corrected z value, the inventors used all three methods to analyze the inventor's 900 cases, the same 300 Each case is used as a reference group for all of these methods. The accuracy of the measurement is always reflected in the confidence value (CV). In the inventors' study, the CV of the standard z-value method was greater in chromosomes 18 and 21 of clinical interest than in other methods (Fig. 11), resulting in lower sensitivity to trisomy 18 and 21 (Table 1). .

表1 不同方法的敏感性和特異性的比較 Table 1 Comparison of sensitivity and specificity of different methods

對於GC校正z值方法,染色體13的CV值是0.0066,100%敏感性率和100%特異性率。對於本文討論的新的GC相關性student t方法,染色體13的CV值是0.0063,100%敏感性率和100%特異性率。在染色體18中,這兩種方法的CV分別是0.0062和0.0066,都為100%敏感性且對它們的特異性率分別是99.89%和99.96%。對於染色體21,在比較這兩種方法的CV時性能相似:分別為0.0088和0.0072。二者在發明人的小案例組研究中都導致100%的相同敏感性率,並達到相同的100%特異性率。並且,這兩種方法的性能都優於標準的z值方法。發明人新開發的GC相關性方法不但與GC校正方法相比具有好的性能,而且它在檢測性染色體異常例如XO、XXY和XYY中有另一優點。發明人的數據顯示,在進行GC校正方法時,難以用在通過乘以權重因子修復序列標簽數目中引入的性染色體所呈現的數據偏差來區分胎兒性別,從而對性染色體病症的檢測似乎很難。 For the GC corrected z-value method, the CV value for chromosome 13 is 0.0066, 100% sensitivity rate and 100% specific rate. For the new GC-related student t method discussed herein, the CV value for chromosome 13 is 0.0063, 100% sensitivity and 100% specificity. In chromosome 18, the CV of the two methods were 0.0062 and 0.0066, respectively, both of which were 100% sensitive and their specific rates were 99.89% and 99.96%, respectively. For chromosome 21, the performance was similar when comparing the CV of the two methods: 0.0088 and 0.0072, respectively. Both in the inventor's small case study led to 100% identical sensitivity rates and reached the same 100% specificity rate. Moreover, the performance of both methods is superior to the standard z-value method. The newly developed GC correlation method of the inventors has not only good performance compared to the GC calibration method, but it has another advantage in detecting sex chromosome abnormalities such as XO, XXY and XYY. The inventor's data show that it is difficult to distinguish the sex of a fetus by using the data bias presented by the sex chromosome introduced by multiplying the weighting factor to repair the number of sequence tags when performing the GC calibration method, so that it seems difficult to detect the sex chromosome disorder. .

實施例9 考慮數據大小、孕週和胎兒DNA分數時的GC相關性t檢驗方法的理論性能Example 9 Theoretical Performance of GC Correlation T-Test Method Considering Data Size, Gestational Week, and Fetal DNA Fraction

測量非整倍性仍然很難,原因是直至今日高背景的母體DNA(Fan,et al.,Proc Natl Acad Sci USA(2008)42:16266-16271)和隨意少量的胎兒DNA分數是通過大規模並行基因組測序(MPGS)方法進行非整倍性檢測的最重要限制因素。然而,在特別針對女性胎兒的MPGS檢測之前,臨床上在確定最小胎兒DNA分數方面沒有大的突破,而僅有的與涉及胎兒DNA分數相關的臨床線索是孕週。以前已經報導在胎兒DNA分數和孕齡之間有統計顯著的相關性(Lo, et al.,Am.J.Human Genet.(1998)62:768-775)。在發明人的研究中,為了研究估計的胎兒DNA分數和孕齡之間的關係,發明人在第12圖中繪出了通過估計公式10獲得的所有懷有男性胎兒的參與案例(共427個案例)的胎兒DNA分數。對每個樣品估計的胎兒DNA分數與孕週相關(P小於0.0001)。還顯示,即使在孕齡20週中,65個案例中仍有4個的胎兒DNA分數少於5%,這將不利地影響檢測準確性。為了評估所述胎兒分數估計方法,發明人選擇了估計的胎兒分數中層次分布的一些案例,然後用Q-PCR幫助計算另一相關胎兒分數。然後,發明人得到顯示它們之間強相關性的相關性標準曲線,這證明了通過發明人的方法估計胎兒分數是可信的。 Measurement of aneuploidy is still difficult because of the high background of maternal DNA up to today (Fan, et al. , Proc Natl Acad Sci USA (2008) 42: 16266-16271) and random small amounts of fetal DNA scores through large scale The Parallel Genome Sequencing (MPGS) method is the most important limiting factor for aneuploidy detection. However, there was no major breakthrough in determining the minimum fetal DNA score clinically before the MPGS test specifically for female fetuses, and the only clinical clue associated with fetal DNA scores was gestational age. A statistically significant correlation between fetal DNA fraction and gestational age has previously been reported (Lo, et al. , Am. J. Human Genet. (1998) 62:768-775). In the inventor's study, in order to study the relationship between the estimated fetal DNA fraction and gestational age, the inventor in Fig. 12 plots all the cases involving male fetuses obtained by estimating formula 10 (a total of 427 Case) fetal DNA score. The estimated fetal DNA score for each sample was correlated with gestational age (P less than 0.0001). It has also been shown that even in 20 weeks of gestational age, 4 of the 65 cases have a fetal DNA score of less than 5%, which will adversely affect the accuracy of the test. To evaluate the fetal fraction estimation method, the inventors selected some cases of the hierarchical distribution of estimated fetal scores and then used Q-PCR to help calculate another relevant fetal fraction. Then, the inventors obtained a correlation standard curve showing a strong correlation between them, which proves that the estimation of the fetal fraction by the inventor's method is credible.

同時,測序深度(總獨特讀段的數目)是影響以標準差值體現的非整倍性檢測的精確度的另一重要因素。當參考案例數目達到150個時,發明人的GC相關方法中採用的每條染色體的標準差可以被固定在某一測序深度水平下(第13圖)。為了研究測序深度如何影響每條染色體的標準差,發明人不但以本發明的170萬水平,而且以總獨特讀段數目達到5百萬(SD=170萬)的另一測序深度水平測序了150個案例。依賴於這兩個集,發明人發現標準方差與總獨特讀段數目的平方根的倒數是線性相關的,如第6圖中所示。 At the same time, the depth of sequencing (the total number of unique reads) is another important factor affecting the accuracy of aneuploidy detection as reflected by the standard deviation. When the number of reference cases reaches 150, the standard deviation of each chromosome used in the inventors' GC-related methods can be fixed at a certain depth of sequencing level (Fig. 13). To investigate how the depth of sequencing affects the standard deviation of each chromosome, the inventors sequenced 150 not only at the 1.7 million level of the present invention, but also at another sequencing depth level of 5 million (SD = 1.7 million) total unique reads. Cases. Depending on these two sets, the inventors found that the standard deviation is linearly related to the reciprocal of the square root of the total number of unique reads, as shown in Figure 6.

對於給定的胎兒DNA分數,發明人可以估計用於本發明方法的總獨特讀段數目以檢測在t1等於3時與正常的染色體拷貝數偏差(第14圖)。已經表明,胎兒DNA分數越少,所需的測序深度越大。在本發明的170萬獨特讀段集中,本發明方法能夠檢測胎兒DNA分數多於4.5%的染色體13和X的非整倍性胎兒,超過4%的染色 體21和18的非整倍性胎兒;而在本發明的5百萬參考集中,本發明方法能夠檢測甚至所述胎兒DNA分數約3%的三體性18和三體性21。如果發明人想鑑定胎兒分數為約4%的染色體X異常例如XXX或XO的胎兒,在這些案例和相應參考案例中需要的總獨特數目應達5百萬。如果胎兒DNA少於3.5%,則測序深度需求將超過20M。並且,如果DNA胎兒分數更低,則檢測將變得不可信且難以進行,因此發明人建議另一策略,即應在孕齡變大時再取樣孕婦血漿、再進行本發明實驗並再分析數據,因為在孕齡變大時有更大可能性胎兒DNA分數將隨孕齡增加而升高。並且,該策略還可以應用於懷疑有少量胎兒DNA分數的樣品。 For a given fetal DNA fraction, the inventors can estimate the total number of unique reads used in the methods of the invention to detect deviations from normal chromosome copy number when t1 is equal to 3 (Fig. 14). It has been shown that the fewer fetal DNA fractions, the greater the depth of sequencing required. In the 1.7 million unique reads of the present invention, the method of the present invention is capable of detecting aneuploid fetuses of chromosomes 13 and X having a fetal DNA fraction of more than 4.5%, and staining more than 4%. The aneuploidy fetus of bodies 21 and 18; and in the 5 million reference set of the invention, the method of the invention is capable of detecting trisomy 18 and trisomy 21 of even about 3% of the fetal DNA fraction. If the inventors want to identify a fetus with a fetal fraction of about 4% of chromosomal X abnormalities such as XXX or XO, the total unique number required in these cases and corresponding reference cases should be 5 million. If the fetal DNA is less than 3.5%, the sequencing depth requirement will exceed 20M. Moreover, if the DNA fetal fraction is lower, the detection will become unreliable and difficult to perform, so the inventors suggest another strategy, that is, to re-sample the maternal plasma when the gestational age becomes larger, and then perform the experiment of the present invention and analyze the data again. Because there is a greater likelihood that the fetal DNA score will increase with gestational age as the gestational age becomes larger. Moreover, the strategy can also be applied to samples suspected of having a small fraction of fetal DNA.

即使本發明的方法工作良好,但如果沒有大異常案例集合則沒有說服力。為了估計本發明應用的該GC相關性student t方法的敏感性,發明人公開了考慮不同孕齡和不同測序深度的理論敏感性。 Even though the method of the present invention works well, it is not convincing if there is no large set of abnormal cases. To estimate the sensitivity of the GC-related student t method to which the present invention is applied, the inventors have disclosed theoretical sensitivities that take into account different gestational ages and different sequencing depths.

發明人以以下步驟計算了非整倍性的理論敏感性。首先,發明人應用了回歸分析來以孕齡擬合胎兒DNA分數,其中是第i孕齡gsa i 擬合的胎兒DNA分數平均值,並通過應用Gaussian函數密度估計(Birke,(2008)Journal of Statistical Planning和Inference 139:2851-2862)來估計大約的胎兒DNA分數,主要是指分布在19和20孕週中的估計胎兒DNA分數,然後根據胎兒DNA分數和孕齡之間的關係外推其他週中胎兒DNA分數分布,其中是第i孕齡中胎兒DNA分數的擬合概率密度,其中X是19和20孕週的數據(第12圖)。第二,發 明人根據其以前提及的總獨特讀段數目來估計標準方差,其中tuqn是總獨特讀段數目。最後,為了根據在每個測序深度中估計的分布胎兒DNA分數和標準方差計算在某一測序深度水平下每一孕齡的敏感性,發明人計算了每一胎兒DNA分數的假陰性的概率密度(在本文中,發明人假設胎兒DNA分數波動為常態分布),然後將它們整合以得到由所有胎兒DNA分數水平組成的孕齡的假陰性率(FNR)其中j是染色體j。容易地,該孕齡的某一測序深度的理論敏感性被計算為1□FNR。第15圖-第21圖顯示了發明人計算得到的圖。設置student t大於3來鑑定女性胎兒非整倍性,而對於男性胎兒,當計算每一分數的假陰性的概率密度時,大於1的對數似然性被用作發明人在二元假設中提及的臨界值,該值幫助達到比女性更高的敏感性。 The inventors calculated the theoretical sensitivity of aneuploidy by the following steps. First, the inventors applied regression analysis to fit fetal DNA scores with gestational age. ,among them Is the mean fetal DNA score fitted to the i-th gestational age gsa i and estimates the approximate fetal DNA score by applying Gaussian function density estimates (Birke, (2008) Journal of Statistical Planning and Inference 139: 2851-2862). Refers to the estimated fetal DNA scores distributed between the 19th and 20th gestational weeks, and then based on the relationship between fetal DNA scores and gestational age. Extrapolating the distribution of fetal DNA scores in other weeks, among which Is the fitted probability density of the fetal DNA fraction in the i-th gestational age, where X is the data for 19 and 20 gestational weeks (Fig. 12). Second, the inventor estimates the standard deviation based on the total number of unique reads previously mentioned. , where tuqn is the total number of unique reads. Finally, in order to calculate the sensitivity of each gestational age at a certain depth of sequencing level based on the estimated fetal DNA fraction and standard deviation estimated at each sequencing depth, the inventors calculated the probability density of false negatives for each fetal DNA fraction. (In this article, the inventors hypothesized that fetal DNA fraction fluctuations are normally distributed) and then integrate them to obtain a false negative rate (FNR) of gestational age consisting of all fetal DNA fraction levels. Where j is chromosome j. Easily, the theoretical sensitivity of a certain sequencing depth of the gestational age is calculated as 1 □ FNR. Fig. 15 - Fig. 21 show the figures calculated by the inventors. Student t is set to be greater than 3 to identify female fetus aneuploidy, whereas for male fetus, when calculating the probability density of false negatives for each score, a log likelihood greater than 1 is used as the inventor in the binary hypothesis And the critical value, which helps achieve higher sensitivity than women.

然而,發明人的推理是相對保守的,原因是難以得到無限接近胎兒DNA分數隨孕齡-特別是小規模取樣中的小孕齡-的真實分布的分布。 However, the inventors' reasoning is relatively conservative because it is difficult to obtain a distribution of the true distribution of infinitely close fetal DNA scores with gestational age - especially small gestational age in small-scale sampling.

參考文獻 references

1. Virginia P. Sybert, Elizabeth McCauley (2004). Turner's Syndrome. N Engl J Med 2004; 351:1227-1238. 1. Virginia P. Sybert, Elizabeth McCauley (2004). Turner's Syndrome. N Engl J Med 2004; 351:1227-1238.

2. Robert Bock (1993). Understanding Klinefelter Syndrome: A Guide for XXY Males and Their Families. NIH Pub. No. 93-3202 August 1993. 2. Robert Bock (1993). Understanding Klinefelter Syndrome: A Guide for XXY Males and Their Families. NIH Pub. No. 93-3202 August 1993.

3. Aksglaede, Lise; Skakkebaek, Niels E.; Juul, Anders (January 2008). "Abnormal sex chromosome constitution and longitudinal growth: serum levels of insulin-like growth factor (IGF)-I, IGF binding protein-3, luteinizing hormone, and testosterone in 109 males with 47, XXY, 47, XYY, or sex-determining region of the Y chromosome (SRY)-positive 46, XX karyotypes". J Clin Endocrinol Metab 93 (1): 169-176. doi:10. 1210/jc. 2007-1426. PMID 17940117. 3. Aksglaede, Lise; Skakkebaek, Niels E.; Juul, Anders (January 2008). "Abnormal sex chromosome constitution and Longitudinal growth: serum levels of insulin-like growth factor (IGF)-I, IGF binding protein-3, luteinizing hormone, and testosterone in 109 males with 47, XXY, 47, XYY, or sex-determining region of the Y chromosome ( SRY)-positive 46, XX karyotypes". J Clin Endocrinol Metab 93 (1): 169-176. doi:10.1210/jc. 2007-1426. PMID 17940117.

4. H. Bruce Ostler (2004). Diseases of the eye and skin: a color atlas. Lippincott Williams & Wilkins. pp. 72. ISBN 9780781749992. 4. H. Bruce Ostler (2004). Diseases of the eye and skin: a color atlas. Lippincott Williams & Wilkins. pp. 72. ISBN 9780781749992.

5. Driscoll DA, Gross S (2009) Clinical practice. Prenatal screening for aneuploidy. N Engl J Med 360: 2556-2562. 5. Driscoll DA, Gross S (2009) Clinical practice. Prenatal screening for aneuploidy. N Engl J Med 360: 2556-2562.

6. Karl O. Kagan, Dave Wright, Catalina Valencia etc (2008). Screening for trisomies 21, 18 and 13 by maternal age, fetal nuchal translucency, fetal heart rate, free b-hCG and pregnancy-associated plasma protein-A. Human Reproduction Vol. 23, No. 9 pp. 1968-1975, 2008 doi:10. 1093/humrep/den224. 6. Karl O. Kagan, Dave Wright, Catalina Valencia etc (2008). Screening for trisomies 21, 18 and 13 by maternal age, fetal nuchal translucency, fetal heart rate, free b-hCG and pregnancy-associated plasma protein-A. Human Reproduction Vol. 23, No. 9 pp. 1968-1975, 2008 doi:10. 1093/humrep/den224.

7. Malone FD, et al. (2005) First-trimester or second-trimester screening, or both, for Down’s syndrome. N Engl J Med 353:2001-2011. 7. Malone FD, et al. (2005) First-trimester or second-trimester screening, or both, for Down’s syndrome. N Engl J Med 353:2001-2011.

8. Fan HC, Quake SR (2010) Sensitivity of Noninvasive Prenatal Detection of Fetal Aneuploidy from Maternal Plasma Using Shotgun Sequencing Is Limited Only by Counting Statistics. PLoS ONE 5(5): e10439. doi:10. 1371/journal. pone. 0010439. 8. Fan HC, Quake SR (2010) Sensitivity of Noninvasive Prenatal Detection of Fetal Aneuploidy from Maternal Plasma Using Shotgun Sequencing Is Limited Only by Counting Statistics. PLoS ONE 5(5): e10439. doi:10. 1371/journal. pone. 0010439.

9. Chiu RW, Chan KC, Gao Y, Lau VY, Zheng W, et al. (2008) Noninvasive prenatal diagnosis of fetal chromosomal aneuploidy by massively parallel genomic sequencing of DNA in maternal plasma. Proc Natl Acad Sci USA 105: 20458-20463. 9. Chiu RW, Chan KC, Gao Y, Lau VY, Zheng W, et al. (2008) Noninvasive prenatal diagnosis of fetal chromosomal aneuploidy by massively parallel genomic sequencing of DNA in maternal plasma. Proc Natl Acad Sci USA 105: 20458- 20463.

10. McCullagh, P. and Nelder, J .~A. (1989), Generalized Linear Models, London, UK: Chapman & Hall/CRC. 10. McCullagh, P. and Nelder, J .~A. (1989), Generalized Linear Models, London, UK: Chapman & Hall/CRC.

11. Fan HC, Blumenfeld YJ, et al. (2008) Noninvasive diagnosis of fetal aneuploidy by shotgun sequencing DNA from maternal blood. Proc Natl Acad Sci USA 42:16266-16271. 11. Fan HC, Blumenfeld YJ, et al. (2008) Noninvasive diagnosis of fetal aneuploidy by shotgun sequencing DNA from maternal blood. Proc Natl Acad Sci USA 42:16266-16271.

12. Melanie Birke. (2008) Shape constrained kernel density estimation. Journal of Statistical Planning and Inference Volume 139, Issue 8, 1 August 2009, Pages 2851-2862. 12. Melanie Birke. (2008) Shape constrained kernel density estimation. Journal of Statistical Planning and Inference Volume 139, Issue 8, 1 August 2009, Pages 2851-2862.

13. Lo et al., Lancet 350:485 487 (1997). 13. Lo et al., Lancet 350: 485 487 (1997).

14. Lo et al., Am. J. hum. Genet. 62:768-775 (1998). 14. Lo et al., Am. J. hum. Genet. 62:768-775 (1998).

15. Pertl and Bianchi, Obstetrics and Gynecology 98:483-490 (2001). 15. Pertl and Bianchi, Obstetrics and Gynecology 98:483-490 (2001).

16. Rogers and Ventner, "Genomics: Massively parallel sequencing," Nature, 437, 326-327 (15 Sep. 2005). 16. Rogers and Ventner, "Genomics: Massively parallel sequencing," Nature, 437, 326-327 (15 Sep. 2005).

17. Mewar et al., "Clinical and molecular evaluation of four patients with partial duplications of the long arm of chromosome 18," Am J Hum Genet. 1993 December; 53(6):1269-78. 17. Mewar et al., "Clinical and molecular evaluation of four patients with partial duplications of the long arm of chromosome 18," Am J Hum Genet. 1993 December; 53(6): 1269-78.

18. Margulies et al., (2005) Nature 437:376-380. 18. Margulies et al., (2005) Nature 437:376-380.

19. Harris et al., (2008) Science, 320:106-109. 19. Harris et al., (2008) Science , 320: 106-109.

20. Soni and Meller, (2007) Clin Chem 53:1996-2001. 20. Soni and Meller, (2007) Clin Chem 53: 1996-2001.

21. Dear, (2003) Brief Funct Genomic Proteomic 1:397-416. 21. Dear, (2003) Brief Funct Genomic Proteomic 1:397-416.

以上所述僅為本發明之較佳實施例,凡依本發明申請專利範圍所做之均等變化與修飾,皆應屬本發明之涵蓋範圍。 The above are only the preferred embodiments of the present invention, and all changes and modifications made to the scope of the present invention should be within the scope of the present invention.

第1圖顯示了通過使用多核苷酸片段的序列訊息計算覆蓋深度和GC含量的原理過程。 Figure 1 shows the principle of calculating the depth of coverage and GC content by using the sequence information of the polynucleotide fragment.

第2圖示出了通過使用來自300個參考案例的數據建立的標準化覆蓋深度-GC含量相關性。將每個案例的標準化覆蓋深度相對於序列的GC含量作圖。十字指示懷有整倍體女性胎兒的案例,方塊指示懷有整倍體男性胎兒的案例。實線是覆蓋深度和GC含量的擬合線。其中,第2圖包括第2A圖、第2B圖、第2C圖、第2D圖。 Figure 2 shows the normalized coverage depth-GC content correlation established by using data from 300 reference cases. The normalized coverage depth for each case is plotted against the GC content of the sequence. The cross indicates the case of a euploid female fetus, which indicates a case of a euploid male fetus. The solid line is the fitted line of depth and GC content. In addition, FIG. 2 includes a 2A diagram, a 2B diagram, a 2C diagram, and a 2D diagram.

第3圖通過以染色體的固有上升GC含量排列染色體而示出了標準化覆蓋深度和相應GC含量之間的趨勢。這裏每一個染色體的固有上升GC含量引用300個參考案例中染色體的序列標簽的平均GC含量。 Figure 3 shows the trend between normalized depth of coverage and corresponding GC content by arranging chromosomes with inherently elevated GC content of the chromosome. The intrinsic rising GC content of each chromosome here refers to the average GC content of the sequence tags of the chromosomes in the 300 reference cases.

第4圖顯示了每個染色體的CG類的不同組成。對每個染色體計算參考獨特讀段的每35 bp讀段的GC含量,將GC含量分級成36個水平,並且每個水平的百分比被計算為每個染色體的GC組成。然後,將所述染色體通過熱圖作圖並進行層次聚類。 Figure 4 shows the different components of the CG class for each chromosome. The GC content per 35 bp read of the reference unique reads was calculated for each chromosome, the GC content was graded to 36 levels, and the percentage of each level was calculated as the GC composition of each chromosome. The chromosomes are then mapped by heat mapping and hierarchically clustered.

第5圖通過人工模擬測序儀偏好表明了測序偏差可引入第2圖中所示的相關性。其中,第5圖包括第5A圖、第5B圖、第5C圖、第5D圖。 Figure 5 shows that the sequencing bias can introduce the correlation shown in Figure 2 by artificially simulating the sequencer preferences. Here, FIG. 5 includes a 5A diagram, a 5B diagram, a 5C diagram, and a 5D diagram.

第6圖繪出相對於序列的多核苷酸片段的總數目的標準差。在150個樣品中,每一染色體的校正標準方差顯示出與獨特讀段的數目的平方根的倒數的相互線性關係。其中,第6圖包括第6A圖、第6B圖、第6C圖、第6D圖。 Figure 6 plots the standard deviation of the total number of polynucleotide fragments relative to the sequence. In 150 samples, the corrected standard deviation of each chromosome shows a linear relationship with the reciprocal of the square root of the number of unique reads. 6 shows FIG. 6A, FIG. 6B, FIG. 6C, and FIG. 6D.

第7圖顯示了對通過公式3計算的每一染色體殘差的QQ作 圖,顯示出具有常態分布的線性關係。其中,第7圖包括第7A圖、第7B圖。 Figure 7 shows the QQ for each chromosome residual calculated by Equation 3. The graph shows a linear relationship with a normal distribution. Among them, Fig. 7 includes Fig. 7A and Fig. 7B.

第8圖顯示了染色體Y覆蓋深度的直方圖。有兩個峰值表示案例的性別可以通過染色體Y的覆蓋深度區分。曲線是通過以Gaussian函數進行核密度估計而得出的染色體Y的相對覆蓋深度的分布。 Figure 8 shows a histogram of the depth of chromosome Y coverage. There are two peaks indicating that the gender of the case can be distinguished by the depth of coverage of chromosome Y. The curve is the distribution of the relative depth of coverage of chromosome Y by kernel density estimation with a Gaussian function.

第9圖顯示了用於對903個測試樣品診斷胎兒染色體異常的過程的簡圖。 Figure 9 shows a simplified diagram of the process used to diagnose fetal chromosomal abnormalities in 903 test samples.

第10圖顯示了非整倍性:三體性13、18和21以及XO、XXY、XYY案例和正常案例的結果。第10A圖顯示了對染色體13、18和21的標準化的覆蓋深度相對GC含量的作圖。第10B圖顯示了對染色體X和Y的作圖。圓代表正常女性胎兒的相對覆蓋度深度與GC含量,點代表正常男性胎兒。實線是相對覆蓋度和GC含量的擬合線,虛線是t值絕對值是1,點線是t值絕對值是2,點虛線:t值絕對值是3。 Figure 10 shows the results of aneuploidy: trisomy 13, 18 and 21 and XO, XXY, XYY cases and normal cases. Figure 10A shows a plot of normalized depth of coverage versus GC content for chromosomes 13, 18 and 21. Figure 10B shows a plot of chromosomes X and Y. The circle represents the relative coverage depth and GC content of normal female fetuses, and the points represent normal male fetuses. The solid line is the fitted line of relative coverage and GC content, the dashed line is the absolute value of the t value is 1, the dotted line is the absolute value of the t value is 2, the dotted line: the absolute value of the t value is 3.

第11圖比較了不同診斷方法的置信度的值。 Figure 11 compares the confidence values for different diagnostic methods.

第12圖顯示了胎兒DNA分數和孕齡之間的關係。母體血漿中胎兒DNA的分數與孕齡相關。胎兒DNA分數通過X和Y一塊估計。在平均胎兒DNA分數和孕齡之間有統計顯著的相關性(P<0.001)。注意,R2值表示相關係數的平方較小。分數最小值是3.49%。 Figure 12 shows the relationship between fetal DNA score and gestational age. The fraction of fetal DNA in maternal plasma correlates with gestational age. Fetal DNA scores are estimated by X and Y. There was a statistically significant correlation between mean fetal DNA scores and gestational age (P < 0.001). Note that the R2 value indicates that the square of the correlation coefficient is small. The minimum score is 3.49%.

第13圖顯示了標準方差與檢測所需的案例數目之間的關係。通過公式5計算每一染色體的標準方差隨不同樣品數目而變化。當 樣品數目超過100個時標準方差變得穩定。 Figure 13 shows the relationship between the standard deviation and the number of cases required for detection. The standard deviation of each chromosome calculated by Equation 5 varies with the number of different samples. when The standard deviation becomes stable when the number of samples exceeds 100.

第14圖顯示了在無細胞血漿中用於檢測胎兒非整倍性的獨特讀段的估計數目,其是胎兒DNA分數的函數。對於各自具有不同長度的染色體13、18、21和X甚至Y(從X和Y之間的關係)的非整倍性,所述估計值是基於不小於3的置信度t值水平。隨著胎兒DNA分數降低,需要的鳥槍序列的總數目增加。使用流動池(flowcell)上每通道4百萬序列讀段的測序通量,如果3.5%的所述無細胞DNA是胎兒的則可以檢測到三體性21。當所述分數和獨特讀段數目較小例如4%和5百萬讀段時,不易檢測到染色體X的非整倍性。不同染色體需要不同水平的胎兒DNA分數和獨特讀段數目,這可能是由於染色體的GC結構導致的。 Figure 14 shows the estimated number of unique reads used to detect fetal aneuploidy in acellular plasma as a function of fetal DNA fraction. For aneuploidy of chromosomes 13, 18, 21 and X and even Y (from the relationship between X and Y) each having a different length, the estimate is based on a confidence t value level of not less than 3. As the fetal DNA fraction decreases, the total number of required shotgun sequences increases. Using the sequencing throughput of 4 million sequence reads per channel on the flowcell, trisomy 21 can be detected if 3.5% of the cell-free DNA is fetal. When the score and the number of unique reads are small, for example, 4% and 5 million reads, the aneuploidy of chromosome X is not easily detected. Different chromosomes require different levels of fetal DNA fraction and unique reads, which may be due to the GC structure of the chromosome.

第15圖顯示了,數據量和孕齡(週)所反映的,用於對於每個孕週和數據量的每個點檢測女性胎兒染色體13的三體性的敏感性的恒值圖。 Figure 15 shows a constant value plot of the sensitivity of the trisomy of female fetus chromosome 13 for each gestational age and data volume as reflected by the amount of data and gestational age (weeks).

第16圖顯示了,數據量和孕齡(週)所反映的,用於對於每個孕週和數據量的每個點檢測女性胎兒染色體18的三體性的敏感性的恒值圖。 Figure 16 shows a constant value plot of the sensitivity of the trisomy of female fetus chromosome 18 for each gestational age and data volume as reflected by the amount of data and gestational age (weeks).

第17圖顯示了,數據量和孕齡(週)所反映的,用於對於每個孕週和數據量的每個點檢測女性胎兒染色體21的三體性的敏感性的恒值圖。 Figure 17 shows a constant value plot of the sensitivity of the trisomy of female fetus chromosome 21 for each gestational age and data volume as reflected by the amount of data and gestational age (weeks).

第18圖顯示了,數據量和孕齡(週)所反映的,用於對於每個孕週和數據量的每個點檢測女性胎兒染色體X的三體性的敏感性的恒值圖。 Figure 18 shows a constant value plot of the sensitivity of the trisomy of female fetus chromosome X for each gestational age and data volume as reflected by the amount of data and gestational age (weeks).

第19圖顯示了數據量和孕齡(週)所反映的,用於檢測男性染色體13的三體性的敏感性的恒值圖。對於每個孕週和數據量的每個點,發明人首先對給定數據量計算胎兒DNA分數和標準方差的經驗分布,並比較通過XY或Y估計的分數,然後發明人計算了每一敏感性類型的非整倍性。第20圖顯示了數據量和孕齡(週)所反映的,用於檢測男性染色體18的三體性的敏感性恒值圖。 Fig. 19 shows a constant value map for detecting the sensitivity of the trisomy of the male chromosome 13 as reflected by the amount of data and the gestational age (week). For each gestational age and data point, the inventor first calculates the empirical distribution of fetal DNA scores and standard deviations for a given amount of data, and compares the scores estimated by XY or Y, and then the inventor calculates each sensitivity. Aneuploidy of sexual type. Figure 20 shows the sensitivity constant values of the trisomy of male chromosome 18 as reflected by the amount of data and gestational age (week).

第21圖顯示了數據量和孕齡(週)所反映的,用於檢測男性染色體21的三體性的敏感性恒值圖。 Figure 21 shows the sensitivity constant value map for detecting the trisomy of male chromosome 21 as reflected by the amount of data and gestational age (week).

Claims (28)

一種用於建立染色體的覆蓋深度和GC含量之間的關係的方法,該方法包括:從多於一個樣品獲得涵蓋該染色體的複數多核苷酸片段的序列訊息;基於該序列訊息將該片段分配至染色體;對於每個樣品基於該序列訊息計算該染色體的覆蓋深度和GC含量;並且確定該染色體的覆蓋深度和GC含量之間的關係。 A method for establishing a relationship between a depth of coverage of a chromosome and a GC content, the method comprising: obtaining a sequence message covering a plurality of polynucleotide fragments of the chromosome from more than one sample; assigning the fragment to the sequence based on the sequence message Chromosome; the coverage depth and GC content of the chromosome are calculated based on the sequence information for each sample; and the relationship between the depth of coverage of the chromosome and the GC content is determined. 如申請專利範圍第1項所述之方法,其中該分配是通過將該片段的序列與參考人基因組序列比較進行的。 The method of claim 1, wherein the assigning is performed by comparing the sequence of the fragment to a reference human genome sequence. 如申請專利範圍第1項所述之方法,其中該染色體的覆蓋深度是分配至該染色體的片段數目和該染色體的參考獨特讀段數目之間的比值。 The method of claim 1, wherein the depth of coverage of the chromosome is a ratio between the number of fragments assigned to the chromosome and the number of reference unique reads of the chromosome. 如申請專利範圍第3項所述之方法,其中該覆蓋深度被標準化。 The method of claim 3, wherein the coverage depth is standardized. 如申請專利範圍第4項所述之方法,其中該標準化相對於另一染色體、所有其他常染色體或所有其他染色體的覆蓋度進行計算。 The method of claim 4, wherein the normalization is calculated relative to the coverage of another chromosome, all other autosomes, or all other chromosomes. 如申請專利範圍第1項所述之方法,其中該染色體的GC含量是分配至該染色體的所有片段的平均GC含量。 The method of claim 1, wherein the GC content of the chromosome is the average GC content of all fragments assigned to the chromosome. 如申請專利範圍第1項所述之方法,其中該染色體是染色體1、2、......、22、X或Y。 The method of claim 1, wherein the chromosome is chromosome 1, 2, ..., 22, X or Y. 如申請專利範圍第4項所述之方法,其中該關係為以下公式: cr i,j =f(GC i,j )+ε i,j ,j=12,…,22,X,Y,其中f(GC i,j)代表樣品i染色體j的覆蓋深度和相應GC含量之間的關係的函數,εi,j代表樣品i染色體j的殘差。 The method of claim 4, wherein the relationship is the following formula: cr i , j = f ( GC i , j )+ ε i , j , j =12,...,22, X , Y , wherein f ( GC i,j ) represents a function of the relationship between the depth of coverage of the chromosome i of the sample i and the corresponding GC content, ε i,j represents the residual of the chromosome i of the sample i. 如申請專利範圍第1項所述之方法,其中該覆蓋深度和GC含量之間的關係通過局部多項式回歸計算。 The method of claim 1, wherein the relationship between the depth of coverage and the GC content is calculated by local polynomial regression. 如申請專利範圍第9項所述之方法,其中該關係是不強的線性關係。 The method of claim 9, wherein the relationship is a non-linear relationship. 如申請專利範圍第10項所述之方法,其中該關係通過loess算法確定。 The method of claim 10, wherein the relationship is determined by a loess algorithm. 如申請專利範圍第8項所述之方法,還包括:根據以下公式計算擬合覆蓋深度: The method of claim 8, further comprising: calculating the fitting coverage depth according to the following formula: 如申請專利範圍第12項所述之方法,還包括:根據以下公式計算標準差: 其中ns代表參考樣品的數目。 The method of claim 12, further comprising: calculating the standard deviation according to the following formula: Where ns represents the number of reference samples. 如申請專利範圍第13項所述之方法,還包括:根據以下公式計算student t統計量: The method of claim 13, further comprising: calculating the student t statistic according to the following formula: 一種檢測胎兒遺傳異常的方法,該方法包括:a)從樣品獲得複數多核苷酸片段的序列訊息; b)基於該序列訊息將該片段分配至染色體;c)基於該序列訊息計算染色體的覆蓋深度和GC含量;d)使用該染色體的GC含量以及確立的該染色體的覆蓋深度和GC含量之間的關係計算該染色體的擬合覆蓋深度;以及e)將該染色體的擬合覆蓋深度與覆蓋深度進行比較,其中它們之間的差異指示胎兒遺傳異常。 A method for detecting a genetic abnormality in a fetus comprising: a) obtaining a sequence message of a plurality of polynucleotide fragments from a sample; b) assigning the fragment to the chromosome based on the sequence message; c) calculating the depth of coverage and GC content of the chromosome based on the sequence information; d) using the GC content of the chromosome and establishing the coverage depth of the chromosome and the GC content The relationship calculates the fit coverage depth of the chromosome; and e) compares the fitted coverage depth of the chromosome to the depth of coverage, wherein the difference between them indicates a fetal genetic abnormality. 如申請專利範圍第15項所述之方法,還包括:f)確定該胎兒性別。 The method of claim 15, further comprising: f) determining the sex of the fetus. 如申請專利範圍第16項所述之方法,其中該胎兒性別根據以下公式確定: 其中cr.a i,x cr.a i,y 分別是X和Y染色體的標準化的相對覆蓋度。 The method of claim 16, wherein the fetal gender is determined according to the following formula: Where cr.a i , x and cr.a i , y are the normalized relative coverage of the X and Y chromosomes , respectively. 如申請專利範圍第16項所述之方法,還包括:g)估計該胎兒分數。 The method of claim 16, further comprising: g) estimating the fetal fraction. 如申請專利範圍第15項所述之方法,其中對該染色體的擬合覆蓋深度與覆蓋深度進行比較通過統計假設檢驗進行。 The method of claim 15, wherein the fitting coverage depth of the chromosome is compared with the depth of coverage by statistical hypothesis testing. 如申請專利範圍第19項所述之方法,其中一個假設是該胎兒是整倍體(H0),另一個假設是該胎兒是非整倍體(H1)。 As for the method described in claim 19, one of the assumptions is that the fetus is euploid (H0), and the other hypothesis is that the fetus is aneuploid (H1). 如申請專利範圍第20項所述之方法,其中對兩個假設都計算student t統計量。 The method of claim 20, wherein the student t statistic is calculated for both hypotheses. 如申請專利範圍第21項所述之方法,其中分別根據以下公式對H0和H1計算student t統計量:,其中fxy是胎兒分數。 The method of claim 21, wherein the student t statistic is calculated for H0 and H1 according to the following formula: with , where fxy is the fetal fraction. 如申請專利範圍第22項所述之方法,其中t1和t2的對數似然比根據以下公式計算:L i,j =log(p(t1 i,j ,degree | D))/log(p(t2 i,j ,degree|T)),其中Li,j是對數似然比,其中degree是指t分布度,D是指二倍性,T是指三體性,p(t1 i,j ,degree | ),=D,T代表給定t分布度的條件概率密度。 The method of claim 22, wherein the log likelihood ratio of t1 and t2 is calculated according to the following formula: L i , j =log( p (t1 i , j ,degree | D))/log( p ( t 2 i , j ,degree|T)), where Li,j is a log-likelihood ratio, where degree is the degree of t distribution, D is diploid, and T is trisomies, p (t1 i , j ,degree | * ), * =D,T represents the conditional probability density for a given t-distribution. 一種確定胎兒遺傳異常的方法,該方法包括:a)從多於一個正常樣品獲得覆蓋目的染色體的複數多核苷酸片段的序列訊息;b)基於該序列訊息將該片段分配至該染色體;c)基於來自該正常樣品的序列訊息計算該染色體的覆蓋深度和GC含量;d)確定該染色體的覆蓋深度和GC含量之間的關係;e)從生物樣品獲得複數多核苷酸片段的序列訊息;f)基於來自該生物樣品的序列訊息將該片段分配至該染色體;g)基於來自該生物樣品的序列訊息計算該染色體的覆蓋深度和GC含量;h)使用該染色體的GC含量以及該染色體的覆蓋深度和GC含量之間的關係來計算該染色體的擬合覆蓋深度;以及i)將該染色體的擬合覆蓋深度與覆蓋深度進行比較,其中它們之間的差異指示胎兒遺傳異常。 A method for determining a genetic abnormality in a fetus comprising: a) obtaining a sequence message covering a plurality of polynucleotide fragments of a chromosome of interest from more than one normal sample; b) assigning the fragment to the chromosome based on the sequence message; c) Calculating the depth of coverage and GC content of the chromosome based on sequence information from the normal sample; d) determining the relationship between the depth of coverage of the chromosome and the GC content; e) obtaining sequence information of the plurality of polynucleotide fragments from the biological sample; Generating the fragment to the chromosome based on sequence information from the biological sample; g) calculating the depth of coverage and GC content of the chromosome based on the sequence information from the biological sample; h) using the GC content of the chromosome and the coverage of the chromosome The relationship between depth and GC content is used to calculate the fitted coverage depth of the chromosome; and i) the fitted coverage depth of the chromosome is compared to the depth of coverage, wherein the difference between them indicates a fetal genetic abnormality. 一種包含用於進行胎兒遺傳異常的產前診斷的多條指令的計算機可讀介質,其工作過程包含以下步驟:a)從樣品接收複數多核苷酸片段的序列訊息;b)基於該序列訊息將該多核苷酸片段分配至染色體;c)基於該序列訊息計算染色體的覆蓋深度和GC含量;d)使用該染色體的GC含量以及確立的該染色體的覆蓋深度和GC含量之間的關係來計算該染色體的擬合覆蓋深度;以及e)將該染色體的擬合覆蓋深度與覆蓋深度進行比較,其中它們之間的差異指示胎兒遺傳異常。 A computer readable medium comprising a plurality of instructions for performing prenatal diagnosis of a fetal genetic abnormality, the working process comprising the steps of: a) receiving a sequence message of a plurality of polynucleotide fragments from a sample; b) based on the sequence of messages The polynucleotide fragment is assigned to the chromosome; c) calculating the depth of coverage and the GC content of the chromosome based on the sequence information; d) calculating the relationship between the GC content of the chromosome and the established depth of coverage of the chromosome and the GC content The fit of the chromosome covers the depth; and e) the fit coverage depth of the chromosome is compared to the depth of coverage, wherein the difference between them indicates fetal genetic abnormality. 如申請專利範圍第25項所述之計算機可讀介質,其工作過程還包括:f)確定該胎兒性別。 The computer readable medium of claim 25, wherein the working process further comprises: f) determining the sex of the fetus. 如申請專利範圍第26項所述之計算機可讀介質,其工作過程還包括:g)估計該胎兒分數。 The computer readable medium of claim 26, wherein the working process further comprises: g) estimating the fetal fraction. 一種用於檢測胎兒遺傳異常的系統,其包括:a)用於從樣品獲得複數多核苷酸片段的序列訊息的工具;以及b)申請專利範圍第25項所述之計算機可讀介質。 A system for detecting a genetic abnormality in a fetus comprising: a) means for obtaining a sequence message of a plurality of polynucleotide fragments from a sample; and b) the computer readable medium of claim 25.
TW101143513A 2012-11-21 2012-11-21 Non-invasive detection of fetus genetic abnormality TWI489305B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
TW101143513A TWI489305B (en) 2012-11-21 2012-11-21 Non-invasive detection of fetus genetic abnormality

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
TW101143513A TWI489305B (en) 2012-11-21 2012-11-21 Non-invasive detection of fetus genetic abnormality

Publications (2)

Publication Number Publication Date
TW201421273A true TW201421273A (en) 2014-06-01
TWI489305B TWI489305B (en) 2015-06-21

Family

ID=51393429

Family Applications (1)

Application Number Title Priority Date Filing Date
TW101143513A TWI489305B (en) 2012-11-21 2012-11-21 Non-invasive detection of fetus genetic abnormality

Country Status (1)

Country Link
TW (1) TWI489305B (en)

Family Cites Families (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20100216153A1 (en) * 2004-02-27 2010-08-26 Helicos Biosciences Corporation Methods for detecting fetal nucleic acids and diagnosing fetal abnormalities
PT2334812T (en) * 2008-09-20 2017-03-29 Univ Leland Stanford Junior Noninvasive diagnosis of fetal aneuploidy by sequencing
US9670243B2 (en) * 2010-06-02 2017-06-06 Industrial Technology Research Institute Compositions and methods for sequencing nucleic acids
CA3160848A1 (en) * 2011-02-24 2013-03-28 The Chinese University Of Hong Kong Molecular testing of multiple pregnancies

Also Published As

Publication number Publication date
TWI489305B (en) 2015-06-21

Similar Documents

Publication Publication Date Title
JP5659319B2 (en) Non-invasive detection of genetic abnormalities in the fetus
JP6161607B2 (en) How to determine the presence or absence of different aneuploidies in a sample
US20170363628A1 (en) Means and methods for non-invasive diagnosis of chromosomal aneuploidy
CN104120181B (en) The method and device of GC corrections is carried out to chromosome sequencing result
HUE030510T2 (en) Diagnosing fetal chromosomal aneuploidy using genomic sequencing
JP2018512048A (en) Mutation detection for cancer screening and fetal analysis
WO2013053183A1 (en) Method and system for genotyping predetermined region in nucleic acid sample
TW201418474A (en) Non-invasive determination of methylome of fetus or tumor from plasma
JP2015512264A (en) Method and system for detecting copy number variation
CN108604258B (en) Chromosome abnormality determination method
CN106661609B (en) Method for predicting congenital heart defects
EP3568472A1 (en) Method for non-invasive prenatal screening for aneuploidy
WO2015035555A1 (en) Method, system, and computer readable medium for determining whether fetus has abnormal number of sex chromosomes
EP3662479A1 (en) A method for non-invasive prenatal detection of fetal sex chromosomal abnormalities and fetal sex determination for singleton and twin pregnancies
Lee et al. Performance of Momguard, a new non-invasive prenatal testing protocol developed in Korea
TWI489305B (en) Non-invasive detection of fetus genetic abnormality