HK1207124B - Non-invasive determination of methylome of fetus or tumor from plasma - Google Patents

Non-invasive determination of methylome of fetus or tumor from plasma Download PDF

Info

Publication number
HK1207124B
HK1207124B HK15107703.6A HK15107703A HK1207124B HK 1207124 B HK1207124 B HK 1207124B HK 15107703 A HK15107703 A HK 15107703A HK 1207124 B HK1207124 B HK 1207124B
Authority
HK
Hong Kong
Prior art keywords
methylation
regions
threshold
region
dna molecules
Prior art date
Application number
HK15107703.6A
Other languages
Chinese (zh)
Other versions
HK1207124A1 (en
Inventor
赵慧君
陈君赐
卢煜明
伦妙芬
江培勇
陈渭雯
Original Assignee
香港中文大学
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Priority claimed from US13/842,209 external-priority patent/US9732390B2/en
Application filed by 香港中文大学 filed Critical 香港中文大学
Priority claimed from PCT/AU2013/001088 external-priority patent/WO2014043763A1/en
Publication of HK1207124A1 publication Critical patent/HK1207124A1/en
Publication of HK1207124B publication Critical patent/HK1207124B/en

Links

Description

从血浆无创测定胎儿或肿瘤的甲基化组Non-invasive determination of fetal or tumor methylome from plasma

相关申请案的交叉引用Cross-citation of related applications

本申请案为要求以下各申请案的优先权的PCT申请案:2013年6月3日提交的标 题为“在血浆中使用甲基化状态和拷贝数检测肿瘤(Tumor Detection In Plasma UsingMethylation Status And Copy Number)”的美国临时专利申请案第61/830,571号;以及2013 年3月15日提交的标题为“从血浆无创测定胎儿或肿瘤的甲基化组(Non-InvasiveDetermination Of Methylome Of Fetus Or Tumor From Plasma)”的美国申请案第 13/842,209号,其为2012年9月20日提交的标题为“通过母体血浆的大规模平行测序测 定胎盘的全基因组DNA甲基化状态的方法(Method Of Determining The Whole Genome DNAMethylation Status Of The Placenta By Massively Parallel Sequencing OfMaternal Plasma)”的美国临时专利申请案第61/703,512号的非临时申请案并且要求美国临时专 利申请案第61/703,512号的权益,所述申请案以全文引用的方式并入本文中用于所有目 的。This application is a PCT application claiming priority to the following: U.S. Provisional Patent Application No. 61/830,571, filed June 3, 2013, entitled "Tumor Detection In Plasma Using Methylation Status and Copy Number"; and U.S. Application No. 13/842, filed March 15, 2013, entitled "Non-Invasive Determination of Methylome of Fetus or Tumor From Plasma". Application No. 209 is a non-provisional application filed on September 20, 2012, entitled "Method of Determining the Whole Genome DNA Methylation Status of the Placenta by Massively Parallel Sequencing of Maternal Plasma," and claims the benefit of U.S. Provisional Patent Application No. 61/703,512, which is incorporated herein by reference in its entirety for all purposes.

技术领域Technical Field

本发明一般涉及DNA的甲基化模式(甲基化组)的测定,并且更具体地说,涉及 分析包括来自不同基因组(例如来自胎儿和母亲,或来自肿瘤和正常细胞)的DNA的 混合物的生物样品(例如血浆)以确定少数基因组的甲基化模式(甲基化组)。还描述 了所测定的甲基化组的用途。This invention generally relates to the determination of DNA methylation patterns (methylome), and more specifically, to the analysis of biological samples (e.g., plasma) comprising a mixture of DNA from different genomes (e.g., from fetus and mother, or from tumor and normal cells) to determine the methylation patterns (methylome) of a minority of genomes. The uses of the determined methylome are also described.

背景技术Background Technology

胚胎和胎儿的发育是一个复杂的过程,并且包括一系列高度协调的遗传和表观遗传 事件。癌症的发展也是一个复杂的过程,它通常涉及多个遗传和表观遗传步骤。发育过程的表观遗传控制的异常与不孕症、自然流产、子宫内生长异常和产后结果相关。DNA 甲基化是最常研究的表观遗传机制之一。DNA的甲基化大多发生在甲基添加到CpG双 核苷酸当中胞嘧啶残基的5'碳的情况下。胞嘧啶甲基化增添了一层对基因转录和DNA 功能的控制。举例来说,称为CpG岛的富含CpG双核苷酸的基因启动子的高甲基化通 常与基因功能的抑制相关。Embryonic and fetal development is a complex process involving a series of highly coordinated genetic and epigenetic events. Cancer development is also a complex process, typically involving multiple genetic and epigenetic steps. Abnormalities in the epigenetic control of developmental processes are associated with infertility, miscarriage, intrauterine growth disorders, and postnatal outcomes. DNA methylation is one of the most frequently studied epigenetic mechanisms. DNA methylation mostly occurs when a methyl group is added to the 5' carbon of a cytosine residue in a CpG dinucleotide. Cytosine methylation adds a layer of control over gene transcription and DNA function. For example, hypermethylation of CpG-rich dinucleotide-rich gene promoters, known as CpG islands, is often associated with the repression of gene function.

尽管表观遗传机制在调节发育过程中有着重要的作用,但人类胚胎和胎组织不易获 得进行分析(肿瘤可能类似地不可获得)。在人类中产前期期间研究健康和疾病的此类表观遗传过程的动态改变实际上是不可能的。胚胎外组织,特别是胎盘,可以作为产前 诊断程序的一部分或出生后获得,其为此类研究提供了主要途径之一。但是,此类组织 需要侵入性程序。Although epigenetic mechanisms play a crucial role in regulating development, human embryonic and fetal tissues are not readily available for analysis (tumors may be similarly unavailable). Studying the dynamic changes in such epigenetic processes of health and disease during the prenatal period in humans is practically impossible. Extraembryonic tissues, particularly the placenta, can be obtained as part of prenatal diagnostic procedures or postnatally, providing one of the main avenues for such research. However, these tissues require invasive procedures.

人类胎盘的DNA甲基化型态已经吸引研究人员数十年。人类胎盘显示了大量与DNA甲基化有关的罕见生理特征。在整体水平上,胎盘组织与大部分体细胞组织比较时 是低甲基化的。在基因水平上,所选基因座的甲基化状态是胎盘组织特定的标记。整体 与基因座特定的甲基化型态都展示了孕龄依赖性改变。印记基因,即表达依赖于等位基 因的亲代来源的基因,在胎盘中起关键作用。胎盘已经被描述为假恶性的,并且已经观 测到若干肿瘤抑制基因的高甲基化。The DNA methylation patterns of the human placenta have fascinated researchers for decades. The human placenta exhibits a wealth of rare physiological features associated with DNA methylation. At the global level, placental tissue is hypomethylated compared to most somatic tissues. At the gene level, the methylation status of selected loci is a tissue-specific marker for placental tissue. Both global and locus-specific methylation patterns demonstrate gestational age-dependent alterations. Imprinted genes, allele-dependent parental genes, play a crucial role in the placenta. The placenta has been described as pseudomalignant, and hypermethylation of several tumor suppressor genes has been observed.

胎盘组织的DNA甲基化型态的研究使得可以了解妊娠相关或发育相关疾病(例如先兆子痫和子宫内生长受限)的病理生理学。基因组印记上的病症与例如普拉德-威利综合症(Prader-Willi syndrome)和安格尔曼综合症(Angelman syndrome)等发育病症相关。已经在由辅助的生殖技术产生的妊娠中观测到胎盘和胎组织的基因组印记和整体 DNA甲基化的改变型态(日浦等人2012人类繁殖;27:2541-2548(H Hiura et al.2012 HumReprod;27:2541-2548))。例如母亲吸烟(霍沃思等人2013表观基因组学;5:37-49 (KEHaworth et al.2013Epigenomics;5:37-49))、母亲饮食因素(江等人2012FASEB J; 26:3563-3574(X Jiang et al.2012FASEB J;26:3563-3574))和母亲代谢状态(例如糖尿 病)(哈吉等人,糖尿病.doi:10.2337/db12-0289(N Hajj et al.,Diabetes.doi:10.2337/db12-0289))等大量环境因素与后代的表观遗传异常相关。Studies of DNA methylation patterns in placental tissue have enabled insights into the pathophysiology of pregnancy-related or developmental disorders, such as preeclampsia and intrauterine growth restriction. Genomically imprinted disorders are associated with developmental disorders such as Prader-Willi syndrome and Angelman syndrome. Genomic imprinting and altered patterns of overall DNA methylation in placental and fetal tissue have been observed in pregnancies resulting from assisted reproductive technologies (H Hiura et al. 2012 HumReprod; 27:2541-2548). For example, a large number of environmental factors, such as maternal smoking (KE Haworth et al. 2013 Epigenomics; 5:37-49), maternal dietary factors (X Jiang et al. 2012 FASEB J; 26:3563-3574), and maternal metabolic status (e.g. diabetes) (N Hajj et al., Diabetes. doi:10.2337/db12-0289), are associated with epigenetic abnormalities in offspring.

尽管努力了数十年,但是还没有任何切实可行的方法可用来研究胎儿或肿瘤甲基化 组并监测妊娠期间或例如恶性病等疾病过程期间的动态改变。因此,提供无创地分析全部或部分胎儿甲基化组和肿瘤甲基化组的方法是非常有价值的。Despite decades of effort, no truly feasible method exists for studying fetal or tumor methylomes and monitoring dynamic changes during pregnancy or disease processes such as malignancies. Therefore, providing a non-invasive method for analyzing all or part of the fetal methylome and tumor methylome is of great value.

发明内容Summary of the Invention

实施例提供了用于测定和使用各种组织和样品的甲基化型态的系统、方法和设备。 提供了实例说明。可以基于血浆甲基化(或具有游离DNA的其它样品,例如尿液、唾液、生殖器洗涤液)与母亲/患者的甲基化型态比较来推断胎儿/肿瘤组织的甲基化型态。 当样品具有DNA的混合物时,可以使用组织特定的等位基因鉴别来自胎儿/肿瘤的 DNA,来确定胎儿/肿瘤组织的甲基化型态。甲基化型态可以用于确定胎儿/肿瘤的基因 组中的拷贝数变异。通过各种技术鉴别出胎儿的甲基化标记物。可以通过测定DNA片 段的尺寸分布的尺寸参数来确定甲基化型态,其中尺寸参数的参考值可以用于确定甲基 化水平。The examples provide systems, methods, and apparatus for determining and using methylation patterns in various tissues and samples. Illustrative examples are provided. The methylation pattern of fetal/tumor tissue can be inferred based on a comparison of plasma methylation (or other samples containing cell-free DNA, such as urine, saliva, genital wash) with the methylation pattern of the mother/patient. When the sample contains a mixture of DNA, tissue-specific alleles can be used to identify DNA from the fetus/tumor to determine the methylation pattern of the fetal/tumor tissue. Methylation pattern can be used to determine copy number variations in the fetal/tumor genome. Methylation markers in the fetus are identified using various techniques. The methylation pattern can be determined by measuring size parameters of the size distribution of DNA fragments, where reference values for the size parameters can be used to determine the methylation level.

另外,甲基化水平可以用于确定癌症等级。在癌症情况下,血浆中甲基化组改变的测量可以允许检测癌症(例如用于筛选目的)、监测(例如检测抗癌治疗后的反应;以 及检测癌症复发)和预后(例如用于测量体内癌细胞的负荷或用于癌症分期或用于评估 由疾病或疾病进展或转移性过程引起的死亡机率)。In addition, methylation levels can be used to determine cancer grade. In cancer cases, measurements of changes in plasma methylome can allow for cancer detection (e.g., for screening purposes), monitoring (e.g., detecting response to anticancer therapy; and detecting cancer recurrence), and prognosis (e.g., for measuring the burden of cancer cells in the body, for cancer staging, or for assessing the probability of death due to the disease or its progression or metastasis).

可以参考以下具体实施方式和附图来更好地了解本发明的实施例的性质和优点。The nature and advantages of the embodiments of the present invention can be better understood by referring to the following detailed implementation and accompanying drawings.

附图说明Attached Figure Description

图1A展示表100,为根据本发明的实施例,母体血液、胎盘和母体血浆的测序结果。Figure 1A shows Table 100, which represents the sequencing results of maternal blood, placenta, and maternal plasma according to an embodiment of the present invention.

图1B展示根据本发明的实施例,测序样品的1Mb窗口中的甲基化密度。Figure 1B shows the methylation density in a 1Mb window of a sequencing sample according to an embodiment of the present invention.

图2A-2C展示β值针对甲基化指数的曲线图:(A)母体血细胞、(B)绒膜绒毛样 品、(c)足月胎盘组织。Figures 2A-2C show the curves of β values versus methylation index: (A) maternal blood cells, (B) chorionic villus sample, (c) full-term placental tissue.

图3A和3B展示从成年男性和非怀孕成年女性收集的血浆和血细胞中甲基化CpG位点百分比的柱状图:(A)常染色体、(B)染色体X。Figures 3A and 3B show bar charts of the percentage of methylated CpG sites in plasma and blood cells collected from adult men and non-pregnant adult women: (A) autosomes, (B) chromosome X.

图4A和4B展示血细胞DNA和血浆DNA中对应基因座的甲基化密度的曲线图: (A)非怀孕成年女性、(B)成年男性。Figures 4A and 4B show the methylation density curves of corresponding loci in blood cell DNA and plasma DNA: (A) non-pregnant adult women, (B) adult men.

图5A和5B展示从孕妇收集的样品当中甲基化CpG位点百分比的柱状图:(A)常 染色体、(B)染色体X。Figures 5A and 5B show bar charts of the percentage of methylated CpG sites in samples collected from pregnant women: (A) autosomes, (B) chromosome X.

图6展示母体血液、胎盘和母体血浆的人类基因组的不同重复序列类别的甲基化水 平的柱状图。Figure 6 shows a bar chart of methylation levels for different repetitive sequence classes in the human genome from maternal blood, placenta, and maternal plasma.

图7A展示早期妊娠样品的Circos图700。图7B展示晚期妊娠样品的Circos图750。Figure 7A shows a Circos diagram 700 of an early pregnancy sample. Figure 7B shows a Circos diagram 750 of a late pregnancy sample.

图8A-8D展示针对提供有用信息的单核苷酸多态性周围的CpG位点,基因组组织DNA针对母体血浆DNA的甲基化密度的比较曲线图。Figures 8A-8D show comparative curves of methylation density of genomic tissue DNA versus maternal plasma DNA around CpG sites surrounding single nucleotide polymorphisms that provide useful information.

图9是一个流程图,说明根据本发明的实施例,用于从生物体的生物样品确定第一甲基化型态的方法900。Figure 9 is a flowchart illustrating a method 900 for determining a first methylation morphology from a biological sample of an organism according to an embodiment of the present invention.

图10是一个流程图,说明根据本发明的实施例,从生物体的生物样品中确定第一甲基化型态的方法1000。Figure 10 is a flowchart illustrating a method 1000 for determining a first methylation type from a biological sample of an organism according to an embodiment of the present invention.

图11A和11B展示根据本发明的实施例,使用母体血浆数据和胎儿DNA百分比浓 度的预测算法的性能图。Figures 11A and 11B show performance graphs of a prediction algorithm using maternal plasma data and fetal DNA percentage concentration according to an embodiment of the present invention.

图12A是表1200,展示根据本发明的实施例预测甲基化的15个所选基因座的细节。图12B是图表1250,展示胎盘中15个所选基因座的推测的类别和其在胎盘中对应甲基 化水平。Figure 12A is Table 1200, showing details of the predicted methylation of 15 selected loci according to embodiments of the present invention. Figure 12B is Graph 1250, showing the inferred categories of the 15 selected loci in the placenta and their corresponding methylation levels in the placenta.

图13是方法1300的流程图,所述方法1300用于从怀有至少一个胎儿的女性个体的生物样品检测胎儿染色体异常。Figure 13 is a flowchart of method 1300, which is used to detect fetal chromosomal abnormalities from a biological sample of a female individual carrying at least one fetus.

图14是方法1400的流程图,所述方法1400用于根据本发明的实施例,通过比较 胎盘甲基化型态与母体甲基化型态来鉴别甲基化分子标记。Figure 14 is a flowchart of method 1400, which is used to identify methylated molecular markers by comparing placental methylation patterns with maternal methylation patterns according to an embodiment of the present invention.

图15A是表1500,使用关于33个先前报导的早期妊娠甲基化分子标记展示早期妊娠数据的DMR鉴别算法的性能。图15B是表1550,使用晚期妊娠数据并与分娩时获得 的胎盘样品比较展示DMR鉴别算法的性能。Figure 15A, Table 1500, demonstrates the performance of the DMR identification algorithm for early pregnancy data using 33 previously reported methylation molecular markers for early pregnancy. Figure 15B, Table 1550, demonstrates the performance of the DMR identification algorithm using late pregnancy data and comparing it with placental samples obtained at delivery.

图16是表1600,展示基于母体血浆亚硫酸氢盐测序数据的直接分析预测为高甲基化或低甲基化的基因座的数目。Figure 16 is Table 1600, showing the number of loci predicted as hypermethylated or hypomethylated based on direct analysis of maternal plasma bisulfite sequencing data.

图17A是曲线1700,展示母体血浆、非怀孕女性对照血浆、胎盘和外周血液DNA 的尺寸分布。图17B是母体血浆、成年女性对照血浆、胎盘组织和成年女性对照血液的 尺寸分布和甲基化型态的曲线1750。Figure 17A is curve 1700, showing the size distribution of DNA in maternal plasma, non-pregnant female control plasma, placenta, and peripheral blood. Figure 17B is curve 1750, showing the size distribution and methylation patterns in maternal plasma, adult female control plasma, placental tissue, and adult female control blood.

图18A和18B是根据本发明的实施例,血浆DNA分子的甲基化密度和尺寸的曲线图。Figures 18A and 18B are graphs showing the methylation density and size of plasma DNA molecules according to embodiments of the present invention.

图19A展示成年非怀孕女性的测序读数的甲基化密度和尺寸的曲线1900。图19B是图1950,展示母体血浆中胎儿特定和母体特定的DNA分子的尺寸分布和甲基化型态。Figure 19A shows the methylation density and size curves for sequencing reads from adult non-pregnant women at 1900. Figure 19B is Figure 1950, showing the size distribution and methylation patterns of fetal-specific and maternal-specific DNA molecules in maternal plasma.

图20是方法2000的流程图,所述方法2000用于根据本发明的实施例,评估生物 体的生物样品中DNA的甲基化水平。Figure 20 is a flowchart of method 2000, which is used to evaluate the methylation level of DNA in a biological sample of an organism according to an embodiment of the present invention.

图21A是表2100,展示肝细胞癌(HCC)患者的手术前血浆和组织样品的甲基化 密度。图21B是表2150,展示每一样品所实现的序列读数的数目和测序深度。Figure 21A is Table 2100, showing the methylation density of plasma and tissue samples from patients with hepatocellular carcinoma (HCC) before surgery. Figure 21B is Table 2150, showing the number of sequence reads and sequencing depth achieved for each sample.

图22是表220,展示健康对照的血浆样品中常染色体中的甲基化密度,在71.2%到72.5%范围内。Figure 22 is a copy of Table 220, showing the methylation density in autosomes in plasma samples from healthy controls, ranging from 71.2% to 72.5%.

图23A和23B展示HCC患者的白细胞层、肿瘤组织、非肿瘤肝组织、手术前血浆 和手术后血浆的甲基化密度。Figures 23A and 23B show the methylation density of leukocyte layer, tumor tissue, non-tumor liver tissue, preoperative plasma, and postoperative plasma in HCC patients.

图24A是图2400,展示来自HCC患者的手术前血浆的甲基化密度。图24B是曲线2450,展示来自HCC患者的手术后血浆的甲基化密度。Figure 24A is Figure 2400, showing the methylation density of plasma from HCC patients before surgery. Figure 24B is curve 2450, showing the methylation density of plasma from HCC patients after surgery.

图25A和25B展示针对染色体1,使用四个健康对照个体的血浆甲基化组数据作为参考,HCC患者的手术前(图2500)和手术后(图2550)血浆样品的血浆DNA甲基 化密度的z分数。Figures 25A and 25B show the z-scores of plasma DNA methylation density in plasma samples from HCC patients before (Figure 2500) and after (Figure 2550) surgery, using plasma methylome data from four healthy control individuals as a reference, for chromosome 1.

图26A是表2600,展示手术前和手术后血浆的z分数的数据。图26B是Circos图2620,展示针对从所有常染色体分析的1Mb区域,使用四个健康对照个体作为参考,HCC患者的手术前和手术后血浆样品的血浆DNA甲基化密度的z分数。图26C是表 2640,展示HCC患者的手术前与手术后血浆样品中全基因组的1Mb区域的z分数的分 布。图26D是表2660,展示在使用CHH和CHG背景时,肿瘤组织和与一些对照血浆 样品重叠之手术前血浆样品的甲基化水平。Figure 26A, Table 2600, shows the z-score data for pre- and post-operative plasma. Figure 26B, Circos Figure 2620, shows the z-scores of plasma DNA methylation density in pre- and post-operative plasma samples from HCC patients, targeting 1Mb regions analyzed from all autosomes, using four healthy controls as references. Figure 26C, Table 2640, shows the distribution of z-scores for 1Mb regions across the entire genome in pre- and post-operative plasma samples from HCC patients. Figure 26D, Table 2660, shows the methylation levels in tumor tissue and pre-operative plasma samples overlapping with some control plasma samples, using CHH and CHG backgrounds.

图27A-H展示根据本发明的实施例,8个癌症患者的甲基化密度的Circos图。图27I是表2780,展示每一样品所实现的序列读数的数目和测序深度。图27J是表2790, 展示不同恶性病患者的血浆中全基因组水平的解析度为1Mb区域的z分数的分布。CL= 肺腺癌;NPC=鼻咽癌;CRC=结肠直肠癌;NE=神经内分泌癌;SMS=平滑肌肉瘤。Figures 27A-H show Circos plots of methylation density for eight cancer patients according to an embodiment of the present invention. Figure 27I is Table 2780, showing the number of sequence reads and sequencing depth achieved for each sample. Figure 27J is Table 2790, showing the distribution of z-scores for regions with a resolution of 1 Mb at the whole genome level in the plasma of patients with different malignant diseases. CL = Lung adenocarcinoma; NPC = Nasopharyngeal carcinoma; CRC = Colorectal cancer; NE = Neuroendocrine carcinoma; SMS = Leiomyosarcoma.

图28是方法2800的流程图,根据本发明的实施例,分析生物体的生物样品以确定癌症等级的分类。Figure 28 is a flowchart of method 2800, which, according to an embodiment of the present invention, analyzes a biological sample of an organism to determine the classification of cancer grade.

图29A是曲线2900,展示参考个体中甲基化密度的分布,假定此分布遵循正态分布。图29B是曲线2950,展示癌症个体中甲基化密度的分布,假定此分布遵循正态分 布并且平均甲基化水平是比阈值低2个标准偏差。Figure 29A, curve 2900, shows the distribution of methylation density in the reference individual, assuming this distribution follows a normal distribution. Figure 29B, curve 2950, shows the distribution of methylation density in the cancer individual, assuming this distribution follows a normal distribution and the mean methylation level is 2 standard deviations below the threshold.

图30是曲线3000,展示健康个体和癌症患者的血浆DNA的甲基化密度的分布。Figure 30 is curve 3000, showing the distribution of methylation density in plasma DNA of healthy individuals and cancer patients.

图31是图表3100,展示健康个体的血浆DNA与HCC患者的肿瘤组织的平均值之 间的甲基化密度差异的分布。Figure 31 is Chart 3100, showing the distribution of the difference in methylation density between plasma DNA in healthy individuals and the mean methylation density between tumor tissues of HCC patients.

图32A是表3200,展示当血浆样品含有5%或2%肿瘤DNA时减小测序深度的影响。Figure 32A is Table 3200, showing the effect of reducing sequencing depth when plasma samples contain 5% or 2% tumor DNA.

图32B是图表3250,展示四个健康对照个体的血浆、HCC患者的白细胞层、正常 肝组织、肿瘤组织、手术前血浆和手术后血浆样品中重复元件和非重复区域的甲基化密 度。Figure 32B is Figure 3250, showing the methylation density of repeating elements and non-repeat regions in plasma and leukocyte layer of HCC patients, normal liver tissue, tumor tissue, preoperative plasma and postoperative plasma samples from four healthy control individuals.

图33展示可与根据本发明的实施例的系统和方法一起使用的例示性计算机系统3300的框图。Figure 33 shows a block diagram of an exemplary computer system 3300 that can be used with systems and methods according to embodiments of the present invention.

图34A展示全身性红斑狼疮(SLE)患者SLE04中血浆DNA的尺寸分布。图34B 和34C展示来自SLE患者SLE04(图34B)和HCC患者TBR36(图34C)的血浆DNA 的甲基化分析。Figure 34A shows the size distribution of plasma DNA in SLE04, a patient with systemic lupus erythematosus (SLE). Figures 34B and 34C show the methylation analysis of plasma DNA from SLE04 (Figure 34B) and TBR36 (Figure 34C), a patient with hepatocellular carcinoma (HCC).

图35是方法3500的流程图,所述方法3500根据本发明的实施例,基于CpG岛的 高甲基化确定癌症等级的分类。Figure 35 is a flowchart of method 3500, which, according to an embodiment of the present invention, determines the classification of cancer grades based on the hypermethylation of CpG islands.

图36是方法3600的流程图,所述方法3600根据本发明的实施例,使用多个染色 体区域分析生物体的生物样品。Figure 36 is a flowchart of method 3600, which, according to an embodiment of the present invention, analyzes biological samples of an organism using multiple chromosome regions.

图37A展示患者TBR36的肿瘤组织、未经亚硫酸氢盐(BS)处理的血浆DNA和 经亚硫酸氢盐(BS)处理的血浆DNA(从内到外)的CNA分析。图37B是展示针对患者TBR36,使用经亚硫酸氢盐处理的血浆和未经亚硫酸氢盐处理的血浆检测1Mb区域 的CNA的z分数之间的关系的散点图。Figure 37A shows the CNA analysis of tumor tissue, untreated (BS) plasma DNA, and BS-treated plasma DNA (from inside to outside) in patient TBR36. Figure 37B is a scatter plot showing the relationship between the z-scores of CNAs in the 1 Mb region detected using BS-treated and untreated plasma in patient TBR36.

图38A展示患者TBR34的肿瘤组织、未经亚硫酸氢盐(BS)处理的血浆DNA和 经亚硫酸氢盐(BS)处理的血浆DNA(从内到外)的CNA分析。图38B是展示针对患 者TBR34,使用经亚硫酸氢盐处理的血浆和未经亚硫酸氢盐处理的血浆检测1Mb区域的CNA的z分数之间的关系的散点图。Figure 38A shows the CNA analysis of tumor tissue, untreated (BS) plasma DNA, and BS-treated plasma DNA (from inside to outside) in patient TBR34. Figure 38B is a scatter plot showing the relationship between the z-scores of CNAs in the 1 Mb region detected using BS-treated and untreated plasma for patient TBR34.

图39A是展示HCC患者TBR240的经亚硫酸氢盐处理的血浆的CNA(内环)和甲 基化分析(外环)的Circos图。图39B是展示HCC患者TBR164的经亚硫酸氢盐处理 的血浆的CNA(内环)和甲基化分析(外环)的Circos图。Figure 39A is a Circos diagram showing the CNA (inner ring) and methylation (outer ring) analysis of bisulfite-treated plasma from HCC patient TBR240. Figure 39B is a Circos diagram showing the CNA (inner ring) and methylation (outer ring) analysis of bisulfite-treated plasma from HCC patient TBR164.

图40A展示患者TBR36的处理前样品和处理后样品的CNA分析。图40B展示患者TBR36的处理前样品和处理后样品的甲基化分析。图41A展示患者TBR34的处理前样 品和处理后样品的CNA分析。图41B展示患者TBR34的处理前样品和处理后样品的甲 基化分析。Figure 40A shows CNA analysis of pre- and post-treatment samples of patient TBR36. Figure 40B shows methylation analysis of pre- and post-treatment samples of patient TBR36. Figure 41A shows CNA analysis of pre- and post-treatment samples of patient TBR34. Figure 41B shows methylation analysis of pre- and post-treatment samples of patient TBR34.

图42展示具有不同数目的测序读数的全基因组低甲基化分析的诊断性能图。Figure 42 shows the diagnostic performance of whole-genome hypomethylation analysis with different numbers of sequencing reads.

图43是展示基于使用不同区域尺寸(50kb、100kb、200kb和1Mb)的全基因组 低甲基化分析检测癌症的ROC曲线的图。Figure 43 is a graph showing the ROC curves for detecting cancer based on whole-genome hypomethylation analysis using different region sizes (50kb, 100kb, 200kb and 1Mb).

图44A展示累积概率(CP)和具有异常的区域的百分比的诊断性能。图44B展示 针对整体低甲基化、CpG岛高甲基化和CNA的血浆分析的诊断性能。Figure 44A shows the diagnostic performance of cumulative probability (CP) and the percentage of regions with abnormalities. Figure 44B shows the diagnostic performance of plasma analysis for overall hypomethylation, CpG island hypermethylation, and CNA.

图45展示具有肝细胞癌患者中整体低甲基化、CpG岛高甲基化和CNA的结果的表。Figure 45 shows a table of results for overall hypomethylation, CpG island hypermethylation, and CNA in patients with hepatocellular carcinoma.

图46展示患有除肝细胞癌外的癌症的患者中整体低甲基化、CpG岛高甲基化和CNA的结果的表。Figure 46 shows a table of results for overall hypomethylation, CpG island hypermethylation, and CNA in patients with cancers other than hepatocellular carcinoma.

图47展示案例TBR34的血浆甲基化的系列分析。Figure 47 shows a series of analyses of plasma methylation in case TBR34.

图48A展示证实HCC患者TBR36的经亚硫酸氢盐处理的血浆DNA中的CNA(内 环)和甲基化改变(外环)的Circos图。图48B是HCC患者TBR36的具有染色体增加 和缺失的区域以及无拷贝数改变的区域的甲基化z分数的箱式图。Figure 48A shows a Circos plot confirming CNA (inner loop) and methylation alterations (outer loop) in bisulfite-treated plasma DNA of HCC patient TBR36. Figure 48B is a box plot of methylation z-fractions in regions with chromosomal increases and deletions, as well as regions without copy number alterations, of HCC patient TBR36.

图49A展示证实HCC患者TBR34的经亚硫酸氢盐处理的血浆DNA中的CNA(内 环)和甲基化改变(外环)的Circos图。图49B是HCC患者TBR34的具有染色体增加 和缺失的区域以及无拷贝数改变的区域的甲基化z分数的箱式图。Figure 49A shows a Circos diagram confirming CNA (inner loop) and methylation alterations (outer loop) in bisulfite-treated plasma DNA of TBR34 from HCC patients. Figure 49B is a box plot of methylation z-fractions in regions with chromosomal increases and deletions, as well as regions without copy number alterations, of TBR34 from HCC patients.

图50A和50B展示SLE患者SLE04和SLE10的血浆低甲基化和CNA分析的结果。Figures 50A and 50B show the results of plasma hypomethylation and CNA analysis in SLE04 and SLE10 patients.

图51A和51B展示两个HCC患者(TBR34和TBR36)的血浆的有和无CNA的区域的 Z甲基化分析。图51C和51D展示两个SLE患者(SLE04和SLE10)的血浆的有和无CNA的区域的Z甲基化分析。Figures 51A and 51B show Z- methylation analysis of plasma regions with and without CNA in two HCC patients (TBR34 and TBR36). Figures 51C and 51D show Z- methylation analysis of plasma regions with and without CNA in two SLE patients (SLE04 and SLE10).

图52A展示使用CNA、整体甲基化和CpG岛甲基化的A组特征,对来自HCC患 者、非HCC癌症患者和健康对照个体的血浆样品的分层聚类分析。图52B展示使用 CNA、整体甲基化和CpG岛甲基化的B组特征的分层聚类。Figure 52A shows the hierarchical cluster analysis of plasma samples from HCC patients, non-HCC cancer patients, and healthy controls using Group A features, including CNA, global methylation, and CpG island methylation. Figure 52B shows the hierarchical clustering using Group B features, including CNA, global methylation, and CpG island methylation.

图53A展示使用A组CpG岛甲基化特征,对来自HCC患者、非HCC癌症患者和 健康对照个体的血浆样品的分层聚类分析。图53B展示使用A组整体甲基化密度,对来 自HCC患者、非HCC癌症患者和健康对照个体的血浆样品的分层聚类分析。Figure 53A shows a hierarchical cluster analysis of plasma samples from HCC patients, non-HCC cancer patients, and healthy controls using group A CpG island methylation characteristics. Figure 53B shows a hierarchical cluster analysis of plasma samples from HCC patients, non-HCC cancer patients, and healthy controls using group A overall methylation density.

图54A展示使用A组整体CNA,对来自HCC患者、非HCC癌症患者和健康对照 个体的血浆样品的分层聚类分析。图54B展示使用B组CpG岛甲基化密度,对来自 HCC患者、非HCC癌症患者和健康对照个体的血浆样品的分层聚类分析。Figure 54A shows the hierarchical cluster analysis of plasma samples from HCC patients, non-HCC cancer patients, and healthy controls using group A overall CNA. Figure 54B shows the hierarchical cluster analysis of plasma samples from HCC patients, non-HCC cancer patients, and healthy controls using group B CpG island methylation density.

图55A展示使用B组整体甲基化密度,对来自HCC患者、非HCC癌症患者和健 康对照个体的血浆样品的分层聚类分析。图55B展示使用B组整体甲基化密度,对来自 HCC患者、非HCC癌症患者和健康对照个体的血浆样品的分层聚类分析。Figure 55A shows the hierarchical cluster analysis of plasma samples from HCC patients, non-HCC cancer patients, and healthy controls using the overall methylation density of group B. Figure 55B shows the hierarchical cluster analysis of plasma samples from HCC patients, non-HCC cancer patients, and healthy controls using the overall methylation density of group B.

图56展示32个健康个体当中1Mb区域的平均甲基化密度(红点)。Figure 56 shows the average methylation density (red dots) in the 1Mb region among 32 healthy individuals.

具体实施方式Detailed Implementation

定义definition

“甲基化组”提供了基因组中多个位点或基因座的DNA甲基化量的量度。甲基化组可以对应于所有基因组、基因组的大部分或基因组的相对较小部分。“胎儿甲基化组”对 应于怀孕女性的胎儿的甲基化组。胎儿甲基化组可以使用多种胎儿组织或胎儿DNA来 源,包括胎盘组织和母体血浆中的游离胎儿DNA来测定。“肿瘤甲基化组”对应于生物 体(例如人类)的肿瘤的甲基化组。肿瘤甲基化组可以使用肿瘤组织或母体血浆中的游离肿瘤DNA测定。胎儿甲基化组和肿瘤甲基化组是相关甲基化组的实例。相关甲基化 组的其它实例是可以提供DNA到体液(例如血浆、血清、汗水、唾液、尿、生殖器分 泌物、精液、粪便液、腹泻液、脑脊髓液、胃肠道分泌物、胰腺分泌物、肠分泌物、痰 液、泪液、来自乳房和甲状腺的抽吸液等)中的器官的甲基化组(例如脑细胞、骨、肺、心、肌肉和肾等的甲基化组)。器官可以是移植器官。The “methylome” provides a measure of the amount of DNA methylation at multiple sites or loci in the genome. A methylome can correspond to the entire genome, a large portion of the genome, or a relatively small portion of the genome. The “fetal methylome” corresponds to the methylome of the fetus of a pregnant woman. The fetal methylome can be determined using a variety of fetal tissue or fetal DNA sources, including placental tissue and cell-free fetal DNA in maternal plasma. The “tumor methylome” corresponds to the methylome of a tumor in an organism (e.g., a human). The tumor methylome can be determined using tumor tissue or cell-free tumor DNA in maternal plasma. The fetal methylome and the tumor methylome are examples of related methylomes. Other examples of relevant methylomes include the methylomes of organs (e.g., brain cells, bone, lungs, heart, muscles, and kidneys) that can provide DNA into bodily fluids (e.g., plasma, serum, sweat, saliva, urine, genital secretions, semen, fecal fluid, diarrheal fluid, cerebrospinal fluid, gastrointestinal secretions, pancreatic secretions, intestinal secretions, sputum, tears, aspirated fluid from the breast and thyroid gland, etc.). The organ can be a transplanted organ.

“血浆甲基化组”是从动物(例如人类)的血浆或血清测定的甲基化组。因为血浆和血清包括游离DNA(游离DNA是指以存在于细胞外的DNA,又指胞外DNA),所以血 浆甲基化组是游离甲基化组的一个实例。血浆甲基化组也是混合甲基化组的一个实例, 因为其是胎儿/母体甲基化组或肿瘤/患者甲基化组的混合物。“胎盘甲基化组”可以从绒 膜绒毛样品(CVS)或胎盘组织样品(例如产后获得)测定。“细胞甲基化组”对应于从患者的细胞(例如血细胞)测定的甲基化组。血细胞的甲基化组称为血细胞甲基化组(或 血液甲基化组)。"Plasma methylome" refers to the methylome measured from the plasma or serum of an animal (e.g., a human). Because plasma and serum contain cell-free DNA (cell-free DNA refers to DNA present outside cells), the plasma methylome is an example of the cell-free methylome. The plasma methylome is also an example of a mixed methylome, as it is a mixture of fetal/maternal methylomes or tumor/patient methylomes. "Placental methylome" can be measured from chorionic villus sampling (CVS) or placental tissue samples (e.g., postpartum). "Cellular methylome" corresponds to the methylome measured from the patient's cells (e.g., blood cells). The methylome of blood cells is called the blood cell methylome (or blood methylome).

“位点”对应于单个位点,其可以是单个碱基位置或一组相关碱基位置,例如CpG位点。“基因座”可以对应于包括多个位点的区域。基因座可以只包括一个位点,此将使得 所述基因座在此背景下相当于一个位点。A "site" corresponds to a single location, which can be a single base position or a group of related base positions, such as a CpG site. A "locus" can correspond to a region that includes multiple sites. A locus can also include only one site, which would make the locus equivalent to a single site in this context.

每个基因组位点(例如CpG位点)的“甲基化指数”是指在位点上展示甲基化的序列读数占覆盖所述位点的读数总数的比例。区域的“甲基化密度”是区域内展示甲基化的位点的读数的数目除以所述区域中覆盖位点的读数的总数。位点可以具有特定的特征,例 如为CpG位点。因此,区域的“CpG甲基化密度”是展示CpG甲基化的读数的数目除以 区域中覆盖CpG位点(例如特定CpG位点、CpG岛内的CpG位点或更大区域)的读数的总数。举例来说,人类基因组中每个100kb区域的甲基化密度可以从CpG位点上在 亚硫酸氢盐处理之后未转化的胞嘧啶(其对应于甲基化胞嘧啶)的总数占测得并比对到 100kb区域的序列读数所覆盖的所有CpG位点的比例。此分析也可以针对例如50kb或 1Mb等其它区域尺寸进行。区域可以是整个基因组或染色体或染色体的一部分(例如染 色体臂)。当区域仅仅包括CpG位点时,CpG位点的甲基化指数与区域的甲基化密度相 同。“甲基化胞嘧啶的比例”是指在分析的胞嘧啶残基的总数上,即在区域中包括在CpG 背景外的胞嘧啶,展示甲基化(例如在亚硫酸氢盐转化之后未转化)的胞嘧啶位点“C” 的数目。甲基化指数、甲基化密度和甲基化胞嘧啶的比例是“甲基化水平”的实例。The “methylation index” for each genomic locus (e.g., a CpG site) refers to the proportion of sequence reads showing methylation at the site relative to the total number of reads covering that site. The “methylation density” of a region is the number of reads showing methylation at sites within that region divided by the total number of reads covering sites within that region. Sites can have specific characteristics, such as being CpG sites. Therefore, the “CpG methylation density” of a region is the number of reads showing CpG methylation divided by the total number of reads covering CpG sites (e.g., a specific CpG site, CpG sites within a CpG island, or a larger region) within that region. For example, the methylation density of each 100kb region in the human genome can be calculated as the proportion of the total number of unconverted cytosine (corresponding to methylated cytosine) at CpG sites after bisulfite treatment to all CpG sites measured and aligned to the 100kb region's sequence reads covering all CpG sites. This analysis can also be performed for other region sizes, such as 50kb or 1Mb. A region can be an entire genome or a chromosome or a portion of a chromosome (e.g., a chromosome arm). When a region contains only CpG sites, the methylation index of the CpG sites is the same as the methylation density of the region. The "ratio of methylated cytosine" refers to the number of methylated (e.g., unconverted after bisulfite conversion) cytosine sites "C" out of the total number of cytosine residues analyzed, i.e., cytosine included in the region outside the CpG background. The methylation index, methylation density, and ratio of methylated cytosine are examples of "methylation level."

“甲基化型态”(也称为甲基化状态)包括与区域的DNA甲基化有关的信息。与DNA甲基化有关的信息可以包括(但不限于)CpG位点的甲基化指数、区域中CpG位点的 甲基化密度、相邻区域上CpG位点的分布、含有一个以上CpG位点的区域内每一个别 CpG位点的甲基化的模式或水平以及非CpG甲基化。基因组的大部分的甲基化型态可 以视为相当于等同于甲基化组。哺乳动物基因组中的“DNA甲基化”通常是指添加甲基到 CpG双核苷酸中胞嘧啶残基的5'碳(即5-甲基胞嘧啶)。DNA甲基化可以发生在例如 CHG和CHH等其它背景下的胞嘧啶中,其中H是腺嘌呤、胞嘧啶或胸腺嘧啶。胞嘧啶 甲基化也可以呈5-羟基甲基胞嘧啶形式。还报导了非胞嘧啶甲基化,例如N6-甲基腺嘌 呤。"Methylation pattern" (also known as methylation state) includes information related to DNA methylation in a region. Information related to DNA methylation may include (but is not limited to) the methylation index of CpG sites, the methylation density of CpG sites in a region, the distribution of CpG sites in adjacent regions, the pattern or level of methylation at each individual CpG site in a region containing more than one CpG site, and non-CpG methylation. Most of the methylation pattern in the genome can be considered equivalent to the methylome. In mammalian genomes, "DNA methylation" typically refers to the addition of a methyl group to the 5' carbon of a cytosine residue in a CpG dinucleotide (i.e., 5-methylcytosine). DNA methylation can occur in cytosine in other backgrounds, such as CHG and CHH, where H is adenine, cytosine, or thymine. Cytosine methylation can also occur in the form of 5-hydroxymethylcytosine. Non-cytosine methylation, such as N6-methyladenine, has also been reported.

“组织”对应于任何细胞。不同类型的组织可以对应于不同类型的细胞(例如肝、肺或血液),但也可以对应于来自不同生物体(母亲与胎儿)的组织或健康细胞与肿瘤细 胞。「生物样品」是指取自个体(例如人类,例如孕妇、癌症患者或怀疑患有癌症者、 器官移植接受者或怀疑患有涉及器官的疾病过程的个体(例如心肌梗塞中的心或中风中的脑))并含有一或多个相关核酸分子的任何样品。生物样品可以是体液,例如血液、 血浆、血清、尿、阴道液、子宫或阴道冲洗液、胸膜液、腹水、脑脊髓液、唾液、汗水、 泪液、痰液、支气管肺泡灌洗液等。也可以使用粪便样品。"Tissue" corresponds to any cell. Different types of tissue can correspond to different types of cells (e.g., liver, lung, or blood), but can also correspond to tissues or healthy cells and tumor cells from different organisms (mother and fetus). "Biological sample" refers to any sample taken from an individual (e.g., a pregnant woman, a cancer patient or suspected cancer patient, an organ transplant recipient or an individual suspected of having a disease process involving an organ (e.g., the heart in a myocardial infarction or the brain in a stroke)) and containing one or more relevant nucleic acid molecules. Biological samples can be bodily fluids, such as blood, plasma, serum, urine, vaginal fluid, uterine or vaginal lavage fluid, pleural fluid, ascites, cerebrospinal fluid, saliva, sweat, tears, sputum, bronchoalveolar lavage fluid, etc. Fecal samples may also be used.

术语“癌症等级”可以指癌症是否存在、癌症阶段、肿瘤尺寸、是否存在转移、身体的总肿瘤负荷和/或癌症严重程度的其它量度。癌症等级可以是数值或其它特征。等级可以是零。癌症等级还包括与突变或多种突变相关的癌变前或癌前期病状(状态)。癌症 等级可以按各种方式使用。举例来说,筛选可以检查已知先前未患癌症的某人是否存在 癌症。评估可以研究已经被诊断为患有癌症的某人以监测癌症随时间推移的进展、研究 疗法的效用或确定预后。在一个实施例中,预后可以表示为患者死于癌症的机率,或在 特定持续时间或时间后癌症进展的机率,或癌症转移的机率。检测可以意指‘筛选’或可以意指检查具有癌症的暗示性特征(例如症状或其它阳性测试)的某人是否患有癌症。The term "cancer grade" can refer to the presence of cancer, cancer stage, tumor size, presence of metastasis, total tumor burden in the body, and/or other measures of cancer severity. Cancer grade can be numerical or other characteristics. The grade can be zero. Cancer grade also includes precancerous or precancerous symptoms (states) associated with a mutation or multiple mutations. Cancer grade can be used in various ways. For example, screening can check whether someone who is known not to have cancer has cancer. Assessment can study someone who has been diagnosed with cancer to monitor cancer progression over time, investigate the efficacy of treatments, or determine prognosis. In one embodiment, prognosis can be expressed as the probability that a patient will die from cancer, or the probability that cancer will progress after a specific duration or time, or the probability that cancer will metastasize. Testing can mean 'screening' or it can mean checking whether someone has suggestive characteristics of cancer (such as symptoms or other positive tests) has cancer.

发明详述Invention Details

表观遗传机制在胚胎和胎儿发育中发挥着重要的作用。但是,人类胚胎和胎组织(包 括胎盘组织)不容易获得(美国专利6,927,028)。某些实施例已经通过分析具有母体循环中存在的游离胎儿DNA分子的样品解决了此问题。胎儿甲基化组可以用多种方式推 断。举例来说,母体血浆甲基化组可以与细胞甲基化组(来自母亲的血细胞)比较并且展示与胎儿甲基化组相关的差异。作为另一实例,可以使用胎儿特定的等位基因确定特 定的基因座上胎儿甲基化组的甲基化。另外,如尺寸与甲基化百分比之间的相关性所展 示,片段的尺寸可以用作甲基化百分比的指示物。Epigenetic mechanisms play a crucial role in embryonic and fetal development. However, human embryos and fetal tissues (including placental tissue) are not readily available (US Patent 6,927,028). This problem has been addressed in some embodiments by analyzing samples containing cell-free fetal DNA molecules present in maternal circulation. The fetal methylome can be inferred in several ways. For example, the maternal plasma methylome can be compared with the cellular methylome (from the mother's blood cells) and show differences associated with the fetal methylome. As another example, fetal-specific alleles can be used to determine the methylation of the fetal methylome at specific loci. Furthermore, as demonstrated by the correlation between size and methylation percentage, fragment size can be used as an indicator of methylation percentage.

在一个实施例中,全基因组亚硫酸氢盐测序用于在单核苷酸分辨率下分析母体血浆 DNA的甲基化型态(甲基化组的部分或全部)。通过采用母亲与胎儿之间的多态差异, 可以从母体血液样品集合胎儿甲基化组。在另一个实施例中,不使用多态差异,而是可 以使用血浆甲基化组与血细胞甲基化组之间的差异。In one embodiment, whole-genome bisulfite sequencing is used to analyze the methylation patterns (partial or complete methylome) of maternal plasma DNA at single nucleotide resolution. By employing polymorphic differences between mother and fetus, the fetal methylome can be derived from the maternal blood sample. In another embodiment, instead of using polymorphic differences, the difference between the plasma methylome and the blood cell methylome can be used.

在另一个实施例中,通过采用肿瘤基因组与非肿瘤基因组之间的单核苷酸变异和/ 或拷贝数异常以及来自血浆(或其它样品)的测序数据,可以在怀疑或已知患有癌症的患者的样品中进行肿瘤的甲基化型态分析。在与健康对照或一组健康对照的血浆甲基化水平比较时测试个体的血浆样品中甲基化水平的差异可以允许鉴别测试个别为患有癌症。另外,甲基化型态可以充当揭露癌症类型、例如来自哪一器官、个人已经发展和是否已发生转移的标记。In another embodiment, tumor methylation pattern analysis can be performed on samples from patients suspected or known to have cancer by employing single nucleotide variants and/or copy number abnormalities between the tumor genome and non-tumor genomes, as well as sequencing data from plasma (or other samples). Differences in methylation levels in the plasma sample of a tested individual, when compared to plasma methylation levels in healthy controls or a group of healthy controls, can allow for the identification of the tested individual as having cancer. Furthermore, methylation pattern can serve as a marker revealing the type of cancer, such as which organ it originated from, whether the individual has developed cancer, and whether metastasis has occurred.

由于此方法具有无创性,故能够从早期妊娠、晚期妊娠中和产后收集的母体血液样 品连续评估胎儿和母体血浆甲基化组。观测到妊娠相关的改变。所述方法也可以应用于在中期妊娠期间获得的样品。从妊娠期间的母体血浆推断的胎儿甲基化组类似于胎盘甲基化组。从母体血浆数据鉴别出印记基因和甲基化有差异的区域。Because this method is non-invasive, it allows for continuous assessment of fetal and maternal plasma methylomes from maternal blood samples collected in early pregnancy, late pregnancy, and postpartum. Pregnancy-related changes were observed. The method can also be applied to samples obtained during mid-pregnancy. The fetal methylome inferred from maternal plasma during pregnancy is similar to the placental methylome. Regions with imprinted genes and differential methylation were identified from maternal plasma data.

因此,已经研发出一种无创、连续和全面地研究胎儿甲基化组的方法,因而能够鉴别生物标记物或直接测试与妊娠相关的病变。实施例也可以用于无创、连续和全面地研 究肿瘤甲基化组,用于筛选或检测个体是否患有癌症、用于监测癌症患者中的恶性疾病 和用于预后。实施例可以应用于任何癌症类型,包括(但不限于)肺癌、乳癌、结肠直 肠癌、前列腺癌、鼻咽癌、胃癌、睾丸癌、皮肤癌(例如黑色素瘤)、影响神经系统的 癌症、骨癌、卵巢癌、肝癌(例如肝细胞癌)、血液科恶性疾病、胰腺癌、子宫内膜癌、 肾癌、宫颈癌、膀胱癌等。Therefore, a non-invasive, continuous, and comprehensive method for studying the fetal methylome has been developed, enabling the identification of biomarkers or direct testing for pregnancy-related lesions. Examples can also be used for non-invasive, continuous, and comprehensive studies of the tumor methylome for screening or detecting cancer in individuals, monitoring malignancies in cancer patients, and for prognosis. Examples can be applied to any type of cancer, including (but not limited to) lung cancer, breast cancer, colorectal cancer, prostate cancer, nasopharyngeal carcinoma, gastric cancer, testicular cancer, skin cancer (e.g., melanoma), cancers affecting the nervous system, bone cancer, ovarian cancer, liver cancer (e.g., hepatocellular carcinoma), hematological malignancies, pancreatic cancer, endometrial cancer, kidney cancer, cervical cancer, bladder cancer, etc.

首先论述了如何测定甲基化组或甲基化型态的描述,随后描述了不同的甲基化组(例如胎儿甲基化组、肿瘤甲基化组、母亲或患者的甲基化组以及混合的甲基化组,例 如来自血浆)。随后描述了使用胎儿特定的标记物或通过比较混合的甲基化型态与细胞 甲基化型态对胎儿甲基化型态的测定。胎儿甲基化标记物通过比较甲基化型态来测定。 论述尺寸与甲基化之间的关系。还提供了甲基化型态检测癌症的用途。First, a description of how to determine methylomes or methylation patterns is presented. Then, different methylomes are described (e.g., fetal methylome, tumor methylome, maternal or patient methylome, and mixed methylomes, such as those derived from plasma). The determination of fetal methylation patterns using fetal-specific markers or by comparing mixed methylation patterns with cellular methylation patterns is then described. Fetal methylation markers are determined by comparing methylation patterns. The relationship between size and methylation is discussed. The use of methylation patterns in cancer detection is also presented.

I.甲基化组的测定I. Determination of the methylation group

无数的方法已经用于研究胎盘甲基化组,但每种方法都有它的限制性。举例来说,亚硫酸氢钠,一种将未甲基化的胞嘧啶残基改性成尿嘧啶并保持甲基化的胞嘧啶不变的化学物质,将胞嘧啶甲基化的差异转化成基因序列差异以供进一步询问。研究胞嘧啶甲 基化的金标法是基于用亚硫酸氢钠处理组织DNA,接着对经亚硫酸氢盐转化的DNA分 子的个别克隆进行直接测序。在分析DNA分子的多个克隆后,可以获得每个CpG位点 的胞嘧啶甲基化模式和定量型态。但是,克隆的亚硫酸氢盐测序是低通量与劳动密集型程序,不易在全基因组规模上应用。Numerous methods have been used to study the placental methylome, but each has its limitations. For example, sodium bisulfite, a chemical that modifies unmethylated cytosine residues into uracil while preserving methylated cytosine, translates differences in cytosine methylation into differences in gene sequence for further inquiry. The gold standard for studying cytosine methylation is based on treating tissue DNA with sodium bisulfite, followed by direct sequencing of individual clones of the bisulfite-converted DNA molecules. After analyzing multiple clones of the DNA molecule, the cytosine methylation pattern and quantitative morphology at each CpG site can be obtained. However, clonal bisulfite sequencing is a low-throughput and labor-intensive procedure, not easily applied on a whole-genome scale.

通常消化未甲基化DNA的甲基化敏感性限制酶提供了一种研究DNA甲基化的低成本方法。但是,由此类研究产生的数据局限于具有酶识别基元的基因座并且结果不是定 量的。由抗甲基化胞嘧啶抗体结合的DNA的免疫沉淀可以用于研究基因组的大区段, 但往往会偏向具有稠密甲基化的基因座,因为抗体结合于此类区域的强度更高。基于微 阵列的方法依赖于用于研究的探针的先验设计和探针与标靶DNA之间的杂交效率。Methylation-sensitive restriction enzymes, which digest unmethylated DNA, typically provide a low-cost method for studying DNA methylation. However, the data generated by such studies are limited to loci with enzyme recognition motifs, and the results are not quantitative. Immunoprecipitation of DNA bound to anti-methylating cytosine antibodies can be used to study large regions of the genome, but tends to favor loci with dense methylation because antibodies bind more strongly to such regions. Microarray-based methods rely on the prior design of the probes used in the study and the hybridization efficiency between the probes and target DNA.

为了全面地研究甲基化组,一些实施例使用大规模平行测序(MPS)基于每个核苷酸和每个等位基因来提供全基因组信息和甲基化水平的定量评估。近来,先亚硫酸氢盐 转化后全基因组MPS已经变得可行(李斯特等人2008细胞;133:523-536(R Lister et al2008Cell;133:523-536))。To comprehensively study the methylome, some embodiments use massively parallel sequencing (MPS) to provide genome-wide information and quantitative assessment of methylation levels based on each nucleotide and each allele. Recently, genome-wide MPS after pre-bisulfite transformation has become feasible (Lister et al. 2008 Cell; 133:523-536).

在少量的将全基因组亚硫酸氢盐测序应用于研究人类甲基化组的已发表研究(李斯 特等人2009自然;462:315-322(R Lister et al.2009Nature;462:315-322);劳伦特等人 2010基因组研究;20:320-331(Laurent et al.2010Genome Res;20:320-331);李等人2010美国科学公共图书馆·生物学;8:e1000533(Y Li et al.2010PLoS Biol;8:e1000533);和库里斯等人2012自然遗传;44:1236-1242(M Kulis et al.2012Nat Genet;44:1236-1242))中,两个研究集中于胚胎干细胞和胎儿成纤维细胞(李斯特等人2009 自然;462:315-322;劳伦特等人2010基因组研究;20:320-331)。两个研究都分析细胞 系来源的DNA。A small number of published studies have applied whole-genome bisulfite sequencing to the study of the human methylome (Lister et al. 2009 Nature; 462:315-322); Laurent et al. 2010 Genome Research; 20:320-331; Li et al. 2010 PRS; 8:e In *Lister et al.* 2009 *Nature* 462:315-322; and *Laurent et al.* 2010 *Genomics Research* 20:20-331, both studies focused on embryonic stem cells and fetal fibroblasts (Lister et al. 2009 *Nature* 462:315-322; Laurent et al. 2010 *Genomics Research* 20:320-331). Both studies analyzed cell line-derived DNA.

A.全基因组亚硫酸氢盐测序A. Whole-genome bisulfite sequencing

某些实施例可以解决前述的挑战并能够全面、无创和连续地研究胎儿甲基化组。在 一个实施例中,将全基因组亚硫酸氢盐测序用于分析在孕妇血液循环中发现的游离胎儿 DNA分子。尽管血浆DNA分子的丰度低和片段性,我们仍能够从母体血浆组装高分辨 率胎儿甲基化组并观测到随着妊娠进展相关的连续改变。假定对无创产前测试(NIPT) 的兴趣非常浓厚,实施例可以提供一种用于发现胎儿生物标记的强大的新工具,或充当 实现胎儿或妊娠相关疾病的NIPT的直接平台。目前提供了来自各种样品的全基因组亚 硫酸氢盐测序的数据,从其中可以得出胎儿甲基化组。在一个实施例中,此项技术可以应用于并发有先兆子痫或子宫内生长迟缓或早产的孕妇中的甲基化型态分析。对于此类 妊娠并发症,因为其无创性,此项技术可以连续使用从而允许监测和/或预后和/或监测 对治疗作出的反应。Certain embodiments address the aforementioned challenges and enable comprehensive, non-invasive, and continuous studies of the fetal methylome. In one embodiment, whole-genome bisulfite sequencing is used to analyze cell-free fetal DNA molecules found in the maternal bloodstream. Despite the low abundance and fragmentation of plasma DNA molecules, we are able to assemble a high-resolution fetal methylome from maternal plasma and observe continuous changes associated with pregnancy progression. Assuming a strong interest in non-invasive prenatal testing (NIPT), these embodiments could provide a powerful new tool for discovering fetal biomarkers or serve as a direct platform for NIPT of fetal or pregnancy-related conditions. Data from whole-genome bisulfite sequencing of various samples are currently available from which the fetal methylome can be derived. In one embodiment, this technique can be applied to the analysis of methylation patterns in pregnant women with preeclampsia, intrauterine growth restriction, or preterm labor. For such pregnancy complications, this technique can be used continuously due to its non-invasive nature, allowing for monitoring and/or prognosis and/or monitoring of response to treatment.

图1A展示表100,为根据本发明的实施例,母体血液、胎盘和母体血浆的测序结果。在一个实施例中,对以下各者的经亚硫酸氢盐转化的DNA文库进行全基因组测序, 所述DNA文库是使用甲基化DNA文库衔接子(伊路米那(Illumina))(李斯特等人2008 细胞;133:523-536)制备:早期妊娠收集的血液样品的血细胞、CVS、分娩时收集的胎 盘组织、早期妊娠和晚期妊娠以及产后时期期间收集的母体血浆样品。还分析从一个成 年男性和一个成年非怀孕女性获得的血细胞和血浆DNA样品。在这一研究中,产生总 共95亿对的原始序列读数。每个样品的测序覆盖度展示在表100中。Figure 1A shows Table 100, illustrating the sequencing results of maternal blood, placenta, and maternal plasma according to an embodiment of the present invention. In one embodiment, whole-genome sequencing was performed on bisulfite-converted DNA libraries prepared using methylated DNA library adaptors (Illumina) (Lister et al. 2008 Cell; 133:523-536): blood cells from blood samples collected in early pregnancy, CVS, placental tissue collected at delivery, and maternal plasma samples collected during early and late pregnancy and the postpartum period. Blood cell and plasma DNA samples obtained from an adult male and an adult non-pregnant woman were also analyzed. In this study, a total of 9.5 billion pairs of raw sequence reads were generated. The sequencing coverage for each sample is shown in Table 100.

对于早期妊娠、晚期妊娠和产后母体血浆样品,可唯一性地比对到人类参考基因组 的序列读数分别达到50倍、34倍和28倍的平均单倍体基因组覆盖度。对于从孕妇获得的样品,基因组中CpG位点的覆盖率在81%到92%范围内。针对早期妊娠、晚期妊娠 和产后母体血浆样品,横跨CpG位点的序列读数分别等于每链33倍、每链23倍和每链 19倍的平均单倍体覆盖率。所有样品的亚硫酸氢盐转化效率>99.9%(表100)。For maternal plasma samples from early pregnancy, late pregnancy, and postpartum, the sequence reads uniquely aligned to the human reference genome achieved average haploid genome coverage of 50-fold, 34-fold, and 28-fold, respectively. For samples obtained from pregnant women, CpG site coverage in the genome ranged from 81% to 92%. For early pregnancy, late pregnancy, and postpartum maternal plasma samples, the sequence reads across CpG sites were equal to average haploid coverage of 33-fold, 23-fold, and 19-fold per strand, respectively. Bisulfite conversion efficiency was >99.9% for all samples (Table 100).

表100中,不明确率(标记“a”)是指同时比对到参考人类基因组的沃森(Watson)和克里克(Crick)链的读数的比例。λ转化率是指内部对照λDNA通过亚硫酸氢盐处理 而转化成“胸腺嘧啶”残基占未甲基化胞嘧啶的比例。H一般等于A、C或T。“a”是指可 以映射到特定基因座但无法分配给沃森或克里克链的读数。“b”是指具有一致起始和结 束座标的成对读数。对于“c”,λDNA在亚硫酸氢盐转化前添加到每个样品中。λ转化率是指在亚硫酸氢盐转化后转化成“胸腺嘧啶”的胞嘧啶核苷酸的比例,并且用作成功亚 硫酸氢盐转化率的指示。“d”是指存在于参考人类基因组中并且在亚硫酸氢盐转化后保 持为胞嘧啶序列的胞嘧啶核苷酸的数目。In Table 100, the ambiguity rate (labeled "a") refers to the proportion of reads that simultaneously align to both the Watson and Crick chains of the reference human genome. The λ conversion rate refers to the proportion of unmethylated cytosine residues converted to "thymine" by bisulfite treatment of the internal control λDNA. H is generally equal to A, C, or T. "a" refers to reads that can be mapped to a specific locus but cannot be assigned to either the Watson or Crick chain. "b" refers to paired reads with consistent start and end coordinates. For "c", λDNA is added to each sample prior to bisulfite conversion. The λ conversion rate refers to the proportion of cytosine nucleotides converted to "thymine" after bisulfite conversion and is used as an indicator of successful bisulfite conversion. "d" refers to the number of cytosine nucleotides present in the reference human genome that retain a cytosine sequence after bisulfite conversion.

在亚硫酸氢盐改性期间,未甲基化胞嘧啶转化成尿嘧啶并随后在PCR扩增后转化成 胸腺嘧啶,而甲基化胞嘧啶将保持完整(弗罗梅尔等人1992美国国家科学院院 刊;89:1827-31(M Frommer et al.1992Proc Natl Acad Sci USA;89:1827-31))。在测序和 比对后,个别CpG位点的甲基化状态可以根据CpG序列中的胞嘧啶残基的甲基化序列 读数“M”(甲基化)的计数和未甲基化序列读数“U”(未甲基化)的计数推断得到。使用 亚硫酸氢盐测序数据,构筑母体血液、胎盘和母体血浆的整个甲基化组。可以使用以下 等式计算母体血浆中特定基因座的平均甲基化CpG密度(又称为甲基化密度MD):During bisulfite modification, unmethylated cytosine is converted to uracil and subsequently to thymine after PCR amplification, while methylated cytosine remains intact (M Frommer et al., 1992 Proc Natl Acad Sci USA; 89:1827-31). After sequencing and alignment, the methylation status of individual CpG sites can be inferred from the counts of methylated sequence reads "M" (methylated) and unmethylated sequence reads "U" (unmethylated) of cytosine residues in the CpG sequence. Using bisulfite sequencing data, the entire methylome of maternal blood, placenta, and maternal plasma is constructed. The average methylated CpG density (also known as methylation density MD) at a specific locus in maternal plasma can be calculated using the following equation:

其中M是基因座内CpG位点的甲基化读数的计数并且U是未甲基化读数的计数。 如果基因座内存在一个以上CpG位点,那么M和U对应于跨越这些位点的计数。Where M is the count of methylated reads at CpG sites within the locus and U is the count of unmethylated reads. If there is more than one CpG site within the locus, then M and U correspond to the counts across these sites.

B.各种技术B. Various technologies

如上所述,可以使用经亚硫酸氢盐转化的血浆DNA的大规模平行测序(MPS)进 行甲基化型态分析。经亚硫酸氢盐转化的血浆DNA的MPS可以用随机或鸟枪方式进行。 测序的深度可以根据相关区域的尺寸变化。As described above, methylation morphology analysis can be performed using massively parallel sequencing (MPS) of bisulfite-converted plasma DNA. MPS of bisulfite-converted plasma DNA can be performed randomly or using a shotgun approach. The sequencing depth can vary depending on the size of the relevant region.

在另一个实施例中,可以首先使用基于溶液相或固相杂交的过程捕捉经亚硫酸氢盐 转化的血浆DNA中的相关区域,接着进行MPS。大规模平行测序可以使用例如伊路米那等合成测序平台(sequencing-by-synthesis platform)、例如来自生命技术(LifeTechnologies)的SOLiD平台等接合测序平台(sequencing-by-ligation platform)、例如来 自生命技术的Ion Torrent或Ion Proton平台等基于半导体的测序系统或例如赫利克斯系 统(Helicos system)或太平洋生物科学系统(Pacific Biosciences system)等单分子测序系统或基于纳米孔的测序系统进行。基于纳米孔的测序包括使用例如脂质双层构筑的纳 米孔和蛋白质纳米孔以及固态纳米孔(例如基于石墨烯的纳米孔)。因为所选择的单分子测序平台将允许在无亚硫酸氢盐转化下直接检测DNA分子的甲基化状态(包括N6- 甲基腺嘌呤、5-甲基胞嘧啶和5-羟基甲基胞嘧啶)(福拉伯格等人2010自然方法;7: 461-465(BA Flusberg et al.2010Nat Methods;7:461-465);诗姆等人2013科学报道; 3:1389.doi:10.1038/srep01389(J Shim et al.2013Sci Rep;3:1389.doi: 10.1038/srep01389)),所以使用此类平台将允许分析未经亚硫酸氢盐转化的样品DNA (例如血浆DNA)的甲基化状态。In another embodiment, relevant regions in bisulfite-converted plasma DNA can be captured first using a solution-phase or solid-phase hybridization process, followed by MPS. Massive parallel sequencing can be performed using semiconductor-based sequencing systems such as the ilumina sequencing-by-synthesis platform, the SOLiD platform from Life Technologies, semiconductor-based sequencing systems such as the Ion Torrent or Ion Proton platforms from Life Technologies, or single-molecule sequencing systems such as the Helicos system or the Pacific Biosciences system, or nanopore-based sequencing systems. Nanopore-based sequencing includes the use of nanopores constructed from, for example, lipid bilayers and protein nanopores, as well as solid-state nanopores (e.g., graphene-based nanopores). Because the chosen single-molecule sequencing platform will allow for the direct detection of the methylation status of DNA molecules (including N6-methyladenine, 5-methylcytosine, and 5-hydroxymethylcytosine) without bisulfite conversion (BA Flusberg et al. 2010 Nat Methods; 7: 461-465; J Shim et al. 2013 Sci Rep; 3:1389.doi:10.1038/srep01389), the use of such a platform will allow for the analysis of the methylation status of DNA samples (e.g., plasma DNA) that have not undergone bisulfite conversion.

除测序以外,还可以使用其它技术。在一个实施例中,甲基化型态分析可以通过甲基化特定的PCR,或先甲基化敏感性限制酶消化后PCR,或先接合酶链式反应后PCR 来进行。在其它实施例中,PCR是单分子或数字PCR的形式(沃格斯坦等人1999美国国家科学院院刊;96:9236-9241(B Vogelstein et al.1999Proc Natl Acad Sci USA;96: 9236-9241))。在其它实施例中,PCR可以是实时PCR。在其它实施例中,PCR可以是 多重PCR。Besides sequencing, other techniques can be used. In one embodiment, methylation pattern analysis can be performed by methylation-specific PCR, or by PCR followed by digestion with a methylation-sensitive restriction enzyme, or by PCR followed by conjugation enzyme chain reaction. In other embodiments, the PCR is in the form of single-molecule or digital PCR (B Vogelstein et al., 1999 Proc Natl Acad Sci USA; 96: 9236-9241). In other embodiments, the PCR can be real-time PCR. In other embodiments, the PCR can be multiplex PCR.

II.甲基化组的分析II. Analysis of Methylation Group

一些实施例可以使用全基因组亚硫酸氢盐测序测定血浆DNA的甲基化型态。胎儿的甲基化型态可以通过如下文描述,对母体血浆DNA样品进行测序来测定。因此,在 孕期无创地获取胎儿DNA分子(和胎儿甲基化组),并且随着孕期的进展,连续监测改 变。因为测序数据为全面的,所以能够在全基因组规模上在单核苷酸分辨率下研究母体 血浆甲基化组。Some embodiments can use whole-genome bisulfite sequencing to determine the methylation pattern of plasma DNA. Fetal methylation patterns can be determined by sequencing maternal plasma DNA samples, as described below. Therefore, fetal DNA molecules (and the fetal methylome) are non-invasively obtained during pregnancy, and changes are continuously monitored as pregnancy progresses. Because the sequencing data is comprehensive, the maternal plasma methylome can be studied at single-nucleotide resolution on a whole-genome scale.

因为测序读数的基因组座标为已知的,所以这些数据能够用来研究基因组中甲基化 组或任何相关区域的整体甲基化水平并且在不同遗传元件之间进行比较。此外,多个序列读数覆盖每个CpG位点或基因座。目前提供了用于测量甲基化组的一些量度的描述。Because the genomic coordinates of the sequencing reads are known, these data can be used to study the overall methylation level of the methylome or any relevant region in the genome and to make comparisons between different genetic elements. Furthermore, multiple sequence reads cover each CpG site or locus. Descriptions of some measures used to measure the methylome are currently provided.

A.血浆DNA分子的甲基化A. Methylation of plasma DNA molecules

DNA分子在人类血浆中以低浓度和片段形式存在,通常长度类似于单核小体单元(洛等人2010科学·转化医学;2:61ra91(YMD Lo et al.2010Sci Transl Med;2:61ra91);以及郑等人2012临床化学;58:549-558(YW Zheng at al.2012Clin Chem;58: 549-558))。尽管存在这些限制,全基因组亚硫酸氢盐测序管道仍然能够分析血浆DNA 分子的甲基化。在其它实施例中,因为所选择的单分子测序平台将允许在无亚硫酸氢盐 转化下直接推测DNA分子的甲基化状态(福拉伯格等人2010自然方法;7:461-465;诗姆等人2013科学报道;3:1389.doi:10.1038/srep01389),所以使用此类平台将允许未 经亚硫酸氢盐转化的样品DNA用于测定血浆DNA的甲基化水平或测定血浆甲基化组。此类平台可以检测N6-甲基腺嘌呤、5-甲基胞嘧啶和5-羟基甲基胞嘧啶,此可以提供与不同生物功能有关的不同形式甲基化对应的改良结果(例如提高的灵敏度或特异性)。 此类改良的结果可以适用于实施例应用于检测或监测例如先兆子痫或特定类型癌症等 特定病症。DNA molecules exist in human plasma in low concentrations and in fragmented forms, typically similar in length to a single nucleosome unit (YMD Lo et al. 2010 Sci Transl Med; 2:61ra91; and YW Zheng et al. 2012 Clin Chem; 58:549-558). Despite these limitations, whole-genome bisulfite sequencing pipelines are still able to analyze the methylation of plasma DNA molecules. In other embodiments, because the chosen single-molecule sequencing platform will allow direct inference of the methylation state of DNA molecules without bisulfite conversion (Fullaberg et al. 2010 Natural Methods; 7:461-465; Smoe et al. 2013 Scientific Reports; 3:1389. doi:10.1038/srep01389), using such a platform will allow sample DNA that has not undergone bisulfite conversion to be used to determine the methylation level of plasma DNA or to determine the plasma methylome. Such platforms can detect N6-methyladenine, 5-methylcytosine, and 5-hydroxymethylcytosine, which can provide improved results (e.g., increased sensitivity or specificity) corresponding to different forms of methylation associated with different biological functions. Such improved results can be applied to the detection or monitoring of specific conditions such as preeclampsia or certain types of cancer.

亚硫酸氢盐测序也可以区分不同形式的甲基化。在一个实施例中,可以包括可以区分5-甲基胞嘧啶与5-羟基甲基胞嘧啶的额外步骤。一种此类方法是氧化亚硫酸氢盐测序(oxBS-seq),其可以在单碱基分辨率下阐明5-甲基胞嘧啶和5-羟基甲基胞嘧啶的位置(布斯等人2012科学;336:934-937(MJ Booth et al.2012Science;336:934-937);布斯等人2013自然实验手册;8:1841-1851(MJ Booth et al.2013Nature Protocols;8:1841-1851))。 在亚硫酸氢盐测序中,5-甲基胞嘧啶与5-羟基甲基胞嘧啶都读成胞嘧啶,因而无法区分。 另一方面,在oxBS-seq中,通过用高钌酸钾(KRuO4)处理将5-羟基甲基胞嘧啶特定地氧化成5-甲酰基胞嘧啶,接着使用亚硫酸氢盐转化将新形成的5-甲酰基胞嘧啶转化成尿嘧啶,将允许5-羟基甲基胞嘧啶与5-甲基胞嘧啶区分开。因此,5-甲基胞嘧啶的读取 可以从单个oxBS-seq操作获得,并且通过与亚硫酸氢盐测序结果比较,推断出5-羟基 甲基胞嘧啶水平。在另一个实施例中,可以使用Tet辅助的亚硫酸氢盐测序(TAB-seq) 将5-甲基胞嘧啶与5-羟基甲基胞嘧啶区分开(余等人2012自然实验手册;7:2159-2170 (M Yu etal.2012Nat Protoc;7:2159-2170))。TAB-seq可以在单碱基分辨率下鉴别5-羟 基甲基胞嘧啶,以及测定其在每个修饰位点的丰度。此方法包括β-葡萄糖基转移酶介导 的5-羟基甲基胞嘧啶保护(葡糖基化)和重组小鼠Tet1(mTet1)介导的5-甲基胞嘧啶 氧化成5-羧基胞嘧啶。在后续亚硫酸氢盐处理和PCR扩增后,胞嘧啶与5-羧基胞嘧啶 (衍生自5-甲基胞嘧啶)都转化成胸腺嘧啶(T),而5-羟基甲基胞嘧啶将读成C。Bisulfite sequencing can also distinguish between different forms of methylation. In one embodiment, it may include an additional step that can differentiate between 5-methylcytosine and 5-hydroxymethylcytosine. One such method is oxobissulfite sequencing (oxBS-seq), which can elucidate the positions of 5-methylcytosine and 5-hydroxymethylcytosine at single-base resolution (Booth et al. 2012 Science; 336:934-937; Booth et al. 2013 Nature Protocols; 8:1841-1851). In bisulfite sequencing, both 5-methylcytosine and 5-hydroxymethylcytosine are read as cytosine and therefore cannot be distinguished. On the other hand, in oxBS-seq, 5-hydroxymethylcytosine is specifically oxidized to 5-formylcytosine by treatment with potassium perruthenate (KRuO4), followed by bisulfite conversion to convert the newly formed 5-formylcytosine to uracil, allowing 5-hydroxymethylcytosine to be distinguished from 5-methylcytosine. Therefore, 5-methylcytosine reads can be obtained from a single oxBS-seq operation, and the 5-hydroxymethylcytosine level can be inferred by comparison with bisulfite sequencing results. In another embodiment, 5-methylcytosine can be distinguished from 5-hydroxymethylcytosine using Tet-assisted bisulfite sequencing (TAB-seq) (M Yu et al. 2012 Nat Protoc; 7:2159-2170). TAB-seq can identify 5-hydroxymethylcytosine at single-base resolution and determine its abundance at each modification site. This method involves β-glucosyltransferase-mediated protection (glucosylation) of 5-hydroxymethylcytosine and recombinant mouse Tet1 (mTet1)-mediated oxidation of 5-methylcytosine to 5-carboxycytosine. Following subsequent bisulfite treatment and PCR amplification, both cytosine and 5-carboxycytosine (derived from 5-methylcytosine) are converted to thymine (T), while 5-hydroxymethylcytosine is read as C.

图1B展示根据本发明的实施例,测序样品的1Mb窗口中的甲基化密度。图150 是描绘跨越基因组的1Mb窗口中母体血浆和基因组DNA中的甲基化密度的Circos图。 从外到内:染色体G带图可以按顺时针方向(着丝粒以红色展示)pter-qter取向,母体 血液(红色)、胎盘(黄色)、母体血浆(绿色)、母体血浆中的共享读数(蓝色)和母 体血浆中的胎儿特定读数(紫色)。母体血细胞、胎盘和母体血浆的整体CpG甲基化水 平(即密度水平)可以见于表100中。跨越全基因组,母体血细胞的甲基化水平一般高 于胎盘的甲基化水平。Figure 1B illustrates the methylation density within a 1Mb window of a sequencing sample according to an embodiment of the present invention. Figure 150 is a Circos diagram depicting the methylation density in maternal plasma and genomic DNA within a 1Mb window spanning the genome. From outside to inside: Chromosome G-banding can be arranged clockwise (centromeres shown in red) with pter-qter orientation, maternal blood (red), placenta (yellow), maternal plasma (green), shared readings in maternal plasma (blue), and fetal-specific readings in maternal plasma (purple). The overall CpG methylation levels (i.e., density levels) of maternal blood cells, placenta, and maternal plasma are shown in Table 100. Across the entire genome, the methylation level of maternal blood cells is generally higher than that of the placenta.

B.亚硫酸氢盐测序与其它技术的比较B. Comparison of bisulfite sequencing with other technologies

使用大规模平行亚硫酸氢盐测序研究胎盘甲基化组。此外,使用覆盖人类基因组中 约480,000个CpG位点的寡核苷酸阵列平台(伊路米那)研究胎盘甲基化组(库里斯等人2012自然遗传;44:1236-1242;以及克拉克等人2012公共科学图书馆·综合;7:e50233(CClark et al.2012PLoS One;7:e50233))。在使用基于珠粒芯片(beadchip)的 基因分型和甲基化分析的一个实施例中,根据制造商方案,使用伊路米那 HumanOmni2.5-8基因分型阵列,进行基因分型。使用Genome Studio软件(Genome Studio Software)(伊路米那)的GenCall算法鉴定基因型。检出率超过99%。对于基于微阵列 的甲基化分析,根据制造商对伊路米那印飞尼姆甲基化分析(Illumina Infinium Methylation Assay)的建议,使用Zymo EZ DNA甲基化试剂盒(美国加利福尼亚州橙县 的兹磨研究公司(Zymo Research,Orange,CA,USA)),将基因组DNA(500-800ng)用 亚硫酸氢钠处理。The placental methylome was studied using massively parallel bisulfite sequencing. Furthermore, the placental methylome was studied using an oligonucleotide array platform (Illumina) covering approximately 480,000 CpG sites in the human genome (Kurlis et al. 2012 Nature Genetics; 44:1236-1242; and Clark et al. 2012 PLoS One; 7:e50233). In one embodiment using bead-chip-based genotyping and methylation analysis, genotyping was performed using the Illumina HumanOmni2.5-8 genotyping array, according to the manufacturer's protocol. Genotyping was performed using the GenCall algorithm of Genome Studio Software (Illumina). The detection rate exceeded 99%. For microarray-based methylation analysis, following the manufacturer's recommendations for the Illumina Infinium Methylation Assay, genomic DNA (500-800 ng) was treated with sodium bisulfite using the Zymo EZ DNA Methylation Kit (Zymo Research, Orange, CA, USA).

根据印飞尼姆HD甲基化分析方案,在50ng/μl下对4μl经亚硫酸氢盐转化的基因组DNA进行甲基化分析。在伊路米那iScan仪器上扫描杂化珠粒芯片。通过 GenomeStudio(v2011.1)甲基化模块(v1.9.0)软件分析DNA甲基化数据,其中相对于 内部对照标准化并减去背景。个别CpG位点的甲基化指数用β值(β)表示,其是使用 甲基化与未甲基化等位基因之间的荧光强度的比率计算的:According to the Infinium HD methylation analysis protocol, methylation analysis was performed on 4 μl of bisulfite-converted genomic DNA at 50 ng/μl. Hybridized bead arrays were scanned on an Ilumina iScan instrument. DNA methylation data were analyzed using the GenomeStudio (v2011.1) methylation module (v1.9.0) software, with normalization relative to internal controls and background subtraction. The methylation index of individual CpG sites is expressed as a β value (β), which is calculated using the ratio of fluorescence intensity between methylated and unmethylated alleles.

对于在阵列上存现并且测序到至少10倍的覆盖度的CpG位点,比较通过阵列获得的β值与如通过相同位点的测序所测定的甲基化指数。β值是指甲基化探针的强度占覆 盖相同CpG位点的甲基化探针与未甲基化探针的总强度的比例。每个CpG位点的甲基 化指数是指覆盖所述CpG上的甲基化读数占总读数的比例。For CpG sites present on the array and sequenced to at least 10-fold coverage, compare the β value obtained by the array with the methylation index determined by sequencing the same site. The β value is the ratio of the intensity of the methylated probe to the total intensity of the methylated and unmethylated probes covering the same CpG site. The methylation index for each CpG site is the ratio of the methylated readings covering that CpG to the total readings.

图2A-2C展示通过伊路米那印飞尼姆人类甲基化450K珠粒芯片阵列测定的β值针对 通过对应CpG位点的全基因组亚硫酸氢盐测序测定的甲基化指数(通过两种平台测得)的散点图:(A)母体血细胞、(B)绒膜绒毛样品、(C)足月胎盘组织。来自两种平台 的数据高度一致并且母体血细胞、CVS和足月胎盘组织对应的皮尔逊相关系数(Pearsoncorrelation coefficient)分别为0.972、0.939、0.954,并且R2值为0.945、0.882和0.910。Figures 2A-2C show scatter plots of β values determined by the Ilumina Infinim Human Methylation 450K Bead Array against methylation indices (measured by two platforms) determined by whole-genome bisulfite sequencing of corresponding CpG sites: (A) maternal hematopoietic cells, (B) chorionic villus samples, and (C) term placental tissue. The data from both platforms are highly consistent, and the Pearson correlation coefficients for maternal hematopoietic cells, CVS, and term placental tissue are 0.972, 0.939, and 0.954, respectively, with values of 0.945, 0.882, and 0.910.

将测序数据与朱等人(Chu et al)所报导的测序数据进一步比较,朱等人使用覆盖 约27,000CpG位点的寡核苷酸阵列,研究12对CVS和母体血细胞DNA样品的甲基化型 态(朱等人2011公共科学图书馆·综合;6:e14723(T Chu et al.2011PLoS One;6:e14723))。CVS和母体血细胞DNA的测序结果与先前研究中12对样品每一者之间的相 关性分析数据显示母体血液的平均皮尔逊系数(0.967)和R2(0.935)以及CVS的平均 皮尔逊系数(0.943)和R2(0.888)。在两种阵列上表示的CpG位点当中,数据与已发表 的数据高度相关。母体血细胞、CVS和胎盘组织的非CpG甲基化率<1%(表100)。这些结果符合当前的报道,即大量的非CpG甲基化主要受限于多能细胞(李斯特等人2009自 然;462:315-322;劳伦特等人2010基因组研究;20:320-331)。The sequencing data were further compared with those reported by Chu et al., who used an oligonucleotide array covering approximately 27,000 CpG sites to study the methylation patterns of 12 pairs of CVS and maternal blood cell DNA samples (Chu et al. 2011 PLoS One; 6:e14723). Correlation analysis between the sequencing results of CVS and maternal blood cell DNA and each of the 12 pairs of samples in previous studies showed a mean Pearson coefficient (0.967) and (0.935) for maternal blood and a mean Pearson coefficient (0.943) and (0.888) for CVS. Among the CpG sites represented on both arrays, the data were highly correlated with published data. The non-CpG methylation rate of maternal blood cells, CVS, and placental tissue was <1% (Table 100). These results are consistent with current reports that a large amount of non-CpG methylation is primarily restricted in pluripotent cells (Lister et al. 2009 Nature; 462:315-322; Laurent et al. 2010 Genome Research; 20:320-331).

C.非怀孕个体的血浆和血液甲基化组的比较C. Comparison of plasma and blood methylomes in non-pregnant individuals

图3A和3B展示从成年男性和非怀孕成年女性收集的血浆和血细胞中甲基化CpG位点百分比的柱状图:(A)常染色体、(B)染色体X。所述图展示男性和非怀孕女性的 血浆和血液甲基化组之间的类似性。男性和非怀孕女性血浆样品中甲基化的CpG位点的 总比例几乎与对应血细胞DNA相同(表100和图2A和2B)。Figures 3A and 3B show bar charts of the percentage of methylated CpG sites in plasma and blood cells collected from adult men and non-pregnant adult women: (A) autosomes, (B) chromosome X. The figures illustrate the similarity between plasma and blood methylation groups in men and non-pregnant women. The overall proportion of methylated CpG sites in plasma samples from men and non-pregnant women was almost identical to that in the corresponding blood cell DNA (Table 100 and Figures 2A and 2B).

随后以基因座特定的方式研究血浆和血细胞样品的甲基化型态的相关性。通过确定 CpG位点上未转化胞嘧啶的总数为占比对到100kb区域的序列读数所覆盖的所有CpG 位点的比例,确定人类基因组中每100kb区域的甲基化密度。甲基化密度在男性以及女 性样品的血浆样品与对应血细胞DNA之间高度一致。The correlation of methylation patterns in plasma and blood cell samples was then investigated in a locus-specific manner. Methylation density per 100kb region in the human genome was determined by identifying the proportion of unconverted cytosine at CpG sites relative to all CpG sites covered by sequence reads up to 100kb. Methylation density showed a high degree of agreement between plasma samples from both male and female subjects and their corresponding blood cell DNA.

图4A和4B展示血细胞DNA和血浆DNA中对应基因座的甲基化密度的散点图:(A) 非怀孕成年女性、(B)成年男性。非怀孕女性样品的皮尔逊相关系数和R2值分别为0.963 和0.927,并且男性样品的皮尔逊相关系数和R2值分别为0.953和0.908。这些数据符合 先前基于同种造血干细胞移植的接受者的血浆DNA分子的基因型评估的发现,所述发现 展示造血细胞是人类血浆中DNA的主要来源(郑等人2012临床化学;58:549-558)。Figures 4A and 4B show scatter plots of methylation density at corresponding loci in hematopoietic cell DNA and plasma DNA: (A) non-pregnant adult women, (B) adult men. The Pearson correlation coefficient and values for the non-pregnant female samples were 0.963 and 0.927, respectively, while those for the male samples were 0.953 and 0.908, respectively. These data are consistent with previous findings based on genotypic assessments of plasma DNA molecules from recipients of allogeneic hematopoietic stem cell transplantation, demonstrating that hematopoietic cells are the primary source of DNA in human plasma (Zheng et al. 2012 Clinical Chemistry; 58:549-558).

D.跨越甲基化组的甲基化水平D. Methylation level across methylation groups

随后研究母体血浆DNA、母体血细胞和胎盘组织的DNA甲基化水平以确定甲基化水平。所述水平是针对重复区域、非重复区域和整体来确定。Subsequently, the methylation levels of maternal plasma DNA, maternal blood cells, and placental tissue DNA were studied to determine the methylation levels. These levels were determined for repeating regions, non-repeating regions, and the whole genome.

图5A和5B展示从孕妇收集的样品当中甲基化CpG位点百分比的柱状图:(A)常 染色体、(B)染色体X。早期和晚期妊娠母体血浆样品的甲基化CpG的总比例分别为 67.0%和68.2%。不同于从非怀孕个体获得的结果,这些比例低于早期妊娠母体血细胞样品但高于CVS和足月胎盘组织样品(表100)。值得注意地,产后母体血浆样品的甲 基化CpG的百分比为73.1%,类似于血细胞数据(表100)。这些倾向在所有常染色体 以及染色体X中分布并且横跨人类基因组的非重复区域和多个类别的重复元件的CpG中观测到。Figures 5A and 5B show bar charts of the percentage of methylated CpG sites in samples collected from pregnant women: (A) autosomes, (B) chromosome X. The total percentages of methylated CpG in maternal plasma samples from early and late pregnancy were 67.0% and 68.2%, respectively. Unlike results obtained from non-pregnant individuals, these percentages are lower than in early pregnancy maternal blood cell samples but higher than in CVS and term placental tissue samples (Table 100). Notably, the percentage of methylated CpG in postpartum maternal plasma samples was 73.1%, similar to the blood cell data (Table 100). These tendencies are observed in CpGs distributed across all autosomes and chromosome X, and across non-repetitive regions and multiple classes of repetitive elements in the human genome.

发现胎盘中重复与非重复元件相对于母体血细胞都是低甲基化的。结果与文献中的 发现一致,即胎盘相对于包括外周血细胞在内的其它组织是低甲基化的。The study found that both repeating and non-repeating elements in the placenta were hypomethylated relative to maternal blood cells. This result is consistent with previous findings that the placenta is hypomethylated relative to other tissues, including peripheral blood cells.

在来自孕妇、非怀孕妇女和成年男性的血细胞DNA中测序CpG位点的71%到72%甲基化(图1的表100)。这些数据与李等人2010公共科学图书馆·生物学;8:e1000533 报导的血液单核细胞的CpG位点的68.4%的报导相当。与关于胎盘组织的低甲基化性的先前报导一致的是,CVS和足月胎盘组织中分别55%和59%的CpG位点甲基化(表100)。CpG sites were sequenced with 71% to 72% methylation in DNA from blood cells of pregnant women, non-pregnant women, and adult men (Table 100, Figure 1). These data are comparable to the reported 68.4% methylation of CpG sites in blood monocytes, as reported by Li et al., 2010, PLOS Biology; 8:e1000533. Consistent with previous reports on the hypomethylation of placental tissue, 55% and 59% of CpG sites were methylated in CVS and term placental tissue, respectively (Table 100).

图6展示母体血液、胎盘和母体血浆的人类基因组的不同重复类别的甲基化水平的 柱状图。重复类别如UCSC基因组浏览器(UCSC genome browser)所定义。所示数据来自早期妊娠样品。不同于早期报道,其主要在基因组中的某些重复类别中观测到胎盘 组织的低甲基化性的数据(诺瓦科维奇等人2012胎盘;33:959-970(B Novakovic et al.2012Placenta;33:959-970));而在此展示相对于血细胞的基因组,胎盘实际上大部分重复类别都是低甲基化的。Figure 6 is a bar chart showing the methylation levels of different repeat classes in the human genome from maternal blood, placenta, and maternal plasma. Repeat classes are defined as defined by the UCSC genome browser. The data shown are from early pregnancy samples. Unlike earlier reports, which primarily observed hypomethylation of placental tissue in certain repeat classes within the genome (Novakovic et al. 2012 Placenta; 33:959-970), this chart shows that, relative to the genome of blood cells, the placenta actually exhibits hypomethylation in most repeat classes.

E.甲基化组的类似性E. Similarity of methylation groups

实施例可以使用相同的平台确定胎盘组织、血细胞和血浆的甲基化组。因此,可以对那些生物样品类型的甲基化组进行直接比较。男性和非怀孕女性的血细胞和血浆的甲基化组之间以及母体血细胞与产后母体血浆样品之间的高度相似性进一步肯定了造血 细胞是人类血浆中DNA的主要来源(郑等人2012临床化学;58:549-558)。The examples used the same platform to determine the methylome of placental tissue, blood cells, and plasma. Therefore, direct comparisons of the methylomes of those biological sample types are possible. The high similarity between the methylomes of blood cells and plasma from male and non-pregnant women, and between maternal blood cells and postpartum maternal plasma samples, further confirms that hematopoietic cells are the primary source of DNA in human plasma (Zheng et al. 2012 Clinical Chemistry; 58:549-558).

根据基因组中甲基化CpG的总比例以及血细胞DNA和血浆DNA中对应基因座之 间的甲基化密度的高度相关性,相似性显而易见。然而,早期妊娠和晚期妊娠母体血浆 样品中甲基化CpG的总比例比母体血细胞数据或产后母体血浆样品低。在怀孕期间降低 的甲基化水平是因为母体血浆中存在的胎儿DNA分子的低甲基化特性。The similarity is evident based on the overall proportion of methylated CpGs in the genome and the high correlation between methylation densities at corresponding loci in blood cell DNA and plasma DNA. However, the overall proportion of methylated CpGs in maternal plasma samples from early and late pregnancy is lower than in maternal blood cell data or postpartum maternal plasma samples. The reduced methylation levels during pregnancy are due to the hypomethylated nature of fetal DNA molecules present in maternal plasma.

产后母体血浆样品中的甲基化型态逆转到变得更类似于母体血细胞表明胎儿DNA分子已经从母体循环中去除。基于胎儿的SNP标记物计算胎儿DNA浓度确实显示浓度 从产前33.9%变到产后样品中仅4.5%。The reversal of methylation patterns in postpartum maternal plasma samples to become more similar to maternal blood cells indicates that fetal DNA molecules have been removed from maternal circulation. Calculations of fetal DNA concentration based on fetal SNP markers indeed show a change in concentration from 33.9% prenatally to only 4.5% in postpartum samples.

F.其它应用F. Other Applications

实施例已经通过血浆DNA的MPS分析成功地组装了DNA甲基化组。从母体血浆 确定胎盘或胎儿甲基化组的能力提供了一种测定、检测和监测与例如先兆子痫、子宫内 生长受限、早产等怀孕相关病状相关的异常甲基化型态的无创方法。举例来说,疾病特 定的异常甲基化标记的检测允许筛选、诊断和监测此类怀孕相关病状。母体血浆甲基化 水平的测量允许筛选、诊断和监测此类怀孕相关病状。除直接应用于怀孕相关病状的研究外,所述方法还可以应用于其中对血浆DNA分析有兴趣的其它医学领域。举例来说, 可以从癌症患者的血浆DNA测定癌症的甲基化组。如本文中所述,从血浆进行癌症甲 基化组分析可能是一种与从血浆进行癌症基因组分析协同的技术(陈等人2013临床化 学;59:211-224(KCA Chanat al.2013Clin Chem;59:211-224)和利瑞等人2012科学·转 化医学;4:162ra154(RJLeary et al.2012Sci Transl Med;4:162ra154))。The examples have successfully assembled the DNA methylome using MPS analysis of plasma DNA. The ability to determine the placental or fetal methylome from maternal plasma provides a non-invasive method for measuring, detecting, and monitoring abnormal methylation patterns associated with pregnancy-related conditions such as preeclampsia, intrauterine growth restriction, and preterm birth. For example, the detection of disease-specific abnormal methylation markers allows for the screening, diagnosis, and monitoring of such pregnancy-related conditions. Measurement of maternal plasma methylation levels allows for the screening, diagnosis, and monitoring of such pregnancy-related conditions. In addition to its direct application in research on pregnancy-related conditions, the method can also be applied to other medical fields where plasma DNA analysis is of interest. For example, the methylome of cancer can be determined from the plasma DNA of cancer patients. As described in this article, cancer methylome analysis from plasma may be a technique that works in conjunction with cancer genomic analysis from plasma (KCA et al. 2013 Clin Chem; 59:211-224 and RJ Leary et al. 2012 Sci Transl Med; 4:162ra154).

举例来说,血浆样品的甲基化水平的测定可以用于癌症筛查。当血浆样品的甲基化水平与健康对照相比显示异常水平时,可以怀疑患有癌症。随后可以通过测定不同基因座上甲基化的血浆型态或通过血浆基因组分析以检测肿瘤相关的拷贝数异常、染色体易位和单核苷酸变异体,对癌症类型或癌症组织来源进行进一步证实和评估。实际上,在 本发明的一个实施例中,血浆癌症甲基化组和基因组型态分析可以同时进行。或者,放 射学和成像研究(例如计算机断层扫描、磁共振成像、正电子发射断层摄影法)或内窥 镜检查(例如上胃肠道内窥镜检查或结肠镜检查)可以基于血浆甲基化水平分析用于进 一步研究怀疑患有癌症的个体。For example, the determination of methylation levels in plasma samples can be used for cancer screening. When the methylation levels in a plasma sample show abnormal levels compared to healthy controls, cancer can be suspected. The type of cancer or the origin of the cancer tissue can then be further confirmed and evaluated by measuring the plasma methylation patterns at different loci or by using plasma genomic analysis to detect tumor-related copy number abnormalities, chromosomal translocations, and single nucleotide variants. In fact, in one embodiment of the invention, plasma cancer methylome and genomic morphology analysis can be performed simultaneously. Alternatively, radiological and imaging studies (e.g., computed tomography, magnetic resonance imaging, positron emission tomography) or endoscopic examinations (e.g., upper gastrointestinal endoscopy or colonoscopy) can be used based on plasma methylation level analysis to further investigate individuals suspected of having cancer.

对于癌症筛选或检测,血浆(或其它生物)样品的甲基化水平的测定可以结合例如以下等其它用于癌症筛选或检测的模态使用:前列腺特定的抗原测量(例如用于前列腺癌)、癌胚抗原(例如用于结肠直肠癌、胃癌、胰腺癌、肺癌、乳癌、甲状腺髓样癌)、 α胎蛋白(例如用于肝癌或生殖细胞肿瘤)、CA125(例如用于卵巢和乳癌)和CA19-9 (例如用于胰腺癌)。For cancer screening or detection, the determination of methylation levels in plasma (or other biological) samples can be combined with other modalities used for cancer screening or detection, such as: prostate-specific antigen measurements (e.g. for prostate cancer), carcinoembryonic antigen (e.g. for colorectal cancer, gastric cancer, pancreatic cancer, lung cancer, breast cancer, medullary thyroid carcinoma), alpha-fetoprotein (e.g. for liver cancer or germ cell tumors), CA125 (e.g. for ovarian and breast cancer), and CA19-9 (e.g. for pancreatic cancer).

另外,可以对其它组织进行测序以获得细胞甲基化组。举例来说,可以对肝组织进行分析以确定对肝来说特定的甲基化模式,其可以用以鉴别肝病变。还可以分析的其它 组织包括脑细胞、骨、肺、心、肌肉和肾等。各种组织的甲基化型态可能例如由于发育、 衰老、疾病过程(例如发炎或肝硬化或自身免疫过程(例如全身性红斑狼疮中))或治 疗(例如用例如5-氮杂胞苷和5-氮杂脱氧胞苷等去甲基剂治疗)而随时间改变。DNA 甲基化的动态性使得此类分析可能对监测生理和病理过程来说极具价值。举例来说,如 果检测到个体的血浆甲基化组与其健康时所获得的基线值相比有所改变,那么随后可以检测器官中的疾病发展过程因为器官会释放DNA到血浆中。Additionally, sequencing can be performed on other tissues to obtain the cellular methylome. For example, liver tissue can be analyzed to determine specific methylation patterns for the liver, which can be used to identify liver lesions. Other tissues that can be analyzed include brain cells, bone, lungs, heart, muscles, and kidneys. The methylation patterns of various tissues may change over time due to development, aging, disease processes (such as inflammation or cirrhosis, or autoimmune processes (such as in systemic lupus erythematosus)), or treatment (such as treatment with demethylating agents such as 5-azacytidine and 5-azadeoxycytidine). The dynamic nature of DNA methylation makes such analyses potentially valuable for monitoring physiological and pathological processes. For example, if an individual's plasma methylome is detected to have changed compared to baseline values obtained when they are healthy, then disease progression in organs can subsequently be detected because organs release DNA into the plasma.

并且,可以从器官移植接受者的血浆DNA测定移植器官的甲基化组。如本发明中所描述,从血浆进行移植甲基化组分析可能是一种与从血浆进行移植基因组分析协同的技术(郑等人2012(YW Zheng at al,2012);洛等人1998柳叶刀;351:1329-1330(YMD Lo atal.1998Lancet;351:1329-1330);以及辛德尔等人2011美国国家科学院院刊;108:6229-6234(TM Snyder et al.2011Proc Natl Acad Sci USA;108:6229-6234))。因为血浆 DNA一般被看作是细胞死亡的标记物,所以从移植器官释放的DNA的血浆水平的增加 可以用作此器官细胞死亡增加的标记物,例如涉及此器官的排斥事件或其它病理性过程 (例如感染或脓肿)。在抗排斥疗法成功开始的情况下,将预期移植器官释放的DNA的 血浆水平降低。Furthermore, the methylome of transplanted organs can be determined from the plasma DNA of organ transplant recipients. As described in this invention, transplant methylome analysis from plasma may be a technique that works in conjunction with transplant genomic analysis from plasma (YW Zheng et al., 2012; YMD Lo et al., 1998 Lancet; 351:1329-1330; and TM Snyder et al., 2011 Proc Natl Acad Sci USA; 108:6229-6234). Because plasma DNA is generally considered a marker of cell death, an increase in plasma levels of DNA released from a transplanted organ can be used as a marker of increased cell death in that organ, such as in cases of rejection or other pathological processes (e.g., infection or abscess). In cases where anti-rejection therapy is successfully initiated, the plasma levels of DNA expected to be released from the transplanted organ are reduced.

III.使用SNP测定胎儿或肿瘤甲基化组III. Using SNPs to determine fetal or tumor methylation profiles

如上所述,对于非怀孕正常人,血浆甲基化组对应于血液甲基化组。但是,对于怀孕女性,这些甲基化组不同。胎儿DNA分子在母体血浆中在大部分母体DNA的背景中 循环(洛等人1998美国人类遗传学杂志;62:768-775(YMD Lo et al.1998Am J Hum Genet;62:768-775))。因此,对于怀孕女性,血浆甲基化组基本上是胎盘甲基化组与血 液甲基化组的复合物。因此,可以从血浆提取胎盘甲基化组。As mentioned above, for non-pregnant individuals, the plasma methylome corresponds to the blood methylome. However, for pregnant women, these methylomes differ. Fetal DNA molecules circulate in maternal plasma against a background of predominantly maternal DNA (YMD Lo et al., 1998, *American Journal of Human Genetics*; 62:768-775). Therefore, for pregnant women, the plasma methylome is essentially a complex of the placental methylome and the blood methylome. Consequently, the placental methylome can be extracted from plasma.

在一个实施例中,母亲与胎儿之间的单核苷酸多态性(SNP)差异用于鉴别母体血浆中的胎儿DNA分子。目标是鉴别母亲是纯合但胎儿是杂合的SNP基因座;可以使用 胎儿特定的等位基因来确定哪个DNA片段来自胎儿。使用SNP基因分型阵列伊路米那HumanOmni2.5-8分析来自母体血细胞的基因组DNA。另一方面,对于母亲是杂合并且 胎儿是纯合的SNP基因座,随后可以使用母亲特定的SNP等位基因来确定哪个血浆 DNA片段来自母亲。此类DNA片段的甲基化水平将反映母亲中相关基因组区域的甲基 化水平。In one embodiment, single nucleotide polymorphism (SNP) differences between mother and fetus are used to identify fetal DNA molecules in maternal plasma. The goal is to identify SNP loci where the mother is homozygous but the fetus is heterozygous; fetal-specific alleles can be used to determine which DNA fragment originates from the fetus. Genomic DNA from maternal blood cells is analyzed using the Ilumina HumanOmni2.5-8 SNP genotyping array. On the other hand, for SNP loci where the mother is heterozygous and the fetus is homozygous, maternal-specific SNP alleles can then be used to determine which plasma DNA fragment originates from the mother. The methylation level of such DNA fragments will reflect the methylation level of the relevant genomic region in the mother.

A.胎儿特定的读数的甲基化与胎盘甲基化组的相关性A. Correlation between fetal-specific methylation readings and placental methylome

从生物样品的测序结果鉴别具有两个不同等位基因的基因座,其中一种等位基因(B)的量显著低于另一等位基因(A)。覆盖B等位基因的读数被看作是胎儿特定的(胎 儿特定的读数)。确定母亲是A纯合的,并且胎儿是A/B杂合的,因而覆盖A等位基因 的读数由母亲和胎儿共享(共享读数)。Sequencing results from biological samples identify loci with two distinct alleles, where one allele (B) is significantly less abundant than the other (A). Readings covering the B allele are considered fetal-specific (fetal-specific readings). If the mother is determined to be homozygous for A and the fetus to be heterozygous for A/B, then readings covering the A allele are shared by both mother and fetus (shared readings).

在分析的用于说明本发明中若干概念的一个怀孕案例中,发现怀孕母亲在常染色体 上的1,945,516个基因座上是纯合的。检查覆盖这些SNP的母体血浆DNA测序读数。 在107,750个基因座上检测到带有非母体等位基因的读数并且这些基因座被视为信息性基因座。在每个信息性SNP,不是来自母亲的等位基因称为胎儿特定的等位基因,而另 一个则称为共享等位基因。In an analysis of a pregnancy case used to illustrate several concepts in this invention, the pregnant mother was found to be homozygous at 1,945,516 loci on autosomes. Maternal plasma DNA sequencing reads covering these SNPs were examined. Reads with non-maternal alleles were detected at 107,750 loci, and these loci were considered informative loci. At each informative SNP, the allele not from the mother is called the fetal-specific allele, while the other is called the shared allele.

可以测定母体血浆中的胎儿/肿瘤DNA百分比浓度(又称为胎儿DNA百分比)。在 一个实施例中,母体血浆中的胎儿DNA百分比浓度f通过以下等式确定:The percentage concentration of fetal/tumor DNA in maternal plasma (also known as fetal DNA percentage) can be determined. In one embodiment, the percentage concentration f of fetal DNA in maternal plasma is determined by the following equation:

其中p是胎儿特定的等位基因的测序读数的数目,并且q是母亲与胎儿之间的共享等位基因的测序读数的数目(洛等人2010科学·转化医学;2:61ra91)。发现早期妊娠、 晚期妊娠和产后母体血浆样品中胎儿DNA比例分别是14.4%、33.9%和4.5%。还使用与 染色体Y比对的读数的数目计算胎儿DNA比例。基于染色体Y数据,在早期妊娠、晚 期妊娠和产后母体血浆样品中,结果分别是14.2%、34.9%和3.7%。Where p is the number of sequencing reads for fetal-specific alleles, and q is the number of sequencing reads for shared alleles between mother and fetus (Lou et al. 2010 Science Translational Medicine; 2:61ra91). The proportion of fetal DNA in maternal plasma samples from early pregnancy, late pregnancy, and postpartum was found to be 14.4%, 33.9%, and 4.5%, respectively. The proportion of fetal DNA was also calculated using the number of reads aligned to chromosome Y. Based on chromosome Y data, the results were 14.2%, 34.9%, and 3.7% in maternal plasma samples from early pregnancy, late pregnancy, and postpartum, respectively.

通过分别分析胎儿特定或共享的序列读数,实施例证实了循环胎儿DNA分子远比背景DNA分子低甲基化。对于早期妊娠与晚期妊娠,胎儿特定的母体血浆读数中对应 基因座的甲基化密度与胎盘组织数据的比较揭露两者有高度相关性。这些数据在基因组 水平证实了胎盘是母体血浆中胎儿来源的DNA分子的主要来源,并代表了比先前基于来源于个别所选的基因座上的信息对应的证据向前迈出了重要一步。By analyzing fetal-specific or shared sequence reads separately, the examples demonstrated that circulating fetal DNA molecules are significantly less methylated than background DNA molecules. Comparison of methylation density at corresponding loci in fetal-specific maternal plasma reads with placental tissue data revealed a high correlation between the two in early and late pregnancy. These data confirm at the genomic level that the placenta is the primary source of fetal-derived DNA molecules in maternal plasma and represent a significant step forward from previous evidence based on information derived from individual selected loci.

使用覆盖靠近信息性SNP的CpG位点的胎儿特定或共享的读数来测定基因组中每个1Mb区域的甲基化密度。从母体血浆序列读数组装得到的胎儿和非胎儿特定的甲基 化组可以例如用Circos图展示(克里辛威斯基等人2009基因组研究;19:1639-1645(MKrzywinski et al.2009Genome Res;19:1639-1645))。还测定母体血细胞和胎盘组织样品的每个1Mb区间的甲基化密度。Methylation density in each 1Mb region of the genome was determined using fetal-specific or shared reads covering CpG sites near informative SNPs. Fetal and non-fetal-specific methylomes assembled from maternal plasma sequence reads can be visualized, for example, using Circos diagrams (MKrzywinski et al. 2009 Genome Res; 19:1639-1645). Methylation density in each 1Mb region was also determined from maternal blood cell and placental tissue samples.

图7A展示早期妊娠样品的Circos图700。图7B展示晚期妊娠样品的Circos图750。图700和750展示每1Mb区间的甲基化密度。染色体G带图(最外环)以顺时针方向 pter-qter取向(着丝粒以红色展示)。第二最外轨迹展示对应1Mb区域中CpG位点的数 目。所示红色柱的标度是每1Mb区间至多20,000个位点。对应1Mb区域的甲基化密 度根据中心位置展示的色彩配置在其它环中展示。Figure 7A shows a Circos diagram 700 for an early pregnancy sample. Figure 7B shows a Circos diagram 750 for a late pregnancy sample. Figures 700 and 750 show the methylation density per 1 Mb interval. The chromosome G-banding diagram (outermost ring) is pter-qter oriented clockwise (centromeres are shown in red). The second outermost trajectory shows the number of CpG sites corresponding to the 1 Mb region. The scale of the red bars shown is up to 20,000 sites per 1 Mb interval. The methylation density corresponding to the 1 Mb region is shown in the other rings according to the color arrangement of the central position.

对于早期妊娠样品(图7A),从内到外,不同环对应于:绒膜绒毛样品、母体血浆 中胎儿特定的读数、母体血浆中母体特定的读数、母体血浆中胎儿与非胎儿总读数以及 母体血细胞。对于晚期妊娠样品(图7B),不同环对应于:足月胎盘组织、母体血浆中胎儿特定的读数、母体血浆中母体特定的读数、母体血浆中胎儿与非胎儿总读数、产后 母体血浆以及母体血细胞(来自早期妊娠血液样品)。可以了解到对于早期与晚期妊娠 血浆样品,胎儿甲基化组比非胎儿特定的甲基化组更加低甲基化。For early pregnancy samples (Figure 7A), from the inside out, the different rings correspond to: chorionic villus sampling, fetal-specific readings in maternal plasma, maternal-specific readings in maternal plasma, total fetal and non-fetal readings in maternal plasma, and maternal blood cells. For late pregnancy samples (Figure 7B), the different rings correspond to: term placental tissue, fetal-specific readings in maternal plasma, maternal-specific readings in maternal plasma, total fetal and non-fetal readings in maternal plasma, postpartum maternal plasma, and maternal blood cells (from early pregnancy blood samples). It can be observed that for both early and late pregnancy plasma samples, the fetal methylation group is more hypomethylated than the non-fetal-specific methylation group.

胎儿甲基化组的总甲基化型态更类似于CVS或胎盘组织样品。相反,主要是母体DNA的血浆中共享读数的DNA甲基化型态更类似于母体血细胞。随后进行母体血浆 DNA读数与母体或胎儿组织的甲基化密度的系统性逐基因座比较。我们识别出与信息性 SNPs存在于相同序列读数上的CpG位点的甲基化密度并且被至少5个母体血浆DNA 序列读数覆盖的位点。The overall methylation pattern of the fetal methylome was more similar to that of CVS or placental tissue samples. Conversely, the DNA methylation pattern of shared reads in plasma, which was primarily maternal DNA, was more similar to that of maternal blood cells. A systematic, locus-by-locus comparison of methylation density between maternal plasma DNA reads and maternal or fetal tissue was then performed. We identified sites with methylation density at CpG sites on the same sequence reads as informative SNPs and which were covered by at least five maternal plasma DNA sequence reads.

图8A-8D展示针对信息性单核苷酸多态性周围的CpG位点,基因组组织DNA针对 母体血浆DNA的甲基化密度的比较图。图8A展示相对于CVS样品中读数的甲基化密 度,早期妊娠母体血浆样品中胎儿特定的读数的甲基化密度。可以看出,胎儿特定的值 极好地对应于CVS值。Figures 8A-8D show a comparison of methylation density of genomic tissue DNA relative to maternal plasma DNA at CpG sites surrounding informative single nucleotide polymorphisms. Figure 8A shows the methylation density of fetal-specific readings in early pregnancy maternal plasma samples relative to CVS readings. It can be seen that the fetal-specific values correspond very well to the CVS values.

图8B展示相对于足月胎盘组织中读数的甲基化密度,晚期妊娠母体血浆样品中胎儿特定的读数的甲基化密度。再次,甲基化密度组极好地对应足月胎盘组织中读数,表明胎儿甲基化型态可以通过分析具有胎儿特定的等位基因的读数获得。Figure 8B shows the methylation density of fetal-specific readings in maternal plasma samples during late pregnancy, relative to readings in full-term placental tissue. Again, the methylation density group corresponds very well to the readings in full-term placental tissue, indicating that fetal methylation patterns can be obtained by analyzing readings with fetal-specific alleles.

图8C展示相对于母体血细胞中读数的甲基化密度,早期妊娠母体血浆样品中共享读数的甲基化密度。假定大部分共享读数来自母亲,两组值极好地对应。图8D展示相 对于母体血细胞中读数的甲基化密度,晚期妊娠母体血浆样品中共享读数的甲基化密度。Figure 8C shows the methylation density of shared readings in early pregnancy maternal plasma samples relative to maternal blood cell readings. It is assumed that most shared readings originate from the mother, and the two sets of values correspond very well. Figure 8D shows the methylation density of shared readings in late pregnancy maternal plasma samples relative to maternal blood cell readings.

对于母体血浆中胎儿特定的读数,早期妊娠母体血浆与CVS之间的斯皮尔曼相关系数(Spearman correlation coefficient)是0.705(P<2.2*e-16);并且晚期妊娠母体血浆 与足月胎盘组织之间的斯皮尔曼相关系数是0.796(P<2.2*e-16)(图8A和8B)。母体血浆中的共享读数与母体血细胞数据进行类似比较。早期妊娠血浆样品的皮尔逊相关系数是0.653(P<2.2*e-16)并且晚期妊娠血浆样品的皮尔逊相关系数是0.638(P<2.2*e-16)(图8C和8D)。For fetal-specific readings in maternal plasma, the Spearman correlation coefficient between early pregnancy maternal plasma and CVS was 0.705 (P < 2.2*e⁻¹⁶); and the Spearman correlation coefficient between late pregnancy maternal plasma and term placental tissue was 0.796 (P < 2.2*e⁻¹⁶) (Figures 8A and 8B). Shared readings in maternal plasma were compared similarly with maternal blood cell data. The Pearson correlation coefficient for early pregnancy plasma samples was 0.653 (P < 2.2*e⁻¹⁶) and the Pearson correlation coefficient for late pregnancy plasma samples was 0.638 (P < 2.2*e⁻¹⁶) (Figures 8C and 8D).

B.胎儿甲基化组B. Fetal methylome

在一个实施例中,为从母体血浆集合胎儿甲基化组,对横跨至少一个信息性胎儿SNP位点并在相同读数内含有至少一个CpG位点的序列读数进行分选。展示胎儿特定 的等位基因的读数包括在胎儿甲基化组的集合中。展示共享等位基因,即非胎儿特定的 等位基因的读数包括在主要由母体来源的DNA分子构成的非胎儿特定的甲基化组的集合中。In one embodiment, to collect a fetal methylome from maternal plasma, sequence reads spanning at least one informative fetal SNP site and containing at least one CpG site within the same read are sorted. Reads exhibiting fetal-specific alleles are included in the collection of fetal methylomes. Reads exhibiting shared alleles, i.e., non-fetal-specific alleles, are included in the collection of non-fetal-specific methylomes consisting primarily of maternally derived DNA molecules.

对于早期妊娠母体血浆样品,胎儿特定的读数覆盖常染色体上218,010个CpG位点。晚期妊娠和产后母体血浆样品的对应图分别是263,611和74,020。平均地,共享读 数覆盖那些CpG位点分别平均33.3、21.7和26.3次。对于早期妊娠、晚期妊娠和产后 母体血浆样品,胎儿特定的读数覆盖那些CpG位点分别3.0、4.4和1.8次。For maternal plasma samples from early pregnancy, fetal-specific readings covered 218,010 CpG loci on autosomes. The corresponding figures for late pregnancy and postpartum maternal plasma samples were 263,611 and 74,020, respectively. On average, shared readings covered those CpG loci an average of 33.3, 21.7, and 26.3 times, respectively. For early pregnancy, late pregnancy, and postpartum maternal plasma samples, fetal-specific readings covered those CpG loci an average of 3.0, 4.4, and 1.8 times, respectively.

胎儿DNA代表母体血浆中的少数群体,并且因此胎儿特定的读数对那些CpG位点的覆盖率与样品的胎儿DNA百分比成比例。对于早期妊娠母体血浆样品,胎儿读数中甲基化CpG的总百分比是47.0%,而共享读数则是68.1%。对于晚期妊娠母体血浆样品, 胎儿读数的甲基化CpG的百分比是53.3%,而共享读数则是68.8%。这些数据展示母体 血浆中胎儿特定的读数比母体血浆中的共享读数更加低甲基化。Fetal DNA represents a minority of CpG sites in maternal plasma, and therefore the coverage of those CpG sites by fetal-specific readings is proportional to the percentage of fetal DNA in the sample. For early pregnancy maternal plasma samples, the total percentage of methylated CpGs in fetal readings was 47.0%, compared to 68.1% in shared readings. For late pregnancy maternal plasma samples, the percentage of methylated CpGs in fetal readings was 53.3%, compared to 68.8% in shared readings. These data demonstrate that fetal-specific readings in maternal plasma are less methylated than shared readings in maternal plasma.

C.方法C. Method

上述技术还可以用于测定肿瘤甲基化型态。现描述用于测定胎儿和肿瘤甲基化型态 的方法。The above techniques can also be used to determine tumor methylation patterns. Methods for determining fetal and tumor methylation patterns are now described.

图9是一个流程图,说明根据本发明的实施例,用于从生物体的生物样品测定第一甲基化型态的方法900。方法900可以从母体血浆的甲基化型态构筑胎儿的表观遗传图。生物样品包括包含源自第一组织和第二组织的游离DNA的混合物的游离DNA。作为实 例,第一组织可以来自胎儿、肿瘤或移植器官。Figure 9 is a flowchart illustrating a method 900 for determining a first methylation pattern from a biological sample of an organism according to an embodiment of the present invention. Method 900 can construct an epigenetic map of the fetus from the methylation patterns of maternal plasma. The biological sample comprises cell-free DNA containing a mixture of cell-free DNA derived from a first tissue and a second tissue. As an example, the first tissue may be derived from a fetus, a tumor, or a transplanted organ.

在框910处,分析来自生物样品的多个DNA分子。DNA分子的分析可以包括确定 DNA分子在生物体的基因组中的位置,确定DNA分子的基因型,以及确定DNA分子 是否在一或多个位点甲基化。In box 910, multiple DNA molecules from a biological sample are analyzed. The analysis of DNA molecules may include determining the location of the DNA molecule within the organism's genome, determining the genotype of the DNA molecule, and determining whether the DNA molecule is methylated at one or more sites.

在一个实施例中,使用DNA分子的序列读数分析DNA分子,其中测序是甲基化可 识别性测序。因此,序列读数包括来自生物样品的DNA分子的甲基化状态。甲基化状 态可以包括特定胞嘧啶残基是5-甲基胞嘧啶还是5-羟基甲基胞嘧啶。序列读数可以从各种测序技术、PCR技术、阵列以及其它适用于鉴别片段序列的技术获得。序列读数的位 点的甲基化状态可以如本文中所述来获得。In one embodiment, DNA molecules are analyzed using sequence reads, where sequencing is methylation-identifiable sequencing. Therefore, the sequence reads include the methylation state of DNA molecules from a biological sample. The methylation state may include whether a specific cytosine residue is 5-methylcytosine or 5-hydroxymethylcytosine. Sequence reads can be obtained from various sequencing technologies, PCR technologies, arrays, and other techniques suitable for identifying fragment sequences. The methylation state of the sites in the sequence reads can be obtained as described herein.

在框920处,鉴别其中第一组织的第一基因组是相应第一等位基因与相应第二等位 基因是杂合并的且第二组织的第二基因组是相应第一等位基因是纯合的多个第一基因座。举例来说,可以在多个第一基因座鉴别胎儿特定的读数。或可以在多个第一基因座 鉴别肿瘤特定的读数。组织特定的读数可以从测序读数鉴别,其中第二等位基因的序列读数的百分比处于特定范围内,例如约3%-25%,由此表明DNA片段的少数群体来自基 因座上的杂合基因组,而多数群体来自基因座上的纯合基因组。At box 920, multiple first loci are identified where the first genome of a first tissue is heterozygous for the corresponding first allele and the corresponding second allele, and the second genome of a second tissue is homozygous for the corresponding first allele. For example, fetal-specific reads can be identified at multiple first loci. Or tumor-specific reads can be identified at multiple first loci. Tissue-specific reads can be identified from sequencing reads where the percentage of second allele sequence reads is within a specific range, such as approximately 3%–25%, indicating that a minority of DNA fragments originate from heterozygous genomes at the loci, while the majority originate from homozygous genomes at the loci.

在框930处,分析位于每一第一基因座的一或多个位点上的DNA分子。确定在位 点上甲基化并对应于基因座的相应第二等位基因的DNA分子数目。每个基因座可能存 在一个以上位点。举例来说,SNP可以指示片段是胎儿特定的,并且所述片段可以具有 多个确定甲基化状态的位点。可以确定每个位点上甲基化的读数的数目并且可以测定基因座的甲基化读数的总数。At box 930, DNA molecules located at one or more sites at each first locus are analyzed. The number of DNA molecules methylated at the site and corresponding to the corresponding second allele at the locus is determined. Each locus may contain more than one site. For example, an SNP may indicate that a fragment is fetal-specific, and the fragment may have multiple sites that define the methylation status. The number of methylation reads at each site can be determined, and the total number of methylation reads at the locus can be measured.

基因座可以通过位点的特定数目、位点的特定组或围绕包含组织特定的等位基因的 变体的区域的特定尺寸来界定。基因座可以仅具有一个位点。位点可以具有特定的特性, 例如为CpG位点。未甲基化的读数数目的测定是同等的,并且涵盖于甲基化状态的测定内。A locus can be defined by a specific number of sites, a specific group of sites, or a specific size of the region surrounding a variant containing tissue-specific alleles. A locus may have only one site. Sites may have specific characteristics, such as being CpG sites. The determination of the number of unmethylated reads is equivalent and is included within the determination of methylation status.

在框940处,对于每一第一基因座,基于在基因座的一或多个位点上甲基化并对应于基因座的相应第二等位基因的DNA分子的数目,计算甲基化密度。举例来说,甲基 化密度可以针对与基因座相对应的CpG位点确定。At box 940, for each first locus, the methylation density is calculated based on the number of DNA molecules methylated at one or more sites at the locus and corresponding to the corresponding second allele at the locus. For example, the methylation density can be determined for the CpG site corresponding to the locus.

在框950处,从第一基因座的甲基化密度产生第一组织的第一甲基化型态。第一甲基化型态可以对应于特定位点,例如CpG位点。甲基化型态可以针对所有具有胎儿特定 的等位基因的基因座或只是那些基因座中的一些。At box 950, the first methylation pattern of the first tissue is derived from the methylation density of the first locus. The first methylation pattern may correspond to a specific site, such as a CpG site. The methylation pattern may be targeted at all loci with fetal-specific alleles or only some of those loci.

IV.使用血浆和血液甲基化组的差异IV. Differences in plasma and blood methylation groups

以上已经展示来自血浆的胎儿特定的读数与胎盘甲基化组相关。因为母体血浆甲基 化组的母体组分主要由血细胞贡献,所以血浆甲基化组与血液甲基化组之间的差异可以 用于确定所有基因座的胎盘甲基化组而不只是胎儿特定的等位基因的位置。血浆甲基化 组与血液甲基化组之间的差异还可以用于测定肿瘤的甲基化组。The above has demonstrated the correlation between fetal-specific readings from plasma and the placental methylome. Because the maternal component of the maternal plasma methylome is primarily contributed by blood cells, the difference between the plasma methylome and the blood methylome can be used to determine the placental methylome for all loci, not just the location of fetal-specific alleles. The difference between the plasma methylome and the blood methylome can also be used to determine the methylome of tumors.

A.方法A. Method

图10是一个流程图,说明根据本发明的实施例,从生物体的生物样品测定第一甲基化型态的方法1000。生物样品(例如血浆)包括源自第一组织和第二组织的游离DNA 组合而成的混合物。第一甲基化型态对应于第一组织(例如胎儿组织或肿瘤组织)的甲 基化型态。方法1200可以从母体血浆推断甲基化有差异的区域。Figure 10 is a flowchart illustrating a method 1000 for determining a first methylation pattern from a biological sample of an organism according to an embodiment of the present invention. The biological sample (e.g., plasma) comprises a mixture of cell-free DNA derived from a first tissue and a second tissue. The first methylation pattern corresponds to the methylation pattern of the first tissue (e.g., fetal tissue or tumor tissue). Method 1200 can infer regions of differential methylation from maternal plasma.

在框1010处,接收生物样品。生物样品可以简单地在机器(例如测序机)上接收。生物样品可以呈从生物体采集的形式或可以呈加工的形式,例如样品可以是从血液样品提取的血浆。At box 1010, a biological sample is received. The biological sample can be simply received on a machine (e.g., a sequencer). The biological sample can be in the form of collection from an organism or in a processed form, such as plasma extracted from a blood sample.

在框1020处,获得与第二组织的DNA相对应的第二甲基化型态。可以从存储器读取第二甲基化型态,因为其可能先前已经测定。第二甲基化型态可以从第二组织测定, 例如仅仅含有或主要含有第二组织的细胞的不同样品。第二甲基化型态可以对应于细胞 甲基化型态并从细胞DNA获得。作为另一实例,第二型态可以从在怀孕前或在癌症出现前收集的血浆样品测定,因为未患癌症的非怀孕者的血浆甲基化组非常类似于血细胞 的甲基化组。At box 1020, the second methylation pattern corresponding to the DNA of the second tissue is obtained. The second methylation pattern can be read from memory, as it may have been previously determined. The second methylation pattern can be determined from the second tissue, such as different samples containing only or primarily cells of the second tissue. The second methylation pattern can correspond to the cellular methylation pattern and be obtained from cellular DNA. As another example, the second pattern can be determined from plasma samples collected before pregnancy or before the onset of cancer, because the plasma methylation profile of non-pregnant individuals without cancer is very similar to the methylation profile of blood cells.

第二甲基化型态可以提供生物体的基因组中多个基因座每一者的甲基化密度。特定 基因座上的甲基化密度对应于第二组织的甲基化的DNA的比例。在一个实施例中,甲 基化密度是CpG甲基化密度,其中与基因座相关的CpG位点用于确定甲基化密度。如 果基因座存在一个位点,那么甲基化密度可以等于甲基化指数。甲基化密度还对应于未甲基化密度,因为两个值互补。The second methylation pattern can provide the methylation density of each of multiple loci in an organism's genome. The methylation density at a particular locus corresponds to the proportion of methylated DNA in the second tissue. In one embodiment, the methylation density is the CpG methylation density, where the CpG site associated with the locus is used to determine the methylation density. If a locus has only one site, then the methylation density can be equal to the methylation index. The methylation density also corresponds to the unmethylated density because the two values are complementary.

在一个实施例中,第二甲基化型态通过对来自生物体样品的细胞DNA进行可识别甲基化的测序区域获得。可识别甲基化的测序的一个实例包括用亚硫酸氢钠处理DNA 并随后进行DNA测序。在另一实例中,可识别甲基化的测序可以在不使用亚硫酸氢钠 下,使用单分子测序平台进行,所述单分子测序平台将允许在无亚硫酸氢盐转化下直接检测DNA分子的甲基化状态(包括N6-甲基腺嘌呤、5-甲基胞嘧啶和5-羟基甲基胞嘧啶) (福拉伯格等人2010自然方法;7:461-465;诗姆等人2013科学报道;3:1389.doi: 10.1038/srep01389);或通过甲基化胞嘧啶的免疫沉淀(例如通过使用针对甲基胞嘧啶的抗体或通过使用甲基化DNA结合蛋白或肽(阿塞韦多等人2011表观基因组学;3: 93-101(LG Acevedo etal.2011Epigenomics;3:93-101))、接着测序来进行;或通过使用 甲基化敏感性限制酶、接着测序来进行。在另一个实施例中,使用非测序技术,例如阵 列、数字PCR和质谱分析。In one embodiment, the second methylation pattern is obtained by sequencing regions of methylation-recognizable DNA from cellular DNA from a biological sample. An example of sequencing that recognizes methylation includes treating DNA with sodium bisulfite followed by DNA sequencing. In another example, sequencing that recognizes methylation can be performed without sodium bisulfite using a single-molecule sequencing platform that would allow direct detection of the methylation status of DNA molecules (including N6-methyladenine, 5-methylcytosine, and 5-hydroxymethylcytosine) without bisulfite conversion (Fullerberg et al. 2010 Nature Methods; 7:461-465; Smoe et al. 2013 Science Reports; 3:1389. doi: 10.1038/srep01389); or via... Immunoprecipitation of permethylated cytosine (e.g., using antibodies against methylcytosine or using methylated DNA-binding proteins or peptides (LG Acevedo et al. 2011 Epigenomics; 3: 93-101)) followed by sequencing; or using methylation-sensitive restriction enzymes followed by sequencing. In another embodiment, non-sequencing techniques such as array, digital PCR, and mass spectrometry are used.

在另一个实施例中,第二组织的第二甲基化密度可以预先从个体的对照样品或从其 它个体获得。来自另一个体的甲基化密度可以充当具有参考甲基化密度的参考甲基化型 态。参考甲基化密度可以从多个样品确定,其中基因座上不同甲基化密度的平均水平(或 其它统计值)可以用作所述基因座上的参考甲基化密度。In another embodiment, the second methylation density of the second tissue can be obtained in advance from a control sample of the individual or from other individuals. The methylation density from another individual can serve as a reference methylation pattern with a reference methylation density. The reference methylation density can be determined from multiple samples, wherein the average level (or other statistical value) of different methylation densities at the locus can be used as the reference methylation density at said locus.

在框1030处,从混合物的游离DNA测定游离甲基化型态。游离甲基化型态提供了多个基因座每一者上的甲基化密度。游离甲基化型态可以通过接收来自游离DNA的测 序的序列读数来测定,其中甲基化信息用序列读数获得。游离甲基化型态可以用与细胞 甲基化组相同的方式测定。At box 1030, the free methylation pattern is determined from the free DNA of the mixture. The free methylation pattern provides the methylation density at each of the multiple loci. The free methylation pattern can be determined by receiving sequence reads from the sequencing of the free DNA, where methylation information is obtained from the sequence reads. The free methylation pattern can be determined in the same manner as the cellular methylome.

在框1040处,确定生物样品中来自第一组织的游离DNA的百分比。在一个实施例中,第一组织是胎儿组织,并且对应DNA是胎儿DNA。在另一个实施例中,第一组织 是肿瘤组织,并且对应DNA是肿瘤DNA。百分比可以用多种方式确定,例如使用胎儿 特定的等位基因或肿瘤特定的等位基因。拷贝数也可以用于确定百分比,例如如2013 年3月13日提交的标题为“用于癌症检测的血浆DNA的突变分析(Mutational Analysis Of Plasma DNA ForCancer Detection)”的美国专利申请案13/801,748(以引用的方式并入) 中所描述。At box 1040, the percentage of cell-free DNA from a first tissue in the biological sample is determined. In one embodiment, the first tissue is fetal tissue, and the corresponding DNA is fetal DNA. In another embodiment, the first tissue is tumor tissue, and the corresponding DNA is tumor DNA. The percentage can be determined in various ways, such as using fetal-specific alleles or tumor-specific alleles. Copy number can also be used to determine the percentage, as described, for example, in U.S. Patent Application 13/801,748, filed March 13, 2013, entitled "Mutational Analysis of Plasma DNA for Cancer Detection" (incorporated by reference).

在框1050处,鉴别用于测定第一甲基化组的多个基因座。这些基因座可能对应于用于测定游离甲基化型态和第二甲基化型态的每个基因座。因此,多个基因座可能对应。可能更多个基因座可以用来测定游离甲基化型态和第二甲基化型态。At box 1050, multiple loci are identified for determining the first methylation group. These loci may correspond to each locus used to determine the free methylation pattern and the second methylation pattern. Therefore, multiple loci may correspond. It is possible that even more loci can be used to determine the free methylation pattern and the second methylation pattern.

在一些实施例中,可以例如使用母体血细胞鉴别在第二甲基化型态中高甲基化或低 甲基化的基因座。为了鉴别母体血细胞中高甲基化的基因座,可以从染色体的一端扫描甲基化指数≥X%(例如80%)的CpG位点。随后可以搜索下游区域内(例如下游200bp 内)的下一个CpG位点。如果紧靠下游CpG位点也具有甲基化指数≥X%(或其它指定量),那么可以将第一和第二CpG位点合并为一组。合并可以继续,直到一下游区域内 不存在其它CpG位点,或紧靠下游CpG位点的甲基化指数<X%。如果合并的CpG位点 的区域含有至少五个紧邻的高甲基化的CpG位点,那么所述区域可以报导为在母体血细 胞中高甲基化。可以进行类似的分析以针对甲基化指数≤20%的CpG位点,搜索母体血 细胞中低甲基化的基因座。可以计算入围的基因座的第二甲基化型态的甲基化密度并用于例如从母体血浆亚硫酸氢盐测序数据推断对应基因座的第一甲基化型态(例如胎盘组 织甲基化密度)。In some embodiments, maternal blood cells can be used, for example, to identify loci that are hypermethylated or hypomethylated in the second methylation pattern. To identify hypermethylated loci in maternal blood cells, CpG sites with a methylation index ≥ X% (e.g., 80%) can be scanned from one end of the chromosome. The next CpG site in a downstream region (e.g., within 200 bp downstream) can then be searched. If the immediately downstream CpG site also has a methylation index ≥ X% (or other specified amount), then the first and second CpG sites can be merged into one group. Merging can continue until no other CpG sites exist in a downstream region, or the methylation index of the immediately downstream CpG site is < X%. If the region of the merged CpG sites contains at least five adjacent hypermethylated CpG sites, then the region can be reported as hypermethylated in maternal blood cells. Similar analyses can be performed to search for hypomethylated loci in maternal blood cells for CpG sites with a methylation index ≤ 20%. The methylation density of the second methylation type of the shortlisted loci can be calculated and used, for example, to infer the first methylation type of the corresponding loci (e.g., placental tissue methylation density) from maternal plasma bisulfite sequencing data.

在框1060处,通过针对多个基因座每一者,计算包括第二甲基化型态的甲基化密度与游离甲基化型态的甲基化密度之间的差异的差异参数,来确定第一组织的第一甲基化型态。差异通过百分比衡量。At box 1060, the first methylation type of the first tissue is determined by calculating a difference parameter, including the difference between the methylation density of the second methylation type and the methylation density of the free methylation type, for each of the multiple loci. The difference is measured as a percentage.

在一个实施例中,使用以下等式推断第一(例如胎盘)组织中基因座的第一甲基化密度(D):In one embodiment, the first methylation density (D) of a locus in a first (e.g., placental) tissue is inferred using the following equation:

其中mbc表示在基因座(例如从母体血细胞亚硫酸氢盐测序数据中确定的入围基因 座)上第二甲基化型态的甲基化密度;mp表示母体血浆亚硫酸氢盐测序数据中对应基因座的甲基化密度;f表示来自第一组织的游离DNA的百分比(例如胎儿DNA百分比 浓度),并且CN表示在基因座上的拷贝数(例如相对于正常,对于扩增值更高或对于缺 失数目更低)。如果第一组织中没有扩增或缺失,那么CN可以为一。对于染色体三倍体 (或肿瘤或胎儿中染色体重复区域),CN将为1.5(因为从2个拷贝增加到3个拷贝)并且染色体单倍体将具有0.5。更高的扩增可以按0.5的增量增加。在此实例中,D可以对 应于差异参数。Where mbc represents the methylation density of the second methylation pattern at the locus (e.g., a shortlisted locus identified from maternal blood cell bisulfite sequencing data); mp represents the methylation density of the corresponding locus in maternal plasma bisulfite sequencing data; f represents the percentage of cell-free DNA from the first tissue (e.g., the percentage concentration of fetal DNA); and CN represents the copy number at the locus (e.g., higher for amplification or lower for deletion relative to normal). If there is no amplification or deletion in the first tissue, then CN can be one. For triploidy (or chromosomal repetitive regions in tumors or fetuses), CN will be 1.5 (because it increases from 2 copies to 3 copies), and for haploidy it will be 0.5. Higher amplification can be increased in increments of 0.5. In this example, D can correspond to the differential parameter.

在框1070处,变换第一甲基化密度以获得第一组织的校正的第一甲基化密度。变换可能造成差异参数与第一组织的实际甲基化型态之间的固定差异。举例来说,值可以 相差固定常数或斜率。变换可以是线性或非线性的。At box 1070, the first methylation density is transformed to obtain the corrected first methylation density of the first tissue. The transformation may result in a fixed difference between the difference parameter and the actual methylation pattern of the first tissue. For example, the values may differ by a fixed constant or slope. The transformation can be linear or non-linear.

在一个实施例中,发现推断值D的分布低于胎盘组织的实际甲基化水平。举例来说, 推断值可以使用来自CpG岛的数据线性变换,CpG岛是CpG位点比例相对过高的基因 组区段。用于此项研究的CpG岛的基因组位置是从UCSC基因组浏览器数据库(NCBI build 36/hg18)(藤田等人2011核酸研究;39:D876-882(PA Fujita et al.2011Nucleic Acids Res;39:D876-882))获得的。举例来说,CpG岛可以被定义为GC含量≥50%、基因组长度>200bp并且观测/预期的CpG数目的比率>0.6的基因组区段(加德纳-加登等 人1987分子生物学杂志;196:261-282(M Gardiner-Garden et al 1987J Mol Biol;196: 261-282))。In one embodiment, the distribution of the inferred value D was found to be lower than the actual methylation level in placental tissue. For example, the inferred value can be obtained using a linear transformation of data from CpG islands, which are genomic regions with a relatively high proportion of CpG sites. The genomic locations of the CpG islands used in this study were obtained from the UCSC Genome Browser database (NCBI build 36/hg18) (Fujita et al. 2011 Nucleic Acids Res; 39:D876-882). For example, a CpG island can be defined as a genomic region with a GC content ≥50%, a genome length >200bp, and an observed/expected CpG number ratio >0.6 (M Gardiner-Garden et al., 1987, J Mol Biol; 196: 261-282).

在一个实现方式中,为推导线性变换等式,可以包括测序样品中具有至少4个CpG位点并且每个CpG位点的平均读取深度≥5的CpG岛。在确定CVS或足月胎盘中CpG 岛的甲基化密度与推断值D之间的线性关系后,使用以下等式确定预测值:In one implementation, to extrapolate the linear transformation equation, CpG islands in the sequencing sample with at least four CpG sites and an average read depth ≥5 for each CpG site can be included. After determining the linear relationship between the methylation density of CpG islands in CVS or full-term placenta and the inferred value D, the predicted value is determined using the following equation:

早期妊娠预测值=D×1.6+0.2Early pregnancy prediction value = D × 1.6 + 0.2

晚期妊娠预测值=D×1.2+0.05Predictive value for late pregnancy = D × 1.2 + 0.05

B.胎儿实例B. Fetal Case

如上文所提及,方法1000可以用于从母体血浆推断胎盘的甲基化概况。血浆中的循环DNA主要来源于造血细胞。仍然存在由其它内脏贡献的未知比例的游离DNA。此外,胎盘来源的游离DNA占母体血浆中总DNA的约5-40%,平均值为约15%。因此, 可以假设,母体血浆中的甲基化水平相当于背景甲基化加怀孕期间胎盘的贡献,如上所 述。As mentioned above, Method 1000 can be used to infer the methylation profile of the placenta from maternal plasma. Circulating DNA in plasma primarily originates from hematopoietic cells. An unknown proportion of cell-free DNA remains, contributed by other internal organs. Furthermore, placental-derived cell-free DNA accounts for approximately 5-40% of total DNA in maternal plasma, with an average of approximately 15%. Therefore, it can be assumed that the methylation level in maternal plasma corresponds to background methylation plus the contribution of the placenta during pregnancy, as described above.

可以使用以下等式确定母体血浆甲基化水平MP:The maternal plasma methylation level (MP) can be determined using the following equation:

MP=BKG×(1-f)+PLN×fMP = BKG × (1-f) + PLN × f

其中BKG是来源于血细胞和内脏的血浆中的背景DNA甲基化水平,PLN是胎盘的 甲基化水平,并且f是母体血浆中的胎儿DNA百分比浓度。BKG represents the background DNA methylation level in plasma derived from blood cells and internal organs, PLN represents the methylation level of the placenta, and f represents the percentage concentration of fetal DNA in maternal plasma.

在一个实施例中,胎盘的甲基化水平可以在理论上如下推导:In one embodiment, the methylation level of the placenta can be theoretically derived as follows:

当CN等于一,D等于PLN,并且BKG等于mbc时等式(1)和(2)相等。在另 一个实施例中,胎儿DNA百分比浓度可以假设或设定成指定值,例如为存在的最小f的假设的一部分。Equations (1) and (2) are equal when CN equals one, D equals PLN, and BKG equals mbc. In another embodiment, the percentage concentration of fetal DNA can be assumed or set to a specified value, for example, as part of the assumption of a minimum f.

获得母体血液的甲基化水平以表示母体血浆的背景甲基化。除母体血细胞中高甲基 化或低甲基化的基因座外,还进一步通过集中在具有临床关联性的界定区域,例如人类基因组中的CpG岛来探索推断方法。The methylation level of maternal blood was obtained to represent the background methylation of maternal plasma. In addition to loci with high or low methylation in maternal blood cells, inference methods were further explored by focusing on clinically relevant delimited regions, such as CpG islands in the human genome.

常染色体和染色体X上总共27,458个CpG岛(NCBI Build36/hg18)的平均甲基化密度来源于母体血浆和胎盘的测序数据。只选择在包括胎盘、母体血液和母体血浆在内 的所有分析样品中覆盖的CpG位点≥10并且每个覆盖的位点的平均读取深度≥5的CpG 岛。结果,26,698个CpG岛(97.2%)保持为有效的,并且使用根据以上等式的血浆甲 基化数据和胎儿DNA百分比浓度推断其甲基化水平。The average methylation density of a total of 27,458 CpG islands (NCBI Build36/hg18) on autosomes and chromosome X was derived from sequencing data from maternal plasma and placenta. Only CpG islands covering ≥10 CpG sites and having an average read depth ≥5 for each covered site were selected from all analyzed samples, including placenta, maternal blood, and maternal plasma. As a result, 26,698 CpG islands (97.2%) remained valid, and their methylation levels were inferred using plasma methylation data and the percentage concentration of fetal DNA derived from the equation above.

注意到推断的PLN值的分布低于胎盘组织中CpG岛的实际甲基化水平。因此,在 一个实施例中,推断的PLN值或简单推断值(D)用作任意单位来评估胎盘中CpG岛 的甲基化水平。在变换后,推断值呈线性并且其分布变得更类似于实际数据集。经变换 的推断值命名为甲基化预测值(MPV)并且随后用于预测胎盘中基因座的甲基化水平。It is noted that the distribution of the inferred PLN values is lower than the actual methylation level of CpG islands in placental tissue. Therefore, in one embodiment, the inferred PLN value, or simple inferred value (D), is used as an arbitrary unit to assess the methylation level of CpG islands in the placenta. After transformation, the inferred values become linear and their distribution becomes more similar to the actual dataset. The transformed inferred values are named methylation prediction values (MPV) and are subsequently used to predict the methylation level of loci in the placenta.

在此实例中,CpG岛基于其在胎盘中的甲基化密度分成3类:低(≤0.4)、中(>0.4-<0.8)和高(≥0.8)。使用推断等式,计算相同组的CpG岛的MPV并随后使用所 述值对应的阈值相同将其分成3类。通过比较实际和推断的数据集,发现75.1%的由 MPV值确定入围的CpG岛可以正确地匹配的组织数据中的相同类别。约22%的CpG岛 分配给具有1级差异的群体(高对中,或中对低)并且低于3%将完全错分类(高对低) (图12A)。也确定了总分类性能:胎盘中甲基化密度≤0.4、>0.4-<0.8和≥0.8的CpG岛 的86.1%、31.4%和68.8%被正确地推断为“低”、“中”和“高”(图12B)。In this example, CpG islands were categorized into three classes based on their methylation density in the placenta: low (≤0.4), medium (>0.4–<0.8), and high (≥0.8). Using inference equations, the MPV of CpG islands within the same group was calculated, and they were subsequently categorized into the three classes using the same threshold corresponding to the stated values. By comparing the actual and inferred datasets, it was found that 75.1% of the CpG islands shortlisted based on MPV values correctly matched the same category in the tissue data. Approximately 22% of CpG islands were assigned to groups with a level 1 difference (high vs. medium, or medium vs. low), and less than 3% were completely misclassified (high vs. low) (Figure 12A). Overall classification performance was also determined: 86.1%, 31.4%, and 68.8% of CpG islands with methylation densities ≤0.4, >0.4–<0.8, and ≥0.8 in the placenta, respectively, were correctly inferred as “low,” “medium,” and “high” (Figure 12B).

图11A和11B展示根据本发明的实施例,使用母体血浆数据和胎儿DNA百分比浓 度预测的算法的性能图。图11A是图1100,展示使用MPV校正分类(推断类别准确匹 配实际数据集)、1级差异(推断类别与实际数据集相差1级)和错误分类(推断类别与 实际数据集相反)的CpG岛分类的准确性。图11B是图1150,展示CpG在每个推断类 别中被正确分类的比例。Figures 11A and 11B illustrate the performance of an algorithm for predicting percentage concentrations of fetal DNA using maternal plasma data according to an embodiment of the present invention. Figure 11A, which is Figure 1100, shows the accuracy of CpG island classification using MPV-corrected classification (inferred class accurately matches the actual dataset), level 1 discrepancy (inferred class differs from the actual dataset by 1 level), and misclassification (inferred class is opposite to the actual dataset). Figure 11B, which is Figure 1150, shows the proportion of CpGs correctly classified in each inferred class.

假设母体背景甲基化在相应基因组区域中是低的,循环中高甲基化的胎盘来源DNA 的存在将增加总血浆甲基化水平,程度取决于胎儿DNA百分比浓度。当释放的胎儿DNA完全甲基化时可以观测到显著的改变。相反,当母体背景甲基化高时,如果释放低甲基 化的胎儿DNA,那么血浆甲基化水平的改变程度将变得更显著。因此,当针对已知在母 体背景与胎盘之间不同的基因座,尤其针对胎盘中高甲基化和低甲基化的标记物推断甲 基化水平时,推断方案可能更切实可行。Assuming low maternal background methylation in the corresponding genomic regions, the presence of highly methylated placental-derived DNA in circulation will increase total plasma methylation levels to a degree dependent on the percentage concentration of fetal DNA. Significant changes can be observed when the released fetal DNA is fully methylated. Conversely, when maternal background methylation is high, changes in plasma methylation levels will be more significant if hypomethylated fetal DNA is released. Therefore, inference schemes may be more practical when inferring methylation levels from loci known to differ between the maternal background and placenta, particularly from markers of hypermethylation and hypomethylation in the placenta.

图12A是表1200,展示根据本发明的实施例对15个所选基因座进行甲基化预测的细节。为了证实技术,选择15个先前已经研究的甲基化有差异的基因座。推断所选区 域的甲基化水平并与先前研究的15个甲基化有差异的基因座相比(丘等人2007美国病 理学杂志;170:941-950(RWK Chiu et al.2007Am J Pathol);詹等人2008临床化学;54: 500-511(S.S.C.Chim et al.2008Clin Chem;54:500-511);詹等人2005美国国家科学 院院刊;102:14753-14758(SSC Chim et al.2005Proc Natl Acad Sci U S A;102: 14753-14758);崔等人2010公共科学图书馆·综合;5:e15069(DWY Tsui et al.2010PloS One;5:e15069))。Figure 12A is Table 1200, illustrating details of methylation prediction for 15 selected loci according to an embodiment of the invention. To validate the technique, 15 previously studied differentially methylated loci were selected. The methylation levels of the selected regions were inferred and compared with those of the 15 previously studied differentially methylated loci (RWK Chiu et al. 2007 American Journal of Pathology; 170:941-950; S.S.C. Chim et al. 2008 Clinical Chemistry; 54:500-511). ; Zhan et al. 2005 Proc Natl Acad Sci U.S.A.; 102:14753-14758; Cui et al. 2010 PloS One; 5:e15069.

图12B是图1250,展示胎盘中15个所选基因座和其对应甲基化水平的推断类别。推断的甲基化类别是:低,≤0.4;中,>0.4-<0.8;高,≥0.8。表1200和图表1300展示 胎盘中的其甲基化水平可以恰当地推断,有若干例外:RASSF1A、CGI009、CGI137和 VAPA。这4个标记物中,仅仅CGI009展示与实际数据集显著不符。其它都只是微小错 分类。Figure 12B is Figure 1250, showing the inferred methylation levels of 15 selected loci in the placenta and their corresponding methylation levels. The inferred methylation levels are: low (≤0.4), medium (>0.4-<0.8), and high (≥0.8). Table 1200 and Figure 1300 show that the methylation levels in the placenta can be appropriately inferred, with a few exceptions: RASSF1A, CGI009, CGI137, and VAPA. Of these four markers, only CGI009 shows a significant misclassification compared to the actual dataset. The others are minor misclassifications.

表1200中,“1”是指通过以下等式计算的推断值(D):其中f是胎儿DNA百分比浓度。标记“2”是指参考使用以下等式线性变换的推断值的甲基化预 测值(MPV):MPV=D×1.6+0.25。标记“3”是指推断值对应的阈值的分类:低,≤0.4; 中,>0.4-<0.8;高,≥0.8。标记“4”是指实际胎盘数据集对应的阈值的分类:低,≤0.4; 中,>0.4-<0.8;高,≥0.8。标记“5”表示胎盘状态是指相对于母体血细胞的胎盘甲基化状 态。In Table 1200, “1” refers to the inferred value (D) calculated using the following equation: where f is the percentage concentration of fetal DNA. “2” refers to the methylation prediction value (MPV) of the inferred value, using a linear transformation of the following equation: MPV = D × 1.6 + 0.25. “3” indicates the threshold classification corresponding to the inferred value: low, ≤0.4; medium, >0.4-<0.8; high, ≥0.8. “4” indicates the threshold classification corresponding to the actual placental dataset: low, ≤0.4; medium, >0.4-<0.8; high, ≥0.8. “5” indicates that the placental status refers to the placental methylation status relative to maternal blood cells.

C.计算胎儿DNA的百分比浓度C. Calculate the percentage concentration of fetal DNA

在一个实施例中,来自第一组织的胎儿DNA的百分比可以使用男性胎儿的Y染色体。母体血浆样品中染色体Y(%chrY)序列的比例是来源于男性胎儿的染色体Y读数 与母体(女性)中错误比对到染色体Y的读数的数目的复合物(丘等人2011BMJ;342: c7401)。因此,样品中%chrY与胎儿DNA百分比浓度(f)之间的关系可以通过以下给 出:In one embodiment, the percentage of fetal DNA from the first tissue can be derived from the Y chromosome of a male fetus. The proportion of chromosome Y (%chrY) sequence in a maternal plasma sample is a complex of the number of chromosome Y readings derived from a male fetus and the number of misaligned chromosome Y readings in the mother (female) (Qiu et al. 2011 BMJ; 342: c7401). Therefore, the relationship between %chrY in the sample and the percentage concentration (f) of fetal DNA can be given as follows:

其中%chrY男性是指含有100%男性DNA的血浆样品中与染色体Y比对的读数的比例; 以及%chrY女性是指含有100%女性DNA的血浆样品中与染色体Y比对的读数的比例。%chrY male refers to the proportion of readings that align with chromosome Y in plasma samples containing 100% male DNA; and %chrY female refers to the proportion of readings that align with chromosome Y in plasma samples containing 100% female DNA.

%chrY可以由与针对来自怀有男性胎儿的女性的样品,无错配地比对到染色体Y的 读数确定,例如其中读数来自经亚硫酸氢盐转化的样品。%chrY男性值可以从两个成年男性血浆样品的亚硫酸氢盐测序获得。%chrY女性值可以从两个非怀孕成年女性血浆样品的亚硫酸氢盐测序获得。%chrY can be determined by mismatch-free alignment of readings to chromosome Y with samples from women carrying male fetuses, for example, where the readings are from bisulfite-converted samples. %chrY male values can be obtained from bisulfite sequencing of plasma samples from two adult males. %chrY female values can be obtained from bisulfite sequencing of plasma samples from two non-pregnant adult women.

在其它实施例中,胎儿DNA百分比可以由常染色体上胎儿特定的等位基因确定。作为另一实例,表观遗传标记可以用于确定胎儿DNA百分比。还可以使用其它确定胎 儿DNA百分比的方式。In other embodiments, the percentage of fetal DNA can be determined by fetal-specific alleles on autosomes. As another example, epigenetic markers can be used to determine the percentage of fetal DNA. Other methods for determining the percentage of fetal DNA may also be used.

D.使用甲基化确定拷贝数的方法D. Methods for determining copy number using methylation

胎盘基因组比母体基因组更加低甲基化。如上文所论述,孕妇血浆的甲基化依赖于 母体血浆中胎盘来源的胎儿DNA的百分比浓度。因此,通过分析染色体区域的甲基化 密度,可以检测胎儿组织对母体血浆贡献的差异。举例来说,在带有三体症胎儿(例如 患有第21对染色体三体症或第18对染色体三体症或第13对染色体三体症)的孕妇中, 当与二体染色体比较时,胎儿将从三体染色体提供额外量的DNA到母体血浆。在此情 况下,三体染色体(或具有扩增的任何染色体区域)的血浆甲基化密度将低于二体染色 体的血浆甲基化密度。差异程度可以通过考虑血浆样品中的胎儿DNA百分比浓度进行数学计算来预测。血浆样品中胎儿DNA百分比浓度越高,三体与二体染色体之间的甲 基化密度差异就越大。对于具有缺失的区域,甲基化密度将更高。The placental genome is more hypomethylated than the maternal genome. As discussed above, the methylation of maternal plasma depends on the percentage concentration of placental-derived fetal DNA in the maternal plasma. Therefore, by analyzing the methylation density of chromosomal regions, differences in the contribution of fetal tissue to maternal plasma can be detected. For example, in pregnant women with trisomy (e.g., trisomy 21, 18, or 13), the fetus will contribute an additional amount of DNA from the trisomy to the maternal plasma compared to a disomy. In this case, the plasma methylation density of the trisomy (or any chromosomal region with amplification) will be lower than that of the disomy. The degree of difference can be predicted mathematically by considering the percentage concentration of fetal DNA in the plasma sample. The higher the percentage concentration of fetal DNA in the plasma sample, the greater the difference in methylation density between trisomy and disomy chromosomes. For regions with deletions, the methylation density will be even higher.

缺失的一个实例是特纳综合症(Turner syndrome),此时女性胎儿将仅仅具有染色 体X的一个拷贝。在此情况下,对于怀有患有特纳综合症的胎儿的孕妇来说,其血浆DNA中染色体X的甲基化密度将高于怀有具有正常数目的染色体X的女性胎儿的相同 孕妇的情况。在此策略的一个实施例中,可以首先分析母体血浆中染色体Y序列的存在 或不存在(例如使用MPS或基于PCR的技术)。如果染色体Y序列存在,那么胎儿可 以归类为男性并将无需以下分析。另一方面,如果母体血浆中缺乏染色体Y序列,那么 胎儿可以归类为女性。在此情况下,随后可以分析母体血浆中染色体X的甲基化密度。 比正常高的染色体X甲基化密度将指示胎儿具有高的患特纳综合症风险。此方法也可以 应用于其它性染色体非整倍体。举例来说,对于患有XYY的胎儿来说,母体血浆中Y 染色体的甲基化密度将低于母体血浆中胎儿DNA水平类似的正常XY胎儿。作为另一 实例,对于患有克氏综合症(Klinefeltersyndrome)(XXY)的胎儿来说,染色体Y序列存在于母体血浆中,但母体血浆中染色体X的甲基化密度将低于母体血浆中胎儿DNA 水平类似的正常XY胎儿。One example of a missing chromosome is Turner syndrome, in which the female fetus will have only one copy of chromosome X. In this case, the methylation density of chromosome X in the plasma DNA of a pregnant woman carrying a fetus with Turner syndrome will be higher than that of the same pregnant woman carrying a female fetus with a normal number of chromosome X chromosomes. In one embodiment of this strategy, the presence or absence of chromosome Y sequence in maternal plasma can be analyzed first (e.g., using MPS or PCR-based techniques). If chromosome Y sequence is present, the fetus can be classified as male and the following analysis is unnecessary. On the other hand, if chromosome Y sequence is absent in maternal plasma, the fetus can be classified as female. In this case, the methylation density of chromosome X in maternal plasma can then be analyzed. A higher than normal chromosome X methylation density will indicate a high risk of Turner syndrome in the fetus. This method can also be applied to other sex chromosome aneuploidies. For example, for a fetus with XYY chromosomes, the methylation density of the Y chromosome in maternal plasma will be lower than that of a normal XY fetus with similar levels of fetal DNA in maternal plasma. As another example, in fetuses with Klinefelter syndrome (XXY), the Y chromosome sequence is present in the maternal plasma, but the methylation density of chromosome X in the maternal plasma will be lower than that of normal XY fetuses with similar fetal DNA levels in the maternal plasma.

从先前的论述,可以将二体染色体的血浆甲基化密度(MP整倍体)计算为: MP整倍体=BKG×(1-f)+PLN×f,其中BKG是来源于血细胞和内脏的血浆中的背景DNA 甲基化水平,PLN是胎盘的甲基化水平,并且f是母体血浆中的胎儿DNA百分比浓度。Based on the previous discussion, the plasma methylation density (MP euploidy ) of disomy chromosomes can be calculated as: MP euploidy = BKG × (1-f) + PLN × f, where BKG is the background DNA methylation level in plasma derived from blood cells and viscera, PLN is the methylation level of the placenta, and f is the percentage concentration of fetal DNA in maternal plasma.

三体染色体的血浆甲基化密度(MP非整倍体)可以被计算为: MP非整倍体=BKG×(1-f)+PLN×f×1.5,其中1.5对应于拷贝数CN,并且多增加一个染色 体增加50%。三体与二体染色体之间的差异(MPDiff)将是The plasma methylation density (MP aneuploidy ) of trisomic chromosomes can be calculated as: MP aneuploidy = BKG × (1-f) + PLN × f × 1.5, where 1.5 corresponds to the copy number CN, and each additional chromosome increases the density by 50%. The difference between trisomic and disomy chromosomes (MP Diff ) will be...

MPDiff=PLN×f×0.5。MP Diff = PLN × f × 0.5.

在一个实施例中,可能非整倍体染色体(或染色体区域)的甲基化密度与一或多个其它假定的整倍体染色体或基因组的总甲基化密度的比较可以用于有效地标准化血浆 样品中的胎儿DNA浓度。比较可以通过计算两个区域的甲基化密度之间的参数(例如 涉及比率或差异)来获得标准化的甲基化密度。比较可以去除所得甲基化水平的依赖性 (例如确定为来自两个甲基化密度的参数)。In one embodiment, comparing the methylation density of a potentially aneuploid chromosome (or chromosomal region) with the total methylation density of one or more other assumed euploid chromosomes or genomes can be used to effectively normalize fetal DNA concentration in a plasma sample. The comparison can be used to obtain a normalized methylation density by calculating a parameter (e.g., involving a ratio or difference) between the methylation densities of the two regions. The comparison can remove dependence on the resulting methylation level (e.g., a parameter determined from the two methylation densities).

如果可能非整倍体染色体的甲基化密度未相对于一或多个其它染色体的甲基化密 度或反映胎儿DNA百分比浓度的其它参数标准化,那么百分比浓度将是影响血浆中甲基化密度的主要因素。举例来说,怀有第21对染色体三体症胎儿并且胎儿DNA百分比浓度为10%的孕妇的第21对染色体的血浆甲基化密度将与怀有整倍体胎儿并且胎儿 DNA百分比浓度是15%的孕妇相同,而标准化的甲基化密度将展示差异。If the methylation density of a potentially aneuploid chromosome is not normalized relative to the methylation density of one or more other chromosomes or other parameters reflecting the percentage concentration of fetal DNA, then the percentage concentration will be the primary factor influencing plasma methylation density. For example, the plasma methylation density of chromosome 21 in a pregnant woman carrying a fetus with trisomy 21 and a fetal DNA percentage concentration of 10% will be the same as that in a pregnant woman carrying an euploid fetus and a fetal DNA percentage concentration of 15%, whereas normalized methylation densities will show differences.

在另一个实施例中,可能非整倍体染色体的甲基化密度可以相对于胎儿DNA百分比浓度标准化。举例来说,以下等式可以应用于标准化甲基化密度:其中MP标准化是用血浆中的胎儿DNA百分比浓度标准 化的甲基化密度,MP未标准化是测量的甲基化密度,BKG是来自母体血细胞或组织的背景 甲基化密度,PLN是胎盘组织中的甲基化密度,并且f是胎儿DNA百分比浓度。BKG和 PLN的甲基化密度可以基于先前从获自健康孕妇的母体血细胞和胎盘组织建立的参考值。不同的遗传和表观遗传方法可以例如通过使用大规模平行测序或PCR在未经亚硫酸 氢盐转化的DNA上从染色体Y测量序列读数的百分比,用于测定血浆样品中的胎儿DNA 百分比浓度。In another embodiment, the methylation density of potentially aneuploid chromosomes can be normalized relative to the percentage concentration of fetal DNA. For example, the following equation can be applied to normalize methylation density: where MP normalized is the methylation density normalized to the percentage concentration of fetal DNA in plasma, MP unnormalized is the measured methylation density, BKG is the background methylation density from maternal blood cells or tissue, PLN is the methylation density in placental tissue, and f is the percentage concentration of fetal DNA. The methylation densities of BKG and PLN can be based on reference values previously established from maternal blood cells and placental tissue obtained from healthy pregnant women. Different genetic and epigenetic methods can be used, for example, to determine the percentage concentration of fetal DNA in a plasma sample by measuring the percentage of sequence readings from chromosome Y on DNA that has not undergone bisulfite conversion using massively parallel sequencing or PCR.

在一个实现方式中,可能非整倍体染色体的标准化甲基化密度可以与由怀有整倍体 胎儿的孕妇组成的参考群体相比。可以确定参考群体的标准化甲基化密度的平均值和SD。随后测试案例的标准化甲基化密度可以表示为z分数,其指示与参考群体平均值的 SD数目:其中MP标准化是测试案例的标准化甲基化密度,平均值 是参考案例的标准化甲基化密度的平均值并且SD是参考案例的标准化甲基化密度的标 准差。例如z分数<-3的阈值可以用于分类染色体是否显著低甲基化,并且因此确定样品 的非整倍体状态。In one implementation, the normalized methylation density of aneuploid chromosomes may be compared to a reference population consisting of pregnant women carrying euploid fetuses. The mean and standard deviation (SD) of the normalized methylation density of the reference population can be determined. Subsequently, the normalized methylation density of the test cases can be expressed as a z-score, indicating the number of SDs relative to the reference population mean: where MP is the normalized methylation density of the test cases, the mean is the average of the normalized methylation densities of the reference cases, and the SD is the standard deviation of the normalized methylation density of the reference cases. For example, a threshold of z-score < -3 can be used to classify whether chromosomes are significantly hypomethylated, and thus determine the aneuploid state of the sample.

在另一个实施例中,MPDiff可以用作标准化甲基化密度。在此类实施例中,PLN可以例如使用方法1000推断。在一些实现方式中,参考甲基化密度(其可以使用f标准化) 可以从整倍体区域的甲基化水平确定。举例来说,平均值可以从相同样品的一或多个染 色体区域确定。阈值可以用f衡量,或只是设定成足够存在最小浓度的水平。In another embodiment, the MP Diff can be used as a normalized methylation density. In such embodiments, the PLN can be inferred, for example, using method 1000. In some implementations, a reference methylation density (which can be normalized using f) can be determined from the methylation level of the euploid region. For example, the average value can be determined from one or more chromosomal regions of the same sample. The threshold can be measured using f, or simply set to a level sufficient to present a minimum concentration.

因此,可以用各种方式实现区域的甲基化水平与阈值的比较。比较可以包括标准化 (例如如上所述),其可以同等地对甲基化水平或阈值进行,取决于值如何界定。因此,可以用多种方式确定区域的所确定的甲基化水平是否在统计学上不同于参考水平(从相同样品或其它样品确定)。Therefore, comparisons between the methylation level of a region and a threshold can be made in various ways. Comparisons can include standardization (e.g., as described above), which can be applied equally to either the methylation level or the threshold, depending on how the value is defined. Thus, it can be determined in several ways whether a region's determined methylation level differs statistically from a reference level (determined from the same sample or other samples).

以上分析可以应用于染色体区域的分析,其可以包括全染色体或染色体的部分,包 括染色体的相邻或分离的子区。在一个实施例中,可能非整倍体染色体可以划分成大量的区域。所述分数胎儿DNA浓度可以具有相同或不同的尺寸。每个区域的甲基化密度可以相对于样品的百分比浓度或相对于一或多个假定整倍体染色体的甲基化密度或基因组的总甲基化密度标准化。每个区域的标准化甲基化密度随后可以与参考群体比较以 确定其是否显著低甲基化。随后可以确定显著低甲基化的区域的百分比。例如超过显著 低甲基化的区域的5%、10%、15%、20%或30%的阈值可以用于将案例的非整倍体状态 分类。The above analysis can be applied to the analysis of chromosomal regions, which may include whole chromosomes or portions of chromosomes, including adjacent or separate subregions of chromosomes. In one embodiment, a possible aneuploid chromosome can be divided into a large number of regions. The fractional fetal DNA concentrations may have the same or different sizes. The methylation density of each region can be normalized relative to a percentage concentration of the sample or relative to the methylation density of one or more assumed euploid chromosomes or the total methylation density of the genome. The normalized methylation density of each region can then be compared with a reference population to determine whether it is significantly hypomethylated. The percentage of significantly hypomethylated regions can then be determined. For example, thresholds of 5%, 10%, 15%, 20%, or 30% of significantly hypomethylated regions can be used to classify the aneuploid state of the case.

当测试扩增或缺失时,可以将甲基化密度与对测试的特定区域可能是特定的参考甲 基化密度比较。每个区域可以具有不同的参考甲基化密度,因为甲基化可以随区域而变化,特别是取决于区域的尺寸(例如区域越小,变化越多)。When testing for amplification or deletion, the methylation density can be compared to a reference methylation density that may be specific to the particular region being tested. Each region can have a different reference methylation density because methylation can vary with region, particularly depending on the size of the region (e.g., the smaller the region, the more varied it is).

如上文所提及,每一者怀有整倍体胎儿的一或多个孕妇可以用于界定相关区域的甲 基化密度的正常范围或两个染色体区域之间的甲基化密度的差异。还可以确定PLN的正常范围(例如通过直接测量或如通过方法1000推断)。在其它实施例中,可以使用两个 甲基化密度之间的比率,例如可能非整倍体染色体与整倍体染色体的两个甲基化密度之 间的比率可以用于分析而非其差异。此甲基化分析方法可以与序列读数计数方法(丘等 人2008美国国家科学院院刊;105:20458-20463(RWK Chiu et al.2008Proc Natl Acad Sci USA;105:20458-20463))和涉及血浆DNA尺寸分析的方法(美国专利2011/0276277) 组合以确定或证实非整倍体。与甲基化分析组合使用的序列读数计数方法可以使用随机 测序(丘等人2008美国国家科学院院刊;105:20458-20463;比安奇等人2012妇产科 期刊119:890-901(DW Bianchi DW et al.2012Obstet Gynecol 119:890-901))或靶向测序 (斯帕克斯等人2012美国妇产科期刊206:319.e1-9(AB Sparks et al.2012Am J Obstet Gynecol 206:319.e1-9);齐默尔曼等人2012产前诊断32:1233-1241(B Zimmermann et al. 2012PrenatDiagn 32:1233-1241);廖等人2012公共科学图书馆·综合;7:e38154(GJ Liao etal.2012PloS One;7:e38154))进行。As mentioned above, one or more pregnant women, each carrying an euploid fetus, can be used to define the normal range of methylation density in the relevant region or the difference in methylation density between two chromosomal regions. The normal range of PLN can also be determined (e.g., by direct measurement or inference as in method 1000). In other embodiments, the ratio between two methylation densities, such as the ratio between two methylation densities of a possible aneuploid chromosome and an euploid chromosome, can be used for analysis rather than their difference. This methylation analysis method can be combined with sequence read counting methods (RWK Chiu et al. 2008 Proc Natl Acad Sci USA; 105:20458-20463) and methods involving plasma DNA size analysis (US Patent 2011/0276277) to identify or confirm aneuploidy. Sequence read counting methods used in conjunction with methylation analysis can employ random sequencing (Qiu et al., 2008, Proceedings of the National Academy of Sciences; 105:20458-20463; Bianchi et al., 2012, Journal of Obstetrics and Gynecology 119:890-901) or targeted sequencing (AB Sparks et al., 2012, Journal of Obstetrics and Gynecology 206:319.e1-9). et al. 2012 Am J Obstet Gynecol 206:319.e1-9); Zimmermann et al. 2012 Prenatal Diagnosis 32:1233-1241; Liao et al. 2012 PloS One; 7:e38154.

BKG的使用可以考虑样品之间的背景变化。举例来说,一个女性可能具有与另一女性不同的BKG甲基化水平,但在此类情形下可以跨越样品使用BKG与PLN之间的差 异。不同染色体区域的阈值可以不同,例如当基因组的一个区域的甲基化密度相对于基 因组的另一个区域不同时。The use of BKG can account for background variations between samples. For example, one woman may have different BKG methylation levels than another, but in such cases, the difference between BKG and PLN can be used across samples. Thresholds can differ for different chromosomal regions, for example, when the methylation density of one region of the genome differs from that of another region of the genome.

此方法可以推广到检测胎儿基因组中的任何染色体异常,包括缺失和扩增。此外,此分析的分辨率可以调整到所需水平,例如基因组可以划分成10Mb、5Mb、2Mb、1Mb、 500kb、100kb区域。因此,此项技术也可以用于检测亚染色体重复或亚染色体缺失。 因而此项技术将允许无创地获得产前胎儿分子核型。当以此方式使用时,此项技术可以与基于分子计数的无创产前测试方法(斯里尼瓦桑等人2013美国人类遗传学杂 志;92:167-176(ASrinivasan et al.2013Am J Hum Genet;92:167-176);余等人2013公共科学图书馆·综合8:e60968(SCY Yu et al.2013PloS One 8:e60968))组合使用。在其它 实施例中,区域的尺寸无需一致。举例来说,区域的尺寸可以调整,使得每个区域含有 一致数目的双核苷酸。在此情况下,区域的实际尺寸将是不同的。This method can be extended to detect any chromosomal abnormalities in the fetal genome, including deletions and amplifications. Furthermore, the resolution of this analysis can be adjusted to desired levels; for example, the genome can be divided into 10Mb, 5Mb, 2Mb, 1Mb, 500kb, and 100kb regions. Therefore, this technique can also be used to detect subchromosomal duplications or deletions. Consequently, this technique will allow for non-invasive prenatal fetal molecular karyotype acquisition. When used in this manner, this technique can be combined with non-invasive prenatal testing methods based on molecular counting (ASrinivasan et al. 2013 American Journal of Human Genetics; 92:167-176; SCY Yu et al. 2013 PLOS One 8:e60968). In other embodiments, the region sizes do not need to be uniform. For example, the size of the regions can be adjusted so that each region contains the same number of dinucleotides. In this case, the actual size of the regions will be different.

等式可以重写为MPDiff=(BKG-PLN)×f×0.5×CN,以适用于不同类型的染色体异常。此处CN表示在受影响区域拷贝数改变的数目。对于染色体增加1个拷贝来说,CN 等于1,对于染色体增加2个拷贝来说,CN等于2,并且对于两个同源染色体之一损失 (例如用于检测胎儿特纳综合症,其中女性胎儿损失X染色体之一,导致XO核型)来 说,CN等于-1。当区域尺寸改变时等式无需改变。但是,当使用更小区域尺寸时灵敏 度和特异性可能降低,因为更小区域中存在更少数目的CpG双核苷酸(或展示胎儿DNA 与母体DNA之间的差异甲基化的其它核苷酸组合),导致甲基化密度的测量中随机变异 增加。在一个实施例中,所需要的读数的数目可以通过分析甲基化密度的变异系数和所 需灵敏度水平来确定。The equation can be rewritten as MP Diff = (BKG - PLN) × f × 0.5 × CN to accommodate different types of chromosomal abnormalities. Here, CN represents the number of copy number changes in the affected region. For a chromosome with one copy increase, CN equals 1; for a chromosome with two copies increase, CN equals 2; and for the loss of one of two homologous chromosomes (e.g., used to detect fetal Turner syndrome, where a female fetus loses one of its X chromosomes, resulting in an XO karyotype), CN equals -1. The equation does not need to be changed when the region size changes. However, sensitivity and specificity may decrease when using smaller region sizes because smaller regions contain fewer CpG dinucleotides (or other combinations of nucleotides that demonstrate differential methylation between fetal and maternal DNA), leading to increased random variation in the measurement of methylation density. In one embodiment, the number of readings required can be determined by analyzing the coefficient of variation of methylation density and the required level of sensitivity.

为了说明此方法的可行性,分析来自9个孕妇的血浆样品。在五个孕妇中,每个都怀有整倍体胎儿并且其它四个每个都怀有第21对染色体三体症(T21)胎儿。随机选择五个整倍体孕妇中的三个以形成参考群体。使用此方法分析剩余两个整倍体怀孕案例 (Eu1和Eu2)和四个T21案例(T21-1、T21-2、T21-3和T21-4)以测试可能的T21状 态。血浆DNA经亚硫酸氢盐转化并使用伊路米那HiSeq2000平台测序。在一个实施例 中,计算个别染色体的甲基化密度。随后确定第21对染色体与其它21对常染色体的平 均值之间的甲基化密度的差异以获得标准化甲基化密度(表1)。参考群体的平均值和 SD用于计算六个测试案例的z分数。To illustrate the feasibility of this method, plasma samples from nine pregnant women were analyzed. Five of the pregnant women carried euploid fetuses, and the other four each carried fetuses with trisomy 21 (T21). Three of the five euploid pregnancies were randomly selected to form a reference population. The remaining two euploid pregnancies (Eu1 and Eu2) and four T21 cases (T21-1, T21-2, T21-3, and T21-4) were analyzed using this method to test for possible T21 statistic. Plasma DNA was bisulfite-converted and sequenced using the ilumina HiSeq2000 platform. In one embodiment, the methylation density of individual chromosomes was calculated. The difference in methylation density between chromosome 21 and the mean of the other 21 pairs of autosomes was then determined to obtain normalized methylation density (Table 1). The mean and SD of the reference population were used to calculate the z-scores for the six test cases.

表1:对z分数使用阈值<-3将样品分类为T21,所有整倍体和T21案例的分类都正确。Table 1: Using a threshold of <-3 for z-scores, samples were classified as T21, and all euploid and T21 cases were correctly classified.

在另一个实施例中,基因组划分成1Mb区域并且确定每个1Mb区域的甲基化密度。可能非整倍体染色体上所有区域的甲基化密度都可以用位于假定的整倍体染色体上的 所有区域的中位甲基化密度标准化。在一个实现方式中,对于每个区域,可以计算甲基 化密度与整倍体区域的中位值的差异。可以使用参考群体的平均值和SD值计算这些值 的z分数。可以确定展示低甲基化的区域的百分比(表2)并与阈值百分比相比。In another embodiment, the genome is divided into 1Mb regions and the methylation density of each 1Mb region is determined. The methylation density of all regions on a hypoploid chromosome can be normalized to the median methylation density of all regions on a hypothetical euploid chromosome. In one implementation, for each region, the difference between the methylation density and the median value of the euploid region can be calculated. The z-scores of these values can be calculated using the mean and SD values of a reference population. The percentage of regions exhibiting low methylation can be determined (Table 2) and compared to a threshold percentage.

表2:对于在第21对染色体上显著更加低甲基化的区域使用5%作为阈值,所有案例针对T21状态正确地分类。Table 2: Using 5% as a threshold for regions that are significantly more hypomethylated on chromosome 21, all cases were correctly classified for the T21 state.

基于此DNA甲基化的用于检测胎儿染色体或亚染色体异常的方法可以结合那些基于例如通过测序(丘等人2008美国国家科学院院刊;105:20458-20463)或数字PCR(洛 等人2007美国国家科学院院刊;104:13116-13121)或DNA分子尺寸确定(美国专利 公开案2011/0276277)来分子计数的方法使用。此类组合(例如DNA甲基化加分子计 数,或DNA甲基化加尺寸确定,或DNA甲基化加分子计数加尺寸确定)将具有协同效 应,此在临床环境下将为有利的,例如提高灵敏度和/或特异性。举例来说,可以减少将 需要例如通过测序分析的DNA分子的数目,而不会不利地影响诊断准确性。此特征将允许此类测试更经济地进行。作为另一实例,对于分析的既定数目的DNA分子,组合 的方法将允许在更低的胎儿DNA百分比浓度下检测胎儿染色体或亚染色体异常。Methods for detecting fetal chromosomal or subchromosomal abnormalities based on this DNA methylation can be combined with methods based on, for example, sequencing (Qiu et al., 2008, Proceedings of the National Academy of Sciences; 105:20458-20463), digital PCR (Luo et al., 2007, Proceedings of the National Academy of Sciences; 104:13116-13121), or DNA molecule size determination (US Patent Publication 2011/0276277) for molecule counting. Such combinations (e.g., DNA methylation plus molecule counting, or DNA methylation plus size determination, or DNA methylation plus molecule counting plus size determination) will have synergistic effects that will be advantageous in a clinical setting, such as improving sensitivity and/or specificity. For example, the number of DNA molecules that would need to be analyzed, for example, by sequencing, can be reduced without adversely affecting diagnostic accuracy. This feature will allow such tests to be performed more economically. As another example, for a given number of DNA molecules analyzed, the combined methods will allow for the detection of fetal chromosomal or subchromosomal abnormalities at a lower percentage concentration of fetal DNA.

图13是方法1300的流程图,所述方法1300用于从生物体的生物样品检测染色体异常。生物样品包括包含源自第一组织和第二组织的游离DNA的混合物的游离DNA。 第一组织可以来自胎儿或肿瘤,并且第二组织可以来自怀孕女性或患者。Figure 13 is a flowchart of method 1300 for detecting chromosomal abnormalities from a biological sample of an organism. The biological sample includes cell-free DNA comprising a mixture of cell-free DNA derived from a first tissue and a second tissue. The first tissue may be derived from a fetus or a tumor, and the second tissue may be derived from a pregnant woman or a patient.

在框1310处,分析来自生物样品的多个DNA分子。DNA分子的分析可以包括确 定生物体的基因组中DNA分子的位置和确定DNA分子是否在一或多个位点甲基化。所 述分析可以通过接收来自可识别甲基化的测序的序列读数进行,因而分析可以只在先前 从DNA获得的数据上进行。在其它实施例中,分析可以包括实际测序或获得数据的其 它主动步骤。At box 1310, multiple DNA molecules from a biological sample are analyzed. The analysis of DNA molecules may include determining the location of DNA molecules within the organism's genome and determining whether the DNA molecules are methylated at one or more sites. The analysis can be performed by receiving sequence reads from sequencing that identifies methylation, thus the analysis can be performed solely on data previously obtained from the DNA. In other embodiments, the analysis may include actual sequencing or other active steps to obtain data.

位置的确定可以包括将DNA分子(例如通过序列读数)映射到人类基因组的相应部分,例如特定区域。在一个实现方式中,如果读数未比对到到相关区域,那么可以忽 略此读数。Location determination may include mapping DNA molecules (e.g., by sequence readings) to corresponding parts of the human genome, such as specific regions. In one implementation, if a reading does not align to the relevant region, then the reading can be ignored.

在框1320处,针对多个位点中的每一者,确定在所述位点甲基化的DNA分子的相应数目。在一个实施例中,位点是CpG位点,并且可能仅仅是某些CpG位点,如使用 本文中提及的一或多个标准来选择。一旦使用在特定位点分析的DNA分子的总数,例 如序列读数的总数进行标准化,那么甲基化的DNA的数目等同于确定未甲基化的数目。At box 1320, for each of a plurality of sites, the corresponding number of DNA molecules methylated at that site is determined. In one embodiment, the site is a CpG site, and may be simply a number of CpG sites, selected using one or more criteria mentioned herein. Once normalized using the total number of DNA molecules analyzed at a particular site, such as the total number of sequence reads, the number of methylated DNA is equivalent to determining the number of unmethylated DNA.

在框1330处,基于在第一染色体区域内的位点甲基化的DNA分子的相应数目计算第一染色体区域的第一甲基化水平。第一染色体区域可以具有任何尺寸,例如上述尺寸。甲基化水平可以考虑与第一染色体区域比对的DNA分子的总数,例如作为标准化程序 的一部分。At box 1330, the first methylation level of the first chromosomal region is calculated based on the corresponding number of site-methylated DNA molecules within the first chromosomal region. The first chromosomal region can have any size, such as those described above. The methylation level can take into account the total number of DNA molecules compared to the first chromosomal region, for example, as part of a normalization procedure.

第一染色体区域可以具有任何尺寸(例如全染色体)并且可以由分开的子区构成,即彼此分隔开的子区。可以确定每个子区的甲基化水平并组合为例如平均值或中位值以确定第一染色体区域的甲基化水平。The first chromosomal region can be of any size (e.g., the entire chromosome) and can be composed of separate subregions, i.e., subregions separated from each other. The methylation level of each subregion can be determined and combined to, for example, an average or median value to determine the methylation level of the first chromosomal region.

在框1340处,将第一甲基化水平与阈值比较。阈值可以是参考甲基化水平或与参考甲基化水平有关(例如与正常水平的指定距离)的数值。阈值可以从怀有胎儿而第一染色体区域无染色体异常的的其它女性怀孕个体、从无癌症的个体的样品或从已知不与 非整倍体相关的生物体的基因座(即二体的区域)中来确定。At box 1340, the first methylation level is compared to a threshold. The threshold can be a reference methylation level or a value related to the reference methylation level (e.g., a specified distance from the normal level). The threshold can be determined from other pregnant women carrying a fetus who have no chromosomal abnormalities in the first chromosome region, from samples from individuals without cancer, or from loci (i.e., regions of disomy) in organisms known not to be associated with aneuploidy.

在一个实施例中,阈值可以被定义为(BKG-PLN)×f×0.5×CN的与参考甲基化水平 的差异,其中BKG是女性的背景(或来自其它个体的平均值或中位值),f是源自第一 组织的游离DNA的百分比浓度,并且CN是测试的拷贝数。CN是与一种类型异常(缺失或重复)相对应的校正因子的一个实例。最初CN为1的阈值可以用于测试所有扩增, 并随后其它阈值可以用于确定扩增程度。阈值可以基于源自第一组织的游离DNA的百 分比浓度以确定基因座的甲基化的预期水平,例如在不存在拷贝数异常时。In one embodiment, the threshold can be defined as the difference between (BKG - PLN) × f × 0.5 × CN and a reference methylation level, where BKG is the female background (or the mean or median from other individuals), f is the percentage concentration of cell-free DNA derived from the first tissue, and CN is the copy number being tested. CN is an instance of a correction factor corresponding to a type of aberration (deletion or duplication). An initial threshold of CN = 1 can be used to test all amplifications, and subsequent thresholds can be used to determine the extent of amplification. The threshold can be based on the percentage concentration of cell-free DNA derived from the first tissue to determine the expected level of methylation at the locus, for example, in the absence of copy number aberrations.

在框1350处,基于比较来确定第一染色体区域的异常的分类。不同水平的统计显著差异可以指示增加的具有染色体异常的胎儿的风险。在各个实施例中,染色体异常可 以是第21对染色体三体症、第18对染色体三体症、第13对染色体三体症、特纳综合 症或克氏综合症。其它实例是亚染色体缺失、亚染色体重复或迪乔治综合症(DiGeorge syndrome)。At box 1350, the classification of abnormalities in the first chromosomal region is determined based on comparison. Different levels of statistical significance can indicate an increased risk of a fetus with chromosomal abnormalities. In various embodiments, the chromosomal abnormality may be trisomy 21, trisomy 18, trisomy 13, Turner syndrome, or Klinefelter syndrome. Other examples are subchromosomal deletions, subchromosomal duplications, or DiGeorge syndrome.

V.标记物的测定V. Determination of markers

如上所述,胎儿基因组的某些部分的甲基化不同与母体基因组。这些差异可能是孕 妇中常见的。不同甲基化的区域可以用于鉴别来自胎儿的DNA片段。As mentioned above, the methylation of certain parts of the fetal genome differs from that of the maternal genome. These differences are likely common in pregnant women. Regions with different methylations can be used to identify DNA fragments originating from the fetus.

A.从胎盘组织和母体组织测定DMR的方法A. Methods for determining DMR from placental and maternal tissues

胎盘具有组织特定的甲基化标记。已经基于在胎盘组织与母体血细胞之间甲基化有 差异的基因座研发胎儿特定的DNA甲基化标记物,用于母体血浆检测和用于无创产前诊断应用(詹等人2008临床化学;54:500-511(SSC Chim et al.2008Clin Chem;54: 500-511);帕帕耶奥尔尤等人2009美国病理学杂志;174:1609-1618(EA Papageorgiou et al2009Am J Pathol;174:1609-1618);以及朱等人2011公共科学图书馆·综合;6: e14723(TChu et al.2011PLoS One;6:e14723))。提供了在全基因组基础上寻找此类甲 基化有差异的区域(DMR)的实施例。The placenta possesses tissue-specific methylation markers. Fetal-specific DNA methylation markers have been developed based on loci with differential methylation between placental tissue and maternal blood cells for maternal plasma detection and non-invasive prenatal diagnostic applications (Zhan et al. 2008 Clinical Chemistry; 54:500-511; Papageorgiou et al. 2009 American Journal of Pathology; 174:1609-1618; and Zhu et al. 2011 PLoS One; 6:e14723). Examples of identifying such differentially methylated regions (DMRs) on a genome-wide basis are provided.

图14是方法1400的流程图,所述方法1400用于根据本发明的实施例,通过比较 胎盘甲基化型态与母体甲基化型态(例如从血细胞测定)来鉴别甲基化标记物。方法1400 还可以通过比较肿瘤甲基化型态与对应于健康组织的甲基化型态用于确定肿瘤标记物。Figure 14 is a flowchart of method 1400, which, according to embodiments of the invention, identifies methylation markers by comparing placental methylation patterns with maternal methylation patterns (e.g., as determined from blood cells). Method 1400 can also be used to determine tumor markers by comparing tumor methylation patterns with methylation patterns corresponding to healthy tissue.

在框1410处,获得胎盘甲基化组和血液甲基化组。胎盘甲基化组可以从胎盘样品,例如CVS或足月胎盘测定。应了解甲基化组可能包括仅仅基因组的一部分的甲基化密 度。At box 1410, the placental methylome and blood methylome are obtained. The placental methylome can be determined from placental samples, such as CVS or full-term placenta. It should be understood that the methylome may include the methylation density of only a portion of the genome.

在框1420处,鉴别包括指定数目的位点(例如5个CpG位点)并且已经获得足够 数目的读数的区域。在一个实施例中,鉴别从每个染色体的一端开始以定位含有至少五 个有效的CpG位点的第一个500bp区域。如果CpG位点被至少五个序列读数覆盖,那 么可以认为所述位点合格。At box 1420, the region is identified, comprising a specified number of sites (e.g., 5 CpG sites) and for which a sufficient number of reads have been obtained. In one embodiment, identification begins at one end of each chromosome to locate the first 500 bp region containing at least five valid CpG sites. If a CpG site is covered by at least five sequence reads, then the site is considered valid.

在框1430处,计算每个位点的胎盘甲基化指数和血液甲基化指数。举例来说,针对每个500bp区域内的所有合格CpG位点,分别地计算甲基化指数。At box 1430, the placental methylation index and blood methylation index for each site are calculated. For example, the methylation index is calculated separately for all eligible CpG sites within each 500 bp region.

在框1440处,在母体血细胞与胎盘样品之间比较甲基化指数以确定所述组的指数在彼此之间是否不同。举例来说,使用例如曼-惠特尼检验(Mann-Whitney test)在母体血细胞与CVS或足月胎盘之间比较甲基化指数。例如P值≤0.01视为在统计学上显著不 同的,不过可以使用其它值,其中更低的数目将减少假阳性区域。At box 1440, methylation indices are compared between maternal blood cell and placental samples to determine whether the indices of the groups are different from each other. For example, the Mann-Whitney test is used to compare methylation indices between maternal blood cells and CVS or term placenta. For example, a p-value ≤0.01 is considered statistically significant, although other values can be used, where lower values will reduce the false positive area.

在一个实施例中,如果有效的CpG位点的数目低于五个或曼-惠特尼检验不显著,那么500bp区域向下游移动100bp。区域继续向下游移动直到对于500bp区域,曼-惠 特尼测试变得显著。随后将考虑下一个500bp区域。如果通过曼-惠特尼检验发现下一 个区域展现统计显著性,那么将其添加到当前区域中,但是组合的相邻区域不大于1,000 bp。In one embodiment, if the number of valid CpG sites is less than five or the Man-Whitney test is not significant, the 500bp region is shifted downstream by 100bp. The region continues to shift downstream until the Man-Whitney test becomes significant for the 500bp region. The next 500bp region is then considered. If the next region is found to be statistically significant by the Man-Whitney test, it is added to the current region, but the combined adjacent regions are no greater than 1,000bp.

在框1450处,在统计学上显著不同(例如通过曼-惠特尼检验)的相邻区域可以合并。注意差异是在两个样品的甲基化指数之间。在一个实施例中,如果相邻区域在彼此 指定距离(例如1,000bp)内并且如果它们展示类似的甲基化型态,那么将它们合并。 在一个实现方式中,可以使用以下中的任一者界定相邻区域之间的甲基化型态的类似性:(1)在胎盘组织中关于母体血液展示相同的倾向,例如两个区域都在胎盘组织中比 血细胞更多甲基化;(2)胎盘组织中相邻区域的甲基化密度差异低于10%;以及(3) 母体血细胞中相邻区域的甲基化密度差异低于10%。At box 1450, adjacent regions that are statistically significantly different (e.g., by the Mann-Whitney test) can be merged. Note that the difference is between the methylation indices of the two samples. In one embodiment, adjacent regions are merged if they are within a specified distance (e.g., 1,000 bp) from each other and if they exhibit similar methylation patterns. In one implementation, the similarity of methylation patterns between adjacent regions can be defined using any of the following: (1) showing the same tendency in placental tissue relative to maternal blood, e.g., both regions are more methylated in placental tissue than in blood cells; (2) the difference in methylation density between adjacent regions in placental tissue is less than 10%; and (3) the difference in methylation density between adjacent regions in maternal blood cells is less than 10%.

在框1460处,计算所述区域上来自母体血细胞DNA的血液甲基化组和胎盘样品(例如CVS或足月胎盘组织)的甲基化密度。甲基化密度可以如本文中所述来确定。At box 1460, the methylation density of blood methylome from maternal blood cell DNA and placental samples (e.g., CVS or full-term placental tissue) in the region is calculated. Methylation density can be determined as described herein.

在框1470处,根据区域中所有位点的总胎盘甲基化密度与总血液甲基化密度在统计学上具有显著性差异,确定候选DMR。在一个实施例中,合并区域内的所有有效的 CpG位点都经受χ2检验。针对合并区域内的所有有效的CpG位点,χ2检验评估在母体血细 胞与胎盘组织之间,,甲基化胞嘧啶的数目以甲基化和未甲基化胞嘧啶的比例形式呈现 时是否在统计学上具有显著性不同。在一个实现方式中,对于χ2检验,P值≤0.01可以视 为在统计学上具有显著性不同。通过χ2检验展示显著性的合并区段被视为候选DMR。At box 1470, candidate DMRs are identified based on a statistically significant difference between the total placental methylation density and the total blood methylation density across all sites in the region. In one embodiment, all valid CpG sites within the merged region undergo a χ² test. For all valid CpG sites within the merged region, the χ² test assesses whether there is a statistically significant difference between maternal blood cells and placental tissue when the number of methylated cytosines is presented as the ratio of methylated to unmethylated cytosines. In one implementation, a p-value ≤ 0.01 is considered statistically significant for the χ² test. Merged segments demonstrating significance through the χ² test are considered candidate DMRs.

在框1480处,鉴别母体血细胞DNA的甲基化密度超过高阈值或低于低阈值的基因座。在一个实施例中,鉴别母体血细胞DNA的甲基化密度≤20%或≥80%的基因座。在其 它实施例中,可以使用除母体血液外的体液,包括(但不限于)唾液、来自女性生殖道 的子宫或子宫颈灌洗液、泪液、汗水、唾液和尿。At box 1480, loci with maternal blood cell DNA methylation density exceeding a high threshold or falling below a low threshold are identified. In one embodiment, loci with maternal blood cell DNA methylation density ≤20% or ≥80% are identified. In other embodiments, bodily fluids other than maternal blood may be used, including (but not limited to) saliva, uterine or cervical lavage fluid from the female reproductive tract, tears, sweat, saliva, and urine.

成功研发母体血浆中胎儿特定的DNA甲基化标记物的关键可能是母体血细胞的甲基化状态尽可能高地甲基化或未甲基化。此可以减少(例如降到最低)母体DNA分子 干扰分析展示相反甲基化型态的胎盘来源的胎儿DNA分子的机率。因此,在一个实施例中,候选DMR通过进一步过滤来选择。候选低甲基化基因座是在母体血细胞中展示 甲基化密度≤20%并在胎盘组织中甲基化密度至少高20%的基因座。候选高甲基化基因 座是在母体血细胞中展示甲基化密度≥80%并在胎盘组织中甲基化密度至少低20%的基 因座。可以使用其它百分比。A key to successfully developing fetal-specific DNA methylation markers from maternal plasma may be ensuring that the methylation state of maternal blood cells is as high as possible or unmethylated. This reduces (e.g., minimizes) the likelihood that maternal DNA molecules interfere with the analysis of placental-derived fetal DNA molecules exhibiting the opposite methylation pattern. Therefore, in one embodiment, candidate DMRs are selected through further filtering. Candidate hypomethylated loci are loci exhibiting a methylation density ≤20% in maternal blood cells and a methylation density at least 20% higher in placental tissue. Candidate hypermethylated loci are loci exhibiting a methylation density ≥80% in maternal blood cells and a methylation density at least 20% lower in placental tissue. Other percentages may be used.

在框1490处,随后在胎盘甲基化密度显著不同于血液甲基化密度的基因座子集中, 通过将差异与阈值比较来鉴别DMR。在一个实施例中,阈值是20%,因此甲基化密度与来自母体血细胞的甲基化密度相差至少20%。因此,可以计算在每个鉴别的基因座, 胎盘甲基化密度与血液甲基化密度之间的差异。差异可以是简单的减法。在其它实施例 中,校正因子和其它函数可以用于确定差异(例如差异可以是应用于简单减法的函数的 结果)。At box 1490, DMR is then identified in a subset of loci where placental methylation density differs significantly from blood methylation density by comparing the difference to a threshold. In one embodiment, the threshold is 20%, so the methylation density differs from the methylation density from maternal blood cells by at least 20%. Therefore, the difference between placental methylation density and blood methylation density can be calculated for each identified locus. The difference can be a simple subtraction. In other embodiments, correction factors and other functions can be used to determine the difference (e.g., the difference can be the result of a function applied to a simple subtraction).

在一个实现方式中,使用此方法,从早期妊娠胎盘样品鉴别出11,729个高甲基化和 239,747个低甲基化基因座。前100个高甲基化基因座列在附件的表S2A中。前100个 低甲基化基因座列在附件的表S2B中。表S2A和S2B列出了染色体、起始和结束位置、 区域尺寸、母体血液中的甲基化密度、胎盘样品中的甲基化密度、P值(都极小)和甲 基化差异。位置对应于参考基因组hg18,其可以在 hgdownload.soe.ucsc.edu/goldenPath/hg18/chromosomes找到。In one implementation, using this method, 11,729 hypermethylated and 239,747 hypomethylated loci were identified from placental samples from early pregnancy. The top 100 hypermethylated loci are listed in Table S2A of the Annex. The top 100 hypomethylated loci are listed in Table S2B of the Annex. Tables S2A and S2B list the chromosome, start and end positions, region size, methylation density in maternal blood, methylation density in placental samples, p-values (all very small), and methylation differences. The positions correspond to the reference genome hg18, which can be found at hgdownload.soe.ucsc.edu/goldenPath/hg18/chromosomes.

从晚期妊娠胎盘样品鉴别出11,920个高甲基化和204,768个低甲基化基因座。妊娠 晚期的前100个高甲基化基因座列于表S2C中,并且前100个低甲基化基因座列于表 S2D中。先前报导在母体血细胞与早期妊娠胎盘组织之间甲基化有差异的三十三个基因座用于验证早期妊娠候选物的清单。使用我们的算法,33个基因座的79%被鉴别为 DMR。11,920 hypermethylated and 204,768 hypomethylated loci were identified from placental samples in late pregnancy. The top 100 hypermethylated loci in late pregnancy are listed in Table S2C, and the top 100 hypomethylated loci are listed in Table S2D. Thirty-three loci previously reported to show differential methylation between maternal blood cells and early pregnancy placental tissue were used to validate a list of early pregnancy candidates. Using our algorithm, 79% of the 33 loci were identified as DMRs.

图15A是表1500,展示使用关于33个先前报导的早期妊娠标记物的早期妊娠数据的DMR鉴别算法的性能。在此表中,“a”指示基因座1到15先前描述于(丘等人2007 美国病理学杂志;170:941-950(RWK Chiu et al.2007Am J Pathol;170:941-950)和詹等 人2008临床化学;54:500-511(SSC Chim et al.2008Clin Chem;54:500-511))中;基因 座16到23先前描述于(元,论文2007,香港中文大学,香港(KC Yuen,thesis 2007,The ChineseUniversity of Hong Kong,Hong Kong))中;以及基因座24到33先前描述于(帕 帕耶奥尔尤等人2009美国病理学杂志;174:1609-1618(EA Papageorgiou et al.2009Am J Pathol;174:1609-1618))中。“b”指示这些数据来源于以上已发表文献。“c”指示母体血 细胞和绒膜绒毛样品的甲基化密度和其差异是从本研究中产生的测序数据中观测到的, 但基于由初始研究提供的基因组座标。“d”指示关于基因座的数据是使用方法1400的实施例,在亚硫酸氢盐测序数据上鉴别的,没有参考以上引用的丘等人(2007)、詹等人 (2008)、元(2007)和帕帕耶奥尔尤等人(2009)的已发表文献。基因座的跨度包括先前报导 的基因组区域,但一般横跨更大的区域。“e”指示基于要求观测到母体血细胞和绒膜绒毛 样品中DMR的对应基因组座标的甲基化密度之间的差异>0.20,候选DMR被归类为真 阳性(TP)或假阴性(FN)。Figure 15A is Table 1500, demonstrating the performance of the DMR identification algorithm using early pregnancy data on 33 previously reported early pregnancy markers. In this table, "a" indicates loci 1 to 15 previously described in (RWK Chiu et al. 2007 American Journal of Pathology; 170:941-950 and Zhan et al. 2008 Clinical Chemistry; 54:500-511); loci 16 to 23 previously described in (Yuan, Paper 2007). The data are derived from the above-mentioned published literature. The data are from the sequencing data generated in this study, but based on genomic coordinates provided by the initial study. (KC Yuen, thesis 2007, The Chinese University of Hong Kong, Hong Kong); and loci 24 to 33 were previously described in (EA Papageorgiou et al. 2009 Am J Pathol; 174:1609-1618). “b” indicates that these data are derived from the above-mentioned published literature. “c” indicates that the methylation density and its differences in maternal blood cells and chorionic villus samples were observed from sequencing data generated in this study, but based on genomic coordinates provided by the initial study. “d” indicates that the data regarding the locus was identified using the embodiment of Method 1400 on bisulfite sequencing data, without reference to the published literature cited above by Qiu et al. (2007), Zhan et al. (2008), Yuan (2007), and Papayeo et al. (2009). The span of the locus includes previously reported genomic regions, but generally spans larger regions. “e” indicates that, based on the requirement of observing a difference >0.20 between the methylation density of the corresponding genomic coordinates of the DMR in maternal hematopoietic cells and chorionic villi samples, the candidate DMR is classified as a true positive (TP) or a false negative (FN).

图15B是表1550,展示使用晚期妊娠数据并与分娩时获得的胎盘样品比较的DMR鉴别算法的性能。“a”指示使用与图17A中所描述相同的33个基因座的清单。“b”指示 因为33个基因座先前从早期怀孕样品鉴别,所以它们可能不适用于晚期妊娠数据。因 此,在本研究中,根据初始研究提供的基因组座标复查足月胎盘组织对应的亚硫酸氢盐 测序数据。母体血细胞与足月胎盘组织之间的甲基化密度的差异>0.20用于确定在晚期 妊娠这些基因座是否实际上是真的DMR。“c”指示关于这些基因座的数据是使用方法 1400,在亚硫酸氢盐测序数据上鉴别的,没有参考先前引用的丘等人(2007)、詹等人 (2008)、元(2007)和帕帕耶奥尔尤等人(2009)的已发表文献。基因座的跨度包括先前报导 的基因组区域,但一般横跨较大的区域。“d”指示基于要求观测到母体血细胞和足月胎 盘组织中DMR的对应基因组座标的甲基化密度之间的差异>0.20,将含有被认定为在晚 期妊娠中甲基化有差异的基因座的候选DMR归类为真阳性(TP)或假阴性(FN)。对 于未被认定为在晚期妊娠中甲基化有差异的基因座,其不存在于DMR清单中或含有这 些基因座但展示甲基化差异<0.20的DMR的存在视为真阴性(TN)DMR。Figure 15B is Table 1550, demonstrating the performance of the DMR identification algorithm using late pregnancy data and compared with placental samples obtained at delivery. “a” indicates the use of the same list of 33 loci as described in Figure 17A. “b” indicates that because these 33 loci were previously identified from early pregnancy samples, they may not be applicable to late pregnancy data. Therefore, in this study, bisulfite sequencing data corresponding to term placental tissue were re-examined based on the genomic coordinates provided in the initial study. A difference in methylation density >0.20 between maternal hematopoietic cells and term placental tissue was used to determine whether these loci were indeed true DMRs in late pregnancy. “c” indicates that the data for these loci were identified using Method 1400 on bisulfite sequencing data without reference to previously cited publications by Qiu et al. (2007), Zhan et al. (2008), Yuan (2007), and Papayeo et al. (2009). The span of the loci includes previously reported genomic regions but generally spans larger areas. The "d" designation indicates that candidate DMRs containing loci identified as differentially methylated in late pregnancy are classified as true positive (TP) or false negative (FN) based on a requirement that the difference in methylation density between the corresponding genomic coordinates of the DMR observed in maternal blood cells and full-term placental tissue is >0.20. Loci not identified as differentially methylated in late pregnancy, those not present in the DMR list, or those containing these loci but showing a methylation difference <0.20, are considered true negative (TN) DMRs.

B.来自母体血浆测序数据的DMRB. DMR from maternal plasma sequencing data

应能够直接从母体血浆DNA亚硫酸氢盐测序数据鉴别胎盘组织DMR,假设还已知样品的胎儿DNA百分比浓度。有可能是因为胎盘是母体血浆中胎儿DNA的主要来源(詹 等人2005美国国家科学院院刊102,14753-14758(SSC Chim et al.2005Proc Natl Acad SciUSA 102,14753-14758)),并且在这一研究中,展示了母体血浆中胎儿特定的DNA 的甲基化状态与胎盘甲基化组相关。It should be possible to directly identify placental tissue DMR from maternal plasma DNA bisulfite sequencing data, assuming the percentage concentration of fetal DNA in the sample is also known. This is likely because the placenta is the primary source of fetal DNA in maternal plasma (Chim et al., 2005 Proc Natl Acad SciUSA 102, 14753-14758), and this study demonstrated that the methylation status of fetal-specific DNA in maternal plasma is correlated with the placental methylome.

因此,方法1400的方面可以使用血浆甲基化组确定推断的胎盘甲基化组而非使用胎盘样品来实现。因此,方法1000和方法1400可以组合来确定DMR。方法1000可以 用于确定胎盘甲基化型态的预测值并在方法1400中使用其。对于此分析,所述实例还 关注母体血细胞中甲基化≤20%或≥80%的基因座。Therefore, aspects of method 1400 can be achieved by using plasma methylome to determine the inferred placental methylome instead of using placental samples. Thus, methods 1000 and 1400 can be combined to determine the DMR. Method 1000 can be used to determine the predicted value of placental methylation morphology and used in method 1400. For this analysis, the example also focuses on loci in maternal blood cells with methylation ≤20% or ≥80%.

在一个实现方式中,为了推断胎盘组织中相对于母体血细胞高甲基化的基因座,分 选展示母体血细胞中甲基化≤20%并且根据预测值甲基化≥60%以及血细胞甲基化密度与预测值之间差异至少50%的基因座。为了推断胎盘组织中相对于母体血细胞低甲基化的 基因座,分选展示母体血细胞中甲基化≥80%,并且根据预测值甲基化≤40%,以及血细 胞甲基化密度与预测值之间差异至少50%的基因座。In one implementation, to infer loci in placental tissue that are hypermethylated relative to maternal blood cells, loci in maternal blood cells with methylation ≤20% and methylation ≥60% according to predicted values, and a difference of at least 50% between the blood cell methylation density and the predicted value, are sorted and displayed. To infer loci in placental tissue that are hypomethylated relative to maternal blood cells, loci in maternal blood cells with methylation ≥80% and methylation ≤40% according to predicted values, and a difference of at least 50% between the blood cell methylation density and the predicted value, are sorted and displayed.

图16是表1600,展示基于母体血浆亚硫酸氢盐测序数据的直接分析预测为高甲基化或低甲基化的基因座的数目。“N/A”意指不适用。“a”指示高甲基化基因座的搜索从展 示母体血细胞中甲基化密度<20%的基因座清单开始。“b”指示低甲基化基因座的搜索从 展示母体血细胞中甲基化密度>80%的基因座清单开始。“c”指示来自绒膜绒毛样品的亚 硫酸氢盐测序数据用于验证早期妊娠母体血浆数据,并且足月胎盘组织用于验证晚期妊 娠母体血浆数据。Figure 16 is Table 1600, showing the number of loci predicted as hypermethylated or hypomethylated based on direct analysis of maternal plasma bisulfite sequencing data. "N/A" indicates not applicable. "a" indicates that the search for hypermethylated loci begins with a list of loci showing methylation density <20% in maternal blood cells. "b" indicates that the search for hypomethylated loci begins with a list of loci showing methylation density >80% in maternal blood cells. "c" indicates that bisulfite sequencing data from chorionic villus samples were used to validate early pregnancy maternal plasma data, and full-term placental tissue was used to validate late pregnancy maternal plasma data.

如表1600中示出,大多数的无创推断的基因座都展示组织中预期的甲基化模式并且与从组织数据寻找到和前面部分中呈现的DMR重叠。附件列出从血浆鉴别出的DMR。表S3A列出从早期妊娠母体血浆亚硫酸氢盐测序数据推断为高甲基化的前100个基因 座。表S3B列出从早期妊娠母体血浆亚硫酸氢盐测序数据推断为低甲基化的前100个基因座。表S3C列出从晚期妊娠母体血浆亚硫酸氢盐测序数据推断为高甲基化的前100个 基因座。表S3D列出从晚期妊娠母体血浆亚硫酸氢盐测序数据推断为低甲基化的前100个基因座。As shown in Table 1600, most of the non-invasively inferred loci exhibited the expected methylation patterns in tissue and overlapped with the DMRs found from tissue data and presented in the preceding sections. The appendix lists the DMRs identified from plasma. Table S3A lists the top 100 loci inferred as hypermethylated from early pregnancy maternal plasma bisulfite sequencing data. Table S3B lists the top 100 loci inferred as hypomethylated from early pregnancy maternal plasma bisulfite sequencing data. Table S3C lists the top 100 loci inferred as hypermethylated from late pregnancy maternal plasma bisulfite sequencing data. Table S3D lists the top 100 loci inferred as hypomethylated from late pregnancy maternal plasma bisulfite sequencing data.

C.胎盘和胎儿甲基化组的妊娠期变化C. Pregnancy changes in placental and fetal methylation groups

CVS中甲基化CpG的总比例是55%,而对于足月胎盘,其是59%(图1的表100)。 可以从CVS比足月胎盘鉴别出更多低甲基化的DMR,而两个组织的高甲基化的DMR 数目类似。因此,显而易见CVS比足月胎盘更加低甲基化。此妊娠期相关的甲基化趋势在母体血浆数据中也是显而易见的。在早期妊娠母体血浆中胎儿特定的读数中甲基化 CpG的比例是47.0%,但晚期妊娠母体血浆中是53.3%。经验证的高甲基化基因座的数 目在早期(1,457个基因座)和晚期妊娠(1,279个基因座)母体血浆样品中是类似的, 但早期妊娠(21,812个基因座)样品实质上比晚期妊娠(12,677个基因座)样品中低甲 基化基因座更多(图16的表1600)。The total proportion of methylated CpGs in the CVS was 55%, compared to 59% in the term placenta (Table 100, Figure 1). More hypomethylated DMRs could be identified in the CVS than in the term placenta, while the number of hypermethylated DMRs was similar in both tissues. Therefore, it is evident that the CVS is more hypomethylated than the term placenta. This pregnancy-related methylation trend was also evident in maternal plasma data. The proportion of methylated CpGs in fetal-specific readings was 47.0% in early pregnancy maternal plasma, but 53.3% in late pregnancy maternal plasma. The number of validated hypermethylated loci was similar in early (1,457 loci) and late (1,279 loci) maternal plasma samples, but the early pregnancy (21,812 loci) sample had substantially more hypomethylated loci than the late pregnancy (12,677 loci) sample (Table 1600, Figure 16).

D.标记物的用途D. The purpose of markers

甲基化有差异的标记物或DMR适用于若干方面。母体血浆中此类标记物的存在指示和证实胎儿或胎盘DNA的存在。此证实可以用作无创产前测试的质量控制。DMR可以充当母体血浆中通用的胎儿DNA标记物,并且优于依赖于母亲与胎儿之间的基因型 差异的标记物,例如基于多态性的标记物或基于染色体Y的标记物。DMR是适用于所 有孕妇的通用胎儿标记物。基于多态性的标记物仅仅适用于其中胎儿从其父亲遗传标记 物并且其中母亲在其基因组中不具有此标记物的部分孕妇。此外,可以通过定量源自那 些DMR的DNA分子来测量母体血浆样品中胎儿DNA浓度。通过知道对正常孕妇所预 期的DMR的型态,可以通过观测母体血浆DMR型态或甲基化型态与对正常孕妇所预 期的型态的偏差来检测怀孕相关并发症、特别是涉及胎盘组织改变的并发症。包括胎盘 组织改变的妊娠相关的并发症包括(但不限于)胎儿染色体非整倍体。实例包括第21 对染色体三体症、先兆子痫、子宫内生长迟缓和早产。Differentially methylated markers, or DMRs, are applicable in several ways. The presence of such markers in maternal plasma indicates and confirms the presence of fetal or placental DNA. This confirmation can be used as quality control for non-invasive prenatal testing. DMRs can serve as universal fetal DNA markers in maternal plasma and are superior to markers that rely on genotypic differences between mother and fetus, such as polymorphism-based markers or Y-chromosome-based markers. DMRs are universal fetal markers applicable to all pregnant women. Polymorphism-based markers are only applicable to a subset of pregnant women in which the fetus inherits the marker from its father and the mother does not possess the marker in her genome. Furthermore, the concentration of fetal DNA in maternal plasma samples can be measured by quantifying DNA molecules derived from those DMRs. By knowing the expected DMR morphology for normal pregnant women, pregnancy-related complications, particularly those involving placental tissue alterations, can be detected by observing deviations in maternal plasma DMR or methylation morphology from the expected morphology for normal pregnant women. Pregnancy-related complications involving placental tissue alterations include (but are not limited to) fetal chromosomal aneuploidy. Examples include trisomy 21, preeclampsia, intrauterine growth retardation, and preterm birth.

E.使用标记物的试剂盒E. Kits using markers

实施例可以提供用于实践本文中所描述的方法以及其它适用方法的组合物和试剂 盒。试剂盒可以用于进行分析母体血浆中胎儿DNA、例如游离胎儿DNA的分析。在一个实施例中,试剂盒可以包括至少一种适用于与本文中鉴别的一或多个基因座特异性杂 交的寡核苷酸。试剂盒也可以包括至少一种适用于与一或多个参考基因座特异性杂交的 寡核苷酸。在一个实施例中,测量胎盘高甲基化的标记物。测试基因座可以是母体血浆 中的甲基化DNA并且参考基因座可以是母体血浆中的甲基化DNA。类似的试剂盒可以形成用于分析血浆中的肿瘤DNA。Examples may provide compositions and kits for practicing the methods described herein, as well as other applicable methods. The kits can be used to analyze fetal DNA, such as cell-free fetal DNA, in maternal plasma. In one embodiment, the kit may include at least one oligonucleotide suitable for specific hybridization with one or more loci identified herein. The kit may also include at least one oligonucleotide suitable for specific hybridization with one or more reference loci. In one embodiment, a marker of placental hypermethylation is measured. The test locus may be methylated DNA in maternal plasma, and the reference locus may be methylated DNA in maternal plasma. Similar kits can be formulated for analyzing tumor DNA in plasma.

在一些情况下,试剂盒可以包括至少两种寡核苷酸引物,所述引物可以用于扩增目 标基因座(例如附件中的基因座)和参考基因座的至少一部分。代替或除引物外,试剂盒还可以包括用于检测与目标基因座和参考基因座相对应的DNA片段的标记探针。在 各个实施例中,试剂盒的一或多种寡核苷酸对应于附件表中的基因座。通常,试剂盒还 提供了指导使用者分析测试样品和评估测试个体中的生理或病理状况的说明手册。In some cases, the kit may include at least two oligonucleotide primers that can be used to amplify at least a portion of a target locus (e.g., the loci listed in the appendix) and a reference locus. Instead of primers, the kit may also include labeled probes for detecting DNA fragments corresponding to the target and reference loci. In various embodiments, one or more oligonucleotides in the kit correspond to the loci listed in the appendix. Typically, the kit also includes an instruction manual guiding the user to analyze test samples and assess physiological or pathological conditions in the tested individual.

在各个实施例中,提供了用于分析含有胎儿DNA与来自怀有胎儿的女性个体的DNA的混合物的生物样品中胎儿DNA的试剂盒。试剂盒可以包含一或多种用于与表 S2A、S2B、S2C、S2D、S3A、S3B、S3C和S3D中列出的基因组区域的至少一部分特 异性杂交的寡核苷酸。因此,可以使用从跨越表到只来自一个表的任何数目的寡核苷酸。 寡核苷酸可以充当引物,并且可以组织为引物对,其中一对对应于来自表的特定区域。In various embodiments, kits are provided for analyzing fetal DNA in biological samples containing a mixture of fetal DNA and DNA from a pregnant female individual. The kit may contain one or more oligonucleotides for specific hybridization with at least a portion of the genomic regions listed in Tables S2A, S2B, S2C, S2D, S3A, S3B, S3C, and S3D. Therefore, any number of oligonucleotides, ranging from those spanning multiple tables to those from only one table, can be used. The oligonucleotides may act as primers and may be organized into primer pairs, where one pair corresponds to a specific region from a table.

VI.尺寸与甲基化密度的关系VI. Relationship between size and methylation density

已知血浆DNA分子在循环中呈短分子形式存在,其中大部分的分子长约160bp(洛等人2010科学·转化医学;2:61ra91、郑等人2012临床化学;58:549-558)。有趣的是, 数据揭露了血浆DNA分子的甲基化状态与尺寸之间的关系。因此,血浆DNA片段长度 与DNA甲基化水平有关联。血浆DNA分子的特征性尺寸型态表明大多数与可能来源于 细胞凋亡期间酶促降解的单核小体相关。Plasma DNA molecules are known to exist in circulation as short molecules, with most being approximately 160 bp in length (Luo et al. 2010 Science Translational Medicine; 2:61ra91; Zheng et al. 2012 Clinical Chemistry; 58:549-558). Interestingly, the data revealed a relationship between the methylation state and size of plasma DNA molecules. Therefore, plasma DNA fragment length is associated with DNA methylation levels. The characteristic size morphology of plasma DNA molecules suggests that most are likely associated with mononuclear bodies, which may originate from enzymatic degradation during apoptosis.

循环DNA实质上片段化。具体来说,在母体血浆样品中循环胎儿DNA比母体来源 的DNA短(陈等人2004临床化学;50:88-92(KCA Chan et al.2004Clin Chem;50: 88-92))。因为双末端比对能够对经亚硫酸氢盐处理的DNA进行尺寸分析,所以可以直 接评估血浆DNA分子的尺寸与其相应甲基化水平之间是否存在任何相关性。在母体血 浆以及非怀孕成年女性对照血浆样品中对此进行探索。Circulating DNA is essentially fragmented. Specifically, circulating fetal DNA in maternal plasma samples is shorter than maternally derived DNA (KCA Chan et al. 2004 Clin Chem; 50: 88-92). Because paired-end alignment allows for size analysis of bisulfite-treated DNA, it is possible to directly assess whether there is any correlation between the size of plasma DNA molecules and their corresponding methylation levels. This was explored in maternal plasma and control plasma samples from non-pregnant adult women.

在这一研究中,使用对每个DNA分子的两个末端进行双末端测序(其包括测序整个分子)以分析每个样品。通过将每个DNA分子的成对末端序列与参考人类基因组比 对并记录测序读数的最末端的基因组座标,可以确定测序DNA分子的长度。血浆DNA 分子天然片段化成小分子并且血浆DNA的测序文库通常在无任何片段化步骤下制备。 因此,通过测序推断的长度表示初始血浆DNA分子的尺寸。In this study, paired-end sequencing (which includes sequencing the entire molecule) was used to analyze each sample. The length of the sequenced DNA molecule was determined by aligning the paired-end sequences of each DNA molecule to a reference human genome and recording the genomic coordinates of the very end of the sequencing reads. Plasma DNA molecules are naturally fragmented into small molecules, and sequencing libraries of plasma DNA are typically prepared without any fragmentation steps. Therefore, the length inferred from sequencing represents the size of the initial plasma DNA molecule.

在先前研究中,确定了母体血浆中胎儿和母体DNA分子的尺寸型态(洛等人2010科学·转化医学;2:61ra91)。展示血浆DNA分子的尺寸类似于单核小体并且胎儿DNA 分子比母体DNA分子短。在这一研究中,确定了血浆DNA分子的甲基化状态与其尺寸 的关系。Previous studies have determined the size morphology of fetal and maternal DNA molecules in maternal plasma (Lou et al. 2010 Science Translational Medicine; 2:61ra91). These studies showed that plasma DNA molecules are similar in size to mononuclear bodies, and that fetal DNA molecules are shorter than maternal DNA molecules. In this study, the relationship between the methylation state and size of plasma DNA molecules was determined.

A.结果A. Result

图17A是曲线1700,展示母体血浆、非怀孕女性对照血浆、胎盘和外周血液DNA 的尺寸分布。对于母体样品和非怀孕女性对照血浆,两个经亚硫酸氢盐处理的血浆样品 显示与先前所报导相同的特征性尺寸分布(洛等人2010科学·转化医学;2:61ra91),其 中最大量的总序列长166-167bp并且10bp周期的DNA分子比143bp短。Figure 17A shows curve 1700, illustrating the size distribution of DNA in maternal plasma, non-pregnant female control plasma, placenta, and peripheral blood. For both maternal and non-pregnant female control plasma samples, the two bisulfite-treated plasma samples exhibited the same characteristic size distribution previously reported (Lo et al. 2010 Science Translational Medicine; 2:61ra91), with the largest total sequence lengths being 166–167 bp and 10 bp period DNA molecules being shorter than 143 bp.

图17B是母体血浆、成年女性对照血浆、胎盘组织和成年女性对照血液的尺寸分布和甲基化型态的曲线1750。对于相同尺寸并含有至少一个CpG位点的DNA分子,计算其平均甲基化密度。随后绘制DNA分子的尺寸与其甲基化密度之间的关系。具体来说, 对于覆盖至少1个CpG位点的测序读数,确定长度在50bp到至多180bp范围内的每 个片段的平均甲基化密度。有趣的是,甲基化密度随着血浆DNA尺寸而增加并在大约 166-167bp达到峰值。但是,在使用超声发生器系统片段化的胎盘和对照血液DNA样 品中未观测到此模式。Figure 17B shows the size distribution and methylation patterns of maternal plasma, adult female control plasma, placental tissue, and adult female control blood. The average methylation density was calculated for DNA molecules of the same size containing at least one CpG site. The relationship between DNA molecule size and methylation density was then plotted. Specifically, for sequencing reads covering at least one CpG site, the average methylation density of each fragment ranging in length from 50 bp to at most 180 bp was determined. Interestingly, methylation density increased with plasma DNA size, peaking at approximately 166–167 bp. However, this pattern was not observed in placental and control blood DNA samples fragmented using an ultrasound generator system.

图18展示血浆DNA分子的甲基化密度和尺寸的曲线图。图18A是早期妊娠母体血浆的曲线1800。图18B是晚期妊娠母体血浆的曲线1850。覆盖至少一个CpG位点的所 有测序读数的数据用蓝色曲线1805表示。还含有胎儿特定的SNP等位基因的读数的数 据用红色曲线1810表示。还含有母体特定的SNP等位基因的读数的数据用绿色曲线 1815表示。Figure 18 shows graphs of methylation density and size of plasma DNA molecules. Figure 18A is curve 1800 for maternal plasma in early pregnancy. Figure 18B is curve 1850 for maternal plasma in late pregnancy. Data for all sequencing reads covering at least one CpG site are represented by the blue curve 1805. Data containing reads of fetal-specific SNP alleles are represented by the red curve 1810. Data containing reads of maternal-specific SNP alleles are represented by the green curve 1815.

含有胎儿特定的SNP等位基因的读数视为来源于胎儿DNA分子。含有母体特定的SNP等位基因的读数视为来源于母体DNA分子。一般来说,具有高度甲基化密度的DNA 分子尺寸较长。此倾向存在于早期妊娠与晚期妊娠的胎儿与母体DNA分子中。胎儿DNA 分子的总尺寸如先前报导的母体DNA分子短。Readings containing fetal-specific SNP alleles are considered to originate from fetal DNA molecules. Readings containing maternal-specific SNP alleles are considered to originate from maternal DNA molecules. Generally, DNA molecules with high methylation density are longer. This tendency is present in fetal and maternal DNA molecules in early and late pregnancy. The overall size of fetal DNA molecules is shorter than previously reported for maternal DNA molecules.

图19A展示成年非怀孕女性的测序读数的甲基化密度和尺寸的曲线1900。来自成年非怀孕女性的血浆DNA样品也展示了相同的DNA分子的尺寸与甲基化状态之间的关 系。另一方面,通过在MPS分析前进行超声波处理步骤使基因组DNA样品片段化。如 曲线1900中示出,来自血细胞和胎盘组织样品的数据未显示相同的变化趋势。因为细 胞的片段化是人工的,所以将预期尺寸与密度无关系。因为血浆中天然片段化的DNA 分子依赖于尺寸,所以可以假定,更低的甲基化密度使分子更可能断裂成更小的片段。Figure 19A shows curve 1900 for methylation density and size of sequencing reads from adult non-pregnant women. Plasma DNA samples from adult non-pregnant women also show the same relationship between DNA molecule size and methylation state. On the other hand, genomic DNA samples were fragmented by a sonication step prior to MPS analysis. As shown in curve 1900, data from blood cell and placental tissue samples did not show the same trend. Because cell fragmentation is artificial, size is not expected to be related to density. Since naturally fragmented DNA molecules in plasma are size-dependent, it can be assumed that lower methylation density makes molecules more likely to break into smaller fragments.

图19B是曲线1950,展示母体血浆中胎儿特定和母体特定的DNA分子的尺寸分布和甲基化型态。胎儿特定和母体特定的血浆DNA分子还显示相同的片段尺寸与甲基化 水平之间的相关性。胎盘来源和母体循环的游离DNA的片段长度都随着甲基化水平而 增加。此外,其甲基化状态的分布彼此不重叠,表明所述现象存在与循环DNA分子来 源的初始片段长度无关。Figure 19B, curve 1950, illustrates the size distribution and methylation patterns of fetal-specific and maternal-specific DNA molecules in maternal plasma. Fetal-specific and maternal-specific plasma DNA molecules also show the same correlation between fragment size and methylation level. Fragment lengths of cell-free DNA from both placental origin and maternal circulation increase with methylation level. Furthermore, the distributions of their methylation states do not overlap, indicating that this phenomenon is independent of the initial fragment length of the circulating DNA molecules.

B.方法B. Method

因此,尺寸分布可以用于评估血浆样品的总甲基化百分比。随后可以在妊娠期间、在癌症监测期间或在治疗期间,根据图18A和18B中展示的关系,通过血浆DNA的尺 寸分布的连续测量来追踪此甲基化测量。甲基化测量也可以用于从相关器官或组织寻找 增加或减少的DNA释放。举例来说,可以特定地寻找对特定器官(例如肝)来说是特 定的DNA甲基化标记并测量这些标记在血浆中的浓度。因为DNA在细胞死亡时会释放到血浆中,所以水平的增加可能意味着所述特定器官或组织中细胞死亡或破坏的增加。 特定器官中的水平降低可能意味着所述器官中对抗破坏或病理过程的治疗在控制之下。Therefore, size distribution can be used to assess the total methylation percentage of a plasma sample. This methylation measurement can then be tracked by continuous measurements of the size distribution of plasma DNA, according to the relationships shown in Figures 18A and 18B, during pregnancy, cancer surveillance, or treatment. Methylation measurements can also be used to look for increased or decreased DNA release from relevant organs or tissues. For example, specific DNA methylation markers specific to a particular organ (e.g., the liver) can be identified and their concentrations in plasma measured. Because DNA is released into the plasma upon cell death, an increase in levels may indicate increased cell death or damage in that particular organ or tissue. A decrease in levels in a particular organ may indicate that treatment to combat damage or pathological processes in that organ is under control.

图20是方法2000的流程图,所述方法2000用于根据本发明的实施例,评估生物 体的生物样品中DNA的甲基化水平。甲基化水平可以针对基因组的特定区域或整个基 因组来评估。如果希望特定的区域,那么可以使用仅仅来自特定区域的DNA片段。Figure 20 is a flowchart of method 2000, which is used to assess the methylation level of DNA in a biological sample of an organism according to embodiments of the present invention. The methylation level can be assessed against a specific region of the genome or the entire genome. If a specific region is desired, then DNA fragments from only that specific region can be used.

在框2010处,测量与各种尺寸相对应的DNA片段的量。针对多个尺寸中的每个尺寸,可以测量与所述尺寸相对应的生物样品中多个DNA片段的量。举例来说,可以测 量长度为140个碱基的DNA片段的数目。量可以呈直方图形式保存。在一个实施例中,测量生物样品中多个核酸的尺寸,此可以个别地(例如通过完整分子或仅仅分子的末端 的单分子测序)或成组地(例如通过电泳)进行。尺寸可以对应于某一范围。因此,量 可以针对尺寸在特定范围内的DNA片段。当进行双末端测序时,可以使用比对到特定 区域的DNA片段(如通过成对序列读数确定)确定所述区域的甲基化水平。At box 2010, the amount of DNA fragments corresponding to various sizes is measured. For each of the multiple sizes, the amount of multiple DNA fragments in the biological sample corresponding to said size can be measured. For example, the number of DNA fragments with a length of 140 bases can be measured. The amounts can be stored in the form of a histogram. In one embodiment, the size of multiple nucleic acids in the biological sample is measured, which can be done individually (e.g., by single-molecule sequencing of the whole molecule or only the ends of the molecule) or in groups (e.g., by electrophoresis). The size can correspond to a range. Therefore, the amount can be for DNA fragments with sizes within a specific range. When performing paired-end sequencing, the methylation level of said region can be determined using DNA fragments aligned to a specific region (e.g., determined by paired sequence readings).

在框2020处,基于多个尺寸下DNA片段的量计算第一参数的第一值。在一方面, 第一参数提供了生物样品中DNA片段的尺寸型态的统计测量(例如直方图)。参数可以 被称为尺寸参数,因为其由多个DNA片段的尺寸来确定。At box 2020, a first value for the first parameter is calculated based on the amount of DNA fragments at multiple sizes. In one respect, the first parameter provides a statistical measurement (e.g., a histogram) of the size morphology of DNA fragments in a biological sample. The parameter may be referred to as a size parameter because it is determined by the sizes of multiple DNA fragments.

第一参数可以为各种形式。一个参数是特定尺寸或尺寸范围的DNA片段相对于所有DNA片段或相对于另一尺寸或范围的DNA片段的百分比。此类参数是在特定尺寸下 DNA片段的数目除以片段的总数,所述数目可以从直方图(提供在特定尺寸下片段的绝 对或相对计数的任何数据结构)获得。作为另一实例,参数可以是在特定尺寸下或特定 范围内片段的数目除以另一尺寸或范围的片段的数目。除法可以充当一种标准化,以考 虑针对不同样品分析的DNA片段的不同数目。标准化可以通过分析每个样品相同数目的DNA片段来实现,其有效地提供了与除以所分析的总片段数相同的结果。参数和尺 寸分析的额外实例可以见于美国专利申请案13/789,553中,所述申请案以引用的方式并 入以达成所有目的。The first parameter can take various forms. One parameter is the percentage of DNA fragments of a specific size or size range relative to all DNA fragments or relative to DNA fragments of another size or size range. Such a parameter is the number of DNA fragments at a specific size divided by the total number of fragments, which can be obtained from a histogram (any data structure that provides an absolute or relative count of fragments at a specific size). As another example, the parameter can be the number of fragments at a specific size or within a specific range divided by the number of fragments at another size or size range. Division can serve as a form of standardization to account for the different numbers of DNA fragments analyzed for different samples. Standardization can be achieved by analyzing the same number of DNA fragments for each sample, which effectively provides the same result as dividing by the total number of fragments analyzed. Additional examples of parameter and size analysis can be found in U.S. Patent Application 13/789,553, which is incorporated by reference for all purposes.

在框2030处,比较第一尺寸值与参考尺寸值。参考尺寸值可以由参考样品的DNA片段计算。为了确定参考尺寸值,可以针对参考样品计算并定量甲基化型态以及第一尺 寸参数的值。因此,当比较第一尺寸值与参考尺寸值时,可以确定甲基化水平。At box 2030, a first size value is compared with a reference size value. The reference size value can be calculated from a DNA fragment of a reference sample. To determine the reference size value, the methylation pattern and the value of the first size parameter can be calculated and quantified for the reference sample. Therefore, when the first size value is compared with the reference size value, the methylation level can be determined.

在框2040处,基于比较来评估甲基化水平。在一个实施例中,可以确定第一参数的第一值高于还是低于参考尺寸值,并由此确定本发明样品的甲基化水平高于还是低于参考尺寸值的甲基化水平。在另一个实施例中,比较通过将第一值输入到校准函数中来 实现。校准功能可以通过鉴别曲线上与第一值相对应的点来有效地比较第一值与校准值(一组参考尺寸值)。随后所评估的甲基化水平作为校准函数的输出值提供。At box 2040, the methylation level is evaluated based on comparison. In one embodiment, a first value of a first parameter can be determined to be higher or lower than a reference size value, thereby determining whether the methylation level of the sample of the present invention is higher or lower than the reference size value. In another embodiment, the comparison is implemented by inputting the first value into a calibration function. The calibration function can effectively compare the first value with calibration values (a set of reference size values) by means of points on the discrimination curve corresponding to the first value. The subsequently evaluated methylation level is provided as the output value of the calibration function.

因此,可以将尺寸参数校准成甲基化水平。举例来说,可以测量甲基化水平并与样品的特定尺寸参数相关。随后来自各种样品的数据点可以拟合校准函数。在一中实现方 式中,不同校准函数可以用于DNA的不同子集。因此,基于先前关于DNA的特定子集 的甲基化与尺寸之间的关系的知识,可以存在一些不同的校准形式。举例来说,胎儿和 母体DNA的校准可能是不同的。Therefore, size parameters can be calibrated to methylation levels. For example, methylation levels can be measured and correlated with specific size parameters of the sample. Data points from various samples can then be fitted to a calibration function. In one implementation, different calibration functions can be used for different subsets of DNA. Therefore, based on prior knowledge of the relationship between methylation and size for specific subsets of DNA, several different forms of calibration may exist. For example, the calibration of fetal and maternal DNA might be different.

如上所示,胎盘与母体血液具有更加低甲基化,因而胎儿DNA因甲基化更低而更小。因此,可以使用样品片段的平均尺寸(或其它统计值)评估甲基化密度。因为片段 尺寸可以使用双末端测序测量,而不是可能技术上更复杂的可识别甲基化的测序,所以 此方法如果用在临床上将可能是有成本效益的。此方法可以用于监测与妊娠进展或例如 先兆子痫、早产和胎儿病症(例如由染色体或遗传异常或子宫内生长迟缓引起的病症)等妊娠相关病症相关的甲基化改变。As shown above, the placenta and maternal blood have lower methylation, resulting in smaller fetal DNA due to lower methylation. Therefore, methylation density can be assessed using the average size (or other statistical values) of sample fragments. Because fragment size can be measured using paired-end sequencing, rather than the potentially more technically complex methylation-identifying sequencing, this method could be cost-effective in clinical applications. This method can be used to monitor methylation changes associated with pregnancy progression or pregnancy-related conditions such as preeclampsia, preterm birth, and fetal disorders (e.g., those caused by chromosomal or genetic abnormalities or intrauterine growth restriction).

在另一个实施例中,此方法可以用于检测和监测癌症。举例来说,随着癌症的成功治疗,如使用此基于尺寸的方法测量的血浆或另一体液中的甲基化型态将朝着无癌症的健康个体的甲基化型态方向改变。相反,在癌症进展的情况下,血浆或另一体液中的甲 基化型态将从无癌症的健康个体的甲基化型态发散。In another embodiment, this method can be used to detect and monitor cancer. For example, with successful cancer treatment, the methylation patterns in plasma or another bodily fluid, as measured using this size-based method, will change towards the methylation patterns of a cancer-free healthy individual. Conversely, in the case of cancer progression, the methylation patterns in plasma or another bodily fluid will diverge from the methylation patterns of a cancer-free healthy individual.

综上所述,血浆中低甲基化的分子比高甲基化的分子短。在胎儿和母体DNA分子中都观测相同的倾向。因为已知DNA甲基化影响核小体填充,所以数据表明可能低甲 基化的DNA分子不太密集地缠绕组蛋白并因此对酶降解更敏感。另一方面,图18A和 18B中呈现的数据还展示尽管胎儿DNA比母体读数对应的甲基化低得多,但胎儿和母体DNA的尺寸分布彼此未完全分开。图19B中,可以看到即使是对于相同的尺寸类别, 胎儿和母体特定的读数的甲基化水平也彼此不同。此观测结果表明胎儿DNA的低甲基 化状态不是说明其相对于母体DNA较短的唯一因素。In summary, hypomethylated molecules in plasma are shorter than hypermethylated molecules. The same tendency is observed in both fetal and maternal DNA molecules. Since DNA methylation is known to affect nucleosome filling, the data suggest that hypomethylated DNA molecules may be less densely wrapped around histones and therefore more sensitive to enzymatic degradation. On the other hand, the data presented in Figures 18A and 18B also show that although fetal DNA has much lower methylation than the maternal readings, the size distributions of fetal and maternal DNA are not entirely separated. In Figure 19B, it can be seen that even for the same size category, the methylation levels of fetal and maternal specific readings differ from each other. This observation suggests that the hypomethylation state of fetal DNA is not the only factor explaining its shorter length relative to maternal DNA.

VII.基因座的印记状态VII. Imprinting status of gene loci

可以检测母体血浆中与母亲共享相同基因型,但具有不同表观遗传标记的胎儿来源 的DNA分子(鹏等人2002临床化学;48:35-41(LLM Poon et al.2002Clin Chem;48: 35-41))。为了说明测序方法灵敏地挑选母体血浆中胎儿来源的DNA分子,将相同的策略应用于检测母体血浆样品中印记胎儿等位基因。鉴别出两个基因组印记区域:H19 (chr11:1,977,419-1,977,821,NCBI Build36/hg18)和MEST(chr7:129,917,976-129,920,347,NCBIBuild36/hg18)。两者都含有用于区分母体与胎儿序列的信息性SNP。针对H19(一 种母体表达的基因),对于所述区域中SNP rs2071094(chr11:1,977,740),母亲是纯合的(A/A)并且胎儿是杂合的(A/C)。母体A等位基因之一完全甲基化并且其它未甲基化。 但是,在胎盘中,A等位基因未甲基化,而父体遗传的C等位基因完全甲基化。在母体 血浆中检测到两个具有C基因型的甲基化读数,与来源于胎盘的印记父体等位基因相对 应。This method can detect fetal DNA molecules in maternal plasma that share the same genotype as the mother but have different epigenetic markers (LLM Poon et al., 2002 Clin Chem; 48: 35-41). To demonstrate the sensitivity of sequencing methods in selecting fetal DNA molecules in maternal plasma, the same strategy was applied to detect imprinted fetal alleles in maternal plasma samples. Two genomic imprinted regions were identified: H19 (chr11: 1,977,419-1,977,821, NCBI Build36/hg18) and MEST (chr7: 129,917,976-129,920,347, NCBI Build36/hg18). Both contained informative SNPs that distinguish maternal and fetal sequences. For H19 (a maternally expressed gene), for SNP rs2071094 (chr11:1,977,740) in the region, the mother is homozygous (A/A) and the fetus is heterozygous (A/C). One of the maternal A alleles is fully methylated while the others are unmethylated. However, in the placenta, the A allele is unmethylated, while the paternally inherited C allele is fully methylated. Two methylated reads with the C genotype were detected in maternal plasma, corresponding to imprinted paternal alleles from the placenta.

MEST也称为PEG1,是一种父体表达的基因。对于印记基因座内的SNP rs2301335(chr7:129,920,062),母亲与胎儿都是杂合(A/G)的。母体血液中G等位基因甲基化, 而A等位基因未甲基化。胎盘中甲基化模式颠倒,其中母体A等位基因甲基化并且父 体G等位基因未甲基化。父体来源的三个未甲基化G等位基因在母体血浆中可检测到。 相比之下,VAV1,染色体19上的一种非印记基因座(chr19:6,723,621-6,724,121),在组织中以及血浆DNA样品中未展示任何等位基因甲基化模式。MEST, also known as PEG1, is a paternally expressed gene. For the SNP rs2301335 (chr7:129,920,062) at the imprinted locus, both mother and fetus are heterozygous (A/G). The G allele is methylated in maternal blood, while the A allele is unmethylated. The methylation pattern in the placenta is reversed, with the maternal A allele methylated and the paternal G allele unmethylated. The three unmethylated G alleles from the father are detectable in maternal plasma. In contrast, VAV1, a non-imprinted locus on chromosome 19 (chr19:6,723,621-6,724,121), does not exhibit any allele methylation pattern in tissue or plasma DNA samples.

因此,可以使用甲基化状态确定哪些DNA片段来自胎儿。举例来说,当母亲是GA 杂合时,在母体血浆中仅仅检测到A等位基因无法用作胎儿标记物。但如果区别血浆中 A分子的甲基化状态,那么甲基化A分子是胎儿特定的,而未甲基化A分子是母体特定 的,或反之亦然。Therefore, methylation status can be used to determine which DNA fragments originated from the fetus. For example, when the mother is GA heterozygous, the presence of only the A allele in maternal plasma cannot be used as a fetal marker. However, if the methylation status of A molecules in the plasma is distinguished, then methylated A molecules are fetal-specific, while unmethylated A molecules are maternally-specific, or vice versa.

接下来集中在已经报导证明胎盘组织中基因组印记的基因座上。基于伍德法英等人 (Woodfine et al.)(2011表观遗传学与染色质;4:1(2011Epigenetics Chromatin;4:1)) 报导的基因座清单,进一步分选在印记控制区内含有SNP的基因座。四个基因座满足标准并且它们是H19、KCNQ10T1、MEST和NESP。The next focus was on loci that had been reported to have genomic imprinted in placental tissue. Based on the list of loci reported by Woodfine et al. (2011 Epigenetics Chromatin; 4:1), loci containing SNPs within the imprinted control region were further sorted. Four loci met the criteria and were H19, KCNQ10T1, MEST, and NESP.

关于母体血细胞样品中H19和KCNQ10T1的读数,母体读数是SNP纯合的并且存 在大致相同比例的甲基化和未甲基化读数。CVS和足月胎盘组织样品揭露了对于两个基 因座,胎儿是杂合的并且每个等位基因是独占性地甲基化或未甲基化,即展示单等位基 因甲基化。在母体血浆样品中,针对两个基因座,检测父体遗传的胎儿DNA分子。对 于H19,父体遗传的分子由含有胎儿特定的等位基因并且甲基化的测序读数表示。对于 KCNQ10T1,父体遗传的分子由含有胎儿特定的等位基因并且未甲基化的测序读数表 示。Regarding the H19 and KCNQ10T1 readings in maternal blood cell samples, the maternal readings were SNP homozygous and showed approximately equal proportions of methylated and unmethylated readings. CVS and full-term placental tissue samples revealed that the fetus was heterozygous for both loci and each allele was exclusively methylated or unmethylated, exhibiting monoallelic methylation. In maternal plasma samples, paternally inherited fetal DNA molecules were detected for both loci. For H19, paternally inherited molecules were represented by sequencing reads containing fetal-specific alleles and being methylated. For KCNQ10T1, paternally inherited molecules were represented by sequencing reads containing fetal-specific alleles and being unmethylated.

另一方面,对于MEST与NESP,母亲都是杂合的。对于MEST,针对SNP,母亲 和胎儿都是GA杂合的。但是,如从关于母体血细胞和胎盘组织的沃森链的数据显而易 见,靠近SNP的CpG的甲基化状态在母亲和胎儿中相反。A等位基因在母亲DNA中未 甲基化,但在胎儿DNA中甲基化。对于MEST,母体等位基因甲基化。因此,可以指 出胎儿已经从其母亲遗传A等位基因(CVS中甲基化)并且母亲已经从其父亲遗传A 等位基因(母体血细胞中未甲基化)。有趣的是,在母体血浆样品中,所有四组分子都可以容易地区分,包括母亲的两个等位基因每一者和胎儿的两个等位基因每一者。因此, 通过将印记基因座的基因型信息与甲基化状态组合,可以容易地区分母体遗传的胎儿 DNA分子与背景母体DNA分子(鹏等人临床化学;48:35-41(LLM Poon et al.2002Clin Chem;48:35-41))。On the other hand, for both MEST and NESP, the mother is heterozygous. For MEST, both the mother and fetus are GA heterozygous for the SNP. However, as is evident from the Watson chain data of maternal blood cells and placental tissue, the methylation status of CpG near the SNP is reversed in the mother and fetus. The A allele is unmethylated in the mother's DNA but methylated in the fetal DNA. For MEST, the maternal allele is methylated. Therefore, it can be indicated that the fetus has inherited the A allele from its mother (methylated in CVS) and the mother has inherited the A allele from its father (unmethylated in maternal blood cells). Interestingly, in maternal plasma samples, all four groups of molecules can be easily distinguished, including each of the mother's two alleles and each of the fetus's two alleles. Therefore, by combining the genotype information of imprinted loci with methylation status, it is easy to distinguish between maternally inherited fetal DNA molecules and background maternal DNA molecules (LLM Poon et al. 2002 Clin Chem; 48:35-41).

此方法可以用于检测单亲源二体(uniparental disomy)。举例来说,如果已知此胎儿 的父亲是G等位基因纯合的,那么无法在母体血浆中检测到未甲基化G等位基因表示缺乏父体等位基因的贡献。此外,在此类情形下,当甲基化G等位基因与甲基化A等 位基因都在此妊娠的血浆中检测到时,将表明胎儿具有来自母亲的异二体性,即从母亲 遗传了两个不同等位基因,没有从父体遗传。或者,如果甲基化A等位基因(从母亲遗 传的胎儿等位基因)和未甲基化A等位基因(从母方外祖父遗传的母体等位基因)在母 体血浆中都检测到,没有未甲基化G等位基因(应已经由胎儿遗传的父体等位基因), 那么将表明胎儿具有来自母亲的同二体性,即从母亲遗传两个一致等位基因,没有从父体遗传。This method can be used to detect uniparental disomy. For example, if the father of the fetus is known to be homozygous for the G allele, the absence of unmethylated G alleles in the maternal plasma indicates a lack of paternal allele contribution. Furthermore, in such cases, the presence of both methylated G and methylated A alleles in the plasma of the pregnancy indicates maternal disomy, meaning the fetus has inherited two distinct alleles from the mother and none from the father. Alternatively, if both methylated A alleles (the fetal allele inherited from the mother) and unmethylated A alleles (the maternal allele inherited from the maternal grandfather) are detected in the maternal plasma, but the absence of unmethylated G alleles (the paternal allele that should have been inherited by the fetus) indicates maternal disomy, meaning the fetus has inherited two identical alleles from the mother and none from the father.

对于NESP,母亲在SNP是GA杂合子,而胎儿是G等位基因纯合的。对于NESP, 父体等位基因甲基化。在母体血浆样品中,甲基化的父体遗传的胎儿G等位基因可以容 易地与未甲基化的背景母体G等位基因区分。For NESP, the mother is heterozygous for GA in the SNP, while the fetus is homozygous for the G allele. For NESP, the paternal allele is methylated. In maternal plasma samples, the methylated paternally inherited fetal G allele can be easily distinguished from the unmethylated background maternal G allele.

VIII.癌症/供体VIII. Cancer/Donor

一些实施例可以用于使用循环血浆/血清DNA的甲基化分析对癌症进行检测、筛选、 监测(例如复发、缓解或对治疗的反应(例如存在或不存在))、分期、分类(例如帮助 选择最适当的治疗模式)和预后。Some embodiments can be used to detect, screen, monitor (e.g., recurrence, remission, or response to treatment (e.g., presence or absence)), stage, classify (e.g., to help select the most appropriate treatment modality) and prognosis of cancer using circulating plasma/serum DNA methylation analysis.

已知癌症DNA证明异常的DNA甲基化(赫尔曼等人2003新英格兰医学杂志;349:2042-2054(JG Herman et al.2003N Engl J Med;349:2042-2054))。举例来说,与非癌细胞相比,例如肿瘤抑制基因等基因的CpG岛启动子高甲基化,而基因体中的CpG位点 低甲基化。假设癌细胞的甲基化型态可以由使用本文中描述的方法的肿瘤来源的血浆 DNA分子的甲基化型态反映,预期当与未患癌症的那些健康个体相比时或当与癌症已经 治愈的个体相比时血浆中的总甲基化型态将在患有癌症的个体之间有差异。甲基化型态的差异类型可以根据基因组的甲基化密度和/或基因组区段的甲基化密度的定量差异。举 例来说,因为来自癌组织的DNA具有整体低甲基化性(伽玛-索沙等人1983核酸研究; 11:6883-6894(Gama-Sosa MA et al.1983Nucleic Acids Res;11:6883-6894),所以将在癌 症患者的血浆中观测到血浆甲基化组或基因组区段的甲基化密度的降低。Cancer DNA is known to exhibit abnormal DNA methylation (Herman et al. 2003 N Engl J Med; 349:2042-2054). For example, compared to non-cancer cells, CpG island promoters of genes such as tumor suppressor genes are highly methylated, while CpG sites in the genome are lowly methylated. Assuming that the methylation pattern of cancer cells can be reflected by the methylation pattern of tumor-derived plasma DNA molecules using the methods described herein, it is expected that the total methylation pattern in plasma will differ between individuals with cancer, whether compared to healthy individuals without cancer or to individuals whose cancer has been cured. The type of difference in methylation pattern can be based on quantitative differences in methylation density of the genome and/or methylation density of genomic segments. For example, because DNA from cancerous tissue is generally hypomethylated (Gama-Sosa et al., 1983 Nucleic Acids Res; 11:6883-6894), a decrease in plasma methylome or methylation density of genomic segments will be observed in the plasma of cancer patients.

甲基化型态的定性改变还应反映在血浆甲基化组数据中。举例来说,当与源自相同基因但在健康对照的样品中的血浆DNA分子相比时,源自仅仅在癌细胞中高甲基化的 基因的血浆DNA分子将展示在癌症患者的血浆中高甲基化。因为异常甲基化发生在大 部分癌症中,所以本文中描述的方法可以应用于检测具有异常甲基化的所有形式的恶性病,例如(但不限于)肺、乳房、结肠直肠、前列腺、鼻咽、胃、睪丸、皮肤、神经系 统、骨骼、卵巢、肝、血液组织、胰腺、子宫、肾脏、膀胱、淋巴组织等中的恶性病。 恶性病可以具有多种组织学亚型,例如癌瘤、腺癌、肉瘤、纤维腺癌、神经内分泌和未 分化等。Qualitative changes in methylation patterns should also be reflected in plasma methylome data. For example, plasma DNA molecules derived from genes that are hypermethylated only in cancer cells will show hypermethylation in the plasma of cancer patients when compared to plasma DNA molecules derived from the same gene but in samples from healthy controls. Because aberrant methylation occurs in most cancers, the methods described herein can be applied to detect all forms of malignancy with aberrant methylation, such as (but not limited to) malignancies of the lung, breast, colorectal, prostate, nasopharynx, stomach, testes, skin, nervous system, bone, ovary, liver, blood tissue, pancreas, uterus, kidney, bladder, lymphoid tissue, etc. Malignancies can have various histological subtypes, such as carcinoma, adenocarcinoma, sarcoma, fibroadenocarcinoma, neuroendocrine, and undifferentiated carcinoma.

另一方面,预期肿瘤来源的DNA分子可以与背景非肿瘤来源的DNA分子区分开, 因为对于源自具有肿瘤相关的异常低甲基化的基因座的DNA分子,肿瘤来源的DNA的 总短尺寸型态突出,此将对DNA分子的尺寸具有额外的影响。并且,可以使用多个与 肿瘤DNA相关的特征性特征,将肿瘤来源的血浆DNA分子与背景非肿瘤来源的血浆 DNA分子区分开,所述特征包括(但不限于)单核苷酸变异体、拷贝数增加和损失、易 位、倒置、异常高或低甲基化和尺寸型态分析。因为所有这些改变都可以独立地发生, 所以这些特征的组合使用可以为灵敏和特异性地检测血浆中的癌症DNA提供额外的优 点。On the other hand, it is expected that tumor-derived DNA molecules can be distinguished from background non-tumor-derived DNA molecules because the overall short size morphology of tumor-derived DNA is prominent for DNA molecules originating from loci with tumor-associated aberrant hypomethylation, which will have an additional impact on the size of the DNA molecule. Furthermore, several characteristic features associated with tumor DNA can be used to distinguish tumor-derived plasma DNA molecules from background non-tumor-derived plasma DNA molecules. These features include (but are not limited to) single nucleotide variants, copy number increases and losses, translocations, inversions, aberrant high or low methylation, and size morphology analysis. Because all these changes can occur independently, the combined use of these features can provide additional advantages for the sensitive and specific detection of cancer DNA in plasma.

A.尺寸和癌症A. Size and Cancer

血浆中肿瘤来源的DNA分子的尺寸也类似于单核小体单元的尺寸,并且比共同存在于癌症患者的血浆中的背景非肿瘤来源的DNA分子短。已经展示尺寸参数与癌症相 关,如以引用的方式并入以达成所有目的的美国专利申请案13/789,553中所描述。The size of tumor-derived DNA molecules in plasma is similar to that of mononuclear body units and is shorter than the background non-tumor-derived DNA molecules coexisting in the plasma of cancer patients. Size parameters have been shown to be relevant to cancer, as described in U.S. Patent Application 13/789,553, which is incorporated herein by reference for all purposes.

因为血浆中胎儿来源和母体来源的DNA都展示分子尺寸与甲基化状态之间的关系, 所以预期肿瘤来源的DNA分子显示相同的倾向。举例来说,在癌症患者的血浆中或筛选癌症的个体中低甲基化的分子将比高甲基化的分子短。Because both fetal and maternal DNA in blood plasma exhibit a relationship between molecular size and methylation status, tumor-derived DNA molecules are expected to show the same tendency. For example, in the plasma of cancer patients or in individuals screened for cancer, hypomethylated molecules will be shorter than hypermethylated molecules.

B.癌症患者中不同组织的甲基化密度B. Methylation density in different tissues of cancer patients

在此实例中,分析肝细胞癌(HCC)患者的血浆和组织样品。在肿瘤手术切除前和 1周后从HCC患者收集血液样品。在血液样品离心后收获血浆和白细胞层。收集所切除 的肿瘤和相邻的非肿瘤肝组织。在有和没有预先亚硫酸氢盐处理下,使用大规模平行测 序,分析从血浆和组织样品提取的DNA样品。还分析作为对照的来自未患癌症的四个健康个体的血浆DNA。DNA样品的亚硫酸氢盐处理将未甲基化胞嘧啶残基转化成尿嘧 啶。在下游聚合酶链反应和测序中,这些尿嘧啶残基将如胸苷一样表现。另一方面,亚 硫酸氢盐处理未将甲基化胞嘧啶残基转化为尿嘧啶。在大规模平行测序后,测序读数由 Methy-Pipe进行分析(江等人Methy-Pipe:用于全基因组甲基化组分析的集成生物信息学数据分析管道,在有关生物信息学和生物医学研讨会的IEEE国际主会议上呈现的论 文,香港,2010年12月18日到21日(P Jiang,et al.Methy-Pipe:An integrated bioinformatics data analysispipeline for whole genome methylome analysis,paper presented at the IEEEInternational Conference on Bioinformatics and Biomedicine Workshops,HongKong,18to 21December 2010)),以确定所有CG双核苷酸位置,即CpG位点上胞嘧啶 残基的甲基化状态。In this example, plasma and tissue samples from a patient with hepatocellular carcinoma (HCC) were analyzed. Blood samples were collected from HCC patients before and one week after surgical resection of the tumor. Plasma and leukocyte layers were harvested after centrifugation of the blood samples. The resected tumor and adjacent non-tumor liver tissue were collected. DNA samples extracted from plasma and tissue samples were analyzed using massively parallel sequencing with and without prior bisulfite treatment. Plasma DNA from four healthy individuals without cancer was also analyzed as a control. Bisulfite treatment of the DNA samples converted unmethylated cytosine residues to uracil. These uracil residues will behave like thymidine in downstream polymerase chain reaction and sequencing. On the other hand, bisulfite treatment did not convert methylated cytosine residues to uracil. Following massively parallel sequencing, the sequencing reads were analyzed using Methy-Pipe (Jiang et al. Methy-Pipe: An integrated bioinformatics data analysis pipeline for whole genome methylome analysis, paper presented at the IEEE International Conference on Bioinformatics and Biomedicine Workshops, Hong Kong, 18 to 21 December 2010) to determine the methylation status of all CG dinucleotide positions, i.e., cytosine residues at CpG sites.

图21A是表2100,展示HCC患者的手术前血浆和组织样品的甲基化密度。相关区 域(例如CpG位点、启动子或重复区域等)的CpG甲基化密度是指在覆盖基因组CpG 双核苷酸上展示CpG甲基化的读数占总读数的比例。白细胞层和非肿瘤肝组织的甲基化 密度类似。基于来自所有常染色体的数据,肿瘤组织的总甲基化密度比白细胞层和非肿 瘤肝组织的总甲基化密度低25%。低甲基化在每个的染色体中是一致的。血浆的甲基化 密度在非恶性组织与癌症组织的值之间。此观测结果与如下事实一致:癌症与非癌症组织都对癌症患者的外周血循环DNA有贡献。已经展示造血系统是无活性恶性病状的个 体中循环DNA的主要来源(磊等人2002临床化学;48:421-7(YYN Lui,et al.2002Clin Chem;48:421-7))。因此,还分析了从四个健康对照获得的血浆样品。图21B的表2150 中展示每一样品所实现的序列读数的数目和测序深度。Figure 21A is Table 2100, showing the methylation density of plasma and tissue samples from HCC patients before surgery. CpG methylation density in relevant regions (e.g., CpG sites, promoters, or repeat regions) refers to the proportion of readings showing CpG methylation on CpG dinucleotides covering the genome out of the total readings. Methylation densities were similar in the leukocyte layer and non-tumor liver tissue. Based on data from all autosomes, the total methylation density in tumor tissue was 25% lower than that in the leukocyte layer and non-tumor liver tissue. Hypomethylation was consistent across all chromosomes. Plasma methylation density fell between values in non-malignant and cancerous tissues. This observation is consistent with the fact that both cancerous and non-cancer tissues contribute to circulating DNA in the peripheral blood of cancer patients. The hematopoietic system has been shown to be a major source of circulating DNA in individuals with inactive malignant symptoms (YYN Lui, et al. 2002 Clin Chem; 48:421-7). Therefore, plasma samples obtained from four healthy controls were also analyzed. Table 2150 in Figure 21B shows the number of sequence reads and sequencing depth achieved for each sample.

图22是表220,展示健康对照的血浆样品中常染色体中的甲基化密度在71.2%到72.5%范围内。这些数据展示从没有肿瘤DNA来源的个体获得的血浆样品中DNA甲基 化的预期水平。在癌症患者中,肿瘤组织还将释放DNA到循环中(陈等人2013临床 化学;59:211-224(KCA Chan et al.2013Clin Chem;59:211-224);利瑞等人2012科学·转 化医学;4:162ra154)。归因于HCC肿瘤的低甲基化性,患者的手术前血浆中肿瘤与非 肿瘤来源的DNA的存在引起与健康对照的血浆相比甲基化密度的降低。实际上,手术 前血浆样品的甲基化密度在肿瘤组织的甲基化密度与健康对照的血浆之间。原因是因为癌症患者的血浆DNA的甲基化水平将受肿瘤组织的异常甲基化(在此情况下低甲基化) 程度和循环中肿瘤来源的DNA的百分比浓度影响。肿瘤组织的较低甲基化密度和循环 中肿瘤来源的DNA的较高百分比浓度将引起癌症患者中血浆DNA的甲基化密度较低。大部分肿瘤据报导展示整体低甲基化(赫尔曼等人2003新英格兰医学杂志;349: 2042-2054;玛-索沙等人1983核酸研究;11:6883-6894)。因此,HCC样品中见到的当 前观察结果还应适用于其它类型的肿瘤。Figure 22 is a summary of Table 220, showing that the methylation density of autosomes in plasma samples from healthy controls ranged from 71.2% to 72.5%. These data illustrate the expected level of DNA methylation in plasma samples obtained from individuals without tumor DNA origin. In cancer patients, tumor tissue also releases DNA into circulation (KCA Chan et al. 2013 Clin Chem; 59:211-224; Liri et al. 2012 Science Translational Medicine; 4:162ra154). The presence of both tumor and non-tumor-derived DNA in the preoperative plasma of patients, attributed to the hypomethylation of HCC tumors, caused a decrease in methylation density compared to healthy control plasma. In fact, the methylation density of preoperative plasma samples fell between that of tumor tissue and healthy control plasma. This is because the methylation level of plasma DNA in cancer patients is influenced by the degree of aberrant methylation (hypomethylation in this case) in tumor tissue and the percentage concentration of tumor-derived DNA in circulation. Lower methylation density in tumor tissue and a higher percentage concentration of tumor-derived DNA in circulation result in lower plasma DNA methylation density in cancer patients. Most tumors have been reported to exhibit overall hypomethylation (Herman et al. 2003, New England Journal of Medicine; 349: 2042-2054; M. Sosa et al. 1983, Nucleic Acid Research; 11: 6883-6894). Therefore, current observations in HCC samples should also apply to other types of tumors.

在一个实施例中,当已知肿瘤组织的甲基化水平时,血浆DNA的甲基化密度可以用于确定血浆/血清样品中肿瘤来源的DNA的百分比浓度。如果肿瘤样品可获得或肿瘤的活组织检查可获得,那么可以获得肿瘤组织的甲基化水平,例如甲基化密度。在另一 个实施例中,关于肿瘤组织的甲基化水平的信息可以从一组类似类型的肿瘤中甲基化水 平的研究获得,并且此信息(例如平均水平或中位水平)应用于待使用本发明中描述的 技术分析的患者。肿瘤组织的甲基化水平可以通过分析患者的肿瘤组织确定或从患有相 同或类似癌症类型的其它患者的肿瘤组织的分析推断。肿瘤组织的甲基化可以使用一系列可识别甲基化的平台确定,包括(但不限于)大规模平行测序、单分子测序、微阵列 (例如寡核苷酸阵列)或质谱分析(例如西格诺公司(Sequenom,Inc.)的Epityper分析)。 在一些实施例中,此类分析可以是先于DNA分子的甲基化状态敏感的程序,包括(但 不限于)胞嘧啶免疫沉淀和察觉甲基化的限制酶消化。当已知肿瘤的甲基化水平时,可 以在血浆甲基化组分析后计算癌症患者血浆中肿瘤DNA的百分比浓度。In one embodiment, when the methylation level of tumor tissue is known, the methylation density of plasma DNA can be used to determine the percentage concentration of tumor-derived DNA in a plasma/serum sample. If a tumor sample is available or a tumor biopsy is available, then the methylation level of the tumor tissue, such as methylation density, can be obtained. In another embodiment, information about the methylation level of tumor tissue can be obtained from a study of methylation levels in a group of similar types of tumors, and this information (e.g., average or median level) is applied to a patient to be analyzed using the techniques described in this invention. The methylation level of tumor tissue can be determined by analyzing a patient's tumor tissue or inferred from analysis of tumor tissue from other patients with the same or similar types of cancer. The methylation of tumor tissue can be determined using a range of methylation-identifying platforms, including (but not limited to) massively parallel sequencing, single-molecule sequencing, microarrays (e.g., oligonucleotide arrays), or mass spectrometry (e.g., Sequenom, Inc.'s Epityper assay). In some embodiments, such analyses can be procedures sensitive to the methylation state of DNA molecules, including (but not limited to) cytosine immunoprecipitation and restriction enzyme digestion that detects methylation. When the methylation level of the tumor is known, the percentage concentration of tumor DNA in the plasma of cancer patients can be calculated after plasma methylome analysis.

血浆甲基化水平P与分数肿瘤DNA浓度f和肿瘤组织甲基化水平TUM之间的关系 可以描述为:P=BKG×(1-f)+TUM×f,其中BKG是来源于血细胞和其它内脏的血浆中的 背景DNA甲基化水平。举例来说,在从此HCC患者获得的肿瘤活组织检查组织中所有常染色体的总甲基化密度展示是42.9%,即此案例的TUM值。来自四个健康对照的血 浆样品的平均甲基化密度是71.6%,即此案例的BKG值。手术前血浆的血浆甲基化密度是59.7%。使用这些值,估得f为41.5%。The relationship between plasma methylation level P and fractional tumor DNA concentration f and tumor tissue methylation level TUM can be described as: P = BKG × (1-f) + TUM × f, where BKG is the background DNA methylation level in plasma derived from blood cells and other internal organs. For example, the total methylation density of all autosomes in a tumor biopsy tissue obtained from this HCC patient is 42.9%, which is the TUM value in this case. The average methylation density of plasma samples from four healthy controls is 71.6%, which is the BKG value in this case. The plasma methylation density before surgery is 59.7%. Using these values, f is estimated to be 41.5%.

在另一个实施例中,当已知血浆样品中肿瘤来源的DNA的百分比浓度时肿瘤组织的甲基化水平可以基于血浆甲基化组数据无创地评估。血浆样品中肿瘤来源的DNA的 百分比浓度可以通过其它遗传分析确定,例如如先前描述的等位基因损失的全基因组分 析(GAAL)和单核苷酸突变的分析(美国专利申请案13/308,473;陈等人2013临床化 学;59:211-24(KCA Chan et al.2013Clin Chem;59:211-24))。此计算是基于与上述相同 的关系,不同之处在于在此实施例中,f的值已知,而TUM的值变成未知。可以针对全 基因组或基因组部分进行推断,类似于针对从母体血浆数据确定胎盘组织甲基化水平的 情况所观测到的数据。In another embodiment, when the percentage concentration of tumor-derived DNA in a plasma sample is known, the methylation level of tumor tissue can be noninvasively assessed based on plasma methylome data. The percentage concentration of tumor-derived DNA in a plasma sample can be determined by other genetic analyses, such as genome-wide analysis of allele loss (GAAL) and single nucleotide mutation analysis as previously described (US Patent Application 13/308,473; Chan et al. 2013 Clin Chem; 59:211-24). This calculation is based on the same relationship as described above, except that in this embodiment, the value of f is known, while the value of TUM becomes unknown. Inferences can be made against the whole genome or a portion of the genome, similar to the data observed in determining placental tissue methylation levels from maternal plasma data.

在另一个实施例中,可以使用甲基化密度的区域之间变化或型态来区分患癌症的个 体与未患癌症的个体。通过将基因组分成特定尺寸的区域(例如1Mb),甲基化分析的分辨率可以进一步增加。在此类实施例中,针对所收集的样品,例如白细胞层、所切除 的HCC组织、靠近肿瘤的非肿瘤肝组织和肿瘤切除前后收集的血浆计算每个1Mb区域 的甲基化密度。在另一个实施例中,区域尺寸无需保持恒定。在一个实现方式中,每个 区域内CpG位点的数目保持恒定,而区域本身的尺寸可以变化。In another embodiment, variations or patterns in methylation density between regions can be used to distinguish individuals with cancer from those without. The resolution of methylation analysis can be further increased by dividing the genome into regions of a specific size (e.g., 1 Mb). In such embodiments, the methylation density of each 1 Mb region is calculated for collected samples, such as leukocyte layers, excised HCC tissue, non-tumor liver tissue near the tumor, and plasma collected before and after tumor resection. In another embodiment, the region size does not need to be constant. In one implementation, the number of CpG sites within each region remains constant, while the size of the region itself can vary.

图23A和23B展示HCC患者的白细胞层、肿瘤组织、非肿瘤肝组织、手术前血浆和手术后血浆的甲基化密度。图23A是染色体1的结果对应的图2300。图23B是染色 体2的结果对应的图2350。Figures 23A and 23B show the methylation density of leukocyte layer, tumor tissue, non-tumor liver tissue, preoperative plasma, and postoperative plasma in HCC patients. Figure 23A is Figure 2300, corresponding to the results for chromosome 1. Figure 23B is Figure 2350, corresponding to the results for chromosome 2.

对于大部分1Mb窗口,白细胞层和靠近肿瘤的非肿瘤肝组织的甲基化密度类似,而肿瘤组织的甲基化密度则较低。手术前血浆的甲基化密度处于肿瘤与非恶性组织的甲基化密度之间。可以使用手术前血浆的甲基化数据和分数肿瘤DNA浓度推断肿瘤组织 中询问基因组区域的甲基化密度。方法与上述相同,使用所有常染色体的甲基化密度值。 还可以使用血浆DNA的此更高分辨率甲基化数据推断所述肿瘤甲基化。还可以使用其 它区域尺寸,例如300kb、500kb、2Mb、3Mb、5Mb或大于5Mb。在一个实施例中, 区域尺寸无需保持恒定。在一个实现方式中,每个区域内CpG位点的数目保持恒定,而 区域本身的尺寸可以变化。For most of the 1Mb window, the methylation density is similar in the leukocyte layer and in non-tumor liver tissue near the tumor, while the methylation density in tumor tissue is lower. Preoperative plasma methylation density falls between that of tumor and non-malignant tissue. The methylation density of the queried genomic regions in tumor tissue can be inferred using preoperative plasma methylation data and fractional tumor DNA concentration. The method is the same as described above, using methylation density values for all autosomes. The tumor methylation can also be inferred using this higher resolution methylation data from plasma DNA. Other region sizes, such as 300kb, 500kb, 2Mb, 3Mb, 5Mb, or greater than 5Mb, can also be used. In one embodiment, the region size does not need to be constant. In one implementation, the number of CpG sites within each region remains constant, while the size of the region itself can vary.

C.癌症患者与健康个体之间血浆甲基化密度的比较C. Comparison of plasma methylation density between cancer patients and healthy individuals

如2100中所示,癌症患者手术前血浆DNA的甲基化密度低于非恶性组织的甲基化密度。此很可能由存在来自肿瘤组织的低甲基化的DNA引起。此较低血浆DNA甲基化 密度可能可以用作检测和监测癌症的生物标记物。对于癌症监测,如果癌症不断发展, 那么血浆中癌症来源的DNA的量随时间的推移而增加。在此实例中,血浆中循环的癌 症来源的DNA的量增加将引起全基因组水平下血浆DNA甲基化密度进一步降低。As shown in 2100, the methylation density of plasma DNA in cancer patients before surgery is lower than that in non-malignant tissues. This is likely due to the presence of hypomethylated DNA from tumor tissue. This lower plasma DNA methylation density could potentially be used as a biomarker for detecting and monitoring cancer. For cancer surveillance, if cancer progresses, the amount of cancer-derived DNA in the plasma increases over time. In this example, the increased amount of circulating cancer-derived DNA in the plasma will cause a further decrease in plasma DNA methylation density at the whole-genome level.

相反,如果癌症对治疗起反应,那么血浆中癌症来源的DNA的量将随时间的推移而降低。在此实例中,血浆中癌症来源的DNA的量降低将引起血浆DNA甲基化密度增 加。举例来说,如果具有表皮生长因子受体突变的肺癌患者已经用例如酪氨酸激酶抑制 等靶向疗法治疗,那么血浆DNA甲基化密度的增加将表示对治疗有反应。随后,对酪 氨酸激酶抑制具抗性的肿瘤克隆的出现将与指示复发的血浆DNA甲基化密度的降低相 关。Conversely, if cancer responds to treatment, the amount of cancer-derived DNA in the plasma will decrease over time. In this example, the decrease in the amount of cancer-derived DNA in the plasma will cause an increase in plasma DNA methylation density. For instance, if a lung cancer patient with an epidermal growth factor receptor mutation has been treated with targeted therapy such as tyrosine kinase inhibition, an increase in plasma DNA methylation density will indicate a response to treatment. Subsequently, the emergence of tumor clones resistant to tyrosine kinase inhibition will be associated with a decrease in plasma DNA methylation density, indicating recurrence.

血浆甲基化密度测量可以连续进行并可以计算此类测量的改变速率并用于预测或 关联临床进展或缓解或预后。对于在癌症组织中高甲基化但在正常组织中低甲基化的所 选基因座,例如大量肿瘤抑制基因的启动子区,癌症进展与对治疗的有利反应之间的关系将与上述模式相反。Plasma methylation density measurements can be performed continuously, and the rate of change in such measurements can be calculated and used to predict or correlate clinical progression, remission, or prognosis. For selected loci that are hypermethylated in cancerous tissues but hypomethylated in normal tissues, such as promoter regions of numerous tumor suppressor genes, the relationship between cancer progression and favorable response to treatment will be the opposite of the above pattern.

为了证明此方法的可行性,将在手术去除肿瘤前后从癌症患者收集的血浆样品的DNA甲基化密度与从四个健康对照个体获得的血浆DNA比较。To demonstrate the feasibility of this method, the DNA methylation density of plasma samples collected from cancer patients before and after surgical removal of the tumor will be compared with that of plasma DNA obtained from four healthy control individuals.

表2200展示癌症患者的手术前和手术后血浆样品以及四个健康对照个体的每个常 染色体的DNA甲基化密度和所有常染色体的组合值。对于所有染色体,手术前血浆DNA样品的甲基化密度低于手术后样品和来自四个健康个体的血浆样品的甲基化密度。手术前与手术后样品之间的血浆DNA甲基化密度差异提供了手术前血浆样品中较低的甲基 化密度是因为存在来自HCC肿瘤的DNA的支持证据。Table 2200 shows the DNA methylation density and combined values for each autosome in preoperative and postoperative plasma samples from cancer patients and four healthy controls. For all chromosomes, the methylation density of preoperative plasma DNA samples was lower than that of postoperative samples and plasma samples from the four healthy individuals. The difference in plasma DNA methylation density between preoperative and postoperative samples provides supporting evidence that the lower methylation density in preoperative plasma samples is due to the presence of DNA from HCC tumors.

手术后血浆样品中DNA甲基化密度逆转到类似于健康对照的血浆样品的水平表明因为手术去除来源,即肿瘤,许多肿瘤来源的DNA已经消失。这些数据表明使用可从 大基因组区域(例如所有常染色体或个别染色体)获得的数据确定手术前血浆的甲基化 密度具有比健康对照低的甲基化水平,从而允许鉴别,即诊断或筛选测试案例为患有癌症。The reversal of DNA methylation density in postoperative plasma samples to levels similar to those in healthy controls indicates that much of the tumor-derived DNA has been eliminated due to surgical removal of the source, the tumor. These data suggest that using data available from large genomic regions (e.g., all autosomes or individual chromosomes) to determine that preoperative plasma methylation density has lower levels than healthy controls allows for the identification, i.e., diagnostic or screening tests for cases of cancer.

手术前血浆的数据还展示甲基化水平比手术后血浆低得多,表明血浆甲基化水平也 可以用于监测肿瘤负荷,因此预测和监测患者中癌症的进展。参考值可以从健康对照或有癌症风险但当前无癌症者的血浆确定。有HCC风险者包括有慢性B型肝炎或C型肝 炎感染者、有血色沉着病者和有肝硬化者。Preoperative plasma data also showed significantly lower methylation levels than postoperative plasma, indicating that plasma methylation levels can also be used to monitor tumor burden, thus predicting and monitoring cancer progression in patients. Reference values can be determined from the plasma of healthy controls or individuals at risk of cancer but currently cancer-free. Individuals at risk of HCC include those with chronic hepatitis B or C infection, those with hemochromatosis, and those with cirrhosis.

可以使用血浆甲基化密度值超出、例如低于基于参考值界定的阈值来评估非怀孕者 的血浆是否具有肿瘤DNA。为了检测低甲基化的循环肿瘤DNA的存在,阈值可以被定义为低于对照群体的值的第5或第1百分位数,或基于标准偏差的数目,例如低于对照 的平均甲基化密度值2或3个标准偏差(SD),或基于确定中位数倍数(MoM)。对于 高甲基化的肿瘤DNA,阈值可以被定义为高于对照群体的值的第95或99百分位数,或 基于标准偏差的数目,例如超过对照的平均甲基化密度值2或3个SD,或基于确定中 位数倍数(MoM)。在一个实施例中,对照群体与测试个体年龄匹配。年龄匹配无需准确,并可以在年龄组(例如30到40岁,对于35岁的测试个体)中进行。The presence of tumor DNA in the plasma of non-pregnant individuals can be assessed using plasma methylation density values exceeding, for example, below, a threshold defined based on reference values. To detect the presence of hypomethylated circulating tumor DNA, the threshold can be defined as being below the 5th or 1st percentile of values in the control population, or based on the number of standard deviations, such as being 2 or 3 standard deviations (SD) below the mean methylation density value of the control population, or based on a determined median fold (MoM). For hypermethylated tumor DNA, the threshold can be defined as being above the 95th or 99th percentile of values in the control population, or based on the number of standard deviations, such as exceeding the mean methylation density value of the control population by 2 or 3 SD, or based on a determined median fold (MoM). In one embodiment, the control population is age-matched to the test individuals. Age matching does not need to be accurate and can be performed within age groups (e.g., 30 to 40 years old, for a 35-year-old test individual).

接下来比较癌症患者与四个对照个体的血浆样品之间1Mb区域的甲基化密度。为了说明,展示染色体1的结果。Next, the methylation density of the 1Mb region was compared between plasma samples from cancer patients and four control individuals. For illustration, the results for chromosome 1 are shown.

图24A是图2400,展示来自HCC患者的手术前血浆的甲基化密度。图24B是图 2450,展示来自HCC患者的手术后血浆的甲基化密度。蓝点表示对照个体的结果,红 点表示HCC患者的血浆样品的结果。Figure 24A is Figure 2400, showing the methylation density of plasma from HCC patients before surgery. Figure 24B is Figure 2450, showing the methylation density of plasma from HCC patients after surgery. Blue dots represent results from control individuals, and red dots represent results from plasma samples from HCC patients.

如图24A中所示,对于大部分区域,来自HCC患者的手术前血浆的甲基化密度低 于对照个体的甲基化密度。在其它染色体上,观测到类似模式。如图24B中所示,对于 大部分区域,来自HCC患者的手术后血浆的甲基化密度类似于对照个体的甲基化密度。 在其它染色体上,观测到类似模式。As shown in Figure 24A, for most regions, the methylation density of preoperative plasma from HCC patients was lower than that of controls. Similar patterns were observed on other chromosomes. As shown in Figure 24B, for most regions, the methylation density of postoperative plasma from HCC patients was similar to that of controls. Similar patterns were observed on other chromosomes.

为了评估测试个体是否患有癌症,测试个体的结果将与参考群体的值比较。在一个 实施例中,参考群体可以由大量健康个体构成。在另一个实施例中,参考群体可以由患有例如慢性B型肝炎感染或肝硬化等非恶性病状的个体构成。随后可以定量测试个体与参考群体之间的甲基化密度差异。To assess whether a test individual has cancer, the individual's results are compared to those of a reference group. In one embodiment, the reference group may consist of a large number of healthy individuals. In another embodiment, the reference group may consist of individuals with non-malignant conditions such as chronic hepatitis B infection or cirrhosis. The difference in methylation density between the individual and the reference group can then be quantitatively tested.

在一个实施例中,参考范围可以来源于对照群体的值。随后可以使用测试个体的结 果与参考群体的上限或下限的偏差来确定个体是否具有肿瘤。此数量将受血浆中肿瘤来 源的DNA的百分比浓度和恶性与非恶性组织之间的甲基化水平差异影响。血浆中更高的肿瘤来源的DNA的百分比浓度将引起测试血浆样品与对照之间更大的甲基化密度差 异。更大程度的恶性与非恶性组织的甲基化水平差异也引起测试血浆样品与对照之间更 大的甲基化密度差异。在又一实施例中,针对不同年龄范围的测试个体,选择不同参考 群体。In one embodiment, the reference range may be derived from values in a control group. The deviation of the test individual's result from the upper or lower limit of the reference group can then be used to determine whether the individual has a tumor. This quantity will be affected by the percentage concentration of tumor-derived DNA in the plasma and the difference in methylation levels between malignant and non-malignant tissues. A higher percentage concentration of tumor-derived DNA in the plasma will cause a greater difference in methylation density between the test plasma sample and the control. A greater difference in methylation levels between malignant and non-malignant tissues will also cause a greater difference in methylation density between the test plasma sample and the control. In yet another embodiment, different reference groups are selected for test individuals of different age ranges.

在另一个实施例中,对于每个1Mb区域,计算四个对照个体的甲基化密度的平均值和SD。随后对于对应的区域,计算HCC患者的甲基化密度与对照个体的平均值之间的差异。在一个实施例中,此差异随后除以对应区域的SD以确定z分数。换句话说,z 分数表示测试与对照血浆样品之间的甲基化密度差异,表示为与对照个体的平均值的 SD的数目。区域的z分数>3指示此区域中HCC患者的血浆DNA比对照个体高甲基化 超过3SD,而区域中z分数<-3指示此区域中HCC患者的血浆DNA比对照个体低甲基 化超过3SD。In another embodiment, for each 1Mb region, the mean and SD of methylation density for four control individuals are calculated. Then, for the corresponding region, the difference between the methylation density of the HCC patient and the mean of the control individuals is calculated. In one embodiment, this difference is then divided by the SD of the corresponding region to determine the z-score. In other words, the z-score represents the difference in methylation density between the tested and control plasma samples, expressed as the number of SDs relative to the mean of the control individuals. A z-score > 3 for a region indicates that the plasma DNA of the HCC patient in this region is more than 3 SD higher than that of the control individuals, while a z-score < -3 for a region indicates that the plasma DNA of the HCC patient in this region is more than 3 SD lower than that of the control individuals.

图25A和25B展示对于染色体1,使用四个健康对照个体的血浆甲基化组数据作为参考,HCC患者的手术前(曲线2500)和手术后(曲线2550)血浆样品的血浆DNA 甲基化密度的z分数。每个点表示一个1Mb区域的结果。黑点表示z分数在-3与3之 间的区域。红点表示z分数<-3的区域。Figures 25A and 25B show the z-scores of plasma DNA methylation density in plasma samples from HCC patients before surgery (curve 2500) and after surgery (curve 2550), using plasma methylome data from four healthy control individuals as a reference for chromosome 1. Each point represents the result for a 1 Mb region. Black dots indicate regions with z-scores between -3 and 3. Red dots indicate regions with z-scores < -3.

图26A是表2600,展示手术前和手术后血浆的z分数的数据。手术前血浆样品中 染色体1上的大部分区域(80.9%)的z分数<-3,表明HCC患者的手术前血浆DNA比 对照个体显著更低甲基化。相反,手术后血浆样品中红点的数目实质上减少(染色体1 上所有区域中的8.3%),表明归因于手术切除外周血循环中的肿瘤DNA的来源,大部 分肿瘤DNA已经从外周血循环中去除。Figure 26A is from Table 2600, showing the z-score data for plasma before and after surgery. In preoperative plasma samples, the z-score for most regions of chromosome 1 (80.9%) was <-3, indicating significantly lower methylation of plasma DNA in HCC patients compared to controls. Conversely, the number of red dots in postoperative plasma samples was substantially reduced (8.3% across all regions of chromosome 1), indicating that most of the tumor DNA, attributable to the surgical removal of the source of tumor DNA in the peripheral blood circulation, had been removed from the peripheral blood circulation.

图26B是Circos图2620,展示针对从所有常染色体分析的所有1Mb区域,使用四 个健康对照个体作为参考,HCC患者的手术前和手术后血浆样品的血浆DNA甲基化密 度的z分数。最外环展示人类常染色体的G带图。中间环展示手术前血浆样品的数据。 最内环展示手术后血浆样品的数据。每个点表示一个1Mb区域的结果。黑点表示z分 数在-3与3之间的区域。红点表示z分数<-3的区域。绿点表示z分数>3的区域。Figure 26B is Circos plot 2620, showing the z-scores of plasma DNA methylation density in plasma samples from HCC patients before and after surgery, using four healthy controls as references, for all 1Mb regions analyzed from all autosomes. The outermost ring shows the G-banding map of human autosomes. The middle ring shows the data from the preoperative plasma samples. The innermost ring shows the data from the postoperative plasma samples. Each dot represents the result for a 1Mb region. Black dots represent regions with z-scores between -3 and 3. Red dots represent regions with z-scores < -3. Green dots represent regions with z-scores > 3.

图26C是表2640,展示对于HCC患者的手术前与手术后血浆样品中的全基因组所有的1Mb区域的z分数的分布。结果指示对于全基因组中的大部分区域(所有1Mb区域中的85.2%),HCC患者的手术前血浆DNA比对照更加低甲基化。相反,手术后血浆 样品中的大多数区域(所有1Mb区域中的93.5%)与对照相比较未展现出显著的高甲 基化或低甲基化。这些数据指示对于此HCC,实质上主要低甲基化的许多肿瘤DNA不再存在于手术后血浆样品中。Figure 26C is Table 2640, showing the distribution of z-scores for all 1Mb regions of the whole genome in preoperative and postoperative plasma samples from HCC patients. The results indicate that for most regions of the whole genome (85.2% of all 1Mb regions), preoperative plasma DNA from HCC patients was more hypomethylated than in controls. Conversely, most regions in postoperative plasma samples (93.5% of all 1Mb regions) did not show significant hypermethylation or hypomethylation compared to controls. These data suggest that for this HCC, substantially predominantly hypomethylated tumor DNA was no longer present in the postoperative plasma samples.

在一个实施例中,z分数<-3的区域的数目、百分比或比例可以用于指示是否存在癌 症。举例来说,如表2640中所示,在手术前血浆中分析的2734个区域中的2330个(85.2%) 展示z分数<-3,而在手术后血浆中分析的2734个区域中的仅仅171个(6.3%)展示z 分数<-3。数据指示手术前血浆中的肿瘤DNA负荷远高于手术后血浆中。In one embodiment, the number, percentage, or proportion of regions with a z-score <-3 can be used to indicate the presence of cancer. For example, as shown in Table 2640, 2330 (85.2%) of the 2734 regions analyzed in preoperative plasma showed a z-score <-3, while only 171 (6.3%) of the 2734 regions analyzed in postoperative plasma showed a z-score <-3. The data indicate a significantly higher tumor DNA load in preoperative plasma than in postoperative plasma.

区域数目的阈值可以使用统计方法确定。举例来说,基于正态分布,预期大约0.15% 的区域的z-分数<-3。因此,区域的截止数目可以是所分析的区域总数的0.15%。换句话 说,如果来自非怀孕个体的血浆样品展示超过0.15%的区域显示z分数<-3,那么血浆中 存在低甲基化DNA的来源,即癌症。举例来说,在此实例中用于分析的2734个1Mb 区域的0.15%是约4个区域。使用此值作为阈值,手术前与手术后血浆样品都含有低甲 基化的肿瘤来源的DNA,不过在手术前血浆样品中此量比手术后血浆样品多得多。对于 四个健康对照个体,没有区域展示显著的高甲基化或低甲基化。可以使用其它阈值(例 如1.1%)并且可以变化,取决于所使用的分析的需要。作为其它实例,截止百分比可以基于统计分布以及所需灵敏度和可接受的特异性而变化。The threshold for the number of regions can be determined using statistical methods. For example, based on a normal distribution, it is expected that approximately 0.15% of the regions will have a z-score <-3. Therefore, the cutoff number of regions could be 0.15% of the total number of regions analyzed. In other words, if more than 0.15% of the regions in a plasma sample from a non-pregnant individual show a z-score <-3, then there is a source of hypomethylated DNA in the plasma, i.e., cancer. For example, in this instance, 0.15% of the 2734 1Mb regions used for analysis is approximately 4 regions. Using this value as a threshold, both pre- and post-operative plasma samples contained hypomethylated tumor-derived DNA, although the amount was significantly higher in the pre-operative plasma sample than in the post-operative plasma sample. For the four healthy control individuals, no regions showed significant hypermethylation or hypomethylation. Other thresholds (e.g., 1.1%) can be used and can be varied depending on the needs of the analysis used. As another example, the cutoff percentage can be varied based on the statistical distribution and the required sensitivity and acceptable specificity.

在另一个实施例中,截止数目可以通过受试者操作特征(ROC)曲线分析,对大量癌症患者和无癌症个体进行分析来确定。为了进一步验证此方法的特异性,分析来自针 对非恶性病状(C06)寻求医学咨询的患者的血浆样品。1.1%的区域显示z分数<-3。在 一个实施例中,可以使用不同阈值将不同水平的疾病状态分类。较低百分比阈值可以用 于区分健康状态与良性病状,并且较高百分比阈值可以用于区分良性病状与恶性病。In another embodiment, the cutoff number can be determined by analyzing a large number of cancer patients and individuals without cancer using receiver operating characteristic (ROC) curve analysis. To further validate the specificity of this method, plasma samples from patients seeking medical advice regarding non-malignant conditions (C06) were analyzed. A region of 1.1% showed a z-score <-3. In one embodiment, different thresholds can be used to classify different levels of disease states. Lower percentage thresholds can be used to distinguish between healthy states and benign conditions, while higher percentage thresholds can be used to distinguish between benign conditions and malignant conditions.

使用大规模平行测序的血浆低甲基化分析的诊断性能似乎优于使用特定类别的重 复元件(例如长散布核元件-1(LINE-1))的基于聚合酶链反应(PCR)的扩增(唐瓦尼 科等人2007临床化学学报;379:127-133(P Tangkijvanich et al.2007Clin Chim Acta;379:127-133))。对此观测结果的一个可能的解释是虽然肿瘤基因组中低甲基化是普遍的,但从一个基因组区域到下一个基因组区域,其具有一定程度的异质性。The diagnostic performance of plasma hypomethylation analysis using massively parallel sequencing appears to be superior to polymerase chain reaction (PCR)-based amplification using specific classes of repeating elements (e.g., long dispersed nuclear element-1 (LINE-1)) (P Tangkijvanich et al. 2007 Clin Chim Acta; 379:127-133). One possible explanation for this observation is that while hypomethylation is prevalent in the tumor genome, it exhibits a degree of heterogeneity from one genomic region to the next.

实际上,观测到参考个体的平均血浆甲基化密度跨越基因组变化(图56)。图56中每个红点展示32个健康个体当中一个1Mb区域的平均甲基化密度。曲线图展示跨越基 因组分析的所有1Mb区域。每个盒内的数目表示染色体数目。观测到平均甲基化密度 随区域而变化。In fact, the average plasma methylation density of the reference individuals was observed across genomic variations (Figure 56). Each red dot in Figure 56 represents the average methylation density of a 1Mb region in 32 healthy individuals. The graph shows all 1Mb regions across the genome analysis. The number within each box represents the chromosome number. The average methylation density was observed to vary across regions.

简单的基于PCR的分析无法将此类区域之间的异质性考虑到其诊断算法中。此类异 质性将加宽在健康个体中观测到的甲基化密度的范围。随后对于视为展示低甲基化的样 品,将需要甲基化密度降低更大的量值。这将引起测试灵敏度降低。Simple PCR-based analyses cannot account for this heterogeneity between regions in their diagnostic algorithms. This heterogeneity will broaden the range of methylation density observed in healthy individuals. Subsequently, for samples considered to exhibit hypomethylation, a greater reduction in methylation density will be required. This will lead to decreased test sensitivity.

相比之下,基于大规模平行测序的方法将基因组分成1Mb区域(或其它尺寸的区域)并且单独测量此类区域的甲基化密度。当在测试样品与对照之间比较每个区域时, 此方法降低了基线甲基化密度跨越不同基因组区域的变化的影响。实际上,在相同区域 内,跨越32个健康对照的个体间的变化相对较小。横越32个健康对照,95%的区域的 变异系数(CV)≤1.8%。然而,为了进一步增强检测癌症相关的低甲基化的灵敏度,可 以跨越多个基因组区域进行比较。当在只测试一个区域时癌症样品的特定区域碰巧未显示低甲基化时,将通过测试多个基因组区域来增强灵敏度,因为其将抵御生物变化的影 响。In contrast, methods based on massively parallel sequencing divide the genome into 1Mb regions (or other sizes) and measure the methylation density of such regions individually. This approach reduces the impact of variations in baseline methylation density across different genomic regions when comparing each region between test samples and controls. In fact, within the same region, the variation between individuals across 32 healthy controls was relatively small. Across 32 healthy controls, the coefficient of variation (CV) for 95% of the regions was ≤1.8%. However, to further enhance the sensitivity of detecting cancer-related hypomethylation, comparisons can be made across multiple genomic regions. Sensitivity will be enhanced by testing multiple genomic regions when a specific region in a cancer sample happens not to show hypomethylation when only one region is tested, as this will protect against the influence of biological variations.

比较对照与测试样品之间的同等基因组区域的甲基化密度和对多个基因组区域执 行此比较的方法(例如分开测试每个基因组区域并随后可能梳理此类结果)对于检测与癌症相关的低甲基化来说具有较高的信噪比。此大规模平行测序方法以说明的方式展 示。可以确定多个基因组区域的甲基化密度并允许比较对照与测试样品之间的对应区域 的甲基化密度的其它方法也可以实现类似的效果。举例来说,可以设计出可以靶向源自 特定基因组区域的血浆DNA分子以及确定此区域的甲基化水平的杂交探针或分子倒置 探针来实现所希望的效果。Comparing the methylation density of equivalent genomic regions between control and test samples, and methods that perform this comparison on multiple genomic regions (e.g., testing each genomic region separately and then potentially combing through these results), offer a high signal-to-noise ratio for detecting cancer-associated hypomethylation. This massively parallel sequencing method is illustrated illustratively. Other methods that can determine the methylation density of multiple genomic regions and allow comparisons of the methylation density of corresponding regions between control and test samples can achieve similar results. For example, hybridization probes or molecular inversion probes that can target plasma DNA molecules derived from specific genomic regions and determine the methylation level of that region can be designed to achieve the desired effect.

在又一实施例中,所有区域的z分数的总和可以用于确定癌症是否存在或用于监测 血浆DNA甲基化水平的连续改变。归因于肿瘤DNA的总低甲基化性,在从患有癌症的个体收集的血浆中z分数的总和低于健康对照。HCC患者的手术前和后血浆的z分数的 总和分别是-49843.8和-3132.13。In another embodiment, the sum of z-scores across all regions can be used to determine the presence of cancer or to monitor continuous changes in plasma DNA methylation levels. Due to the overall hypomethylation of tumor DNA, the sum of z-scores in plasma collected from individuals with cancer is lower than in healthy controls. The sum of z-scores in plasma from HCC patients before and after surgery were -49843.8 and -3132.13, respectively.

在其它实施例中,其它方法可以用于研究血浆DNA的甲基化水平。举例来说,可 以使用质谱分析(陈等人2013临床化学;59:824-832(ML Chen et al.2013Clin Chem;59:824-832))或大规模平行测序确定甲基化胞嘧啶残基占胞嘧啶残基总含量的比例。但是,因为大部分的胞嘧啶残基不在CpG双核苷酸背景下,所以在与在CpG双核苷酸的情况 下评估的甲基化水平相比时,总胞嘧啶残基中甲基化胞嘧啶的比例将相对较小。确定从 HCC患者获得的组织和血浆样品以及从健康对照获得的四个血浆样品的甲基化水平。在 CpG、任何胞嘧啶背景下、在CHG和CHH背景下使用全基因组大规模平行测序数据测 量甲基化水平。H是指腺嘌呤、胸腺嘧啶或胞嘧啶残基。In other embodiments, other methods can be used to study the methylation level of plasma DNA. For example, mass spectrometry (ML Chen et al. 2013 Clin Chem; 59:824-832) or massively parallel sequencing can be used to determine the proportion of methylated cytosine residues to the total cytosine residues. However, because most cytosine residues are not in a CpG dinucleotide background, the proportion of methylated cytosine in the total cytosine residues will be relatively small compared to the methylation levels assessed in the CpG dinucleotide context. The methylation levels of tissue and plasma samples obtained from HCC patients and four plasma samples obtained from healthy controls were determined. Methylation levels were measured using whole-genome massively parallel sequencing data in CpG, any cytosine background, and CHG and CHH backgrounds. H refers to adenine, thymine, or cytosine residues.

图26D是表2660,展示使用CHH和CHG背景时,肿瘤组织和与一些对照血浆样品重叠的手术前血浆样品的甲基化水平。当与白细胞层、非肿瘤肝组织、手术后血浆样 品和健康对照血浆样品相比时,在CpG与未指定胞嘧啶中,肿瘤组织和手术前血浆样品 的甲基化水平都一致更低。但是,基于甲基化CpG的数据,即甲基化密度,展示比基于 甲基化胞嘧啶的数据更宽的动态范围。Figure 26D is Table 2660, showing the methylation levels of tumor tissue and preoperative plasma samples overlapping with some control plasma samples when using CHH and CHG backgrounds. When compared with leukocyte layer, non-tumor liver tissue, postoperative plasma samples, and healthy control plasma samples, the methylation levels in both tumor tissue and preoperative plasma samples were consistently lower in both CpG and unspecified cytosine. However, data based on methylated CpG, i.e., methylation density, show a wider dynamic range than data based on methylated cytosine.

在其它实施例中,血浆DNA的甲基化状态可以通过使用针对甲基化胞嘧啶的抗体的方法,例如甲基化DNA免疫沉淀(MeDIP)确定。但是,预期这些方法的精确度不如基于测序的方法,因为抗体结合存在变化性。在又一实施例中,可以确定血浆DNA 中5-羟基甲基胞嘧啶的水平。就此而言,已经发现5-羟基甲基胞嘧啶的水平降低是例如 黑色素瘤等某些癌症的表观遗传特征(利安等人2012细胞;150:1135-1146(CG Lian,et al.2012Cell;150:1135-1146))。In other embodiments, the methylation status of plasma DNA can be determined using methods employing antibodies against methylated cytosine, such as methylated DNA immunoprecipitation (MeDIP). However, these methods are not expected to be as accurate as sequencing-based methods because antibody binding is variable. In yet another embodiment, the level of 5-hydroxymethylcytosine in plasma DNA can be determined. In this regard, reduced levels of 5-hydroxymethylcytosine have been found to be an epigenetic characteristic of certain cancers, such as melanoma (C.Lian et al. 2012 Cell; 150:1135-1146).

除HCC外,还研究了此方法是否可以应用于其它类型的癌症。分析来自2个肺腺 癌患者(CL1和CL2)、2个鼻咽癌患者(NPC1和NPC2)、2个结肠直肠癌患者(CRC1 和CRC2)、1个转移性神经内分泌肿瘤患者(NE1)和1个转移性平滑肌肉瘤患者(SMS1) 的血浆样品。这些个体的血浆DNA经亚硫酸氢盐转变并使用伊路米那HiSeq2000平台 在一端测序50bp。上述四个健康对照个体用作分析这8个患者的参考群体。使用一端 50bp的序列读数。全基因组划分成1Mb区域。使用来自参考群体的数据,计算每个区 域的甲基化密度的平均值和SD。随后8个癌症患者的结果表示为z分数,z分数表示距 离参考群体的平均值的SD的数目。正值指示测试案例的甲基化密度低于参考群体的平 均值,且反之亦然。图27I的表2780中展示每一样品所实现的序列读数的数目和测序深 度。In addition to HCC, the applicability of this method to other types of cancer was investigated. Plasma samples from two patients with lung adenocarcinoma (CL1 and CL2), two patients with nasopharyngeal carcinoma (NPC1 and NPC2), two patients with colorectal cancer (CRC1 and CRC2), one patient with metastatic neuroendocrine tumor (NE1), and one patient with metastatic leiomyosarcoma (SMS1) were analyzed. The plasma DNA of these individuals was bisulfite-converted and sequenced at one end (50 bp) using the ilumina HiSeq2000 platform. Four healthy controls were used as a reference population for analyzing these eight patients. The 50 bp sequence reads were used. The entire genome was divided into 1 Mb regions. Using data from the reference population, the mean methylation density and SD of each region were calculated. The results for the eight cancer patients were then expressed as z-scores, representing the number of SDs from the reference population mean. A positive value indicates that the methylation density of the test case was lower than the reference population mean, and vice versa. Table 2780 in Figure 27I shows the number of sequence reads and sequencing depth achieved for each sample.

图27A-H展示根据本发明的实施例,8个癌症患者的甲基化密度的Circos图。每个点表示一个1Mb区域的结果。黑点表示z分数在-3与3之间的区域。红点表示z分数<-3的区域。绿点表示z分数>3的区域。两个连续线条之间的区间表示z分数差异为20。Figures 27A-H show Circos plots of methylation density for eight cancer patients according to an embodiment of the present invention. Each dot represents a 1 Mb region. Black dots represent regions with z-scores between -3 and 3. Red dots represent regions with z-scores < -3. Green dots represent regions with z-scores > 3. The interval between two consecutive lines represents a z-score difference of 20.

对于大部分类型的癌症,包括肺癌、鼻咽癌、结肠直肠癌和转移性神经内分泌肿瘤的患者,在跨越基因组的多个区域中观测到显著低甲基化。有趣的是,除低甲基化外, 还在转移性平滑肌肉瘤的情况下,在跨越基因组的多个区域中观测到显著高甲基化。平 滑肌肉瘤的胚胎来源是中胚层,而剩余7个患者中其它类型的癌症的胚胎来源是外胚层。 因此,可能肉瘤的DNA甲基化模式不同于癌瘤。For most types of cancer, including lung cancer, nasopharyngeal carcinoma, colorectal cancer, and metastatic neuroendocrine tumors, significant hypomethylation was observed in multiple regions across the genome. Interestingly, in addition to hypomethylation, significant hypermethylation was also observed in multiple regions across the genome in the case of metastatic leiomyosarcoma. Leiomyosarcoma has an embryonic origin from the mesoderm, while the other cancer types in the remaining seven patients had an embryonic origin from the ectoderm. Therefore, the DNA methylation pattern of sarcoma may differ from that of carcinoma.

由此案例可以看出,血浆DNA的甲基化模式还可以适用于区分不同类型的癌症,在此实例中是区分癌瘤与肉瘤。这些数据还表明所述方法可以用于检测与恶性疾病相关的异常高甲基化。对于所有这8个案例,仅仅获得血浆样品并且未分析肿瘤组织。由此说明即使没有先验的肿瘤组织甲基化型态或甲基化水平,也可以使用所描述的方法容易 地检测到血浆中的肿瘤来源的DNA。This case demonstrates that plasma DNA methylation patterns can also be used to differentiate between different types of cancer, in this example, carcinoma and sarcoma. These data also indicate that the described method can be used to detect abnormally high methylation associated with malignant diseases. For all eight cases, only plasma samples were obtained, and tumor tissue was not analyzed. This demonstrates that even without prior knowledge of tumor tissue methylation patterns or levels, tumor-derived DNA in plasma can be easily detected using the described method.

图27J是表2790,展示不同恶性病患者的血浆中全基因组的所有1Mb区域的z分 数的分布。展示每个案例的z分数<-3、-3到3和>3的区域的百分比。所有案例中超过 5%的区域显示z分数<-3。因此,如果使用区域的5%为显著低甲基化的阈值用于将样品 分类为癌症阳性,那么所有这些案例将归类为癌症阳性。结果展示低甲基化很可能是不 同类型癌症的普遍现象,并且血浆甲基化组分析将适用于检测不同类型的癌症。Figure 27J is a direct translation of Table 2790, showing the distribution of z-scores for all 1Mb regions of the whole genome in the plasma of patients with different malignancies. It shows the percentage of regions with z-scores <-3, -3 to 3, and >3 for each case. More than 5% of the regions in all cases showed z-scores <-3. Therefore, if 5% of the regions are used as a threshold for significant hypomethylation to classify samples as cancer-positive, then all these cases would be classified as cancer-positive. The results demonstrate that hypomethylation is likely a common phenomenon across different types of cancer, and that plasma methylome analysis would be applicable for detecting different types of cancer.

D.方法D. Method

图28是方法2800的流程图,所述方法2800根据本发明的实施例,分析生物体的 生物样品以确定癌症等级的分类。生物样品包括源自正常细胞的DNA并且可能包括来 自与癌症相关的细胞的DNA。生物样品中至少一些DNA可能是游离的。Figure 28 is a flowchart of method 2800, which, according to an embodiment of the invention, analyzes a biological sample of an organism to determine the classification of cancer grade. The biological sample includes DNA derived from normal cells and may include DNA from cancer-associated cells. At least some of the DNA in the biological sample may be free.

在框2810处,分析来自生物样品的多个DNA分子。DNA分子的分析可以包括确 定生物体的基因组中DNA分子的位置和确定DNA分子是否在一或多个位点甲基化。所 述分析可以通过接收来自可识别甲基化的测序的序列读数进行,因而分析可以只在先前 从DNA获得的数据上进行。在其它实施例中,分析可以包括实际测序或获得数据的其 它步骤。At box 2810, multiple DNA molecules from a biological sample are analyzed. The analysis of DNA molecules may include determining the location of DNA molecules within the organism's genome and determining whether the DNA molecules are methylated at one or more sites. The analysis can be performed by receiving sequence reads from sequencing that identifies methylation, thus the analysis can be performed solely on data previously obtained from the DNA. In other embodiments, the analysis may include actual sequencing or other steps to obtain data.

在框2820处,针对多个位点每一者,确定在所述位点甲基化的DNA分子的相应数目。在一个实施例中,位点是CpG位点,并且可能仅仅是某些CpG位点,如使用本文 中提及的一或多个标准来选择。一旦使用在特定位点分析的DNA分子的总数,例如序 列读数的总数进行标准化,那么甲基化的DNA的数目等同于确定未甲基化的数目。举 例来说,区域的CpG甲基化密度的增加等同于相同区域的未甲基化CpG的密度降低。At box 2820, for each of the plurality of sites, the corresponding number of DNA molecules methylated at that site is determined. In one embodiment, the site is a CpG site, and may be only certain CpG sites, selected using one or more criteria mentioned herein. Once normalized using the total number of DNA molecules analyzed at a particular site, such as the total number of sequence reads, the number of methylated DNA is equivalent to determining the number of unmethylated DNA. For example, an increase in the CpG methylation density of a region is equivalent to a decrease in the density of unmethylated CpG in the same region.

在框2830处,基于在多个位点甲基化的DNA分子的相应数目,计算第一甲基化水平。第一甲基化水平可以对应于基于与多个位点相对应的DNA分子的数目确定的甲基 化密度。位点可以对应于多个基因座或仅仅一个基因座。At box 2830, a first methylation level is calculated based on the corresponding number of DNA molecules methylated at multiple sites. The first methylation level may correspond to a methylation density determined based on the number of DNA molecules corresponding to the multiple sites. A site may correspond to multiple loci or only one locus.

在框2840处,比较第一甲基化水平与第一阈值。第一阈值可以是参考甲基化水平或与参考甲基化水平有关(例如与正常水平的指定距离)。参考甲基化水平可以从无癌症的个体的样品或从已知不与生物体的癌症相关的基因座或生物体确定。第一阈值可以 从参考甲基化水平建立,所述参考甲基化水平从先前获得的生物体的生物样品确定,其 可以先于测试的生物样品。At box 2840, a first methylation level is compared to a first threshold. The first threshold can be a reference methylation level or related to a reference methylation level (e.g., a specified distance from a normal level). The reference methylation level can be determined from a sample from a cancer-free individual or from a locus or organism known not to be associated with cancer in the organism. The first threshold can be established from a reference methylation level determined from a previously obtained biological sample of an organism, which may precede the biological sample being tested.

在一个实施例中,第一阈值是与由从健康生物体获得的生物样品建立的参考甲基化 水平的指定距离(例如指定数目的标准偏差)。比较可以通过确定第一甲基化水平与参考甲基化水平之间的差异,并随后比较与对应于第一阈值的阈值的差异(例如以确定甲 基化水平与参考甲基化水平是否有统计学上差异)来进行。In one embodiment, the first threshold is a specified distance (e.g., a specified number of standard deviations) from a reference methylation level established from a biological sample obtained from a healthy organism. Comparisons can be made by determining the difference between the first methylation level and the reference methylation level, and subsequently comparing the difference to a threshold corresponding to the first threshold (e.g., to determine if there is a statistically significant difference between the methylation level and the reference methylation level).

在框2850处,基于比较,确定癌症等级的分类。癌症等级的实例包括个体是否具有癌症或癌变前病状,或患上癌症的可能性增加。在一个实施例中,第一阈值可以由先 前从该个体获得的样品确定(例如参考甲基化水平可以由先前样品确定)。At box 2850, a cancer grade classification is determined based on comparison. Examples of cancer grades include whether an individual has cancer or precancerous symptoms, or an increased likelihood of developing cancer. In one embodiment, a first threshold may be determined from a sample previously obtained from the individual (e.g., a reference methylation level may be determined from a previous sample).

在一些实施例中,第一甲基化水平可以对应于甲基化水平超过阈值的区域数目。举 例来说,可以鉴别生物体基因组的多个区域。所述区域可以使用本文中提及的标准,例如一定长度或一定数目的位点来鉴别。可以鉴别每个区域内的一或多个位点(例如CpG 位点)。可以计算每个区域的区域甲基化水平。第一甲基化水平是针对第一区域。每个区域甲基化水平与相应区域阈值比较,所述区域阈值可以相同或随区域而变化。第一区 域的区域阈值是第一阈值。相应区域阈值可以是距离参考甲基化水平的指定量(例如 0.5),由此仅仅计数与参考有显著性差异的区域,参考可以从非癌症个体确定。In some embodiments, a first methylation level may correspond to the number of regions whose methylation levels exceed a threshold. For example, multiple regions of an organism's genome may be identified. These regions may be identified using criteria mentioned herein, such as a certain length or a certain number of sites. One or more sites (e.g., CpG sites) may be identified within each region. A regional methylation level may be calculated for each region. The first methylation level is for a first region. The methylation level of each region is compared to a corresponding regional threshold, which may be the same or vary with the region. The regional threshold for the first region is the first threshold. The corresponding regional threshold may be a specified amount of distance from a reference methylation level (e.g., 0.5), thereby counting only regions that differ significantly from the reference, which may be determined from a non-cancer individual.

可以确定区域甲基化水平超出相应区域阈值的区域第一数目,并与阈值比较来确定 分类。在一个实现方式中,阈值是百分比。第一数目与阈值比较可以包括在与阈值比较前将区域第一数目除以区域第二数目(例如所有区域),例如作为标准化过程的一部分。A first number of regions whose methylation levels exceed a corresponding region threshold can be determined and compared with the threshold to determine the classification. In one implementation, the threshold is a percentage. The comparison of the first number with the threshold may include dividing the first number of regions by a second number of regions (e.g., all regions) before comparing with the threshold, for example as part of a normalization process.

如上所述,生物样品中肿瘤DNA的百分比浓度可以用于计算第一阈值。百分比浓度可以简单地评估为超过最小值,而百分比浓度低于最小值的样品可以标记为例如不适合于分析。最小值可以基于肿瘤甲基化水平相对于参考甲基化水平的预期差异来确定。 举例来说,如果差异是0.5(例如作为某一阈值),那么将需要某一肿瘤浓度足够高以满 足此差异。As mentioned above, the percentage concentration of tumor DNA in a biological sample can be used to calculate a first threshold. A percentage concentration can be simply assessed as exceeding a minimum value, while samples with percentage concentrations below the minimum value can be flagged as, for example, unsuitable for analysis. The minimum value can be determined based on the expected difference between the tumor methylation level and a reference methylation level. For example, if the difference is 0.5 (e.g., as a threshold), then a sufficiently high tumor concentration would be required to satisfy this difference.

来自方法1300的特定技术可以用于方法2800。在方法1300中,可以针对肿瘤确定拷贝数变异(例如其中可以针对相对于肿瘤的第二染色体区域具有拷贝数改变,测试肿 瘤的第一染色体区域)。因此,方法1300可以假设肿瘤存在。在方法2800中,可以测 试样品不管任何拷贝数特征如何,是否存在任何肿瘤的指示。两种方法的一些技术可能 是类似的。但是,相对于一些区域可能具有拷贝数变异的癌症DNA与非癌症DNA的混合物与参考甲基化水平的差异,方法2800的阈值和甲基化参数(例如标准化的甲基化 水平)可以检测与非癌症DNA的参考甲基化水平的统计差异。因此,方法2800的参考 值可以从无癌症的样品中确定,例如从无癌症的生物体或从相同患者的非癌症组织(例如先前采集的血浆或从同时获得的已知无癌症(可以从细胞DNA确定)的样品)。Specific techniques from Method 1300 can be used in Method 2800. In Method 1300, copy number variations can be identified for tumors (e.g., a first chromosomal region of the tumor can be tested for copy number changes relative to a second chromosomal region of the tumor). Therefore, Method 1300 can presuppose the presence of a tumor. In Method 2800, a sample can be tested for any indication of the presence of a tumor, regardless of any copy number characteristics. Some techniques of the two methods may be similar. However, Method 2800's thresholds and methylation parameters (e.g., normalized methylation levels) can detect statistical differences from a reference methylation level to a mixture of cancer DNA and non-cancer DNA, where some regions may have copy number variations, relative to a reference methylation level. Therefore, the reference value of Method 2800 can be determined from cancer-free samples, such as from a cancer-free organism or from non-cancer tissue from the same patient (e.g., previously collected plasma or from a simultaneously obtained sample known to be cancer-free (which can be determined from cellular DNA)).

E.预测使用血浆DNA甲基化分析检测的肿瘤DNA的最小百分比浓度E. Predicting the minimum percentage concentration of tumor DNA to be detected using plasma DNA methylation analysis.

一种测量使用血浆DNA的甲基化水平检测癌症的方法的灵敏度的方式涉及揭露血浆DNA甲基化水平与对照相比时的改变所需要的最小分数肿瘤来源的DNA浓度。测试 灵敏度还依赖于肿瘤组织中DNA甲基化与健康对照或血细胞DNA中基线血浆DNA甲 基化水平之间的差异程度。血细胞是健康个体血浆中DNA的主要来源。差异越大,癌 症患者可以越容易与非癌症个体区分并且将反映为血浆中肿瘤来源的检测下限越低,以 及检测癌症患者的临床灵敏度越高。此外,健康个体中或不同年龄的个体中血浆DNA 甲基化的变化(汉纳姆等人2013分子细胞;49:359-367(G Hannum et al.2013Mol Cell; 49:359-367))也会影响检测与癌症存在相关的甲基化改变的灵敏度。健康个体中血浆DNA甲基化的较小变化将使由少量癌症来源的DNA的存在引起的改变的检测更容易。One approach to measuring the sensitivity of a method for detecting cancer using plasma DNA methylation levels involves the minimum fraction of tumor-derived DNA concentration required to reveal changes in plasma DNA methylation levels compared to a control. Test sensitivity also depends on the degree of difference between DNA methylation in tumor tissue and baseline plasma DNA methylation levels in healthy controls or blood cells. Blood cells are the primary source of DNA in the plasma of healthy individuals. A greater difference makes it easier to distinguish cancer patients from non-cancer individuals and results in a lower detection limit reflecting tumor origin in plasma, as well as higher clinical sensitivity in detecting cancer patients. Furthermore, variations in plasma DNA methylation in healthy individuals or individuals of different ages (G Hannum et al. 2013 Mol Cell; 49:359-367) also affect the sensitivity for detecting methylation changes associated with the presence of cancer. Smaller changes in plasma DNA methylation in healthy individuals will make it easier to detect changes caused by the presence of small amounts of cancer-derived DNA.

图29A是曲线2900,展示参考个体中甲基化密度的分布,假定此分布遵循正态分布。此分析是基于每个血浆样品仅仅提供一个甲基化密度值,例如所有常染色体或特定 染色体的甲基化密度。其展示分析的特异性将如何受影响。在一个实施例中,比参考个 体的平均DNA甲基化密度低3个SD的阈值用于确定测试样品是否比来自参考个体的 样品显著更低甲基化。当使用此阈值时,预期大约0.15%的非癌症个体将具有被归类为 患有癌症的假阳性结果,产生99.85%的特异性。Figure 29A, curve 2900, illustrates the distribution of methylation density in the reference individuals, assuming this distribution follows a normal distribution. This analysis is based on providing only one methylation density value per plasma sample, such as the methylation density of all autosomes or a specific chromosome. It illustrates how the specificity of the analysis will be affected. In one embodiment, a threshold of 3 SD lower than the average DNA methylation density of the reference individuals is used to determine whether the test sample is significantly lower in methylation than the sample from the reference individuals. When using this threshold, approximately 0.15% of non-cancer individuals are expected to have false positive results classified as having cancer, producing a specificity of 99.85%.

图29B是曲线2950,展示参考个体和癌症患者中甲基化密度的分布。阈值是比参考个体的甲基化密度的平均值低3个SD。如果癌症患者的甲基化密度的平均值比阈值 低2个SD(即比参考个体的平均值低5个SD),那么将预期97.5%的癌症个体具有低于 阈值的甲基化密度。换句话说,如果提供每个个体的一个甲基化密度值,例如当分析所 有常染色体或特定染色体的全基因组的总甲基化密度时,那么预期灵敏度将是97.5%。 两个群体的平均甲基化密度之间的差异受两个因素影响,即癌症与非癌症组织之间的甲基化水平的差异程度和血浆样品中肿瘤来源的DNA的百分比浓度。这两个参数值越高, 这两个群体的甲基化密度的值的差异就越高。此外,两个群体的甲基化密度的分布的SD 越低,两个群体的甲基化密度的分布的重叠越小。Figure 29B, curve 2950, illustrates the distribution of methylation density in reference individuals and cancer patients. The threshold is 3 SD lower than the mean methylation density of the reference individuals. If the mean methylation density of cancer patients is 2 SD lower than the threshold (i.e., 5 SD lower than the mean of the reference individuals), then it would be expected that 97.5% of cancer individuals have a methylation density below the threshold. In other words, if a single methylation density value is provided for each individual, such as when analyzing the total methylation density of the entire genome across all autosomes or a specific chromosome, then the expected sensitivity would be 97.5%. The difference in mean methylation density between the two populations is influenced by two factors: the degree of difference in methylation levels between cancer and non-cancer tissues and the percentage concentration of tumor-derived DNA in the plasma sample. The higher these two parameters are, the greater the difference in methylation density values between the two populations. Furthermore, the lower the SD of the distribution of methylation density between the two populations, the less overlap there is in the distribution of methylation density between the two populations.

此处,使用假设的实例说明此概念。假设肿瘤组织的甲基化密度大约是0.45并且健 康个体的血浆DNA的甲基化密度大约是0.7。这些假设值类似于从HCC患者获得的值, 其中常染色体的总甲基化密度是42.9%并且来自健康对照的血浆样品的常染色体的平均 甲基化密度是71.6%。假设测量全基因组的血浆DNA甲基化密度的CV是1%,阈值将 是0.7×(100%-3×1%)=0.679。为了实现97.5%的灵敏度,癌症患者的血浆DNA的平均 甲基化密度需要大约0.679-0.7×(2×1%)=0.665。假设f表示血浆样品中肿瘤来源的DNA 的百分比浓度。那么f可以计算为(0.7-0.45)×f=0.7-0.665。因此,f大约是14%。由此 计算,据估计如果全基因组的总甲基化密度用作诊断参数,那么可以在血浆中检测到的最小百分比浓度是14%,以便实现97.5%的诊断灵敏度。Here, we use a hypothetical example to illustrate this concept. Assume the methylation density of tumor tissue is approximately 0.45 and the methylation density of plasma DNA in a healthy individual is approximately 0.7. These hypothetical values are similar to those obtained from HCC patients, where the total methylation density of autosomes is 42.9% and the average methylation density of autosomes from plasma samples from healthy controls is 71.6%. Assuming the CV for measuring genome-wide plasma DNA methylation density is 1%, the threshold would be 0.7 × (100% - 3 × 1%) = 0.679. To achieve a sensitivity of 97.5%, the average methylation density of plasma DNA from cancer patients would need to be approximately 0.679 - 0.7 × (2 × 1%) = 0.665. Let f represent the percentage concentration of tumor-derived DNA in the plasma sample. Then f can be calculated as (0.7 - 0.45) × f = 0.7 - 0.665. Therefore, f is approximately 14%. Based on this calculation, it is estimated that if the total methylation density of the whole genome is used as a diagnostic parameter, the minimum percentage concentration that can be detected in plasma is 14% in order to achieve a diagnostic sensitivity of 97.5%.

接下来对从HCC患者获得的数据进行此分析。针对此说明,对于每个样品,仅仅 基于由所有常染色体评估的值进行一次甲基化密度测量。在从健康个体获得的血浆样品 中平均甲基化密度是71.6%。这四个样品的甲基化密度的SD是0.631%。因此,血浆甲 基化密度的阈值将需要是71.6%-3×0.631%=69.7%,以达到z分数<-3并且特异性为 99.85%。为了实现97.5%的灵敏度,癌症患者的平均血浆甲基化密度将需要比阈值低2 个SD,即68.4%。因为肿瘤组织的甲基化密度是42.9%并且使用式:P=BKG×(1-f)+TUM ×f,f将需要是至少11.1%。The following analysis was performed on data obtained from HCC patients. For this purpose, methylation density was measured only once per sample, based on values assessed by all autosomes. The mean methylation density in plasma samples obtained from healthy individuals was 71.6%. The SD of methylation density for these four samples was 0.631%. Therefore, the threshold for plasma methylation density would need to be 71.6% - 3 × 0.631% = 69.7% to achieve a z-score < -3 and a specificity of 99.85%. To achieve a sensitivity of 97.5%, the mean plasma methylation density in cancer patients would need to be 2 SDs lower than the threshold, i.e., 68.4%. Because the methylation density in tumor tissue is 42.9% and using the formula: P = BKG × (1 - f) + TUM × f, f would need to be at least 11.1%.

在另一个实施例中,不同基因组区域的甲基化密度可以例如如图25A或26B中所示来分开分析。换句话说,对于每个样品,进行甲基化水平的多次测量。如下文所示,可 以在血浆中低得多的分数肿瘤DNA浓度下检测到显著低甲基化,因而将增强血浆DNA甲基化分析检测癌症的诊断性能。可以计数展示甲基化密度与参考群体的显著偏差的基 因组区域的数目。随后基因组区域的数目可以与阈值比较以确定跨越所研究基因组区域 的群体,例如全基因组的1Mb区域是否存在血浆DNA的总显著低甲基化。阈值可以通过分析一组无癌症的参考个体建立或用数学方法,例如根据正态分布函数推导出。In another embodiment, the methylation density of different genomic regions can be analyzed separately, for example, as shown in Figures 25A or 26B. In other words, multiple measurements of methylation levels are performed for each sample. As explained below, significant hypomethylation can be detected at much lower fractions of tumor DNA concentration in plasma, thus enhancing the diagnostic performance of plasma DNA methylation analysis for cancer detection. The number of genomic regions exhibiting significant deviations in methylation density from a reference population can be counted. The number of genomic regions can then be compared with a threshold to determine whether significant hypomethylation of plasma DNA exists across the population spanning the studied genomic regions, such as a 1Mb region across the entire genome. The threshold can be established by analyzing a group of cancer-free reference individuals or derived mathematically, for example, based on a normal distribution function.

图30是曲线3000,展示健康个体和癌症患者的血浆DNA的甲基化密度的分布。每个1Mb区域的甲基化密度与参考群体的对应值比较。确定展示显著低甲基化(比参考 群体的平均值低3个SD)的区域的百分比。10%显著低甲基化的阈值用于确定血浆样品 中是否存在肿瘤来源的DNA。还可以根据所希望的测试灵敏度和特异性使用其它的阈值,例如5%、15%、20%、25%、30%、35%、40%、45%、50%、60%、70%、80%或 90%。Figure 30 shows curve 3000, illustrating the distribution of methylation density in plasma DNA of healthy individuals and cancer patients. The methylation density of each 1Mb region is compared to the corresponding value in the reference population. The percentage of regions exhibiting significant hypomethylation (3 SD lower than the reference population mean) is determined. A threshold of 10% significant hypomethylation is used to determine the presence of tumor-derived DNA in the plasma sample. Other thresholds, such as 5%, 15%, 20%, 25%, 30%, 35%, 40%, 45%, 50%, 60%, 70%, 80%, or 90%, can also be used depending on the desired test sensitivity and specificity.

举例来说,为了将样品分类为含有肿瘤来源的DNA,可以使用10%的1Mb区域展 示显著低甲基化(z分数<-3)作为阈值。如果超过10%的区域比参考群体显著更低甲基 化,那么样品归类为癌症测试阳性。对于每个1Mb区域,使用比参考群体的平均甲基 化密度低3个SD的阈值界定样品为显著更低甲基化。对于每个1Mb区域,如果癌症 患者的平均血浆DNA甲基化密度比参考个体的平均血浆DNA甲基化密度低1.72个SD,那么有10%的机率,癌症患者的任何特定区域的甲基化密度值将低于阈值(即z分数<-3) 并得到阳性结果。随后,如果查看全基因组的所有1Mb区域,那么大约10%的区域将 预期展示具有显著较低的甲基化密度的阳性结果(即z分数<-3)。假设健康个体的血浆 DNA的总甲基化密度大约是0.7并且测量每个1Mb区域的血浆DNA甲基化密度的变 异系数(CV)是1%,癌症患者的血浆DNA的平均甲基化密度将需要是0.7×(100%-1.72×1%)=0.68796。f是血浆中肿瘤来源的DNA的百分比浓度,以便实现此平均血浆 DNA甲基化密度。假设肿瘤组织的甲基化密度是0.45,f可以使用以下等式计算:For example, to classify a sample as containing tumor-derived DNA, a threshold of 10% of 1Mb regions showing significant hypomethylation (z-score < -3) can be used. If more than 10% of regions are significantly hypomethylated compared to the reference population, the sample is classified as positive for a cancer test. For each 1Mb region, a threshold of 3 SD lower than the average methylation density of the reference population is used to define a sample as significantly hypomethylated. For each 1Mb region, if the average plasma DNA methylation density of cancer patients is 1.72 SD lower than the average plasma DNA methylation density of reference individuals, there is a 10% chance that the methylation density value of any particular region in the cancer patient will be below the threshold (i.e., z-score < -3) and result in a positive result. Subsequently, if all 1Mb regions of the whole genome are examined, approximately 10% of regions will be expected to show a positive result with significantly lower methylation density (i.e., z-score < -3). Assuming the total methylation density of plasma DNA in a healthy individual is approximately 0.7 and the coefficient of variation (CV) for measuring plasma DNA methylation density per 1 Mb region is 1%, the average methylation density of plasma DNA in a cancer patient would need to be 0.7 × (100% - 1.72 × 1%) = 0.68796. f is the percentage concentration of tumor-derived DNA in the plasma to achieve this average plasma DNA methylation density. Assuming the methylation density of tumor tissue is 0.45, f can be calculated using the following equation:

其中表示参考个体中血浆DNA的平均甲基化密度,M肿瘤表示癌症患者中肿瘤组织的甲基化密度;以及表示癌症患者中血浆DNA的平均甲基化密度。Wherein represents the average methylation density of plasma DNA in the reference individual, M <sub>tumor</sub> represents the methylation density of tumor tissue in the cancer patient; and represents the average methylation density of plasma DNA in the cancer patient.

使用此等式,(0.7-0.45)×f=0.7-0.68796。因此,最小百分比浓度可以使用此方法检 测,将推断为4.8%。灵敏度可以通过降低显著更低甲基化的区域的截止百分比,例如从 10%到5%而进一步增强。Using this equation, (0.7-0.45)×f=0.7-0.68796. Therefore, the minimum percentage concentration that can be detected using this method will be inferred to be 4.8%. Sensitivity can be further enhanced by reducing the cutoff percentage of significantly lower methylated regions, for example from 10% to 5%.

如以上实例中所示,此方法的灵敏度由癌症与例如血细胞等非癌症组织之间的甲基 化水平的差异程度决定。在一个实施例中,只选择展示在非癌症个体与肿瘤组织的血浆DNA之间的甲基化密度差异大的染色体区域。在一个实施例中,只选择甲基化密度的差异>0.5的区域。在其它实施例中,0.4、0.6、0.7、0.8或0.9的差异可以用于选择适合的 区域。在又一个实施例中,基因组区域的实际尺寸不固定。实际上,例如基于固定读数 深度或固定数目的CpG位点界定基因组区域。针对每个样品,评估多个这些基因组区域 的甲基化水平。As illustrated in the examples above, the sensitivity of this method is determined by the degree of difference in methylation levels between cancer and non-cancer tissues, such as blood cells. In one embodiment, only chromosomal regions exhibiting large differences in methylation density between plasma DNA from non-cancer individuals and tumor tissues are selected. In another embodiment, only regions with a methylation density difference > 0.5 are selected. In other embodiments, differences of 0.4, 0.6, 0.7, 0.8, or 0.9 can be used to select suitable regions. In yet another embodiment, the actual size of the genomic region is not fixed. In practice, genomic regions are defined, for example, based on a fixed read depth or a fixed number of CpG sites. For each sample, the methylation levels of multiple of these genomic regions are evaluated.

图31是图表3100,展示健康个体的血浆DNA与HCC患者的肿瘤组织的平均值之 间的甲基化密度差异的分布。正值表示健康个体的血浆DNA中甲基化密度更高并且负 值表示肿瘤组织中甲基化密度更高。Figure 31, or Chart 3100, shows the distribution of the difference in methylation density between plasma DNA from healthy individuals and the mean methylation density in tumor tissue from HCC patients. Positive values indicate higher methylation density in plasma DNA from healthy individuals, while negative values indicate higher methylation density in tumor tissue.

在一个实施例中,可以选择在癌症与非癌症组织的甲基化密度之间差异最大的区域,例如差异>0.5的区域,不管这些区域的肿瘤低甲基化还是高甲基化。假定血浆中肿 瘤来源的DNA的百分比浓度相同,血浆中肿瘤来源的DNA的的百分比浓度的检测极限 可以通过集中于这些区域上来降低,因为癌症与非癌症个体之间的血浆DNA甲基化水 平的分布差异较大。举例来说,如果仅仅使用差异>0.5的区域并采用10%的区域显著更 低甲基化的阈值来确定测试个体是否患有癌症,那么可以使用以下等式计算所检测的肿 瘤来源的DNA的最小百分比浓度(f):其中表 示参考个体中血浆DNA的平均甲基化密度,M肿瘤表示癌症患者中肿瘤组织的甲基化密 度;以及表示癌症患者中血浆DNA的平均甲基化密度。In one embodiment, regions with the greatest difference in methylation density between cancerous and non-cancer tissues can be selected, such as regions with a difference > 0.5, regardless of whether these regions are hypomethylated or hypermethylated tumors. Assuming the percentage concentration of tumor-derived DNA in plasma is the same, the detection limit for the percentage concentration of tumor-derived DNA in plasma can be reduced by focusing on these regions because the distribution of plasma DNA methylation levels differs significantly between cancerous and non-cancer individuals. For example, if only regions with a difference > 0.5 are used and a threshold of 10% of regions being significantly hypomethylated is applied to determine whether a tested individual has cancer, then the minimum percentage concentration (f) of tumor-derived DNA detected can be calculated using the following equation: where represents the average methylation density of plasma DNA in the reference individual, M <sub>tumor</sub> represents the methylation density of tumor tissue in the cancer patient; and represents the average methylation density of plasma DNA in the cancer patient.

如参考个体的血浆与肿瘤组织之间的甲基化密度差异是至少0.5。那么,0.5×f=0.7 -0.68796并且f=2.4%。因此,通过集中于癌症与非癌症组织之间的甲基化密度差异较 高的区域,分数肿瘤来源的DNA的下限可以从4.8%降到2.4%。关于哪些区域将展示癌症与例如血细胞等非癌症组织之间较大的甲基化差异程度的信息可以从获自其它个体 的相同器官或相同组织类型的肿瘤组织确定。If the difference in methylation density between the plasma and tumor tissue of a reference individual is at least 0.5, then 0.5 × f = 0.7 - 0.68796 and f = 2.4%. Therefore, by focusing on regions with higher methylation density differences between cancerous and non-cancer tissues, the lower limit for fractional tumor-derived DNA can be reduced from 4.8% to 2.4%. Information on which regions will show a greater degree of methylation difference between cancer and non-cancer tissues such as blood cells can be determined from tumor tissues of the same organ or tissue type obtained from other individuals.

在另一个实施例中,参数可以衍生自所有区域的血浆DNA的甲基化密度并且考虑癌症与非癌症组织之间的甲基化密度差异。差异较大的区域可以假定为权重较重。在一 个实施例中,在计算最终参数时,每个区域的癌症与非癌症组织之间的甲基化密度差异 可以直接用作特定区域的权重。In another embodiment, the parameters can be derived from the methylation density of plasma DNA across all regions and take into account the difference in methylation density between cancerous and non-cancer tissues. Regions with larger differences can be assumed to have higher weights. In one embodiment, when calculating the final parameters, the difference in methylation density between cancerous and non-cancer tissues in each region can be directly used as the weight for that specific region.

在又一实施例中,不同类型的癌症可以在肿瘤组织中具有不同的甲基化模式。癌症 特定的权重型态可以衍生自特定类型癌症的甲基化程度。In another embodiment, different types of cancer can have different methylation patterns in tumor tissue. Cancer-specific weighted morphologies can be derived from the degree of methylation in a particular type of cancer.

在又一实施例中,甲基化密度的区域间关系可以在有和无癌症的个体中确定。图8中,可以观测到在少量的区域中,肿瘤组织比参考个体的血浆DNA更加甲基化。因此, 可以选择差异的最极值,例如差异>0.5和差异<0的区域。随后这些区域的甲基化密度的 比率可以用于指示测试个体是否患有癌症。在其它实施例中,不同区域的甲基化密度的 差异和商可以用作表明区域间关系的参数。In another embodiment, the interregional relationship of methylation density can be determined in individuals with and without cancer. In Figure 8, it can be observed that in a small number of regions, tumor tissue is more methylated than the plasma DNA of the reference individual. Therefore, the extreme values of the differences can be selected, such as regions with differences > 0.5 and differences < 0. The ratio of methylation density in these regions can then be used to indicate whether the tested individual has cancer. In other embodiments, the difference and quotient of methylation density in different regions can be used as parameters indicating interregional relationships.

进一步评估所述方法使用如通过从HCC患者获得的数据所说明的多个基因组区域的甲基化密度检测或评估肿瘤的检测灵敏度。首先,将来自手术前血浆的读数与从健康 对照的血浆样品获得的读数混合以模拟含有在20%到0.5%范围内的分数肿瘤DNA浓度 的血浆样品。随后对甲基化密度相当于z分数<-3的1Mb区域(在全基因组中2,734个 区域中)的百分比评分。当血浆中的分数肿瘤DNA浓度是20%时,80.0%的区域展示显 著低甲基化。10%、5%、2%、1%和0.5%的血浆中分数肿瘤DNA浓度的对应数据分别是67.6%、49.7%、18.9%、3.8%和0.77%的区域展示低甲基化。因为对照样品中展示z 分数<-3的区域的数目的理论界限是0.15%,所以数据显示甚至在肿瘤百分比浓度仅为 0.5%时,仍然有更多的区域(0.77%)超出理论截止界限。Further evaluation of the method was conducted using methylation density detection of multiple genomic regions, as illustrated by data obtained from HCC patients, to assess the detection sensitivity of tumors. First, readings from preoperative plasma were mixed with readings from plasma samples obtained from healthy controls to simulate plasma samples containing fractional tumor DNA concentrations ranging from 20% to 0.5%. Subsequently, the percentage of methylation density corresponding to 1Mb regions (out of 2,734 regions in the whole genome) with z-scores <-3 was scored. When the fractional tumor DNA concentration in plasma was 20%, 80.0% of the regions showed significant hypomethylation. The corresponding data for fractional tumor DNA concentrations in plasma of 10%, 5%, 2%, 1%, and 0.5% were 67.6%, 49.7%, 18.9%, 3.8%, and 0.77%, respectively, showing hypomethylation. Because the theoretical limit for the number of regions showing z-scores <-3 in the control sample is 0.15%, the data shows that even at a tumor percentage concentration of only 0.5%, there are still more regions (0.77%) that exceed the theoretical cutoff.

图32A是表3200,展示当血浆样品含有5%或2%肿瘤DNA时减小测序深度的影响。当平均测序深度仅是单倍体基因组的0.022倍时,仍然可以检测到高比例的展示显著低 甲基化的区域(>0.15%)。Figure 32A is a copy of Table 3200, showing the effect of reducing sequencing depth when plasma samples contain 5% or 2% tumor DNA. Even with an average sequencing depth of only 0.022 times that of the haploid genome, a high proportion of regions exhibiting significant hypomethylation (>0.15%) can still be detected.

图32B是图表3250,展示四个健康对照个体的血浆、HCC患者的白细胞层、正常 肝组织、肿瘤组织、手术前血浆和手术后血浆样品中重复元件和非重复区域的甲基化密 度。可以观测到,癌症与非癌症组织中重复元件比非重复区域更多甲基化(更高甲基化 密度)。但是,当与肿瘤组织比较时,非癌症组织和健康个体的血浆DNA中重复元件与非重复区域之间的甲基化差异更大。Figure 32B, which is Figure 3250, shows the methylation density of repeating elements and non-repeating regions in plasma from four healthy controls, leukocyte layers from HCC patients, normal liver tissue, tumor tissue, pre-operative plasma, and post-operative plasma samples. It can be observed that repeating elements are more methylated (higher methylation density) than non-repeating regions in both cancerous and non-cancer tissues. However, when compared with tumor tissue, the difference in methylation between repeating elements and non-repeating regions in plasma DNA from non-cancer tissues and healthy individuals is even greater.

结果,癌症患者的血浆DNA的甲基化密度在重复元件比在非重复区域降低得大。对于重复元件和非重复区域,四个健康对照和HCC患者的平均值之间的血浆DNA甲基 化密度差异分别是0.163和0.088。关于手术前和手术后血浆样品的数据还展示甲基化密 度改变的动态范围在重复区域比在非重复区域大。在一个实施例中,重复元件的血浆 DNA甲基化密度可以用于确定患者是否患有癌症或用于监测疾病进展。As a result, the methylation density of plasma DNA in cancer patients was significantly lower in repeating elements than in non-repeat regions. For repeating elements and non-repeat regions, the differences in plasma DNA methylation density between the mean values of four healthy controls and HCC patients were 0.163 and 0.088, respectively. Data from pre- and post-operative plasma samples also showed a larger dynamic range of methylation density changes in repeating regions than in non-repeat regions. In one embodiment, the plasma DNA methylation density of repeating elements can be used to determine whether a patient has cancer or to monitor disease progression.

如上文所论述,参考个体的血浆中甲基化密度的变化还将影响区分癌症患者与非癌 症个体的准确性。甲基化密度分布越紧密(即标准差越小),区分癌症与非癌症个体越准确。在另一个实施例中,1Mb区域的甲基化密度的变异系数(CV)可以用作选择参 考群体中血浆DNA甲基化密度的变化性低的区域的标准。举例来说,仅仅选择<1%的 区域。例如0.5%、0.75%、1.25%和1.5%等其它值也可以用作选择甲基化密度的变化性 低的区域的标准。在又一实施例中,选择标准可以包括区域的CV和癌症与非癌症组织 之间的甲基化密度差异两者。As discussed above, variations in plasma methylation density in reference individuals also affect the accuracy of distinguishing between cancer patients and non-cancer individuals. A denser distribution of methylation density (i.e., a smaller standard deviation) results in more accurate differentiation between cancer and non-cancer individuals. In another embodiment, the coefficient of variation (CV) of methylation density in a 1 Mb region can be used as a criterion for selecting regions with low variability in plasma DNA methylation density within the reference population. For example, only regions <1% are selected. Other values, such as 0.5%, 0.75%, 1.25%, and 1.5%, can also be used as criteria for selecting regions with low variability in methylation density. In yet another embodiment, selection criteria may include both the CV of the region and the difference in methylation density between cancerous and non-cancer tissues.

当已知肿瘤组织的甲基化密度时,甲基化密度也可以用于评估血浆样品中肿瘤来源 的DNA的百分比浓度。此信息可以通过分析患者的肿瘤或从对具有相同类型癌症的大量患者的肿瘤进行研究来获得。如上文所论述,血浆甲基化密度(P)可以使用以下等 式表达:P=BKG×(1-f)+TUM×f,其中BKG是来自血细胞和其它器官的背景甲基化密度,TUM是肿瘤组织中的甲基化密度,并且f是血浆样品中肿瘤来源的DNA的百分 比浓度。此可以重写为:When the methylation density of tumor tissue is known, it can also be used to assess the percentage concentration of tumor-derived DNA in plasma samples. This information can be obtained by analyzing a patient's tumor or by studying tumors from a large number of patients with the same type of cancer. As discussed above, plasma methylation density (P) can be expressed using the following equation: P = BKG × (1 - f) + TUM × f, where BKG is the background methylation density from blood cells and other organs, TUM is the methylation density in tumor tissue, and f is the percentage concentration of tumor-derived DNA in the plasma sample. This can be rewritten as:

BKG的值可以通过在不存在癌症的时刻分析患者的血浆样品或由无癌症个体的参考群体的研究来确定。因此,在测量血浆甲基化密度后,可以确定f。BKG values can be determined by analyzing a patient's plasma sample at a time when cancer is absent, or by a study of a reference population of individuals without cancer. Therefore, f can be determined after measuring plasma methylation density.

F.与其它方法组合F. Combination with other methods

本文所述的甲基化分析方法可以与其它基于血浆中肿瘤来源的DNA的遗传改变的方法组合使用。此类方法的实例包括分析癌症相关的染色体异常(陈等人2013临床化 学;59:211-224;利瑞等人2012科学·转化医学;4:162ra154)和血浆中癌症相关的单核苷 酸变异(陈等人et al.2013临床化学;59:211-224)。甲基化分析方法比那些遗传性方法 有优点。The methylation analysis method described in this article can be used in combination with other methods based on genetic alterations in tumor-derived DNA in plasma. Examples of such methods include the analysis of cancer-related chromosomal abnormalities (Chen et al., 2013 Clinical Chemistry; 59:211-224; Liri et al., 2012 Science Translational Medicine; 4:162ra154) and cancer-related mononucleotide variants in plasma (Chen et al., 2013 Clinical Chemistry; 59:211-224). Methylation analysis methods have advantages over those genetic methods.

如图21A中所示,肿瘤DNA的低甲基化是涉及遍及几乎整个基因组分布的区域的整体现象。因此,来自所有染色体区域的DNA片段将对于患者中肿瘤来源的低甲基化 DNA对血浆/血清DNA的可能贡献是信息性的。相比之下,染色体异常(染色体区域的 扩增或缺失)仅仅存在于一些染色体区域并且来自肿瘤组织中无染色体异常的区域的 DNA片段在分析中将不是信息性的(陈等人2013临床化学;59:211-224)。类似地,在每个癌症基因组中仅仅观测到数千个单核苷酸改变(陈等人2013临床化学;59: 211-224)。不与这些单核苷酸变化重叠的DNA片段对于确定肿瘤来源的DNA是否存在 于血浆中将不是信息性的。因此,此甲基化分析方法可能比用于检测外周血循环中癌症 相关的改变的那些遗传性方法更有成本效益。As shown in Figure 21A, hypomethylation of tumor DNA is a global phenomenon involving regions distributed across almost the entire genome. Therefore, DNA fragments from all chromosomal regions will be informative of the potential contribution of tumor-derived hypomethylated DNA to plasma/serum DNA in patients. In contrast, chromosomal abnormalities (amplification or deletion of chromosomal regions) are present only in some chromosomal regions, and DNA fragments from regions of tumor tissue without chromosomal abnormalities will not be informative in the analysis (Chen et al. 2013 Clinical Chemistry; 59: 211-224). Similarly, only a few thousand single nucleotide alterations are observed in each cancer genome (Chen et al. 2013 Clinical Chemistry; 59: 211-224). DNA fragments that do not overlap with these single nucleotide alterations will not be informative for determining the presence of tumor-derived DNA in plasma. Therefore, this methylation analysis method may be more cost-effective than those genetic methods used to detect cancer-related alterations in peripheral blood circulation.

在一个实施例中,血浆DNA甲基化分析的成本效益可以通过从最具信息性的区域,例如富集癌症与非癌症组织之间甲基化差异最高的区域DNA片段来进一步增强成本效 益。富集这些区域的方法的实例包括使用杂交探针(例如尼姆布雷根(Nimblegen)SeqCap 系统和安捷伦(Agilent)SureSelect标靶富集系统)、PCR扩增和固相杂交。In one embodiment, the cost-effectiveness of plasma DNA methylation analysis can be further enhanced by enriching DNA fragments from the most informative regions, such as those with the highest methylation differences between cancer and non-cancer tissues. Examples of methods for enriching these regions include using hybridization probes (such as the Nimblegen SeqCap system and the Agilent SureSelect target enrichment system), PCR amplification, and solid-phase hybridization.

G.组织特定的分析/供体G. Organize specific analysis/donors

肿瘤来源的细胞侵袭并转移到相邻或远端的器官。被侵袭的组织或转移性病灶由于 细胞死亡而提供DNA到血浆中。通过分析癌症患者的血浆中DNA的甲基化型态和检测组织特定的甲基化标记的存在,可以检测与疾病过程相关的组织类型。此方法提供了一 种与癌症过程有关的组织无创解剖扫描以帮助鉴别所涉及的器官为原发性和转移性位 点。监测血浆中所涉及器官的甲基化标记的相对浓度还将允许评估那些器官的肿瘤负荷 并确定所述器官中的癌症过程是否退化或改善或已经治愈。举例来说,如果基因X在肝 中特定甲基化。那么将预期与癌症(例如结肠直肠癌)相关的肝部转移增加血浆中来自 基因X的甲基化序列的浓度。还将存在具有与基因X类似的甲基化特征的另一序列或 序列群体。随后可以组合由此类序列产生的结果。类似的考虑因素适用于其它组织,例如脑、骨、肺和肾等。Tumor-derived cells invade and metastasize to adjacent or distant organs. Invaded tissues or metastatic lesions contribute DNA to the plasma due to cell death. By analyzing the methylation patterns of DNA in the plasma of cancer patients and detecting the presence of tissue-specific methylation markers, tissue types associated with the disease process can be identified. This method provides a non-invasive anatomical scan of tissues associated with the cancer process to help identify primary and metastatic sites. Monitoring the relative concentrations of methylation markers in the plasma of the involved organs will also allow for assessment of the tumor burden in those organs and determination of whether the cancer process in those organs has regressed, improved, or has been cured. For example, if gene X is specifically methylated in the liver, then an increased concentration of methylated sequences from gene X in the plasma would be expected in cases of liver metastases associated with cancer (e.g., colorectal cancer). Another sequence or sequence group with similar methylation characteristics to gene X would also be present. Results from such sequences can then be combined. Similar considerations apply to other tissues, such as the brain, bone, lungs, and kidneys.

另一方面,已知来自不同器官的DNA展现组织特定的甲基化标记(福兹赫2002自然遗传;31:175-179(BW Futscher et al.2002Nat Genet;31:175-179);詹等人2008临床化学;54:500-511)。因此,血浆中的甲基化型态分析可以用于阐明来自各种器官的组织 对血浆的贡献。此类贡献的阐明可以用于评估器官破坏,因为相信血浆DNA在细胞死 亡时释放。举例来说,例如肝炎(例如通过病毒、自身免疫过程等)或由药物引起的肝 脏毒性(例如药物过剂量(例如通过扑热息痛(paracetamol))或毒素(例如醇))等肝 脏病变与肝细胞损伤相关并且将预期与血浆中增加的肝来源DNA水平相关。举例来说, 如果基因X在肝中特定甲基化。那么将预期肝脏病变增加血浆中来自基因X的甲基化 序列的浓度。相反,如果肝中基因Y特别低甲基化。那么将预期肝脏病变降低血浆中来 自基因Y的甲基化序列的浓度。在其它实施例中,基因X或Y可以被不是基因并且显 示体内不同组织中的甲基化差异的任何基因组序列置换。On the other hand, it is known that DNA from different organs exhibits tissue-specific methylation markers (Fotscher 2002 Nature Genetics; 31:175-179; BW Futscher et al. 2002 Nat Genet; 31:175-179; Zhan et al. 2008 Clinical Chemistry; 54:500-511). Therefore, analysis of methylation patterns in plasma can be used to elucidate the contribution of tissues from various organs to plasma. Elucidation of such contributions can be used to assess organ damage, as plasma DNA is believed to be released upon cell death. For example, liver lesions such as hepatitis (e.g., through viruses, autoimmune processes, etc.) or drug-induced liver toxicity (e.g., drug overdose (e.g., through paracetamol) or toxins (e.g., alcohols)) are associated with hepatocellular damage and will be expected to be associated with increased levels of liver-derived DNA in plasma. For example, if gene X is specifically methylated in the liver, then increased liver lesions will be expected to increase the concentration of methylated sequences from gene X in plasma. Conversely, if gene Y is particularly hypomethylated in the liver, then liver lesions will be expected to reduce the concentration of methylated sequences from gene Y in the plasma. In other embodiments, gene X or Y can be replaced by any genomic sequence that is not a gene and shows differences in methylation across different tissues in the body.

本文所述的技术还可以应用于评估器官移植接受者的血浆中供体来源的DNA(洛等人1998柳叶刀;351:1329-1330(YMD Lo et al.1998Lancet;351:1329-1330))。供体与接受者之间的多态差异已经用于区分血浆中供体来源的DNA与接受者来源的DNA(郑等 人2012临床化学;58:549-558)。提出移植器官的组织特定的甲基化标记也可以用作一 种检测接受者血浆中供体DNA的方法。The techniques described in this article can also be applied to assess donor-derived DNA in the plasma of organ transplant recipients (Lo et al., 1998, The Lancet; 351:1329-1330). Polymorphic differences between donors and recipients have been used to distinguish donor-derived DNA from recipient-derived DNA in plasma (Zheng et al., 2012, Clinical Chemistry; 58:549-558). It has been proposed that tissue-specific methylation markers of transplanted organs can also be used as a method for detecting donor DNA in recipient plasma.

通过监测供体DNA的浓度,可以无创地评估移植器官的状态。举例来说,移植排 斥反应与高速率的细胞死亡相关,并且因此如移植器官的甲基化特征所反映,与患者处 于稳定状况时比较或与其它稳定的移植接受者或无移植的健康对照比较,接受者血浆 (或血清)中供体DNA的浓度将增加。类似于已经针对癌症所描述的内容,可以在移植接受者的血浆中通过检测所有或一些特征性特征,包括多态差异、移植实体器官的较短 尺寸的DNA(郑等人2012临床化学;58:549-558)和组织特定的甲基化型态,来鉴别供 体来源的DNA。By monitoring the concentration of donor DNA, the condition of transplanted organs can be non-invasively assessed. For example, transplant rejection is associated with a high rate of cell death, and therefore, as reflected in the methylation characteristics of the transplanted organ, the concentration of donor DNA in the recipient's plasma (or serum) will be increased compared to patients in a stable condition or to other stable transplant recipients or healthy controls without transplantation. Similar to what has been described for cancer, donor-derived DNA can be identified in the recipient's plasma by detecting all or some of the characteristic features, including polymorphism, the shorter DNA size of the transplanted solid organ (Zheng et al. 2012 Clinical Chemistry; 58:549-558), and tissue-specific methylation patterns.

H.基于尺寸标准化甲基化H. Size-normalized methylation

如上文和伦等人(伦等人临床化学2013;doi:10.1373/clinchem.2013.212274(FMF Lun et al.Clin.Chem.2013;doi:10.1373/clinchem.2013.212274))所述,甲基化密度(例如血 浆DNA)与DNA片段的尺寸相关。较短血浆DNA片段的甲基化密度的分布显著低于较长片段。提出具有血浆DNA的异常片段化模式的一些非癌症病状(例如全身性红斑 狼疮(SLE))因存在更大量的较少甲基化的短血浆DNA片段而可能显示血浆DNA明 显的低甲基化。换句话说,血浆DNA的尺寸分布可能是血浆DNA的甲基化密度的干扰 因素。As described above by Lun et al. (Lun et al., Clinical Chemistry, 2013; doi:10.1373/clinchem.2013.212274), methylation density (e.g., plasma DNA) is related to the size of DNA fragments. The methylation density distribution of shorter plasma DNA fragments is significantly lower than that of longer fragments. It has been suggested that some non-cancerous conditions with anomalous plasma DNA fragmentation patterns (e.g., systemic lupus erythematosus (SLE)) may exhibit significant hypomethylation of plasma DNA due to the presence of a larger amount of shorter, less methylated plasma DNA fragments. In other words, the size distribution of plasma DNA may be a confounding factor for the methylation density of plasma DNA.

图34A展示SLE患者SLE04中血浆DNA的尺寸分布。九个健康对照个体的尺寸分 布展示为灰色点线并且SLE04的尺寸分布展示为黑色实线。SLE04中短血浆DNA片段 比九个健康对照个体中更大量。因为较短DNA片段一般较少甲基化,所以此尺寸分布模式可能干扰血浆DNA的甲基化分析并引起更明显的低甲基化。Figure 34A shows the size distribution of plasma DNA in SLE04 patients with SLE. The size distribution of the nine healthy controls is shown as gray dotted lines, and the size distribution of SLE04 is shown as a black solid line. Short plasma DNA fragments are more abundant in SLE04 than in the nine healthy controls. Because shorter DNA fragments are generally less methylated, this size distribution pattern may interfere with plasma DNA methylation analysis and cause more pronounced hypomethylation.

在一些实施例中,测量的甲基化水平可以进行标准化,以减少尺寸分布对血浆DNA甲基化分析的干扰影响。举例来说,可以测量多个位点的DNA分子的尺寸。在各个实 现方式中,测量值可以提供DNA分子特定的尺寸(例如长度),或简单地确定尺寸在特 定范围内,此也可以对应于尺寸。随后标准化的甲基化水平可以与阈值比较。有若干方 式来执行标准化以减少尺寸分布对血浆DNA甲基化分析的干扰影响。In some embodiments, the measured methylation levels can be normalized to reduce the interference of size distribution on plasma DNA methylation analysis. For example, the size of DNA molecules at multiple sites can be measured. In various implementations, the measurements can provide a specific size (e.g., length) of the DNA molecule, or simply determine that the size is within a specific range, which may also correspond to size. The normalized methylation level can then be compared to a threshold. Several methods exist for performing normalization to reduce the interference of size distribution on plasma DNA methylation analysis.

在一个实施例中,可以进行DNA(例如血浆DNA)的尺寸分级分离。尺寸分级分 离可以确保类似尺寸的DNA片段以与阈值一致的方式用于确定甲基化水平。作为尺寸 分级分离的一部分,可以选择具有第一尺寸(例如第一长度范围)的DNA片段,其中第一阈值对应于第一尺寸。标准化可以通过仅仅使用所选择的DNA片段计算甲基化水 平来实现。In one embodiment, size grading of DNA (e.g., plasma DNA) can be performed. Size grading ensures that DNA fragments of similar size are used in a manner consistent with a threshold for determining methylation levels. As part of size grading, DNA fragments with a first size (e.g., a first length range) can be selected, where a first threshold corresponds to the first size. Normalization can be achieved by calculating the methylation level using only the selected DNA fragments.

尺寸分级分离可以用各种方式实现,例如通过不同尺寸的DNA分子的物理分离(例如通过电泳,或基于微流体的技术,或基于离心的技术)或通过电脑模拟分析。对于电 脑模拟分析,在一个实施例中,可以执行血浆DNA分子的双末端大规模平行测序。随 后可以通过与参考人类基因组的血浆DNA分子两个末端中每一者的位置比较来推断测 序分子的尺寸。随后,可以通过选择匹配一或多个尺寸选择标准(例如尺寸在指定范围 内的标准)的测序DNA分子执行后续分析。因此,在一个实施例中,可以分析具有类似尺寸(例如在指定范围内)的片段的甲基化密度。阈值(例如在方法2800的框2840 中)可以基于相同尺寸范围内的片段确定。举例来说,甲基化水平可以从已知患有癌症 或未患癌症的样品确定,并且阈值可以从这些甲基化水平确定。Size-grade separation can be achieved in various ways, such as through physical separation of DNA molecules of different sizes (e.g., by electrophoresis, or microfluidics-based techniques, or centrifugation-based techniques) or through computer simulation analysis. For computer simulation analysis, in one embodiment, massively parallel sequencing of the paired ends of plasma DNA molecules can be performed. The size of the sequenced molecule can then be inferred by comparing its position with that of each of the two ends of a plasma DNA molecule from a reference human genome. Subsequently, further analysis can be performed by selecting sequenced DNA molecules that match one or more size selection criteria (e.g., a size within a specified range). Thus, in one embodiment, the methylation density of fragments with similar sizes (e.g., within a specified range) can be analyzed. Thresholds (e.g., in box 2840 of method 2800) can be determined based on fragments within the same size range. For example, methylation levels can be determined from samples known to have or not have cancer, and thresholds can be determined from these methylation levels.

在另一个实施例中,可以确定外周血循环DNA的甲基化密度与尺寸之间的函数关系。函数关系可以由数据点或函数系数来定义。函数关系可以提供与相应尺寸相对应的 校正值(例如较短尺寸可以具有对应的甲基化增加)。在各种实现方式中,校正值可以 在0与1之间或超过1。In another embodiment, a functional relationship between the methylation density and size of peripheral blood circulating DNA can be determined. This functional relationship can be defined by data points or function coefficients. The functional relationship can provide a correction value corresponding to the respective size (e.g., a shorter size may have a corresponding increase in methylation). In various implementations, the correction value can be between 0 and 1 or greater than 1.

标准化可以基于平均尺寸进行。举例来说,可以计算与DNA分子相对应的用于计算第一甲基化水平的平均尺寸,并且第一甲基化水平可以乘以对应校正值(即与平均尺 寸相对应)。作为另一实例,可以根据DNA分子的尺寸以及DNA尺寸与甲基化之间的 关系将每个DNA分子的甲基化密度标准化。Normalization can be based on average size. For example, the average size corresponding to the DNA molecule for calculating the first methylation level can be calculated, and the first methylation level can be multiplied by a corresponding correction value (i.e., corresponding to the average size). As another example, the methylation density of each DNA molecule can be normalized based on the size of the DNA molecule and the relationship between DNA size and methylation.

在另一个实施例中,可以基于每个分子来标准化。举例来说,可以获得特定位点上DNA分子的相应尺寸(例如如上所述),并且与相应尺寸相对应的校正值可以从函数关 系来确定。对于非标准化计算,每个分子在确定位点的甲基化指数方面将同等地计数。 对于标准化计算,分子对甲基化指数的贡献可以通过对应于分子尺寸的校正因子来加 权。In another embodiment, normalization can be performed on a per-molecule basis. For example, the corresponding size of the DNA molecule at a specific site can be obtained (e.g., as described above), and the correction value corresponding to the corresponding size can be determined from a functional relationship. For unnormalized calculations, each molecule will be counted equally in terms of the methylation index at a given site. For normalized calculations, the contribution of a molecule to the methylation index can be weighted by a correction factor corresponding to the molecule size.

图34B和34C展示来自SLE患者SLE04(图34B)和HCC患者TBR36(图34C)的血 浆DNA的甲基化分析。外圆展示未经电脑模拟尺寸分级分离的血浆DNA的Z甲基化结果。 内圆展示130bp或更长的血浆DNA的Z甲基化结果。对于SLE患者SLE04,84%的区域展示 在无电脑模拟尺寸分级分离下的低甲基化。当仅仅分析130bp或更长的片段时,展示低 甲基化的区域的百分比减少到15%。对于HCC患者TBR36,在进行和未进行电脑模拟尺寸分级分离下,分别98.5%和98.6%的区域展示血浆DNA低甲基化。这些结果表明电脑 模拟尺寸分级分离可以有效地减少例如全身性红斑狼疮或其它发炎病状患者中与增加的血浆DNA片段化有关的假阳性低甲基化结果。Figures 34B and 34C show the methylation analysis of plasma DNA from SLE patient SLE04 (Figure 34B) and HCC patient TBR36 (Figure 34C). The outer circle shows the Z- methylation results of plasma DNA without computer-simulated size grading. The inner circle shows the Z- methylation results of plasma DNA 130 bp or longer. For SLE patient SLE04, 84% of the region showed hypomethylation without computer-simulated size grading. When only fragments of 130 bp or longer were analyzed, the percentage of regions showing hypomethylation decreased to 15%. For HCC patient TBR36, 98.5% and 98.6% of the region showed plasma DNA hypomethylation with and without computer-simulated size grading, respectively. These results indicate that computer-simulated size grading can effectively reduce false-positive hypomethylation results associated with increased plasma DNA fragmentation in patients with conditions such as systemic lupus erythematosus or other inflammatory conditions.

在一个实施例中,可以比较进行和未进行尺寸分级分离的分析的结果以指示是否尺 寸对甲基化结果有任何干扰影响。因此,除标准化外或代替标准化,计算特定尺寸下的甲基化水平可以用于确定在进行和未进行尺寸分级分离下超过阈值的区域的百分比不 同时是否存在假阳性的可能性,或是否仅仅特定甲基化水平不同。举例来说,在进行和未进行尺寸分级分离下样品的结果之间的显著性差异的存在可以用于指示因异常的片 段化模式而可能有假阳性结果。用于确定差异是否显著的阈值可以通过分析癌症患者群 和非癌症对照个体群来建立。In one embodiment, the results of analyses performed and not performed size grading can be compared to indicate whether size has any interfering effect on methylation results. Therefore, in addition to or instead of standardization, calculating the methylation level at a specific size can be used to determine whether the percentage of regions exceeding a threshold under size grading and without methylation differs, and whether there is a possibility of false positives, or whether only the specific methylation level differs. For example, the presence of a significant difference between the results of samples under size grading and without methylation can be used to indicate possible false positives due to abnormal fragmentation patterns. The threshold used to determine whether a difference is significant can be established by analyzing a cohort of cancer patients and a cohort of non-cancer controls.

I.分析血浆中的全基因组CpG岛高甲基化I. Analysis of genome-wide CpG island hypermethylation in plasma

除整体低甲基化外,还经常在癌症中观测到CpG岛的高甲基化(百林等人2011自然评论:癌症;11:726-734(SB Baylin et al.2011Nat Rev Cancer;11:726-734);琼斯等人2007,细胞;128:683-692(PA Jones et al.2007,Cell;128:683-692);埃斯特尔等人2007自然综述遗传学2007;8:286-298(M Esteller et al.2007Nat Rev Genet 2007;8:286-298);埃利希氏等人2002癌基因2002;21:5400-5413(M Ehrlich et al.2002Oncogene2002;21:5400-5413))。在此部分,描述使用CpG岛高甲基化的全基因组分析 用于检测和监测癌症。In addition to overall hypomethylation, hypermethylation of CpG islands is frequently observed in cancer (SB Baylin et al. 2011 Nat Rev Cancer; 11:726-734); Jones et al. 2007 Cell; 128:683-692). (M Esteller et al. 2007 Nat Rev Genet 2007; 8:286-298); M Ehrlich et al. 2002 Oncogene 2002; 21:5400-5413). In this section, genome-wide analysis using CpG island hypermethylation is described for the detection and monitoring of cancer.

图35是方法3500的流程图,所述方法3500根据本发明的实施例,基于CpG岛的 高甲基化确定癌症等级的分类。方法2800的多个位点可以包括CpG位点,其中CpG位 点组织成多个CpG岛,每个CpG岛包括一或多个CpG位点。每个CpG岛的甲基化水 平可以用于确定癌症等级的分类。Figure 35 is a flowchart of method 3500, which, according to an embodiment of the present invention, determines cancer grade classification based on the hypermethylation of CpG islands. In method 2800, the multiple sites may include CpG sites, wherein the CpG sites are organized into multiple CpG islands, each CpG island including one or more CpG sites. The methylation level of each CpG island can be used to determine the cancer grade classification.

在框3510处,鉴别待分析的CpG岛。在此分析中,作为一个实例,首先确定一组 待分析的CpG岛,其用在健康参考个体的血浆中相对低的甲基化密度表征。在一方面, 参考群体中甲基化密度的变化可以相对较小,以允许更易于检测癌症相关的高甲基化。 在一个实施例中,在参考群体中CpG岛具有低于第一百分比的平均甲基化密度,并且参考群体中甲基化密度的变异系数低于第二百分比。At box 3510, the CpG islands to be analyzed are identified. In this analysis, as an example, a set of CpG islands to be analyzed is first identified, characterized by relatively low methylation density in the plasma of healthy reference individuals. In one aspect, the variation in methylation density in the reference population can be relatively small to allow for easier detection of cancer-related hypermethylation. In one embodiment, the CpG islands in the reference population have an average methylation density below a first percentage, and the coefficient of variation of methylation density in the reference population is below a second percentage.

作为一个实例,为了说明,使用以下标准鉴别适用的CpG岛:As an example, to illustrate, the following criteria are used to identify applicable CpG islands:

i.参考群体(例如健康个体)中CpG岛的平均甲基化密度<5%i. The average methylation density of CpG islands in the reference population (e.g., healthy individuals) is <5%.

ii.用于分析参考群体(例如健康个体)的血浆中甲基化密度的变异系数<30%。ii. The coefficient of variation for methylation density in plasma used to analyze a reference population (e.g., healthy individuals) is <30%.

这些参数可以针对特定的应用而调整。从数据集,基因组中的454个CpG岛满足这些标准。These parameters can be adjusted for specific applications. From the dataset, 454 CpG islands in the genome meet these criteria.

在框3520处,计算每个CpG岛的甲基化密度。甲基化密度可以如本文中所述来确定。At box 3520, the methylation density of each CpG island is calculated. The methylation density can be determined as described herein.

在框3530处,确定每一CpG岛是否高甲基化。举例来说,为分析测试案例的CpG 岛高甲基化,每个CpG岛的甲基化密度与参考群体的对应数据比较。甲基化密度(甲基 化水平的一个实例)可以与一或多个阈值比较以确定特定岛是否高甲基化。At box 3530, determine whether each CpG island is hypermethylated. For example, to analyze the hypermethylation of CpG islands in the test case, the methylation density of each CpG island is compared with the corresponding data in the reference population. The methylation density (an instance of methylation level) can be compared with one or more thresholds to determine whether a particular island is hypermethylated.

在一个实施例中,第一阈值可以对应于参考群体的甲基化密度的平均值加指定百分 比。另一阈值可以对应于参考群体的甲基化密度的平均值加指定数目的标准偏差。在一个实现方式中,计算z分数(Z甲基化)并与阈值比较。作为一个实例,测试个体(例如筛 选癌症的个体)中的CpG岛如果满足以下标准,那么其被看作是显著高甲基化的:In one embodiment, a first threshold may correspond to the average methylation density of the reference population plus a specified percentage. Another threshold may correspond to the average methylation density of the reference population plus a specified number of standard deviations. In one implementation, a z-score (Z -methylation ) is calculated and compared to the threshold. As an example, CpG islands in a test individual (e.g., individuals screening for cancer) are considered significantly hypermethylated if they meet the following criteria:

i.其甲基化密度高于参考群体的平均值并且差异达2%,和i. Its methylation density was higher than the reference population average by 2%, and

ii.Z甲基化>3。ii. Z -methylation > 3.

这些参数也可以针对特定应用而调整。These parameters can also be adjusted for specific applications.

在框3540处,高甲基化CpG岛的甲基化密度(例如为z分数)用于确定累积分数。 举例来说,在鉴别所有显著高甲基化的CpG岛后,可以计算涉及z分数总和的分数或所有 高甲基化CpG岛的z分数的函数。分数的一个实例是累积概率(CP)分数,如另一部分 中所描述。累积概率分数使用Z甲基化根据概率分布(例如具有3个自由度的史都登氏t概 率分布(Student's t probability distribution))确定偶然具有此类观测结果的概率。At box 3540, the methylation density (e.g., z-score) of the hypermethylated CpG islands is used to determine the cumulative score. For example, after identifying all significantly hypermethylated CpG islands, a score involving the sum of z-scores or a function of the z-scores of all hypermethylated CpG islands can be calculated. One example of a score is the cumulative probability (CP) score, as described in another section. The cumulative probability score uses Z -methylation to determine the probability of having such an observation by chance according to a probability distribution (e.g., Student's t-probability distribution with 3 degrees of freedom).

在框3550处,比较累积分数与累积阈值以确定癌症等级的分类。举例来说,如果所鉴别的CpG岛中的总高甲基化足够大,那么生物体可以被鉴别为患有癌症。在一个实 施例中,累积阈值对应于来自参考群体的最高累积分数。At box 3550, the cumulative score is compared to a cumulative threshold to determine the cancer grade classification. For example, if the total hypermethylation in the identified CpG islands is sufficiently high, the organism can be identified as having cancer. In one embodiment, the cumulative threshold corresponds to the highest cumulative score from the reference population.

IX.甲基化和CNAIX. Methylation and CNA

如上文所提及,本文所述的甲基化分析方法可以与其它基于血浆中肿瘤来源的DNA 的遗传改变的方法组合使用。此类方法的实例包括分析癌症相关的染色体异常(陈等人 2013临床化学;59:211-224;利瑞等人2012科学·转化医学;4:162ra154)。拷贝数异常(CNA)的方面描述于美国专利申请案第13/308,473号中。As mentioned above, the methylation analysis method described herein can be used in combination with other methods based on genetic alterations in tumor-derived DNA in plasma. Examples of such methods include the analysis of cancer-related chromosomal abnormalities (Chen et al. 2013 Clinical Chemistry; 59:211-224; Leary et al. 2012 Science Translational Medicine; 4:162ra154). Aspects of copy number abnormalities (CNAs) are described in U.S. Patent Application No. 13/308,473.

A.CNAA.CNA

拷贝数异常可以通过计数与基因组的特定部分比对的DNA片段,将计数标准化并比较计数与阈值来检测。在各个实施例中,标准化可以通过对与基因组的相同部分的另 一单倍型比对的DNA片段进行计数(相对单倍型剂量(RHDO))或通过对与基因组的 另一部分比对的DNA片段进行计数来进行。Copy number abnormalities can be detected by counting DNA fragments aligned to a specific part of the genome, normalizing the counts, and comparing the counts to a threshold. In various embodiments, normalization can be performed by counting DNA fragments aligned to another haplotype of the same part of the genome (relative haplotype dose (RHDO)) or by counting DNA fragments aligned to another part of the genome.

RHDO方法依赖于使用杂合基因座。通过比较两个区域而非相同区域的两个单倍型,此部分中描述的实施例也可以用于纯合基因座,因而是非单倍型特定的。在相对染色体区域剂量方法中,来自一个染色体区域的片段数目(例如如通过对与所述区域比对 的序列读数计数所确定)与预期值(其可以来自参考染色体区域或来自已知健康的另一 样品中的相同区域)比较。以此方式,无论测序标签来自哪个单倍型,都针对染色体区域计数片段。因此,仍然可以使用不含杂合基因座的序列读数。为了进行比较,一个实 施例可以在比较前将标签计数标准化。每个区域由至少两个基因座(彼此分隔开)界定, 并且这些基因座上的片段可以用于获得关于所述区域的累积值。The RHDO method relies on the use of heterozygous loci. The embodiments described in this section can also be used for homozygous loci, and are therefore non-haplotype-specific, by comparing two haplotypes of two regions rather than the same region. In the relative chromosomal region dosing method, the number of fragments from a chromosomal region (e.g., determined by counting sequence reads aligned to said region) is compared to a target value (which may come from a reference chromosomal region or from the same region in another known healthy sample). In this way, fragments are counted against the chromosomal region regardless of which haplotype the sequencing tag comes from. Therefore, sequence reads without heterozygous loci can still be used. For comparison, one embodiment may normalize the tag count before comparison. Each region is defined by at least two loci (separated from each other), and fragments at these loci can be used to obtain a cumulative value for said region.

可以通过将与特定区域比对的测序读数的数目除以可与全基因组比对的测序读数 的总数来计算所述区域的测序读数(标签)的标准化值。此标准化的标签计数允许由一个待与另一样品的结果比较的样品产生。举例来说,标准化值可以是预期来自所述特定 区域的测序读数的比例(例如百分比或分数),如上所述。在其它实施例中,其它用于 标准化的方法是可能的。举例来说,可以通过将一个区域的计数数目除以参考区域的计 数数目(在以上情况下,参考区域正好是全基因组)来标准化。随后此标准化标签计数 可以针对阈值比较,所述阈值可以从一或多个未显示癌症的参考样品确定。A normalized value for the sequencing reads (tags) of a specific region can be calculated by dividing the number of sequencing reads aligned to that region by the total number of sequencing reads aligned to the whole genome. This normalized tag count can be generated by a sample whose results are to be compared with those of another sample. For example, the normalized value can be a proportion (e.g., a percentage or fraction) of the sequencing reads expected from the specific region, as described above. In other embodiments, other methods for normalization are possible. For example, normalization can be achieved by dividing the count of a region by the count of a reference region (in the above case, the reference region is exactly the whole genome). This normalized tag count can then be compared against a threshold that can be determined from one or more reference samples that do not show cancer.

随后测试案例的标准化标签计数与例如无癌症个体等一或多个参考个体的标准化 标签计数比较。在一个实施例中,通过针对特定染色体区域计算案例的z分数来进行比较。可以使用以下等式计算z分数:z分数=(案例的标准化标签计数-平均值)/SD, 其中“平均值”是与参考样品的特定染色体区域比对的平均标准化标签计数;以及SD是 与参考样品的特定区域比对的标准化标签计数的数目的标准差。因此,z分数是测试案例的染色体区域的标准化标签计数相距一或多个参考个体的相同染色体区域的平均标 准化标签计数的标准差数目。The standardized tag count of the test case is then compared with the standardized tag counts of one or more reference individuals, such as cancer-free individuals. In one embodiment, the comparison is made by calculating a z-score for a specific chromosomal region. The z-score can be calculated using the following equation: z-score = (standardized tag count of the case - mean) / SD, where the "mean" is the mean standardized tag count compared to a specific chromosomal region of the reference sample; and SD is the standard deviation of the number of standardized tag counts compared to a specific region of the reference sample. Therefore, the z-score is the number of standard deviations of the standardized tag counts of the chromosomal region of the test case from the mean standardized tag counts of the same chromosomal region of one or more reference individuals.

在测试生物体患有癌症的情况下,在肿瘤组织中扩增的染色体区域将在血浆DNA中呈现过高。这将引起z分数的正值。另一方面,在肿瘤组织中缺失的染色体区域将在 血浆DNA中呈现不足。这将引起z分数的负值。z分数的量值由若干因素决定。In test organisms with cancer, amplified chromosomal regions in tumor tissue will appear excessively in plasma DNA. This will result in a positive z-score. Conversely, missing chromosomal regions in tumor tissue will appear insufficiently in plasma DNA. This will result in a negative z-score. The magnitude of the z-score is determined by several factors.

一个因素是生物样品(例如血浆)中肿瘤来源的DNA的百分比浓度。样品(例如 血浆)中肿瘤来源的DNA的百分比浓度越高,测试案例与参考案例的标准化标签计数之间的差异将越大。因此,z分数的量值将越大。One factor is the percentage concentration of tumor-derived DNA in the biological sample (e.g., plasma). The higher the percentage concentration of tumor-derived DNA in the sample (e.g., plasma), the greater the difference between the normalized tag count of the test case and the reference case. Therefore, the z-score will be higher.

另一因素是一或多个参考案例中标准化标签计数的变化。在测试案例的生物样品(例如血浆)中相同程度的染色体区域呈现过高下,参考群体中标准化标签计数的较小 变化(即较小标准差)将产生较高z分数。类似地,在测试案例的生物样品(例如血浆) 中相同程度的染色体区域呈现不足下,参考群体中标准化标签计数的较小标准差将产生 更大负值的z分数。Another factor is the variation in standardized label counts across one or more reference cases. When the same degree of chromosomal region presentation is present in the biological sample (e.g., plasma) of the test case, a smaller variation (i.e., a smaller standard deviation) in the standardized label counts of the reference population will produce a higher z-score. Similarly, when the same degree of chromosomal region presentation is insufficient in the biological sample (e.g., plasma) of the test case, a smaller standard deviation in the standardized label counts of the reference population will produce a larger negative z-score.

另一因素是肿瘤组织中染色体异常的量值。染色体异常的量值是指特定染色体区域 的拷贝数改变(增加或损失)。肿瘤组织中拷贝数改变越高,血浆DNA中特定染色体区 域呈现过高或呈现不足的程度越高。举例来说,染色体的两个拷贝的损失与染色体两个 拷贝之一的损失相比,将引起血浆DNA中更大的染色体区域呈现不足,并且,因此引 起更大负值的z分数。通常,癌症中存在多个染色体异常。每种癌症中的染色体异常可以进一步在其性质(即扩增或缺失)、其程度(单或多拷贝增加或损失)和其广度程度 (根据染色体长度异常的尺寸)方面变化。Another factor is the magnitude of chromosomal abnormalities in tumor tissue. The magnitude of a chromosomal abnormality refers to a change in the copy number (increase or loss) of a specific chromosomal region. The higher the copy number change in tumor tissue, the greater the degree to which a specific chromosomal region is present in plasma DNA, either excessively or insufficiently. For example, the loss of two copies of a chromosome, compared to the loss of one of two copies, will result in a larger insufficient chromosomal region in plasma DNA, and therefore a larger negative z-score. Typically, multiple chromosomal abnormalities are present in cancer. The chromosomal abnormalities in each type of cancer can further vary in their nature (i.e., amplification or deletion), their degree (single or multiple copy increases or losses), and their extent (depending on the size of the chromosomal length abnormality).

测量标准化标签计数的精确度受所分析的分子数目影响。预期当百分比浓度是大约 12.5%、6.3%和3.2%时,分别15,000、60,000和240,000个分子将需要进行分析以检测具有一个拷贝改变(增加或损失)的染色体异常。关于针对不同染色体区域检测癌症的标签计数的进一步细节描述于洛等人的标题为“使用大规模平行基因组测序诊断胎儿染 色体非整倍体”的美国专利公开案第2009/0029377号中,所述公开案的全部内容以引用 的方式并入本文中以达成所有目的。The accuracy of measuring standardized tag counts is affected by the number of molecules analyzed. It is anticipated that 15,000, 60,000, and 240,000 molecules, respectively, will need to be analyzed to detect chromosomal abnormalities with a copy change (increase or loss), at percentage concentrations of approximately 12.5%, 6.3%, and 3.2%. Further details regarding tag counts for detecting cancer in different chromosomal regions are described in U.S. Patent Publication No. 2009/0029377, entitled “Diagnosis of Fetal Chromosomal Aneuploidy Using Massive Parallel Genome Sequencing,” by Yullo et al., the entire contents of which are incorporated herein by reference for all purposes.

实施例也可以使用尺寸分析代替标签计数法。还可以使用尺寸分析代替标准化标签 计数。尺寸分析可以使用如本文中和美国专利申请案第12/940,992号中所提及的各种参 数。举例来说,可以使用来自以上的Q或F值。此类尺寸值无需通过从其它区域计数来 标准化,因为这些值不随着读数的数目而按比例调整。例如上述和美国专利申请案第 13/308473号更详细描述的RHDO方法等单倍型特定的方法技术也可以用于非特定方法。举例来说,可以使用涉及测序深度和区域优化的技术。在一些实施例中,当比较两 个区域时可以考虑特定区域的GC偏好。因为RHDO方法使用相同区域,所以不需要此 类校正。The embodiments may also use size analysis instead of tag counting. Size analysis may also be used instead of normalized tag counting. Size analysis may use various parameters as mentioned herein and in U.S. Patent Application No. 12/940,992. For example, Q or F values from the above may be used. Such size values do not need to be normalized by counting from other regions because these values are not proportionally adjusted with the number of readings. Haplotype-specific methodological techniques, such as the RHDO method described in more detail above and in U.S. Patent Application No. 13/308473, may also be used for non-specific methods. For example, techniques involving sequencing depth and region optimization may be used. In some embodiments, GC preference for a particular region may be considered when comparing two regions. Because the RHDO method uses the same regions, this type of correction is not required.

虽然某些癌症可能通常在特定染色体区域中存在异常,但此类癌症不是始终只在此 类区域中存在异常。举例来说,额外的染色体区域可能展示异常,并且此类额外区域的位置可能是未知的。此外,当筛选患者以鉴别早期癌症时,可能想要鉴别一大批在全基 因组范围任一位置可能显示异常的各种类型的癌症。为了解决这些情况,实施例可以用系统的方式分析多个区域以确定哪个区域展示异常。可以使用异常数目和其位置(例如 它们是否相邻),例如来证实异常、确定癌症阶段、提供癌症的诊断(例如是否数目超 过阈值)以及基于显示异常的各个区域的数目和位置提供预后。While some cancers may typically exhibit abnormalities in specific chromosomal regions, these cancers do not always present abnormalities only in these regions. For example, additional chromosomal regions may show abnormalities, and the location of such additional regions may be unknown. Furthermore, when screening patients to identify early-stage cancers, it may be desirable to identify a large number of various types of cancer that may show abnormalities at any location across the entire genome. To address these situations, embodiments can systematically analyze multiple regions to determine which regions show abnormalities. The number of abnormalities and their location (e.g., whether they are adjacent) can be used, for example, to confirm abnormalities, determine cancer stage, provide a cancer diagnosis (e.g., whether the number exceeds a threshold), and provide prognosis based on the number and location of the individual regions showing abnormalities.

因此,实施例可以基于展示异常的区域数目鉴别生物体是否患有癌症。因此,可以测试多个区域(例如3,000个)以鉴别显示异常的区域数目。所述区域可以涵盖整个基 因组或只是基因组的部分,例如非重复区域。Therefore, embodiments can identify whether an organism has cancer based on the number of regions displaying abnormalities. Thus, multiple regions (e.g., 3,000) can be tested to identify the number of regions displaying abnormalities. These regions can cover the entire genome or only portions of the genome, such as non-repetitive regions.

图36是方法3600的流程图,所述方法3600根据本发明的实施例,使用多个染色 体区域分析生物体的生物样品。生物样品包括核酸分子(也称为片段)。Figure 36 is a flowchart of method 3600, which, according to an embodiment of the present invention, analyzes biological samples of an organism using multiple chromosomal regions. The biological samples include nucleic acid molecules (also referred to as fragments).

在框3610处,鉴别生物体的基因组的多个区域(例如不重叠区域)。每个染色体区域包括多个基因座。区域的尺寸可以是1Mb,或一些其它同等尺寸。对于尺寸为1Mb 的区域的情况,那么整个基因组可以包括约3,000个区域,每一者具有预定尺寸和位置。 此类预定区域可以变化以容纳特定染色体的长度或指定数目的待使用区域,和本文中所 提及的任何其它标准。如果区域具有不同长度,那么此类长度可以用于将结果标准化, 例如如本文中所述。可以基于特定生物体的某些标准和/或基于测试癌症的知识特定地选 择区域。也可以任意地选择区域。At box 3610, multiple regions (e.g., non-overlapping regions) of an organism's genome are identified. Each chromosomal region includes multiple loci. The size of the region can be 1 Mb, or some other equivalent size. In the case of a 1 Mb region, the entire genome could then include approximately 3,000 regions, each with a predetermined size and location. Such predetermined regions can vary to accommodate a specific chromosome length or a specified number of regions to be used, and any other criteria mentioned herein. If regions have different lengths, such lengths can be used to standardize the results, for example, as described herein. Regions can be specifically selected based on certain criteria for a particular organism and/or based on knowledge of the cancer being tested. Regions can also be selected arbitrarily.

在框3620处,针对多个核酸分子中的每一者,鉴别生物体的参考基因组中核酸分子的位置。位置可以用本文中提及的任一方式确定,例如通过测序片段以获得测序标签 并将测序标签与参考基因组比对。针对单倍型特定的方法,也可以确定分子的特定单倍 型。At box 3620, for each of the multiple nucleic acid molecules, the location of the nucleic acid molecule in the reference genome of the organism is identified. The location can be determined using any of the methods mentioned herein, such as obtaining a sequencing tag from a sequencing fragment and aligning the sequencing tag to the reference genome. A specific haplotype of the molecule can also be determined using haplotype-specific methods.

针对每一染色体区域进行框3630-3650。在框3630处,基于鉴别的位置,鉴别相应群体的核酸分子为来自染色体区域。相应群体可以包括至少一个位于染色体区域多个基因座中每一者的核酸分子。在一个实施例中,群体可以是与染色体区域的特定单倍型比对的片段,例如如以上RHDO方法中。在另一个实施例中,群体可以是与染色体区域比 对的任何片段。Boxes 3630-3650 are defined for each chromosomal region. At box 3630, nucleic acid molecules from a corresponding population are identified as originating from the chromosomal region based on their location. A corresponding population may include nucleic acid molecules located at at least one of multiple loci within the chromosomal region. In one embodiment, the population may be a fragment aligned to a specific haplotype of the chromosomal region, such as in the RHDO method described above. In another embodiment, the population may be any fragment aligned to a chromosomal region.

在框3640处,计算机系统计算相应群体的核酸分子的相应值。相应值界定了相应群体的核酸分子的特性。相应值可以是本文中提及的任一值。举例来说,值可以是群体 中的片段数目或群体中片段的尺寸分布的统计值。相应值也可以是标准化值。例如区域 的标签计数除以样品的标签计数总数或参考区域的标签计数数目。相应值也可以是与另一值的差异或比值(例如RHDO中),由此提供了所述区域的差异特性。At box 3640, the computer system calculates the corresponding value for the nucleic acid molecules in the corresponding population. The corresponding value defines the characteristics of the nucleic acid molecules in the corresponding population. The corresponding value can be any value mentioned herein. For example, the value can be the number of fragments in the population or a statistical value of the size distribution of fragments in the population. The corresponding value can also be a standardized value. For example, the tag count of a region divided by the total number of tag counts in the sample or the number of tag counts in a reference region. The corresponding value can also be a difference or ratio to another value (e.g., in RHDO), thereby providing the differential characteristics of the region.

在框3650处,比较相应值与参考值以确定第一染色体区域显示缺失还是扩增的分类。此参考值可以是本文所述的任何阈值或参考值。举例来说,参考值可以是针对正常 样品确定的阈值。对于RHDO,相应值可以是两个单倍型的标签计数的差异或比率,并且参考值可以是用于确定存在统计显著偏差的阈值。作为另一实例,参考值可以是另一 单倍型或区域的标签计数或尺寸值,并且比较可以包括采取差异或比值(或它们的函数) 并随后确定差异或比率是否超过阈值。At box 3650, the corresponding value is compared to a reference value to determine whether the first chromosomal region shows a deletion or an amplification. This reference value can be any threshold or reference value described herein. For example, a reference value can be a threshold determined for a normal sample. For RHDO, the corresponding value can be the difference or ratio of the tag counts of two haplotypes, and the reference value can be a threshold used to determine if there is a statistically significant deviation. As another example, the reference value can be the tag count or size value of another haplotype or region, and the comparison can include taking the difference or ratio (or a function of them) and subsequently determining whether the difference or ratio exceeds a threshold.

参考值可以基于其它区域的结果而变化。举例来说,如果邻近区域也展示偏差(不过小于一个阈值,例如z分数为3),那么可以使用低阈值。举例来说,如果三个相连区域都超过第一阈值,那么癌症更有可能。因此,此第一阈值可以低于从非相连区域鉴别 癌症所需要的另一阈值。三个区域(或大于三个)具有甚至更小偏差都可以具有随机波 动效应对应的足够低的概率,从而可以维持灵敏度和特异性。The reference value can vary based on results from other regions. For example, if neighboring regions also show bias (but less than a threshold, such as a z-score of 3), then a lower threshold can be used. For instance, if three adjacent regions all exceed a first threshold, then cancer is more likely. Therefore, this first threshold can be lower than another threshold required to identify cancer from non-adjacent regions. Three regions (or more) with even smaller biases can have a sufficiently low probability corresponding to random fluctuations, thus maintaining sensitivity and specificity.

在框3660处,确定被归类为显示缺失或扩增的基因组区域的量。计数的染色体区域可能具有限制。举例来说,可能仅仅计数与至少一个其它区域相邻的区域(或相邻区 域可能需要具有一定尺寸,例如4个或更多个区域)。对于区域不相同的实施例,数目 也可以考虑相应长度(例如数目可以是异常区域的总长度)。At box 3660, determine the number of genomic regions classified as showing deletions or amplifications. The chromosomal regions counted may be limited. For example, only regions adjacent to at least one other region may be counted (or adjacent regions may need to have a certain size, such as four or more regions). For embodiments where the regions are not identical, the number may also consider the corresponding length (e.g., the number could be the total length of the aberrant regions).

在框3670处,比较量与量阈值以确定样品的分类。作为实例,分类可以是生物体是否患有癌症、癌症的阶段和癌症的预后。在一个实施例中,所有异常区域都计数并使 用单个阈值,不管区域出现在何处。在另一个实施例中,阈值可以基于计数的区域的位 置和尺寸而变化。举例来说,特定染色体或染色体臂上区域的量可以与特定染色体(或 臂)的阈值比较。可以使用多个阈值。举例来说,特定染色体(或臂)上异常区域的量 必须超过第一阈值,并且基因组中异常区域的总量必须超过第二阈值。阈值可以是确定显示缺失或扩增的区域的百分比。At box 3670, the quantity is compared to a quantity threshold to determine the classification of the sample. As an example, the classification could be whether an organism has cancer, the stage of cancer, and the prognosis of cancer. In one embodiment, all abnormal regions are counted and a single threshold is applied, regardless of where the region appears. In another embodiment, the threshold can vary based on the location and size of the counted regions. For example, the quantity of a region on a specific chromosome or chromosome arm can be compared to a threshold for that specific chromosome (or arm). Multiple thresholds can be used. For example, the quantity of abnormal regions on a specific chromosome (or arm) must exceed a first threshold, and the total number of abnormal regions in the genome must exceed a second threshold. The threshold can be a percentage of regions showing deletions or amplifications.

区域的量的此阈值也可以取决于主张所计数区域的不均衡有多强。举例来说,用作 确定癌症分类的阈值的区域的量可能取决于用于检测每个区域中的异常的特异性和灵敏度(异常阈值)。举例来说,如果异常阈值低(例如z分数为2),那么可以选择高的 量阈值(例如150)。但如果异常阈值高(例如z分数为3),那么量阈值可以较低(例 如50)。展示异常的区域的量也可以是加权值,例如展示高度不均衡的一个区域可以加权高于仅展示略微不均衡的区域(即比仅仅异常阳性和阴性有更多的分类)。作为一个 实例,可以使用z分数的总和,由此使用加权值。The threshold for the quantity of a region can also depend on how strong the imbalance is claimed among the regions being counted. For example, the quantity of a region used as a threshold to determine a cancer classification might depend on the specificity and sensitivity (abnormality threshold) used to detect abnormalities in each region. For instance, if the abnormality threshold is low (e.g., a z-score of 2), then a high quantity threshold (e.g., 150) can be chosen. But if the abnormality threshold is high (e.g., a z-score of 3), then the quantity threshold can be low (e.g., 50). The quantity of regions exhibiting abnormalities can also be a weighted value; for example, a region exhibiting high imbalance can be weighted higher than a region exhibiting only slight imbalance (i.e., more classifications than simply abnormal positive and negative). As an example, the sum of z-scores could be used, thus employing a weighted value.

因此,展示标准化标签计数(或用于群体特性的其它相应值)的显著表示过高或表示不足的染色体区域的量(其可以包括数目和/或尺寸)可以用于反映疾病的严重程度。 具有异常标准化标签计数的染色体区域的量可以由两个因素决定,即肿瘤组织中染色体 异常的数目(或尺寸)和生物样品(例如血浆)中肿瘤来源的DNA的百分比浓度。更晚期癌症倾向于展现更多(和更大)染色体异常。因此,更多癌症相关的染色体异常将 可能在样品(例如血浆)中可检测到。在具有更晚期癌症的患者中,较高的肿瘤负荷将引起血浆中肿瘤来源的DNA较高的百分比浓度。结果,将在血浆样品中更容易检测到 肿瘤相关的染色体异常。Therefore, the amount of significantly overrepresented or underrepresented chromosomal regions (which may include number and/or size) exhibiting normalized tag counts (or other corresponding values for population characteristics) can be used to reflect disease severity. The amount of chromosomal regions with abnormal normalized tag counts can be determined by two factors: the number (or size) of chromosomal abnormalities in tumor tissue and the percentage concentration of tumor-derived DNA in a biological sample (e.g., plasma). More advanced cancers tend to exhibit more (and larger) chromosomal abnormalities. Therefore, more cancer-related chromosomal abnormalities are likely to be detectable in samples (e.g., plasma). In patients with more advanced cancer, a higher tumor burden will result in a higher percentage concentration of tumor-derived DNA in the plasma. As a result, tumor-related chromosomal abnormalities will be more easily detected in plasma samples.

一种用于提高灵敏度而无损特异性的可能方法是考虑相邻染色体区段的结果。在一 个实施例中,z分数的阈值保持>2和<-2。但是,仅仅当两个连续片段都将展示相同类型的异常时,例如两个区段的z分数>2,染色体区域才将被归类为可能异常。在其它实 施例中,邻近区段的z分数可以使用较高阈值加在一起。举例来说,三个连续区段的z 分数可以求和并可以使用阈值5。此概念可以延伸到超过三个连续区段。One possible approach to improve sensitivity without compromising specificity is to consider the results of adjacent chromosomal segments. In one embodiment, the z-score threshold remains >2 and <-2. However, the chromosomal region is only classified as potentially abnormal if two consecutive segments are likely to show the same type of abnormality, for example, if the z-scores of both segments are >2. In other embodiments, the z-scores of adjacent segments can be summed using a higher threshold. For example, the z-scores of three consecutive segments can be summed and a threshold of 5 can be used. This concept can be extended to more than three consecutive segments.

量和异常阈值的组合也可能取决于分析的目的和生物体的任何先前知识(或其缺乏)。举例来说,如果针对癌症筛选正常的健康群体,那么将通常可能在区域的量(即 区域数目的高阈值)与区域被鉴别为具有异常时的异常阈值方面使用高度特异性。但在 具有较高风险的患者(例如有肿瘤或家族史的患者、吸烟者、慢性人类乳突状瘤病毒(HPV)携带者、肝炎病毒携带者或其它病毒携带者)中,阈值可能较低以具有更高灵 敏度(更少假阴性)。The combination of quantity and abnormality threshold may also depend on the purpose of the analysis and any prior knowledge (or lack thereof) about the organism. For example, if screening a normal healthy population for cancer, then a high specificity would typically be used in terms of the quantity of regions (i.e., a high threshold for the number of regions) versus the abnormality threshold when a region is identified as having an abnormality. However, in patients at higher risk (e.g., those with a tumor or family history, smokers, chronic human papillomavirus (HPV) carriers, hepatitis virus carriers, or other viral carriers), the threshold may be lower for higher sensitivity (fewer false negatives).

在一个实施例中,如果使用1Mb分辨率和肿瘤来源的DNA6.3%的检测下限来检测染色体异常,那么每个1Mb区段中的分子数目将需要为60,000。对于全基因组,此将 变换成大约1.8亿(60,000个读数/兆碱基×3,000兆碱基)可比对读数。In one embodiment, if chromosomal abnormalities are detected using a 1Mb resolution and a detection limit of 6.3% for tumor-derived DNA, then the number of molecules in each 1Mb segment would need to be 60,000. For the whole genome, this would translate to approximately 180 million (60,000 readings/Megabase × 3,000 Megabase) comparable readings.

较小的区段尺寸将产生较高的用于检测较小染色体异常的分辨率。但是,这将增加 对总共要分析的分子数目的要求。以分辨率为代价,较大的区段尺寸将减少分析所需要的分子数目。因此,仅仅可以检测到较大的异常。在一个实现方式中,可以使用较大的 区域,展示异常的区段可以再分并且分析这些子区以获得更好的分辨率(例如如上所 述)。如果对待检测的缺失或扩增的尺寸(或检测的最小浓度)进行评估,那么可以确 定分析的分子数目。Smaller segment sizes will produce higher resolution for detecting smaller chromosomal abnormalities. However, this will increase the total number of molecules required to be analyzed. Larger segment sizes, at the expense of resolution, will reduce the number of molecules required for analysis. Therefore, only larger abnormalities can be detected. In one implementation, a larger region can be used, and the segment showing the abnormality can be subdivided and analyzed to obtain better resolution (e.g., as described above). The number of molecules to be analyzed can be determined by evaluating the size of the deletion or amplification to be detected (or the minimum concentration to be detected).

B.基于经亚硫酸氢盐处理的血浆DNA的测序的CNAB. CNA based on sequencing of bisulfite-treated plasma DNA

可以时常在肿瘤组织中观测到全基因组低甲基化和CNA。此处,证明CNA和癌症 相关的甲基化改变的信息可以同时从血浆DNA的亚硫酸氢盐测序获得。因为两种类型 的分析可以在相同数据集上进行,所以实际上对于CNA分析来说,没有额外的成本。 其它实施例可以使用不同的程序获得甲基化信息和遗传信息。在其它实施例中,可以结 合CNA分析,对癌症相关的高甲基化执行类似的分析。Genome-wide hypomethylation and CNA can frequently be observed in tumor tissue. Here, it is demonstrated that information on CNA and cancer-related methylation alterations can be obtained simultaneously from bisulfite sequencing of plasma DNA. Because both types of analysis can be performed on the same dataset, there is practically no additional cost for CNA analysis. Other embodiments can use different procedures to obtain methylation and genetic information. In other embodiments, similar analyses can be performed on cancer-related hypermethylation in conjunction with CNA analysis.

图37A展示患者TBR36的肿瘤组织、未经亚硫酸氢盐(BS)处理的血浆DNA和 经亚硫酸氢盐(BS)处理的血浆DNA(从内到外)的CNA分析。图37A展示患者TBR36 的肿瘤组织、未经亚硫酸氢盐(BS)处理的血浆DNA和经亚硫酸氢盐(BS)处理的血 浆DNA(从内到外)的CNA分析。最外环展示染色体G带图。每个点表示1Mb区域 的结果。绿点、红点和灰点分别表示拷贝数增加、拷贝数损失和无拷贝数改变的区域。 对于血浆分析,展示z分数。两个同心线之间存在5的差异。对于肿瘤组织分析,展示 拷贝数。两个同心线之间存在一个拷贝差异。图38A展示患者TBR34的肿瘤组织、未经亚硫酸氢盐(BS)处理的血浆DNA和经亚硫酸氢盐(BS)处理的血浆DNA(从内 到外)的CNA分析。在经亚硫酸氢盐处理和未经亚硫酸氢盐处理的血浆样品中检测的 CNA的模式一致。Figure 37A shows the CNA analysis of tumor tissue, untreated plasma DNA, and BS-treated plasma DNA (inside to outside) from patient TBR36. The outermost ring shows the chromosome G-banding diagram. Each point represents the result for a 1 Mb region. Green, red, and gray dots represent regions of copy number increase, copy number loss, and no copy number change, respectively. For plasma analysis, z-scores are shown. A difference of 5 exists between the two concentric lines. For tumor tissue analysis, copy number is shown. A copy difference of one exists between the two concentric lines. Figure 38A shows the CNA analysis of tumor tissue, untreated plasma DNA, and BS-treated plasma DNA (inside to outside) from patient TBR34. The CNA patterns detected in both bisulfite-treated and untreated plasma samples were consistent.

在肿瘤组织、未经亚硫酸氢盐处理和经亚硫酸氢盐处理的血浆中检测的CNA的模式一致。为了进一步评估经亚硫酸氢盐处理与未经亚硫酸氢盐处理的血浆的结果之间的一致性,构建散点图。图37B是展示针对患者TBR36,使用经亚硫酸氢盐处理的血浆和 未经亚硫酸氢盐处理的血浆检测1Mb区域的CNA的z分数之间的关系的散点图。观测 到两个分析的z分数之间的正相关(r=0.89,p<0.001,皮尔逊相关)。图38B是展示针 对患者TBR34,使用经亚硫酸氢盐处理的血浆和未经亚硫酸氢盐处理的血浆检测1Mb 区域的CNA的z分数之间的关系的散点图。观测到两个分析的z分数之间的正相关 (r=0.81,p<0.001,皮尔逊相关)。The patterns of CNAs detected in tumor tissue, untreated plasma, and bisulfite-treated plasma were consistent. To further evaluate the consistency between the results of bisulfite-treated and untreated plasma, scatter plots were constructed. Figure 37B is a scatter plot showing the relationship between z-scores of CNAs in the 1Mb region detected using bisulfite-treated plasma and untreated plasma for patient TBR36. A positive correlation was observed between the z-scores of the two analyses (r = 0.89, p < 0.001, Pearson correlation). Figure 38B is a scatter plot showing the relationship between z-scores of CNAs in the 1Mb region detected using bisulfite-treated plasma and untreated plasma for patient TBR34. A positive correlation was observed between the z-scores of the two analyses (r = 0.81, p < 0.001, Pearson correlation).

C.癌症相关的CNA与甲基化改变的协同分析C. Synergistic analysis of cancer-related CNA and methylation alterations

如上所述,CNA的分析可以包括对每个1Mb区域中的序列读数的数目计数,而甲 基化密度的分析可以包括检测CpG双核苷酸上甲基化的胞嘧啶残基的比例。这两个分析 的组合可以产生协同信息用于检测癌症。举例来说,甲基化分类和CNA分类可以用于 确定癌症等级的第三分类。As described above, CNA analysis can include counting the number of sequence reads in each 1Mb region, while methylation density analysis can include detecting the proportion of methylated cytosine residues on CpG dinucleotides. The combination of these two analyses can produce synergistic information for cancer detection. For example, methylation classification and CNA classification can be used to determine the third category of cancer severity.

在一个实施例中,癌症相关的CNA或甲基化改变的存在都可以用于指示癌症的可能存在。在此类实施例中,当CNA或甲基化改变都存在于测试个体的血浆中时可以增 加检测癌症的灵敏度。在另一个实施例中,两种改变的存在可以用于指示癌症的存在。在此类实施例中,可以提高测试的特异性,因为两种类型改变的任一者都可能在一些非 癌症个体中检测到。因此,仅仅当第一分类与第二分类都指示癌症时,第三分类才可能 是癌症阳性。In one embodiment, the presence of either cancer-related CNA or methylation alterations can be used to indicate the possible presence of cancer. In such embodiments, the sensitivity of cancer detection can be increased when both CNA and methylation alterations are present in the plasma of the test individual. In another embodiment, the presence of both alterations can be used to indicate the presence of cancer. In such embodiments, the specificity of the test can be improved because either type of alteration can be detected in some non-cancer individuals. Therefore, the third category is likely to be cancer-positive only if both the first and second categories indicate cancer.

招募26个HCC患者和22个健康个体。从每个个体收集血液样品并且在亚硫酸氢 盐处理后对血浆DNA测序。对于HCC患者,在诊断时收集血液样品。显著量的CNA 的存在例如被定义为>5%的区域展示z分数<-3或>3。显著量的癌症相关的低甲基化的 存在被定义为>3%的区域展示z分数<-3。作为实例,区域的量表示为区域的原始计数、 百分比和区域的长度。Twenty-six HCC patients and 22 healthy individuals were recruited. Blood samples were collected from each individual, and plasma DNA was sequenced after bisulfite treatment. For HCC patients, blood samples were collected at diagnosis. The presence of significant amounts of CNA was defined, for example, as >5% of the region showing a z-score <-3 or >3. The presence of significant amounts of cancer-associated hypomethylation was defined as >3% of the region showing a z-score <-3. As examples, the amount of a region is expressed as the raw count, percentage, and length of the region.

表3展示在经亚硫酸氢盐处理的血浆DNA上使用大规模平行测序检测26个HCC 患者的血浆中显著量的CNA和甲基化改变。Table 3 shows the significant CNA and methylation alterations in the plasma of 26 HCC patients detected by massively parallel sequencing of bisulfite-treated plasma DNA.

表3Table 3

癌症相关的甲基化改变和CNA的检测率分别是69%和50%。如果任一标准的存在用于指示癌症可能存在,那么检测率(即诊断灵敏度)提高到73%。The detection rates for cancer-related methylation alterations and CNA were 69% and 50%, respectively. If the presence of either criterion is used to indicate the possible presence of cancer, the detection rate (i.e., diagnostic sensitivity) increases to 73%.

展示了存在CNA(图39A)或甲基化改变(图39B)的两个患者的结果。图39A 是展示HCC患者TBR240的经亚硫酸氢盐处理的血浆的CNA(内环)和甲基化分析(外 环)的Circos图。对于CNA分析,绿点、红点和灰点分别表示染色体增加、损失和无拷贝数改变的区域。对于甲基化分析,绿点、红点和灰点分别表示具有高甲基化、低甲 基化和正常甲基化的区域。在此患者中,在血浆中检测到癌症相关的CNA,而甲基化分 析未揭露显著量的癌症相关的低甲基化。图39B是展示HCC患者TBR164的经亚硫酸 氢盐处理的血浆的CNA(内环)和甲基化分析(外环)的Circos图。在此患者中,在 血浆中检测到癌症相关的低甲基化。但是,无法观测到显著量的CNA。展示存在CNA 与甲基化改变的两个患者的结果展示在图48A(TBR36)和49A(TBR34)中。Results are presented for two patients with either CNA (Figure 39A) or methylation alterations (Figure 39B). Figure 39A is a Circos diagram showing CNA (inner ring) and methylation (outer ring) analysis of bisulfite-treated plasma from HCC patient TBR240. For CNA analysis, green, red, and gray dots represent regions of increased, decreased, and no copy number alterations, respectively. For methylation analysis, green, red, and gray dots represent regions with high, low, and normal methylation, respectively. In this patient, cancer-associated CNA was detected in the plasma, but methylation analysis did not reveal a significant amount of cancer-associated low methylation. Figure 39B is a Circos diagram showing CNA (inner ring) and methylation (outer ring) analysis of bisulfite-treated plasma from HCC patient TBR164. In this patient, cancer-associated low methylation was detected in the plasma. However, a significant amount of CNA could not be observed. The results of two patients with CNA and methylation alterations are shown in Figures 48A (TBR36) and 49A (TBR34).

表4展示在经亚硫酸氢盐处理的血浆DNA上使用大规模平行测序检测22个对照个体的血浆中显著量的CNA和甲基化改变。随机抽样(即留一交叉检验)方法用于评估 每个对照个体。因此,当评估特定个体时,其它21个个体用于计算对照组的平均值和 SD。Table 4 shows the significant CNA and methylation alterations in the plasma of 22 control individuals detected using massively parallel sequencing on bisulfite-treated plasma DNA. Random sampling (i.e., leave-one-out crossover test) was used to evaluate each control individual. Therefore, when evaluating a particular individual, the other 21 individuals were used to calculate the mean and SD of the control group.

表4Table 4

显著量的甲基化改变和CNA的检测特异性分别是86%和91%。如果需要两个标准的存在来指示可能存在癌症,那么特异性提高到95%。The detection specificity for significant methylation alterations and CNA was 86% and 91%, respectively. If the presence of both criteria is required to indicate the possible presence of cancer, the specificity increases to 95%.

在一个实施例中,CNA和/或低甲基化阳性的样品视为癌症阳性,并且当两者不可检测时样品视为阴性。使用“或”逻辑提供了更高的灵敏度。在另一个实施例中,只有对 CNA与低甲基化都呈阳性的样品才视为癌症阳性,由此提供了更高的特异性。在又一实 施例中,可以使用三层分类。个体分类成i.都正常;ii.一者异常;iii.都异常。In one embodiment, a sample positive for both CNA and/or hypomethylation is considered cancer-positive, and a sample negative is considered negative when both are undetectable. Using "OR" logic provides higher sensitivity. In another embodiment, only samples positive for both CNA and hypomethylation are considered cancer-positive, thus providing higher specificity. In yet another embodiment, a three-tiered classification can be used. Individuals are classified as i. both normal; ii. one abnormal; iii. both abnormal.

不同的后续策略可以用于这三个分类。举例来说,(iii)的个体可以经受最密集的后 续方案,例如涉及全身成像;(ii)的个体可以经受次密集的后续方案,例如在若干周的相对较短时间间隔后进行重复的血浆DNA测序;以及(i)的个体可以经受最不密集的 后续方案,例如在多年后重新测试。在其它实施例中,甲基化和CNA测量可以结合其 它临床参数(例如成像结果或血清生物化学)使用以进一步优化分类。Different follow-up strategies can be used for these three categories. For example, individuals in category (iii) may undergo the most intensive follow-up protocol, such as involving whole-body imaging; individuals in category (ii) may undergo a less intensive follow-up protocol, such as repeat plasma DNA sequencing at relatively short intervals of several weeks; and individuals in category (i) may undergo the least intensive follow-up protocol, such as retesting after many years. In other embodiments, methylation and CNA measurements may be used in conjunction with other clinical parameters, such as imaging results or serum biochemistry, to further optimize the classification.

D.在治疗后血浆DNA分析的预后价值D. Prognostic value of plasma DNA analysis after treatment

血浆中癌症相关的CNA和/或甲基化改变的存在将指示癌症患者的循环中肿瘤来源 的DNA的存在。在治疗(例如手术)后将预期这些癌症相关的改变降低或清除。另一 方面,在治疗后血浆中这些改变的持续可以指示来自身体的所有肿瘤细胞未完全去除并 且可以作为疾病复发的一种适用预示物。The presence of cancer-related CNA and/or methylation alterations in plasma indicates the presence of tumor-derived DNA in the circulation of cancer patients. These cancer-related alterations are expected to decrease or clear after treatment (e.g., surgery). Conversely, the persistence of these alterations in plasma after treatment can indicate that all tumor cells from the body have not been completely eliminated and can serve as a suitable predictor of disease recurrence.

在打算洽愈的肿瘤手术切除后一周,从两个HCC患者TBR34和TBR36收集血液样品。对经亚硫酸氢盐处理的治疗后的血浆样品进行CNA和甲基化分析。One week after surgical resection of the tumor intended for cure, blood samples were collected from two HCC patients, TBR34 and TBR36. CNA and methylation analyses were performed on the post-treatment plasma samples after bisulfite treatment.

图40A展示在HCC患者TBR36的肿瘤手术切除前(内环)和后(外环)对经亚硫 酸氢盐处理的血浆DNA的CNA分析。每个点表示1Mb区域的结果。绿点、红点和灰 点分别表示拷贝数增加、拷贝数损失和无拷贝数改变的区域。在肿瘤切除后大部分在治 疗前观测到的CNA消失。z分数<-3或>3的区域的比例从25%减少到6.6%。Figure 40A shows the CNA analysis of bisulfite-treated plasma DNA before (inner loop) and after (outer loop) tumor resection in HCC patient TBR36. Each dot represents the result for a 1 Mb region. Green, red, and gray dots represent regions of copy number increase, copy number loss, and no copy number change, respectively. Most of the CNA observed before treatment disappeared after tumor resection. The proportion of regions with z-scores <-3 or >3 decreased from 25% to 6.6%.

图40B展示在HCC患者TBR36的肿瘤手术切除前(内环)和后(外环)对经亚硫 酸氢盐处理的血浆DNA的甲基化分析。绿点、红点和灰点分别表示具有高甲基化、低 甲基化和正常甲基化的区域。展示显著低甲基化的区域的比例从90%显著降低到7.9%, 并且低甲基化程度也展示显著降低。此患者在肿瘤切除22个月后临床上完全回复。Figure 40B shows the methylation analysis of bisulfite-treated plasma DNA in HCC patient TBR36 before (inner ring) and after (outer ring) tumor resection. Green, red, and gray dots represent regions with hypermethylation, hypomethylation, and normal methylation, respectively. The proportion of regions showing significant hypomethylation decreased significantly from 90% to 7.9%, and the degree of hypomethylation also showed a significant reduction. This patient achieved complete clinical remission 22 months after tumor resection.

图41A展示在HCC患者TBR34的肿瘤手术切除前(内环)和后(外环)对经亚硫 酸氢盐处理的血浆DNA的CNA分析。虽然在肿瘤手术切除后在所影响的区域中展示 CNA的区域的数目与CNA量值都有所降低,但在手术后血浆样品中可以观测到残余 CNA。红圈突出了残余CNA最明显的区域。展示z分数<-3或>3的区域的比例从57% 减少到12%。Figure 41A shows the CNA analysis of bisulfite-treated plasma DNA before (inner loop) and after (outer loop) tumor resection in HCC patient TBR34. Although the number and magnitude of CNA-displaying regions decreased in the affected areas after tumor resection, residual CNA was still observable in postoperative plasma samples. The red circles highlight the areas with the most prominent residual CNA. The proportion of regions displaying z-scores <-3 or >3 decreased from 57% to 12%.

图41B展示在HCC患者TBR34的肿瘤手术切除前(内环)和后(外环)对经亚硫 酸氢盐处理的血浆DNA的甲基化分析。在肿瘤切除后低甲基化的量值减少,其中低甲 基化的区域的平均z分数从-7.9减少到-4.0。但是,z分数<-3的区域的比例展示相反的 改变,从41%增加到85%。此观测结果可能指示在治疗后存在残余癌细胞。临床上,在 肿瘤切除3个月后剩余未切除的肝中检测到肿瘤结的多个病灶。在手术后的第4个月观 测到肺癌转移。所述患者在手术后8个月死于局部复发和转移性疾病。Figure 41B shows the methylation analysis of bisulfite-treated plasma DNA before (inner loop) and after (outer loop) tumor resection in HCC patient TBR34. The amount of hypomethylation decreased after tumor resection, with the mean z-score of hypomethylated regions decreasing from -7.9 to -4.0. However, the proportion of regions with z-scores <-3 showed the opposite change, increasing from 41% to 85%. This observation may indicate the presence of residual cancer cells after treatment. Clinically, multiple lesions with tumor nodules were detected in the remaining unresected liver 3 months after tumor resection. Lung cancer metastasis was observed 4 months post-surgery. The patient died 8 months post-surgery from local recurrence and metastatic disease.

这两个患者(TBR34和TBR36)中的观察结果表明CNA和低甲基化的残余癌症相 关的改变的存在可以用于在打算洽愈的治疗后监测和预测癌症患者。所述数据还展示了所检测的血浆CNA的量的改变程度可以与评估血浆DNA低甲基化程度的改变程度协同 使用,以预测和监测治疗功效。The observations in these two patients (TBR34 and TBR36) suggest that the presence of residual cancer-related alterations in CNA and hypomethylation can be used to monitor and predict cancer patients after treatment intended for cure. The data also demonstrate that the degree of alteration in the amount of plasma CNA detected can be used in conjunction with the assessment of alterations in plasma DNA hypomethylation to predict and monitor treatment efficacy.

因此,在一些实施例中,在治疗前获得一个生物样品并且在治疗(例如手术)后获得第二生物样品。针对第一样品获得第一值,例如展示低甲基化和CNA(例如扩增或缺 失)的区域的z分数(例如区域甲基化水平和CNA的标准化值)和展示低甲基化和CNA 的区域的数目。针对第二样品可以获得第二值。在另一个实施例中,可以在治疗后获得 第三或甚至额外的样品。可以从第三或甚至额外的样品获得展示低甲基化和CNA(例如 扩增或缺失)的区域的数目。Therefore, in some embodiments, a biological sample is obtained before treatment and a second biological sample is obtained after treatment (e.g., surgery). A first value is obtained for the first sample, such as a z-score (e.g., a normalized value of the region's methylation level and CNA) and the number of regions exhibiting hypomethylation and CNA (e.g., amplification or deletion). A second value can be obtained for the second sample. In another embodiment, a third or even additional sample can be obtained after treatment. The number of regions exhibiting hypomethylation and CNA (e.g., amplification or deletion) can be obtained from the third or even additional sample.

如上针对图40A和41A所述,第一样品中展示低甲基化的区域的第一数目可以与第二样品中展示低甲基化的区域的第二量比较。如上针对图40B和41B所述,第一样品 中展示低甲基化的区域的第一量可以与第二样品中展示低甲基化的区域的第二量比较。 第一量与第二量和第一数目与第二数目比较可以用于确定治疗的预后。在不同实施例 中,仅仅比较之一可以确定预后或可以使用两个比较。在获得第三或甚至额外的样品的 实施例中,这些样品中的一或多个可以独自或结合第二样品用于确定治疗的预后。As described above with respect to Figures 40A and 41A, a first number of hypomethylated regions in the first sample can be compared with a second amount of hypomethylated regions in the second sample. As described above with respect to Figures 40B and 41B, a first amount of hypomethylated regions in the first sample can be compared with a second amount of hypomethylated regions in the second sample. Comparisons of the first amount with the second amount and the first number with the second number can be used to determine the prognosis of treatment. In different embodiments, only one comparison may be used to determine the prognosis, or both comparisons may be used. In embodiments where a third or even additional samples are obtained, one or more of these samples may be used alone or in combination with the second sample to determine the prognosis of treatment.

在一个实现方式中,当第一量与第二量之间的第一差异低于第一差异阈值时预测预 后将更坏。在另一个实施例中,当第一数目与第二数目之间的第二差异低于第二差异阈值时预测预后将更坏。阈值可以是相同或不同的。在一个实施例中,第一差异阈值和第 二差异阈值是零。因此,对于以上实例,甲基化的值之间的差异将指示患者TBR34更 坏的预后。In one implementation, a worse prognosis is predicted when the first difference between the first and second amounts is below a first difference threshold. In another embodiment, a worse prognosis is predicted when the second difference between the first and second amounts is below a second difference threshold. The thresholds can be the same or different. In one embodiment, the first and second difference thresholds are zero. Therefore, for the above examples, the difference between methylation values will indicate a worse prognosis for the patient with TBR34.

如果第一差异和/或第二差异超过相同阈值或相应阈值,那么预后可以更好。预后的 分类可以取决于差异低于或超过阈值多少。多个阈值可以用于提供各种分类。差异越大可以预测结果越好,并且差异越小(和甚至负值)可以预测结果越坏。If the first and/or second differences exceed the same or corresponding thresholds, the prognosis can be better. The classification of the prognosis can depend on how much the differences are below or above the thresholds. Multiple thresholds can be used to provide a variety of classifications. Larger differences predict better outcomes, while smaller differences (and even negative values) predict worse outcomes.

在一些实施例中,还记下了采集各个样品的时间点。在此类时间参数下,可以确定动力学或量的改变速率。在一个实施例中,血浆中肿瘤相关的低甲基化的快速降低和/ 或血浆中肿瘤相关的CNA的快速降低将预测良好预后。相反,血浆中肿瘤相关的低甲 基化的静态或快速增加和/或肿瘤相关的CNA的静态或快速增加将预测不良预后。甲基 化和CNA测量可以结合其它临床参数(例如成像结果或血清生物化学或蛋白质标记物) 使用以预测临床结果。In some embodiments, the time points at which each sample was collected are also recorded. Under such time parameters, the rate of change of kinetics or quantities can be determined. In one embodiment, a rapid decrease in plasma tumor-associated hypomethylation and/or a rapid decrease in plasma tumor-associated CNA predicts a favorable prognosis. Conversely, a static or rapid increase in plasma tumor-associated hypomethylation and/or a static or rapid increase in plasma tumor-associated CNA predicts a poor prognosis. Methylation and CNA measurements can be used in conjunction with other clinical parameters (e.g., imaging findings or serum biochemical or protein markers) to predict clinical outcomes.

除血浆外,实施例可以使用其它样品。举例来说,肿瘤相关的甲基化异常(例如低甲基化)和/或肿瘤相关的CNA可以从癌症患者血液中循环的肿瘤细胞、从尿、粪便、 唾液、痰液、胆汁液、胰腺液、子宫颈拭子、生殖道(例如阴道)分泌物、腹水、胸膜液、精液、汗水和泪液的游离DNA或肿瘤细胞得以测量。In addition to plasma, other samples may be used in the examples. For example, tumor-related methylation abnormalities (e.g., hypomethylation) and/or tumor-related CNAs can be measured from circulating tumor cells in the blood of cancer patients, or from cell-free DNA or tumor cells in urine, feces, saliva, sputum, bile, pancreatic fluid, cervical swabs, genital tract (e.g., vaginal) secretions, ascites, pleural fluid, semen, sweat, and tears.

在各个实施例中,肿瘤相关的甲基化异常(例如低甲基化)和/或肿瘤相关的CNA可以从乳癌、肺癌、结肠直肠癌、胰腺癌、卵巢癌、鼻咽癌、子宫颈癌、黑色素瘤、脑 肿瘤等患者的血液或血浆检测。实际上,因为例如CNA等甲基化和遗传改变是癌症中 的普遍现象,所以所述方法可以用于所有癌症类型。甲基化和CNA测量可以结合其它 临床参数(例如成像结果)使用以预测临床结果。实施例也可以用于筛选和监测具有肿瘤发生前病变,例如腺瘤的患者。In various embodiments, tumor-related methylation abnormalities (e.g., hypomethylation) and/or tumor-related CNAs can be detected from the blood or plasma of patients with breast cancer, lung cancer, colorectal cancer, pancreatic cancer, ovarian cancer, nasopharyngeal carcinoma, cervical cancer, melanoma, brain tumors, etc. In fact, because methylation and genetic alterations such as CNA are common in cancer, the methods described can be used for all cancer types. Methylation and CNA measurements can be used in conjunction with other clinical parameters (e.g., imaging results) to predict clinical outcomes. The embodiments can also be used to screen and monitor patients with pre-tumor lesions, such as adenomas.

因此,在一个实施例中,生物样品在治疗前采集,并且在治疗后重复CNA和甲基 化测量。测量可以得到确定显示缺失或扩增的区域的后续第一量并且可以得到确定区域 甲基化水平超过相应区域阈值的区域的后续第二量。第一量可以与后续第一量比较,并 且第二量可以与后续第二量比较,以确定生物体的预后。Therefore, in one embodiment, the biological sample is collected prior to treatment, and CNA and methylation measurements are repeated post-treatment. The measurements can yield a subsequent first quantity identifying regions showing deletion or amplification, and a subsequent second quantity identifying regions where the methylation level exceeds a corresponding regional threshold. The first quantity can be compared with the subsequent first quantity, and the second quantity can be compared with the subsequent second quantity to determine the prognosis of the organism.

确定生物体预后的比较可以包括确定第一量与后续第一量之间的第一差异,并且第 一差异可以与一或多个第一差异阈值比较以确定预后。确定生物体预后的比较也可以包 括确定第二量与后续第二量之间的第二差异,并且第二差异可以与一或多个第二差异阈 值比较。阈值可以是零或另一数目。Comparisons used to determine the prognosis of an organism may include determining a first difference between a first quantity and a subsequent first quantity, and this first difference may be compared to one or more first difference thresholds to determine the prognosis. Comparisons used to determine the prognosis of an organism may also include determining a second difference between a second quantity and a subsequent second quantity, and this second difference may be compared to one or more second difference thresholds. The thresholds may be zero or another number.

可以预测预后在第一差异低于第一差异阈值时比第一差异超过第一差异阈值时更 坏。可以预测预后在第二差异低于第二差异阈值时比第二差异超过第二差异阈值时更坏。治疗的实例包括免疫疗法、手术、放射线疗法、化学疗法、基于抗体的疗法、基因 疗法、表观遗传疗法或靶向疗法。The prognosis can be predicted to be worse when the first difference is below a first difference threshold than when the first difference exceeds a first difference threshold. Similarly, the prognosis can be predicted to be worse when the second difference is below a second difference threshold than when the second difference exceeds a second difference threshold. Examples of treatments include immunotherapy, surgery, radiation therapy, chemotherapy, antibody-based therapies, gene therapy, epigenetic therapy, or targeted therapy.

E.性能E. Performance

现描述对于CNA和甲基化分析,不同数目的序列读数和区域尺寸的诊断性能。The diagnostic performance for CNA and methylation analysis with different numbers of sequence reads and region sizes is described below.

1.序列读数的数目1. Number of sequence readings

根据一个实施例,分析32个健康对照个体的血浆DNA、26个患有肝细胞癌的患者和20个患有包括鼻咽癌、乳癌、肺癌、神经内分泌癌症和平滑肌肉瘤在内的其它类型 癌症的患者。随机选择32个健康个体中的二十二个作为参考群体。这22个参考个体的平均值和标准差(SD)用于确定甲基化密度和基因组代表的正常范围。从每个个体的血 浆样品提取的DNA用于使用伊路米那双末端测序试剂盒构造测序文库。随后测序文库 经受亚硫酸氢盐处理,将未甲基化的胞嘧啶残基转化成尿嘧啶。每个血浆样品经亚硫酸 氢盐转化的测序文库使用伊路米那HiSeq2000测序仪的一个通道测序。According to one embodiment, plasma DNA was analyzed from 32 healthy controls, 26 patients with hepatocellular carcinoma, and 20 patients with other types of cancer, including nasopharyngeal carcinoma, breast cancer, lung cancer, neuroendocrine cancer, and leiomyosarcoma. Twenty-two of the 32 healthy individuals were randomly selected as a reference group. The mean and standard deviation (SD) of these 22 reference individuals were used to determine the normal range for methylation density and genome representation. DNA extracted from plasma samples from each individual was used to construct sequencing libraries using an Illumina paired-end sequencing kit. The sequencing libraries were then subjected to bisulfite treatment to convert unmethylated cytosine residues into uracil. The bisulfite-converted sequencing libraries from each plasma sample were sequenced using one channel of an Illumina HiSeq2000 sequencer.

在碱基判定后,去除片段末端上的衔接序列和低质量碱基(即质量分数<5)。随后修剪过的以FASTQ格式存在的读数通过称为Methy-Pipe的甲基化数据分析的生物信息学流程来处理(江等人2010,有关生物信息学和生物医学的IEEE国际主会议, doi:10.1109/BIBMW.2010.5703866(P Jiang et al.2010,IEEE International Conference onBioinformatics and Biomedicine,doi:10.1109/BIBMW.2010.5703866))。为了比对经亚硫酸 氢盐转化的测序读数,首先正对参考人类基因组(NCBI build 36/hg19),用计算机程序对沃森和克里克链分开将所有胞嘧啶残基转化到胸腺嘧啶。随后,对所有处理过的读数进行每个胞嘧啶到胸腺嘧啶的转化并保存每个转化残基的位置信息。使用SOAP2将转 化读数与两个转化后参考人类基因组(李等人2009生物信息学25:1966-1967(R Li etal.2009Bioinformatics 25:1966-1967))比对,其中每个比对读数允许最多两个错配。仅仅可比对到基因组唯一位置的读数用于下游分析。去除同时比对到沃森和克里克链的不明确读数和重复(克隆)读数。CpG双核苷酸背景下的胞嘧啶残基用于下游甲基化分析。在比对后,基于在计算机程序转化期间保存的位置信息,恢复最初存在于测序读数上的 胞嘧啶。在CpG双核苷酸中恢复的胞嘧啶评分为甲基化。在CpG双核苷酸中的胸腺嘧 啶评分为未甲基化。After base determination, linker sequences and low-quality bases (i.e., quality fraction <5) at the ends of fragments were removed. The pruned reads in FASTQ format were then processed using a bioinformatics workflow called Methy-Pipe for methylation data analysis (Jiang et al. 2010, IEEE International Conference on Bioinformatics and Biomedicine, doi:10.1109/BIBMW.2010.5703866). To align the bisulfite-converted sequencing reads, they were first aligned with a reference human genome (NCBI build 36/hg19), and a computer program was used to separately convert all cytosine residues to thymine in the Watson and Crick chains. Subsequently, all processed reads underwent a cytosine-to-thymine transformation, and the positional information of each transformed residue was preserved. The transformed reads were aligned using SOAP2 with two transformed reference human genomes (R Li et al., 2009 Bioinformatics 25:1966-1967), with a maximum of two mismatches allowed per aligned read. Only reads aligned to unique genomic locations were used for downstream analysis. Indistinct reads and duplicate (clonal) reads aligned to both Watson and Crick chains were removed. Cytosine residues in the CpG dinucleotide background were used for downstream methylation analysis. After alignment, cytosine originally present on the sequencing reads was recovered based on the positional information preserved during the computer-programmed transformation. The recovered cytosine in CpG dinucleotides was scored as methylated. The thymine in CpG dinucleotides was scored as unmethylated.

对于甲基化分析,基因组划分成相同尺寸的区域。测试的区域的尺寸包括50kb、100kb、200kb和1Mb。每个区域的甲基化密度被计算为在CpG双核苷酸背景下甲基 化胞嘧啶的数目除以CpG位置上的胞嘧啶总数。在其它实施例中,跨越基因组,区域尺 寸可以是不相同的。在一个实施例中,跨越多个个体,比较在不相同尺寸的此类区域中 的每个区域。For methylation analysis, the genome is divided into regions of uniform size. Regions tested include sizes of 50kb, 100kb, 200kb, and 1Mb. The methylation density of each region is calculated as the number of methylated cytosines against a CpG dinucleotide background divided by the total number of cytosines at CpG sites. In other embodiments, the region sizes may vary across the genome. In one embodiment, each of these regions of varying sizes is compared across multiple individuals.

为了确定测试案例的血浆甲基化密度是否正常,甲基化密度与参考群体的结果比较。随机选择32个健康个体中的二十二个作为参考群体来计算甲基化z分数(Z甲基化)。To determine whether the plasma methylation density of the test cases was normal, the methylation density was compared with the results of a reference group. Twenty-two of the 32 randomly selected healthy individuals were used as the reference group to calculate the methylation z-score (Z -methylation ).

其中MD测试是特定1Mb区域的测试案例的甲基化密度;是对应区域的参考群体的平均甲基化密度;以及MDSD是对应区域的参考群体的甲基化密度的SD。Wherein, MD test is the methylation density of test cases in a specific 1Mb region; is the average methylation density of the reference population in the corresponding region; and MD SD is the SD of the methylation density of the reference population in the corresponding region.

对于CNA分析,确定比对到每个1Mb区域的测序读数的数目(陈等人2013临床 化学59:211-24)。使用如先前描述的局部加权回归散点平滑法(陈等人2011公共科学图 书馆·综合6:e21791(EZ Chen et al.2011PLoS One 6:e21791)),确定每个区域在针对 GC偏好校正后的测序读数密度。对于血浆分析,将测试案例的测序读数密度与参考群 体比较以计算CNA的z分数(ZCNA):For CNA analysis, the number of sequencing reads aligned to each 1Mb region was determined (Chen et al. 2013 Clinical Chemistry 59:211-24). The sequencing read density for each region, corrected for GC preference, was determined using locally weighted regression scatter smoothing as previously described (Chen et al. 2011 PLoS One 6:e21791). For plasma analysis, the sequencing read density of the test cases was compared to that of the reference population to calculate the z-score of the CNA (Z CNA ).

其中RD测试是特定1Mb区域的测试案例的测序读数密度;是对应区域的参考群体的平均测序读数密度;以及RDSD是对应区域的参考群体的测序读数密度的SD。如 果区域的ZCNA<-3或>3,那么界定区域显示CNA。The RD test is the sequencing read density of a test case in a specific 1Mb region; the average sequencing read density of the reference population for the corresponding region; and the RD SD is the SD of the sequencing read density of the reference population for the corresponding region. If the Z CNA of a region is <-3 or >3, then the region is defined by a CNA.

每个案例获得9300万比对读数的平均值(范围:3900万到1.42亿)。为了评估测 序读数的数目降低对诊断性能的影响,从每个案例随机选择1000万比对读数。相同组 的参考个体用于为测序读数减少的数据集建立每个1Mb区域的参考范围。确定每个案例 的展示显著低甲基化,即Z甲基化<-3的区域的百分比,和具有CNA,即ZCNA<-3或>3的区 域的百分比。受试者操作特征(ROC)曲线用于说明具有来自1个通道的所有测序读数 和每个案例1000万个读数的数据集的全基因组低甲基化和CNA分析的诊断性能。在ROC 分析中,所有32个健康个体都用于分析。Each case yielded an average of 93 million aligned reads (range: 39 million to 142 million). To assess the impact of reduced sequencing read counts on diagnostic performance, 10 million aligned reads were randomly selected from each case. Reference individuals from the same group were used to establish reference ranges for each 1Mb region in the dataset with reduced sequencing read counts. The percentage of regions exhibiting significant hypomethylation (Z methylation < -3) and having CNA (Z CNA < -3 or > 3) were determined for each case. Receiver operating characteristic (ROC) curves were used to illustrate the diagnostic performance of whole-genome hypomethylation and CNA analysis for datasets with all sequencing reads from one channel and 10 million reads per case. All 32 healthy individuals were used in the ROC analysis.

图42展示具有不同数目的测序读数的全基因组低甲基化分析的诊断性能图。对于低甲基化分析,ROC曲线的曲线下面积在两个数据集之间未显著差异,此两个数据集分 析来自一个通道的所有测序读数和每一案例1000万个读数(P=0.761)。对于CNA分析, 当测序读数的数目从使用一个通道的数据降到1000万时诊断性能随着曲线下面积的显著降低而退化(P<0.001)。Figure 42 illustrates the diagnostic performance of whole-genome hypomethylation analysis with varying numbers of sequencing reads. For hypomethylation analysis, the area under the ROC curve did not differ significantly between the two datasets, which analyzed all sequencing reads from one channel and 10 million reads per case (P = 0.761). For CNA analysis, diagnostic performance degraded significantly with a decrease in the area under the curve as the number of sequencing reads decreased from 10 million using data from one channel (P < 0.001).

2.使用不同区域尺寸的影响2. The impact of using different area sizes

除将基因组划分成1Mb区域外,还探索是否可以使用更小的区域尺寸。理论上, 更小区域的使用可能减少区域内甲基化密度的变化性。这是因为不同基因组区域之间的 甲基化密度可以大幅变化。当区域较大时,包括具有不同甲基化密度的区域的机率将增 加,并且因此将引起区域的甲基化密度的变化性整体增加。In addition to dividing the genome into 1Mb regions, the possibility of using smaller region sizes was explored. Theoretically, using smaller regions could reduce the variability in methylation density within those regions. This is because methylation density can vary significantly between different genomic regions. Larger regions increase the likelihood of including regions with different methylation densities, thus leading to an overall increase in the variability in methylation density within those regions.

虽然使用更小的区域尺寸可能减少与区域间差异有关的甲基化密度的变化性,但另 一方面这将减少比对到特定区域的测序读数的数目。比对到个别区域的读数降低将增加 因抽样变化而引起的变化性。可以引起甲基化密度的整体变化性最低的最适区域尺寸可 以用实验方式针对特定诊断应用的需求确定,例如每个样品的测序读数的总数和使用的 DNA测序仪的类型。While using smaller region sizes may reduce variability in methylation density related to inter-regional differences, it also reduces the number of sequencing reads aligned to a specific region. Reduced reads aligned to individual regions increase variability due to sampling variations. The optimal region size that minimizes overall variability in methylation density can be determined experimentally for the needs of a specific diagnostic application, such as the total number of sequencing reads per sample and the type of DNA sequencer used.

图43是展示基于使用不同区域尺寸(50kb、100kb、200kb和1Mb)的全基因组 低甲基化分析检测癌症的ROC曲线的图。所示P值是用于1Mb区域尺寸下曲线下面积 的比较。当区域尺寸从1Mb减少到200kb时可以看到改善的倾向。Figure 43 is a graph showing the ROC curves for detecting cancer based on whole-genome hypomethylation analysis using different region sizes (50kb, 100kb, 200kb, and 1Mb). The p-values shown are comparisons of the area under the curve for the 1Mb region size. A tendency for improvement can be seen when the region size is reduced from 1Mb to 200kb.

F.累积概率分数F. Cumulative probability score

甲基化和CNA的区域的量可以是各种值。以上实例描述了超过阈值的区域数目或展 示显著低甲基化或CNA的此类区域的百分比作为分类样品是否与癌症相关的参数。此类方法并未考虑个别区域的异常量值。举例来说,Z甲基化为-3.5的区域将与Z甲基化为-30的区 域相同,因为两者都将归类为具有显著低甲基化。但是,血浆中低甲基化改变的程度, 即Z甲基化值的量值,受样品中癌症相关的DNA的量影响,并且因此,可以补充展示异常的 区域的百分比的信息以反映肿瘤负荷。血浆样品中肿瘤DNA的较高百分比浓度将引起较 低甲基化密度,并且此将变换成较低Z甲基化值。The amount of methylated and CNA regions can be various values. The examples above describe the number of regions exceeding a threshold or the percentage of such regions exhibiting significant hypomethylation or CNA as a parameter for classifying whether a sample is associated with cancer. Such methods do not account for anomalous values in individual regions. For example, a region with Z -methylation of -3.5 will be considered the same as a region with Z -methylation of -30, as both will be classified as having significant hypomethylation. However, the degree of hypomethylation alteration in plasma, i.e., the magnitude of Z -methylation values, is influenced by the amount of cancer-associated DNA in the sample, and therefore, the percentage of regions exhibiting anomalous values can be used to supplement information reflecting tumor burden. A higher percentage concentration of tumor DNA in a plasma sample will result in a lower hypomethylation density, which will translate into a lower Z- methylation value.

1.作为诊断参数的累积概率分数1. Cumulative probability score as a diagnostic parameter

为了利用来自异常量值的信息,研发一种称为累积概率(CP)分数的方法。基于正态分布概率函数,每个Z甲基化值变换成碰巧具有此类观测结果的概率。To utilize information from outlier values, a method called cumulative probability (CP) score was developed. Based on a normal distribution probability function, each Z -methylation value is transformed into the probability of having such an observation by chance.

CP分数被计算为:The CP score is calculated as follows:

对于Z甲基化<-3的区域(i),CP分数=∑-log(Probi)For regions (i) with Z -methylation < -3, the CP score is ∑-log(Prob i ).

其中Probi是根据具有3个自由度的史都登氏t分布的区域(i)的Z甲基化的概率,并且log 是自然对数函数。在另一个实施例中,可以使用具有底数10(或其它数目)的对数。在其它实施例中,例如(但不限于)正态分布和γ分布等其它分布可以用于将z分数变换成CP。Where Prob <sub>i</sub> is the probability of Z -methylation of region (i) according to a Steudent t-distribution with 3 degrees of freedom, and log<sub>i</sub> is the natural logarithm function. In another embodiment, a logarithm with a base of 10 (or other number) can be used. In other embodiments, other distributions, such as (but not limited to) normal and gamma distributions, can be used to transform the z-scores into CP.

CP分数越大指示正常群体中碰巧具有此类偏离甲基化密度的概率越低。因此,高CP分数将指示样品中具有异常低甲基化的DNA,例如存在癌症相关的DNA的机率较 高。A higher CP score indicates a lower probability that a normal population happens to have this type of deviated methylation density. Therefore, a high CP score will indicate a higher probability of having abnormally low methylation DNA in the sample, such as cancer-related DNA.

与展示异常的区域的百分比相比,CP分数测量值具有更高的动态范围。虽然不同患者之间的肿瘤负荷可以大幅变化,但更大范围的CP值将适用于反映具有相对较高和 相对较低肿瘤负荷的患者的肿瘤负荷。此外,CP分数的使用可能对于检测血浆中肿瘤 相关的DNA的浓度改变更灵敏。此对于监测治疗反应和预测来说是有利的。因此,在 治疗期间CP分数的降低指示对治疗的反应良好。在治疗期间缺乏CP分数的降低或甚至增加将指示反应不良或缺乏。对于预后,高CP分数指示高肿瘤负荷并且表明预后不 良(例如较高的死亡或肿瘤进展机率)。Compared to the percentage of areas showing abnormalities, CP score measurements have a wider dynamic range. While tumor burden can vary considerably between patients, a wider range of CP values will be applicable to reflecting the tumor burden in patients with relatively high and relatively low tumor burdens. Furthermore, the use of CP scores may be more sensitive for detecting changes in the concentration of tumor-associated DNA in plasma. This is advantageous for monitoring treatment response and prediction. Therefore, a decrease in CP score during treatment indicates a good response to treatment. The lack of a decrease in CP score or even an increase during treatment will indicate a poor or absent response. For prognosis, a high CP score indicates a high tumor burden and suggests a poor prognosis (e.g., a higher chance of death or tumor progression).

图44A展示累积概率(CP)和具有异常的区域的百分比的诊断性能。两种类型诊 断算法的曲线下面积之间无显著性差异(P=0.791)。Figure 44A illustrates the diagnostic performance of cumulative probability (CP) and the percentage of regions with anomalies. There was no significant difference in the area under the curve between the two types of diagnostic algorithms (P = 0.791).

图44B展示针对整体低甲基化、CpG岛高甲基化和CNA的血浆分析的诊断性能。 在每个样品测序一个通道(低甲基化分析是200kb区域尺寸,以及CNA是1Mb区域 尺寸,以及CpG岛根据加州大学圣克鲁兹分校(The University of California,Santa Cruz,UCSC))代管的数据库界定)下,所有三种类型分析的曲线下面积都在0.90以上。Figure 44B illustrates the diagnostic performance of plasma analyses for overall hypomethylation, CpG island hypermethylation, and CNA. With one channel sequenced per sample (hypomethylation analysis was defined as a 200 kb region size, CNA as a 1 Mb region size, and CpG islands as defined by the database hosted by the University of California, Santa Cruz (UCSC), the area under the curve (AUC) for all three analysis types was above 0.90.

在后续分析中,对照个体中的最高CP分数用作三种类型分析每一者的阈值。选择这些阈值得到100%的诊断特异性。整体低甲基化、CpG岛高甲基化和CNA分析的诊断 灵敏度分别是78%、89%和52%。在46个癌症患者中的43个中,检测到三种类型异常中的至少一种,因此,产生93.4%的灵敏度和100%的特异性。结果指示三种类型分析 可以协同使用来检测癌症。In subsequent analyses, the highest CP score in the control individuals was used as a threshold for each of the three types of analyses. These thresholds were selected to achieve 100% diagnostic specificity. The diagnostic sensitivities for overall hypomethylation, CpG island hypermethylation, and CNA analyses were 78%, 89%, and 52%, respectively. At least one of the three types of abnormalities was detected in 43 out of 46 cancer patients, thus yielding a sensitivity of 93.4% and a specificity of 100%. These results indicate that the three types of analyses can be used synergistically to detect cancer.

图45展示具有肝细胞癌患者中整体低甲基化、CpG岛高甲基化和CNA的结果的表。三种类型分析的CP分数阈值分别是960、2.9和211。阳性CP分数结果呈粗体和带下 划线。Figure 45 shows the results for overall hypomethylation, CpG island hypermethylation, and CNA in patients with hepatocellular carcinoma. The CP score thresholds for the three types of analysis are 960, 2.9, and 211, respectively. Positive CP score results are shown in bold and underlined.

图46展示具有患有除肝细胞癌外的癌症的患者中整体低甲基化、CpG岛高甲基化和CNA的结果的表。三种类型分析的CP分数阈值分别是960、2.9和211。阳性CP分 数结果呈粗体和带下划线。Figure 46 shows the results for overall hypomethylation, CpG island hypermethylation, and CNA in patients with cancers other than hepatocellular carcinoma. The CP score thresholds for the three types of analysis are 960, 2.9, and 211, respectively. Positive CP score results are shown in bold and underlined.

2.CP分数用于癌症监测的应用2. Application of CP score in cancer surveillance

在治疗前后从HCC患者TBR34收集系列样品。分析样品的整体低甲基化。Series of samples were collected from TBR34 of HCC patients before and after treatment. Overall hypomethylation of the samples was analyzed.

图47展示案例TBR34的血浆甲基化的系列分析。最内环展示白细胞层(黑色)和 肿瘤组织(紫色)的甲基化密度。对于血浆样品,展示每个1Mb区域的Z甲基化。两条线 之间的差异表示5的Z甲基化差异。红点与灰点表示与参考群体比较具有低甲基化的区域并 且甲基化密度未改变的区域。从向外第2内环起,分别是在治疗前、肿瘤切除3天后和 2个月后采集的血浆样品。在治疗前,可以在血浆中观测到高度的低甲基化,并且超过 18.5%的区域的Z甲基化<-10。在肿瘤切除3天后,可以观测到血浆中低甲基化程度减少, 其中没有区域的Z甲基化<-10。Figure 47 shows a series of analyses of plasma methylation in case TBR34. The innermost ring shows the methylation density of the leukocyte layer (black) and tumor tissue (purple). For plasma samples, Z- methylation is shown for each 1Mb region. The difference between the two lines represents a 5-fold difference in Z -methylation . Red and gray dots represent regions with low methylation compared to the reference population and where the methylation density remained unchanged. From the second innermost ring outwards, plasma samples were collected before treatment, 3 days after tumor resection, and 2 months later, respectively. Before treatment, high levels of low methylation were observed in the plasma, with over 18.5% of regions having Z- methylation <-10. 3 days after tumor resection, a decrease in low methylation was observed in the plasma, with no regions having Z -methylation <-10.

表5Table 5

表5展示虽然在肿瘤手术切除3天后低甲基化改变的量值减少,但显示异常的区域的百分比展示反常的增加。另一方面,CP分数更准确地揭露血浆中低甲基化程度降低 并且可以更好地反映肿瘤负荷的改变。Table 5 shows that although the magnitude of hypomethylation changes decreased 3 days after tumor resection, the percentage of areas showing abnormalities showed an abnormal increase. On the other hand, CP scores more accurately reveal the decrease in plasma hypomethylation and can better reflect changes in tumor burden.

在手术治疗2个月后,仍然存在显著百分比的展示低甲基化改变的区域。CP分数还保持固定在大约15,000下。此患者后来被诊断为在手术3个月后在剩余未切除的肝中 具有多灶性肿瘤(先前在手术时未知)并且注意到在手术4个月后具有多处肺癌转移。 在手术8个月后患者死于转移性疾病。这些结果表明CP分数可能比具有异常的区域的 百分比更有效于反映肿瘤负荷。Two months after surgery, a significant percentage of areas exhibiting hypomethylation remained. The CP score also remained fixed at approximately 15,000. This patient was later diagnosed with multifocal tumors (previously unknown at surgery) in the remaining unresected liver three months after surgery, and multiple lung cancer metastases were noted four months after surgery. The patient died of metastatic disease eight months after surgery. These results suggest that the CP score may be a more effective indicator of tumor burden than the percentage of areas with abnormalities.

总的来说,CP可以适用于需要测量血浆中肿瘤DNA的量的应用。此类应用的实例包括:预后和监测癌症患者(例如观测对治疗的反应,或观测肿瘤进展)。In general, CP can be applied to applications that require measuring the amount of tumor DNA in plasma. Examples of such applications include: prognostic and monitoring cancer patients (e.g., observing response to treatment or monitoring tumor progression).

累积z分数是z分数的直和,即不转化成概率。在此实例中,累积z分数展示与CP 分数相同的行为。在其它情况下,CP可能比累积z分数监测残余疾病更灵敏,因为CP 分数的动态范围更大。The cumulative z-score is a direct sum of z-scores, meaning it is not converted into a probability. In this example, the cumulative z-score exhibits the same behavior as the CP score. In other cases, the CP may be more sensitive than the cumulative z-score in monitoring residual disease because the CP score has a larger dynamic range.

X.CNA对甲基化的影响The effect of X.CNA on methylation

上文描述CNA和甲基化用于确定癌症等级的相应分类,其中分类组合以提供第三分 类。除此组合外,CNA还可以用于改变甲基化分析的阈值并通过比较具有不同CNA特征的区域群体的甲基化水平来鉴别假阳性。举例来说,过大丰度对应区域的甲基化水平(例 如ZCNA>3)可以与正常丰度对应区域的甲基化水平比较(例如-3<ZCNA<3)。首先,描述 CNA对甲基化水平的影响。The preceding text describes the use of CNA and methylation in the corresponding classification for determining cancer grade, where classifications are combined to provide a third category. In addition to this combination, CNA can also be used to modify the threshold of methylation analysis and to identify false positives by comparing the methylation levels of populations of regions with different CNA characteristics. For example, the methylation level of regions corresponding to excessive abundance (e.g., Z CNA > 3) can be compared with the methylation level of regions corresponding to normal abundance (e.g., -3 < Z CNA < 3). First, the effect of CNA on methylation levels is described.

A.具有染色体增加和损失的区域的甲基化密度的改变A. Changes in methylation density in regions with chromosome addition and loss.

因为肿瘤组织一般展示整体低甲基化,所以癌症患者的血浆中肿瘤来源的DNA的存在将引起与非癌症个体比较,甲基化密度降低。癌症患者的血浆中的低甲基化程度在 理论上与血浆样品中的肿瘤来源的DNA的百分比浓度成比例。Because tumor tissue generally exhibits overall hypomethylation, the presence of tumor-derived DNA in the plasma of cancer patients will cause a decrease in methylation density compared to non-cancer individuals. Theoretically, the degree of hypomethylation in the plasma of cancer patients is proportional to the percentage concentration of tumor-derived DNA in the plasma sample.

对于肿瘤组织中展示染色体增加的区域,额外剂量的肿瘤DNA将从扩增的DNA区段释放到血浆中。此增加的肿瘤DNA对血浆的贡献将在理论上引起患病区域的血浆 DNA中较高程度的低甲基化。另一因素是将预期展示扩增的基因组区域赋予肿瘤细胞以 生长优势,因而预期会表达。此类区域一般低甲基化。For regions of tumor tissue exhibiting chromosomal amplification, an additional dose of tumor DNA will be released from the amplified DNA segments into the plasma. This contribution of increased tumor DNA to the plasma will theoretically cause a higher degree of hypomethylation in the plasma DNA of the diseased region. Another factor is that the anticipated amplification of the genomic region will give tumor cells a growth advantage, thus presumably leading to its expression. Such regions are generally hypomethylated.

相比之下,对于肿瘤组织中展示染色体损失的区域,减少的肿瘤DNA对血浆的贡献将引起与不具有拷贝数改变的区域比较,较低程度的低甲基化。另一因素是肿瘤细胞 中缺失的基因组区域可能含有肿瘤抑制基因并且可能有利于肿瘤细胞使此类区域沉默。 因此,预期此类区域具有较高机率的高甲基化。In contrast, for regions of tumor tissue exhibiting chromosomal loss, the reduced contribution of tumor DNA to plasma will result in a lower degree of hypomethylation compared to regions without copy number alterations. Another factor is that the missing genomic regions in tumor cells may contain tumor suppressor genes, potentially silencing these regions. Therefore, such regions are expected to have a higher probability of hypermethylation.

此处,使用两个HCC患者(TBR34和TBR36)的结果说明此影响。图48A(TBR36) 和49A(TBR34)具有圆圈突出的染色体增加或损失的区域和对应的甲基化分析。图48B 和49B分别展示患者TBR36和TBR34的针对损失、正常和增加的甲基化z分数的箱式 图。Here, results from two HCC patients (TBR34 and TBR36) are used to illustrate this effect. Figures 48A (TBR36) and 49A (TBR34) show regions of increased or decreased chromosomes highlighted by circles and the corresponding methylation analyses. Figures 48B and 49B show box plots of methylation z-scores for decreased, normal, and increased methylation in patients TBR36 and TBR34, respectively.

图48A展示证实HCC患者TBR36的经亚硫酸氢盐处理的血浆DNA中的CNA(内 环)和甲基化改变(外环)的Circos图。红色圆圈突出了具有染色体增加或损失的区域。 展示染色体增加的区域比无拷贝数改变的区域更加低甲基化。展示染色体损失的区域不 如无拷贝数改变的区域低甲基化。图48B是HCC患者TBR36的具有染色体增加和损失 的区域以及无拷贝数改变的区域的甲基化z分数的曲线。与无拷贝改变的区域比较,具 有染色体增加的区域具有更大负值的z分数(更加低甲基化)并且具有染色体损失的区 域具有更小负值的z分数(不太低甲基化)。Figure 48A shows a Circos plot confirming changes in CNA (inner loop) and methylation (outer loop) in bisulfite-treated plasma DNA of HCC patient TBR36. Red circles highlight regions with chromosome increase or loss. Regions showing chromosome increase are more hypomethylated than regions without copy number changes. Regions showing chromosome loss are less hypomethylated than regions without copy number changes. Figure 48B is a curve showing the methylation z-score for regions with chromosome increase and loss, and regions without copy number changes, in HCC patient TBR36. Compared to regions without copy number changes, regions with chromosome increase have larger negative z-scores (more hypomethylated), and regions with chromosome loss have smaller negative z-scores (less hypomethylated).

图49A展示证实HCC患者TBR34的经亚硫酸氢盐处理的血浆DNA中的CNA(内 环)和甲基化改变(外环)的Circos图。图49B是HCC患者TBR34的具有染色体增加 和损失的区域以及无拷贝数改变的区域的甲基化z分数的箱式图。患者TBR36中具有染色体增加和损失的区域之间的甲基化密度差异比患者TBR34中大,因为前一患者中肿 瘤来源的DNA的百分比浓度更高。Figure 49A shows a Circos plot confirming CNA (inner loop) and methylation alterations (outer loop) in bisulfite-treated plasma DNA from HCC patient TBR34. Figure 49B is a box plot of methylation z-fractions in regions with chromosomal increases and losses, as well as regions without copy number alterations, from HCC patient TBR34. The difference in methylation density between regions with chromosomal increases and losses was greater in patient TBR36 than in patient TBR34 because the former patient had a higher percentage concentration of tumor-derived DNA.

在此实例中,用于确定CNA的区域与用于确定甲基化的区域相同。在一个实施例中,相应区域阈值依赖于相应区域显示缺失还是扩增。在一个实现方式中,与未显示扩 增时相比,相应区域显示扩增时相应区域阈值(例如用于确定低甲基化的z分数阈值) 的量值更大(例如量值可以超过3,并且可以使用低于-3的阈值)。因此,为了测试低甲 基化,与未显示扩增相比,相应区域显示扩增时相应区域阈值可以具有更大的负值。预 期此类实施方式提高用于检测癌症的测试的特异性。In this example, the region used to determine CNA is the same as the region used to determine methylation. In one embodiment, the threshold for the corresponding region depends on whether the corresponding region shows deletion or amplification. In one implementation, the magnitude of the threshold for the corresponding region (e.g., the z-score threshold for determining hypomethylation) is larger when the corresponding region shows amplification compared to when no amplification is shown (e.g., the magnitude can exceed 3, and a threshold below -3 can be used). Therefore, to test for hypomethylation, the threshold for the corresponding region can have a larger negative value when the corresponding region shows amplification compared to when no amplification is shown. It is expected that such implementations will improve the specificity of tests for detecting cancer.

在另一个实施例中,与未显示缺失时相比,相应区域显示缺失时,相应区域阈值具有更小量值(例如低于3)。因此,为了测试低甲基化,与未显示缺失相比,相应区域显示缺失时相应区域阈值可以具有更小的负值。预期此类实现方式提高用于检测癌症的测 试的灵敏度。以上实现方式中阈值的调整可以取决于针对特定诊断情况所希望的灵敏度 和特异性改变。在其它实施例中,甲基化和CNA测量可以结合其它临床参数(例如成 像结果或血清生物化学)使用以预测癌症。In another embodiment, when a region shows a deletion, the threshold for that region has a smaller value (e.g., below 3) compared to when no deletion is shown. Therefore, to test for hypomethylation, the threshold for a region showing a deletion can have a smaller negative value compared to when no deletion is shown. Such implementations are expected to improve the sensitivity of tests used to detect cancer. The threshold adjustments in the above implementations can depend on the desired changes in sensitivity and specificity for a particular diagnostic situation. In other embodiments, methylation and CNA measurements can be used in conjunction with other clinical parameters (e.g., imaging results or serum biochemistry) to predict cancer.

B.使用CNA选择区域B. Use CNA to select the region

如上所述,已经展示了在肿瘤组织中具有拷贝数异常的区域中血浆甲基化密度将改 变。在肿瘤组织中具有拷贝数增加的区域,低甲基化的肿瘤DNA对血浆的贡献增加将 引起与无拷贝数异常的区域比较,血浆DNA低甲基化的程度更大。相反,在肿瘤组织 中具有拷贝数损失的区域,低甲基化的癌症来源的DNA对血浆的贡献减少将引起血浆 DNA低甲基化的程度更小。血浆DNA的甲基化密度与相对呈现量之间的此关系可用于区分与存在癌症相关的DNA对应的低甲基化结果与血浆DNA中其它非癌性原因对应的低甲基化(例如SLE)。As described above, it has been shown that plasma methylation density changes in regions with copy number abnormalities within tumor tissue. In regions of increased copy number within tumor tissue, the increased contribution of hypomethylated tumor DNA to plasma leads to a greater degree of plasma DNA hypomethylation compared to regions without copy number abnormalities. Conversely, in regions of copy number loss within tumor tissue, the reduced contribution of hypomethylated cancer-derived DNA to plasma leads to a lesser degree of plasma DNA hypomethylation. This relationship between plasma DNA methylation density and relative presentation can be used to distinguish hypomethylation results corresponding to the presence of cancer-related DNA from hypomethylation corresponding to other non-cancerous causes in plasma DNA (e.g., SLE).

为了说明此方法,分析两个肝细胞癌(HCC)患者和两个SLE而非癌症患者的血浆样品。这两个SLE患者(SLE04和SLE10)展示血浆中明显存在低甲基化和CNA。对 于患者SLE04,84%区域展示低甲基化并且11.2%区域展示CNA。对于患者SLE10,10.3% 区域展示低甲基化并且5.7%区域展示CNA。To illustrate this method, plasma samples from two patients with hepatocellular carcinoma (HCC) and two patients with SLE but not cancer were analyzed. The two SLE patients (SLE04 and SLE10) showed significant hypomethylation and CNA in their plasma. For patient SLE04, 84% of the area showed hypomethylation and 11.2% showed CNA. For patient SLE10, 10.3% of the area showed hypomethylation and 5.7% showed CNA.

图50A和50B展示SLE患者SLE04和SLE10的血浆低甲基化和CNA分析的结果。外 圆展示在1Mb分辨率下的甲基化z分数(Z甲基化)。甲基化Z甲基化<-3的区域呈红色并且Z甲基化>-3的区域呈灰色。内圆展示CNA z分数(ZCNA)。绿点、红点和灰点分别表示ZCNA>3、<3和-3到3之间的区域。在这两个SLE患者中,在血浆中观测到低甲基化和CNA改变。Figures 50A and 50B show the results of plasma hypomethylation and CNA analysis in SLE patients SLE04 and SLE10. The outer circle shows the methylation z-fraction (Z -methylation ) at 1 Mb resolution. Regions with Z -methylation < -3 are shown in red, and regions with Z -methylation > -3 are shown in gray. The inner circle shows the CNA z-fraction (Z- CNA ). Green, red, and gray dots represent regions with Z- CNA >3, <3, and between -3 and 3, respectively. Hypomethylation and CNA alterations were observed in the plasma of both SLE patients.

为了确定甲基化和CNA的改变是否与血浆中存在癌症来源的DNA一致,比较 ZCNA>3、<-3和-3到3之间的区域的Z甲基化。对于血浆中由癌症来源的DNA引起的甲基化 改变和CNA,将预期ZCNA<-3的区域对应更少低甲基化并具有更小负值的Z甲基化。相比之 下,将预期ZCNA>3的区域对应更多低甲基化并具有更大负值的Z甲基化。为了说明,应用 单边秩和检验来比较具有CNA的区域(即ZCNA<-3或>3的区域)与无CNA的区域(即ZCNA在-3与3之间)的Z甲基化。在其它实施例中,可以使用其它统计检验,例如(但不限于) 史都登氏t检验、方差分析(ANOVA)检验和克鲁斯卡尔-沃利斯检验(Kruskal-Wallis test)。To determine whether changes in methylation and CNA are consistent with the presence of cancer-derived DNA in plasma, Z- methylation was compared in regions with Z- CNA >3, <-3, and between -3 and 3. For changes in methylation and CNA caused by cancer-derived DNA in plasma, regions with Z -CNA <-3 were expected to correspond to less hypomethylation and smaller negative Z -methylation values. In contrast, regions with Z -CNA >3 were expected to correspond to more hypomethylation and larger negative Z -methylation values. For illustration, a one-sided rank-sum test was applied to compare Z-methylation in regions with CNA (i.e., regions with Z- CNA <-3 or >3) with regions without CNA (i.e., regions with Z- CNA between -3 and 3). In other embodiments, other statistical tests may be used, such as (but not limited to) Stourden's t-test, analysis of variance (ANOVA) test, and Kruskal-Wallis test.

图51A和51B展示两个HCC患者(TBR34和TBR36)的血浆的有和无CNA的区域的 Z甲基化分析。ZCNA<-3和>3的区域分别表示血浆中DNA呈现不足和呈现过高的区域。在 TBR34与TBR36中,与血浆中具有正常呈现的区域(即ZCNA在-3与3之间的区域)相 比,血浆中呈现不足的区域(即ZCNA<-3的区域)具有显著更高的Z甲基化(P值<10-5,单 边秩和检验)。正常呈现对应于整倍体基因组所预期的。对于血浆中呈现过高的区域(即 ZCNA>3的区域),与血浆中具有正常呈现的区域相比,其具有显著更低的Z甲基化(P值<10-5, 单边秩和检验)。所有这些改变都与血浆样品中低甲基化的肿瘤DNA的存在一致。Figures 51A and 51B show Z- methylation analysis of CNA-containing and CNA-free regions in plasma from two HCC patients (TBR34 and TBR36). Regions with Z CNA <-3 and >3 represent regions of insufficient and excessive DNA presentation in plasma, respectively. In both TBR34 and TBR36, regions of insufficient presentation (i.e., Z CNA <-3) in plasma showed significantly higher Z- methylation compared to regions with normal presentation (i.e., regions with Z CNA between -3 and 3) (P < 10⁻⁵ , one-sided rank-sum test). Normal presentation corresponds to what is expected in euploid genomes. For regions of excessive presentation (i.e., Z CNA >3), they showed significantly lower Z- methylation compared to regions with normal presentation (P < 10⁻⁵ , one-sided rank-sum test). All these changes are consistent with the presence of hypomethylated tumor DNA in plasma samples.

图51C和51D展示两个SLE患者(SLE04和SLE10)的血浆的有和无CNA的区域的 Z甲基化分析。ZCNA<-3和>3的区域分别表示血浆中呈现不足和呈现过高的区域。对于 SLE04,与血浆中具有正常呈现的区域(即ZCNA在-3与3之间的区域)相比,血浆中呈 现不足的区域(即ZCNA<-3的区域)不具有显著更高的Z甲基化(P值=0.99,单边秩和检 验),并且与血浆中具有正常呈现的区域相比,血浆中呈现过高的区域(即ZCNA>3的区 域)不具有显著更低的Z甲基化(P值=0.68,单边秩和检验)。这些结果不同于因血浆中存 在肿瘤来源的低甲基化的DNA而引起的所预期的改变。类似地,对于SLE10,与ZCNA在-3与3之间的区域相比,ZCNA<-3的区域不具有显著更高的Z甲基化(P值=0.99,单边秩和检验)。Figures 51C and 51D show Z- methylation analysis of plasma regions with and without Z-CNA in two SLE patients (SLE04 and SLE10). Regions with Z -CNA <-3 and >3 represent regions with insufficient and excessive Z-CNA in plasma, respectively. For SLE04, regions with insufficient Z -CNA (i.e., Z-CNA <-3) did not have significantly higher Z -methylation compared to regions with normal Z-CNA (i.e., regions with Z -CNA between -3 and 3) (P = 0.99, one-sided rank-sum test), and regions with excessive Z-CNA (i.e., regions with Z -CNA >3) did not have significantly lower Z- methylation compared to regions with normal Z-CNA (P = 0.68, one-sided rank-sum test). These results differ from the expected changes due to the presence of tumor-derived hypomethylated DNA in plasma. Similarly, for SLE10, regions with Z CNA < -3 did not have significantly higher Z methylation compared to regions with Z CNA between -3 and 3 (P = 0.99, one-sided rank-sum test).

SLE患者中不具有Z甲基化与ZCNA之间的典型癌症相关模式的原因在于在SLE患者中,CNA不存在于也显示低甲基化的特定细胞型中。实际上,所观测到的表面上存在的CNA 和低甲基化是因为SLE患者中循环DNA的尺寸分布改变。当参考来源于健康个体时,改 变的尺寸分布可能改变了不同基因组区域的测序读取密度,导致表面上存在的CNA。如 先前章节中所描述,循环DNA片段的尺寸与其甲基化密度之间存在相关性。因此,改变 的尺寸分布也可以导致异常甲基化。The reason why the typical cancer-related pattern between Z -methylation and Z -CNA is not present in SLE patients is that CNA is absent in certain cell types that also exhibit hypomethylation in SLE patients. In fact, the observed presence of CNA and hypomethylation on the surface is due to altered size distribution of circulating DNA in SLE patients. When the reference is derived from healthy individuals, this altered size distribution may change the sequencing read density of different genomic regions, leading to the presence of CNA on the surface. As described in previous chapters, there is a correlation between the size of circulating DNA fragments and their methylation density. Therefore, altered size distribution can also lead to aberrant methylation.

虽然ZCNA>3的区域比ZCNA在-3与3之间的区域的甲基化水平略低,但比较的p值远高于在两个癌症患者中观测到的p值。在一个实施例中,p值可以用作参数以确定测试案例患有癌症的可能性。在另一个实施例中,具有正常和异常呈现的区域之间的Z甲基化差异可 以用作指示存在癌症的可能性的参数。在一个实施例中,一组癌症患者可以用于建立Z甲基化与ZCNA之间的相关性并确定不同参数的阈值以便指示改变与测试血浆样品中存在癌 症来源的低甲基化的DNA一致。Although regions with Z CNA > 3 have slightly lower methylation levels than regions with Z CNA between -3 and 3, the p-values for comparison are much higher than those observed in two cancer patients. In one embodiment, the p-value can be used as a parameter to determine the likelihood of a test case having cancer. In another embodiment, the difference in Z methylation between regions with normal and abnormal presentation can be used as a parameter indicating the likelihood of cancer presence. In one embodiment, a group of cancer patients can be used to establish the correlation between Z methylation and Z CNA and to determine thresholds for different parameters to indicate changes consistent with the presence of cancer-derived hypomethylated DNA in the test plasma sample.

因此,在一个实施例中,可以进行CNA分析以确定都显示以下一者的第一组区域:缺失、扩增或正常呈现。举例来说,第一组区域可以都显示缺失,或都显示扩增,或都 显示正常呈现(例如具有正常第一量的区域,例如正常Z甲基化)。可以确定此第一组区域 的甲基化水平(例如方法2800的第一甲基化水平可以对应于第一组区域)。Therefore, in one embodiment, CNA analysis can be performed to determine a first group of regions that all show one of the following: deletion, amplification, or normal presentation. For example, the first group of regions may all show deletion, or all show amplification, or all show normal presentation (e.g., regions with a normal first amount, such as normal Z- methylation ). The methylation level of this first group of regions can be determined (e.g., the first methylation level of method 2800 may correspond to the first group of regions).

CNA分析可以确定都显示以下第二者的第二组区域:缺失、扩增或正常呈现。第二组区域将显示不同于第一组。举例来说,如果第一组区域正常,那么第二组区域可以显 示缺失或扩增。可以基于第二组区域中位点上甲基化的DNA分子的相应数目计算第二 甲基化水平。CNA analysis can identify a second group of regions that all show one of the following: deletion, amplification, or normal presentation. The second group of regions will show a different appearance than the first group. For example, if the first group of regions is normal, then the second group of regions may show deletion or amplification. The level of second methylation can be calculated based on the corresponding number of methylated DNA molecules at the sites in the second group of regions.

随后可以在第一甲基化水平与第二甲基化之间计算参数。举例来说,可以计算差异 或比率并与阈值比较。差异或比率也可以经受概率分布(例如作为统计检验的一部分)以确定获得所述值的概率,并且此概率可以与阈值比较以基于甲基化水平确定癌症等 级。可以选择此类阈值以区分患有癌症的样品与未患癌症的样品(例如SLE)。A parameter can then be calculated between the first and second methylation levels. For example, a difference or ratio can be calculated and compared to a threshold. The difference or ratio can also be subjected to a probability distribution (e.g., as part of a statistical test) to determine the probability of obtaining the value, and this probability can be compared to a threshold to determine the cancer grade based on the methylation level. Such a threshold can be selected to distinguish samples with cancer from those without cancer (e.g., SLE).

在一个实施例中,可以确定第一组区域或区域混合物(即展示扩增、缺失和正常的区域的混合物)的甲基化水平。此甲基化水平随后可以与第一阈值比较,作为第一阶段 分析的一部分。如果超过阈值,那么,由此指示癌症的可能性,随后可以进行以上分析 以确定指示是否是假阳性。因而癌症等级的最终分类可以包括两个甲基化水平的参数与 第二阈值的比较。In one embodiment, the methylation level of a first set of regions or mixtures of regions (i.e., a mixture showing amplified, deleted, and normal regions) can be determined. This methylation level can then be compared to a first threshold as part of a first-stage analysis. If the threshold is exceeded, this indicates the likelihood of cancer, and the above analysis can then be performed to determine whether the indication is a false positive. Thus, the final classification of the cancer grade may include a comparison of two methylation level parameters with a second threshold.

第一甲基化水平可以是针对第一组区域的每个区域计算的区域甲基化水平的统计值(例如平均值或中位值)。第二甲基化水平也可以是针对第二组区域的每个区域计算的区域甲基化水平的统计值。作为实例。可以使用单边秩和检验、史都登氏t检验、方 差分析(ANOVA)检验或克鲁斯卡尔-沃利斯检验确定统计值。The first methylation level can be a statistical value (e.g., mean or median) of the regional methylation level calculated for each region in the first group of regions. The second methylation level can also be a statistical value of the regional methylation level calculated for each region in the second group of regions. As an example, the statistical value can be determined using a one-sided rank-sum test, Stewart's t-test, analysis of variance (ANOVA), or the Kruskal-Wallis test.

XI.癌症类型分类XI. Classification of Cancer Types

除确定生物体是否患有癌症外,实施例还可以鉴别与样品相关的癌症类型。癌症类 型的此鉴别可以使用整体低甲基化、CpG岛高甲基化和/或CNA的模式。所述模式可以 包括使用所测量的区域甲基化水平、区域的相应CNA值和CpG岛的甲基化水平将具有 已知诊断的患者聚类。以下结果展示患有类似类型癌症的生物体具有区域和CpG岛的类 似值,以及非癌症患者具有类似值。在聚类时,区域或岛的每个值都可以是聚类过程中 的单独尺度。In addition to determining whether an organism has cancer, the examples can also identify the type of cancer associated with a sample. This identification of cancer type can use a pattern of overall hypomethylation, CpG island hypermethylation, and/or CNA. The pattern may include clustering patients with a known diagnosis using measured regional methylation levels, the corresponding CNA values for the regions, and the methylation levels of the CpG islands. The following results show that organisms with similar types of cancer have similar values for regions and CpG islands, as do non-cancer patients. During clustering, each value for a region or island can be a separate scale in the clustering process.

已知相同类型的癌症将共享类似的遗传和表观遗传改变(格布哈特等人2004细胞生成和基因组研究;104:352-358(E Gebhart et al.2004Cytogenet Genome Res;104:352-358);琼斯等人2007细胞;128:683-692(PA Jones et al.2007Cell;128:683-692))。下面,描述在血浆中检测到的CNA和甲基化改变的模式如何用于推断癌症的来源或类 型。使用例如分层聚类分析将来自HCC患者、非HCC患者和健康对照个体的血浆DNA 样品分类。使用例如R脚本软件包中的heatmap.2function(cran.r-project.org/web/packages/gplots/gplots.pdf)进行分析。It is known that cancers of the same type share similar genetic and epigenetic alterations (E. Gebhart et al. 2004 Cytogenet Genome Res; 104:352-358; PA. Jones et al. 2007 Cell; 128:683-692). Below, we describe how patterns of CNA and methylation alterations detected in plasma can be used to infer the origin or type of cancer. Plasma DNA samples from HCC patients, non-HCC patients, and healthy controls were classified using, for example, hierarchical clustering analysis. Analysis was performed using, for example, the heatmap.2 function from the R scripting package (cran.r-project.org/web/packages/gplots/gplots.pdf).

为了说明此方法的潜能,使用两组标准(A组和B组)作为实例来鉴别适用于分类血浆样品的特征(参见表6)。在其它实施例中,其它标准可以用于鉴别所述特征。所用 的特征包括在1Mb分辨率下整体CNA、在1Mb分辨率下整体甲基化密度和CpG岛甲 基化。To illustrate the potential of this method, two sets of standards (Group A and Group B) were used as examples to identify features suitable for classifying plasma samples (see Table 6). In other embodiments, other standards may be used to identify the features. The features used include overall CNA at 1 Mb resolution, overall methylation density at 1 Mb resolution, and CpG island methylation.

表6Table 6

在头两个实例中,使用CNA、在1Mb分辨率下整体甲基化和CpG岛甲基化特征所 有用于分类。在其它实施例中,可以使用其它标准,例如(但不限于)测量参考群体的 血浆中的特征的精确度。In the first two examples, CNA, global methylation at 1 Mb resolution, and CpG island methylation features were all used for classification. In other embodiments, other criteria may be used, such as (but not limited to) measuring the accuracy of features in the plasma of a reference population.

图52A展示使用包括355个CNA、584个在1Mb分辨率下整体甲基化特征和110 个CpG岛的甲基化状态的所有1,130个A组特征,对来自HCC患者、非HCC癌症患 者和健康对照个体的血浆样品的分层聚类分析。上侧色带表示样品组:绿色、蓝色和红 色分别表示健康个体、HCC和非HCC癌症患者。一般来说,三组个体倾向于聚类在一 起。纵轴表示分类特征。跨越不同个体具有类似模式的特征聚类在一起。这些结果表明 血浆中CpG岛甲基化改变、在1Mb分辨率下全基因组甲基化改变和CNA的模式可能 用于确定具有未知根源的患者中癌症的来源。Figure 52A illustrates a hierarchical cluster analysis of plasma samples from HCC patients, non-HCC cancer patients, and healthy controls using all 1,130 Group A features, including 355 CNAs, 584 global methylation features at 1 Mb resolution, and methylation status of 110 CpG islands. The top bands represent sample groups: green, blue, and red represent healthy individuals, HCC patients, and non-HCC cancer patients, respectively. Generally, the three groups tend to cluster together. The vertical axis represents categorical features. Features with similar patterns across different individuals cluster together. These results suggest that patterns of CpG island methylation alterations, genome-wide methylation alterations at 1 Mb resolution, and CNAs in plasma may be used to determine the origin of cancer in patients with unknown origins.

图52B展示使用包括759个CNA、911个在1Mb分辨率下整体甲基化和191个CpG 岛的甲基化状态的所有2,780个B组特征,对来自HCC患者、非HCC癌症患者和健康 对照个体的血浆样品的分层聚类分析。上侧色带表示样品组:绿色、蓝色和红色分别表示健康个体、HCC和非HCC癌症患者。一般来说,三组个体倾向于聚类在一起。纵轴 表示分类特征。跨越不同个体具有类似模式的特征聚类在一起。这些结果表明血浆中不 同组的CpG岛甲基化改变、在1Mb分辨率下全基因组甲基化改变和CNA的模式可以 用于确定具有未知根源的患者中癌症的来源。分类特征的选择可以针对特定应用而调整。此外,可以根据个体关于不同类型癌症的先前概率,给予癌症类型预测以不同的权 重。举例来说,患有慢性病毒肝炎的患者倾向于出现肝细胞癌,而慢性吸烟者倾向于出 现肺癌。因此,可以使用例如(但不限于)逻辑回归、多元回归或聚类回归计算癌症类 型的加权概率。Figure 52B illustrates a hierarchical clustering analysis of plasma samples from HCC patients, non-HCC cancer patients, and healthy controls using all 2,780 Group B features, including 759 CNAs, 911 whole-genome methylation states at 1 Mb resolution, and 191 CpG islands. The top bands represent sample groups: green, blue, and red represent healthy individuals, HCC patients, and non-HCC cancer patients, respectively. Generally, the three groups tend to cluster together. The vertical axis represents the categorical features. Features with similar patterns across different individuals cluster together. These results suggest that patterns of CpG island methylation alterations, whole-genome methylation alterations at 1 Mb resolution, and CNAs in different groups of plasma can be used to determine the origin of cancer in patients with unknown origins. The selection of categorical features can be tailored to specific applications. Furthermore, cancer type predictions can be weighted differently based on an individual's prior probability of different cancer types. For example, patients with chronic viral hepatitis tend to develop hepatocellular carcinoma, while chronic smokers tend to develop lung cancer. Therefore, weighted probabilities of cancer types can be calculated using, for example (but not limited to), logistic regression, multiple regression, or cluster regression.

在其它实施例中,单一类型的特征可以用于分类分析。举例来说,在以下实例中,仅仅在1Mb分辨率下整体甲基化、CpG岛高甲基化或在1Mb分辨率下CNA用于分层 聚类分析。当使用不同的特征时,区分能力可能不同。分类特征的进一步优化可以提高分类准确性。In other embodiments, a single type of feature can be used for classification analysis. For example, in the following instances, only global methylation, CpG island hypermethylation, or CNA at 1Mb resolution is used for hierarchical clustering analysis. Discriminative power may vary when different features are used. Further optimization of classification features can improve classification accuracy.

图53A展示使用A组CpG岛甲基化特征,对来自HCC患者、非HCC癌症患者和健康对照个体的血浆样品的分层聚类分析。一般来说,癌症患者聚类在一起,而非癌症 个体在另一类中。但是,与使用所有三种类型特征相比较,HCC和非HCC患者不太分得开。Figure 53A illustrates a stratified clustering analysis of plasma samples from HCC patients, non-HCC cancer patients, and healthy controls using group A CpG island methylation features. Generally, cancer patients clustered together, while non-cancer individuals were in another group. However, compared to using all three types of features, HCC and non-HCC patients were less clearly distinguishable.

图53B展示使用A组在1Mb分辨率下整体甲基化密度作为分类特征,对来自HCC 患者、非HCC癌症患者和健康对照个体的血浆样品的分层聚类分析。观测到HCC和非 HCC患者的优先聚类。Figure 53B shows a hierarchical cluster analysis of plasma samples from HCC patients, non-HCC cancer patients, and healthy controls, using the overall methylation density of group A at 1 Mb resolution as a classification feature. Preferred clustering was observed between HCC and non-HCC patients.

图54A展示使用A组在1Mb分辨率下整体CNA作为分类特征,对来自HCC患者、 非HCC癌症患者和健康对照个体的血浆样品的分层聚类分析。看到HCC和非HCC患 者的优先聚类。Figure 54A shows a hierarchical cluster analysis of plasma samples from HCC patients, non-HCC cancer patients, and healthy controls, using the overall CNA at 1 Mb resolution as the classification feature in group A. Preferred clustering is observed between HCC and non-HCC patients.

图54B展示使用B组CpG岛甲基化密度作为分类特征,对来自HCC患者、非HCC 癌症患者和健康对照个体的血浆样品的分层聚类分析。可以观测HCC和非HCC癌症患 者的优先聚类。Figure 54B illustrates a hierarchical cluster analysis of plasma samples from HCC patients, non-HCC cancer patients, and healthy controls, using the methylation density of CpG islands in group B as a classification feature. Preferred clustering of HCC and non-HCC cancer patients can be observed.

图55A展示使用B组在1Mb分辨率下整体甲基化密度作为分类特征,对来自HCC 患者、非HCC癌症患者和健康对照个体的血浆样品的分层聚类分析。可以观测到HCC 和非HCC癌症患者的优先聚类。Figure 55A illustrates a hierarchical clustering analysis of plasma samples from HCC patients, non-HCC cancer patients, and healthy controls, using the overall methylation density of group B at 1 Mb resolution as a classification feature. Preferred clustering of HCC and non-HCC cancer patients can be observed.

图55B展示使用B组在1Mb分辨率下整体CNA作为分类特征,对来自HCC患者、 非HCC癌症患者和健康对照个体的血浆样品的分层聚类分析。可以观测到HCC和非 HCC癌症患者的优先聚类。Figure 55B illustrates a hierarchical clustering analysis of plasma samples from HCC patients, non-HCC cancer patients, and healthy controls using the overall CNA of group B at 1 Mb resolution as a classification feature. Preferred clustering of HCC and non-HCC cancer patients can be observed.

血浆样品的这些分层聚类结果表明不同特征的组合可以用于鉴别主要癌症类型。选 择标准的进一步优化可以进一步提高分类的准确性。These hierarchical clustering results from plasma samples indicate that combinations of different features can be used to identify major cancer types. Further optimization of the selection criteria could further improve classification accuracy.

因此,在一个实施例中,当甲基化分类指示生物体存在癌症时,可以通过将甲基化水平(例如来自方法2800的第一甲基化或任何区域甲基化水平)与由其它生物体(即 相同类型的其它生物体,例如人类)确定的对应值比较,来鉴别与生物体相关的癌症类 型。对应值可以是针对计算甲基化水平的相同区域或位点集合。其它生物体中的至少两 个被鉴别为患有不同类型的癌症。举例来说,对应值可以组织成类,其中两个类与不同 癌症相关。Therefore, in one embodiment, when methylation classification indicates the presence of cancer in an organism, the type of cancer associated with the organism can be identified by comparing methylation levels (e.g., first methylation from method 2800 or methylation levels in any region) with corresponding values determined by other organisms (i.e., other organisms of the same type, such as humans). The corresponding values can be for the same set of regions or sites for which methylation levels are calculated. At least two of the other organisms are identified as having different types of cancer. For example, the corresponding values can be organized into classes, where two classes are associated with different cancers.

此外,当CNA和甲基化一起用以获得癌症等级的第三分类时,CNA和甲基化特征 可以与来自其它生物体的对应值比较。举例来说,显示缺失或扩增的区域的第一量(例 如来自图36)可以与由其它生物体确定的对应值比较以鉴别与生物体相关的癌症类型。Furthermore, when CNA and methylation are used together to obtain a third category of cancer classification, CNA and methylation features can be compared with corresponding values from other organisms. For example, the first amount showing a region of deletion or amplification (e.g., from Figure 36) can be compared with corresponding values determined by other organisms to identify the cancer type associated with that organism.

在一些实施例中,甲基化特征是基因组的多个区域的区域甲基化水平。可以使用确 定区域甲基化水平超过相应区域阈值的区域,例如生物体的区域甲基化水平可以与其它生物体的基因组相同区域的区域甲基化水平比较。比较可以允许区分癌症类型,或仅仅提供额外的过滤来证实癌症(例如鉴别假阳性)。因此,可以基于比较,确定生物体是具有第一类型的癌症、不存在癌症还是具有第二类型的癌症。In some embodiments, methylation characteristics are the regional methylation levels of multiple regions of the genome. Regions whose methylation levels exceed a corresponding regional threshold can be used; for example, the regional methylation levels of an organism can be compared with the regional methylation levels of the same regions in the genomes of other organisms. This comparison can allow for differentiation of cancer types, or simply provide additional filtering to confirm cancer (e.g., identifying false positives). Therefore, based on the comparison, it can be determined whether an organism has type I cancer, does not have cancer, or has type II cancer.

其它生物体(与测试的生物体一起)可以使用区域甲基化水平聚类。因此,区域甲基化水平的比较可以用于确定生物体属于哪一类。聚类也可以使用确定显示缺失或扩增的区域的CNA标准化值,如上文所述。并且,聚类可以使用高甲基化CpG岛的相应甲 基化密度。Other organisms (along with the tested organism) can be clustered using regional methylation levels. Therefore, comparisons of regional methylation levels can be used to determine which group an organism belongs to. Clustering can also use CNA normalized values to identify regions showing deletions or amplifications, as described above. Furthermore, clustering can use the corresponding methylation density of highly methylated CpG islands.

为了说明此方法的原理,展示使用逻辑回归将两个未知的样品分类的一个实例。此 分类的目的是确定这两个样品是HCC还是非HCC癌症。汇集一训练组的样品,其包括从HCC患者收集的23个血浆样品和来自患有除HCC外的癌症的患者的18个样品。因 此,训练组中总共41个案例。在此实例中,选择13个特征,包括关于CpG岛甲基化的 五个特征(X1-X5)、关于1Mb区域甲基化的六个特征(X6-X11)和关于1Mb区域的 CNA的2个特征(X12-X13)。基于训练组中至少15案例的z分数>3或<-3的标准,选 择CpG甲基化特征。基于训练组中至少39个案例的z分数>3或<-3的标准,选择1Mb 甲基化特征。基于至少20个案例的z分数>3或<-3的标准,选择CNA特征。对此训练 组的样品进行逻辑回归以便确定每个特征(X1-X13)的回归系数。具有更大量值的回归 系数(与其阳性还是阴性意义无关)的特征提供了HCC与非HCC样品之间更好的鉴别。 每个案例相应特征的z分数用作独立变数的输入值。随后分析两个血浆样品的13个特 征,一个来自HCC患者(TBR36)和一个来自患有肺癌的患者(TBR177)。To illustrate the principle of this method, an example of classifying two unknown samples using logistic regression is presented. The purpose of this classification is to determine whether the two samples are HCC or non-HCC cancers. A training group of samples was compiled, which included 23 plasma samples collected from HCC patients and 18 samples from patients with cancers other than HCC. Therefore, there were a total of 41 cases in the training group. In this example, 13 features were selected, including five features (X1-X5) regarding CpG island methylation, six features (X6-X11) regarding methylation of the 1Mb region, and two features (X12-X13) regarding CNA in the 1Mb region. The CpG methylation feature was selected based on a z-score >3 or <-3 for at least 15 cases in the training group. The 1Mb methylation feature was selected based on a z-score >3 or <-3 for at least 39 cases in the training group. The CNA feature was selected based on a z-score >3 or <-3 for at least 20 cases. Logistic regression was performed on the samples from this training group to determine the regression coefficients for each feature (X1–X13). Features with larger regression coefficients (regardless of whether they were positive or negative) provided better differentiation between HCC and non-HCC samples. The z-score of the corresponding feature for each case was used as input values for the independent variables. Thirteen features from two plasma samples were then analyzed, one from an HCC patient (TBR36) and the other from a patient with lung cancer (TBR177).

在此癌症类型分类分析中,假设这两个样品是从患有来源未知的癌症的患者收集。 对于每个样品,将相应特征的z分数放到逻辑回归等式,以确定比值比(odds ratio)的自然对数(ln(让步比)),其中让步比表示患有HCC与未患HCC的概率的比率(HCC/ 非HCC)。In this cancer type classification analysis, it is assumed that both samples were collected from patients with cancer of unknown origin. For each sample, the z-score of the corresponding feature is put into a logistic regression equation to determine the natural logarithm of the odds ratio (ln(concession ratio)), where the concession ratio represents the ratio of the probability of having HCC to the probability of not having HCC (HCC/non-HCC).

表7展示逻辑回归等式的13个特征的回归系数。还展示了两个测试案例(TBR36 和TBR177)的相应特征的z分数。TBR36和TBR177的HCC的ln(比值比)分别是37.03 和-4.37。从这些比值比,算得从HCC患者收集的血浆样品的概率分别为>99.9%和1%。 简单地说,TBR36具有样品来自HCC患者的高度可能性,而TBR177具有样品来自HCC 患者的低可能性。Table 7 shows the regression coefficients for the 13 features of the logistic regression equation. The z-scores for the corresponding features for the two test cases (TBR36 and TBR177) are also shown. The ln(odds ratio) for HCC in TBR36 and TBR177 are 37.03 and -4.37, respectively. From these odds ratios, the probabilities of collecting plasma samples from HCC patients are >99.9% and 1%, respectively. Simply put, TBR36 has a high probability of the sample originating from an HCC patient, while TBR177 has a low probability.

表7Table 7

在其它实施例中,分层聚类回归、分类树分析和其它回归模型可以用于确定癌症的 可能主要来源。In other embodiments, hierarchical clustering regression, classification tree analysis, and other regression models can be used to determine the likely primary source of cancer.

XII.材料与方法XII. Materials and Methods

A.制备经亚硫酸氢盐处理的DNA文库和测序A. Preparation of bisulfite-treated DNA libraries and sequencing

添加有0.5%(w/w)未甲基化λDNA(普洛麦格(Promega))的基因组DNA通过 科瓦里斯S220系统(科瓦里斯(Covaris))片段化到大约200bp长。使用双末端测序 样品制备试剂盒(伊路米那),根据制造商的说明书制备DNA文库,不同之处在于甲基 化衔接子(伊路米那)接合到DNA片段。在使用安普蕾(AMPure)XP磁珠(贝克曼 库尔特(Beckman Coulter))进行两轮纯化后,接合产物分成2部分,其中之一用EpiTect 亚硫酸氢盐试剂盒(凯杰(Qiagen))进行2轮亚硫酸氢盐改性。插入物中CpG位点上 未甲基化的胞嘧啶转化成尿嘧啶,而甲基化的胞嘧啶保持不变。经亚硫酸氢钠处理或未 经处理的衔接子接合的DNA分子通过使用以下配方进行10次PCR循环来富集:50μl 反应物中2.5U PfuTurboCx hotstartDNA聚合酶(安捷伦技术(Agilent Technologies))、1X PfuTurboCx反应缓冲液、25μMdNTP、1μl PCR引物PE 1.0和1μl PCR引物PE 2.0(伊 路米那)。热循环型态为:95℃2分钟,98℃30s,随后98℃15s、60℃30s和72℃4 分钟循环10次,以及72℃10分钟的最终步骤(李斯特等人2009自然;462:315-322(R Lister,et al.2009Nature;462:315-322))。使用安普蕾XP磁珠纯化PCR产物。Genomic DNA with 0.5% (w/w) unmethylated λDNA (Promega) was fragmented to approximately 200 bp using the Covaris S220 system. DNA libraries were prepared according to the manufacturer's instructions using a paired-end sequencing sample preparation kit (Ilumina), except that methylated adaptors (Ilumina) were attached to the DNA fragments. After two rounds of purification using AMPure XP magnetic beads (Beckman Coulter), the conjugation product was aliquoted into two fractions, one of which underwent two rounds of bisulfite modification using the EpiTect bisulfite kit (Qiagen). Unmethylated cytosine at CpG sites in the insert was converted to uracil, while methylated cytosine remained unchanged. DNA molecules conjugated by adaptors, whether treated with sodium bisulfite or not, were enriched by 10 PCR cycles using the following formulation: 50 μl of 2.5 U PfuTurboCx hotstart DNA polymerase (Agilent Technologies), 1X PfuTurboCx reaction buffer, 25 μM dNTP, 1 μl of PCR primer PE 1.0, and 1 μl of PCR primer PE 2.0 (Illumina). The thermal cycling pattern was: 95 °C for 2 min, 98 °C for 30 s, followed by 10 cycles of 98 °C for 15 s, 60 °C for 30 s, and 72 °C for 4 min, and a final step of 72 °C for 10 min (R. Lister, et al. 2009 Nature; 462:315-322). PCR products were purified using Amprey XP magnetic beads.

从3.2-4ml母体血浆样品提取的血浆DNA外加片段化的λDNA(每毫升血浆25pg) 并且如上所述进行文库构造(丘等人2011BMJ;342:c7401)。在接合到甲基化衔接子后, 接合产物分成两半,并且一部分进行2轮亚硫酸氢盐改性。随后通过如上所述的10次 PCR循环富集经亚硫酸氢盐处理或未经处理的接合产物。Plasma DNA was extracted from 3.2–4 ml maternal plasma samples, and fragmented λDNA (25 pg per ml of plasma) was added, followed by library construction as described above (Qiu et al., 2011 BMJ; 342:c7401). After conjugation to a methylated linker, the conjugation product was split in two, and one half underwent two rounds of bisulfite modification. Subsequently, the bisulfite-treated or untreated conjugation products were enriched by 10 PCR cycles as described above.

在HiSeq2000仪器(伊路米那)上以双末端格式针对75bp,将经亚硫酸氢盐处理或未经处理的DNA文库测序。DNA簇在cBot仪器(伊路米那)上用双末端簇产生试 剂盒v3产生。使用HiSeq控制软件(HCS)v1.4版和实时分析(RTA)软件v 1.13(伊 路米那)进行实时图像分析和碱基判定,通过这些软件,自动化矩阵和定相计算基于外加DNA文库测序的PhiX控制v3。DNA libraries, either bisulfite-treated or untreated, were sequenced in paired-end format targeting 75 bp on a HiSeq2000 instrument (Illumina). DNA clusters were generated on a cBot instrument (Illumina) using a paired-end cluster generation kit v3. Real-time image analysis and base determination were performed using HiSeq Control Software (HCS) version 1.4 and Real-Time Analysis (RTA) software v1.13 (Illumina). These software programs automated matrix and phasing calculations based on the external DNA library sequencing using PhiX Control v3.

B.甲基化胞嘧啶的序列比对和鉴别B. Sequence alignment and identification of methylated cytosine

在碱基判定后,去除片段末端上的衔接序列和低质量碱基(即质量分数<20)。随后FASTQ格式的修整读数通过称为Methy-Pipe的甲基化数据分析管道来处理(江等人 Methy-Pipe:用于全基因组甲基化组分析的集成生物信息学数据分析程序,在有关生物 信息学和生物医学研讨会的IEEE国际主会议上发表的论文,香港,2010年12月18日到 21日(PJiang,et al.Methy-Pipe:An integrated bioinformatics data analysis pipelinefor whole genome methylome analysis,paper presented at the IEEE InternationalConference on Bioinformatics and Biomedicine Workshops,Hong Kong,18to21December 2010))。为了比对经亚硫酸氢盐转化的测序读数,首先使用参考人类基因组(NCBI build 36/hg18), 用计算机程序对沃森和克里克链分开进行所有胞嘧啶残基到胸腺嘧啶的转化。随后,在 用计算机程序对所有处理的读数中进行每个胞嘧啶到胸腺嘧啶的转化并保存每个转化 残基的位置信息。使用SOAP2(李等人2009生物信息学25:1966-1967)将转化读数与两个转化后的参考人类基因组比对,其中每个比对读数允许最多两个错配。仅仅选择 可比对到基因组的唯一位置的读数。去除掉同时比对到沃森和克里克链的不明确的读数 和具有相同起始和结束基因组位置的重复(克隆)读数。保留插入尺寸≤600bp的测序 读数用于甲基化和尺寸分析。After base determination, linker sequences and low-quality bases (i.e., mass fraction <20) at the fragment ends are removed. The FASTQ formatted trimmed reads are then processed through a methylation data analysis pipeline called Methy-Pipe (Jiang et al., Methy-Pipe: An integrated bioinformatics data analysis pipeline for whole genome methylome analysis, paper presented at the IEEE International Conference on Bioinformatics and Biomedicine Workshops, Hong Kong, 18-21 December 2010). To align bisulfite-converted sequencing reads, a computer program was first used to convert all cytosine residues to thymine separately in the Watson and Crick chains using a reference human genome (NCBI build 36/hg18). Subsequently, a computer program was used to convert each cytosine to thymine in all treated reads, and the positional information of each converted residue was saved. The converted reads were aligned to two converted reference human genomes using SOAP2 (Li et al. 2009 Bioinformatics 25:1966-1967), with a maximum of two mismatches allowed per aligned read. Only reads that could be aligned to a unique position in the genome were selected. Indistinct reads that aligned to both the Watson and Crick chains and duplicate (clonal) reads with the same start and end genomic positions were removed. Sequencing reads with an insert size ≤600 bp were retained for methylation and size analysis.

CpG双核苷酸背景下的胞嘧啶残基是下游DNA甲基化研究的主要目标。在比对后,基于在计算机程序转化期间保存的位置信息,恢复最初存在于测序读数上的胞嘧啶。在CpG双核苷酸中恢复的胞嘧啶评分为甲基化。在CpG双核苷酸中的胸腺嘧啶评分为未甲基化。在文库制备期间包括的未甲基化的λDNA充当用于评估亚硫酸氢钠修饰的的内 部控制。如果亚硫酸氢盐转化效率是100%,那么λDNA上的所有胞嘧啶都应已经转化 成胸腺嘧啶。Cytosine residues in the CpG dinucleotide background are the primary target for downstream DNA methylation studies. After alignment, cytosine residues initially present on the sequencing reads are recovered based on positional information preserved during computer-programmed conversion. Recovered cytosine residues in CpG dinucleotides are graded as methylated. Thymine residues in CpG dinucleotides are graded as unmethylated. Unmethylated λDNA included during library preparation serves as an internal control for evaluating sodium bisulfite modification. If the bisulfite conversion efficiency is 100%, then all cytosine residues on the λDNA should have been converted to thymine.

XIII.概述XIII. Overview

使用本文所述的实施例,可以使用例如个体的血浆无创地筛选、检测、监测或预测癌症。还可以通过从母体血浆推断胎儿DNA的甲基化型态对胎儿进行产前筛选、诊断、 研究或监测。为了说明所述方法的能力,展示了通过研究胎盘组织常规获得的信息可以直接从母体血浆评估。举例来说,通过直接分析母体血浆DNA实现了基因座的印记状态、鉴别在胎儿与母体DNA之间具有甲基化差异的基因座以及基因座的甲基化型态的 妊娠期变化。我们方法的主要优势是胎儿甲基化组可以在妊娠期间全面地评估,而不破 坏妊娠或无需对胎儿组织进行侵入性抽样。假定已知改变的DNA甲基化状态与许多妊 娠相关病状之间的关联,在这一研究中描述的方法可以充当研究那些病状的病理生理学和鉴别生物标记物的一种重要工具。通过集中于印记基因座,展示了可以从母体血浆评 估父体传递以及母体传递的胎儿甲基化型态。此方法适用于研究印记疾病。实施例也可 以直接用于胎儿或妊娠相关疾病的产前评估。Using the embodiments described herein, cancer can be noninvasively screened, detected, monitored, or predicted using, for example, individual plasma. Prenatal screening, diagnosis, research, or monitoring of the fetus can also be performed by inferring the methylation pattern of fetal DNA from maternal plasma. To illustrate the capabilities of the methods, it is demonstrated that information routinely obtained from studying placental tissue can be directly assessed from maternal plasma. For example, the imprinting status of loci, identification of loci with methylation differences between fetal and maternal DNA, and gestational changes in the methylation pattern of loci were achieved through direct analysis of maternal plasma DNA. A key advantage of our method is that the fetal methylome can be comprehensively assessed during pregnancy without disrupting the pregnancy or requiring invasive sampling of fetal tissue. Assuming known associations between altered DNA methylation status and many pregnancy-related conditions, the methods described in this study can serve as an important tool for studying the pathophysiology of those conditions and identifying biomarkers. By focusing on imprinted loci, it is demonstrated that paternally and maternally transmitted fetal methylation patterns can be assessed from maternal plasma. This method is applicable to the study of imprinted diseases. The examples can also be used directly for prenatal assessment of fetal or pregnancy-related diseases.

已经证实了全基因组亚硫酸氢盐测序可以应用于研究胎盘组织的DNA甲基化型态。 人类基因组中存在大约28M的CpG位点(克拉克等人2012公共科学图书馆·综合;7:e50233(C Clark et al.2012PLoS One;7:e50233))。CVS和足月胎盘组织样品的亚硫酸 氢盐测序数据覆盖了大于80%的CpG。此表示覆盖率实质上比那些使用其它高通量的平 台可实现的覆盖率更宽。举例来说,用于胎盘组织上先前研究的伊路米那印飞尼姆人类 甲基化27K珠粒芯片阵列(朱等人2011公共科学图书馆·综合;6:e14723(T Chu et al.2011PLoSOne;6:e14723))仅仅覆盖基因组中0.1%的CpG。最近可获得的伊路米那印 飞尼姆人类甲基化450K珠粒芯片阵列仅仅覆盖1.7%的CpG(克拉克等人2012公共科 学图书馆·综合;7:e50233)。因为MPS方法没有与探针设计、杂交效率或抗体捕捉强度 有关的限制,所以可以评估CpG岛内或外和大部分序列背景下的CpG。Whole-genome bisulfite sequencing has been proven effective in studying DNA methylation patterns in placental tissues. The human genome contains approximately 28 million CpG sites (Clark et al., 2012 PLoS One; 7:e50233). Bisulfite sequencing data from CVS and full-term placental tissue samples cover more than 80% of the CpGs. This indicates a significantly wider coverage than that achievable using other high-throughput platforms. For example, the illumina-infinim human methylated 27K bead array used in previous studies on placental tissue (T Chu et al., 2011 PLoS One; 6:e14723) only covers 0.1% of the CpGs in the genome. The recently available ilumininfinium human methylated 450K bead array only covers 1.7% of CpG (Clark et al. 2012 PLOS ONE; 7:e50233). Because the MPS method has no limitations related to probe design, hybridization efficiency, or antibody capture strength, it is possible to evaluate CpG inside or outside CpG islands and in most sequence backgrounds.

XIV.计算机系统XIV. Computer Systems

本文中提及的任何计算机系统都可以利用任何适合数目的子系统。所述子系统的实 例展示于图33中计算机设备3300。在一些实施例中,计算机系统包括单一计算机设备,其中子系统可以是计算机设备的组件。在其它实施例中,计算机系统可以包括多个具有 内部组件的各自作为子系统的计算机设备。Any computer system mentioned herein can utilize any suitable number of subsystems. Examples of such subsystems are shown in computer device 3300 in FIG33. In some embodiments, the computer system includes a single computer device, wherein the subsystems may be components of the computer device. In other embodiments, the computer system may include multiple computer devices, each having internal components, each serving as a subsystem.

图33中所示的子系统经由系统总线3375互连。展示其它子系统,例如打印机3374、键盘3378、存储装置3379、与显示适配器3382耦接的监视器3376等。与I/O控制器 3371耦接的外围装置和输入/输出(I/O)装置可以通过本领域中已知的许多构件(例如 串行端口3377)与计算机系统连接。举例来说,串行端口3377或外部接口3381(例如 以太网、Wi-Fi等)可以用以将计算机系统3300连接到广域网(例如因特网)、鼠标输 入装置或扫描仪。经由系统总线3375的互连使得中央处理器3373与每个子系统通信并 且控制来自系统存储器3372或存储装置3379(例如固定磁盘)的指令的执行以及子系 统之间的信息的交换。系统存储器3372和/或存储装置3379可以包含计算机可读媒体。 本文中提及的任何值都可以由一个组件向另一个组件输出并且可以向用户输出。The subsystems shown in Figure 33 are interconnected via system bus 3375. Other subsystems are shown, such as printer 3374, keyboard 3378, storage device 3379, monitor 3376 coupled to display adapter 3382, etc. Peripheral devices and input/output (I/O) devices coupled to I/O controller 3371 can be connected to the computer system via many components known in the art, such as serial port 3377. For example, serial port 3377 or external interface 3381 (e.g., Ethernet, Wi-Fi, etc.) can be used to connect computer system 3300 to a wide area network (e.g., the Internet), a mouse input device, or a scanner. The interconnection via system bus 3375 enables central processing unit 3373 to communicate with each subsystem and control the execution of instructions from system memory 3372 or storage device 3379 (e.g., fixed disk) and the exchange of information between subsystems. System memory 3372 and/or storage device 3379 may contain computer-readable media. Any value mentioned in this article can be output from one component to another and can also be output to the user.

计算机系统可以包括例如通过外部接口3381或通过内部接口连接在一起的多个相 同组件或子系统。在一些实施例中,计算机系统、子系统或设备可以经网络通信。在所述情况下,一个计算机可以视为客户端并且另一个计算机视为服务器,其中每一者可以 是同一计算机系统的一部分。客户端和服务器可以各自包括多个系统、子系统或组件。A computer system may include multiple identical components or subsystems connected together, for example, via an external interface 3381 or an internal interface. In some embodiments, the computer system, subsystem, or device may communicate via a network. In this case, one computer may be regarded as a client and another computer as a server, wherein each may be part of the same computer system. The client and server may each include multiple systems, subsystems, or components.

应了解,本发明的任何实施例都可以按控制逻辑形式以模块化或集成方式使用硬件 (例如专用集成电路或现场可编程门阵列)和/或使用具有通用可编程处理器的计算机软 件来实施。如本文中所用,处理器包括同一集成芯片上的多核处理器,或单一电路板上或网络连接的多个处理单元。基于本发明和本文中所提供的传授内容,本领域的普通技 术人员将知道并且了解使用硬件和硬件与软件的组合来实施本发明的实施例的其它方 式和/或方法。It should be understood that any embodiment of the present invention can be implemented in a modular or integrated manner using hardware (e.g., application-specific integrated circuits or field-programmable gate arrays) and/or computer software with a general-purpose programmable processor in the form of control logic. As used herein, the processor includes a multi-core processor on the same integrated chip, or multiple processing units on a single circuit board or network-connected. Based on the present invention and the teachings provided herein, those skilled in the art will know and understand other ways and/or methods of implementing embodiments of the present invention using hardware and combinations of hardware and software.

本申请中描述的任何软件组件或函数都可以实施为由处理器使用任何适合的计算 机语言(例如Java、C++或Perl)、使用例如常规或面向对象的技术执行的软件代码。软件代码可以存储为用于存储和/或传输的计算机可读媒体上的一系列指令或命令,适合的媒体包括随机存取存储器(RAM)、只读存储器(ROM)、磁性媒体(例如硬盘驱动器 或软性磁盘)或光学媒体(例如光盘(CD)或DVD(数字通用光盘))、快闪存储器等。 计算机可读媒体可以是此类存储或传输装置的任何组合。Any software component or function described in this application can be implemented as software code executed by a processor using any suitable computer language (e.g., Java, C++, or Perl) and employing techniques such as conventional or object-oriented methods. The software code can be stored as a series of instructions or commands on a computer-readable medium for storage and/or transmission, suitable media including random access memory (RAM), read-only memory (ROM), magnetic media (e.g., hard disk drive or floppy disk) or optical media (e.g., optical disc (CD) or DVD (Digital Universal Optical Disc)), flash memory, etc. The computer-readable medium can be any combination of such storage or transmission means.

所述程序还可以使用适合于经由符合多种方案的有线、光学和/或无线网络(包括因 特网)传输的载波信号来编码和传输。因此,根据本发明的一个实施例的计算机可读媒体可以使用以此类程序编码的数据信号产生。以程序代码编码的计算机可读媒体可以与相容装置一起封装或与其它装置分开地提供(例如经由因特网下载)。任何此类计算机 可读媒体都可以存在于单一计算机程序产品(例如硬盘驱动器、CD或整个计算机系统) 之上或之内,并且可以存在于系统或网络内的不同计算机程序产品之上或之内。计算机系统可以包括用于向用户提供本文中提及的任何结果的监视器、打印机或其它适合的显 示器。The program can also be encoded and transmitted using carrier signals suitable for transmission via wired, optical, and/or wireless networks (including the Internet) conforming to various schemes. Therefore, a computer-readable medium according to an embodiment of the invention can be generated using data signals encoded with such a program. Computer-readable media encoded with program code can be packaged with a compatible device or provided separately from other devices (e.g., via Internet download). Any such computer-readable medium can reside on or within a single computer program product (e.g., hard disk drive, CD, or entire computer system) and can reside on or within different computer program products within a system or network. The computer system may include a monitor, printer, or other suitable display for providing a user with any of the results mentioned herein.

本文中所描述的任何方法都可以完全或部分地用计算机系统执行,所述计算机系统 包括一或多个处理器,所述处理器可以经配置以执行所述步骤。因此,实施例可以涉及经配置以执行本文中所描述的任何方法的步骤的计算机系统,可能用不同组件执行相应的步骤或相应群组的步骤。尽管本文中方法的步骤以编号步骤的形式呈现,但其可以同时或以不同顺序执行。另外,这些步骤的部分可以与其它方法的其它步骤的部分一起使 用。此外,步骤的全部或部分可以是任选的。另外,任何方法的任何步骤都可以用执行 这些步骤的模块、电路或其它构件来执行。Any method described herein can be performed, wholly or partially, by a computer system including one or more processors configured to perform the steps. Therefore, embodiments may relate to computer systems configured to perform the steps of any method described herein, possibly with different components performing the respective steps or groups of steps. Although the steps of the methods herein are presented as numbered steps, they may be performed simultaneously or in different orders. Furthermore, portions of these steps may be used in conjunction with portions of other steps of other methods. Moreover, all or part of the steps may be optional. Additionally, any step of any method may be performed using modules, circuitry, or other components that perform these steps.

可以在不脱离本发明的实施例的精神和范围下以任何适合的方式组合特定实施例 的特定细节。然而,本发明的其它实施例可以涉及与每个个别方面或这些个别方面的特定组合相关的特定实施例。Specific details of a particular embodiment may be combined in any suitable manner without departing from the spirit and scope of the embodiments of the invention. However, other embodiments of the invention may relate to specific embodiments associated with each individual aspect or a specific combination of these individual aspects.

已经出于说明和描述的目的呈现了本发明的例示性实施例的以上描述。其并不打算 是穷尽性的或将本发明限制于所描述的精确形式,并且鉴于以上传授内容许多修改和变 化是可能的。所述实施例经选择和描述以便最佳地解释本发明的原理和其实际应用,由此使得本领域的其它技术人员能够在各种实施例中并且在适于所预期的特定用途的各 种修改下最佳地利用本发明。The foregoing description of exemplary embodiments of the invention has been presented for purposes of illustration and description. It is not intended to be exhaustive or to limit the invention to the precise forms described, and many modifications and variations are possible given the foregoing teachings. The embodiments have been chosen and described to best explain the principles of the invention and its practical application, thereby enabling others skilled in the art to best utilize the invention in various embodiments and with various modifications suitable for the particular intended use.

除非具体地相反指示,否则“一(a/an)”或“所述(the)”的叙述打算意指“一或多个”。Unless specifically indicated to the contrary, the phrase “a/an” or “the” is intended to mean “one or more”.

此处提及的所有专利、专利申请案、公开案和描述都以全文引用的方式并入以达成 所有目的。不承认任一者是现有技术。All patents, patent applications, publications, and descriptions mentioned herein are incorporated herein by reference in their entirety for all purposes. None of them are acknowledged as prior art.

表S2A.从早期妊娠绒膜绒毛样品和母体血细胞鉴别出的100个最高甲基化区域的清单Table S2A. List of 100 most methylated regions identified from early pregnancy chorionic villus samples and maternal blood cells.

  

  

表S2B.从早期妊娠绒膜绒毛样品和母体血细胞鉴别出的100个最低甲基化区域的清单Table S2B. List of 100 hypomethylated regions identified from early pregnancy chorionic villus samples and maternal blood cells.

表S2C.从晚期妊娠胎盘样品和母体血细胞鉴别出的100个最高甲基化区域的清单Table S2C. List of 100 most methylated regions identified from placental samples and maternal blood cells in late pregnancy.

表S2D.从晚期妊娠胎盘样品和母体血细胞鉴别出的100个最低甲基化区域的清单Table S2D. List of 100 minimally methylated regions identified from placental samples and maternal blood cells in late pregnancy.

表S3A从早期妊娠母体血浆亚硫酸氢盐测序数据推断为 高甲基化的前100个基因座的清单Table S3A lists the top 100 loci inferred as hypermethylated from early pregnancy maternal plasma bisulfite sequencing data.

  

  

表S3B从早期妊娠母体血浆亚硫酸氢盐测序数据推断为低甲基化的 前100个基因座的清单Table S3B lists the top 100 hypomethylated loci inferred from early pregnancy maternal plasma bisulfite sequencing data.

  

  

表S3C从晚期妊娠母体血浆亚硫酸氢盐测序数据推断为 高甲基化的前100个基因座的清单Table 1 lists the top 100 S3C loci inferred from late-pregnancy maternal plasma bisulfite sequencing data as hypermethylated.

  

  

表S3D从晚期妊娠母体血浆亚硫酸氢盐测序数据推断为 低甲基化的前100个基因座的清单Table S3D lists the top 100 loci inferred as hypomethylated from maternal plasma bisulfite sequencing data in late pregnancy.

  

  

Claims (234)

1.一种分析生物体的生物样品的非诊断目的的方法,所述生物样品包括源自正常细胞和与癌症相关的细胞的游离脱氧核糖核酸(DNA)分子的混合物,所述方法包含:1. A method for analyzing a biological sample of an organism for non-diagnostic purposes, said biological sample comprising a mixture of free deoxyribonucleic acid (DNA) molecules derived from normal cells and cancer-associated cells, said method comprising: 从所述生物样品分析多个游离DNA分子,其中所述分析是针对获自大规模平行测序的序列读值,并且分析多个游离DNA分子的每一个包括:The analysis of the biological sample comprises the analysis of multiple cell-free DNA molecules, wherein the analysis is performed on sequence reads obtained from massively parallel sequencing, and the analysis of each of the multiple cell-free DNA molecules includes: 确定游离DNA分子在所述生物体的基因组中的位置;和Determine the location of free DNA molecules within the genome of the organism; and 确定所述游离DNA分子在一或多个位点是否甲基化;Determine whether the free DNA molecule is methylated at one or more sites; 对于多个位点中的每一者:For each of the multiple sites: 利用序列读值,对在所述位点上甲基化的来自所述生物样品的游离DNA分子的相应数目进行分别计数;Using sequence reads, the number of free DNA molecules from the biological sample methylated at the stated site is counted separately. 利用在所述多个位点上甲基化的游离DNA分子的所述相应数目之和计算第一甲基化水平,其中所述多个位点位于多个染色体上;The first methylation level is calculated by summing the corresponding numbers of free DNA molecules methylated at the plurality of sites, wherein the plurality of sites are located on a plurality of chromosomes; 将所述第一甲基化水平与第一阈值比较;以及Compare the first methylation level with a first threshold; and 基于所述比较,确定癌症等级的第一分类。Based on the comparison, a first category of cancer severity is determined. 2.如权利要求1所述的方法,其中确定所述游离DNA分子在一或多个位点上是否甲基化包括:2. The method of claim 1, wherein determining whether the free DNA molecule is methylated at one or more sites comprises: 进行可识别甲基化的测序。Perform sequencing that identifies methylation. 3.如权利要求2所述的方法,其中进行可识别甲基化的测序包括:3. The method of claim 2, wherein performing identifiable methylation sequencing comprises: 用亚硫酸氢钠处理所述游离DNA分子;以及The free DNA molecules were treated with sodium bisulfite; and 对所述经处理的游离DNA分子进行测序。The processed free DNA molecules were sequenced. 4.如权利要求3所述的方法,其中所述用亚硫酸氢钠处理所述游离DNA分子是用于检测5-羟基甲基胞嘧啶的Tet辅助的亚硫酸氢盐转化或氧化亚硫酸氢盐测序的一部分。4. The method of claim 3, wherein the treatment of the free DNA molecule with sodium bisulfite is part of Tet-assisted bisulfite conversion or oxidative bisulfite sequencing for the detection of 5-hydroxymethylcytosine. 5.如权利要求2所述的方法,其中游离DNA分子的所述相应数目通过比对从所述可识别甲基化的测序中获得的序列读数来确定。5. The method of claim 2, wherein the corresponding number of free DNA molecules is determined by comparing sequence reads obtained from sequencing of the recognizable methylation. 6.如权利要求1所述的方法,其中确定所述游离DNA分子在一或多个位点上是否甲基化包括使用甲基化敏感性限制酶消化、甲基化特定的PCR、甲基化依赖性的DNA沉淀、甲基化DNA结合蛋白/肽或无需亚硫酸氢钠处理的单分子测序。6. The method of claim 1, wherein determining whether the free DNA molecule is methylated at one or more sites comprises digestion with a methylation-sensitive restriction enzyme, methylation-specific PCR, methylation-dependent DNA precipitation, methylated DNA-binding proteins/peptides, or single-molecule sequencing without sodium bisulfite treatment. 7.如权利要求1所述的方法,其进一步包含:7. The method of claim 1, further comprising: 对于所述基因组的第一多个区域中的每一者:For each of the first plurality of regions of the genome: 确定来自所述生物样品的所述区域的游离DNA分子的相应数目;Determine the corresponding number of free DNA molecules from the region of the biological sample; 计算来自所述区域的游离DNA分子的所述相应数目的相应标准化值;以及Calculate the corresponding normalized value for the corresponding number of free DNA molecules from the region; and 将所述相应标准化值与参考值比较以确定所述相应区域显示缺失还是扩增;The corresponding standardized value is compared with the reference value to determine whether the corresponding region shows a missing or an amplified region. 对所述基因组的第一多个区域中确定为含有缺失或扩增的区域的第一量进行确定;A first quantity is determined for a first plurality of regions of the genome that are identified as containing deletions or amplifications; 将所述第一量与第一阈值比较以确定癌症等级的第二分类;以及The first quantity is compared with a first threshold to determine a second category of cancer severity; and 使用所述第一分类和所述第二分类以确定癌症等级的第三分类。A third category is used to determine the cancer grade using the first and second classifications. 8.如权利要求7所述的方法,其中所述第一阈值是用于确定缺失或扩增的所述第一多个区域的百分比。8. The method of claim 7, wherein the first threshold is a percentage of the first plurality of regions that are missing or amplified. 9.如权利要求7所述的方法,其中仅当所述第一分类与所述第二分类都指示癌症时,所述第三分类才是癌症阳性。9. The method of claim 7, wherein the third category is cancer-positive only if both the first category and the second category indicate cancer. 10.如权利要求7所述的方法,其中当所述第一分类或所述第二分类指示癌症时,所述第三分类是癌症阳性。10. The method of claim 7, wherein the third category is cancer-positive when the first or second category indicates cancer. 11.如权利要求1所述的方法,其中所述第一分类指示所述生物体存在癌症,所述方法进一步包含:11. The method of claim 1, wherein the first classification indicates the presence of cancer in the organism, the method further comprising: 通过将所述第一甲基化水平与从其它生物体确定的对应值的比较,鉴别与所述生物体相关的癌症类型,其中所述其它生物体中的至少两个被鉴别为患有不同类型的癌症。The type of cancer associated with the organism is identified by comparing the first methylation level with a corresponding value determined from other organisms, wherein at least two of the other organisms are identified as having different types of cancer. 12.如权利要求7所述的方法,其中所述第一分类指示所述生物体存在癌症,所述方法进一步包含:12. The method of claim 7, wherein the first classification indicates the presence of cancer in the organism, the method further comprising: 通过将所述第一甲基化水平与从其它生物体确定的对应值的比较,鉴别与所述生物体相关的癌症类型,其中所述其它生物体中的至少两个被鉴别为患有不同类型的癌症。The type of cancer associated with the organism is identified by comparing the first methylation level with a corresponding value determined from other organisms, wherein at least two of the other organisms are identified as having different types of cancer. 13.如权利要求12所述的方法,其中所述第三分类指示所述生物体存在癌症,所述方法进一步包含:13. The method of claim 12, wherein the third classification indicates the presence of cancer in the organism, the method further comprising: 通过将区域的所述第一量与从所述其它生物体确定的对应值的比较,鉴别与所述生物体相关的所述癌症类型。The type of cancer associated with the organism is identified by comparing the first quantity in the region with a corresponding value determined from the other organisms. 14.如权利要求7所述的方法,其中计算所述第一甲基化水平包括:14. The method of claim 7, wherein calculating the first methylation level comprises: 鉴别所述基因组的第二多个区域;Identify a second or more regions of the genome; 鉴别各个所述第二多个区域内的一或多个位点;以及Identify one or more sites within each of the second plurality of regions; and 计算所述第二多个区域中每个区域的区域甲基化水平,其中所述第一甲基化水平是针对第一区域,Calculate the regional methylation level of each of the second plurality of regions, wherein the first methylation level is for the first region. 所述方法进一步包含:The method further includes: 所述区域甲基化水平每一者与相应区域阈值比较,其包括将所述第一甲基化水平与所述第一阈值比较;Each of the methylation levels in a region is compared with a corresponding regional threshold, including comparing the first methylation level with the first threshold. 对区域甲基化水平确定为超过所述相应区域阈值的区域的第二量进行确定;以及A second amount is determined for regions where the methylation level exceeds the corresponding regional threshold; and 将所述第二多个区域中区域的所述第二量与第二阈值比较以确定所述第一分类。The second quantity of a region in the second plurality of regions is compared with a second threshold to determine the first classification. 15.如权利要求1所述的方法,其中计算所述第一甲基化水平包括:15. The method of claim 1, wherein calculating the first methylation level comprises: 鉴别所述基因组的第二多个区域;Identify a second or more regions of the genome; 鉴别各个所述第二多个区域内的一或多个位点;以及Identify one or more sites within each of the second plurality of regions; and 计算所述第二多个区域中每个区域的区域甲基化水平,其中所述第一甲基化水平是针对第一区域,Calculate the regional methylation level of each of the second plurality of regions, wherein the first methylation level is for the first region. 所述方法进一步包含:The method further includes: 所述区域甲基化水平每一者与相应区域阈值比较,其包括将所述第一甲基化水平与所述第一阈值比较;Each of the methylation levels in a region is compared with a corresponding regional threshold, including comparing the first methylation level with the first threshold. 对区域甲基化水平确定为超过所述相应区域阈值的区域的第二量进行确定;以及A second amount is determined for regions where the methylation level exceeds the corresponding regional threshold; and 将所述第二多个区域中区域的所述第二量与第二阈值比较以确定所述第一分类。The second quantity of a region in the second plurality of regions is compared with a second threshold to determine the first classification. 16.如权利要求15所述的方法,其中确定区域甲基化水平超过所述相应区域阈值的所述区域对应于第一组区域,所述方法进一步包含:16. The method of claim 15, wherein the region where the methylation level exceeds the corresponding region threshold corresponds to a first group of regions, the method further comprising: 将所述第一组区域的区域甲基化水平与其它生物体关于所述第一组区域的对应区域甲基化水平比较,所述其它生物体具有如下所述的生物体中的至少两者:第一类型的癌症、不存在癌症和第二类型的癌症;以及The methylation levels of the first set of regions are compared with the corresponding methylation levels of other organisms in the first set of regions, wherein the other organisms have at least two of the following: type I cancer, no cancer, and type II cancer; and 基于所述比较,确定所述生物体是否具有所述第一类型的癌症、不存在癌症或所述第二类型的癌症。Based on the comparison, it is determined whether the organism has the first type of cancer, does not have cancer, or has the second type of cancer. 17.如权利要求16所述的方法,其进一步包含:17. The method of claim 16, further comprising: 基于其它生物体的所述第一组区域的对应区域甲基化水平将所述其它生物体聚类,其中所述类中的两者对应于以下任两者:所述第一类型的癌症、不存在癌症和所述第二类型的癌症,The other organisms are clustered based on the methylation levels of corresponding regions in the first group of regions, wherein two of the clusters correspond to either of the following: the first type of cancer, the absence of cancer, and the second type of cancer. 其中利用所述第二多个区域中区域甲基化水平的所述比较以确定所述生物体属于哪一类。The comparison of regional methylation levels in the second plurality of regions is used to determine which category the organism belongs to. 18.如权利要求17所述的方法,其中所述其它生物体的所述聚类使用所述生物体的所述区域甲基化水平。18. The method of claim 17, wherein the clustering of the other organisms uses the methylation level of the regions of the organism. 19.如权利要求17所述的方法,其中所述类包括与所述第一类型的癌症相对应的第一类、与所述第二类型的癌症相对应的第二类和与不存在癌症相对应的第三类。19. The method of claim 17, wherein the class includes a first class corresponding to the first type of cancer, a second class corresponding to the second type of cancer, and a third class corresponding to the absence of cancer. 20.如权利要求17所述的方法,其中所述其它生物体的所述聚类进一步基于所述其它生物体的第二组区域的相应标准化值,其中所述第二组区域对应于确定为含有缺失或扩增的区域,并且其中区域的所述相应标准化值是由来自所述区域的游离DNA分子的相应数目确定,所述方法进一步包含:20. The method of claim 17, wherein the clustering of the other organisms is further based on corresponding normalized values of a second set of regions of the other organisms, wherein the second set of regions corresponds to regions identified as containing deletions or amplifications, and wherein the corresponding normalized value of the region is determined by the corresponding number of free DNA molecules from the region, the method further comprising: 对于所述第二组区域中的每一者:For each of the regions in the second group: 确定来自所述区域的游离DNA分子的相应数目;和Determine the corresponding number of free DNA molecules from the region; and 从来自所述区域的游离DNA分子的相应数目来计算相应标准化值;以及The corresponding normalized value is calculated from the corresponding number of free DNA molecules from the region; and 将所述生物体的所述第二组区域的所述相应标准化值与所述其它生物体的所述相应标准化值比较以确定所述生物体属于哪一类的一部分。The corresponding standardized value of the second group region of the organism is compared with the corresponding standardized value of the other organisms to determine which category the organism belongs to. 21.如权利要求20所述的方法,其中所述其它生物体的所述聚类进一步基于高甲基化的CpG岛的相应甲基化密度,所述方法进一步包含:21. The method of claim 20, wherein the clustering of the other organisms is further based on the corresponding methylation density of highly methylated CpG islands, the method further comprising: 对于所述高甲基化的CpG岛每一者:For each of the hypermethylated CpG islands: 确定相应甲基化密度,和Determine the corresponding methylation density, and 将所述生物体的所述高甲基化的CpG岛的所述相应甲基化密度与所述其它生物体的所述甲基化密度比较以确定所述生物体属于哪一类的一部分。The methylation density of the hypermethylated CpG islands of the organism is compared with the methylation density of the other organisms to determine which category the organism belongs to. 22.如权利要求15所述的方法,其进一步包含:22. The method of claim 15, further comprising: 对于所述第二多个区域中的每一者:For each of the second plurality of regions: 计算所述区域甲基化水平与所述相应区域阈值之间的相应差异;以及Calculate the difference between the methylation level of the region and the corresponding threshold of the region; and 计算与所述相应差异相对应的相应概率;Calculate the corresponding probability for each of the aforementioned differences; 其中确定区域的所述第二量包括:The second quantity that determines the region includes: 计算包括所述相应概率的累积分数。Calculate the cumulative score including the corresponding probability. 23.如权利要求22所述的方法,其中计算所述累积分数包括:23. The method of claim 22, wherein calculating the cumulative score comprises: 利用所述第二多个区域中各区域的所述相应概率的对数以获得相应对数结果;以及The logarithmic result is obtained by using the logarithm of the corresponding probability of each region in the second plurality of regions; and 计算包括所述相应对数结果的总和。The calculation includes the sum of the corresponding logarithmic results. 24.如权利要求23所述的方法,其中所述累积分数是所述相应对数结果的所述总和的负值。24. The method of claim 23, wherein the cumulative fraction is the negative of the sum of the corresponding logarithmic results. 25.如权利要求22所述的方法,其中所述第二多个区域中各区域的所述相应差异被所述相应区域阈值相关的标准差所标准化。25. The method of claim 22, wherein the corresponding differences in each of the second plurality of regions are standardized by the standard deviation related to the threshold of the corresponding region. 26.如权利要求22所述的方法,其中所述相应概率对应于根据统计分布所述相应差异的概率。26. The method of claim 22, wherein the corresponding probability corresponds to the probability of the corresponding difference according to a statistical distribution. 27.如权利要求22所述的方法,其中所述第二阈值对应于来自其它生物体的样品的参考组别的最高累积分数。27. The method of claim 22, wherein the second threshold corresponds to the highest cumulative score of a reference group of samples from other organisms. 28.如权利要求15所述的方法,其进一步包含:28. The method of claim 15, further comprising: 对于所述第一多个区域中的每一者:For each of the first plurality of regions: 计算所述相应标准化值与所述参考值之间的相应差异;以及Calculate the corresponding difference between the standardized value and the reference value; and 计算与所述相应差异相对应的相应概率;Calculate the corresponding probability for each of the aforementioned differences; 其中确定区域的所述第一量包括:The first quantity that determines the region includes: 计算包括所述相应概率的第一总和。The calculation includes the first summation of the corresponding probabilities. 29.如权利要求15所述的方法,其中所述相应区域阈值是来自参考甲基化水平的指定量。29. The method of claim 15, wherein the corresponding region threshold is a specified amount from a reference methylation level. 30.如权利要求15所述的方法,其中所述第二阈值是百分比,并且其中比较区域的所述第二量与第二阈值包括:30. The method of claim 15, wherein the second threshold is a percentage, and wherein comparing the second quantity in the comparison region with the second threshold comprises: 在与所述第二阈值比较前,将区域的所述第二量除以所述第二多个区域的第二数目。Before comparing with the second threshold, the second quantity of the region is divided by the second number of the second plurality of regions. 31.如权利要求30所述的方法,其中所述第二数目对应于所述第二多个区域全部。31. The method of claim 30, wherein the second number corresponds to all of the second plurality of regions. 32.如权利要求15所述的方法,其中所述第一多个区域与所述第二多个区域相同,并且其中所述相应区域阈值依赖于所述相应区域显示缺失还是扩增。32. The method of claim 15, wherein the first plurality of regions are the same as the second plurality of regions, and wherein the threshold of the corresponding region depends on whether the corresponding region shows a deficiency or an amplification. 33.如权利要求32所述的方法,其中与未显示扩增时相比,所述相应区域显示扩增时相应区域阈值中的一个具有更大量值,并且其中与未显示缺失时相比,所述相应区域显示缺失时相应区域阈值中的第二个具有更小量值。33. The method of claim 32, wherein one of the threshold values of the corresponding region when amplification is displayed has a larger value compared to when no amplification is displayed, and wherein the second of the threshold values of the corresponding region when a deficiency is displayed has a smaller value compared to when no deficiency is displayed. 34.如权利要求33所述的方法,其中相应区域阈值测试所述第二多个区域的低甲基化,其中与未显示扩增时相比,所述相应区域显示扩增时相应区域阈值中的第三个具有更大负值,并且其中与未显示缺失时相比,所述相应区域显示缺失时,相应区域阈值中的第四个具有更小负值。34. The method of claim 33, wherein the corresponding region threshold tests for hypomethylation of the second plurality of regions, wherein the third of the corresponding region thresholds has a larger negative value when the corresponding region shows amplification compared to when no amplification is shown, and wherein the fourth of the corresponding region thresholds has a smaller negative value when the corresponding region shows deletion compared to when no deletion is shown. 35.如权利要求15所述的方法,其中所述生物样品在治疗前采集,所述方法进一步包含:35. The method of claim 15, wherein the biological sample is collected prior to treatment, the method further comprising: 针对在治疗后采集的另一生物样品,重复如权利要求15所述的方法,以获得:For another biological sample collected after treatment, the method described in claim 15 is repeated to obtain: 确定显示缺失或扩增的区域的后续第一量;以及Determine the first subsequent measure to show the region of absence or amplification; and 确定区域甲基化水平超过所述相应区域阈值的区域的后续第二量;A subsequent second quantity is determined for regions where the methylation level exceeds the corresponding regional threshold; 将所述第一量与所述后续第一量比较并且将所述第二量与所述后续第二量比较以确定所述生物体的预后。The first quantity is compared with the subsequent first quantity, and the second quantity is compared with the subsequent second quantity to determine the prognosis of the organism. 36.如权利要求35所述的方法,其中将所述第一量与所述后续第一量比较并且将所述第二量与所述后续第二量比较以确定所述生物体的所述预后包括:36. The method of claim 35, wherein comparing the first quantity with the subsequent first quantity and comparing the second quantity with the subsequent second quantity to determine the prognosis of the organism comprises: 确定所述第一量与所述后续第一量之间的第一差异;Determine the first difference between the first quantity and the subsequent first quantity; 将所述第一差异与一或多个第一差异阈值比较;Compare the first difference with one or more first difference thresholds; 确定所述第二量与所述后续第二量之间的第二差异;以及Determine the second difference between the second quantity and the subsequent second quantity; and 将所述第二差异与一或多个第二差异阈值比较。The second difference is compared with one or more second difference thresholds. 37.如权利要求36所述的方法,其中当所述第一差异低于所述第一差异阈值中的一个时,与所述第一差异超过所述第一差异阈值中的一个时相比,预测所述预后将变得更坏,并且其中当所述第二差异低于所述第二差异阈值中的一个时,与所述第二差异超过所述第二差异阈值中的一个时相比,预测所述预后将变得更坏。37. The method of claim 36, wherein when the first difference is below one of the first difference thresholds, the prognosis is predicted to be worse compared to when the first difference exceeds one of the first difference thresholds, and wherein when the second difference is below one of the second difference thresholds, the prognosis is predicted to be worse compared to when the second difference exceeds one of the second difference thresholds. 38.如权利要求37所述的方法,其中所述第一差异阈值中的一个和所述第二差异阈值中的一个是零。38. The method of claim 37, wherein one of the first difference thresholds and one of the second difference thresholds are zero. 39.如权利要求35所述的方法,其中所述治疗是免疫疗法、手术、放射线疗法、化学疗法、基于抗体的疗法、表观遗传疗法或靶向疗法。39. The method of claim 35, wherein the treatment is immunotherapy, surgery, radiation therapy, chemotherapy, antibody-based therapy, epigenetic therapy, or targeted therapy. 40.如权利要求1所述的方法,其中所述第一阈值是与由获自健康生物体的生物样品建立的参考甲基化水平的指定距离。40. The method of claim 1, wherein the first threshold is a specified distance from a reference methylation level established from a biological sample obtained from a healthy organism. 41.如权利要求40所述的方法,其中所述指定距离是相对所述参考甲基化水平的标准偏差的指定数目。41. The method of claim 40, wherein the specified distance is a specified number of standard deviations relative to the reference methylation level. 42.如权利要求1所述的方法,其中所述第一阈值由参考甲基化水平建立,所述参考甲基化水平是从在所述生物样品测试前获得的所述生物体的先前生物样品确定。42. The method of claim 1, wherein the first threshold is established by a reference methylation level, the reference methylation level being determined from a previous biological sample of the organism obtained prior to the biological sample test. 43.如权利要求1所述的方法,其中将所述第一甲基化水平与所述第一阈值的比较包括:43. The method of claim 1, wherein comparing the first methylation level with the first threshold comprises: 确定所述第一甲基化水平与参考甲基化水平之间的差异;以及Determine the difference between the first methylation level and the reference methylation level; and 将所述差异与对应于所述第一阈值比较。The difference is compared with the value corresponding to the first threshold. 44.如权利要求1所述的方法,其进一步包含:44. The method of claim 1, further comprising: 确定所述生物样品中肿瘤游离DNA的百分比浓度;Determine the percentage concentration of cell-free tumor DNA in the biological sample; 基于所述百分比浓度计算所述第一阈值。The first threshold is calculated based on the percentage concentration. 45.如权利要求1所述的方法,其进一步包含:45. The method of claim 1, further comprising: 确定所述生物样品中肿瘤游离DNA的百分比浓度是否超过最小值;以及Determine whether the percentage concentration of cell-free tumor DNA in the biological sample exceeds a minimum value; and 如果所述百分比浓度不超过所述最小值,那么标记所述生物样品。If the percentage concentration does not exceed the minimum value, then the biological sample is labeled. 46.如权利要求45所述的方法,其中所述最小值基于肿瘤甲基化水平相对于参考甲基化水平的预期差异来确定。46. The method of claim 45, wherein the minimum value is determined based on the expected difference between the tumor methylation level and a reference methylation level. 47.如权利要求1所述的方法,其进一步包含:47. The method of claim 1, further comprising: 测量位于所述多个位点上游离DNA分子的尺寸;以及Measuring the size of free DNA molecules located at the plurality of sites; and 在将所述第一甲基化水平与所述第一阈值比较前,基于所述游离DNA分子的所述测量尺寸将所述第一甲基化水平标准化。Before comparing the first methylation level with the first threshold, the first methylation level is normalized based on the measured size of the free DNA molecule. 48.如权利要求47所述的方法,其中基于所述测量尺寸将所述第一甲基化水平标准化包括:48. The method of claim 47, wherein normalizing the first methylation level based on the measurement size comprises: 选择具有第一尺寸的游离DNA分子;Select free DNA molecules with the first size; 使用所选择的游离DNA分子计算所述第一甲基化水平,所述第一阈值对应于所述第一尺寸。The first methylation level is calculated using the selected free DNA molecule, and the first threshold corresponds to the first size. 49.如权利要求48所述的方法,其中所述第一尺寸是某一长度范围。49. The method of claim 48, wherein the first dimension is a length range. 50.如权利要求48所述的方法,其中基于依赖于尺寸的物理分离来选择所述游离DNA分子。50. The method of claim 48, wherein the free DNA molecules are selected based on size-dependent physical separation. 51.如权利要求48所述的方法,其中选择具有第一尺寸的游离DNA分子包括:51. The method of claim 48, wherein selecting a free DNA molecule having a first size comprises: 对所述多个游离DNA分子进行双末端大规模平行测序以获得所述多个游离DNA分子每一者的成对序列;The plurality of free DNA molecules were subjected to paired-end massive parallel sequencing to obtain the paired sequences of each of the plurality of free DNA molecules; 通过将所述多个游离DNA分子的所述成对序列与参考基因组比较,确定游离DNA分子的尺寸;以及The size of the free DNA molecules is determined by comparing the paired sequences of the plurality of free DNA molecules with a reference genome; and 选择具有所述第一尺寸的游离DNA分子。Select a free DNA molecule having the first size. 52.如权利要求47所述的方法,其中基于所述测量的尺寸将所述第一甲基化水平标准化包括:52. The method of claim 47, wherein normalizing the first methylation level based on the measured size comprises: 获得尺寸与甲基化水平之间的函数关系;以及Obtain the functional relationship between size and methylation level; and 使用所述函数关系将所述第一甲基化水平标准化。The first methylation level is normalized using the aforementioned functional relationship. 53.如权利要求52所述的方法,其中所述函数关系提供与相应尺寸相对应的校正系数。53. The method of claim 52, wherein the functional relationship provides a correction coefficient corresponding to the respective size. 54.如权利要求53所述的方法,其进一步包含:54. The method of claim 53, further comprising: 计算与用于计算所述第一甲基化水平的游离DNA分子相对应的平均尺寸;以及Calculate the average size corresponding to the free DNA molecule used to calculate the first methylation level; and 将所述第一甲基化水平乘以所述对应校正系数。Multiply the first methylation level by the corresponding correction factor. 55.如权利要求53所述的方法,其进一步包含:55. The method of claim 53, further comprising: 对于所述多个位点中的每一者:For each of the plurality of sites: 对于位于所述位点上的所述游离DNA分子每一者:For each of the free DNA molecules located at the said site: 获得所述位点上所述游离DNA分子的相应尺寸;以及Obtain the corresponding size of the free DNA molecule at the said site; and 使用与所述相应尺寸相对应的所述校正系数将所述游离DNA分子对在所述位点上甲基化的游离DNA分子的所述相应数目的贡献标准化。The contribution of the free DNA molecule to the corresponding number of free DNA molecules methylated at the site is normalized using the correction coefficient corresponding to the corresponding size. 56.如权利要求1所述的方法,其中所述多个位点包括CpG位点,其中所述CpG位点组织成多个CpG岛,每个CpG岛包括多个CpG位点,并且其中所述第一甲基化水平对应于多个CpG岛的第一CpG岛。56. The method of claim 1, wherein the plurality of sites comprises CpG sites, wherein the CpG sites are organized into a plurality of CpG islands, each CpG island comprising a plurality of CpG sites, and wherein the first methylation level corresponds to a first CpG island of the plurality of CpG islands. 57.如权利要求56所述的方法,其中其它生物体的样品的参考群体中所述CpG岛的每一者具有低于第一百分比的平均甲基化密度,并且其中所述参考群体中所述CpG岛的每一者具有低于第二百分比的平均甲基化密度变异系数,并且其中对于分别的CpG岛,使用跨越所述多个CpG位点的甲基化和非甲基化DNA分子的总数确定每个甲基化密度。57. The method of claim 56, wherein each of the CpG islands in a reference population of samples from other organisms has an average methylation density of less than a first percentage, and wherein each of the CpG islands in the reference population has an average methylation density coefficient of variation of less than a second percentage, and wherein for each CpG island, each methylation density is determined using the total number of methylated and unmethylated DNA molecules spanning the plurality of CpG sites. 58.如权利要求56所述的方法,其进一步包含:58. The method of claim 56, further comprising: 对于所述CpG岛每一者:For each of the CpG islands: 通过将所述CpG岛的甲基化水平与相应阈值比较,确定所述CpG岛是否相对于其它生物体的样品的参考组别具有高甲基化;By comparing the methylation level of the CpG islands with corresponding thresholds, it is determined whether the CpG islands are highly methylated relative to a reference group of samples from other organisms; 对于所述高甲基化的CpG岛每一者:For each of the hypermethylated CpG islands: 确定相应甲基化密度,Determine the corresponding methylation density, 从所述相应甲基化密度计算累积分数;以及Calculate the cumulative fraction from the corresponding methylation density; and 将所述累积分数与累积阈值比较以确定所述第一分类。The cumulative score is compared with a cumulative threshold to determine the first classification. 59.如权利要求58所述的方法,其中从所述相应甲基化密度计算所述累积分数包括:59. The method of claim 58, wherein calculating the cumulative fraction from the corresponding methylation density comprises: 对于所述高甲基化的CpG岛每一者:For each of the hypermethylated CpG islands: 计算所述相应甲基化密度与参考密度之间的相应差异;以及Calculate the corresponding difference between the methylation density and the reference density; and 计算与所述相应差异相对应的相应概率;以及Calculate the corresponding probability corresponding to the corresponding difference; and 使用所述相应概率确定所述累积分数。The cumulative score is determined using the corresponding probability. 60.如权利要求59所述的方法,其中所述累积分数通过以下来确定:60. The method of claim 59, wherein the cumulative score is determined by: 对于所述高甲基化的CpG岛每一者:利用所述相应概率的对数以获得相应对数结果;以及For each of the hypermethylated CpG islands: the corresponding logarithmic result is obtained using the logarithm of the corresponding probability; and 计算包括所述相应对数结果的总和,其中所述累积分数是所述总和的负值。The calculation includes the sum of the corresponding logarithmic results, wherein the cumulative fraction is the negative of the sum. 61.如权利要求59所述的方法,其中每个相应差异用与所述参考密度相关的标准差标准化。61. The method of claim 59, wherein each corresponding difference is standardized using a standard deviation associated with the reference density. 62.如权利要求58所述的方法,其中所述累积阈值对应于来自所述参考组别的最高累积分数。62. The method of claim 58, wherein the cumulative threshold corresponds to the highest cumulative score from the reference group. 63.如权利要求58所述的方法,其中确定所述第一CpG岛是否高甲基化包括:63. The method of claim 58, wherein determining whether the first CpG island is hypermethylated comprises: 比较所述第一甲基化水平与所述第一阈值和第三阈值,Compare the first methylation level with the first threshold and the third threshold. 其中所述第一阈值对应于所述参考群体的甲基化密度的平均值加指定百分比,并且其中所述第三阈值对应于指定数目的标准偏差加所述参考群体组别的甲基化密度的平均值。The first threshold corresponds to the average methylation density of the reference population plus a specified percentage, and the third threshold corresponds to a specified number of standard deviations plus the average methylation density of the reference population groups. 64.如权利要求63所述的方法,其中所述指定百分比是2%。64. The method of claim 63, wherein the specified percentage is 2%. 65.如权利要求63所述的方法,其中所述指定数目的标准偏差是三。65. The method of claim 63, wherein the specified number of standard deviations is three. 66.如权利要求1所述的方法,其进一步包含:66. The method of claim 1, further comprising: 对于所述基因组的第一多个区域中的每一者:For each of the first plurality of regions of the genome: 确定来自所述区域的游离DNA分子的相应数目;Determine the corresponding number of free DNA molecules from the region; 从所述相应数目计算相应标准化值;和Calculate the corresponding standardized value from the corresponding number; and 将所述相应标准化值与参考值比较以确定所述区域显示缺失还是扩增;The corresponding standardized value is compared with the reference value to determine whether the region shows a missing or expanded feature. 确定第一组区域,所述第一组区域被确定为具有如下所述中的一者:缺失、扩增或正常呈现,其中所述第一甲基化水平对应于所述第一组区域;A first group of regions is identified, which is defined as having one of the following: deletion, amplification, or normal presentation, wherein the first methylation level corresponds to the first group of regions; 确定第二组区域,所述第二组区域被确定为具有如下所述中的另一者:缺失、扩增或正常呈现;以及A second group of regions was identified, which was defined as having another of the following: deletion, amplification, or normal presentation; and 基于所述第二组区域中位点上甲基化的游离DNA分子的所述相应数目计算第二甲基化水平,The second methylation level is calculated based on the corresponding number of free DNA molecules methylated at the sites in the second group of regions. 其中将所述第一甲基化水平与所述第一阈值比较包括:The comparison of the first methylation level with the first threshold includes: 计算所述第一甲基化水平与所述第二甲基化水平之间的参数;和Calculate the parameters between the first methylation level and the second methylation level; and 将所述参数与所述第一阈值比较。The parameter is compared with the first threshold. 67.如权利要求66所述的方法,其中所述第一甲基化水平是针对所述第一组区域的每个区域计算的区域甲基化水平的统计值,并且其中所述第二甲基化水平是针对所述第二组区域的每个区域计算的区域甲基化水平的统计值。67. The method of claim 66, wherein the first methylation level is a statistical value of the regional methylation level calculated for each region of the first group of regions, and wherein the second methylation level is a statistical value of the regional methylation level calculated for each region of the second group of regions. 68.如权利要求67所述的方法,其中所述统计值使用Student’s t检验(Student'st-test)、方差分析(ANOVA)检验或克鲁斯卡尔-沃利斯检验(Kruskal-Wallis test)确定。68. The method of claim 67, wherein the statistic is determined using a Student’s t-test, an analysis of variance (ANOVA) test, or a Kruskal-Wallis test. 69.如权利要求66所述的方法,其中所述参数包括所述第一甲基化水平与所述第二甲基化水平之间的比率或差异。69. The method of claim 66, wherein the parameter includes the ratio or difference between the first methylation level and the second methylation level. 70.如权利要求69所述的方法,其中计算所述参数包括将概率分布应用于所述比率或所述差异。70. The method of claim 69, wherein calculating the parameter includes applying a probability distribution to the ratio or the difference. 71.一种从生物体的生物样品确定第一甲基化型态的非诊断目的的方法,所述生物样品包括包含源自第一组织和第二组织的游离DNA分子的混合物的游离DNA,所述方法包含:71. A method for non-diagnostic purposes of determining a first methylation pattern from a biological sample of an organism, said biological sample comprising cell-free DNA containing a mixture of cell-free DNA molecules derived from a first tissue and a second tissue, said method comprising: 获得与所述第二组织的DNA分子相对应的第二甲基化型态,所述第二甲基化型态提供所述生物体的基因组中多个基因座每一者上的第二甲基化密度,每个基因座上的所述第二甲基化密度为在所述基因座上所述第二组织的甲基化的DNA分子的比例;A second methylation pattern corresponding to the DNA molecules of the second tissue is obtained, the second methylation pattern providing a second methylation density at each of a plurality of loci in the genome of the organism, the second methylation density at each locus being the proportion of methylated DNA molecules of the second tissue at the locus; 从所述混合物的所述游离DNA分子确定游离甲基化型态,所述游离甲基化型态提供所述多个基因座每一者上的混合甲基化密度,每个基因座上的所述混合甲基化密度为在所述基因座上所述混合物的甲基化的DNA分子的比例;The free methylation pattern is determined from the free DNA molecules of the mixture, the free methylation pattern provides the mixed methylation density at each of the plurality of loci, the mixed methylation density at each locus being the proportion of methylated DNA molecules of the mixture at that locus; 确定所述混合物中来自所述第一组织的所述游离DNA分子的百分比;以及Determine the percentage of the free DNA molecules from the first tissue in the mixture; and 通过以下来确定所述第一组织的所述第一甲基化型态:The first methylation type of the first tissue is determined by the following: 对于所述多个基因座中的每一基因座:For each of the plurality of loci: 计算差异参数,所述差异参数包括所述第二甲基化型态的所述第二甲基化密度与所述游离甲基化型态的所述混合甲基化密度之间的差异,所述差异通过所述混合物中来自所述第一组织的所述游离DNA分子的百分比来衡量。The difference parameter is calculated, which includes the difference between the second methylation density of the second methylation type and the mixed methylation density of the free methylation type, the difference being measured by the percentage of the free DNA molecules from the first tissue in the mixture. 72.如权利要求71所述的方法,其进一步包含:72. The method of claim 71, further comprising: 从所述第一甲基化型态确定第一甲基化水平;以及Determine the first methylation level from the first methylation morphology; and 将所述第一甲基化水平与第一阈值比较以确定癌症等级的分类。The first methylation level is compared with a first threshold to determine the cancer grade classification. 73.如权利要求71所述的方法,其进一步包含:73. The method of claim 71, further comprising: 变换所述第一甲基化型态以获得校正的第一甲基化型态。The first methylation type is transformed to obtain the corrected first methylation type. 74.如权利要求73所述的方法,其中所述变换是线性变换。74. The method of claim 73, wherein the transformation is a linear transformation. 75.如权利要求71所述的方法,其中基因座的所述差异参数D被定义为其中mbc表示所述基因座上所述第二甲基化型态的所述第二甲基化密度,mp表示所述基因座上所述游离甲基化型态的所述混合甲基化密度,f是来自所述生物样品中的所述第一组织的游离DNA分子的百分比浓度,并且CN表示所述基因座上的拷贝数。75. The method of claim 71, wherein the differential parameter D of the locus is defined as follows: mbc represents the second methylation density of the second methylation pattern at the locus, mp represents the mixed methylation density of the free methylation pattern at the locus, f is the percentage concentration of free DNA molecules from the first tissue in the biological sample, and CN represents the copy number at the locus. 76.如权利要求75所述的方法,其中对于所述基因座上为二倍体的所述第一组织的所述拷贝数,CN是一。76. The method of claim 75, wherein CN is one for the copy number of the first tissue that is diploid at the locus. 77.如权利要求75所述的方法,其进一步包含:77. The method of claim 75, further comprising: 鉴别D超出阈值的区域。Identify regions where D exceeds the threshold. 78.如权利要求71所述的方法,其进一步包含:78. The method of claim 71, further comprising: 通过选择具有以下标准中的任一或多者的基因座,鉴别所述多个基因座:The plurality of loci are identified by selecting loci that meet one or more of the following criteria: 超过50%的GC含量;GC content exceeding 50%; 所述第二甲基化型态的第二甲基化密度低于第一阈值或超过第二阈值;以及The second methylation density of the second methylation type is below the first threshold or exceeds the second threshold; and 由所述基因座界定的区域中最少五个CpG位点。The region defined by the locus contains at least five CpG sites. 79.如权利要求71所述的方法,其中所述生物样品来自怀有胎儿的女性个体,并且其中所述第一组织来自所述胎儿或胎盘并且所述第二组织来自所述女性个体。79. The method of claim 71, wherein the biological sample is derived from a female individual carrying a fetus, and wherein the first tissue is derived from the fetus or placenta and the second tissue is derived from the female individual. 80.如权利要求79所述的方法,其进一步包含:80. The method of claim 79, further comprising: 从所述第一甲基化型态确定第一甲基化水平;以及Determine the first methylation level from the first methylation morphology; and 将所述第一甲基化水平与第一阈值比较以确定医学病状的分类。The first methylation level is compared with a first threshold to determine the classification of the medical condition. 81.如权利要求80所述的方法,其中所述医学病状是先兆子痫或所述第一组织中的染色体异常。81. The method of claim 80, wherein the medical condition is preeclampsia or a chromosomal abnormality in the first tissue. 82.如权利要求71所述的方法,其中所述第一组织来自所述生物体内的肿瘤并且所述第二组织来自所述生物体的非恶性组织。82. The method of claim 71, wherein the first tissue is derived from a tumor within the organism and the second tissue is derived from a non-malignant tissue within the organism. 83.如权利要求71所述的方法,其中所述生物样品还包括来自所述第二组织的DNA分子,并且其中获得所述第二甲基化型态包括:83. The method of claim 71, wherein the biological sample further comprises DNA molecules from the second tissue, and wherein obtaining the second methylation type comprises: 从所述第二组织分析所述DNA分子以确定所述第二甲基化型态。The DNA molecules from the second tissue are analyzed to determine the second methylation pattern. 84.如权利要求71所述的方法,其中所述获得的第二甲基化型态对应于从一组对照样品获得的平均甲基化型态。84. The method of claim 71, wherein the obtained second methylation type corresponds to the average methylation type obtained from a set of control samples. 85.如权利要求71所述的方法,其中确定所述多个基因座的第一基因座的所述游离DNA的甲基化型态的混合甲基化密度包括:85. The method of claim 71, wherein determining the mixed methylation density of the methylation patterns of the free DNA at the first locus of the plurality of loci comprises: 确定来自所述第一基因座的所述游离DNA分子的第一数目;Determine a first number of the free DNA molecules from the first locus; 确定所述第一数目的每个DNA分子在一或多个位点上是否甲基化以获得甲基化DNA分子的第二数目;以及Determine whether each of the first number of DNA molecules is methylated at one or more sites to obtain a second number of methylated DNA molecules; and 从所述第一数目和所述第二数目计算所述混合甲基化密度。The mixed methylation density is calculated from the first number and the second number. 86.如权利要求85所述的方法,其中所述一或多个位点是CpG位点。86. The method of claim 85, wherein the one or more sites are CpG sites. 87.如权利要求71所述的方法,其中确定所述游离DNA甲基化型态包括:87. The method of claim 71, wherein determining the free DNA methylation pattern comprises: 用亚硫酸氢钠处理所述游离DNA分子;以及The free DNA molecules were treated with sodium bisulfite; and 对所述经处理的DNA分子进行测序。The processed DNA molecules were sequenced. 88.如权利要求87所述的方法,其中所述游离DNA分子的所述处理是Tet辅助的亚硫酸氢盐转化的一部分或包括用高钌酸钾(KRuO4)处理所述游离DNA分子。88. The method of claim 87, wherein the treatment of the free DNA molecule is part of a Tet-assisted bisulfite conversion or includes treatment of the free DNA molecule with potassium perruthenate (KRuO4). 89.如权利要求71所述的方法,其中确定所述游离甲基化型态包括:89. The method of claim 71, wherein determining the free methylation type comprises: 对所述DNA进行单分子测序,其中所述单分子测序包括确定DNA分子的至少一个位点是否甲基化。The DNA is subjected to single-molecule sequencing, wherein the single-molecule sequencing includes determining whether at least one site of the DNA molecule is methylated. 90.一种计算机系统,其包含一个或多个处理器和存储多个指令的计算机可读媒体,所述指令在被执行时控制计算机系统的一个或多个处理器以执行如权利要求1和6-89中任一项所述的方法。90. A computer system comprising one or more processors and a computer-readable medium storing a plurality of instructions, which, when executed, control one or more processors of the computer system to perform the method as described in any one of claims 1 to 6-89. 91.一种从生物体的生物样品确定第一甲基化型态的非诊断目的的方法,所述生物样品包括包含源自第一组织和第二组织的游离DNA分子的混合物的游离DNA,所述方法包含:91. A method for non-diagnostic purposes of determining a first methylation pattern from a biological sample of an organism, said biological sample comprising free DNA containing a mixture of free DNA molecules derived from a first tissue and a second tissue, said method comprising: 分析来自所述生物样品的多个DNA分子,其中分析来自所述多个DNA分子的DNA分子包括:Analyzing multiple DNA molecules from the biological sample, wherein the DNA molecules analyzed from the multiple DNA molecules include: 确定所述生物体的基因组中所述DNA分子的位置;Determine the location of the DNA molecule in the genome of the organism; 确定所述DNA分子的基因型;以及Determine the genotype of the DNA molecule; and 确定所述DNA分子在一或多个位点是否甲基化;Determine whether the DNA molecule is methylated at one or more sites; 鉴别多个第一基因座,其满足与所述第一组织的第一基因组相应的第一等位基因和相应的第二等位基因是杂合的并且所述第二组织的第二基因组相应的第一等位基因是纯合的;Identify multiple first loci that satisfy the condition that the first allele and the corresponding second allele corresponding to the first genome of the first tissue are heterozygous and the corresponding first allele of the second genome of the second tissue is homozygous; 对于所述第一基因座中的每一者:For each of the first loci: 对于与所述基因座相关的一或多个位点的每一者:For each of the one or more loci associated with the said gene locus: 确定在所述位点上甲基化并对应于所述基因座的所述相应第二等位基因的DNA分子的数目;Determine the number of DNA molecules methylated at the site and corresponding to the corresponding second allele at the locus; 利用在所述基因座的所述一或多个位点上甲基化并对应于所述基因座的所述相应第二等位基因的DNA分子的所述数目以计算甲基化密度;以及The methylation density is calculated using the number of DNA molecules methylated at one or more sites at the locus and corresponding to the respective second allele at the locus; and 从所述第一基因座的所述甲基化密度产生所述第一组织的所述第一甲基化型态,其中每个基因座上的第一甲基化密度为在所述基因座上所述第一组织的甲基化的DNA分子的比例。The first methylation pattern of the first tissue is generated from the methylation density of the first locus, wherein the first methylation density at each locus is the proportion of methylated DNA molecules of the first tissue at the locus. 92.如权利要求91所述的方法,其中所述生物样品来自怀有胎儿的女性个体,并且其中所述第一组织来自所述胎儿或胎盘并且所述第二组织对应于所述女性个体。92. The method of claim 91, wherein the biological sample is derived from a female individual carrying a fetus, and wherein the first tissue is derived from the fetus or placenta and the second tissue corresponds to the female individual. 93.如权利要求91所述的方法,其中所述位点对应于CpG位点。93. The method of claim 91, wherein the site corresponds to a CpG site. 94.如权利要求91所述的方法,其中每个基因座包括至少一个CpG位点。94. The method of claim 91, wherein each locus includes at least one CpG site. 95.如权利要求91所述的方法,其中每个基因座靠近至少一个CpG位点。95. The method of claim 91, wherein each locus is adjacent to at least one CpG site. 96.一种从生物体的生物样品检测染色体异常的非诊断目的的方法,所述生物样品包括包含源自第一组织和第二组织的游离DNA分子的混合物的游离DNA,所述方法包含:96. A method for detecting chromosomal abnormalities from a biological sample of an organism for non-diagnostic purposes, said biological sample comprising cell-free DNA containing a mixture of cell-free DNA molecules derived from a first tissue and a second tissue, said method comprising: 分析来自所述生物样品的多个DNA分子,其中所述分析是针对获自大规模平行测序的序列读值,并且分析来自所述多个DNA分子的DNA分子包括:The analysis comprises multiple DNA molecules from the biological sample, wherein the analysis is performed on sequence reads obtained from massively parallel sequencing, and the DNA molecules analyzed from the multiple DNA molecules include: 确定参考基因组中所述DNA分子的位置;以及Determine the location of the DNA molecule described in the reference genome; and 确定所述DNA分子在一或多个位点上是否甲基化;Determine whether the DNA molecule is methylated at one or more sites; 对于多个位点中的每一者:For each of the multiple sites: 利用序列读值,对在所述位点上甲基化的DNA分子的相应数目进行分别计数;The number of DNA molecules methylated at the stated site is counted using sequence reads. 利用所述第一染色体区域内的位点上甲基化的DNA分子的所述相应数目之和,计算第一染色体区域的第一甲基化水平,所述第一甲基化水平为在所述第一染色体区域内的位点上甲基化的DNA分子的比例;The first methylation level of the first chromosome region is calculated by summing the corresponding numbers of methylated DNA molecules at sites within the first chromosome region. The first methylation level is the proportion of methylated DNA molecules at sites within the first chromosome region. 将所述第一甲基化水平与第一阈值比较;以及Compare the first methylation level with a first threshold; and 基于所述比较确定所述第一组织中所述第一染色体区域的异常的分类。The classification of abnormalities in the first chromosomal region of the first tissue is determined based on the comparison. 97.如权利要求96所述的方法,其中将所述第一甲基化水平与阈值比较包括:97. The method of claim 96, wherein comparing the first methylation level with a threshold comprises: 将所述第一甲基化水平标准化并将所述标准化第一甲基化水平与所述阈值比较。The first methylation level is standardized and the standardized first methylation level is compared with the threshold. 98.如权利要求97所述的方法,其中所述标准化是指使用第二染色体区域的第二甲基化水平进行标准化。98. The method of claim 97, wherein the normalization refers to normalization using the second methylation level of the second chromosome region. 99.如权利要求97所述的方法,其中所述标准化是指使用来自所述第一组织的游离DNA的百分比浓度进行标准化。99. The method of claim 97, wherein the standardization refers to standardization using a percentage concentration of free DNA from the first tissue. 100.如权利要求96所述的方法,其中计算第一甲基化水平包括:100. The method of claim 96, wherein calculating the first methylation level comprises: 鉴别所述生物体的基因组的多个区域;Identify multiple regions of the genome of the organism; 鉴别所述多个区域中每一区域内的一或多个位点;Identify one or more sites within each of the plurality of regions; 计算所述多个区域中每个区域的区域甲基化水平以产生区域甲基化水平,其中所述第一甲基化水平是针对第一区域,以及Calculate the regional methylation level for each of the plurality of regions to generate a regional methylation level, wherein the first methylation level is for a first region, and 其中将所述第一甲基化水平与第一阈值比较包括:The comparison of the first methylation level with the first threshold includes: 将每一所述区域甲基化水平与相应区域的阈值比较;Compare the methylation level of each region with the threshold of the corresponding region; 确定区域甲基化水平超出所述相应区域阈值的区域的第一数目;以及Determine a first number of regions whose regional methylation levels exceed the corresponding regional threshold; and 将所述第一数目与阈值以确定所述分类。The first number is compared with a threshold to determine the classification. 101.如权利要求100所述的方法,其中所述阈值是百分比,并且其中将所述第一数目与阈值比较包括:101. The method of claim 100, wherein the threshold is a percentage, and wherein comparing the first number with the threshold comprises: 在与所述阈值比较前,将区域的所述第一数目除以区域的第二数目。Before comparing with the threshold, the first number of regions is divided by the second number of regions. 102.如权利要求101所述的方法,其中区域的所述第二数目是指所鉴别的所述多个区域全部。102. The method of claim 101, wherein the second number of regions refers to all of the identified plurality of regions. 103.如权利要求100所述的方法,其中所述相应区域阈值是来自参考甲基化水平的指定量。103. The method of claim 100, wherein the corresponding region threshold is a specified amount from a reference methylation level. 104.如权利要求96所述的方法,其中所述生物样品来自怀有胎儿的女性个体,其中所述第一组织来自胎儿或胎盘并且所述第二组织来自所述女性个体,并且其中所述染色体异常是胎儿染色体异常。104. The method of claim 96, wherein the biological sample is derived from a female individual carrying a fetus, wherein the first tissue is derived from a fetus or placenta and the second tissue is derived from the female individual, and wherein the chromosomal abnormality is a fetal chromosomal abnormality. 105.如权利要求104所述的方法,其中所述阈值是指基于与来自所述女性个体的所述第二组织的DNA相关的背景甲基化水平。105. The method of claim 104, wherein the threshold refers to the background methylation level associated with the DNA of the second tissue from the female individual. 106.如权利要求104所述的方法,其中所述阈值从怀有无染色体异常的胎儿的其它女性怀孕个体针对所述第一染色体区域确定。106. The method of claim 104, wherein the threshold is determined from other female pregnant individuals carrying fetuses without chromosomal abnormalities for the first chromosomal region. 107.如权利要求104所述的方法,其中所述染色体异常是指第21对染色体三体症、第18对染色体三体症、第13对染色体三体症、特纳综合症(Turner syndrome)或克氏综合症(Klinefelter syndrome)。107. The method of claim 104, wherein the chromosomal abnormality refers to trisomy 21, trisomy 18, trisomy 13, Turner syndrome, or Klinefelter syndrome. 108.如权利要求96所述的方法,其中所述第一组织来自所述生物体内的肿瘤并且所述第二组织来自所述生物体的非恶性组织。108. The method of claim 96, wherein the first tissue is derived from a tumor within the organism and the second tissue is derived from a non-malignant tissue within the organism. 109.如权利要求96所述的方法,其中所述阈值由未患癌症的其它生物体确定。109. The method of claim 96, wherein the threshold is determined by another organism that does not have cancer. 110.如权利要求96所述的方法,其中阈值是基于所述生物样品中源自所述第一组织的游离DNA的浓度。110. The method of claim 96, wherein the threshold is based on the concentration of free DNA derived from the first tissue in the biological sample. 111.如权利要求110所述的方法,其中所述阈值是基于与异常类型相对应的比例因子,其中所述异常类型是缺失或重复。111. The method of claim 110, wherein the threshold is based on a scaling factor corresponding to an anomaly type, wherein the anomaly type is a missing or duplicate. 112.如权利要求96所述的方法,其中确定所述DNA分子在参考基因组中所在的所述位置包括确定所述位置是否在所述第一染色体区域内。112. The method of claim 96, wherein determining the location of the DNA molecule in the reference genome includes determining whether the location is within the first chromosomal region. 113.如权利要求112所述的方法,其中确定所述DNA分子在所述参考基因组中所在的所述位置是通过确定所述DNA分子是否比对到所述第一染色体区域内来实现。113. The method of claim 112, wherein determining the location of the DNA molecule in the reference genome is achieved by determining whether the DNA molecule aligns to the first chromosomal region. 114.如权利要求96所述的方法,其中所述染色体异常是亚染色体缺失、亚染色体重复或迪乔治综合症(DiGeorge syndrome)。114. The method of claim 96, wherein the chromosomal abnormality is a subchromosomal deletion, a subchromosomal duplication, or DiGeorge syndrome. 115.如权利要求1-89和91-114中任一项所述的方法,其中所述生物样品由在生物样品中获得游离核酸分子的处理收获。115. The method of any one of claims 1-89 and 91-114, wherein the biological sample is harvested by a treatment that yields free nucleic acid molecules in the biological sample. 116.如权利要求115所述的方法,其中所述生物样品选自血浆和血清。116. The method of claim 115, wherein the biological sample is selected from plasma and serum. 117.如权利要求115所述的方法,其中所述生物样品由离心处理收获。117. The method of claim 115, wherein the biological sample is harvested by centrifugation. 118.一种计算机系统,其包含一个或多个处理器和存储多个指令的计算机可读媒体,所述指令在被执行时控制计算机系统的一个或多个处理器以执行如权利要求91-114中任一项所述的方法。118. A computer system comprising one or more processors and a computer-readable medium storing a plurality of instructions, which, when executed, control one or more processors of the computer system to perform the method as claimed in any one of claims 91-114. 119.一种用于分析生物体的生物样品的系统,所述生物样品包括源自正常细胞和与癌症相关的细胞的游离脱氧核糖核酸(DNA)分子的混合物,所述系统包含:119. A system for analyzing biological samples from an organism, said biological sample comprising a mixture of free deoxyribonucleic acid (DNA) molecules derived from normal cells and cancer-associated cells, said system comprising: 用于从所述生物样品分析多个游离DNA分子的装置,所述分析是针对获自大规模平行测序的序列读值,并且其中分析多个游离DNA分子的每一个包括:A device for analyzing multiple cell-free DNA molecules from the biological sample, the analysis being for sequence reads obtained from massively parallel sequencing, and wherein the analysis of each of the multiple cell-free DNA molecules includes: 确定游离DNA分子在所述生物体的基因组中的位置;To determine the location of free DNA molecules within the genome of the organism; 确定所述游离DNA分子在一或多个位点是否甲基化;Determine whether the free DNA molecule is methylated at one or more sites; 对于多个位点中的每一者:For each of the multiple sites: 用于利用序列读值,对在所述位点上甲基化的来自所述生物样品的游离DNA分子的相应数目进行分别计数的装置;A device for counting the number of free DNA molecules methylated at the site from the biological sample using sequence reads; 用于利用在所述多个位点上甲基化的游离DNA分子的所述相应数目之和计算第一甲基化水平的装置,其中所述多个位点位于多个染色体上;A device for calculating a first methylation level using the sum of the respective numbers of free DNA molecules methylated at said plurality of sites, wherein said plurality of sites are located on a plurality of chromosomes; 用于将所述第一甲基化水平与第一阈值比较的装置;以及A means for comparing the first methylation level with a first threshold; and 用于基于所述比较确定癌症等级的第一分类的装置。A device for determining a first category of cancer grade based on the comparison. 120.如权利要求119所述的系统,其中确定所述游离DNA分子在一或多个位点上是否甲基化的装置包括:120. The system of claim 119, wherein the means for determining whether the free DNA molecule is methylated at one or more sites comprises: 用于进行可识别甲基化的测序的装置。A device used for sequencing that can identify methylation. 121.如权利要求120所述的系统,其中用于进行可识别甲基化的测序的装置包括:121. The system of claim 120, wherein the means for performing sequencing that can identify methylation comprises: 用于利用亚硫酸氢钠处理所述游离DNA分子的装置;以及Apparatus for treating the free DNA molecules with sodium bisulfite; and 用于对所述经处理的游离DNA分子进行测序的装置。Apparatus for sequencing the processed free DNA molecules. 122.如权利要求121所述的系统,其中所述利用亚硫酸氢钠处理所述游离DNA分子是用于检测5-羟基甲基胞嘧啶的Tet辅助的亚硫酸氢盐转化或氧化亚硫酸氢盐测序的一部分。122. The system of claim 121, wherein the treatment of the free DNA molecule with sodium bisulfite is part of Tet-assisted bisulfite conversion or oxidative bisulfite sequencing for the detection of 5-hydroxymethylcytosine. 123.如权利要求120所述的系统,其中每个区域的游离DNA分子的所述相应数目通过比对从所述可识别甲基化的测序中获得的序列读数来确定。123. The system of claim 120, wherein the corresponding number of free DNA molecules in each region is determined by comparing sequence reads obtained from sequencing of the recognizable methylation. 124.如权利要求119所述的系统,其中用于确定所述游离DNA分子在一或多个位点上是否甲基化的装置包括用于使用甲基化敏感性限制酶消化、甲基化特定的PCR、甲基化依赖性的DNA沉淀、甲基化DNA结合蛋白/肽或无需亚硫酸氢钠处理的单分子测序的装置。124. The system of claim 119, wherein the means for determining whether the free DNA molecule is methylated at one or more sites comprises means for digestion with methylation-sensitive restriction enzymes, methylation-specific PCR, methylation-dependent DNA precipitation, methylated DNA-binding proteins/peptides, or single-molecule sequencing without sodium bisulfite treatment. 125.如权利要求119所述的系统,其进一步包含:125. The system of claim 119, further comprising: 对于所述基因组的第一多个区域中的每一者:For each of the first plurality of regions of the genome: 用于确定来自所述生物样品的所述区域的游离DNA分子的相应数目的装置;A device for determining the corresponding number of free DNA molecules from the region of the biological sample; 用于计算来自所述区域的游离DNA分子的所述相应数目的相应标准化值的装置;以及A means for calculating the corresponding standardized value of the corresponding number of free DNA molecules from the region; and 用于将所述相应标准化值与参考值比较以确定所述相应区域显示缺失还是扩增的装置;A device for comparing the corresponding standardized value with a reference value to determine whether the corresponding region shows a deficiency or an amplification; 用于对所述基因组的第一多个区域中确定为含有缺失或扩增的区域的第一量进行确定的装置;A means for determining a first amount of regions in a first plurality of regions of the genome that are identified as containing deletions or amplifications; 用于将所述第一量与第一阈值比较以确定癌症等级的第二分类的装置;以及A device for comparing the first quantity with a first threshold to determine a second category of cancer grade; and 用于使用所述第一分类和所述第二分类以确定癌症等级的第三分类的装置。A device for determining a third category of cancer grade using the first and second categories. 126.如权利要求125所述的系统,其中所述第一阈值是用于确定缺失或扩增的所述第一多个区域的百分比。126. The system of claim 125, wherein the first threshold is a percentage for determining the first plurality of regions that are missing or expanded. 127.如权利要求125所述的系统,其中仅当所述第一分类与所述第二分类都指示癌症时,所述第三分类才是癌症阳性。127. The system of claim 125, wherein the third category is cancer-positive only when both the first category and the second category indicate cancer. 128.如权利要求125所述的系统,其中当所述第一分类或所述第二分类指示癌症时,所述第三分类是癌症阳性。128. The system of claim 125, wherein the third category is cancer-positive when the first or second category indicates cancer. 129.如权利要求119所述的系统,其中所述第一分类指示所述生物体存在癌症,所述系统进一步包含:129. The system of claim 119, wherein the first classification indicates the presence of cancer in the organism, the system further comprising: 用于通过将所述第一甲基化水平与从其它生物体确定的对应值的比较来鉴别与所述生物体相关的癌症类型的装置,其中所述其它生物体中的至少两个被鉴别为患有不同类型的癌症。A device for identifying a type of cancer associated with an organism by comparing the first methylation level with a corresponding value determined from other organisms, wherein at least two of the other organisms are identified as having different types of cancer. 130.如权利要求125所述的系统,其中所述第一分类指示所述生物体存在癌症,所述系统进一步包含:130. The system of claim 125, wherein the first classification indicates the presence of cancer in the organism, the system further comprising: 用于通过将所述第一甲基化水平与从其它生物体确定的对应值的比较来鉴别与所述生物体相关的癌症类型的装置,其中所述其它生物体中的至少两个被鉴别为患有不同类型的癌症。A device for identifying a type of cancer associated with an organism by comparing the first methylation level with a corresponding value determined from other organisms, wherein at least two of the other organisms are identified as having different types of cancer. 131.如权利要求130所述的系统,其中所述第三分类指示所述生物体存在癌症,所述系统进一步包含:131. The system of claim 130, wherein the third classification indicates the presence of cancer in the organism, the system further comprising: 用于通过将区域的所述第一量与从所述其它生物体确定的对应值的比较来鉴别与所述生物体相关的所述癌症类型的装置。A means for identifying the type of cancer associated with the organism by comparing the first quantity in the region with a corresponding value determined from the other organism. 132.如权利要求125所述的系统,其中用于计算所述第一甲基化水平的装置包括:132. The system of claim 125, wherein the means for calculating the first methylation level comprises: 用于鉴别所述基因组的第二多个区域的装置;Device for identifying a second plurality of regions of the genome; 用于鉴别各个所述第二多个区域内的一或多个位点的装置;A means for identifying one or more sites within each of the second plurality of regions; 用于计算所述第二多个区域中每个区域的区域甲基化水平的装置,其中所述第一甲基化水平是针对第一区域,A means for calculating the regional methylation level of each of the second plurality of regions, wherein the first methylation level is for a first region. 所述系统进一步包含:The system further includes: 用于所述区域甲基化水平每一者与相应区域阈值比较的装置,其包括将所述第一甲基化水平与所述第一阈值比较;A means for comparing each of the methylation levels of the regions with a corresponding region threshold, comprising comparing the first methylation level with the first threshold; 用于对区域甲基化水平确定为超过所述相应区域阈值的区域的第二量进行确定的装置;以及A means for determining a second quantity of regions whose methylation levels exceed a corresponding regional threshold; and 用于将所述第二多个区域中区域的所述第二量与第二阈值比较以确定所述第一分类的装置。A means for comparing the second quantity of a region in the second plurality of regions with a second threshold to determine the first classification. 133.如权利要求119所述的系统,其中用于计算所述第一甲基化水平的装置包括:133. The system of claim 119, wherein the means for calculating the first methylation level comprises: 用于鉴别所述基因组的第二多个区域的装置;Device for identifying a second plurality of regions of the genome; 用于鉴别各个所述第二多个区域内的一或多个位点的装置;A means for identifying one or more sites within each of the second plurality of regions; 用于计算所述第二多个区域中每个区域的区域甲基化水平的装置,其中所述第一甲基化水平是针对第一区域,A means for calculating the regional methylation level of each of the second plurality of regions, wherein the first methylation level is for a first region. 所述系统进一步包含:The system further includes: 用于所述区域甲基化水平每一者与相应区域阈值比较的装置,其包括将所述第一甲基化水平与所述第一阈值比较;A means for comparing each of the methylation levels of the regions with a corresponding region threshold, comprising comparing the first methylation level with the first threshold; 用于对区域甲基化水平确定为超过所述相应区域阈值的区域的第二量进行确定的装置;以及A means for determining a second quantity of regions whose methylation levels exceed a corresponding regional threshold; and 用于将所述第二多个区域中区域的所述第二量与第二阈值比较以确定所述第一分类的装置。A means for comparing the second quantity of a region in the second plurality of regions with a second threshold to determine the first classification. 134.如权利要求133所述的系统,其中确定区域甲基化水平超过所述相应区域阈值的所述区域对应于第一组区域,所述系统进一步包含:134. The system of claim 133, wherein the regions where the methylation level exceeds the corresponding region threshold correspond to a first group of regions, the system further comprising: 用于将所述第一组区域的区域甲基化水平与其它生物体关于所述第一组区域的对应区域甲基化水平比较的装置,所述其它生物体具有如下所述的生物体中的至少两者:第一类型的癌症、不存在癌症和第二类型的癌症;以及A device for comparing the regional methylation level of the first set of regions with the corresponding regional methylation level of other organisms regarding the first set of regions, said other organisms having at least two of the following: a first type of cancer, no cancer, and a second type of cancer; and 用于基于所述比较,确定所述生物体是否具有所述第一类型的癌症、不存在癌症或所述第二类型的癌症的装置。A device for determining, based on the comparison, whether the organism has the first type of cancer, does not have cancer, or has the second type of cancer. 135.如权利要求134所述的系统,其进一步包含:135. The system of claim 134, further comprising: 用于基于其它生物体的第一组区域的对应区域甲基化水平将所述其它生物体聚类的装置,其中所述类中的两者对应于以下任两者:所述第一类型的癌症、不存在癌症和所述第二类型的癌症,A device for clustering other organisms based on the methylation levels of corresponding regions in a first group of regions, wherein two of the groups correspond to either: the first type of cancer, the absence of cancer, and the second type of cancer. 其中利用所述第二多个区域中区域甲基化水平的所述比较以确定所述生物体属于哪一类。The comparison of regional methylation levels in the second plurality of regions is used to determine which category the organism belongs to. 136.如权利要求135所述的系统,其中所述其它生物体的所述聚类使用所述生物体的所述区域甲基化水平。136. The system of claim 135, wherein the clustering of the other organisms uses the regional methylation level of the organism. 137.如权利要求135所述的系统,其中所述类包括与所述第一类型的癌症相对应的第一类、与所述第二类型的癌症相对应的第二类和与不存在癌症相对应的第三类。137. The system of claim 135, wherein the class includes a first class corresponding to the first type of cancer, a second class corresponding to the second type of cancer, and a third class corresponding to the absence of cancer. 138.如权利要求135所述的系统,其中所述其它生物体的所述聚类进一步基于所述其它生物体的第二组区域的相应标准化值,其中所述第二组区域对应于确定为含有缺失或扩增的区域,并且其中区域的所述相应标准化值是由来自所述区域的游离DNA分子的相应数目确定,所述系统进一步包含:138. The system of claim 135, wherein the clustering of the other organisms is further based on corresponding normalized values of a second set of regions of the other organisms, wherein the second set of regions corresponds to regions identified as containing deletions or amplifications, and wherein the corresponding normalized value of the region is determined by the corresponding number of free DNA molecules from the region, the system further comprising: 对于所述第二组区域中的每一者:For each of the regions in the second group: 用于确定来自所述区域的游离DNA分子的相应数目的装置;以及A device for determining the corresponding number of free DNA molecules from said region; and 用于从来自所述区域的游离DNA分子的相应数目来计算相应标准化值的装置;以及A means for calculating a corresponding normalized value from the corresponding number of free DNA molecules from said region; and 用于将所述生物体的所述第二组区域的所述相应标准化值与所述其它生物体的所述相应标准化值比较以确定所述生物体属于哪一类的一部分的装置。A device for comparing the corresponding standardized values of the second group of regions of the organism with the corresponding standardized values of the other organisms to determine which category the organism belongs to. 139.如权利要求138所述的系统,其中所述其它生物体的所述聚类进一步基于高甲基化的CpG岛的相应甲基化密度,所述系统进一步包含:139. The system of claim 138, wherein the clustering of the other organisms is further based on the corresponding methylation density of highly methylated CpG islands, the system further comprising: 对于所述高甲基化的CpG岛每一者:For each of the hypermethylated CpG islands: 用于确定相应甲基化密度的装置,以及Apparatus for determining the corresponding methylation density, and 用于将所述生物体的所述高甲基化的CpG岛的所述相应甲基化密度与所述其它生物体的所述甲基化密度比较以确定所述生物体属于哪一类的一部分的装置。An apparatus for comparing the methylation density of the hypermethylated CpG islands of the organism with the methylation density of other organisms to determine which category the organism belongs to. 140.如权利要求133所述的系统,其进一步包含:140. The system of claim 133, further comprising: 对于所述第二多个区域中的每一者:For each of the second plurality of regions: 用于计算所述区域甲基化水平与所述相应区域阈值之间的相应差异的装置;以及A device for calculating the difference between the methylation level of the region and the corresponding threshold of the region; and 用于计算与所述相应差异相对应的相应概率的装置;A device for calculating the corresponding probability corresponding to the corresponding difference; 其中用于确定区域的所述第二量的装置包括:The means for determining the second quantity of the region includes: 用于计算包括所述相应概率的累积分数的装置。A device for calculating a cumulative score including the corresponding probabilities. 141.如权利要求140所述的系统,其中用于计算所述累积分数的装置包括:141. The system of claim 140, wherein the means for calculating the cumulative score comprises: 用于利用所述第二多个区域中各区域的所述相应概率的对数以获得相应对数结果的装置;以及A means for obtaining a corresponding logarithmic result by utilizing the logarithm of the corresponding probability of each region in the second plurality of regions; and 用于计算包括所述相应对数结果的总和的装置。A means for calculating the sum including the corresponding logarithmic results. 142.如权利要求141所述的系统,其中所述累积分数是所述相应对数结果的所述总和的负值。142. The system of claim 141, wherein the cumulative score is the negative of the sum of the corresponding logarithmic results. 143.如权利要求140所述的系统,其中所述第二多个区域中各区域的所述相应差异被所述相应区域阈值相关的标准差所标准化。143. The system of claim 140, wherein the corresponding differences in each of the second plurality of regions are normalized by the standard deviation of the corresponding region threshold. 144.如权利要求140所述的系统,其中所述相应概率对应于根据统计分布所述相应差异的概率。144. The system of claim 140, wherein the corresponding probability corresponds to the probability of the corresponding difference according to a statistical distribution. 145.如权利要求140所述的系统,其中所述第二阈值对应于来自其它生物体的样品的参考组别的最高累积分数。145. The system of claim 140, wherein the second threshold corresponds to the highest cumulative score of a reference group of samples from other organisms. 146.如权利要求133所述的系统,其进一步包含:146. The system of claim 133, further comprising: 对于所述第一多个区域中的每一者:For each of the first plurality of regions: 用于计算所述相应标准化值与所述参考值之间的相应差异的装置;以及A means for calculating the corresponding difference between the corresponding standardized value and the reference value; and 用于计算与所述相应差异相对应的相应概率的装置;A device for calculating the corresponding probability corresponding to the corresponding difference; 其中用于确定区域的所述第一量的装置包括:The means for determining the first quantity of the region includes: 用于计算包括所述相应概率的第一总和的装置。A means for calculating a first sum including the corresponding probabilities. 147.如权利要求133所述的系统,其中所述相应区域阈值是来自参考甲基化水平的指定量。147. The system of claim 133, wherein the corresponding region threshold is a specified amount derived from a reference methylation level. 148.如权利要求133所述的系统,其中所述第二阈值是百分比,并且其中比较区域的所述第二量与第二阈值包括:148. The system of claim 133, wherein the second threshold is a percentage, and wherein the second quantity in the comparison region is compared with the second threshold, comprising: 用于在与所述第二阈值比较前,将区域的所述第二量除以所述第二多个区域的第二数目的装置。A means for dividing the second quantity of a region by the second number of the second plurality of regions before comparing it with the second threshold. 149.如权利要求148所述的系统,其中所述第二数目对应于所述第二多个区域全部。149. The system of claim 148, wherein the second number corresponds to all of the second plurality of regions. 150.如权利要求133所述的系统,其中所述第一多个区域与所述第二多个区域相同,并且其中所述相应区域阈值依赖于所述相应区域显示缺失还是扩增。150. The system of claim 133, wherein the first plurality of regions are the same as the second plurality of regions, and wherein the threshold of the corresponding region depends on whether the corresponding region shows a deficiency or an amplification. 151.如权利要求150所述的系统,其中与未显示扩增时相比,所述相应区域显示扩增时相应区域阈值中的一个具有更大量值,并且其中与未显示缺失时相比,所述相应区域显示缺失时相应区域阈值中的第二个具有更小量值。151. The system of claim 150, wherein, compared to when no amplification is displayed, one of the threshold values of the corresponding region when amplification is displayed has a larger value, and wherein, compared to when no deficiency is displayed, the second of the threshold values of the corresponding region when deficiency is displayed has a smaller value. 152.如权利要求151所述的系统,其中相应区域阈值测试所述第二多个区域的低甲基化,其中与未显示扩增时相比,所述相应区域显示扩增时相应区域阈值中的第三个具有更大负值,并且其中与未显示缺失时相比,所述相应区域显示缺失时,相应区域阈值中的第四个具有更小负值。152. The system of claim 151, wherein the corresponding region threshold tests the hypomethylation of the second plurality of regions, wherein the third of the corresponding region thresholds has a larger negative value when the corresponding region shows amplification compared to when no amplification is shown, and wherein the fourth of the corresponding region thresholds has a smaller negative value when the corresponding region shows deletion compared to when no deletion is shown. 153.如权利要求133所述的系统,其中所述生物样品在治疗前采集,进一步包含:153. The system of claim 133, wherein the biological sample is collected prior to treatment, further comprising: 针对在治疗后采集的另一生物样品,利用如权利要求133所述的系统,以获得:For another biological sample collected after treatment, the system as described in claim 133 is used to obtain: 确定显示缺失或扩增的区域的后续第一量;以及Determine the first subsequent measure to show the region of absence or amplification; and 确定区域甲基化水平超过所述相应区域阈值的区域的后续第二量;A subsequent second quantity is determined for regions where the methylation level exceeds the corresponding regional threshold; 将所述第一量与所述后续第一量比较并且将所述第二量与所述后续第二量比较以确定所述生物体的预后。The first quantity is compared with the subsequent first quantity, and the second quantity is compared with the subsequent second quantity to determine the prognosis of the organism. 154.如权利要求153所述的系统,其中将所述第一量与所述后续第一量比较并且将所述第二量与所述后续第二量比较以确定所述生物体的所述预后包括:154. The system of claim 153, wherein comparing the first quantity with the subsequent first quantity and comparing the second quantity with the subsequent second quantity to determine the prognosis of the organism comprises: 用于确定所述第一量与所述后续第一量之间的第一差异的装置;A means for determining a first difference between the first quantity and the subsequent first quantity; 用于将所述第一差异与一或多个第一差异阈值比较的装置;A means for comparing the first difference with one or more first difference thresholds; 用于确定所述第二量与所述后续第二量之间的第二差异的装置;以及A means for determining a second difference between the second quantity and the subsequent second quantity; and 用于将所述第二差异与一或多个第二差异阈值比较的装置。A means for comparing the second difference with one or more second difference thresholds. 155.如权利要求154所述的系统,其中当所述第一差异低于所述第一差异阈值中的一个时,与所述第一差异超过所述第一差异阈值中的一个时相比,预测所述预后将变得更坏,并且其中当所述第二差异低于所述第二差异阈值中的一个时,与所述第二差异超过所述第二差异阈值中的一个时相比,预测所述预后将变得更坏。155. The system of claim 154, wherein when the first difference is below one of the first difference thresholds, the prognosis is predicted to be worse compared to when the first difference exceeds one of the first difference thresholds, and wherein when the second difference is below one of the second difference thresholds, the prognosis is predicted to be worse compared to when the second difference exceeds one of the second difference thresholds. 156.如权利要求155所述的系统,其中所述第一差异阈值中的一个和所述第二差异阈值中的一个是零。156. The system of claim 155, wherein one of the first difference thresholds and one of the second difference thresholds are zero. 157.如权利要求153所述的系统,其中所述治疗是免疫疗法、手术、放射线疗法、化学疗法、基于抗体的疗法、表观遗传疗法或靶向疗法。157. The system of claim 153, wherein the treatment is immunotherapy, surgery, radiation therapy, chemotherapy, antibody-based therapy, epigenetic therapy, or targeted therapy. 158.如权利要求119所述的系统,其中所述第一阈值是与由获自健康生物体的生物样品建立的参考甲基化水平的指定距离。158. The system of claim 119, wherein the first threshold is a specified distance from a reference methylation level established by a biological sample obtained from a healthy organism. 159.如权利要求158所述的系统,其中所述指定距离是相对所述参考甲基化水平的标准偏差的指定数目。159. The system of claim 158, wherein the specified distance is a specified number of standard deviations relative to the reference methylation level. 160.如权利要求119所述的系统,其中所述第一阈值由参考甲基化水平建立,所述参考甲基化水平是从在所述生物样品测试前获得的所述生物体的先前生物样品确定。160. The system of claim 119, wherein the first threshold is established by a reference methylation level determined from a previous biological sample of the organism obtained prior to the biological sample test. 161.如权利要求119所述的系统,其中用于将所述第一甲基化水平与所述第一阈值的比较的装置包括:161. The system of claim 119, wherein the means for comparing the first methylation level with the first threshold comprises: 用于确定所述第一甲基化水平与参考甲基化水平之间的差异的装置;以及A means for determining the difference between the first methylation level and a reference methylation level; and 用于将所述差异与对应于所述第一阈值比较的装置。A means for comparing the difference with a value corresponding to the first threshold. 162.如权利要求119所述的系统,其进一步包含:162. The system of claim 119, further comprising: 用于确定所述生物样品中肿瘤游离DNA的百分比浓度的装置;A device for determining the percentage concentration of cell-free tumor DNA in the biological sample; 用于基于所述百分比浓度计算所述第一阈值的装置。A device for calculating the first threshold based on the percentage concentration. 163.如权利要求119所述的系统,其进一步包含:163. The system of claim 119, further comprising: 用于确定所述生物样品中肿瘤游离DNA的百分比浓度是否超过最小值的装置;以及A device for determining whether the percentage concentration of cell-free tumor DNA in the biological sample exceeds a minimum value; and 用于如果所述百分比浓度不超过所述最小值,那么标记所述生物样品的装置。An apparatus for labeling a biological sample if the percentage concentration does not exceed the minimum value. 164.如权利要求163所述的系统,其中所述最小值基于肿瘤甲基化水平相对于参考甲基化水平的预期差异来确定。164. The system of claim 163, wherein the minimum value is determined based on the expected difference between the tumor methylation level and the reference methylation level. 165.如权利要求119所述的系统,其进一步包含:165. The system of claim 119, further comprising: 用于测量位于所述多个位点上游离DNA分子的尺寸的装置;以及A device for measuring the size of free DNA molecules located at the plurality of sites; and 用于在将所述第一甲基化水平与所述第一阈值比较前,基于所述游离DNA分子的所述测量尺寸将所述第一甲基化水平标准化的装置。An apparatus for normalizing the first methylation level based on the measured size of the free DNA molecule before comparing the first methylation level with the first threshold. 166.如权利要求165所述的系统,其中用于基于所述测量尺寸将所述第一甲基化水平标准化的装置包括:166. The system of claim 165, wherein the means for normalizing the first methylation level based on the measurement size comprises: 用于选择具有第一尺寸的游离DNA分子的装置;A device for selecting free DNA molecules of a first size; 用于使用所选择的游离DNA分子计算所述第一甲基化水平的装置,所述第一阈值对应于所述第一尺寸。A means for calculating the first methylation level using selected free DNA molecules, wherein the first threshold corresponds to the first size. 167.如权利要求166所述的系统,其中所述第一尺寸是某一长度范围。167. The system of claim 166, wherein the first dimension is a length range. 168.如权利要求166所述的系统,其中基于依赖于尺寸的物理分离来选择所述游离DNA分子。168. The system of claim 166, wherein the free DNA molecules are selected based on size-dependent physical separation. 169.如权利要求166所述的系统,其中用于选择具有第一尺寸的游离DNA分子的装置包括:169. The system of claim 166, wherein the means for selecting free DNA molecules having a first size comprises: 用于对所述多个游离DNA分子进行双末端大规模平行测序以获得所述多个游离DNA分子每一者的成对序列的装置;Apparatus for performing paired-end massive parallel sequencing on the plurality of free DNA molecules to obtain paired sequences of each of the plurality of free DNA molecules; 用于通过将所述多个游离DNA分子的所述成对序列与参考基因组比较,确定游离DNA分子的尺寸的装置;以及A device for determining the size of free DNA molecules by comparing the paired sequences of the plurality of free DNA molecules with a reference genome; and 用于选择具有所述第一尺寸的游离DNA分子的装置。A device for selecting free DNA molecules having the first size. 170.如权利要求165所述的系统,其中用于基于所述测量的尺寸将所述第一甲基化水平标准化的装置包括:170. The system of claim 165, wherein the means for normalizing the first methylation level based on the size of the measurement comprises: 用于获得尺寸与甲基化水平之间的函数关系的装置;以及Apparatus for obtaining a functional relationship between size and methylation level; and 用于使用所述函数关系将所述第一甲基化水平标准化的装置。A device for normalizing the first methylation level using the functional relationship. 171.如权利要求170所述的系统,其中所述函数关系提供与相应尺寸相对应的校正系数。171. The system of claim 170, wherein the functional relationship provides correction coefficients corresponding to the respective dimensions. 172.如权利要求171所述的系统,其进一步包含:172. The system of claim 171, further comprising: 用于计算与用于计算所述第一甲基化水平的游离DNA分子相对应的平均尺寸的装置;以及A device for calculating the average size of a free DNA molecule corresponding to the first methylation level; and 用于将所述第一甲基化水平乘以所述对应校正系数的装置。A means for multiplying the first methylation level by the corresponding correction coefficient. 173.如权利要求171所述的系统,其进一步包含:173. The system of claim 171, further comprising: 对于所述多个位点中的每一者:For each of the plurality of sites: 对于位于所述位点上的所述游离DNA分子每一者:For each of the free DNA molecules located at the said site: 用于获得所述位点上所述游离DNA分子的相应尺寸的装置;以及A device for obtaining the corresponding size of the free DNA molecule at the said site; and 用于使用与所述相应尺寸相对应的所述校正系数将所述游离DNA分子对在所述位点上甲基化的游离DNA分子的所述相应数目的贡献标准化的装置。A device for normalizing the contribution of the free DNA molecule to the corresponding number of free DNA molecules methylated at the site using the correction coefficient corresponding to the corresponding size. 174.如权利要求119所述的系统,其中所述多个位点包括CpG位点,其中所述CpG位点组织成多个CpG岛,每个CpG岛包括多个CpG位点,并且其中所述第一甲基化水平对应于多个CpG岛的第一CpG岛。174. The system of claim 119, wherein the plurality of sites comprises CpG sites, wherein the CpG sites are organized into a plurality of CpG islands, each CpG island comprising a plurality of CpG sites, and wherein the first methylation level corresponds to a first CpG island of the plurality of CpG islands. 175.如权利要求174所述的系统,其中其它生物体的样品的参考群体中所述CpG岛的每一者具有低于第一百分比的平均甲基化密度,并且其中所述参考群体中所述CpG岛的每一者具有低于第二百分比的平均甲基化密度变异系数,并且其中对于分别的CpG岛,使用跨越所述多个CpG位点的甲基化和非甲基化DNA分子的总数确定每个甲基化密度。175. The system of claim 174, wherein each of the CpG islands in a reference population of samples from other organisms has an average methylation density of less than a first percentage, and wherein each of the CpG islands in the reference population has an average methylation density coefficient of variation of less than a second percentage, and wherein for each CpG island, each methylation density is determined using the total number of methylated and unmethylated DNA molecules spanning the plurality of CpG sites. 176.如权利要求174所述的系统,其进一步包含:176. The system of claim 174, further comprising: 对于所述CpG岛每一者:For each of the CpG islands: 用于通过将所述CpG岛的甲基化水平与相应阈值比较来确定所述CpG岛是否相对于其它生物体的样品的参考组别具有高甲基化的装置;A device for determining whether a CpG island is highly methylated relative to a reference group of samples from other organisms by comparing the methylation level of the CpG island with a corresponding threshold; 对于所述高甲基化的CpG岛每一者:For each of the hypermethylated CpG islands: 用于确定相应甲基化密度的装置,Apparatus for determining the corresponding methylation density, 用于从所述相应甲基化密度计算累积分数的装置;以及A means for calculating the cumulative fraction from the corresponding methylation density; and 用于将所述累积分数与累积阈值比较以确定所述第一分类的装置。A means for comparing the cumulative score with a cumulative threshold to determine the first classification. 177.如权利要求176所述的系统,其中用于从所述相应甲基化密度计算所述累积分数的装置包括:177. The system of claim 176, wherein the means for calculating the cumulative fraction from the corresponding methylation density comprises: 对于所述高甲基化的CpG岛每一者:For each of the hypermethylated CpG islands: 用于计算所述相应甲基化密度与参考密度之间的相应差异的装置;和A device for calculating the corresponding difference between the corresponding methylation density and the reference density; and 用于计算与所述相应差异相对应的相应概率的装置;以及A means for calculating the corresponding probability corresponding to the corresponding difference; and 用于使用所述相应概率确定所述累积分数的装置。A means for determining the cumulative score using the corresponding probability. 178.如权利要求177所述的系统,其中所述累积分数通过以下来确定:178. The system of claim 177, wherein the cumulative score is determined by: 对于所述高甲基化的CpG岛每一者:For each of the hypermethylated CpG islands: 利用所述相应概率的对数以获得相应对数结果;以及The corresponding logarithmic result is obtained by using the logarithm of the corresponding probability; and 计算包括所述相应对数结果的总和,所述累积分数是所述总和的负值。The calculation includes the sum of the corresponding logarithmic results, and the cumulative score is the negative of the sum. 179.如权利要求177所述的系统,其中每个相应差异用与所述参考密度相关的标准差标准化。179. The system of claim 177, wherein each corresponding difference is standardized using a standard deviation associated with the reference density. 180.如权利要求176所述的系统,其中所述累积阈值对应于来自所述参考组别的最高累积分数。180. The system of claim 176, wherein the cumulative threshold corresponds to the highest cumulative score from the reference group. 181.如权利要求176所述的系统,其中用于确定所述第一CpG岛是否高甲基化的装置包括:181. The system of claim 176, wherein the means for determining whether the first CpG island is hypermethylated comprises: 用于比较所述第一甲基化水平与所述第一阈值和第三阈值的装置,A device for comparing the first methylation level with the first threshold and the third threshold. 其中所述第一阈值对应于所述参考群体的甲基化密度的平均值加指定百分比,并且其中所述第三阈值对应于指定数目的标准偏差加所述参考群体组别的甲基化密度的平均值。The first threshold corresponds to the average methylation density of the reference population plus a specified percentage, and the third threshold corresponds to a specified number of standard deviations plus the average methylation density of the reference population groups. 182.如权利要求181所述的系统,其中所述指定百分比是2%。182. The system of claim 181, wherein the specified percentage is 2%. 183.如权利要求181所述的系统,其中所述指定数目的标准偏差是三。183. The system of claim 181, wherein the specified number of standard deviations is three. 184.如权利要求119所述的系统,其进一步包含:184. The system of claim 119, further comprising: 对于所述基因组的第一多个区域中的每一者:For each of the first plurality of regions of the genome: 用于确定来自所述区域的游离DNA分子的相应数目的装置;A device for determining the corresponding number of free DNA molecules from said region; 用于从所述相应数目计算相应标准化值的装置;以及A means for calculating a corresponding standardized value from the corresponding number; and 用于将所述相应标准化值与参考值比较以确定所述区域显示缺失还是扩增的装置;A device for comparing the corresponding standardized value with a reference value to determine whether the region shows a missing or amplified feature; 用于确定第一组区域的装置,所述第一组区域被确定为具有如下所述中的一者:A means for determining a first set of regions, the first set of regions being determined to have one of the following: 缺失、扩增或正常呈现,其中所述第一甲基化水平对应于所述第一组区域;Deletion, amplification, or normal presentation, wherein the first methylation level corresponds to the first group of regions; 用于确定第二组区域的装置,所述第二组区域被确定为具有如下所述中的另一者:缺失、扩增或正常呈现;以及A means for determining a second set of regions, the second set of regions being determined to have another of the following: deletion, amplification, or normal presentation; and 用于基于所述第二组区域中位点上甲基化的游离DNA分子的所述相应数目计算第二甲基化水平的装置,A device for calculating the second methylation level based on the corresponding number of free DNA molecules methylated at sites in the second group of regions. 其中用于将所述第一甲基化水平与所述第一阈值比较的装置包括:The means for comparing the first methylation level with the first threshold includes: 用于计算所述第一甲基化水平与所述第二甲基化水平之间的参数的装置;以及A device for calculating a parameter between the first methylation level and the second methylation level; and 用于将所述参数与所述第一阈值比较的装置。A means for comparing the parameter with the first threshold. 185.如权利要求184所述的系统,其中所述第一甲基化水平是针对所述第一组区域的每个区域计算的区域甲基化水平的统计值,并且其中所述第二甲基化水平是针对所述第二组区域的每个区域计算的区域甲基化水平的统计值。185. The system of claim 184, wherein the first methylation level is a statistical value of the regional methylation level calculated for each region of the first group of regions, and wherein the second methylation level is a statistical value of the regional methylation level calculated for each region of the second group of regions. 186.如权利要求185所述的系统,其中所述统计值使用Student’s t检验(Student'st-test)、方差分析(ANOVA)检验或克鲁斯卡尔-沃利斯检验(Kruskal-Wallis test)确定。186. The system of claim 185, wherein the statistic is determined using a Student’s t-test, an analysis of variance (ANOVA) test, or a Kruskal-Wallis test. 187.如权利要求184所述的系统,其中所述参数包括所述第一甲基化水平与所述第二甲基化水平之间的比率或差异。187. The system of claim 184, wherein the parameter includes the ratio or difference between the first methylation level and the second methylation level. 188.如权利要求187所述的系统,其中用于计算所述参数的装置包括用于将概率分布应用于所述比率或所述差异的装置。188. The system of claim 187, wherein the means for calculating the parameter includes means for applying a probability distribution to the ratio or the difference. 189.一种从生物体的生物样品确定第一甲基化型态的系统,所述生物样品包括包含源自第一组织和第二组织的游离DNA分子的混合物的游离DNA,所述系统包含:189. A system for determining a first methylation pattern from a biological sample of an organism, said biological sample comprising free DNA containing a mixture of free DNA molecules derived from a first tissue and a second tissue, said system comprising: 用于获得与所述第二组织的DNA分子相对应的第二甲基化型态的装置,所述第二甲基化型态提供所述生物体的基因组中多个基因座每一者上的第二甲基化密度,特定基因座上的所述第二甲基化密度为在所述基因座上所述第二组织的甲基化的DNA分子的比例;Apparatus for obtaining a second methylation pattern corresponding to DNA molecules of the second tissue, the second methylation pattern providing a second methylation density at each of a plurality of loci in the genome of the organism, the second methylation density at a particular locus being the proportion of methylated DNA molecules of the second tissue at that locus; 用于从所述混合物的所述游离DNA分子确定游离甲基化型态的装置,所述游离甲基化型态提供所述多个基因座每一者上的混合甲基化密度,每个基因座上的所述混合甲基化密度为在所述基因座上所述混合物的甲基化的DNA分子的比例;A device for determining the free methylation pattern from the free DNA molecules of the mixture, the free methylation pattern providing a mixed methylation density at each of the plurality of loci, the mixed methylation density at each locus being the proportion of methylated DNA molecules of the mixture at that locus; 用于确定所述混合物中来自所述第一组织的所述游离DNA分子的百分比的装置;以及A device for determining the percentage of free DNA molecules from the first tissue in the mixture; and 用于通过以下来确定所述第一组织的所述第一甲基化型态的装置:Apparatus for determining the first methylation type of the first tissue by: 对于所述多个基因座中的每一基因座:For each of the plurality of loci: 用于计算差异参数的装置,所述差异参数包括所述第二甲基化型态的所述第二甲基化密度与所述游离甲基化型态的所述混合甲基化密度之间的差异,所述差异通过所述混合物中来自所述第一组织的所述游离DNA分子的百分比来衡量。An apparatus for calculating a difference parameter, the difference parameter including the difference between the second methylation density of the second methylation type and the mixed methylation density of the free methylation type, the difference being measured by the percentage of the free DNA molecules from the first tissue in the mixture. 190.如权利要求189所述的系统,其进一步包含:190. The system of claim 189, further comprising: 用于从所述第一甲基化型态确定第一甲基化水平的装置;以及A means for determining a first methylation level from the first methylation morphology; and 用于将所述第一甲基化水平与第一阈值比较以确定癌症等级的分类的装置。A device for comparing the first methylation level with a first threshold to determine the classification of cancer grade. 191.如权利要求189所述的系统,其进一步包含:191. The system of claim 189, further comprising: 用于变换所述第一甲基化型态以获得校正的第一甲基化型态的装置。A device for changing the first methylation type to obtain a corrected first methylation type. 192.如权利要求191所述的系统,其中所述变换是线性变换。192. The system of claim 191, wherein the transformation is a linear transformation. 193.如权利要求189所述的系统,其中基因座的所述差异参数D被定义为其中mbc表示所述基因座上所述第二甲基化型态的所述第二甲基化密度,mp表示所述基因座上所述游离甲基化型态的所述混合甲基化密度,f是来自所述生物样品中的所述第一组织的游离DNA分子的百分比浓度,并且CN表示所述基因座上的拷贝数。193. The system of claim 189, wherein the differential parameter D of the locus is defined as follows: mbc represents the second methylation density of the second methylation pattern at the locus, mp represents the mixed methylation density of the free methylation pattern at the locus, f is the percentage concentration of free DNA molecules from the first tissue in the biological sample, and CN represents the copy number at the locus. 194.如权利要求193所述的系统,其中对于所述基因座上为二倍体的所述第一组织的所述拷贝数,CN是一。194. The system of claim 193, wherein CN is one for the copy number of the first tissue that is diploid at the locus. 195.如权利要求193所述的系统,其进一步包含:195. The system of claim 193, further comprising: 用于鉴别D超出阈值的区域的装置。A device for identifying regions where D exceeds a threshold. 196.如权利要求189所述的系统,其进一步包含:196. The system of claim 189, further comprising: 用于通过选择具有以下标准中的任一或多者的基因座来鉴别所述多个基因座的装置:A device for identifying the plurality of loci by selecting loci having one or more of the following criteria: 超过50%的GC含量;GC content exceeding 50%; 所述第二甲基化型态的第二甲基化密度低于第一阈值或超过第二阈值;以及The second methylation density of the second methylation type is below the first threshold or exceeds the second threshold; and 由所述基因座界定的区域中最少五个CpG位点。The region defined by the locus contains at least five CpG sites. 197.如权利要求189所述的系统,其中所述生物样品来自怀有胎儿的女性个体,并且其中所述第一组织来自所述胎儿或胎盘并且所述第二组织来自所述女性个体。197. The system of claim 189, wherein the biological sample is derived from a female individual carrying a fetus, and wherein the first tissue is derived from the fetus or placenta and the second tissue is derived from the female individual. 198.如权利要求197所述的系统,其进一步包含:198. The system of claim 197, further comprising: 用于从所述第一甲基化型态确定第一甲基化水平的装置;以及A means for determining a first methylation level from the first methylation morphology; and 用于将所述第一甲基化水平与第一阈值比较以确定医学病状的分类的装置。A device for comparing the first methylation level with a first threshold to determine the classification of a medical condition. 199.如权利要求198所述的系统,其中所述医学病状是先兆子痫或所述第一组织中的染色体异常。199. The system of claim 198, wherein the medical condition is preeclampsia or a chromosomal abnormality in the first tissue. 200.如权利要求189所述的系统,其中所述第一组织来自所述生物体内的肿瘤并且所述第二组织来自所述生物体的非恶性组织。200. The system of claim 189, wherein the first tissue is derived from a tumor within the organism and the second tissue is derived from non-malignant tissue within the organism. 201.如权利要求189所述的系统,其中所述生物样品还包括来自所述第二组织的DNA分子,并且其中获得所述第二甲基化型态包括:201. The system of claim 189, wherein the biological sample further comprises DNA molecules from the second tissue, and wherein obtaining the second methylation form comprises: 用于从所述第二组织分析所述DNA分子以确定所述第二甲基化型态的装置。Apparatus for analyzing the DNA molecules from the second tissue to determine the second methylation pattern. 202.如权利要求189所述的系统,其中所述获得的第二甲基化型态对应于从一组对照样品获得的平均甲基化型态。202. The system of claim 189, wherein the obtained second methylation type corresponds to the average methylation type obtained from a set of control samples. 203.如权利要求189所述的系统,其中确定所述多个基因座的第一基因座的所述游离DNA的甲基化型态的混合甲基化密度包括:203. The system of claim 189, wherein determining the mixed methylation density of the methylation patterns of the free DNA at the first locus of the plurality of loci comprises: 用于确定来自所述第一基因座的所述游离DNA分子的第一数目的装置;A device for determining a first number of the free DNA molecules from the first locus; 用于确定所述第一数目的每个DNA分子在一或多个位点上是否甲基化以获得甲基化DNA分子的第二数目的装置;以及A means for determining whether each of the first number of DNA molecules is methylated at one or more sites to obtain a second number of methylated DNA molecules; and 用于从所述第一数目和所述第二数目计算所述混合甲基化密度的装置。A device for calculating the mixed methylation density from the first number and the second number. 204.如权利要求203所述的系统,其中所述一或多个位点是CpG位点。204. The system of claim 203, wherein the one or more sites are CpG sites. 205.如权利要求189所述的系统,其中用于确定所述游离DNA甲基化型态的装置包括:205. The system of claim 189, wherein the means for determining the methylation pattern of the free DNA comprises: 用于利用亚硫酸氢钠处理所述游离DNA分子的装置;以及Apparatus for treating the free DNA molecules with sodium bisulfite; and 用于对所述经处理的DNA分子进行测序的装置。Apparatus for sequencing the processed DNA molecules. 206.如权利要求205所述的系统,其中所述游离DNA分子的所述处理是Tet辅助的亚硫酸氢盐转化的一部分或包括用高钌酸钾(KRuO4)处理所述游离DNA分子。206. The system of claim 205, wherein the treatment of the free DNA molecule is part of a Tet-assisted bisulfite conversion or includes treatment of the free DNA molecule with potassium perruthenate (KRuO4). 207.如权利要求189所述的系统,其中用于确定所述游离甲基化型态的装置包括:207. The system of claim 189, wherein the means for determining the free methylation type comprises: 用于对所述DNA进行单分子测序的装置,其中所述单分子测序包括确定DNA分子的至少一个位点是否甲基化。An apparatus for performing single-molecule sequencing on the DNA, wherein the single-molecule sequencing includes determining whether at least one site of the DNA molecule is methylated. 208.一种从生物体的生物样品确定第一甲基化型态的系统,所述生物样品包括包含源自第一组织和第二组织的游离DNA分子的混合物的游离DNA,所述系统包含:208. A system for determining a first methylation pattern from a biological sample of an organism, said biological sample comprising free DNA containing a mixture of free DNA molecules derived from a first tissue and a second tissue, said system comprising: 用于分析来自所述生物样品的多个DNA分子的装置,其中分析来自所述多个DNA分子的DNA分子包括:A device for analyzing multiple DNA molecules from said biological sample, wherein the DNA molecules analyzed from said multiple DNA molecules include: 用于确定所述生物体的基因组中所述DNA分子的位置的装置;A device for determining the location of the DNA molecule in the genome of the organism; 用于确定所述DNA分子的基因型的装置;以及A device for determining the genotype of the DNA molecule; and 用于确定所述DNA分子在一或多个位点是否甲基化的装置;A device for determining whether the DNA molecule is methylated at one or more sites; 用于鉴别多个第一基因座的装置,其满足与所述第一组织的第一基因组相应的第一等位基因和相应的第二等位基因是杂合的并且所述第二组织的第二基因组相应的第一等位基因是纯合的;A device for identifying multiple first loci, wherein the first allele and the corresponding second allele corresponding to the first genome of the first tissue are heterozygous and the first allele corresponding to the second genome of the second tissue is homozygous; 对于所述第一基因座中的每一者:For each of the first loci: 对于与所述基因座相关的一或多个位点的每一者:For each of the one or more loci associated with the said locus: 用于确定在所述位点上甲基化并对应于所述基因座的所述相应第二等位基因的DNA分子的数目的装置;A device for determining the number of DNA molecules methylated at the site and corresponding to the corresponding second allele at the locus. 用于利用在所述基因座的所述一或多个位点上甲基化并对应于所述基因座的所述相应第二等位基因的DNA分子的所述数目以计算甲基化密度的装置;以及A means for calculating methylation density using the number of DNA molecules methylated at one or more sites at the locus and corresponding to the respective second allele of the locus; and 用于从所述第一基因座的所述甲基化密度产生所述第一组织的所述第一甲基化型态的装置,其中每个基因座上的第一甲基化密度为在所述基因座上所述第一组织的甲基化的DNA分子的比例。An apparatus for generating the first methylation pattern of the first tissue from the methylation density of the first locus, wherein the first methylation density at each locus is the proportion of methylated DNA molecules of the first tissue at the locus. 209.如权利要求208所述的系统,其中所述生物样品来自怀有胎儿的女性个体,并且其中所述第一组织来自所述胎儿或胎盘并且所述第二组织对应于所述女性个体。209. The system of claim 208, wherein the biological sample is derived from a female individual carrying a fetus, and wherein the first tissue is derived from the fetus or placenta and the second tissue corresponds to the female individual. 210.如权利要求208所述的系统,其中所述位点对应于CpG位点。210. The system of claim 208, wherein the site corresponds to a CpG site. 211.如权利要求208所述的系统,其中每个基因座包括至少一个CpG位点。211. The system of claim 208, wherein each locus includes at least one CpG site. 212.如权利要求208所述的系统,其中每个基因座靠近至少一个CpG位点。212. The system of claim 208, wherein each locus is adjacent to at least one CpG site. 213.一种从生物体的生物样品检测染色体异常的系统,所述生物样品包括包含源自第一组织和第二组织的游离DNA分子的混合物的游离DNA,所述系统包含:213. A system for detecting chromosomal abnormalities from a biological sample of an organism, said biological sample comprising cell-free DNA containing a mixture of cell-free DNA molecules derived from a first tissue and a second tissue, said system comprising: 用于分析来自所述生物样品的多个DNA分子的装置,所述分析是针对获自大规模平行测序的序列读值,并且其中分析所述多个DNA分子的DNA分子包括:An apparatus for analyzing multiple DNA molecules from said biological sample, said analysis being for sequence reads obtained from massively parallel sequencing, wherein the DNA molecules analyzed comprise: 用于确定参考基因组中所述DNA分子的位置的装置;以及A device for determining the location of the DNA molecule in a reference genome; and 用于确定所述DNA分子在一或多个位点上是否甲基化的装置;A device for determining whether the DNA molecule is methylated at one or more sites; 对于多个位点中的每一者:For each of the multiple sites: 用于利用序列读值,对在所述位点上甲基化的DNA分子的相应数目的装置;A device for determining the corresponding number of DNA molecules methylated at said site using sequence reads; 用于利用所述第一染色体区域内的位点上甲基化的DNA分子的所述相应数目之和,计算第一染色体区域的第一甲基化水平的装置,所述第一甲基化水平为在所述第一染色体区域内的位点上甲基化的DNA分子的比例;A device for calculating a first methylation level of a first chromosome region by summing the corresponding numbers of methylated DNA molecules at sites within the first chromosome region, wherein the first methylation level is the proportion of methylated DNA molecules at sites within the first chromosome region. 用于将所述第一甲基化水平与第一阈值比较的装置;以及A means for comparing the first methylation level with a first threshold; and 用于基于所述比较确定所述第一组织中所述第一染色体区域的异常的分类的装置。A device for classifying abnormalities in the first chromosomal region of the first tissue based on the comparison. 214.如权利要求213所述的系统,其中用于将所述第一甲基化水平与阈值比较的装置包括:214. The system of claim 213, wherein the means for comparing the first methylation level with a threshold comprises: 用于将所述第一甲基化水平标准化并将所述标准化第一甲基化水平与所述阈值比较的装置。A device for standardizing the first methylation level and comparing the standardized first methylation level with the threshold. 215.如权利要求214所述的系统,其中所述标准化是指使用第二染色体区域的第二甲基化水平进行标准化。215. The system of claim 214, wherein the normalization refers to normalization using the second methylation level of the second chromosome region. 216.如权利要求214所述的系统,其中所述标准化是指使用来自所述第一组织的游离DNA的百分比浓度进行标准化。216. The system of claim 214, wherein the normalization refers to normalization using a percentage concentration of free DNA from the first tissue. 217.如权利要求213所述的系统,其中用于计算第一甲基化水平的装置包括:217. The system of claim 213, wherein the means for calculating the first methylation level comprises: 用于鉴别所述生物体的基因组的多个区域的装置;Device for identifying multiple regions of the genome of the organism; 用于鉴别所述多个区域中每一区域内的一或多个位点的装置;A means for identifying one or more sites within each of the plurality of regions; 用于计算所述多个区域中每个区域的区域甲基化水平以产生区域甲基化水平的装置,其中所述第一甲基化水平是针对第一区域,以及A means for calculating the regional methylation level of each of the plurality of regions to generate a regional methylation level, wherein the first methylation level is for a first region, and 其中用于将所述第一甲基化水平与第一阈值比较的装置包括:The means for comparing the first methylation level with a first threshold includes: 用于将每一所述区域甲基化水平与相应区域的阈值比较的装置;A means for comparing the methylation level of each of the regions with a threshold for the corresponding region; 用于确定区域甲基化水平超出所述相应区域阈值的区域的第一数目的装置;以及A means for determining a first number of regions whose regional methylation levels exceed the corresponding regional threshold; and 用于将所述第一数目与阈值以确定所述分类的装置。A means for determining the classification by comparing the first number with a threshold. 218.如权利要求217所述的系统,其中所述阈值是百分比,并且其中用于将所述第一数目与阈值比较的装置包括:218. The system of claim 217, wherein the threshold is a percentage, and wherein the means for comparing the first number with the threshold comprises: 用于在与所述阈值比较前将区域的所述第一数目除以区域的第二数目的装置。A means for dividing the first number of regions by the second number of regions before comparing with the threshold. 219.如权利要求218所述的系统,其中区域的所述第二数目是指所鉴别的所述多个区域全部。219. The system of claim 218, wherein the second number of regions refers to all of the identified plurality of regions. 220.如权利要求217所述的系统,其中所述相应区域阈值是来自参考甲基化水平的指定量。220. The system of claim 217, wherein the corresponding region threshold is a specified amount from a reference methylation level. 221.如权利要求213所述的系统,其中所述生物样品来自怀有胎儿的女性个体,其中所述第一组织来自胎儿或胎盘并且所述第二组织来自所述女性个体,并且其中所述染色体异常是胎儿染色体异常。221. The system of claim 213, wherein the biological sample is derived from a female individual carrying a fetus, wherein the first tissue is derived from the fetus or placenta and the second tissue is derived from the female individual, and wherein the chromosomal abnormality is a fetal chromosomal abnormality. 222.如权利要求221所述的系统,其中所述阈值是指基于与来自所述女性个体的所述第二组织的DNA相关的背景甲基化水平。222. The system of claim 221, wherein the threshold refers to the background methylation level associated with the DNA of the second tissue from the female individual. 223.如权利要求221所述的系统,其中所述阈值从怀有无染色体异常的胎儿的其它女性怀孕个体针对所述第一染色体区域确定。223. The system of claim 221, wherein the threshold is determined from other female pregnant individuals carrying fetuses without chromosomal abnormalities for the first chromosomal region. 224.如权利要求221所述的系统,其中所述染色体异常是指第21对染色体三体症、第18对染色体三体症、第13对染色体三体症、特纳综合症(Turner syndrome)或克氏综合症(Klinefelter syndrome)。224. The system of claim 221, wherein the chromosomal abnormality refers to trisomy 21, trisomy 18, trisomy 13, Turner syndrome, or Klinefelter syndrome. 225.如权利要求213所述的系统,其中所述第一组织来自所述生物体内的肿瘤并且所述第二组织来自所述生物体的非恶性组织。225. The system of claim 213, wherein the first tissue is derived from a tumor within the organism and the second tissue is derived from non-malignant tissue within the organism. 226.如权利要求213所述的系统,其中所述阈值由未患癌症的其它生物体确定。226. The system of claim 213, wherein the threshold is determined by another organism that does not have cancer. 227.如权利要求213所述的系统,其中阈值是基于所述生物样品中源自所述第一组织的游离DNA的浓度。227. The system of claim 213, wherein the threshold is based on the concentration of free DNA derived from the first tissue in the biological sample. 228.如权利要求227所述的系统,其中所述阈值是基于与异常类型相对应的比例因子,其中所述异常类型是缺失或重复。228. The system of claim 227, wherein the threshold is based on a scaling factor corresponding to an anomaly type, wherein the anomaly type is a missing or duplicate. 229.如权利要求213所述的系统,其中用于确定所述DNA分子在参考基因组中所在的所述位置的装置包括用于确定所述位置是否在所述第一染色体区域内的装置。229. The system of claim 213, wherein the means for determining the location of the DNA molecule in a reference genome includes means for determining whether the location is within the first chromosomal region. 230.如权利要求229所述的系统,其中用于确定所述DNA分子在所述参考基因组中所在的所述位置的装置是通过用于确定所述DNA分子是否比对到所述第一染色体区域内的装置来实现。230. The system of claim 229, wherein the means for determining the location of the DNA molecule in the reference genome is implemented by means for determining whether the DNA molecule aligns to the first chromosomal region. 231.如权利要求213所述的系统,其中所述染色体异常是亚染色体缺失、亚染色体重复或迪乔治综合症(DiGeorge syndrome)。231. The system of claim 213, wherein the chromosomal abnormality is a subchromosomal deletion, a subchromosomal duplication, or DiGeorge syndrome. 232.如权利要求119-231中任一项所述的系统,其中所述生物样品由在生物样品中获得游离核酸分子的处理收获。232. The system of any one of claims 119-231, wherein the biological sample is harvested by a process for obtaining free nucleic acid molecules in the biological sample. 233.如权利要求232所述的系统,其中所述生物样品选自血浆和血清。233. The system of claim 232, wherein the biological sample is selected from plasma and serum. 234.如权利要求232所述的系统,其中所述生物样品由离心处理收获。234. The system of claim 232, wherein the biological sample is harvested by centrifugation.
HK15107703.6A 2012-09-20 2013-09-20 Non-invasive determination of methylome of fetus or tumor from plasma HK1207124B (en)

Applications Claiming Priority (7)

Application Number Priority Date Filing Date Title
US201261703512P 2012-09-20 2012-09-20
US61/703,512 2012-09-20
US13/842,209 US9732390B2 (en) 2012-09-20 2013-03-15 Non-invasive determination of methylome of fetus or tumor from plasma
US13/842,209 2013-03-15
US201361830571P 2013-06-03 2013-06-03
US61/830,571 2013-06-03
PCT/AU2013/001088 WO2014043763A1 (en) 2012-09-20 2013-09-20 Non-invasive determination of methylome of fetus or tumor from plasma

Publications (2)

Publication Number Publication Date
HK1207124A1 HK1207124A1 (en) 2016-01-22
HK1207124B true HK1207124B (en) 2024-07-05

Family

ID=

Similar Documents

Publication Publication Date Title
JP7594817B2 (en) Non-invasive determination of fetal or tumor methylome from plasma
US20250349386A1 (en) Non-invasive detection of tissue abnormality using methylation
HK40055436A (en) Non-invasive determination of methylome of tumor from plasma
HK40013800B (en) Non-invasive determination of methylome of tumor from plasma
HK1207124B (en) Non-invasive determination of methylome of fetus or tumor from plasma
HK1257868B (en) Non-invasive determination of methylome of tumor from plasma