WO2012079486A1 - Method of preparing dna sample for sequencing and use thereof - Google Patents

Method of preparing dna sample for sequencing and use thereof Download PDF

Info

Publication number
WO2012079486A1
WO2012079486A1 PCT/CN2011/083726 CN2011083726W WO2012079486A1 WO 2012079486 A1 WO2012079486 A1 WO 2012079486A1 CN 2011083726 W CN2011083726 W CN 2011083726W WO 2012079486 A1 WO2012079486 A1 WO 2012079486A1
Authority
WO
WIPO (PCT)
Prior art keywords
dna
sequencing
fragment
unit
dna fragment
Prior art date
Application number
PCT/CN2011/083726
Other languages
French (fr)
Chinese (zh)
Inventor
吴逵
阿叁
耿春雨
张秀清
杨焕明
Original Assignee
深圳华大基因科技有限公司
深圳华大基因研究院
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 深圳华大基因科技有限公司, 深圳华大基因研究院 filed Critical 深圳华大基因科技有限公司
Publication of WO2012079486A1 publication Critical patent/WO2012079486A1/en

Links

Classifications

    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12QMEASURING OR TESTING PROCESSES INVOLVING ENZYMES, NUCLEIC ACIDS OR MICROORGANISMS; COMPOSITIONS OR TEST PAPERS THEREFOR; PROCESSES OF PREPARING SUCH COMPOSITIONS; CONDITION-RESPONSIVE CONTROL IN MICROBIOLOGICAL OR ENZYMOLOGICAL PROCESSES
    • C12Q1/00Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions
    • C12Q1/68Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions involving nucleic acids
    • C12Q1/6806Preparing nucleic acids for analysis, e.g. for polymerase chain reaction [PCR] assay

Abstract

A method of preparing a DNA sample for sequencing, a method of constructing a DNA sequencing library, a DNA sequencing library, a method of DNA sequencing, a method of determining the sequence information of genomic DNA, a device for preparing a DNA sample for sequencing and a DNA sequencing system are provided. Wherein, the method of preparing a DNA sample for sequencing includes the following steps: fragmenting the genomic DNA and obtaining the first DNA fragments; end-repairing the first DNA fragments and obtaining the end-repaired DNA fragments; linking the capturing labels to the end-repaired DNA fragments and obtaining the DNA fragments carrying the capturing labels; Circularizing the DNA fragments which carry the capturing labels and obtaining the circular DNA; Fragmenting the circular DNA and obtaining the second DNA fragments; screening the second DNA fragments in order to obtain the target fragments, and the target fragments form the DNA sample for sequencing.

Description

制备用于测序的 DNA样品的方法及其应用  Method for preparing DNA sample for sequencing and application thereof
优先权信息 Priority information
本申请请求 2010 年 12 月 16 日向中国国家知识产权局提交的、 专利申请号为 201010591448.7的专利申请的优先权和权益, 并且通过参照将其全文并入此处。 技术领域  Priority is claimed on Japanese Patent Application No. 2010-10591448.7, filed on Dec. Technical field
本发明涉及生物技术领域, 具体地, 涉及 DNA测序技术领域, 特别是制备用于测 序的 DNA样品的方法及其应用。 更具体地, 本发明提供了制备用于测序的 DNA样品的 方法、 构建 DNA测序文库的方法、 DNA测序文库、 DNA测序的方法、 确定基因组 DNA 序列信息的方法、 制备用于测序的 DNA样品的装置以及 DNA测序系统。 背景技术  The present invention relates to the field of biotechnology, and in particular to the field of DNA sequencing technology, and in particular to a method for preparing a DNA sample for sequencing and its use. More specifically, the present invention provides a method of preparing a DNA sample for sequencing, a method of constructing a DNA sequencing library, a DNA sequencing library, a method of DNA sequencing, a method of determining genomic DNA sequence information, and a preparation of a DNA sample for sequencing. Devices and DNA sequencing systems. Background technique
生物的基因组, 是生物所携带的遗传信息的总和, 对基因组进行测序分析, 不仅能 够从整体水平研究基因的存在、 基因的结构与功能、 基因之间的相互关系, 为基因表达 和调控等生物学基础研究提供重要数据,而且对疾病诊断学和基因治疗等应用研究意义 重大。  The genome of a living being is the sum of the genetic information carried by the organism. The sequencing of the genome can not only study the existence of genes, the structure and function of genes, the relationship between genes, but also the genes such as gene expression and regulation. Basic research provides important data and is of great significance for applied research in disease diagnostics and gene therapy.
鸟枪法 (Shotgun)测序, 是目前常用的基因组测序方法。 现阶段, 该方法步骤主要包 括: 将基因组 DN A进行片段化, 得到 DNA片段, 然后对这些 DN A片段进行高通量测 序, 以获得 DNA片段的序列信息; 借助 lkb以上的长片段测序, 即构建长片段的末端 配对测序文库, 然后对其进行测序, 获得该长片段两个末端的序列信息; 利用长片段两 个末端的序列信息和 DNA片段的序列信息, 进行基因组组装, 以便获得整个基因组的 序列信息, 最终实现基因组测序。 其中, 借助长片段两个末端的序列信息, 能够有效地 将短的序列即 DNA片段序列的重叠群 (contig ) 组装成较大的架构 (scaffold ) , 这对 于像人或果蝇这种相对较大而复杂的基因组组装来说是一关键突破(可参见 Myers EW, et al: A whole-genome assembly of Drosophila. Science 2000, 287(5461):2196-2204. , 通过 参照将其全文并入本文)。 因此,能否成功地构建较大跨度的片段的末端配对测序文库, 对该基因组测序方法至关重要。  Shotgun sequencing is currently the commonly used genome sequencing method. At this stage, the method steps mainly include: fragmenting the genomic DNA DN A to obtain a DNA fragment, and then performing high-throughput sequencing on the DN A fragments to obtain sequence information of the DNA fragment; sequencing with long fragments above lkb, ie Construct a long-length end-paired sequencing library, and then sequence it to obtain sequence information at both ends of the long fragment; use the sequence information of the two ends of the long fragment and the sequence information of the DNA fragment to perform genome assembly to obtain the entire genome The sequence information is ultimately achieved by genome sequencing. Among them, by using the sequence information of the two ends of the long segment, the short sequence, that is, the contig of the DNA fragment sequence can be effectively assembled into a larger scaffold, which is relatively similar to human or fruit flies. Large and complex genomic assembly is a key breakthrough (see Myers EW, et al: A whole-genome assembly of Drosophila. Science 2000, 287 (5461): 2196-2204., which is incorporated herein by reference in its entirety. ). Therefore, the ability to successfully construct a terminal-paired sequencing library of a large span of fragments is critical to the sequencing of the genome.
然而, 目前构建长片段(有时也称为 "具有较大跨度的片段" )的末端配对测序文 库的方法, 仍有待改进。 发明内容 However, the current method of constructing end-paired sequencing libraries of long fragments (sometimes referred to as "fragments with larger spans") remains to be improved. Summary of the invention
本发明是基于发明人的下列发现而完成的:  The present invention has been completed based on the following findings of the inventors:
现阶段的构建长片段的末端配对测序文库的方法, 耗时长、 成本高, 难以用于制备 长度达到 20kb甚至 50kb的片段的末端配对测序文库, 且测序数据处理难度大, 不利于 大规模文库的制备和测序。  At this stage, the method of constructing long-end paired paired sequencing libraries is time-consuming and costly, and it is difficult to prepare end-paired sequencing libraries of fragments up to 20 kb or even 50 kb in length, and sequencing data processing is difficult, which is not conducive to large-scale libraries. Preparation and sequencing.
本发明旨在至少解决现有技术中存在的技术问题之一。 为此, 本发明提供了制备用 于测序的 DNA样品的方法及其应用。  The present invention aims to solve at least one of the technical problems existing in the prior art. To this end, the present invention provides methods of preparing DNA samples for sequencing and uses thereof.
根据本发明的一个方面, 本发明提供了一种制备用于测序的 DNA样品的方法。 根据 本发明的实施例, 该方法包括以下步骤: 将基因组 DNA片段化, 以便获得第一 DNA片段; 将第一 DNA片段进行末端修复, 以便获得经过末端修复的 DNA片段; 将经过末端修复的 DNA片段连接捕获标记, 以便获得具有捕获标记的 DNA片段; 将具有捕获标记的 DNA片 段进行环化处理, 以便获得环状 DNA; 将环状 DNA片段化, 以便获得第二 DNA片段; 以 及将第二 DNA片段进行筛选, 以便获得目的片段, 该目的片段构成用于测序的 DNA样品。 其中, 目的片段包含第一 DNA片段的末端序列。 根据本发明的实施例, 利用该方法, 能够 方便有效地制备基因组 DNA的用于测序的 DNA样品, 所得的 DNA样品能够有效地应用 于构建 DNA测序文库, 因而能够获得第一 DNA片段的末端配对测序文库。 由此, 进一步 基于高通量测序技术,能够准确地确定第一 DNA片段的末端序列信息,从而该基因组 DNA 的长片段即第一 DNA片段的末端序列信息能够有效地用于辅助基因组组装, 即能够有效地 对基因组的测序数据进行组装获得完整的基因组序列信息。  According to one aspect of the invention, the invention provides a method of preparing a DNA sample for sequencing. According to an embodiment of the invention, the method comprises the steps of: fragmenting genomic DNA to obtain a first DNA fragment; performing end repair of the first DNA fragment to obtain a DNA fragment that has been repaired at the end; a fragment is ligated to capture a marker to obtain a DNA fragment having a capture marker; a DNA fragment having a capture marker is cyclized to obtain a circular DNA; a circular DNA is fragmented to obtain a second DNA fragment; and a second The DNA fragment is screened to obtain a fragment of interest which constitutes a DNA sample for sequencing. Wherein the target fragment comprises the end sequence of the first DNA fragment. According to an embodiment of the present invention, a DNA sample for sequencing of genomic DNA can be conveniently and efficiently prepared by using the method, and the obtained DNA sample can be effectively applied to construct a DNA sequencing library, thereby enabling end pairing of the first DNA fragment. Sequencing library. Therefore, based on the high-throughput sequencing technology, the end sequence information of the first DNA fragment can be accurately determined, so that the long fragment of the genomic DNA, that is, the end sequence information of the first DNA fragment can be effectively used for assisting genome assembly, that is, The genome sequence data can be efficiently assembled to obtain complete genomic sequence information.
根据本发明的又一方面, 本发明提供了一种构建 DNA测序文库的方法。 根据本发明 的实施例, 该方法包含以下步骤: 根据本发明实施例的制备用于测序的 DNA样品的方法, 制备目的片段; 将目的片段进行末端修复和 3'末端添加碱基 A, 以便获得末端添加碱基 A 的目的片段; 将末端添加碱基 A的目的片段与接头相连, 以便获得连接产物; 将连接产物 进行 PCR扩增, 以便获得扩增产物; 以及分离纯化扩增产物, 该扩增产物构成 DNA测序 文库。 发明人惊奇地发现, 利用该方法构建 DNA测序文库, 简单快速、 高效、 可重复性好, 并且所得文库质量非常好。 根据本发明的实施例, 利用本发明的构建 DNA测序文库的方法 构建的 DNA测序文库, 能够有效地应用于高通量测序平台例如 Illumina测序平台, 并且基 于测序结果就能够准确地确定 DNA测序文库的序列信息, 基于 DNA测序文库的序列信息 能够有效地辅助基因组组装。  According to still another aspect of the present invention, the present invention provides a method of constructing a DNA sequencing library. According to an embodiment of the present invention, the method comprises the steps of: preparing a DNA fragment for sequencing according to an embodiment of the present invention, preparing a fragment of interest; performing end repair on the target fragment and adding base A at the 3' end to obtain Adding a target fragment of base A to the end; linking the target fragment to which the base A is added to the linker to obtain a ligation product; PCR-amplifying the ligation product to obtain an amplification product; and isolating and purifying the amplification product, the expansion The amplified product constitutes a DNA sequencing library. The inventors have surprisingly found that the use of this method to construct a DNA sequencing library is simple, rapid, efficient, and reproducible, and the resulting library is of very good quality. According to an embodiment of the present invention, a DNA sequencing library constructed by the method for constructing a DNA sequencing library of the present invention can be effectively applied to a high-throughput sequencing platform such as an Illumina sequencing platform, and an DNA sequencing library can be accurately determined based on the sequencing result. The sequence information, based on the sequence information of the DNA sequencing library, can effectively assist in genome assembly.
根据本发明的另一方面, 本发明提供了一种 DNA测序文库。 根据本发明的实施例, 该 DNA测序文库是通过根据本发明实施例的构建 DNA测序文库的方法构建的。 根据本发 明的实施例, 该 DNA测序文库能够有效地用于高通量测序平台例如 Illumina测序平台, 并 且基于测序结果确定的 DNA测序文库的序列信息能够有效地辅助基因组组装。 According to another aspect of the invention, the invention provides a DNA sequencing library. According to an embodiment of the present invention, the DNA sequencing library is constructed by a method of constructing a DNA sequencing library according to an embodiment of the present invention. According to this issue In the embodiment, the DNA sequencing library can be effectively used in a high-throughput sequencing platform such as an Illumina sequencing platform, and the sequence information of the DNA sequencing library determined based on the sequencing result can effectively assist in genome assembly.
根据本发明的再一方面,本发明提供了一种 DNA测序的方法。根据本发明的实施例, 该方法包括: 根据本发明实施例的构建 DNA测序文库的方法, 构建基因组 DNA的 DNA 测序文库; 以及对基因组 DNA的 DNA测序文库进行测序, 以便获得测序结果。 发明人发 现, 利用该方法, 能够对基因组 DNA进行测序, 并且可重复性好, 测序结果准确、 有效, 能够用于辅助基因组组装, 从而能够应用于后续的确定基因组 DNA的序列信息。  According to still another aspect of the present invention, the present invention provides a method of DNA sequencing. According to an embodiment of the present invention, the method comprises: constructing a DNA sequencing library of genomic DNA according to a method of constructing a DNA sequencing library according to an embodiment of the present invention; and sequencing a DNA sequencing library of genomic DNA to obtain a sequencing result. The inventors have found that by using this method, genomic DNA can be sequenced and reproducible, and the sequencing results are accurate and effective, and can be used for assisting genome assembly, and thus can be applied to subsequent determination of sequence information of genomic DNA.
根据本发明的另一方面, 本发明提供了一种确定基因组 DNA序列信息的方法。 根据 本发明的实施例, 该方法包括以下步骤: 将基因组 DNA分为第一基因组 DNA样本和第二 基因组 DNA样本; 利用第一基因组 DNA样本, 利用根据本发明实施例的 DNA测序的方 法对第一基因组 DNA样本进行测序, 并且基于测序结果, 确定基因组 DNA的部分序列信 息; 利用第二基因组 DNA样本, 根据常规的测序方法对第二基因组 DNA样本进行测序, 获得基因组 DNA的测序数据, 其中, 该常规的测序方法为选自 SOLEXA、 SOLID, 454、 和单分子测序技术的至少一种; 以及将基因组 DNA的部分序列信息与基因组 DNA的测序 数据进行组装和拼接, 以便确定基因组 DNA的序列信息。 发明人发现, 利用该方法能够有 效地确定基因组 DNA的序列信息, 并且可重复性好, 结果准确、 可靠。  According to another aspect of the invention, the invention provides a method of determining genomic DNA sequence information. According to an embodiment of the present invention, the method comprises the steps of: dividing genomic DNA into a first genomic DNA sample and a second genomic DNA sample; using the first genomic DNA sample, using a method of DNA sequencing according to an embodiment of the present invention A genomic DNA sample is sequenced, and partial sequence information of the genomic DNA is determined based on the sequencing result; the second genomic DNA sample is sequenced according to a conventional sequencing method, and the sequencing data of the genomic DNA is obtained, wherein The conventional sequencing method is at least one selected from the group consisting of SOLEXA, SOLID, 454, and single molecule sequencing techniques; and assembling and splicing partial sequence information of genomic DNA with sequencing data of genomic DNA to determine sequence information of genomic DNA . The inventors have found that the method can effectively determine the sequence information of genomic DNA, and the repeatability is good, and the result is accurate and reliable.
根据本发明的又一方面, 本发明提供了一种制备用于测序的 DNA样品的装置。 根据 本发明的实施例, 该装置包括: 第一片段化单元, 该第一片段化单元用于将基因组 DNA片 段化, 以便获得第一 DNA片段; 末端修复单元, 该末端修复单元与第一片段化单元相连, 用于将第一 DNA片段进行末端修复, 以便获得经过末端修复的 DNA片段; 标记单元, 该 标记单元与末端修复单元相连, 用于将经过末端修复的 DNA片段连接捕获标记, 以便获得 具有捕获标记的 DNA片段; 环化单元, 该环化单元与标记单元相连, 用于将具有捕获标记 的 DNA片段进行环化处理, 以便获得环状 DNA; 第二片段化单元, 该第二片段化单元与 环化单元相连, 用于将环状 DNA片段化, 以便获得第二 DNA片段; 以及筛选单元, 该筛 选单元与第二片段化单元相连, 用于将第二 DNA片段进行筛选, 以便获得目的片段, 该目 的片段构成用于测序的 DNA样品。 发明人惊奇地发现, 利用根据本发明实施例的制备用于 测序的 DNA样品的装置, 能够方便有效地制备用于测序的 DNA样品, 并且可重复性好, 所得 DNA样品能够有效地应用于 DNA测序文库的制备, 从而能够成功用于高通量测序。  According to still another aspect of the present invention, the present invention provides an apparatus for preparing a DNA sample for sequencing. According to an embodiment of the present invention, the apparatus comprises: a first fragmentation unit for fragmenting genomic DNA to obtain a first DNA fragment; an end repair unit, the end repair unit and the first fragment The unit is ligated for end-repairing the first DNA fragment to obtain a DNA fragment which is end-repaired; a labeling unit, which is connected to the end-repairing unit, for connecting the end-repaired DNA fragment to the capture marker so that Obtaining a DNA fragment having a capture label; a cyclization unit, the cyclization unit being linked to the label unit for cyclizing the DNA fragment having the capture label to obtain a circular DNA; a second fragmentation unit, the second a fragmentation unit coupled to the cyclization unit for fragmenting the circular DNA to obtain a second DNA fragment; and a screening unit coupled to the second fragmentation unit for screening the second DNA fragment, In order to obtain a fragment of interest, the fragment of interest constitutes a DNA sample for sequencing. The inventors have surprisingly found that with the apparatus for preparing a DNA sample for sequencing according to an embodiment of the present invention, a DNA sample for sequencing can be conveniently and efficiently prepared, and the repeatability is good, and the obtained DNA sample can be effectively applied to DNA. The sequencing library was prepared to be successfully used for high throughput sequencing.
根据本发明的再一方面, 本发明提供了一种 DNA测序系统。 根据本发明的实施例, 该系统包括: 样品制备装置, 该样品制备装置为根据本发明实施例的制备用于测序的 DNA 样品的装置, 用于制备目的片段, 该目的片段构成用于测序的 DNA样品; 文库构建装置, 该文库构建装置与样品制备装置相连, 用于针对用于测序的 DNA样品构建测序文库; 以及 测序装置, 该测序装置与文库构建装置相连, 用于对测序文库进行测序。 才艮据本发明的实 施例, 利用该系统能够有效地对基因组 DNA样本进行测序, 并且操作方便、快速, 需时少, 可重复性好, 测序结果准确、 可靠。 According to still another aspect of the present invention, the present invention provides a DNA sequencing system. According to an embodiment of the present invention, the system comprises: a sample preparation device which is a device for preparing a DNA sample for sequencing according to an embodiment of the present invention, for preparing a fragment of interest, the fragment of interest constituting for sequencing a DNA sample; a library construction device coupled to the sample preparation device for constructing a sequencing library for the DNA sample for sequencing; and a sequencing device coupled to the library construction device for sequencing the sequencing library . According to the invention For example, the system can effectively sequence genomic DNA samples, and the operation is convenient, rapid, less time-consuming, reproducible, and the sequencing results are accurate and reliable.
本发明的附加方面和优点将在下面的描述中部分给出,部分将从下面的描述中变得 明显, 或通过本发明的实践了解到。 附图说明  The additional aspects and advantages of the invention will be set forth in part in the description which follows. DRAWINGS
本发明的上述和 /或附加的方面和优点从结合下面附图对实施例的描述中将变得明 显和容易理解, 其中:  The above and/or additional aspects and advantages of the present invention will become apparent and readily understood from
图 1 : 显示了根据本发明一个实施例的构建 DNA测序文库的方法的流程示意图。 图 2: 显示了根据本发明的实施例 1中以不同的打断参数,利用标准 Hydroshear仪将 企鹅基因组 DNA进行打断后所得到的 DNA片段的电泳图。  Figure 1 is a schematic flow diagram showing a method of constructing a DNA sequencing library according to one embodiment of the present invention. Fig. 2 is a view showing an electrophoretogram of a DNA fragment obtained by disrupting penguin genomic DNA using a standard Hydroshear instrument with different interruption parameters according to Example 1 of the present invention.
图 3 :显示了根据本发明的实施例 1中电泳分离得到的 40 - 45kb的生物素标记 DNA 片段的电泳图。  Fig. 3 is a chart showing the electrophoresis of a 40 - 45 kb biotin-labeled DNA fragment obtained by electrophoresis separation in Example 1 of the present invention.
图 4: 显示了根据本发明的实施例 1中得到的配对末端序列比对到企鹅基因组上的 插入范围验证结果。  Figure 4: shows the result of the insertion range verification on the penguin genome aligned to the paired end sequence obtained in Example 1 of the present invention.
图 5 : 显示了根据本发明的实施例 2中得到的配对末端序列比对到梅花基因组上的 插入范围验证结果。  Figure 5: shows the result of the insertion range verification on the paired-end sequence obtained in Example 2 according to the present invention.
图 6: 显示了根据本发明的实施例 3中得到的配对末端序列比对到人基因组上的插 入范围验证结果。  Fig. 6: shows the result of the insertion range verification on the paired end sequence obtained in Example 3 according to the present invention.
图 7: 显示了根据本发明一个实施例的制备用于测序的 DNA样品的装置的示意图。 图 8: 显示了 #居本发明一个实施例的 DNA测序系统 发明详细描述  Figure 7: A schematic diagram showing an apparatus for preparing a DNA sample for sequencing according to one embodiment of the present invention. Figure 8: shows a DNA sequencing system of one embodiment of the present invention. Detailed Description of the Invention
下面详细描述本发明的实施例, 所述实施例的示例在附图中示出, 其中自始至终相 图描述的实施例是示例性的, 仅用于解释本发明, 而不能理解为对本发明的限制。  The embodiments of the present invention are described in detail below, and the examples of the embodiments are illustrated in the accompanying drawings, wherein the embodiments described in the drawings are intended to be illustrative only and not to limit the invention. .
需要说明的是, 术语 "第一" 、 "第二" 仅用于描述目的, 而不能理解为指示或暗 示相对重要性或者隐含指明所指示的技术特征的数量。 由此, 限定有 "第一"、 "第二" 的特征可以明示或者隐含地包括一个或者更多个该特征。进一步地,在本发明的描述中, 除非另有说明, "多个" 的含义是两个或两个以上。  It should be noted that the terms "first" and "second" are used for descriptive purposes only, and are not to be construed as indicating or implying relative importance or implicitly indicating the number of technical features indicated. Thus, features defining "first", "second" may explicitly or implicitly include one or more of the features. Further, in the description of the present invention, "multiple" means two or more unless otherwise stated.
制备用于测序的 DNA样品的方法 根据本发明的一个方面, 本发明提供了一种制备用于测序的 DNA样品的方法。 根据 本发明的实施例, 该方法包括以下步骤: Method of preparing a DNA sample for sequencing According to one aspect of the invention, the invention provides a method of preparing a DNA sample for sequencing. According to an embodiment of the invention, the method comprises the steps of:
首先, 将基因组 DNA片段化, 以便获得第一 DNA片段。 根据本发明的实施例, 将基 因组 DNA片段化之前, 可以进一步包括从样本中提取基因组 DNA的步骤。 根据本发明的 实施例, 样本的来源不受特别限制。 根据本发明的一些具体示例, 样本可以来源于动物、 植物和^:生物的至少一种。 根据本发明的一些实施例, 动物可以为人和企鹅的至少一种。 根据本发明的一个实施例, 植物可以为梅花。 根据本发明的实施例, 将基因组 DNA片段化 的方法不受特别限制。 根据本发明的一些具体示例, 将基因组 DNA片段化可以通过选自雾 化法、 超声破碎法、 HydroShear和酶切处理的至少一种进行。 根据本发明的一个实施例, 可以利用 HydroShear仪将基因组 DNA片段化。 根据本发明的实施例, 在获得第一 DNA片 段之后, 且在将第一 DNA片段进行末端修复之前, 可以进一步包括: 对第一 DNA片段进 行片段选择。 根据本发明的实施例, 第一 DNA 片段的长度不受特别限制, 优选地, 第一 DNA片段的长度为 20-50kb , 更优选地, 第一 DNA片段的长度为 25-50kb。  First, the genomic DNA is fragmented to obtain a first DNA fragment. According to an embodiment of the present invention, before the genomic DNA is fragmented, the step of extracting genomic DNA from the sample may be further included. According to an embodiment of the present invention, the source of the sample is not particularly limited. According to some specific examples of the invention, the sample may be derived from at least one of an animal, a plant, and a living organism. According to some embodiments of the invention, the animal may be at least one of a human and a penguin. According to an embodiment of the invention, the plant may be plum. According to an embodiment of the present invention, the method of fragmenting genomic DNA is not particularly limited. According to some specific examples of the present invention, fragmentation of genomic DNA can be carried out by at least one selected from the group consisting of a misting method, a sonication method, a HydroShear, and a digestion treatment. According to one embodiment of the invention, genomic DNA can be fragmented using a HydroShear instrument. According to an embodiment of the present invention, after obtaining the first DNA fragment, and before performing the end repair of the first DNA fragment, the method further comprises: performing segment selection on the first DNA fragment. According to an embodiment of the present invention, the length of the first DNA fragment is not particularly limited, and preferably, the length of the first DNA fragment is 20-50 kb, and more preferably, the length of the first DNA fragment is 25-50 kb.
其次, 将第一 DNA片段进行末端修复, 以便获得经过末端修复的 DNA片段。 根据本 发明的实施例, 可以利用 Klenow片段、 T4 DNA聚合酶和 T4多核苷酸激酶将第一 DNA片 段进行末端修复, 其中, Klenow片段具有 5 '→ 3 '聚合酶活性和 3 '→ 5 '聚合酶活性, 但缺少 5'→ 3'外切酶活性。  Second, the first DNA fragment is end-repaired to obtain a DNA fragment that has been repaired at the end. According to an embodiment of the present invention, the first DNA fragment can be end-repaired using Klenow fragment, T4 DNA polymerase and T4 polynucleotide kinase, wherein the Klenow fragment has 5 '→ 3 'polymerase activity and 3 '→ 5 ' Polymerase activity, but lacks 5'→3' exonuclease activity.
接着, 将经过末端修复的 DNA片段连接捕获标记, 以便获得具有捕获标记的 DNA片 段。在本文中所使用的术语 "捕获标记"是指这样一种标记物,通过使用该标记物标记 DNA 片段, 能够借助该标记物的性质, 例如可以借助与该标记物特异性结合的试剂, 有效地捕 获筛选经过标记的 DNA片段。 根据本发明的实施例, 捕获标记的类型不受特别限制。 根据 本发明的一个实施例, 釆用的捕获标记是生物素, 由此, 通过釆用生物素标记 DNA片段, 能够在后续操作中使用可以特异性结合生物素的试剂, 例如携带链霉亲和素 (可以特异性 结合生物素) 的磁珠, 有效地捕获筛选釆用生物素标记的 DNA片段。 另外, 发明人发现, 在 DNA片段中引入生物素并不影响后续处理, 从而能够提高测序文库的构建效率, 进而提 高测序的效率和准确性。 根据本发明的实施例, 可以利用 Klenow片段、 T4 DNA聚合酶和 T4多核苷酸激酶连接捕获标记, 其中, Klenow片段具有 5'→ 3'聚合酶活性和 3'→ 5'聚合酶 活性, 但缺少 5'→3 '外切酶活性。 根据本发明的实施例, 在获得具有捕获标记的 DNA片段 之后, 且在将具有捕获标记的 DNA片段进行环化处理之前, 可以进一步包括: 对具有捕获 标记的 DNA片段进行片段选择。 根据本发明的实施例, 片段选择的方法不受特别限制。 根 据本发明的一些具体示例, 可以利用 0.6%的琼脂糖电泳进行片段选择。 根据本发明的一些 实施例, 具有捕获标记的 DNA片段的长度可以为 20-50kb。 Next, the end-repaired DNA fragment is ligated to a capture marker to obtain a DNA fragment having a capture marker. The term "capture marker" as used herein refers to a marker by which a DNA fragment can be labeled, by virtue of the nature of the marker, for example, by means of an agent that specifically binds to the marker. The captured DNA fragments are screened and screened. According to an embodiment of the present invention, the type of the capture mark is not particularly limited. According to one embodiment of the present invention, the capture label for use is biotin, whereby by using biotin-labeled DNA fragments, reagents that can specifically bind biotin can be used in subsequent operations, such as carrying streptavidin The magnetic beads (which can specifically bind to biotin) effectively capture and screen the DNA fragments labeled with biotin. In addition, the inventors have found that the introduction of biotin into the DNA fragment does not affect subsequent processing, thereby improving the efficiency of sequencing library construction, thereby improving the efficiency and accuracy of sequencing. According to an embodiment of the present invention, a capture marker can be ligated using a Klenow fragment, T4 DNA polymerase, and a T4 polynucleotide kinase, wherein the Klenow fragment has 5'→3' polymerase activity and 3'→5' polymerase activity, but Lack of 5'→3 'exonuclease activity. According to an embodiment of the present invention, after obtaining the DNA fragment having the capture marker, and before subjecting the DNA fragment having the capture marker to the cyclization treatment, it may further comprise: performing fragment selection on the DNA fragment having the capture marker. According to an embodiment of the present invention, the method of segment selection is not particularly limited. Root According to some specific examples of the invention, fragment selection can be performed using 0.6% agarose electrophoresis. According to some embodiments of the invention, the DNA fragment having the capture marker may be 20-50 kb in length.
接下来, 将具有捕获标记的 DNA片段进行环化处理, 以便获得环状 DNA。 根据本发 明的实施例,可以利用 T4 DNA连接酶和 T3 DNA连接酶将具有标记物的 DNA片段进行环 化处理。 根据本发明的一些实施例, 将具有标记物的 DNA片段进行环化处理后, 可以进一 步包括去除未环化的 DNA片段的步骤。 根据本发明的实施例, 去除未环化的 DNA片段的 方法不受特别限制。 根据本发明的一些具体示例, 可以通过使用选自 DNA酶和核酸外切酶 的至少一种去除未环化的 DNA片段。 根据本发明的实施例, 其中的 DNA酶为不降解质粒 的 ATP依赖性 DNA酶, 核酸外切酶为核酸外切酶 I。  Next, the DNA fragment having the capture marker is subjected to a cyclization treatment to obtain a circular DNA. According to an embodiment of the present invention, a DNA fragment having a marker can be subjected to a cyclization treatment using T4 DNA ligase and T3 DNA ligase. According to some embodiments of the present invention, after the DNA fragment having the label is subjected to a cyclization treatment, the step of removing the uncircularized DNA fragment may be further included. According to an embodiment of the present invention, the method of removing the uncircularized DNA fragment is not particularly limited. According to some specific examples of the present invention, the uncircularized DNA fragment can be removed by using at least one selected from the group consisting of a DNase and an exonuclease. According to an embodiment of the present invention, the DNase is an ATP-dependent DNase which does not degrade the plasmid, and the exonuclease is exonuclease I.
然后, 将环状 DNA片段化, 以便获得第二 DNA片段。 根据本发明的实施例, 将环状 Then, the circular DNA is fragmented to obtain a second DNA fragment. According to an embodiment of the invention, the ring will be
DNA片段化的方法不受特别限制。 根据本发明的一些具体示例, 将环状 DNA片段化可以 通过选自雾化法、 超声破碎法、 HydroShear和酶切处理的至少一种进行。 根据本发明的一 个实施例, 可以利用 Covaris 超声打断仪将环状 DNA片段化。 根据本发明的实施例, 第二 DNA片段的长度可以为 100-1000bp。 根据本发明的一些具体示例, 第二 DNA片段可以为 200-800 bp。 The method of DNA fragmentation is not particularly limited. According to some specific examples of the present invention, fragmentation of the circular DNA can be carried out by at least one selected from the group consisting of an atomization method, a sonication method, a HydroShear, and a digestion treatment. According to one embodiment of the invention, the circular DNA can be fragmented using a Covaris ultrasonic interrupter. According to an embodiment of the invention, the second DNA fragment may be from 100 to 1000 bp in length. According to some specific examples of the invention, the second DNA fragment may be 200-800 bp.
最后, 将第二 DNA 片段进行筛选, 以便获得目的片段, 该目的片段构成用于测序的 DNA样品。 根据本发明的实施例, 将第二 DNA片段进行筛选的方法不受特别限制。 根据 本发明的一些实施例, 将第二 DNA片段进行筛选是利用磁珠捕获进行的, 其中, 该磁珠携 带能够特异性识别捕获标记的分子实体。 才艮据本发明的一个实施例, 其中的捕获标记为生 物素, 且磁珠上携带的分子实体为链霉亲和素。  Finally, the second DNA fragment is screened to obtain a fragment of interest which constitutes a DNA sample for sequencing. According to an embodiment of the present invention, the method of screening the second DNA fragment is not particularly limited. According to some embodiments of the invention, screening the second DNA fragment is performed using magnetic bead capture, wherein the magnetic bead carries a molecular entity capable of specifically recognizing the captured label. According to an embodiment of the invention, the capture marker is a biotin and the molecular entity carried on the magnetic bead is streptavidin.
根据本发明的实施例,利用该方法能够方便有效地制备基因组 DNA的用于测序的 DNA 样品, 所得的 DNA样品能够有效地应用于构建 DNA测序文库, 因而能够获得基因组 DNA 的 DNA测序文库(也可以称为末端配对测序文库, 其中的 "末端"在本文中是指第一 DNA 片段的两个末端)。 进一步基于高通量测序技术, 能够准确地确定基因组 DNA的部分序列 信息(在本文中指第一 DNA片段的末端序列信息), 从而能够有效地用于辅助基因组组装。  According to an embodiment of the present invention, a DNA sample for sequencing of genomic DNA can be conveniently and efficiently prepared by using the method, and the obtained DNA sample can be effectively applied to construct a DNA sequencing library, thereby obtaining a DNA sequencing library of genomic DNA (also It may be referred to as a terminal paired sequencing library, wherein "end" refers herein to both ends of the first DNA fragment). Further based on high-throughput sequencing technology, it is possible to accurately determine partial sequence information of genomic DNA (in this context, the end sequence information of the first DNA fragment), thereby being effectively used for assisting genome assembly.
DNA测序文库及其构建方法  DNA sequencing library and construction method thereof
根据本发明的又一方面, 本发明提供了一种构建 DNA测序文库的方法。 根据本发明 的实施例, 参照图 1 , 该方法包含以下步骤:  According to still another aspect of the present invention, the present invention provides a method of constructing a DNA sequencing library. According to an embodiment of the invention, referring to Figure 1, the method comprises the following steps:
首先, 根据本发明实施例的制备用于测序的 DNA样品的方法, 制备目的片段。  First, a method of preparing a DNA sample for sequencing according to an embodiment of the present invention, a target fragment is prepared.
其次, 将目的片段进行末端修复和 3'末端添加碱基 A, 以便获得末端添加碱基 A的目 的片段。 根据本发明的实施例, 将目的片段进行末端修复的方法不受特别限制。 根据本发 明的一个实施例, 可以利用 Klenow片段、 T4 DNA聚合酶和 T4多核苷酸激酶将目的片段 进行末端修复, 其中, Klenow片段具有 5 '→ 3 '聚合酶活性和 3 '→ 5 '聚合酶活性, 但缺少 5 ' →3'外切酶活性。 根据本发明的实施例, 将目的片段进行 3'末端添加碱基 A的方法不受特 别限制。 根据本发明的一个实施例, 可以利用 Klenow (3 '-5' exo-)将目的片段进行 3'末端添 力口碱基 A。 Next, the target fragment was subjected to terminal repair and base A was added to the 3' end to obtain a target fragment in which the base A was added at the end. According to an embodiment of the present invention, the method of performing end repair of the target fragment is not particularly limited. According to this issue In one embodiment, the target fragment can be end-repaired using Klenow fragment, T4 DNA polymerase, and T4 polynucleotide kinase, wherein the Klenow fragment has 5'→3' polymerase activity and 3'→5' polymerase activity. , but lacks 5 ' → 3' exonuclease activity. According to an embodiment of the present invention, the method of adding the base A to the 3' end of the target fragment is not particularly limited. According to one embodiment of the present invention, Klenow (3 '-5' exo-) can be used to carry out the 3' end of the target fragment to base A.
接着, 将末端添加碱基 A的目的片段与接头相连, 以便获得连接产物。 根据本发明的 实施例, 将末端添加碱基 A的目的片段与接头相连的方法不受特别限制。 根据本发明的一 个实施例, 将末端添加碱基 A的目的片段与接头相连是利用 T4 DNA连接酶进行的。  Next, the target fragment to which the base A is added at the end is attached to a linker to obtain a ligation product. According to an embodiment of the present invention, the method of attaching the target fragment to which the terminal A is added to the terminal is not particularly limited. According to one embodiment of the present invention, the target fragment to which the terminal A is added at the end is linked to a linker by using T4 DNA ligase.
接下来, 将连接产物进行 PCR扩增, 以便获得扩增产物。  Next, the ligation product is subjected to PCR amplification to obtain an amplification product.
然后, 分离纯化扩增产物, 该扩增产物构成 DNA测序文库。 根据本发明的实施例, 分 离纯化扩增产物的方法不受特别限制。 根据本发明的一个实施例, 可以利用 2%琼脂糖凝胶 电泳分离纯化扩增产物。  Then, the amplified product is isolated and purified, and the amplified product constitutes a DNA sequencing library. According to an embodiment of the present invention, the method of separating and purifying the amplification product is not particularly limited. According to one embodiment of the present invention, the amplified product can be isolated and purified by 2% agarose gel electrophoresis.
利用根据本发明实施例的构建 DNA测序文库的方法,能够简单快速、高效地构建 DNA 测序文库, 并且可重复性好, 所得文库质量非常好。 发明人惊奇地发现, 利用该方法构建 的 DNA测序文库, 能够有效地应用于高通量测序平台例如 Illumina测序平台, 并且基于测 序结果就能够准确地确定 DNA测序文库的序列信息, 基于 DNA测序文库的序列信息能够 有效地辅助基因组组装。  With the method of constructing a DNA sequencing library according to an embodiment of the present invention, a DNA sequencing library can be constructed simply, quickly, and efficiently, and the reproducibility is good, and the resulting library is of very good quality. The inventors have surprisingly found that DNA sequencing libraries constructed by this method can be effectively applied to high-throughput sequencing platforms such as the Illumina sequencing platform, and based on the sequencing results, the sequence information of the DNA sequencing library can be accurately determined, based on the DNA sequencing library. The sequence information can effectively assist in genome assembly.
具体地,才艮据本发明的实施例,本发明的构建 DNA测序文库的方法可以包括如下步骤: Specifically, according to an embodiment of the present invention, the method of constructing a DNA sequencing library of the present invention may comprise the following steps:
1 )将样本基因组 DNA随机打断为 20 - 50kb的 DNA片段; 1) Randomly interrupt the sample genomic DNA into a 20-50 kb DNA fragment;
2 ) 下述的步骤 A或 B:  2) Steps A or B below:
A. 将打断的 DNA片段两个末端进行补平, 并加上捕获标记, 然后分离 20 _ 50kb的第 一 DNA片段; 或  A. fill the two ends of the interrupted DNA fragment, add the capture marker, and then isolate the first DNA fragment of 20 _ 50 kb; or
B. 分离打断的 20 - 50kb的 DNA片段, 然后将 DNA片段两个末端进行补平, 并加上 翁 ^己;  B. Isolation of the interrupted 20 - 50 kb DNA fragment, and then fill the two ends of the DNA fragment, and add Wengji;
3 )将分离的 DNA片段进行环化处理, 得到环状 DNA , 并除去未环化的 DNA片段; 3) cyclizing the isolated DNA fragment to obtain a circular DNA, and removing the uncircularized DNA fragment;
4 )将环状 DNA打断为 100 - 2000b 的 DNA片段; 4) breaking the circular DNA into a DNA fragment of 100 - 2000b;
5 )从步骤 4 ) 中得到的 DNA片段中分离带有捕获标记的 DNA片段, 得到捕获片段; 优选地, 还可以包括  5) separating the DNA fragment with the capture marker from the DNA fragment obtained in step 4) to obtain a capture fragment; preferably, it may also include
6 )将捕获片段进行末端补平;  6) the capture fragment is end-filled;
优选地, 还可以包括  Preferably, it may also include
7 )将步骤 6 ) 中末端补平后的 DNA片段进行末端加碱基 A和连接测序接头的步骤, 以便获得连接产物; 7) the step of adding the base A and the ligation-splicing link to the DNA fragment after the end-filling in step 6), In order to obtain a ligation product;
优选地, 还可以包括  Preferably, it may also include
8 )将步骤 7 ) 中得到的连接产物进行 PCR扩增的步骤。  8) A step of PCR amplification of the ligation product obtained in the step 7).
根据本发明的实施例, 在步骤 1 ) 中, 可以将基因组 DNA打断为 25 - 50kb的 DNA片 段。根据本发明的具体示例, 可以将基因组 DNA打断为 20 - 40kb的 DNA片段、 30 - 50kb 的 DNA片段、 35 - 50kb的 DNA片段、 40 - 50kb的 DNA片段、 或者 40 - 45kb的 DNA片 段。 根据本发明的实施例, 样本基因组 DNA可以是任意物种的基因组 DNA, 该物种包括 但不限于哺乳动物、 鸟类、 或植物(如双子叶植物), 具体地包括灵长目、 企鹅目、 或蔷薇 目, 更具体地包括人科、 企鹅科、 或蔷薇科(如李属)根据本发明的一个实施例, 样本基 因组 DNA可以为人、 企鹅(例如阿德里企鹅( Pygoscelis adeliae ) )、 或梅花(例如野梅花, ( Prunus mume ) ) 的基因组 DNA。 才艮据本发明的实施例, 可以对基因组 DNA进行物理方 法打断, 例如可以利用雾化法、 超声片段法或使用 HydroShear仪进行打断, 将基因组 DNA 打断为 20 _ 50kb大小的片段。 优选地, 使用 HydroShear仪进行打断, 通过调节流过收缩孔 的速度和收缩孔的孔径大小, 可以控制基因组 DNA被打断后的片段大小, 使基因组 DNA 被打断成大小较均一的片段。根据本发明的一个实施例, 可以使用 HydroShear仪进行打断, 其中可以使用大片段打断配件, 速度参数设置为 14 - 16 , 循环数设置为 30 - 40 (根据片段 大小选取不同数值), 由此, 可以将基因组 DNA的打断片段范围提高至 20 _ 50kb。  According to an embodiment of the present invention, in step 1), the genomic DNA can be interrupted into a 25 - 50 kb DNA fragment. According to a specific example of the present invention, genomic DNA can be interrupted into a 20 - 40 kb DNA fragment, a 30 - 50 kb DNA fragment, a 35 - 50 kb DNA fragment, a 40 - 50 kb DNA fragment, or a 40 - 45 kb DNA fragment. According to an embodiment of the invention, the sample genomic DNA may be genomic DNA of any species including, but not limited to, mammals, birds, or plants (eg, dicots), specifically including primates, penguins, or rosettes. More specifically, including human, penguin, or Rosaceae (such as Prunus). According to one embodiment of the present invention, the sample genomic DNA can be human, penguin (eg, Pygoscelis adeliae), or plum (eg, wild) Plum, (Prunus mume) genomic DNA. According to an embodiment of the present invention, genomic DNA can be physically interrupted, for example, by atomization, ultrasonic fragmentation, or interruption using a HydroShear instrument to break genomic DNA into fragments of 20 _ 50 kb in size. Preferably, the HydroShear apparatus is used for interrupting, and by adjusting the speed of the flow through the contraction hole and the pore size of the contraction hole, the size of the fragment after the genomic DNA is interrupted can be controlled, and the genomic DNA is interrupted into a relatively uniform fragment. According to an embodiment of the present invention, the HydroShear instrument can be used for interruption, wherein a large segment interrupting accessory can be used, the speed parameter is set to 14 - 16 , and the number of cycles is set to 30 - 40 (different values are selected according to the segment size), Thus, the range of disrupted fragments of genomic DNA can be increased to 20 _ 50 kb.
根据本发明的实施例, 在步骤 2 ) 中, 该分离可以为凝胶电泳分离; 具体地, 可以为琼 脂糖凝胶电泳分离, 可以釆用普通琼脂糖凝胶电泳或者脉冲场凝胶电泳, 然后利用切胶回 收, 将目的大小的 DNA片段分离纯化出来。 根据本发明的一些实施例, 捕获标记可以为生 物素, 并且相应的, 步骤 5 )中的分离可以通过使用带有链酶亲和素的磁珠进行。 根据本发 明的实施例, 也可以选用基于类似抗体-抗原反应的结合系统进行步骤 2 )和步骤 5 )。  According to an embodiment of the present invention, in step 2), the separation may be gel electrophoresis separation; specifically, it may be separated by agarose gel electrophoresis, and may be performed by ordinary agarose gel electrophoresis or pulsed-field gel electrophoresis. Then, using a gelatin recovery, the DNA fragment of the desired size is isolated and purified. According to some embodiments of the invention, the capture label can be a biotin, and correspondingly, the separation in step 5) can be carried out by using magnetic beads with streptavidin. According to an embodiment of the present invention, step 2) and step 5) may also be carried out based on a binding system similar to the antibody-antigen reaction.
由于经过物理打断的 DNA片段, 可能形成 5 '或 3 '端突出, 需要进行末端补平, 因此, 根据本发明的实施例, 在步骤 2 ) 中, 可以利用聚合酶如 Klenow大片段酶、 T4 DNA聚合 酶和 T4多聚核苷酸激酶以及 dNTP补平末端, 以产生平端化的 DNA。 其中, T4 DNA聚合 酶可以使 3 '突出末端平滑化, 5 '末端补平, Klenow大片段酶可以补平 5 '突出端或切除 3 '突 出端, 而 T4多聚核苷酸激酶则是将 5 '端磷酸化并去除 3 '端磷酸基团, 以便进行连接反应。  Due to the physically broken DNA fragment, a 5' or 3' end projection may be formed, and end filling is required. Therefore, according to an embodiment of the present invention, in step 2), a polymerase such as Klenow large fragment enzyme, T4 DNA polymerase and T4 polynucleotide kinase and dNTPs fill the ends to produce blunt-ended DNA. Among them, T4 DNA polymerase can smooth the 3' overhanging end, fill the 5' end, and Klenow large fragment enzyme can fill the 5' overhang or excise the 3' overhang, while T4 polynucleotide kinase will The 5' end is phosphorylated and the 3' terminal phosphate group is removed for the ligation reaction.
根据本发明的实施例,在步骤 2 )中,补平 DNA片段的末端后,可以将末端补平的 DNA 片段进行生物素 (Biotin)标记, 其中, 标记的反应体系和条件与末端补平的反应相似, 只是 将普通 dNTP换成 Biotin-dNTP与普通 dNTP的混合物,然后利用 Klenow大片段酶、 T4 DNA 聚合酶所具有的 3 '-5 '外切酶活性和 5 '-3 '聚合酶活性,在 DNA片段的 3 '末端发生替换反应, 将普通 dNTP替换成 Biotin-dNTP, 从而在保证 DNA片段维持平末端的条件下使其标记上 生物素。 根据本发明的具体示例, 也可以直接利用标记有生物素的碱基进行末端补平。 以 上这些将末端补平的 DNA 片段进行生物素 (Biotin)标记的方法均在本领域技术人员的知识 和技能之内。 According to an embodiment of the present invention, in step 2), after finishing the end of the DNA fragment, the end-filled DNA fragment can be labeled with Biotin, wherein the labeled reaction system and conditions are filled with the end. The reaction was similar, except that the common dNTP was replaced with a mixture of Biotin-dNTP and common dNTP, and then the 3'-5' exonuclease activity and 5 '-3 'polymerase activity of Klenow large fragment enzyme, T4 DNA polymerase were utilized. , a substitution reaction occurs at the 3' end of the DNA fragment, The common dNTP was replaced with Biotin-dNTP, thereby labeling the biotin with a DNA fragment that was maintained at the blunt end. According to a specific example of the present invention, it is also possible to directly perform end-filling using a base labeled with biotin. The above methods for labeling end-filled DNA fragments with Biotin are well within the knowledge and skill of those skilled in the art.
根据本发明的实施例, 在步骤 3 ) 中, 对分离得到的目的大小的 DNA片段进行环化, 可以釆用 T4 DNA连接酶及 T3 DNA连接酶联合作用的形式使目的片段 DNA两个末端形成 连接, 使该片段成环。 根据本发明的一些具体示例, 也可以单独使用 T4 DNA连接酶或 T3 DNA连接酶进行连接, 根据本发明的实施例, 在含有 PEG的连接緩冲液中, 16°C孵育 16 小时的条件下, 比较使用 T3 DNA连接酶和 T4 DNA连接酶联合作用、 单独使用 T3 DNA 连接酶以及单独使用 T4 DNA连接酶进行连接环化处理的效果, 结果表明, 使用 T3 DNA 连接酶和 T4 DNA连接酶联合作用比单独使用 T3 DNA连接酶或单独使用 T4 DNA连接酶, 能够使环化效率(指片段化的线性 DNA自连成环状 DNA的比例)从 1% - 3%提高至 5% - 10%, 因此, 优选使用 T3 DNA连接酶和 T4 DNA连接酶联合作用进行连接环化处理。  According to an embodiment of the present invention, in step 3), the isolated DNA fragment of the desired size is cyclized, and the two ends of the DNA of the target fragment can be formed by the combination of T4 DNA ligase and T3 DNA ligase. Connect to make the segment loop. According to some specific examples of the present invention, ligation can also be carried out using T4 DNA ligase or T3 DNA ligase alone, according to an embodiment of the present invention, in a PEG-containing ligation buffer, incubated at 16 ° C for 16 hours. , comparing the effects of T3 DNA ligase and T4 DNA ligase, T3 DNA ligase alone, and T4 DNA ligase alone for ligation, the results showed that T3 DNA ligase and T4 DNA ligase were used in combination. The effect of cyclization efficiency (referring to the ratio of fragmented linear DNA self-ligated into circular DNA) from 1% - 3% to 5% - 10% can be achieved by using T3 DNA ligase alone or T4 DNA ligase alone. Therefore, it is preferred to carry out the ligation and cyclization treatment using a combination of T3 DNA ligase and T4 DNA ligase.
根据本发明的实施例, 在步骤 3 )中, 优选地, 在进行环化反应之前, 增加一步将 DNA 混合液置于 50 - 75°C下进行孵育 1 - 30分钟后立即进行水浴的步骤。 这一步骤可以降低不 同 DNA片段连接在一起的几率, 确保每个环化的 DNA分子均为单一片段。 具体地, 根据 本发明的实施例, 孵育的温度可以为 60 - 70 °C , 例如 61、 62、 63、 64、 65、 66、 67、 68、 69、 或 70°C , 孵育的时间可以为 5 - 25分钟, 更具体地, 孵育的时间可以为 10 - 20分钟, 例如 10、 11、 12、 13、 14、 15、 16、 17、 18、 19、 或 20分钟。 才艮据本发明的一个实施例, 将 DNA混合液置于 65°C下进行孵育 15分钟后立即水浴。  According to an embodiment of the present invention, in the step 3), preferably, a step of performing a water bath immediately after the incubation of the DNA mixture at 50 - 75 ° C for 1 to 30 minutes is carried out before the cyclization reaction. This step reduces the chances of linking different DNA fragments together, ensuring that each cyclized DNA molecule is a single fragment. Specifically, according to an embodiment of the present invention, the incubation temperature may be 60-70 ° C, such as 61, 62, 63, 64, 65, 66, 67, 68, 69, or 70 ° C, and the incubation time may be 5 - 25 minutes, more specifically, the incubation time may be 10 - 20 minutes, such as 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, or 20 minutes. According to one embodiment of the present invention, the DNA mixture was incubated at 65 ° C for 15 minutes and immediately after a water bath.
在步骤 3 ) 中, 未连接的片段化 DNA需要去除, 否则会影响配对末端文库的测序, 根 据本发明的实施例, 可以釆用已知的消化线性 DNA的方法去除未连接的片段化 DNA。 根 据本发明的具体示例,可以使用不降解质粒的 ATP依赖 DNA酶( Plasmid-Safe ATP-dependent DNase )、 或核酸外切酶 I ( Exonuclease I )等降解未成环的双链或单链 DNA, 优选地, 使 用不降解质粒的 ATP依赖 DNA酶(Plasmid-Safe ATP-dependent DNase )和核酸外切酶 I ( Exonuclease I )去除未成环的双链或单链 DNA, 由此, 能够更彻底的消化双链线性 DNA 和单链线性 DNA, 使得未环化的线性 DNA对文库的影响降至最低。  In step 3), the unligated fragmented DNA needs to be removed, otherwise the sequencing of the paired-end library is affected. According to an embodiment of the present invention, the unligated fragmented DNA can be removed by a known method of digesting linear DNA. According to a specific example of the present invention, an unconjugated double-stranded or single-stranded DNA can be degraded using a plasmid-safe ATP-dependent DNase or an exonuclease I. , using a non-degrading plasmid, ATP-dependent DNase and Exonuclease I, to remove unlooped double-stranded or single-stranded DNA, thereby enabling more complete digestion Linear linear DNA and single-stranded linear DNA minimize the effects of uncircularized linear DNA on the library.
根据本发明的一个实施例, 可以利用 DNA片段平末端的高效自连接环化, 由此, 能够 省略使用外来载体时需要的设计酶切位点或引入中间接头实现环化连接等步骤, 再配合随 机打断使环状 DNA片段化的操作, 能够大大提高配对末端测序数据的可用度。 这是因为如 果使用酶切法打断如前文所述得到的配对末端序列读长太短(每一端仅得到 25bp左右有效 数据), 而使用中间接头进行环化, 在打断过程中容易因断裂位置处于中间接头区间而使得 文库丢失某一端序列, 无法形成配对末端, 限制了其数据的丰富度, 而前面所述的根据本 发明实施例的使用 DNA片段的高效自连接环化, 连接点两端即为基因组序列信息, 不存在 其他外来序列或中间接头, 可以最大限度的利用数据信息(每一端有效数据可以达到 lOObp 或以上)。 According to an embodiment of the present invention, high-efficiency self-ligation cyclization of the blunt end of the DNA fragment can be utilized, whereby the steps of designing the cleavage site required for the use of the foreign vector or introducing the intermediate link to realize the cyclization connection can be omitted. Randomly interrupting the process of fragmenting circular DNA can greatly increase the availability of sequencing data at the paired ends. This is because if the enzymatic cleavage method is used to interrupt the paired end sequence obtained as described above, the read length is too short (only about 25 bp is effective at each end). Data), while using the intermediate linker for cyclization, it is easy to cause the library to lose a certain end sequence due to the position of the break in the intermediate linker interval during the interruption process, and the paired end cannot be formed, which limits the richness of the data. According to the embodiment of the present invention, the efficient self-ligation cyclization of the DNA fragment is used, and the two ends of the connection point are the genomic sequence information, and there are no other external sequences or intermediate connectors, so that the data information can be utilized to the utmost extent (the effective data at each end can reach lOObp) or above).
在步骤 4 ) 中, 由于环状 DNA不能直接用于测序, 需要通过片段化恢复成线性 DNA, 同时释放出配对末端序列。根据本发明的实施例,可以使用已知的各种打断方式将环状 DNA 片段化,根据本发明的具体示例,可以利用雾化法、超声破碎法或 HydroShear等将环状 DNA 片段化。 根据本发明的一个实施例, 优选釆用 Covaris S2仪器超声打断法, 将 20 _ 40kb的 环状 DNA打断成 200 - 800b 的线性 DNA片段。 这些打断获得的线性 DNA片段并非全部 都是测序需要的配对末端片段。 在步骤 2 ) 中进行的连接捕获标记(生物素标记), 是对片 段末端几个碱基进行替换标记, 因此只有片段末端带有生物素, 经过环化之后这些带有生 物素标记的末端被连接起来, 通过带有链霉亲和素的磁珠 (Streptavidin magnetic beads), 可 以将这些带有生物素标记的配对末端片段进行特异性捕获, 而那些不带生物素标记的中间 片段则因无法与磁珠结合而被去除。  In step 4), since the circular DNA cannot be directly used for sequencing, it needs to be restored to linear DNA by fragmentation, and the paired terminal sequence is released. According to an embodiment of the present invention, circular DNA can be fragmented using various known breaking methods, and according to a specific example of the present invention, circular DNA can be fragmented by atomization, sonication or HydroShear. According to one embodiment of the present invention, it is preferred to use a Covaris S2 instrument for ultrasonic disruption to break a 20-40 kb circular DNA into a linear DNA fragment of 200 - 800b. Not all of the linear DNA fragments obtained by these interruptions are paired end fragments required for sequencing. The ligation capture marker (biotin tag) performed in step 2) is a substitution tag for several bases at the end of the fragment, so only the end of the fragment carries biotin, and after the cyclization, these biotin-labeled ends are Linked together, these biotin-labeled paired-end fragments can be specifically captured by streptavidin magnetic beads, while those without biotin-labeled intermediate fragments are unable to It is removed in combination with the magnetic beads.
根据本发明的实施例, 在步骤 4 ) 中, 可以将环状 DNA打断为 100 - l,000b 的 DNA 片段, 根据本发明的一个实施例, 可以打断为 200 _ 800bp的片段, 根据本发明的又一个实 施例,可以打断为 200 - 700b 的片段,根据本发明的再一个实施例,可以打断为 200 - 600bp 的片段; 根据本发明的另一个实施例, 可以打断为 200 _ 500bp的片段。  According to an embodiment of the present invention, in step 4), the circular DNA can be interrupted into a DNA fragment of 100 - l,000b, and according to an embodiment of the present invention, a fragment of 200 _ 800 bp can be interrupted, according to the present invention. In still another embodiment of the invention, the segment of 200-700b can be interrupted. According to still another embodiment of the present invention, the segment of 200-600 bp can be interrupted; according to another embodiment of the present invention, it can be interrupted to 200. _ 500bp fragment.
在步骤 6 ) _ 8 )中, 被捕获到磁珠上的 DNA片段需要经过末端补平, 根据本发明的实 施例, 同样可以利用聚合酶如 Klenow大片段酶、 T4 DNA聚合酶和 T4多聚核苷酸激酶以 及 dNTP补平末端,以产生平端化的 DNA,然后可以利用 Klenow(3'-5'exo—)聚合酶和 dATP, 在 DNA片段 3'末端添加一个碱基 A, 其中 Klenow(3'-5'exo—)聚合酶保留了 DNA聚合酶活 性, 但是失去了 5'-3'和 3'-5'外切酶活性。 根据本发明的实施例, 添加碱基 A之后可以再利 用 T4 DNA连接酶将测序接头连接到 DNA片段末端, 利用接头末端的 T碱基突出和 DNA 片段末端的 A碱基突出互补配对实现连接。 根据本发明的实施例, 接头可选择 Illumina、 SOLiD或 454测序接头, 以适应不同测序平台测序使用。 根据本发明的实施例, 连接接头 之后可以通过特异引物 PCR扩增富集配对末端片段, 其形成测序文库。 根据本发明的实施 例, 可以将获得的测序文库在 Illumina、 SOLiD或 454等第二代测序平台上进行单向或双向 测序, 以获得两个配对末端的序列信息后将其用于基因组图谱的组装或比对。  In step 6) _ 8 ), the DNA fragment captured on the magnetic beads needs to be end-filled, and according to an embodiment of the present invention, a polymerase such as Klenow large fragment enzyme, T4 DNA polymerase and T4 can also be used. Nucleotide kinases and dNTPs fill the ends to produce blunt-ended DNA, which can then be added to the 3' end of the DNA fragment using Klenow (3'-5'exo-) polymerase and dATP, where Klenow ( 3'-5'exo-) polymerase retains DNA polymerase activity but loses 5'-3' and 3'-5' exonuclease activity. According to an embodiment of the present invention, after the base A is added, the sequencing linker can be ligated to the end of the DNA fragment by using T4 DNA ligase, and the T base swell at the end of the linker and the A base highlight complementary pair at the end of the DNA fragment are used for the ligation. In accordance with embodiments of the present invention, the linker can be selected from Illumina, SOLiD or 454 sequencing adaptors to accommodate sequencing sequencing of different sequencing platforms. According to an embodiment of the invention, the paired end fragments can be enriched by specific primers after ligation of the linker, which form a sequencing library. According to an embodiment of the present invention, the obtained sequencing library can be unidirectionally or bidirectionally sequenced on a second generation sequencing platform such as Illumina, SOLiD or 454 to obtain sequence information of two paired ends and then used for genomic mapping. Assemble or compare.
根据本发明的另一方面, 本发明提供了一种 DNA测序文库。 根据本发明的实施例, 该 DNA测序文库是通过根据本发明实施例的构建 DNA测序文库的方法构建的。 发明人发 现, 该 DNA测序文库质量好, 纯度高, 能够有效地用于高通量测序平台例如 Illumina测序 平台,并且基于测序结果确定的 DNA测序文库的序列信息能够有效地辅助其来源的基因组 的组装。 According to another aspect of the invention, the invention provides a DNA sequencing library. According to an embodiment of the invention, The DNA sequencing library was constructed by a method of constructing a DNA sequencing library according to an embodiment of the present invention. The inventors have found that the DNA sequencing library is of high quality and high purity, can be effectively used in high-throughput sequencing platforms such as the Illumina sequencing platform, and the sequence information of the DNA sequencing library determined based on the sequencing results can effectively assist the genome of the source. Assembly.
DNA测序的方法及确定基因组 DNA序列信息的方法  DNA sequencing method and method for determining genomic DNA sequence information
根据本发明的再一方面,本发明提供了一种 DNA测序的方法。根据本发明的实施例, 该方法包括:  According to still another aspect of the present invention, the present invention provides a method of DNA sequencing. According to an embodiment of the invention, the method comprises:
首先, 根据本发明实施例的构建 DNA测序文库的方法, 构建基因组 DNA的 DNA测 序文库。  First, a DNA sequencing library of genomic DNA is constructed by the method of constructing a DNA sequencing library according to an embodiment of the present invention.
然后, 以及对基因组 DNA的 DNA测序文库进行测序, 以便获得测序结果。 根据本发 明的实施例, 对基因组 DNA的 DNA测序文库进行测序的方法不受特别限制。 根据本发明 的具体示例, 可以利用高通量测序平台 (也可以称为 "高通量测序技术")进行测序。 根据 本发明的一些实施例, 可以利用选自第二代测序平台和单分子测序平台的至少一种进行测 序。 根据本发明的一些具体示例, 第二代测序平台可以为选自 Illumina-Solexa测序平台、 ABI-Solid测序平台和 Roche-454测序平台的至少一种,单分子测序平台可以为选自 Helicos 公司的真实单分子测序平台、 Pacific Biosciences 公司的单分子实时测序平台和 Oxford Nanopore Technologies公司的纳米孔测序平台的至少一种。根据本发明的实施例, 该方法可 以进一步包括: 基于测序结果, 确定基因组 DNA的部分序列信息。  Then, and sequencing the DNA sequencing library of genomic DNA to obtain sequencing results. According to an embodiment of the present invention, a method of sequencing a DNA sequencing library of genomic DNA is not particularly limited. According to a specific example of the invention, sequencing can be performed using a high throughput sequencing platform (also referred to as "high throughput sequencing technology"). According to some embodiments of the invention, the sequencing may be performed using at least one selected from the group consisting of a second generation sequencing platform and a single molecule sequencing platform. According to some specific examples of the present invention, the second generation sequencing platform may be at least one selected from the group consisting of an Illumina-Solexa sequencing platform, an ABI-Solid sequencing platform, and a Roche-454 sequencing platform, and the single molecule sequencing platform may be selected from the group of Helicos. Real single-molecule sequencing platform, Pacific Biosciences' single-molecule real-time sequencing platform and at least one of Oxford Nanopore Technologies' nanopore sequencing platforms. According to an embodiment of the present invention, the method may further comprise: determining partial sequence information of the genomic DNA based on the sequencing result.
利用根据本发明实施例的 DNA测序的方法, 能够有效地对基因组 DNA进行测序, 并 且可重复性好, 测序结果准确性好、 可靠性高, 而且该测序结果能够用于辅助基因组组装, 从而能够应用于后续的确定基因组 DNA的序列信息。  By using the method of DNA sequencing according to an embodiment of the present invention, genomic DNA can be efficiently sequenced, and the repeatability is good, the sequencing result is accurate and reliable, and the sequencing result can be used for assisting genome assembly, thereby enabling Applied to subsequent sequence information for determining genomic DNA.
根据本发明的另一方面, 本发明提供了一种确定基因组 DNA序列信息的方法。 根据 本发明的实施例, 该方法包括以下步骤:  According to another aspect of the invention, the invention provides a method of determining genomic DNA sequence information. According to an embodiment of the invention, the method comprises the steps of:
首先, 将基因组 DNA分为第一基因组 DNA样本和第二基因组 DNA样本。  First, the genomic DNA is divided into a first genomic DNA sample and a second genomic DNA sample.
其次, 利用第一基因组 DNA样本, 利用根据本发明实施例的 DNA测序的方法对第一 基因组 DNA样本进行测序, 并且基于测序结果, 确定基因组 DNA的部分序列信息。  Next, using the first genomic DNA sample, the first genomic DNA sample is sequenced by the method of DNA sequencing according to an embodiment of the present invention, and based on the sequencing result, partial sequence information of the genomic DNA is determined.
接着, 利用第二基因组 DNA样本, 根据常规的测序方法对第二基因组 DNA样本进行 测序, 获得基因组 DNA的测序数据, 其中, 该常规的测序方法为选自 SOLEXA、 SOLID, 454、 和单分子测序技术的至少一种。  Next, using the second genomic DNA sample, the second genomic DNA sample is sequenced according to a conventional sequencing method, and the sequencing data of the genomic DNA is obtained, wherein the conventional sequencing method is selected from the group consisting of SOLEXA, SOLID, 454, and single molecule sequencing. At least one of the technologies.
然后, 将基因组 DNA的部分序列信息与基因组 DNA的测序数据进行组装和拼接, 以 便确定基因组 DNA的序列信息。  Then, the partial sequence information of the genomic DNA and the sequencing data of the genomic DNA are assembled and spliced to determine the sequence information of the genomic DNA.
发明人发现,利用根据本发明实施例的确定基因组 DNA序列信息的方法能够有效地确 定基因组 DNA的序列信息, 并且可重复性好, 结果准确性好、 可靠性高。 制备用于测序的 DNA样品的装置和 DNA测序系统 The inventors have found that the method for determining genomic DNA sequence information according to an embodiment of the present invention can effectively determine the sequence information of genomic DNA, and has good repeatability, high accuracy and high reliability. Apparatus and DNA sequencing system for preparing DNA samples for sequencing
根据本发明的又一方面, 本发明提供了一种制备用于测序的 DNA样品的装置。 根据 本发明的实施例, 参照图 7, 制备用于测序的 DNA样品的装置 1000包括: 第一片段化单元 100、 末端修复单元 200、 标记单元 300、 环化单元 400、 第二片段化单元 500以及稀选单元 600。 第一片段化单元 100用于将基因组 DNA片段化, 以便获得第一 DNA片段。 根据本发 明的实施例, 可以釆用适于将基因组 DNA片段化的任意装置作为第一片段化单元 100。 根 据本发明的一个实施例, 可以釆用 HydroShear仪作为第一片段化单元 100。 末端修复单元 200与第一片段化单元 100相连, 用于将第一 DNA片段进行末端修复, 以便获得经过末端 修复的 DNA片段。 标记单元 300与末端修复单元 200相连, 用于将经过末端修复的 DNA 片段连接捕获标记, 以便获得具有捕获标记的 DNA片段。 根据本发明的一个实施例, 标记 单元 300中设置有生物素。环化单元 400与标记单元 300相连,用于将具有捕获标记的 DNA 片段进行环化处理, 以便获得环状 DNA。 第二片段化单元 500与环化单元 400相连, 用于 将环状 DNA片段化, 以便获得第二 DNA片段, 其中, 根据本发明的实施例, 第二片段化 单元 500可以为 Covaris 超声打断仪。 筛选单元 600与第二片段化单元 500相连,用于将第 二 DNA片段进行筛选, 以便获得目的片段, 该目的片段构成用于测序的 DNA样品。 根据 本发明的一个实施例, 筛选单元 600 中设置有携带链霉亲和素的磁珠。 根据本发明的一些 实施例, 该装置可以进一步包括: 基因组提取单元, 该基因组提取单元与第一片段化单元 100相连, 用于从生物样本提取基因组 DNA。  According to still another aspect of the present invention, the present invention provides an apparatus for preparing a DNA sample for sequencing. According to an embodiment of the present invention, referring to FIG. 7, an apparatus 1000 for preparing a DNA sample for sequencing includes: a first fragmentation unit 100, an end repair unit 200, a labeling unit 300, a cyclization unit 400, and a second fragmentation unit 500. And a thinning unit 600. The first fragmentation unit 100 is for fragmenting genomic DNA to obtain a first DNA fragment. According to an embodiment of the present invention, any device suitable for fragmenting genomic DNA can be used as the first fragmentation unit 100. According to one embodiment of the invention, a HydroShear instrument can be employed as the first fragmentation unit 100. The end repair unit 200 is coupled to the first fragmentation unit 100 for end-repairing the first DNA fragment to obtain an end-repaired DNA fragment. The labeling unit 300 is coupled to the end repair unit 200 for attaching the end-repaired DNA fragment to the capture label to obtain a DNA fragment having the capture label. According to an embodiment of the invention, biotin is provided in the marking unit 300. The cyclization unit 400 is coupled to the labeling unit 300 for cyclizing the DNA fragment having the capture label to obtain circular DNA. The second fragmentation unit 500 is coupled to the cyclization unit 400 for fragmenting the circular DNA to obtain a second DNA fragment, wherein the second fragmentation unit 500 can be interrupted by Covaris ultrasound according to an embodiment of the present invention. instrument. The screening unit 600 is coupled to the second fragmentation unit 500 for screening the second DNA fragment to obtain a fragment of interest which constitutes a DNA sample for sequencing. According to an embodiment of the present invention, the screening unit 600 is provided with a magnetic bead carrying streptavidin. According to some embodiments of the present invention, the apparatus may further comprise: a genome extraction unit coupled to the first fragmentation unit 100 for extracting genomic DNA from the biological sample.
发明人惊奇地发现, 利用根据本发明实施例的制备用于测序的 DNA样品的装置, 能够 方便有效地制备用于测序的 DNA样品, 并且可重复性好, 所得 DNA样品质量好、 纯度高, 能够有效地应用于 DNA测序文库的制备, 从而能够成功用于高通量测序。  The inventors have surprisingly found that with the apparatus for preparing a DNA sample for sequencing according to an embodiment of the present invention, a DNA sample for sequencing can be conveniently and efficiently prepared, and the reproducibility is good, and the obtained DNA sample is of good quality and high purity. It can be effectively applied to the preparation of DNA sequencing libraries, which can be successfully used for high-throughput sequencing.
根据本发明的再一方面, 本发明提供了一种 DNA测序系统。 根据本发明的实施例, 参照图 8, DNA测序系统 10000包括: 样品制备装置 1000、 文库构建装置 2000以及测序装 置 3000。 其中, 样品制备装置 1000为根据本发明实施例的制备用于测序的 DNA样品的装 置, 用于制备目的片段, 该目的片段构成用于测序的 DNA样品。 文库构建装置 2000与样 品制备装置 1000相连, 用于针对用于测序的 DNA样品构建测序文库。 才艮据本发明的实施 例, 文库构建装置 2000可以进一步包括: 末端修饰单元, 其用于将目的片段与接头相连, 以便获得连接产物; PCR扩增单元, 其与末端修饰单元相连, 用于对连接产物进行扩增, 以便获得扩增产物; 以及纯化单元, 其与 PCR扩增单元相连, 用于分离纯化扩增产物, 该 扩增产物构成 DNA测序文库。 测序装置 3000与文库构建装置 2000相连, 用于对测序文库 进行测序。 根据本发明的实施例, 可以釆用任何适于对测序文库进行测序的装置作为测序 装置 3000。根据本发明的一些实施例, 测序装置 3000可以是选自第二代测序平台和单分子 测序平台的至少一种。 根据本发明的一些实施例, 其中, 第二代测序平台可以为选自 Illumina-Solexa测序平台、 ABI-Solid测序平台和 Roche-454测序平台的至少一种, 单分子 测序平台可以为选自 Helicos公司的真实单分子测序平台、 Pacific Biosciences公司的单分子 实时测序平台和 Oxford Nanopore Technologies公司的纳米孔测序平台的至少一种。 According to still another aspect of the present invention, the present invention provides a DNA sequencing system. According to an embodiment of the present invention, referring to FIG. 8, a DNA sequencing system 10000 includes: a sample preparation device 1000, a library construction device 2000, and a sequencing device 3000. Among them, the sample preparation device 1000 is a device for preparing a DNA sample for sequencing according to an embodiment of the present invention, for preparing a fragment of interest, which constitutes a DNA sample for sequencing. Library construction device 2000 is coupled to sample preparation device 1000 for constructing a sequencing library for DNA samples for sequencing. According to an embodiment of the present invention, the library construction apparatus 2000 may further include: an end modification unit for connecting the target fragment to the linker to obtain a ligation product; and a PCR amplification unit connected to the end modification unit for The ligation product is amplified to obtain an amplification product; and a purification unit is coupled to the PCR amplification unit for isolating and purifying the amplification product, and the amplification product constitutes a DNA sequencing library. The sequencing device 3000 is coupled to the library construction device 2000 for sequencing the sequencing library. According to an embodiment of the present invention, any device suitable for sequencing a sequencing library can be used as a sequencing Device 3000. According to some embodiments of the invention, the sequencing device 3000 may be at least one selected from the group consisting of a second generation sequencing platform and a single molecule sequencing platform. According to some embodiments of the present invention, the second generation sequencing platform may be at least one selected from the group consisting of an Illumina-Solexa sequencing platform, an ABI-Solid sequencing platform, and a Roche-454 sequencing platform, and the single molecule sequencing platform may be selected from Helicos. The company's real single-molecule sequencing platform, Pacific Biosciences' single-molecule real-time sequencing platform, and at least one of Oxford Nanopore Technologies' nanopore sequencing platforms.
根据本发明的实施例, 利用该系统能够有效地对基因组 DNA样本进行测序, 并且操作 方便、 快速, 需时少, 可重复性好, 测序结果准确性好、 可靠性高。  According to an embodiment of the present invention, the genomic DNA sample can be efficiently sequenced by using the system, and the operation is convenient, rapid, less time-consuming, reproducible, and the sequencing result is accurate and reliable.
需要说明的是, 根据本发明实施例的制备用于测序的 DNA样品的方法及其应用, 是 本申请的发明人经过艰苦的创造性劳动和优化工作才完成的。 下面将结合实施例对本发明的方案进行解释。 本领域技术人员将会理解, 下面的实施 例仅用于说明本发明, 而不应视为限定本发明的范围。 实施例中未注明具体技术或条件的, 按照本领域内的文献所描述的技术或条件(例如参考 J.萨姆布鲁克等著, 黄培堂等译的《分 子克隆实验指南》, 第三版, 科学出版社)或者按照产品说明书进行。 所用试剂或仪器未注 明生产厂商者, 均为可以通过市购获得的常规产品, 例如可以釆购自 Illumina公司。  It should be noted that the method for preparing a DNA sample for sequencing according to an embodiment of the present invention and its application are completed by the inventor of the present application through arduous creative labor and optimization work. The solution of the present invention will be explained below in conjunction with the embodiments. Those skilled in the art will appreciate that the following examples are merely illustrative of the invention and are not to be considered as limiting the scope of the invention. In the examples, the specific techniques or conditions are not indicated, according to the techniques or conditions described in the literature in the field (for example, refer to J. Sambrook et al., Huang Peitang et al., Molecular Cloning Experimental Guide, Third Edition, Science Press) or in accordance with the product manual. The reagents or instruments used are not specified by the manufacturer, and are conventional products that are commercially available, for example, from Illumina.
实施例 1: 企鹅基因组的 DNA测序文库构建和测序  Example 1: DNA sequencing library construction and sequencing of penguin genome
1. 构建企鹅基因组的 DNA测序文库  1. Construction of a DNA sequencing library for penguin genome
以 50μ§的阿德里企鹅( Pygoscelis adeliae )的基因组 DNA作为建库样品,构建其 DNA 测序文库, 在本实施例中, 该 DNA测序文库为 40 - 45kb的插入片段的末端配对文库。  A DNA sequencing library was constructed using 50 μ§ of genomic DNA of Pygoscelis adeliae as a library sample. In this example, the DNA sequencing library is a 40-45 kb insert end-pair library.
1 ) 随机打断基因组 DNA  1) Randomly interrupt genomic DNA
将基因组 DNA配置成打断反应体系, 然后利用标准 Hydroshear仪( GeneMachine, San Carlos, CA., USA ), 设置不同的打断参数 (速度参数和循环数)作为实验的各个处理, 均以 ΙΟΟμΙ的打断反应体系将基因组 DNA进行打断, 以便获得 DNA片段。将以上各实验处理所 得的 DNA片段进行电泳, 以检测各实验处理对基因组 DNA的打断效果, 电泳结果见图 2。 如图 2所示, 各泳道的示意及其样品上样量为: 泳道 1 , 分子量标准 λ-Hind III digest (Takara公司, 货号 D3403A); 泳道 2 , 原始基因组 DNA, 上样 150ng; 泳道 3 , 分子量 标准 Low Range PFG Marker ( NEB公司, 货号 M0350S ); 泳道 4 , 速度参数为 14 , 循 环数为 40的打断效果, 其中上样量 200ng; 泳道 5 , 速度参数为 14 , 循环数为 30的打 断效果, 上样量 200ng; 泳道 6 , 分子量标准 lkb DNA Extension Ladder (Invitrogen公 司, 货号 1051 1 -012); 泳道 7 ,速度参数为 15 ,循环数为 40的打断效果, 上样量 200ng; 泳道 8 , 速度参数为 15 , 循环数为 30的打断效果, 上样量 200ng; 泳道 9 , 分子量标准 Low Range PFG Marker ( NEB公司, 货号 M0350S ) ; 泳道 10 , 速度参数为 16 , 循环 数为 40的打断效果, 上样量 200ng; 泳道 1 1 , 速度参数为 16 , 循环数为 30的打断效 果, 上样量 200ng; 泳道 12 , 分子量标准 lkb DNA Extension Ladder (Invitrogen公司, 货号 1051 1-012); 泳道 13 , 原始基因组 DNA, 上样 150ng。 由图 3可知, 泳道 8显示 的实验处理的打断效果最好, 即以速度参数为 15 , 循环数为 30的打断参数, 利用标准 Hydroshear仪将基因组 DNA进行打断时,打断效果最好,能够有效地获得 20-50kb的 DNA 片段。 The genomic DNA was configured to interrupt the reaction system, and then different breaking parameters (speed parameters and number of cycles) were set using the standard Hydroshear instrument (GeneMachine, San Carlos, CA., USA) as the individual treatments of the experiment, both of which were The reaction system is interrupted to interrupt the genomic DNA to obtain a DNA fragment. The DNA fragments obtained by the above experimental treatments were subjected to electrophoresis to examine the breaking effect of each experimental treatment on genomic DNA, and the electrophoresis results are shown in Fig. 2. As shown in Figure 2, the schematics of each lane and the sample loading were: Lane 1, molecular weight standard λ-Hind III digest (Takara, Cat. No. D3403A); Lane 2, original genomic DNA, loading 150 ng; Lane 3, The molecular weight standard Low Range PFG Marker (NEB, item number M0350S); Lane 4, the speed parameter is 14, the number of cycles is 40, the loading is 200ng; the lane 5, the speed parameter is 14, the number of cycles is 30 Interruption effect, loading volume 200ng; Lane 6, molecular weight standard lkb DNA Extension Ladder (Invitrogen, Cat. No. 1051 1 -012); Lane 7, speed parameter 15, cycle number 40 interrupting effect, loading volume 200ng Lane 8, the speed parameter is 15, the number of cycles is 30, the loading is 200ng; Lane 9, the molecular weight standard Low Range PFG Marker (NEB, item number M0350S); Lane 10, speed parameter 16 , interrupting effect with a cycle number of 40, load 200ng; lane 1 1 , speed parameter 16 , cycle number 30 interrupt Effect, loading volume 200 ng; Lane 12, molecular weight standard lkb DNA Extension Ladder (Invitrogen, Cat. No. 1051 1-012); Lane 13, original genomic DNA, loading 150 ng. As can be seen from Fig. 3, the experimental treatment shown in lane 8 has the best interruption effect, that is, the interruption parameter with the speed parameter of 15 and the number of cycles of 30, and the interruption of the genomic DNA by the standard Hydroshear instrument is the most interrupting effect. Well, it is possible to efficiently obtain a 20-50 kb DNA fragment.
将标准 Hydroshear仪的打断参数设置为速度 15 , 循环数 30, 然后利用其将 ΙΟΟμΙ的打 断反应体系进行打断, 以便获得 20-50kb的 DNA片段。 回收该 DNA片段置于 EP管中, 然 后利用 Agencourt AM纯化磁珠( Agencourt AMPure Beads , BECKMAN COULTER)将 DNA 片段进行纯化, 具体地, 向 EP管中添加 1.8倍体积的 Agencourt AM纯化磁珠, 颠倒混匀, 然后于室温下放置 10分钟使 DNA与磁珠充分结合,之后将 EP管置于磁力架上静置 2分钟 使磁珠被充分吸附到管壁, 去除上清, 再加入 500μ1 70%乙醇, 颠倒数次, 然后去除上清, 再加入 500μ1 70%乙醇, 颠倒数次, 去除上清, 然后将 ΕΡ管置于 37°C下进行干燥, 直至磁 珠出现干裂, 再加入 200μ1洗脱緩冲液 (QIAGEN)重悬磁珠, 具体地, 将 ΕΡ管于室温下放 置 10分钟使 DNA充分溶解于洗脱緩冲液, 然后将 ΕΡ管置于磁力架上静置 2分钟后,将上 清转入新的 ΕΡ管, 再向原管中加入 185μ1洗脱緩冲液重悬磁珠, 同样将原 ΕΡ管于室温下 放置 10分钟后, 将其转移置于磁力架上静置 2分钟, 然后将上清转入新的 ΕΡ管, 此举的 目的是最大限度回收结合于磁珠上的 DNA片段。 合并上清, 以得到纯化的 DNA片段, 备 用。  The breaking parameter of the standard Hydroshear instrument was set to a speed of 15 and a cycle number of 30, and then the ΙΟΟμΙ interrupting reaction system was interrupted to obtain a 20-50 kb DNA fragment. The DNA fragment was recovered and placed in an EP tube, and then the DNA fragment was purified using Agencourt AM purified magnetic beads (Agencourt AMPure Beads, BECKMAN COULTER). Specifically, 1.8 volumes of Agencourt AM purified magnetic beads were added to the EP tube, upside down. Mix well, then place at room temperature for 10 minutes to fully bind the DNA to the magnetic beads, then place the EP tube on a magnetic stand for 2 minutes to allow the magnetic beads to be fully adsorbed to the tube wall, remove the supernatant, and add 500 μl 70% Ethanol, reverse several times, then remove the supernatant, add 500μ1 70% ethanol, invert several times, remove the supernatant, then dry the tube at 37 ° C until the beads are dry, then add 200μ1 elution Resuspend the magnetic beads in the buffer (QIAGEN). Specifically, place the tube at room temperature for 10 minutes to fully dissolve the DNA in the elution buffer, then place the tube on the magnetic stand for 2 minutes. The supernatant was transferred to a new manifold, and the magnetic beads were resuspended in the original tube by adding 185 μl elution buffer. The original tube was also placed at room temperature for 10 minutes, and then transferred to a magnetic stand. Set for 2 minutes, then transfer the supernatant to a new tube, the purpose of which is to maximize the recovery of DNA fragments bound to the beads. The supernatants were combined to obtain a purified DNA fragment, which was prepared for use.
2 )末端补平和生物素标记  2) End-filling and biotin labeling
取 385μ1上一步得到的 DNA片段的溶液,添加 50μ1 10xT4多聚核苷酸激酶緩冲液、 8μ1 25mM dNTP、 25μ1 Τ4 DNA聚合酶( 3000单位 /ml, Enzymatics, Beverly, MA. , USA ), 5μ1 Klenow 聚合酶 (5000 单位 /ml, Enzymatics)和 25μ1 T4 多聚核苷酸激酶 (10000 单位 /ml, Enzymatics), 然后于 20°C下进行温育 30分钟, 对上一步得到的 DNA片段进行末端补平, 以便获得末端补平的 DNA片段, 再使用 Agencourt AM纯化磁珠将其进行纯化, 得到 345μ1 末端补平的 DNA片段,然后向其添加 50μ1 10χΤ4多聚核苷酸激酶緩冲液、50μ1 Biotin-dNTP, 25μ1 Τ4 DNA聚合酶( 3000单位 /ml, Enzymatics, Beverly, MA. , USA )、 5μ1 Klenow聚合 酶 (5000单位 /ml, Enzymatics) 和 25μ1 Τ4多聚核苷酸激酶 (10000单位 /ml, Enzymatics), 于 20 °C下进行温育 30分钟, 以便获得生物素标记产物, 备用。  Take 385 μl of the DNA fragment obtained in the previous step, add 50 μl of 10×T4 polynucleotide kinase buffer, 8 μl of 25 mM dNTP, 25 μl of Τ4 DNA polymerase (3000 units/ml, Enzymatics, Beverly, MA., USA), 5μ1 Klenow polymerase (5000 units/ml, Enzymatics) and 25μ1 T4 polynucleotide kinase (10000 units/ml, Enzymatics), then incubated at 20 ° C for 30 minutes, the end of the DNA fragment obtained in the previous step The DNA fragment was filled in to obtain the end-filled DNA fragment, and then purified using Agencourt AM purified magnetic beads to obtain a 345 μl end-filled DNA fragment, and then 50 μl of 10χΤ4 polynucleotide kinase buffer, 50 μl was added thereto. Biotin-dNTP, 25μ1 Τ4 DNA polymerase (3000 units/ml, Enzymatics, Beverly, MA., USA), 5μ1 Klenow polymerase (5000 units/ml, Enzymatics) and 25μ1 Τ4 polynucleotide kinase (10000 units/ Ml, Enzymatics), incubation at 20 °C for 30 minutes to obtain the biotinylated product, ready for use.
3 ) 电泳分离 通过添加 5μ1 20%SDS和 50μ1 10χ溴酚蓝,将上一步所得的生物素标记产物配制成 500μ1 的反应体系, 混匀后于 65°C下进行孵育 10分钟, 然后置于水上冷却 3分钟, 再利用 0.6% 的 Megebase琼脂糖胶将其进行电泳, 具体地, 以电压 3.5V/CM, 开关时间 (switch time ) 1 - 10s的脉冲场电泳 16小时, 然后利用溴化乙啶 (EB)将凝胶染色后, 在 Darkreader下切取 40 - 45kb的片段, 以便获得 40 - 45kb的生物素标记 DNA片段, 然后利用 QIAEX II纯化试 剂盒将其进行胶回收纯化, 备用。 3) Electrophoresis separation The biotin-labeled product obtained in the previous step was formulated into a 500 μl reaction system by adding 5 μl of 20% SDS and 50 μl of 10 bromophenol blue. After mixing, the mixture was incubated at 65 ° C for 10 minutes, and then placed on water for 3 minutes. It was electrophoresed using 0.6% Megebase agarose gel, specifically, pulsed field electrophoresis for 16 hours at a voltage of 3.5 V/CM, switch time of 1 - 10 s, and then using ethidium bromide (EB). After gel staining, a 40-45 kb fragment was cut under Darkreader to obtain a 40-45 kb biotin-labeled DNA fragment, which was then subjected to gel recovery using a QIAEX II purification kit for use.
将上述电泳分离得到的 40 - 45kb的生物素标记 DNA片段进行电泳检测,结果见图 3。 如图 3所示, 各泳道的示意及其样品上样量为: 泳道 1 , 分子量标准 lkb DNA Extension Ladder (Invitrogen公司, 货号 1051 1-012); 泳道 2, 电泳分离得到的 40 - 45kb的生物素 标记 DNA片段,上样约 50μ§;泳道 3 ,分子量标准 lkb DNA Extension Ladder (Invitrogen 公司, 货号 10511-012); 泳道 4, 分子量标准 Low Range PFG Marker ( NEB公司, 货 号 M0350S )。 由图 3可知, 电泳分离得到的 40 - 45kb的生物素标记 DNA片段质量合格。  The 40-45 kb biotin-labeled DNA fragment obtained by electrophoresis described above was subjected to electrophoresis, and the results are shown in Fig. 3. As shown in Figure 3, the schematics of each lane and the sample loading are: Lane 1, molecular weight standard lkb DNA Extension Ladder (Invitrogen, Cat. No. 1051 1-012); Lane 2, 40 - 45 kb organisms obtained by electrophoresis separation The labeled DNA fragment was loaded with about 50 μ§; Lane 3, molecular weight standard lkb DNA Extension Ladder (Invitrogen, Cat. No. 10511-012); Lane 4, molecular weight standard Low Range PFG Marker (NEB, Cat. No. M0350S). As can be seen from Fig. 3, the 40-45 kb biotin-labeled DNA fragment obtained by electrophoresis was qualified.
4 )环化  4) Cyclization
按照以下步骤将上一步获得的 40 _ 45kb 的生物素标记 DNA 片段进行环化处理: 在 1000ng40 - 45kb的生物素标记 DNA片段的溶液中添加 2000μ1 2χ连接酶緩冲液、 ΙΟΟμΙ Τ4 DNA连接酶( 400, 000单位 /ml, NEB )、 ΙΟΟμΙ Τ3 DNA连接酶( 300, 000单位 /ml, Enzymatics ), 然后用超纯水补平反应体系至 4ml, 分装至 8个 1.5ml EP管中, 每管 500μ1, 使得反应体系 中 DNA浓度为 0.25ng/ l, 然后将 EP管置于 16°C下进行温育 18小时。  The 40 _ 45 kb biotin-labeled DNA fragment obtained in the previous step was cyclized according to the following procedure: 2000 μl 2 χ ligase buffer, ΙΟΟμΙ Τ4 DNA ligase was added to a solution of 1000 ng 40 - 45 kb biotin-labeled DNA fragment ( 400,000 units/ml, NEB), ΙΟΟμΙ Τ3 DNA ligase (300,000 units/ml, Enzymatics), then fill the reaction system to 4ml with ultrapure water, and dispense into 8 1.5ml EP tubes, each The tube was 500 μl so that the DNA concentration in the reaction system was 0.25 ng/l, and then the EP tube was incubated at 16 ° C for 18 hours.
然后分别向各 EP管中添加 5μ1 lOOmM的 ΑΤΡ、60μ1 10x不降解质粒的 ATP依赖性 DNA 酶( Plasmid-Safe ATP-dependent DNase )緩冲液、 25μ1不降解质粒的 ATP依赖性 DNA酶 ( 10000单位 /ml, Epicentre )和 15μ1核酸外切酶 I ( 20000单位 /ml, NEB ), 将各 EP管于 37 °C下放置 30分钟, 以消化去除没有环化的双链或单链线性 DNA , 然后在 75。C下放置 20 分钟使酶失活, 再添加 16μ1 0.5Μ EDTA抑制酶活性, 然后将其进行水浴 3分钟使 DNA复 性, 以便获得环状 DNA。  Then, 5 μl lOO mM guanidine, 60 μl 10x non-degrading plasmid ATP-dependent DNase buffer, 25 μl non-degrading plasmid ATP-dependent DNase (10000 units) were added to each EP tube. /ml, Epicentre) and 15μ1 exonuclease I (20000 units/ml, NEB), each EP tube was placed at 37 °C for 30 minutes to digest the double-stranded or single-stranded linear DNA without cyclization, and then At 75. The enzyme was inactivated by placing it for 20 minutes under C, and 16 μl of 0.5 Μ EDTA was added to inhibit the enzyme activity, and then it was subjected to a water bath for 3 minutes to renature the DNA to obtain a circular DNA.
5 )将环状 DNA片段化  5) Fragmentation of circular DNA
利用 Covaris将环状 DNA打断成 200 - 800b 的线性 DNA片段,然后利用 QIAGEN Mini 洗脱 PCR纯化试剂盒将其进行回收纯化并溶于 50μ1的洗脱緩冲液中,以便获得纯化的 DNA 片段, 备用。  The circular DNA was broken into 200-800b linear DNA fragments using Covaris, and then purified by QIAGEN Mini elution PCR purification kit and dissolved in 50μ1 elution buffer to obtain purified DNA fragments. , spare.
6 )磁珠捕获、 结合到磁珠上的 DNA的末端修复及添加碱基 A  6) Magnetic bead capture, end-repair of DNA bound to magnetic beads and addition of base A
取 20μ1磁珠 ® Μ-280链霉亲和素磁珠 (Dynabeads® M-280 Streptavidin magnetic beads, Take 20μ1 magnetic beads ® Μ-280 streptavidin magnetic beads (Dynabeads® M-280 Streptavidin magnetic beads,
Invitrogen)于 1.5 ml无 R A醉不粘的离心管 ( Non-stick R ase-Free 1.5 ml Microfuge Tube, Ambion, AM12450不粘管) 中, 置于磁力架上静置 1分钟, 去上清, 用 50 μΐ的磁珠结合 緩冲液(Bead Binding Buffer ) 洗涤磁珠两次。 小心地重悬沉淀, 将离心管于磁分离架上静 置 1分钟, 弃上清。 重复此步骤一次。 然后再用 50 μΐ的磁珠结合緩冲液重悬磁珠, 备用。 Invitrogen) in a 1.5 ml non-stick Rase-Free 1.5 ml Microfuge Tube, Ambion, AM12450 non-stick tube), placed on a magnetic stand for 1 minute, remove the supernatant, and wash the beads twice with 50 μM Bead Binding Buffer. The pellet was carefully resuspended, and the tube was allowed to stand on a magnetic separator for 1 minute, and the supernatant was discarded. Repeat this step once. The beads were then resuspended in 50 μM magnetic beads in combination with buffer and used.
将纯化的 DNA 片段与重悬的磁珠于离心管中等体积混合均匀后, 于 20 °C下在 Thermomixer上进行温浴 15分钟(每 2 min震荡 15 s, 500 rpm )。 此时纯化的 DNA片段中 带有生物素标记的配对末端片段被特异结合到磁珠上, 而不带有生物素标记的 DNA片段则 无法结合到磁珠上。 然后按照以下步骤在磁分离架上用磁珠洗涤緩冲液 I和洗脱緩冲液对 磁珠进行纯化: 将离心管于磁分离架上静置 1分钟, 舍弃上清, 用 200 μΐ的磁珠洗涤緩冲 液 I洗涤磁珠, 每次洗涤轻轻吹打重悬磁珠五次, 去上清, 再用磁珠洗涤緩冲液 I将磁珠 重复洗涤两次, 然后将离心管于磁分离架上静置 1分钟, 舍弃上清, 用 200 μΐ的洗脱緩冲 液洗涤磁珠两次, 每次洗涤轻轻吹打重悬磁珠五次, 然后去掉最后一次洗涤的洗脱緩冲液, 向离心管中添加 50 μΐ的洗脱緩冲液重悬磁珠, 以便获得重悬的磁珠 DNA溶液, 备用。  The purified DNA fragments were mixed with the resuspended magnetic beads in a medium volume in a centrifuge tube, and then incubated on a Thermomixer for 15 minutes at 20 ° C (15 s, 2 rpm, 500 rpm). At this time, the biotin-labeled paired-end fragment in the purified DNA fragment was specifically bound to the magnetic beads, and the biotin-labeled DNA fragment could not be bound to the magnetic beads. Then, the magnetic beads were purified by magnetic bead washing buffer I and elution buffer on the magnetic separation rack as follows: The centrifuge tube was allowed to stand on the magnetic separation rack for 1 minute, and the supernatant was discarded, using 200 μM. Magnetic beads washing buffer I wash the magnetic beads, re-suspend the magnetic beads five times with each wash, remove the supernatant, and then repeatedly wash the magnetic beads twice with magnetic beads washing buffer I, then centrifuge the tubes Allow to stand on the magnetic separator for 1 minute, discard the supernatant, wash the beads twice with 200 μΐ of elution buffer, gently resuspend the beads five times with each wash, and then remove the elution of the last wash. To flush, resuspend the magnetic beads by adding 50 μM of elution buffer to the centrifuge tube to obtain a resuspended magnetic bead DNA solution for use.
向离心管中添加 ΙΟμΙ 10x T4多聚核苷酸激酶緩冲液、 1.6μ1 25ηιΜ的(1ΝΤΡ、5μ1 Τ4 DNA 聚合酶( 3000单位 /ml, Enzymatics, Beverly, MA. , USA )、 Ιμΐ Klenow聚合酶 (5000单位 /ml, Enzymatics)和 5μ1 T4多聚核苷酸激酶 (10000单位 /ml, Enzymatics)后, 于 20 °C下进行温育 30分钟, 以对结合到磁珠上的 DNA进行末端补平。然后在磁分离架上用磁珠洗涤緩冲液 I 和洗脱緩冲液对磁珠进行纯化, 步骤同上。 然后向离心管中添加 32 μΐ的洗脱緩冲液重悬磁 珠,转移至新的不粘管中,再添加 5μ1 lOxBlue緩冲液、 ΙΟμΙ 1 mM dATP以及 3μ1 Klenow (3 '-5' exo-), 混匀后置于 37°C下进行温育 30分钟, 以进行末端加碱基 A。 然后在磁分离架上用磁 珠洗涤緩冲液 I和洗脱緩冲液对磁珠进行纯化, 步骤同上。 然后向不粘管中添加 19μ1的洗 脱緩冲液重悬磁珠, 转移至新的不粘管中, 备用。  Add ΙΟμΙ 10x T4 polynucleotide kinase buffer, 1.6μ1 25ηιΜ to the centrifuge tube (1ΝΤΡ, 5μ1 Τ4 DNA polymerase (3000 units/ml, Enzymatics, Beverly, MA., USA), Ιμΐ Klenow polymerase (5000 units/ml, Enzymatics) and 5μ1 T4 polynucleotide kinase (10000 units/ml, Enzymatics), then incubated at 20 °C for 30 minutes to end-end the DNA bound to the magnetic beads Then, the magnetic beads were purified by magnetic bead washing buffer I and elution buffer on the magnetic separation rack, and the procedure was the same as above. Then, 32 μΐ of the elution buffer was added to the centrifuge tube to resuspend the magnetic beads. Transfer to a new non-stick tube, add 5μl lOxBlue buffer, ΙΟμΙ 1 mM dATP and 3μ1 Klenow (3 '-5' exo-), mix and incubate at 37 ° C for 30 minutes, Add the base A to the end. Then, the magnetic beads were purified on the magnetic separation rack with Magnetic Bead Wash Buffer I and Elution Buffer, as above. Then add 19 μl of elution buffer to the non-stick tube. Resuspend the beads and transfer to a new non-stick tube for later use.
7 )加接头和 PCR扩增  7) Add linker and PCR amplification
向上述备用的不粘管中添加 25μ1 2xRapid连接緩冲液、 Ιμΐ Illumina双末端接头寡核苷 酸 ( Illumina PE Adapter Oligo ) 以及 5μ1 T4 DNA连接酶( 600,000单位 /mL , Enzymatics ) 后, 置于 20°C下进行温育 15分钟, 以进行测序接头连接, 然后在磁分离架上用磁珠洗涤緩 冲液 I和洗脱緩冲液对磁珠进行纯化, 步骤同上。 再向不粘管中添加 23μ1的洗脱緩冲液重 悬磁珠,转移至 0.2ml PCR管中,添加 25μ1 Phusion DNA聚合酶和上下游引物各 Ιμΐ,混匀, 然后使用以下反应程序为进行 PCR扩增: (a ) 98°C 30秒; (b ) 98°C 10秒; (c ) 65 °C 30 秒; (d ) 72 °C 40秒; 其中步骤(b )到 (d )进行 18个循环, (e ) 72 °C 5分钟, 以便获得 扩增产物, 该扩增产物构成企鹅基因组的 DNA测序文库, 即获得企鹅基因组的 40 _ 45kb 的插入片段的末端配对文库。 然后将 PCR管保存于 4°C下, 备用。 2. 上机测序 Add 25 μl of 2xRapid Ligation Buffer, Ιμΐ Illumina double-end linker oligonucleotide (Illumina PE Adapter Oligo), and 5μ1 T4 DNA ligase (600,000 units/mL, Enzymatics) to the above-mentioned spare non-stick tube, and place it in 20 Incubation was carried out for 15 minutes at ° C for sequencing ligation, and then the magnetic beads were purified on a magnetic separation rack with Magnetic Bead Wash Buffer I and Elution Buffer, as above. Then, add 23 μl of elution buffer to the non-stick tube, resuspend the magnetic beads, transfer to a 0.2 ml PCR tube, add 25 μl of Phusion DNA polymerase and upstream and downstream primers, mix, and then use the following reaction procedure. PCR amplification: (a) 98 ° C for 30 seconds; (b) 98 ° C for 10 seconds; (c) 65 ° C for 30 seconds; (d) 72 ° C for 40 seconds; wherein steps (b) to (d) Eighteen cycles, (e) 72 °C for 5 minutes, in order to obtain an amplification product that constitutes a DNA sequencing library of the penguin genome, ie, a terminal paired library of the 40-45 kb insert of the penguin genome. The PCR tube was then stored at 4 ° C and used. 2. Sequencing on the machine
将上述 PCR管于磁分离架上静置 1分钟, 取出上清转入新的 1.5ml EP管中, 即将扩增 产物置于该 EP管中, 然后利用 2.0%的 Low Range Ultra琼脂糖胶将扩增产物进行电泳, 具 体地, 以电压 15V/CM的脉冲场进行电泳 16小时, 然后利用溴化乙啶 (EB)将凝胶染色后, 在 Darkreader下切取长度为 400bp - 700b 的 DNA片段, 然后利用 Qiagen MinElute凝胶纯 化试剂盒将其进行胶回收纯化, 以便获得企鹅基因组的 DNA测序文库, 备用。  The PCR tube was allowed to stand on a magnetic separation rack for 1 minute, and the supernatant was taken out and transferred to a new 1.5 ml EP tube, and the amplified product was placed in the EP tube, and then 2.0% Low Range Ultra agarose gel was used. The amplified product was subjected to electrophoresis, specifically, electrophoresis was carried out for 16 hours at a pulsed field of a voltage of 15 V/cm, and then the gel was stained with ethidium bromide (EB), and a DNA fragment of 400 bp - 700b in length was cut out under a Darkreader. The Qiagen MinElute Gel Purification Kit was then used for gel recovery and purification to obtain a DNA sequencing library of the penguin genome for use.
利用 Illumina HiSeq 2000测序平台将上述 DNA测序文库进行上机测序, 其中测序的循 环数为 50。  The above DNA sequencing library was sequenced using an Illumina HiSeq 2000 sequencing platform with a sequence of 50 cycles.
3. 测序结果及分析  3. Sequencing results and analysis
经过对企鹅基因组的 DNA测序文库的测序, 获得了企鹅基因组的插入片段为 40kb的 配 对 末 端 序 列 信 息 , 然 利 用 SOAPdenovo 软件 ( 该 软件 可 从 例 如 htt ://soa .genomics .org. cn/soapdenovo .html 下载), 将这些数据比对到企鶴基因组序列上, 验证该文库测序得到的配对末端序列在基因组上的距离跨度, 结果见图 4。 图 4显示了本 实施例得到的配对末端序列比对到企鹅基因组上的插入范围验证结果。 由图 4可知, 本 实施例获得的配对末端序列的距离跨度为 40kb, 符合片段范围预期。  After sequencing the DNA sequencing library of the penguin genome, the insertion fragment of the penguin genome was obtained with 40 kb of paired-end sequence information, using SOAPdenovo software (for example, htt://soa.genomics.org.cn/soapdenovo). Html download), these data were compared to the genomic sequence of the crane, and the distance span of the paired end sequences obtained by sequencing the library on the genome was verified. The results are shown in Fig. 4. Figure 4 shows the results of the insertion range verification of the paired end sequences obtained in this example on the penguin genome. As can be seen from Fig. 4, the paired end sequence obtained in this embodiment has a distance span of 40 kb, which is in line with the expected range of the fragment.
利用 SOAPdenovo软件进行企鹅基因组组装(例如可以参见 Li, R, 等人. The sequenc e and de novo assembly of the giant panda genome. Nature 463,311-317(2010); Li, R, 等 人. De novo assembly of human genomes with massively parallel short read sequencing. Genome Res.20:265-272(2010), 通过参照将其全文并入本文), 在 scaffold N50达到 890kb 时, 再结合使用上述获得的企鹅基因组的插入片段为 40kb的末端序列信息进行组装, 组装 的结果为: scaffold N50显著提高到 7500kb; 而在企鹅基因组组装 scaffold N50达到 5000k b时, 再结合使用上述获得的企鹅基因组的插入片段为 40kb的末端序列信息进行组装, 组 装的结果为: scaffold N50显著提高到 12000kb。  Penguin genome assembly using SOAPdenovo software (see, for example, Li, R, et al. The Sequenc e and de novo assembly of the giant panda genome. Nature 463, 311-3317 (2010); Li, R, et al. De novo assembly of Human genomes with massively parallel short read sequencing. Genome Res. 20:265-272 (2010), the entire disclosure of which is incorporated herein by reference, The 40 kb end sequence information was assembled, and the results of the assembly were as follows: scaffold N50 was significantly increased to 7500 kb; and when the penguin genome assembly scaffold N50 reached 5000 k b, the insertion of the penguin genome obtained above was 40 kb of end sequence information. As a result of the assembly, the assembly was: The scaffold N50 was significantly increased to 12,000 kb.
在本文中, 所使用的术语 "contig N50" 或 "scaffold N50" 是指: 在基因组图谱的绘制 过程(或组装过程 ) 中, scaffold N50是评价组装水平高低的一个重要指标。 基因组组装首 先通过相互重叠关系将 DNA片段序列拼接成较长的序列的重叠群,这些重叠群即为 contig, 再通过酶切位点信息或其他能够确定排列或顺序关系的 "标记" 信息将若干个 contig进行 拼接,可以形成各个 contig在染色体上的线性排列或是相对位置关系, 即形成 scaffokL N50 即覆盖 50%所有核苷酸的最大序列的重叠群长度, 把 contig或 scaffold从大到小排序, 并对 其长度进行累加, 当累加长度达到全部 contig或 scaffold长度一半时, 最后一个 contig或 scaffold长度即为 contig N50或 scaffold N50。 实施例 2: 梅花基因组的 DNA测序文库的构建和测序 In this paper, the term "contig N50" or "scaffold N50" is used to mean: In the mapping process (or assembly process) of the genome map, scaffold N50 is an important indicator for evaluating the level of assembly. Genomic assembly firstly splicing DNA fragment sequences into overlapping groups of longer sequences by overlapping relationships. These contigs are contig, and then some information can be determined by cleavage site information or other "marker" information that can determine the alignment or order relationship. The contig is spliced to form a linear arrangement or a relative positional relationship of each contig on the chromosome, that is, to form a scaffokL N50, that is, a contig length of a maximum sequence covering 50% of all nucleotides, sorting contig or scaffold from large to small. And accumulate its length. When the accumulated length reaches half of the total contig or scaffold length, the length of the last contig or scaffold is contig N50 or scaffold N50. Example 2: Construction and sequencing of DNA sequencing library of plum genome
以梅花的基因组 DNA作为基因组 DNA样本, 按照与实施例 1中相同的方法进行野梅 花( Pmnus mume )基因组的 DNA测序文库的构建和测序, 得到梅花基因组的 DNA测序文 库( 40kb的长片段的末端配对 DNA文库) 的测序结果。  Using the genomic DNA of plum blossom as a genomic DNA sample, the DNA sequencing library of the Pmnus mume genome was constructed and sequenced in the same manner as in Example 1 to obtain a DNA sequencing library of the plum genome (the end of a 40 kb long fragment) Sequencing results of the paired DNA library).
测序结果及分析  Sequencing results and analysis
利用 SOAPdenovo软件, 将测序结果比对到梅花基因组序列上, 验证该文库测序得到 的配对末端序列在基因组上的距离跨度, 结果见图 5。 图 5显示了本实施例得到的配对末 端序列比对到梅花基因组上的插入范围验证结果。 由图 5可知, 本实施例获得的配对末 端序列的距离跨度为 40kb, 符合片段范围预期。  Using SOAPdenovo software, the sequencing results were compared to the plum genome sequence, and the distance span of the paired end sequences obtained by sequencing the library on the genome was verified. The results are shown in Fig. 5. Fig. 5 shows the result of the insertion range verification of the paired terminal sequence obtained in the present example on the plum genome. As can be seen from Fig. 5, the distance of the paired terminal sequence obtained in this embodiment is 40 kb, which is in line with the expected range of the segment.
利用 SOAPdenovo软件进行梅花基因组组装,在 scaffold N50达到 570kb时,再结合使 用上述获得的梅花基因组的插入片段为 40kb的末端序列信息进行组装, 组装的结果为: sc affold N50显著提高到 970kb。 实施例 3: 人基因组的 DNA文库构建和测序  The plum genome assembly was performed using SOAPdenovo software. When the scaffold N50 reached 570 kb, the insert of the plum genome obtained above was used to assemble the 40 kb terminal sequence information. The assembly result was as follows: sc affold N50 was significantly increased to 970 kb. Example 3: DNA library construction and sequencing of the human genome
以人基因组 DNA作为基因组 DNA样本, 按照与实施例 1中相同的方法进行人基因组 的 DNA测序文库的构建和测序, 得到人基因组的 DNA测序文库( 40kb的长片段的末端配 对 DNA文库) 的测序结果。  Using human genomic DNA as a genomic DNA sample, the DNA sequencing library of the human genome was constructed and sequenced in the same manner as in Example 1 to obtain a sequencing sequence of a human genome DNA sequencing library (a 40 kb long fragment end-pair DNA library). result.
测序结果及分析  Sequencing results and analysis
使用 SOAPdenovo软件, 将测序结果比对到人基因组序列上, 验证该文库测序得到的 配对末端序列在基因组上的距离跨度, 结果见图 6。 图 6显示了本实施例得到的配对末端 序列比对到人基因组上的插入范围验证结果。 由图 6可知, 本实施例获得的配对末端序 列的距离跨度为 40kb , 符合片段范围预期。  Using SOAPdenovo software, the sequencing results were compared to the human genome sequence, and the distance span of the paired end sequences obtained by sequencing the library on the genome was verified. The results are shown in Fig. 6. Fig. 6 shows the results of the insertion range verification of the paired end sequences obtained in the present example aligned to the human genome. As can be seen from Fig. 6, the distance of the paired end sequence obtained in this embodiment spans 40 kb, which is in line with the expected range of the fragment.
利用 SOAPdenovo软件进行人基因组组装, 在 scaffold N50达到 lOOOkb时, 再结合使 用上述获得的人基因组的插入片段为 40kb 的末端序列信息进行组装, 组装的结果为: scaffold N50显著提高到 2000kb。 工业实用性  Human genome assembly was performed using SOAPdenovo software. When the scaffold N50 reached 1000 kb, the insert of the human genome obtained above was used to assemble 40 kb of end sequence information, and the assembly result was as follows: scaffold N50 was significantly increased to 2000 kb. Industrial applicability
本发明的制备用于测序的 DNA样品的方法、构建 DNA测序文库的方法、 DNA测序文 库、 DNA测序的方法、 确定基因组 DNA序列信息的方法、 制备用于测序的 DNA样品的装 置以及 DNA测序系统, 能够有效地应用于长片段的末端配对测序文库的构建及测序, 并且 获得的文库质量好, 测序结果准确。 根据本发明的实施例, 通过构建末端配对文库实现基 因组上大跨度序列的末端测序, 整个实验过程简单快速, 一个文库的构建周期仅为 3 天, 对比利用 fosmid克隆末端测序具有十分明显的时间优势, 避免了繁瑣的实验步骤, 降低文 库构建失败的风险。 通过对本发明所构建的 20 - 50kb插入长度的配对末端文库进行测序, 得到的有效数据用于组装, 能够有效增加 scaffold N50的长度, 促进基因组组装水平达到精 细图甚至完成图的标准。 Method for preparing DNA sample for sequencing, method for constructing DNA sequencing library, DNA sequencing library, method for DNA sequencing, method for determining genomic DNA sequence information, device for preparing DNA sample for sequencing, and DNA sequencing system , can be effectively applied to the construction and sequencing of end-paired sequencing libraries of long fragments, and The obtained library was of good quality and the sequencing results were accurate. According to an embodiment of the present invention, the end-sequencing of large-span sequences on the genome is realized by constructing a terminal paired library, and the whole experimental process is simple and rapid, and the construction period of one library is only 3 days, and the use of fosmid clone end sequencing has a significant time advantage. , avoiding cumbersome experimental steps and reducing the risk of library construction failure. By sequencing the paired-end library of 20-50 kb insertion length constructed by the present invention, the valid data obtained for assembly can effectively increase the length of scaffold N50, and promote the level of genome assembly to reach the standard of fine map or even complete map.
尽管本发明的具体实施方式已经得到详细的描述, 本领域技术人员将会理解。 根据已 经公开的所有教导, 可以对那些细节进行各种修改和替换, 这些改变均在本发明的保护范 围之内。 本发明的全部范围由所附权利要求及其任何等同物给出。  Although specific embodiments of the invention have been described in detail, those skilled in the art will understand. Various modifications and alterations of those details are possible in light of the teachings of the invention. The full scope of the invention is given by the appended claims and any equivalents thereof.
在本说明书的描述中, 参考术语 "一个实施例"、 "一些实施例"、 "示意性实施例"、 "示 例"、 "具体示例"、 或 "一些示例" 等的描述意指结合该实施例或示例描述的具体特征、 结 构、 材料或者特点包含于本发明的至少一个实施例或示例中。 在本说明书中, 对上述术语 的示意性表述不一定指的是相同的实施例或示例。 而且, 描述的具体特征、 结构、 材料或 者特点可以在任何的一个或多个实施例或示例中以合适的方式结合。  In the description of the present specification, the description of the terms "one embodiment", "some embodiments", "illustrative embodiment", "example", "specific example", or "some examples", etc. Particular features, structures, materials or features described in the examples or examples are included in at least one embodiment or example of the invention. In the present specification, the schematic representation of the above terms does not necessarily mean the same embodiment or example. Furthermore, the particular features, structures, materials, or characteristics described may be combined in a suitable manner in any one or more embodiments or examples.

Claims

权利要求书 Claim
1、 一种制备用于测序的 DNA样品的方法, 其特征在于, 包括以下步骤: A method of preparing a DNA sample for sequencing, comprising the steps of:
将基因组 DNA片段化, 以便获得第一 DNA片段;  Fragmenting the genomic DNA to obtain a first DNA fragment;
将所述第一 DNA片段进行末端修复, 以便获得经过末端修复的 DNA片段; 将所述经过末端修复的 DNA片段连接捕获标记,以便获得具有捕获标记的 DNA片段; 将所述具有捕获标记的 DNA片段进行环化处理, 以便获得环状 DNA;  End-repairing the first DNA fragment to obtain a DNA fragment subjected to end repair; attaching the end-repaired DNA fragment to a capture marker to obtain a DNA fragment having a capture marker; and the DNA having the capture marker The fragment is subjected to a cyclization treatment to obtain a circular DNA;
将所述环状 DNA片段化, 以便获得第二 DNA片段; 以及  Fragmenting the circular DNA to obtain a second DNA fragment;
将所述第二 DNA 片段进行筛选, 以便获得目的片段, 所述目的片段构成用于测序的 DNA样品。  The second DNA fragment is screened to obtain a fragment of interest, which constitutes a DNA sample for sequencing.
2、 根据权利要求 1 所述的方法, 其特征在于, 进一步包括从样本中提取基因组 DNA 的步骤。  2. The method of claim 1 further comprising the step of extracting genomic DNA from the sample.
3、 根据权利要求 2所述的方法, 其特征在于, 所述样本来源于动物、 植物和微生物的 至少一种。  3. Method according to claim 2, characterized in that the sample is derived from at least one of animals, plants and microorganisms.
4、 根据权利要求 3所述的方法, 其特征在于, 所述动物为人和企鹅的至少一种。  4. The method according to claim 3, wherein the animal is at least one of a human and a penguin.
5、 根据权利要求 3所述的方法, 其特征在于, 所述植物为梅花。  5. The method according to claim 3, wherein the plant is plum.
6、 根据权利要求 1所述的方法, 其特征在于, 所述将基因组 DNA片段化是通过选自 雾化法、 超声破碎法、 HydroShear和酶切处理的至少一种进行的。  6. The method according to claim 1, wherein the fragmenting of the genomic DNA is carried out by at least one selected from the group consisting of an atomization method, a sonication method, a HydroShear, and a digestion treatment.
7、 根据权利要求 6所述的方法, 其特征在于, 利用 HydroShear仪将基因组 DNA片段 化。  7. Method according to claim 6, characterized in that the genomic DNA is fragmented using a HydroShear instrument.
8、 根据权利要求 1所述的方法, 其特征在于, 在获得第一 DNA片段之后, 且在将所 述第一 DNA片段进行末端修复之前, 进一步包括: 对所述第一 DNA片段进行片段选择。  8. The method according to claim 1, wherein after obtaining the first DNA fragment and before performing the end repair of the first DNA fragment, the method further comprises: performing segment selection on the first DNA fragment .
9、 根据权利要求 1所述的方法, 其特征在于, 所述第一 DNA片段的长度为 20-50kb。 9. The method according to claim 1, wherein the first DNA fragment has a length of 20 to 50 kb.
10、 根据权利要求 1所述的方法, 其特征在于, 所述第一 DNA片段的长度为 25-50kb。 10. The method according to claim 1, wherein the first DNA fragment has a length of 25-50 kb.
11、 根据权利要求 1所述的方法, 其特征在于,将所述第一 DNA片段进行末端修复和 连接捕获标记均是利用 Klenow片段、 T4 DNA聚合酶和 T4多核苷酸激酶进行的, 其中, 所述 Klenow片段具有 5'→3'聚合酶活性和 3'→5'聚合酶活性, 但缺少 5'→3'外切酶活性。 The method according to claim 1, wherein the first DNA fragment is subjected to end repair and ligated capture, and the Klenow fragment, T4 DNA polymerase and T4 polynucleotide kinase are used, wherein The Klenow fragment has 5'→3' polymerase activity and 3'→5' polymerase activity, but lacks 5'→3' exonuclease activity.
12、 根据权利要求 1所述的方法, 其特征在于, 所述捕获标记为生物素。  12. The method of claim 1 wherein the capture marker is biotin.
13、根据权利要求 1所述的方法,其特征在于,在获得具有捕获标记的 DNA片段之后, 且在将所述具有捕获标记的 DNA片段进行环化处理之前, 进一步包括: 对所述具有捕获标 记的 DNA片段进行片段选择。 13. The method according to claim 1, wherein after obtaining the DNA fragment having the capture marker, and before subjecting the DNA fragment having the capture marker to cyclization, further comprising: having the capture The labeled DNA fragment is subjected to fragment selection.
14、 根据权利要求 12或 13所述的方法, 其特征在于, 利用 0.6%琼脂糖电泳进行所述 片段选择。 14. Method according to claim 12 or 13, characterized in that the fragment selection is carried out by 0.6% agarose electrophoresis.
15、 根据权利要求 1所述的方法, 其特征在于, 所述具有捕获标记的 DNA片段的长度 为 20-50kb。  The method according to claim 1, wherein the DNA fragment having the capture label has a length of 20 to 50 kb.
16、 根据权利要求 1所述的方法, 其特征在于, 利用 T4 DNA连接酶和 T3 DNA连接 酶将所述具有标记物的 DNA片段进行环化处理。  The method according to claim 1, wherein the DNA fragment having the label is subjected to a cyclization treatment using T4 DNA ligase and T3 DNA ligase.
17、 根据权利要求 1所述的方法, 其特征在于, 将所述具有标记物的 DNA片段进行环 化处理后, 进一步包括去除未环化的 DNA片段的步骤。  17. The method according to claim 1, wherein the step of cyclizing the DNA fragment having the label further comprises the step of removing the uncircularized DNA fragment.
18、 根据权利要求 1所述的方法, 其特征在于, 通过使用选自 DNA酶和核酸外切酶的 至少一种去除未环化的 DNA片段。  18. The method according to claim 1, wherein the uncircularized DNA fragment is removed by using at least one selected from the group consisting of a DNase and an exonuclease.
19、 根据权利要求 18所述的方法, 其特征在于, 所述 DNA酶为不降解质粒的 ATP依 赖性 DNA酶, 所述核酸外切酶为核酸外切酶 I。  The method according to claim 18, wherein the DNase is an ATP-dependent DNase that does not degrade a plasmid, and the exonuclease is exonuclease I.
20、 根据权利要求 1所述的方法, 其特征在于, 将所述环状 DNA片段化是通过选自雾 化法、 超声破碎法、 HydroShear和酶切处理的至少一种进行的。  20. The method according to claim 1, wherein the fragmentation of the circular DNA is carried out by at least one selected from the group consisting of a misting method, a sonication method, a HydroShear, and a digestion treatment.
21、 根据权利要求 1所述的方法, 其特征在于, 利用 Covaris 超声打断仪将环状 DNA 片段化。  21. The method of claim 1 wherein the circular DNA is fragmented using a Covaris ultrasonic interrupter.
22、根据权利要求 1所述的方法,其特征在于,所述第二 DNA片段的长度为 100-1000bp。 22. The method of claim 1 wherein said second DNA fragment is between 100 and 1000 bp in length.
23、 根据权利要求 1所述的方法, 其特征在于, 所述第二 DNA片段为 200-800 bp。23. The method according to claim 1, wherein the second DNA fragment is 200-800 bp.
24、 根据权利要求 1所述的方法, 其特征在于, 将所述第二 DNA片段进行筛选是利用 磁珠捕获进行的, 其中, 所述磁珠携带能够特异性识别所述捕获标记的分子实体。 24. The method according to claim 1, wherein the screening of the second DNA fragment is performed by magnetic bead capture, wherein the magnetic bead carries a molecular entity capable of specifically recognizing the capture marker .
25、 根据权利要求 24所述的方法, 其特征在于, 所述捕获标记为生物素, 且所述磁珠 上携带的分子实体为链霉亲和素。  25. The method of claim 24, wherein the capture marker is biotin and the molecular entity carried on the magnetic bead is streptavidin.
26、 一种构建 DNA测序文库的方法, 其特征在于, 包含以下步骤:  26. A method of constructing a DNA sequencing library, comprising the steps of:
根据权利要求 1所述的方法, 制备目的片段;  The method according to claim 1, wherein the target fragment is prepared;
将所述目的片段进行末端修复和 3'末端添加碱基 A, 以便获得末端添加碱基 A的目的 片段;  The target fragment is subjected to end repair and base A is added to the 3' end to obtain a target fragment in which the base A is added at the end;
将所述末端添加碱基 A的目的片段与接头相连, 以便获得连接产物;  Adding the target fragment of the terminal A to the terminal to the linker to obtain a ligation product;
将所述连接产物进行 PCR扩增, 以便获得扩增产物; 以及  The ligation product is subjected to PCR amplification to obtain an amplification product;
分离纯化所述扩增产物, 所述扩增产物构成所述 DNA测序文库。 The amplification product is isolated and purified, and the amplification product constitutes the DNA sequencing library.
27、 根据权利要求 26所述的方法, 其特征在于, 将所述目的片段进行末端修复是利用 Klenow片段、 T4 DNA聚合酶和 T4多核苷酸激酶进行的, 其中, 所述 Klenow片段具有 5' →3'聚合酶活性和 3'→5'聚合酶活性, 但缺少 5'→3'外切酶活性。 27. The method according to claim 26, wherein the end fragment repair of the fragment of interest is carried out using a Klenow fragment, T4 DNA polymerase and T4 polynucleotide kinase, wherein the Klenow fragment has 5' →3' polymerase activity and 3'→5' polymerase activity, but lacking 5'→3' exonuclease activity.
28、根据权利要求 26所述的方法, 其特征在于, 利用 Klenow (3 '-5' exo-)将所述目的片 段进行 3 '末端添加碱基 A。  The method according to claim 26, wherein the target fragment is subjected to addition of a base A at the 3' end using Klenow (3 '-5' exo-).
29、 根据权利要求 26所述的方法, 其特征在于, 将所述末端添加碱基 A的目的片段与 接头相连是利用 T4 DNA连接酶进行的。  The method according to claim 26, wherein the attachment of the target fragment to which the terminal A is added to the terminal is carried out using T4 DNA ligase.
30、 根据权利要求 26所述的方法, 其特征在于, 利用 2%琼脂糖凝胶电泳分离纯化所 述扩增产物。  30. The method according to claim 26, wherein the amplification product is isolated and purified by 2% agarose gel electrophoresis.
31、 一种 DNA测序文库, 其是通过根据权利要求 26-30中任一项所述的方法构建的。 31. A DNA sequencing library constructed by the method of any one of claims 26-30.
32、 一种 DNA测序的方法, 其包括: 32. A method of DNA sequencing, comprising:
根据权利要求 26-30任一项所述的方法, 构建基因组 DNA的 DNA测序文库; 以及 对所述基因组 DNA的 DNA测序文库进行测序, 以便获得测序结果。  The method according to any one of claims 26 to 30, wherein a DNA sequencing library of genomic DNA is constructed; and the DNA sequencing library of the genomic DNA is sequenced to obtain a sequencing result.
33、 根据权利要求 32所述的方法, 其特征在于, 所述测序是利用高通量测序平台进行 的。  33. The method of claim 32, wherein said sequencing is performed using a high throughput sequencing platform.
34、 根据权利要求 32所述的方法, 其特征在于, 所述测序是利用选自第二代测序平台 和单分子测序平台的至少一种进行的。  34. The method of claim 32, wherein the sequencing is performed using at least one selected from the group consisting of a second generation sequencing platform and a single molecule sequencing platform.
35、 根据权利要求 34 所述的方法, 其特征在于, 所述第二代测序平台为选自 Illumina-Solexa测序平台、 ABI-Solid测序平台和 Roche-454测序平台的至少一种, 所述单 分子测序平台为选自 Helicos公司的真实单分子测序平台、 Pacific Biosciences公司的单分子 实时测序平台和 Oxford Nanopore Technologies公司的纳米孔测序平台的至少一种。  35. The method according to claim 34, wherein the second generation sequencing platform is at least one selected from the group consisting of an Illumina-Solexa sequencing platform, an ABI-Solid sequencing platform, and a Roche-454 sequencing platform, the single The molecular sequencing platform is at least one of a real single molecule sequencing platform selected from Helicos, a single molecule real-time sequencing platform from Pacific Biosciences, and a nanopore sequencing platform from Oxford Nanopore Technologies.
36、 根据权利要求 34所述的方法, 其特征在于, 进一步包括:  36. The method of claim 34, further comprising:
基于所述测序结果, 确定所述基因组 DNA的部分序列信息。  Based on the sequencing results, partial sequence information of the genomic DNA is determined.
37、 一种确定基因组 DNA序列信息的方法, 其特征在于, 包括以下步骤:  37. A method of determining genomic DNA sequence information, comprising the steps of:
将基因组 DNA分为第一基因组 DNA样本和第二基因组 DNA样本;  The genomic DNA is divided into a first genomic DNA sample and a second genomic DNA sample;
利用所述第一基因组 DNA样本, 利用根据权利要求 32-36任一项所述的方法对所述第 一基因组 DNA样本进行测序,并且基于测序结果,确定所述基因组 DNA的部分序列信息; 利用所述第二基因组 DNA样本, 根据常规的测序方法对所述第二基因组 DNA样本进 行测序, 获得所述基因组 DNA的测序数据, 其中, 所述常规的测序方法为选自 SOLEXA、 SOLID, 454、 和单分子测序技术的至少一种; 以及  Using the first genomic DNA sample, sequencing the first genomic DNA sample by the method according to any one of claims 32-36, and determining partial sequence information of the genomic DNA based on the sequencing result; The second genomic DNA sample is subjected to sequencing according to a conventional sequencing method to obtain sequencing data of the genomic DNA, wherein the conventional sequencing method is selected from the group consisting of SOLEXA, SOLID, 454, And at least one of single molecule sequencing techniques;
将所述基因组 DNA的部分序列信息与所述基因组 DNA的测序数据进行组装和拼接, 以便确定所述基因组 DNA的序列信息。 The partial sequence information of the genomic DNA is assembled and spliced with the sequencing data of the genomic DNA to determine sequence information of the genomic DNA.
38、 一种制备用于测序的 DNA样品的装置, 其特征在于, 包括: 38. A device for preparing a DNA sample for sequencing, comprising:
第一片段化单元,所述第一片段化单元用于将基因组 DNA片段化,以便获得第一 DNA 片段;  a first fragmentation unit for fragmenting genomic DNA to obtain a first DNA fragment;
末端修复单元, 所述末端修复单元与所述第一片段化单元相连, 用于将所述第一 DNA 片段进行末端修复, 以便获得经过末端修复的 DNA片段;  An end repairing unit, the end repairing unit is coupled to the first fragmentation unit for performing end repair of the first DNA fragment to obtain a DNA fragment that has been repaired at the end;
标记单元, 所述标记单元与所述末端修复单元相连, 用于将所述经过末端修复的 DNA 片段连接捕获标记, 以便获得具有捕获标记的 DNA片段;  a labeling unit, the labeling unit being connected to the end repairing unit, configured to connect the end-repaired DNA fragment to a capture marker to obtain a DNA fragment having a capture marker;
环化单元, 所述环化单元与所述标记单元相连, 用于将所述具有捕获标记的 DNA片段 进行环化处理, 以便获得环状 DNA;  a cyclization unit, the cyclization unit is coupled to the labeling unit, and configured to cyclize the DNA fragment having the capture label to obtain a circular DNA;
第二片段化单元, 所述第二片段化单元与所述环化单元相连, 用于将所述环状 DNA片 段化, 以便获得第二 DNA片段; 以及  a second fragmentation unit, the second fragmentation unit being ligated to the cyclization unit for fragmenting the circular DNA to obtain a second DNA fragment;
筛选单元, 所述筛选单元与所述第二片段化单元相连, 用于将所述第二 DNA片段进行 筛选, 以便获得目的片段, 所述目的片段构成用于测序的 DNA样品。  a screening unit, the screening unit being coupled to the second fragmentation unit for screening the second DNA fragment to obtain a target fragment, the target fragment constituting a DNA sample for sequencing.
39、 根据权利要求 38所述的装置, 其特征在于, 进一步包括:  39. The device according to claim 38, further comprising:
基因组提取单元, 所述基因组提取单元与所述第一片段化单元相连, 用于从生物样本 提取基因组 DNA。  A genome extraction unit, the genome extraction unit being coupled to the first fragmentation unit for extracting genomic DNA from the biological sample.
40、 根据权利要求 38 所述的装置, 其特征在于, 所述第一片段化单元为 HydroShear 仪。  40. Apparatus according to claim 38 wherein said first fragmentation unit is a HydroShear instrument.
41、 根据权利要求 38所述的装置, 其特征在于, 所述第二片段化单元为 Covaris 超声 打断仪。  41. Apparatus according to claim 38 wherein said second fragmentation unit is a Covaris ultrasonic interrupter.
42、 根据权利要求 38所述的装置, 其特征在于, 所述标记单元中设置有生物素。  42. The device according to claim 38, wherein biotin is provided in the marking unit.
43、 根据权利要求 42所述的装置, 其特征在于, 所述筛选单元中设置有携带链霉亲和 素的磁珠 。  43. The apparatus according to claim 42, wherein the screening unit is provided with a magnetic bead carrying streptavidin.
44、 一种 DNA测序系统, 其特征在于, 包括:  44. A DNA sequencing system, comprising:
样品制备装置,所述样品制备装置为权利要求 38-43任一项所述的制备用于测序的 DNA 样品的装置, 用于制备目的片段, 所述目的片段构成用于测序的 DNA样品;  A sample preparation device, which is the device for preparing a DNA sample for sequencing according to any one of claims 38 to 43 for preparing a fragment of interest, the fragment of interest constituting a DNA sample for sequencing;
文库构建装置, 所述文库构建装置与所述样品制备装置相连, 用于针对所述用于测序 的 DNA样品构建测序文库; 以及  a library construction device, the library construction device being coupled to the sample preparation device for constructing a sequencing library for the DNA sample for sequencing;
测序装置, 所述测序装置与文库构建装置相连, 用于对所述测序文库进行测序。  A sequencing device is coupled to the library construction device for sequencing the sequencing library.
45、 根据权利要求 44所述的系统, 其特征在于, 所述文库构建装置进一步包括: 末端修饰单元, 所述末端修饰单元用于将所述目的片段与接头相连, 以便获得连接产 物; 45. The system according to claim 44, wherein the library construction device further comprises: an end modification unit, wherein the end modification unit is configured to connect the target segment to a linker to obtain a connection product. Object
PCR扩增单元, 所述 PCR扩增单元与所述末端修饰单元相连, 用于对所述连接产物进 行扩增, 以便获得扩增产物; 以及  a PCR amplification unit, the PCR amplification unit being linked to the terminal modification unit for amplifying the ligation product to obtain an amplification product;
纯化单元, 所述纯化单元与所述 PCR扩增单元相连, 用于分离纯化所述扩增产物, 所 述扩增产物构成所述 DNA测序文库。  And a purification unit, wherein the purification unit is coupled to the PCR amplification unit for isolating and purifying the amplification product, and the amplification product constitutes the DNA sequencing library.
46、 根据权利要求 44所述的系统, 其特征在于, 所述测序装置是选自第二代测序平台 和单分子测序平台的至少一种。  46. The system of claim 44, wherein the sequencing device is at least one selected from the group consisting of a second generation sequencing platform and a single molecule sequencing platform.
47、 根据权利要求 46 所述的系统, 其特征在于, 所述第二代测序平台为选自 Illumina-Solexa测序平台、 ABI-Solid测序平台和 Roche-454测序平台的至少一种。  47. The system of claim 46, wherein the second generation sequencing platform is at least one selected from the group consisting of an Illumina-Solexa sequencing platform, an ABI-Solid sequencing platform, and a Roche-454 sequencing platform.
48、根据权利要求 46所述的系统,其特征在于,所述单分子测序平台为选自 Helicos公 司的真实单分子测序平台、 Pacific Bio sciences 公司的单分子实时测序平台和 Oxford Nanopore Technologies公司的纳米孑 ϋ测序平台的至少一种。  48. The system of claim 46, wherein the single molecule sequencing platform is a real single molecule sequencing platform selected from Helicos, a single molecule real time sequencing platform from Pacific Biosciences, and a nanometer from Oxford Nanopore Technologies. At least one of the sequencing platforms.
PCT/CN2011/083726 2010-12-16 2011-12-08 Method of preparing dna sample for sequencing and use thereof WO2012079486A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201010591448.7 2010-12-16
CN2010105914487A CN102534811B (en) 2010-12-16 2010-12-16 DNA (deoxyribonucleic acid) library and preparation method thereof, as well as DNA sequencing method and device

Publications (1)

Publication Number Publication Date
WO2012079486A1 true WO2012079486A1 (en) 2012-06-21

Family

ID=46244096

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2011/083726 WO2012079486A1 (en) 2010-12-16 2011-12-08 Method of preparing dna sample for sequencing and use thereof

Country Status (3)

Country Link
CN (1) CN102534811B (en)
HK (1) HK1169460A1 (en)
WO (1) WO2012079486A1 (en)

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103290104A (en) * 2013-01-23 2013-09-11 蒋智 Simple and cheap genome sample breaking method applied to second generation sequencing
WO2016037358A1 (en) * 2014-09-12 2016-03-17 深圳华大基因科技有限公司 Isolated oligonucleotide and use thereof in nucleic acid sequencing
CN107794572A (en) * 2016-08-31 2018-03-13 安诺优达基因科技(北京)有限公司 A kind of method and its application for building large fragment library
EP3225721A4 (en) * 2014-11-26 2018-05-16 MGI Tech Co., Ltd. Method and reagent for constructing nucleic acid double-linker single-strand cyclical library
CN112176028A (en) * 2019-07-05 2021-01-05 深圳华大生命科学研究院 Rapid WGS library establishment method based on endonuclease

Families Citing this family (19)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102839168A (en) * 2012-07-31 2012-12-26 深圳华大基因研究院 Nucleic acid probe, and preparation method and application thereof
CN103627710B (en) * 2012-08-22 2016-08-03 中国人民解放军总医院 SPG11 gene mutation body and application thereof
US10023856B2 (en) * 2013-09-25 2018-07-17 Thermo Fisher Scientific Baltics Uab Enzyme composition for DNA end repair, adenylation, phosphorylation
CN103849617A (en) * 2013-12-16 2014-06-11 复旦大学 Connector and method for Permanent preservation of genome DNA (deoxyribonucleic acid)
CN104109709A (en) * 2014-04-04 2014-10-22 北京泛生子生物科技有限公司 Important gene enrichment method used for individual cancer diagnosis and treatment
CN104153003A (en) * 2014-08-08 2014-11-19 上海美吉生物医药科技有限公司 Method for establishing DNA (Deoxyribose Nucleic Acid) library based on illumina sequencing platform
WO2016049929A1 (en) * 2014-09-30 2016-04-07 天津华大基因科技有限公司 Method for constructing sequencing library and application thereof
US10479991B2 (en) 2014-11-26 2019-11-19 Mgi Tech Co., Ltd Method and reagent for constructing nucleic acid double-linker single-strand cyclical library
CN105986324B (en) * 2015-02-11 2018-08-14 深圳华大智造科技有限公司 Cyclic annular tiny RNA library constructing method and its application
CN106319639B (en) * 2015-06-17 2018-09-04 深圳华大智造科技有限公司 Build the method and apparatus of sequencing library
CN105002570B (en) * 2015-07-21 2017-09-05 中国农业科学院深圳农业基因组研究所 A kind of method for once preparing multiple double end sequencing libraries of DNA large fragments insertion
CN106554957B (en) 2015-09-30 2020-04-21 中国农业科学院深圳农业基因组研究所 Sequencing library, preparation and application thereof
CN107794573B (en) * 2016-08-31 2022-09-13 浙江安诺优达生物科技有限公司 Method for constructing DNA large fragment library and application thereof
CN108342385A (en) * 2017-01-22 2018-07-31 中国科学院天津工业生物技术研究所 A kind of connector and the method that sequencing library is built by way of high efficiency cyclisation
CN108866154B (en) * 2017-05-15 2021-11-16 深圳华大基因股份有限公司 Noninvasive prenatal haplotype construction method based on long-fragment DNA capture and third-generation sequencing
CN108866172B (en) * 2017-05-15 2021-11-16 深圳华大基因股份有限公司 Noninvasive prenatal haplotype construction method based on long-fragment DNA cyclization and third-generation sequencing
CN109988817B (en) * 2017-12-30 2020-12-04 安诺优达基因科技(北京)有限公司 Method for randomly breaking DNA
CN109266718A (en) * 2018-10-19 2019-01-25 广东菲鹏生物有限公司 Detect method existing for endonuclease
CN109868270B (en) * 2019-03-11 2021-02-26 深圳乐土生物科技有限公司 Low initial amount DNA library construction method

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101351552A (en) * 2005-06-06 2009-01-21 454生命科学公司 Paired end sequencing
CN101460633A (en) * 2006-03-14 2009-06-17 基尼宗生物科学公司 Methods and means for nucleic acid sequencing

Family Cites Families (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
AU6188599A (en) * 1998-10-26 2000-05-15 Novozymes A/S Constructing and screening a dna library of interest in filamentous fungal cells
US20100009856A1 (en) * 2002-06-21 2010-01-14 Sinogenomax Sompany LTD. Randomized dna libraries and double-stranded rna libraries, use and method of production thereof

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101351552A (en) * 2005-06-06 2009-01-21 454生命科学公司 Paired end sequencing
CN101460633A (en) * 2006-03-14 2009-06-17 基尼宗生物科学公司 Methods and means for nucleic acid sequencing

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
FULLWOOD M J. ET AL.: "Next-generation DNA sequencing of paired-end tags (PET) for anscriptome and genome analyses.", GENOME RES., vol. 9, no. 4, April 2009 (2009-04-01), pages 521 - 532, XP055015048, DOI: doi:10.1101/gr.074906.107 *

Cited By (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103290104A (en) * 2013-01-23 2013-09-11 蒋智 Simple and cheap genome sample breaking method applied to second generation sequencing
WO2016037358A1 (en) * 2014-09-12 2016-03-17 深圳华大基因科技有限公司 Isolated oligonucleotide and use thereof in nucleic acid sequencing
US9890375B2 (en) 2014-09-12 2018-02-13 Bgi Shenzhen Co., Limited Isolated oligonucleotide and use thereof in nucleic acid sequencing
US10023906B2 (en) 2014-09-12 2018-07-17 Mgi Tech Co., Ltd. Method for constructing nucleic acid single-stranded cyclic library and reagents thereof
US10544451B2 (en) 2014-09-12 2020-01-28 Mgi Tech Co., Ltd. Vesicular linker and uses thereof in nucleic acid library construction and sequencing
US10995367B2 (en) 2014-09-12 2021-05-04 Mgi Tech Co., Ltd. Vesicular adaptor and uses thereof in nucleic acid library construction and sequencing
EP3225721A4 (en) * 2014-11-26 2018-05-16 MGI Tech Co., Ltd. Method and reagent for constructing nucleic acid double-linker single-strand cyclical library
CN107794572A (en) * 2016-08-31 2018-03-13 安诺优达基因科技(北京)有限公司 A kind of method and its application for building large fragment library
CN107794572B (en) * 2016-08-31 2022-04-05 安诺优达基因科技(北京)有限公司 Method for constructing large fragment library and application thereof
CN112176028A (en) * 2019-07-05 2021-01-05 深圳华大生命科学研究院 Rapid WGS library establishment method based on endonuclease
CN112176028B (en) * 2019-07-05 2024-04-16 深圳华大生命科学研究院 Rapid WGS library construction method based on endonuclease

Also Published As

Publication number Publication date
CN102534811A (en) 2012-07-04
CN102534811B (en) 2013-11-20
HK1169460A1 (en) 2013-01-25

Similar Documents

Publication Publication Date Title
WO2012079486A1 (en) Method of preparing dna sample for sequencing and use thereof
US10400279B2 (en) Method for constructing a sequencing library based on a single-stranded DNA molecule and application thereof
WO2012028105A1 (en) Sequencing library and its preparation method thereof, terminal nucleic acid sequence determining method and system
JP6438126B2 (en) Method and reagent kit for constructing nucleic acid single-stranded circular library
US11827933B2 (en) Bubble-shaped adaptor element and method of constructing sequencing library with bubble-shaped adaptor element
WO2013064066A1 (en) Method for constructing methylated high-throughput sequencing library for whole genome and use thereof
KR102458022B1 (en) Methods of sequencing nucleic acids in mixtures and compositions related thereto
WO2012071985A1 (en) Method for extracting dna from ffpe samples and use thereof
US20100035249A1 (en) Rna sequencing and analysis using solid support
WO2012159564A1 (en) High throughput methylation detection method
WO2012037878A1 (en) Nucleic acid index and application thereof
WO2013104106A1 (en) Method for construction of plasma dna sequencing library and kit thereof
US9540687B2 (en) DNA fragment detection method, DNA fragment detection kit and the use thereof
CA2892646A1 (en) Methods for targeted genomic analysis
WO2013056640A1 (en) A method for preparing a nucleic acid library, its uses and kits
CN104153003A (en) Method for establishing DNA (Deoxyribose Nucleic Acid) library based on illumina sequencing platform
WO2012089147A1 (en) Method for constructing a sequencing library for a nucleic acid sample and use thereof
WO2012037883A1 (en) Nucleic acid tags and use thereof
WO2012126398A1 (en) Dna tag and use thereof
US20230017673A1 (en) Methods and Reagents for Molecular Barcoding
CN111378720A (en) Construction method and application of sequencing library of long-chain non-coding RNA
WO2012075959A1 (en) Hemi-methylation linker and use thereof
CN111560651B (en) Method for preparing double-stranded RNA sequencing library
EP3615683B1 (en) Methods for linking polynucleotides
WO2014086037A1 (en) Method for constructing nucleic acid sequencing library and applications thereof

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 11847991

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 11847991

Country of ref document: EP

Kind code of ref document: A1