WO2012126398A1 - Dna标签及其用途 - Google Patents

Dna标签及其用途 Download PDF

Info

Publication number
WO2012126398A1
WO2012126398A1 PCT/CN2012/072970 CN2012072970W WO2012126398A1 WO 2012126398 A1 WO2012126398 A1 WO 2012126398A1 CN 2012072970 W CN2012072970 W CN 2012072970W WO 2012126398 A1 WO2012126398 A1 WO 2012126398A1
Authority
WO
WIPO (PCT)
Prior art keywords
dna
tag
linker
sequencing
library
Prior art date
Application number
PCT/CN2012/072970
Other languages
English (en)
French (fr)
Inventor
程磊
王俊
Original Assignee
深圳华大基因科技有限公司
深圳华大基因研究院
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 深圳华大基因科技有限公司, 深圳华大基因研究院 filed Critical 深圳华大基因科技有限公司
Publication of WO2012126398A1 publication Critical patent/WO2012126398A1/zh

Links

Classifications

    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12QMEASURING OR TESTING PROCESSES INVOLVING ENZYMES, NUCLEIC ACIDS OR MICROORGANISMS; COMPOSITIONS OR TEST PAPERS THEREFOR; PROCESSES OF PREPARING SUCH COMPOSITIONS; CONDITION-RESPONSIVE CONTROL IN MICROBIOLOGICAL OR ENZYMOLOGICAL PROCESSES
    • C12Q1/00Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions
    • C12Q1/68Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions involving nucleic acids
    • C12Q1/6806Preparing nucleic acids for analysis, e.g. for polymerase chain reaction [PCR] assay

Definitions

  • the present invention relates to the field of gene sequencing, and in particular to DNA tags and uses thereof. Background technique
  • the mate-paired library sequencing refers to the sequence of the large span (2-10 kb) fragment obtained by constructing a large fragment library. This sequence obtained from both ends of the larger span plays an important role in the assembly of large genomes or complex genomes and the characterization of genomic structural variations, and is particularly suitable for the De novo sequencing project.
  • template beads of different paired-end library samples are usually coated in different partitions of the sequencing chip, and then sequenced, and finally the library samples are distinguished by different partitions.
  • the partitioning of the sequencing chip will occupy chip space, reducing the utilization of a single chip, resulting in a reduction in data throughput.
  • Table 1 shows the comparison of the total amount of single-chip magnetic bead coating and the expected data output for different sizes of zoning chips.
  • the existing SOLID sequencer fourth edition sequencing chip can be divided into up to 8 regions, that is, each sequencing chip can sequence up to 8 paired terminal library samples, which is far from meeting the increasing sequencing throughput. need.
  • Table 1 Comparison of total single-chip magnetic bead coating and expected data output for different sizes of zoning chips
  • DNA tag library sequencing maximizes sequencing capacity, reduces sample preparation, and enables sequencing of multiple DNA samples.
  • SOLiDTM System Barcoding the Barcode technology
  • the present invention aims to solve at least one of the technical problems existing in the prior art.
  • indexed-cap adaptor refers to a cap joint with a label.
  • mate-paired indexed library refers to the use of a label cap A paired end library constructed from a linker. Since the tag cap adaptor used in the paired end tag library contains a tag specific for the sample, the molecules in the paired end tag library can correspond to the sample by the tag sequence.
  • the first aspect of the invention proposes a set of isolated DNA tags consisting of the nucleotides set forth in SEQ ID NOS: 1-24.
  • the terms “index” and “DNA index” are used interchangeably and refer to a stretch of double-stranded oligonucleotide having a particular base sequence.
  • the DNA tag is an oligonucleotide duplex of 5 bp in length, and the sequence of one of the strands is selected from SEQ ID NOS: 1-24 (shown in Table 2 below).
  • a tag when a tag is represented by a sequence identifier (SEQ ID NO: ), it indicates that the sequence of one strand of the tag is the sequence indicated by the sequence identifier.
  • SEQ ID NO: 1 when the tag is described by SEQ ID NO: 1, it indicates that the sequence of one strand of the tag is SEQ ID NO: 1.
  • all DNA sequences are given in the direction of 5' to 3'.
  • the DNA tag according to an embodiment of the present invention can ensure that the sequencer process runs normally when constructing and sequencing the paired end library, making the information analysis process simple enough, and the sequence can achieve the following effects: 1 each sample for mixed sequencing The tag sequence is equal in length; 2 the combination of tag sequences for hybrid sequencing should ensure that all four fluorescent dye signals can be read in the same SOL iD sequencing cycle; 3 the tag sequence for hybrid sequencing should have at least two The difference in bases to ensure that the misreading of one base does not confuse the source of the sample; 4 the last bit of the sequence of tags must be G.
  • the inventors have surprisingly found that the construction of a sequencing library using a DNA tag according to an embodiment of the present invention can effectively reduce the problem of data output bias and can accurately distinguish a plurality of sequencing libraries.
  • the invention proposes a set of isolated oligonucleotides.
  • the isolated oligonucleotide has a first strand and a second strand, wherein the first strand is 5'-ACAGCAG(N) 5 or 5*-phos-ACAGCAG(N) 5
  • the second strand is 5'-phos-(N') 5 CTGCTGTAC, wherein (N) 5 is the isolated DNA tag of claim 1, and (N') 5 is a complementary sequence of (N) 5 .
  • the set of isolated oligonucleotides can be used as a tag linker to introduce a DNA tag according to an embodiment of the invention into a paired end tag library.
  • these oligonucleotides may also be referred to as "indexed-cap adaptors", ie, cap connectors with labels.
  • the invention provides the use of a DNA tag in the construction or sequencing of a paired end tag library according to an embodiment of the invention.
  • the term "ma te-pa i red indexed l ibrary" as used herein refers to a paired-end library constructed using a tag cap linker. Since the tag cap adaptor used in the paired end tag library contains a tag specific for the sample, the molecules in the paired end tag library can correspond to the sample through the tag sequence.
  • the invention provides the use of a DNA tag in a kit for the preparation or sequencing of a paired end tag library, in accordance with an embodiment of the invention.
  • the invention provides the use of a tag cap adaptor in the construction or sequencing of a paired end tag library in accordance with an embodiment of the invention.
  • the invention provides the use of a tag cap adaptor in a kit for the preparation or sequencing of a paired end tag library, in accordance with an embodiment of the invention.
  • the invention provides a kit for constructing a library of paired end tags, comprising a tag cap adaptor according to an embodiment of the invention, ie a set of isolated oligonucleotides, said isolated oligo
  • the nucleotide has a first strand and a second strand, wherein the first strand is 5'-ACAGCAG(N) 5 or 5'-phos-ACAGCAG(N) 5 and the second strand is 5'-phos -(N') 5 CTGCTGTAC, wherein (N) 5 is the isolated DNA tag of claim 1, and (N') 5 is a complementary sequence of (N) 5 .
  • the invention proposes a method of constructing a library of paired end tags.
  • the method comprises the steps of: fragmenting a DNA sample to obtain a DNA fragment; and ligating the DNA fragment to a DNA tag linker to obtain a DNA fragment ligated to the DNA tag linker, the DNA tag linker Including a set of isolated DNA tags as described above, preferably using a tag cap adaptor according to an embodiment of the invention; cyclizing the DNA fragment of the ligated DNA tag linker with a biotinylated intermediate linker to obtain cyclization
  • the product, preferably, the two strands of the intermediate linker have the nucleotide sequences set forth in SEQ ID NO: 27 and SEQ ID NO: 28, respectively; and the cyclized product is cleaved to obtain a cleaved cyclized product; Entangling a fragment of interest in the cleavage product of cleavage; linking a linker at both ends of the target fragment to
  • DNA tags can be efficiently introduced into the constructed library, so that multiple samples can be simultaneously sequenced.
  • the source of the sample was distinguished based on the DNA tag.
  • the inventors found that the stability and reproducibility of the obtained sequencing data were very good with the DNA tag of the present invention.
  • the DNA tag of the present invention to sequence the library samples, it is only necessary to perform two independent sequencing reactions to achieve hybrid sequencing of multiple paired end libraries on one chip partition.
  • DNA tags of the present invention may be employed in the construction of the paired end tag library.
  • a sequencing library can be separately constructed, and finally, a plurality of sequencing libraries can be combined, or after the labels are respectively connected, the steps that can be collectively processed are, for example, combined in a PCR step, thereby facilitating A paired end tag library containing multiple samples was constructed.
  • the source of the DNA sample which can be used in the above method is not particularly limited.
  • it can be a prokaryotic or eukaryotic DNA sample.
  • the method of fragmenting a DNA sample is also not particularly limited, and according to an embodiment of the present invention, it may be carried out by at least one selected from the group consisting of an atomization method, an ultrasonic method, and a Hydroshear method, preferably using The DNA sample was fragmented by the Hydroshear method.
  • the DNA sample is fragmented, and the length of the obtained DNA fragment is not particularly limited.
  • the length of the DNA fragment is 1000-4000 bp.
  • the step of selecting the DNA fragment ligated to the DNA tag linker is further included.
  • the fragment selection can be carried out using at least one selected from the group consisting of pulse gel electrophoresis, sucrose or cesium chloride gradient sedimentation and size exclusion chromatography.
  • the fragment selection can be performed using pulse gel electrophoresis.
  • the DNA fragment of the selected DNA tag linker is 1500-2000 bp in length.
  • the step of digesting the DNA fragment of the uncircularized DNA tag linker in the cyclized product is further included.
  • the digestion can be carried out using a Plasmidsafe nuclease.
  • the means for breaking the cyclized product is not particularly limited.
  • the cyclized product may be cleaved using at least one selected from the group consisting of ultrasonic pulverization and enzymatic cleavage.
  • the fragmentation can be carried out using at least one selected from the group consisting of a restriction endonuclease method and a nick translation-exonuclease method.
  • the type of the joint connected at both ends of the target segment is not particularly limited. It can be conveniently selected according to the sequencing system used.
  • a P1 linker and a P2 linker are respectively connected at both ends of the target fragment, wherein the two strands of the P1 linker respectively have the cores shown by SEQ ID NO: 33 and SEQ ID NO:
  • the nucleotide sequence, the two strands of the P2 linker have the nucleotide sequences set forth in SEQ ID NO: 35 and SEQ ID NO: 36, respectively.
  • the P1 linker and the P2 linker are respectively attached to the 5' end and the 3' end of the target fragment.
  • PCR amplification can be carried out using primers having the nucleotide sequences shown in SEQ ID NO: 37 and SEQ ID NO: 38, respectively.
  • PCR amplification is emulsion PCR, wherein the emulsion PCR uses magnetic beads carrying an oligonucleotide that specifically recognizes a P1 linker.
  • the constructed sequencing library can be conveniently applied to the ABI SOLiD sequencing platform for sequencing.
  • the invention provides a paired end tag library obtained by the method according to the foregoing. With this library, bidirectional end sequencing can be efficiently performed. It is thus possible to accurately classify the sample source by obtaining the tag sequence to obtain the nucleic acid sequence information obtained.
  • the present invention provides a method of determining DNA sample sequence information, comprising the steps of: constructing a paired end tag library of said DNA sample according to the method described above; The paired end tag library is sequenced to determine sequence information for the DNA sample. Thereby, bidirectional end sequencing is performed efficiently. It is thus possible to accurately classify the sample source by obtaining the tag sequence to obtain the nucleic acid sequence information obtained.
  • a platform that can be used for sequencing is not particularly limited.
  • the paired end tag library can be sequenced using an ABI SOLiD sequencing platform.
  • the paired end tag library is sequenced using sequencing primers that are specifically paired with the two ends of the library, respectively.
  • nucleic acid sequencing can be conveniently performed using a high throughput sequencing platform.
  • the invention provides a set of DNA tags, the tag being a 5 bp oligonucleotide chain, and the sequence of one of its strands being selected from the group consisting of SEQ ID NOs: 1-24 (see Table 2) ). In a preferred embodiment of the invention, there are at least 2 base differences between any two of said tag sequences.
  • the set of tags comprises at least 2, preferably at least 4, or at least 6, or at least 8, or at least 10, or selected from the group consisting of SEQ ID NOs: 1-24, or At least 12, or at least 16, or at least 20 or 24 tags; more preferably, the set of tags comprises at least SEQ ID NOs: 1 and 2, or SEQ ID NOs: 3 and 4, or SEQ ID NO: 5 And 6, or SEQ ID NOs: 7 and 8, or SEQ ID NOs: 9 and 10, or SEQ ID NOs: 11 and 12, or SEQ ID NOs: 13 and 14, or SEQ ID NOs: 15 and 16, or SEQ ID NO: 17 and 18, or SEQ ID NOS: 19 and 20, or SEQ ID NOS: 21 and 22, or SEQ ID NOS: 23 and 24, or a combination of any two or more thereof.
  • the tag of the invention is used to label a cap link, the sequences of the two strands of the cap linker being set forth in SEQ ID NO: 25 and SEQ ID NO
  • a DNA tag of the invention which can be used to prepare a tag cap linker and/or for constructing and sequencing a paired end tag library.
  • the tag of the present invention is used to label the cap link of the two strands of SEQ ID NO: 25 and SEQ ID NO: 26, respectively, to prepare a tag cap adaptor of the present invention.
  • the DNA tags of the invention can also be used in the preparation of kits for the preparation of tag cap adapters and/or for the construction and sequencing of paired end tag libraries.
  • a tag cap connector having the structure of:
  • cap connectors that can be used to construct a paired end library
  • EcoP15I cap joints and LMP cap joints respectively, in which the 5' ends of both chains of the EcoP15I cap joint are phosphorylated, while the LMP cap joint has only one strand of 5 'The end is phosphorylated.
  • tag cap adaptor of the invention which can be used to construct and sequence a paired end tag library.
  • the tag cap adaptor of the present invention can also be used to prepare kits for constructing and sequencing paired end tag libraries.
  • kits comprising a set of labels of the invention, or a label cap joint of the invention.
  • the kit of the invention further comprises other reagents, for example, the sequences of the two strands thereof are the cap junctions of SEQ ID NO: 25 and SEQ ID NO: 26, respectively.
  • kit of the invention which can be used to construct and sequence a paired end tag library.
  • a method of constructing and sequencing a paired end tag library of a DNA sample comprising the steps of:
  • fragmented DNA fragment has a length of 1000-4000 bp; preferably, the fragmentation method is selected from the group consisting of an atomization method, an ultrasonic method, and a Hydroshear method;
  • cyclizing the DNA fragment with the tag cap linker using a biotinylated intermediate linker optionally, selecting the size of the fragment for the cyclized ligation product, preferably selected from pulse gel electrophoresis, sucrose or chlorine Plutonium gradient sedimentation and size exclusion chromatography; preferably, the sequences of the two strands of the intermediate linker are SEQ ID NO: 27 and SEQ ID NO: 28, respectively;
  • cyclized ligation product obtained by cleavage, preferably, using ultrasonic cleavage and enzymatic cleavage, such as restriction endonuclease and nick translation-exonuclease;
  • step d using the streptavidin magnetic beads to enrich the DNA fragment obtained in step c), and ligating the P1 linker and the P2 linker to the 5' end and the 3' end of the enriched DNA fragment, respectively;
  • step d designing a primer according to the sequence of the P1 linker and the P2 linker, and amplifying the DNA fragment obtained in step d) to form a library of the paired end tag library;
  • step 4) sequencing the product of step 4) using high-throughput sequencing techniques such as the ABI SOLiD sequencing platform, in which one paired terminal region (TAG1) is sequenced using a set of sequencing primers specifically paired with the P1 linker, and the other paired terminal region (TAG2) sequencing using a set of sequencing primers specifically paired with a sequence consisting of a middle linker and a partial tag cap linker to obtain sequences at both ends of the fragmented DNA fragment;
  • TAG1 paired terminal region
  • TAG2 paired terminal region specifically paired with a sequence consisting of a middle linker and a partial tag cap linker
  • step 6 processing the sequencing data obtained in step 5), wherein different sequence reads are mapped to different DNA samples using the tag sequence, and then two overlapping DNA fragments from the same sample are passed through sequence overlap and linkage The sequence at the end splicing out the complete DNA sequence of the sample.
  • the DNA sample is a prokaryotic or eukaryotic DNA sample.
  • the resulting cyclized ligation product is cleaved using enzymatic cleavage.
  • the digestion method comprises a restriction endonuclease method and a nick translation-exonuclease method; wherein the restriction endonuclease method utilizes a restriction endonuclease, such as EcoP15I.
  • the two strands of the intermediate sequencing link consisting of an intermediate linker and a partial tag cap linker are respectively
  • the tag sequences in the sequencing reads are rejected after the different sequencing reads are mapped to different DNA samples.
  • Another aspect of the invention provides a paired end tag library made using the methods provided herein.
  • sequencing the library samples using the DNA tag of the present invention it is only necessary to perform two separate sequencing reactions to achieve hybrid sequencing of multiple paired end libraries on one chip partition.
  • the result after sequencing is:
  • the first 5 base sequences of the second paired end (TAG2) are the tag sequence ⁇ ij , which is used to determine the sequence sample Source;
  • the remaining sequence of TAG2 and the entire sequence of the first paired end (TAG1) are from the sample and can be used for further information analysis.
  • a 5-10 base tag sequence was introduced at the cap linker ligation step during the SOLiD paired-end library construction, allowing only two independent sequencing reactions (one sequencing reaction for TAG1 and one sequencing reaction for TAG2 and Label) enables simultaneous sequencing of multiple paired-end libraries in a single sequencing chip partition of the SOLiD sequencer, accelerating high-throughput sequencing, reducing time and reagent costs.
  • Figure 1 shows the structure and sequencing procedure of the tagged paired-end library constructed in Example 2 of the present invention, wherein the box portion is the introduced tag sequence, the Primer is a primer, the cycle is a cycle, and LA is an intermediate link; And Fig. 2 shows a correlation analysis between the expected value and the actual value of the sequencing statistical data in the embodiment 3 of the present invention. Detailed description of the invention
  • the label cap joint was synthesized using the label sequence in Table 2.
  • the Indexl LMP cap joint was synthesized by the Indexl sequence in Table 2 as an example.
  • the preparation process is as follows.
  • Annealing hybridization was performed on a thermocycler (96-well GeneAmp® PCR System 9700) according to the following procedure. Temperature reaction time
  • a 2x50 bp paired end tag library is prepared by using the genomic DNA of human blood monocytes as an example, according to the method for constructing a paired end tag library of the present invention, and the specific construction process is as follows:
  • the relevant protein solutions, buffers, linkers or primer sequences in this example are from the kit Applied Biosystems SOLiDTM Mate-Paired Library Oligo kit (4400468) or Applied Biosystems SOLiDTM Long Mate- Paired Library Construction kit ( 4443474 ).
  • Antisense strand 5'-phos-GGCCAAGGCGGATGTACGGT-3' (SEQ ID NO: 28).
  • biotin-labeled fragment of interest is enriched by Dynal streptavidin magnetic beads (Invitrogen) and subjected to end-Polishing, and then the P1 and P2 linkers are ligated thereto.
  • the sequence of the P1 linker is as follows:
  • the sequence of the P2 linker is as follows:
  • the sense strand 5'-phos-AGAGAATGAGGAACCCGGGGCAGTT-3' (SEQ ID NO: 35), antisense strand 5'-CTGCCCCGGGTTCCTCATTCTCT-3' (SEQ ID NO: 36).
  • PCR primer 1 5 '-CC ACTACGCCTCCGCTTTCCTCTCTATG-3 '(SEQ ID NO: 37), PCR primer 2 5'-CTGCCCCGGGTTCCTCATTCT-3' (SEQ ID NO: 38).
  • the Indexl-8 library obtained in step 8) was detected using an ABI 3730 sequencer, and each library was tested for at least 48 positive clones randomly selected.
  • SEQ ID NOS: 39-60 shows a partial positive clone sequence of the Indexl library obtained using the 3730 sequencer.
  • each of SEQ ID NOS: 39-60 includes an intermediate sequencing linker sequence (ie, cap linker sequence (CTGCTGTAC) + intermediate linker sequence (CGTACATCCGCCTTGGCCGT) + cap linker ⁇ ' J (ACAGCAG),
  • the complete sequence is CTGCTGTACCGTACATCCGCCTTGGCCGTACAGCAG (SEQ ID NO: 29)
  • the 5 base sequence downstream of the intermediate sequencing linker sequence is the Indexl sequence (GGAAG) that is expected to introduce a cap linker during library construction.
  • FIG. 1 shows a block diagram of a tagged paired end library for SOLiD sequencing constructed in accordance with the method of the present embodiment, wherein the block portion is a tag sequence.
  • the TAG2 sequencing primer and the intermediate sequencing linker sequence are paired, and the first 5 bp of the generated sequencing result (TAG2 sequence) is the introduced tag sequence, so that the sample source of the library product can be determined according to the tag sequence.
  • TAG2 sequence the first 5 bp of the generated sequencing result
  • the sequence following the tag sequence will be used for information analysis.
  • Example 3 Mixed sequencing of paired end tag libraries
  • the Indexl-4 library constructed according to the procedure shown in Example 2 was mixed in the same amount as the library 9; the Index 5-8 library constructed according to the procedure shown in Example 2 was mixed in the amount of the same substance as the library 10; The Indexl-8 library constructed in accordance with the procedure shown in Example 2 was mixed into the library 11 in an amount of the same substance.
  • emulsion PCR was carried out in accordance with the emPCR standard procedure (Applied Biosystems SOLiDTM 3 System Templated Bead Preparation Guide P/N4407421B) provided by Applied Biosystems to obtain magnetic beads with template chains.
  • the DNA on the magnetic beads is modified at the 3' end so that it can be immobilized on a SOLiD sequencing chip. Then, sequencing was performed in accordance with the SOLiD3 sequencer operating procedure (Applied Biosystems SOLiDTM 3 System Instrument Operation Guide P/N4407430B) provided by Applied Biosystems. Specifically, sequencing was performed using the ABI SOLiD 3 sequencing platform, and each mixed library occupies 1/4 of the sequencing chips (the total TAG yield per library is expected to be 50M). Correct).
  • the sequence of tags in the sequencing data can be utilized to determine the source of the sample for the data. After the sample source has been distinguished, the tag sequence at the 5' end of TAG2 is removed and the remaining sequence is applied for subsequent analysis. Through the sequence overlap and linkage, the complete target nucleic acid is spliced from the sequences at both ends of the interrupted DNA fragment.
  • the statistical analysis of the sequencing results of Table 3 shows the total yield of the three libraries and the distribution of the number of detections of each label. Among them, bio using bioscopeV 1.2 software to take the full length of TAG1 and the 6-50bp of TAG2 for comparison analysis.
  • TAG1 and the 6-50 bp of TAG2 were compared using software bioscope V1.2.
  • the results showed that in the tag library constructed according to the method of the present invention, the comparable data accounted for about 70% of the original data (in this experiment, the mean value was 70.32%, and the standard deviation was 1.11%). This is consistent with the comparison ratio (70-72%) obtained by the conventional method of building a library, thereby demonstrating that the label construction method of the present invention does not significantly affect the alignment efficiency of the TAG.
  • the sequencing results of the library are reproducible and reliable. Industrial applicability
  • the isolated DNA tag and the isolated oligonucleotide of the present invention can be effectively applied to the construction or sequencing of the paired terminal tag library of the sample DNA, and the obtained library is of good quality and the sequencing result is accurate.

Landscapes

  • Chemical & Material Sciences (AREA)
  • Organic Chemistry (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Analytical Chemistry (AREA)
  • Zoology (AREA)
  • Wood Science & Technology (AREA)
  • Proteomics, Peptides & Aminoacids (AREA)
  • Health & Medical Sciences (AREA)
  • Engineering & Computer Science (AREA)
  • Biophysics (AREA)
  • Immunology (AREA)
  • Microbiology (AREA)
  • Molecular Biology (AREA)
  • Biotechnology (AREA)
  • Physics & Mathematics (AREA)
  • Chemical Kinetics & Catalysis (AREA)
  • Biochemistry (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Health & Medical Sciences (AREA)
  • Genetics & Genomics (AREA)
  • Measuring Or Testing Involving Enzymes Or Micro-Organisms (AREA)

Description

DNA标签及其用途 优先权信息
本申请请求 201 1 年 3 月 24 日向中国国家知识产权局提交的、 专利申请号为 201 1 10071 176.2的专利申请的优先权和权益, 并且通过参照将其全文并入此处。 技术领域
本发明涉及基因测序领域, 具体而言, 涉及 DNA标签及其用途。 背景技术
配对末端文库( mate-paired library )测序是指通过构建大片段文库, 获得较大跨度 ( 2-10kb )片段两端的序列。 这种从较大跨度两端所获得的序列对大基因组或者复杂基 因组的组装和基因组结构变异的发掘具有非常重要的作用, 特别适合于新基因组测序 (De novo sequencing)项目。
目前通常将不同的配对末端文库样品的模板磁珠分别涂布在测序芯片的不同分区 内, 然后再进行测序, 最后通过不同的分区来区分文库样品。 然而, 测序芯片的分区将 占用芯片空间, 减少单个芯片的利用率, 从而导致数据产出量降低。 表 1显示不同规格 的分区芯片的单芯片磁珠涂布总量和预期数据产出量的对比。 此外, 现有的 SOLiD测 序仪第四版测序芯片最多可以分为 8个区, 即,每张测序芯片最多可以对 8个配对末端 文库样品进行测序, 这远远不能满足测序通量日益增长的需要。
表 1 : 不同规格的分区芯片的单芯片磁珠涂布总量和预期数据产出量的对比
Figure imgf000003_0001
DNA标签文库测序可最大化测序容量, 减少样品制备流程, 实现对多个 DNA样品 的混合测序。 目前, 在 SOLiD 系统中, 在单分区芯片上对多个样品进行混合测序利用 的是 Barcode技术 ( SOLiD™ System Barcoding ) 。
然而, 目前针对配对末端文库测序文库的 DNA标签仍有待改进。 发明内容
本发明旨在至少解决现有技术中存在的技术问题之一。
如本文中使用的, "标签帽接头 ( indexed-cap adaptor )" 是指带有标签的帽接头。 如本文中使用的, "配对末端标签文库(mate-paired indexed library )" 是指使用标签帽 接头构建的配对末端文库。 由于配对末端标签文库所使用的标签帽接头含有特异于样品的 标签, 因此, 配对末端标签文库中的分子可以通过标签序列与样品——对应。
为此, 本发明的第一方面提出了一组分离的 DNA标签, 其由 SEQ ID NO: 1-24所示的 核苷酸构成。 在本文中, 所使用的术语 "标签(index )" 和 "DNA标签(DNA index )" 可 互换使用,其是指具有特定碱基序列的一段双链寡核苷酸。在本文中, DNA标签为长度 5bp 的寡核苷酸双链, 并且其一条链的序列选自 SEQ ID NO: 1-24 (如下表 2所示)。 特别地, 在本说明书中, 当用序列标志符( SEQ ID NO: )表示标签时, 其表示标签的一条链的序列 为该序列标志符所示的序列。 例如, 当用 SEQ ID NO: 1描述标签时, 其表示标签的一条链 的序列为 SEQ ID NO: 1。 另外, 在本说明书中, 所有 DNA序列以 5'至 3'的方向给出。 根据 本发明实施例的 DNA标签在用于构建并测序配对末端文库时,可以保证测序仪流程运行 正常, 使信息分析流程足够简便, 并且其序列能够实现下列效果: ①用于混合测序的各 样品的标签序列等长;②进行混合测序的标签序列组合在同一 SOL iD测序循环中应保证 4种荧光染料信号都可以被读出; ③进行混合测序的标签序列两两之间应至少具有两个 碱基的差异, 以确保一个碱基的错读不至于混淆样品来源; ④标签序列的最后一位必须 为 G。 发明人惊奇地发现, 利用根据本发明实施例的 DNA标签构建测序文库, 能够有效地 减少数据产出偏向性的问题, 并且能够精确地对多种测序文库进行区分。
DNA标签序列
SEQ ID NO: 名称 碱基序列 荧光信号序列
1 Indexl GGAAG 00202
2 Index2 TCATG 12131
3 Index3 CAAGG 31020
4 Index4 ATACG 23313
5 Index5 AGCCG 22303
6 Index6 TTCGG 10230
7 Index7 GCCAG 03012
8 Index8 CACTG 31121
9 Index9 TTCTG 10221
10 Indexl O CGCCG 33303
11 Indexl l GACAG 02112
12 Indexl 2 ACCGG 21030
13 Indexl 3 CGATG 33231
14 Index" ACCCG 21003
15 Indexl 5 TCGAG 12322
16 Indexl 6 GGTGG 00110
17 Indexl 7 ACGTG 21311
18 Indexl 8 TCAGG 12120
19 Indexl 9 GCCCG 03003
20 Index20 CCTAG 30232 21 Index21 AAGAG 20222
22 Index22 GATGG 02310
23 Index23 TAATG 13031
24 Index24 CACCG 31103 根据本发明的第二方面, 本发明提出了一组分离的寡核苷酸。 根据本发明的实施例, 该分离的寡核苷酸具有第一链和第二链, 其中, 所述第一链为 5'-ACAGCAG(N)5 或 5*-phos-ACAGCAG(N)5, 所述第二链为 5'-phos-(N')5CTGCTGTAC, 其中, (N)5为权利要求 1所述的分离的 DNA标签, (N')5为 (N)5的互补序列。 由此, 可以将该组分离的寡核苷酸作 为标签接头, 将根据本发明实施例的 DNA标签引入配对末端标签文库中。 因而, 这些寡核 苷酸, 也可以被称为 "标签帽接头 ( indexed-cap adaptor )" , 即带有标签的帽接头。
进一步, 在本发明的第三方面, 本发明提出了根据本发明实施例的 DNA标签在配对末 端标签文库构建或测序中的用途。 在本文中所使用的术语 "配对末端标签文库 ( ma te-pa i red indexed l ibrary )" 是指使用标签帽接头构建的配对末端文库。 由于配 对末端标签文库所使用的标签帽接头含有特异于样品的标签, 因此, 配对末端标签文库 中的分子可以通过标签序列与样品——对应。
在本发明的第四方面, 本发明提出了根据本发明实施例的 DNA标签在制备试剂盒中的 用途, 所述试剂盒用于配对末端标签文库构建或测序。
在本发明的第五方面, 本发明提出了根据本发明实施例的标签帽接头在配对末端标签 文库构建或测序中的用途。
在本发明的第六方面, 本发明提出了根据本发明实施例的标签帽接头在制备试剂盒中 的用途, 所述试剂盒用于配对末端标签文库构建或测序。
根据本发明的第七方面, 本发明提出了一种构建配对末端标签文库的试剂盒, 其包括 根据本发明实施例的标签帽接头, 即一组分离的寡核苷酸, 所述分离的寡核苷酸具有第一 链和第二链, 其中, 所述第一链为 5'-ACAGCAG(N)5 或 5'-phos-ACAGCAG(N)5, 所述第 二链为 5'-phos-(N')5CTGCTGTAC, 其中, (N)5为权利要求 1所述的分离的 DNA标签, (N')5 为 (N)5的互补序列。
在本发明的第八方面, 本发明提出了一种构建配对末端标签文库的方法。 根据本发明 的实施例, 该方法包括以下步骤: 将 DNA样品片段化, 以便获得 DNA片段; 将所述 DNA 片段与 DNA标签接头相连, 以便获得连接 DNA标签接头的 DNA片段, 所述 DNA标签接 头包含前面所述的一组分离的 DNA标签, 优选釆用根据本发明实施例的标签帽接头; 利用 生物素化的中间接头将所述连接 DNA标签接头的 DNA片段进行环化,以便获得环化产物, 优选, 中间接头的两条链分别具有 SEQ ID NO: 27和 SEQ ID NO: 28所示的核苷酸序列; 以及将所述环化产物进行断裂, 以便获得断裂的环化产物; 从所述断裂的环化产物中富集 目的片段; 在所述目的片段的两端分别连接接头, 以便获得连接产物; 以及将所述连接产 物进行 PCR扩增, 以便获得扩增产物, 所述扩增产物构成所述配对末端标签文库。 利用该 方法, 能够有效地将 DNA标签引入所构建的文库中, 从而可以同时对多种样品进行测序, 最后根据 DNA标签对样品的来源进行区别, 另外, 发明人发现, 釆用本发明的 DNA标签, 所得到的测序数据结果的稳定性和可重复性非常好。 并且, 利用本发明的 DNA标签对文库 样品进行测序, 只需 2 次独立的测序反应, 即可实现在一个芯片分区上对多个配对末端文 库进行混合测序。
本领域技术人员可以理解的是, 根据本发明的实施例, 可以在构建配对末端标签文库 的过程中, 仅釆用本发明 DNA标签的一种或者多种。 对于釆用多种 DNA标签, 可以分别 构建测序文库, 最后将多种测序文库进行组合, 也可以在分别连接标签之后, 能够共同处 理的步骤中例如在 PCR步骤中进行组合, 由此, 可以方便地构建含有多种样品的配对末端 标签文库。
才艮据本发明的实施例, 可以用于上述方法的 DNA样品的来源不受特别限制。 例如可以 为原核生物或真核生物 DNA样品。根据本发明的实施例,对 DNA样品进行片段化的方法, 也不受特别限制, 根据本发明的实施例, 可以通过选自雾化法, 超声法和 Hydroshear法的 至少一种进行, 优选利用 Hydroshear法将所述 DNA样品片段化。 根据本发明的实施例, 将 DNA样品片段化, 所得到的 DNA片段的长度并不受特别限制, 根据具体的实施例, DNA 片段的长度为 1000-4000bp。 由此, 可以进一步提高构建测序文库以及后续测序的效率。
根据本发明的实施例,在利用生物素化的中间接头将所述连接 DNA标签接头的 DNA 片段进行环化之前, 进一步包括将所述连接 DNA标签接头的 DNA片段进行片段选择的步 骤。 例如, 可以利用选自脉冲凝胶电泳、 蔗糖或氯化铯梯度沉降和分子排阻层析的至少一 种进行所述片段选择。 根据具体的实施例, 可以利用脉冲凝胶电泳进行所述片段选择。 根 据本发明的实施例, 所选择的连接 DNA标签接头的 DNA片段的长度为 1500-2000 bp。 由 此, 可以进一步提高构建测序文库以及后续测序的效率。 根据本发明的实施例, 将所述环 化产物进行断裂之前, 进一步包括将环化产物中未环化的连接 DNA标签接头的 DNA片段 进行消化的步骤。 根据本发明的实施例, 可以利用 Plasmidsafe核酸酶进行所述消化。
根据本发明的实施例, 对环化产物进行断裂的手段并不受特别限制。 根据本发明的实 施例,可以使用选自超声断裂法和酶切法的至少一种将所述环化产物进行断裂。根据具体 的实施例, 可以使用选自限制性内切酶法和缺刻平移 -外切酶法的至少一种进行所述断 裂。
根据本发明的实施例, 在目的片段两端所连接的接头的类型并不受特别限制。 可以根 据所釆用的测序系统进行方便地选择。 根据本发明的实施例, 在所述目的片段的两端分别 连接 P1接头和 P2接头, 其中, 所述 P1接头的两条链分别具有 SEQ ID NO: 33和 SEQ ID NO: 34所示的核苷酸序列,所述 P2接头的两条链分别具有 SEQ ID NO: 35和 SEQ ID NO: 36所示的核苷酸序列。 根据具体的实例, 将所述 P1接头和所述 P2接头分别连接到所述目 的片段的 5'端和 3'端。 进一步, 可以釆用分别具有 SEQ ID NO: 37和 SEQ ID NO: 38所示 核苷酸序列的引物进行 PCR扩增。 根据一个具体实例, PCR扩增为乳液 PCR, 其中所述乳 液 PCR釆用磁珠, 所述磁珠携带有特异性识别 P1接头的寡核苷酸。 由此, 可以将所构建的 测序文库方便地应用于 ABI SOLiD测序平台进行测序。 在本发明的第九方面, 本发发明提供了一种配对末端标签文库, 其是通过根据前面所 述的方法获得的。 利用该文库, 可以有效地进行双向末端测序。 从而可以通过获得标签序 列, 来对所获得的核酸序列信息来精确地进行样品来源分类。
在本发明的第十方面, 本发明提供了一种确定 DNA样品序列信息的方法, 其包括下列 步骤: 才艮据前面所述的方法构建所述 DNA样品的配对末端标签文库; 以及对所述配对末端 标签文库进行测序, 以便确定所述 DNA样品的序列信息。 由此, 效地进行双向末端测序。 从而可以通过获得标签序列, 来对所获得的核酸序列信息来精确地进行样品来源分类。
根据本发明的实施例, 可以用于进行测序的平台并不受特别限制。 根据本发明的实施 例, 可以利用 ABI SOLiD测序平台对所述配对末端标签文库进行测序。 优选地, 利用分别 与所述文库的两端序列特异性配对的测序引物, 对所述配对末端标签文库进行测序。 由此, 可以便捷地利用高通量测序平台进行核酸测序。
因此, 在本发明的一个方面, 本发明提供了一组 DNA标签, 所述标签为长度 5bp的寡 核苷酸链, 并且其一条链的序列选自 SEQ ID NO: 1-24 (参见表 2)。 在本发明的一个优选实 施方案中, 任意两个所述标签序列之间至少具有 2 个碱基差异。 在本发明的一个优选实施 方案中, 一组标签包含选自 SEQ ID NO: 1-24的至少 2种, 优选地至少 4种、 或至少 6种、 或至少 8种、 或至少 10种、 或至少 12种、 或至少 16种、 或至少 20种或 24种标签; 更优 选地, 一组标签至少包括 SEQ ID NO: 1和 2, 或 SEQ ID NO: 3和 4, 或 SEQ ID NO: 5 和 6, 或 SEQ ID NO: 7和 8, 或 SEQ ID NO: 9和 10, 或 SEQ ID NO: 11和 12, 或 SEQ ID NO: 13和 14, 或 SEQ ID NO: 15和 16, 或 SEQ ID NO: 17和 18, 或 SEQ ID NO: 19和 20, 或 SEQ ID NO: 21和 22, 或 SEQ ID NO: 23和 24所示的标签, 或者其任何两个或者 多个的组合。 在一个优选实施方案中, 本发明的标签用于标记帽接头, 所述帽接头的两条 链的序列分别如 SEQ ID NO: 25和 SEQ ID NO: 26所示。
在本发明的另一个方面, 提供了本发明的 DNA标签的用途, 其可用于制备标签帽接头 和 /或用于构建和测序配对末端标签文库。 优选地, 本发明的标签用于标记其两条链的序列 分别为 SEQ ID NO: 25和 SEQ ID NO:26的帽接头, 从而制备本发明的标签帽接头。 本发明 的 DNA标签还可以用于制备试剂盒,所述试剂盒用于制备标签帽接头和 /或用于构建和测序 配对末端标签文库。
在本发明的另一个方面, 提供了一种标签帽接头, 其具有下式的结构:
5'-ACAGCAG(N)5或者 5*-phos-ACAGCAG(N)5
5*-phos-(N*)5CTGCTGTAC或者 或 5'-phos-(N')5CTGCTGTAC 其中, (N)5表示选自 SEQ ID NO: 1-24的标签序列, (N')5表示所述标签序列的互补序 列。
可用于构建配对末端文库的帽接头有两种, 其分别为 EcoP15I帽接头和 LMP帽接头, 其中 EcoP15I帽接头的两条链的 5'端都被磷酸化, 而 LMP帽接头只有一条链的 5'端被磷酸 化。 在本发明的另一个方面, 提供了本发明的标签帽接头的用途, 其可以用于构建和测序 配对末端标签文库。 本发明的标签帽接头还可以用于制备试剂盒, 所述试剂盒用于构建和 测序配对末端标签文库。
在本发明的另一个方面, 提供了一种试剂盒, 其包含本发明的一组标签, 或本发明的 标签帽接头。 在本发明的一个优选实施方案中, 本发明的试剂盒还包含其他试剂, 例如, 其两条链的序列分别为 SEQ ID NO: 25和 SEQ ID NO:26的帽接头。
在本发明的另一个方面, 提供了本发明的试剂盒的用途, 其可以用于构建和测序配对 末端标签文库。
在本发明的另一个方面,提供了一种构建和测序 DNA样品的配对末端标签文库的方法, 其包括以下步骤:
1 ) 片段化样品 DNA, 其中, 优选地, 片段化后的 DNA片段长度为 1000-4000bp; 优 选地, 片段化方法选自雾化法, 超声法和 Hydroshear法;
2 )通过下列步骤构建 DNA样品的配对末端标签文库:
a. 使用本发明的标签制备标签帽接头并将所得的标签帽接头连接至片段化后的 DNA 片段的两个末端, 或者将本发明的标签帽接头连接至片段化后的 DNA片段的两个末端, 从 而形成带有标签帽接头的 DNA片段, 其中, 每一种 DNA样品使用一种标签帽接头;
b. 利用生物素化的中间接头环化连接带有标签帽接头的 DNA片段;任选地,对环化连 接产物进行片段大小的选择, 优选的选择方法选自脉冲凝胶电泳、 蔗糖或氯化铯梯度沉降 和分子排阻层析; 优选地, 所述中间接头的两条链的序列分别为 SEQ ID NO: 27和 SEQ ID NO: 28;
c. 断裂所得的环化连接产物, 优选地, 使用超声断裂法和酶切法, 例如限制性内切酶 法和缺刻平移 -外切酶法;
d. 使用链霉亲和素磁珠富集步骤 c )所得的 DNA片段, 并将 P1接头和 P2接头分别连 接到富集所得的 DNA片段的 5'端和 3'端;
e. 根据 P1接头和 P2接头的序列设计引物, 并扩增步骤 d )所得的 DNA片段, 形成配 对末端标签文库文库;
3 )任选地, 将使用不同标签帽接头的样品的配对末端标签文库等摩尔量混合, 从而获 得混合的配对末端标签文库;
4 )使用乳液 PCR法将步骤 2 ) 的配对末端标签文库或步骤 3 ) 的混合的配对末端标签 文库扩增到 P1磁珠上, 所述磁珠上固定有 P1接头引物;
5 )利用高通量测序技术例如用 ABI SOLiD测序平台对步骤 4 )的产物进行测序, 其中 一个配对末端区 (TAG1 )利用和 P1 接头特异配对的一组测序引物进行测序, 另一个配对 末端区 ( TAG2 )利用和由中间接头及部分标签帽接头组成的序列特异配对的一组测序引物 进行测序, 从而获得片段化后的 DNA片段的两个末端的序列;
6 )对步骤 5 )获得的测序数据进行处理, 其中, 利用标签序列将不同的测序读段对应 到不同的 DNA样品, 然后通过序列重叠和连锁关系, 从来自同一样品的 DNA片段的两个 末端的序列拼接出样品的完整 DNA序列。
在本发明的一个优选实施方案中, 所述 DNA样品是原核生物或真核生物 DNA样品。 在本发明的一个优选实施方案中, 使用酶切法断裂所得的环化连接产物。 优选地, 所 述酶切法包括限制性内切酶法和缺刻平移 -外切酶法; 其中限制性内切酶法利用的是 ΠΙ型限 制性内切酶, 例如 EcoP15I。
在本发明的一个优选实施方案中, 由中间接头及部分标签帽接头组成的中间测序接头 的两条链分别为
5*-CTGCTGTACCGTACATCCGCCTTGGCCGTACAGCAG-3*(SEQ ID NO: 29),
5*-CTGCTGTACGGCCAAGGCGGATGTACGGTACAGCAG-3*(SEQ ID NO: 30)。
在本发明的一个优选实施方案中, 在将不同的测序读段对应到不同的 DNA样品后, 剔 除测序读段中的标签序列。
本发明的另一方面提供了一种配对末端标签文库, 其使用本发明提供的方法制得。 利用本发明的 DNA标签对文库样品进行测序, 只需 2次独立的测序反应, 即可实现在 一个芯片分区上对多个配对末端文库进行混合测序。 特别地, 对于 50+50bp 的配对末端测 序类型来说, 测序后得到的结果是: 第二个配对末端 (TAG2 ) 的前 5个碱基序列为标签序 歹 ij , 其用于确定序列的样品来源; TAG2 的剩余序列及第一个配对末端 (TAG1 ) 的全部序 列则来自样品, 可用于进一步的信息分析。
对短片段测序技术的深入研究表明, 25-30bp的读长可满足重测序研究中的生物信息学 分析要求; 当读长达到 100 bp或以上时,即可进行基因组的从头组装和测序工作(Whiteford N, Haslam N, Weber G, et al. An analysis of the feasibility of short read sequencing. Nucleic
Acids Res, 2005, 33: el71 )。 因此, 将 TAG2的前 5个碱基序列用作标签序列以标记样品来 源, 不会妨碍进一步的信息分析。
根据本发明的实施例, 本发明至少可以实现下列技术效果之一:
A)在 SOLiD配对末端文库构建过程中在帽接头连接步骤引入了 5-10个碱基的标签序 列,从而只需通过 2次独立测序反应 (一个测序反应针对 TAG1 ,另一个测序反应针对 TAG2 和标签),即可实现在 SOLiD测序仪的单个测序芯片分区内对多个配对末端文库进行混合测 序, 加速了高通量测序, 降低了时间和试剂花费。
B)无需使用芯片分区方法即可在单张芯片上对多个配对末端标签文库样品进行混合测 序, 提高了芯片面积的利用率, 提高了单个测序反应的数据产量, 降低了单位数据产出的 成本。
C) 可以对多至 48 个配对末端文库进行混合测序, 与现有技术中的芯片分区方法(其 只能对 8个配对末端文库进行混合测序)相比, 进一步提高了测序效率。
本发明的附加方面和优点将在下面的描述中部分给出,部分将从下面的描述中变得 明显, 或通过本发明的实践了解到。 附图说明 本发明的上述和 /或附加的方面和优点从结合下面附图对实施例的描述中将变得明显和 容易理解, 其中:
图 1显示了本发明实施例 2中构建的带有标签的配对末端文库的结构和测序流程, 其 中方框部分为所引入的标签序列, Primer为引物 , Cycle为循环, LA.为中间接头; 以及 图 2显示了本发明实施例 3中的测序统计数据的期望值和实际值的相关性分析。 发明详细描述
下面将结合实施例对本发明的方案进行解释。 本领域技术人员将会理解, 下面的实施 例仅用于说明本发明, 而不应视为限定本发明的范围。 实施例中未注明具体技术或条件的, 按照本领域内的文献所描述的技术或条件 (例如参考 J.萨姆布鲁克等著, 黄培堂等译的《分 子克隆实验指南》 , 第三版, 科学出版社)或者按照产品说明书进行。 实施例 1: 制备标签帽接头
在本实施例中, 利用表 2中的标签序列合成了标签帽接头。
以表 2中的 Indexl序列合成 Indexl LMP帽接头为例, 制备流程如下。
a )合成制备标签帽接头所需的两条寡核苷酸序列:
Indexl LMP帽接头 -a: 5 AC AGC AGGG AAG-3 ' (SEQ ID NO:31);
Indexl LMP帽接头 -b: 5 '-phos-CTTCCCTGCTGTAC-3 '(SEQ ID NO:32)。
b )将干粉状或薄膜状寡核苷酸稀释至 125μΜ。
c )将 125μΜ Indexl帽接头 -a溶液、 125μΜ Indexl帽接头 -b溶液、 5X T4连接酶緩冲 液(Invitrogen )依照 2: 2: 1的体积比混合, 并分装至 PCR管中, 每管 100μ1。
d )在热循环仪 ( 96-well GeneAmp® PCR System 9700 )上依照如下程序进行退火杂交。 温度 反应时间
95 °C 5分钟
72 °C 5分钟
65 °C 5分钟
60 °C 5分钟
50 °C 3分钟
40 °C 3分钟
30 °C 3分钟
20 °C 3分钟
10°C 3分钟
4°C e )取出退火后的双链 Indexl帽接头在 -20 °C下保存备用。 实施例 2: 构建 2x50bp配对末端标签文库
在本实施例中, 以人血单核细胞的基因组 DNA为例, 按照本发明的构建配对末端标签 文库的方法, 制备 2x50bp配对末端标签文库, 具体构建流程如下:
2.1主要试剂
除非另有注明, 本实施例中的相关蛋白溶液、 緩冲液、 接头或引物序列等均来自试剂 盒 Applied Bio systems SOLiD™ Mate-Paired Library Oligo kit (4400468)或 Applied Biosystems SOLiD™ Long Mate-Paired Library Construction kit ( 4443474 )。
2.2实验步骤
操作步骤参考 Applied Biosystems SOLiD™ 4 System Library Preparation Guide P/N
4445673, section 3.1 , 通过参照并入本文。
1)检测 DNA样品:不少于 20μ§,在 1%琼脂糖凝胶上电泳 40分钟 (130V),以检测 DNA 完整性; 样品中不允许存在 R A和蛋白质污染。
2)使用 Hydroshear法将样品 DNA打断为 1000bp-4000bp大小的 DNA片段,并进行末 端精修( End-Polishing )。
3)在 DNA片段上连接实施例 1中制备的带有标签的 LMP帽接头( Indexl-8 LMP帽接 头;); 每个样品使用一种标签 LMP帽接头。
4) 通过脉冲凝胶电泳选择 1.5-2kb大小的连接产物片段。
5) 利用 T4 DNA连接酶, 将经过片段选择的连接产物和生物素化的中间接头进行环化 连接, 并使用 Plasmidsafe核酸酶( Epicentre )对未环化的 DNA分子进行消化; 使用的中间 接头的序列如下:
正义链 5'-phos-CGTACATCCGCCTTGGCCGT-3' ( SEQ ID NO :27 ),
反义链 5'-phos-GGCCAAGGCGGATGTACGGT-3' ( SEQ ID NO:28 )。
6) 利用缺刻平移-外切酶消化法打断环化的连接产物分子。
7) 通过 Dynal链霉亲和素磁珠 ( Invitrogen )对带有生物素标记的目的片段进行富集, 并进行分子末端精修( End-Polishing ), 然后在其上连接 P1和 P2接头。
P1接头的序列如下:
正义链 5'-CCACTACGCCTCCGCTTTCCTCTCTATGGGCAGTCGGTGAT-3' (SEQ ID NO:33 ),
( SEQ ID NO:34 )。
P2接头的序列如下:
正义链 5'-phos-AGAGAATGAGGAACCCGGGGCAGTT-3' ( SEQ ID NO:35 ), 反义链 5'-CTGCCCCGGGTTCCTCATTCTCT-3' ( SEQ ID NO:36 )。
8)使用如下引物对步骤 7)获得的连接产物进行 PCR扩增从而得到文库产物 ( Indexl-8 文库): PCR引物 1 5 '-CC ACTACGCCTCCGCTTTCCTCTCTATG-3 '(SEQ ID NO:37), PCR引物 2 5'-CTGCCCCGGGTTCCTCATTCT-3' ( SEQ ID NO:38 )。
9)使用 ABI 3730测序仪检测步骤 8)获得的 Indexl-8文库, 每个文库至少测随机挑选 的 48个阳性克隆。
经鉴定, 每个文库的所有克隆均包含目的标签序列, 且都没有非目的标签序列的污染。 特别地, 以 Indexl文库的阳性克隆的测序结果为例, SEQ ID NO:39-60显示利用 3730测序 仪获得的 Indexl文库的部分阳性克隆序列。 其中, 经鉴定发现, SEQ ID NO: 39-60的每条 序列都包括中间测序接头序列 ( 即, 帽接头序列(CTGCTGTAC) + 中间接头序列 (CGTACATCCGCCTTGGCCGT) + 帽 接 头 序 歹' J (ACAGCAG) , 完 整 序 列 为 CTGCTGTACCGTACATCCGCCTTGGCCGTACAGCAG ( SEQ ID NO :29 ) ), 并且在中间测 序接头序列下游的 5 碱基序列均为文库构建过程中期望引入帽接头的 Indexl 序列 ( GGAAG )。
由此可知, 通过上述建库流程所得到的所有文库产物均包含了期望得到的中间测序接 头序列和标签序列, 并且该建库流程可重复性良好, 可靠。
图 1显示的是依据本实施例的方法构建的适用于 SOLiD测序的带有标签的配对末端文 库的结构图, 其中方框部分为标签序列。 在 SOLiD测序过程中, TAG2测序引物和中间测 序接头序列配对, 并且产生的测序结果(TAG2序列)的前 5bp为导入的标签序列, 从而可 以才艮据该标签序列确定该文库产物的样品来源, 并将标签序列后的序列将用于信息分析。 实施例 3: 配对末端标签文库的混合测序
3.1主要试剂
除非另有注明, 本实施例中涉及到的试剂均来自 Applied Biosystems公司。
3.2实险步骤
1) 混合文库
将依照实施例 2所示流程构建的 Indexl-4文库按等物质的量混合,作为文库 9; 将依照 实施例 2所示流程构建的 Index5-8文库按等物质的量混合, 作为文库 10; 将依照实施例 2 所示流程构建的 Indexl-8文库按等物质的量混合成为文库 11。
2)扩增
分别使用文库 9-11 , 依照 Applied Biosystems 提供的 emPCR 标准流程 (Applied Biosystems SOLiD™ 3 System Templated Bead Preparation Guide P/N4407421B )进行乳液 PCR ( emPCR ), 从而获得带有模板链的磁珠。
3) 测序
对磁珠上的 DNA进行 3'末端的修饰, 使其可以固定在 SOLiD测序芯片上。 然后, 依 照 Applied Biosystems提供的 SOLiD3 测序仪操作流程 ( Applied Biosystems SOLiD™ 3 System Instrument Operation Guide P/N4407430B )进行测序。 特别地, 使用 ABI SOLiD 3测 序平台进行测序,每个混合文库占用 1/4张测序芯片(预计每个文库混合总 TAG产量为 50M 对)。
4)数据处理
由于不同的样品对应不同的标签序列, 因此, 可以利用测序数据中的标签序列来确定 数据的样品来源。 在样品来源区分完毕后, 剔除 TAG2 5'端的标签序列, 并将剩下的序列应 用于后续分析。 通过序列重叠和连锁关系, 从打断后的 DNA片段的两个末端的序列拼接出 完整的目的核酸。
表 3的测序结果统计分析显示了 3个文库的总产量和各标签的检出数分布。 其中, 釆 用 bioscopeV 1.2软件取 TAG1全长和 TAG2的第 6-50bp进行比对分析。
表 3: 混合文库的测序数据统计
Figure imgf000013_0001
从表 3统计数据可以看出, 3个混合文库内各标签检出百分比均一性良好, 且没有非目 的标签的污染。 因为测序中不可避免地会出现错误, 因此, 将对标签位置存在测序错误的 TAG直接作为未识别处理。 目前, SOLiD测序平台的初始错误率约为 3%。 在本实施例中, 未识别的 TAG所占的比例和该值基本一致, 这充分说明了本发明的方法的可靠性。
利用软件 bioscope V1.2对 TAG1全长和 TAG2第 6-50bp进行比对分析。 结果表明, 依 据本发明方法所构建的标签文库中, 可比对数据约占原始数据 70% (本实验中, 均值为 70.32%, 标准差为 1.11% )。 这与釆用常规方法建库所得的比对比率(70-72% )—致, 从而 证明本发明的标签建库方法不会显著影响 TAG的比对效率。
图 2显示了对上述三组数据的标签检出百分比的期望值和实际值进行相关性分析的结 果, 其中横坐标为标签检出百分比的期望值, 纵坐标为标签检出百分比的实测值。 在理想 状态下,期望值和实测值应满足 Y=X。而在本实施例中,二者的线性拟合为 Υ=0.953Χ+0.254, 相关系数为 R2=0.997, 即, 期望值和实测值的偏差在 5%以内, 这充分表明, 本发明的标签 文库的测序结果可重复性强, 结果可靠。 工业实用性
本发明的分离的 DNA标签和分离的寡核苷酸, 能够有效地应用于样品 DNA的配对末 端标签文库构建或测序, 并且获得的文库质量好, 测序结果准确。 尽管本发明的具体实施方式已经得到详细的描述, 本领域技术人员将会理解。 根据已 经公开的所有教导, 可以对那些细节进行各种修改和替换, 这些改变均在本发明的保护范 围之内。 本发明的全部范围由所附权利要求及其任何等同物给出。
在本说明书的描述中, 参考术语 "一个实施例"、 "一些实施例"、 "示意性实施例"、 "示 例"、 "具体示例"、 或 "一些示例" 等的描述意指结合该实施例或示例描述的具体特征、 结 构、 材料或者特点包含于本发明的至少一个实施例或示例中。 在本说明书中, 对上述术语 的示意性表述不一定指的是相同的实施例或示例。 而且, 描述的具体特征、 结构、 材料或 者特点可以在任何的一个或多个实施例或示例中以合适的方式结合。

Claims

权利要求书
1、 一组分离的 DNA标签, 其由 SEQ ID NO: 1-24所示的核苷酸构成。
2、 一组分离的寡核苷酸, 所述分离的寡核苷酸具有第一链和第二链,
其中,
所述第一链为 5'-ACAGCAG(N)5 或 5'-phos-ACAGCAG(N)5
所述第二链为 5'-phos-(N')5CTGCTGTAC,
其中, (N)5为权利要求 1所述的分离的 DNA标签, (N')5为 (N)5的互补序列。
3、 权利要求 1所述的 DNA标签在配对末端标签文库构建或测序中的用途。
4、 权利要求 1所述的 DNA标签在制备试剂盒中的用途, 所述试剂盒用于配对末端标签 文库构建或测序。
5. 权利要求 2所述的寡核苷酸在配对末端标签文库构建或测序中的用途。
6、 权利要求 2所述的寡核苷酸在制备试剂盒中的用途, 所述试剂盒用于配对末端标签 文库构建或测序。
7、 一种构建配对末端标签文库的方法, 其特征在于, 包括以下步骤:
将 DNA样品片段化, 以便获得 DNA片段;
将所述 DNA片段与 DNA标签接头相连, 以便获得连接 DNA标签接头的 DNA片段, 所述 DNA标签接头包含权利要求 1所述的一组分离的 DNA标签;
利用生物素化的中间接头将所述连接 DNA标签接头的 DNA片段进行环化, 以便获 得环化产物;
将所述环化产物进行断裂, 以便获得断裂的环化产物;
从所述断裂的环化产物中富集目的片段;
在所述目的片段的两端分别连接接头, 以便获得连接产物; 以及
将所述连接产物进行 PCR扩增, 以便获得扩增产物, 所述扩增产物构成所述配对末端 标签文库。
8、 根据权利要求 7所述的方法, 其特征在于, 所述 DNA样品为原核生物或真核生物 DNA样品。
9、 根据权利要求 7所属的方法, 所述片段化是通过选自雾化法, 超声法和 Hydroshear 法的至少一种进行的。
10、 根据权利要求 7所述的方法, 其特征在于, 利用链霉亲和素磁珠进行所述富集。
11、 根据权利要求 7所述的方法, 其特征在于, 所述 DNA片段的长度为 1000-4000bp。
12、 根据权利要求 7所述的方法, 其特征在于, 所述 DNA标签接头为权利要求 2所述 的一组分离的寡核苷酸。
13、 根据权利要求 7所述的方法, 其特征在于, 在利用生物素化的中间接头将所述连 接 DNA标签接头的 DNA片段进行环化之前,进一步包括将所述连接 DNA标签接头的 DNA 片段进行片段选择的步骤。
14、 根据权利要求 13所述的方法, 其特征在于, 利用选自脉冲凝胶电泳、 蔗糖或氯化 铯梯度沉降和分子排阻层析的至少一种进行所述片段选择。
15、根据权利要求 14所述的方法, 其特征在于, 利用脉冲凝胶电泳进行所述片段选择。
16、根据权利要求 15所述的方法, 其特征在于,所述连接 DNA标签接头的 DNA片段 的长度为 1500-2000 bp。
17、根据权利要求 7所述的方法,其特征在于,所述中间接头的两条链分别具有 SEQ ID NO: 27和 SEQ ID NO: 28所示的核苷酸序列。
18、 根据权利要求 7所述的方法, 其特征在于, 釆用 T4 DNA连接酶进行所述环化。
19、 根据权利要求 7 所述的方法, 其特征在于, 将所述环化产物进行断裂之前, 进一 步包括将环化产物中未环化的连接 DNA标签接头的 DNA片段进行消化的步骤。
20、根据权利要求 19所述的方法,其特征在于,利用 Plasmidsafe核酸酶进行所述消化。
21、根据权利要求 7所述的方法, 其特征在于, 使用选自超声断裂法和酶切法的至少 一种将所述环化产物进行断裂。
22、 根据权利要求 21所述的方法, 其特征在于, 使用选自限制性内切酶法和缺刻平 移-外切酶法的至少一种进行所述断裂。
23、 根据权利要求 7所述的方法, 其特征在于, 在所述目的片段的两端分别连接 P1接 头和 P2接头, 其中, 所述 P1接头的两条链分别具有 SEQ ID NO: 33和 SEQ ID NO: 34 所示的核苷酸序列, 所述 P2接头的两条链分别具有 SEQ ID NO: 35和 SEQ ID NO: 36所 示的核苷酸序列。
24、 根据权利要求 7所述的方法, 其特征在于, 将所述 P1接头和所述 P2接头分别连 接到所述目的片段的 5'端和 3'端。
25、根据权利要求 24所述的方法,其特征在于,所述 PCR扩增釆用分别具有 SEQ ID NO: 37和 SEQ ID NO: 38所示核苷酸序列的引物。
26、 根据权利要求 23所述的方法, 其特征在于, 所述 PCR扩增为乳液 PCR, 其中所 述乳液 PCR釆用磁珠, 所述磁珠携带有特异性识别 P1接头的寡核苷酸。
27、 一种配对末端标签文库, 其是通过根据权利要求 7-26任一项所述的方法获得的。
28、 一种确定 DNA样品序列信息的方法, 其包括下列步骤:
根据权利要求 7-26任一项所述的方法构建所述 DNA样品的配对末端标签文库; 以及 对所述配对末端标签文库进行测序, 以便确定所述 DNA样品的序列信息。
29、根据权利要求 28所述的方法, 其特征在于, 利用 ABI SOLiD测序平台对所述配对 末端标签文库进行测序。
30、 根据权利要求 28所述的方法, 其特征在于, 利用分别与所述文库的两端序列特异 性配对的测序引物, 对所述配对末端标签文库进行测序。
31、 一种用于构建配对末端标签文库的试剂盒, 其包括:
一组分离的寡核苷酸, 所述分离的寡核苷酸具有第一链和第二链,
其巾, 所述第一链为 5'-ACAGCAG(N)5 或 5'-phos-ACAGCAG(N)5 所述第二链为 5'-phos-(N')5CTGCTGTAC,
其中, (N)5为权利要求 1所述的分离的 DNA标签, (N')5为 (N)5的互补序列。
PCT/CN2012/072970 2011-03-24 2012-03-23 Dna标签及其用途 WO2012126398A1 (zh)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201110071176.2 2011-03-24
CN2011100711762A CN102690809B (zh) 2011-03-24 2011-03-24 Dna标签及其在构建和测序配对末端标签文库中的应用

Publications (1)

Publication Number Publication Date
WO2012126398A1 true WO2012126398A1 (zh) 2012-09-27

Family

ID=46856545

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2012/072970 WO2012126398A1 (zh) 2011-03-24 2012-03-23 Dna标签及其用途

Country Status (3)

Country Link
CN (1) CN102690809B (zh)
HK (1) HK1175196A1 (zh)
WO (1) WO2012126398A1 (zh)

Families Citing this family (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104630211B (zh) * 2013-11-15 2018-03-02 苏州吉玛基因股份有限公司 一种Small RNA cDNA文库的构建方法
CN104695027B (zh) * 2013-12-06 2017-10-20 中国科学院北京基因组研究所 测序文库及其制备和应用
CN104232760B (zh) * 2014-08-26 2017-03-15 深圳华大基因医学有限公司 确定混合测序数据中读段的样本源的方法及装置
CN105420348B (zh) * 2014-09-04 2019-10-15 中国科学院北京基因组研究所 改进的测序文库及其制备和应用
CN106795650B (zh) * 2014-09-26 2021-03-09 深圳华大基因股份有限公司 Pf快速建库方法及其应用
CN105525357B (zh) * 2014-09-30 2018-08-21 深圳华大基因股份有限公司 一种测序文库的构建方法及试剂盒和应用
CN105154444A (zh) * 2015-10-15 2015-12-16 南京普东兴生物科技有限公司 一种有效提高建库效率的非对称高通量测序接头及其应用
CN108779487A (zh) * 2015-11-16 2018-11-09 普罗格尼迪公司 用于检测甲基化状态的核酸和方法
CN105926043B (zh) * 2016-04-19 2018-08-28 苏州贝康医疗器械有限公司 一种提高孕妇血浆游离dna测序文库中胎儿游离dna占比的方法
CN109576800A (zh) * 2018-12-07 2019-04-05 北京安智因生物技术有限公司 一种遗传性扩张型心肌病的基因检测文库的构建方法及其试剂盒
CN111462818B (zh) * 2019-01-22 2023-04-21 武汉华大医学检验所有限公司 测序产量预测方法和建立测序产量预测模型的方法及装置

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2008045575A2 (en) * 2006-10-13 2008-04-17 J. Craig Venter Institute, Inc. Sequencing method
CN101434988A (zh) * 2007-11-16 2009-05-20 深圳华因康基因科技有限公司 一种高通量寡核苷酸测序方法
CN101921840A (zh) * 2010-06-30 2010-12-22 深圳华大基因科技有限公司 一种基于dna分子标签技术和dna不完全打断策略的pcr测序方法
CN101967476A (zh) * 2010-09-21 2011-02-09 深圳华大基因科技有限公司 一种基于接头连接的DNA PCR-Free标签文库构建方法

Family Cites Families (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20020127575A1 (en) * 2000-10-30 2002-09-12 Glenn Hoke Partially double-stranded nucleic acids, methods of making, and use thereof
CN101921748B (zh) * 2010-06-30 2012-11-14 上海华大基因科技有限公司 用于高通量检测人类乳头瘤病毒的dna分子标签

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2008045575A2 (en) * 2006-10-13 2008-04-17 J. Craig Venter Institute, Inc. Sequencing method
CN101434988A (zh) * 2007-11-16 2009-05-20 深圳华因康基因科技有限公司 一种高通量寡核苷酸测序方法
CN101921840A (zh) * 2010-06-30 2010-12-22 深圳华大基因科技有限公司 一种基于dna分子标签技术和dna不完全打断策略的pcr测序方法
CN101967476A (zh) * 2010-09-21 2011-02-09 深圳华大基因科技有限公司 一种基于接头连接的DNA PCR-Free标签文库构建方法

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
ONDOV, B.D. ET AL.: "Efficient mapping of Applied Biosystems SOLiD sequence data to a reference genome for functional genomic applications", BIOINFORMATICS, vol. 24, no. 23, 7 October 2008 (2008-10-07), pages 2776 - 2777 *
WANG, SHENGYUE: "Prospects of clinical application of new generational high flux sequencing technique", GUANGDONG MEDICAL JOURNAL, vol. 31, no. 3, 28 February 2010 (2010-02-28), pages 269 - 272 *

Also Published As

Publication number Publication date
HK1175196A1 (en) 2013-06-28
CN102690809A (zh) 2012-09-26
CN102690809B (zh) 2013-12-04

Similar Documents

Publication Publication Date Title
WO2012126398A1 (zh) Dna标签及其用途
US10995367B2 (en) Vesicular adaptor and uses thereof in nucleic acid library construction and sequencing
US10400279B2 (en) Method for constructing a sequencing library based on a single-stranded DNA molecule and application thereof
US20210355537A1 (en) Compositions and methods for identification of a duplicate sequencing read
US11827933B2 (en) Bubble-shaped adaptor element and method of constructing sequencing library with bubble-shaped adaptor element
EP2880182B1 (en) Recombinase mediated targeted dna enrichment for next generation sequencing
JP6542771B2 (ja) 核酸プローブ及びゲノム断片検出方法
CN106715713B (zh) 试剂盒及其在核酸测序中的用途
JP6430631B2 (ja) リンカー要素、及び、それを使用してシーケンシングライブラリーを構築する方法
CN115516109A (zh) 条码化核酸用于检测和测序的方法
US20230159984A1 (en) Gene target region enrichment method and kit
WO2012116661A1 (zh) Dna标签及其应用
WO2012037882A1 (zh) Dna标签及其应用
EP2580378A2 (en) Methods and composition for multiplex sequencing
WO2012037880A1 (zh) Dna标签及其应用
WO2012037884A1 (zh) Dna标签及其应用
US20140336058A1 (en) Method and kit for characterizing rna in a composition
JP2015516814A (ja) 標的化されたdnaの濃縮および配列決定
CN114729349A (zh) 条码化核酸用于检测和测序的方法
WO2012037875A1 (zh) Dna标签及其应用
WO2021027236A1 (zh) 构建dna文库的方法及其应用
US20190316181A1 (en) Methods and reagents for molecular barcoding
CN108342385A (zh) 一种接头和通过高效率环化方式构建测序文库的方法
WO2014086037A1 (zh) 构建核酸测序文库的方法及其应用
US20230366021A1 (en) METHOD OF PREPARATION OF cDNA LIBRARY USEFUL FOR EFFICIENT mRNA SEQUENCING AND USES THEREOF

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 12760061

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 12760061

Country of ref document: EP

Kind code of ref document: A1