WO2017054302A1 - 测序文库及其制备和应用 - Google Patents

测序文库及其制备和应用 Download PDF

Info

Publication number
WO2017054302A1
WO2017054302A1 PCT/CN2015/095380 CN2015095380W WO2017054302A1 WO 2017054302 A1 WO2017054302 A1 WO 2017054302A1 CN 2015095380 W CN2015095380 W CN 2015095380W WO 2017054302 A1 WO2017054302 A1 WO 2017054302A1
Authority
WO
WIPO (PCT)
Prior art keywords
sequence
nucleotide sequence
sequencing
stranded
double
Prior art date
Application number
PCT/CN2015/095380
Other languages
English (en)
French (fr)
Inventor
阮珏
王开乐
Original Assignee
中国农业科学院深圳农业基因组研究所
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 中国农业科学院深圳农业基因组研究所 filed Critical 中国农业科学院深圳农业基因组研究所
Publication of WO2017054302A1 publication Critical patent/WO2017054302A1/zh
Priority to US15/903,911 priority Critical patent/US11702690B2/en

Links

Images

Classifications

    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12QMEASURING OR TESTING PROCESSES INVOLVING ENZYMES, NUCLEIC ACIDS OR MICROORGANISMS; COMPOSITIONS OR TEST PAPERS THEREFOR; PROCESSES OF PREPARING SUCH COMPOSITIONS; CONDITION-RESPONSIVE CONTROL IN MICROBIOLOGICAL OR ENZYMOLOGICAL PROCESSES
    • C12Q1/00Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions
    • C12Q1/68Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions involving nucleic acids
    • C12Q1/6869Methods for sequencing
    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12NMICROORGANISMS OR ENZYMES; COMPOSITIONS THEREOF; PROPAGATING, PRESERVING, OR MAINTAINING MICROORGANISMS; MUTATION OR GENETIC ENGINEERING; CULTURE MEDIA
    • C12N15/00Mutation or genetic engineering; DNA or RNA concerning genetic engineering, vectors, e.g. plasmids, or their isolation, preparation or purification; Use of hosts therefor
    • C12N15/09Recombinant DNA-technology
    • C12N15/10Processes for the isolation, preparation or purification of DNA or RNA
    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12NMICROORGANISMS OR ENZYMES; COMPOSITIONS THEREOF; PROPAGATING, PRESERVING, OR MAINTAINING MICROORGANISMS; MUTATION OR GENETIC ENGINEERING; CULTURE MEDIA
    • C12N15/00Mutation or genetic engineering; DNA or RNA concerning genetic engineering, vectors, e.g. plasmids, or their isolation, preparation or purification; Use of hosts therefor
    • C12N15/09Recombinant DNA-technology
    • C12N15/10Processes for the isolation, preparation or purification of DNA or RNA
    • C12N15/1034Isolating an individual clone by screening libraries
    • C12N15/1065Preparation or screening of tagged libraries, e.g. tagged microorganisms by STM-mutagenesis, tagged polynucleotides, gene tags
    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12NMICROORGANISMS OR ENZYMES; COMPOSITIONS THEREOF; PROPAGATING, PRESERVING, OR MAINTAINING MICROORGANISMS; MUTATION OR GENETIC ENGINEERING; CULTURE MEDIA
    • C12N15/00Mutation or genetic engineering; DNA or RNA concerning genetic engineering, vectors, e.g. plasmids, or their isolation, preparation or purification; Use of hosts therefor
    • C12N15/09Recombinant DNA-technology
    • C12N15/10Processes for the isolation, preparation or purification of DNA or RNA
    • C12N15/1034Isolating an individual clone by screening libraries
    • C12N15/1068Template (nucleic acid) mediated chemical library synthesis, e.g. chemical and enzymatical DNA-templated organic molecule synthesis, libraries prepared by non ribosomal polypeptide synthesis [NRPS], DNA/RNA-polymerase mediated polypeptide synthesis
    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12NMICROORGANISMS OR ENZYMES; COMPOSITIONS THEREOF; PROPAGATING, PRESERVING, OR MAINTAINING MICROORGANISMS; MUTATION OR GENETIC ENGINEERING; CULTURE MEDIA
    • C12N15/00Mutation or genetic engineering; DNA or RNA concerning genetic engineering, vectors, e.g. plasmids, or their isolation, preparation or purification; Use of hosts therefor
    • C12N15/09Recombinant DNA-technology
    • C12N15/10Processes for the isolation, preparation or purification of DNA or RNA
    • C12N15/1034Isolating an individual clone by screening libraries
    • C12N15/1086Preparation or screening of expression libraries, e.g. reporter assays
    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12NMICROORGANISMS OR ENZYMES; COMPOSITIONS THEREOF; PROPAGATING, PRESERVING, OR MAINTAINING MICROORGANISMS; MUTATION OR GENETIC ENGINEERING; CULTURE MEDIA
    • C12N15/00Mutation or genetic engineering; DNA or RNA concerning genetic engineering, vectors, e.g. plasmids, or their isolation, preparation or purification; Use of hosts therefor
    • C12N15/09Recombinant DNA-technology
    • C12N15/10Processes for the isolation, preparation or purification of DNA or RNA
    • C12N15/1034Isolating an individual clone by screening libraries
    • C12N15/1089Design, preparation, screening or analysis of libraries using computer algorithms
    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12NMICROORGANISMS OR ENZYMES; COMPOSITIONS THEREOF; PROPAGATING, PRESERVING, OR MAINTAINING MICROORGANISMS; MUTATION OR GENETIC ENGINEERING; CULTURE MEDIA
    • C12N15/00Mutation or genetic engineering; DNA or RNA concerning genetic engineering, vectors, e.g. plasmids, or their isolation, preparation or purification; Use of hosts therefor
    • C12N15/09Recombinant DNA-technology
    • C12N15/10Processes for the isolation, preparation or purification of DNA or RNA
    • C12N15/1034Isolating an individual clone by screening libraries
    • C12N15/1093General methods of preparing gene libraries, not provided for in other subgroups
    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12QMEASURING OR TESTING PROCESSES INVOLVING ENZYMES, NUCLEIC ACIDS OR MICROORGANISMS; COMPOSITIONS OR TEST PAPERS THEREFOR; PROCESSES OF PREPARING SUCH COMPOSITIONS; CONDITION-RESPONSIVE CONTROL IN MICROBIOLOGICAL OR ENZYMOLOGICAL PROCESSES
    • C12Q1/00Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions
    • C12Q1/68Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions involving nucleic acids
    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12QMEASURING OR TESTING PROCESSES INVOLVING ENZYMES, NUCLEIC ACIDS OR MICROORGANISMS; COMPOSITIONS OR TEST PAPERS THEREFOR; PROCESSES OF PREPARING SUCH COMPOSITIONS; CONDITION-RESPONSIVE CONTROL IN MICROBIOLOGICAL OR ENZYMOLOGICAL PROCESSES
    • C12Q1/00Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions
    • C12Q1/68Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions involving nucleic acids
    • C12Q1/6806Preparing nucleic acids for analysis, e.g. for polymerase chain reaction [PCR] assay
    • CCHEMISTRY; METALLURGY
    • C40COMBINATORIAL TECHNOLOGY
    • C40BCOMBINATORIAL CHEMISTRY; LIBRARIES, e.g. CHEMICAL LIBRARIES
    • C40B40/00Libraries per se, e.g. arrays, mixtures
    • C40B40/04Libraries containing only organic compounds
    • C40B40/06Libraries containing nucleotides or polynucleotides, or derivatives thereof
    • CCHEMISTRY; METALLURGY
    • C40COMBINATORIAL TECHNOLOGY
    • C40BCOMBINATORIAL CHEMISTRY; LIBRARIES, e.g. CHEMICAL LIBRARIES
    • C40B50/00Methods of creating libraries, e.g. combinatorial synthesis
    • C40B50/06Biochemical methods, e.g. using enzymes or whole viable microorganisms
    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12NMICROORGANISMS OR ENZYMES; COMPOSITIONS THEREOF; PROPAGATING, PRESERVING, OR MAINTAINING MICROORGANISMS; MUTATION OR GENETIC ENGINEERING; CULTURE MEDIA
    • C12N2310/00Structure or type of the nucleic acid
    • C12N2310/30Chemical structure
    • C12N2310/35Nature of the modification
    • C12N2310/351Conjugate
    • C12N2310/3517Marker; Tag
    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12NMICROORGANISMS OR ENZYMES; COMPOSITIONS THEREOF; PROPAGATING, PRESERVING, OR MAINTAINING MICROORGANISMS; MUTATION OR GENETIC ENGINEERING; CULTURE MEDIA
    • C12N2320/00Applications; Uses
    • C12N2320/10Applications; Uses in screening processes
    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12QMEASURING OR TESTING PROCESSES INVOLVING ENZYMES, NUCLEIC ACIDS OR MICROORGANISMS; COMPOSITIONS OR TEST PAPERS THEREFOR; PROCESSES OF PREPARING SUCH COMPOSITIONS; CONDITION-RESPONSIVE CONTROL IN MICROBIOLOGICAL OR ENZYMOLOGICAL PROCESSES
    • C12Q1/00Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions
    • C12Q1/68Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions involving nucleic acids
    • C12Q1/6844Nucleic acid amplification reactions
    • C12Q1/686Polymerase chain reaction [PCR]
    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12QMEASURING OR TESTING PROCESSES INVOLVING ENZYMES, NUCLEIC ACIDS OR MICROORGANISMS; COMPOSITIONS OR TEST PAPERS THEREFOR; PROCESSES OF PREPARING SUCH COMPOSITIONS; CONDITION-RESPONSIVE CONTROL IN MICROBIOLOGICAL OR ENZYMOLOGICAL PROCESSES
    • C12Q2521/00Reaction characterised by the enzymatic activity
    • C12Q2521/50Other enzymatic activities
    • C12Q2521/501Ligase
    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12QMEASURING OR TESTING PROCESSES INVOLVING ENZYMES, NUCLEIC ACIDS OR MICROORGANISMS; COMPOSITIONS OR TEST PAPERS THEREFOR; PROCESSES OF PREPARING SUCH COMPOSITIONS; CONDITION-RESPONSIVE CONTROL IN MICROBIOLOGICAL OR ENZYMOLOGICAL PROCESSES
    • C12Q2521/00Reaction characterised by the enzymatic activity
    • C12Q2521/50Other enzymatic activities
    • C12Q2521/514Mismatch repair protein
    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12QMEASURING OR TESTING PROCESSES INVOLVING ENZYMES, NUCLEIC ACIDS OR MICROORGANISMS; COMPOSITIONS OR TEST PAPERS THEREFOR; PROCESSES OF PREPARING SUCH COMPOSITIONS; CONDITION-RESPONSIVE CONTROL IN MICROBIOLOGICAL OR ENZYMOLOGICAL PROCESSES
    • C12Q2600/00Oligonucleotides characterized by their use
    • C12Q2600/16Primer sets for multiplex assays
    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12QMEASURING OR TESTING PROCESSES INVOLVING ENZYMES, NUCLEIC ACIDS OR MICROORGANISMS; COMPOSITIONS OR TEST PAPERS THEREFOR; PROCESSES OF PREPARING SUCH COMPOSITIONS; CONDITION-RESPONSIVE CONTROL IN MICROBIOLOGICAL OR ENZYMOLOGICAL PROCESSES
    • C12Q2600/00Oligonucleotides characterized by their use
    • C12Q2600/166Oligonucleotides used as internal standards, controls or normalisation probes

Definitions

  • the invention relates to a sequencing library and its preparation and use.
  • the method of adding a tag is to add the tag to the DNA by polymerase chain reaction by adding the tag to the end of the specific primer.
  • the polymerase chain reaction occurs when the label is added, such an error is difficult to remove in subsequent experiments, thereby limiting its detection of extremely low frequency sites.
  • a very large limitation of the method of exogenous labeling of DNA is that this method can only target small genomes or a small number of target genes, and comprehensive detection of the entire genome cannot be achieved. Because the labeling method needs to detect the same and complementary labels in order to correct the positive and negative strands of DNA, it requires a very high sequencing depth, so it is difficult to achieve for large genomes.
  • ⁇ , ⁇ , and the like have developed a method for constructing a DNA library (Patent Application No. 201310651462.5) by cyclizing a single strand of a DNA molecule, performing a rolling circle amplification, and concatenating the products of the same molecule in tandem Together, through separate measurements of the two copies before and after Sequence information, correcting errors in the process of library construction and sequencing, effectively reducing the error rate of sequencing and increasing the utilization of data.
  • a DNA library Patent Application No. 201310651462.5
  • the present invention provides a sequencing library and its preparation and application.
  • the target sequence referred to in the present invention refers to a sequence in which the present invention is used as an insert of a sequencing library provided by the present invention as a sequencing target.
  • linker sequence refers to a sequence designed by the present invention for ligation to one or both ends of a target sequence to effect cyclization of the target sequence.
  • the linker sequence in the present invention can be designed as a single linker or as a double linker, and when it is a double linker, the doublet is annealed to two at least partially complementary single-stranded nucleotide sequences.
  • the linker sequence can be designed by a person skilled in the art according to the selected enzymes and reaction conditions, based on conventional techniques in the art, in the prepared double strands.
  • the denaturation temperature of the region between the two strand gaps should be higher than the reaction temperature of the strand displacement enzyme used.
  • the linker sequence can be, for example, SEQ ID NO: 1 and/or SEQ ID NO: 2.
  • the two linker sequences when designed as a double-linker sequence, can be annealed to yield a double-linker sequence, and the 5' end of the ligated target sequence and linker sequence ligated product should be phosphorylated.
  • ligation sequence refers to a sequence of a chain double-stranded nucleotide sequence obtained by the double-stranded circular nucleotide sequence prepared by the present invention, which ligates the target sequences at both ends. In the linked sequence, at least a portion of the region has a reverse complementary sequence.
  • the sequencing length of the sequencer referred to in the present invention means that for double-end sequencing, the sequencing length of the sequencer is equal to the sum of the lengths of the double-end sequencing; for single-ended sequencing, the sequencing length of the sequencer is equal to the length of the single-ended sequence. .
  • nicking enzyme means that, in general, when a restriction enzyme binds to a DNA recognition sequence, both strands of DNA are simultaneously hydrolyzed.
  • Each endonuclease has two hydrolysis domains, which act on the two strands of DNA, respectively catalyze the hydrolysis reaction and endo-endase, which hydrolyzes only one strand of the double-stranded DNA and nicks the DNA strand. Instead of cutting, the nicking action produces 3'-hydroxyl and 5'-phosphate groups.
  • the site to be cleavage referred to in the present invention refers to a site in the nucleotide sequence which can be cleaved for a single-stranded nucleotide sequence; for a double-stranded nucleotide sequence, it refers to a chain A site that can be cleaved while the corresponding site on the other strand cannot be cleaved.
  • gap refers to a sequence non-contiguous region in a double-stranded nucleotide sequence, which may be one or more bases in length.
  • the sequencing library refers to a collection of DNA fragments for sequencing containing a target sequence and other sequences (eg, a sequencing linker).
  • the invention relates to a single-stranded circular nucleotide sequence having at least one site to be cleavable.
  • the single-stranded circular nucleotide sequence can have a site to be cleavage.
  • the site to be cut can be a dUTP base, an 8-oxo-dGTP or a nick endonuclease recognition site.
  • a double-stranded cyclic nucleotide sequence having at least one site to be cut or one gap per strand is involved.
  • the double-stranded circular nucleotide sequence has a gap in one strand and the other strand has at least one site to be cut.
  • the to-be-cut site is in the 5' direction of the gap.
  • both strands of the double-stranded circular nucleotide sequence have at least one site to be cleavage.
  • the double-stranded circular nucleotide sequence has a gap in each of the two strands.
  • the closest distance between the gap/to-cut site on one strand of the double-stranded circular nucleotide sequence and the gap/to-cut site on the other strand is preferably greater than 6 bases.
  • the site to be cleavage can be a dUTP base, an 8-oxo-dGTP or a nick endonuclease recognition site.
  • a nucleotide sequence comprising a ligation sequence and two target sequences, the two ends of the ligation sequence are respectively linked to the target sequence, and the two target sequences are in the same direction Repeat the sequence.
  • the linker sequence has a reverse complementary region.
  • At least one of said target sequences may be further linked to another sequence at the other end of said linker sequence, said other sequence being at least partially identical to said linker portion.
  • the target sequence length is less than the sequencing length of the sequencer.
  • the sum of the length of the other sequence and the target sequence is less than the sequencing length of the sequencer.
  • a nucleotide sequence is composed of a ligation sequence and a target sequence that is contiguous with both ends of the ligation sequence, and the two target sequences are homologous repeats.
  • the target sequence length is less than the sequencing length of the sequencer.
  • a nucleotide sequence is composed of a ligation sequence and two target sequences, the two ends of the ligation sequence are respectively linked to the target sequence, and the two target regions of the target sequence are Repeat in the same direction.
  • a sequencing library comprising any of the nucleotide sequences described above.
  • a linker sequence which has at least one cleavable site in the case of joining other nucleotides at both ends.
  • the linker sequence is 6-100 bp.
  • the linker sequence can be a double stranded nucleotide sequence.
  • the above linker sequence is produced by preparing the above-described single-stranded circular nucleotide sequence, the above-described double-stranded cyclic nucleotide sequence, the above nucleotide sequence, the above nucleotide sequence or the above-described sequencing library. application.
  • One aspect of the present invention relates to the use of the above-described single-stranded circular nucleotide sequence for the preparation of the above-described double-stranded circular nucleotide sequence, the above nucleotide sequence, the above nucleotide sequence or the above-described sequencing library.
  • One aspect of the present invention relates to the use of the above-described double-stranded cyclic nucleotide sequence for the preparation of the above nucleotide sequence, the above nucleotide sequence or the above-described sequencing library.
  • the use of the above nucleotide sequence in the preparation of the above-described sequencing library is involved.
  • a method for producing the above single-stranded cyclic nucleotide sequence comprising:
  • the target sequence is ligated to a linker sequence comprising a cleavable base, an nickable cleavage site or a gap to obtain a double-stranded or single-stranded linker sequence; when the resulting linker sequence is a double-stranded sequence, the denatured single-stranded Thereafter, single-strand cyclization is carried out; when the ligation sequence obtained in the step is a single-stranded sequence, single-strand cyclization is directly performed.
  • a method for producing the above double-stranded cyclic nucleotide sequence comprising:
  • the single-stranded circular nucleotide sequence is subjected to complementary strand synthesis, and a primer having no phosphorylation at the 5' end is used to form a double-stranded circular structure having a complementary strand gap; or, the target sequence and the nickable base are contained
  • the double-stranded sequence obtained by ligation of the cleavable cleavage site or the nicked linker sequence is directly double-stranded.
  • a method for producing the above nucleotide sequence comprising:
  • the double-stranded circular nucleotide sequence is cleaved to obtain a double-stranded circular nucleotide sequence having a gap in both strands; and the double-stranded circular nucleotide sequence having a gap in both strands is performed. Strand displacement amplification.
  • a method for producing the above-described sequencing library comprises the steps of: terminating the above nucleotide sequence with A tail, and ligating a sequencing adaptor to perform a PCR reaction.
  • the gene sequencing includes, but is not limited to, genomic DNA sequencing, target fragment capture sequencing (eg, exon capture sequencing), sequencing of single-stranded DNA fragments, sequencing of fossil DNA, or free DNA in body fluids (eg, blood, urine, saliva). Sequencing.
  • a sequencing method is involved, the sequencing method comprising the step of using the above-described sequencing library.
  • a sequencing kit comprising an end The compound is added with A tail reagent, DNA ligase, linker sequence, single-chain cyclization reagent, nickase and strand displacement reagent.
  • Embodiments of the present invention provide a method for preparing the above-described sequencing library, the method comprising:
  • the cyclized linking sequence obtained in the step (2) is subjected to complementary strand synthesis, and the primer having no phosphorylation at the 5' end is used to form a gap with a complementary strand (notch 1).
  • Double-stranded cyclic structures see Figures 1 and 2; in the case of double-stranded cyclization, the product after double-stranded cyclization is a double-stranded cyclic structure, see Figure 3.
  • the double-stranded circular structure obtained in the step (3) is nicked: when the double-stranded circular structure is obtained by synthesizing a complementary strand after single-chain cyclization, it is formed on a DNA strand having a cleavable site. Cut the notch (notch 2).
  • the closest distance between the nicked notch (notch 2) and the complementary chain notch (notch 1) is preferably 6 or more bases, and the nicked notch is located in the 5' direction of the complementary strand notch, see FIG. 1; further, complementary strand synthesis is employed
  • the primers may also contain a site to be cleavage, and after dicing, a complementary strand nick is formed (notch 3).
  • the complementary strand nicking gap is preferably located in the 3' direction of the complementary strand gap, see FIG. 2; when the double-stranded cyclic structure is obtained by double-stranded cyclization, the two strands respectively form nicked notches, as shown in FIG. Notch 1 and notch 2.
  • the second or third generation sequencing library construction can be carried out by using the nucleotide sequence obtained in the step (5). For example, after performing the terminal repair and the A tail on the nucleotide sequence obtained in the step (5), the sequencing linker is ligated, and then the PCR reaction is carried out to obtain a DNA sequencing library.
  • the linker sequence ligated after DNA fragmentation may be ligated to the 5' end, the 3' end or both ends of the target sequence, either in a single-stranded form or in a double-stranded form.
  • the linker sequence is ligated to both ends of the target sequence, the resulting single-stranded circular nucleotide sequence or the ligated sequence of the double-stranded circular nucleotide sequence is joined by a linker sequence at both ends, and the linker sequence is reverse-complementary region.
  • the resulting single-stranded circular nucleotide sequence or the linked sequence of the double-stranded circular nucleotide sequence is a linker sequence.
  • the linker sequence is linked to the target sequence by a single-stranded linkage, single strand direct cyclization yields a single-stranded circular nucleotide sequence.
  • the linker sequence is joined to the target sequence by a double-stranded linkage, the double strand is first denatured into a single strand and re-cyclized to yield a single-stranded circular nucleotide sequence.
  • the linked linker sequence contains a cleavable site or a gap already exists.
  • the cyclization mode of the present invention may adopt a single chain cyclization mode or a double chain cyclization mode.
  • the primers used are primers that are not phosphorylated at the 5' end, and the primers may contain bases capable of forming cleavable sites. (eg dUTP, 8-oxo-dGTP, etc.) or nicking endonuclease nicking recognition site (eg, 5'-GC ⁇ TGAGG-3' of the endonuclease Nb.BbvCI), or may not contain the above Bases and sites, which may be matched to a partial region of the linker sequence or may be a primer that matches a known sequence in the target sequence.
  • double-stranded cyclization since the cyclized DNA is double-stranded, no complementation is required. Chain synthesis.
  • the manner of generating the gap may be various, for example, designing one or more bases of dUTP, 8-oxo-dGTP and the like in the primer used in the complementary strand synthesis, and the complementary linker sequence is complementary.
  • the enzyme can be cleaved with dUTP and 8-oxo-dGTP (such as UDG, USER enzyme, etc.) to form a gap; in the primers used for complementary strand synthesis, the linked linker sequence is designed to be nicked.
  • the nicking recognition site of the dicer is nicked by DNA nicking to generate a gap or the like.
  • strand displacement activity DNA polymerase e.g., Bst DNA polymerase (large fragment), Bst 2.0DNA polymerase, phi29DNA polymerase, DisplaceAce TM DNA Polymerase , etc.
  • the sequence of the linker comprising the insert and the sequence of the two copies of the target sequence are: 5'-partial linker sequence (the part of the linker sequence may be absent)-target sequence-linker sequence- The target sequence-partial linker sequence (which may be absent)-3', as shown in Figures 1, 2 and 3.
  • the sequencing library provided by the embodiments of the present invention can be used for sequencing platforms such as second generation and third generation sequencing.
  • the linker sequence may contain random base regions, such as may be 2-30 bases, used as labels to distinguish between different target sequences.
  • DNA amplification technology based on strand displacement reaction in DNA amplification of strand displacement reaction, certain DNA polymerases (for example, including Phi 29 DNA polymerase, Bst DNA polymerase (large fragment)) are extending If a downstream DNA strand is encountered during the process of the new strand, the extension reaction can be continued while the downstream duplex is stripped to produce isothermal amplification of the free single-stranded DNA.
  • DNA amplification based on strand displacement reactions does not require thermal denaturation.
  • the DNA amplification based on the strand displacement reaction includes, for example, strand displacement amplification, rolling circle amplification, multiple strand displacement amplification, and loop-mediated amplification.
  • the second generation sequencing method refers to sequencing while synthesizing (Sequencing by Synthesis), a method for determining the sequence of a DNA by capturing a label of a newly synthesized end, including but not limited to Roche/454 FLX, Illumina/Solexa Genome Analyzer, and Applied Biosystems SOLID system.
  • the third-generation sequencing method refers to a single-molecule sequencing technology, that is, when DNA sequencing is performed, individual sequencing of each DNA molecule can be realized without PCR amplification.
  • a single-molecule sequencing technology that is, when DNA sequencing is performed, individual sequencing of each DNA molecule can be realized without PCR amplification.
  • These include, but are not limited to, single molecule fluorescent sequencing, and representative techniques are the Helicos SMS technology and the Pacific Bioscience SMART technology, and nanopore sequencing.
  • DNA amplification errors and sequencing errors can be effectively removed to accurately detect mutations present on DNA molecules.
  • the linker sequence By ligating the linker sequence to the end of the small fragment of the DNA to be sequenced, and then denaturation of the chimera, the single-stranded target sequence and the linker sequence are ligated to the fragment DNA, followed by single-strand cyclization, followed by cyclized single-stranded DNA.
  • Complementary strand synthesis, nicking site nicking, and strand displacement enzyme chain replacement are independent of each other during the amplification process, and therefore the errors generated when replicating on the respective units are also independent.
  • the above-mentioned product is constructed by sequencing library, and the library is sequenced, and one or two repeating units are detected for each sequencing, and the sequences measured by the two repeating units are mutually confirmed, and the bases of the two repeating units are inconsistent. That is, the polymerase chain reaction error or sequencing error generated during the preparation of the library or during the sequencing process, and the consensus sequence is the original sequence.
  • the principle of the present invention is clarified below using a sequencing error rate of 1/100 (the error rate of second generation sequencing is 1/100 to 1/1000).
  • the probability of a type error occurring simultaneously at the same site of two repeating units on a consensus sequence is: 1/3*(1/100) 2 , which is the error rate of 3*10 -5 (more repeat unit uniform base) The base has a lower probability of error).
  • the probability of the same error in two different consistent sequences is: (1/3*(1/100) 2 ) 2 ie 9*10 -10 , therefore, this method is extremely effective in removing the library construction process and sequencing process. The resulting error has reached the goal of precise sequencing.
  • this method can only amplify one original DNA by four times, effectively preventing the rolling-loop amplification, and some easily amplified sequences are rapidly amplified in a certain period of time. Some areas that are difficult to amplify are slow or even unable to amplify. Effectively eliminates the extreme imbalance of rolling circle amplification. Efficient, balanced coverage of the genome is achieved.
  • target region capture such as: exon capture, target gene capture, target gene screening
  • the two copies copied from the original DNA are tandem, independent sequences.
  • the probe captures a molecule containing at least two nucleic acid sequences of the same repeating unit, and when the captured sequence is sequenced, the DNA sequence can be accurately determined.
  • the complementary strand synthesis of the DNA molecule after cyclization can be directly carried out using a primer that matches the target gene (one or more); if the cyclization mode For double-stranded cyclization, the double strand is denatured, and then the complementary strand of the cyclized DNA molecule can be synthesized by using a primer that matches the target gene (one or more), thereby enriching only the target gene.
  • a primer that matches the target gene one or more
  • the sequence consisting of the adaptor sequence constructed by the method and the two copies of the target sequence can be used to construct a plurality of second generation short fragment sequencing libraries, making it suitable for various sequencing platforms.
  • Figure 1 Flow chart of the construction of the sequencing library of the present invention (primer without nicking base).
  • the DNA macromolecule is fragmented, and a linker having a nicked base (such as dUTP, 8-oxo-dGTP, nicking endonuclease recognition site, etc.) is ligated to perform single-strand cyclization.
  • the cyclized DNA molecule adopts common primers without nicking bases for complementary strand synthesis, and nicks to generate gaps (select the corresponding nicking method according to the nicked bases in the linker), the strand displacement linker sequence and two copies
  • the sequence of target sequences consists.
  • the double-stranded DNA after strand displacement was constructed, sequenced, and analyzed by standard high-throughput sequencing libraries.
  • Figure 2 Flow chart of the construction of the sequencing library of the present invention (primers with nicked bases).
  • the DNA macromolecule is fragmented, and a linker having a nicked base (such as dUTP, 8-oxo-dGTP, nicking endonuclease recognition site, etc.) is ligated to perform single-strand cyclization.
  • the cyclized DNA molecule is subjected to complementary strand synthesis using a primer containing a nicked base, and then nicked to generate a gap (selecting the corresponding nicking method according to the nicked base in the linker), the strand displacement joining sequence and two copies
  • the sequence of target sequences consists.
  • the double-stranded DNA after strand displacement was constructed, sequenced, and analyzed by standard high-throughput sequencing libraries.
  • Figure 3 Flow chart of the construction of the sequencing library of the present invention (primers with nicked bases).
  • the DNA macromolecule is fragmented, and a linker having a nicked base (such as dUTP, 8-oxo-dGTP, nicking endonuclease recognition site, etc.) is ligated to perform double-stranded cyclization.
  • the cyclized DNA molecule is cleaved to generate a gap (selecting the corresponding nicking pattern according to the nicked base in the linker), and the strand replacement synthesizes the sequence consisting of the linked sequence and the two copies of the target sequence.
  • the double-stranded DNA after strand displacement was constructed, sequenced, and analyzed by standard high-throughput sequencing libraries.
  • Figure 4 This method can be used for the screening of target genes.
  • the complementary strands of DNA molecules which have been cyclized by primers that match the target gene (one or more) can be synthesized by nicking, strand displacement synthesis and sequencing.
  • the library is effective for enrichment of the target gene, and sequencing of the selected target gene is achieved.
  • a single-stranded or double-stranded cyclization is carried out by ligating a short-segment DNA molecule to a linker sequence, and a double-stranded, triple-nicked or multi-nicked double-stranded circular DNA molecule is obtained by cyclization and nicking. Further, the strand-replacement enzyme is amplified to obtain a sequence consisting of two target sequences which are identical by at least a partial region linked by a ligation sequence, and a sequencing library is constructed and sequenced. Specifically, at least the following three schemes can be implemented.
  • the DNA is randomly interrupted into less than half of the sequence read by the second-generation sequencer (the length after the interruption plus the length of the 5' partial linker is preferably less than half of the read length), and then the linker sequence is ligated, wherein the linker
  • the sequence contains nicked bases (eg, dUTP, 8-oxo-dGTP, nick endonuclease recognition sites, etc.). Upon high temperature denaturation, it immediately cools and turns the DNA into a single strand.
  • the DNA containing the linker sequence after single-stranding is cyclized with a single-chain cyclase.
  • the cyclized DNA is subjected to complementary strand synthesis using common primers without nicking bases, and then nicked to generate gaps (selecting the corresponding nicking method according to the nicked bases in the linker), and the strand displacement synthesis target sequence and two A sequence of copies of the target sequence.
  • the double-stranded DNA after strand displacement was constructed, sequenced, and analyzed by standard high-throughput sequencing libraries.
  • the DNA is randomly interrupted into less than half of the sequence read by the second-generation sequencer (the length after the interruption plus the length of the 5' partial linker is preferably less than half of the read length), and then the linker sequence is ligated, wherein the linker
  • the sequence contains nicked bases (eg, dUTP, 8-oxo-dGTP, nick endonuclease recognition sites) Etc., the number of bases nicked is not limited).
  • nicked bases eg, dUTP, 8-oxo-dGTP, nick endonuclease recognition sites
  • the cyclized DNA is subjected to complementary strand synthesis using primers containing nicked bases (such as dUTP, 8-oxo-dGTP, nicking endonuclease recognition sites, etc.), and then nicking A gap is generated (depending on the nicking base in the linker, the corresponding nicking mode is selected), and the strand replaces the sequence consisting of the linked sequence and the two copies of the target sequence.
  • primers containing nicked bases such as dUTP, 8-oxo-dGTP, nicking endonuclease recognition sites, etc.
  • the DNA is randomly interrupted into less than half of the sequence read by the second-generation sequencer (the length after the interruption plus the length of the 5' partial linker is preferably less than half of the read length), and then the linker sequence is ligated, wherein the linker
  • the sequence contains a nicked base (such as dUTP, 8-oxo-dGTP, a nicking endonuclease recognition site, etc.) or dephosphorylation of the DNA molecule or linker sequence upon cyclization.
  • the double-stranded DNA after strand displacement was constructed, sequenced, and analyzed by standard high-throughput sequencing libraries.
  • Interrupted tube Covaris Microtube 6x16mm, catalog#:520045
  • the product was purified using Agencourt AMPure XP (Beckman Coulter, Inc) magnetic beads.
  • UO-A was dissolved in 100 pmol of UO-adaptor1 (annealing buffer: 10 mM Tris-HCl (pH 7.5), 1 mM EDTA, 0.1 mM NaCl) and 100 pmol of UO-adaptor 2 (annealing buffer: 10 mM Tris-HCl) (pH 7.5), 1 mM EDTA, 0.1 mM NaCl) was formed by equal volume mixing annealing (94 ° C for 5 min, gradually decreasing to 0.1 ° C per second to 25 ° C).
  • the linker sequence includes, but is not limited to, the sequence forms of UO-adaptor1 and UO-adaptor2 in the examples. The same below.
  • Exonuclease III (E.coli): 1 ⁇ l
  • the product was purified using a MinElute Reaction Cleanup Kit
  • the product was purified using Agencourt AMPure XP (Beckman Coulter, Inc) magnetic beads.
  • UO-p1 5'-AGCACGTACGACTGATCT-3' (SEQ ID NO: 3)
  • the product was purified using Agencourt AMPure XP (Beckman Coulter, Inc) magnetic beads.
  • NEXTflex TM DNA Barcodes Bioo Scientific Corporation, Catalog #: 514101): 0.5 ⁇ l, total: 83 ⁇ l
  • the product was purified using Agencourt AMPure XP (Beckman Coulter, Inc) magnetic beads.
  • the product was purified using Agencourt AMPure XP (Beckman Coulter, Inc) magnetic beads.
  • the eluted DNA is the constructed library, which can be used for sequencing on the second generation sequencing platform.
  • Example 2 Construction of a whole genome DNA library according to the above scheme 2 (here, a three-gap scheme is taken as an example)
  • DNA fragmentation, end-filling plus A, ligation of the linker, and single-strand cyclization step is the same as in Example 1.
  • the product was purified using Agencourt AMPure XP (Beckman Coulter, Inc) magnetic beads.
  • UO-p1-2 5'-AGCACGTACGACTGAUCT-3' (SEQ ID NO: 4)
  • the product can be used to construct a second-generation, three-generation sequencing library.
  • Example 3 Construction of a whole genome DNA library according to the above scheme 3 (double-stranded cyclization protocol, the linker contains a site to be cleavage)
  • DNA fragmentation ( ⁇ 700 bp, fragmentation conditions: duty cycle: 5%, intensity: 3, cycles per burst: 200, time: 75 s)), end-filling plus A, joints as in Example 1, joints
  • the sequence is: UO-A2, which is annealed by the following two sequences:
  • T4PNK T4 Polynucleotide Kinase, NEB, M0201S
  • the product was purified with 1 ⁇ Ampure XP magnetic beads.
  • the product was purified using 1XAmpure XP magnetic beads.
  • Exonuclease III (E.coli): 1 ⁇ l
  • the product was purified using a MinElute Reaction Cleanup Kit
  • the product was purified using Agencourt AMPure XP (Beckman Coulter, Inc) magnetic beads.
  • the product can be used to construct a one, two or three generation sequencing library.
  • DNA fragmentation ( ⁇ 700 bp, fragmentation conditions: duty cycle: 5%, intensity: 3, cycles per burst: 200, time: 75 s), end-filling plus A
  • the product was purified using 1X Ampure XP magnetic beads at 37 ° C for 60 min.
  • the product was purified using 1XAmpure XP magnetic beads.
  • the linker sequence is: UO-A3, which is annealed by the following two sequences:
  • Exonuclease III (E.coli): 1 ⁇ l
  • the product was purified using a MinElute Reaction Cleanup Kit
  • the product was purified using Agencourt AMPure XP (Beckman Coulter, Inc) magnetic beads.
  • the product can be used to construct a one, two or three generation sequencing library.
  • Magnetic beads (Invitrogen TM : M-280 Streptavidin, Catalog#: 11205D) Grab the hybridized fragment (50 ⁇ l magnetic beads, wash three times with 200 ⁇ l SureSelect Binding Buffer, resuspend the magnetic beads with 200 ⁇ l SureSelect Binding Buffer, add the hybridized product, leave it at room temperature for 30 min, magnetic beads adsorption, SureSelect Wash 1 wash once, SureSelect Wash 2 wash three times, 36.5 ⁇ l ddH 2 O resuspend magnetic beads), see Agilent: SureSelect Human All Exon Kits Operating Manual.
  • Beckman Coulter, Inc Agencourt AMPure XP, Item No. A63880
  • dNTPs 100 mM; 25 mM each dNTP: 0.5 ⁇ l
  • the primer sequences are as follows:
  • the eluted DNA is the constructed human exon library, which can be used for the second generation sequencing Sequencing.
  • Example 6 DNA library construction of peripheral blood free DNA
  • peripheral blood DNA was subjected to terminal complementation plus A, ligation linker, single-strand cyclization, complementary strand synthesis, strand displacement and subsequent Illumina library construction steps, as in Example 1.
  • Example 7 Selection of phage Phix174 library sequencing data analysis in Example 1
  • the target sequence size range is 30-107 bp, the average size is 91.86817 bp, the standard deviation is 14.42506, and the median is 94 bp.
  • the sequencing error rate refers to the proportion of sites in the consensus sequence that are different from the reference sequence. Using this principle, the error rate of DNA in the measured data is calculated. Assuming that there are no low frequency mutations in the sample, the sequencing error rate for this method is 10 -5 . Sequencing errors are distributed differently on different bases (bases of the reference genome), and the specific sequencing error rates are shown in Table 1.
  • the single base error rate (10 -5 ) of the method is much lower than the error rate of the second generation sequencing (1%), and is far lower than some existing improved methods.
  • the method completely eliminates the error rate problem of the second generation sequencing, and realizes ultra-precise sequencing of the DNA molecule by means of the second generation sequencing technology platform.
  • the sequencing depth at any one of the genomes should be equal to the average sequencing depth of the whole gene, ie the ratio should be 1. After taking the logarithm of e for the above ratio, the result should be 0. If the starting template cannot be amplified equally, then there must be some sites on the genome.
  • the sequencing depth is not equal to the average sequencing depth of the whole gene, that is, the ratio is greater than 1 or less than 1, that is, the logarithm of the ratio should be greater than 0 or less than 0.
  • the logarithmic depth of the sequencing depth of almost all sites and the whole genome average sequencing depth ratio are uniformly distributed above and below zero. Even the site with the highest sequencing depth, the ratio of the site to the whole genome sequencing depth is smaller than the natural logarithm e, that is, the logarithm of the ratio is less than 1, achieving uniform replication of the whole genome, and the amplification product is better. More uniform coverage of the entire genome.
  • the techniques provided by the present invention efficiently and uniformly amplify circular DNA molecules.
  • Another advantage of this method is that the sequencing accuracy is independent of the sequencing depth, which solves the problem that the labeling method must be able to accurately determine the DNA sequence under the extremely high sequencing coverage multiplier, thus enabling the realization of large genomes (such as human genomes). And so on) accurate sequencing.
  • the method of the present invention is capable of ultra-accurate determination of the molecular composition of DNA in a cell, and can present a DNA composition in a normal or pathologically occurring cell population such as a cancer tissue.
  • a cancer tissue In the detection of cancer, it can be used to detect whether a certain tissue or organ of a normal individual has developed a potential carcinogenic mutation to achieve the purpose of early detection of cancer and prevention of cancer.
  • this method can detect the distribution of DNA mutations in cancer populations; it can be used to discover potential small clonal populations in cancer tissues to truly understand the heterogeneous structure of tumors; it can help to explain mutations in The role played by the development of cancer; can be used to find cancer stem cells.
  • the method can be used to detect mutations in DNA in normal cells in an individual, thereby tracing the growth pattern of normal tissues; and measuring the number of DNA mutations in a tissue in individuals of different ages, thereby Estimate the rate of DNA mutation; can be used to detect the presence of mutations associated with various diseases in a normal individual, to achieve the purpose of preventing disease, and the like.
  • the method can effectively construct the free DNA in the peripheral blood, and can effectively detect the low-frequency mutation sites existing in the peripheral blood.
  • This non-invasive detection means the occurrence and development of cancer. Effective detection and evaluation of harmful mutations in the fetus during prenatal diagnosis.
  • the sequencing of ancient human DNA is the main means to study human evolution, but there are many problems in the determination of ancient human DNA.
  • the biggest problems are the extraction of ancient human DNA with low content, serious degradation and serious microbial contamination.
  • the method can construct a library by using a very small amount of DNA (single and double strands), and the constructed library can perform exon capture (removing microbial genome contamination), and can effectively address these several problems in the construction process of the ancient DNA library.
  • a sequencing library construction kit can be provided, which can include an end-filling plus A tail reagent, a DNA ligase, a linker sequence, a single-chain cyclization reagent, a second strand synthesis reagent, a nickase,
  • the strand displacement reagent, dNTP (2.5 mM), BSA (100X) can specifically include the following:
  • End-filling plus A-tail reagent Contains 10X end-blend plus A buffer (500 mM Tris-HCl, 100 mM MgCl 2 , 100 mM DTT, 10 mM ATP, 4 mM dATP, 4 mM dCTP, 4 mM dGTP, 4 mM dTTP, pH 7.5 @ 25 ° C ), T4DNA Polymerase (3U/ ⁇ l), Klenow DNA Polymerase (0.5U/ ⁇ l), T4Polynucleotide Kinase (10U/ ⁇ l), Thermophilic modified DNA polymerase (5U/ ⁇ l).
  • a buffer 500 mM Tris-HCl, 100 mM MgCl 2 , 100 mM DTT, 10 mM ATP, 4 mM dATP, 4 mM dCTP, 4 mM dGTP, 4 mM dTTP, pH 7.5 @ 25 ° C
  • T4DNA Polymerase
  • DNA ligase T4 DNA ligase (20 U/ ⁇ l), 5XT4 DNA ligase buffer (250 mM Tris-HCl, 50 mM MgCl 2 , 5 mMATP, 50 mM DTT, pH 7.5 @ 25 ° C)
  • Single-chain cyclization reagent single-chain cyclase (100 U/ ⁇ l), 50 mM MnCl 2 , 10X single-chain cyclase buffer (0.33 M Tris-Acetate (pH 7.5), 0.66 M potassium acetate and 5 mm DTT)
  • Second strand synthesis reagent DNA Polymerase I (E. coli) (10 U / ⁇ l) 10X buffer: (500 mM NaCl, 100 mM, Tris-HCl, 100 mM MgCl 2 , 10 mM DTT, pH 7.9 @ 25 ° C
  • Chain displacement reagent Bst DNA polymerase large fragment (8U/ ⁇ l), 10X Bst DNA polymerase buffer (200mM Tris-HCl, 100mM (NH4) 2 SO 4 , 100mM KCl, 20mM MgSO 4 , 1% X-100, pH 8.8@25°C)

Landscapes

  • Chemical & Material Sciences (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Engineering & Computer Science (AREA)
  • Organic Chemistry (AREA)
  • Genetics & Genomics (AREA)
  • Zoology (AREA)
  • Wood Science & Technology (AREA)
  • General Engineering & Computer Science (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Biotechnology (AREA)
  • Biomedical Technology (AREA)
  • Biochemistry (AREA)
  • Molecular Biology (AREA)
  • Microbiology (AREA)
  • General Health & Medical Sciences (AREA)
  • Physics & Mathematics (AREA)
  • Biophysics (AREA)
  • Proteomics, Peptides & Aminoacids (AREA)
  • Analytical Chemistry (AREA)
  • Plant Pathology (AREA)
  • Crystallography & Structural Chemistry (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Immunology (AREA)
  • Chemical Kinetics & Catalysis (AREA)
  • General Chemical & Material Sciences (AREA)
  • Medicinal Chemistry (AREA)
  • Measuring Or Testing Involving Enzymes Or Micro-Organisms (AREA)

Abstract

提供一种测序文库,该文库包括一种核苷酸序列,该序列包括一段连接序列和两条目标序列,连接序列两端分别连接所述目标序列,两条所述目标序列为同向重复序列。

Description

测序文库及其制备和应用
本申请要求在2015年9月30日提交中国专利局、申请号为201510638417.5、发明名称为“测序文库及其制备和应用”的中国专利申请的优先权,其全部内容通过引用结合在本申请中。
技术领域
本发明涉及一种测序文库及其制备和应用。
背景技术
第二代测序技术的发展,推动了生物学以及生物医学研究的革命性发展。但是由于高通量测序本身的特点,在测得的序列中存在约1%的碱基错误。虽然在一些应用中1%的错误率是可以忍受的,但是在很多情况下,这1%的错误却掩盖了很多真实的信息,而成为很多研究的障碍。比如:在微生物诱变过程中,如何监测某种诱变剂在不同诱变浓度下造成突变频率分布模式,从而有效地优化诱变体系、提高诱变效率;如何在一个大的诱变群体中筛选某株带有目标突变的目标菌;检测一个正常个体的某一组织或器官是否存在潜在的致癌突变位点、检测癌症细胞群体中DNA组成的异质性以及隐藏的小克隆群体、利用每个细胞中的DNA突变作为标记追溯该细胞的起源及分裂模式、准确获取一个高度杂合的癌症群体中的基因型、计算癌症细胞或体细胞分裂时突变产生的速率、寻找生物医学治疗中一些小群体(如癌症干细胞等)中存在的致病突变,筛查或检测外周血游离DNA中的致病、致癌突变以及及疾病的早期预测等。如何利用现有的二代测序技术准确的测定DNA的序列,就成了一个非 常关键的问题。
截止目前,有一些方法尝试从生物、化学等方面对二代测序的错误进行改进,如无扩增的建库方法,有效的避免了文库准备过程中因聚合酶链式反应扩增产生的错误。通过对样品DNA和参考DNA分别加相应的标签,从而有效过滤链特异性的错误。有一些方法则从数据分析角度降低二代测序的错误率。另外,还有一些方法通过DNA随机打断时产生的断点信息或者在聚合酶链式反应扩增之前对DNA模板加入相应的标签来矫正由于聚合酶链式反应扩增产生的错误。通过加入标签,就可以确定哪些DNA分子来自于同一条分子,从而达到矫正的作用。
这些方法从一定程度上提高了二代测序的准确性,但是由于各自方法的缺陷性,比如在金迪及其同事的文章(Kinde I,Wu J,Papadopoulos N,Kinzler KW,Vogelstein B(2011)Detection and quantification ofrare mutations with massively parallel sequencing.Proc NatlAcad Sci USA 108:9530–9535)中,加入标签的方法是通过将标签加在特定引物的末端,通过聚合酶链式反应的方式将标签加入到DNA分子中,当加入标签时的聚合酶链式反应发生错误时,这种错误在后面的实验中就很难去除,从而限制了其对极低频位点的检测。对DNA进行外源加标签方法的一个非常大的局限是这种方法只能针对于小的基因组或者少数的目的基因,无法实现对整个基因组的全面检测。因为标签法需要测到相同和互补的标签才能起到DNA正负链相互校正的目的,这样就需要极高的测序深度,因此对于大的基因组是很难实现的。
鉴于此,阮珏、王开乐等人开发了一种DNA文库构建方法(201310651462.5号专利申请文件),该方法通过将DNA分子单链环化,进行滚环扩增,将同一分子复制的产物串联在一起,通过前后两个拷贝的单独测 序信息,校正去除文库构建及测序过程中产生的错误,有效降低了测序的错误率,增加了数据的利用率。但是由于滚环扩增产生的偏好性,极大限制了其应用。在后续实验中,王开乐,阮珏等针对滚环扩增的偏好性问题,做了进一步改进(201410448968.0号专利申请文件),在一定程度上降低了滚环扩增的偏好性。但是依然没有较好的解决滚环扩增所引入的巨大偏好性。滚环扩增具有极大的序列偏好性,以至于个别环状DNA扩增倍数极大,而绝大多数环扩增的倍数很低,在后续的测序过程中,很难实现对整个基因组的全面的有效的精准检测。
综上所述,开发一种能够快速、有效、高精确地测定DNA序列的测序文库是十分必要的。
发明内容
为了解决现有技术中DNA测序准确率不能满足实际需求的问题,本发明提供了一种测序文库及其制备和应用。
除非具体描述中,存在以下定义之外的特别限定,否则,本发明中的相关名词使用以下定义:
本发明所称目标序列是指,本发明作为本发明所提供测序文库的插入片段,作为测序目标的序列。
本发明所称接头序列是指,本发明所设计,用于连接到目标序列一端或两端,以实现目标序列环化的序列。在本发明中的接头序列可以设计为单链接头,也可以设计为双链接头,当为双链接头时,该双链接头为两条至少部分互补的单链核苷酸序列退火而成。本发明中,接头序列可以由本领域技术人员根据所选用的酶及反应条件,基于本领域的常规技术手段设计而成,在所制备的双链 环状核苷酸序列中,两条链缺口之间区域的变性温度应高于所使用链置换酶的反应温度。在本发明的一个实施方案中,所述接头序列例如可以为SEQ ID NO:1和/或SEQ ID NO:2所示。在本发明的实施方案中,当设计为双链接头序列时,两条接头序列可以进行退火,得到双链接头序列,需要环化的目标序列与接头序列连接产物的5’端应当磷酸化。
本发明所称连接序列是指,由本发明所制备的双链环状核苷酸序列所得到的链式双链核苷酸序列中,连接两端目标序列的序列。该连接序列中,至少部分区域存在反向互补序列。
本发明所称测序仪的测序长度是指:对双端测序而言,测序仪的测序长度等于双端测序长度之和;对单端测序而言,测序仪的测序长度等于单端序列的长度。
本发明所称的切刻内切酶是指:一般情况下,限制性内切酶结合到DNA识别序列上时,同时水解DNA两条链。每个内切酶都具有两个水解功能域,分别作用于DNA的两条链,分别催化水解反应而内切内切酶,它们只水解双链DNA中的一条链,对DNA链进行切刻而不是切断,切刻作用产生3’-羟基及5’-磷酸基。
本发明所称的待切位点,对于单链核苷酸序列而言,是指该核苷酸序列中可以被切断的位点;对于双链核苷酸序列而言,是指一条链上可以被切开,而另一条链上相应位点不能被切开的位点。
本发明所称缺口是指,双链核苷酸序列中的序列非连续区域,其长度可以是1个或多个碱基。
在本发明中,所述测序文库是指含有目标序列和其它序列(例如测序接头)的用于测序的DNA片段的集合。
作为本发明的一个方面,涉及一种单链环状核苷酸序列,所述单链环状核苷酸序列有至少一个待切位点。
在一具体实施方式中,所述单链环状核苷酸序列可以有一个待切位点。
在一具体实施例中,所述待切位点可以为dUTP碱基、8-oxo-dGTP或切刻内切酶识别位点。
作为本发明的一个方面,涉及一种双链环状核苷酸序列,所述双链环状核苷酸序列每条链有至少有一个待切位点或一个缺口。
在一具体实施方式中,所述双链环状核苷酸序列一条链有缺口,另一条链有至少一个待切位点。具体地,所述待切位点在所述缺口的5'方向。
在一具体实施方式中个,所述双链环状核苷酸序列两条链都有至少一个待切位点。
在一具体实施方式中个,所述双链环状核苷酸序列两条链各有一个缺口。
在具体实施例中,所述双链环状核苷酸序列一条链上的缺口/待切位点与另一条链上的缺口/待切位点之间的最近距离优选大于6个碱基。
在具体实施例中,所述待切位点可以是dUTP碱基、8-oxo-dGTP或切刻内切酶识别位点。
作为本发明的一个方面,涉及一种核苷酸序列,所述序列包括一段连接序列和两条目标序列,所述连接序列两端分别连接所述目标序列,两条所述目标序列为同向重复序列。
在一具体实施方式中个,所述连接序列存在反向互补区域。
在具体实施例中,至少一条所述目标序列可以在与所述接头序列相接的另一端还连接有其他序列,所述其他序列至少部分区域与所述连接序列部分区域相同。
在具体实施方式中,所述目标序列长度小于测序仪的测序长度。
在具体实施例中,所述其他序列与所述目标序列的长度之和小于测序仪的测序长度。
作为本发明的一个方面,涉及一种核苷酸序列由连接序列以及与所述连接序列两端相接的目标序列构成,两条所述目标序列为同向重复序列。所述目标序列长度小于测序仪的测序长度。
作为本发明的一个方面,涉及一种核苷酸序列,所述序列由一段连接序列和两条目标序列构成,所述连接序列两端分别连接所述目标序列,两条所述目标序列部分区域同向重复。
作为本发明的一个方面,涉及一种测序文库,所述文库包括上述任一核苷酸序列。
作为本发明的一个方面,涉及一种接头序列,在两端连接其他核苷酸的情况下,所述接头序列有至少一个可切刻位点。
在具体实施方式中,所述接头序列为6-100bp。
在具体实施例中,所述接头序列可以为双链核苷酸序列。
作为本发明的一个方面,涉及上述接头序列在制备上述单链环状核苷酸序列、上述双链环状核苷酸序列、上述核苷酸序列、上述核苷酸序列或上述测序文库中的应用。
作为本发明的一个方面,涉及上述单链环状核苷酸序列在制备上述双链环状核苷酸序列、上述核苷酸序列、上述核苷酸序列或上述测序文库中的应用。
作为本发明的一个方面,涉及上述双链环状核苷酸序列在制备上述核苷酸序列、上述核苷酸序列或上述测序文库中的应用。
作为本发明的一个方面,涉及上述核苷酸序列在制备上述核苷酸序列或上述测序文库中的应用。
作为本发明的一个方面,涉及上述核苷酸序列在制备上述测序文库中的应用。
作为本发明的一个方面,涉及制备上述单链环状核苷酸序列的方法,包括:
将目标序列与含有可切刻碱基、可切刻的酶切位点或者缺口的接头序列连接,得到双链或单链连接序列;当得到的连接序列为双链序列时,变性单链化后,进行单链环化;当步骤得到的连接序列为单链序列时,直接进行单链环化。
作为本发明的一个方面,涉及制备上述双链环状核苷酸序列的方法,包括:
将上述单链环状核苷酸序列进行互补链合成,使用5’末端没有磷酸化的引物,形成带有互补链缺口的双链环状结构体;或者,将目标序列与含有可切刻碱基、可切刻的酶切位点或者缺口的接头序列连接所得到的双链序列直接双链环化。
作为本发明的一个方面,涉及制备上述核苷酸序列的方法,包括:
将上述双链环状核苷酸序列进行切刻,得到两条链上都有缺口的双链环状核苷酸序列;对两条链上都有缺口的双链环状核苷酸序列进行链置换扩增。
作为本发明的一个方面,涉及制备上述测序文库的方法,包括:上述核苷酸序列进行末端修复加A尾,连接测序接头,进行PCR反应。
作为本发明的一个方面,涉及上述测序文库在基因测序中的应用。所述基因测序包括但不限于基因组DNA测序、目标片段捕获测序(例如外显子捕获测序)、单链DNA片段的测序、化石DNA的测序或体液(例如血液、尿液、唾液)中游离DNA的测序。
作为本发明的一个方面,涉及一种测序方法,所述测序方法包括使用上述测序文库的步骤。
作为本发明的一个方面,涉及一种测序试剂盒,所述测序试剂盒包括末端 补平加A尾试剂、DNA连接酶、接头序列、单链环化试剂、切口酶和链置换试剂。
本发明实施例提供一种制备上述测序文库的方法,所述方法包括:
(1)将目标序列与接头序列(含有可切刻碱基、可切刻的酶切位点或者缺口)连接,得到双链或单链连接序列;
(2)步骤(1)得到的连接序列环化:当步骤(1)得到的连接序列为双链序列时,直接双链环化,或者变性单链化后,进行单链环化;当步骤(1)得到的连接序列为单链序列时,直接进行单链环化;
(3)当为单链环化时,需将步骤(2)得到的环化的连接序列进行互补链合成,使用5’末端没有磷酸化的引物,形成带有互补链缺口(缺口1)的双链环状结构体,见附图1和2;双链环化时,双链环化后的产物既是双链环状结构体,见附图3。
(4)将步骤(3)得到的双链环状结构体进行切刻:当双链环状结构体是单链环化后互补链合成得到时,在有可切位点的DNA链上形成切刻缺口(缺口2)。切刻缺口(缺口2)与互补链缺口(缺口1)的最近距离优选大于等于6个碱基,切刻缺口位于互补链缺口的5’方向,见附图1;进一步地,互补链合成采用的引物上也可以含有待切位点,经切刻处理后,形成互补链切刻缺口(缺口3)。互补链切刻缺口优选位于互补链缺口的3’方向,见附图2;当双链环状结构体是双链环化后得到时,两条链上分别形成切刻缺口,如附图3缺口1和缺口2。
(5)将步骤(4)得到的,在两条链上都有缺口的环状DNA进行链置换扩增,形成形式为5'-部分接头序列(该部分接头序列可以没有)-目标序列-连接序列-目标序列-3'部分接头序列(该部分接头序列可以没有)的核苷酸序 列,其中的连接序列为前述步骤中所接入的接头序列连接而成。
(6)利用步骤(5)得到的核苷酸序列,即可进行二代或者三代测序文库构建。如:对步骤(5)得到的核苷酸序列,进行末端修复加A尾后,连接测序接头,然后进行PCR反应,得到DNA测序文库。
在本发明中,DNA片段化后连接的接头序列,可以连接在目标序列的5'末端,3'末端或者两端,可以是通过单链连接的形式,也可以通过双链连接的形式。接头序列连接在目标序列两端的情况,所得到的单链环状核苷酸序列或双链环状核苷酸序列的连接序列由两端的接头序列连接而成,所述连接序列存在反向互补区域。接头序列连接在目标序列单端的情况,所得到的单链环状核苷酸序列或双链环状核苷酸序列的连接序列为接头序列。当所述接头序列通过单链连接的形式连接于目标序列的情况,单链直接环化得到单链环状核苷酸序列。当所述接头序列通过双链连接的形式连接于目标序列的情况,双链先变性为单链,再环化得到单链环状核苷酸序列。以上过程本领域技术人员依据本领域的常规技术手段既可以实现。
在本发明中,双链环状核苷酸序列中,连接的接头序列含有可切位点或者已存在缺口。
本发明环化的方式可采用单链环化的方式也可以采用双链环化的方式。
当采用单链环化的方式时,需要进行互补链的合成,在互补链合成时,所采用的引物为5'末端没有磷酸化的引物,该引物可以含有能形成可切位点的碱基(如dUTP,8-oxo-dGTP等)或切刻内切酶切刻识别位点(如:切刻内切酶Nb.BbvCI的5'-GC▲TGAGG-3’),也可以不含有上述碱基和位点,该引物可以是与接头序列部分区域匹配,也可以是与目标序列中的已知序列相匹配的引物。当采用双链环化的方式时,由于环化后的DNA为双链,不需要进行互补 链合成。
本发明中缺口产生的方式可以有多种,比如:在互补链合成所采用的引物中、连接的接头序列中设计一个或多个碱基的dUTP、8-oxo-dGTP等碱基,在互补链合成后,用能切刻dUTP、8-oxo-dGTP的酶(如UDG、USER酶等)切刻,形成缺口;在互补链合成所采用的引物中、连接的接头序列中设计切刻内切酶的切刻识别位点,通过DNA切刻内切酶切刻产生缺口等。
本发明中,链置换合成采用具有链置换活性的DNA聚合酶(如Bst DNA polymerase(large fragment),Bst 2.0DNA polymerase,phi29DNA polymerase,DisplaceAceTM DNA Polymerase等)进行合成。
本发明实施例之一所提供测序文库中,插入片段包含的接头序列和两个拷贝目标序列的排列顺序是:5'-部分接头序列(该部分接头序列可以没有)-目标序列-连接序列-目标序列-部分接头序列(该部分接头序列可以没有)-3',如附图1、2和3所示。
本发明实施例所提供测序文库可以用于第二代、第三代测序等测序平台。
在本发明的实施方案中,所述接头序列可以含有随机碱基区域,比如可以是2-30个碱基,用做标签以区分不同的目标序列。
在本发明中,采用基于链置换反应的DNA扩增技术,链置换反应的DNA扩增中,某些DNA聚合酶(例如包括Phi 29DNA聚合酶,Bst DNA聚合酶(大片段))在在延伸新链的过程中如果遇到下游DNA链,可以继续延伸反应并同时将下游双链剥离而产生游离的单链的DNA等温扩增。通常情况下,基于链置换反应的DNA扩增无需热变性。所述基于链置换反应的DNA扩增例如包括链置换扩增、滚环扩增、多重链置换扩增和环介导的扩增等。
在本发明中,所述第二代测序方法是指边合成边测序(Sequencing by  Synthesis),即通过捕捉新合成的末端的标记来确定DNA的序列的方法,其包括但不限于Roche/454FLX、Illumina/Solexa Genome Analyzer和Applied Biosystems SOLID system。
在本发明中,所述第三代测序方法是指单分子测序技术,即DNA测序时,不需要经过PCR扩增,即可实现对每一条DNA分子的单独测序。其包括但不限于单分子荧光测序,代表性的技术为美国螺旋生物(Helicos)的SMS技术和美国太平洋生物(Pacific Bioscience)的SMART技术,以及纳米孔测序(nanopore sequencing)。
本发明提供的测序文库及其应用,至少实现了如下有益效果:
1、在任何测序深度下,都能有效去除DNA扩增错误和测序错误,从而超精确检测DNA分子上存在的突变。
通过连接接头序列到待测序的DNA小片段末端,然后对这种嵌合体变性,得到单链的目标序列与接头序列连接片段DNA,再进行单链环化,而后对环化后的单链DNA进行互补链合成,切刻位点切刻,链置换酶链置换。链置换复制所得到的两个重复单元,在扩增过程是相互独立的,因此,在各自单元上复制时所产生的错误也是独立的。对上述产物进行测序文库构建,并对该文库进行测序,则每次测序会测到1到2次重复单元,将2次重复单元测得的序列进行相互确认,两次重复单元不一致的碱基,即是文库制备过程中或测序过程中产生的聚合酶链式反应错误或测序错误,一致的序列即是原始序列。
以下使用测序错误率为1/100(二代测序的错误率是1/100至1/1000)来阐明本发明的原理。一条一致序列上两个重复单元的同一位点同时发生一种类型错误的概率是:1/3*(1/100)2,即3*10-5的错误率(更多的重复单元一致碱基的错误概率更低)。那么两条不同的一致序列出现同样错误的概率为: (1/3*(1/100)2)2即9*10-10,因此,该方法极其有效的去除了文库构建过程和测序过程中产生的错误,达到了精确测序的目的。
2,均衡的扩增待测DNA序列,实现基因组的均衡测序。
由于该方法采用的双切口链置换扩增方式,该方式只能使一个原始的DNA扩增四倍,有效防止了滚环扩增中,某些容易扩增的序列在一定时间内迅速扩增,某些难扩增的区域缓慢甚至无法扩增的现象。有效的消除了滚环扩增的极度不均衡性。实现了对基因组有效的、均衡的覆盖。
3、能够兼容目标区域捕获(如:外显子捕获、目的基因捕获、目的基因筛选)等方法。
本发明提供的接头序列与两个拷贝的目标序列组成的序列中,由原始DNA复制的两个拷贝是串联在一起的,相互独立的序列。在进行目标区域捕获时,探针捕获到的分子至少含有两个同向重复单元的核酸序列,对捕获到的序列进行测序时,便能够精确的测定DNA序列。当采用目的基因筛选时,如果环化方式为单链环化,直接采用可以与目的基因(1个或多个)相匹配的引物进行环化后的DNA分子的互补链合成;如果环化方式为双链环化,需将双链变性后,再采用可以与目的基因(1个或多个)相匹配的引物进行环化后的DNA分子的互补链合成,从而起到只富集目的基因的目的,如附图4。
4、适用于微量DNA短片段甚至单链DNA测序文库的构建。
由于单链环化所需的DNA起始量小(纳克级别甚至更低),片段短(30-200碱基对)。因此适用于外周血游离DNA和古化石等降解严重的DNA的测序文库构建。
5、该方法构建的接头序列与两个拷贝的目标序列组成的序列可用于构建多种第二代短片段测序文库,使其适用于各种测序平台。
附图说明
图1:本发明所述的测序文库构建流程图(无切刻碱基的引物)。DNA大分子经片段化,连接带切刻碱基(如dUTP、8-oxo-dGTP、切刻内切酶识别位点等)的接头后,进行单链环化。环化后的DNA分子采用无切刻碱基的普通引物进行互补链合成,切刻产生缺口(根据接头中的切刻碱基,选择相应的切刻方式),链置换连接序列与两个拷贝的目标序列组成的序列。链置换后的双链DNA进行标准的高通量测序文库构建,测序,并进行数据分析。
图2:本发明所述的测序文库构建流程图(有切刻碱基的引物)。DNA大分子经片段化,连接带切刻碱基(如dUTP、8-oxo-dGTP、切刻内切酶识别位点等)的接头后,进行单链环化。环化后的DNA分子采用含切刻碱基的引物进行互补链合成,而后切刻产生缺口(根据接头中的切刻碱基,选择相应的切刻方式),链置换连接序列与两个拷贝的目标序列组成的序列。链置换后的双链DNA进行标准的高通量测序文库构建,测序,并进行数据分析。
图3:本发明所述的测序文库构建流程图(有切刻碱基的引物)。DNA大分子经片段化,连接带切刻碱基(如dUTP、8-oxo-dGTP、切刻内切酶识别位点等)的接头后,进行双链环化。环化后的DNA分子,切刻产生缺口(根据接头中的切刻碱基,选择相应的切刻方式),链置换合成连接序列与两个拷贝的目标序列组成的序列。链置换后的双链DNA进行标准的高通量测序文库构建,测序,并进行数据分析。
图4:该方法可用于目的基因筛选,采用可以与目的基因(1个或多个)相匹配的引物进行环化后的DNA分子的互补链合成,再经切刻,链置换合成后构建测序文库,从而有效进行目的基因富集,实现对筛选的目的基因进行测序。
具体实施方式
下面将结合实施例对本发明的实施方案进行详细描述,但是本领域技术人员将会理解,下列实施例仅用于说明本发明,而不应视为限定本发明的范围。实施例中未注明具体条件者,按照常规条件或制造商建议的条件进行。所用试剂或仪器未注明生产厂商者,均为可以通过市购获得的常规产品。
本发明的创新点之一在于,通过对短片段DNA分子连接接头序列,进行单链或者双链环化,环化后切刻得到双缺口、三缺口或者多缺口的双链环状DNA分子,再经链置换酶扩增,得到由一个连接序列连接的两个至少部分区域相同的目标序列组成的序列,构建测序文库并测序。具体来讲,至少可以采用如下三种方案来实现。
方案一(单链环化方案之双缺口方案):
首先将DNA随机打断成小于二代测序仪测序读长一半的片段(打断后的长度加上5’部分接头序列的长度最好小于读长一半),然后连接上接头序列,其中该接头序列含有切刻碱基(如dUTP、8-oxo-dGTP、切刻内切酶识别位点等)。经高温变性,立即冷却,将DNA变为单链。单链化后含接头序列的DNA,用单链环化酶进行环化。环化后的DNA,利用无切刻碱基的普通引物进行互补链合成,而后切刻产生缺口(根据接头中的切刻碱基,选择相应的切刻方式),链置换合成目标序列与两个拷贝的目标序列组成的序列。链置换后的双链DNA进行标准的高通量测序文库构建,测序,并进行数据分析。
方案二(单链环化方案之三缺口方案及多缺口方案):
首先将DNA随机打断成小于二代测序仪测序读长一半的片段(打断后的长度加上5'部分接头序列的长度最好小于读长一半),然后连接上接头序列,其中该接头序列含有切刻碱基(如dUTP、8-oxo-dGTP、切刻内切酶识别位点 等,切刻碱基数目不限)。经高温变性,立即冷却,将DNA变为单链。单链化后含接头序列的DNA,用单链环化酶进行环化。环化后的DNA,利用含切刻碱基(如dUTP、8-oxo-dGTP、切刻内切酶识别位点等,切刻碱基数目不限)的引物进行互补链合成,而后切刻产生缺口(根据接头中的切刻碱基,选择相应的切刻方式),链置换合成连接序列与两个拷贝的目标序列组成的序列。链置换后的双链DNA进行标准的高通量测序文库构建,测序,并进行数据分析。
方案三(双链环化方案)
首先将DNA随机打断成小于二代测序仪测序读长一半的片段(打断后的长度加上5'部分接头序列的长度最好小于读长一半),然后连接上接头序列,其中该接头序列含有切刻碱基(如dUTP、8-oxo-dGTP、切刻内切酶识别位点等)或者环化时将DNA分子或接头序列去磷酸化处理。采用DNA连接酶进行双链环化,环化后的DNA,切刻产生缺口(根据接头中的切刻碱基选择相应的切刻方式,若接头中已含有缺口,可省略切刻),链置换合成连接序列与两个拷贝的目标序列组成的序列。链置换后的双链DNA进行标准的高通量测序文库构建,测序,并进行数据分析。
实施例1:按照上述方案一(双缺口方案)构建全基因组DNA文库(Illumina平台)
1)DNA片段化
所用仪器和试剂:
超声打断仪:Covaris:S2Focused-ultrasonicator
打断管:Covaris Microtube 6x16mm,catalog#:520045
QIAGEN MinElute Gel Extraction Kit(250),Catalog#:28606
Takara 20bp DNA Ladder(Dye Plus),Takara Code,3420A
用超声打断仪(Covaris S2Focused-ultrasonicator)将5μg纯化好的PhiX174基因组DNA打断为150-200bp(Intensity:5,Duty Cycle:10%,Cycles per Burst:200,Temperature:4℃,time:60s,number of cycles:5),打断体系为50μl。
4%琼脂糖凝胶电泳(80V,70min;1×TAE),切胶回收(QIAGEN MinElute Gel ExtractionKit)60-90bp片段(Takara 20bp DNA Ladder),回收步骤详见QIAGEN MinElute Gel Extraction Kit说明书。
2)末端补平加A
所用试剂:New England Biolabs:
Figure PCTCN2015095380-appb-000001
UltraTM DNA Library Prep Kit for
Figure PCTCN2015095380-appb-000002
Catalog#:E7370S
片段化DNA:55.5μl
End Prep Enzyme Mix:3μl
End Repair Reaction Buffer(10×):6.5μl
共:65μl
20℃30min,65℃30min。
3)连接接头序列
所用试剂:New England Biolabs:
Figure PCTCN2015095380-appb-000003
UltraTM DNA Library Prep Kit for
Figure PCTCN2015095380-appb-000004
Catalog#:E7370S
已补平的DNA:65μl
Blunt/TA Ligase Master Mix:15μl
Ligation Enhancer:1μl
接头序列UO-A(50pmol):2μl
共:83μl
20℃30min,65℃5min立即置于冰上3min。
产物用Agencourt AMPure XP(Beckman Coulter,Inc)磁珠纯化。
接头序列:UO-A由100pmol的UO-adaptor1(退火缓冲液溶解:10mM Tris-HCl(pH 7.5),1mM EDTA,0.1mM NaCl)和100pmol的UO-adaptor2(退火缓冲液溶解:10mM Tris-HCl(pH 7.5),1mM EDTA,0.1mM NaCl)等体积混合退火(94℃5min,以每秒0.1℃逐渐降温至25℃)而成。
UO-adaptor1:
5'-pGATCAGTCGTACGTGCTTACTCTCAATAGCAGCTT-3'(SEQ ID NO:1)
UO-adaptor2:
5'-pGTGGGCAGTCGGTGAACGACTGAUCT-3'(SEQ ID NO:2)
注:接头序列包含但不局限于实施例中UO-adaptor1和UO-adaptor2的序列形式。下同。
4)单链环化
New England Biolabs:Exonuclease I(E.coli),Catalog#:M0293
New England Biolabs:Exonuclease III(E.coli),Catalog#:M0206
Epicentre:CircLigase II ssDNA Ligase,Catalog#:CL9025K
DNA:24μl。
95℃3min,立即置于冰上3min
10×circligase buffer:6μl
10mmol MnCl2:1.5μl
Circligase(100u/μl):1.5μl
60℃2h,80℃10min
消化线性及二聚体DNA:
Exonuclease I(E.coli):1μl
Exonuclease III(E.coli):1μl
37℃,1h
产物用MinElute Reaction Cleanup Kit纯化
5)互补链合成
New England Biolabs:Klenow Fragment(3'→5'exo-),Catalog#:M0212S
New England Biolabs:USERTM Enzyme,Catalog#:M5505S
NEB buffer 4:2μl
primer(UO-p1,10uM):1μl
DNA:15.8μl
95℃,3min,立即置于冰上3min。
完成后加入:
2.5mM dNTP:0.5μl
100X BSA:0.2μl
Klenow Fragment(3'→5'exo-):1μl
共20μl
20℃,30min,75℃,20min。
USERTM Enzyme:1μl
37℃,30min
产物采用Agencourt AMPure XP(Beckman Coulter,Inc)磁珠纯化。
UO-p1:5'-AGCACGTACGACTGATCT-3'(SEQ ID NO:3)
6)链置换合成
New England Biolabs:Bst 2.0
Figure PCTCN2015095380-appb-000005
DNA Polymerase,Catalog#: M0538S
DNA:16.5μl
Isothermal Amplification Buffer:2μl
2.5mM dNTP:0.5μl
Bst 2.0
Figure PCTCN2015095380-appb-000006
DNA Polymerase:0.5μl
60℃,30min
产物采用Agencourt AMPure XP(Beckman Coulter,Inc)磁珠纯化。
7)对上述结果序列构建Illumina文库
可利用构建标准的Illumina文库的商业试剂盒,如:TruSeq DNA Sample Preparation Kits等。具体包括以下步骤:
(1)末端补平加A(同上“末端补平加A”部分)
(2)测序接头序列连接
已补平的DNA:65μl
Blunt/TA Ligase Master Mix:15μl
Ligation Enhancer:1μl
NEXTflexTM DNA Barcodes(Bioo Scientific Corporation,Catalog#:514101):0.5μl,共:83μl
20℃,30min
产物采用Agencourt AMPure XP(Beckman Coulter,Inc)磁珠纯化。
(3)PCR扩增
DNA:24μl
NEXTflexTM Primer Mix(Bioo Scientific Corporation,Catalog#:514101):1μl
KAPA HiFi HotStart ReadyMix(Kapa Biosystems,Catalog#:KK2601):25μl
共:50μl
PCR扩增循环条件:
98℃45s预变性,循环扩增(98℃15s,65℃30s,72℃60s)13次,72℃4min,4℃冷却。
产物采用Agencourt AMPure XP(Beckman Coulter,Inc)磁珠纯化。
2%琼脂糖凝胶电泳,切胶回收(QIAGEN MinElute Gel Extraction Kit)300-500bp片段。
洗脱后的DNA即是构建好的文库,该文库即可用于二代测序平台测序。
实施例2:按照上述方案二(这里以三缺口方案为例)构建全基因组DNA文库
(1)DNA片段化,末端补平加A,连接接头,单链环化步骤同实施例1。
(2)互补链合成
New England Biolabs:Klenow Fragment(3'→5'exo-),Catalog#:M0212S
New England Biolabs:USERTM Enzyme,Catalog#:M5505S
NEB buffer 4:2μl
primer(UO-p1-2,10uM):1μl
DNA:15.8μl
95℃3min,立即置于冰上3min。
完成后加入:
2.5mM dNTP:0.5μl
100X BSA:0.2μl
Klenow Fragment(3'→5'exo-):1μl
共20μl
20℃30min,75℃20min。
USERTM Enzyme:1μl
37℃,30min,50℃,5min,立即置于冰上。
产物采用Agencourt AMPure XP(Beckman Coulter,Inc)磁珠纯化。
UO-p1-2:5'-AGCACGTACGACTGAUCT-3'(SEQ ID NO:4)
产物即可用于构建二代三代测序文库
实施例3:按照上述方案三(双链环化方案,接头含待切位点)构建全基因组DNA文库
(1)DNA片段化(~700bp,片段化条件:duty cycle:5%,intensity:3,cycles per burst:200,time:75s)),末端补平加A,连接接头同实施例1,接头序列为:UO-A2,有下面两条序列退火而成:
5'-AGCACGTACGACTGAUCT-3'(SEQ ID NO:5)
5'-pGATCAGTCGTACGTGCT-3'(SEQ ID NO:6)
(2)末端磷酸化
44μl DNA,10U T4PNK(T4Polynucleotide Kinase,NEB,M0201S),50mM Tris-HCl pH 7.5,10mM MgCl2,1mM ATP,10mM DTT,37℃30min,产物用1XAmpure XP磁珠纯化。
(3)双链环化
Figure PCTCN2015095380-appb-000007
Quick Ligation Module(NEB,E6056S)
DNA:35μl
T4quick ligase:5μl
5Xligase buffer:10μl
20℃,30min
产物用1XAmpure XP磁珠纯化。
(4)酶切消化
Exonuclease I(E.coli):1μl
Exonuclease III(E.coli):1μl
USERTM Enzyme:1μl
DNA:42μl
NEB buffer 4:5μl
37℃,1h
产物用MinElute Reaction Cleanup Kit纯化
(5)链置换合成
New England Biolabs:Bst 2.0
Figure PCTCN2015095380-appb-000008
DNA Polymerase,Catalog#:M0538S
DNA:16.5μl
Isothermal Amplification Buffer:2μl
2.5mM dNTP:0.5μl
Bst2.0
Figure PCTCN2015095380-appb-000009
DNA Polymerase:0.5μl
60℃,60min
产物采用Agencourt AMPure XP(Beckman Coulter,Inc)磁珠纯化。
产物即可用于构建一代、二代或三代测序文库。
实施例4:按照上述方案三(双链环化方案)构建全基因组DNA文库
(1)DNA片段化(~700bp,片段化条件:duty cycle:5%,intensity:3,cycles per burst:200,time:75s),末端补平加A
(2)5'端去磷酸化(NEB:M0289)
DNA:44μl
Antarctic Phosphatase:1μl
Antarctic Phosphatase ReactionBuffer:5μl
37℃,60min,产物用1XAmpure XP磁珠纯化。
(3)双链环化
Figure PCTCN2015095380-appb-000010
Quick Ligation Module(NEB,E6056S)
DNA:34μl
UO-A3:1μl
T4quick ligase:5μl
5Xligase buffer:10μl
20℃,30min
产物用1XAmpure XP磁珠纯化。
接头序列为:UO-A3,有下面两条序列退火而成:
5'-pGATCAGTCGTACGTGCTTACTCTCAATAGCAGCTT-3'(SEQ ID NO:7)
5'-pAGCTGCTATTGAGAGTAAGCACGTACGACTGATCT-3'(SEQ ID NO:8)
(4)酶切消化
Exonuclease I(E.coli):1μl
Exonuclease III(E.coli):1μl
DNA:43μl
NEB buffer 4:5μl
37℃,1h
产物用MinElute Reaction Cleanup Kit纯化
(5)链置换合成
New England Biolabs:Bst 2.0
Figure PCTCN2015095380-appb-000011
DNA Polymerase,Catalog#:M0538S
DNA:16.5μl
Isothermal Amplification Buffer:2μl
2.5mM dNTP:0.5μl
Bst2.0
Figure PCTCN2015095380-appb-000012
DNA Polymerase:0.5μl
60℃,60min
产物采用Agencourt AMPure XP(Beckman Coulter,Inc)磁珠纯化。
产物即可用于构建一代、二代或三代测序文库。
实施例5:构建目标区域捕获文库
根据实例1的方法进行对human基因组DNA进行文库构建,并对PCR后的产物进行目标区域捕获。
外显子探针杂交
本实验采用Agilent:SureSelect Human All Exon Kits对上述PCR反应产物进行外显子探针杂交。杂交缓冲液配制:
SureSelect Hyb#1(orange cap,or bottle):25μl
SureSelect Hyb#2(red cap):1μl
SureSelect Hyb#3(yellow cap):10μl
SureSelect Hyb#4(black cap,or bottle):13μl
共:49μl,65℃,5min。
捕获文库混合物配制:
SureSelect Library:5μl
SureSelect RNase Block(purple cap):0.5μl
ddH2O:1.5μl
共:7μl,65℃,2min。
样品混合物配制:
纯化好的DNA(约700ng):3.4μl
SureSelect Indexing Block#1(green cap):2.5μl
SureSelect Block#2(blue cap):2.5μl
SureSelect Indexing Block#3(brown cap):0.6μl
共:9μl,95℃,5min,65℃保持。
取13μl配制好的杂交缓冲液加入捕获文库混合物(7μl)中,再将样品混合物(9μl)加入,共29μl,65℃杂交24h。
磁珠(InvitrogenTM
Figure PCTCN2015095380-appb-000013
M-280Streptavidin,Catalog#:11205D)抓取杂交好的片段(50μl磁珠,用200μl SureSelect Binding Buffer洗涤三次,200μl SureSelect Binding Buffer重悬磁珠,加入杂交后产物,室温放置30min,磁珠吸附,SureSelect Wash 1洗一次,SureSelect Wash 2洗三次,36.5μl ddH2O重悬磁珠),详见Agilent:SureSelect Human All Exon Kits操作手册。
(7)探针杂交后PCR
所用仪器试剂:
PCR仪:Eppendorf:Mastecycler pro s
Agilent:Herculase II Fusion DNA Polymerases,Catalog#:600677
Beckman Coulter,Inc:Agencourt AMPure XP,Item No.A63880
反应配方如下:
外显子探针杂交中重悬的磁珠:36.5μl
MP PCR primer 1.0(10pmol):1μl
MP PCR primer 2.0(10pmol):1μl
5×Herculase II Reaction Buffer:10μl
dNTPs(100mM;25mM each dNTP):0.5μl
Herculase II Fusion DNA Polymerase:1μl
共:50μl。
PCR扩增循环条件:
98℃2min预变性,循环扩增(98℃30s,65℃30s,72℃30s)12次,72℃10min,4℃冷却。
引物序列如下:
MP PCR primer 1.0:
5'-AATGATACGGCGACCACCGAGATCTACACTCTTTCCCTACACGACGCTCTTCCG
ATCT-3'(SEQ ID NO:9)
MP PCR primer 2.0:
5'-CAAGCAGAAGACGGCATACGAGAT-3'(SEQ ID NO:10)
PCR完成后用Agencourt AMPure XP磁珠纯化,概述如下:对扩增后产物加入1.8倍体积磁珠,室温放置5min,磁力架吸附5min,去上清,70%酒精洗两次,晾干后,16μl ddH2O洗脱。详见试剂盒说明书。
洗脱后的DNA即是构建好的人外显子文库,该文库即可用于二代测序平 台测序。
实施例6:对外周血游离DNA进行DNA文库构建
(1)提取外周血游离DNA并检测其片段大小。
所用仪器和试剂:
QIAGEN:QIAamp Circulating Nucleic Acid Kit,catalog#:55114
Agilent:2100bioanalyzer
取2ml血浆,采用QIAGEN的QIAamp Circulating Nucleic Acid Kit提取血浆中的DNA(cell-free circulating DNA),20μl ddH2O洗脱(提取方法见试剂盒说明书)。采用Agilent的2100bioanalyzer检测提取的片段大小分布。从结果得出,肝癌病人中游离的DNA片段大小集中在164bp附近,分布范围约是(110bp-210bp),浓度为4.78ng/μl,DNA总量约为100ng。
(2)对外周血DNA进行末端补平加A,连接接头,单链环化,互补链合成,链置换及后续Illumina文库构建步骤,同实施例1。
实施例7:选取实施例1中噬菌体Phix174文库测序数据分析
用hiseq 2500测得约1G的双向数据(读长为2×125=250bp)。对数据处理分析如下:
1、共测得:1410463条reads,其中含有上述正确结构的reads数为:631353条reads
2、目标序列大小范围为:30-107bp,平均大小为:91.86817bp,标准差为:14.42506,中位数为:94bp。
3、对构建好的文库,进行双端的高通量测序(Pair-End sequencing)。将测序所得结果中的两个目标序列相互比较,去除不一致的序列。测序错误率是指一致序列中与参考序列不一样的位点所占的比例。利用该原理,来计算所测 的数据中DNA的错误率。假设样品中不存在低频突变,该方法的测序错误率为10-5。测序错误在不同碱基(参考基因组的碱基)上分布不同,具体测序错误率见表1。
表1不同碱基的测序错误频率
测序错误类型 测序错误率
A=>C 1.85E-06
T=>G 1.25E-06
A=>G 6.56E-06
T=>C 7.55E-06
A=>T 3.59E-06
T=>A 2.80E-06
C=>A 3.11E-05
G=>T 3.22E-05
C=>G 9.94E-06
G=>C 7.42E-06
C=>T 1.67E-05
G=>A 1.34E-05
从上述计算的结果可以看出,该方法的单碱基错误率(10-5)远远低于二代测序的错误率(1%),也远远低于已经存在的一些改进方法,因此本方法较为彻底的消除了二代测序的错误率问题,借助于第二代测序技术平台实现了对DNA分子的超精确测序。
4、测序覆盖度分布
根据上述测序结果,分析测到的序列在phix174全基因组上的覆盖情况,结果发现,本发明所提供的方法有效降低了扩增的偏好性,测序数据实现了对全基因组的有效均匀覆盖。
如果起始模板被完全均衡的扩增,那么在基因组上的任何一个位点的测序深度应该等于全基因平均测序深度,即比值应为1,对上述比值取e的对数后,结果应为0。如果起始模板不能被均衡的扩增,那么基因组上必有某些位点的 测序深度不等于全基因平均测序深度,即比值大于1或者小于1,即比值的对数值应大于0或者小于0。
采用专利201310651462.5和201410448968.0构建的文库,几乎所有位点的测序深度与全基因组平均测序深度比值的对数值已经严重偏离0,绝大部分位点的对数值集中在-1以下,小部分位点的比值大于0,甚至达到4,意味着某些位点的复制倍数是全基因组平均复制倍数的几十倍甚至上百倍,这是由于环状DNA扩增过程中,滚环复制存在极大的复制不均衡性,导致某些位点的扩增量很大,这些扩增量大的位点的存在,提升了全基因组平均测序深度的值,导致了绝大多数点的测序深度与全基因组平均测序深度的比值降低了。而采用本发明,几乎所有位点的测序深度与全基因组平均测序深度比值的对数值都均匀的分布在0的上下。即使是测序深度最高的位点,该位点与全基因组测序深度的比值也小于自然对数e,即比值的对数小于1,实现了对全基因组均匀的复制,扩增产物较好的、较均匀的覆盖了整个基因组。综上所述,本发明所提供的技术有效地、均衡地扩增了环状DNA分子。
该方法的另一个优点是测序精度与测序深度无关,解决了标签法必须在极高的测序覆盖乘数下才能较精确测定DNA序列的问题,从而也就可以实现对大基因组(如人类的基因组等)的精确测序。
本发明的方法能够超精确测定细胞中的DNA分子组成,可以把一个正常或发生病变(如癌症组织等)细胞群体中的DNA组成较真实的呈现出来。在癌症的检测方面,可以用来检测一个正常个体的某一组织或器官是否已经发生了潜在的致癌突变,以达到提前发现癌症和预防癌症的目的。在癌症研究的方面,该方法可以检测癌症群体中DNA突变的分布情况;可以用于发现癌症组织中潜在的小克隆群体来真实的了解肿瘤的异质性结构;可以帮助阐释突变在 癌症的发生发展所起的作用;可以用来寻找肿瘤干细胞等。对于癌症治疗方面,可以用于寻找肿瘤干细胞群体,然后针对肿瘤干细胞设计特定的药物靶标,以实现对癌症的有效治疗等。对正常个体而言,该方法可以用于检测个体中正常细胞内DNA发生的突变,从而追溯正常组织的生长模式;也可以测定不同年龄个体中,某一组织中DNA突变发生的个数,从而估算DNA突变的速率;可以用于检测一个正常个体中是否存在与各种疾病相关的突变,达到预防疾病的目的等。
同时该方法能对外周血中的游离DNA进行有效的文库构建,能够有效的检测外周血中存在的低频突变位点,这种通过非侵害性的检测手段就能够对癌症的发生及发展过程、产前诊断中胎儿体内的有害突变等进行有效的检测和评估。
在古人类DNA的序列测定是研究人类进化的主要手段,但测定古人类DNA有很多难题,其中最大的几个问题是提取的古人类DNA含量低,降解严重,微生物污染严重。该方法能够利用极少量的DNA(单双链均可)进行文库构建,构建的文库能够进行外显子捕获(去除微生物基因组污染),可有效针对古DNA文库构建过程中的这几个难题。
基于本发明,可以提供一种测序文库构建试剂盒,所述试剂盒可以包括末端补平加A尾试剂,DNA连接酶,接头序列,单链环化试剂,第二链合成试剂,切口酶,链置换试剂,dNTP(2.5mM),BSA(100X),试剂盒例如可以具体包含如下:
末端补平加A尾试剂:包含10X末端补平加A缓冲液(500mM Tris-HCl,100mM MgCl2,100mM DTT,10mM ATP,4mM dATP,4mM dCTP,4mM dGTP,4mM dTTP,pH 7.5@25℃),T4DNA Polymerase(3U/μl),Klenow  DNA Polymerase(0.5U/μl),T4Polynucleotide Kinase(10U/μl),Thermophilic modified DNA polymerase(5U/μl).
DNA连接酶:T4DNA连接酶(20U/μl),5XT4DNA连接酶缓冲液(250mM Tris-HCl,50mM MgCl2,5mMATP,50mM DTT,pH 7.5@25℃)
接头序列:
由5'-pGATCAGTCGTACGTGCTTACTCTCAATAGCAGCTT-3'(SEQ ID NO:1)和5'-pGTGGGCAGTCGGTGAACGACTGAUCT-3'(SEQ ID NO:2)退火形成的Y型结构体
单链环化试剂:单链环化酶(100U/μl),50mM MnCl2,10X单链环化酶缓冲液(0.33M Tris-Acetate(pH 7.5),0.66M potassium acetate and 5mm DTT)
第二链合成试剂:DNA Polymerase I(E.coli)(10U/μl)10X缓冲液:(500mM NaCl,100mM,Tris-HCl,100mM MgCl2,10mM DTT,pH 7.9@25℃
切口酶:Uracil DNA glycosylase(UDG)(1U/μl),DNA glycosylase-lyase Endonuclease VIII(1U/μl)
链置换试剂:Bst DNA聚合酶大片段(8U/μl),10X Bst DNA聚合酶缓冲液(200mM Tris-HCl,100mM(NH4)2SO4,100mM KCl,20mM MgSO4,1%
Figure PCTCN2015095380-appb-000014
X-100,pH 8.8@25℃)
尽管本发明的具体实施方式已经得到详细的描述,本领域技术人员将会理解。根据已经公开的所有教导,可以对那些细节进行各种修改和替换,这些改变均在本发明的保护范围之内。本发明的全部范围由所附权利要求及其任何等同物给出。
Figure PCTCN2015095380-appb-000015
Figure PCTCN2015095380-appb-000016
Figure PCTCN2015095380-appb-000017
Figure PCTCN2015095380-appb-000018

Claims (35)

  1. 一种单链环状核苷酸序列,其特征在于,所述单链环状核苷酸序列有至少一个待切位点。
  2. 权利要求1所述单链环状核苷酸序列,其特征在于,所述单链环状核苷酸序列有一个待切位点。
  3. 权利要求1或2任一项所述单链环状核苷酸序列,其特征在于,所述待切位点为dUTP碱基、8-oxo-dGTP或切刻内切酶识别位点。
  4. 一种双链环状核苷酸序列,其特征在于,所述双链环状核苷酸序列每条链有至少有一个待切位点或一个缺口。
  5. 权利要求4所述双链环状核苷酸序列,其特征在于,所述双链环状核苷酸序列一条链有缺口,另一条链有至少一个待切位点。
  6. 权利要求5所述双链环状核苷酸序列,其特征在于,所述待切位点在所述缺口的5'方向。
  7. 权利要求4所述双链环状核苷酸序列,其特征在于,所述双链环状核苷酸序列两条链都有至少一个待切位点。
  8. 权利要求4所述双链环状核苷酸序列,其特征在于,所述双链环状核苷酸序列两条链各有一个缺口。
  9. 权利要求4所述双链环状核苷酸序列,其特征在于,所述双链环状核苷酸序列一条链上的缺口/待切位点与另一条链上的缺口/待切位点之间的最近距离大于6个碱基。
  10. 权利要求4-9任一项所述双链环状核苷酸序列,其特征在于,所述待切位点为dUTP碱基、8-oxo-dGTP或切刻内切酶识别位点。
  11. 一种核苷酸序列,其特征在于,所述序列包括一段连接序列和两条目标序列,所述连接序列两端分别连接所述目标序列,两条所述目标序列为同向重复序列。
  12. 权利要求11所述核苷酸序列,其特征在于,所述连接序列存在反向互补区域。
  13. 权利要求11所述核苷酸序列,其特征在于,至少一条所述目标序列在与所述接头序列相接的另一端还连接有其他序列,所述其他序列至少部分区域与所述连接序列部分区域相同。
  14. 权利要求11所述核苷酸序列,其特征在于,所述目标序列长度小于测序仪的测序长度。
  15. 权利要求14所述核苷酸序列,其特征在于,所述其他序列与所述目标序列的长度之和小于测序仪的测序长度。
  16. 权利要求11所述核苷酸序列,其特征在于,所述核苷酸序列由连接序列以及与所述连接序列两端相接的目标序列构成,两条所述目标序列为同向重复序列。
  17. 权利要求16所述核苷酸序列,其特征在于,所述目标序列长度小于测序仪的测序长度。
  18. 一种核苷酸序列,其特征在于,所述序列由一段连接序列和两条目标序列构成,所述连接序列两端分别连接所述目标序列,两条所述目标序列部分区域同向重复。
  19. 一种测序文库,其特征在于,所述文库包括权利要求11-18任一项所 述的核苷酸序列。
  20. 一种接头序列,其特征在于,在两端连接其他核苷酸的情况下,所述接头序列有至少一个可切刻位点。
  21. 权利要求20所述接头序列,其特征在于,所述接头序列为6-100bp。
  22. 权利要求20所述接头序列,其特征在于,所述接头序列为双链核苷酸序列。
  23. 权利要求20-22任一项所述接头序列在制备权利要求1-3任一项所述单链环状核苷酸序列、权利要求4-10任一项所述双链环状核苷酸序列、权利要求11-17任一项所述核苷酸序列、权利要求18所述核苷酸序列或权利要求19所述测序文库中的应用。
  24. 权利要求1-3任一项所述单链环状核苷酸序列在制备权利要求4-10任一项所述双链环状核苷酸序列、权利要求11-17任一项所述核苷酸序列、权利要求18所述核苷酸序列或权利要求19所述测序文库中的应用。
  25. 权利要求4-10任一项所述双链环状核苷酸序列在制备权利要求11-17任一项所述核苷酸序列、权利要求18所述核苷酸序列或权利要求19所述测序文库中的应用。
  26. 权利要求11-17任一项所述核苷酸序列在制备权利要求18所述核苷酸序列或权利要求19所述测序文库中的应用。
  27. 权利要求18所述核苷酸序列在制备权利要求19所述测序文库中的应用。
  28. 制备权利要求1-3任一项所述单链环状核苷酸序列的方法,其特征在于,包括:
    将目标序列与含有可切刻碱基、可切刻的酶切位点或者缺口的接头序列连接,得到双链或单链连接序列;
    当得到的连接序列为双链序列时,变性单链化后,进行单链环化;当步骤得到的连接序列为单链序列时,直接进行单链环化。
  29. 制备权利要求4-10任一项所述双链环状核苷酸序列的方法,其特征在于,包括:
    将权利要求1-3任一项所述单链环状核苷酸序列进行互补链合成,使用5’末端没有磷酸化的引物,形成带有互补链缺口的双链环状结构体;或者,将目标序列与含有可切刻碱基、可切刻的酶切位点或者缺口的接头序列连接所得到的双链序列直接双链环化。
  30. 制备权利要求11-18任一项所述核苷酸序列的方法,其特征在于,包括:
    将权利要求4-10任一项所述双链环状核苷酸序列进行切刻,得到两条链上都有缺口的双链环状核苷酸序列;
    对两条链上都有缺口的双链环状核苷酸序列进行链置换扩增。
  31. 制备权利要求19所述测序文库的方法,其特征在于,包括:权利要求11-18所述核苷酸序列进行末端修复加A尾,连接测序接头,进行PCR反应。
  32. 权利要求19所述测序文库在基因测序中的应用。
  33. 权利要求32所述应用,其特征在于,所述基因测序是指基因组DNA测序、目标片段捕获测序、单链DNA片段的测序、化石DNA的测序或体液中游离DNA的测序。
  34. 一种测序方法,其特征在于,所述测序方法包括使用权利要求19所述测序文库的步骤。
  35. 一种测序试剂盒,其特征在于,所述测序试剂盒包括末端补平加A尾试剂、DNA连接酶、接头序列、单链环化试剂、切口酶和链置换试剂。
PCT/CN2015/095380 2015-09-30 2015-11-24 测序文库及其制备和应用 WO2017054302A1 (zh)

Priority Applications (1)

Application Number Priority Date Filing Date Title
US15/903,911 US11702690B2 (en) 2015-09-30 2018-02-23 Sequencing library, and preparation and use thereof

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201510638417.5A CN106554957B (zh) 2015-09-30 2015-09-30 测序文库及其制备和应用
CN201510638417.5 2015-09-30

Related Child Applications (1)

Application Number Title Priority Date Filing Date
US15/903,911 Continuation US11702690B2 (en) 2015-09-30 2018-02-23 Sequencing library, and preparation and use thereof

Publications (1)

Publication Number Publication Date
WO2017054302A1 true WO2017054302A1 (zh) 2017-04-06

Family

ID=58418051

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2015/095380 WO2017054302A1 (zh) 2015-09-30 2015-11-24 测序文库及其制备和应用

Country Status (3)

Country Link
US (1) US11702690B2 (zh)
CN (1) CN106554957B (zh)
WO (1) WO2017054302A1 (zh)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110872610A (zh) * 2019-11-29 2020-03-10 福建和瑞基因科技有限公司 构建靶序列的测序文库的方法
CN113897686A (zh) * 2021-12-09 2022-01-07 臻和(北京)生物科技有限公司 一种适用于单端测序的扩增子文库构建引物组和构建方法
CN116110496A (zh) * 2023-01-05 2023-05-12 深圳市海普洛斯医疗系统科技有限公司 接头序列快速检测方法、装置、设备及存储介质

Families Citing this family (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106939344B (zh) * 2017-04-20 2020-04-21 北京迈基诺基因科技股份有限公司 用于二代测序的接头
CN110225979B (zh) * 2017-05-23 2024-05-31 深圳华大基因股份有限公司 基于滚环扩增的基因组目标区域富集方法及其应用
CN111315895A (zh) * 2017-09-14 2020-06-19 豪夫迈·罗氏有限公司 用于产生环状单链dna文库的新型方法
CN109536594B (zh) * 2017-09-20 2022-11-01 深圳华大智造科技股份有限公司 一种用于Small RNA的测序方法、测序试剂和应用
CN108060191B (zh) * 2017-11-07 2021-05-04 深圳华大智造科技股份有限公司 一种双链核酸片段加接头的方法、文库构建方法和试剂盒
CN108866155B (zh) * 2018-06-11 2022-07-26 中国农业科学院深圳农业基因组研究所 一种下一代测序文库的制备方法
CN109182526A (zh) * 2018-10-10 2019-01-11 杭州翱锐生物科技有限公司 用于早期肝癌辅助诊断的试剂盒及其检测方法
CN111863133B (zh) * 2019-12-30 2023-07-18 上海交通大学医学院附属瑞金医院 一种高通量测序数据的分析方法、试剂盒及分析系统
CN112111544B (zh) * 2020-09-23 2022-04-01 复旦大学附属肿瘤医院 提高单链dna连接效率的方法
CN113481196B (zh) * 2021-06-30 2023-07-04 序康医疗科技(苏州)有限公司 一种dna连接的方法及其应用

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102628079A (zh) * 2012-03-31 2012-08-08 盛司潼 一种通过环化方式构建测序文库的方法
CN104532360A (zh) * 2014-12-17 2015-04-22 北京诺禾致源生物信息科技有限公司 全基因组甲基化测序文库及其构建方法
CN104561362A (zh) * 2015-02-03 2015-04-29 北京诺禾致源生物信息科技有限公司 高通量测序文库及其构建方法
CN104695027A (zh) * 2013-12-06 2015-06-10 中国科学院北京基因组研究所 测序文库及其制备和应用
CN105420348A (zh) * 2014-09-04 2016-03-23 中国科学院北京基因组研究所 改进的测序文库及其制备和应用

Family Cites Families (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5176996A (en) * 1988-12-20 1993-01-05 Baylor College Of Medicine Method for making synthetic oligonucleotides which bind specifically to target sites on duplex DNA molecules, by forming a colinear triplex, the synthetic oligonucleotides and methods of use
CN102286456B (zh) * 2005-04-04 2014-04-30 拜尔作物科学公司 用于去除所选择dna序列的方法和手段
WO2006108423A2 (en) 2005-04-12 2006-10-19 In Situ Rcp A/S Under Founding Methods for production of oligonucleotides
CN1793376A (zh) * 2005-11-28 2006-06-28 北京北方杰士生物科技有限责任公司 一种去除转基因植物中选择标记的方法及其专用载体
EP2235217B1 (en) 2008-01-09 2016-04-20 Life Technologies Corporation Method of making a paired tag library for nucleic acid sequencing
EP2340314B8 (en) * 2008-10-22 2015-02-18 Illumina, Inc. Preservation of information related to genomic dna methylation
CN102534811B (zh) * 2010-12-16 2013-11-20 深圳华大基因科技服务有限公司 一种dna文库及其制备方法、一种dna测序方法和装置
WO2016037361A1 (zh) * 2014-09-12 2016-03-17 深圳华大基因科技有限公司 试剂盒及其在核酸测序中的用途
CN104726549B (zh) * 2014-10-10 2020-01-21 青岛耐德生物技术有限公司 一种基于切刻酶的双链核酸等温扩增检测新方法
US20170349893A1 (en) * 2014-11-26 2017-12-07 Bgi Shenzhen Method and reagent for constructing nucleic acid double-linker single-strand cyclical library
US10479991B2 (en) * 2014-11-26 2019-11-19 Mgi Tech Co., Ltd Method and reagent for constructing nucleic acid double-linker single-strand cyclical library

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102628079A (zh) * 2012-03-31 2012-08-08 盛司潼 一种通过环化方式构建测序文库的方法
CN104695027A (zh) * 2013-12-06 2015-06-10 中国科学院北京基因组研究所 测序文库及其制备和应用
CN105420348A (zh) * 2014-09-04 2016-03-23 中国科学院北京基因组研究所 改进的测序文库及其制备和应用
CN104532360A (zh) * 2014-12-17 2015-04-22 北京诺禾致源生物信息科技有限公司 全基因组甲基化测序文库及其构建方法
CN104561362A (zh) * 2015-02-03 2015-04-29 北京诺禾致源生物信息科技有限公司 高通量测序文库及其构建方法

Cited By (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110872610A (zh) * 2019-11-29 2020-03-10 福建和瑞基因科技有限公司 构建靶序列的测序文库的方法
CN112877403A (zh) * 2019-11-29 2021-06-01 福建和瑞精创基因技术有限公司 构建靶序列的测序文库的方法
CN110872610B (zh) * 2019-11-29 2022-11-29 福建和瑞基因科技有限公司 构建靶序列的测序文库的方法
CN112877403B (zh) * 2019-11-29 2023-11-03 福建和瑞基因科技有限公司 构建靶序列的测序文库的方法
CN113897686A (zh) * 2021-12-09 2022-01-07 臻和(北京)生物科技有限公司 一种适用于单端测序的扩增子文库构建引物组和构建方法
CN116110496A (zh) * 2023-01-05 2023-05-12 深圳市海普洛斯医疗系统科技有限公司 接头序列快速检测方法、装置、设备及存储介质

Also Published As

Publication number Publication date
US11702690B2 (en) 2023-07-18
CN106554957A (zh) 2017-04-05
US20190078157A1 (en) 2019-03-14
CN106554957B (zh) 2020-04-21

Similar Documents

Publication Publication Date Title
WO2017054302A1 (zh) 测序文库及其制备和应用
CN110734908B (zh) 高通量测序文库的构建方法以及用于文库构建的试剂盒
Van Dijk et al. Library preparation methods for next-generation sequencing: tone down the bias
JP6525473B2 (ja) 複製物配列決定リードを同定するための組成物および方法
CN108431233B (zh) Dna文库的高效率构建
US10718015B2 (en) Sequencing library, preparation method and use thereof
US20150284769A1 (en) Reduced representation bisulfite sequencing with diversity adaptors
US11359233B2 (en) Methods for labelling nucleic acids
CN109844137B (zh) 用于鉴定嵌合产物的条形码化环状文库构建
CN106497920A (zh) 一种用于非小细胞肺癌基因突变检测的文库构建方法及试剂盒
US11111524B2 (en) Method of identifying sequence variants using concatenation
US20230159984A1 (en) Gene target region enrichment method and kit
JP2020501554A (ja) 短いdna断片を連結することによる一分子シーケンスのスループットを増加する方法
US20180030532A1 (en) Bubble-shaped adaptor element and method of constructing sequencing library with bubble-shaped adaptor element
KR20160141680A (ko) 바코드 서열을 포함하는 어댑터를 이용한 차세대 염기서열 분석 방법
CN109576346A (zh) 高通量测序文库的构建方法及其应用
CN111868257A (zh) 用于单分子测序的双链dna模板的生成
US20220098642A1 (en) Quantitative amplicon sequencing for multiplexed copy number variation detection and allele ratio quantitation
US20240026440A1 (en) Methods of labelling nucleic acids
CN109825552B (zh) 一种用于对目标区域进行富集的引物及方法
WO2018144159A1 (en) Capture probes using positive and negative strands for duplex sequencing
CN108359723B (zh) 一种降低深度测序错误的方法
WO2020159435A1 (en) Method of sequencing nucleic acid with unnatural base pairs
WO2021058145A1 (en) Phage t7 promoters for boosting in vitro transcription
US20200199584A1 (en) Duplex sequencing using direct repeat molecules

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 15905214

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 15905214

Country of ref document: EP

Kind code of ref document: A1