WO2015144045A1 - 包含两个随机标记的质粒库及其在高通量测序中的应用 - Google Patents

包含两个随机标记的质粒库及其在高通量测序中的应用 Download PDF

Info

Publication number
WO2015144045A1
WO2015144045A1 PCT/CN2015/074981 CN2015074981W WO2015144045A1 WO 2015144045 A1 WO2015144045 A1 WO 2015144045A1 CN 2015074981 W CN2015074981 W CN 2015074981W WO 2015144045 A1 WO2015144045 A1 WO 2015144045A1
Authority
WO
WIPO (PCT)
Prior art keywords
sequence
plasmid
primer
library
dna
Prior art date
Application number
PCT/CN2015/074981
Other languages
English (en)
French (fr)
Inventor
刘晓
徐志超
尉晓林
吴仲义
阮珏
Original Assignee
清华大学
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 清华大学 filed Critical 清华大学
Priority to US15/128,557 priority Critical patent/US20200131504A1/en
Publication of WO2015144045A1 publication Critical patent/WO2015144045A1/zh

Links

Images

Classifications

    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12NMICROORGANISMS OR ENZYMES; COMPOSITIONS THEREOF; PROPAGATING, PRESERVING, OR MAINTAINING MICROORGANISMS; MUTATION OR GENETIC ENGINEERING; CULTURE MEDIA
    • C12N15/00Mutation or genetic engineering; DNA or RNA concerning genetic engineering, vectors, e.g. plasmids, or their isolation, preparation or purification; Use of hosts therefor
    • C12N15/09Recombinant DNA-technology
    • C12N15/10Processes for the isolation, preparation or purification of DNA or RNA
    • C12N15/1034Isolating an individual clone by screening libraries
    • C12N15/1065Preparation or screening of tagged libraries, e.g. tagged microorganisms by STM-mutagenesis, tagged polynucleotides, gene tags
    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12NMICROORGANISMS OR ENZYMES; COMPOSITIONS THEREOF; PROPAGATING, PRESERVING, OR MAINTAINING MICROORGANISMS; MUTATION OR GENETIC ENGINEERING; CULTURE MEDIA
    • C12N15/00Mutation or genetic engineering; DNA or RNA concerning genetic engineering, vectors, e.g. plasmids, or their isolation, preparation or purification; Use of hosts therefor
    • C12N15/09Recombinant DNA-technology
    • C12N15/10Processes for the isolation, preparation or purification of DNA or RNA
    • C12N15/1034Isolating an individual clone by screening libraries
    • C12N15/1093General methods of preparing gene libraries, not provided for in other subgroups
    • CCHEMISTRY; METALLURGY
    • C40COMBINATORIAL TECHNOLOGY
    • C40BCOMBINATORIAL CHEMISTRY; LIBRARIES, e.g. CHEMICAL LIBRARIES
    • C40B40/00Libraries per se, e.g. arrays, mixtures
    • C40B40/02Libraries contained in or displayed by microorganisms, e.g. bacteria or animal cells; Libraries contained in or displayed by vectors, e.g. plasmids; Libraries containing only microorganisms or vectors
    • CCHEMISTRY; METALLURGY
    • C40COMBINATORIAL TECHNOLOGY
    • C40BCOMBINATORIAL CHEMISTRY; LIBRARIES, e.g. CHEMICAL LIBRARIES
    • C40B50/00Methods of creating libraries, e.g. combinatorial synthesis
    • C40B50/06Biochemical methods, e.g. using enzymes or whole viable microorganisms
    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12QMEASURING OR TESTING PROCESSES INVOLVING ENZYMES, NUCLEIC ACIDS OR MICROORGANISMS; COMPOSITIONS OR TEST PAPERS THEREFOR; PROCESSES OF PREPARING SUCH COMPOSITIONS; CONDITION-RESPONSIVE CONTROL IN MICROBIOLOGICAL OR ENZYMOLOGICAL PROCESSES
    • C12Q1/00Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions
    • C12Q1/68Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions involving nucleic acids
    • C12Q1/6869Methods for sequencing

Definitions

  • the invention belongs to the field of genomics and relates to a method for high-throughput sequencing of DNA fragments by random sequence labeling plasmids.
  • the genome-wide shotgun method based on the second-generation sequencing technology has made rapid progress in the field of genomics in the past ten years due to its low cost and high speed.
  • second-generation sequencing technologies also encounter bottlenecks of uncontrollability, error rates, and cost swells. Due to the limitation of the length of the sequenced fragments, repeats of more than 1 kb in the genome will not be effectively detected to produce gaps, which will cause great trouble for genome-wide assembly, haploid typing, metagenomics and other research. .
  • BAC bacterial artificial chromosome
  • YAC yeast artificial chromosome
  • Fosmid Cosmid plasmid
  • other libraries not only provide long-range genomic DNA for Sanger method to sequence both ends, establish cross-gap linkage, make up
  • the second-generation sequencing has the shortcomings of insufficient read length, and can also be used as a library to provide materials for genetic, molecular biology and biochemical research of the species at any time.
  • the disadvantage of this technique is that sequencing using the Sanger method is too slow and expensive.
  • each plasmid is a double-stranded circular DNA which is composed of a plasmid backbone fragment and a DNA fragment having a specific structure, and has a specific structure.
  • the DNA fragment consists of the marker sequence 1, the DNA insertion site sequence to be tested and the marker sequence 2 from upstream to downstream;
  • any two plasmids in the plasmid library, the combination of the marker sequence 1 and the marker sequence 2 are different from each other;
  • the plasmid backbone fragment does not contain the same sequence as the DNA insertion site sequence to be tested.
  • the marker sequence 1 and the marker sequence 2 are both random sequences.
  • the random sequence does not need to have any biological function, such as: no transcription produces RNA, no expression produces protein, does not bind to any RNA or protein as a cis-acting element.
  • any two plasmids in the plasmid library, the plasmid backbone fragment and the DNA insertion site sequence to be tested are identical to each other.
  • the plasmids in the plasmid library are in more than 100 types.
  • the combination of the marker sequence 1 and the marker sequence 2 are different from each other, and it can be understood that at least one of the two marker sequences carried in the plasmid library is different from each other, preferably The two marker sequences are different from each other.
  • the marker sequence 1 and the marker sequence 2 can each have a length of 10-200 bp, such as 10-40 bp, and further 15-25 bp.
  • the DNA insertion site sequence to be tested may be an enzyme cleavage site recognition sequence, an upstream and downstream homologous arm sequence for homologous recombination, other structural sequences for inserting the DNA to be tested, or may be added based on the above sequences. Other DNA sequences can be used equally for the insertion of the DNA to be tested.
  • the length of the DNA insertion site sequence to be tested may be 4 bp to 1 Kb. When the DNA insertion site sequence to be tested is a restriction site recognition sequence, the length may be 4 bp to 100 bp; when the DNA insertion site sequence to be tested is an upstream and downstream homologous for homologous recombination When the arm sequence is used, its length can be 50bp-1Kb.
  • the DNA insertion site sequence to be tested is specifically an enzyme cleavage site recognition sequence, and each plasmid in the plasmid library is not in addition to the restriction site recognition sequence.
  • the cleavage site corresponding to the cleavage site recognition sequence is included.
  • the plasmid backbone fragment may be derived from a bacterial artificial chromosome plasmid, a yeast artificial chromosome plasmid, a Fosmid plasmid or a Cosmid plasmid.
  • the plasmid backbone fragment is from a Fosmid plasmid-pcc2FOS plasmid. Specifically, the plasmid backbone fragment removes the nucleotide between the 362th and the 403th positions of the pcc2FOS plasmid, and mutates the A base at position 355 to the C base, and the T base at position 410. A fragment obtained by mutating to a G base and mutating the A base at position 437 to a G base.
  • the cleavage site recognition sequence added is a sequence formed by sequentially ligating the recognition sequences of BamH I, Nhe I and Hind III.
  • the marker sequence 1 and the marker sequence 2 may all be composed of a random sequence (the nucleotide sequence is random), or may be a combination of a random sequence and a specific sequence in multiple forms (such as a sequence containing a plurality of discrete random sequences of 1 bp or more.
  • the principle is that the specific nucleotide sequence combination of the marker sequence 1 and the marker sequence 2 is theoretically more than 100 possibilities, and the plasmids in the plasmid library are divided into more than 100 species (and The marker sequence 1 and the marker sequence 2 of any two of the majority of the plasmids are different from each other), thereby meeting the requirements of high-throughput sequencing.
  • the method for preparing the plasmid library provided by the present invention may specifically include the following steps (a) and (b):
  • a sequence A of 10-200 bp in length is ligated to the 5' end of the reverse primer, and the obtained primer is referred to as a reverse primer B; the length of the 5' end of the forward primer is 10 - 200bp of sequence B, the obtained primer is recorded as forward primer B;
  • the sequence A and the sequence B are a random sequence (the nucleotide sequence is random) or a sequence containing at least a plurality of discrete 1 bp or more random sequences;
  • sequence C and the sequence D satisfy the following conditions: the 5' end of the sequence C and the 5' end of the sequence D both contain an cleavage site K which is not present on the plasmid backbone fragment; The 5' end of the sequence C is inversely complementary to the 5' end of the sequence D; and the sequence C is the reverse complement of the 5' end of one strand of the DNA insertion site sequence to be tested; D is the sequence of the 3' end of the one strand of the DNA insertion site sequence;
  • the self-ligation of the PCR product further comprises the step of transforming the ligation product into a recipient strain (such as Escherichia coli, specifically E. coli EPI300), and extracting the plasmid from the transformed strain to obtain the plasmid library.
  • a recipient strain such as Escherichia coli, specifically E. coli EPI300
  • the sequence A and the sequence B may further have a length of 10 to 40 bp. In one embodiment of the invention, both sequence A and sequence B are each 15-25 bp in length.
  • the DNA insertion site sequence to be tested may be A cleavage site recognition sequence, an upstream and downstream homologous arm sequence for homologous recombination, or other structural sequences for insertion of the DNA to be tested.
  • the length of the DNA insertion site sequence to be tested may be 4 bp to 1 Kb.
  • the length may be 4 bp to 100 bp; when the DNA insertion site sequence to be tested is an upstream and downstream homologous arm sequence for homologous recombination When it is, its length can be 50bp-1Kb.
  • the sequence identical to the sequence of the insertion site of the DNA to be tested is not contained.
  • the DNA insertion site sequence to be tested is specifically an enzyme cleavage site recognition sequence.
  • the starting plasmid is a bacterial artificial chromosome plasmid, a yeast artificial chromosome plasmid, a Fosmid plasmid or a Cosmid plasmid.
  • the starting plasmid is specifically a Fosmid plasmid-pcc2FOS plasmid.
  • the region to be substituted of the starting plasmid is a sequence consisting of nucleotides 362-403 of the pcc2FOS plasmid; the plasmid backbone fragment is a nucleoside from the 362th to the 403th position of the pcc2FOS plasmid.
  • the cleavage site recognition sequence of the DNA insertion site sequence is a sequence formed by sequentially ligating the recognition sequences of BamHI, Nhe I and Hind III.
  • step (a3) of the foregoing method is specifically:
  • the reverse sequence primer C is obtained by ligating the following sequence at the 5' end of the reverse primer B: a sequence formed by sequentially ligating the recognition sequences of the restriction sites Nhe I and BamH I (corresponding to the sequence C);
  • forward primer C a sequence formed by sequentially ligating the recognition sequences of the restriction sites Nhe I and Hind III (corresponding to the sequence D).
  • the cleavage site K is the cleavage site Nhe I.
  • the step (b) of the above method is: using the starting plasmid as a template, performing PCR amplification with the forward primer C and the reverse primer C, and digesting with restriction endonuclease Nhe I
  • the PCR product, the digested product is self-ligated, to obtain the plasmid library.
  • the DNA fragment to be tested may be from 15 kb to 400 kb in length.
  • linearized plasmid library that satisfies the following conditions is also within the scope of the present invention:
  • the linearized plasmid library and the linearized fragment obtained by linearizing the plasmid library provided by the above-described invention from the insertion site of the test DNA are identical in sequence.
  • Another object of the present invention is to provide a method for performing high-throughput sequencing of a DNA fragment to be tested using the plasmid library or the linearized plasmid.
  • the method for sequencing high-throughput ends of the DNA fragments to be tested by using the plasmid library provided by the present invention is as shown in FIG. 1 , and specifically includes the following steps:
  • Forward primer 1 was designed according to the sequence of the 3' end of the plasmid backbone fragment, reverse primer 1 was designed according to the sequence of the 5' end of the plasmid backbone fragment; and the 5' end of the forward primer 1 was ligated for Qualcomm
  • the sequenced primer sequence 1 was obtained, and the obtained primer was designated as the forward primer A;
  • the linker sequence 2 used in pair with the linker sequence 1 was ligated to the 5' end of the reverse primer 1, and the obtained primer was recorded as a reverse primer.
  • A the linker sequence 2 used in pair with the linker sequence 1 was ligated to the 5' end of the reverse primer 1
  • restriction endonuclease M and the restriction endonuclease M' satisfy the condition that the restriction enzyme M is located at the 3' end of the plasmid backbone fragment in the plasmid library, the restriction The endonuclease M' is located at the 5' end of the plasmid backbone fragment in the plasmid library, and the distance between the two is less than 10 kb from the marker sequence 1 or the marker sequence 2;
  • restriction endonuclease M and the restriction endonuclease M' may be the same restriction endonuclease or different restriction enzymes;
  • Forward primer 2 and reverse primer 2 were designed according to the sequence of the 3' end of the plasmid backbone fragment, and forward primer 3 and reverse primer 3 were designed according to the sequence of the 5' end of the plasmid backbone fragment;
  • a linker sequence 3 for high-throughput sequencing is ligated to the 5' end of the forward primer 2, and the resulting primer is referred to as a forward primer B; the linker sequence is ligated at the 5' end of the reverse primer 2 3 paired linker sequence 4, the resulting primer is referred to as reverse primer B;
  • the linker sequence 3 is ligated to the 5' end of the forward primer 3, and the obtained primer is referred to as a forward primer C; the linker sequence 4 is ligated at the 5' end of the reverse primer 3, and the obtained primer is recorded.
  • the reverse primer C For the reverse primer C;
  • the circularized DNA molecule library 1 obtained in the step (5) is used as a template, and the forward primer B and the reverse primer B are subjected to PCR amplification to obtain a PCR product 2;
  • the cyclized plasmid library 2 obtained in the step (5) is used as a template, and the forward primer C and the reverse primer C are subjected to PCR amplification to obtain a PCR product 3;
  • the resulting PCR product 2 and the PCR product 3 are separately subjected to high-throughput sequencing according to the linker sequence 3 and the linker sequence 4, and the marker sequence 1 and its downstream are obtained from the cyclized DNA molecule library 1.
  • the recipient strain may be Escherichia coli.
  • the recipient strain is an E. coli DH10b strain.
  • the high throughput sequencing can be a second generation DNA sequencing.
  • the linker sequence used for high throughput sequencing was determined according to the sequencer used.
  • the sequencer used is specifically a Hiseq2000 and a Miseq sequencer manufactured by Illumina. High-throughput sequencing in step (1) (first round of high-throughput sequencing) using the Hiseq2000 sequencer; high-throughput sequencing in step (7) (second round of high-throughput sequencing) using Miseq sequencing instrument.
  • the adaptor sequence employed is as follows: the sequence of the adaptor sequence 1 and the adaptor sequence 3 is: 5'-AATGATACGGCGACCACCGAGATCTACACTCTTTCCCTACACGACGCTCTTCCGATCT-3' (SEQ ID NO: 1); the linker sequence 2 and the linker sequence The sequence of 4 is 5'-CAAGCAGAAGACGGCATACGAGATNNNNNNGTGACTGGAGTTCAGACGTGTGCTCTTCCGATCT-3' (SEQ ID NO: 2) (where NNNNNN is the reaction index index, which is used to distinguish sequences from other samples of the same batch of upflow chambers).
  • the "sonication" may be specifically performed using a S220/E220 focused ultrasonic breaker manufactured by Covaris, with an instantaneous maximum power of 105 W and a duty cycle of 5% for 40 seconds.
  • the "cyclizing the fragmented DNA fragment” may specifically be the use of DNA
  • the end-repairing enzyme (NEB) repairs both ends of the fragmented DNA fragment to blunt ends and ligates the ends of the DNA with T4 DNA ligase (NEB).
  • the restriction endonuclease M and the restriction endonuclease M' described in the step (5) are each specifically a restriction endonuclease PvuII.
  • the DNA fragment to be tested may have a length of 15 kb to 400 kb.
  • step (II) determining the sequence of the two ends of each of the DNA fragments to be tested according to the information obtained in the step (I), thereby effecting sequencing of the high-throughput ends of the DNA fragments to be tested.
  • the present invention prepares a plasmid library labeled with a random sequence, and the library constructed using the plasmid library can use high-throughput sequencing methods, such as second-generation sequencing, to perform genomic DNA in addition to the characteristics of the conventional library. Sequencing at both ends.
  • the invention makes the sequencing of the two ends of the long fragment DNA fast, cheap and accurate.
  • 1 is a flow chart for performing high-throughput sequencing at both ends of a DNA fragment to be tested provided by the present invention.
  • FIG. 2 is a schematic diagram showing the construction method of a random sequence-labeled plasmid library provided by the present invention.
  • Figure 3 is a diagram showing the BAC vector a in Table 1.
  • the sequences at both ends of the insert are matched to the two chromosomes of the yeast genome, respectively, and the random sequences linked to the ends of the insert are known by sequencing the empty vector.
  • the tags are from the same vector, thus obtaining two pairs of mutually paired sequences separated by 153,401 bp.
  • Figure 4 is a distribution plot of the results of high-throughput sequencing of 1536 yeast BAC libraries.
  • pcc2FOS plasmid product of Epicentre, catalog number ccfos059.
  • Yeast S288C American Type Culture Collection Center (ATCC), No. 204508.
  • E. coli EPI300 Product of Epicentre, catalog number EC300105.
  • E. coli DH10b Life technologies, Inc., catalog number 18297-010.
  • This example uses the pcc2FOS plasmid as an example to construct a plasmid library in which the exogenous fragment containing the random sequence is substituted for nucleotides 362-403 of the pcc2FOS plasmid. details as follows:
  • a reverse primer A for amplifying a plasmid backbone fragment was designed based on a sequence upstream of the position of the pcc2FOS plasmid to be inserted, and a forward primer A for amplifying a plasmid backbone fragment was designed based on the sequence downstream of the position of the pcc2FOS plasmid to be inserted.
  • reverse primers C The recognition sequences of the restriction sites Nhe I and BamH I are sequentially ligated at the 5' end of the reverse primer B, and the obtained primers are referred to as reverse primers C (sequences are as follows); The ends were ligated to the recognition sequences of the restriction sites Nhe I and Hind III, and the resulting primers were designated as forward primers C (sequences below).
  • N 15-25 represents a random primer sequence
  • N may be A, T, C or G
  • 15-25 at the lower corner indicates the number of random primer bases.
  • Forward mutation primer 5'-ttcctaggctgtttcctggtgggaGcctctagagtcgacctgcaggcatgcGagctt-3' (SEQ ID NO: 3) (G in the first capital is a G base of the T base mutation at position 410, and G in the second upper case is A at position 437 Base base mutation G base)
  • Reverse mutation primer 5'-gtctaggtgtcgttgtacgtgggaGccccgggtaccgagctc-3' (SEQ ID NO: 4) (G in upper case is the reverse complementary base of the C base of the A base mutation at position 355)
  • PCR amplification is carried out by using the forward primer C and the reverse primer C in the step (2), and the PCR product is cut into a gel, digested with NheI, and finally cleaved.
  • the plasmid library of the random sequence tag was obtained by continuous cyclization (Fig. 2), and the plasmid was transformed into E. coli EPI300 and stored at -80 °C.
  • Example 2 Using the plasmid library prepared in Example 1 to measure long DNA fragments for high-throughput sequencing at both ends
  • the long DNA fragment to be tested in this example was derived from the genome of yeast S288C (http://downloads.yeastgenome.org/sequence/S288C_reference/genome_releases/S288C_reference_genome_Current_Release.tgz).
  • the method for performing high-throughput sequencing of the long DNA fragments to be tested using the plasmid library prepared in Example 1 is as follows:
  • Sequencer The Hiseq2000 sequencer from Illumina.
  • forward primer A (sequence is as follows); the adaptor sequence 2 used in pair with the adaptor sequence 1 is ligated at the 5' end of the reverse primer 1 Primers are referred to as reverse primer A (sequence as follows).
  • the "NNNNNN” of the reverse primer A is the reaction index, and N can be A, T, C or G for the sequence distinguished from the other samples of the same batch of the upflow chamber.
  • yeast genomic DNA The liquid cultured yeast S288C was collected, and after digesting the cell wall, the yeast protoplasts were uniformly embedded in the low melting point plug. Protein was removed by treatment with proteinase K. The yeast-containing gel plug was pre-cut with restriction endonuclease Hind III, and finally it was confirmed that the reaction was carried out at 37 ° C for 10 minutes using an enzyme concentration of 20 U/ml. Finally, a yeast genomic DNA fragment of 120 kb to 300 kb in size was recovered by pulse electrophoresis.
  • Example 2 The plasmid library prepared in Example 1 was digested with the restriction endonuclease Hind III, and the end-filling treatment was carried out by dephosphorylation or partial filling, so that it could not be self-ligated, and then added to the step (1).
  • the extracted long fragment genomic DNA is ligated.
  • the plasmid in which the long fragment genomic DNA was inserted was transformed into Escherichia coli DH10b to obtain a genomic BAC library of yeast S288C.
  • Escherichia coli in the entire BAC library was cultured together, and a plasmid in which the genomic fragment was inserted was extracted (11 plasmids were randomly selected, numbered a-k, and Sanger sequencing was performed for verification of the accuracy of the method of the present invention).
  • the plasmid was first digested with restriction endonuclease PvuII (not far from upstream and downstream of the position of the pcc2FOS plasmid to be inserted, ie, a PvuII restriction site recognition sequence at positions 218 bp and 651 bp), and then Covaris was used.
  • the company's S220/E220 focused ultrasonic crusher has an instantaneous maximum power of 105W and a duty cycle of 5% for 40 seconds. Then, the DNA fragment of the fragment was repaired to a blunt end using DNA end-repairing enzyme (NEB), and then ligated with T4 DNA ligase (NEB) to circulate the both ends of the fragment to obtain a circular DNA molecule library.
  • NEB DNA end-repairing enzyme
  • NNNNN is a reaction index
  • N may be A, T, C or G for a sequence which is distinguished from other samples of the same batch of the upflow chamber.
  • step 1 using the random sequence tag pairing relationship obtained in step 1, and the relationship between the random sequence and the long segment genomic DNA end sequence, the sequences at both ends of each long fragment DNA to be tested are obtained.
  • the method of the second embodiment can be used to realize high-throughput sequencing of the long-length DNA fragments to be tested quickly and accurately.
  • Example 3 another second round of high-throughput sequencing of the genomic BAC library of yeast S288C obtained in Example 2
  • the Escherichia coli in the entire BAC library was cultured together, and a plasmid in which the genomic fragment was inserted was extracted.
  • the plasmid was first digested with restriction endonuclease NotI (not farther upstream and downstream of the position of the pcc2FOS plasmid to be inserted, ie, a NotI restriction site recognition sequence at the 3 bp and 686 bp), and then Covaris was used.
  • the company's S220/E220 focused ultrasonic crusher has an instantaneous maximum power of 105W and a duty cycle of 5% for 40 seconds.
  • the DNA fragment of the fragment was repaired to a blunt end using DNA end-repairing enzyme (NEB), and then ligated with T4 DNA ligase (NEB) to circulate the both ends of the fragment to obtain a circular DNA molecule library.
  • NEB DNA end-repairing enzyme
  • NNNNN is a reaction index
  • N may be A, T, C or G for a sequence which is distinguished from other samples of the same batch of the upflow chamber.
  • step 1 using the random sequence tag pairing relationship obtained in step 1, and the relationship between the random sequence and the long segment genomic DNA end sequence, the sequences at both ends of each long fragment DNA to be tested are obtained.
  • the double-end sequences of 1251 BAC plasmids were obtained and aligned with the genomic sequences, and it was found that more than 99.8% of the plasmid tag sequences correctly directed the long-length genomic sequences of the respective ligated pairs.

Abstract

提供的是质粒库,其包含DNA插入位点及位于该位点上游和下游的两个标记序列。任意两种选自该库中的质粒的两个标记序列的组合是不同的。还提供的是利用该质粒库对插入的DNA进行高通量两端测序的方法。

Description

用随机序列标记质粒对DNA片段进行高通量两端测序的方法 技术领域
本发明属于基因组学领域,涉及一种用随机序列标记质粒对DNA片段进行高通量两端测序的方法。
背景技术
基于第二代测序技术的全基因组散弹枪法以其成本低,速度快的特点使基因组学领域在过去的十多年间取得了飞速进展。但是,当测序片段长度大于1kb或者更长时,第二代测序技术也会遇到不可控性、错误率和成本骤升的瓶颈。由于测序片段长度的限制,基因组中长度大于1kb的重复序列将无法被有效测得而产生间隙(gap),从而给全基因组组装、单倍体分型、宏基因组学等研究带来很大麻烦。
而细菌人工染色体(BAC)质粒、酵母人工染色体(YAC)质粒、Fosmid、Cosmid质粒等文库的构建则不仅提供了长片段的基因组DNA以供Sanger法进行两端测序,建立跨间隙的联系,弥补了第二代测序读长不足的缺点,同时还可以作为一个文库,随时为该物种的遗传学、分子生物学及生物化学研究提供素材。这一技术的缺点则是使用Sanger法测序太慢且昂贵。
发明内容
本发明的一个目的是提供一种用于对待测DNA片段进行高通量两端测序的质粒库。
本发明所提供的质粒库中,每个质粒均为由质粒骨架片段和具有特定结构的DNA片段连接而成的双链环形DNA,所述具有特定结构 的DNA片段自上游到下游依次由标记序列1、待测DNA插入位点序列和标记序列2组成;
所述质粒库中的任意两种质粒,所述标记序列1和所述标记序列2的组合彼此不同;且
所述质粒库中,所述质粒骨架片段中不含有与所述待测DNA插入位点序列相同的序列。
在本发明的一个实施例中,所述标记序列1和所述标记序列2均为随机序列。该随机序列不需要具有任何生物学功能,比如:不转录产生RNA、不表达产生蛋白质、不作为顺式作用元件结合任何RNA或者蛋白质。
在本发明的一个实施例中,所述质粒库中的任意两种质粒,所述质粒骨架片段和所述待测DNA插入位点序列均彼此相同。
所述质粒库中的质粒在100种以上。
其中,所述标记序列1和所述标记序列2的组合彼此不同,可理解为:所述质粒库中的任意两种质粒,各自所携带的两个标记序列中至少有一个彼此不同,最好为两个标记序列均彼此不同。
其中,所述标记序列1和所述标记序列2的长度均可为10-200bp,如10-40bp,再如15-25bp。
所述待测DNA插入位点序列可为酶切位点识别序列、用于同源重组的上下游同源臂序列、其他用于插入待测DNA的结构序列,或者在以上各序列基础上增加其它DNA序列但同样能够用于插入待测DNA的序列。所述待测DNA插入位点序列的长度可为4bp-1Kb。当所述待测DNA插入位点序列为酶切位点识别序列时,其长度可为4bp-100bp;当所述待测DNA插入位点序列为用于同源重组的上下游同源 臂序列时,其长度可为50bp-1Kb。
在本发明的一个实施例中,所述待测DNA插入位点序列具体为酶切位点识别序列,所述质粒库中的每个质粒,除所述酶切位点识别序列外的部分不含有所述酶切位点识别序列对应的酶切位点。
所述质粒骨架片段可来自于细菌人工染色体质粒、酵母人工染色体质粒、Fosmid质粒或Cosmid质粒。
在本发明的一个实施例中,所述质粒骨架片段来自于Fosmid质粒—pcc2FOS质粒。具体而言,所述质粒骨架片段为将pcc2FOS质粒从第362位至第403位之间的核苷酸去除,同时将第355位的A碱基突变为C碱基,410位的T碱基突变为G碱基,437位的A碱基突变为G碱基后得到的片段。相应的,所添加的所述酶切位点识别序列为将BamH I、Nhe I和Hind III的识别序列顺次连接后形成的序列。
在所述质粒库中,所述标记序列1和所述标记序列2可全部由随机序列(核苷酸的排列顺序是随机的)组成,也可由随机序列与特定序列以多形式组合而成(如含有多个离散的1bp或1bp以上随机序列的序列)。无论是哪种情况,其原则是所述标记序列1和所述标记序列2的具体核苷酸序列组合在理论上均有100种以上可能,将质粒库中的质粒分为超过100种(且绝大多数质粒中任两种的所述标记序列1和所述标记序列2均彼此不同),从而满足高通量测序的要求。
本发明的还一个目的是提供一种制备所述质粒库的方法。
本发明所提供的制备所述质粒库的方法,具体可包括如下(a)和(b)的步骤:
(a)按照如下(a1)-(a3)的步骤设计正向引物丙和反向引物丙:
(a1)根据出发质粒的待插入位点或待取代区域上游的序列设计用于扩增质粒骨架片段的反向引物甲,根据所述出发质粒的所述待插入位点或所述待取代区域下游的序列设计用于扩增质粒骨架片段的正向引物甲;
(a2)在所述反向引物甲的5’末端连接长度为10-200bp的序列A,得到的引物记为反向引物乙;在所述正向引物甲的5’末端连接长度为10-200bp的序列B,得到的引物记为正向引物乙;
所述序列A和所述序列B为随机序列(核苷酸的排列顺序是随机的)或至少含有多个离散的1bp或1bp以上随机序列的序列;
(a3)在所述反向引物乙的5’末端连接序列C,得到的引物记为反向引物丙;在所述正向引物乙的5’末端连接序列D,得到的引物记为正向引物丙;
所述序列C和所述序列D满足如下条件:所述序列C的5’端和所述序列D的5’端均含有一个所述质粒骨架片段上不存在的酶切位点K;且所述序列C的5’端和所述序列D的5’端反向互补;且所述序列C为所述待测DNA插入位点序列的一条链的5’端的反向互补序列;所述序列D为所述DNA插入位点序列的所述一条链的3’端的序列;
(b)以所述出发质粒为模板,以所述正向引物丙和所述反向引物丙进行PCR扩增,将所得PCR产物用核酸内切酶K消化后自连,即得到所述质粒库。
其中,将所述PCR产物自连后还包括将连接产物转化受体菌(如大肠杆菌,具体如大肠杆菌EPI300)的步骤,从转化后的菌株中提取质粒得到所述质粒库。
在所述方法中步骤(a2)中,所述序列A和所述序列B的长度进一步可为10-40bp。在本发明的一个实施例中,所述序列A和所述序列B的长度均具体为15-25bp。
在上述方法的步骤(a3)中,所述待测DNA插入位点序列可为 酶切位点识别序列、用于同源重组的上下游同源臂序列,或其他用于插入待测DNA的结构序列。所述待测DNA插入位点序列的长度可为4bp-1Kb。当所述待测DNA插入位点序列为酶切位点识别序列时,其长度可为4bp-100bp;当所述待测DNA插入位点序列为用于同源重组的上下游同源臂序列时,其长度可为50bp-1Kb。
在所述质粒骨架片段中,不含有与所述待测DNA插入位点序列相同的序列。
在本发明的一个实施例中,所述待测DNA插入位点序列具体为酶切位点识别序列。
在上述方法中,所述出发质粒为细菌人工染色体质粒、酵母人工染色体质粒、Fosmid质粒或Cosmid质粒。在本发明的一个实施例中,所述出发质粒具体为Fosmid质粒—pcc2FOS质粒。相应的,所述出发质粒的待取代区域为pcc2FOS质粒的第362-403位核苷酸所组成的序列;所述质粒骨架片段为将pcc2FOS质粒从第362位至第403位之间的核苷酸去除,同时将第355位的A碱基突变为C碱基,410位的T碱基突变为G碱基,437位的A碱基突变为G碱基后得到的片段;作为所述待测DNA插入位点序列的所述酶切位点识别序列为将BamHI、Nhe I和Hind III的识别序列顺次连接后形成的序列。
在本发明的一个实施例中,上述方法的步骤(a3)具体为:
在所述反向引物乙的5’末端连接如下序列得到反向引物丙:将酶切位点Nhe I和BamH I的识别序列顺次连接后形成的序列(对应所述序列C);
在所述正向引物乙的5’末端连接如下序列得到正向引物丙:将酶切位点Nhe I和Hind III的识别序列顺次连接后形成的序列(对应所述序列D)。
即所述酶切位点K为酶切位点Nhe I。
相应的,上述方法的步骤(b)为:以所述出发质粒为模板,以所述正向引物丙和所述反向引物丙进行PCR扩增,用限制性内切酶Nhe I酶切所得PCR产物,酶切产物自连,即得到所述质粒库。
所述质粒库在对待测DNA片段进行高通量两端测序中的应用也属于本发明的保护范围。
在所述应用中,所述待测DNA片段的长度可为15kb-400kb。
另外,满足如下条件的线性化质粒库也属于本发明的保护范围:
所述线性化质粒库与将上述本发明所提供的所述质粒库从所述待测DNA插入位点序列处进行线性化后得到的线性化片段,在序列上相同。
本发明的另一个目的是提供一种利用所述质粒库或所述线性化质粒对待测DNA片段进行高通量两端测序的方法。
本发明所提供的利用所述质粒库对待测DNA片段进行高通量两端测序的方法,流程图如图1所示,具体可包括如下步骤:
(1)按照如下设计正向引物A和反向引物A:
根据所述质粒骨架片段的3’端的序列设计正向引物1,根据所述质粒骨架片段的5’端的序列设计反向引物1;在所述正向引物1的5’端连接用于高通量测序的接头序列1,得到的引物记为正向引物A;在所述反向引物1的5’端连接与所述接头序列1配对使用的接头序列2,得到的引物记为反向引物A;
(2)以所述质粒库为模板,以所述正向引物A和所述反向引物A进行PCR扩增,得到PCR产物1;根据所述接头序列1和所述接头序列2将所得PCR产物1进行高通量测序,获得所述质粒库中每个质 粒的所述标记序列1和所述标记序列2的序列,将存在于同一质粒中的所述标记序列1和所述标记序列2配对;
(3)将批量待测DNA片段克隆到所述质粒库的所述待测DNA插入位点序列,所述质粒库中每个质粒克隆入一个所述待测DNA片段,将得到的重组质粒转化受体菌,获得DNA文库;
(4)从步骤(3)获得的DNA文库中提取重组质粒,获得重组质粒库;
(5)平行进行如下I)和II):
I)用限制性内切酶M酶切步骤(4)获得的重组质粒库,超声破碎,将破碎的DNA片段环化,得到环化DNA分子库1;
II)用限制性内切酶M’酶切步骤(4)获得的重组质粒库,超声破碎,将破碎的DNA片段环化,得到环化DNA分子库2;
所述限制性内切酶M和所述限制性内切酶M’满足如下条件:所述限制性内切酶M在所述质粒库中位于所述质粒骨架片段的3’端,所述限制性内切酶M’在所述质粒库中位于所述质粒骨架片段的5’端,且两者距离所述标记序列1或所述标记序列2的距离小于10kb;
所述限制性内切酶M和所述限制性内切酶M’可为相同的限制性内切酶,也可为不同的限制性内切酶;
(6)按照如下设计正向引物B、反向引物B、正向引物C和反向引物C:
根据所述质粒骨架片段的3’端的序列设计正向引物2和反向引物2,根据所述质粒骨架片段的5’端的序列设计正向引物3和反向引物3;
在所述正向引物2的5’端连接用于高通量测序的接头序列3,得到的引物记为正向引物B;在所述反向引物2的5’端连接与所述接头序列3配对使用的接头序列4,得到的引物记为反向引物B;
在所述正向引物3的5’端连接所述接头序列3,得到的引物记为正向引物C;在所述反向引物3的5’端连接所述接头序列4,得到的引物记为反向引物C;
(7)以步骤(5)获得的所述环化DNA分子库1为模板,以所述正向引物B和所述反向引物B进行PCR扩增,得到PCR产物2;
以步骤(5)获得的所述环化质粒库2为模板,以所述正向引物C和所述反向引物C进行PCR扩增,得到PCR产物3;
根据所述接头序列3和所述接头序列4将所得PCR产物2和所述PCR产物3分别进行高通量测序,从所述环化DNA分子库1中获得所述标记序列1及其下游的所述待测DNA片段的5’端序列,从所述环化DNA分子库2中获得所述标记序列2及其上游的所述待测DNA片段的3’端序列;
(8)根据步骤(2)中获得所述标记序列1和所述标记序列2的配对关系,确定每个所述待测DNA片段的两端序列,从而实现对所述待测DNA片段的高通量两端测序。
在所述方法的步骤(3)中,所述受体菌可为大肠杆菌。在本发明的一个实施例中,所述受体菌为大肠杆菌DH10b菌株。
在所述方法中,所述高通量测序可为第二代DNA测序。进行高通量测序时所采用的接头序列根据使用的测序仪确定。在本发明中,所采用的测序仪具体为illumina公司生产的Hiseq2000和Miseq测序仪。步骤(1)中的高通量测序(第一轮高通量测序)采用的为Hiseq2000测序仪;步骤(7)中的高通量测序(第二轮高通量测序)采用的为Miseq测序仪。相应的,所采用的接头序列如下:所述接头序列1和所述接头序列3的序列为:5’-AATGATACGGCGACCACCGAGATCTACACTCTTTCCCTACACGACGCTCTTCCGATCT-3’(SEQ ID NO:1);所述接头序列2和所述接头序列4的序列均为5’-CAAGCAGAAGACGGCATACGAGATNNNNNNGTGACTGGAGTTCAGACGTGTGCTCTTCCGATCT-3’(SEQ ID NO:2)(其中NNNNNN为反应索引index,用于和同一批上流室的其他样品区分开的序列)。
在所述方法的步骤(5)中,所述“超声破碎”具体可为使用Covaris公司生产的S220/E220聚焦超声破碎仪,瞬时最高功率105W,占空比5%处理40秒。所述“将破碎的DNA片段环化”具体可为使用DNA 末端修复酶(NEB)将破碎的DNA片段两端修复为平末端后用T4DNA连接酶(NEB)连接DNA两端实现环化。
在本发明的一个实施例中,步骤(5)中所述的限制性内切酶M和所述限制性内切酶M’均具体为限制性内切酶PvuII。
在所述方法中,所述待测DNA片段的长度可为15kb~400kb。
作为本领域技术人员,可以预见如下利用所述线性化质粒库进行高通量测序的方法的可行性:
(I)将所述线性化质粒库(如HindIII)直接连入所述待测DNA片段构建所述DNA文库(对应以上步骤(3));一方面,对所述DNA文库直接进行高通量测序(对应同以上步骤(4)-(7)),从而获得所述标记序列1及其下游的所述待测DNA片段的5’端序列,以及所述标记序列2及其上游的所述待测DNA片段的3’端序列;另一方面,将所述DNA文库中连入的所述待测DNA片段(如使用与之前线性化相同的酶HindIII)去除后,再将质粒骨架环化得到空质粒,然后再对所述空质粒进行高通量测序(对应以上步骤(1)-(2)),从而获得所述标记序列1和所述标记序列2的配对关系;
(II)根据步骤(I)中获得的信息,确定每个所述待测DNA片段的两端序列,从而实现对所述待测DNA片段的高通量两端测序。
以上方法也属于本发明的保护范围。
本发明制备了用随机序列标记的质粒库,使用这种质粒库构建的文库除了拥有传统文库的特点外,还可以使用高通量的测序法,如第二代测序法对其中的基因组DNA进行两端测序。本发明使得长片段DNA两端测序具有了快速、便宜且准确的特点。
附图说明
图1为利用本发明所提供的对待测DNA片段进行高通量两端测序的流程图。
图2为本发明所提供的随机序列标记的质粒库的构建方法示意图。
图3为以表1中的BAC载体a为例,其插入片段两端序列分别匹配到酵母基因组4号染色体的两处,而由于之前对空载体的测序可知插入片段两端序列相连的随机序列标签来自同一个载体,于是得到相距153,401bp的两段相互配对序列。
[根据细则26改正29.05.2015] 
图4为对1536个酵母BAC文库进行高通量两端测序的结果的分布图。
具体实施方式
下述实施例中所使用的实验方法如无特殊说明,均为常规方法。
下述实施例中所用的材料、试剂等,如无特殊说明,均可从商业途径得到。
pcc2FOS质粒:Epicentre公司产品,其产品目录号为ccfos059。
酵母S288C:美国模式菌种收集中心(ATCC),编号204508。
大肠杆菌EPI300:Epicentre公司产品,其产品目录号为EC300105。
大肠杆菌DH10b:Life technologies公司产品,其产品目录号为18297-010。
实施例1、用随机序列标记的质粒库的制备
本实施例以pcc2FOS质粒为例,构建将含有随机序列的外源片段替换pcc2FOS质粒的第362-403位核苷酸的质粒库。具体如下:
(1)根据pcc2FOS质粒待插入位置上游的序列设计用于扩增质粒骨架片段的反向引物甲,根据pcc2FOS质粒的待插入位置下游的序列设计用于扩增质粒骨架片段的正向引物甲。
(2)在所述反向引物甲的5’末端和所述正向引物的5’末端分别连接长度为15-25bp的随机序列为标签,得到的引物分别记为反向引物乙和正向引物乙;
在所述反向引物乙的5’末端顺次连接酶切位点Nhe I和BamH I的识别序列,得到的引物记为反向引物丙(序列如下);在所述正向引物乙的5’末端顺次连接酶切位点Nhe I和Hind III的识别序列,得到的引物记为正向引物丙(序列如下)。
正向引物丙:
5’-TAGC-GCTAGC-AAGCTT-CC-(N)15-25-GTGGGAGCCTCTAGAGTCG-3’(下划线部分依次为酶切位点Nhe I和Hind III的识别序列,(N)15-25后的序列为正向引物甲的序列,粗斜体的碱基G为pcc2FOS质粒的第410位的突变碱基);
反向引物丙:
5’-CGAT-GCTAGC-GGATCC-(N)15-25-GTGGGAGCCCCGGGTA-3’(下划线部分依次为酶切位点Nhe I和BamH I的识别序列,(N)15-25后的序列为反向引物甲的序列,粗斜体的碱基G为pcc2FOS质粒的第355位的突变碱基)。
其中,(N)15-25表示随机引物序列,N可为A、T、C或G;下角标处15-25表示随机引物碱基数目。
(3)以pcc2FOS质粒为模板,先以如下正向突变引物和反向突变引物进行PCR扩增,得到突变后的pcc2FOS。
正向突变引物:5’-ttcctaggctgtttcctggtgggaGcctctagagtcgacctgcaggcatgcGagctt-3’(SEQ ID NO:3)(第一个大写的G为410位的T碱基突变的G碱基,第二个大写的G为437位的A碱基突变的G碱基)
反向突变引物:5’-gtctaggtgtcgttgtacgtgggaGccccgggtaccgagctc-3’ (SEQ ID NO:4)(大写的G为第355位的A碱基突变的C碱基的反向互补碱基)
再以突变后的pcc2FOS为模板,以步骤(2)中的所述正向引物丙和所述反向引物丙进行PCR扩增,将PCR产物切胶回收,用NheI酶切,最后将酶切产物回收后自连环化得到随机序列标记的质粒库(图2),再将质粒转化入大肠杆菌EPI300后-80℃保存。
实施例2、利用实施例1制备的质粒库对待测DNA长片段进行高通量两端测序
本实施例中待测DNA长片段来自于酵母S288C的基因组(http://downloads.yeastgenome.org/sequence/S288C_reference/genome_releases/S288C_reference_genome_Current_Release.tgz)。
利用实施例1制备的质粒库对待测DNA长片段进行高通量两端测序的方法具体如下:
1、第一轮高通量测序
测序仪:Illumina公司生产的Hiseq2000测序仪。
(1)根据pcc2FOS质粒待插入位点上游的序列设计正向引物1,根据pcc2FOS质粒的待插入位点下游的序列设计反向引物1;在所述正向引物1的5’末端连接用于高通量测序的接头序列1,得到的引物记为正向引物A(序列如下);在所述反向引物1的5’末端连接与所述接头序列1配对使用的接头序列2,得到的引物记为反向引物A(序列如下)。
正向引物A:
5’-AATGATACGGCGACCACCGAGATCTACACTCTTTCCCTACACGACGCTCTTCCGATCT-acgactcactatagggcgaat-3’(SEQ ID NO:5)(大写字母序列为接头序列1,小写字母序列为正向引物1的序列);
反向引物A:
5’-CAAGCAGAAGACGGCATACGAGATNNNNNNGTGACTGGAGTTCAGACGTGTGCTCTTCCGATCT-cgccaagctatttaggtgagac-3’(SEQ ID NO:6)(大写字母序列为接头序列2,小写字母序列为反向引物1的序列);
其中,反向引物A的“NNNNNN”是反应索引(index),N可为A、T、C或G,用于和同一批上流室的其他样品区分开的序列。
(2)取实施例1冻存的含有质粒库的大肠杆菌EPI300转基因株,在LB液体培养基中培养后提取质粒。以所得质粒为模板,利用以所述正向引物A和所述反向引物A进行PCR扩增,得到PCR产物(随机序列-酶切位点识别序列-随机序列);根据所述接头序列1和所述接头序列2将所得PCR产物进行高通量测序,获得所述质粒库中每个质粒中两个随机序列的具体序列信息,将存在于同一质粒中的两个随机序列配对,即得到不同的随机序列相互配对关系。
2、插入待测DNA长片段建立文库
(1)酵母基因组DNA长片段的获得:收集液体培养的酵母S288C,消化细胞壁后,将酵母原生质体均匀地包埋在低熔点胶栓中。用蛋白酶K处理去掉蛋白质。将含酵母的胶栓用限制性内切酶Hind III预酶切,最终确定采用20U/ml的酶浓度37℃反应10分钟。最后用脉冲电泳回收120kb到300kb大小的酵母基因组DNA片段。
(2)用限制性内切酶Hind III酶切实施例1制备的质粒库,使用去磷酸化法或者部分补平法进行末端补平处理,使其无法自连,再加入步骤(1)中提取完毕的长片段基因组DNA进行连接。将插有长片段基因组DNA的质粒转化大肠杆菌DH10b,得到酵母S288C的基因组BAC文库。
3、第二轮高通量测序
测序仪:Illumina公司生产的Miseq测序仪。
(1)将整个BAC文库中的大肠杆菌一起培养,提取插有基因组片段的质粒(另随机抽取11个质粒,编号为a-k,进行Sanger测序,用于对本发明方法准确性的验证)。将质粒先使用限制性内切酶PvuII(在pcc2FOS质粒的待插入位置的上游和下游不远处,即第218bp和第651bp处均有一个PvuII酶切位点识别序列)酶切,再使用Covaris公司生产的S220/E220聚焦超声破碎仪,瞬时最高功率105W,占空比5%处理40秒。然后将碎片的DNA片段使用DNA末端修复酶(NEB公司)修复为平末端后用T4DNA连接酶(NEB公司)连接片段两端实现环化,得到环化DNA分子库。
(2)根据pcc2FOS质粒待插入位置上游的序列设计正向引物2和反向引物2,根据pcc2FOS质粒待插入位置下游的序列设计正向引物3和反向引物3;在所述正向引物2的5’末端连接用于高通量测序的接头序列3,得到的引物记为正向引物B(序列如下);在所述反向引物2的5’末端连接与所述接头序列3配对使用的接头序列4,得到的引物记为反向引物B(序列如下);在所述正向引物3的5’末端连接所述接头序列3,得到的引物记为正向引物C(序列如下);在所述反向引物3的5’末端连接所述接头序列4,得到的引物记为反向引物C(序列如下)。
正向引物B:
5’-AATGATACGGCGACCACCGAGATCTACACTCTTTCCCTACACGACGCTCTTCCGATCT-acgactcactatagggcgaat-3’(SEQ ID NO:7)(大写字母序列为接头序列3,小写序列为正向引物2的序列);
反向引物B:
5’-CAAGCAGAAGACGGCATACGAGATNNNNNNGTGACTGGAGTTCAGACGTGTGCTCTTCCGATCT-aatcgccttgcagcacatcc-3’(SEQ ID NO:8)(大写字母序列为接头序列4,小写字母序列为反向引物2的序列)。
正向引物C:
5’-AATGATACGGCGACCACCGAGATCTACACTCTTTCCCTACACGACGCTCTTCCGATCT-ttccagtcgggaaacctgtc-3’(SEQ ID NO:9)(大写字母序列为接头序列3,小写字母序列为正向引物3的序列);
反向引物C:
5’-CAAGCAGAAGACGGCATACGAGATNNNNNNGTGACTGGAGTTCAGACGTGTGCTCTTCCGATCT-cgccaagctatttaggtgagac-3’(SEQ ID NO:10)(大写字母序列为接头序列4,小写字母序列为反向引物3的序列)。
其中,反向引物B和反应引物C中,“NNNNNN”是反应索引(index),N可为A、T、C或G,用于和同一批上流室的其他样品区分开的序列。
(3)以步骤(1)获得的所述环化DNA分子库为模板,以由所述正向引物B和所述反向引物B组成的引物对,以及由所述正向引物C和所述反向引物C组成的引物对,分别进行PCR扩增,得到PCR产物;根据所述接头序列3和所述接头序列4将所得PCR产物分别进行高通量测序,最终得到随机序列标签和长片段基因组DNA末端序列的关系。
最后,利用步骤1得到的随机序列标签配对关系,以及随机序列和长片段基因组DNA末端序列的关系得到每个待测长片段DNA两端的序列。
以上述被编号为a-k的11个提取于步骤2获得的酵母S288C的基因组BAC文库中的BAC重组载体为例,将经过第二轮测序获得的序列结果,通过BLAST与酵母S288C基因组序列进行比对。结果显示,11个质粒中各个随机序列均能正确地引导各自连接的长片段基因 组序列进行配对。除了一个BAC重组载体的插入片段落入基因组重复区以外,其他载体的插入片段均能被正确地定位到酵母S288C的基因组上,且片段大小正常。具体结果详见表1和图3。
表1 11个BAC重组载体的测序比对结果
Figure PCTCN2015074981-appb-000001
可见,利用本发明实施例1制备的质粒库,参照实施例2的方法,可实现快速、准确的对待测DNA长片段进行高通量两端测序。
实施例3、对实施例2得到酵母S288C的基因组BAC文库进行的另一次第二轮高通量测序
测序仪:Illumina公司生产的Miseq测序仪。
(1)将整个BAC文库中的大肠杆菌一起培养,提取插有基因组片段的质粒。将质粒先使用限制性内切酶NotI(在pcc2FOS质粒的待插入位置的上游和下游不远处,即第3bp和第686bp处均有一个NotI酶切位点识别序列)酶切,再使用Covaris公司生产的S220/E220聚焦超声破碎仪,瞬时最高功率105W,占空比5%处理40秒。然后将碎片的DNA片段使用DNA末端修复酶(NEB公司)修复为平末端后用T4DNA连接酶(NEB公司)连接片段两端实现环化,得到环化DNA分子库。
(2)根据pcc2FOS质粒待插入位置上游的序列设计正向引物2和反向引物2,根据pcc2FOS质粒待插入位置下游的序列设计正向引物3和反向引物3;在所述正向引物2的5’末端连接用于高通量测序的接头序列3,得到的引物记为正向引物B(序列如下);在所述反向引物2的5’末端连接与所述接头序列3配对使用的接头序列4,得到的引物记为反向引物B(序列如下);在所述正向引物3的5’末端连接所述接头序列3,得到的引物记为正向引物C(序列如下);在所述反向引物3的5’末端连接所述接头序列4,得到的引物记为反向引物C(序列如下)。
正向引物B:
5’-AATGATACGGCGACCACCGAGATCTACACTCTTTCCCTACACGACGCTCTTCCGATCT-acgactcactatagggcgaat-3’(SEQ ID NO:11)(大写字母序列为接头序列3,小写序列为正向引物2的序列);
反向引物B:
5’-CAAGCAGAAGACGGCATACGAGATNNNNNNGTGACTGGAGTTCAGACGTGTGCTCTTCCGATCT-aagccagccccgacacc-3’(SEQ ID NO:12)(大写字母序列为接头序列4,小写字母序列为反向引物2的序列)。
正向引物C:
5’-AATGATACGGCGACCACCGAGATCTACACTCTTTCCCTACACGACGCTCTTCCGATCT-gcattaatgaatcggccaa-3’(SEQ ID NO:13)(大写字母序列为接头序列3,小写字母序列为正向引物3的序列);
反向引物C:
5’-CAAGCAGAAGACGGCATACGAGATNNNNNNGTGACTGGAGTTCAGACGTGTGCTCTTCCGATCT-cgccaagctatttaggtgagac-3’(SEQ ID NO:14)(大写字母序列为接头序列4,小写字母序列为反向引物3的序列)。
其中,反向引物B和反应引物C中,“NNNNNN”是反应索引(index),N可为A、T、C或G,用于和同一批上流室的其他样品区分开的序列。
(3)以步骤(1)获得的所述环化DNA分子库为模板,以由所述正向引物B和所述反向引物B组成的引物对,以及由所述正向引物C和所述反向引物C组成的引物对,分别进行PCR扩增,得到PCR产物;根据所述接头序列3和所述接头序列4将所得PCR产物分别进行高通量测序,最终得到随机序列标签和长片段基因组DNA末端序列的关系。
最后,利用步骤1得到的随机序列标签配对关系,以及随机序列和长片段基因组DNA末端序列的关系得到每个待测长片段DNA两端的序列。
通过上述方法,对1536个酵母BAC文库进行高通量两端测序,结果如下:
Figure PCTCN2015074981-appb-000002
得到了1251个BAC质粒的双端序列,并将其与基因组序列进行比对,发现超过99.8%的质粒的标签序列均能正确地引导各自连接的长片段基因组序列进行配对。

Claims (12)

  1. 质粒库,其特征在于:所述质粒库中的每个质粒均为由质粒骨架片段和具有特定结构的DNA片段连接而成双链环形DNA,所述具有特定结构的DNA片段自上游到下游依次由标记序列1、待测DNA插入位点序列和标记序列2组成;且
    所述质粒库中的任意两种质粒,所述标记序列1和所述标记序列2的组合彼此不同;且
    所述质粒库中,所述质粒骨架片段中不含有与所述待测DNA插入位点序列相同的序列。
  2. 权利要求1所述质粒库的制备方法,包括如下步骤:
    (a)按照如下(a1)-(a3)的步骤设计正向引物丙和反向引物丙:
    (a1)根据出发质粒的待插入位点或待取代区域上游的序列设计用于扩增质粒骨架片段的反向引物甲,根据所述出发质粒的待插入位点或待取代区域下游的序列设计用于扩增质粒骨架片段的正向引物甲;
    (a2)在所述反向引物甲的5’末端连接长度为10-200bp的序列A,得到的引物记为反向引物乙;在所述正向引物甲的5’末端连接长度为10-200bp的序列B,得到的引物记为正向引物乙;所述序列A和所述序列B为随机序列或含有多个离散的1bp或1bp以上随机序列的序列;
    (a3)在所述反向引物乙的5’末端连接序列C,得到的引物记为反向引物丙;在所述正向引物乙的5’末端连接序列D,得到的引物记为正向引物丙;
    所述序列C和所述序列D满足如下条件:所述序列C的5’端和所述序列D的5’端均含有一个所述质粒骨架片段上不存在的酶切位点K;且所述序列C的5’端和所述序列D的5’端反向互补;且所述序列C为所述待测DNA插入位点序列的一条链的5’端的反向互补序列;所述序列D为所述DNA插入位点序列的所述一条链的3’端的序列;
    (b)以所述出发质粒为模板,以所述正向引物丙和所述反向引物丙进行PCR扩增,将所得PCR产物用核酸内切酶K消化并自连后得到所述质粒库。
  3. 根据权利要求1所述的质粒库,其特征在于:所述标记序列1和所述标记序列2均为随机序列。
  4. 根据权利要求1所述的质粒库,其特征在于:所述质粒库中的任意两种质粒,所述质粒骨架片段和待测DNA插入位点序列均彼此相同。
  5. 根据权利要求1所述的质粒库,其特征在于:所述标记序列1和所述标记序列2的长度均为10-200bp。
  6. 根据权利要求1-5中任一所述的质粒库或方法,其特征在于:所述待测DNA插入位点序列为酶切位点识别序列;
    所述酶切位点识别序列的长度具体为4bp-100bp。
  7. 根据权利要求1-6中任一所述的质粒库或方法,其特征在于:
    所述质粒骨架片段来自于细菌人工染色体质粒、酵母人工染色体质粒、Fosmid质粒或Cosmid质粒;或
    所述出发质粒为细菌人工染色体质粒、酵母人工染色体质粒、Fosmid质粒或Cosmid质粒。
  8. 根据权利要求7所述的质粒库或方法,其特征在于:所述细菌人工染色体质粒为pcc2FOS质粒;或
    所述质粒骨架片段为将pcc2FOS质粒从第362位至第403位之间的核苷酸去除,同时将第355位的A碱基突变为C碱基,410位的T碱基突变为G碱基,437位的A碱基突变为G碱基后得到的片段。
  9. 根据权利要求8所述的质粒库或方法,其特征在于:在所述质粒库中,所述酶切位点识别序列为将BamH I、Nhe I和Hind III的识别序列顺次连接后形成的序列;或
    在所述方法的步骤(a3)中,所述序列C为将酶切位点Nhe I和BamH I的识别序列顺次连接后形成的序列;所述序列D为将酶切位点Nhe I和Hind III的识别序列顺次连接后形成的序列;或
    在所述方法的步骤(b)中,所述核酸内切酶K为限制性内切酶Nhe I。
  10. 线性化质粒库,其特征在于:所述线性化质粒库与将权利要求1和3-9中任一所述的质粒库从所述待测DNA插入位点序列处进行线性化后得到的线性化片段在序列上相同。
  11. 权利要求1和3-10中任一所述的质粒库或所述的线性化质粒库在对待测DNA片段进行高通量两端测序中的应用。
  12. 一种利用权利要求1和3-10中任一所述的质粒库或所述的线性化质粒库对待测DNA片段进行高通量两端测序的方法,包括如下步骤:
    (1)按照如下设计正向引物A和反向引物A:
    根据权利要求1和3-10任一中所述的质粒骨架片段的3’端的序列设计正向引物1,根据所述质粒骨架片段的5’端的序列设计反向引物1;在所述正向引物1的5’末端连接用于高通量测序的接头序列1,得到的引物记为正向引物A;在所述反向引物1的5’末端连接与所述接头序列1配对使用的接头序列2,得到的引物记为反向引物A;
    (2)以权利要求1和3-10中任一所述的质粒库为模板,以所述正向引物A和所述反向引物A进行PCR扩增,得到PCR产物1;根据所述接头序列1和所述接头序列2将所得PCR产物1进行高通量测序,获得所述质粒库中每个质粒的所述标记序列1和所述标记序列2的序列,将存在于同一质粒中的所述标记序列1和所述标记序列2配 对;
    (3)将批量待测DNA片段克隆到所述质粒库的所述酶切位点识别序列,所述质粒库中每个质粒克隆入一个所述待测DNA片段,将得到的重组质粒转化受体菌,获得DNA文库;
    (4)从步骤(3)获得的DNA文库中提取重组质粒,获得重组质粒库;
    (5)平行进行如下I)和II):
    I)用限制性内切酶M酶切步骤(4)获得的重组质粒库,超声破碎,将破碎的DNA片段环化,得到环化DNA分子库1;
    II)用限制性内切酶M’酶切步骤(4)获得的重组质粒库,超声破碎,将破碎的DNA片段环化,得到环化DNA分子库2;
    所述限制性内切酶M和所述限制性内切酶M’满足如下条件:所述限制性内切酶M在所述质粒库中位于所述质粒骨架片段的3’端,所述限制性内切酶M’在所述质粒库中位于所述质粒骨架片段的5’端,且两者距离权利要求1和3-10任一中所述的标记序列1或所述标记序列2的距离小于10kb;
    (6)按照如下设计正向引物B、反向引物B、正向引物C和反向引物C:
    根据权利要求1和3-10任一中所述的质粒骨架片段的3’端的序列设计正向引物2和反向引物2,根据所述质粒骨架片段的5’端的序列设计正向引物3和反向引物3;
    在所述正向引物2的5’末端连接用于高通量测序的接头序列3,得到的引物记为正向引物B;在所述反向引物2的5’末端连接与所述接头序列3配对使用的接头序列4,得到的引物记为反向引物B;
    在所述正向引物3的5’末端连接所述接头序列3,得到的引物记为正向引物C;在所述反向引物3的5’末端连接所述接头序列4,得到的引物记为反向引物C;
    (7)以步骤(5)获得的所述环化质粒库1为模板,以所述正向引物B和所述反向引物B进行PCR扩增,得到PCR产物2;
    以步骤(5)获得的所述环化质粒库2为模板,以所述正向引物C 和所述反向引物C进行PCR扩增,得到PCR产物3;
    根据所述接头序列3和所述接头序列4将所得PCR产物2和所述PCR产物3进行高通量测序,从所述环化DNA分子库1中获得所述标记序列1及其下游的所述待测DNA片段的5’端序列,从所述环化DNA分子库2中获得所述标记序列2及其上游的所述待测DNA片段的3’端序列;
    (8)根据步骤(2)中获得所述标记序列1和所述标记序列2的配对关系,确定每个所述待测DNA片段的两端序列,从而实现对所述待测DNA片段的高通量两端测序。
PCT/CN2015/074981 2014-03-26 2015-03-24 包含两个随机标记的质粒库及其在高通量测序中的应用 WO2015144045A1 (zh)

Priority Applications (1)

Application Number Priority Date Filing Date Title
US15/128,557 US20200131504A1 (en) 2014-03-26 2015-03-24 Plasmid library comprising two random markers and use thereof in high throughput sequencing

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201410116844.2A CN103882530B (zh) 2014-03-26 2014-03-26 用随机序列标记质粒对dna片段进行高通量两端测序的方法
CN201410116844.2 2014-03-26

Publications (1)

Publication Number Publication Date
WO2015144045A1 true WO2015144045A1 (zh) 2015-10-01

Family

ID=50951639

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2015/074981 WO2015144045A1 (zh) 2014-03-26 2015-03-24 包含两个随机标记的质粒库及其在高通量测序中的应用

Country Status (3)

Country Link
US (1) US20200131504A1 (zh)
CN (1) CN103882530B (zh)
WO (1) WO2015144045A1 (zh)

Families Citing this family (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103882530B (zh) * 2014-03-26 2016-02-24 清华大学 用随机序列标记质粒对dna片段进行高通量两端测序的方法
CN106367485B (zh) * 2016-08-29 2019-04-26 厦门艾德生物医药科技股份有限公司 一种用于检测基因突变的多定位双标签接头组及其制备方法和应用
CN107034210A (zh) * 2017-05-09 2017-08-11 古博 增强子筛选高通量测序文库简易构建的载体制作方法
CN108866173A (zh) * 2017-05-16 2018-11-23 深圳华大基因科技服务有限公司 一种标准序列的验证方法、装置及其应用
WO2018232595A1 (zh) * 2017-06-20 2018-12-27 深圳华大智造科技有限公司 Pcr引物对及其应用
CN110527715A (zh) * 2019-09-16 2019-12-03 中国科学院遗传与发育生物学研究所农业资源研究中心 一种功能基因组克隆子库的测序方法

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO1990000621A1 (en) * 1988-07-15 1990-01-25 Rijksuniversiteit Te Leiden Dna sequencing method and primer suitable therefor
WO2007011733A2 (en) * 2005-07-18 2007-01-25 Pioneer Hi-Bred International, Inc. Modified frt recombination sites and methods of use
CN103882530A (zh) * 2014-03-26 2014-06-25 清华大学 用随机序列标记质粒对dna片段进行高通量两端测序的方法

Family Cites Families (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5356773A (en) * 1989-05-16 1994-10-18 Kinetic Investments Limited Generation of unidirectional deletion mutants
US9018138B2 (en) * 2007-08-16 2015-04-28 The Johns Hopkins University Compositions and methods for generating and screening adenoviral libraries

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO1990000621A1 (en) * 1988-07-15 1990-01-25 Rijksuniversiteit Te Leiden Dna sequencing method and primer suitable therefor
WO2007011733A2 (en) * 2005-07-18 2007-01-25 Pioneer Hi-Bred International, Inc. Modified frt recombination sites and methods of use
CN103882530A (zh) * 2014-03-26 2014-06-25 清华大学 用随机序列标记质粒对dna片段进行高通量两端测序的方法

Also Published As

Publication number Publication date
CN103882530B (zh) 2016-02-24
US20200131504A1 (en) 2020-04-30
CN103882530A (zh) 2014-06-25

Similar Documents

Publication Publication Date Title
WO2015144045A1 (zh) 包含两个随机标记的质粒库及其在高通量测序中的应用
US11898270B2 (en) Pig genome-wide specific sgRNA library, preparation method therefor and application thereof
US11453876B2 (en) Compositions and methods for identifying polynucleotides of interest
US10344269B2 (en) Recombinase mutants
US10036007B2 (en) Method of synthesis of gene library using codon randomization and mutagenesis
US10385334B2 (en) Molecular identity tags and uses thereof in identifying intermolecular ligation products
US6248569B1 (en) Method for introducing unidirectional nested deletions
US10837012B2 (en) Compositions and methods for polynucleotide assembly
CN106636065B (zh) 一种全基因组高效基因区富集测序方法
WO2005003389A2 (en) In vitro amplification of dna
CN104357438B (zh) 一种dna组装和克隆的方法
WO2023060539A1 (en) Compositions and methods for detecting target cleavage sites of crispr/cas nucleases and dna translocation
JP2024509194A (ja) インビボdnaアセンブリー及び解析
JP2017516498A (ja) 大きな挿入断片に由来するメイトペア配列
CN117677694A (zh) 体内dna组装和分析
WO2005038026A1 (ja) 変異のタイピング方法
WO2004101783A1 (fr) Methode d'utilisation d'une sequence d'adn cible

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 15768494

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 15768494

Country of ref document: EP

Kind code of ref document: A1