WO2020164015A1 - 用于三代测序建库的融合引物、建库方法、测序方法和建库试剂盒 - Google Patents

用于三代测序建库的融合引物、建库方法、测序方法和建库试剂盒 Download PDF

Info

Publication number
WO2020164015A1
WO2020164015A1 PCT/CN2019/074977 CN2019074977W WO2020164015A1 WO 2020164015 A1 WO2020164015 A1 WO 2020164015A1 CN 2019074977 W CN2019074977 W CN 2019074977W WO 2020164015 A1 WO2020164015 A1 WO 2020164015A1
Authority
WO
WIPO (PCT)
Prior art keywords
sequence
primer
homologous recombination
fusion primer
sequencing
Prior art date
Application number
PCT/CN2019/074977
Other languages
English (en)
French (fr)
Inventor
黄标
黄金
丁芬
朱欢文
田志坚
Original Assignee
武汉华大医学检验所有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 武汉华大医学检验所有限公司 filed Critical 武汉华大医学检验所有限公司
Priority to CN201980082604.6A priority Critical patent/CN113166756B/zh
Priority to PCT/CN2019/074977 priority patent/WO2020164015A1/zh
Publication of WO2020164015A1 publication Critical patent/WO2020164015A1/zh

Links

Images

Classifications

    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12NMICROORGANISMS OR ENZYMES; COMPOSITIONS THEREOF; PROPAGATING, PRESERVING, OR MAINTAINING MICROORGANISMS; MUTATION OR GENETIC ENGINEERING; CULTURE MEDIA
    • C12N15/00Mutation or genetic engineering; DNA or RNA concerning genetic engineering, vectors, e.g. plasmids, or their isolation, preparation or purification; Use of hosts therefor
    • C12N15/09Recombinant DNA-technology
    • C12N15/10Processes for the isolation, preparation or purification of DNA or RNA
    • CCHEMISTRY; METALLURGY
    • C40COMBINATORIAL TECHNOLOGY
    • C40BCOMBINATORIAL CHEMISTRY; LIBRARIES, e.g. CHEMICAL LIBRARIES
    • C40B50/00Methods of creating libraries, e.g. combinatorial synthesis
    • C40B50/06Biochemical methods, e.g. using enzymes or whole viable microorganisms

Definitions

  • the present invention relates to the technical field of third-generation sequencing, in particular to a fusion primer, a method for constructing a library, a sequencing method and a kit for constructing a library for the third-generation sequencing library.
  • Pacbio's third-generation sequencing is based on the principle of sequencing while synthesizing. It uses SMRT (Single Molecule Real-Time Fluorescence Sequencing Technology) chips as a carrier to perform sequencing reactions. During sequencing, genomic DNA is broken into many small fragments, which are then dispersed into droplets. Into different ZMW nanopores. When the polymerization reaction occurs at the bottom of the ZMW well, the nucleotides labeled with different fluorescence will be retained by the polymerase in the fluorescence detection area of the small hole. According to the type of fluorescence and the duration of fluorescence, the type of template DNA base composition can be determined.
  • SMRT Single Molecule Real-Time Fluorescence Sequencing Technology
  • An SMRT chip on the Pacbio platform has 1 million ZMW sequencing holes. Each sequencing hole can generate a piece of sequence information (about 20-30Kb in length). On average, each chip can produce 5-15G data, but for species with smaller genomes When the amount of data required is small (the data requirement is less than 1G), it is often necessary to add different molecular tags to each sample, and mix and sequence. After the sequencing is completed, the molecular tag sequence is used to separate the corresponding samples of each sample. information.
  • Metagenomics also known as microbial environmental genomics, constructs a metagenomic library by directly extracting the DNA of all microorganisms from environmental samples, and uses the research strategy of genomics to study the genetic composition and communities of all microorganisms contained in environmental samples Features.
  • the current traditional method is to determine the 16S rDNA genes in the microbial genome. These genes are usually 1.5Kb in length and are widely distributed in prokaryotes. They can provide sufficient information and have a relatively slow evolutionary process; their conservation and specificity Coexisting, the species of microorganisms are distinguished by conserved and specific regions. Based on these characteristics, scientists can easily study the composition and diversity of species in the environment by selecting these gene regions, but they have not yet been able to fully analyze the gene functions in the environment. Now, with the widespread application of a new generation of high-throughput and low-cost sequencing technology, scientists can sequence the entire genome in the environment, and after obtaining massive amounts of data, comprehensively analyze the microbial community structure and gene function composition.
  • 16S rDNA is the most commonly used "molecular clock" in bacterial taxonomy, and its sequence contains 9 variable regions and 10 conserved regions. The variable region differs from bacteria to bacteria, and the degree of variation is closely related to the phylogeny of bacteria. By detecting the sequence variation and abundance of 16S rDNA, we can understand community diversity information in environmental samples. The analysis based on 16S rDNA plays an important role in the classification and identification of microorganisms and microecological research. With the development of DNA sequencing technology, 16S sequencing can be roughly divided into three stages. The first stage is the first-generation sequencing represented by the ABI 3730 sequencer.
  • this method can perform full-length sequencing of 16S, 18S, and ITS rDNA, which can assist conventional bacterial identification methods and improve The accuracy of strain identification.
  • This method is only suitable for species identification of single strains that can be classified and cultured, and more than 99% of the strains in the environment cannot be isolated and cultured, which greatly limits the application scope of first-generation sequencing.
  • the second stage is the second-generation sequencing with high throughput and low cost as the main feature. This method has the advantages of high throughput, high accuracy and low cost. It can be widely used in the identification of bacterial species that cannot be classified and cultivated.
  • next-generation sequencing can only sequence one or several variable regions of the full-length 16S rDNA of 1.5Kb, such as V4, V3-V4, V1-V3, V4- V5, etc., are difficult to reach the level of first-generation sequencing in terms of species classification and identification accuracy.
  • the three-generation sequencing with single molecule and long read length as the main feature can achieve high-throughput full-length 16S rDNA sequencing. It does not require separation and culture to obtain the full-length 16S rDNA information of all microorganisms in the community, achieving accurate species identification by first-generation sequencing And the breadth of application of next-generation sequencing. At present, the cost of third-generation sequencing is relatively high, which limits the large-scale application of third-generation sequencing; with the increase in throughput and the decrease in cost, third-generation sequencing is expected to replace second-generation sequencing as the main method of microbial community research.
  • Homologous Recombination refers to the recombination that occurs between sister chromatins or between or within DNA molecules containing homologous sequences on the same chromosome. Homologous recombination requires a series of protein catalysis. At present, homologous recombination technology is widely used in the field of molecular cloning. The target fragment is connected into the vector by homologous recombination.
  • This application provides a fusion primer, a library building method, a sequencing method, and a library building kit for third-generation sequencing library building.
  • the amplified products are homologously recombined and spliced to increase the sequencing chip Increase the number of samples and reduce sequencing costs.
  • an embodiment provides a fusion primer for third-generation sequencing library construction, the fusion primer includes a homologous recombination sequence, a tag sequence and a specific amplification primer sequence from 5'end to 3'end in sequence , wherein the aforementioned homologous recombination sequence is used for homologous recombination and splicing of the amplification products of the aforementioned fusion primer, the aforementioned tag sequence is used to distinguish different amplification products, and the aforementioned specific amplification primer sequence is used to combine with the target sequence to perform Primer extension.
  • the aforementioned target sequence is 16S rDNA
  • the aforementioned specific amplification primer sequence is a sequence that specifically binds to the aforementioned 16S rDNA.
  • the length of the aforementioned homologous recombination sequence is 16 bp.
  • the above-mentioned fusion primer is selected from any one or more pairs of primers in SEQ ID NO: 1 to 48.
  • an embodiment provides a third-generation sequencing library construction method, the method comprising: using the fusion primer of the first aspect to amplify the target sequence; and then passing the amplified product through the homology on the fusion primer.
  • the recombined sequence is spliced by homologous recombination to obtain a spliced product including at least two amplified products.
  • the aforementioned target sequence is 16S rDNA.
  • the above-mentioned target sequence is a 16S rDNA full-length sequence.
  • every 2 to 20 aforesaid amplification products are spliced together.
  • every 2 to 4 above-mentioned amplification products are spliced together.
  • the length of the amplification product is 1.5 Kb, and the length of the splicing product is 3 to 6 Kb.
  • the aforementioned homologous recombination splicing uses NEBuilder homologous recombinase.
  • the above method further includes performing damage repair reaction, end repair reaction and ligation joint reaction on the splicing product.
  • the above method further includes enzymatic digestion of the product of the above ligation linker reaction to remove unlinked linkers, and fragment sorting to obtain a product of a predetermined size.
  • an embodiment provides a third-generation sequencing method, which includes performing computer-based sequencing on a library obtained by the library construction method of the second aspect.
  • an embodiment provides a third-generation sequencing library construction kit, which includes the fusion primer of the first aspect.
  • the above kit further includes a homologous recombinase.
  • the above-mentioned homologous recombinase is NEBuilder homologous recombinase.
  • the aforementioned kit further includes reagents for damage repair reaction, end repair reaction and ligation linker reaction.
  • the above kit further includes digestive enzymes.
  • the above-mentioned digestive enzymes include ExoIII and ExoVII digestive enzymes.
  • the amplified products are spliced by homologous recombination, which increases the number of samples on the sequencing chip and reduces the sequencing cost.
  • the target sequence is 16S rDNA
  • the full-length 16S rDNA (1.5Kb) of each sample is amplified by PCR with the fusion primer of the present invention, and then multiple (for example, 2-4 A) amplified products are spliced into longer (e.g. 3-6Kb) homologous recombination splicing products, and then multiple (e.g. 10-12) homologous recombination splicing products are mixed to build a library and sequenced on each chip.
  • Increase the number of samples from 10-12 to 20-48, that is, the number of amplicon samples on each chip can be increased by about 3 times, thereby effectively reducing the cost of library construction and sequencing, increasing the flexibility of mixing libraries, and effectively shortening Lead time.
  • Figure 1 is a schematic diagram of the structure of a fusion primer amplified product in an embodiment of the present invention
  • FIG. 2 is a schematic diagram of the splicing principle of homologous recombination of amplified products in an embodiment of the present invention.
  • the fusion primer used for the third-generation sequencing library construction of the present invention includes a homologous recombination sequence, a tag sequence and a specific amplification primer sequence from 5'end to 3'end.
  • the homologous recombination sequence is used to expand the fusion primer.
  • the amplified product is spliced by homologous recombination, the tag sequence is used to distinguish different amplified products, and the specific amplification primer sequence is used to combine with the target sequence for primer extension.
  • both ends of each amplified product include the above-mentioned fusion primer structure, and the middle is an insert.
  • the inserted fragment can be 16S rDNA, especially the full-length 16S rDNA sequence, and the length is generally about 1.5Kb.
  • the inserted fragment can also be 18S rDNA, etc.
  • the use of the fusion primer of the present invention to amplify 18S rDNA can classify and identify eukaryotic microorganisms such as yeast.
  • the length of the homologous recombination sequence in the fusion primer of the present invention is generally 5 to 25 bp, such as 5 bp, 6 bp, 8 bp, 10 bp, 12 bp, 15 bp, 16 bp, 18 bp, 20 bp, 23 bp or 25 bp, etc., preferably 16 bp.
  • the base composition of the homologous recombination sequence needs to pay attention to the base balance. Generally, three consecutive identical bases do not appear.
  • the amplified products with the same homologous recombination sequence can be spliced by homologous recombination through the same homologous recombination sequence to form a spliced product.
  • 2 to 20 amplified products are spliced together.
  • 2 to The 4 amplified products are spliced together.
  • the length of the amplified product is 1.5Kb, for example, when the amplified product is 16S rDNA, the length of the spliced product obtained by splicing 2 to 4 amplified products together is 3 to 6Kb.
  • the fusion primers of the present invention can be divided into different groups.
  • the homologous recombination sequences of the fusion primers in the same group are the same, so their amplified products can be homologously recombined and spliced with each other, and the homologous recombination sequences between different groups are different.
  • the tag sequence (Barcode) is used to distinguish different amplification products, that is, to distinguish amplification products from different sample sources.
  • This tag sequence can be a sequence of any suitable length (for example, 10 to 20 bp). In one embodiment, the tag sequence It is the official sequence published by Pacbio, which is 16 bases in length.
  • the tag sequences of different fusion primers are different from each other, so that each fusion primer can specifically amplify a sample, and the sample can be distinguished from each other by adding a specific tag sequence to the sample.
  • the specific amplification primer sequence is used to bind to the target sequence for primer extension.
  • the specific amplification primer sequence is basically determined by the target sequence.
  • the specific amplification primer sequence is The 16S rDNA complementary pairing sequence belongs to the fixed specific sequence on the fusion primer and cannot be modified, that is, different fusion primers have the same specific amplification primer sequence.
  • An embodiment of the present invention provides a method for constructing a third-generation sequencing library.
  • the method includes: using the fusion primer of the present invention to amplify a target sequence; and then performing homologous recombination sequences on the amplified product through the fusion primer.
  • Source recombination splicing to obtain spliced products including at least two amplified products.
  • a homologous recombinase such as NEBuilder homologous recombinase
  • the 1.5Kb amplified products are spliced into 3 to 6Kb homologous recombination splicing products, and then 10 to 12 homologous recombination splicing products are mixed to build a library and sequenced on the computer, and the number of samples on each chip is changed from 10 to 12 The number is increased to 20 to 48, thereby reducing the cost of library construction and sequencing.
  • the splicing product is subjected to damage repair reaction, end repair reaction and ligation linker reaction, and the product of the ligation linker reaction is subjected to enzymatic digestion to remove unlinked linkers, and fragments are sorted to obtain
  • the product of the predetermined size is the final library of the present invention.
  • An embodiment of the present invention provides a third-generation sequencing library construction kit, which includes the fusion primer of the present invention. It can also include homologous recombinase, such as NEBuilder homologous recombinase. In addition, it also includes reagents for damage repair reactions, end repair reactions, and ligation linker reactions, as well as digestive enzymes, such as ExoIII and ExoVII digestive enzymes.
  • homologous recombinase such as NEBuilder homologous recombinase.
  • reagents for damage repair reactions, end repair reactions, and ligation linker reactions as well as digestive enzymes, such as ExoIII and ExoVII digestive enzymes.
  • the 5'end is the homologous recombination sequence (splicing sequence), the middle underlined sequence is the tag sequence (Barcode), and the 3'end is the 16S rDNA specific amplification primer sequence; the amplification products of primers numbered 1 and 2 correspond to one set Splicing, where 1_R and 2_F are splicing recognition sites.
  • the amplified products of primers numbered 3 and 4 correspond to a set of splicing, among which 3_R and 4_F are splicing recognition sites. In every two pairs, the downstream of the first pair of primers and the upstream of the second pair of primers are splicing recognition sites, and so on.
  • Amplification system Volume 10X pfx Buffer 5 MgSO 4 (50mM) 2 dNTP (10mM) 2 pfx enzyme (2.5U) 1 P1 primer (10 ⁇ M) 2 P2 primer (10 ⁇ M) 2 DNA 6 Nuclease-free (NF) water 30
  • the size of two 16S rDNA full-length splicing objects is about 3Kb, and the sorting range in the experiment is roughly in the range of 2-4Kb.
  • 16S splicing product 2 WH1812009964 1520 ⁇ 3243 13.80
  • 16S splicing product 3 WH1812009966 1709 ⁇ 3226 18.90 16S splicing product 4 WHYD18125226_A 1701 ⁇ 3201 19.70 16S splicing product 5 WHYD18125229_A 1667 ⁇ 3216 18.80 16S splicing product 6 WH1812004714 1600 ⁇ 3181 15.80 16S splicing product 7 WH1812004716 1507 ⁇ 3178 2.00 16S splicing product 8 WH1812004718 1544 ⁇ 3306 21.10 16S splicing product 9 WH1812004720 1543 ⁇ 3207 19.30 16S splicing product 10 WHYD18066115_C 1524 ⁇ 3112 26.90 16S splicing product 11 WHYD18066119_C 1517 ⁇ 3034 19.40 16S splicing product 12 WHYD18066121_C 1630 ⁇ 3240 23.50
  • sequencing result data statistics are as follows:
  • Sequencing results before splicing the data volume after removing the RQ quality value less than 0.8: 7.58Gb (data volume qualified standard is greater than 5Gb); the enzyme read length after removing the RQ quality value less than 0.8: 19057bp (enzyme read length qualified standard is greater than 10kb ).
  • Sequencing results after splicing the data volume after removing the RQ quality value less than 0.8: 8.34Gb (the data volume qualification standard is greater than 5Gb); the enzyme read length after removing the RQ quality value less than 0.8: 13761bp (the enzyme reading length qualification standard is greater than 10kb ).
  • RQ reads quality
  • the sequencer will automatically filter out the data.
  • the amount of data represents the total amount of data generated by the entire chip after filtering out low-quality reads (Reads), in Gb.
  • the average enzyme read length represents the average length of all read lengths (Reads) of the entire chip after filtering out low-quality read lengths (Reads), in units of bp.
  • the number of amplicon samples on each chip can be increased by about 3 times, the sequencing cost can be effectively saved (it is expected to be saved from the existing 1500RMB/sample to 500RMB/sample), and pooling can be increased.
  • the flexibility of) can effectively shorten the lead time.

Landscapes

  • Chemical & Material Sciences (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Organic Chemistry (AREA)
  • Health & Medical Sciences (AREA)
  • Genetics & Genomics (AREA)
  • Biochemistry (AREA)
  • Engineering & Computer Science (AREA)
  • Biomedical Technology (AREA)
  • Biotechnology (AREA)
  • General Engineering & Computer Science (AREA)
  • Zoology (AREA)
  • Molecular Biology (AREA)
  • Wood Science & Technology (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Microbiology (AREA)
  • Plant Pathology (AREA)
  • Biophysics (AREA)
  • Physics & Mathematics (AREA)
  • General Health & Medical Sciences (AREA)
  • Crystallography & Structural Chemistry (AREA)
  • Chemical Kinetics & Catalysis (AREA)
  • General Chemical & Material Sciences (AREA)
  • Medicinal Chemistry (AREA)
  • Measuring Or Testing Involving Enzymes Or Micro-Organisms (AREA)

Abstract

一种用于三代测序建库的融合引物、建库方法、测序方法和建库试剂盒。融合引物从5'端至3'端依次包括同源重组序列、标签序列和特异性扩增引物序列,其中同源重组序列用于将融合引物的扩增产物进行同源重组拼接,标签序列用于区分不同的扩增产物,特异性扩增引物序列用于与靶标序列结合以进行引物延伸。使用融合引物扩增靶标序列,对扩增产物进行同源重组拼接,增加测序芯片上样本数量,降低测序成本。

Description

用于三代测序建库的融合引物、建库方法、测序方法和建库试剂盒 技术领域
本发明涉及三代测序技术领域,具体涉及一种用于三代测序建库的融合引物、建库方法、测序方法和建库试剂盒。
背景技术
Pacbio第三代测序基于边合成边测序的原理,以SMRT(单分子实时荧光测序技术)芯片为载体进行测序反应,测序时将基因组DNA打断成许多小片段,制成液滴后将其分散到不同的ZMW纳米孔中。当ZMW孔底部聚合反应发生时,被不同荧光标记的核苷酸会在小孔的荧光探测区域中被聚合酶滞留,根据荧光的种类和荧光持续时间就可以判定模板DNA碱基组成的种类。
Pacbio平台一个SMRT芯片有100万个ZMW测序孔,每一个测序孔可以产生一条序列信息(大概长度为20-30Kb),平均每张芯片可以产出5-15G数据,但是对于基因组较小的物种,所需要的数据量较少时(数据需求小于1G),往往需要把每个样本加上不同的分子标签,混合测序,测序完成后,通过分子标签序列,拆分出对应的每个样本的信息。
宏基因组学(Metagenomics)又叫微生物环境基因组学,通过直接从环境样品中提取全部微生物的DNA,构建宏基因组文库,利用基因组学的研究策略研究环境样品所包含的全部微生物的遗传组成及其群落功能。
目前传统的方法是测定微生物基因组上的16S rDNA基因,这些基因的长度通常在1.5Kb,广泛分布于原核生物,既能提供足够的信息,而且具有相对缓慢的进化过程;其保守性与特异性并存,通过保守区和特异区来区别微生物的种属。基于这些特性,科学家们通过选择这些基因区域,方便地研究环境中物种的组成多样性,但是还不能全面分析环境中的基因功能。而现在,新一代高通量低成本测序技术的广泛应用,科学家们可以对环境中的全基因组进行测序,在获得海量的数据后,全面地分析微生物群落结构以及基因功能组成等。
16S rDNA是细菌分类学研究中最常用的“分子钟”,其序列包含9个可变区和10个保守区。可变区因细菌而异,且变异程度与细菌的系统发育密切相关。通过检测16S rDNA的序列变异和丰度,可以了解环境样品中群落多样性信息。基于16S rDNA的分析在微生物分类 鉴定、微生态研究等方面起到重要作用。16S测序随着DNA测序技术的发展大致可以分为三个阶段。第一阶段以ABI 3730测序仪为代表的一代测序,该方法凭借其长的序列片段和高的准确率,可以对16S、18S、ITS rDNA进行全长测序,可辅助常规菌种鉴定方法,提高菌种鉴定的准确度。该方法只适用于能够分类培养的单菌株的物种鉴定,而环境中99%以上的菌株都是不能分离培养的,这也就大大限制了一代测序的应用范围。第二阶段以高通量低成本为主要特征的二代测序,该方法具有高通量、高准确度、低成本的优点,可以广泛应用于不能分类培养群落的菌种鉴定,是目前常用的16S rDNA测序平台;由于读长的限制,二代测序只能对1.5Kb的16S rDNA全长的某一或某几个可变区进行测序,如V4、V3-V4、V1-V3、V4-V5等,在物种分类鉴定准确度上难以达到一代测序的水平。
以单分子和长读长为主要特征的三代测序可以实现高通量全长16S rDNA测序,不需要分离培养,即可获得群落中所有微生物的全长16S rDNA信息,达到一代测序的物种鉴定准确性和二代测序的应用广度。目前,三代测序成本较高,限制了三代测序的大范围应用;随着通量的提升和成本的下降,三代测序有望替代二代测序成为微生物群落研究的主要方法。
同源重组(Homologous Recombination)是指发生在姐妹染色单体(sister chromatin)之间或同一染色体上含有同源序列的DNA分子之间或分子之内的重新组合。同源重组需要一系列的蛋白质催化,目前同源重组技术,广泛应用于分子克隆领域,将目的片段通过同源重组的方法连接进载体。
由于16S rDNA在文库插入片段长度大致在1500bp,而Pacbio平台目前读长平均在20-30kb,同一个插入片段会反复读取10-20个循环,一般同一个片段读取达到四次以上99.9%的错误都会被校准过来。这样往往会造成较多的数据浪费,无形之中增加测序成本,限制了Pacbio平台和扩增子项目的发展。
发明内容
本申请提供一种用于三代测序建库的融合引物、建库方法、测序方法和建库试剂盒,通过使用本发明的融合引物扩增,对扩增产物进行同源重组拼接,增加测序芯片上样本数量,降低测序成本。
根据第一方面,一种实施例中提供一种用于三代测序建库的融合引物,该融合引物从5’端至3’端依次包括同源重组序列、标签序列和特异性扩增引物序列,其中上述同源重组序列用于将上述融合引物的扩增产物进行同源重组拼接,上述标签序列用于区分不同的扩增产物, 上述特异性扩增引物序列用于与靶标序列结合以进行引物延伸。
在优选实施例中,上述靶标序列是16S rDNA,上述特异性扩增引物序列是与上述16S rDNA特异性结合的序列。
在优选实施例中,上述同源重组序列的长度是5至25bp。
在优选实施例中,上述同源重组序列的长度是16bp。
在优选实施例中,上述融合引物选自SEQ ID NO:1至48中任意一对或多对引物。
根据第二方面,一种实施例中提供一种三代测序文库构建方法,该方法包括:采用第一方面的融合引物对靶标序列进行扩增;然后将扩增产物通过上述融合引物上的同源重组序列进行同源重组拼接,得到包括至少两段扩增产物的拼接产物。
在优选实施例中,上述靶标序列是16S rDNA。
在优选实施例中,上述靶标序列是16S rDNA全长序列。
在优选实施例中,每2至20个上述扩增产物拼接在一起。
在优选实施例中,每2至4个上述扩增产物拼接在一起。
在优选实施例中,上述扩增产物的长度是1.5Kb,上述拼接产物的长度是3至6Kb。
在优选实施例中,上述同源重组拼接采用NEBuilder同源重组酶。
在优选实施例中,上述方法还包括对上述拼接产物进行损伤修复反应、末端修复反应和连接接头反应。
在优选实施例中,上述方法还包括对上述连接接头反应的产物进行酶消化以去除未连接上的接头,以及进行片段分选以得到预定大小的产物。
根据第三方面,一种实施例中提供一种三代测序方法,该方法包括对第二方面的文库构建方法得到的文库进行上机测序。
根据第四方面,一种实施例中提供一种三代测序文库构建试剂盒,该试剂盒包括第一方面的融合引物。
在优选实施例中,上述试剂盒还包括同源重组酶。
在优选实施例中,上述同源重组酶是NEBuilder同源重组酶。
在优选实施例中,上述试剂盒还包括损伤修复反应、末端修复反应和连接接头反应的试 剂。
在优选实施例中,上述试剂盒还包括消化酶。
在优选实施例中,上述消化酶包括ExoIII和ExoVII消化酶。
通过使用本发明的融合引物扩增,对扩增产物进行同源重组拼接,增加测序芯片上样本数量,降低测序成本。具体而言,对于靶标序列是16S rDNA的情况,通过本发明的融合引物PCR扩增出每个样本的16S rDNA全长(1.5Kb)后,使用同源重组酶将多个(例如2-4个)扩增产物拼接成较长(例如3-6Kb)的同源重组拼接产物,然后将多个(例如10-12个)同源重组拼接产物混合建库和上机测序,每张芯片上的样本数由10-12个增加至20-48个,即能够将每张芯片上面的扩增子样本数增加3倍左右,从而有效地降低建库测序成本,增加混库灵活度,有效缩短交付周期。
附图说明
图1为本发明实施例中融合引物扩增产物的结构示意图;
图2为本发明实施例中扩增产物同源重组拼接原理示意图。
具体实施方式
下面通过具体实施方式结合附图对本发明作进一步详细说明。在以下的实施方式中,很多细节描述是为了使得本申请能被更好的理解。然而,本领域技术人员可以毫不费力的认识到,其中部分特征在不同情况下是可以省略的,或者可以由其他元件、材料、方法所替代。
另外,说明书中所描述的特点、操作或者特征可以以任意适当的方式结合形成各种实施方式。同时,方法描述中的各步骤或者动作也可以按照本领域技术人员所能显而易见的方式进行顺序调换或调整。因此,说明书和附图中的各种顺序只是为了清楚描述某一个实施例,并不意味着是必须的顺序,除非另有说明其中某个顺序是必须遵循的。
本发明的用于三代测序建库的融合引物,从5’端至3’端依次包括同源重组序列、标签序列和特异性扩增引物序列,其中同源重组序列用于将融合引物的扩增产物进行同源重组拼接,标签序列用于区分不同的扩增产物,特异性扩增引物序列用于与靶标序列结合以进行引物延伸。如图1所示,每个扩增产物的两端都包括上述融合引物结构,中间是插入片段。插入片段可以是16S rDNA,尤其是16S rDNA全长序列,长度一般是1.5Kb左右。此外,插入片段 还可以是18S rDNA等,使用本发明的融合引物扩增18S rDNA,可分类鉴定酵母等真核微生物。
本发明的融合引物中同源重组序列的长度一般是5至25bp,例如5bp、6bp、8bp、10bp、12bp、15bp、16bp、18bp、20bp、23bp或25bp等,优选16bp。同源重组序列的碱基组成上需要注意碱基平衡,一般不出现连续三个相同的碱基即可。具有相同同源重组序列的扩增产物之间能够通过共同的同源重组序列而进行同源重组拼接形成拼接产物,一般2至20个扩增产物拼接在一起,在优选实施例中,2至4个扩增产物拼接在一起。在扩增产物的长度是1.5Kb的情况下,例如扩增产物是16S rDNA的情况下,2至4个扩增产物拼接在一起得到的拼接产物的长度是3至6Kb。本发明的融合引物可以分成不同组,同一组内的融合引物的同源重组序列相同,因此它们的扩增产物能够彼此同源重组拼接,而不同组之间的同源重组序列不同,彼此之间不能拼接。标签序列(Barcode)用于区分不同的扩增产物,即区分不同样本来源的扩增产物,这种标签序列可以是任何合适长度(例如10至20bp)的序列,在一个实施例中,标签序列为Pacbio公司官方公布的序列,长度是16个碱基。不同的融合引物的标签序列彼此不同,这样每一个融合引物都可以特异性扩增一个样本,而将该样本加上特定的标签序列而能够区分彼此样本来源。特异性扩增引物序列用于与靶标序列结合以进行引物延伸,特异性扩增引物序列基本上由靶标序列来确定,例如在靶标序列是16S rDNA的情况下,特异性扩增引物序列为与16S rDNA互补配对的序列,属于融合引物上的固定的特定序列,不可修改,即不同的融合引物上都有这一段相同的特异性扩增引物序列。
本发明的一种实施例中提供一种三代测序文库构建方法,该方法包括:采用本发明的融合引物对靶标序列进行扩增;然后将扩增产物通过融合引物上的同源重组序列进行同源重组拼接,得到包括至少两段扩增产物的拼接产物。
例如,如图2所示,通过本发明的融合引物PCR扩增出每个样本的1.5Kb的16S rDNA全长序列后,使用同源重组酶(例如NEBuilder同源重组酶)将2至4个1.5Kb的扩增产物,拼接成3至6Kb的同源重组拼接产物,然后将10至12个同源重组拼接产物混合建库和上机测序,将每张芯片上面的样本数由10至12个,增加到20至48个,从而降低建库测序成本。
在本发明的优选实施例中,对拼接产物进行损伤修复反应、末端修复反应和连接接头反应,以及对连接接头反应的产物进行酶消化以去除未连接上的接头,以及进行片段分选以得到预定大小的产物,即得到本发明的最终文库。
本发明的一种实施例中提供一种三代测序文库构建试剂盒,该试剂盒包括本发明的融合引物。还可以包括同源重组酶,例如NEBuilder同源重组酶。此外,还包括损伤修复反应、 末端修复反应和连接接头反应的试剂,以及消化酶,例如ExoIII和ExoVII消化酶。
以下通过实施例详细说明本发明的技术方案,应当理解,实施例仅是示例性的,不能理解为对本发明保护范围的限制。
实施例中用到的建库试剂盒信息,如下表1所示:
表1商业建库试剂盒
Figure PCTCN2019074977-appb-000001
实施例中用到的引物序列,如下表2所示:
表2融合引物序列
Figure PCTCN2019074977-appb-000002
Figure PCTCN2019074977-appb-000003
Figure PCTCN2019074977-appb-000004
Figure PCTCN2019074977-appb-000005
注:5’端为同源重组序列(拼接序列),中间下划线序列为标签序列(Barcode),3’端为16S rDNA特异性扩增引物序列;编号1和2引物的扩增产物对应一组拼接,其中1_R和2_F为拼接识别位点。编号3和4引物的扩增产物对应一组拼接,其中3_R和4_F为拼接识别位点。每两对中,第一对引物的下游和第二对引物的上游为拼接识别位点,依次类推。
实施例
样本:大鼠肠道样本
(1)首先通过上海生工生物技术有限公司合成24组带拼接位点的融合引物(如表2所示),然后进行PCR扩增16S rDNA全长的目的区域,每个扩增使用24组引物中的一组引物,依次将24组PCR扩增产物命名为PCR-1、PCR-2……PCR-24。扩增反应所用的试剂盒为INVITROGEN公司的
Figure PCTCN2019074977-appb-000006
Pfx DNA聚合酶(货号:0101025002)。扩增反应体系和反应条件如下表3和表4所示:
表3
扩增体系 体积(μL)
10X pfx Buffer 5
MgSO 4(50mM) 2
dNTP(10mM) 2
pfx酶(2.5U) 1
P1引物(10μM) 2
P2引物(10μM) 2
DNA 6
无核酸酶(NF)的水 30
表4
Figure PCTCN2019074977-appb-000007
(2)将每两个PCR扩增产物(例如:PCR-1和PCR-2,PCR-3和PCR-4等)等量混合,采用NEBuilder同源重组酶,37℃过夜反应,反应体系如下表5,用0.6倍磁珠纯化样本,然后检测浓度和片段分布。
表5
反应组份 体积(μL)
PCR产物 5
NEBuilder HiFi Master Mix 10
5
总体积 20
(3)12组拼接好的同源重组产物,拼接完成后检测拼接产物浓度,根据浓度(表6)和体积计算取样量,计算公式为总量/浓度,等量混合。
表6
Figure PCTCN2019074977-appb-000008
(4)损伤修复+末端修复+加接头+双酶消化+片段分选
A、损伤修复反应
如下表7所示体系,37℃反应60min:
表7
Figure PCTCN2019074977-appb-000009
Figure PCTCN2019074977-appb-000010
B、末端修复反应
如下表8所示体系,25℃反应10min:
表8
Figure PCTCN2019074977-appb-000011
C、加接头
如下表9所示体系,25℃反应12-16小时:
表9
Figure PCTCN2019074977-appb-000012
D、双酶消化
如下表10所示体系,37℃反应60min:
表10
试剂 用量(μL)
DNA 40
ExoIII 1
ExoVII 1
E、片段分选
在插入片段条带附近,例如2个16S rDNA全长拼接目的大小为3Kb左右,实验中分选范围大致在2-4Kb范围内。
(5)制备上机:加primer+加测序聚合酶。
A、引物连接
如下表11所示体系,20℃反应60min:
表11
试剂 用量(μL)
6.1
10×Primer Buffer(引物缓冲液) 1
Sample Volume(样本量) 1.9
Diluted Sequencing Primer(稀释的测序引物) 1
B、聚合酶连接
如下表12所示体系,30℃反应60min:
表12
试剂 用量(μL)
dNTP 1.6
DTT 1.6
Binding Buffer V2(结合缓冲液V2) 1.6
Sample Volume(样本量) 9.2
Polymerase Diluted(稀释的聚合酶) 1
C、磁珠纯化
使用0.6倍磁珠纯化产物。
(6)测试结果:
根据测序结果,拼接前的片段大小如下表13所示:
表13
序号 样本编号 片段大小(bp) 质量浓度(ng/μL)
16S-1 WH1812009962 1642 25.24
16S-2 WH1812009963 1633 20.45
16S-3 WH1812009964 1733 17.41
16S-4 WH1812009965 1705 11.85
16S-5 WH1812009966 1685 20.56
16S-6 WH1812009967 1662 12.66
16S-7 WHYD18125226_A 1701 3.67
16S-8 WHYD18125227_A 1721 7.43
16S-9 WHYD18125229_A 1706 10.23
16S-10 WHYD18125230_A 1708 10.54
16S-11 WH1812004714 1603 73.88
16S-12 WH1812004715 1586 20.13
16S-13 WH1812004716 1550 59.80
16S-14 WH1812004717 1534 36.27
16S-15 WH1812004718 1499 35.90
16S-16 WH1812004719 1513 39.78
16S-17 WH1812004720 1515 36.17
16S-18 WHYD18066114_C 1605 48.93
16S-19 WHYD18066115_C 1540 37.65
16S-20 WHYD18066118_C 1512 23.62
16S-21 WHYD18066119_C 1527 13.93
16S-22 WHYD18066120_C 1677 34.28
16S-23 WHYD18066121_C 1626 53.74
16S-24 WHYD18066122_C 1716 19.11
根据测序结果,拼接后的片段大小如下表14所示:
表14
序号 样本编号 片段大小(bp) 质量浓度(ng/μL)
16S拼接产物1 WH1812009962 1528\3252 11.80
16S拼接产物2 WH1812009964 1520\3243 13.80
16S拼接产物3 WH1812009966 1709\3226 18.90
16S拼接产物4 WHYD18125226_A 1701\3201 19.70
16S拼接产物5 WHYD18125229_A 1667\3216 18.80
16S拼接产物6 WH1812004714 1600\3181 15.80
16S拼接产物7 WH1812004716 1507\3178 2.00
16S拼接产物8 WH1812004718 1544\3306 21.10
16S拼接产物9 WH1812004720 1543\3207 19.30
16S拼接产物10 WHYD18066115_C 1524\3112 26.90
16S拼接产物11 WHYD18066119_C 1517\3034 19.40
16S拼接产物12 WHYD18066121_C 1630\3240 23.50
结果显示:扩增出来的16S rDNA全长样本大致在1500bp,经过拼接之后1540bp\3080bp均有分布,表明拼接成功。
测序结果数据统计如下:
拼接前测序结果:去除RQ质量值小于0.8之后的数据量:7.58Gb(数据量合格标准为大于5Gb);去除RQ质量值小于0.8之后的酶读长:19057bp(酶读长合格标准为大于10kb)。
拼接后测序结果:去除RQ质量值小于0.8之后的数据量:8.34Gb(数据量合格标准为大于5Gb);去除RQ质量值小于0.8之后的酶读长:13761bp(酶读长合格标准为大于10kb)。
其中,RQ(reads quality)表示对来自同一个零模波导孔(测序孔)的亚读长(Subreads)的精准度进行预测,目前sequel测序仪设置的RQ都是0.8,对于低于这个值的数据,测序仪会自动的过滤掉。数据量表示过滤掉低质量的读长(Reads)后,整张芯片产生的总共数据量,单位Gb。平均酶读长表示过滤掉低质量的读长(Reads)后,整张芯片所有读长(Reads)的平均长度,单位bp。
使用本发明,能够将每张芯片上的扩增子样本数增加3倍左右,可以有效节省测序成本(预计可以从现有的1500RMB/样本,节省至500RMB/样本),增加了混库(pooling)的灵活度,可以有效缩短交付周期。
以上应用了具体个例对本发明进行阐述,只是用于帮助理解本发明,并不用以限制本发明。对于本发明所属技术领域的技术人员,依据本发明的思想,还可以做出若干简单推演、变形或替换。

Claims (21)

  1. 一种用于三代测序建库的融合引物,其特征在于,所述融合引物从5’端至3’端依次包括同源重组序列、标签序列和特异性扩增引物序列,其中所述同源重组序列用于将所述融合引物的扩增产物进行同源重组拼接,所述标签序列用于区分不同的扩增产物,所述特异性扩增引物序列用于与靶标序列结合以进行引物延伸。
  2. 根据权利要求1所述的融合引物,其特征在于,所述靶标序列是16S rDNA,所述特异性扩增引物序列是与所述16S rDNA特异性结合的序列。
  3. 根据权利要求1所述的融合引物,其特征在于,所述同源重组序列的长度是5至25bp。
  4. 根据权利要求3所述的融合引物,其特征在于,所述同源重组序列的长度是16bp。
  5. 根据权利要求1所述的融合引物,其特征在于,所述融合引物选自SEQ ID NO:1至48中任意一对或多对引物。
  6. 一种三代测序文库构建方法,其特征在于,所述方法包括:采用权利要求1至5任一项所述的融合引物对靶标序列进行扩增;然后将扩增产物通过所述融合引物上的同源重组序列进行同源重组拼接,得到包括至少两段扩增产物的拼接产物。
  7. 根据权利要求6所述的方法,其特征在于,所述靶标序列是16S rDNA。
  8. 根据权利要求7所述的方法,其特征在于,所述靶标序列是16S rDNA全长序列。
  9. 根据权利要求6所述的方法,其特征在于,每2至20个所述扩增产物拼接在一起。
  10. 根据权利要求9所述的方法,其特征在于,每2至4个所述扩增产物拼接在一起。
  11. 根据权利要求6所述的方法,其特征在于,所述扩增产物的长度是1.5Kb,所述拼接产物的长度是3至6Kb。
  12. 根据权利要求6所述的方法,其特征在于,所述同源重组拼接采用NEBuilder同源重组酶。
  13. 根据权利要求6所述的方法,其特征在于,所述方法还包括对所述拼接产物进行损伤修复反应、末端修复反应和连接接头反应。
  14. 根据权利要求13所述的方法,其特征在于,所述方法还包括对所述连接接头反应的产物进行酶消化以去除未连接上的接头,以及进行片段分选以得到预定大小的产物。
  15. 一种三代测序方法,其特征在于,所述方法包括对权利要求6至14任一项所述的文库构建方法得到的文库进行上机测序。
  16. 一种三代测序文库构建试剂盒,其特征在于,所述试剂盒包括权利要求1至5任一项所述的融合引物。
  17. 根据权利要求16所述的试剂盒,其特征在于,所述试剂盒还包括同源重组酶。
  18. 根据权利要求17所述的试剂盒,其特征在于,所述同源重组酶是NEBuilder同源重组酶。
  19. 根据权利要求16所述的试剂盒,其特征在于,所述试剂盒还包括损伤修复反应、末端修复反应和连接接头反应的试剂。
  20. 根据权利要求16所述的试剂盒,其特征在于,所述试剂盒还包括消化酶。
  21. 根据权利要求20所述的试剂盒,其特征在于,所述消化酶包括ExoIII和ExoVII消化酶。
PCT/CN2019/074977 2019-02-13 2019-02-13 用于三代测序建库的融合引物、建库方法、测序方法和建库试剂盒 WO2020164015A1 (zh)

Priority Applications (2)

Application Number Priority Date Filing Date Title
CN201980082604.6A CN113166756B (zh) 2019-02-13 2019-02-13 用于三代测序建库的融合引物、建库方法、测序方法和建库试剂盒
PCT/CN2019/074977 WO2020164015A1 (zh) 2019-02-13 2019-02-13 用于三代测序建库的融合引物、建库方法、测序方法和建库试剂盒

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
PCT/CN2019/074977 WO2020164015A1 (zh) 2019-02-13 2019-02-13 用于三代测序建库的融合引物、建库方法、测序方法和建库试剂盒

Publications (1)

Publication Number Publication Date
WO2020164015A1 true WO2020164015A1 (zh) 2020-08-20

Family

ID=72044303

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2019/074977 WO2020164015A1 (zh) 2019-02-13 2019-02-13 用于三代测序建库的融合引物、建库方法、测序方法和建库试剂盒

Country Status (2)

Country Link
CN (1) CN113166756B (zh)
WO (1) WO2020164015A1 (zh)

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104293783A (zh) * 2014-09-30 2015-01-21 天津诺禾致源生物信息科技有限公司 适用于扩增子测序文库构建的引物、构建方法、扩增子文库及包含其的试剂盒
CN106754870A (zh) * 2016-11-30 2017-05-31 武汉菲沙基因信息有限公司 一种构建多样品全长转录组混合文库的方法
CN107829146A (zh) * 2017-11-29 2018-03-23 广州赛哲生物科技股份有限公司 一种用于构建16SrRNA基因扩增子测序文库的引物组及构建方法
CN108070643A (zh) * 2017-10-31 2018-05-25 南京格致基因生物科技有限公司 微生物16S rDNA单分子水平测序文库的构建方法
CN109136222A (zh) * 2018-09-13 2019-01-04 武汉菲沙基因信息有限公司 PacBio测序平台多样品混合测序文库构建的带标签接头及应用

Family Cites Families (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20050069938A1 (en) * 2003-09-26 2005-03-31 Youxiang Wang Amplification of polynucleotides by rolling circle amplification
KR101406720B1 (ko) * 2012-06-19 2014-06-13 (주)지노첵 차세대 염기서열 분석법을 위한 융합 프라이머의 설계방법 그리고 이러한 융합 프라이머 및 차세대 염기서열 분석법을 이용한 표적 유전자의 유전자형 분석방법

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104293783A (zh) * 2014-09-30 2015-01-21 天津诺禾致源生物信息科技有限公司 适用于扩增子测序文库构建的引物、构建方法、扩增子文库及包含其的试剂盒
CN106754870A (zh) * 2016-11-30 2017-05-31 武汉菲沙基因信息有限公司 一种构建多样品全长转录组混合文库的方法
CN108070643A (zh) * 2017-10-31 2018-05-25 南京格致基因生物科技有限公司 微生物16S rDNA单分子水平测序文库的构建方法
CN107829146A (zh) * 2017-11-29 2018-03-23 广州赛哲生物科技股份有限公司 一种用于构建16SrRNA基因扩增子测序文库的引物组及构建方法
CN109136222A (zh) * 2018-09-13 2019-01-04 武汉菲沙基因信息有限公司 PacBio测序平台多样品混合测序文库构建的带标签接头及应用

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
SCHLOSS, P.D.: "Sequencing 16S rRNA gene fragments using the PacBio SMRT DNA sequencing system", PEERJ., 28 March 2016 (2016-03-28), XP055733544 *
WANG, B.: "Unveiling the complexity of the maize transcriptome by sing- le-molecule long-read sequencing", NATURE COMMUNICATIONS, 24 June 2016 (2016-06-24), XP055733535 *

Also Published As

Publication number Publication date
CN113166756A (zh) 2021-07-23
CN113166756B (zh) 2023-10-13

Similar Documents

Publication Publication Date Title
He et al. Genotyping-by-sequencing (GBS), an ultimate marker-assisted selection (MAS) tool to accelerate plant breeding
Kumar et al. A high-throughput method for Illumina RNA-Seq library preparation
Fullwood et al. Chromatin interaction analysis using paired‐end tag sequencing
Morgil et al. Single nucleotide polymorphisms (SNPs) in plant genetics and breeding
EP1546345B1 (en) Genome partitioning
WO2017083766A1 (en) High-throughput crispr-based library screening
WO2018024082A1 (zh) 一种串联rad标签测序文库的构建方法
Chen et al. Chloroplast genome of Aconitum barbatum var. puberulum (Ranunculaceae) derived from CCS reads using the PacBio RS platform
Maxwell et al. A detailed cell-free transcription-translation-based assay to decipher CRISPR protospacer-adjacent motifs
Schultzhaus et al. CRISPR-based enrichment strategies for targeted sequencing
CN110959045B (zh) 生成大规模平行测序的dna文库的改进的方法和试剂盒
Xu et al. Transcriptome-wide identification and functional investigation of circular RNA in the teleost large yellow croaker (Larimichthys crocea)
Renganaath et al. Systematic identification of cis-regulatory variants that cause gene expression differences in a yeast cross
Rahimi et al. Nanopore sequencing of full-length circRNAs in human and mouse brains reveals circRNA-specific exon usage and intron retention
Negi et al. Applications and challenges of microarray and RNA-sequencing
Xu et al. Genome reconstruction and haplotype phasing using chromosome conformation capture methodologies
JP2022513343A (ja) 次世代シーケンスにおいて低サンプルインプットを扱うための正規化対照
Poulsen et al. RNA‐Seq for bacterial gene expression
Magbanua et al. Innovations in double digest restriction-site associated DNA sequencing (ddRAD-Seq) method for more efficient SNP identification
CN107002150B (zh) 一种dna合成产物的高通量检测方法
CN108642209B (zh) 一种小麦植株千粒重判断标记及其应用
WO2020164015A1 (zh) 用于三代测序建库的融合引物、建库方法、测序方法和建库试剂盒
Zhang et al. Massive-scale single-cell chromatin accessibility sequencing using combinatorial fluidic indexing
Llaca Sequencing technologies and their use in plant biotechnology and breeding
CN108753922A (zh) 一种构建转录组测序文库的方法及相应的接头序列和试剂盒

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 19915285

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 19915285

Country of ref document: EP

Kind code of ref document: A1