WO2020228298A1 - 一种测序文库的构建方法及其应用 - Google Patents

一种测序文库的构建方法及其应用 Download PDF

Info

Publication number
WO2020228298A1
WO2020228298A1 PCT/CN2019/121334 CN2019121334W WO2020228298A1 WO 2020228298 A1 WO2020228298 A1 WO 2020228298A1 CN 2019121334 W CN2019121334 W CN 2019121334W WO 2020228298 A1 WO2020228298 A1 WO 2020228298A1
Authority
WO
WIPO (PCT)
Prior art keywords
sequencing
sequenced
sequence
library
competent bacteria
Prior art date
Application number
PCT/CN2019/121334
Other languages
English (en)
French (fr)
Inventor
史泓杰
冯建龙
叶立
张利民
陈大飞
倪志伟
姜伟
张飞
陈豫
周祯祯
吴昕
Original Assignee
苏州金唯智生物科技有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 苏州金唯智生物科技有限公司 filed Critical 苏州金唯智生物科技有限公司
Publication of WO2020228298A1 publication Critical patent/WO2020228298A1/zh

Links

Classifications

    • CCHEMISTRY; METALLURGY
    • C40COMBINATORIAL TECHNOLOGY
    • C40BCOMBINATORIAL CHEMISTRY; LIBRARIES, e.g. CHEMICAL LIBRARIES
    • C40B50/00Methods of creating libraries, e.g. combinatorial synthesis
    • C40B50/06Biochemical methods, e.g. using enzymes or whole viable microorganisms
    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12QMEASURING OR TESTING PROCESSES INVOLVING ENZYMES, NUCLEIC ACIDS OR MICROORGANISMS; COMPOSITIONS OR TEST PAPERS THEREFOR; PROCESSES OF PREPARING SUCH COMPOSITIONS; CONDITION-RESPONSIVE CONTROL IN MICROBIOLOGICAL OR ENZYMOLOGICAL PROCESSES
    • C12Q1/00Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions
    • C12Q1/68Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions involving nucleic acids
    • C12Q1/6869Methods for sequencing

Definitions

  • This application belongs to the field of biotechnology and relates to a method for constructing a sequencing library and its application.
  • the synthesized fragments need to be introduced into the vector for sequencing verification.
  • most of the first-generation Sanger sequencing is used.
  • the instrument used is the 3730 sequencer from ABI, which can perform 96 reactions at a time, and the sequencing read length of each reaction About 700bp, a total of about 76200bp.
  • the specific steps of Sanger sequencing are: transfer the vector containing the insert to competent bacteria, smear it evenly on a petri dish and cultivate it overnight; select a monoclonal strain and culture it in 200 ⁇ L medium for 2 hours; use specific primers for PCR amplification, The obtained bands were verified by Sanger sequencing, the sequencing results were manually compared with the standard sequence, and 100% correct clones were selected for the next experiment.
  • using the Sanger method for sequencing verification requires a lot of manpower in the early stage for sample amplification and screening, which is costly and time-consuming.
  • CN 107760672 A discloses an industrialized gene synthesis method based on second-generation sequencing technology.
  • the method includes the following steps: splitting the sequence to be synthesized into multiple small fragments, and then synthesizing the split multiple small fragments with head-to-tail synthesis primers; Amplification containing 20 random bases at the 5'end uses upstream and downstream primers to amplify the synthesized small fragments; mix all the amplified small fragments, and then sequence using next-generation sequencing technology, and compare and analyze the sequencing results , Find a sequence that is exactly the same as expected, so as to determine that the random sequence at both ends of the sequence is the desired sequence, and then design and call primers according to the required sequence; use the primers to perform the first round of call amplification on the amplified small fragments Amplify, then synthesize the primers from the end to the end for the second round of call and amplification, and then assemble the amplified fragments in the second round to obtain the sequence product.
  • this method needs to
  • the third-generation Pacbio sequencing is based on single-molecule real-time sequencing and zero-mode waveguide hole technology, which can perform independent single-molecule sequencing for each library.
  • the sequencing process does not perform PCR amplification and has no GC preference.
  • the sequencing depth reaches 20 ⁇ and the sequencing accuracy Up to 99.99%, sequencing read length can reach more than 100kb.
  • the application of third-generation sequencing in gene synthesis is limited by the large number of samples and the high cost of building a database. It cannot be really used.
  • this application provides a mixed extraction and restriction digestion method and its application.
  • the method mixes the bacterial solution and extracts the plasmid, and performs three generations of the linearized fragment obtained after the plasmid digestion with the tag sequence. Sequencing, and finally processing the results using an automated splitting analysis program, realizing large-scale low-cost three-generation sequencing of a large number of clones, and successfully applied to gene synthesis.
  • this application provides a method for constructing a sequencing library.
  • the method includes the following steps:
  • M and N are positive integers.
  • a mixed extraction and enzyme digestion step is used in the library construction process. While increasing the sample volume, the monoclonal screening process is omitted, the plasmid extraction work is reduced by 99%, and the bacterial culture time is shortened, which is beneficial Realize the application of third-generation sequencing instead of first-generation sequencing in gene synthesis.
  • the sample to be sequenced in step (1) includes synthetic gene fragments.
  • the length of the gene fragment is 500-10000bp, for example, it may be 500bp, 1000bp, 2000bp, 3000bp, 4000bp, 5000bp, 6000bp, 7000bp, 8000bp, 9000bp or 10000bp, preferably 4000-6000bp.
  • the library construction process and the sequencing process do not involve PCR, do not need to split the sample to be sequenced, and the sequencing result does not need to be assembled and assembled, and is directly processed by the analysis program, which realizes complete sequencing of long fragments and significantly reduces sequencing costs.
  • the culture in step (2) is performed in a 96-well plate.
  • the restriction enzymes in step (4) are used for restriction enzymes.
  • restriction sites and restriction enzymes are selected according to the plasmid information to linearize the plasmid.
  • the restriction enzymes include but are not limited to EcoR I, BamH I, Hind II, Hind III, Any one of Alu I, BsuR I, Bal I, Hal III, HPa I, or Sma I.
  • step (5) it further includes a step of repairing the linearized plasmid.
  • the repair includes damage repair and/or end repair.
  • the product is made into a complete double-stranded DNA, which is beneficial to the subsequent three-generation sequencing.
  • the tag sequence in step (5) is ligated to both ends of the linearized plasmid by DNA ligase.
  • step (5) a step of recovering and purifying the library is further included.
  • the recovery and purification includes the use of magnetic beads to recover the DNA, and then digest the DNA without the tag sequence connected with nuclease.
  • this application provides a method for constructing a sequencing library.
  • the method includes the following steps:
  • M and N are positive integers.
  • this application provides a sequencing verification method based on third-generation sequencing, and the method includes the following steps:
  • the third-generation sequencing in step (3') includes Pacbio single molecule fluorescence sequencing and/or nanopore sequencing, preferably Pacbio single molecule fluorescence sequencing.
  • the result analysis in step (4') includes:
  • the tag sequence is used to determine the number of the mixed bacterial solution
  • the conservative sequence of the sample to be tested is used to determine the type of sample in the mixed bacterial solution.
  • the automatic analysis program is used to split the label sequence and the conservative sequence of the sample to be sequenced. The sequencing results are divided, and the automatic correspondence between sequencing results and samples is realized.
  • low-abundance CCS sequence refers to a CCS sequence with an abundance of less than 3.
  • the present application provides a gene synthesis method, which includes the step of sequencing and verifying the synthesized gene fragments using the method described in the second aspect.
  • this application provides a gene synthesis method, which includes the following steps:
  • this application provides a method for constructing a sequencing library as described in the first aspect and/or an application of a sequencing verification method based on third-generation sequencing as described in the second aspect in gene synthesis.
  • the plasmid is extracted after mixing the bacterial solution, and the linearized fragments and tagged sequences obtained after the plasmid digestion are subjected to three-generation sequencing, and finally the results are processed using an automated analysis program to increase the sample volume. Eliminates the monoclonal screening process, reduces the plasmid extraction work by 99%, shortens the bacterial culture time, and is successfully applied to gene synthesis;
  • This application uses third-generation sequencing to verify the sequencing of synthetic genes. At least 5,000 single clones can be sequenced in a single sequence. According to the length of each single clone is 5,000 bp, the total is about 2.5 ⁇ 10 7 bp, the cost of sequencing a single base Only 4.7% of Sanger sequencing, realizing large-scale low-cost three-generation sequencing of a large number of clones;
  • This application uses a mixed extraction and enzyme digestion step, does not involve amplification, has no restriction on the gene sequence, does not need to split the sequence, and the sequencing result does not need to be assembled, and can be processed by the analysis program to achieve the full length of the gene Sequencing.
  • 600 genes were synthesized by the method of gene synthesis, numbered 1, 2, 3...600.
  • this example lists the reference sequences SEQ ID NO: 1 to 6 of the 6 genes, and the sequence information is detailed See sequence listing.
  • SMRTbell library After DNA repair is performed on the linearized plasmid, use T4DNA ligase to ligate the tag sequence shown in Table 2 to the double-stranded DNA After the ligation reaction is completed, mix 8 samples with different tag sequences, recover them once with 1.0 ⁇ AMPure beads, and then digest the double-stranded DNA that is not tagged with nuclease III and VII to obtain a purified library.
  • Example 1 The library purified in Example 1 was subjected to QC, and Qubit was used to quantitatively detect the library concentration and Aglient 2100 to detect the distribution and size of the library;
  • the analysis process regularly checks the sequencing result catalog to determine whether the sequencing has been completed and whether the data has been uploaded;
  • c) Use the provided index information and tag sequence information of each clone to split the data. While splitting, remove low-abundance (less than 3) CCS sequences, and map the sequencing results to each sequencing sample, and Count data information that cannot be split;
  • the reference sequence number corresponding to well F1 is LB3214-1
  • the clone number is L008133
  • the reference sequence is 662 bp long.
  • the specific sequence information is shown in SEQ ID NO:1.
  • the optimal PacBio sequencing sequence is 100% exact match with the reference sequence without mutation sites , The abundance is 108, and the abundance of the optimal sequence accounts for 95.575% of the total abundance under the tag sequence.
  • the highest abundance sequence sequence under the tag sequence exactly matches the reference sequence 100%, no mutation site, an abundance of 108, the abundance of the optimal sequence accounted for 95.575% of the total abundance under the tag sequence, the tag sequence The total abundance accounted for 113 of the total reads.
  • Example 2 Compared with Example 2, the library purified in Example 1 was verified by first-generation sequencing.
  • Example 2 Compared with Example 2, the library purified in Example 1 was verified by second-generation sequencing.
  • this application adopts a mixed extraction and restriction digestion step to perform three-generation sequencing on the linearized fragments and tagged sequences obtained after plasmid digestion, and finally use an automated splitting analysis program to process the results, while increasing the sample volume It saves the monoclonal screening process, reduces 99% of the plasmid extraction work, shortens the bacterial culture time, reduces the sequencing cost, automatically corresponds the sequencing results to the samples, and realizes the third-generation sequencing instead of the first-generation sequencing for gene synthesis in.

Landscapes

  • Chemical & Material Sciences (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Organic Chemistry (AREA)
  • Biochemistry (AREA)
  • Health & Medical Sciences (AREA)
  • Proteomics, Peptides & Aminoacids (AREA)
  • Engineering & Computer Science (AREA)
  • Microbiology (AREA)
  • Zoology (AREA)
  • Wood Science & Technology (AREA)
  • Molecular Biology (AREA)
  • General Chemical & Material Sciences (AREA)
  • Analytical Chemistry (AREA)
  • Biophysics (AREA)
  • Physics & Mathematics (AREA)
  • Biotechnology (AREA)
  • Immunology (AREA)
  • Medicinal Chemistry (AREA)
  • Chemical Kinetics & Catalysis (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Health & Medical Sciences (AREA)
  • Genetics & Genomics (AREA)
  • Measuring Or Testing Involving Enzymes Or Micro-Organisms (AREA)

Abstract

本申请提供了一种测序文库的构建方法及其应用,所述方法包括以下步骤:(1)将M个待测序样本分别转入感受态细菌,划线过夜培养;(2)从每个待测序样本的感受态细菌中分别挑取N个菌落,进行单独培养;(3)将包含不同待测序样本的菌液混合,形成N份混合菌液,每份混合菌液中包含M种待测序样本的感受态细菌;(4)对N份混合菌液分别提取质粒,线性化酶切;(5)向N份线性化混合质粒添加不同的标签序列,混合后得到测序文库;其中,M和N为正整数。本申请采用混合抽提酶切步骤,在提高样本量的同时,省去了单克隆筛选过程,减少了99%的质粒抽提工作,有利于实现三代测序替代一代测序应用于基因合成中。

Description

一种测序文库的构建方法及其应用 技术领域
本申请属于生物技术领域,涉及一种测序文库的构建方法及其应用。
背景技术
传统的工业化基因合成过程中,合成的片段需要导入载体进行测序验证,目前大多采用一代Sanger测序,使用的仪器是ABI公司的3730测序仪,一次可以进行96个反应,每个反应的测序读长约700bp,共计约76200bp。Sanger测序的具体步骤为:将含有插入片段的载体转移到感受态细菌中,均匀涂抹于培养皿中过夜培养;挑选单克隆菌株在200μL培养基中培养2小时;用特异引物进行PCR扩增,将得到的条带进行Sanger测序验证,将测序结果与标准序列进行人工比对,选择100%正确的克隆进行下一步实验。然而,采用Sanger法进行测序验证,前期需要花费大量的人力进行样本扩增和筛选,成本高、耗时长。
CN 107760672 A公开了一种基于二代测序技术的工业化基因合成方法,该方法包括如下步骤:将待合成的序列拆分成多段小片段,然后用首尾合成引物合成拆分的多段小片段;采用5’端含有20个随机碱基的扩增用上下游引物对合成的小片段进行扩增;将扩增后的所有小片段混合,然后采用二代测序技术进行测序,将测序结果进行对比分析,找到与预期完全一致的序列,从而确定该序列两端的随机序列为所需序列,然后根据所需序列设计调取引物;采用调取引物对扩增后的小片段进行第一轮调取扩增,然后首尾合成引物进行第二轮调取扩增,然后将第二轮调取扩增后的片段进行组装即得序列产物。但是,该方法需要对样本进行拆分,设计多组引物进行PCR扩增,过程繁琐、耗时较长,且对高GC、高重复序列或含有poly结构的特殊样本测序效果差、测序准确度低。
三代Pacbio测序基于单分子实时测序和零模波导孔技术,可以对每个文库 进行独立的单分子测序,测序过程由于不进行PCR扩增,无GC偏好性,测序深度达20×,测序准确度达99.99%,测序读长可达100kb以上。Pacbio测序芯片上有1百万个零模波导孔,读长达数千碱基,不需要额外设计测序引物,可以同时进行上万个样本的测序,在规定的样本数量内,测序成本不会随样本量的增加而升高。但是,三代测序在基因合成中的应用受限于样本数量大,建库成本高等条件,无法真正运用起来。
如何将三代测序替代一代或二代测序应用于基因合成,是本领域亟待解决的问题。
发明内容
针对现有技术的不足,本申请提供了一种混合抽提酶切方法及其应用,所述方法将菌液混合后抽提质粒,将质粒酶切后得到的线性化片段加标签序列进行三代测序,最后使用自动化拆分分析程序对结果进行处理,实现了大规模低成本对大量克隆进行三代测序,并成功应用于基因合成中。
为达此目的,本申请采用以下技术方案:
第一方面,本申请提供了一种测序文库的构建方法,所述方法包括以下步骤:
(1)将M个待测序样本分别转入感受态细菌,划线过夜培养;
(2)从每个待测序样本的感受态细菌中分别挑取N个菌落,进行单独培养;
(3)将包含不同待测序样本的菌液混合,形成N份混合菌液,每份混合菌液中包含M种待测序样本的感受态细菌;
(4)对N份混合菌液分别提取质粒,线性化酶切;和
(5)向N份线性化混合质粒添加不同的标签序列,混合后得到测序文库;
其中,M和N为正整数。
本申请中,文库构建过程中采用混合抽提酶切步骤,在提高样本量的同时,省去了单克隆筛选过程,减少了99%的质粒抽提工作,缩短了菌液培养时间,有利于实现三代测序替代一代测序应用于基因合成中。
优选地,步骤(1)所述待测序样本包括合成的基因片段。
优选地,所述基因片段的长度为500-10000bp,例如可以是500bp、1000bp、2000bp、3000bp、4000bp、5000bp、6000bp、7000bp、8000bp、9000bp或10000bp,优选为4000-6000bp。
本申请中,文库构建过程和测序过程不涉及PCR,无需对待测序样本进行拆分,测序结果无需拼接组装,直接通过分析程序进行处理,实现了长片段的完整测序,显著降低了测序成本。
优选地,步骤(2)所述培养在96孔板中进行。
优选地,步骤(4)所述酶切采用限制性内切酶进行。
本申请中,根据质粒信息选取合适的酶切位点和限制性内切酶,对质粒进行线性化,所述限制性内切酶包括但不限于EcoR I、BamH I、Hind II、Hind III、Alu I、BsuR I、Bal I、Hal III、HPa I或Sma I中的任意一种。
优选地,在步骤(5)之前还包括对线性化酶切质粒进行修复的步骤。
优选地,所述修复包括损伤修复和/或末端修复。
本申请中,通过对线性化酶切质粒进行修复,使产物呈完整的双链DNA,有利于后续三代测序的进行。
优选地,步骤(5)所述标签序列通过DNA连接酶连接在线性化质粒的两端。
优选地,在步骤(5)之后还包括对文库进行回收纯化的步骤。
优选地,所述回收纯化包括采用磁珠回收后,再用核酸酶消化未连接标签 序列的DNA。
作为优选技术方案,本申请提供了一种测序文库的构建方法,所述方法包括以下步骤:
(1)将M个待验证的长度为500-10000bp合成的基因片段分别转入感受态细菌,划线过夜培养;
(2)从每个待测序样本的感受态细菌中分别挑取N个菌落,单独培养在96孔板的同一列中;
(3)将96孔板中同一行的包含不同待测序样本的菌液混合,形成N份混合菌液,每份混合菌液中包含M种待测序样本的感受态细菌;
(4)对N份混合菌液分别提取质粒,采用限制性内切酶进行线性化酶切,修复后得到完整双链的质粒DNA;和
(5)采用DNA连接酶向N份线性化混合质粒添加不同的标签序列,混合后用磁珠回收一次,再用核酸酶消化未连接标签序列的DNA,得到测序文库;
其中,M和N为正整数。
第二方面,本申请提供了一种基于三代测序的测序验证方法,所述方法包括以下步骤:
(1’)采用如第一方面所述的方法进行文库构建;
(2’)对构建的文库进行浓度和分布范围的检测;
(3’)三代测序;和
(4’)结果分析。
优选地,步骤(3’)所述三代测序包括Pacbio单分子荧光测序和/或纳米孔测序,优选为Pacbio单分子荧光测序。
优选地,步骤(4’)所述结果分析包括:
根据标签序列和待测序样本的保守序列拆分测序结果;
去除低丰度CCS(Circular Consensus Sequence)序列;和
将测序结果与参考序列进行比对。
本申请中,标签序列用于确定混合菌液的编号,待测样本的保守序列用于确定混合菌液中样本的种类,通过自动化拆分分析程序,根据标签序列和待测序样本的保守序列拆分测序结果,实现了测序结果与样本的自动化对应。
本申请中,术语“低丰度CCS序列”指丰度小于3的CCS序列。
第三方面,本申请提供了一种基因合成方法,所述方法包括将合成的基因片段采用如第二方面所述的方法进行测序验证的步骤。
作为优选技术方案,本申请提供了一种基因合成方法,所述方法包括以下步骤:
(1)将M个待验证的长度为500-10000bp合成的基因片段分别转入感受态细菌,划线过夜培养;
(2)从每个待测序样本的感受态细菌中分别挑取N个菌落,单独培养在96孔板的同一列中;
(3)将96孔板中同一行的包含不同待测序样本的菌液混合,形成N份混合菌液,每份混合菌液中包含M种待测序样本的感受态细菌;
(4)对N份混合菌液分别提取质粒,采用限制性内切酶进行线性化酶切,修复后得到完整双链的质粒DNA;
(5)采用DNA连接酶向N份线性化混合质粒添加不同的标签序列,混合后用磁珠回收一次,再用核酸酶消化未连接标签序列的DNA,得到测序文库;
(6)对构建的文库进行浓度和分布范围的检测;
(7)Pacbio单分子荧光测序;和
(8)根据标签序列和待验证基因片段的保守序列拆分测序结果,去除小于3的低丰度CCS序列,将测序结果与参考序列进行比对。
第四方面,本申请提供了一种如第一方面所述的测序文库的构建方法和/或如第二方面所述的基于三代测序的测序验证方法在基因合成中的应用。
与现有技术相比,本申请具有如下有益效果:
(1)本申请将菌液混合后抽提质粒,将质粒酶切后得到的线性化片段加标签序列进行三代测序,最后使用自动化拆分分析程序对结果进行处理,在提高样本量的同时,省去了单克隆筛选过程,减少了99%的质粒抽提工作,缩短了菌液培养时间,并成功应用于基因合成中;
(2)本申请采用三代测序进行合成基因的测序验证,一次测序可以至少完成5000个单克隆的测序,按照每个单克隆长度为5000bp,共计约2.5×10 7bp,单个碱基的测序成本仅为Sanger测序的4.7%,实现了大规模低成本对大量克隆进行三代测序;
(3)本申请采用混合抽提酶切步骤,不涉及扩增,对基因序列没有限制,不需要对序列进行拆分,测序结果无需组装,可以通过分析程序进行处理,实现了基因的全长测序。
具体实施方式
为进一步阐述本申请所采取的技术手段及其效果,以下结合实施例和附图对本申请作进一步地说明。可以理解的是,此处所描述的具体实施方式仅仅用于解释本申请,而非对本申请的限定。
实施例中未注明具体技术或条件者,按照本领域内的文献所描述的技术或条件,或者按照产品说明书进行。所用试剂或仪器未注明生产厂商者,均为可通过正规渠道商购获得的常规产品。
实施例1文库构建
本实施例通过基因合成的方法合成了600条基因,编号为1、2、3……600,示例性地,本实施例列举6条基因的参考序列SEQ ID NO:1~6,序列信息详见序列表。
(1)将600条待验证的基因分别转入感受态细菌,涂平板后进行过夜培养;
(2)整理过夜培养后的平板,每条基因挑取8个圆润、独立、饱满的菌落,以表1-1和表1-2所示的对应方式放入50块96孔板中进行培养,共得到4800个单克隆:
表1-1
  1 2 3 4 5 6 7 8 9 10 11 12
A 1A 2A 3A 4A 5A 6A 7A 8A 9A 10A 11A 12A
B 1B 2B 3B 4B 5B 6B 7B 8B 9B 10B 11B 12B
C 1C 2C 3C 4C 5C 6C 7C 8C 9C 10C 11C 12C
D 1D 2D 3D 4D 5D 6D 7D 8D 9D 10D 11D 12D
E 1E 2E 3E 4E 5E 6E 7E 8E 9E 10E 11E 12E
F 1F 2F 3F 4F 5F 6F 7F 8F 9F 10F 11F 12F
G 1G 2G 3G 4G 5G 6G 7G 8G 9G 10G 11G 12G
H 1H 2H 3H 4H 5H 6H 7H 8H 9H 10H 11H 12H
表1-2
Figure PCTCN2019121334-appb-000001
Figure PCTCN2019121334-appb-000002
……
(3)待培养一定时长后,将50块96孔板中字母相同的菌液混合,得到8份混合菌液AB…H;
(4)采用AXYGEN质粒抽提试剂盒对8份混合菌液进行抽提,得到8份混合质粒AB…H,采用Hind III限制性内切酶对混合质粒进行线性化酶切,得到8份混合的线性化质粒AB…H;
(5)使用Qubit 3.0进行定量,每份样品取150-200ng DNA进行SMRTbell文库构建:对线性化质粒进行DNA修复后,采用T4DNA连接酶将如表2所示的标签序列连接到双链DNA上,连接反应结束后,将加有不同标签序列的8份样品混合,用1.0×AMPure beads回收一次,再用核酸酶III和VII消化未连接上标签的双链DNA,得到纯化的文库。
表2
编号 序列
SEQ ID NO:7 CGTCTGACTACTCACG
SEQ ID NO:8 CAACTGACTACTCACG
SEQ ID NO:9 CCCCTGACTACTCACG
SEQ ID NO:10 CGGCTGACTACTCACG
SEQ ID NO:11 CTTCTGACTACTCACG
SEQ ID NO:12 CATCTGACTACTCACG
SEQ ID NO:13 CCTCTGACTACTCACG
SEQ ID NO:14 CTCCTGACTACTCACG
实施例2 Pacbio Sequel测序和结果分析
(1)将实施例1纯化的文库进行QC,采用Qubit定量检测文库浓度和Aglient 2100检测文库的分布与大小;
(2)根据经验选择上机浓度为3pM,加入测序引物Pacbio Sequencing Primer v3和酶Sequel DNA Polymerase 2.1,进行Pacbio Sequel测序,耗时约13小时;
(3)测序完成后,自动化分析流程检测到完成信号,开始进行生物信息学分析,并生成分析结果,具体步骤为:
a)分析流程定时检查测序结果目录,判断测序是否已经完成,并且数据是否已经上传完毕;
b)测序数据上传完成,启动Pacbio data质量矫正程序,依据设置的passnumber大于10,生产高质量的测序片段;
c)利用提供的index信息和每个克隆的标签序列信息,进行数据拆分,在拆分的同时,去除低丰度(小于3)的CCS序列,将测序结果对应到每个测序样本,并统计不能进行拆分的数据信息;
d)根据合成序列两端的质粒保守序列,从测序序列中提取目标合成序列信息,并与参考序列进行比对,比对软件采用Minimap2;
e)统计比对结果中与参考序列一致的测序序列数目,突变序列数目,统计 结果提供该样本测序序列中最优序列的结果和最高丰度序列比对结果;
f)对于有突变的序列,提供测序序列与参考序列的BLAST比对结果,展示突变信息,辅助进行序列修复;
g)将所有分析结果整理到GS上机信息表中。
在编号为0227-Amp-1的96孔板中,F1孔对应的参考序列编号为LB3214-1,克隆编号为L008133,参考序列长662bp,具体的序列信息如SEQ ID NO:1所示。
根据提供的index信息(TTTATTATTAGCATATAAAA)、单克隆的标签序列信息(CGTCTGACTACTCACG,CGTCTGACTACTCACG)、载体标签序列gaattgacgcgtattgggat和atcccaatggcgcgccgagc后,比对上最优的PacBio测序序列与参考序列100%精确匹配,无突变位点,丰度为108,最优序列的丰度占该标签序列下总丰度的95.575%。
该标签序列下最高丰度的测序序列与参考序列100%精确匹配,无突变位点,丰度为108,最优序列的丰度占该标签序列下总丰度的95.575%,该标签序列下总丰度占总reads数为113。
对比例1
与实施例2相比,将实施例1纯化的文库采用一代测序进行测序验证。
对比例2
与实施例2相比,将实施例1纯化的文库采用二代测序进行测序验证。
实施例2、对比例1和对比例2的测序验证方法的时长和单个碱基成本见表3。
表3
编号 测序方法 耗时(h) 100个碱基成本(元)
实施例2 Pacbio Sequel测序 13 0.139
对比例1 一代测序 2040 1.667
对比例2 二代测序 40 8.403
由此可见,在基因合成的应用中,采用三代测序进行测序验证,相较于一代和二代测序,不仅缩短了测序时长,而且显著降低了测序成本。
综上所述,本申请采用混合抽提酶切步骤,将质粒酶切后得到的线性化片段加标签序列进行三代测序,最后使用自动化拆分分析程序对结果进行处理,在提高样本量的同时,省去了单克隆筛选过程,减少了99%的质粒抽提工作,缩短了菌液培养时间,降低了测序成本,将测序结果与样本自动化对应,实现了三代测序替代一代测序应用于基因合成中。
申请人声明,本申请通过上述实施例来说明本申请的详细方法,但本申请并不局限于上述详细方法,即不意味着本申请必须依赖上述详细方法才能实施。所属技术领域的技术人员应该明了,对本申请的任何改进,对本申请产品各原料的等效替换及辅助成分的添加、具体方式的选择等,均落在本申请的保护范围和公开范围之内。

Claims (15)

  1. 一种测序文库的构建方法,其包括以下步骤:
    (1)将M个待测序样本分别转入感受态细菌,划线过夜培养;
    (2)从每个待测序样本的感受态细菌中分别挑取N个菌落,进行单独培养;
    (3)将包含不同待测序样本的菌液混合,形成N份混合菌液,每份混合菌液中包含M种待测序样本的感受态细菌;
    (4)对N份混合菌液分别提取质粒,线性化酶切;和
    (5)向N份线性化混合质粒添加不同的标签序列,混合后得到测序文库;
    其中,M和N为正整数。
  2. 根据权利要求1所述的方法,其中,步骤(1)所述待测序样本包括合成的基因片段。
  3. 根据权利要求2所述的方法,其中,所述基因片段的长度为500-10000bp,优选为4000-6000bp。
  4. 根据权利要求1所述的方法,其中,步骤(2)所述培养在96孔板中进行。
  5. 根据权利要求1-4中任一项所述的方法,其中,步骤(4)所述酶切采用限制性内切酶进行。
  6. 根据权利要求1-5中任一项所述的方法,其中,在步骤(5)之前还包括对线性化酶切质粒进行修复的步骤;
    优选地,所述修复包括损伤修复和/或末端修复。
  7. 根据权利要求1-6中任一项所述的方法,其中,步骤(5)所述标签序列通过DNA连接酶连接在线性化质粒的两端。
  8. 根据权利要求1-7中任一项所述的方法,其中,在步骤(5)之后还包括对文库进行回收纯化的步骤;
    优选地,所述回收纯化包括采用磁珠回收后,再用核酸酶消化未连接标签序列的DNA。
  9. 根据权利要求1-8中任一项所述的方法,其中,所述方法包括以下步骤:
    (1)将M个待验证的长度为500-10000bp合成的基因片段分别转入感受态细菌,划线过夜培养;
    (2)从每个待测序样本的感受态细菌中分别挑取N个菌落,单独培养在96孔板的同一列中;
    (3)将96孔板中同一行的包含不同待测序样本的菌液混合,形成N份混合菌液,每份混合菌液中包含M种待测序样本的感受态细菌;
    (4)对N份混合菌液分别提取质粒,采用限制性内切酶进行线性化酶切,修复后得到完整双链的质粒DNA;和
    (5)采用DNA连接酶向N份线性化混合质粒添加不同的标签序列,混合后用磁珠回收一次,再用核酸酶消化未连接标签序列的DNA,得到测序文库;
    其中,M和N为正整数。
  10. 一种基于三代测序的测序验证方法,其包括以下步骤:
    (1’)采用如权利要求1-9中任一项所述的方法进行文库构建;
    (2’)对构建的文库进行浓度和分布范围的检测;
    (3’)三代测序;和
    (4’)结果分析。
  11. 根据权利要求10所述的方法,其中,步骤(3’)所述三代测序包括Pacbio单分子荧光测序和/或纳米孔测序,优选为Pacbio单分子荧光测序。
  12. 根据权利要求11所述的方法,其中,步骤(4’)所述结果分析包括:
    根据标签序列和待测序样本的保守序列拆分测序结果;
    去除低丰度CCS序列;和
    将测序结果与参考序列进行比对。
  13. 一种基因合成方法,其包括将合成的基因片段采用如权利要求10-12中任一项所述的方法进行测序验证的步骤。
  14. 根据权利要求13所述的方法,其中,所述方法包括以下步骤:
    (1)将M个待验证的长度为500-10000bp合成的基因片段分别转入感受态细菌,划线过夜培养;
    (2)从每个待测序样本的感受态细菌中分别挑取N个菌落,单独培养在96孔板的同一列中;
    (3)将96孔板中同一行的包含不同待测序样本的菌液混合,形成N份混合菌液,每份混合菌液中包含M种待测序样本的感受态细菌;
    (4)对N份混合菌液分别提取质粒,采用限制性内切酶进行线性化酶切,修复后得到完整双链的质粒DNA;
    (5)采用DNA连接酶向N份线性化混合质粒添加不同的标签序列,混合后用磁珠回收一次,再用核酸酶消化未连接标签序列的DNA,得到测序文库;
    (6)对构建的文库进行浓度和分布范围的检测;
    (7)Pacbio单分子荧光测序;和
    (8)根据标签序列和待验证基因片段的保守序列拆分测序结果,去除小于3的低丰度CCS序列,将测序结果与参考序列进行比对。
  15. 一种如权利要求1-9中任一项所述的测序文库的构建方法或如权利要求10-12中任一项所述的基于三代测序的测序验证方法在基因合成中的应用。
PCT/CN2019/121334 2019-05-13 2019-11-27 一种测序文库的构建方法及其应用 WO2020228298A1 (zh)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201910394932.1A CN111926393A (zh) 2019-05-13 2019-05-13 一种测序文库的构建方法及其应用
CN201910394932.1 2019-05-13

Publications (1)

Publication Number Publication Date
WO2020228298A1 true WO2020228298A1 (zh) 2020-11-19

Family

ID=73282900

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2019/121334 WO2020228298A1 (zh) 2019-05-13 2019-11-27 一种测序文库的构建方法及其应用

Country Status (2)

Country Link
CN (1) CN111926393A (zh)
WO (1) WO2020228298A1 (zh)

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107760672A (zh) * 2016-08-17 2018-03-06 苏州泓迅生物科技股份有限公司 一种基于二代测序技术的工业化基因合成方法
CN108866173A (zh) * 2017-05-16 2018-11-23 深圳华大基因科技服务有限公司 一种标准序列的验证方法、装置及其应用
CN109056077A (zh) * 2018-09-13 2018-12-21 武汉菲沙基因信息有限公司 一种适用于PacBio测序平台的扩增子混样测序文库构建方法

Family Cites Families (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP2398915B1 (en) * 2009-02-20 2016-08-24 Synthetic Genomics, Inc. Synthesis of sequence-verified nucleic acids
WO2016109981A1 (zh) * 2015-01-09 2016-07-14 深圳华大基因研究院 一种dna合成产物的高通量检测方法
CN105671644A (zh) * 2016-02-26 2016-06-15 武汉冰港生物科技有限公司 一种基因组混样测序文库的制备方法
CN107190001A (zh) * 2017-04-17 2017-09-22 武汉金开瑞生物工程有限公司 一种基因合成方法

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107760672A (zh) * 2016-08-17 2018-03-06 苏州泓迅生物科技股份有限公司 一种基于二代测序技术的工业化基因合成方法
CN108866173A (zh) * 2017-05-16 2018-11-23 深圳华大基因科技服务有限公司 一种标准序列的验证方法、装置及其应用
CN109056077A (zh) * 2018-09-13 2018-12-21 武汉菲沙基因信息有限公司 一种适用于PacBio测序平台的扩增子混样测序文库构建方法

Also Published As

Publication number Publication date
CN111926393A (zh) 2020-11-13

Similar Documents

Publication Publication Date Title
CN106555226B (zh) 一种构建高通量测序文库的方法和试剂盒
CN107201408B (zh) 一种基于转录组测序开发剑麻ssr引物的方法
CN109593757B (zh) 一种探针及其适用于高通量测序的对目标区域进行富集的方法
CN112359093B (zh) 血液中游离miRNA文库制备和表达定量的方法及试剂盒
CN108517567B (zh) 用于cfDNA建库的接头、引物组、试剂盒和建库方法
CN105986015A (zh) 一种基于高通量测序的多样本的一个或多个靶序列的检测方法和试剂盒
CN111676276A (zh) 一种快速精准确定基因编辑突变情况的方法及其应用
KR102121570B1 (ko) 인삼 품종 또는 자원의 판별 및 분류를 위한 snp 기반 kasp용 프라이머 세트 및 이의 용도
CN110042148A (zh) 一种高效获取叶绿体dna测序数据的方法及其应用
CN113564197A (zh) 一种CRISPR/Cas9介导的植物多基因编辑载体的构建方法和应用
CN110218811B (zh) 一种筛选水稻突变体的方法
CN112941635A (zh) 一种提高文库转化率的二代测序建库试剂盒及其方法
WO2023202030A1 (zh) 一种小分子rna的高通量测序文库构建方法
CN111979353A (zh) 一种针对新型冠状病毒SARS-CoV-2全长基因组测序的文库构建方法
CN107002150B (zh) 一种dna合成产物的高通量检测方法
WO2020228298A1 (zh) 一种测序文库的构建方法及其应用
CN115058490B (zh) 一种用于构建微生物靶向测序文库的引物组合及其应用
CN109797437A (zh) 一种检测多个样品时测序文库的构建方法及其应用
CN109852668A (zh) 一种简化基因组测序文库及其建库方法
CN111518879B (zh) 一种提高多重pcr扩增文库质量的方法
CN106566872A (zh) 基于测序基因分型技术的猪snp标记位点的分析方法
CN110305974B (zh) 基于检测五个snp位点区分常见小鼠近交系的pcr分析引物及其分析方法
CN113278646A (zh) 一种构建水稻多基因编辑突变体库的方法及应用
CN115125295A (zh) 一种用于多位点可持续使用的基因分型标准品
CN114807123B (zh) 一种dna分子的扩增引物设计和连接方法

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 19928835

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 19928835

Country of ref document: EP

Kind code of ref document: A1