WO2022144003A1 - 一种用于高通量靶向测序的多重pcr文库构建方法 - Google Patents

一种用于高通量靶向测序的多重pcr文库构建方法 Download PDF

Info

Publication number
WO2022144003A1
WO2022144003A1 PCT/CN2021/143948 CN2021143948W WO2022144003A1 WO 2022144003 A1 WO2022144003 A1 WO 2022144003A1 CN 2021143948 W CN2021143948 W CN 2021143948W WO 2022144003 A1 WO2022144003 A1 WO 2022144003A1
Authority
WO
WIPO (PCT)
Prior art keywords
mocode
barcode
sequence
sequencing
adapter
Prior art date
Application number
PCT/CN2021/143948
Other languages
English (en)
French (fr)
Inventor
朱钧
若林良之
张用书
杨楚烜
白冰
Original Assignee
东科智生基因科技(北京)有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 东科智生基因科技(北京)有限公司 filed Critical 东科智生基因科技(北京)有限公司
Priority to CN202180088322.4A priority Critical patent/CN116888276A/zh
Priority to US18/270,492 priority patent/US20240076653A1/en
Publication of WO2022144003A1 publication Critical patent/WO2022144003A1/zh

Links

Images

Classifications

    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12NMICROORGANISMS OR ENZYMES; COMPOSITIONS THEREOF; PROPAGATING, PRESERVING, OR MAINTAINING MICROORGANISMS; MUTATION OR GENETIC ENGINEERING; CULTURE MEDIA
    • C12N15/00Mutation or genetic engineering; DNA or RNA concerning genetic engineering, vectors, e.g. plasmids, or their isolation, preparation or purification; Use of hosts therefor
    • C12N15/09Recombinant DNA-technology
    • C12N15/10Processes for the isolation, preparation or purification of DNA or RNA
    • C12N15/1034Isolating an individual clone by screening libraries
    • C12N15/1065Preparation or screening of tagged libraries, e.g. tagged microorganisms by STM-mutagenesis, tagged polynucleotides, gene tags
    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12NMICROORGANISMS OR ENZYMES; COMPOSITIONS THEREOF; PROPAGATING, PRESERVING, OR MAINTAINING MICROORGANISMS; MUTATION OR GENETIC ENGINEERING; CULTURE MEDIA
    • C12N15/00Mutation or genetic engineering; DNA or RNA concerning genetic engineering, vectors, e.g. plasmids, or their isolation, preparation or purification; Use of hosts therefor
    • C12N15/09Recombinant DNA-technology
    • C12N15/10Processes for the isolation, preparation or purification of DNA or RNA
    • C12N15/1034Isolating an individual clone by screening libraries
    • C12N15/1093General methods of preparing gene libraries, not provided for in other subgroups
    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12QMEASURING OR TESTING PROCESSES INVOLVING ENZYMES, NUCLEIC ACIDS OR MICROORGANISMS; COMPOSITIONS OR TEST PAPERS THEREFOR; PROCESSES OF PREPARING SUCH COMPOSITIONS; CONDITION-RESPONSIVE CONTROL IN MICROBIOLOGICAL OR ENZYMOLOGICAL PROCESSES
    • C12Q1/00Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions
    • C12Q1/68Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions involving nucleic acids

Definitions

  • the present disclosure relates to the field of biomedicine, and more particularly, to a method for constructing a DNA library, in particular to a method for constructing a multiplex PCR library for high-throughput targeted sequencing.
  • the present disclosure relates to the technical field of library construction, in particular to a method for constructing a targeted high-throughput DNA library.
  • a method for constructing a targeted high-throughput DNA library In the past decade, with the continuous advancement of next-generation sequencing technology, the application of life science research has also continued to expand. Different nucleic acid preparation methods and sequencing library construction methods are also more efficient.
  • High-throughput sequencing High-Throughput Sequencing
  • NGS next-generation sequencing technology
  • the sequencing read length is short, and the general sequencing length is 2x300bp or 2x150bp.
  • the obtained short-read sequences are very difficult to align and assemble when there is no reference genome alignment and assembly, or when the genome contains highly complex structural sequences.
  • the splicing and assembly of short sequences can be assisted by a large-span large fragment library (mate pair library).
  • mate pair library a large-span large fragment library
  • structural variation of large fragments of chromosomes such as insertions, deletions, inversions, and translocations, can be detected.
  • High-throughput targeted sequencing is a very cost-effective and highly sensitive detection method, and the key link is the targeted enrichment of target genes.
  • the main methods to achieve targeted enrichment include hybridization capture and PCR-based methods.
  • Library Construction Methods In general, hybrid capture-based methods require the use of streptavidin-coated magnetic beads, which are expensive and complicated, and require more DNA samples.
  • PCR-based targeted enrichment technology using molecular barcode (Unique Molecular Identifier, UMI) technology has made great progress, which can solve the original difficulty of removing PCR repeats. Errors in UMI are still difficult to eliminate and the operation steps are cumbersome. Therefore, it is necessary to provide an accurate, efficient and simple method for constructing a multiplex PCR-targeted enrichment library.
  • molecular barcode Unique Molecular Identifier
  • PCR-based targeted enrichment library construction methods mainly include AmpliSeq (thermo), SLIM Amplification, Relay PCR, etc. These methods all include a two-step PCR reaction, that is, the first step is targeted amplification of the target fragment, and the second step is PCR enrichment after adapter ligation, but these methods all use traditional TA ligation or blunt-end ligation, and the overall library construction process does not add control. In the non-specific amplification link, the non-specific amplification products cannot be well removed. This situation is particularly prominent in targeted methylation sequencing. Due to the bisulfite-treated DNA, most cytosines are changed to thymines, which makes it easier to form primer-dimers or non-specific amplification between multiple primers.
  • the purpose of the present disclosure is to provide a multiplex PCR library construction method for high-throughput targeted sequencing.
  • the present disclosure relates to a method for constructing a multiplex PCR library for high-throughput targeted sequencing.
  • the specific amplification product is added with a polybasic MoCODE barcode, and the MoCODE barcode is used to make the amplification product and the MoCODE barcode decoding sequence.
  • Sequencing adapters are efficiently connected to build a library.
  • the MoCODE barcode refers to the protruding single-stranded nucleotide sequence that constitutes the two sticky ends of the obtained PCR product after digesting multiple PCR products with a specific endonuclease.
  • the MoCODE barcode The barcode decoding sequence is the nucleotide sequence complementary to the MoCODE barcode.
  • the generation method of the MoCODE barcode includes: one or more of modified nucleotides, nicking enzymes, endonucleases, chemical modifications, photolytic bases, etc.; preferably, the Modified nucleotides include one or more of dUTP, dITP, and RNA bases.
  • the MoCODE barcodes may or may not be identical within the molecule.
  • the MoCODE barcode is a non-random specific barcode.
  • the length of the MoCODE barcode is 2-20nt.
  • the MoCODE barcode decoding sequence and the MoCODE barcode sequence are complementary sequences with a length of 2-20 nt.
  • the sequencing adapter can be artificially designed and synthesized, or match the sequence of the target segment itself.
  • the sequencing adapter can be a single adapter or a bidirectional adapter.
  • each specific segment enrichment can be decoded by single-linker decoding, double-linker decoding or automatic circularization decoding.
  • the present disclosure also relates to a primer for multiplex PCR for high-throughput targeted sequencing, the primer comprising a MoCODE barcode generation sequence, preferably, the sequence of the primer comprises Seq ID Nos: 1-22, 27-52, Sequences shown in 53, 55, 57-104, 109, 111.
  • the present disclosure also relates to a sequencing adapter for multiplex PCR for high-throughput targeted sequencing
  • the sequencing adapter comprises a MoCODE barcode decoding sequence
  • the sequencing adapter further comprises a sequencing adapter, an index of a sequencing platform
  • the sequencing adapter includes a high-throughput sequencing universal sequence, an index tag and the MoCODE barcode decoding sequence
  • the sequence of the sequencing adapter includes Seq ID Nos: 23-26, 54 , 56, 105-108, 110, 112.
  • a multiplex PCR library construction method for high-throughput targeted sequencing of the present disclosure includes the following steps:
  • each primer participating in the multiple PCR reactions includes a specific MoCODE barcode generation sequence, preferably, the primers also include gene-specific sequences;
  • step 6) connecting the purified PCR product containing the MoCODE barcode obtained in step 5) and a sequencing adapter, the sequencing adapter containing the MoCODE barcode decoding sequence complementary to the MoCODE;
  • step 6) Purify the ligation product obtained in step 6) with magnetic beads to complete the construction of a multiplex PCR library for high-throughput targeted sequencing.
  • the generation method of the MoCODE barcode in step 4) includes: one or more of modified nucleotides, nicking enzymes, endonucleases, chemical modifications, photodegradable bases, etc.; preferably Typically, the modified nucleotides include one or more of dUTP, dITP, and RNA bases. More preferably, the MoCODE barcode is generated by enzymatic digestion with a specific endonuclease.
  • a MoCODE barcode is generated at each of the 5' and 3' sticky ends, wherein the MoCODE barcodes of the 5' and 3' sticky ends may be the same or different.
  • the sequencing adapter in step 6) can be a single adapter, a bidirectional adapter or a circularization adapter.
  • the present disclosure has the following advantages:
  • the library construction process is more efficient. Compared with other companies' PCR-based targeted enrichment library construction methods, the manual operation time is reduced by 40-50%, and the overall library construction is reduced by 40-50%. 30-40% reduction in time.
  • Fig. 1 is the process of using different MoCODE to construct the library of the disclosed method
  • Figure 2 is a schematic diagram of the upstream and downstream primer structures of the disclosed multiplex PCR
  • FIG. 3 is a schematic structural diagram of the upstream and downstream joints of the present disclosure.
  • 4A is a schematic diagram of the MoCODE (not identical) double-stranded structure at both ends of the PCR product in Example 3 of the disclosure;
  • 4B is a schematic diagram of the double-stranded structure of the upstream linker in Example 3 of the disclosure.
  • 4C is a schematic diagram of the double-stranded structure of the downstream linker in Example 3 of the disclosure.
  • 5A is a schematic diagram of the MoCODE (identical) double-stranded structure at both ends of the PCR product in Example 4 of the disclosure;
  • 5B is a schematic diagram of the double-stranded structure of the upstream linker in Example 4 of the disclosure.
  • 5C is a schematic diagram of the double-stranded structure of the downstream linker in Example 4 of the disclosure.
  • 6A is a schematic diagram of the primers used in generating MoCODE barcodes by utilizing the MoCODE generating sequence contained in the amplification target segment itself;
  • 6B is a schematic diagram of the target fragment of PCR amplification that itself contains MoCODE generation sequence when generating MoCODE barcode by using the MoCODE generation sequence contained in the amplification target segment itself;
  • 6C is a schematic diagram of a PCR product that generates a MoCODE barcode when the MoCODE generation sequence contained in the amplification target segment itself is used to generate a MoCODE barcode for the disclosure;
  • FIG. 8 is the result of agarose gel electrophoresis of the products connected by sequencing adapters in Example 2 of the present disclosure.
  • sample including a sample or culture (eg, a microbial culture) comprising nucleic acid
  • the sample may include a sample of synthetic origin.
  • Biological samples include whole blood, serum, plasma, umbilical cord blood, chorionic villus, amniotic fluid, cerebrospinal fluid, spinal fluid, lavage fluid (eg, bronchoalveolar, gastric, peritoneal, catheter, ear, arthroscopic lotion), biopsy samples, urine, feces, sputum, saliva, nasal mucus, prostatic fluid, semen, lymph, bile, tears, sweat, breast milk, breast fluid, embryonic and fetal cells.
  • the biological sample is blood, and more preferably plasma.
  • blood as used herein includes whole blood or any blood fraction, such as serum and plasma as conventionally defined.
  • Blood plasma refers to the whole blood fraction produced by centrifugation of anticoagulant-treated blood.
  • Blood serum refers to the watery portion of the fluid that remains after a blood sample has clotted.
  • Environmental samples include environmental materials such as surface materials, soil, water, and industrial samples, as well as samples obtained from food and dairy processing units, instruments, equipment, utensils, disposable and non-disposable items. These examples should not be construed as limiting the types of samples applicable to the present invention.
  • target target nucleic acid
  • gene of interest refers to any molecule whose presence is to be detected or measured, or whose function, interaction or property is to be studied.
  • nucleic acid and “nucleic acid molecule” are used interchangeably throughout this disclosure.
  • the terms refer to oligonucleotides, oligomers, polynucleotides, deoxyribonucleotides (DNA), genomic DNA, mitochondrial DNA (mtDNA), complementary DNA (cDNA), bacterial DNA, viral DNA, viral RNA , RNA, messenger RNA (mRNA), transfer RNA (tRNA), ribosomal RNA (rRNA), siRNA, catalytic RNA, clone, plasmid, M13, P1, cosmid, bacterial artificial chromosome (BAC), yeast artificial chromosome ( YAC), amplified nucleic acids, amplicons, PCR products, and other types of amplified nucleic acids, RNA/DNA hybrids, and polyamide nucleic acids (PNA), all of which may be in single- or double-stranded form, and unless otherwise Without limitation, known analogs of natural nucleotides, and combinations and//
  • nucleotide refers to naturally occurring and modified/non-naturally occurring nucleotides, including tri-, di- and monophosphate nucleosides, as well as monophosphate monophosphates present within polynucleic acids or oligonucleotides body. Nucleotides can also be ribose; 2'-deoxy; 2',3'-deoxy and numerous other nucleotide mimetics well known in the art.
  • Mimics include chain terminating nucleotides, such as 3'-O-methyl, halobases, or sugar substitutions; alternative sugar structures, including non-sugar, alkyl ring structures; alternative bases, including inosine; deaza modifications chi and psi, linker-modified; mass marker-modified; phosphodiester modifications or substitutions, including phosphorothioates, methylphosphonates, boranophosphates, amides, esters, ethers; and basic Or complete internucleotide substitutions, including cleavage linkages, such as photocleavable nitrophenyl moieties.
  • nucleotides such as 3'-O-methyl, halobases, or sugar substitutions
  • alternative sugar structures including non-sugar, alkyl ring structures
  • alternative bases including inosine
  • deaza modifications chi and psi linker-modified
  • mass marker-modified mass marker-modified
  • phosphodiester modifications or substitutions
  • amplification reaction refers to any in vitro means for amplifying copies of a target nucleic acid sequence.
  • Amplification refers to the step of subjecting a solution to conditions sufficient to allow amplification.
  • Components of an amplification reaction can include, but are not limited to, for example, primers, polynucleotide templates, polymerases, nucleotides, dNTPs, and the like.
  • the term “amplification” generally refers to an "exponential" increase in a target nucleic acid. However, “amplification” as used herein may also refer to a linear increase in the number of selected target nucleic acid sequences, but is different from a one-time, single primer extension step.
  • PCR polymerase chain reaction
  • oligonucleotide refers to a linear oligomer of natural or modified nucleoside monomers linked by phosphodiester bonds or analogs thereof. Oligonucleotides include deoxyribonucleosides, ribonucleosides, anomeric forms thereof, peptide nucleic acids (PNA), and the like, which are capable of specifically binding a target nucleic acid. Typically, monomers are linked by phosphodiester bonds or analogs thereof to form oligonucleotides ranging in size from a few monomer units (eg, 3-4) to tens of monomer units (eg 40-60).
  • oligonucleotide is represented by a sequence of letters (such as "ATGCCTG"), it should be understood that, unless otherwise indicated, the nucleotides are in 5'-3' order from left to right, and "A” refers to deoxyadenosine Glycosides, “C” refers to deoxycytidine, “G” refers to deoxyguanosine, “T” refers to deoxythymidine, and “U” refers to ribonucleoside, uridine.
  • oligonucleotides contain the four natural deoxynucleotides; however, they may also contain ribonucleosides or non-natural nucleotide analogs.
  • oligonucleotide or polynucleotide substrate requirements for activity eg single-stranded DNA, RNA/DNA duplexes, etc.
  • appropriate composition for the oligonucleotide or polynucleotide substrate is entirely within the knowledge of the ordinary skilled person.
  • oligonucleotide primer refers to a polynucleotide sequence that hybridizes to a sequence on a target nucleic acid template and facilitates detection by an oligonucleotide probe.
  • oligonucleotide primers serve as starting points for nucleic acid synthesis.
  • oligonucleotide primers can be used to create structures that can be cleaved by cleavage reagents.
  • Primers can be of various lengths, and are typically less than 50 nucleotides in length. The lengths and sequences of primers used in PCR can be designed based on principles known to those skilled in the art.
  • mismatched nucleotide or “mismatch” refers to a nucleotide that is not complementary to the target sequence at the one or more positions. Oligonucleotide probes can have at least one mismatch, but can also have 2, 3, 4, 5, 6, or 7 or more mismatched nucleotides.
  • telomere binding refers to the recognition, contact and stable complexing between the two molecules formation of a molecule, and greatly reduced recognition, contact, or complex formation of that molecule with other molecules.
  • annealing refers to the formation of a stable complex between two molecules.
  • cleavage reagent refers to any tool, including but not limited to enzymes, capable of cleaving an oligonucleotide to produce fragments.
  • the cleavage reagent may be used only to cleave, degrade, or otherwise isolate the second portion of the oligonucleotide probe, or a fragment thereof.
  • the cleavage reagent can be an enzyme.
  • Cleavage reagents can be natural, synthetic, unmodified or modified.
  • the cleavage reagent is preferably an enzyme having both synthetic (or polymerization) activity and nuclease activity.
  • Such enzymes are typically nucleic acid amplification enzymes.
  • nucleic acid amplification enzymes are nucleic acid polymerases such as Thermus aquaticus (Taq), DNA polymerases or E. coli DNA polymerase I.
  • the enzymes may be naturally occurring, unmodified or modified.
  • nucleic acid polymerase refers to an enzyme that catalyzes the incorporation of nucleotides into nucleic acids.
  • exemplary nucleic acid polymerases include DNA polymerases, RNA polymerases, terminal transferases, reverse transcriptases, telomerases, and the like.
  • thermostable DNA polymerase refers to a DNA polymerase that is stable (ie, resistant to decomposition or denaturation) and retains sufficient catalytic activity when subjected to elevated temperatures for a selected period of time.
  • thermostable DNA polymerases retain sufficient activity to effect subsequent primer extension reactions when subjected to high temperatures for the time necessary to denature double-stranded nucleic acids.
  • the heating conditions necessary for nucleic acid denaturation are well known in the art and are exemplified in US Pat. Nos. 4,683,202 and 4,683,195.
  • Thermostable polymerases as used herein are generally suitable for use in temperature cycling reactions such as the polymerase chain reaction ("PCR").
  • thermostable nucleic acid polymerases examples include Thermus aquaticus Taq DNA polymerase, Thermus sp. Z05 polymerase, Thermus flavus polymerase, Thermotoga maritima polymerase, Such as TMA-25 and TMA-30 polymerase, Tth DNA polymerase, etc.
  • modified polymerase refers to a polymerase in which at least one monomer differs from a reference sequence, such as a native or wild-type form of the polymerase or another modified form of the polymerase. Exemplary modifications include monomeric insertions, deletions and substitutions. Modified polymerases also include chimeric polymerases having identifiable component sequences (eg, structural or functional domains, etc.) derived from two or more parents. Also included in the definition of modified polymerase are those chemically modified polymerases that contain the reference sequence.
  • modified polymerases include G46E E678G CS5 DNA polymerase, G46EL329A E678G CS5 DNA polymerase, G46E L329A D640G S671F CS5 DNA polymerase, G46E L329AD640G S671F E678G CS5 DNA polymerase, G46E E678G DNA polymerase, CS06 ⁇ Z05 polymerase, ⁇ Z05-Gold polymerase, ⁇ Z05R polymerase, E615G Taq DNA polymerase, E678G TMA-25 polymerase, E678G TMA-30 polymerase, etc.
  • 5' to 3' nuclease activity or "5'-3' nuclease activity” refers to the activity of a nucleic acid polymerase, typically associated with nucleic acid strand synthesis, whereby nucleotides are removed from the 5' end of a nucleic acid strand , for example, E. coli DNA polymerase I has this activity, but the Klenow fragment does not.
  • Some enzymes with 5' to 3' nuclease activity are 5' to 3' exonucleases. Examples of such 5' to 3' exonuclease include: exonuclease from B.
  • subtilis phosphodiesterase from spleen, lambda exonuclease, exonuclease from yeast Enzyme II, exonuclease V from yeast and exonuclease from Neurospora crassa.
  • MoCODE barcode “Molecular Code” and “specific molecular barcode” used in the present disclosure refer to the two sticky ends that constitute the obtained PCR product after digestion of multiple PCR products with specific endonucleases the overhanging single-stranded sequence.
  • MoCODE barcode decoding sequence or “molecular barcode decoding sequence” used in the present disclosure refers to the nucleotide sequence complementary to the “MoCODE barcode", “Molecular Code” and “specific molecular barcode”.
  • the MoCODE barcodes of each pair of amplification primers can be different or the same.
  • MoCODE barcodes can be from 2nt-20nt in length or longer.
  • the matching connection between the MoCODE barcode and the connector is a sticky end connection. Compared with the current TA connection or blunt end connection for library building, this method can improve the connection efficiency and the final detection sensitivity.
  • Amplification Gene-specific and universal amplification, and the introduction of MoCODE barcodes can be implemented in the same PCR reaction, shortening the operation steps and manual operation time, avoiding cross-contamination during library construction, reducing costs, and improving clinical practicability.
  • MoCODE barcodes can be used with UMI to further improve the mutation detection accuracy of targeted sequencing through error correction.
  • a method for constructing a multiplex PCR library for high-throughput targeted sequencing of the present disclosure includes adding MoCODE barcodes to specific amplification products, and using matching sequencing adapters containing MoCODE barcode decoding sequences for efficient ligation and construction library.
  • the sample source of the specific amplification product includes, but is not limited to, genomic DNA, cell-free DNA, cell-free, cDNA generated by reverse transcription of an RNA sample, and the like.
  • the template DNA of the multiplex PCR reaction can be DNA, bisulfite-converted DNA, cDNA, and the like.
  • the extraction method of the template DNA of the multiplex PCR reaction may be column extraction, magnetic bead method, phenol-chloroform extraction-ethanol or isopropanol precipitation, and the like.
  • the primers participating in the multiplex PCR reaction comprise a specific MoCODE barcode generating sequence, preferably, the primers also comprise gene-specific sequences;
  • the MoCODE barcode generation method includes: modified nucleotides (dUTP, dITP, RNA Base), nicking enzymes (Nicking enzymes), endonucleases, chemical modifications, photolytic bases Base et al. The purpose is to make a recognizable cleavage site at the end of the PCR product, and then cut out the sticky end containing the MoCODE barcode.
  • the MoCODE barcode is generated in a manner that, in the primers of the multiplex PCR reaction, in addition to a gene-specific sequence, the 5' end may also include a specific nucleic acid that is common among primers
  • the recognition site of the nuclease followed by digestion of the purified PCR product with specific endonucleases (one or two).
  • the enzymatically digested PCR product will contain two sticky ends.
  • the protruding single-stranded sequence of each sticky end forms a specific molecular barcode, namely the Molecular CODE (MoCODE) barcode.
  • the primer sequence comprises the sequence shown in Seq ID No: 1-22, 27-52, 53, 55, 57-104, 109, 111, wherein n represents the nucleotide dITP or dUTP.
  • the MoCODE barcode is generated in a manner that, in each primer of the multiplex PCR reaction, in addition to a gene-specific sequence, it also includes a dITP site, which is a site that is After specific enzyme digestion and recognition, a sticky end of 6 bases can be formed, that is, the MoCODE barcode sequence is generated.
  • the MoCODE barcodes may or may not be identical within the molecule, eg, "identical” means that the MoCODE barcodes at both ends of the same PCR product molecule are recognized by an endonuclease After cleavage is formed, the “different” means that the MoCODE barcodes at both ends of the same PCR product molecule are recognized by two different endonucleases and formed after cleavage.
  • a MoCODE barcode is contained within the same nucleotide molecule, eg, the same MoCODE barcode generated at the 5' and 3' sticky ends of a PCR product molecule.
  • two MoCODE barcodes are contained within the same nucleotide molecule, eg, the MoCODE barcodes generated at the 5' and 3' sticky ends of a PCR product molecule are different.
  • the MoCODE barcode is a non-random specific barcode.
  • the MoCODE barcode is 2-20 nt in length.
  • the MoCODE barcode sequence comprises the sequences shown in Seq ID Nos: 53, 59, 109, 111.
  • the MoCODE barcode decoding sequence and the MoCODE barcode sequence are complementary sequences, with a length of 2-20 nt.
  • the MoCODE barcode decoding sequence comprises the sequences shown in Seq ID Nos: 54, 56, 110, 112.
  • the sequencing adapter comprising the MoCODE barcode decoding sequence may be artificially designed and synthesized, or may match the sequence of the target segment itself.
  • the sequencing adapter comprising the MoCODE barcode decoding sequence can be matched with the sequence of the target segment itself.
  • the primer at the 5' end of the PCR does not need to have the MoCODE generating sequence; if the target segment of amplification itself contains MoCODE and will be used to generate the MoCODE barcode at the 3' end, then the PCR The 3' primers do not need to carry the MoCODE generating sequence ( Figure 6A).
  • the sequencing adapters comprise sequences shown in Seq ID Nos: 23-26, 105-108, wherein "nnnnnnn", [i5] or [i7] represent an index tag, such as an 8nt Illumina Index tag sequence.
  • index tag such as an 8nt Illumina Index tag sequence.
  • the 5' end for sticky linking can be phosphorylated as is known in the art.
  • the "n” or “I” at position 5 in the primer sequence Seq ID No: 57-104 is "dITP”.
  • the PCR-amplified fragments of interest may contain one or two self-MoCODE generating sequences within themselves (FIG. 6B).
  • the own MoCODE generating sequence can be used to generate MoCODE barcodes on one or both ends of the DNA molecule.
  • the corresponding MoDODE barcodes can be generated at one or both ends of the PCR product through endonuclease digestion corresponding to the self-generated MoCODE sequence (Fig. 6C).
  • the sequencing adapters comprising MoCODE barcode decoding sequences can be single adapters, bidirectional adapters, and each specific segment enrichment can be decoded by single adapter decoding, double adapter decoding, or automatic circularization decoding.
  • the use of the "single linker” occurs when the MoCODE barcodes at both ends of the PCR product are “identical”; the use of the “bidirectional linker” occurs when the barcodes at both ends of the PCR product are "different”, it is understandable that, When different adapters are used, the adapters on both sides of the non-specific product are the same, which cannot form the correct test product, which is eliminated in the sequencing process.
  • the "circularization” can use a variety of different MoCODE barcodes, and the structure is MoCODE + common sequence bound by sequencing primer + gene specific sequence.
  • the circularization decoding steps are: PCR, digestion, circularization (circularization), exonuclease digestion (exonuclease digestion), add-on PCR (adding a complete sequencing primer binding point + library index + sequence adapter), which can be used for Multiple amplicons are formed.
  • the sequencing adaptor comprising a MoCODE barcode decoding sequence comprises an upstream sequencing adaptor and a downstream sequencing adaptor comprising a MoCODE complementary to the MoCODE barcode at the 5' end of the digested PCR product A barcode decoding sequence, the downstream sequencing adapter comprises a MoCODE barcode decoding sequence complementary to the MoCODE barcode at the 3' end of the digested PCR product.
  • the upstream sequencing adapter and the downstream sequencing adapter further comprise an adapter upper strand and an adapter lower strand, respectively, the adapter upper strand is a sense strand, and the adapter lower strand is an antisense strand.
  • the MoCODE barcode decoding sequence may be located at the 3' end of the upper chain of the adapter of the upstream sequencing adapter or the 5' end of the lower chain of the adapter of the upstream sequencing adapter, or may be located at the upper chain of the adapter of the downstream sequencing adapter.
  • the 5' end or the 3' end of the linker lower strand located at the downstream sequencing linker ( Figure 3).
  • multiplex amplification of 2-1000 target segments can be achieved, and each target segment can have its own specific barcode, or multiple target segments can share the same barcode.
  • the MoCODE barcodes are non-random specific barcodes that can also be used for multi-purpose segment cancatmerization.
  • the DNA polymerase used in the multiplex PCR can be Taq polymerase, PFx, KOD, Pfu, Q5, Bst, Phusion and other commercialized enzymes.
  • the ligase used in the multiplex PCR can be T4 DNA ligase, 9 NTM DNA ligase, Taq DNA ligase, Tth DNA ligase, TfiDNA ligase, AmpligaseR, and the like.
  • the excess removal of the sequencing adapter can be performed by magnetic bead method, column extraction method, ethanol precipitation method, agarose or polyacrylamide gel recovery method, and the like.
  • the constructed library is suitable for high-throughput sequencing platforms such as Illumina, Roche, ThermoFisher, Pacific Biosciences, BGI, Oxford Nanopore Technologies, Huayinkang, and Hanhai Gene.
  • the method for constructing a multiplex PCR library for high-throughput targeted sequencing includes the following steps (an exemplary library construction process is shown in Figure 1 ):
  • Step 1 Prepare the sample to be tested to extract DNA, if it is a methylation sequencing library construction, then bisulfite conversion is required;
  • Step 2 Using the DNA sample processed in Step 1 as a template, perform multiple PCR reactions with high-fidelity PCR enzymes and multiple pairs of primers (Figure 2). At its 5' end, it contains a specific molecular barcode generation sequence common among primers.
  • Step 3 Purify the PCR product of Step 2 with magnetic beads
  • Step 4 The purified product of Step 3 is digested with a specific endonuclease.
  • the 3' and 5' ends of correctly amplified multiplex PCR products should contain a specific barcode generation site, which, when digested with specific endonucleases, will form sticky ends, i.e. generate MoCODE barcode sequences, which are used in the mediation step Five connections.
  • barcode generation site which, when digested with specific endonucleases, will form sticky ends, i.e. generate MoCODE barcode sequences, which are used in the mediation step Five connections.
  • barcode generation site There are many ways to generate barcodes, including: modified nucleotides, dUTP, dITP, RNA Base, nickase, endonuclease, chemical modification, photodegradable base, etc.;
  • Step 5 Perform magnetic bead purification on the enzymatic digestion product in Step 4;
  • Step 6 The purified enzymatic digestion product obtained in Step 5 is introduced into the upstream sequencing adapter and the downstream sequencing adapter by using a ligase that can catalyze the ligation between the sticky ends.
  • the introduced upstream sequencing adapter contains a high-throughput sequencing universal sequence (which can include an index tag sequence) and a MoCODE barcode decoding sequence that can be complementary to the MOCODE at the 5' end of the digested PCR product obtained in step 4.
  • the introduced downstream sequencing adapter contains high-throughput sequencing universal sequence (including index tag sequence) and MoCODE barcode decoding sequence that can be complementary to the MOCODE at the 3' end of the digested PCR product obtained in step 4 ( Figure 3);
  • Step 7 Purify the ligated product in step 6 with magnetic beads and complete the construction of the sequencing library.
  • Example 1 Targeted methylation multiplex PCR enrichment using MoCODE to eliminate non-specific PCR products
  • each pair of BSP primers in the experimental group included a specific molecular (MoCODE) barcode generation sequence common between primers at its 5' end; in the control group, each pair of BSP primers only contained gene-specific sequences, It does not contain a specific molecule (MoCODE) barcode generating sequence at its 5' end.
  • MoCODE specific molecular
  • the two MoCODE barcode sequences were generated via digestion of the PCR product with two restriction enzymes. Then the enrichment effect of the two groups of products was observed by agarose gel electrophoresis.
  • Hela cell genomic DNA (NEB Company, USA) was converted into bisulfite with EZ DNA Methylation-Gold Kit (ZYMO Company, USA).
  • Step 1 94°C, 2 minutes.
  • Second step 6 cycles (98°C, 10 seconds; 59°C, 5 seconds; 68°C, 5 seconds).
  • Step 3 35 cycles (98°C, 10 seconds; 68°C, 10 seconds).
  • Step 4 68°C, 1 minute.
  • Step 5 Keep at 8°C.
  • the reaction mixture was purified using HiPrep PCR magnetic beads (1.2x) and eluted in 15 ⁇ l of water.
  • the universal specific molecular barcode generation sequences of upstream primers and downstream primers are Seq ID Nos: 1 and 12, respectively, the upstream primer sequences of Moko1-10 are Seq ID Nos: 2-11, and the downstream primer sequences of Moko1-10 are Seq ID No. 2-11 respectively. ID Nos: 13-22.
  • sequencing adapter ligation was performed on the PCR products purified by the restriction endonuclease treatment of the experimental group in Example 1. The effect of sequencing adapter ligation was then observed via agarose gel electrophoresis.
  • Annealing program 82°C, 2 minutes; 570x ⁇ 82°C, 3 seconds, -0.1°C/cycle ⁇ ; 4°C hold.
  • T4 DNA Ligase Buffer 10x T4 DNA Ligase Buffer (NEB) 2 ⁇ l Purified digested PCR product 15 ⁇ l Upstream adapter (10 ⁇ M) 1 ⁇ l Downstream adapter (10 ⁇ M) 1 ⁇ l T4 DNA ligase (NEB, 200U/ ⁇ l) 1 ⁇ l total capacity 20 ⁇ l
  • Hela cell genomic DNA (NEB Company, USA) was converted into bisulfite with EZ DNA Methylation-Gold Kit (ZYMO Company, USA).
  • Step 1 94°C, 2 minutes.
  • Second step 6 cycles (98°C, 10 seconds; 59°C, 5 seconds; 68°C, 5 seconds).
  • Step 3 35 cycles (98°C, 10 seconds; 68°C, 10 seconds).
  • Step 4 68°C, 1 minute.
  • Step 5 Keep at 8°C.
  • the reaction mixture was purified using HiPrep PCR magnetic beads (1.2x) and eluted in 15 ⁇ l of water.
  • Annealing program 82°C, 2 minutes; 570x ⁇ 82°C, 3 seconds, -0.1°C/cycle ⁇ ; 4°C hold.
  • T4 DNA Ligase Buffer 10x T4 DNA Ligase Buffer (NEB) 2 ⁇ l Purified digested PCR product 15 ⁇ l Upstream adapter (10 ⁇ M) 1 ⁇ l Downstream adapter (10 ⁇ M) 1 ⁇ l T4 DNA ligase (NEB, 200U/ ⁇ l) 1 ⁇ l total capacity 20 ⁇ l
  • the ligation mixture was purified using HiPrep PCR magnetic beads (1x) and eluted in 10 ⁇ l of water.
  • the concentration of the 1:10,000 dilution was determined with the Kapa library quantification kit.
  • the concentration of the library was adjusted to 4 nM with water.
  • Illumina paired-end sequencing raw .fastq files were assembled into complete tested segments by PEAR software. The sequencing result after each assembly is compared with the target segment sequence, and the sequence that meets the expected read length generated by the correct paired primers is identified as on-target, and the on-target rate is the number of on-target sequences in the total reads. The proportion of the number taken.
  • the upstream primer sequences of Moko11-23 are Seq ID No: 27, 29, 31, 33, 35, 37, 39, 41, 43, 45, 47, 49, 51
  • the downstream primer sequences of Moko11-23 are Seq ID No: 28, 30, 32, 34, 36, 38, 40, 42, 44, 46, 48, 50, 52.
  • MoCODE barcode sequence (5'>3') MoCODE barcode decoding sequence (5'>3') upstream connector TGTA (Seq ID No: 53) TACA (Seq ID No: 54) downstream adapter GAT (Seq ID No: 55) ATC (Seq ID No: 56)
  • TCT/LCT Thin-Cytologic Test/Liquid-based cytologic test
  • the first step 94°C, 2 minutes;
  • Step 2 6 cycles (98°C, 10 seconds; 59°C, 5 seconds; 68°C, 5 seconds);
  • the third step 35 cycles (98°C, 10 seconds; 64°C, 5 seconds; 68°C, 5 seconds);
  • the fourth step 68 °C, 1 minute
  • Step 5 Keep at 8°C.
  • reaction mixture was purified using AMPure XP magnetic beads (1.5x) and eluted in 13 ⁇ l water.
  • Annealing program 82°C, 2 minutes; 570x ⁇ 82°C, 3 seconds, -0.1°C/cycle ⁇ ; 4°C hold.
  • the ligation mixture was purified using AMPure XP magnetic beads (1.2x) and eluted in 10 ⁇ l of water.
  • Illumina paired-end sequencing raw .fastq files were assembled into complete tested segments by PEAR software. The sequencing result after each assembly is compared with the target segment sequence, and the sequence with the expected read length generated by the correctly paired primers is identified as on-target, and the on-target rate is the number of on-target sequences in the total read reads. The proportion of the number taken.
  • Sample 1 Sample 2 total reads 1225399 1143004 hit rate 98.0% 98.2%
  • the underlined sequence fragment is the specific target gene sequence
  • MoCODE barcode sequence (5'>3') MoCODE barcode decoding sequence 5'>3') upstream connector CACAT (Seq ID No: 109) ATGTG (Seq ID No: 110) downstream adapter CGGAA (Seq ID No: 111) TTCCG (Seq ID No: 112)

Landscapes

  • Life Sciences & Earth Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Genetics & Genomics (AREA)
  • Chemical & Material Sciences (AREA)
  • Engineering & Computer Science (AREA)
  • Organic Chemistry (AREA)
  • Wood Science & Technology (AREA)
  • Zoology (AREA)
  • Biotechnology (AREA)
  • General Engineering & Computer Science (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Biomedical Technology (AREA)
  • Microbiology (AREA)
  • Physics & Mathematics (AREA)
  • Molecular Biology (AREA)
  • Biochemistry (AREA)
  • General Health & Medical Sciences (AREA)
  • Biophysics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Crystallography & Structural Chemistry (AREA)
  • Plant Pathology (AREA)
  • Proteomics, Peptides & Aminoacids (AREA)
  • Immunology (AREA)
  • Analytical Chemistry (AREA)
  • Measuring Or Testing Involving Enzymes Or Micro-Organisms (AREA)

Abstract

一种用于高通量靶向测序的多重PCR文库构建方法,首先通过高特异性的多重PCR反应,获得靶向DNA产物,再经由特异性核酸内切酶消化,使PCR产物末端产生特异性的分子条码,这使得建库过程更加高效,也保证了所得数据的准确及测序深度。

Description

一种用于高通量靶向测序的多重PCR文库构建方法 技术领域
本公开涉及生物医药领域,更具体地,本公开涉及一种DNA文库的构建方法,特别涉及一种用于高通量靶向测序的多重PCR文库构建方法。
背景技术
本公开涉及文库构建技术领域,具体涉及一种靶向高通量DNA文库构建方法。过去十年,随着新一代测序技术的不断进步,对生命科学研究的应用也在不断扩大。不同核酸的制备方法和测序文库构建手段也更加高效。
高通量测序(High-Throughput Sequencing),即下一代测序技术(Next-generation sequencing,NGS),是通过在高密度生物芯片上实现大规模平行测序的技术,具有数据产量高,单位数据量成本低的特点。但其缺点在于测序读长短,一般测序长度为2x300bp或者2x150bp。获得的短读长序列在无参考基因组比对拼接,或者含有高度复杂结构序列的基因组时,序列的比对和拼接会非常困难。此时,通过大跨度的大片段文库(mate pair library)可以辅助短序列的拼接组装。此外,通过link算法对大片段文库进行分析,可以检测染色体大片段的结构变异,如插入、缺失、倒位、异位等。
高通量靶向测序是一种非常具有成本效益以及灵敏度很高的检测手段,而其中关键环节在于目的基因的靶向富集,目前实现靶向富集的主要方法包括基于杂交捕获和PCR的文库构建方法。总体来说,基于杂交捕获的方法由于需要使用链霉亲和素包裹的磁珠,因此成本昂贵且操作步骤繁琐,同时需要更多的DNA样本。随着近年来技术的发展,相比杂交捕获,使用分子条形码(Unique Molecular Identifier,UMI)技术基于PCR的靶向富集技术尽管得到了长足进步,可以解决原先难以去除PCR重复序列的困难,但UMI中的错误仍难以消除且操作步骤繁琐。因此,有必要提供一种精准、高效、简便的多重PCR靶向富集文库构建方法。
现有基于PCR的靶向富集文库构建方法主要包括AmpliSeq(thermo)、SLIM Amplification、Relay PCR等。这些方法均包含两步PCR反应,即第一步靶向扩增目的片段,第二步接头连接后的PCR富集,但这些方法均使用传统TA连接或平端连接,整体文库构建过程没有添加控制非特异性扩增环节,也不能很好的去除非特异性扩增产物。这种情况在靶向甲基化测序中表现尤为突出。由于重亚硫酸盐处理后的DNA,绝大多数胞嘧啶变成胸腺嘧啶,使得多重引物之间较容易形成引物二聚体或非特异性扩增。
发明内容
本公开的目的在于提供一种用于高通量靶向测序的多重PCR文库构建方法。
为了达到上述目的,本公开采用了以下技术手段:
本公开涉及一种用于高通量靶向测序的多重PCR文库的构建方法,通过对特异性扩增产物加入多碱基MoCODE条码,并利用MoCODE条码使扩增产物与包含MoCODE条码解码序列的测序接头高效连接建库,所述MoCODE条码是指用特异性核酸内切酶消化多重PCR产物后,组成所获得的PCR产物的两个粘性末端的突出的单链核苷酸序列,所述MoCODE条码解码序列为与所述MoCODE条码互补的核苷酸序列。
优选地,所述MoCODE条码的生成方式包括:修饰核苷酸、切口酶(Nicking enzyme)、内切酶、化学修饰、可光解碱基等中的一种或多种;优选地,所述修饰核苷酸包括dUTP,dITP,RNA碱基中的一种或多种。
优选地,所述MoCODE条码在分子内可以是相同的或不相同的。
优选地,所述MoCODE条码为非随机特异性条码。
优选地,所述MoCODE条码的长度2-20nt。
优选地,所述MoCODE条码解码序列与MoCODE条码序列为互补序列,长度2-20nt。
优选地,所述测序接头可以是人工设计合成、或与目的区段自身片段序列匹配。
优选地,所述测序接头可以为单一接头、双向接头。
优选地,每一个特定区段富集可通过单一接头解码、双接头解码或自动环化解码。
本公开还涉及一种用于高通量靶向测序的多重PCR的引物,所述引物包含MoCODE条码生成序列,优选地,所述引物的序列包含Seq ID No:1-22、27-52、53、55、57-104、109、111所示序列。
相应地,本公开还涉及一种用于高通量靶向测序的多重PCR的测序接头,所述测序接头包含MoCODE条码解码序列,优选地,所述测序接头还包含测序平台的测序接头、index标签中的一种或多种,优选地,所述测序接头包含高通量测序通用序列、index标签和所述MoCODE条码解码序列,所述测序接头的序列包含Seq ID No:23-26、54、56、105-108、110、112所示序列。
本公开的一种用于高通量靶向测序的多重PCR文库构建方法,所述方法包括以下步骤:
1)从待检样本中提取DNA;
2)进行多重PCR反应,参与多重PCR反应的每条引物包含一段特异的MoCODE条码生成序列,优选地,所述引物还包含基因特异性序列;
3)用磁珠法纯化步骤2)所得PCR产物;
4)使步骤3)所得纯化PCR产物产生5’和3’粘性末端,并分别在5’和/或3’粘性末端生成MoCODE条码;
5)用磁珠法纯化步骤4)的含有MoCODE条码的PCR产物;
6)连接步骤5)所得的纯化的含有MoCODE条码的PCR产物和测序接头,所述测序接头含有与MoCODE互补的MoCODE条码解码序列;
7)用磁珠纯化步骤6)所得连接产物,完成用于高通量靶向测序的多重PCR文库的构建。
优选地,步骤4)中所述MoCODE条码的生成方式包括:修饰核苷酸、切口酶(Nicking enzyme)、内切酶、化学修饰、可光解碱基等中的一种或多种;优选地,所述修饰核苷酸包括dUTP,dITP,RNA碱基中的一种或多种,更优选地,所述MoCODE条码的生成方式为利用特异性核酸内切酶进行酶消化。
优选地,步骤4)中所述在5’和3’粘性末端各生成一个MoCODE条码,其中所述5’和3’粘性末端的MoCODE条码可以相同也可以不同。
优选地,步骤6)中所述测序接头可以为单一接头、双向接头或环化接头。
与现有技术相比,本公开具有如下优点:
(1)降低多重PCR扩增中非特异产物
目前基于PCR靶向富集的文库构建方法虽然引入了UMIs,可以一定程度过滤掉文库构建和测序过程中的错误,但随机的错误不只由于模板片段的序列导致,同时会来源于UMIs自身的序列。如果错误发生在UMIs,PCR重复序列将会被错误的识别为来自UMIs标识的唯一分子,这将导致被过高评估测序深度,影响测序质量。UMIs本身为随机序列,并不能去除多重PCR中的非特异性扩增产物、引物二聚体、或更为复杂的单链或双链的多聚体。
通过设计特异性高的多重PCR引物组并在每组引物中加入特定酶切位点及一段唯一特有序列,使得只有被正确扩增的PCR产物经过酶消化才能与特异性配对的接头连接,进而完成测序文库构建。扩增过程中产生的二聚体和多聚体经由特异性核酸内切酶消化去除。非特异性扩增产物由于不能与解码接头产生正确组合,最终连接产物在高通量测序过程中无法被扩增和识别,所得测序数据全部或绝大多数为特异性目的片段,大大提高测序数据的着靶率,从而保证测序深度。
(2)高效、减少污染
通过设计粘性末端接头连接,相比平端连接中只有连接酶的作用,更突出了碱基的互补作用,同时增加了酶与底物亲和力,使得连接效率显著提高。相比其他公司基于PCR的靶向富集文库构建方法中的两次PCR,整个文库构建过程仅需一步PCR反应,减少污染,具有更好的抗污染能力。
(3)操作简便、简约时间
通过设计特异性高的多重PCR引物组、增加接头连接效率,使得建库过程更加高效,相比其他公司基于PCR的靶向富集文库构建方法,手工操作时间减少40-50%、整体建库时间缩短30-40%。
附图说明
图1为本公开方法的使用不相同MoCODE构建文库的过程;
图2为本公开多重PCR的上下游引物结构示意图;
图3为本公开上下游接头结构示意图;
图4A为本公开实施例3中PCR产物两端MoCODE(不相同)双链结构示意图;
图4B为本公开实施例3中上游接头双链结构示意图;
图4C为本公开实施例3中下游接头双链结构示意图;
图5A为本公开实施例4中PCR产物两端MoCODE(相同)双链结构示意图;
图5B为本公开实施例4中上游接头双链结构示意图;
图5C为本公开实施例4中下游接头双链结构示意图;
图6A为本公开利用扩增目的区段自身含有的MoCODE生成序列产生MoCODE条码时使用的引物示意图;
图6B为本公开利用扩增目的区段自身含有的MoCODE生成序列产生MoCODE条码时自身含有MoCODE生成序列的PCR扩增的目的片段示意图;
图6C为本公开利用扩增目的区段自身含有的MoCODE生成序列产生MoCODE条码时生成了MoCODE条码的PCR产物示意图;
图7为本公开实施例1的PCR扩增产物琼脂糖凝胶电泳结果;
图8为本公开实施例2测序接头连接的产物琼脂糖凝胶电泳结果。
具体实施方式
根据本公开的上述内容,按照本领域的普通技术知识和惯用手段,在不脱离本公开上述基本技术思想前提下,还可以做出其它多种形式的修改、替换或变更。
I.定义
术语“样品”,包括包含核酸的样本或培养物(例如,微生物培养物),还意图包括生物样品和环境样品。样品可以包括合成起源的样本。生物样品包括全血、血清、血浆、脐带血、绒毛膜绒毛、羊水、脑脊液、脊髓液、灌洗液(例如,支气管肺泡的、胃的、腹膜的、导管的、耳的、关节镜的灌洗液)、活检样品、尿、粪便、痰、唾液、鼻粘液、前列腺液、精液、淋巴液、胆汁、泪液、汗液、乳汁、乳房流体、胚胎细胞和胎儿细胞。在优选的实施方案中,所述生物样品是血液,并且更优选地是血浆。如本文使用的术语“血液”包括全血或任何血液级分,诸如,如常规地定义的血清和血浆。血液血浆是指由用抗凝剂处理过的血液的离心产生的全血级分。血液血清是指血液样品已经凝固后剩余的流体的水样部分。环境样品包括环境材料,诸如表面物质、土壤、水和工业样品,以及从食品和乳制品加工装置、仪器、设备、器具、一次性和非一次性物品获得的样品。这些实例不应解释为限制可应用于本发明的样品类型。
术语“靶标”、“靶核酸”、“目的基因”意图指待检测或测量其存在、或者待研究其功能、相互作用或特性的任何分子。
术语“核酸”和“核酸分子”可以在本公开全文互换使用。所述术语是指寡核苷酸、寡聚物、多核苷酸、脱氧核糖核苷酸(DNA)、基因组DNA、线粒体DNA(mtDNA)、互补DNA(cDNA)、细菌DNA、病毒DNA、病毒RNA、RNA、信使RNA(mRNA)、转移RNA(tRNA)、核糖体RNA(rRNA)、siRNA、催化性RNA、克隆、质粒、M13、P1、粘粒、细菌人工染色体(BAC)、酵母人工染色体(YAC)、扩增的核酸、扩增子、PCR产物及其他类型的扩增的核酸、RNA/DNA杂交体和聚酰胺核酸(PNA),所有这些可以呈单链或双链形式,并且除非另有限制,否则将包括可以与天然存在的核苷酸类似的方式起作用的天然核苷酸的已知类似物,及其组合和/或混合物。因此,术语“核苷酸”是指天然存在的和修饰的/非天然存在的核苷酸,包括三、二和单磷酸核苷,以及在聚核酸或寡核苷酸内存在的单磷酸单体。核苷酸也可以是核糖;2’-脱氧;2’,3’-脱氧以及本领域众所周知的大量其他核苷酸模拟物。模拟物包括链终止核苷酸,诸如3’-O-甲基,卤代碱基或糖取代;替代糖结构,包括非糖,烷基环结构;替代碱基,包括肌苷;脱氮修饰的;chi和psi,接头修饰的;质量标记修饰的;磷酸二酯修饰或替代,包括硫代磷酸酯,甲基膦酸酯,硼代磷酸酯(boranophosphate),酰胺,酯,醚;和基本或完全的核苷酸间替代,包括切割连接,诸如光可切割的硝基苯基部分。
术语“扩增反应”是指用于扩增靶核酸序列的拷贝的任何体外方式。“扩增”是指使溶液处于足以允许扩增的条件的步骤。扩增反应的组分可以包括但不限于例如引物、多核苷酸模板、聚合酶、核苷酸、dNTP等。术语“扩增”通常是指靶核酸的“指数”增加。然而,如本文使用的“扩增”还可以是指选定的靶核酸序列的数目的线性增加,但不同于一次性的、单引物延伸步骤。
术语“聚合酶链式反应”或“PCR”是指用于以几何级数扩增靶双链DNA的特定区段或子序列的方法。PCR是本领域技术人员众所周知的。
术语“寡核苷酸”是指通过磷酸二酯键或其类似物连接的天然或修饰的核苷单体的线性寡聚体。寡核苷酸包括能够特异性地结合靶核酸的脱氧核糖核苷、核糖核苷、其端基异构形式、肽核酸(PNA)等。通常,单体通过磷酸二酯键或其类似物连接以形成寡核苷酸,所述寡核苷酸的大小范围从几个单体单元(例如3-4个)至几十个单体单元(例如40-60个)。每当寡核苷酸通过字母的序列(诸如“ATGCCTG”)表示时,应该理解,除非另外指出,否则核苷酸从左到右是5’-3’顺序,并且“A”是指脱氧腺苷,“C”是指脱氧胞苷,“G”是指脱氧鸟苷,“T”是指脱氧胸苷,并且“U”是指核糖核苷,尿苷。通常寡核苷酸包含四种天然脱氧核苷酸;然而,它们也可包含核糖核苷或非天然核苷酸类似物。当酶对于活性具有特定寡核苷酸或多核苷酸底物要求(例如单链DNA、RNA/DNA双链体等)的情况下,则关于寡核苷酸或多核苷酸底物的适当组成的选择完全是在普通技术人员的知识之内。
术语“引物”即“寡核苷酸引物”,是指多核苷酸序列:其与靶核酸模板上的序列杂交并且促进寡核苷酸探针的检测。在本发明的扩增实施方案中,寡核苷酸引物充当核酸合成的起始点。在非扩增实施方案中,寡核苷酸引物可以用于建立能够被切割试剂切割的结构。引物可以具有多种长度,并且通常长度小于50个核苷酸。可以基于本领域技术人员已知的原则来设计用于PCR中的引物的长度和序列。
“错配核苷酸”或“错配”是指在该一个或多个位置处与靶序列不互补的核苷酸。寡核苷酸探针可以具有至少一个错配,但还可以具有2、3、4、5、6或7个或更多个错配核苷酸。
关于一种分子与另一种分子(诸如用于靶多核苷酸的探针)的结合的术语“特异性的”或“特异性”,是指两种分子之间的识别、接触和稳定复合物的形成,以及该分子与其他分子的大幅减少的识别、接触或复合物形成。如本文使用的术语“退火”是指两种分子之间的稳定复合物的形成。
术语“切割试剂”是指能够切割寡核苷酸以产生片段的任何工具,包括但不限于酶。对于其中不发生扩增的方法,切割试剂可以仅用于切割、降解或以其他方式分离寡核苷酸探针的第二部分或其片段。切割试剂可以是酶。切割试剂可以是天然的、合成的、未修饰的或修饰的。
对于其中发生扩增的方法,切割试剂优选地是具有合成(或聚合)活性和核酸酶活性的酶。这样的酶通常为核酸扩增酶。核酸扩增酶的实例是核酸聚合酶,诸如水生栖热菌(Thermus aquaticus,Taq)、DNA聚合酶
Figure PCTCN2021143948-appb-000001
或大肠杆菌(E.coli)DNA聚合酶I。所述酶可以是天然存在的,未修饰的或修饰的。
术语“核酸聚合酶”是指催化核苷酸并入核酸内的酶。示例性的核酸聚合酶包括DNA聚合酶、RNA聚合酶、末端转移酶、逆转录酶、端粒酶等。
“热稳定的DNA聚合酶”是指这样的DNA聚合酶:当在选定的时间段经受高温时,其为稳定的(即抵抗分解或变性)且保留足够的催化活性。例如,当经受高温经过双链核酸变性所必需的时间时,热稳定DNA聚合酶保留足够的活性以实现随后的引物延伸反应。核酸变性所必需的加热条件是本领域众所周知的,并且例示在美国专利号4,683,202和4,683,195中。如本文使用的热稳定的聚合酶通常适用于温度循环反应诸如聚合酶链式反应(“PCR”)中。热稳定的核酸聚合酶的实例包括水生栖热菌Taq DNA聚合酶、栖热菌属种Z05聚合酶、黄栖热菌(Thermus flavus)聚合酶、海栖热袍菌(Thermotoga maritima)聚合酶,诸如TMA-25和TMA-30聚合酶、Tth DNA聚合酶等。
“修饰的聚合酶”是指其中至少一个单体不同于参考序列的聚合酶,所述参考序列诸如所述聚合酶的天然或野生型形式或所述聚合酶的另一种修饰形式。示例性修饰包括单体插入、缺失和取代。修饰的聚合酶还包括嵌合聚合酶,其具有衍生自两个或更多个亲本的可鉴定的组分序列(例如,结构或功能结构域等)。修饰聚合酶的定义中还包括那些包含参考序列的化学修饰的 聚合酶。修饰聚合酶的实例包括G46E E678G CS5 DNA聚合酶,G46EL329A E678G CS5 DNA聚合酶,G46E L329A D640G S671F CS5 DNA聚合酶,G46E L329AD640G S671F E678G CS5 DNA聚合酶,G46E E678G CS6 DNA聚合酶,Z05 DNA聚合酶,ΔZ05聚合酶,ΔZ05-Gold聚合酶,ΔZ05R聚合酶,E615G Taq DNA聚合酶,E678G TMA-25聚合酶,E678G TMA-30聚合酶等。
术语“5’至3’核酸酶活性”或“5’-3’核酸酶活性”是指核酸聚合酶的活性,通常与核酸链合成相关,由此从核酸链5’端移除核苷酸,例如,大肠杆菌DNA聚合酶I具有该活性,而Klenow片段则没有。一些具有5’至3’核酸酶活性的酶是5’至3’外切核酸酶。这种5’至3’外切核酸酶的实例包括:来自枯草芽孢杆菌(B.subtilis)的外切核酸酶,来自脾的磷酸二酯酶,λ外切核酸酶,来自酵母的外切核酸酶II,来自酵母的外切核酸酶V和来自粗糙脉孢菌(Neurospora crassa)的外切核酸酶。
本公开所使用术语“MoCODE条码”、“分子条码(Molecular Code)”、“特异分子条码”是指用特异性核酸内切酶消化多重PCR产物后,组成所获得的PCR产物的两个粘性末端的突出单链序列。
本公开所使用术语“MoCODE条码解码序列”或称“分子条码解码序列”为与所述“MoCODE条码”、“分子条码(Molecular Code)”、“特异分子条码”互补的核苷酸序列。
II.实施方式
本公开的一种用于高通量测序的多重PCR靶向富集文库构建方法所基于的原理是:
1、在每个扩增区段的引物中引入MoCODE条码(Molecular Code)。
2、每对扩增引物的MoCODE条码可以是不相同的或相同的。
通过后期接头连接时相互匹配对特异性扩增产物进行选择。MoCODE条码的长度可以从2nt-20nt或更长。
3、非特异性片段由于不能和接头形成有效的匹配,不能形成正确的测序所需结构,在测序反应体系中不能扩增从而在反应体系中去除。
4、MoCODE条码和所述接头的匹配连接是粘端连接,相比目前建库的TA连接或平端连接,此方法可以提高连接效率和最终的检测灵敏度。
5、扩增:基因特异性与通用扩增,和MoCODE条码引入可在同一PCR反应中实现,缩短操作步骤和手工操作时间,避免建库中交叉污染,降低成本,提高临床实用性。
6、MoCODE条码可以配合UMI使用,通过错误纠正进一步提高靶向测序的突变检测准确度。
本公开的一种用于高通量靶向测序的多重PCR文库的构建方法,通过对特异性扩增产物加入MoCODE条码,并利用与之匹配的包含MoCODE条码解码序列的测序接头进行高效连接建库。
在本公开的某些实施方案中,所述特异性扩增产物的样本来源包括但不限于基因组DNA、游离DNA、游离细胞、通过RNA样本逆转录产生的cDNA等。
在本公开的某些实施方案中,其中,多重PCR反应的模板DNA可以是DNA、经重亚硫酸盐转化的DNA和cDNA等。
在本公开的某些实施方案中,所述多重PCR反应的模板DNA的提取方法可以是柱提法、磁珠法和酚-氯仿抽提-乙醇或异丙醇沉淀等。
在本公开的某些实施方案中,参与多重PCR反应的引物包含一段特异的MoCODE条码生成序列,优选地,所述引物还包含基因特异性序列;
在本公开的某些实施方案中,所述MoCODE条码的生成方式包括:修饰核苷酸(dUTP,dITP,RNA Base),切口酶(Nicking enzyme),内切酶,化学修饰,可光解碱基等。其目的是在PCR产物末端进行可以识别的切割位点,进而切割出含有MoCODE条码的粘性末端。
在本公开的具体实施方案中,所述MoCODE条码的生成方式为在多重PCR反应的引物中,除一段基因特异性序列外,还可以在其5’端包含一个引物间通用的特异性核酸内切酶的识别位点,随后再利用特异性核酸内切酶(一个或两个)消化经纯化的PCR产物。经酶消化的PCR产物将含有两个粘性末端。每一个粘性末端的突出单链序列形成一段特异的分子条码,即Molecular CODE(MoCODE)条码。
在本公开的某些实施方案中,所述引物序列包含Seq ID No:1-22、27-52、53、55、57-104、109、111所示序列,其中n表示核苷酸dITP或dUTP。
在本公开的具体实施方案中,所述MoCODE条码的生成方式为在多重PCR反应的每条引物中,除一段基因特异性序列外,还包含一个dITP位点,该位点为位点,经特异性酶的酶切识别后,可形成6个碱基的粘性末端,即产生MoCODE条码序列。
在本公开的某些实施方案中,所述MoCODE条码在分子内可以是相同的或不相同的,例如,所述“相同的”表示同一个PCR产物分子两端的MoCODE条码由一个内切酶识别后切割形成,所述“不相同的”表示同一个PCR产物分子两端的MoCODE条码由两个不同内切酶识别后切割形成。
在本公开的某些实施方案中,同一个核苷酸分子内含有一种MoCODE条码,例如在一个PCR产物分子的5’和3’粘性末端生成的MoCODE条码相同。
在本公开的某些实施方案中,同一个核苷酸分子内含有两种MoCODE条码,例如在一个PCR产物分子的5’和3’粘性末端生成的MoCODE条码不同。
在本公开的某些实施方案中,所述MoCODE条码为非随机特异性条码。
在本公开的某些实施方案中,所述MoCODE条码的长度2-20nt。
在本公开的某些实施方案中,所述MoCODE条码序列包含Seq ID No:53、59、109、111所示序列。
在本公开的某些实施方案中,所述MoCODE条码解码序列与MoCODE条码序列为互补序列,长度2-20nt。
在本公开的某些实施方案中,所述MoCODE条码解码序列包含Seq ID No:54、56、110、112所示序列。
在本公开的某些实施方案中,所述包含MoCODE条码解码序列的测序接头可以是人工设计合成、或与目的区段自身片段序列匹配。
所述包含MoCODE条码解码序列的测序接头可以是与目的区段自身片段序列匹配的示例性说明为,如果PCR扩增的目的区段自身含有MoCODE生成序列,且该自身含有的MoCODE生成序列将用于产生5’端的MoCODE条码,则此时PCR的5’端引物不需要带有MoCODE生成序列;如果扩增的目的区段自身含有MoCODE将用于产生3’端的MoCODE条码,则此时PCR的3’端引物不需要带有MoCODE生成序列(图6A)。
在本公开的某些实施方案中,所述测序接头包含Seq ID No:23-26、105-108所示序列,其中“nnnnnnnn”、[i5]或[i7]表示index标签,例如8nt的Illumina Index标签序列。如本领域公知的,用于粘性链接的5’末端可以磷酸化。
在本公开的某些实施方式中,所述引物序列Seq ID No:57-104中第5位的“n”或“I”为“dITP”。
在本公开的某些实施方案中,PCR扩增的目的片段内部可以含有一个或两个自身MoCODE生成序列(图6B)。相应地,自身的MoCODE生成序列可用于产生DNA分子一端或者两端的MoCODE条码。经由和自身MoCODE生成序列相应的核酸内切酶消化,可在PCR产物一端或两端产生所对应的MoDODE条码(图6C)。
在本公开的某些实施方案中,所述包含MoCODE条码解码序列的测序接头可以为单一接头、双向接头,每一个特定区段富集可通过单一接头解码、双接头解码或自动环化解码。所述“单一接头”的使用,发生于PCR产物两端的MoCODE条码为“相同”时;所述“双向接头”的使用,发生于PCR产物两端的条码为“不相同”时,可以理解地,在使用不相同的接头时,非特异性产物两侧接头相同,不能形成正确的被测产物,从而在测序环节中被清除。
在本公开的某些实施方式中,所述“环化”可以使用多种不同的MoCODE条码,结构为MoCODE+测序引物结合的常见序列+基因特异性序列。所述环化解码步骤为:PCR、消化、圆环化(circularization)、外切酶消化(exonuclease digestion)、add-on PCR(加入完整的测序引物结合点+文库索引+序列适配器),可用于形成多种扩增子。
在本公开的某些实施方案中,所述包含MoCODE条码解码序列的测序接头包括上游测序接头和下游测序接头,所述上游测序接头包含可与消化的PCR产物的5’端的MoCODE条码互补 的MoCODE条码解码序列,所述下游测序接头包含可与消化的PCR产物的3’端的MoCODE条码互补的MoCODE条码解码序列。
并且,所述上游测序接头和下游测序接头还分别包含接头上链和接头下链,所述接头上链为正义链,所述接头下链为反义链。所述MoCODE条码解码序列可以位于所述上游测序接头的接头上链的3’端或位于所述上游测序接头的接头下链的5’端,也可以位于所述下游测序接头的接头上链的5’端或位于所述下游测序接头的接头下链的3’端(图3)。
在本公开的某些实施方案中,可实现2-1000个目的区段多重扩增,每个目的区段可以有各自特异性条码,也可以多个目的区段共享同一条码。
在本公开的某些实施方案中,所述MoCODE条码为非随机特异性条码,也可用于多目的区段连环化(cancatmerization)。
在本公开的某些实施方案中,所述多重PCR所用DNA聚合酶可以是Taq聚合酶,PFx,KOD,Pfu,Q5,Bst,Phusion等商业化的酶。
在本公开的某些实施方案中,所述多重PCR所用连接酶可以是T4DNA连接酶,9 NTM DNA连接酶,Taq DNA连接酶,Tth DNA连接酶,TfiDNA连接酶,AmpligaseR等。
在本公开的某些实施方案中,所述测序接头的过量去除可用磁珠法、柱提法、乙醇沉淀法、琼脂糖或聚丙烯酰胺胶回收法等。
在本公开的某些实施方案中,所建文库适用于Illumina、Roche、ThermoFisher、Pacific Biosciences、华大基因、Oxford Nanopore Technologies、华因康、瀚海基因等高通量测序平台。
具体的,在本公开的某些实施方案中,所述一种用于高通量靶向测序的多重PCR文库的构建方法包括如下步骤(示例性建库流程如图1所示):
步骤一:准备待检样本提取DNA,若为甲基化测序文库构建需随后进行重亚硫酸盐转化;
步骤二:以步骤一处理得到的DNA样本为模板,用高保真性PCR酶和多对引物(图2)进行多重PCR反应;参与多重PCR反应的每对引物除包含一段基因特异性序列外,还在其5’端包含一段引物间通用的特异分子条码生成序列。
步骤三:对步骤二的PCR产物进行磁珠纯化;
步骤四:对步骤三的纯化产物利用特异性核酸内切酶进行消化。正确扩增的多重PCR产物的3’和5’末端应包含一个特定的条码生成位点,利用特异性核酸内切酶消化后,会形成粘性末端,即产生MoCODE条码序列,用于介导步骤五的连接。生成条码的方式有多种方式,包括:修饰核苷酸,dUTP,dITP,RNA Base,切口酶,内切酶,化学修饰,可光解碱基等;
步骤五:对步骤四中的酶消化产物进行磁珠纯化;
步骤六:对步骤五中所得纯化的酶消化产物,利用可以催化粘性末端之间连接的连接酶引入上游测序接头和下游测序接头。引入的上游测序接头包含高通量测序通用序列(可以包括index标签序列)和可与步骤四所获消化的PCR产物的5’端的MOCODE互补的MoCODE条码解码序列。引入的下游测序接头包含高通量测序通用序列(包括index标签序列)和可与步骤四所获消化的PCR产物的3’端的MOCODE互补的MoCODE条码解码序列(图3);
步骤七:对步骤六的连接产物进行磁珠纯化并完成测序文库的构建。
III.实施例
下面结合具体实例来进一步描述本发明,本发明的优点和特点将会随着描述而更为清楚。但这些实例仅是范例性的,并不对本发明的范围构成任何限制。本领域技术人员应该理解的是,在不偏离本发明的精神和范围下可以对本发明技术方案的细节和形式进行修改或替换,但这些修改和替换均落入本发明的保护范围内。
实施例1:利用MoCODE进行靶向甲基化多重PCR富集消除非特异PCR产物
在本实施例中,设计了2组10对重亚硫酸盐测序引物(Bisulfite Sequencing Primer,BSP),2组中的每条引物均包含相同基因特异性序列。其中,实验组每对BSP引物除包含基因特异性序列外,分别在其5’端包含一段引物间通用的特异分子(MoCODE)条码生成序列;对照组每对BSP引物仅包含基因特异性序列,在其5’端不包含特异分子(MoCODE)条码生成序 列。两个MoCODE条码序列经由两个限制性内切酶消化PCR产物产生。随后2组产物经由琼脂糖凝胶电泳观察富集效果。
1)PCR模板制备
a)将Hela细胞基因组DNA(美国NEB公司)用EZ DNA Methylation-Gold Kit(美国ZYMO公司)进行重亚硫酸盐转化。
b)用Qubit荧光计测量所获转化DNA的浓度。
c)用水调节重亚硫酸盐转化DNA的浓度至50ng/μl。
2)多重PCR
a)PCR反应体系
组分 体积
无核酸酶水 21.5μl
2倍KOD-Multi Epi PCR预混液(TOYOBO) 25μl
引物混合液(10μM) 1.5μl
亚硫酸盐处理过的Hela细胞基因组DNA 1μl(50ng)
KOD-Multi&Ep(TOYOBO) 1μl
总体积 50μl
b)PCR程序
第一步:94℃,2分钟。
第二步:6个循环(98℃,10秒;59℃,5秒;68℃,5秒)。
第三步:35个循环(98℃,10秒;68℃,10秒)。
第四步:68℃,1分钟。
第五步:保持在8℃。
3)用HiPrep PCR磁珠(美国MAGBIO公司)纯化多重PCR产物
a)用60μl磁珠(1.2倍)纯化PCR产物。
b)纯化产物洗脱在15μl水中。
c)用Qubit荧光计测量纯化PCR产物的浓度
d)用水调节产物的浓度为10ng/μl。
4)用限制性内切酶Bbvl和Earl处理纯化的PCR产物(生成的产物结构示意图如图5A所示)
组分 体积
10倍Cutsmart缓冲液(NEB) 2μl
BbvI(NEB,2U/μl) 1μl
EarI(NEB,20U/μl) 0.5μl
纯化PCR产物 5μl 50ng
无核酸酶水 11.5μl
总体积 20μl
在一个热循环器上于37℃孵育30分钟。
在65℃下孵育20分钟,使酶丧失活性。
使用HiPrep PCR磁珠(1.2x)纯化反应混合液,并洗脱在15μl水中。
5)琼脂糖凝胶电泳
a)用0.5×TBE制备2%琼脂糖凝胶,加入核酸染料(GelSafe)(每10ml体系加1μl染料)。
b)加入5μL用限制性内切酶处理纯化后的PCR产物。
c)150V电泳30分钟,凝胶成像系统拍照观察。
6)琼脂糖凝胶电泳结果
实验组可见10对引物PCR扩增产物条带清晰,无引物二聚体产生;对照组PCR产物成弥散条带状,且引物二聚体明显(图7)。
7)本实施例中所用PCR引物序列
如下,其中,上游引物和下游引物通用特异分子条码生成序列分别为Seq ID No:1、12,Moko1-10上游引物序列分别为Seq ID No:2-11,Moko1-10下游引物序列分别为Seq ID No:13-22。
Figure PCTCN2021143948-appb-000002
实施例2:利用MoCODE进行靶向甲基化多重PCR富集后测序接头的连接
在本实施例中,对实施案例1中对实验组用限制性内切酶处理纯化后的PCR产物进行测序接头连接。随后经由琼脂糖凝胶电泳观察测序接头连接效果。
1)接头连接(接头结构示意图如图5B-C所示)
a)接头的制备
Figure PCTCN2021143948-appb-000003
Figure PCTCN2021143948-appb-000004
在82℃下,于热循环器中孵育2分钟。
以0.1℃/3秒的速率冷却至25℃。
退火程序:82℃,2分钟;570x{82℃,3秒,-0.1℃/周期};4℃保温。
b)连接反应
组分 容量
10倍T4DNA连接酶缓冲液(NEB) 2μl
纯化的酶切PCR产物 15μl
上游接头(10μM) 1μl
下游接头(10μM) 1μl
T4DNA连接酶(NEB,200U/μl) 1μl
总体积 20μl
通过移液器上下轻轻混合反应混合液,并进行短暂的离心。
在室温下孵育15分钟。
2)琼脂糖凝胶电泳
a)用0.5×TBE制备2%琼脂糖凝胶,加入核酸染料(GelSafe)(每10ml体系加1μl染料)。
b)加入5μl用限制性内切酶处理纯化后的PCR产物。
c)150V电泳30分钟,凝胶成像系统拍照观察。
3)琼脂糖凝胶电泳结果
电泳结果清晰可见完成测序接头连接的产物大小均有约100bp的增长,说明接头连接成功(图8)。
4)本实施例中所用接头序列
Figure PCTCN2021143948-appb-000005
[i5]/[i7]表示8nt Illumina Index标签序列
实施例3:利用MoCODE构建NGS文库方法1
在本实施例中,使用了两个不相同的接头建库。两个MoCODE条码序列经由两个限制性内切酶消化PCR产物产生。
1)PCR模板制备
a)将Hela细胞基因组DNA(美国NEB公司)用EZ DNA Methylation-Gold Kit(美国ZYMO公司)进行重亚硫酸盐转化。
b)用Qubit荧光计测量所获转化DNA的浓度。
c)用水调节重亚硫酸盐转化DNA的浓度至50ng/μl。
2)多重PCR
a)PCR反应体系。
组分 体积
无核酸酶水 21.5μl
2倍KOD-Multi Epi PCR预混液(TOYOBO) 25μl
引物混合液(10μM) 1.5μl
亚硫酸盐处理过的Hela细胞基因组DNA 1μl(50ng)
KOD-Multi&Ep(TOYOBO) 1μl
总体积 50μl
b)PCR程序
第一步:94℃,2分钟。
第二步:6个循环(98℃,10秒;59℃,5秒;68℃,5秒)。
第三步:35个循环(98℃,10秒;68℃,10秒)。
第四步:68℃,1分钟。
第五步:保持在8℃。
3)用HiPrep PCR磁珠(美国MAGBIO公司)纯化多重PCR产物
a)用60μl磁珠(1.2倍)纯化PCR产物。
b)纯化产物洗脱在15μl水中。
c)用Qubit荧光计测量纯化PCR产物的浓度
d)用水调节产物的浓度为10ng/μl。
4)用限制性内切酶Bbvl和Earl处理纯化的PCR产物(生成的产物结构示意图如图4A所示)
组分 体积
10倍Cutsmart缓冲液(NEB) 2μl
BbvI(NEB,2U/μl) 1μl
EarI(NEB,20U/μl) 0.5μl
纯化PCR产物 5μl 50ng
无核酸酶水 11.5μl
总体积 20μl
在一个热循环器上于37℃孵育30分钟。
在65℃下孵育20分钟,使酶丧失活性。
使用HiPrep PCR磁珠(1.2x)纯化反应混合液,并洗脱在15μl水中。
5)接头连接(接头结构示意图如图4B-C所示)
a)接头的制备
Figure PCTCN2021143948-appb-000006
在82℃下,于热循环器中孵育2分钟。
以0.1℃/3秒的速率冷却至25℃。
退火程序:82℃,2分钟;570x{82℃,3秒,-0.1℃/周期};4℃保温。
b)连接反应
组分 容量
10倍T4DNA连接酶缓冲液(NEB) 2μl
纯化的酶切PCR产物 15μl
上游接头(10μM) 1μl
下游接头(10μM) 1μl
T4DNA连接酶(NEB,200U/μl) 1μl
总体积 20μl
通过移液器上下轻轻混合反应混合液,并进行短暂的离心。
在室温下孵育15分钟。
使用HiPrep PCR磁珠(1x)纯化连接混合物,并洗脱在10μl水中。
6)测量文库浓度
取1μl纯化的连接产物,制备系列10倍稀释液(1:10到1:10,000)。
用Kapa文库定量试剂盒测定1:10,000的稀释液的浓度。
用水调节文库的浓度至4nM。
在Illumina测序平台进行测序。
7)测序结果
Illumina双端测序原始.fastq文件经过PEAR软件组装为完整被测区段。每一组装后的测序结果与目标区段序列相比较,由正确配对引物产生的符合预期读长的序列认定为中靶(on-target),中靶率为中靶序列数在总读取读取数中的占比。
总读取数554265;中靶率97.0%。
8)本实施例中所用PCR引物序列
如下所示,其上下游通用特异分子条码生成序列以及Moko1-10中所述上下游引物均与实施例1相同,Moko11-23上游引物序列分别为Seq ID No:27、29、31、33、35、37、39、41、43、45、47、49、51,Moko11-23下游引物序列分别为Seq ID No:28、30、32、34、36、38、40、42、44、46、48、50、52。
Figure PCTCN2021143948-appb-000007
Figure PCTCN2021143948-appb-000008
下划线所示为特异的目标基因序列
9)本实施例中所用接头序列
如下,其与实施例2所用接头序列相同(Seq ID No:23-26)
Figure PCTCN2021143948-appb-000009
[i5]/[i7]表示8nt Illumina Index标签序列
10)本实施例中所用MoCODE条码序列和MoCODE条码解码序列
  MoCODE条码序列(5’>3’) MoCODE条码解码序列(5’>3’)
上游接头 TGTA(Seq ID No:53) TACA(Seq ID No:54)
下游接头 GAT(Seq ID No:55) ATC(Seq ID No:56)
实施例4:利用MoCODE构建NGS文库方法2
在本实施例中,使用了两个不相同的接头建库。两个MoCODE条码序列经由一个核酸内切酶消化PCR产物产生。
1)PCR模板制备
a)取待检TCT/LCT(Thin-Cytologic Test/Liquid-based cytologic test)细胞保存液1-1.5ml,离心并去除上清液,随后加入PBS 200ml重悬,使用DNeasy Blood&Tissue Kit(德国QIAGEN公司)抽提DNA。
b)用Qubit荧光计测量所获DNA浓度。
c)用EZ DNA Methylation-Gold Kit(美国ZYMO公司)对所获DNA进行重亚硫酸盐转化。
e)用Qubit荧光计测量所获转化DNA的浓度。
d)用水调节重亚硫酸盐转化DNA的浓度至10ng/μl。
2)多重PCR
a)PCR反应体系
组分 体积
无核酸酶水 17.5μl
2倍KOD-Multi Epi PCR预混液(TOYOBO) 25μl
引物混合液(10μM) 1.5μl
亚硫酸盐处理过的基因组DNA 5μl(50ng)
KOD-Multi&Ep(TOYOBO) 1μl
总体积 50μl
b)PCR程序:
第一步:94℃,2分钟;
第二步:6个循环(98℃,10秒;59℃,5秒;68℃,5秒);
第三步:35个循环(98℃,10秒;64℃,5秒;68℃,5秒);
第四步:68℃,1分钟;
第五步:保持在8℃。
3)用AMPure XP磁珠(美国Beckman Coulter公司)纯化多重PCR产物
a)用75μl磁珠(1.5倍)纯化PCR产物。
b)纯化产物洗脱在15μl水中。
c)用Qubit荧光计测量纯化PCR产物的浓度。
d)用水调节产物的浓度为20ng/μl。
4)用核酸内切酶Endonuclease V(美国NEB公司)处理纯化的PCR产物(生成的产物结构示意图如图5A所示)
组分 体积
10倍缓冲液4(NEB) 2μl
Endonuclease V(NEB,10U/μl) 1μl
纯化PCR产物 5μl(100ng)
无核酸酶水 12μl
总体积 20μl
在一个热循环器上于37℃孵育30分钟。
在65℃下孵育20分钟,使酶丧失活性。
使用AMPure XP磁珠(1.5倍)纯化反应混合液,并洗脱在13μl水中。
5)接头连接
a)接头的制备(接头结构示意图如图5B-C所示)
Figure PCTCN2021143948-appb-000010
在82℃下,于热循环器中孵育2分钟。
以0.1℃/3秒的速率冷却至25℃。
退火程序:82℃,2分钟;570x{82℃,3秒,-0.1℃/周期};4℃保温。
b)连接反应
组分 容量
10倍T4DNA连接酶缓冲液(NEB) 2μl
纯化的酶切PCR产物 13μl
上游接头(10μM) 2μl
下游接头(10μM) 2μl
T4DNA连接酶(NEB,200U/μl) 1μl
总体积 20μl
通过移液器上下轻轻混合反应混合液,并进行短暂的离心。
在室温下孵育15分钟。
使用AMPure XP磁珠(1.2倍)纯化连接混合物,并洗脱在10μl水中。
6)测量文库浓度
a)取1μl纯化的连接产物,制备系列10倍稀释液(1:10到1:10,000)。
b)用Kapa文库定量试剂盒测定1:10,000的稀释液的浓度。
c)用水调节文库的浓度至4nM。
d)在Illumina测序平台进行测序。
7)测序结果
Illumina双端测序原始.fastq文件经过PEAR软件组装为完整被测区段。每一组装后的测序结果与目标区段序列相比较,由正确配对引物产生的符合预期读长的序列认定为中靶(on-target),中靶率为中靶序列数在总读取读取数中的占比。
  样品1 样品2
总读取数 1225399 1143004
中靶率 98.0% 98.2%
8)本实施例中所用PCR引物序列
如下所示,其从左至右、从上至下依次为Seq ID No:57-104。
Figure PCTCN2021143948-appb-000011
Figure PCTCN2021143948-appb-000012
Figure PCTCN2021143948-appb-000013
I:dITP
下划线所示序列片段为特异的目标基因序列
9)本实施例中所用接头序列
如下所示,其依次为Seq ID No:105-108。
Figure PCTCN2021143948-appb-000014
[i5]/[i7]表示8nt Illumina Index标签序列
10)本实施例中所用MoCODE条码序列和MoCODE条码解码序列
如下所示,其依次为Seq ID No:109-112。
  MoCODE条码序列(5’>3’) MoCODE条码解码序列5’>3’)
上游接头 CACAT(Seq ID No:109) ATGTG(Seq ID No:110)
下游接头 CGGAA(Seq ID No:111) TTCCG(Seq ID No:112)

Claims (10)

  1. 一种用于高通量靶向测序的多重PCR文库的构建方法,其特征在于,通过对特异性扩增产物加入多碱基MoCODE条码,并利用MoCODE条码使扩增产物与包含MoCODE条码解码序列的测序接头高效连接建库,所述MoCODE条码是指用特异性核酸内切酶消化多重PCR产物后,组成所获得的PCR产物的两个粘性末端的突出的单链核苷酸序列,所述MoCODE条码解码序列为与所述MoCODE条码互补的核苷酸序列。
  2. 如权利要求1的方法,其中,所述MoCODE条码的生成方式包括:修饰核苷酸、切口酶(Nicking enzyme)、内切酶、化学修饰、可光解碱基等中的一种或多种;优选地,所述修饰核苷酸包括dUTP,dITP,RNA碱基中的一种或多种。
  3. 如权利要求1或2的方法,其中,所述MoCODE条码在分子内可以是相同的或不相同的。
  4. 如权利要求1-3任一项的方法,其中,所述MoCODE条码为非随机特异性条码。
  5. 如权利要求1-4任一项的方法,其中,所述MoCODE条码的长度2-20nt,优选地,所述MoCODE条码解码序列与MoCODE条码序列为互补序列,长度2-20nt。
  6. 如权利要求1-5任一项的方法,所述测序接头可以是人工设计合成、或与目的区段自身片段序列匹配;优选地,所述测序接头可以为单一接头、双向接头,优选地,每一个特定区段富集可通过单一接头解码、双接头解码或自动环化解码。
  7. 一种用于高通量靶向测序的多重PCR的引物,其特征在于,所述引物包含MoCODE条码生成序列,优选地,所述引物的序列包含选自Seq ID No:1-22、27-52、53、55、57-104、109、111所示序列。
  8. 一种用于高通量靶向测序的多重PCR的测序接头,其特征在于,所述测序接头包含MoCODE条码解码序列,优选地,所述测序接头还包含测序平台的测序接头、index标签中的一种或多种,优选地,所述测序接头包含高通量测序通用序列、index标签和所述MoCODE条码解码序列,优选地,所述测序接头的序列包含选自Seq ID No:23-26、54、56、105-108、110、112所示序列。
  9. 一种用于高通量靶向测序的多重PCR文库构建方法,其特征在于,所述方法包括以下步骤:
    1)从待检样本中提取DNA;
    2)进行多重PCR反应,参与多重PCR反应的每条引物包含一段特异的MoCODE条码生成序列,优选地,所述引物还包含基因特异性序列;
    3)用磁珠法纯化步骤2)所得PCR产物;
    4)使步骤3)所得纯化PCR产物产生5’和3’粘性末端,并分别在5’和/或3’粘性末端生成MoCODE条码;
    5)用磁珠法纯化步骤4)的含有MoCODE条码的PCR产物;
    6)连接步骤5)所得的纯化的含有MoCODE条码的PCR产物和测序接头,所述测序接头含有与MoCODE互补的MoCODE条码解码序列;
    7)用磁珠纯化步骤6)所得连接产物,完成用于高通量靶向测序的多重PCR文库的构建。
  10. 如权利要求9的方法,其中,步骤4)中所述MoCODE条码的生成方式包括:修饰核苷酸、切口酶(Nicking enzyme)、内切酶、化学修饰、可光解碱基等中的一种或多种;优选地,所述修饰核苷酸包括dUTP,dITP,RNA碱基中的一种或多种,更优选地,所述MoCODE条码的生成方式为利用特异性核酸内切酶进行酶消化;
    优选地,步骤4)中所述在5’和3’粘性末端各生成一个MoCODE条码,其中所述5’和3’粘性末端的MoCODE条码可以相同也可以不同;
    优选地,步骤6)中所述测序接头可以为单一结头、双向接头或环化接头。
PCT/CN2021/143948 2020-12-31 2021-12-31 一种用于高通量靶向测序的多重pcr文库构建方法 WO2022144003A1 (zh)

Priority Applications (2)

Application Number Priority Date Filing Date Title
CN202180088322.4A CN116888276A (zh) 2020-12-31 2021-12-31 一种用于高通量靶向测序的多重pcr文库构建方法
US18/270,492 US20240076653A1 (en) 2020-12-31 2021-12-31 Method for constructing multiplex pcr library for high-throughput targeted sequencing

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202011628234 2020-12-31
CN202011628234.2 2020-12-31

Publications (1)

Publication Number Publication Date
WO2022144003A1 true WO2022144003A1 (zh) 2022-07-07

Family

ID=82260289

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2021/143948 WO2022144003A1 (zh) 2020-12-31 2021-12-31 一种用于高通量靶向测序的多重pcr文库构建方法

Country Status (3)

Country Link
US (1) US20240076653A1 (zh)
CN (1) CN116888276A (zh)
WO (1) WO2022144003A1 (zh)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115992243A (zh) * 2022-11-11 2023-04-21 深圳凯瑞思医疗科技有限公司 一种检测卵巢癌的引物组合、试剂盒及文库构建方法

Citations (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5508169A (en) * 1990-04-06 1996-04-16 Queen's University At Kingston Indexing linkers
US6045994A (en) * 1991-09-24 2000-04-04 Keygene N.V. Selective restriction fragment amplification: fingerprinting
WO2001075154A2 (de) * 2000-04-03 2001-10-11 Axaron Bioscience Ag Neue verfahren zur parallelen sequenzierung eines nukleinsäuregemisches an einer oberfläche
US20030232348A1 (en) * 2002-06-17 2003-12-18 Affymetrix, Inc. Complexity management of genomic DNA by locus specific amplification
US20060024681A1 (en) * 2003-10-31 2006-02-02 Agencourt Bioscience Corporation Methods for producing a paired tag from a nucleic acid sequence and methods of use thereof
CN101374963A (zh) * 2005-12-22 2009-02-25 凯津公司 用于基于aflp的高通量多态性检测的方法
US20090092967A1 (en) * 2006-06-26 2009-04-09 Epoch Biosciences, Inc. Method for generating target nucleic acid sequences
CN102373287A (zh) * 2011-11-30 2012-03-14 盛司潼 一种检测肺癌易感基因的方法及试剂盒
WO2018040961A1 (zh) * 2016-08-30 2018-03-08 广州康昕瑞基因健康科技有限公司 一种建库方法及snp分型方法

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110734908B (zh) * 2019-11-15 2021-06-08 福州福瑞医学检验实验室有限公司 高通量测序文库的构建方法以及用于文库构建的试剂盒

Patent Citations (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5508169A (en) * 1990-04-06 1996-04-16 Queen's University At Kingston Indexing linkers
US6045994A (en) * 1991-09-24 2000-04-04 Keygene N.V. Selective restriction fragment amplification: fingerprinting
WO2001075154A2 (de) * 2000-04-03 2001-10-11 Axaron Bioscience Ag Neue verfahren zur parallelen sequenzierung eines nukleinsäuregemisches an einer oberfläche
US20030232348A1 (en) * 2002-06-17 2003-12-18 Affymetrix, Inc. Complexity management of genomic DNA by locus specific amplification
US20060024681A1 (en) * 2003-10-31 2006-02-02 Agencourt Bioscience Corporation Methods for producing a paired tag from a nucleic acid sequence and methods of use thereof
CN101374963A (zh) * 2005-12-22 2009-02-25 凯津公司 用于基于aflp的高通量多态性检测的方法
US20090092967A1 (en) * 2006-06-26 2009-04-09 Epoch Biosciences, Inc. Method for generating target nucleic acid sequences
CN102373287A (zh) * 2011-11-30 2012-03-14 盛司潼 一种检测肺癌易感基因的方法及试剂盒
WO2018040961A1 (zh) * 2016-08-30 2018-03-08 广州康昕瑞基因健康科技有限公司 一种建库方法及snp分型方法

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115992243A (zh) * 2022-11-11 2023-04-21 深圳凯瑞思医疗科技有限公司 一种检测卵巢癌的引物组合、试剂盒及文库构建方法
CN115992243B (zh) * 2022-11-11 2024-01-26 深圳凯瑞思医疗科技有限公司 一种检测卵巢癌的引物组合、试剂盒及文库构建方法

Also Published As

Publication number Publication date
CN116888276A (zh) 2023-10-13
US20240076653A1 (en) 2024-03-07

Similar Documents

Publication Publication Date Title
US11697843B2 (en) Methods for creating directional bisulfite-converted nucleic acid libraries for next generation sequencing
US20220017893A1 (en) Capture methodologies for circulating cell free dna
JP7535611B2 (ja) ライブラリー調製方法ならびにそのための組成物および使用
CN109511265B (zh) 通过链鉴定改进测序的方法
US20220364169A1 (en) Sequencing method for genomic rearrangement detection
CN111801427B (zh) 用于单分子的单链环状dna模板的产生
CN110603326A (zh) 扩增靶核酸的方法
WO2022144003A1 (zh) 一种用于高通量靶向测序的多重pcr文库构建方法
EP3601611B1 (en) Polynucleotide adapters and methods of use thereof
US20220348940A1 (en) Method for introducing mutations
CA3223987A1 (en) Methods, compositions, and kits for preparing sequencing library
CN110468179A (zh) 选择性扩增核酸序列的方法
KR20230124636A (ko) 멀티플렉스 반응에서 표적 서열의 고 감응성 검출을위한 조성물 및 방법
JP2022546485A (ja) 腫瘍高精度アッセイのための組成物および方法
US12091715B2 (en) Methods and compositions for reducing base errors of massive parallel sequencing using triseq sequencing
WO2018009677A1 (en) Fast target enrichment by multiplexed relay pcr with modified bubble primers
CN115279918A (zh) 用于测序的新型核酸模板结构
KR20230028450A (ko) 앰플리콘 포괄적 풍부화

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 21914731

Country of ref document: EP

Kind code of ref document: A1

WWE Wipo information: entry into national phase

Ref document number: 202180088322.4

Country of ref document: CN

WWE Wipo information: entry into national phase

Ref document number: 18270492

Country of ref document: US

NENP Non-entry into the national phase

Ref country code: DE

32PN Ep: public notification in the ep bulletin as address of the adressee cannot be established

Free format text: NOTING OF LOSS OF RIGHTS PURSUANT TO RULE 112(1) EPC (EPO FORM 1205A DATED 10/10/2023)

122 Ep: pct application non-entry in european phase

Ref document number: 21914731

Country of ref document: EP

Kind code of ref document: A1