WO2016181128A1 - Methods, compositions, and kits for preparing sequencing library - Google Patents

Methods, compositions, and kits for preparing sequencing library Download PDF

Info

Publication number
WO2016181128A1
WO2016181128A1 PCT/GB2016/051335 GB2016051335W WO2016181128A1 WO 2016181128 A1 WO2016181128 A1 WO 2016181128A1 GB 2016051335 W GB2016051335 W GB 2016051335W WO 2016181128 A1 WO2016181128 A1 WO 2016181128A1
Authority
WO
WIPO (PCT)
Prior art keywords
primers
target
primer
sequence
target specific
Prior art date
Application number
PCT/GB2016/051335
Other languages
French (fr)
Inventor
Guoliang Fu
Original Assignee
Genefirst Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Priority claimed from GBGB1507978.3A external-priority patent/GB201507978D0/en
Priority claimed from GBGB1517255.4A external-priority patent/GB201517255D0/en
Priority claimed from GBGB1600415.2A external-priority patent/GB201600415D0/en
Application filed by Genefirst Ltd filed Critical Genefirst Ltd
Publication of WO2016181128A1 publication Critical patent/WO2016181128A1/en

Links

Classifications

    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12QMEASURING OR TESTING PROCESSES INVOLVING ENZYMES, NUCLEIC ACIDS OR MICROORGANISMS; COMPOSITIONS OR TEST PAPERS THEREFOR; PROCESSES OF PREPARING SUCH COMPOSITIONS; CONDITION-RESPONSIVE CONTROL IN MICROBIOLOGICAL OR ENZYMOLOGICAL PROCESSES
    • C12Q1/00Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions
    • C12Q1/68Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions involving nucleic acids
    • C12Q1/6844Nucleic acid amplification reactions
    • C12Q1/6853Nucleic acid amplification reactions using modified primers or templates

Definitions

  • This invention relates to methods, compositions and kits for making a target enriched sequencing library from one or more samples involving linear amplification and tagging two strands of target sequence.
  • the sequencing library is suitable for massive parallel sequencing and comprises a plurality of double- stranded nucleic acid molecules.
  • Duplex sequencing (Schmitt, et al PNAS 109: 14508-14513) is one of them. This approach greatly reduces errors by independently tagging and sequencing each of the two strands of a DNA duplex. As the two strands are complementary, true mutations are found at the same position in both strands. In contrast, PCR or sequencing errors result in mutations in only one strand and can thus be discounted as technical error.
  • Safe-Sequencing System Safe-Sequencing System
  • the keys to this approach are (i) assignment of a unique identifier (UID) to each template molecule, (ii) amplification of each uniquely tagged template molecule to create UID families, and (iii) redundant sequencing of the amplification products. PCR fragments with the same UID are considered mutant ("supermutants") only if 95% of them contain the identical mutation.
  • UID unique identifier
  • US Patents US8,722,368, US8,685,678, US8,742,606 describe methods of sequencing
  • polynucleotides attached with degenerate base region to determine/estimate the number of different starting polynucleotides.
  • these methods do not compare sequence information of the original two strands and involve ligating and PCR to attach degenerate base region.
  • Targeted next generation sequencing often involves the analysis of large complex fragments and this is achieved by multiplex PCR (the simultaneous amplification of different target DNA sequences in a single PCR reaction).
  • Results obtained with multiplex PCR are often complicated by artifacts of the amplification products. These include false negative results due to reaction failure and false-positive results (such as amplification of spurious products) due to non-specific priming events. Since the possibility of non-specific priming increases with each additional primer pair, conditions must be modified as necessary as individual primer sets are added.
  • sample refers to any substance containing or presumed to contain nucleic acids and includes a sample of tissue or fluid isolated from an individual or individuals.
  • the nucleic acid sample may be obtained from an organism selected from viruses, bacteria, fungi, plants, and animals.
  • the nucleic acid sample is obtained from a mammal.
  • the mammal is human.
  • the nucleic acid sample can be obtained from a specimen of body fluid or tissue biopsy of a subject, or from cultured cells.
  • the body fluid may be selected from whole blood, serum, plasma, urine, sputum, bile, stool, bone marrow, lymph, semen, breast exudate, bile, saliva, tears, bronchial washings, gastric washings, spinal fluids, synovial fluids, peritoneal fluids, pleural effusions, and amniotic fluid.
  • a "individual sample” may be a single cell, which can be one T cell or one B cell, while the plurality of samples may be many blood cells in a blood sample.
  • nucleotide sequence refers to either a homopolymer or a heteropolymer of deoxyribonucleotides, ribonucleotides or other nucleic acids.
  • nucleotide generally refers to the monomer components of nucleotide sequences even though the monomers may be nucleoside and/or nucleotide analogs, and/or modified nucleosides such as amino modified nucleosides in addition to nucleotides.
  • nucleotide includes non-naturally occurring analog structures.
  • nucleic acid refers to at least two nucleotides covalently linked together.
  • a nucleic acid of the present invention will generally contain phosphodiester bonds, although in some cases nucleic acid analogs are included that may have alternate backbones.
  • Nucleic acids may be single-stranded or double-stranded, as specified, or contain portions of both double-stranded and single-stranded sequence.
  • the nucleic acid may be DNA, both genomic and cDNA, RNA or DNA-RNA hybrids, where the nucleic acid contains any combination of deoxyribo- and ribo-nucleotides, and any combination of bases, including uracil, adenine, thymine, cytosine, guanine, inosine, xathanine, hypoxathanine, etc.
  • Reference to a "DNA sequence" can include both single-stranded and double- stranded DNA. A specific sequence, unless the context indicates otherwise, refers to the single stranded DNA of such sequence, the duplex of such sequence with its complement (double stranded DNA) and/or the complement of such sequence.
  • polynucleotide and “oligonucleotide” are types of “nucleic acid”, and generally refer to primers, oligomer fragments to be detected. There is no intended
  • nucleic acid refers to any nucleic acid
  • polynucleotide and oligonucleotide
  • Nucleic acid also include nucleic acid analogs.
  • the oligonucleotide is not necessarily physically derived from any existing or natural sequence but may be generated in any manner, including chemical synthesis, DNA replication, reverse transcription or a combination thereof.
  • target sequence As used herein, the terms "target sequence”, “target nucleic acid”, “target nucleic acid sequence” and “nucleic acids of interest” are used interchangeably and refer to a desired region which is to be either amplified, detected or both, or is the subject of hybridization with a complementary oligonucleotide, polynucleotide, e.g., a blocking oligomer, or the subject of a primer extension process.
  • the target sequence can be composed of DNA, RNA, analogs thereof, or combinations thereof.
  • the target sequence can be single-stranded or double-stranded.
  • the target nucleic acid which forms a hybridization duplex with the primer may also be referred to as a "template.”
  • a template serves as a pattern for the synthesis of a complementary polynucleotide.
  • a target sequence for use with the present invention may be derived from any living or once living organism, including but not limited to prokaryotes, eukaryotes, plants, animals, and viruses, as well as synthetic and/or recombinant target sequences.
  • Primer refers to more than one primer and refers to an oligonucleotide, whether occurring naturally or produced synthetically, which is capable of acting as a point of initiation of synthesis when placed under conditions in which synthesis of a primer extension product, which is complementary to a nucleic acid strand is induced i.e., in the presence of nucleotides and an agent for polymerization such as DNA polymerase and at a suitable temperature and in a suitable buffer.
  • Such conditions include the presence of four different deoxyribonucleoside triphosphates and a polymerization-inducing agent such as DNA
  • the primer is preferably single- stranded for maximum efficiency in amplification.
  • the primers herein are selected to be substantially complementary to a strand of each specific sequence to be amplified. This means that the primers must be sufficiently complementary to hybridize with their respective strands.
  • a non-complementary nucleotide fragment may be attached to the 5 '-end of the primer (5' tail portion) or in the primer (bulge portion), with the remainder of the primer sequence being complementary to the desired section of the target base sequence.
  • the primers are complementary, except when non-complementary nucleotides may be present at a predetermined primer terminus or middle region as described.
  • the primers herein are selected to be substantially identical to a strand of each specific sequence to be amplified. This means that the primers must be sufficiently identical to one strand, so that they can hybridize with their respective other strands.
  • the term "complementary” refers to the ability of two nucleotide sequences to bind sequence- specifically to each other by hydrogen bonding through their purine and/or pyrimidine bases according to the usual Watson-Crick rules for forming duplex nucleic acid complexes. It can also refer to the ability of nucleotide sequences that may include modified nucleotides or analogues of deoxyribonucleotides and ribonucleotides to bind sequence- specifically to each other by other than the usual Watson Crick rules to form alternative nucleic acid duplex structures.
  • hybridization and “annealing” are interchangeable, and refers to the process by which two nucleotide sequences complementary to each other bind together to form a duplex sequence or segment.
  • duplex and “double-stranded” are interchangeable, meaning a structure formed as a result of hybridization between two complementary sequences of nucleic acids.
  • duplexes can be formed by the complementary binding of two DNA segments to each other, two RNA segments to each other, or of a DNA segment to an RNA segment, the latter structure being termed as a hybrid duplex.
  • Either or both members of such duplexes can contain modified nucleotides and/or nucleotide analogues as well as nucleoside analogues.
  • such duplexes are formed as the result of binding of one or more blocking oligonucleotides to a sample sequence.
  • wild-type nucleic acid As used herein, the terms "wild-type nucleic acid”, “normal nucleic acid”, “nucleic acid with normal nucleotides”, “wild-type DNA” and “wild-type template” are used interchangeably and refer to a polynucleotide which has a nucleotide sequence that is considered to be normal or unaltered.
  • mutant polynucleotide refers to a polynucleotide which has a nucleotide sequence that is different from the nucleotide sequence of the corresponding wild- type polynucleotide.
  • nucleotide sequence of the mutant polynucleotide as compared to the wild-type polynucleotide is referred to as the nucleotide "mutation", "variant nucleotide” or “variation.”
  • variant nucleotide(s) also refers to one or more nucleotide(s) substitution, deletion, insertion, methylation, and/or modification changes.
  • Amplification denotes the use of any amplification procedures to increase the concentration of a particular nucleic acid sequence within a mixture of nucleic acid sequences.
  • reaction mixture refers to a mixture of components necessary to amplify at least one product from nucleic acid templates.
  • the mixture may comprise nucleotides (dNTPs), a thermostable polymerase, primers, and a plurality of nucleic acid templates.
  • the mixture may further comprise a Tris buffer, a monovalent salt and Mg 2+ .
  • concentration of each component is well known in the art and can be further optimized by an ordinary skilled artisan.
  • amplified product or "amplicon” refer to a fragment of DNA amplified by a polymerase using single-side primers or a pair of primers in an amplification method.
  • primer extension product refers to a fragment of DNA extended by a polymerase using one or a pair of primers in a reaction, which may involve one pass extension, for example first strand cDNA synthesis, or two pass extension, for example double strand cDNA syntheses, or many cycles of extension, which may be a PCR.
  • the invention provides methods of processing target nucleic acids from one or more samples, wherein a target nucleic acid in a sample may be single- stranded molecule (which is referred to as first strand, wherein its complement is referred to as second strand) or double- stranded duplex which comprises a first strand and a complementary second strand, wherein the method comprises:
  • reaction mixture(s) each comprising a first set of multiple target specific primers annealing to multiple target sequences, for any particular target sequence, forward primers are designed to hybridise to the first strands of the target sequences, reverse primers are designed to hybridise to the second strands of the target sequences, wherein for one reaction the set of the target specific primers comprises either forward primers or reverse primers but not both;
  • step (d) PCR amplifying the products of step (c) using primers to generate double- stranded PCR products, wherein the product of this step may be used directly for sequencing,
  • the method may further comprise step (e) processing the PCR products of step (d) to complete the sequencing library preparation for massive parallel sequencing such as a NGS platform.
  • step (c) and /or step (e) may comprise removing the unreacted primers, wherein the removing of the unreacted primers may comprise purifying the single-stranded linear
  • amplification products of step (b) or double- stranded product of step (e), for example a bead or column based method is used to remove unreacted primers.
  • the removing of the unreacted primers may comprise treating the amplification products by enzymatic digestion to remove the unreacted primers, wherein the enzymatic digestion may be exonuclease I digestion.
  • the step (c) may comprise hybridising the single-stranded amplification products to a second set of target-specific primers.
  • the hybridised target- specific primers of the second set of primers may be extended on the single-stranded amplification products (one pass extension).
  • the target-specific primer may comprise an affinity label or 5' universal tail portion, wherein the 5' universal tail portion of the hybridised target- specific primers are hybridised with an affinity-labelled oligonucleotide complementary to the 5' universal tail.
  • the affinity label may be biotin, the complex of the hybridised amplification products/ target- specific oligonucleotides/biotin-labelled oligonucleotide are captured by avidin solid supports.
  • the target specific primer may comprise a 5' tail portion and a 3' target complementary portion (Fig. 4).
  • the target specific primer may comprise a bulge portion which is non-complementary to a target sequence and is located between the two target complementary portions, in another words, the bulge portion interrupts the target complementary portion of a primer, creating the complementary 5' part and a 3' part and non-complementary bulge in the middle.
  • the bulge portion may be located at 4-15 nucleotides from the 3' end, preferably at 5-10 nucleotide away from the 3' end.
  • a primer may have a bulge portion, the primer still can function as primer.
  • the 5' tail portion or budge portion not complementary to the target sequence may comprise random sequence identifier (RSI), or/and sequence compatible for a NGS platform, which may comprise universal PCR primer sequence, NGS sequencing primer sequence, and/or NGS adaptor sequences.
  • RKI random sequence identifier
  • first set of target- specific primers are present in a reaction, wherein the target- specific primer in the first set is capable of hybridising to the first strand of a target duplex. Since there is no primer pair annealing to both strands of a target sequence, in step (b) the linear amplification occurs.
  • step (c) and/or step (d) there may be a second set of the target- specific primers present in the reaction to either enriching, one pass extension or amplifying the products. The second set of primers are capable of hybridising to the extension strands generated from the first set of primers.
  • the target- specific primers in the first set or second set may comprise a random sequence identifier (RSI) which is located between the 5' tail portion and the 3' target complementary portion or in the bulge portion , wherein RSI portion comprises at least three random or degenerated nucleotides, wherein during step (b) or/and step (c) RSI assigns each extended strand an unique sequence identifier such that during sequence analysis based on the unique RSI, the sequences sharing the same RSI are grouped into a family.
  • the random sequence identifier may comprise a sequence that is between approximately 3 and 20 nucleotides in length.
  • RSI portion may comprise at least 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13,14, or 15 random or degenerated nucleotides, wherein during linear amplification step (b) or step (c) RSI assigns each amplified strand with an unique sequence identifier such that during sequence analysis based on the unique RSI, the sequences sharing the same RSI are grouped into a family (Fig. 3).
  • the random sequence identifier may comprise degenerate or semi-degenerate or completely random nucleic acid sequence.
  • the target specific primer may comprise a multiplex identifier (MID) located between the 5' tail portion and the 3' target complementary portion or in the bulge portion, wherein MID is used to identify a sample when multiple samples are sequenced together.
  • the target specific primer may comprise both RSI and MID located between the 5' tail portion and the 3' target complementary portion. It does not matter how RSI and MID are located. It may be preferred MID is located 3' of RSI such that a primer may have a structure 5' tail portion - RSI - MID - 3 'target complementary portion (Fig. 2C).
  • the step (c) may comprise purifying the single- stranded linear amplification products of step (b). For example, enzymatic digestion, a bead or column based method may be used to remove unreacted primers. The purification method removes the unreacted primers. Any method can be used; preferred method is the Agencourt AMPure XP beads from Beckman coulter. After digestion or purification, the purified product may be immediately processed to step (d). In the step (d), the PCR primers may comprise second set of target- specific primers annealing to the linear amplification product, and universal primer which is related to the 5' tail portion of primers of first set.
  • each primer of the first set comprises bulge portion without a 5' tail portion
  • the linear amplification product may be purified, for example beads purification
  • the PCR primers may include a second set of target specific primer annealing to the linear amplification product, and third set of target specific primers related to the 5' part sequence of the bulge primers of first set.
  • related means comprising same sequence or similar sequence, for example similar may mean sharing at least 95, 96, 97, 98 or 99% sequence identity.
  • the universal primer is capable of hybridising to the 5' tail portion of primers of first set.
  • the universal primer is capable of hybridising to the 5' tail portion of primers of second set.
  • the universal primer is capable of hybridising to the copied part of the 5' tail portion of the primers of the first set.
  • the universal primer is capable of hybridising to the copied part of the 5' tail portion of the primers of the second set.
  • the step (c) may comprise hybridising the single-stranded single-side amplification products to a second set of multiple target- specific primers which are capable of annealing to the linear amplification products generated from the first set of the target- specific primers.
  • RSI is preferably incorporated into primer extended target nucleic acids in the step (a) and (b), but RSI may be also incorporated into target nucleic acids in the step (c).
  • the target-specific primer in the first set comprises only 3' target
  • each primer in the second set comprises a 5' tail portion or bulge portion, which comprises random sequence identifier (RSI).
  • RSI random sequence identifier
  • the annealed primers of the second set may be extended on the templates generated from step (b), wherein the RSI is incorporated into the extended target nucleic acids.
  • the extension may be done once or twice, or more than two times, which may be achieved by temperature cycling through denaturing, annealing and extension.
  • the PCR primers may include third set of target specific primer nested to the first set of target specific primer, and the universal primer related to the 5' tail sequence of the primers of second set if the primers in the second set comprise a 5' tail portion.
  • the PCR primers may include third set of target specific primer nested to the first set of target specific primer, and fourth set of target specific primers related to the 5' part sequence of the bulge primers of second set if the primer in the second set comprises a bulge portion.
  • Nested primers for use in the PCR amplification are oligonucleotides having sequence complementary to a region on a target sequence between reverse and forward primer targeting sites. One primer is called outer primer; its nested primer is called inner primer. The nested inner primer may overlap with its outer primer.
  • the hybridised target- specific primers of the second set may be extended on the templates of the single- stranded single-side amplification products.
  • the extension reaction may be performed in the same reaction vessel as the linear amplification reaction vessel. After linear amplification with or without removing the unreacted primers of the first set, the target- specific primers of second set are added into the reaction, heat denatured, put to hybridisation/extension conditions.
  • the extension condition may include the same ingredients in the linear amplification reaction.
  • the extension may be performed at cycling condition to extend the oligonucleotides several times, but preferably the extension is performed only once or twice.
  • the extended double-strand products may be purified by any means known in the art, for example Qiagen PCR purification kit, or Agencourt Ampure XP kit.
  • the target-specific primer in the second set may comprise a 5' universal tail, wherein the 5' universal tail portion of the target- specific primers may be hybridised with an affinity-labelled oligonucleotide complementary to the 5' universal tail (Fig. 4D).
  • the affinity label may be biotin, the complex of the linear amplification products/ target- specific oligonucleotides/biotin-labelled oligonucleotide may be captured by avidin solid supports.
  • the target specific primer of step (a) may be ordinary primer comprising target complementary sequence only.
  • the target specific primer of step (a) may comprise a 5' tail portion and a 3' target complementary portion.
  • the 3' target complementary portion is used to hybridise to the target sequence and prime DNA synthesise.
  • the 5' tail portion may comprise random sequence identifier, or/and sequence compatible to the followed amplification or/and sequencing process in a NGS platform (Fig. 4).
  • the 5' tail portion may comprise sequence compatible to the primer used in the NGS.
  • the target specific primer may comprise a 3' target complementary portion, which is disrupted by a bulge portion located at 3, 4, 5, 6, 7, 8, 9, 10 or more nucleotides from 3' end of the primer.
  • the bulge portion may have any desired length.
  • the bulge portion may be solely RSI, which is 3-20 nucleotides long.
  • the 5' tail portion or the bulge portion is not complementary to the initial target sequence (Fig. 4).
  • the 5' tail portion or the bulge portion of the primer may comprise random sequence identifier or/and sequence compatible to a NGS platform.
  • the hybridisation creates an unpaired base bulge.
  • the bulge portion like the 5' tail portion may comprise restriction site, RSI, MID or sequence compatible for NGS sequencing, such as sequencing primer.
  • step (a) only one side of primers for a particular target is present in the reaction so that single- stranded linear amplification products is generated in step (b).
  • first strand the target specific forward primers complementary to the RNA template may be present in the reaction, but no reverse primers are in the same reaction.
  • double stranded DNA templates the target specific forward primers complementary to the first strands of the DNA templates are present in forward reaction, but no reverse primers are in the same forward reaction.
  • the target specific reverse primers complementary to the second strands of the DNA templates are present in the reverse reaction, but no forward primers are present in the same reverse reaction (Fig. 3).
  • step (b) is carried out.
  • the linear single-side amplification can be isothermal amplification.
  • the linear single-side amplification is a thermal cycling amplification involving temperature cycling, including denaturing step, and annealing /extension step.
  • the cycle number can be any suitable number, which may be between 1-100 cycles, for example 1 cycle, 2 cycles, 3 cycles, 10 cycles, 15 cycles, 20 cycles, 25, cycles, 30 cycles, 35 cycles, 40 cycles, 45 cycles, 50 cycles, 60 cycles or 100 cycles.
  • step (b) the reaction can immediately be processed to step (d) without any purification and enrichment step. It is preferred that the remaining primers after the reaction of step (b) are kept at a considerably low level, therefore do not interfere the next step.
  • One method to achieve this may be that the primers may be consumed in the linear amplification and reach to a very low level at the end of linear amplification. For this to happen, the primers added in the starting reaction must be in a very small amount, so that most primers are consumed after linear amplification.
  • an optional purification or enrichment in step (c) may be carried out. Any purification method can be used to remove the unreacted primers, for example using beads to purify. Alternatively, enrichment of desired linear amplification product may be carried out.
  • the step (c) may comprise hybridising the linear amplification products to a second set of multiple target- specific primers.
  • the second set of the target- specific primers may be the same as used in both step (c) or/and step (d).
  • step (d) may use a different set of target -specific primers or may not use target specific primers.
  • the hybridised second set of the target- specific primers may be extended on the templates of the linear amplification products(one pass extension).
  • the extension reaction may be performed in the same reaction.
  • the extended double- strand products may be purified by any means known in the art.
  • the purified extended products are amplified in step (d).
  • the primers used for amplification may comprise a first universal primer and a second universal primer, wherein the first universal primer comprises a sequence related to the 5' tail portion sequence of primers in the first set, the second universal primer related to the 5' tail portion sequence of the second set of the target- specific primers.
  • the primers used for amplification may comprise an universal primer related to the first set of primer and a second set of multiple target specific primers, wherein the second set of multiple target specific primers capable of hybridising to the extended products of the first set of the primers, wherein the universal primer comprises a sequence related to the 5' tail portion sequence of primers in the first set.
  • the primers used for amplification may comprise a second set of multiple target specific primers, wherein the second set of multiple target specific primers capable of hybridising to the extended products of the first set of the primers, and third set of multiple target specific primers, which are nested primer relative to the first set, or are related to the 5' part of bulge primer of the first set.
  • the step (c) may comprise exonuclease I treatment or/and purifying the product of step (b) to remove the unreacted primers
  • the purified product of step (c) is amplified by second set of target specific primers comprising 3' priming sequences capable of hybridising to the purified linear amplified product of step (b) and third set of target specific primers comprising 3' priming sequences which are identical or substantially identical to the 5' part of bulge portion of the first set of target specific primers (Fig. 2).
  • the linear amplification products may be enriched by hybridising probes on a solid support.
  • the probes bind the desired linear amplification product specifically.
  • the pairing second set of primers capable of hybridising to the single-stranded linear product of step (b) may be used in step (c) as probes to enrich the target sequence.
  • the term "pairing" means, if one primer is forward primer, the pairing primer is reverse primer.
  • the target specific primers may comprise a 5' tail portion and a 3' target complementary portion (Fig. 1 and 4).
  • An affinity labelled oligonucleotide is complementary to the 5' tail portion or bulge portion of the target specific primers.
  • the affinity label may be biotin.
  • the linear amplification products are hybridised to the target specific primers, which are then hybridised to the biotin labelled the oligonucleotide through the 5' tail portion or bulge portion. Then the biotin labelled the oligonucleotides are pulled out by streptavidin beads (Fig. 4 D). All unreacted primers, template DNA and non-specific products are removed by the enrichment. Particularly, if in the forward reaction the primers are forward primers, the linear amplification product from the forward reaction may be enriched by hybridising to the target specific reverse primers, which either comprise an affinity label, or comprise a 5' tail portion/bulge portion which is hybridised to a universal oligonucleotide which comprises an affinity label.
  • the capture of the linear amplification products can be performed either on a solid phase or in liquid step.
  • the capture operation of the enrichment will employ hybridisation to probes representing multiple target nucleic acids.
  • On a solid phase non-binding fragments are separated from binding fragments.
  • Suitable solid supports known in the art include filters, glass slides, membranes, beads, columns, etc.
  • a capture reagent can be added which binds to the probes, for example through a biotin-avidin type interaction. After capture, desired fragments can be eluted for further processing.
  • primers used to generate double stranded PCR products may comprise target specific forward primers and target specific reverse primers. If the primers in the reaction of the step (a) are forward primers, another set of the target specific forward primers of step (d) may be nested primers in terms of forward primers of step (a).
  • primers used to generate double stranded PCR products may comprise a universal primer and a second set of multiple target specific primers. The second set of multiple target specific primers comprises either reverse primers or forward primers but not both, wherein the universal primer comprises sequence related to the 5' tail portion sequence or bulge portion of primers in the first set.
  • the primers used in the forward reaction of step (d) comprise a second set of target specific reverse primers and universal primer, which are capable of targeting the 5' tail portion of the primers used in steps (a and b). If in the reverse reaction of steps (a and b) the target specific primers are reverse primers, which comprise 3' target complementary portion and 5' tail portion, the primers used in the reverse reaction of step (d) comprise a second set of target specific forward primers and universal primer, which are capable of targeting to the 5' tail portion of the primers used in steps (a and b).
  • the single-stranded starting molecule may be RNA, or single-stranded cDNA.
  • the double- stranded duplex may be genomic DNA, or any suitable dsDNA present in a sample.
  • the step (a) the reaction mixtures may comprise two reactions: forward reaction and reverse reaction.
  • the forward reaction comprises a first set (forward set) of multiple target specific forward primers annealing to first strands of the multiple target sequences from one sample
  • the reverse reaction comprises a first set (reverse set) of multiple target specific reverse primers annealing to the second strands of the multiple target sequences from the same one sample.
  • the primers used to generate PCR products may comprise an universal primer targeting 5' tail portion of first set primers and another universal primer targeting 5' tail portion of second set of primers if the step (c) comprises enriching the linear amplification products by hybridising and extension of the second set of the target- specific primers.
  • the primers used to generate PCR products in the step (d) may comprise an universal primer targeting 5' tail portion of first set primers and a second set of multiple target specific primers annealing to second strands of the multiple target sequences.
  • the primers used to generate PCR products in the step (d) may comprise a universal primer targeting 5' tail portion of first set primers and a third set of multiple target specific primers annealing to second strands of the multiple target sequences, wherein the third set of the target- specific primers (inner primers) is nested to the second set of the target- specific primers (outer primers).
  • the universal primers in the forward and reverse reactions may be the same.
  • the reaction mixtures may comprise multiple reactions for more than one sample, which may be two samples, three samples or more than three samples, or more than 10 samples.
  • Each sample may comprise two reactions: forward reaction and reverse reaction.
  • Different sample reactions (all forward reactions, or all reverse reactions) may be preferably mixed in step (c or d), where the identity of each sample is assigned in the linear amplification by target specific primers having MID. All forward reactions or reverse reactions after linear amplification may be processed in one mixture in step (c) and followed steps.
  • the PCR products may be purified and ready for sequencing, or may be further amplified in another PCR to add universal primers used for sequencing.
  • all forward reaction and reverse reactions may be mixed and amplified by using universal primers, which target to the 5' tail portion of the target specific primers used in step (a) or/and step (d). Then the PCR products may be purified and size selected ready for NGS sequencing.
  • the method further comprises analysing the NGS reads derived from the forward reaction and the reverse reaction, which represent two different strands of target sequences, comprising generating an error-corrected consensus sequences by (i) grouping into families containing the same random sequence identifier sequences; (ii) removing the target sequences of the same family having one or more nucleotide positions where the target sequence disagree with majority members, and (iii) examining if the same mutations appearing in the two reactions, which represent different strands of a target sequence.
  • the method further comprises analysing the NGS reads derived from the forward reaction and the reverse reaction, which represent two different strands of target sequences, comprising generating consensus sequences by grouping into families containing the same random sequence identifier (RSI)sequences; and counting the numbers of families.
  • This method provides an accurate counting for the numbers of original target nucleic acids present in a sample.
  • the methods can be used to quantitate the starting molecules, although the single-side amplification may distort the number of the original target molecule number. Nevertheless, the counting of RSI families of a target sequence in comparison with other samples or comparing between forward reaction and reverse reaction may provide accurate counting information.
  • the present invention further provides a kit for performing a method according to one or more of proceeding methods, comprising: providing reaction mixture(s), each comprising a first set of multiple target specific primers annealing to multiple target sequences, wherein for any particular target sequence, forward primers are designed to hybridise to the first strands of the target sequences, reverse primers are designed to hybridise to the second strands of the target sequences, wherein the set of the target specific primers in the forward reaction comprises forward primers and the set of the target specific primers in the reverse reaction comprises reverse primers; wherein the target specific primer comprises a 5' tail portion and a 3' target complementary portion, or/and the target specific primer comprises a bulge portion, both 5' part and 3' part of which are target specific sequences capable of hybridising to the target sequence; wherein the target- specific primer in the first set or second set comprises a random sequence identifier (RSI) located between the 5' tail portion and the 3' target complementary portion or in the bulge portion , wherein RSI portion
  • reaction mixtures are capable of carrying out linear amplification of the target sequences to generate single- stranded linear amplification products; optionally purifying or enriching reagents for purifying or enriching the single-stranded linear amplification products; and PCR amplifying reagents for amplifying the single-stranded linear amplification products using primers to generate double-stranded PCR products; wherein the primers and reagents are described in the proceeding methods.
  • a target-specific primer may comprise a random sequence identifier (RSI) between 5' universal tail and 3' target complementary portion or in the bulge portion.
  • RSI random sequence identifier
  • the purpose of RSI is twofold. First is the assignment of a unique RSI to each DNA template molecule. The second is the amplification of each uniquely tagged template, so that many daughter molecules with the identical RSI sequence are generated (defined as a RSI family). If a mutation pre-existed in the template molecule used for amplification, that mutation should be present in every daughter molecule containing that RSI.
  • a target- specific oligonucleotide may further comprise a fixed multiplexing barcode sequence between 5' universal tail and 3' target complementary portion or in the bulge portion.
  • the barcode sequence and RSI may both are present; barcode can be located at 5' or 3' of RSI.
  • the universal primers may contain two or more terminal phosphorothioates to make them resistant to any Exonuclease activity. They may also contain 5 ' -grafting sequences necessary for hybridization to NGS flow cell, for example the Illumina GA IIx flow cell. Finally, they may contain an index sequence between the grafting sequence and the universal tag sequence. This index sequence enables the PCR products from multiple different individuals to be
  • the target nucleic acid sequence may comprise a nucleic acid fragment or gene which contains variant nucleotide(s), and may be selected from the group consisting of disorder- associated SNP/deletion/insertion, chromosome rearrangement, trisomy, or cancer genes, drug- resistance gene, and virulence gene.
  • the disorder-associated gene may include, but is not limited to cancer-associated genes and genes associated with a hereditary disease.
  • the variant nucleotide(s) in the diagnostic region of the target polynucleotide sequence may include one or more nucleotide substitutions, chromosome rearrangement, deletions, insertions and/or abnormal methylation.
  • DNA methylation is an important epigenetic modification of the genome. Abnormal DNA methylation may result in silencing of tumor suppressor genes and is common in a variety of human cancer cells.
  • a preliminary treatment should be conducted prior to the practice of the present method.
  • the nucleic acid sample should be chemically modified by a bisulphite treatment, which will convert cytosine to uracil but not the methylated cytosine (i.e., 5- methylcytosine, which is resistant to this treatment and remains as cytosine).
  • the method of this invention can be applied to the detection of abnormal methylation(s) in the target nucleic acid.
  • the present invention provides a method of analysing a biological sample for gene expression.
  • the unique barcoded RSI is assigned to every linear
  • amplification strand and subsequently is identified during sequence analysis.
  • the present invention provides a method of analysing a biological sample for the presence and/or the amount of mutations or polymorphisms at multiple loci of different target nucleic acid sequences.
  • the present invention provides a method of analysing a biological sample for chromosomes abnormality of, for example trisomy.
  • the amplification and enriching step may be followed by next generation sequencing, digital PCR, microarray, or other high throughput analysis.
  • the number of multiplexing of target loci may be more than 5, or more than 10, or more than 30, or more than 50, or more than 100, or even more than 500.
  • One of limitations of these methods is that when a mutant is very rare in a sample, for example one or two mutants are present in the sample, after dividing the sample nucleic acid into two reactions, only one reaction may contain the mutant. The comparing two strands sequences in the two reactions are impossible.
  • the specificity can be increased by requiring more than one mutation sequencing reads in one reaction for mutation identification— the probability of introducing the same artefactual mutation twice or three times would be extreamly low.
  • more than one mutation sequencing reads in different RSI molecules in forward or reverse reaction may also be classified as mutant positive, as during single-side linear amplification step, the same artefacts appear more than twice would be very rare.
  • the invention provides methods of linking individual nucleic acid molecule with an unique sequence identifier or linking nucleic acids of a single cell with an unique sequence identifier, or making a targeted sequencing library from one or more samples, wherein the sequencing library is suitable for massive parallel sequencing and comprises a plurality of double- stranded nucleic acid molecules, wherein the method comprises:
  • each reaction is a primer extension reaction or PCR amplification, comprising a DNA polymerase, and a set of forward and reverse primers;
  • the at least one primer extension reaction is preferably a PCR amplification, which may include RT-PCR with reverse transcription before the PCR.
  • the RT-PCR is preferably performed in one step, single reaction.
  • the first population of primer extension product may be derived from first single molecule or from limited number of molecules, and second population of primer extension product may be derived from second single molecule or limited number of molecules, the reaction is performed in one chamber, wherein the limited number can be 2, 3, 4 , 5, 6, 7, 8, or up to 50.
  • the first single molecule or limited number of molecules may be a target molecule from a sample, the second single molecule or limited number of molecules may be an artificial template.
  • the first population of primer extension product may be derived from a single cell, and second population of primer extension product may be derived from second single molecule, which may be an artificial template, the reaction is performed in one chamber.
  • the artificial template comprises random sequence identifier (RSI) which comprises a degenerate or semi-degenerate or completely random nucleic acid sequence.
  • RSI random sequence identifier
  • the length of RSI can be from 2 nucleotides to 50 nucleotides long, preferably from 3 nucleotides to 30 nucleotides long, or more preferably from 3 nucleotides to 20 nucleotides long.
  • the linking may comprise a ligation reaction by a DNA ligase.
  • the primer extension products may be treated by enzyme to allow the ends of the products to be ligated.
  • the treating may comprise restriction enzyme digestion or phosphorylation by a kinase or trimming the ends to be blunt ends.
  • the linking may comprise an extension reaction by the DNA polymerase, wherein the 3 'end part of a forward strand derived from forward primer of an extended duplex of the first population is complementary to the 3 'end part of a forward strand of forward primer of an extended duplex of the second population.
  • the primer extension reaction may comprise an asymmetric PCR, wherein the ratio of forward primers and reverse primers is more than 1, preferably more than 2.
  • the primer comprises a 3' target complementary portion and a 5' tail portion, or the primer comprises a 3' target complementary portion, which is disrupted by a bulge portion located at 3, 4, 5, 6 or more nucleotides from 3' end of the primer, wherein the 5' tail portion or the bulge portion is not complementary to the initial target sequence (Fig. 7).
  • the 5' tail portion or the bulge portion of the primer may comprise a restriction enzyme site, such that primer extension products, after digestion, are ligatable (Fig. 7C and D).
  • a restriction enzyme site such that primer extension products, after digestion, are ligatable (Fig. 7C and D).
  • the hybridisation creates an unpaired base bulge.
  • the bulge can have any length, depending on what functional sequence it contains.
  • the bulge portion like the 5' tail portion may comprise restriction site, RSI, or sequence compatible for NGS sequencing, such as sequencing primer site.
  • the 5' tail portion of the primers for the first population comprises sequence complementary to the sequence of the 5' tail portion of the primers for the second population, such that the two populations of the primer extension products can be linked together by an extension reaction (Fig. 7F).
  • the primer may comprise a RSI portion between the 5' tail portion and the 3' target complementary portion.
  • the 5' tail portion or the bulge portion of at least one set of the primers may comprise random sequence identifier which is used to group sequencing reads to a family of target sequences (Fig. 7 B, C and D).
  • the 5' tail portion or the bulge portion of the primer may comprise a restriction enzyme site and random sequence identifier, wherein the restriction site is located 5' to the random sequence identifier (Fig. 7C and D).
  • the random sequence identifier comprises degenerate or semi-degenerate or completely random nucleic acid sequence.
  • the random sequence identifier may comprise a sequence that is between approximately 3 and 18 nucleotides in length.
  • the 5' tail portion or bulge portion of the primer may comprise sample multiplex identifier (MID) which is used to group sequencing reads to a sample.
  • MID sample multiplex identifier
  • the 5' tail portion or the bulge portion may comprise sequence compatible for sequencing (Fig. 7B, C and D).
  • the linked products may be used directly for next generation sequencing.
  • the linked products may be amplified using primers targeting to the 5' tail portion or the bulge portion of an extended forward primer flanking the target sequences, or primers which hybridise to the internal part of the target sequences, which is commonly referred to as nested primers.
  • the invention provides a method for high-throughput sequencing and analysis of nucleic acids from a plurality of biological samples comprising: (a) providing in a single reaction chamber at least two populations of primer extension products wherein forward and reverse primers are designed for each target/template sequence;
  • the two populations of primer extension products may be generated from target nucleic acid sequences, which are derived from an individual single sample of the plurality of biological samples.
  • the first population of primer extension product may be generated from a single target sample; second population of primer extension product may be generated from one barcoded artificial template or no more than five barcoded artificial templates.
  • the single target sample may be a single target molecule.
  • the single target sample may be a single cell.
  • the reaction chamber may be a compartment of water/oil emulsion or droplet.
  • the step (a) may be in a single assay.
  • the biological samples may be cells. The reaction may be performed on target
  • the barcoded artificial template comprises random sequence identifier which is used to group sequencing reads to a family of target sequences.
  • the random sequence identifier comprises degenerate or semi-degenerate or completely random nucleic acid sequence, which comprises a sequence that is between approximately 2 and 30 nucleotides in length.
  • the barcoded artificial template or the 5' tail portion of the reverse may comprise sample sequence identifier which is used to group sequencing reads to a sample.
  • the linking of two populations may comprise an extension reaction by the DNA polymerase, wherein the 3 'end part of a single stranded DNA derived from forward primer extension product of the first population is complementary to the 3 'end part of a single stranded DNA of forward primer extension product of the second population.
  • the primer extension reaction may comprise an asymmetric PCR, wherein the ratio of forward primers and reverse primers is more than 1, preferably more than 2.
  • generating the two populations of the primer extension products can be performed in a single PCR reaction. This method of coupling the two population of amplicons is also called splicing by overlap extension or fusion PCR.
  • the single reaction chamber can be achieved by emulsion or droplet method.
  • emulsion-based method single molecules or single cells are placed into individual compartments (chambers) of water-in-oil emulsion.
  • Emulsion may be formed by using physical methods (e.g., vortexing) that depends on Poisson statistics to achieve clonality.
  • emulsions may be generated using microfluidic technology (droplet fusion).
  • the target nucleic acid sequence may be a single molecule, which generates a first population of the amplicon.
  • the second population of the amplicon is generated from a single artificial template, which comprise unique random sequence identifier (the barcode).
  • the single target molecule and single artificial template are isolated in oil/water emulsion or droplet. The linking of the two populations occurs in the same reaction chamber through extending the complementary ends of the two population of the amplicons.
  • the target nucleic acid sequences may be a single cell, which generates a first population of the amplicon.
  • the second population of the amplicon is generated from a single artificial template, which comprise unique random sequence identifier (the barcode).
  • the second population of the amplicon is generated from the same single cell, which comprise unique target sequences, for example the heavy or light chain of antibody or TCR genes.
  • the single cell and single artificial template are isolated in oil/water emulsion.
  • the linking of the two populations occurs in the same reaction chamber through extending the complementary ends of the two population of the amplicons.
  • the sequencing and analysis may be of a transcriptome or genome.
  • the reaction is emulsion PCR.
  • the step (a) may be in a single assay.
  • the biological samples may be cells, which may be selected from the group consisting of cells in in vitro culture, stem cells, tumour cells, tissue biopsy cells, hybridomas, blood cells, and tissue section cells, wherein the blood cells may be T-lymphocytes or B-lymphocytes, wherein one of the primers may be an oligo (dT) primer.
  • cells which may be selected from the group consisting of cells in in vitro culture, stem cells, tumour cells, tissue biopsy cells, hybridomas, blood cells, and tissue section cells, wherein the blood cells may be T-lymphocytes or B-lymphocytes, wherein one of the primers may be an oligo (dT) primer.
  • the two populations of initial amplicons are derived from the transcripts of one sample, are ligated and sequenced. Accurate gene expression analysis can be obtained through the counting the family of sequence reads.
  • each single original transcript may be linked to a unique artificial template through emulsion PCR, the sequence reads shared with the same unique identifier of the artificial template will be counted as one transcript.
  • transcripts of million individual cells can be sequenced and analysed. Individual cells from a sample will be mixed with a single or two unique artificial templates in an emulsion. In one emulsion chamber there may be one cell with one or two unique artificial templates.
  • the unique artificial template is also amplified by PCR.
  • the unique artificial template is amplified first to generate enough unique artificial sequence amplicons, which is used to link the transcript amplicons.
  • the annealing temperature can be lowed to amplify the transcripts. This low annealing temperature PCR also promotes the joining (linking) of the two populations of the amplicons together.
  • the linked products may be further amplified and sequenced.
  • the present invention provides a method for multiplex amplifying and enriching multiple mutated target nucleic acid sequences in a sample may containing a small proportion of mutated sequences in a large wild-type background, for next generation sequencing analysis or other high-throughput detection.
  • tumour DNA can be used as a non-invasive biomarker to detect the presence of malignancy, follow treatment response, or monitor for recurrence.
  • current methods of detection have significant limitations.
  • Next Generation Sequencing (NGS) methods have revolutionised genomic exploration by allowing simultaneous sequencing of hundreds of billions of base pairs at a small fraction of the time and cost of traditional methods.
  • the error rate of ⁇ 1% results in hundreds of millions of sequencing mistakes, which is unacceptable when aiming to identify rare mutants in genetically heterogeneous mixtures, such as tumours and plasma.
  • the methods of this invention overcome these limitations in sequencing accuracy. Mutation- harbouring cfDNA can be obscured by a relative excess of background wild-type DNA;
  • the method greatly reduces errors by independently tagging and sequencing each original DNA duplex.
  • the methods of the present invention can substantially improve the accuracy of massively parallel sequencing. It can be implemented through either RSI in target specific primer and can be applied to virtually any sample preparation workflow or sequencing platform.
  • the approach can easily be used to identify rare mutants in a population of DNA templates.
  • One of the advantages of the strategy is that it yields the number of templates analysed as well as the fraction of templates containing variant bases.
  • the two strands of one target template in sample each is uniquely tagged and independently sequenced. Comparing the sequences of the two strands results in either agreement to each other or disagreement. The agreement gives the confidence to score a mutation as true positive. Artifactual mutations introduced during PCR amplification are detectable as errors, if both strand sequences of two populations does not agree to each other.
  • RNA molecules are created, each of which arose from a single strand of an individual DNA molecule.
  • members of each PCR family are identified and grouped by virtue of sharing the identical RSI tag sequence.
  • the sequences of uniquely RSI tagged family and two strands of target sequences are then compared to create a PCR consensus sequence. This step filters out random errors introduced during sequencing or PCR to yield a set of sequences, each of which derives from an individual molecule of single-stranded DNA.
  • sequences belonging to the two complementary strands of each target are identified by searching for complementary sequences among sequencing reads. Following partnering of the two strands, the sequences of the strands are compared. A sequence base at a given position is kept only if the read data from each of the two strands matches perfectly. The ratio of any mutation among the two strands are also compared; only the similar ratio of the numbers of mutant and normal sequence among the two strands indicates true mutation positive. Comparing the sequences obtained from both strands eliminates errors introduced during the first round of PCR where an artifactual mutation may be propagated to all PCR duplicates of one strand and would not be removed by single strand sequencing filtering alone.
  • the barcoded random sequence identifier in the target specific primer can also be used for single- molecule counting to precisely determine absolute DNA or RNA copy numbers. Because tagging occurs before major amplification, the relative abundance of variants in a population can be accurately assessed given that proportional representation is not subject to skewing by amplification biases.
  • Kits include the primers, in separate containers or in a single master mixture container.
  • the kit may also contain other suitably packaged reagents and materials needed for extension, amplification, enrichment, for example, buffers, dNTPs, and/or polymerizing means; and for detection analysis, for example, and enzymes, as well as instructions for conducting the assay.
  • the methods of the present invention greatly reduce errors by: tagging two strands of any target sequences or linking two populations of the same set of target sequences (or one target sequence and one artificial unique template with random sequence identifier) derived from one or two separate initial preparations with identifiable sequence signatures; tagging each target sequence with random sequence identifier; BiDirectional sequencing the two strands or linked two target sequences.
  • the methods provide uniform amplification of multiple target sequences.
  • Analysis provides error-corrected consensus sequences by grouping the sequenced uniquely tagged sequences or linked two amplicons into families containing the same pair of the two amplicons, which is further grouped into families containing the same set of random sequence identifier sequences; removing the target sequences of the same family having one or more nucleotide positions where the target sequence disagree with majority members in a family; and same mutations appearing in the two populations would be the true mutations.
  • the method can be used for detecting mutation in any sample such as FFPE or blood.
  • the accurate counting of sequencing reads which reflect the original molecules present in a sample provides information for copy number variations or for prenatal test for chromosome abnormality.
  • FIG.1 depicts a schematic of an illustrative embodiment of the present invention.
  • a forward reaction a set of multiple forward primers are hybridised to the first strands of the target sequences.
  • single stranded amplification products are generated.
  • the linear amplification may be thermal cycling amplification with one side of primers (not primer pairs).
  • only one strand of a target sequence is amplified. For example, if there is 20 cycles, the strand is amplified 20 fold in theory.
  • Each primer has a random sequence identifier, such that each amplified single- stranded product has an unique sequence identifier, which can be identified during sequence analysis.
  • the single - stranded linear amplification product may be enzymatically treated to remove unreacted primers, or purified or enriched. This step is optional, as it may be not necessary if the primers are greatly diminished after linear amplification.
  • the singe-stranded linear product then is PCR amplified using forward primers (may be universal primers or target specific primers) and target specific reverse primers.
  • the PCR products may be further amplified in another PCR to add universal primers used for sequencing. In this step, all forward reaction and reverse reactions may be mixed and amplified by using universal primers, which target to the 5' tail portion/bulge portion of the target specific primers used in step (a) and step (d). Then the PCR products may be purified and size selected ready for NGS sequencing.
  • FIG.2 depicts a schematic of an illustrative embodiment of the present invention.
  • a forward reaction a set of multiple forward primers are hybridised to the first strands of the target sequences.
  • the forward primer comprises a bulge portion, alternatively, the forward primer may comprise both 5' tail portion and bulge potion.
  • the linear amplification may be thermal cycling amplification with one side of primers (not primer pairs). In the linear amplification, only one strand of a target sequence is amplified. For example, if there is 20 cycles, the strand is amplified 20 fold in theory.
  • Each primer comprises a random sequence identifier in the bulge, such that each amplified single-stranded product has an unique sequence identifier, which can be identified during sequence analysis.
  • the single- stranded linear amplification product may be enzymatically treated to remove unreacted primers, or purified or enriched. This step is optional, as it may be not necessary if the primers are greatly diminished after linear amplification.
  • the singe-stranded linear product then is PCR amplified using forward primers which may be universal primers or target specific primers and target specific reverse primers.
  • the PCR products may be further amplified in another PCR to add universal primers used for sequencing.
  • all forward reaction and reverse reactions may be mixed and amplified by using universal primers, which target to the 5' tail portion/bulge portion of the target specific primers used in step (a) and step (d). Then the PCR products may be purified and size selected ready for NGS sequencing.
  • FIG. 3 Starting DNA sample is divided into two reactions; each amplifies one strand of a double stranded target molecule. This amplification is a single-side linear amplification generating single- stranded product.
  • the primer contains unique random sequence identifier (RSI), which gives each single- stranded amplification molecule an identity.
  • the single- stranded amplification product is enriched by hybridising a second set of target- specific primer, one -pass extension and purifying, or capturing on beads. The unreacted primers and primer dimers are removed.
  • the enriched product is PCR amplified using primers compatible to a NGS platform, Proceed to NGS.
  • Analysing the NGS reads derived from the first reaction and the second reaction, which represent two different strands of target sequences comprising generating an error-corrected consensus sequences by (i) grouping into families containing the same set of random sequence identifier sequences; (ii) removing the target sequences of the same family having one or more nucleotide positions where the target sequence disagree with majority members, and (iii) examining if the same mutations appearing in the two reactions, which represent different strands of a target sequence.
  • FIG. 4 depicts primers and affinity labelled oligonucleotide.
  • A a primer with 5' tail portion and 3' target complementary portion (left), a primer with bulge portion (right).
  • primer comprises 5' tail portion, RSI and 3 ' target complementary portion, or comprises RSI in the bulge portion.
  • C primer comprises 5' tail portion, RSI, sample MID and 3' target complementary portion.
  • D affinity labelled oligonucleotide hybridises to the 5' tail portion of a primer, the affinity label is attached to a bead.
  • FIG.5 depicts a schematic of an illustrative embodiment of the present invention.
  • a single sample which may be a single target molecule is present in a plurality of multiple samples which for example may be a plasma DNA.
  • Multiple single-molecule reactions may be performed in multiple reaction chambers which can be water/oil emulsion.
  • Each chamber (or a single reaction vessel) may contain single target molecule(first population), single barcoded artificial template (second population) and PCR reagents for generating two populations of amplicons.
  • the 5' tail sequence of the reverse primers for the first population is complementary to the whole reverse primer (or the 5' tail sequence of the reverse primers) for the second population.
  • each artificial template comprises a barcoded random sequence identifier, such that the amplicons from the single target molecule are tagged with such unique sequence identifier.
  • FIG.6 depicts a schematic of an illustrative embodiments of the present invention.
  • a single sample which may be a single cell is present in a plurality of multiple samples which can be a blood tissue. Multiple single cell reactions may be performed in multiple reaction chambers which can be water/oil emulsion. Each chamber (or a single reaction compartment) may contain single cell (for generating the first population of amplicons), single barcoded artificial template (second population) and PCR reagents (or RT-PCR) for generating two populations of amplicons.
  • the 5' tail sequence of the reverse primers for the first population is complementary to the whole reverse primer (or the 5' tail sequence of the reverse primers) for the second population.
  • each artificial template comprises a barcoded random sequence identifier, such that the amplicons from the single cell are tagged with such unique sequence identifier.
  • FIG. 7 shows primers used in the initial amplification/primer extension.
  • the reverse target specific primer comprises 3' target complementary portion and 5' tail portion (or a bulge portion).
  • the forward target specific primer comprises 3' target complementary portion and optional 5' tail portion (or a bulge portion). If forward primer comprises 5' tail portion, a universal primer targeting the 5' tail portion may be used to amplify the linked products. If forward primer does not comprise 5' tail portion, nested target specific primers targeting internal regions of target sequences may be used to amplify the linked products.
  • the 5' tail portion of the primer may comprise RSI and/or sample index identifier.
  • the 5' tail portion of the primer may comprise RSI.
  • the 5' tail portion of the primer may comprise RSI and restriction endonuclease recognition site.
  • the bulge portion of the primer may comprise RSI and restriction endonuclease recognition site.
  • the bulge portion of the primer may sequence compatible to the sequencing during NGS.
  • the 5' tail sequence of the reverse primers for the first population is complementary to the 5' tail sequence of the reverse primers for the second population.
  • Example 1 A cancer mutation hot spot panel was designed, containing 245x2 primer pairs.
  • the Panel contains four pools of primers used to perform multiplex PCR for preparation of amplicon libraries from genomic "hot spot” regions that are frequently mutated in human cancer genes.
  • the Hotspot Panel was designed to amplify 245 amplicons covering approximately 3,000 COSMIC mutations from 50 oncogenes and tumor suppressor genes.
  • Fp5 pool contains first set of forward primers, which has the structure: 5' tail(universal)-RSI-
  • Rp5 pool contains first set of reverse primers, which has the structure: 5' tail(universal)-RSI- MID-target specific;
  • Fp7 pool contains second set of forward primers, which has the structure: 5' tail(universal)-target specific;
  • Fp7 pool contains second set of reverse primers, which has the structure: 5' tail(universal)-target specific.
  • Each pool contains 245 primers.
  • all target regions of the gene(s) of interest were linearly amplified in two separate multiplex reactions per sample: forward reaction and reverse reaction, using Fp5 primer mix and Rp5 primer mix respectably and a hot-start DNA polymerase.
  • the linear amplified products were then purified to remove unreacted primers.
  • PCR was performed using universal primer targeting the tail portion of Fp5 and Rp5 primer, and target specific primer mix RP7 or FP7.
  • the amplified PCR was purified from unreacted primers. This product was ready for Ion Torrent sequencing.
  • a universal PCR was performed to enable tagging of the amplicons with specific MIDs and adaptors required for sequencing with the Illumina MiSeq MPS systems using the MID for Illumina MiSeq kits.
  • Each tagged amplicon library was subsequently purified from small residual DNA fragments and the DNA concentration determined. Next, these purified and individually tagged amplicon libraries were pooled equimolarly, resulting in an amplicon pool or sequencing sample.
  • Human DNA was either freshly prepared or stored between 4°C and 8°C (short-term storage) or in a freezer between -15°C and -25°C (long-term storage).
  • Suitable buffers are TE (10 mM Tris, 1 mM EDTA; pH8) or TE-4 (10 mM Tris, 0.1 mM EDTA; pH8).
  • Successful amplification of FFPE-derived DNA is highly dependent on the DNA quality. DNA extracted from fresh frozen tissue or blood.
  • forward reaction contains FP5 primer mix
  • reverse reaction contains RP5 primer mix
  • the linear amplified product was PCR amplified using universal P5SEq primer which contains 5' p5 sequence and 3' sequence targeting the tail portion of RP5 and FP5 primer, and second set of target specific primer, which comprise 5' tail portion and 3' target specific portion.
  • all target regions of the gene(s) of interest were linearly amplified in two separate multiplex reactions per sample: forward reaction and reverse reaction, using Fp5 primer mix and Rp5 primer mix respectably and a hot-start DNA polymerase.
  • the linear amplified products were then hybridised with a second set of multiple target specific primers Fp7pl or Rp7pl .
  • the 5' tail portion of the second set of primers was hybridised with a biotin labelled probe P7extBio:5' CCTCGTATGCCGTCTTCTGCT3 ' (SEQ ID NO: 1).
  • the complex containing linear amplified product/ second set of primer/ P7extBio were affinity purified by streptavidin beads.
  • all target regions of the gene(s) of interest were linearly amplified in two separate multiplex reactions per sample: forward reaction and reverse reaction, using Fp5 primer mix and Rp5 primer mix respectably and a hot-start DNA polymerase.
  • the linear amplified products were then hybridised with a second set of multiple target specific primers Fp7pl or Rp7pl.
  • One pass extension of the second set of primer on the template of the linear amplified product was performed.
  • the double- stranded extension product was either purified by beads or was digested with Exonuclease I, followed by heat inactivation.
  • the above product was amplified using universal primers p5SEq and P7index. Then the product was purified for sequencing.
  • nucleic acid molecule with an unique sequence identifier is an artificial template.
  • Target specific primers were designed to amplify EGFR and Kras gene fragments: EGFRF333e GTCGTTTTACGTTGGcgcagttgggcacttttgaa (SEQ ID NO: 4); EGFRR443e CTTTAAGAAGGAAAGATCATAtg (SEQ ID NO: 5); KRASF170e
  • Oil- surfactant mixture is prepared first in a 50ml tube and mixed at room temperature:
  • SOE PCR is a method for combining two DNA sequences (splicing) without the need for restriction sites. This is achieved by designing two of the four PCR primers to have
  • one reaction contained the artificial template only (NNNtemp), another genomic DNA only (SUN DNA) and one more containing both artificial and genomic DNA (NNNtemp+SUN) as well as an NTC.
  • Oil- surfactant mixture is prepared first in a 50ml tube and mixed at room temperature as described above.
  • reaction mixture(s) each comprising a first set of multiple target specific primers annealing to multiple target sequences, for any particular target sequence, a forward primer is designed to hybridise to the first strand of the target sequence, and reverse primer is designed to hybridise to the second strand of the target sequence, wherein in one reaction mixture the set of the target specific primers comprises either forward primers or reverse primers but not both;
  • step (d) PCR amplifying the products of step (c) using primers to generate double- stranded PCR products.
  • step (e) further processing the PCR products of step (d) to complete the library preparation for massive parallel sequencing.
  • step (c) and /or step (e) comprise removing the unreacted primers.
  • removing of the unreacted primers comprises purifying the single-stranded linear amplification products of step (b) or double-stranded product of step (e), wherein a bead or column based method is used to remove unreacted primers.
  • step (c) comprises hybridising the single- stranded amplification products to a second set of target- specific primers.
  • the target-specific primer comprises an affinity label or 5' universal tail portion, wherein the 5' universal tail portion of the hybridised target-specific primers are hybridised with an affinity-labelled oligonucleotide complementary to the 5' universal tail.
  • oligonucleotide are captured by avidin solid supports.
  • the target specific primer comprises a 5' tail portion and a 3' target complementary portion.
  • primers comprise a universal primer and a second set of multiple target specific primers, wherein the multiple target specific primers in the second set are either reverse primers or forward primers, but not both, wherein the universal primer comprises a sequence related to the 5' tail portion sequence of primers in the first set.
  • primers comprise a first universal primer and a second universal primer, wherein the first universal primer comprises a sequence related to the 5' tail portion sequence of primers in the first set, the second universal primer related to the 5' tail portion sequence of the second set of the target specific primers.
  • the target specific primer comprises a bulge portion, both 5' part and 3' part of which are target specific sequences capable of hybridising to the target sequence.
  • step (c) comprises purifying the product of step (b) to remove the unreacted primers
  • the purified product of step (c) is amplified by second set of target specific primers comprising 3' priming sequences capable of hybridising to the purified linear amplified product of step (b) and third set of target specific primers comprising 3' priming sequences which comprise sequence identical to the 5' part of bulge portion of the target specific primers of the first set.
  • the target- specific primer in the first set or second set comprises a random sequence identifier (RSI) located between the 5' tail portion and the 3' target complementary portion or in the bulge portion , wherein RSI portion comprises at least three random or degenerated nucleotides.
  • RSI random sequence identifier
  • the target specific primer comprises a multiplex identifier (MID) located between the 5' tail portion and the 3' target complementary portion or in the bulge portion, wherein MID is used to identify a sample when multiple samples are sequenced together.
  • MID multiplex identifier
  • reaction mixtures comprise two reactions: forward reaction comprises a first set of multiple target specific forward primers annealing to first strands of the multiple target sequences from one sample, reverse reaction comprises a first set of multiple target specific reverse primers annealing to the second strands of the multiple target sequences from the same one sample.
  • the primers comprise a second set of multiple target specific reverse primers, for the reverse reaction the primers comprise a second set of multiple target specific forward primers.
  • a kit for performing a method according to one or more of proceeding methods comprising: providing reaction mixture(s), each comprising a first set of multiple target specific primers annealing to multiple target sequences, wherein for any particular target sequence, forward primers are designed to hybridise to the first strands of the target sequences, reverse primers are designed to hybridise to the second strands of the target sequences, wherein the set of the target specific primers in the forward reaction comprises either forward primers or reverse primers but not both and the set of the target specific primers in the reverse reaction comprises reverse primers; wherein the target specific primer comprises a 5' tail portion and a 3' target complementary portion, or/and the target specific primer comprises a bulge portion, both 5' part and 3' part of which are target specific sequences capable of hybridising to the target sequence;
  • the target-specific primer in the first set or second set comprises a random sequence identifier (RSI) located between the 5' tail portion and the 3' target complementary portion or in the bulge portion , wherein RSI portion comprises at least three random or degenerated nucleotides;
  • RSI random sequence identifier
  • reaction mixtures are capable of carrying out linear amplification of the target sequences to generate single-stranded linear amplification products; optionally purifying or enriching reagents for purifying or enriching the single-stranded linear amplification products; and PCR amplifying reagents for amplifying the single-stranded linear amplification products using primers to generate double-stranded PCR products; wherein the primers and reagents are described in the proceeding methods.

Landscapes

  • Chemical & Material Sciences (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Organic Chemistry (AREA)
  • Engineering & Computer Science (AREA)
  • Zoology (AREA)
  • Wood Science & Technology (AREA)
  • Proteomics, Peptides & Aminoacids (AREA)
  • Health & Medical Sciences (AREA)
  • Biophysics (AREA)
  • Chemical Kinetics & Catalysis (AREA)
  • Immunology (AREA)
  • Microbiology (AREA)
  • Molecular Biology (AREA)
  • Analytical Chemistry (AREA)
  • Physics & Mathematics (AREA)
  • Biotechnology (AREA)
  • Biochemistry (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Health & Medical Sciences (AREA)
  • Genetics & Genomics (AREA)
  • Measuring Or Testing Involving Enzymes Or Micro-Organisms (AREA)

Abstract

A method of processing target nucleic acids from one or more samples, wherein a target nucleic acid in a sample is either: (i) a double-stranded duplex which comprises a first strand and a complementary second strand; or (ii) a single-stranded molecule which is a first strand or its complementary second strand wherein the method comprises: (a) providing a reaction mixture(s), each reaction mixture comprising a first set of multiple target specific primers capable of annealing to multiple target sequences, wherein in any one reaction mixture the set of the target specific primers comprises either forward target specific primers or reverse target specific primers but not both; (b) performing a single-side linear amplification of the target sequences to generate single-stranded amplification products; (c) treating the products step (b) to enrich the products; and (d) PCR amplifying the products of step (c) using primers to generate double-stranded PCR products.

Description

METHODS, COMPOSITIONS, AND KITS FOR PREPARING SEQUENCING LIBRARY
BACKGROUND OF THE INVENTION
This invention relates to methods, compositions and kits for making a target enriched sequencing library from one or more samples involving linear amplification and tagging two strands of target sequence. The sequencing library is suitable for massive parallel sequencing and comprises a plurality of double- stranded nucleic acid molecules.
Next-generation DNA sequencing promises to revolutionize clinical medicine and basic research. However, while this technology has the capacity to generate hundreds of billions of nucleotides of DNA sequence in a single experiment, the error rate of ~ 1% results in hundreds of millions of sequencing mistakes. These scattered errors become extremely problematic when "deep sequencing" genetically heterogeneous mixtures, such as tumours or mixed microbial populations.
To overcome limitations in sequencing accuracy, several methods have been reported.
Duplex sequencing (Schmitt, et al PNAS 109: 14508-14513) is one of them. This approach greatly reduces errors by independently tagging and sequencing each of the two strands of a DNA duplex. As the two strands are complementary, true mutations are found at the same position in both strands. In contrast, PCR or sequencing errors result in mutations in only one strand and can thus be discounted as technical error. Another approach called Safe-Sequencing System ("Safe-SeqS") was reported by Kinde et al (PNAS 2011 Jun 7;108(23):9530-5). The keys to this approach are (i) assignment of a unique identifier (UID) to each template molecule, (ii) amplification of each uniquely tagged template molecule to create UID families, and (iii) redundant sequencing of the amplification products. PCR fragments with the same UID are considered mutant ("supermutants") only if 95% of them contain the identical mutation. US Patents US8,722,368, US8,685,678, US8,742,606 describe methods of sequencing
polynucleotides attached with degenerate base region to determine/estimate the number of different starting polynucleotides. However, these methods do not compare sequence information of the original two strands and involve ligating and PCR to attach degenerate base region.
Targeted next generation sequencing often involves the analysis of large complex fragments and this is achieved by multiplex PCR (the simultaneous amplification of different target DNA sequences in a single PCR reaction). Results obtained with multiplex PCR however are often complicated by artifacts of the amplification products. These include false negative results due to reaction failure and false-positive results (such as amplification of spurious products) due to non-specific priming events. Since the possibility of non-specific priming increases with each additional primer pair, conditions must be modified as necessary as individual primer sets are added.
DETAILED DESCRIPTION
To facilitate an understanding of the invention, a number of terms are defined below.
As used herein, a "sample" refers to any substance containing or presumed to contain nucleic acids and includes a sample of tissue or fluid isolated from an individual or individuals.
Particularly, the nucleic acid sample may be obtained from an organism selected from viruses, bacteria, fungi, plants, and animals. Preferably, the nucleic acid sample is obtained from a mammal. In a preferred embodiment of this invention, the mammal is human. The nucleic acid sample can be obtained from a specimen of body fluid or tissue biopsy of a subject, or from cultured cells. The body fluid may be selected from whole blood, serum, plasma, urine, sputum, bile, stool, bone marrow, lymph, semen, breast exudate, bile, saliva, tears, bronchial washings, gastric washings, spinal fluids, synovial fluids, peritoneal fluids, pleural effusions, and amniotic fluid. A "individual sample" may be a single cell, which can be one T cell or one B cell, while the plurality of samples may be many blood cells in a blood sample.
As used herein, the term "nucleotide sequence" refers to either a homopolymer or a heteropolymer of deoxyribonucleotides, ribonucleotides or other nucleic acids.
As used herein, the term "nucleotide" generally refers to the monomer components of nucleotide sequences even though the monomers may be nucleoside and/or nucleotide analogs, and/or modified nucleosides such as amino modified nucleosides in addition to nucleotides. In addition, "nucleotide" includes non-naturally occurring analog structures.
As used herein, the term "nucleic acid" refers to at least two nucleotides covalently linked together. A nucleic acid of the present invention will generally contain phosphodiester bonds, although in some cases nucleic acid analogs are included that may have alternate backbones. Nucleic acids may be single-stranded or double-stranded, as specified, or contain portions of both double-stranded and single-stranded sequence. The nucleic acid may be DNA, both genomic and cDNA, RNA or DNA-RNA hybrids, where the nucleic acid contains any combination of deoxyribo- and ribo-nucleotides, and any combination of bases, including uracil, adenine, thymine, cytosine, guanine, inosine, xathanine, hypoxathanine, etc. Reference to a "DNA sequence" can include both single-stranded and double- stranded DNA. A specific sequence, unless the context indicates otherwise, refers to the single stranded DNA of such sequence, the duplex of such sequence with its complement (double stranded DNA) and/or the complement of such sequence.
As used herein, the "polynucleotide" and "oligonucleotide" are types of "nucleic acid", and generally refer to primers, oligomer fragments to be detected. There is no intended
distinction in length between the term "nucleic acid", "polynucleotide" and "oligonucleotide", and these terms will be used interchangeably. "Nucleic acid", "DNA" and similar terms also include nucleic acid analogs. The oligonucleotide is not necessarily physically derived from any existing or natural sequence but may be generated in any manner, including chemical synthesis, DNA replication, reverse transcription or a combination thereof.
As used herein, the terms "target sequence", "target nucleic acid", "target nucleic acid sequence" and "nucleic acids of interest" are used interchangeably and refer to a desired region which is to be either amplified, detected or both, or is the subject of hybridization with a complementary oligonucleotide, polynucleotide, e.g., a blocking oligomer, or the subject of a primer extension process. The target sequence can be composed of DNA, RNA, analogs thereof, or combinations thereof. The target sequence can be single-stranded or double-stranded. In primer extension processes, the target nucleic acid which forms a hybridization duplex with the primer may also be referred to as a "template." A template serves as a pattern for the synthesis of a complementary polynucleotide. A target sequence for use with the present invention may be derived from any living or once living organism, including but not limited to prokaryotes, eukaryotes, plants, animals, and viruses, as well as synthetic and/or recombinant target sequences.
"Primer" as used herein refers to more than one primer and refers to an oligonucleotide, whether occurring naturally or produced synthetically, which is capable of acting as a point of initiation of synthesis when placed under conditions in which synthesis of a primer extension product, which is complementary to a nucleic acid strand is induced i.e., in the presence of nucleotides and an agent for polymerization such as DNA polymerase and at a suitable temperature and in a suitable buffer. Such conditions include the presence of four different deoxyribonucleoside triphosphates and a polymerization-inducing agent such as DNA
polymerase or reverse transcriptase, in a suitable buffer ("buffer" includes substituents which are cofactors, or affect pH, ionic strength, etc.), and at a suitable temperature. The primer is preferably single- stranded for maximum efficiency in amplification. The primers herein are selected to be substantially complementary to a strand of each specific sequence to be amplified. This means that the primers must be sufficiently complementary to hybridize with their respective strands. A non-complementary nucleotide fragment may be attached to the 5 '-end of the primer (5' tail portion) or in the primer (bulge portion), with the remainder of the primer sequence being complementary to the desired section of the target base sequence. Commonly, the primers are complementary, except when non-complementary nucleotides may be present at a predetermined primer terminus or middle region as described. In another expression, the primers herein are selected to be substantially identical to a strand of each specific sequence to be amplified. This means that the primers must be sufficiently identical to one strand, so that they can hybridize with their respective other strands.
As used herein, the term "complementary" refers to the ability of two nucleotide sequences to bind sequence- specifically to each other by hydrogen bonding through their purine and/or pyrimidine bases according to the usual Watson-Crick rules for forming duplex nucleic acid complexes. It can also refer to the ability of nucleotide sequences that may include modified nucleotides or analogues of deoxyribonucleotides and ribonucleotides to bind sequence- specifically to each other by other than the usual Watson Crick rules to form alternative nucleic acid duplex structures.
As used herein, the term "hybridization" and "annealing" are interchangeable, and refers to the process by which two nucleotide sequences complementary to each other bind together to form a duplex sequence or segment.
The terms "duplex" and "double-stranded" are interchangeable, meaning a structure formed as a result of hybridization between two complementary sequences of nucleic acids. Such duplexes can be formed by the complementary binding of two DNA segments to each other, two RNA segments to each other, or of a DNA segment to an RNA segment, the latter structure being termed as a hybrid duplex. Either or both members of such duplexes can contain modified nucleotides and/or nucleotide analogues as well as nucleoside analogues. As disclosed herein, such duplexes are formed as the result of binding of one or more blocking oligonucleotides to a sample sequence.
As used herein, the terms "wild-type nucleic acid", "normal nucleic acid", "nucleic acid with normal nucleotides", "wild-type DNA" and "wild-type template" are used interchangeably and refer to a polynucleotide which has a nucleotide sequence that is considered to be normal or unaltered.
As used herein, the term "mutant polynucleotide", "mutant nucleic acid", "variant nucleic acid", and "nucleic acid with variant nucleotides", refers to a polynucleotide which has a nucleotide sequence that is different from the nucleotide sequence of the corresponding wild- type polynucleotide. The difference in the nucleotide sequence of the mutant polynucleotide as compared to the wild-type polynucleotide is referred to as the nucleotide "mutation", "variant nucleotide" or "variation." The term "variant nucleotide(s)" also refers to one or more nucleotide(s) substitution, deletion, insertion, methylation, and/or modification changes.
"Amplification" as used herein denotes the use of any amplification procedures to increase the concentration of a particular nucleic acid sequence within a mixture of nucleic acid sequences.
The terms "reaction mixture", "amplification mixture" or "PCR mixture" as used herein refer to a mixture of components necessary to amplify at least one product from nucleic acid templates. The mixture may comprise nucleotides (dNTPs), a thermostable polymerase, primers, and a plurality of nucleic acid templates. The mixture may further comprise a Tris buffer, a monovalent salt and Mg2+. The concentration of each component is well known in the art and can be further optimized by an ordinary skilled artisan.
The terms "amplified product" or "amplicon" refer to a fragment of DNA amplified by a polymerase using single-side primers or a pair of primers in an amplification method.
The terms "primer extension product" refer to a fragment of DNA extended by a polymerase using one or a pair of primers in a reaction, which may involve one pass extension, for example first strand cDNA synthesis, or two pass extension, for example double strand cDNA syntheses, or many cycles of extension, which may be a PCR.
The practice of the present invention will employ, unless otherwise indicated, conventional techniques of molecular biology, microbiology and recombinant DNA techniques, which are within the skill of a person skilled in the art. All patents, patent applications, and publications mentioned herein, both supra and infra, are hereby incorporated by reference.
In one aspect (Fig. 1 and 2), the invention provides methods of processing target nucleic acids from one or more samples, wherein a target nucleic acid in a sample may be single- stranded molecule (which is referred to as first strand, wherein its complement is referred to as second strand) or double- stranded duplex which comprises a first strand and a complementary second strand, wherein the method comprises:
(a) providing reaction mixture(s), each comprising a first set of multiple target specific primers annealing to multiple target sequences, for any particular target sequence, forward primers are designed to hybridise to the first strands of the target sequences, reverse primers are designed to hybridise to the second strands of the target sequences, wherein for one reaction the set of the target specific primers comprises either forward primers or reverse primers but not both;
(b) single- side linear amplification of the target sequences to generate single- stranded linear amplification products;
(c) treating the products step (b) to enrich the products;
(d) PCR amplifying the products of step (c) using primers to generate double- stranded PCR products, wherein the product of this step may be used directly for sequencing,
The method may further comprise step (e) processing the PCR products of step (d) to complete the sequencing library preparation for massive parallel sequencing such as a NGS platform.
The step (c) and /or step (e) may comprise removing the unreacted primers, wherein the removing of the unreacted primers may comprise purifying the single-stranded linear
amplification products of step (b) or double- stranded product of step (e), for example a bead or column based method is used to remove unreacted primers. The removing of the unreacted primers may comprise treating the amplification products by enzymatic digestion to remove the unreacted primers, wherein the enzymatic digestion may be exonuclease I digestion.
The step (c) may comprise hybridising the single-stranded amplification products to a second set of target- specific primers. The hybridised target- specific primers of the second set of primers may be extended on the single-stranded amplification products (one pass extension).
Optionally, the target- specific primer may comprise an affinity label or 5' universal tail portion, wherein the 5' universal tail portion of the hybridised target- specific primers are hybridised with an affinity-labelled oligonucleotide complementary to the 5' universal tail. The affinity label may be biotin, the complex of the hybridised amplification products/ target- specific oligonucleotides/biotin-labelled oligonucleotide are captured by avidin solid supports.
The target specific primer may comprise a 5' tail portion and a 3' target complementary portion (Fig. 4). Alternatively, the target specific primer may comprise a bulge portion which is non-complementary to a target sequence and is located between the two target complementary portions, in another words, the bulge portion interrupts the target complementary portion of a primer, creating the complementary 5' part and a 3' part and non-complementary bulge in the middle. The bulge portion may be located at 4-15 nucleotides from the 3' end, preferably at 5-10 nucleotide away from the 3' end. Despite a primer may have a bulge portion, the primer still can function as primer. The 5' tail portion or budge portion not complementary to the target sequence may comprise random sequence identifier (RSI), or/and sequence compatible for a NGS platform, which may comprise universal PCR primer sequence, NGS sequencing primer sequence, and/or NGS adaptor sequences. In the step (a), first set of target- specific primers are present in a reaction, wherein the target- specific primer in the first set is capable of hybridising to the first strand of a target duplex. Since there is no primer pair annealing to both strands of a target sequence, in step (b) the linear amplification occurs. In the step (c) and/or step (d), there may be a second set of the target- specific primers present in the reaction to either enriching, one pass extension or amplifying the products. The second set of primers are capable of hybridising to the extension strands generated from the first set of primers.
The target- specific primers in the first set or second set may comprise a random sequence identifier (RSI) which is located between the 5' tail portion and the 3' target complementary portion or in the bulge portion , wherein RSI portion comprises at least three random or degenerated nucleotides, wherein during step (b) or/and step (c) RSI assigns each extended strand an unique sequence identifier such that during sequence analysis based on the unique RSI, the sequences sharing the same RSI are grouped into a family. The random sequence identifier may comprise a sequence that is between approximately 3 and 20 nucleotides in length.
Specifically, RSI portion may comprise at least 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13,14, or 15 random or degenerated nucleotides, wherein during linear amplification step (b) or step (c) RSI assigns each amplified strand with an unique sequence identifier such that during sequence analysis based on the unique RSI, the sequences sharing the same RSI are grouped into a family (Fig. 3). The random sequence identifier may comprise degenerate or semi-degenerate or completely random nucleic acid sequence.
The target specific primer may comprise a multiplex identifier (MID) located between the 5' tail portion and the 3' target complementary portion or in the bulge portion, wherein MID is used to identify a sample when multiple samples are sequenced together. The target specific primer may comprise both RSI and MID located between the 5' tail portion and the 3' target complementary portion. It does not matter how RSI and MID are located. It may be preferred MID is located 3' of RSI such that a primer may have a structure 5' tail portion - RSI - MID - 3 'target complementary portion (Fig. 2C).
The step (c) may comprise purifying the single- stranded linear amplification products of step (b). For example, enzymatic digestion, a bead or column based method may be used to remove unreacted primers. The purification method removes the unreacted primers. Any method can be used; preferred method is the Agencourt AMPure XP beads from Beckman coulter. After digestion or purification, the purified product may be immediately processed to step (d). In the step (d), the PCR primers may comprise second set of target- specific primers annealing to the linear amplification product, and universal primer which is related to the 5' tail portion of primers of first set. If each primer of the first set comprises bulge portion without a 5' tail portion, in step (c) the linear amplification product may be purified, for example beads purification, in the step (d) the PCR primers may include a second set of target specific primer annealing to the linear amplification product, and third set of target specific primers related to the 5' part sequence of the bulge primers of first set. As used herein "related" means comprising same sequence or similar sequence, for example similar may mean sharing at least 95, 96, 97, 98 or 99% sequence identity. In one embodiment the universal primer is capable of hybridising to the 5' tail portion of primers of first set. In one embodiment the universal primer is capable of hybridising to the 5' tail portion of primers of second set. In one embodiment the universal primer is capable of hybridising to the copied part of the 5' tail portion of the primers of the first set. In one embodiment the universal primer is capable of hybridising to the copied part of the 5' tail portion of the primers of the second set.
The step (c) may comprise hybridising the single-stranded single-side amplification products to a second set of multiple target- specific primers which are capable of annealing to the linear amplification products generated from the first set of the target- specific primers.
RSI is preferably incorporated into primer extended target nucleic acids in the step (a) and (b), but RSI may be also incorporated into target nucleic acids in the step (c). In one embodiment, when the target-specific primer in the first set comprises only 3' target
complementary region without a 5' tail or bulge portion, each primer in the second set comprises a 5' tail portion or bulge portion, which comprises random sequence identifier (RSI). In the step (c) after removing the unreacted primers of the first set, the annealed primers of the second set may be extended on the templates generated from step (b), wherein the RSI is incorporated into the extended target nucleic acids. The extension may be done once or twice, or more than two times, which may be achieved by temperature cycling through denaturing, annealing and extension. In this embodiment, in the step (d) the PCR primers may include third set of target specific primer nested to the first set of target specific primer, and the universal primer related to the 5' tail sequence of the primers of second set if the primers in the second set comprise a 5' tail portion. Alternatively, in the step (d) the PCR primers may include third set of target specific primer nested to the first set of target specific primer, and fourth set of target specific primers related to the 5' part sequence of the bulge primers of second set if the primer in the second set comprises a bulge portion. Nested primers for use in the PCR amplification are oligonucleotides having sequence complementary to a region on a target sequence between reverse and forward primer targeting sites. One primer is called outer primer; its nested primer is called inner primer. The nested inner primer may overlap with its outer primer.
In one embodiment, in the step (c) to enrich of the linear amplified product, the hybridised target- specific primers of the second set may be extended on the templates of the single- stranded single-side amplification products. The extension reaction may be performed in the same reaction vessel as the linear amplification reaction vessel. After linear amplification with or without removing the unreacted primers of the first set, the target- specific primers of second set are added into the reaction, heat denatured, put to hybridisation/extension conditions. The extension condition may include the same ingredients in the linear amplification reaction. The extension may be performed at cycling condition to extend the oligonucleotides several times, but preferably the extension is performed only once or twice. The extended double-strand products may be purified by any means known in the art, for example Qiagen PCR purification kit, or Agencourt Ampure XP kit.
In another embodiment, the target- specific primer in the second set may comprise a 5' universal tail, wherein the 5' universal tail portion of the target- specific primers may be hybridised with an affinity-labelled oligonucleotide complementary to the 5' universal tail (Fig. 4D). The affinity label may be biotin, the complex of the linear amplification products/ target- specific oligonucleotides/biotin-labelled oligonucleotide may be captured by avidin solid supports.
The target specific primer of step (a) may be ordinary primer comprising target complementary sequence only. Preferably, the target specific primer of step (a) may comprise a 5' tail portion and a 3' target complementary portion. The 3' target complementary portion is used to hybridise to the target sequence and prime DNA synthesise. The 5' tail portion may comprise random sequence identifier, or/and sequence compatible to the followed amplification or/and sequencing process in a NGS platform (Fig. 4). For example, the 5' tail portion may comprise sequence compatible to the primer used in the NGS. Alternatively, the target specific primer may comprise a 3' target complementary portion, which is disrupted by a bulge portion located at 3, 4, 5, 6, 7, 8, 9, 10 or more nucleotides from 3' end of the primer. The bulge portion may have any desired length. The bulge portion may be solely RSI, which is 3-20 nucleotides long. The 5' tail portion or the bulge portion is not complementary to the initial target sequence (Fig. 4). The 5' tail portion or the bulge portion of the primer may comprise random sequence identifier or/and sequence compatible to a NGS platform. When the primer with bulge portion is annealing to the initial nucleic acid target template, the hybridisation creates an unpaired base bulge. The bulge portion, like the 5' tail portion may comprise restriction site, RSI, MID or sequence compatible for NGS sequencing, such as sequencing primer.
In step (a), only one side of primers for a particular target is present in the reaction so that single- stranded linear amplification products is generated in step (b). For single-stranded initial RNA target (referred to as first strand), the target specific forward primers complementary to the RNA template may be present in the reaction, but no reverse primers are in the same reaction. For double stranded DNA templates, the target specific forward primers complementary to the first strands of the DNA templates are present in forward reaction, but no reverse primers are in the same forward reaction. In a second separate reaction, the target specific reverse primers complementary to the second strands of the DNA templates are present in the reverse reaction, but no forward primers are present in the same reverse reaction (Fig. 3).
When primers anneal to the target sequences, in the presence of reagents for linear amplification, step (b) is carried out. The linear single-side amplification can be isothermal amplification. Preferably, the linear single-side amplification is a thermal cycling amplification involving temperature cycling, including denaturing step, and annealing /extension step. The cycle number can be any suitable number, which may be between 1-100 cycles, for example 1 cycle, 2 cycles, 3 cycles, 10 cycles, 15 cycles, 20 cycles, 25, cycles, 30 cycles, 35 cycles, 40 cycles, 45 cycles, 50 cycles, 60 cycles or 100 cycles.
After step (b), the reaction can immediately be processed to step (d) without any purification and enrichment step. It is preferred that the remaining primers after the reaction of step (b) are kept at a considerably low level, therefore do not interfere the next step. One method to achieve this may be that the primers may be consumed in the linear amplification and reach to a very low level at the end of linear amplification. For this to happen, the primers added in the starting reaction must be in a very small amount, so that most primers are consumed after linear amplification. Alternatively, an optional purification or enrichment in step (c) may be carried out. Any purification method can be used to remove the unreacted primers, for example using beads to purify. Alternatively, enrichment of desired linear amplification product may be carried out. Any enrichment method to enrich the linear amplification products can be used. The step (c) may comprise hybridising the linear amplification products to a second set of multiple target- specific primers. The second set of the target- specific primers may be the same as used in both step (c) or/and step (d). Alternatively, step (d) may use a different set of target -specific primers or may not use target specific primers. In one embodiment, the hybridised second set of the target- specific primers may be extended on the templates of the linear amplification products(one pass extension). The extension reaction may be performed in the same reaction. The extended double- strand products may be purified by any means known in the art. The purified extended products are amplified in step (d). In the step (d) the primers used for amplification may comprise a first universal primer and a second universal primer, wherein the first universal primer comprises a sequence related to the 5' tail portion sequence of primers in the first set, the second universal primer related to the 5' tail portion sequence of the second set of the target- specific primers. Alternatively, in the step (d) the primers used for amplification may comprise an universal primer related to the first set of primer and a second set of multiple target specific primers, wherein the second set of multiple target specific primers capable of hybridising to the extended products of the first set of the primers, wherein the universal primer comprises a sequence related to the 5' tail portion sequence of primers in the first set. Alternatively, in the step (d) the primers used for amplification may comprise a second set of multiple target specific primers, wherein the second set of multiple target specific primers capable of hybridising to the extended products of the first set of the primers, and third set of multiple target specific primers, which are nested primer relative to the first set, or are related to the 5' part of bulge primer of the first set.
When the reaction mixture of the step (a) comprises target specific primers with bulge portion, the step (c) may comprise exonuclease I treatment or/and purifying the product of step (b) to remove the unreacted primers, in the step (d) the purified product of step (c) is amplified by second set of target specific primers comprising 3' priming sequences capable of hybridising to the purified linear amplified product of step (b) and third set of target specific primers comprising 3' priming sequences which are identical or substantially identical to the 5' part of bulge portion of the first set of target specific primers (Fig. 2).
The linear amplification products may be enriched by hybridising probes on a solid support. The probes bind the desired linear amplification product specifically. Since the first set of target- specific primers is used in linear amplification, the pairing second set of primers capable of hybridising to the single-stranded linear product of step (b) may be used in step (c) as probes to enrich the target sequence. The term "pairing" means, if one primer is forward primer, the pairing primer is reverse primer. The target specific primers may comprise a 5' tail portion and a 3' target complementary portion (Fig. 1 and 4). An affinity labelled oligonucleotide is complementary to the 5' tail portion or bulge portion of the target specific primers. The affinity label may be biotin. The linear amplification products are hybridised to the target specific primers, which are then hybridised to the biotin labelled the oligonucleotide through the 5' tail portion or bulge portion. Then the biotin labelled the oligonucleotides are pulled out by streptavidin beads (Fig. 4 D). All unreacted primers, template DNA and non-specific products are removed by the enrichment. Particularly, if in the forward reaction the primers are forward primers, the linear amplification product from the forward reaction may be enriched by hybridising to the target specific reverse primers, which either comprise an affinity label, or comprise a 5' tail portion/bulge portion which is hybridised to a universal oligonucleotide which comprises an affinity label.
The capture of the linear amplification products can be performed either on a solid phase or in liquid step. Typically, the capture operation of the enrichment will employ hybridisation to probes representing multiple target nucleic acids. On a solid phase, non-binding fragments are separated from binding fragments. Suitable solid supports known in the art include filters, glass slides, membranes, beads, columns, etc. If in a liquid phase, a capture reagent can be added which binds to the probes, for example through a biotin-avidin type interaction. After capture, desired fragments can be eluted for further processing.
In step (d), primers used to generate double stranded PCR products may comprise target specific forward primers and target specific reverse primers. If the primers in the reaction of the step (a) are forward primers, another set of the target specific forward primers of step (d) may be nested primers in terms of forward primers of step (a). Alternatively, in step (d), primers used to generate double stranded PCR products may comprise a universal primer and a second set of multiple target specific primers. The second set of multiple target specific primers comprises either reverse primers or forward primers but not both, wherein the universal primer comprises sequence related to the 5' tail portion sequence or bulge portion of primers in the first set. If in the forward reaction of steps (a and b) the target specific primers are forward primers, which comprise 3' target complementary portion and 5' tail portion, the primers used in the forward reaction of step (d) comprise a second set of target specific reverse primers and universal primer, which are capable of targeting the 5' tail portion of the primers used in steps (a and b). If in the reverse reaction of steps (a and b) the target specific primers are reverse primers, which comprise 3' target complementary portion and 5' tail portion, the primers used in the reverse reaction of step (d) comprise a second set of target specific forward primers and universal primer, which are capable of targeting to the 5' tail portion of the primers used in steps (a and b).
The single-stranded starting molecule may be RNA, or single-stranded cDNA. The double- stranded duplex may be genomic DNA, or any suitable dsDNA present in a sample.The step (a) the reaction mixtures may comprise two reactions: forward reaction and reverse reaction. The forward reaction comprises a first set (forward set) of multiple target specific forward primers annealing to first strands of the multiple target sequences from one sample, and the reverse reaction comprises a first set (reverse set) of multiple target specific reverse primers annealing to the second strands of the multiple target sequences from the same one sample. In the step (d), the primers used to generate PCR products may comprise an universal primer targeting 5' tail portion of first set primers and another universal primer targeting 5' tail portion of second set of primers if the step (c) comprises enriching the linear amplification products by hybridising and extension of the second set of the target- specific primers. Alternatively, the primers used to generate PCR products in the step (d) may comprise an universal primer targeting 5' tail portion of first set primers and a second set of multiple target specific primers annealing to second strands of the multiple target sequences. Alternatively, the primers used to generate PCR products in the step (d) may comprise a universal primer targeting 5' tail portion of first set primers and a third set of multiple target specific primers annealing to second strands of the multiple target sequences, wherein the third set of the target- specific primers (inner primers) is nested to the second set of the target- specific primers (outer primers). The universal primers in the forward and reverse reactions may be the same.
The reaction mixtures may comprise multiple reactions for more than one sample, which may be two samples, three samples or more than three samples, or more than 10 samples.
Different samples may be process together in parallel. Each sample may comprise two reactions: forward reaction and reverse reaction. Different sample reactions (all forward reactions, or all reverse reactions) may be preferably mixed in step (c or d), where the identity of each sample is assigned in the linear amplification by target specific primers having MID. All forward reactions or reverse reactions after linear amplification may be processed in one mixture in step (c) and followed steps.
In step (d) or (e), the PCR products may be purified and ready for sequencing, or may be further amplified in another PCR to add universal primers used for sequencing. In this step, all forward reaction and reverse reactions may be mixed and amplified by using universal primers, which target to the 5' tail portion of the target specific primers used in step (a) or/and step (d). Then the PCR products may be purified and size selected ready for NGS sequencing.
The method further comprises analysing the NGS reads derived from the forward reaction and the reverse reaction, which represent two different strands of target sequences, comprising generating an error-corrected consensus sequences by (i) grouping into families containing the same random sequence identifier sequences; (ii) removing the target sequences of the same family having one or more nucleotide positions where the target sequence disagree with majority members, and (iii) examining if the same mutations appearing in the two reactions, which represent different strands of a target sequence.
The method further comprises analysing the NGS reads derived from the forward reaction and the reverse reaction, which represent two different strands of target sequences, comprising generating consensus sequences by grouping into families containing the same random sequence identifier (RSI)sequences; and counting the numbers of families. This method provides an accurate counting for the numbers of original target nucleic acids present in a sample.
The methods can be used to quantitate the starting molecules, although the single-side amplification may distort the number of the original target molecule number. Nevertheless, the counting of RSI families of a target sequence in comparison with other samples or comparing between forward reaction and reverse reaction may provide accurate counting information.
The present invention further provides a kit for performing a method according to one or more of proceeding methods, comprising: providing reaction mixture(s), each comprising a first set of multiple target specific primers annealing to multiple target sequences, wherein for any particular target sequence, forward primers are designed to hybridise to the first strands of the target sequences, reverse primers are designed to hybridise to the second strands of the target sequences, wherein the set of the target specific primers in the forward reaction comprises forward primers and the set of the target specific primers in the reverse reaction comprises reverse primers; wherein the target specific primer comprises a 5' tail portion and a 3' target complementary portion, or/and the target specific primer comprises a bulge portion, both 5' part and 3' part of which are target specific sequences capable of hybridising to the target sequence; wherein the target- specific primer in the first set or second set comprises a random sequence identifier (RSI) located between the 5' tail portion and the 3' target complementary portion or in the bulge portion , wherein RSI portion comprises at least three random or degenerated nucleotides, wherein during step (b) or step (c) RSI assigns each extended strand an unique sequence identifier such that during sequence analysis based on the unique RSI, the sequences shared the same RSI are grouped into a family;
wherein the reaction mixtures are capable of carrying out linear amplification of the target sequences to generate single- stranded linear amplification products; optionally purifying or enriching reagents for purifying or enriching the single-stranded linear amplification products; and PCR amplifying reagents for amplifying the single-stranded linear amplification products using primers to generate double-stranded PCR products; wherein the primers and reagents are described in the proceeding methods.
A target-specific primer may comprise a random sequence identifier (RSI) between 5' universal tail and 3' target complementary portion or in the bulge portion. The purpose of RSI is twofold. First is the assignment of a unique RSI to each DNA template molecule. The second is the amplification of each uniquely tagged template, so that many daughter molecules with the identical RSI sequence are generated (defined as a RSI family). If a mutation pre-existed in the template molecule used for amplification, that mutation should be present in every daughter molecule containing that RSI.
A target- specific oligonucleotide may further comprise a fixed multiplexing barcode sequence between 5' universal tail and 3' target complementary portion or in the bulge portion. The barcode sequence and RSI may both are present; barcode can be located at 5' or 3' of RSI.
The universal primers may contain two or more terminal phosphorothioates to make them resistant to any Exonuclease activity. They may also contain 5 ' -grafting sequences necessary for hybridization to NGS flow cell, for example the Illumina GA IIx flow cell. Finally, they may contain an index sequence between the grafting sequence and the universal tag sequence. This index sequence enables the PCR products from multiple different individuals to be
simultaneously analysed in the same flow cell compartment of the sequencer.
The target nucleic acid sequence may comprise a nucleic acid fragment or gene which contains variant nucleotide(s), and may be selected from the group consisting of disorder- associated SNP/deletion/insertion, chromosome rearrangement, trisomy, or cancer genes, drug- resistance gene, and virulence gene. The disorder-associated gene may include, but is not limited to cancer-associated genes and genes associated with a hereditary disease.
The variant nucleotide(s) in the diagnostic region of the target polynucleotide sequence may include one or more nucleotide substitutions, chromosome rearrangement, deletions, insertions and/or abnormal methylation.
DNA methylation is an important epigenetic modification of the genome. Abnormal DNA methylation may result in silencing of tumor suppressor genes and is common in a variety of human cancer cells. In order to detect the presence of any abnormal methylation in the target polynucleotide, a preliminary treatment should be conducted prior to the practice of the present method. Preferably, the nucleic acid sample should be chemically modified by a bisulphite treatment, which will convert cytosine to uracil but not the methylated cytosine (i.e., 5- methylcytosine, which is resistant to this treatment and remains as cytosine). With these modifications, the method of this invention can be applied to the detection of abnormal methylation(s) in the target nucleic acid.
The present invention provides a method of analysing a biological sample for gene expression. In one embodiment, the unique barcoded RSI is assigned to every linear
amplification strand and subsequently is identified during sequence analysis.
The present invention provides a method of analysing a biological sample for the presence and/or the amount of mutations or polymorphisms at multiple loci of different target nucleic acid sequences. In another aspect, the present invention provides a method of analysing a biological sample for chromosomes abnormality of, for example trisomy. The amplification and enriching step may be followed by next generation sequencing, digital PCR, microarray, or other high throughput analysis. The number of multiplexing of target loci may be more than 5, or more than 10, or more than 30, or more than 50, or more than 100, or even more than 500.
One of limitations of these methods is that when a mutant is very rare in a sample, for example one or two mutants are present in the sample, after dividing the sample nucleic acid into two reactions, only one reaction may contain the mutant. The comparing two strands sequences in the two reactions are impossible. However, the specificity can be increased by requiring more than one mutation sequencing reads in one reaction for mutation identification— the probability of introducing the same artefactual mutation twice or three times would be extreamly low.
Instead of matching sequencing reads of forward and reverse reactions, more than one mutation sequencing reads in different RSI molecules in forward or reverse reaction may also be classified as mutant positive, as during single-side linear amplification step, the same artefacts appear more than twice would be very rare.
In another aspect, the invention provides methods of linking individual nucleic acid molecule with an unique sequence identifier or linking nucleic acids of a single cell with an unique sequence identifier, or making a targeted sequencing library from one or more samples, wherein the sequencing library is suitable for massive parallel sequencing and comprises a plurality of double- stranded nucleic acid molecules, wherein the method comprises:
(a) providing at least two populations of primer extension products, which are derived from one or more samples or artificial templates in one reaction or more separate reactions which are performed in one chamber or more separate chambers , wherein forward and reverse primers are designed for each target or artificial template, each reaction is a primer extension reaction or PCR amplification, comprising a DNA polymerase, and a set of forward and reverse primers;
(b) linking the primer extension products of a first population with the primer extension product of a second population through end joining, wherein it is designed that the reverse primer side of the primer extension product is the end for end joining;
(c) processing the linked products for sequencing, which comprises optionally amplifying the linked products.
The at least one primer extension reaction is preferably a PCR amplification, which may include RT-PCR with reverse transcription before the PCR. The RT-PCR is preferably performed in one step, single reaction. The first population of primer extension product may be derived from first single molecule or from limited number of molecules, and second population of primer extension product may be derived from second single molecule or limited number of molecules, the reaction is performed in one chamber, wherein the limited number can be 2, 3, 4 , 5, 6, 7, 8, or up to 50. The first single molecule or limited number of molecules may be a target molecule from a sample, the second single molecule or limited number of molecules may be an artificial template. In another aspect, the first population of primer extension product may be derived from a single cell, and second population of primer extension product may be derived from second single molecule, which may be an artificial template, the reaction is performed in one chamber.
The artificial template comprises random sequence identifier (RSI) which comprises a degenerate or semi-degenerate or completely random nucleic acid sequence. The length of RSI can be from 2 nucleotides to 50 nucleotides long, preferably from 3 nucleotides to 30 nucleotides long, or more preferably from 3 nucleotides to 20 nucleotides long.
The linking may comprise a ligation reaction by a DNA ligase. Before ligation reaction, the primer extension products may be treated by enzyme to allow the ends of the products to be ligated. The treating may comprise restriction enzyme digestion or phosphorylation by a kinase or trimming the ends to be blunt ends.
The linking may comprise an extension reaction by the DNA polymerase, wherein the 3 'end part of a forward strand derived from forward primer of an extended duplex of the first population is complementary to the 3 'end part of a forward strand of forward primer of an extended duplex of the second population. The primer extension reaction may comprise an asymmetric PCR, wherein the ratio of forward primers and reverse primers is more than 1, preferably more than 2. The primer comprises a 3' target complementary portion and a 5' tail portion, or the primer comprises a 3' target complementary portion, which is disrupted by a bulge portion located at 3, 4, 5, 6 or more nucleotides from 3' end of the primer, wherein the 5' tail portion or the bulge portion is not complementary to the initial target sequence (Fig. 7). The 5' tail portion or the bulge portion of the primer may comprise a restriction enzyme site, such that primer extension products, after digestion, are ligatable (Fig. 7C and D). When the primer with bulge portion is annealing to the primer binding site of the initial nucleic acid target template, the hybridisation creates an unpaired base bulge. The bulge can have any length, depending on what functional sequence it contains. The bulge portion, like the 5' tail portion may comprise restriction site, RSI, or sequence compatible for NGS sequencing, such as sequencing primer site.
Alternatively, the 5' tail portion of the primers for the first population comprises sequence complementary to the sequence of the 5' tail portion of the primers for the second population, such that the two populations of the primer extension products can be linked together by an extension reaction (Fig. 7F).
The primer may comprise a RSI portion between the 5' tail portion and the 3' target complementary portion. In another expression version, the 5' tail portion or the bulge portion of at least one set of the primers may comprise random sequence identifier which is used to group sequencing reads to a family of target sequences (Fig. 7 B, C and D). The 5' tail portion or the bulge portion of the primer may comprise a restriction enzyme site and random sequence identifier, wherein the restriction site is located 5' to the random sequence identifier (Fig. 7C and D). The random sequence identifier comprises degenerate or semi-degenerate or completely random nucleic acid sequence. The random sequence identifier may comprise a sequence that is between approximately 3 and 18 nucleotides in length.
The 5' tail portion or bulge portion of the primer may comprise sample multiplex identifier (MID) which is used to group sequencing reads to a sample. The 5' tail portion or the bulge portion may comprise sequence compatible for sequencing (Fig. 7B, C and D).
The linked products may be used directly for next generation sequencing. Alternatively, the linked products may be amplified using primers targeting to the 5' tail portion or the bulge portion of an extended forward primer flanking the target sequences, or primers which hybridise to the internal part of the target sequences, which is commonly referred to as nested primers.
In another aspect, the invention provides a method for high-throughput sequencing and analysis of nucleic acids from a plurality of biological samples comprising: (a) providing in a single reaction chamber at least two populations of primer extension products wherein forward and reverse primers are designed for each target/template sequence;
(b) linking the two population of the primer extension products through end joining, wherein it is designed that the reverse primer side of the primer extension product is the end for product joining;
(c) processing the linked products for sequencing, which comprises optionally amplifying the linked products.
(d) sequencing the product of (c); and
(e) correlating the nucleic acid sequences to a single sample of the plurality of biological samples through linked correlation identification.
The two populations of primer extension products may be generated from target nucleic acid sequences, which are derived from an individual single sample of the plurality of biological samples. The first population of primer extension product may be generated from a single target sample; second population of primer extension product may be generated from one barcoded artificial template or no more than five barcoded artificial templates. The single target sample may be a single target molecule. The single target sample may be a single cell. The reaction chamber may be a compartment of water/oil emulsion or droplet. The step (a) may be in a single assay. The biological samples may be cells. The reaction may be performed on target
polynucleotides from an isolated single cell or single target molecule. The reaction mixture and the single molecule or single cell may be isolated in an oil/water emulsion or droplet. The barcoded artificial template comprises random sequence identifier which is used to group sequencing reads to a family of target sequences. The random sequence identifier comprises degenerate or semi-degenerate or completely random nucleic acid sequence, which comprises a sequence that is between approximately 2 and 30 nucleotides in length. The barcoded artificial template or the 5' tail portion of the reverse may comprise sample sequence identifier which is used to group sequencing reads to a sample.
The linking of two populations may comprise an extension reaction by the DNA polymerase, wherein the 3 'end part of a single stranded DNA derived from forward primer extension product of the first population is complementary to the 3 'end part of a single stranded DNA of forward primer extension product of the second population. The primer extension reaction may comprise an asymmetric PCR, wherein the ratio of forward primers and reverse primers is more than 1, preferably more than 2. In this method, generating the two populations of the primer extension products can be performed in a single PCR reaction. This method of coupling the two population of amplicons is also called splicing by overlap extension or fusion PCR. Two of the PCR primers have complementary sequences so that two amplicons, when being single-stranded, function as primers and they fuse to each other. In this way, single target molecule, or multiple nucleic acid sequences in a single cell can link to an artificial template, which contains unique random sequence identifier. The single reaction chamber can be achieved by emulsion or droplet method. In the emulsion-based method, single molecules or single cells are placed into individual compartments (chambers) of water-in-oil emulsion. Emulsion may be formed by using physical methods (e.g., vortexing) that depends on Poisson statistics to achieve clonality. Alternatively, emulsions may be generated using microfluidic technology (droplet fusion).
In one embodiment, the target nucleic acid sequence may be a single molecule, which generates a first population of the amplicon. The second population of the amplicon is generated from a single artificial template, which comprise unique random sequence identifier (the barcode). The single target molecule and single artificial template are isolated in oil/water emulsion or droplet. The linking of the two populations occurs in the same reaction chamber through extending the complementary ends of the two population of the amplicons. In another embodiment, the target nucleic acid sequences may be a single cell, which generates a first population of the amplicon. The second population of the amplicon is generated from a single artificial template, which comprise unique random sequence identifier (the barcode).
Alternatively, the second population of the amplicon is generated from the same single cell, which comprise unique target sequences, for example the heavy or light chain of antibody or TCR genes. The single cell and single artificial template are isolated in oil/water emulsion. The linking of the two populations occurs in the same reaction chamber through extending the complementary ends of the two population of the amplicons. The sequencing and analysis may be of a transcriptome or genome. The reaction is emulsion PCR. The step (a) may be in a single assay. The biological samples may be cells, which may be selected from the group consisting of cells in in vitro culture, stem cells, tumour cells, tissue biopsy cells, hybridomas, blood cells, and tissue section cells, wherein the blood cells may be T-lymphocytes or B-lymphocytes, wherein one of the primers may be an oligo (dT) primer.
In another embodiment, the two populations of initial amplicons are derived from the transcripts of one sample, are ligated and sequenced. Accurate gene expression analysis can be obtained through the counting the family of sequence reads. In another embodiment, each single original transcript may be linked to a unique artificial template through emulsion PCR, the sequence reads shared with the same unique identifier of the artificial template will be counted as one transcript. In a further embodiment (Fig. 6), transcripts of million individual cells can be sequenced and analysed. Individual cells from a sample will be mixed with a single or two unique artificial templates in an emulsion. In one emulsion chamber there may be one cell with one or two unique artificial templates. Every interested transcript is amplified by RT-PCR, in the meanwhile the unique artificial template is also amplified by PCR. Preferably, before RT-PCR amplifying transcripts, the unique artificial template is amplified first to generate enough unique artificial sequence amplicons, which is used to link the transcript amplicons. To amplify artificial template before amplifying transcripts, one may use higher annealing temperature capable of amplifying the artificial template without amplifying the transcripts as the primer's Tm for artificial template is higher than primer's Tm for transcripts. After generating enough artificial template amplicon, the annealing temperature can be lowed to amplify the transcripts. This low annealing temperature PCR also promotes the joining (linking) of the two populations of the amplicons together. The linked products may be further amplified and sequenced.
In another aspect, the present invention provides a method for multiplex amplifying and enriching multiple mutated target nucleic acid sequences in a sample may containing a small proportion of mutated sequences in a large wild-type background, for next generation sequencing analysis or other high-throughput detection.
The release of cell-free DNA into the bloodstream from dying tumour cells has been well documented in patients with various types of cancer. Research has shown that circulating tumour DNA can be used as a non-invasive biomarker to detect the presence of malignancy, follow treatment response, or monitor for recurrence. However, current methods of detection have significant limitations. Next Generation Sequencing (NGS) methods have revolutionised genomic exploration by allowing simultaneous sequencing of hundreds of billions of base pairs at a small fraction of the time and cost of traditional methods. However, the error rate of ~ 1% results in hundreds of millions of sequencing mistakes, which is unacceptable when aiming to identify rare mutants in genetically heterogeneous mixtures, such as tumours and plasma. The methods of this invention overcome these limitations in sequencing accuracy. Mutation- harbouring cfDNA can be obscured by a relative excess of background wild-type DNA;
detection has proven to be challenging. The method greatly reduces errors by independently tagging and sequencing each original DNA duplex.
The methods of the present invention can substantially improve the accuracy of massively parallel sequencing. It can be implemented through either RSI in target specific primer and can be applied to virtually any sample preparation workflow or sequencing platform. The approach can easily be used to identify rare mutants in a population of DNA templates. One of the advantages of the strategy is that it yields the number of templates analysed as well as the fraction of templates containing variant bases. The two strands of one target template in sample, each is uniquely tagged and independently sequenced. Comparing the sequences of the two strands results in either agreement to each other or disagreement. The agreement gives the confidence to score a mutation as true positive. Artifactual mutations introduced during PCR amplification are detectable as errors, if both strand sequences of two populations does not agree to each other.
In one embodiment, during the linear amplification and RSI tagging, many "families" of molecules are created, each of which arose from a single strand of an individual DNA molecule. After sequencing, members of each PCR family are identified and grouped by virtue of sharing the identical RSI tag sequence. The sequences of uniquely RSI tagged family and two strands of target sequences are then compared to create a PCR consensus sequence. This step filters out random errors introduced during sequencing or PCR to yield a set of sequences, each of which derives from an individual molecule of single-stranded DNA.
Next, sequences belonging to the two complementary strands of each target are identified by searching for complementary sequences among sequencing reads. Following partnering of the two strands, the sequences of the strands are compared. A sequence base at a given position is kept only if the read data from each of the two strands matches perfectly. The ratio of any mutation among the two strands are also compared; only the similar ratio of the numbers of mutant and normal sequence among the two strands indicates true mutation positive. Comparing the sequences obtained from both strands eliminates errors introduced during the first round of PCR where an artifactual mutation may be propagated to all PCR duplicates of one strand and would not be removed by single strand sequencing filtering alone.
In addition to their application for high sensitivity detection of rare DNA variants, the barcoded random sequence identifier in the target specific primer can also be used for single- molecule counting to precisely determine absolute DNA or RNA copy numbers. Because tagging occurs before major amplification, the relative abundance of variants in a population can be accurately assessed given that proportional representation is not subject to skewing by amplification biases.
Reagents employed in the methods of the invention can be packaged into kits. Kits include the primers, in separate containers or in a single master mixture container. The kit may also contain other suitably packaged reagents and materials needed for extension, amplification, enrichment, for example, buffers, dNTPs, and/or polymerizing means; and for detection analysis, for example, and enzymes, as well as instructions for conducting the assay.
The methods of the present invention greatly reduce errors by: tagging two strands of any target sequences or linking two populations of the same set of target sequences (or one target sequence and one artificial unique template with random sequence identifier) derived from one or two separate initial preparations with identifiable sequence signatures; tagging each target sequence with random sequence identifier; BiDirectional sequencing the two strands or linked two target sequences. In addition, the methods provide uniform amplification of multiple target sequences. Analysis provides error-corrected consensus sequences by grouping the sequenced uniquely tagged sequences or linked two amplicons into families containing the same pair of the two amplicons, which is further grouped into families containing the same set of random sequence identifier sequences; removing the target sequences of the same family having one or more nucleotide positions where the target sequence disagree with majority members in a family; and same mutations appearing in the two populations would be the true mutations.
The method can be used for detecting mutation in any sample such as FFPE or blood. The accurate counting of sequencing reads which reflect the original molecules present in a sample provides information for copy number variations or for prenatal test for chromosome abnormality.
BRIEF DESCRIPTION OF THE DRAWINGS
FIG.1 depicts a schematic of an illustrative embodiment of the present invention. In a forward reaction, a set of multiple forward primers are hybridised to the first strands of the target sequences. In the presence of ingredients for linear amplification, single stranded amplification products are generated. The linear amplification may be thermal cycling amplification with one side of primers (not primer pairs). In the linear amplification, only one strand of a target sequence is amplified. For example, if there is 20 cycles, the strand is amplified 20 fold in theory. Each primer has a random sequence identifier, such that each amplified single- stranded product has an unique sequence identifier, which can be identified during sequence analysis. The single - stranded linear amplification product may be enzymatically treated to remove unreacted primers, or purified or enriched. This step is optional, as it may be not necessary if the primers are greatly diminished after linear amplification. The singe-stranded linear product then is PCR amplified using forward primers (may be universal primers or target specific primers) and target specific reverse primers. The PCR products may be further amplified in another PCR to add universal primers used for sequencing. In this step, all forward reaction and reverse reactions may be mixed and amplified by using universal primers, which target to the 5' tail portion/bulge portion of the target specific primers used in step (a) and step (d). Then the PCR products may be purified and size selected ready for NGS sequencing.
FIG.2 depicts a schematic of an illustrative embodiment of the present invention. In a forward reaction, a set of multiple forward primers are hybridised to the first strands of the target sequences. The forward primer comprises a bulge portion, alternatively, the forward primer may comprise both 5' tail portion and bulge potion. In the presence of ingredients for linear amplification, single stranded amplification products are generated. The linear amplification may be thermal cycling amplification with one side of primers (not primer pairs). In the linear amplification, only one strand of a target sequence is amplified. For example, if there is 20 cycles, the strand is amplified 20 fold in theory. Each primer comprises a random sequence identifier in the bulge, such that each amplified single-stranded product has an unique sequence identifier, which can be identified during sequence analysis. The single- stranded linear amplification product may be enzymatically treated to remove unreacted primers, or purified or enriched. This step is optional, as it may be not necessary if the primers are greatly diminished after linear amplification. The singe-stranded linear product then is PCR amplified using forward primers which may be universal primers or target specific primers and target specific reverse primers. The PCR products may be further amplified in another PCR to add universal primers used for sequencing. In this step, all forward reaction and reverse reactions may be mixed and amplified by using universal primers, which target to the 5' tail portion/bulge portion of the target specific primers used in step (a) and step (d). Then the PCR products may be purified and size selected ready for NGS sequencing.
FIG. 3 Starting DNA sample is divided into two reactions; each amplifies one strand of a double stranded target molecule. This amplification is a single-side linear amplification generating single- stranded product. The primer contains unique random sequence identifier (RSI), which gives each single- stranded amplification molecule an identity. The single- stranded amplification product is enriched by hybridising a second set of target- specific primer, one -pass extension and purifying, or capturing on beads. The unreacted primers and primer dimers are removed. The enriched product is PCR amplified using primers compatible to a NGS platform, Proceed to NGS. Analysing the NGS reads derived from the first reaction and the second reaction, which represent two different strands of target sequences, comprising generating an error-corrected consensus sequences by (i) grouping into families containing the same set of random sequence identifier sequences; (ii) removing the target sequences of the same family having one or more nucleotide positions where the target sequence disagree with majority members, and (iii) examining if the same mutations appearing in the two reactions, which represent different strands of a target sequence.
FIG. 4 depicts primers and affinity labelled oligonucleotide. (A) a primer with 5' tail portion and 3' target complementary portion (left), a primer with bulge portion (right). (B) primer comprises 5' tail portion, RSI and 3 ' target complementary portion, or comprises RSI in the bulge portion. (C) primer comprises 5' tail portion, RSI, sample MID and 3' target complementary portion. (D) affinity labelled oligonucleotide hybridises to the 5' tail portion of a primer, the affinity label is attached to a bead.
FIG.5 depicts a schematic of an illustrative embodiment of the present invention. A single sample which may be a single target molecule is present in a plurality of multiple samples which for example may be a plasma DNA. Multiple single-molecule reactions may be performed in multiple reaction chambers which can be water/oil emulsion. Each chamber (or a single reaction vessel) may contain single target molecule(first population), single barcoded artificial template (second population) and PCR reagents for generating two populations of amplicons. The 5' tail sequence of the reverse primers for the first population is complementary to the whole reverse primer (or the 5' tail sequence of the reverse primers) for the second population. After a few cycles of an initial PCR, in the same reaction vessel under an appropriate condition such as lower the annealing temperature, the end part of the forward strand of the first population anneals to the end part of the forward strand of the second population, which then extend using other strand as template. The initial PCR is preferably performed as asymmetric PCR with forward primer having higher amount than reverse primer. After end joining, the linked products may be sequenced directly. Alternatively, the linked products may be optionally amplified using primers compatible for a next generation sequencing platform. In this method, each artificial template comprises a barcoded random sequence identifier, such that the amplicons from the single target molecule are tagged with such unique sequence identifier.
FIG.6 depicts a schematic of an illustrative embodiments of the present invention. A single sample which may be a single cell is present in a plurality of multiple samples which can be a blood tissue. Multiple single cell reactions may be performed in multiple reaction chambers which can be water/oil emulsion. Each chamber (or a single reaction compartment) may contain single cell (for generating the first population of amplicons), single barcoded artificial template (second population) and PCR reagents (or RT-PCR) for generating two populations of amplicons. The 5' tail sequence of the reverse primers for the first population is complementary to the whole reverse primer (or the 5' tail sequence of the reverse primers) for the second population. After a few cycles of an initial PCR, in the same reaction vessel under an appropriate condition such as lower the annealing temperature, the end part of the forward strand of the first population anneals to the end part of the forward strand of the second population, which then extend using other strand as template. The initial PCR is preferably performed as asymmetric PCR with forward primer having higher amount than reverse primer. After end joining, the linked products may be sequenced directly. Alternatively, the linked products may be optionally amplified using primers compatible for a next generation sequencing platform. In this method, each artificial template comprises a barcoded random sequence identifier, such that the amplicons from the single cell are tagged with such unique sequence identifier.
FIG. 7 shows primers used in the initial amplification/primer extension. (A) the reverse target specific primer comprises 3' target complementary portion and 5' tail portion (or a bulge portion). The forward target specific primer comprises 3' target complementary portion and optional 5' tail portion (or a bulge portion). If forward primer comprises 5' tail portion, a universal primer targeting the 5' tail portion may be used to amplify the linked products. If forward primer does not comprise 5' tail portion, nested target specific primers targeting internal regions of target sequences may be used to amplify the linked products. The 5' tail portion of the primer may comprise RSI and/or sample index identifier. (B) The 5' tail portion of the primer may comprise RSI. (C) The 5' tail portion of the primer may comprise RSI and restriction endonuclease recognition site. (D) The bulge portion of the primer may comprise RSI and restriction endonuclease recognition site. (E) The bulge portion of the primer may sequence compatible to the sequencing during NGS. (F) The 5' tail sequence of the reverse primers for the first population is complementary to the 5' tail sequence of the reverse primers for the second population.
Example 1 A cancer mutation hot spot panel was designed, containing 245x2 primer pairs. The Panel contains four pools of primers used to perform multiplex PCR for preparation of amplicon libraries from genomic "hot spot" regions that are frequently mutated in human cancer genes. The Hotspot Panel was designed to amplify 245 amplicons covering approximately 3,000 COSMIC mutations from 50 oncogenes and tumor suppressor genes.
ABL1 EZH2 JAK3 PTEN
AKT1 FBXW7 IDH2 PTPN11
ALK FGFR1 KDR RB I
APC FGFR2 KIT RET
ATM FGFR3 KRAS SMAD4
BRAF FLT3 MET SMARCB 1
CDH1 GNA11 MLH1 SMO
CDKN2A GNAS MPL SRC
CSF1RGNAQNOTCH1 STK11
CTNNB 1 HNF1A NPM1 TP53
EGFR HRAS NRAS VHL
ERBB2 IDH1 PDGFRA
ERBB4 JAK2 PIK3CA
Fp5 pool contains first set of forward primers, which has the structure: 5' tail(universal)-RSI-
MID-target specific;
Rp5 pool contains first set of reverse primers, which has the structure: 5' tail(universal)-RSI- MID-target specific;
Fp7 pool contains second set of forward primers, which has the structure: 5' tail(universal)-target specific;
Fp7 pool contains second set of reverse primers, which has the structure: 5' tail(universal)-target specific.
Each pool contains 245 primers.
In the first step, all target regions of the gene(s) of interest were linearly amplified in two separate multiplex reactions per sample: forward reaction and reverse reaction, using Fp5 primer mix and Rp5 primer mix respectably and a hot-start DNA polymerase. The linear amplified products were then purified to remove unreacted primers.
In the second step, PCR was performed using universal primer targeting the tail portion of Fp5 and Rp5 primer, and target specific primer mix RP7 or FP7. The amplified PCR was purified from unreacted primers. This product was ready for Ion Torrent sequencing. For Illumina platform, in the third step, a universal PCR was performed to enable tagging of the amplicons with specific MIDs and adaptors required for sequencing with the Illumina MiSeq MPS systems using the MID for Illumina MiSeq kits.
Each tagged amplicon library was subsequently purified from small residual DNA fragments and the DNA concentration determined. Next, these purified and individually tagged amplicon libraries were pooled equimolarly, resulting in an amplicon pool or sequencing sample.
Human DNA was either freshly prepared or stored between 4°C and 8°C (short-term storage) or in a freezer between -15°C and -25°C (long-term storage). Suitable buffers are TE (10 mM Tris, 1 mM EDTA; pH8) or TE-4 (10 mM Tris, 0.1 mM EDTA; pH8). Successful amplification of FFPE-derived DNA is highly dependent on the DNA quality. DNA extracted from fresh frozen tissue or blood.
All primers were synthesised by Eurofins or Eurogentec and were diluted to lOuM. DNA polymerases were purchased from Promega, Thermo Fisher or NEB.
Linear Amplification
Two separate reactions were prepared: forward reaction contains FP5 primer mix, reverse reaction contains RP5 primer mix, as follows
Figure imgf000029_0001
Cycling conditions:
Figure imgf000029_0002
Purify all 10 μΐ with 1.8x Ampure beads (18 μΐ) as described below
Bead Purification Process
The Workflow for the Purification process was as follows:
1. Add the appropriate amount of Ampure beads (0.7-1.8 μΐ) per 1 μΐ of sample
2. Pipette mix lOx and incubate at room temperature for approximately 2 mins 3. Place on a magnetic plate for another 2 mins approximately and remove supernatant. If beads are easily disturbed incubate on magnetic plate for a few more minutes
4. Wash beads twice with 80% ethanol for 30 seconds each time. Leave tubes uncapped on magnet to dry for 25-30 mins. To remove residual ethanol centrifuge briefly
5. Add 20 μΐ of H20 and pipette mix making sure to re-suspend all the beads. Incubate on bench for approx. 2mins
6. Place back on magnet for approx. 5 mins and retain supernatant
1st PCR
After purification, the linear amplified product was PCR amplified using universal P5SEq primer which contains 5' p5 sequence and 3' sequence targeting the tail portion of RP5 and FP5 primer, and second set of target specific primer, which comprise 5' tail portion and 3' target specific portion.
Figure imgf000030_0001
Cycling conditions:
Figure imgf000030_0002
Purify 18μ1 of the PCR (leaving behind 5μ1 to run on a gel) after running the gel. Add between 1-1.6 μΐ depending on the size and intensity of the non-specific bands. To remove anything at 150 bp and below use 1.0 μΐ Ampure beads per 1 μΐ PCR product. The purified product was ready for IonTorren platform sequencing. 2na PCR To add index barcode for Illumina sequencing, the purified product was amplified
universal P5 primer and index primer P7index.
Figure imgf000031_0002
Cycling conditions:
Figure imgf000031_0003
Final Purification performed after running 5 μΐ on an agarose gel. All bands below 200 bp should be removed. Use between 0.7-0.8 μΐ Ampure beads per 1 μΐ of product. If uncertain retain supernatant to avoid losing all samples. Elute in 20 μΐ H20 and quantify 2 μΐ of each sample on the qubit.
Adjust the concentration so that they were all in a similar range. Add 5-10 μΐ of each sample, at the adjusted concentration, into a Lobind Eppendorf tube and quantify 2 μΐ of this mix on the qubit also.
Sequence analysis
After the quality check the alignment were proceeded against hgl9.
Figure imgf000031_0001
Sample Amplicon % in % off target Mean
Territory Target Coverage
Sample_AD011 38572 77 23 1697 x
Sample_AD012 38572 76 24 1647 x
Sample_AD013 38572 77 23 1696 x
Sample_AD014 38572 76 24 2659 x
After the alignement reads were grouped into families, grouped reads in families were characterised by a random identifier identity of 100%.
Figure imgf000032_0001
Reads of forward reaction and reverse reaction were compared, rare mutations appearing in the both reaction were identified.
Example 2
In the first step, all target regions of the gene(s) of interest were linearly amplified in two separate multiplex reactions per sample: forward reaction and reverse reaction, using Fp5 primer mix and Rp5 primer mix respectably and a hot-start DNA polymerase. The linear amplified products were then hybridised with a second set of multiple target specific primers Fp7pl or Rp7pl . The 5' tail portion of the second set of primers was hybridised with a biotin labelled probe P7extBio:5' CCTCGTATGCCGTCTTCTGCT3 ' (SEQ ID NO: 1). The complex containing linear amplified product/ second set of primer/ P7extBio were affinity purified by streptavidin beads.
The followed steps were the same as example 1. Example 3
In the first step, all target regions of the gene(s) of interest were linearly amplified in two separate multiplex reactions per sample: forward reaction and reverse reaction, using Fp5 primer mix and Rp5 primer mix respectably and a hot-start DNA polymerase. The linear amplified products were then hybridised with a second set of multiple target specific primers Fp7pl or Rp7pl. One pass extension of the second set of primer on the template of the linear amplified product was performed. The double- stranded extension product was either purified by beads or was digested with Exonuclease I, followed by heat inactivation.
The above product was amplified using universal primers p5SEq and P7index. Then the product was purified for sequencing.
Example 4.
Forward and reverser bulge primers were designed to have a structure as 5 '-complementary part- RSI - 3' complementary part. Here is an example of bulge primers Ab922F, 5'
Caagcccactgtctatggtgnnnnnnnnnccccaac-3'(SEQ ID NO: 2); A 073R, 5'- CTTCACGGCCACCGTCAGGCnnnnnnnnnCTTCCAC-3' (SEQ ID NO: 3). All target regions of the gene(s) of interest were linearly amplified in two separate multiplex reactions per sample: forward reaction and reverse reaction, using Fp5bbulge primer mix and Rp5bulge primer mix respectably and a hot-start DNA polymerase. The linear amplified products were purified using beads as described in example 1. The purified products were amplified using a second set of target specific primers and a third set of target specific primers. The second set of specific primers is capable of annealing to the linear amplification product, and third set of target specific primers related to the 5' part sequence of the bulge primers of first set. The rest steps were the same as Example 1.
Example 5
Emulsion PCR experiments were performed for linking individual nucleic acid molecule with an unique sequence identifier or linking nucleic acids of a single cell with an unique sequence identifier, to make targeted sequencing libraries. In this example, nucleic acid molecule with an unique sequence identifier is an artificial template.
Target specific primers were designed to amplify EGFR and Kras gene fragments: EGFRF333e GTCGTTTTACGTTGGcgcagttgggcacttttgaa (SEQ ID NO: 4); EGFRR443e CTTTAAGAAGGAAAGATCATAtg (SEQ ID NO: 5); KRASF170e
GTCGTTTTACGTTGGCCTGCTGAAAATGACTGAA (SEQ ID NO: 6); KRASR283e GGATCATATTCGTCCACAA (SEQ ID NO: 7). Artificial template containing 18 random nucleotides was synthesised by Eurofins. NNNtemp,
ACACTCTTTCCCTACACGACGCTCTTCCGATCTNNNNNNNNNNNNNNNNNNTCTGGC C GTC GTTTT AC (SEQ ID NO: 8). Primer pair for this artificial template are PISE,
ACACTCTTTCCCTACACGACGCTC (SEQ ID NO: 9); M13m,
CCAACGTAAAACGACGGCCAGA (SEQ ID NO: 10).
Materials
Figure imgf000034_0002
Procedure
1. Oil- surfactant mixture is prepared first in a 50ml tube and mixed at room temperature:
Figure imgf000034_0003
2. 150 μΐ of the oil-surfactant mix added per 25 μΐ of the aqueous phase mix (PCR reaction)
Simultaneously run an Open' (not in emulsion) PCR as a positive control
Aqueous Phase mix:
Figure imgf000034_0001
Figure imgf000035_0001
For the emulsion, 300μ1 of oil mix added to each Eppendorf tube (x2) and vortexed at 2200rpm for approximately 2 minutes per tube. This was then divided between 2 wells, approximately 120μ1 (solution sticking to pipette tips meant some was lost)
3. For the open PCR control added 25μ1 per well of the aqueous phase (Step 2)
4. The following cycling program was used:
Figure imgf000035_0002
5. Both the emulsion and open PCR reactions were kept at +4°C overnight
6. Took out reactions from +4°C and equilibrated to room temperature. Pooled emulsion PCR into a 1.5ml tube and added 1ml of isobutanol, vortexed briefly and centrifuged at 10,500 rpm for 1 min. Discard top oil phase and leave the pellet to dry for approximately 5 mins
7. Resuspended the pellet by adding 40 μΐ of water and gently mixing with a pipette. No
purification kit used at this point
8. Both emulsion and open PCR samples ran on gel to verify amplification as shown in Figure 1 Splicing by overlap extension PCR (SOE-PCR)
SOE PCR is a method for combining two DNA sequences (splicing) without the need for restriction sites. This is achieved by designing two of the four PCR primers to have
complementary sequences so that the resulting amplicons fuse to each other. Using this principle an artificial template containing a string of NNNNNNs will be fused to the genomic DNA amplicon providing a unique sequence identifier (tag).
Procedure
1. Made 10μΜ of the primer and the following mix:
Components XI (μΐ) X5 (μΐ)
Figure imgf000036_0001
To make sure that the fragment amplified was the spliced product, one reaction contained the artificial template only (NNNtemp), another genomic DNA only (SUN DNA) and one more containing both artificial and genomic DNA (NNNtemp+SUN) as well as an NTC.
2. The cycling conditions used were as follows:
Figure imgf000036_0002
3. Gel run used to identify if the SOE PCR was successful by comparing fragment size of reaction containing both templates to reactions containing either genomic DNA or artificial DNA only.
Experiments of emulsion PCR in combination with Splicing by overlap extension PCR (SOE- PCR) were performed in emulsion as described .
1. Oil- surfactant mixture is prepared first in a 50ml tube and mixed at room temperature as described above.
2. 150 μΐ of the oil-surfactant mix added per 25 μΐ of the aqueous phase mix (PCR reaction). Simultaneously run an Open' (not in emulsion) PCR as a positive control, using primers for
SOE-PCR and artificial template and human DNA, which is diluted to appropriate one copy per emulsion well. 3. For the open PCR control added 25μ1 per well of the aqueous phase (Step 2)
4. The cycling program was used as described above.
5. Both the emulsion and open PCR reactions were kept at +4°C overnight. Took out reactions from +4°C and equilibrated to room temperature. Pooled emulsion PCR into a 1.5ml tube and added 1ml of isobutanol, vortexed briefly and centrifuged at 10,500 rpm for 1 min. Discard top oil phase and leave the pellet to dry for approximately 5 mins
6. Resuspended the pellet by adding 40 μΐ of water and gently mixing with a pipette. No purification kit used at this point. Both emulsion and open PCR samples ran on gel to verify amplification
FURTHER ASPECTS OF THE DISCLOSURE
What is described herein is:
1. A method of processing target nucleic acids from one or more samples, wherein a target nucleic acid in a sample is either single-stranded molecule (referred to as first strand, its complement is referred to as second strand) or double-stranded duplex which comprises a first strand and a complementary second strand, wherein the method comprises:
(a) providing reaction mixture(s), each comprising a first set of multiple target specific primers annealing to multiple target sequences, for any particular target sequence, a forward primer is designed to hybridise to the first strand of the target sequence, and reverse primer is designed to hybridise to the second strand of the target sequence, wherein in one reaction mixture the set of the target specific primers comprises either forward primers or reverse primers but not both;
(b) single- side linear amplification of the target sequences to generate single- stranded
amplification products;
(c) treating the products step (b) to enrich the products;
(d) PCR amplifying the products of step (c) using primers to generate double- stranded PCR products.
2. The method of paragraph 1, wherein the method comprises, step (e) further processing the PCR products of step (d) to complete the library preparation for massive parallel sequencing.
3. The method of paragraph 1 or 2, wherein the step (c) and /or step (e) comprise removing the unreacted primers.
4. The method of paragraph 3, wherein the removing of the unreacted primers comprises purifying the single-stranded linear amplification products of step (b) or double-stranded product of step (e), wherein a bead or column based method is used to remove unreacted primers.
5. The method of paragraph 3, wherein the removing of the unreacted primers comprises treating the amplification products by enzymatic digestion to remove the unreacted primers.
6. The method of paragraph 5, wherein the enzymatic digestion is exonuclease I digestion.
7. The method of paragraph 1, wherein the step (c) comprises hybridising the single- stranded amplification products to a second set of target- specific primers.
8. The method of paragraph 7, wherein the hybridised target- specific primers of the second set of primers are extended on the single-stranded amplification products.
9. The method of paragraph 8, wherein the target- specific primer comprises an affinity label or 5' universal tail portion, wherein the 5' universal tail portion of the hybridised target- specific primers are hybridised with an affinity-labelled oligonucleotide complementary to the 5' universal tail.
10. The method of paragraph 9, wherein the affinity label is biotin, the complex of the hybridised amplification products/ target- specific oligonucleotides/biotin-labelled
oligonucleotide are captured by avidin solid supports.
11. The method of paragraph 8, wherein the extended double-strand products are purified.
12. The method of paragraph 1 or 7, wherein the target specific primer comprises a 5' tail portion and a 3' target complementary portion.
13. The method of paragraph 12, wherein in the step (d) primers comprise a universal primer and a second set of multiple target specific primers, wherein the multiple target specific primers in the second set are either reverse primers or forward primers, but not both, wherein the universal primer comprises a sequence related to the 5' tail portion sequence of primers in the first set.
14. The method of paragraph 12, wherein the step (d) primers comprise a first universal primer and a second universal primer, wherein the first universal primer comprises a sequence related to the 5' tail portion sequence of primers in the first set, the second universal primer related to the 5' tail portion sequence of the second set of the target specific primers.
15. The method of paragraph 1 or 7, wherein the target specific primer comprises a bulge portion, both 5' part and 3' part of which are target specific sequences capable of hybridising to the target sequence.
16. The method of paragraph 15, wherein when in the step (a) the reaction mixture comprises target specific primers with bulge portion, the step (c) comprises purifying the product of step (b) to remove the unreacted primers, in the step (d) the purified product of step (c) is amplified by second set of target specific primers comprising 3' priming sequences capable of hybridising to the purified linear amplified product of step (b) and third set of target specific primers comprising 3' priming sequences which comprise sequence identical to the 5' part of bulge portion of the target specific primers of the first set.
17. The method of any one preceding paragraph , wherein the target- specific primer in the first set or second set comprises a random sequence identifier (RSI) located between the 5' tail portion and the 3' target complementary portion or in the bulge portion , wherein RSI portion comprises at least three random or degenerated nucleotides.
18. The method of any one preceding paragraph, wherein the target specific primer comprises a multiplex identifier (MID) located between the 5' tail portion and the 3' target complementary portion or in the bulge portion, wherein MID is used to identify a sample when multiple samples are sequenced together.
19. The method of paragraph 17 and 18, wherein the target specific primer comprises both MID and RSI.
20. The method of paragraph 1, wherein in the step (a) for each sample the reaction mixtures comprise two reactions: forward reaction comprises a first set of multiple target specific forward primers annealing to first strands of the multiple target sequences from one sample, reverse reaction comprises a first set of multiple target specific reverse primers annealing to the second strands of the multiple target sequences from the same one sample.
21. The method of paragraph 20, wherein in the step (d) for the forward reaction, the primers comprise a second set of multiple target specific reverse primers, for the reverse reaction the primers comprise a second set of multiple target specific forward primers.
22. The method of paragraph 1, wherein further comprising analysing the NGS reads derived from the forward reaction and the reverse reaction, which represent two different strands of target sequences, comprising generating consensus sequences by grouping into families containing the same random sequence identifier (RSI)sequences; and counting the numbers of families, which in consideration of linear amplification represent an accurate relative count for the numbers of original target nucleic acids present in a sample.
23. The method of paragraph 1, wherein the single-side linear amplification is thermal cycle amplification with cycling denaturing and annealing/extension steps, wherein the cycle number is more than four.
24. The method of paragraph 1, wherein the single-side linear amplification is isothermal amplification.
25. A kit for performing a method according to one or more of proceeding methods, comprising: providing reaction mixture(s), each comprising a first set of multiple target specific primers annealing to multiple target sequences, wherein for any particular target sequence, forward primers are designed to hybridise to the first strands of the target sequences, reverse primers are designed to hybridise to the second strands of the target sequences, wherein the set of the target specific primers in the forward reaction comprises either forward primers or reverse primers but not both and the set of the target specific primers in the reverse reaction comprises reverse primers; wherein the target specific primer comprises a 5' tail portion and a 3' target complementary portion, or/and the target specific primer comprises a bulge portion, both 5' part and 3' part of which are target specific sequences capable of hybridising to the target sequence;
wherein the target- specific primer in the first set or second set comprises a random sequence identifier (RSI) located between the 5' tail portion and the 3' target complementary portion or in the bulge portion , wherein RSI portion comprises at least three random or degenerated nucleotides;
wherein the reaction mixtures are capable of carrying out linear amplification of the target sequences to generate single-stranded linear amplification products; optionally purifying or enriching reagents for purifying or enriching the single-stranded linear amplification products; and PCR amplifying reagents for amplifying the single-stranded linear amplification products using primers to generate double-stranded PCR products; wherein the primers and reagents are described in the proceeding methods.

Claims

What is claimed is: 1. A method of processing target nucleic acids from one or more samples, wherein a target nucleic acid in a sample is either:
(i) a double- stranded duplex which comprises a first strand and a complementary second strand; or
(ii) a single-stranded molecule which is a first strand or its complementary second strand wherein the method comprises:
(a) providing a reaction mixture(s), each reaction mixture comprising a first set of multiple target specific primers capable of annealing to multiple target sequences, wherein in any one reaction mixture the set of the target specific primers comprises either forward target specific primers or reverse target specific primers but not both;
(b) performing a single-side linear amplification of the target sequences to generate single- stranded amplification products;
(c) treating the products step (b) to enrich the products; and
(d) PCR amplifying the products of step (c) using primers to generate double- stranded PCR products.
2. The method of claim 1, wherein the method further comprises step (e) which comprises processing the PCR products of step (d) to provide a library preparation for massive parallel sequencing.
3. The method of claim 1 or 2, wherein the step (c) and /or step (e) comprises removing the unreacted primers.
4. The method of claim 3, wherein the removing of the unreacted primers comprises purifying the single-stranded linear amplification products of step (b) and/or double-stranded product of step (e) using a bead or column based method.
5. The method of claim 3, wherein the removing of the unreacted primers comprises treating the amplification products by enzymatic digestion.
6. The method of claim 5, wherein the enzymatic digestion is exonuclease I digestion.
7. The method of any preceding claim, wherein the step (c) comprises hybridising the single- stranded amplification products to a second set of target- specific primers.
8. The method of claim 7, wherein the hybridised target- specific primers of the second set of primers are extended on the single- stranded amplification products.
9. The method of claim 7 or 8, wherein the target- specific primer comprises an affinity label or 5' universal tail portion, wherein the 5' universal tail portion of the hybridised target- specific primers are hybridised with an affinity-labelled oligonucleotide complementary to the 5' universal tail.
10. The method of claim 9, wherein the affinity label is biotin and the complex of the hybridised amplification products/ target- specific oligonucleotides/biotin-labelled
oligonucleotide are captured by avidin solid supports.
11. The method of any of claims 8 to 10, wherein the extended double- strand products are purified.
12. The method of any preceding claim, wherein the target specific primer comprises a 5' tail portion and a 3' target complementary portion.
13. The method of claim 12, wherein in step (d) the primers comprise a universal primer and a second set of multiple target specific primers, wherein the multiple target specific primers in the second set are either reverse primers or forward primers, but not both, wherein the universal primer comprises a sequence related to the 5' tail portion sequence of primers in the first set.
14. The method of claim 12, wherein in step (d) the primers comprise a first universal primer and a second universal primer, wherein the first universal primer comprises a sequence related to the 5' tail portion sequence of primers in the first set of the target specific primers and the second universal primer comprises a sequence related to the 5' tail portion sequence of the second set of the target specific primers.
15. The method of any of claims 1 to 14, wherein the target specific primer comprises a bulge portion between a 5' part and a 3' part of the primer which are target specific sequences capable of hybridising to the target sequence.
16. The method of claim 15, wherein when in step (a) the reaction mixture comprises target specific primers with a bulge portion, step (c) comprises purifying the product of step (b) to remove the unreacted primers, in step (d) the purified product of step (c) is amplified by a second set of target specific primers comprising 3' priming sequences capable of hybridising to the purified linear amplified product of step (b) and a third set of target specific primers comprising 3' priming sequences which comprise sequence identical to the 5' part of the bulge portion of the target specific primers of the first set.
17. The method of any of claims 12 to 16, wherein the target- specific primer in the first set or second set comprises a random sequence identifier (RSI) located between the 5' tail portion and the 3' target complementary portion or in the bulge portion, which RSI portion comprises at least three random or degenerated nucleotides.
18. The method of any of claims 11 to 17, wherein the target specific primer comprises a multiplex identifier (MID) located between the 5' tail portion and the 3' target complementary portion or in the bulge portion, which MID allows a sample to be identified when multiple samples are sequenced together.
19. The method of claim 17 or 18, wherein the target specific primer comprises both a MID and a RSI.
20. The method of any preceding claim, wherein step (a) for each sample comprises two reaction mixtures:
(i) a forward reaction which comprises a first set of multiple target specific forward primers capable of annealing to first strands of the multiple target sequences from one sample, and
(ii) a reverse reaction which comprises a first set of multiple target specific reverse primers capable of annealing to the second strands of the multiple target sequences from the same one sample.
21. The method of claim 20, wherein in step (d):
(i) for the forward reaction, the primers comprise a second set of multiple target specific reverse primers; and
(ii) for the reverse reaction, the primers comprise a second set of multiple target specific forward primers.
22. The method of any of claims 17 to 21, further comprising analysing the massive parallel sequencing reads derived from the forward reaction and the reverse reaction, comprising generating consensus sequences by grouping into families containing the same random sequence identifier (RSI)sequences; and counting the numbers of families.
23. The method of any preceding claim, wherein the single-side linear amplification is thermal cycle amplification with cycling denaturing and annealing/extension steps.
24. The method according to claim 23, wherein the cycle number is more than four.
25. The method of any of claims 1 to 22, wherein the single- side linear amplification is isothermal amplification.
26. A kit for performing a method according to any preceding claim comprising:
(a) a first set of multiple target specific primers as defined in any of claims 1 to 24 which are forward or reverse primers capable of annealing to multiple target sequences of either a first strand or a second strand of the target sequences; and/or
(b) a second set of multiple target specific primers as defined in any of claims 13 to 25; and/or
(c) primers for generating double- stranded PCR products.
27. A kit according to claim 26 further comprising: purifying or enriching reagents for purifying or enriching single- stranded amplification and PCR amplifying reagents for amplifying singles-stranded linear amplification products using primers to generate double- stranded PCR products.
PCT/GB2016/051335 2015-05-11 2016-05-11 Methods, compositions, and kits for preparing sequencing library WO2016181128A1 (en)

Applications Claiming Priority (6)

Application Number Priority Date Filing Date Title
GBGB1507978.3A GB201507978D0 (en) 2015-05-11 2015-05-11 Methods, compositions, and kits for preparing sequencing library
GB1507978.3 2015-05-11
GB1517255.4 2015-09-30
GBGB1517255.4A GB201517255D0 (en) 2015-09-30 2015-09-30 Methods,compositions and kits for preparing sequencing library
GBGB1600415.2A GB201600415D0 (en) 2016-01-10 2016-01-10 Methods, compositions, and kits for preparing sequencing library
GB1600415.2 2016-01-10

Publications (1)

Publication Number Publication Date
WO2016181128A1 true WO2016181128A1 (en) 2016-11-17

Family

ID=56015037

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/GB2016/051335 WO2016181128A1 (en) 2015-05-11 2016-05-11 Methods, compositions, and kits for preparing sequencing library

Country Status (1)

Country Link
WO (1) WO2016181128A1 (en)

Cited By (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2018057846A1 (en) * 2016-09-22 2018-03-29 Sigma-Aldrich Co, Llc Single primer to dual primer amplicon switching
CN110079868A (en) * 2019-03-20 2019-08-02 上海思路迪生物医学科技有限公司 BRCA1/2 genetic mutation detects library constructing method and kit
KR20190127804A (en) * 2017-03-09 2019-11-13 아이리퍼트와, 인크. Dimer Avoidance Multiple Polymerase Chain Reaction for Amplification of Multiple Targets
CN110603334A (en) * 2017-06-20 2019-12-20 深圳华大智造科技有限公司 PCR primer pair and application thereof
CN110603327A (en) * 2017-06-20 2019-12-20 深圳华大智造科技有限公司 PCR primer pair and application thereof
WO2020113577A1 (en) * 2018-12-07 2020-06-11 深圳华大生命科学研究院 Method for constructing target gene library, detection device and application thereof
WO2020178772A1 (en) * 2019-03-04 2020-09-10 King Abdullah University Of Science And Technology Compositions and methods of labeling nucleic acids and sequencing and analysis thereof
CN112074613A (en) * 2018-05-04 2020-12-11 海岸线生物群有限责任公司 Multiple specific/non-specific primers for PCR of complex gene libraries
WO2021163546A1 (en) * 2020-02-14 2021-08-19 The Johns Hopkins University Methods and materials for assessing nucleic acids
CN113710815A (en) * 2019-01-04 2021-11-26 威廉马歇莱思大学 Quantitative amplicon sequencing for multiple copy number variation detection and allele ratio quantification
US11459611B2 (en) 2011-04-15 2022-10-04 The Johns Hopkins University Safe sequencing system

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2001055454A1 (en) * 2000-01-28 2001-08-02 Althea Technologies, Inc. Methods for analysis of gene expression
US20040121364A1 (en) * 2000-02-07 2004-06-24 Mark Chee Multiplex nucleic acid reactions
WO2010056513A2 (en) * 2008-10-30 2010-05-20 Sequenom, Inc. Products and processes for multiplex nucleic acid identification
US20110294689A1 (en) * 2010-05-27 2011-12-01 Affymetrix, Inc Multiplex Amplification Methods
US8685678B2 (en) 2010-09-21 2014-04-01 Population Genetics Technologies Ltd Increasing confidence of allele calls with molecular counting
US8742606B2 (en) 2009-12-24 2014-06-03 Doosan Infracore Co., Ltd. Power converting device for hybrid
WO2015121236A1 (en) * 2014-02-11 2015-08-20 F. Hoffmann-La Roche Ag Targeted sequencing and uid filtering

Patent Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2001055454A1 (en) * 2000-01-28 2001-08-02 Althea Technologies, Inc. Methods for analysis of gene expression
US20040121364A1 (en) * 2000-02-07 2004-06-24 Mark Chee Multiplex nucleic acid reactions
WO2010056513A2 (en) * 2008-10-30 2010-05-20 Sequenom, Inc. Products and processes for multiplex nucleic acid identification
US8742606B2 (en) 2009-12-24 2014-06-03 Doosan Infracore Co., Ltd. Power converting device for hybrid
US20110294689A1 (en) * 2010-05-27 2011-12-01 Affymetrix, Inc Multiplex Amplification Methods
US8685678B2 (en) 2010-09-21 2014-04-01 Population Genetics Technologies Ltd Increasing confidence of allele calls with molecular counting
US8722368B2 (en) 2010-09-21 2014-05-13 Population Genetics Technologies Ltd. Method for preparing a counter-tagged population of nucleic acid molecules
WO2015121236A1 (en) * 2014-02-11 2015-08-20 F. Hoffmann-La Roche Ag Targeted sequencing and uid filtering

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
KINDE ET AL., PNAS, vol. 108, no. 23, 7 June 2011 (2011-06-07), pages 9530 - 5
SCHMITT ET AL., PNAS, vol. 109, pages 14508 - 14513

Cited By (23)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US11773440B2 (en) 2011-04-15 2023-10-03 The Johns Hopkins University Safe sequencing system
US11459611B2 (en) 2011-04-15 2022-10-04 The Johns Hopkins University Safe sequencing system
WO2018057846A1 (en) * 2016-09-22 2018-03-29 Sigma-Aldrich Co, Llc Single primer to dual primer amplicon switching
US11390913B2 (en) 2016-09-22 2022-07-19 Sigma-Aldrich Co. Llc Single primer to dual primer amplicon switching
CN110662756A (en) * 2017-03-09 2020-01-07 艾瑞普特公司 Multiplex polymerase chain reaction avoiding dimers for multi-target amplification
JP2020509747A (en) * 2017-03-09 2020-04-02 アイレパートリー インコーポレイテッド Dimer avoidance multiplex polymerase chain reaction for amplifying multiple targets
KR102593421B1 (en) * 2017-03-09 2023-10-25 아이리퍼트와, 인크. Dimer avoidance multiplex polymerase chain reaction for amplification of multiple targets
CN110662756B (en) * 2017-03-09 2023-09-15 艾瑞普特公司 Dimer-avoiding multiplex polymerase chain reaction for multi-target amplification
EP3585797A4 (en) * 2017-03-09 2020-12-30 Irepertoire, Inc. Dimer avoided multiplex polymerase chain reaction for amplification of multiple targets
JP7280191B2 (en) 2017-03-09 2023-05-23 アイレパートリー インコーポレイテッド Dimer Avoidance Multiplex Polymerase Chain Reaction to Amplify Multiple Targets
KR20190127804A (en) * 2017-03-09 2019-11-13 아이리퍼트와, 인크. Dimer Avoidance Multiple Polymerase Chain Reaction for Amplification of Multiple Targets
CN110603334A (en) * 2017-06-20 2019-12-20 深圳华大智造科技有限公司 PCR primer pair and application thereof
CN110603334B (en) * 2017-06-20 2024-01-16 深圳华大智造科技股份有限公司 PCR primer pair and application thereof
EP3643789A4 (en) * 2017-06-20 2021-01-06 MGI Tech Co., Ltd. Pcr primer pair and application thereof
CN110603327A (en) * 2017-06-20 2019-12-20 深圳华大智造科技有限公司 PCR primer pair and application thereof
JP2021521878A (en) * 2018-05-04 2021-08-30 ショアライン バイオミー エルエルシー Multiple specific // non-specific primers used for PCR of gene complex pools
CN112074613A (en) * 2018-05-04 2020-12-11 海岸线生物群有限责任公司 Multiple specific/non-specific primers for PCR of complex gene libraries
WO2020113577A1 (en) * 2018-12-07 2020-06-11 深圳华大生命科学研究院 Method for constructing target gene library, detection device and application thereof
CN113710815A (en) * 2019-01-04 2021-11-26 威廉马歇莱思大学 Quantitative amplicon sequencing for multiple copy number variation detection and allele ratio quantification
EP3906320A4 (en) * 2019-01-04 2022-10-19 William Marsh Rice University Quantitative amplicon sequencing for multiplexed copy number variation detection and allele ratio quantitation
WO2020178772A1 (en) * 2019-03-04 2020-09-10 King Abdullah University Of Science And Technology Compositions and methods of labeling nucleic acids and sequencing and analysis thereof
CN110079868A (en) * 2019-03-20 2019-08-02 上海思路迪生物医学科技有限公司 BRCA1/2 genetic mutation detects library constructing method and kit
WO2021163546A1 (en) * 2020-02-14 2021-08-19 The Johns Hopkins University Methods and materials for assessing nucleic acids

Similar Documents

Publication Publication Date Title
CN110036118B (en) Compositions and methods for identifying nucleic acid molecules
CN110191961B (en) Method for preparing asymmetrically tagged sequencing library
WO2016181128A1 (en) Methods, compositions, and kits for preparing sequencing library
JP7008407B2 (en) Methods for Identifying and Counting Methylation Changes in Nucleic Acid Sequences, Expressions, Copies, or DNA Using Combinations of nucleases, Ligses, Polymerases, and Sequencing Reactions
CN107075581B (en) Digital measurement by targeted sequencing
JP6525473B2 (en) Compositions and methods for identifying replicate sequencing leads
AU2018277019A1 (en) A method of amplifying single cell transcriptome
CN109844137B (en) Barcoded circular library construction for identification of chimeric products
CN109477142B (en) Asymmetric templates and asymmetric methods of nucleic acid sequencing
CN110023504B (en) Nucleic acid sample preparation method for analyzing cell-free DNA
JP2018521675A (en) Target enrichment by single probe primer extension
CN110777195A (en) Human identity recognition using a set of SNPs
CN111801427B (en) Generation of single-stranded circular DNA templates for single molecules
US20220364169A1 (en) Sequencing method for genomic rearrangement detection
CN111868257A (en) Generation of double stranded DNA templates for Single molecule sequencing
JP7079029B2 (en) Methods, Compositions, and Kits for Preparing Nucleic Acid Libraries
JP7134186B2 (en) Generation of nucleic acid libraries from RNA and DNA
WO2017142989A1 (en) Nucleic acid preparation and analysis
US20230374574A1 (en) Compositions and methods for highly sensitive detection of target sequences in multiplex reactions
US20230112730A1 (en) Compositions and methods for oncology precision assays
JP2023519979A (en) Methods for detecting structural rearrangements within the genome
EP4048812B1 (en) Methods for 3' overhang repair
US20230340588A1 (en) Methods and compositions for reducing base errors of massive parallel sequencing using triseq sequencing
CA3223987A1 (en) Methods, compositions, and kits for preparing sequencing library
WO2024039272A1 (en) Nucleic acid amplification

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 16723452

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 16723452

Country of ref document: EP

Kind code of ref document: A1