WO2021216574A1 - Nucleic acid preparations from multiple samples and uses thereof - Google Patents

Nucleic acid preparations from multiple samples and uses thereof Download PDF

Info

Publication number
WO2021216574A1
WO2021216574A1 PCT/US2021/028193 US2021028193W WO2021216574A1 WO 2021216574 A1 WO2021216574 A1 WO 2021216574A1 US 2021028193 W US2021028193 W US 2021028193W WO 2021216574 A1 WO2021216574 A1 WO 2021216574A1
Authority
WO
WIPO (PCT)
Prior art keywords
sample
nucleic acid
target nucleic
sequence
pcr
Prior art date
Application number
PCT/US2021/028193
Other languages
French (fr)
Inventor
Quan Peng
Sarah BAUGHER
Original Assignee
Qiagen Sciences, Llc
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Qiagen Sciences, Llc filed Critical Qiagen Sciences, Llc
Publication of WO2021216574A1 publication Critical patent/WO2021216574A1/en

Links

Classifications

    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12NMICROORGANISMS OR ENZYMES; COMPOSITIONS THEREOF; PROPAGATING, PRESERVING, OR MAINTAINING MICROORGANISMS; MUTATION OR GENETIC ENGINEERING; CULTURE MEDIA
    • C12N15/00Mutation or genetic engineering; DNA or RNA concerning genetic engineering, vectors, e.g. plasmids, or their isolation, preparation or purification; Use of hosts therefor
    • C12N15/09Recombinant DNA-technology
    • C12N15/10Processes for the isolation, preparation or purification of DNA or RNA
    • C12N15/1034Isolating an individual clone by screening libraries
    • C12N15/1093General methods of preparing gene libraries, not provided for in other subgroups
    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12QMEASURING OR TESTING PROCESSES INVOLVING ENZYMES, NUCLEIC ACIDS OR MICROORGANISMS; COMPOSITIONS OR TEST PAPERS THEREFOR; PROCESSES OF PREPARING SUCH COMPOSITIONS; CONDITION-RESPONSIVE CONTROL IN MICROBIOLOGICAL OR ENZYMOLOGICAL PROCESSES
    • C12Q1/00Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions
    • C12Q1/68Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions involving nucleic acids
    • C12Q1/6806Preparing nucleic acids for analysis, e.g. for polymerase chain reaction [PCR] assay

Definitions

  • An alternative method is bead-based normalization.
  • a limited amount of DNA binding beads is added to each library.
  • the library amount greatly surpasses the binding capacity of the beads, so that the beads are saturated. Excess, unbounded DNAs are removed.
  • each library is normalized to the number of bead-bound DNA. Libraries can be pooled accordingly. No further quantification of each individual library is necessary. Due to its simplicity, such commercial kits are available from many manufacturers. However, it provides less level of consistency and reproducibility compared to qPCR-based normalization methods. Mehta, B. et al., Int. J. Legal Med. 132(1): 125-132 (2016).
  • the methods can further comprise (e) releasing the captured target nucleic acid molecules from the capture probes.
  • the target nucleic acid molecule concentration in each sample can be greater than the corresponding capture probe concentration.
  • the attaching in (a) can comprise ligating the sample tag oligonucleotide to the target nucleic acid molecule in each sample.
  • the sample tag oligonucleotide can comprise a unique sequence specific for each sample.
  • the sample tag oligonucleotide can further comprise a spacer.
  • the sample tag oligonucleotide can further comprise one or more dU bases.
  • the attaching in (a) can comprise performing PCR amplification of each sample with a PCR primer.
  • the PCR primer can comprise a sample tag oligonucleotide, a spacer, and a universal sequence.
  • the PCR primer can further comprise one or more dU bases.
  • the corresponding capture probe can bind to the unique sequence of a corresponding sample tag oligonucleotide attached to the target nucleic acid molecule.
  • the capture probes can be pooled at a desired concentration prior to the adding in (c) to the pooled samples.
  • the concentration of each capture probe can be less than that of each corresponding PCR products.
  • the releasing in (e) can comprise releasing the nucleic acid molecules from the capture probes by, but not limited to, heat or alkaline denaturation.
  • the releasing in (e) can comprise releasing the nucleic acid molecules from the capture probes by uracil-DNA glycosylase digestion of the dU base in the sample specific oligonucleotide.
  • the unique sequence of the sample tag oligonucleotide can comprise 1-100 base differences compared to the unique sequence of the sample tag oligonucleotide for another sample.
  • the ligation or PCR products can comprise a 5’ overhang.
  • the 5’ overhang can comprise, but not limited to, 5-50 bases, 5-40 bases, 5-30 bases, 5-20 bases, 10-20 bases, or any specific number of bases or ranges of bases derived therefrom.
  • the capture probe can be linked to, but not limited to, an affinity tag for binding to a solid surface, His tag, TAP tag, or antibody tag.
  • the affinity tag can be biotin and the solid surface can be a streptavidin coated magnetic bead.
  • the methods disclosed herein can further comprise purifying the pooled target nucleic acid molecules attached to the sample tag oligonucleotides from the multiple samples.
  • the purifying can be by, but not limited to, spin column, magnetic beads, or gel electrophoresis.
  • the method can further comprise quantifying the pooled ligation or PCR products.
  • the quantifying can be performed by, but not limited to, qPCR or electrophoresis.
  • the methods can further comprise sequencing the released target nucleic acid molecules.
  • the methods can further comprise performing next generation sequencing.
  • the methods can further comprise performing third generation sequencing.
  • the methods can further comprise performing single cell analysis.
  • each sample comprises one or more target nucleic acid molecules
  • PCR primer comprises a 5’ variable region comprising a 5’ sample tag sequence comprising a unique sequence specific for each sample, a spacer, and a universal sequence;
  • kits comprising: multiple PCR primers for use with multiple samples comprising target nucleic acid molecules, wherein each PCR primer comprises a 5’ sample tag sequence comprising a unique sequence specific for the each sample, a spacer, and a universal sequence; and a corresponding capture probe capable of binding to the unique sequence specific for each sample.
  • the kits can further comprise a ligase.
  • the PCR primer can further comprise one or more dU bases.
  • the kits can further comprise a uracil-DNA glycosylase.
  • FIG. 2 Comparison of traditional workflow and new, improved workflow disclosed herein.
  • FIG. 3A PCR primer design.
  • FIG. 3B PCR products using the PCR primer design.
  • FIG. 3C Capture probe binding and streptavidin beads pull down.
  • FIG. 3D Final normalized library.
  • FIGS. 4 A and 4B Exemplary capture probes (FIG. 4B) and corresponding uPCR (universal PCR) primers (FIG. 4A) are provided in corresponding rows as numbered. Sequence differences are underlined. “IdeoxyU” is internal deoxyUridine, “iSp9” is internal tri ethylene glycol spacer, and “3Bio” is 3’ Biotin.
  • next generation sequencing enables analyzing many samples simultaneously in one sequencing run.
  • each sample a.k.a. library
  • each sample needs to be individually prepared, quantified, and accurately pooled together before loading on a sequencing instrument. This process is tedious and time consuming.
  • Disclosed herein are new methods for high throughput library normalization from multiple samples. Purification and size selection can be performed after sample pooling. The complexity of handling is reduced to one tube, instead of tens or hundreds of individual tubes.
  • the methods disclosed herein utilize high specific hybridization pull down approach to make sure that each library is at desired concentration in the final library pool, so that individual quantification of each library is not required.
  • the libraries can be pooled together directly after ligation of sample tag oligonucleotides or PCR amplification without requiring individual purification and size selection.
  • the library pool can be purified and normalized by capture probes through high specific hybridization and pull down.
  • the resulting library mix is normalized to the amount of the predetermined capture probe mix used (usually at equal molar). It can be directly loaded on a sequencing instrument.
  • Customers or users also have the option to quantify the library pool by other methods such as qPCR or electrophoresis, although it is not necessary.
  • the early pooling feature makes the approach disclosed herein suitable for high throughput applications when customers or uses have many samples to analyse at once.
  • Disclosed herein are methods for high throughput library normalization.
  • the process involves five major steps: 1) attaching libraries with indexed normalization oligonucleotides (individually for each library), so that each library has its own sample index, as well as a single stranded arm for hybridization; 2) pool libraries together and do combined cleanup; 3) add normalization capture probes which bind to the respective targets; 4) immobilize the bound targets and wash off unbound ones; and 5) release the targets.
  • the output library is normalized and ready to be sequenced.
  • FIG. 1 summarizes the process.
  • the uPCR universal PCR
  • the uPCR can be seamlessly integrated into library amplification reaction of most PCR based NGS library prep kits.
  • a user can replace the PCR primers with specially designed primers.
  • No additional handling is required before sample pooling.
  • purification and size selection for each sample individually is the major bottleneck. It involves multiple manual pipetting steps for each sample.
  • One of the benefits of the workflow disclosed herein is that purification and size selection is after sample pooling. Comparison of a workflow disclosed herein and a traditional workflow is summarized in FIG. 2.
  • the methods can further comprise (e) releasing the captured target nucleic acid molecules from the capture probes.
  • the target nucleic acid molecule concentration in each sample can be greater than the corresponding capture probe concentration.
  • the attaching in (a) can comprise ligating the sample tag oligonucleotide to the target nucleic acid molecule in each sample.
  • the sample tag oligonucleotide can comprise a unique sequence specific for each sample.
  • the sample tag oligonucleotide can further comprise one or more dU bases.
  • the sample tag oligonucleotide can further comprise a spacer.
  • the attaching in (a) can comprise performing PCR amplification of each sample with a PCR primer.
  • the PCR primer can comprise a sample tag oligonucleotide comprising a unique sequence specific for each sample a spacer, and a universal sequence.
  • the PCR primer can further comprise one or more dU bases.
  • the corresponding capture probe can bind to the unique sequence of a corresponding sample tag oligonucleotide attached to the target nucleic acid molecule, e.g., of the ligation or PCR products.
  • the capture probes can be pooled at a desired concentration prior to the adding in (c) to the pooled samples.
  • the concentration of each corresponding capture probe can be less than that of each corresponding PCR products.
  • the releasing in (e) can comprise releasing the target nucleic acid molecules from the capture probes by, but not limited to, heat or alkaline denaturation.
  • the target nucleic acid molecules can be released from the capture probes by RNase digestion of an RNA base in the primer, e.g., when a dU base is replaced by an RNA base.
  • the releasing in (e) can comprise releasing the target nucleic acid molecules from the capture probes by uracil-DNA glycosylase digestion of the dU base in the sample specific oligonucleotide.
  • the sample specific oligonucleotide can comprise 1-100 base differences compared to a sample specific oligonucleotide for another sample.
  • the ligation or PCR products can comprise a 5’ overhang.
  • the 5’ overhang can comprise 5-50 bases, 5-40 bases, 5-30 bases, 5-20 bases, 10-20 bases, or any specific number of bases or ranges of bases derived therefrom.
  • the capture probe can be linked to, but not limited to, an affinity tag for binding to a solid surface, His tag, TAP tag, or antibody tag.
  • the affinity tag can be biotin and the solid surface can be a streptavidin coated magnetic bead.
  • the methods can further comprise purifying the pooled target nucleic acid molecules attached to the sample tag oligonucleotides, e.g., the ligation or PCR products, from the multiple samples.
  • the purifying can be by, but not limited to, spin column, magnetic beads, or gel electrophoresis.
  • the methods can further comprise quantifying the pooled PCR products. The quantifying can be performed by, but not limited to, qPCR or electrophoresis.
  • the methods can further comprise sequencing the released target nucleic acid molecules.
  • the methods can further comprise performing next generation sequencing.
  • the methods can further comprise performing third generation sequencing.
  • the methods can further comprise performing single cell analysis.
  • each sample comprises target nucleic acid molecules
  • PCR primer comprises a 5’ variable region comprising a 5’ sample tag sequence comprising a unique sequence specific for the each sample, a spacer, and a universal sequence;
  • kits comprising: multiple PCR primers for use with multiple samples comprising target nucleic acid molecules, wherein each PCR primer comprises a 5’ sample tag sequence comprising a unique sequence specific for the each sample, a spacer, and a universal sequence; and a capture probe.
  • the PCR primers can further comprise one or more dU bases.
  • the kits can further comprise a ligase.
  • the kits can further comprise a uracil-DNA glycosylase.
  • target nucleic acid molecule(s) are nucleic acid molecules of interest, such as for analysis, e.g., by sequencing. As disclosed herein, target nucleic acid molecules from multiple samples can be analyzed simultaneously. The methods disclosed herein provide for a single or multiple target nucleic acid molecules from each sample to be pooled with another or other multiple samples for analysis simultaneously.
  • sample can include nucleic acid molecules, such as RNA or DNA, a single cell, multiple cells, fragments of cells, or an aliquot of body fluid, taken from a subject (e.g., a mammalian subject, an animal subject, a human subject, or a non-human animal subject).
  • Samples can be selected by one of skill in the art using any known means known including but not limited to centrifugation, venipuncture, blood draw, excretion, swabbing, biopsy, needle aspirate, lavage sample, scraping, surgical incision, laser capture microdissection, gradient separation, or intervention or other means known in the art.
  • the term “mammal” or “mammalian” as used herein includes both humans and non-humans and include but is not limited to humans, non-human primates, canines, felines, murines, bovines, equines, and porcines.
  • biological sample is intended to include, but is not limited to, tissues, cells, biological fluids and isolates thereof, isolated from a subject, as well as tissues, cells, and fluids present within a subject.
  • a “single cell” refers to one cell.
  • Single cells useful in the methods described herein can be obtained from a tissue of interest, or from a biopsy, blood sample, or cell culture. Additionally, cells from specific organs, tissues, tumors, neoplasms, or the like can be obtained and used in the methods described herein. In general, cells from any population can be used in the methods, such as a population of prokaryotic or eukaryotic organisms, including bacteria or yeast.
  • a single cell suspension can be obtained using standard methods known in the art including, for example, enzymatically using trypsin or papain to digest proteins connecting cells in tissue samples or releasing adherent cells in culture, or mechanically separating cells in a sample. Samples can also be selected by one of skill in the art using one or more markers known to be associated with a sample of interest.
  • Methods for manipulating single cells include fluorescence activated cell sorting (FACS), micromanipulation and the use of semi -automated cell pickers (e.g., the QuixellTM cell transfer system from Stoelting Co.).
  • FACS fluorescence activated cell sorting
  • micromanipulation e.g., the QuixellTM cell transfer system from Stoelting Co.
  • semi -automated cell pickers e.g., the QuixellTM cell transfer system from Stoelting Co.
  • Individual cells can, for example, be individually selected based on features detectable by microscopic observation, such as location, morphology, or reporter gene expression.
  • the sample is prepared and the cell(s) are lysed to release cellular contents including DNA and RNA, such as gDNA and mRNA, using methods known to those of skill in the art. Lysis can be achieved by, for example, heating the cells, or by the use of detergents or other chemical methods, or by a combination of these. Any suitable lysis method known in the art can be used.
  • Multiple samples means more than one sample, such as but not limited to 2 or more, 2-5, 6-10, 11-15, 16-20, 21-30, 31-40, 41-50, 51-100, more than 100, or any specific number or ranges of samples derived therefrom.
  • the multiple samples can be derived from one source or origin or from different sources or origins.
  • polynucleotide(s) or “oligonucleotide(s)” refers to nucleic acids such as DNA molecules and RNA molecules and analogs thereof (e.g., DNA or RNA generated using nucleotide analogs or using nucleic acid chemistry).
  • polynucleotides can be made synthetically, e.g., using art-recognized nucleic acid chemistry or enzymatically using, e.g., a polymerase, and, if desired, can be modified. Typical modifications include methylation, biotinylation, and other art-known modifications.
  • a polynucleotide can be single- stranded or double-stranded and, where desired, linked to a detectable moiety.
  • a polynucleotide can include hybrid molecules, e.g., comprising DNA and RNA.
  • G,” “C,” “A,” “T” and “U” each generally stands for a nucleotide that contains guanine, cytosine, adenine, thymidine and uracil as a base, respectively.
  • ribonucleotide” or “nucleotide” can also refer to a modified nucleotide or a surrogate replacement moiety.
  • guanine, cytosine, adenine, and uracil can be replaced by other moieties without substantially altering the base pairing properties of an oligonucleotide comprising a nucleotide bearing such replacement moiety.
  • a nucleotide comprising inosine as its base can base pair with nucleotides containing adenine, cytosine, or uracil.
  • nucleotides containing uracil, guanine, or adenine can be replaced in nucleotide sequences by a nucleotide containing, for example, inosine.
  • adenine and cytosine anywhere in the oligonucleotide can be replaced with guanine and uracil, respectively, to form G-U Wobble base pairing with the target mRNA. Sequences containing such replacement moieties are suitable for the compositions and methods described herein.
  • DNA refers to chromosomal DNA, plasmid DNA, phage DNA, or viral DNA that is single stranded or double stranded.
  • DNA can be obtained from prokaryotes or eukaryotes.
  • genomic DNA or gDNA refers to chromosomal DNA.
  • messenger RNA or “mRNA” refers to an RNA that is without introns and that can be translated into a polypeptide.
  • cDNA refers to a DNA that is complementary or identical to an mRNA, in either single stranded or double stranded form.
  • polymerase and its derivatives, generally refers to any enzyme that can catalyze the polymerization of nucleotides (including analogs thereof) into a nucleic acid strand. Typically, but not necessarily, such nucleotide polymerization can occur in a template-dependent fashion.
  • Such polymerases can include without limitation naturally occurring polymerases and any subunits and truncations thereof, mutant polymerases, variant polymerases, recombinant, fusion or otherwise engineered polymerases, chemically modified polymerases, synthetic molecules or assemblies, and any analogs, derivatives or fragments thereof that retain the ability to catalyze such polymerization.
  • the polymerase can be a mutant polymerase comprising one or more mutations involving the replacement of one or more amino acids with other amino acids, the insertion or deletion of one or more amino acids from the polymerase, or the linkage of parts of two or more polymerases.
  • the polymerase comprises one or more active sites at which nucleotide binding and/or catalysis of nucleotide polymerization can occur.
  • Some exemplary polymerases include without limitation DNA polymerases and RNA polymerases.
  • polymerase and its variants, as used herein, also refers to fusion proteins comprising at least two portions linked to each other, where the first portion comprises a peptide that can catalyze the polymerization of nucleotides into a nucleic acid strand and is linked to a second portion that comprises a second polypeptide.
  • the second polypeptide can include a reporter enzyme or a processivity-enhancing domain.
  • the polymerase can possess 5’ exonuclease activity or terminal transferase activity.
  • the polymerase can be optionally reactivated, for example through the use of heat, chemicals or re-addition of new amounts of polymerase into a reaction mixture.
  • the polymerase can include a hot-start polymerase or an aptamer-based polymerase that optionally can be reactivated.
  • extension and its variants, as used herein, when used in reference to a given primer, comprises any in vivo or in vitro enzymatic activity characteristic of a given polymerase that relates to polymerization of one or more nucleotides onto an end of an existing nucleic acid molecule.
  • primer extension occurs in a template-dependent fashion; during template-dependent extension, the order and selection of bases is driven by established base pairing rules, which can include Watson- Crick type base pairing rules or alternatively (and especially in the case of extension reactions involving nucleotide analogs) by some other type of base pairing paradigm.
  • extension occurs via polymerization of nucleotides on the 3 ⁇ H end of the nucleic acid molecule by the polymerase.
  • ligating refers generally to the act or process for covalently linking two or more molecules together, for example, covalently linking two or more nucleic acid molecules to each other.
  • ligation includes joining nicks between adjacent nucleotides of nucleic acids.
  • ligation includes forming a covalent bond between an end of a first and an end of a second nucleic acid molecule.
  • the litigation can include forming a covalent bond between a 5’ phosphate group of one nucleic acid and a 3’ hydroxyl group of a second nucleic acid thereby forming a ligated nucleic acid molecule.
  • any means for joining nicks or bonding a 5’phosphate to a 3’ hydroxyl between adjacent nucleotides can be employed.
  • an enzyme such as a ligase can be used.
  • an amplified target sequence can be ligated to an adapter to generate an adapter-ligated amplified target sequence.
  • ligase refers generally to any agent capable of catalyzing the ligation of two substrate molecules.
  • the ligase includes an enzyme capable of catalyzing the joining of nicks between adjacent nucleotides of a nucleic acid.
  • the ligase includes an enzyme capable of catalyzing the formation of a covalent bond between a 5’ phosphate of one nucleic acid molecule to a 3’ hydroxyl of another nucleic acid molecule thereby forming a ligated nucleic acid molecule.
  • Suitable ligases can include, but not limited to, T4 DNA ligase, T4 RNA ligase, and E. coli DNA ligase.
  • ligation conditions generally refers to conditions suitable for ligating two molecules to each other. In some embodiments, the ligation conditions are suitable for sealing nicks or gaps between nucleic acids.
  • a “nick” or “gap” refers to a nucleic acid molecule that lacks a directly bound 5’ phosphate of a mononucleotide pentose ring to a 3’ hydroxyl of a neighboring mononucleotide pentose ring within internal nucleotides of a nucleic acid sequence.
  • the term nick or gap is consistent with the use of the term in the art.
  • a nick or gap can be ligated in the presence of an enzyme, such as ligase at an appropriate temperature and pH.
  • an enzyme such as ligase
  • T4 DNA ligase can join a nick between nucleic acids at a temperature of about 70°C-72°C.
  • blunt-end ligation refers generally to ligation of two blunt-end double-stranded nucleic acid molecules to each other.
  • a “blunt end” refers to an end of a double-stranded nucleic acid molecule wherein substantially all of the nucleotides in the end of one strand of the nucleic acid molecule are base paired with opposing nucleotides in the other strand of the same nucleic acid molecule.
  • a nucleic acid molecule is not blunt ended if it has an end that includes a single-stranded portion greater than two nucleotides in length, referred to herein as an “overhang.”
  • the end of nucleic acid molecule does not include any single stranded portion, such that every nucleotide in one strand of the end is based paired with opposing nucleotides in the other strand of the same nucleic acid molecule.
  • the ends of the two blunt ended nucleic acid molecules that become ligated to each other do not include any overlapping, shared or complementary sequence.
  • blunted-end ligation excludes the use of additional oligonucleotide adapters to assist in the ligation of the double-stranded amplified target sequence to the double-stranded adapter, such as patch oligonucleotides as described in Mitra and Varley, US2010/0129874.
  • blunt-ended ligation includes a nick translation reaction to seal a nick created during the ligation process.
  • amplicon refers to the amplified product of a nucleic acid amplification reaction, e g., RT-PCR.
  • reverse-transcriptase PCR and “RT-PCR” refer to a type of PCR where the starting material is mRNA.
  • the starting mRNA is enzymatically converted to complementary DNA or “cDNA” using a reverse transcriptase enzyme.
  • the cDNA is then used as a template for a PCR reaction.
  • PCR product refers to the resultant mixture of compounds after two or more cycles of the PCR steps of denaturation, annealing and extension are complete. These terms encompass the case where there has been amplification of one or more segments of one or more target sequences.
  • amplification reagents refers to those reagents (deoxyribonucleotide triphosphates, buffer, etc.), needed for amplification except for primers, nucleic acid template, and the amplification enzyme.
  • amplification reagents along with other reaction components are placed and contained in a reaction vessel (test tube, microwell, etc.).
  • Amplification methods include PCR methods known to those of skill in the art and also include rolling circle amplification (Blanco et ah, J. Biol. Chem., 264, 8935-8940, 1989), hyperbranched rolling circle amplification (Lizard et ah, Nat.
  • hybridize refers to a sequence specific non-covalent binding interaction with a complementary nucleic acid. Hybridization can occur to all or a portion of a nucleic acid sequence. Those skilled in the art will recognize that the stability of a nucleic acid duplex, or hybrids, can be determined by the Tm. Additional guidance regarding hybridization conditions can be found in: Current Protocols in Molecular Biology, John Wiley & Sons, N.Y., 1989, 6.3.1-6.3.6 and in: Sambrook et ah, Molecular Cloning, a Laboratory Manual, Cold Spring Harbor Laboratory Press, Vol. 3, 1989.
  • incorporating refers to covalently linking a series of nucleotides with the rest of the polynucleotide, for example at the 3’ or 5’ end of the polynucleotide, by phosphodiester bonds, wherein the nucleotides are linked in the order prescribed by the sequence.
  • a sequence has been “incorporated” into a polynucleotide, or equivalently the polynucleotide “incorporates” the sequence, if the polynucleotide contains the sequence or a complement thereof. Incorporation of a sequence into a polynucleotide can occur enzymatically (e.g., by ligation or polymerization) or using chemical synthesis (e.g., by phosphoramidite chemistry).
  • the terms “amplify” and “amplification” refer to enzymatically copying the sequence of a polynucleotide, in whole or in part, so as to generate more polynucleotides that also contain the sequence or a complement thereof.
  • the sequence being copied is referred to as the template sequence.
  • Examples of amplification include DNA- templated RNA synthesis by RNA polymerase, RNA-templated first-strand cDNA synthesis by reverse transcriptase, and DNA-templated PCR amplification using a thermostable DNA polymerase.
  • Amplification includes all primer-extension reactions.
  • Amplification includes methods such as PCR, ligation amplification (or ligase chain reaction, LCR) and amplification methods.
  • the PCR procedure describes a method of gene amplification which is comprised of (i) sequence-specific hybridization of primers to specific genes within a DNA sample (or library), (ii) subsequent amplification involving multiple rounds of annealing, elongation, and denaturation using a DNA polymerase, and (iii) screening the PCR products for a band of the correct size.
  • the primers used are oligonucleotides of sufficient length and appropriate sequence to provide initiation of polymerization, i.e. each primer is specifically designed to be complementary to each strand of the genomic locus to be amplified.
  • Primers useful to amplify sequences from a particular gene region are preferably complementary to and hybridize specifically to sequences in the target region or in its flanking regions and can be prepared using the polynucleotide sequences provided herein. Nucleic acid sequences generated by amplification can be sequenced directly.
  • the term “associated” is used herein to refer to the relationship between a sample and the DNA molecules, RNA molecules, or other polynucleotides originating from or derived from that sample.
  • a polynucleotide is associated with a sample if it is an endogenous polynucleotide, i.e., it occurs in the sample at the time the sample is selected or is derived from an endogenous polynucleotide.
  • the mRNAs endogenous to a cell are associated with that cell.
  • cDNAs resulting from reverse transcription of these mRNAs, and DNA amplicons resulting from PCR amplification of the cDNAs contain the sequences of the mRNAs and are also associated with the cell.
  • the polynucleotides associated with a sample need not be located or synthesized in the sample and are considered associated with the sample even after the sample has been destroyed (for example, after a cell has been lysed). Molecular barcoding or other techniques can be used to determine which polynucleotides in a mixture are associated with a particular sample.
  • the reaction is called “annealing” or “binding” and those polynucleotides are described as “complementary”.
  • annealing or “binding” and those polynucleotides are described as “complementary”.
  • complementary when used to describe a first nucleotide sequence in relation to a second nucleotide sequence, refers to the ability of a polynucleotide comprising the first nucleotide sequence to hybridize and form a duplex structure under certain conditions with a polynucleotide comprising the second nucleotide sequence, as will be understood by the skilled person.
  • Such conditions can, for example, be stringent conditions, where stringent conditions can include: 400 mM NaCl, 40 mM PIPES pH 6.4, 1 mM EDTA, 50°C or 70°C for 12-16 hours followed by washing.
  • stringent conditions can include: 400 mM NaCl, 40 mM PIPES pH 6.4, 1 mM EDTA, 50°C or 70°C for 12-16 hours followed by washing.
  • Other conditions such as physiologically relevant conditions as can be encountered inside an organism, can apply. The skilled person will be able to determine the set of conditions most appropriate for a test of complementarity of two sequences in accordance with the ultimate application of the hybridized nucleotides.
  • Complementary sequences include base-pairing of a region of a polynucleotide comprising a first nucleotide sequence to a region of a polynucleotide comprising a second nucleotide sequence over the length or a portion of the length of one or both nucleotide sequences.
  • Such sequences can be referred to as “complementary” with respect to each other herein.
  • the two sequences can be complementary, or they can include one or more, but generally not more than about 5, 4, 3, or 2 mismatched base pairs within regions that are base-paired. For two sequences with mismatched base pairs, the sequences will be considered “substantially complementary” as long as the two nucleotide sequences bind to each other via base-pairing.
  • nucleotide sequences the left-hand end of a single-stranded nucleotide sequence is the 5’-end; the left-hand direction of a double-stranded nucleotide sequence is referred to as the 5’-direction.
  • the direction of 5’ to 3’ addition of nucleotides to nascent RNA transcripts is referred to as the transcription direction.
  • the DNA strand having the same sequence as an mRNA is referred to as the “coding strand”; sequences on the DNA strand having the same sequence as an mRNA transcribed from that DNA and which are located 5’ to the 5’ -end of the RNA transcript are referred to as “upstream sequences”; sequences on the DNA strand having the same sequence as the RNA and which are 3’ to the 3’ end of the coding RNA transcript are referred to as “downstream sequences.”
  • the double stranded DNA fragments can be end polished so that they are amenable for ligation.
  • the ends of the DNA fragments can be polished to have blunt ends. As known in the art, this can be achieved with enzymes that can either fill in or remove the protruding strand.
  • the sample tag oligonucleotide or sample tag sequence can comprise a unique sequence specific for each sample.
  • the sample tag oligonucleotide can comprise a unique sequence specific for each sample and a spacer.
  • the sample tag oligonucleotide can comprise a unique sequence specific for each sample, one or more dU bases, and a spacer, e.g., in the 5’ to 3’ direction.
  • the PCR primer comprises a sample tag sequence comprising a unique sequence specific for each sample, one or more dU bases, a spacer, and a universal sequence.
  • the PCR primer can comprise a sample tag oligonucleotide comprising a unique sequence specific for each sample and a universal sequence, e.g., in the 5’ to 3’ direction.
  • the number of dU bases can vary, e.g., but not limited to, at least 1, at least 2, at least 3, at least 4, at least 5, at least 6, at least 7, at least 8, at least 9, at least 10, at least 15, at least 20, at least 25, at least 30, or any number or ranges of dU bases derived therefrom.
  • the spacer can be a piece of sequence that, when serving as template, cannot be extended by polymerase in normal PCR conditions.
  • a spacer can be, but not limited to, one or more RNA bases or a linker, such as a carbon backbone without any nucleobase.
  • the polymerase stops incorporating a nucleotide at the spacer site in the template, thus leaving a single strand overhang at 5’ end beyond the spacer site.
  • the spacer can be a covalent linker between 2 oligonucleotides and can be, e.g., but not limited to, at least 1 base long, at least 2 bases long, at least 3 bases long, at least 4 bases long, at least 5 bases long, at least 6 bases long, at least 7 bases long, at least 8 bases long, at least 9 bases long, at least 10 bases long, or any number or ranges of bases derived therefrom.
  • the sample tag oligonucleotide can comprise 1-100 base differences in its unique sequence compared to a sample specific oligonucleotide for another sample.
  • the sample tag oligonucleotide can comprise 1-100, 1-90, 1-80, 1-70, 1-60, 1- 50, 1-40, 1-30, 1-20, 1-10, 1-9, 1-8, 1-7, 1-6, 1-5, 1-4, 1-3, 1-2, or a specific number or ranges of base differences derived therefrom in its unique sequence compared to a sample specific oligonucleotide for another sample.
  • the universal sequence is the region of the PCR primer that are complementary to nucleotide sequences that are very common in a particular set of DNA molecules and cloning vectors. Thus, they are able to bind to a wide variety of DNA templates.
  • amplified target sequences refers generally to a nucleic acid sequence produced by the amplification of/amplifying the target sequences using target-specific primers and the methods provided herein.
  • the amplified target sequences can be either of the same sense (the positive strand produced in the second round and subsequent even-numbered rounds of amplification) or antisense (i.e., the negative strand produced during the first and subsequent odd-numbered rounds of amplification) with respect to the target sequences.
  • the amplified target sequences are typically less than 50% complementary to any portion of another amplified target sequence in the reaction.
  • PCR polymerase chain reaction
  • the mixture is denatured and the primers then annealed to their complementary sequences within the target molecule.
  • the primers are extended with a polymerase so as to form a new pair of complementary strands.
  • the steps of denaturation, primer annealing, and polymerase extension can be repeated many times (i.e., denaturation, annealing and extension constitute one “cycle;” there can be numerous “cycles”) to obtain a high concentration of an amplified segment of the desired target sequence.
  • the length of the amplified segment of the desired target sequence is determined by the relative positions of the primers with respect to each other, and therefore, this length is a controllable parameter.
  • PCR polymerase chain reaction
  • the term “primer” includes a PCR primer or an oligonucleotide, either natural or synthetic, that is capable, upon forming a duplex with a polynucleotide template, of acting as a point of initiation of nucleic acid synthesis and being extended from its 3’ end along the template so that an extended duplex is formed.
  • the sequence of nucleotides added during the extension process is determined by the sequence of the template polynucleotide.
  • primers are extended by a DNA polymerase. Primers usually have a length in the range of between 3 to 36 nucleotides, also 5 to 24 nucleotides, also from 14 to 36 nucleotides.
  • Primers within the scope of the disclosure herein include orthogonal primers, amplification primers, constructions primers and the like. Pairs of primers can flank a sequence of interest or a set of sequences of interest. Primers and probes can be degenerate in sequence. Primers within the scope of the disclosure herein bind adjacent to a target sequence.
  • a “primer” can be considered a short polynucleotide, generally with a free 3’ -OH group that binds to a target or template potentially present in a sample of interest by hybridizing with the target, and thereafter promoting polymerization of a polynucleotide complementary to the target. Primers of the disclosure herein are comprised of nucleotides ranging from 17 to 30 nucleotides.
  • the primer is at least 17 nucleotides, or alternatively, at least 18 nucleotides, or alternatively, at least 19 nucleotides, or alternatively, at least 20 nucleotides, or alternatively, at least 21 nucleotides, or alternatively, at least 22 nucleotides, or alternatively, at least 23 nucleotides, or alternatively, at least 24 nucleotides, or alternatively, at least 25 nucleotides, or alternatively, at least 26 nucleotides, or alternatively, at least 27 nucleotides, or alternatively, at least 28 nucleotides, or alternatively, at least 29 nucleotides, or alternatively, at least 30 nucleotides, or alternatively at least 50 nucleotides, or alternatively at least 75 nucleotides or alternatively at least 100 nucleotides.
  • target-specific primer refers generally to a single stranded or double-stranded polynucleotide, typically an oligonucleotide, that includes at least one sequence that is at least 50% complementary, typically at least 75% complementary or at least 85% complementary, more typically at least 90% complementary, more typically at least 95% complementary, more typically at least 98% or at least 99% complementary, or 100% identical, to at least a portion of a nucleic acid molecule that includes a target sequence of interest, i.e., a target nucleic acid molecule.
  • the target-specific primer and target sequence are described as “corresponding” to each other.
  • the target-specific primer is capable of hybridizing to at least a portion of its corresponding target sequence (or to a complement of the target sequence); such hybridization can optionally be performed under standard hybridization conditions or under stringent hybridization conditions. In some embodiments, the target-specific primer is not capable of hybridizing to the target sequence, or to its complement, but is capable of hybridizing to a portion of a nucleic acid strand including the target sequence, or to its complement.
  • the target-specific primer includes at least one sequence that is at least 75% complementary, typically at least 85% complementary, more typically at least 90% complementary, more typically at least 95% complementary, more typically at least 98% complementary, or more typically at least 99% complementary, to at least a portion of the target sequence itself; in other embodiments, the target-specific primer includes at least one sequence that is at least 75% complementary, typically at least 85% complementary, more typically at least 90% complementary, more typically at least 95% complementary, more typically at least 98% complementary, or more typically at least 99% complementary, to at least a portion of the nucleic acid molecule other than the target sequence.
  • the target-specific primer is substantially non-complementary to other target sequences present in the sample; optionally, the target-specific primer is substantially non- complementary to other nucleic acid molecules present in the sample.
  • nucleic acid molecules present in the sample that do not include or correspond to a target sequence (or to a complement of the target sequence) are referred to as “non-specific” sequences or “non-specific nucleic acids”.
  • the target-specific primer is designed to include a nucleotide sequence that is substantially complementary to at least a portion of its corresponding target sequence.
  • a target-specific primer is at least 95% complementary, or at least 99% complementary, or 100% identical, across its entire length to at least a portion of a nucleic acid molecule that includes its corresponding target sequence.
  • a target-specific primer can be at least 90%, at least 95% complementary, at least 98% complementary or at least 99% complementary, or 100% identical, across its entire length to at least a portion of its corresponding target sequence.
  • a forward target-specific primer and a reverse target-specific primer define a target- specific primer pair that can be used to amplify the target sequence via template-dependent primer extension.
  • each primer of a target-specific primer pair includes at least one sequence that is substantially complementary to at least a portion of a nucleic acid molecule including a corresponding target sequence but that is less than 50% complementary to at least one other target sequence in the sample.
  • amplification can be performed using multiple target-specific primer pairs in a single amplification reaction, wherein each primer pair includes a forward target-specific primer and a reverse target-specific primer, each including at least one sequence that substantially complementary or substantially identical to a corresponding target sequence in the sample, and each primer pair having a different corresponding target sequence.
  • the target- specific primer can be substantially non-complementary at its 3’ end or its 5’ end to any other target-specific primer present in an amplification reaction.
  • the target-specific primer can include minimal cross hybridization to other target-specific primers in the amplification reaction. In some embodiments, target-specific primers include minimal cross-hybridization to non-specific sequences in the amplification reaction mixture. In some embodiments, the target-specific primers include minimal self-complementarity. In some embodiments, the target-specific primers can include one or more cleavable groups located at the 3 ’ end. In some embodiments, the target-specific primers can include one or more cleavable groups located near or about a central nucleotide of the target-specific primer. In some embodiments, one of more targets-specific primers includes only non-cleavable nucleotides at the 5’ end of the target-specific primer.
  • a target specific primer includes minimal nucleotide sequence overlap at the 3’ end or the 5’ end of the primer as compared to one or more different target-specific primers, optionally in the same amplification reaction.
  • 1, 2, 3, 4, 5, 6, 7, 8, 9, 10 or more, target- specific primers in a single reaction mixture include one or more of the above embodiments.
  • substantially all of the plurality of target-specific primers in a single reaction mixture includes one or more of the above embodiments.
  • Primer design is based on single primer extension, in which each genomic target is enriched by one target-specific primer and one universal primer - a strategy that removes conventional two target-specific primer design restriction and reduces the amount of required primers. All primers required for a panel are pooled into an individual primer pool to reduce panel handling and the number of pools required for enrichment and library construction.
  • the booster panel is a pool of up to 100 primers that can be used to boost the performance of certain primers in any panel (cataloged, extended, or custom), or to extend the contents of an existing custom panel. The primers are delivered as a single pool that can be spiked into the existing panel.
  • PCR cycles can be conducted using an adapter primer and a pool of single primers, each carrying a gene specific sequence and a 5’ universal sequence. During this process, each single primer repeatedly samples the same target locus from different DNA templates. Afterwards, additional PCR cycles can be conducted using universal primers to attach complete adapter sequences and to amplify the library to the desired quantity.
  • a real-time polymerase chain reaction also known as quantitative polymerase chain reaction (qPCR)
  • qPCR quantitative polymerase chain reaction
  • PCR polymerase chain reaction
  • Real time PCR can be used quantitatively (quantitative real-time PCR), and semi -quantitatively, i.e. above/below a certain amount of DNA molecules (semi quantitative real-time PCR).
  • PCRs include but are not limited to nested PCR (used to analyze DNA sequences coming from different organisms of the same species but that can differ for a single nucleotide (SNIPS) and to ensure amplification of the sequence of interest in each of the organism analyzed) and Inverse-PCR (usually used to clone a region flanking an insert or a transposable element).
  • SNIPS single nucleotide
  • Inverse-PCR usually used to clone a region flanking an insert or a transposable element.
  • Two common methods for the detection of PCR products in real-time PCR are: (1) non-specific fluorescent dyes that intercalate with any double-stranded DNA, and (2) sequence-specific DNA probes consisting of oligonucleotides that are labeled with a fluorescent reporter which permits detection only after hybridization of the probe with its complementary sequence.
  • PCR is a reaction in which replicate copies are made of a target polynucleotide using a pair of primers or a set of primers consisting of an upstream and a downstream primer, and a catalyst of polymerization, such as a DNA polymerase, and typically a thermally-stable polymerase enzyme.
  • a catalyst of polymerization such as a DNA polymerase, and typically a thermally-stable polymerase enzyme.
  • the corresponding capture probe can bind to the corresponding sample tag sequence of the ligation or PCR product, i.e., the unique sequence specific for a sample.
  • the capture probes can be pooled at a desired concentration prior to the adding in (c) to the pooled samples.
  • a “capture probe” is meant a moiety which can be used to bind or attach to a strand of nucleic acid.
  • a capture probe can comprise a nucleotide sequence that can bind to a corresponding or complementary nucleotide sequence in a sample tag oligonucleotide, or a ligation or PCR product containing the unique sequence specific for a sample.
  • the capture probe can also include a polyT tail.
  • the capture probe comprises a nucleotide sequence that has at least 80%, at least 85%, at least 90%, at least 93%, at least 94%, at least 95%, at least 96%, at least 97%, at least 98%, at least 99%, or 100% identical to a nucleotide sequence that is complementary to a region of the sample tag oligonucleotide or sequence that comprises a unique sequence for a sample.
  • the capture probe can be attached or linked to an affinity tag.
  • the capture probe is an affinity tag that is attached to a nucleic acid.
  • An “affinity tag” is a moiety that can bind to a solid surface or support.
  • An affinity tag can be, but is not limited to: antigens (such as proteins (including peptides)) and antibodies (including fragments thereof (FAbs, etc.)); proteins and small molecules, including biotin/streptavidin; enzymes and substrates or inhibitors; other protein-protein interacting pairs; receptor-ligands; carbohydrates and their binding partners, nucleic acid— nucleic acid binding proteins pairs; or oligonucleotides and their reverse complement oligonucleotide sequence pairs.
  • the capture probe can be linked to, but not limited to, an affinity tag for binding to a solid surface, His tag, TAP tag, or antibody tag.
  • the affinity tag can be, but not limited to, biotin and the solid surface can be a streptavidin coated magnetic bead.
  • the affinity tag comprises biotin or imino-biotin, which can bind to streptavidin. Imino-biotin disassociates from streptavidin in pH 4.0 buffer while biotin requires a denaturant (e.g., 6 M guanidinium HC1, pH 1.5 or 90% formamide at 95°C).
  • a denaturant e.g. 6 M guanidinium HC1, pH 1.5 or 90% formamide at 95°C.
  • specifically bind is meant that the capture probe binds with specificity to a sample tag oligonucleotide or a moiety on a solid support to differentiate between the pair and other components or contaminants of the system. The binding should be sufficient to remain bound under the conditions of the assay, including washes to remove non-specific binding.
  • the dissociation constants of the pair will be less than about 1 O 4 - 1 O 6 M 1 , with less than about 10 5 to 10 9 M 1 , or less than about 10 7 -10 9 M 1 .
  • the target nucleic acid comprising the sample tag sequence can be immobilized on a solid surface or support rather than non-sample tag oligonucleotides.
  • the sample tag oligonucleotide is immobilized by binding to a capture probe attached to a solid support.
  • sample tag sequence region or primer containing the sample tag sequence binds to a corresponding capture probe, which can be attached to a solid surface or support.
  • substrate or “solid support” or other grammatical equivalents herein is meant any material that is appropriate for or can be modified to be appropriate for the attachment of the target sequences. As will be appreciated by those in the art, the number of possible substrates is very large.
  • Possible substrates include, but are not limited to, glass and modified or functionalized glass, plastics (including acrylics, polystyrene and copolymers of styrene and other materials, polypropylene, polyethylene, polybutylene, polyurethanes, Teflon, etc.), polysaccharides, nylon or nitrocellulose, ceramics, resins, silica or silica-based materials including silicon and modified silicon, carbon, metals, inorganic glasses, plastics, optical fiber bundles, and a variety of other polymers.
  • the solid surface can be magnetic beads or high throughput microtiter plates.
  • sample tag oligonucleotide can bind or attach to the solid surface or support in a number of ways, such as by its corresponding capture probes.
  • the non-hybridized nucleic acids can be removed by washing.
  • the hybridization complexes are immobilized on a solid support and washed under conditions sufficient to remove non-hybridized nucleic acids, i.e., non-hybridized probes and sample nucleic acids.
  • immobilized complexes are washed under conditions sufficient to remove imperfectly hybridized complexes. That is, hybridization complexes that contain mismatches are also removed in the wash steps.
  • hybridization or washing conditions can be used, including high, moderate and low stringency conditions; see for example, Maniatis et al., Molecular Cloning: A Laboratory Manual, 2d Edition, 1989, and Short Protocols in Molecular Biology, ed. Ausubel, et al, hereby incorporated by reference. Stringent conditions are sequence- dependent and will be different in different circumstances. Longer sequences hybridize specifically at higher temperatures. An extensive guide to the hybridization of nucleic acids is found in Tijssen, Techniques in Biochemistry and Molecular Biology— Hybridization with Nucleic Acid Probes, "Overview of principles of hybridization and the strategy of nucleic acid assays" (1993). Generally, stringent conditions are selected to be about 5-10°C.
  • Tm thermal melting point
  • the Tm is the temperature (under defined ionic strength, pH and nucleic acid concentration) at which 50% of the probes complementary to the target hybridize to the target sequence at equilibrium (as the target sequences are present in excess, at Tm, 50% of the probes are occupied at equilibrium).
  • Stringent conditions will be those in which the salt concentration is less than about 1.0 M sodium ion, typically about 0.01 to 1.0 M sodium ion concentration (or other salts) at pH 7.0 to 8.3 and the temperature is at least about 30°C for short probes (e.g. 10 to 50 nucleotides) and at least about 60°C for long probes (e.g.
  • Stringent conditions may also be achieved with the addition of helix destabilizing agents such as formamide.
  • the hybridization or washing conditions may also vary when a non-ionic backbone, i.e. PNA is used, as is known in the art.
  • cross- linking agents may be added after target binding to cross-link, i.e. covalently attach, the two strands of the hybridization complex.
  • bead types are commercially available, including but not limited to, beads selected from agarose beads, streptavidin-coated beads, NeutrAvidin- coated beads, antibody-coated beads, paramagnetic beads, magnetic beads, electrostatic beads, electrically conducting beads, fluorescently labeled beads, colloidal beads, glass beads, semiconductor beads, and polymeric beads.
  • the releasing in (e) can comprise releasing the nucleic acid molecules from the capture probes by, but not limited to, heat or alkaline denaturation.
  • the releasing in (e) can comprise releasing the nucleic acid molecules from the capture probes by uracil-DNA glycosylase digestion of the dU base in the sample specific oligonucleotide or sequence.
  • the cleavage enzyme composition comprises an N- glycosylase and an AP endonuclease.
  • the cleavage enzyme is an N- glycosylase selected from among uracil-N-glycosylase and an AP endonuclease and FPG protein and the AP endonuclease is selected from among E. cob endonuclease III or endonuclease IV.
  • the cleavage enzyme can also be a restriction enzyme.
  • the sample tag oligonucleotide bound to the capture probe can exhibit a sequence of a restriction site, which can then be incubated with a restriction endonuclease that recognizes the restriction site, wherein the restriction endonuclease cleaves the nucleic acid molecule from the capture probe.
  • the methods disclosed herein results in obtaining or achieving normalization of or normalized quantity of target nucleic acid molecules between samples such that the quantity or number of the target nucleic acid molecules a sample is similar to the quantity or number of the same or different target nucleic acid molecules in other sample(s).
  • the sample pool can be normalized by hybridization of the capture probe to the sample tag oligonucleotide or sequence for 10 min to 2 hr, 15 min to 1.5 hr, 30 min to 1 hr, or any length of time or ranges of time derived therefrom as needed for the normalization.
  • the normalization of the target nucleic acid molecules in the pooled samples is within 5-fold to 0.5-fold, within 4-fold to 1-fold, within 3-fold to 2-fold, within 1.5-fold to 0.5-fold, or within folds or ranges of folds derived therefrom in the pooled samples. In some embodiments, the normalization of the target nucleic acid molecules in the pooled samples is within 1% to 10%, 2% to 7%, 3% to 5%, 4% to 6%, or any % or ranges of % derived therefrom.
  • the normalization of the target nucleic acid molecules in the pooled samples have a Gini coefficient of 0.5 to 0.03 0.4 to 0.05, 0.3 to 0.06, 0.2 to 0.03, 0.015 to 0.07, or any Gini coefficient or ranges of Gini coefficient derived therefrom.
  • a “Gini coefficient” is a measure of the inequality of values across the population. A Gini coefficient of zero expresses perfect equality, where all values are the same. A Gini coefficient of one (or 100%) expresses maximal inequality among values. In some embodiments, such normalizations as disclosed herein can be compared to prenormalization measurements.
  • RNA expression profiles are determined using any sequencing methods known in the art. Determination of the sequence of a nucleic acid sequence of interest can be performed using a variety of sequencing methods known in the art including, but not limited to, sequencing by synthesis (SBS), sequencing by hybridization (SBH), sequencing by ligation (SBL) (Shendure et al.
  • SBS sequencing by synthesis
  • SBH sequencing by hybridization
  • SBL sequencing by ligation
  • Embodiments of the disclosure herein also provide methods for analyzing gene expression in a plurality of single cells, the method comprising the steps of preparing a cDNA library using the method described herein and sequencing the cDNA library.
  • a “gene” refers to a polynucleotide containing at least one open reading frame (ORF) that is capable of encoding a particular polypeptide or protein after being transcribed and translated. Any of the polynucleotide sequences described herein can be used to identify larger fragments or full- length coding sequences of the gene with which they are associated. Methods of isolating larger fragment sequences are known to those of skill in the art.
  • expression refers to the process by which polynucleotides are transcribed into mRNA and/or the process by which the transcribed mRNA is subsequently being translated into peptides, polypeptides, or proteins. If the polynucleotide is derived from genomic DNA, expression can include splicing of the mRNA in a eukaryotic cell.
  • the cDNA library can be sequenced by any suitable screening method.
  • the cDNA library can be sequenced using a high-throughput screening method, such as Applied Biosystems’ SOLiD sequencing technology, or Illumina’s Genome Analyzer.
  • the cDNA library can be shotgun sequenced.
  • the number of reads can be at least 10,000, at least 1 million, at least 10 million, at least 100 million, or at least 1000 million.
  • the number of reads can be from 10,000 to 100,000, or alternatively from 100,000 to 1 million, or alternatively from 1 million to 10 million, or alternatively from 10 million to 100 million, or alternatively from 100 million to 1000 million.
  • a “read” is a length of continuous nucleic acid sequence obtained by a sequencing reaction.
  • the DNA or gDNA library generated by the methods disclosed herein can be useful for, but not limited to, DNA variant detection, copy number analysis, fusion gene detection and structural variant detection.
  • the cDNA library generated by the methods disclosed herein can be useful for, but not limited to, RNA variant detection, gene expression analysis, and fusion gene detection.
  • the DNA and cDNA libraries can also be used for paired DNA and RNA profiling.
  • the expression profiles described herein are useful in the field of predictive medicine in which diagnostic assays, prognostic assays, pharmacogenomics, and monitoring clinical trials are used for prognostic (predictive) purposes to thereby treat an individual prophylactically. Accordingly, some embodiments relate to diagnostic assays for determining the expression profile of nucleic acid sequences (e.g., RNAs), in order to determine whether an individual is at risk of developing a disorder and/or disease. Such assays can be used for prognostic or predictive purposes to thereby prophylactically treat an individual prior to the onset of the disorder and/or disease. Accordingly, in certain exemplary embodiments, methods of diagnosing and/or prognosing one or more diseases and/or disorders using one or more of expression profiling methods described herein are provided.
  • nucleic acid sequences e.g., RNAs
  • Some embodiments pertain to monitoring the influence of agents (e.g., drugs or other compounds administered either to inhibit or to treat or prevent a disorder and/or disease) on the expression profile of nucleic acid sequences (e.g., RNAs) in clinical trials. Accordingly, in certain exemplary embodiments, methods of monitoring one or more diseases and/or disorders before, during and/or subsequent to treatment with one or more agents using one or more of expression profiling methods described herein are provided.
  • agents e.g., drugs or other compounds administered either to inhibit or to treat or prevent a disorder and/or disease
  • nucleic acid sequences e.g., RNAs
  • Monitoring the influence of agents (e.g., drug compounds) on the level of expression of a marker can be applied not only in basic drug screening, but also in clinical trials.
  • agents e.g., drug compounds
  • the effectiveness of an agent to affect an expression profile can be monitored in clinical trials of subjects receiving treatment for a disease and/or disorder associated with the expression profile.
  • the methods for monitoring the effectiveness of treatment of a subject with an agent comprising the steps of (i) obtaining a pre-administration sample from a subject prior to administration of the agent; (ii) detecting one or more expression profiled in the pre administration sample; (iii) obtaining one or more post-administration samples from the subject; (iv) detecting one or more expression profiles in the post-administration samples; (v) comparing the one or more expression profiled in the pre-administration sample with the one or more expression profiles in the post-administration sample or samples; and (vi) altering the administration of the agent to the subject accordingly.
  • an agent e.g., an agonist, antagonist, peptidomimetic, protein, peptide, nucleic acid, small molecule, or other drug candidate
  • the expression profiling methods described herein allow the quantitation of gene expression.
  • tissue specificity but also the level of expression of a variety of genes in the tissue is ascertainable.
  • genes can be grouped on the basis of their tissue expression per se and level of expression in that tissue. This is useful, for example, in ascertaining the relationship of gene expression between or among tissues.
  • one tissue can be perturbed and the effect on gene expression in a second tissue can be determined.
  • the effect of one cell type on another cell type in response to a biological stimulus can be determined.
  • Such a determination is useful, for example, to know the effect of cell-cell interaction at the level of gene expression.
  • an assay can be used to determine the molecular basis of the undesirable effect and thus provides the opportunity to co-administer a counteracting agent or otherwise treat the undesired effect.
  • undesirable biological effects can be determined at the molecular level.
  • the effects of an agent on expression of other than the target gene can be ascertained and counteracted.
  • the time course of expression of one or more nucleic acid sequences can be monitored. This can occur in various biological contexts, as disclosed herein, for example development of a disease and/or disorder, progression of a disease and/or disorder, and processes, such as cellular alterations associated with the disease and/or disorder.
  • the expression profiling methods described herein are also useful for ascertaining the effect of the expression of one or more nucleic acid sequences (e.g., genes, mRNAs and the like) on the expression of other nucleic acid sequences (e.g., genes, mRNAs and the like) in the same cell or in different cells. This provides, for example, for a selection of alternate molecular targets for therapeutic intervention if the ultimate or downstream target cannot be regulated.
  • nucleic acid sequences e.g., genes, mRNAs and the like
  • the expression profiling methods described herein are also useful for ascertaining differential expression patterns of one or more nucleic acid sequences (e.g., genes, mRNAs and the like) in normal and abnormal cells. This provides a battery of nucleic acid sequences (e.g., genes, mRNAs and the like) that could serve as a molecular target for diagnosis or therapeutic intervention.
  • nucleic acid sequences e.g., genes, mRNAs and the like
  • step 1 to create PCR product with single stranded arm for hybridization, modified PCR primers can be used.
  • modified PCR primers In the following example, it consists of 5’ variable region for hybridization to capture probe, dU base for cleavage in later steps, and a spacer preventing polymerase elongation beyond this point so that variable region for probe binding is always single stranded (FIG. 3 A).
  • step 2 the PCR products are pooled from multiple samples.
  • step 3 the capture probe is designed to bind to corresponding variable region of the PCR products.
  • step 4 it is 3’ biotinylated for binding to streptavidin coated magnetic beads (FIG. 3C).
  • Capture probe is in limited supply in the reaction, in lesser quantity than the corresponding PCR products, so that only a subset of the PCR product binds. Capture probes are pooled at pre-determined concentration, depending on sequencing capacity allocation for each sample. A typical normalization is equimolar for each sample. The total amount of the capture probe determines the output library concentration.
  • step 5 the release of normalized library can be achieved by UNG (Uracil-DNA glycosylase) digestion of the dU base from the PCR primer.
  • UNG Uracil-DNA glycosylase
  • the library molecules maintains double stranded structure. It can be visualized and quantified by optional library quantification methods such as bioanalyzer. Traditional qPCR quantification is also possible, although it is not necessary.
  • the library is ready to be sequenced (FIG. 3D).
  • a panel of 105 probes were designed. Sequences of any two of the capture probes differed by at least 4 bases. Out of the 105 designs, 25 were randomly selected and used for lab test (each set of uPCR primer and corresponding capture probe sequences shown in SEQ ID NOS:l and 2, and 3; 16 and 2, and 17; 20 and 2, and 21; 24 and 2, and 25; 26 and 2, and 27; 38 and 2, and 39; 40 and 2, and 41; 42 and 2, and 43; 44 and 2, and 45; 46 and 2, and 47; 48 and 2, and 49; 50 and 2, and 51; 52 and 2, and 53; 54 and 2, and 55; 56 and 2, and 57; 58 and 2, and 59; 60 and 2, and 61; 62 and 2, and 63; 64 and 2, and 65; 66 and 2, and 67; 68 and
  • the library pool was normalized using the workflow disclosed herein, aiming for equal molar output. Two normalization reactions were performed. One was done by using 1- hour hybridization, the other was 30-min hybridization, to the capture probes. The abundance of each sample was determined by sequencing on an Illumina MiSeq sequencer (Table 3, Post-normalization abundance). Table 3 [0143] Using the highly uneven library mix with a Gini coefficient of 0.310 as input, both of the normalization reactions achieved very uniform output. The highest and lowest abundant samples were within 2-fold after normalization (5.1% vs 2.7% for 1-hr hyb, 5.1% vs 2.8% for 30-min hyb).
  • the normalization method was applied to Whole Genome Sequencing (WGS) workflow as a real case example.
  • WGS NGS libraries were prepared by using QIAseq FX DNA Library Kit following Manufacturer recommended condition with minor modifications. Briefly, the final PCR amplification were substituted with the normalization compatible conditions. This modification did not add any substantial handling complexity or reaction time to the original workflow.
  • 5ul (out of 50ul) of each libraries were pooled together, and purified by Agencourt AMPure XP Beads. This library pool was normalized using a 30-min hybridization normalization method and sequenced on MiSeq platform (Table 4).
  • the turnaround time for this normalization is about 1 hour and 45 mins, including purification of only one pooled library (30 mins).
  • the hands-on time is about 45 mins.
  • the turnaround time for an equivalent traditional method is about 2 hrs. It requires individual purification of all the samples, which takes at least 45 mins to process 25 samples. Additionally, individual quantification, dilution and pooling of 25 samples takes about 1 hr and 15 mins to perform.
  • the total hands-on time is about 1.5 hours. The time goes longer with more samples, while for the normalization, the time remains substantially unchanged with more samples. This makes the methods disclosed herein particularly attractive for high throughput applications.
  • the library normalization workflows disclosed herein provide high-throughput alternatives to traditional library quantification and mixing method.
  • the workflows can be used as integrated components in current DNA or RNA sequencing kits.
  • the workflows can also be used as a stand-alone kit, complementary to other universal NGS library preparation kits.

Abstract

The invention relates to methods of nucleic acid preparations. More specifically, the methods relate to obtaining a normalized quantity of target nucleic acid molecules from multiple samples for simultaneous analysis of the multiple samples, such as by sequencing.

Description

NUCLEIC ACID PREPARATIONS FROM MULTIPLE SAMPLES AND USES
THEREOF
CROSS-REFERENCE TO RELATED APPLICATIONS [0001] This application claims benefit of the filing date of U.S. Appl. No. 63/012,467, filed April 20, 2020, the disclosure of which is incorporated herein by reference in its entirety.
REFERENCE TO SEQUENCE LISTING SUBMITTED ELECTRONICALLY [0002] The content of the electronically submitted sequence listing in ASCII text file (Name: 2495-0014W001_Sequence Listing_ST25.txt; Size: 15 KB; and Date of Creation: April 16, 2021) filed with the application is incorporated herein by reference in its entirety.
BACKGROUND OF THE INVENTION
[0003] Because of high throughput of NGS (next generation sequencing) platforms and high cost per sequencing run, many people tend to combine multiple samples into one sequencing run so that the per sample cost is low. However, many NGS platforms have very low tolerance to library concentration variation. Accurate quantification of the library is thus required. To date, the best practice is to precisely measure the concentration of each sample by qPCR, which usually involves serial dilution of each sample and control standard. In addition, the calculation involves adjustments based on library fragment length, and thus, qPCR measurement is usually complemented by an electrophoresis method, such as by Agilent’s Bioanalyzer. After determining the concentration, libraries need to be manually diluted again, and carefully pooled together at a desired amount of each sample. This process involves multiple pipetting steps and is time-consuming. Any mismeasurement and handling errors may result in insufficient coverage for some samples, deteriorated data quality, or even failure of the entire sequencing run.
[0004] An alternative method is bead-based normalization. In this method, a limited amount of DNA binding beads is added to each library. The library amount greatly surpasses the binding capacity of the beads, so that the beads are saturated. Excess, unbounded DNAs are removed. After elution of the beads, each library is normalized to the number of bead-bound DNA. Libraries can be pooled accordingly. No further quantification of each individual library is necessary. Due to its simplicity, such commercial kits are available from many manufacturers. However, it provides less level of consistency and reproducibility compared to qPCR-based normalization methods. Mehta, B. et al., Int. J. Legal Med. 132(1): 125-132 (2018).
[0005] Another approach for library normalization has been developed. See US2019/0211374 Al. In this approach, each library is PCR amplified to an excessive amount. Next, a limited amount of normalization probe (N-probe) is provided and ligated to each DNA library. The DNA library is in excess so that only a subset is ligated. The N-probe protects the DNA libraries from digestion by exonuclease. After pooling all the samples together and subjecting them to exonuclease treatment, all unprotected libraries are digested. The remaining library is equal to the amount of the ligated N-probe. This method frees customers from qPCR quantification assay and concentration adjustment of individual libraries. However, there are some drawbacks. The ligation reaction is non-specific, so that it requires separate handling of each sample. After individual purification and size selection of the PCR product and the completion of ligation, the sample can then be pooled together. This is the major bottleneck for its throughput.
[0006] SNOP (Stoichimometrically Normalizing Oligonucleotide Purification) has been demonstrated for synthetic oligonucleotide purification. Pinto, A. et al., Nat. Commun. 9(1):2467 (2018).
[0007] There remains a need for improved, nucleic acid preparations amenable for analysis in a combined pool of multiple samples.
BRIEF SUMMARY OF THE INVENTION
[0008] Accordingly, disclosed herein are methods of obtaining a normalized quantity of target nucleic acid molecules from multiple samples, comprising:
(a) attaching a sample tag oligonucleotide to a target nucleic acid molecule in each sample, wherein the sample tag oligonucleotide comprises a unique sequence specific for each sample;
(b) pooling multiple samples;
(c) adding corresponding capture probes to the pooled samples, wherein the sample tag oligonucleotide attached to the target nucleic acid molecule binds to the corresponding capture probe; and
(d) separating captured target nucleic acid molecules from uncaptured nucleic acid molecules. The methods can further comprise (e) releasing the captured target nucleic acid molecules from the capture probes.
[0009] The target nucleic acid molecule concentration in each sample can be greater than the corresponding capture probe concentration.
[0010] The attaching in (a) can comprise ligating the sample tag oligonucleotide to the target nucleic acid molecule in each sample.
[0011] The sample tag oligonucleotide can comprise a unique sequence specific for each sample. The sample tag oligonucleotide can further comprise a spacer. The sample tag oligonucleotide can further comprise one or more dU bases.
[0012] The attaching in (a) can comprise performing PCR amplification of each sample with a PCR primer. The PCR primer can comprise a sample tag oligonucleotide, a spacer, and a universal sequence. The PCR primer can further comprise one or more dU bases.
[0013] The corresponding capture probe can bind to the unique sequence of a corresponding sample tag oligonucleotide attached to the target nucleic acid molecule.
[0014] The capture probes can be pooled at a desired concentration prior to the adding in (c) to the pooled samples. In some embodiments, the concentration of each capture probe can be less than that of each corresponding PCR products.
[0015] The releasing in (e) can comprise releasing the nucleic acid molecules from the capture probes by, but not limited to, heat or alkaline denaturation.
[0016] The releasing in (e) can comprise releasing the nucleic acid molecules from the capture probes by uracil-DNA glycosylase digestion of the dU base in the sample specific oligonucleotide.
[0017] The unique sequence of the sample tag oligonucleotide can comprise 1-100 base differences compared to the unique sequence of the sample tag oligonucleotide for another sample.
[0018] In some embodiments, the ligation or PCR products can comprise a 5’ overhang. The 5’ overhang can comprise, but not limited to, 5-50 bases, 5-40 bases, 5-30 bases, 5-20 bases, 10-20 bases, or any specific number of bases or ranges of bases derived therefrom. [0019] The capture probe can be linked to, but not limited to, an affinity tag for binding to a solid surface, His tag, TAP tag, or antibody tag. The affinity tag can be biotin and the solid surface can be a streptavidin coated magnetic bead.
[0020] The methods disclosed herein can further comprise purifying the pooled target nucleic acid molecules attached to the sample tag oligonucleotides from the multiple samples. The purifying can be by, but not limited to, spin column, magnetic beads, or gel electrophoresis.
[0021] The method can further comprise quantifying the pooled ligation or PCR products. The quantifying can be performed by, but not limited to, qPCR or electrophoresis.
[0022] The methods can further comprise sequencing the released target nucleic acid molecules. The methods can further comprise performing next generation sequencing. The methods can further comprise performing third generation sequencing. The methods can further comprise performing single cell analysis.
[0023] Also disclosed herein are methods of simultaneously sequencing target nucleic acid molecules from multiple samples, comprising:
(a) preparing two or more samples, wherein each sample comprises one or more target nucleic acid molecules;
(b) performing PCR amplification of the target nucleic acid molecule in each sample with a PCR primer, wherein the PCR primer comprises a 5’ variable region comprising a 5’ sample tag sequence comprising a unique sequence specific for each sample, a spacer, and a universal sequence;
(c) pooling the multiple samples;
(d) adding corresponding capture probes to the pooled samples, wherein the sample tag sequence of the PCR product binds to a corresponding capture probe and the PCR product concentration of each sample is greater than the corresponding capture probe concentration;
(e) separating the captured target nucleic acid molecules from the uncaptured nucleic acid molecules;
(f) releasing the captured target nucleic acids from the capture probes; and
(g) simultaneously sequencing the target nucleic acids from multiple samples. [0024] Further disclosed herein are kits comprising: multiple PCR primers for use with multiple samples comprising target nucleic acid molecules, wherein each PCR primer comprises a 5’ sample tag sequence comprising a unique sequence specific for the each sample, a spacer, and a universal sequence; and a corresponding capture probe capable of binding to the unique sequence specific for each sample. The kits can further comprise a ligase. The PCR primer can further comprise one or more dU bases. The kits can further comprise a uracil-DNA glycosylase. BRIEF DESCRIPTION OF THE DRAWINGS/FIGURES [0025] FIG. T Summary of a method for high throughput NGS library normalization disclosed herein.
[0026] FIG. 2. Comparison of traditional workflow and new, improved workflow disclosed herein.
[0027] FIG. 3A. PCR primer design. FIG. 3B. PCR products using the PCR primer design. FIG. 3C. Capture probe binding and streptavidin beads pull down. FIG. 3D. Final normalized library.
[0028] FIGS. 4 A and 4B. Exemplary capture probes (FIG. 4B) and corresponding uPCR (universal PCR) primers (FIG. 4A) are provided in corresponding rows as numbered. Sequence differences are underlined. “IdeoxyU” is internal deoxyUridine, “iSp9” is internal tri ethylene glycol spacer, and “3Bio” is 3’ Biotin.
DETAILED DESCRIPTION OF THE INVENTION [0029] The high throughput of next generation sequencing (NGS) enables analyzing many samples simultaneously in one sequencing run. However, each sample (a.k.a. library) needs to be individually prepared, quantified, and accurately pooled together before loading on a sequencing instrument. This process is tedious and time consuming. Disclosed herein are new methods for high throughput library normalization from multiple samples. Purification and size selection can be performed after sample pooling. The complexity of handling is reduced to one tube, instead of tens or hundreds of individual tubes. The methods disclosed herein utilize high specific hybridization pull down approach to make sure that each library is at desired concentration in the final library pool, so that individual quantification of each library is not required.
[0030] The libraries can be pooled together directly after ligation of sample tag oligonucleotides or PCR amplification without requiring individual purification and size selection. The library pool can be purified and normalized by capture probes through high specific hybridization and pull down. The resulting library mix is normalized to the amount of the predetermined capture probe mix used (usually at equal molar). It can be directly loaded on a sequencing instrument. Customers or users also have the option to quantify the library pool by other methods such as qPCR or electrophoresis, although it is not necessary. The early pooling feature makes the approach disclosed herein suitable for high throughput applications when customers or uses have many samples to analyse at once. [0031] Disclosed herein are methods for high throughput library normalization. The process involves five major steps: 1) attaching libraries with indexed normalization oligonucleotides (individually for each library), so that each library has its own sample index, as well as a single stranded arm for hybridization; 2) pool libraries together and do combined cleanup; 3) add normalization capture probes which bind to the respective targets; 4) immobilize the bound targets and wash off unbound ones; and 5) release the targets. The output library is normalized and ready to be sequenced. FIG. 1 summarizes the process.
[0032] For example, the uPCR (universal PCR) can be seamlessly integrated into library amplification reaction of most PCR based NGS library prep kits. A user can replace the PCR primers with specially designed primers. No additional handling is required before sample pooling. In traditional NGS workflow, purification and size selection for each sample individually is the major bottleneck. It involves multiple manual pipetting steps for each sample. One of the benefits of the workflow disclosed herein is that purification and size selection is after sample pooling. Comparison of a workflow disclosed herein and a traditional workflow is summarized in FIG. 2.
[0033] Accordingly, disclosed herein are methods of obtaining a normalized quantity of target nucleic acid molecules from multiple samples, comprising:
(a) attaching a sample tag oligonucleotide to a target nucleic acid molecule in each sample, wherein the sample tag oligonucleotide comprises a unique sequence specific for each sample;
(b) pooling multiple samples;
(c) adding corresponding capture probes to the pooled samples, wherein the sample tag oligonucleotide attached to the target nucleic acid molecule binds to a corresponding capture probe; and
(d) separating the captured target nucleic acid molecules from the uncaptured nucleic acid molecules.
The methods can further comprise (e) releasing the captured target nucleic acid molecules from the capture probes.
[0034] The target nucleic acid molecule concentration in each sample can be greater than the corresponding capture probe concentration.
[0035] The attaching in (a) can comprise ligating the sample tag oligonucleotide to the target nucleic acid molecule in each sample. [0036] The sample tag oligonucleotide can comprise a unique sequence specific for each sample. The sample tag oligonucleotide can further comprise one or more dU bases. The sample tag oligonucleotide can further comprise a spacer.
[0037] The attaching in (a) can comprise performing PCR amplification of each sample with a PCR primer. The PCR primer can comprise a sample tag oligonucleotide comprising a unique sequence specific for each sample a spacer, and a universal sequence. The PCR primer can further comprise one or more dU bases.
[0038] The corresponding capture probe can bind to the unique sequence of a corresponding sample tag oligonucleotide attached to the target nucleic acid molecule, e.g., of the ligation or PCR products.
[0039] The capture probes can be pooled at a desired concentration prior to the adding in (c) to the pooled samples. In some embodiments, the concentration of each corresponding capture probe can be less than that of each corresponding PCR products.
[0040] The releasing in (e) can comprise releasing the target nucleic acid molecules from the capture probes by, but not limited to, heat or alkaline denaturation. In some embodiments, the target nucleic acid molecules can be released from the capture probes by RNase digestion of an RNA base in the primer, e.g., when a dU base is replaced by an RNA base.
[0041] The releasing in (e) can comprise releasing the target nucleic acid molecules from the capture probes by uracil-DNA glycosylase digestion of the dU base in the sample specific oligonucleotide.
[0042] The sample specific oligonucleotide can comprise 1-100 base differences compared to a sample specific oligonucleotide for another sample.
[0043] In some embodiments, the ligation or PCR products can comprise a 5’ overhang. The 5’ overhang can comprise 5-50 bases, 5-40 bases, 5-30 bases, 5-20 bases, 10-20 bases, or any specific number of bases or ranges of bases derived therefrom.
[0044] The capture probe can be linked to, but not limited to, an affinity tag for binding to a solid surface, His tag, TAP tag, or antibody tag. The affinity tag can be biotin and the solid surface can be a streptavidin coated magnetic bead.
[0045] The methods can further comprise purifying the pooled target nucleic acid molecules attached to the sample tag oligonucleotides, e.g., the ligation or PCR products, from the multiple samples. The purifying can be by, but not limited to, spin column, magnetic beads, or gel electrophoresis. [0046] The methods can further comprise quantifying the pooled PCR products. The quantifying can be performed by, but not limited to, qPCR or electrophoresis.
[0047] The methods can further comprise sequencing the released target nucleic acid molecules. The methods can further comprise performing next generation sequencing. The methods can further comprise performing third generation sequencing. The methods can further comprise performing single cell analysis.
[0048] Also disclosed herein are methods of simultaneously sequencing target nucleic acids from multiple samples, comprising:
(a) preparing two or more samples, wherein each sample comprises target nucleic acid molecules;
(b) performing PCR amplification of the target nucleic acid molecule in each sample with a PCR primer, wherein the PCR primer comprises a 5’ variable region comprising a 5’ sample tag sequence comprising a unique sequence specific for the each sample, a spacer, and a universal sequence;
(c) pooling the multiple samples;
(d) adding corresponding capture probes to the pooled samples, wherein the sample tag sequence of the PCR product binds to a corresponding capture probe and each PCR product concentration is greater than the corresponding capture probe concentration;
(e) separating the captured target nucleic acid molecules from the uncaptured nucleic acid molecules;
(f) releasing the captured target nucleic acids from the capture probes; and
(g) simultaneously sequencing the target nucleic acids from multiple samples. [0049] Further disclosed herein are kits comprising: multiple PCR primers for use with multiple samples comprising target nucleic acid molecules, wherein each PCR primer comprises a 5’ sample tag sequence comprising a unique sequence specific for the each sample, a spacer, and a universal sequence; and a capture probe. The PCR primers can further comprise one or more dU bases. The kits can further comprise a ligase. The kits can further comprise a uracil-DNA glycosylase.
[0050] The term “target nucleic acid molecule(s)” are nucleic acid molecules of interest, such as for analysis, e.g., by sequencing. As disclosed herein, target nucleic acid molecules from multiple samples can be analyzed simultaneously. The methods disclosed herein provide for a single or multiple target nucleic acid molecules from each sample to be pooled with another or other multiple samples for analysis simultaneously. [0051] The term “sample” can include nucleic acid molecules, such as RNA or DNA, a single cell, multiple cells, fragments of cells, or an aliquot of body fluid, taken from a subject (e.g., a mammalian subject, an animal subject, a human subject, or a non-human animal subject). Samples can be selected by one of skill in the art using any known means known including but not limited to centrifugation, venipuncture, blood draw, excretion, swabbing, biopsy, needle aspirate, lavage sample, scraping, surgical incision, laser capture microdissection, gradient separation, or intervention or other means known in the art. The term “mammal” or “mammalian” as used herein includes both humans and non-humans and include but is not limited to humans, non-human primates, canines, felines, murines, bovines, equines, and porcines.
[0052] As used herein, the term “biological sample” is intended to include, but is not limited to, tissues, cells, biological fluids and isolates thereof, isolated from a subject, as well as tissues, cells, and fluids present within a subject.
[0053] As used herein, a “single cell” refers to one cell. Single cells useful in the methods described herein can be obtained from a tissue of interest, or from a biopsy, blood sample, or cell culture. Additionally, cells from specific organs, tissues, tumors, neoplasms, or the like can be obtained and used in the methods described herein. In general, cells from any population can be used in the methods, such as a population of prokaryotic or eukaryotic organisms, including bacteria or yeast.
[0054] A single cell suspension can be obtained using standard methods known in the art including, for example, enzymatically using trypsin or papain to digest proteins connecting cells in tissue samples or releasing adherent cells in culture, or mechanically separating cells in a sample. Samples can also be selected by one of skill in the art using one or more markers known to be associated with a sample of interest.
[0055] Methods for manipulating single cells are known in the art and include fluorescence activated cell sorting (FACS), micromanipulation and the use of semi -automated cell pickers (e.g., the Quixell™ cell transfer system from Stoelting Co.). Individual cells can, for example, be individually selected based on features detectable by microscopic observation, such as location, morphology, or reporter gene expression.
[0056] Once a desired sample has been identified, the sample is prepared and the cell(s) are lysed to release cellular contents including DNA and RNA, such as gDNA and mRNA, using methods known to those of skill in the art. Lysis can be achieved by, for example, heating the cells, or by the use of detergents or other chemical methods, or by a combination of these. Any suitable lysis method known in the art can be used.
[0057] Methods for preparation of samples comprising nucleic acid molecules are well known in the art. See also WO2019/191122.
[0058] “Multiple samples” means more than one sample, such as but not limited to 2 or more, 2-5, 6-10, 11-15, 16-20, 21-30, 31-40, 41-50, 51-100, more than 100, or any specific number or ranges of samples derived therefrom. The multiple samples can be derived from one source or origin or from different sources or origins.
[0059] The term “polynucleotide(s)” or “oligonucleotide(s)” refers to nucleic acids such as DNA molecules and RNA molecules and analogs thereof (e.g., DNA or RNA generated using nucleotide analogs or using nucleic acid chemistry). As desired, polynucleotides can be made synthetically, e.g., using art-recognized nucleic acid chemistry or enzymatically using, e.g., a polymerase, and, if desired, can be modified. Typical modifications include methylation, biotinylation, and other art-known modifications. In addition, a polynucleotide can be single- stranded or double-stranded and, where desired, linked to a detectable moiety. In some aspects, a polynucleotide can include hybrid molecules, e.g., comprising DNA and RNA. [0060] “G,” “C,” “A,” “T” and “U” each generally stands for a nucleotide that contains guanine, cytosine, adenine, thymidine and uracil as a base, respectively. However, it will be understood that the term “ribonucleotide” or “nucleotide” can also refer to a modified nucleotide or a surrogate replacement moiety. The skilled person is well aware that guanine, cytosine, adenine, and uracil can be replaced by other moieties without substantially altering the base pairing properties of an oligonucleotide comprising a nucleotide bearing such replacement moiety. For example, without limitation, a nucleotide comprising inosine as its base can base pair with nucleotides containing adenine, cytosine, or uracil. Hence, nucleotides containing uracil, guanine, or adenine can be replaced in nucleotide sequences by a nucleotide containing, for example, inosine. In another example, adenine and cytosine anywhere in the oligonucleotide can be replaced with guanine and uracil, respectively, to form G-U Wobble base pairing with the target mRNA. Sequences containing such replacement moieties are suitable for the compositions and methods described herein.
[0061] The term “DNA” refers to chromosomal DNA, plasmid DNA, phage DNA, or viral DNA that is single stranded or double stranded. DNA can be obtained from prokaryotes or eukaryotes.
[0062] The term “genomic DNA” or gDNA” refers to chromosomal DNA. [0063] The term “messenger RNA” or “mRNA” refers to an RNA that is without introns and that can be translated into a polypeptide.
[0064] The term “cDNA” refers to a DNA that is complementary or identical to an mRNA, in either single stranded or double stranded form.
[0065] As used herein, “polymerase” and its derivatives, generally refers to any enzyme that can catalyze the polymerization of nucleotides (including analogs thereof) into a nucleic acid strand. Typically, but not necessarily, such nucleotide polymerization can occur in a template-dependent fashion. Such polymerases can include without limitation naturally occurring polymerases and any subunits and truncations thereof, mutant polymerases, variant polymerases, recombinant, fusion or otherwise engineered polymerases, chemically modified polymerases, synthetic molecules or assemblies, and any analogs, derivatives or fragments thereof that retain the ability to catalyze such polymerization. Optionally, the polymerase can be a mutant polymerase comprising one or more mutations involving the replacement of one or more amino acids with other amino acids, the insertion or deletion of one or more amino acids from the polymerase, or the linkage of parts of two or more polymerases. Typically, the polymerase comprises one or more active sites at which nucleotide binding and/or catalysis of nucleotide polymerization can occur. Some exemplary polymerases include without limitation DNA polymerases and RNA polymerases. The term “polymerase” and its variants, as used herein, also refers to fusion proteins comprising at least two portions linked to each other, where the first portion comprises a peptide that can catalyze the polymerization of nucleotides into a nucleic acid strand and is linked to a second portion that comprises a second polypeptide. In some embodiments, the second polypeptide can include a reporter enzyme or a processivity-enhancing domain. Optionally, the polymerase can possess 5’ exonuclease activity or terminal transferase activity. In some embodiments, the polymerase can be optionally reactivated, for example through the use of heat, chemicals or re-addition of new amounts of polymerase into a reaction mixture. In some embodiments, the polymerase can include a hot-start polymerase or an aptamer-based polymerase that optionally can be reactivated.
[0066] The term “extension” and its variants, as used herein, when used in reference to a given primer, comprises any in vivo or in vitro enzymatic activity characteristic of a given polymerase that relates to polymerization of one or more nucleotides onto an end of an existing nucleic acid molecule. Typically, but not necessarily, such primer extension occurs in a template-dependent fashion; during template-dependent extension, the order and selection of bases is driven by established base pairing rules, which can include Watson- Crick type base pairing rules or alternatively (and especially in the case of extension reactions involving nucleotide analogs) by some other type of base pairing paradigm. In one non limiting example, extension occurs via polymerization of nucleotides on the 3ΌH end of the nucleic acid molecule by the polymerase.
[0067] As used herein, the terms “ligating,” “ligation,” and their derivatives refer generally to the act or process for covalently linking two or more molecules together, for example, covalently linking two or more nucleic acid molecules to each other. In some embodiments, ligation includes joining nicks between adjacent nucleotides of nucleic acids. In some embodiments, ligation includes forming a covalent bond between an end of a first and an end of a second nucleic acid molecule. In some embodiments, for example embodiments wherein the nucleic acid molecules to be ligated include conventional nucleotide residues, the litigation can include forming a covalent bond between a 5’ phosphate group of one nucleic acid and a 3’ hydroxyl group of a second nucleic acid thereby forming a ligated nucleic acid molecule. In some embodiments, any means for joining nicks or bonding a 5’phosphate to a 3’ hydroxyl between adjacent nucleotides can be employed. In an exemplary embodiment, an enzyme such as a ligase can be used. Generally, for the purposes of this disclosure, an amplified target sequence can be ligated to an adapter to generate an adapter-ligated amplified target sequence.
[0068] As used herein, “ligase” and its derivatives, refers generally to any agent capable of catalyzing the ligation of two substrate molecules. In some embodiments, the ligase includes an enzyme capable of catalyzing the joining of nicks between adjacent nucleotides of a nucleic acid. In some embodiments, the ligase includes an enzyme capable of catalyzing the formation of a covalent bond between a 5’ phosphate of one nucleic acid molecule to a 3’ hydroxyl of another nucleic acid molecule thereby forming a ligated nucleic acid molecule. Suitable ligases can include, but not limited to, T4 DNA ligase, T4 RNA ligase, and E. coli DNA ligase.
[0069] As used herein, “ligation conditions” and its derivatives, generally refers to conditions suitable for ligating two molecules to each other. In some embodiments, the ligation conditions are suitable for sealing nicks or gaps between nucleic acids. As defined herein, a “nick” or “gap” refers to a nucleic acid molecule that lacks a directly bound 5’ phosphate of a mononucleotide pentose ring to a 3’ hydroxyl of a neighboring mononucleotide pentose ring within internal nucleotides of a nucleic acid sequence. As used herein, the term nick or gap is consistent with the use of the term in the art. Typically, a nick or gap can be ligated in the presence of an enzyme, such as ligase at an appropriate temperature and pH. In some embodiments, T4 DNA ligase can join a nick between nucleic acids at a temperature of about 70°C-72°C.
[0070] As used herein, “blunt-end ligation” and its derivatives, refers generally to ligation of two blunt-end double-stranded nucleic acid molecules to each other. A “blunt end” refers to an end of a double-stranded nucleic acid molecule wherein substantially all of the nucleotides in the end of one strand of the nucleic acid molecule are base paired with opposing nucleotides in the other strand of the same nucleic acid molecule. A nucleic acid molecule is not blunt ended if it has an end that includes a single-stranded portion greater than two nucleotides in length, referred to herein as an “overhang.” In some embodiments, the end of nucleic acid molecule does not include any single stranded portion, such that every nucleotide in one strand of the end is based paired with opposing nucleotides in the other strand of the same nucleic acid molecule. In some embodiments, the ends of the two blunt ended nucleic acid molecules that become ligated to each other do not include any overlapping, shared or complementary sequence. Typically, blunted-end ligation excludes the use of additional oligonucleotide adapters to assist in the ligation of the double-stranded amplified target sequence to the double-stranded adapter, such as patch oligonucleotides as described in Mitra and Varley, US2010/0129874. In some embodiments, blunt-ended ligation includes a nick translation reaction to seal a nick created during the ligation process.
[0071] The term “amplicon” refers to the amplified product of a nucleic acid amplification reaction, e g., RT-PCR.
[0072] The terms “reverse-transcriptase PCR” and “RT-PCR” refer to a type of PCR where the starting material is mRNA. The starting mRNA is enzymatically converted to complementary DNA or “cDNA” using a reverse transcriptase enzyme. The cDNA is then used as a template for a PCR reaction.
[0073] The terms “PCR product,” “PCR fragment,” and “amplification product” refer to the resultant mixture of compounds after two or more cycles of the PCR steps of denaturation, annealing and extension are complete. These terms encompass the case where there has been amplification of one or more segments of one or more target sequences.
[0074] The term “amplification reagents” refers to those reagents (deoxyribonucleotide triphosphates, buffer, etc.), needed for amplification except for primers, nucleic acid template, and the amplification enzyme. Typically, amplification reagents along with other reaction components are placed and contained in a reaction vessel (test tube, microwell, etc.). Amplification methods include PCR methods known to those of skill in the art and also include rolling circle amplification (Blanco et ah, J. Biol. Chem., 264, 8935-8940, 1989), hyperbranched rolling circle amplification (Lizard et ah, Nat. Genetics, 19, 225-232, 1998), and loop-mediated isothermal amplification (Notomi et ah, Nucl. Acids Res., 28, e63, 2000), each of which is hereby incorporated by reference in its entirety.
[0075] The term “hybridize” refers to a sequence specific non-covalent binding interaction with a complementary nucleic acid. Hybridization can occur to all or a portion of a nucleic acid sequence. Those skilled in the art will recognize that the stability of a nucleic acid duplex, or hybrids, can be determined by the Tm. Additional guidance regarding hybridization conditions can be found in: Current Protocols in Molecular Biology, John Wiley & Sons, N.Y., 1989, 6.3.1-6.3.6 and in: Sambrook et ah, Molecular Cloning, a Laboratory Manual, Cold Spring Harbor Laboratory Press, Vol. 3, 1989.
[0076] As used herein, “incorporating” a sequence into a polynucleotide refers to covalently linking a series of nucleotides with the rest of the polynucleotide, for example at the 3’ or 5’ end of the polynucleotide, by phosphodiester bonds, wherein the nucleotides are linked in the order prescribed by the sequence. A sequence has been “incorporated” into a polynucleotide, or equivalently the polynucleotide “incorporates” the sequence, if the polynucleotide contains the sequence or a complement thereof. Incorporation of a sequence into a polynucleotide can occur enzymatically (e.g., by ligation or polymerization) or using chemical synthesis (e.g., by phosphoramidite chemistry).
[0077] As used herein, the terms “amplify” and “amplification” refer to enzymatically copying the sequence of a polynucleotide, in whole or in part, so as to generate more polynucleotides that also contain the sequence or a complement thereof. The sequence being copied is referred to as the template sequence. Examples of amplification include DNA- templated RNA synthesis by RNA polymerase, RNA-templated first-strand cDNA synthesis by reverse transcriptase, and DNA-templated PCR amplification using a thermostable DNA polymerase. Amplification includes all primer-extension reactions. Amplification includes methods such as PCR, ligation amplification (or ligase chain reaction, LCR) and amplification methods. These methods are known and widely practiced in the art. See, e.g., U.S. Pat. Nos. 4,683,195 and 4,683,202 and Innis et ah, “PCR protocols: a guide to method and applications” Academic Press, Incorporated (1990) (for PCR); and Wu et al. (1989) Genomics 4:560-569 (for LCR). In general, the PCR procedure describes a method of gene amplification which is comprised of (i) sequence-specific hybridization of primers to specific genes within a DNA sample (or library), (ii) subsequent amplification involving multiple rounds of annealing, elongation, and denaturation using a DNA polymerase, and (iii) screening the PCR products for a band of the correct size. The primers used are oligonucleotides of sufficient length and appropriate sequence to provide initiation of polymerization, i.e. each primer is specifically designed to be complementary to each strand of the genomic locus to be amplified.
[0078] Reagents and hardware for conducting amplification reaction are commercially available. Primers useful to amplify sequences from a particular gene region are preferably complementary to and hybridize specifically to sequences in the target region or in its flanking regions and can be prepared using the polynucleotide sequences provided herein. Nucleic acid sequences generated by amplification can be sequenced directly.
[0079] The term “associated” is used herein to refer to the relationship between a sample and the DNA molecules, RNA molecules, or other polynucleotides originating from or derived from that sample. A polynucleotide is associated with a sample if it is an endogenous polynucleotide, i.e., it occurs in the sample at the time the sample is selected or is derived from an endogenous polynucleotide. For example, the mRNAs endogenous to a cell are associated with that cell. cDNAs resulting from reverse transcription of these mRNAs, and DNA amplicons resulting from PCR amplification of the cDNAs, contain the sequences of the mRNAs and are also associated with the cell. The polynucleotides associated with a sample need not be located or synthesized in the sample and are considered associated with the sample even after the sample has been destroyed (for example, after a cell has been lysed). Molecular barcoding or other techniques can be used to determine which polynucleotides in a mixture are associated with a particular sample.
[0080] When hybridization occurs in an antiparallel configuration between two single- stranded polynucleotides, the reaction is called “annealing” or “binding” and those polynucleotides are described as “complementary”. As used herein, and unless otherwise indicated, the term “complementary,” when used to describe a first nucleotide sequence in relation to a second nucleotide sequence, refers to the ability of a polynucleotide comprising the first nucleotide sequence to hybridize and form a duplex structure under certain conditions with a polynucleotide comprising the second nucleotide sequence, as will be understood by the skilled person. Such conditions can, for example, be stringent conditions, where stringent conditions can include: 400 mM NaCl, 40 mM PIPES pH 6.4, 1 mM EDTA, 50°C or 70°C for 12-16 hours followed by washing. Other conditions, such as physiologically relevant conditions as can be encountered inside an organism, can apply. The skilled person will be able to determine the set of conditions most appropriate for a test of complementarity of two sequences in accordance with the ultimate application of the hybridized nucleotides. [0081] Complementary sequences include base-pairing of a region of a polynucleotide comprising a first nucleotide sequence to a region of a polynucleotide comprising a second nucleotide sequence over the length or a portion of the length of one or both nucleotide sequences. Such sequences can be referred to as “complementary” with respect to each other herein. However, where a first sequence is referred to as “substantially complementary” with respect to a second sequence herein, the two sequences can be complementary, or they can include one or more, but generally not more than about 5, 4, 3, or 2 mismatched base pairs within regions that are base-paired. For two sequences with mismatched base pairs, the sequences will be considered “substantially complementary” as long as the two nucleotide sequences bind to each other via base-pairing.
[0082] Conventional notation is used herein to describe nucleotide sequences: the left-hand end of a single-stranded nucleotide sequence is the 5’-end; the left-hand direction of a double-stranded nucleotide sequence is referred to as the 5’-direction. The direction of 5’ to 3’ addition of nucleotides to nascent RNA transcripts is referred to as the transcription direction. The DNA strand having the same sequence as an mRNA is referred to as the “coding strand”; sequences on the DNA strand having the same sequence as an mRNA transcribed from that DNA and which are located 5’ to the 5’ -end of the RNA transcript are referred to as “upstream sequences”; sequences on the DNA strand having the same sequence as the RNA and which are 3’ to the 3’ end of the coding RNA transcript are referred to as “downstream sequences.”
[0083] In some embodiments, the double stranded DNA fragments can be end polished so that they are amenable for ligation. For example, the ends of the DNA fragments can be polished to have blunt ends. As known in the art, this can be achieved with enzymes that can either fill in or remove the protruding strand.
[0084] In some embodiments, the sample tag oligonucleotide or sample tag sequence can comprise a unique sequence specific for each sample. The sample tag oligonucleotide can comprise a unique sequence specific for each sample and a spacer. The sample tag oligonucleotide can comprise a unique sequence specific for each sample, one or more dU bases, and a spacer, e.g., in the 5’ to 3’ direction. [0085] In some embodiments, the PCR primer comprises a sample tag sequence comprising a unique sequence specific for each sample, one or more dU bases, a spacer, and a universal sequence. The PCR primer can comprise a sample tag oligonucleotide comprising a unique sequence specific for each sample and a universal sequence, e.g., in the 5’ to 3’ direction. In some embodiments, the number of dU bases can vary, e.g., but not limited to, at least 1, at least 2, at least 3, at least 4, at least 5, at least 6, at least 7, at least 8, at least 9, at least 10, at least 15, at least 20, at least 25, at least 30, or any number or ranges of dU bases derived therefrom.
[0086] The spacer can be a piece of sequence that, when serving as template, cannot be extended by polymerase in normal PCR conditions. For example, a spacer can be, but not limited to, one or more RNA bases or a linker, such as a carbon backbone without any nucleobase. The polymerase stops incorporating a nucleotide at the spacer site in the template, thus leaving a single strand overhang at 5’ end beyond the spacer site. In some embodiments, the spacer can be a covalent linker between 2 oligonucleotides and can be, e.g., but not limited to, at least 1 base long, at least 2 bases long, at least 3 bases long, at least 4 bases long, at least 5 bases long, at least 6 bases long, at least 7 bases long, at least 8 bases long, at least 9 bases long, at least 10 bases long, or any number or ranges of bases derived therefrom.
[0087] The sample tag oligonucleotide can comprise 1-100 base differences in its unique sequence compared to a sample specific oligonucleotide for another sample. In some embodiments, the sample tag oligonucleotide can comprise 1-100, 1-90, 1-80, 1-70, 1-60, 1- 50, 1-40, 1-30, 1-20, 1-10, 1-9, 1-8, 1-7, 1-6, 1-5, 1-4, 1-3, 1-2, or a specific number or ranges of base differences derived therefrom in its unique sequence compared to a sample specific oligonucleotide for another sample.
[0088] The universal sequence is the region of the PCR primer that are complementary to nucleotide sequences that are very common in a particular set of DNA molecules and cloning vectors. Thus, they are able to bind to a wide variety of DNA templates.
[0089] As used herein, “amplified target sequences” and its derivatives, refers generally to a nucleic acid sequence produced by the amplification of/amplifying the target sequences using target-specific primers and the methods provided herein. The amplified target sequences can be either of the same sense (the positive strand produced in the second round and subsequent even-numbered rounds of amplification) or antisense (i.e., the negative strand produced during the first and subsequent odd-numbered rounds of amplification) with respect to the target sequences. For the purposes of this disclosure, the amplified target sequences are typically less than 50% complementary to any portion of another amplified target sequence in the reaction.
[0090] The term “polymerase chain reaction” (“PCR”) of Mullis (U.S. Pat. Nos. 4,683,195, 4,683,202, and 4,965,188) refers to a method for increasing the concentration of a segment of a target sequence in a mixture of nucleic acid sequences without cloning or purification. This process for amplifying the target sequence consists of introducing a large excess of two oligonucleotide primers to the nucleic acid sequence mixture containing the desired target sequence, followed by a precise sequence of thermal cycling in the presence of a polymerase (e.g., DNA polymerase). The two primers are complementary to their respective strands of the double stranded target sequence. To effect amplification, the mixture is denatured and the primers then annealed to their complementary sequences within the target molecule. Following annealing, the primers are extended with a polymerase so as to form a new pair of complementary strands. The steps of denaturation, primer annealing, and polymerase extension can be repeated many times (i.e., denaturation, annealing and extension constitute one “cycle;” there can be numerous “cycles”) to obtain a high concentration of an amplified segment of the desired target sequence. The length of the amplified segment of the desired target sequence is determined by the relative positions of the primers with respect to each other, and therefore, this length is a controllable parameter. By virtue of the repeating aspect of the process, the method is referred to as the “polymerase chain reaction” (hereinafter “PCR”). Because the desired amplified segments of the target sequence become the predominant sequences (in terms of concentration) in the mixture, they are said to be “PCR amplified.”
[0091] As used herein, the term “primer” includes a PCR primer or an oligonucleotide, either natural or synthetic, that is capable, upon forming a duplex with a polynucleotide template, of acting as a point of initiation of nucleic acid synthesis and being extended from its 3’ end along the template so that an extended duplex is formed. The sequence of nucleotides added during the extension process is determined by the sequence of the template polynucleotide. Usually primers are extended by a DNA polymerase. Primers usually have a length in the range of between 3 to 36 nucleotides, also 5 to 24 nucleotides, also from 14 to 36 nucleotides. Primers within the scope of the disclosure herein include orthogonal primers, amplification primers, constructions primers and the like. Pairs of primers can flank a sequence of interest or a set of sequences of interest. Primers and probes can be degenerate in sequence. Primers within the scope of the disclosure herein bind adjacent to a target sequence. A “primer” can be considered a short polynucleotide, generally with a free 3’ -OH group that binds to a target or template potentially present in a sample of interest by hybridizing with the target, and thereafter promoting polymerization of a polynucleotide complementary to the target. Primers of the disclosure herein are comprised of nucleotides ranging from 17 to 30 nucleotides. In some embodiments, the primer is at least 17 nucleotides, or alternatively, at least 18 nucleotides, or alternatively, at least 19 nucleotides, or alternatively, at least 20 nucleotides, or alternatively, at least 21 nucleotides, or alternatively, at least 22 nucleotides, or alternatively, at least 23 nucleotides, or alternatively, at least 24 nucleotides, or alternatively, at least 25 nucleotides, or alternatively, at least 26 nucleotides, or alternatively, at least 27 nucleotides, or alternatively, at least 28 nucleotides, or alternatively, at least 29 nucleotides, or alternatively, at least 30 nucleotides, or alternatively at least 50 nucleotides, or alternatively at least 75 nucleotides or alternatively at least 100 nucleotides.
[0092] As used herein, “target-specific primer” and its derivatives, refers generally to a single stranded or double-stranded polynucleotide, typically an oligonucleotide, that includes at least one sequence that is at least 50% complementary, typically at least 75% complementary or at least 85% complementary, more typically at least 90% complementary, more typically at least 95% complementary, more typically at least 98% or at least 99% complementary, or 100% identical, to at least a portion of a nucleic acid molecule that includes a target sequence of interest, i.e., a target nucleic acid molecule. In such instances, the target-specific primer and target sequence are described as “corresponding” to each other. In some embodiments, the target-specific primer is capable of hybridizing to at least a portion of its corresponding target sequence (or to a complement of the target sequence); such hybridization can optionally be performed under standard hybridization conditions or under stringent hybridization conditions. In some embodiments, the target-specific primer is not capable of hybridizing to the target sequence, or to its complement, but is capable of hybridizing to a portion of a nucleic acid strand including the target sequence, or to its complement. In some embodiments, the target-specific primer includes at least one sequence that is at least 75% complementary, typically at least 85% complementary, more typically at least 90% complementary, more typically at least 95% complementary, more typically at least 98% complementary, or more typically at least 99% complementary, to at least a portion of the target sequence itself; in other embodiments, the target-specific primer includes at least one sequence that is at least 75% complementary, typically at least 85% complementary, more typically at least 90% complementary, more typically at least 95% complementary, more typically at least 98% complementary, or more typically at least 99% complementary, to at least a portion of the nucleic acid molecule other than the target sequence. In some embodiments, the target-specific primer is substantially non-complementary to other target sequences present in the sample; optionally, the target-specific primer is substantially non- complementary to other nucleic acid molecules present in the sample. In some embodiments, nucleic acid molecules present in the sample that do not include or correspond to a target sequence (or to a complement of the target sequence) are referred to as “non-specific” sequences or “non-specific nucleic acids”. In some embodiments, the target-specific primer is designed to include a nucleotide sequence that is substantially complementary to at least a portion of its corresponding target sequence. In some embodiments, a target-specific primer is at least 95% complementary, or at least 99% complementary, or 100% identical, across its entire length to at least a portion of a nucleic acid molecule that includes its corresponding target sequence. In some embodiments, a target-specific primer can be at least 90%, at least 95% complementary, at least 98% complementary or at least 99% complementary, or 100% identical, across its entire length to at least a portion of its corresponding target sequence. In some embodiments, a forward target-specific primer and a reverse target-specific primer define a target- specific primer pair that can be used to amplify the target sequence via template-dependent primer extension. Typically, each primer of a target-specific primer pair includes at least one sequence that is substantially complementary to at least a portion of a nucleic acid molecule including a corresponding target sequence but that is less than 50% complementary to at least one other target sequence in the sample. In some embodiments, amplification can be performed using multiple target-specific primer pairs in a single amplification reaction, wherein each primer pair includes a forward target-specific primer and a reverse target-specific primer, each including at least one sequence that substantially complementary or substantially identical to a corresponding target sequence in the sample, and each primer pair having a different corresponding target sequence. In some embodiments, the target- specific primer can be substantially non-complementary at its 3’ end or its 5’ end to any other target-specific primer present in an amplification reaction. In some embodiments, the target-specific primer can include minimal cross hybridization to other target-specific primers in the amplification reaction. In some embodiments, target-specific primers include minimal cross-hybridization to non-specific sequences in the amplification reaction mixture. In some embodiments, the target-specific primers include minimal self-complementarity. In some embodiments, the target-specific primers can include one or more cleavable groups located at the 3 ’ end. In some embodiments, the target-specific primers can include one or more cleavable groups located near or about a central nucleotide of the target-specific primer. In some embodiments, one of more targets-specific primers includes only non-cleavable nucleotides at the 5’ end of the target-specific primer. In some embodiments, a target specific primer includes minimal nucleotide sequence overlap at the 3’ end or the 5’ end of the primer as compared to one or more different target-specific primers, optionally in the same amplification reaction. In some embodiments 1, 2, 3, 4, 5, 6, 7, 8, 9, 10 or more, target- specific primers in a single reaction mixture include one or more of the above embodiments. In some embodiments, substantially all of the plurality of target-specific primers in a single reaction mixture includes one or more of the above embodiments.
[0093] Primer design is based on single primer extension, in which each genomic target is enriched by one target-specific primer and one universal primer - a strategy that removes conventional two target-specific primer design restriction and reduces the amount of required primers. All primers required for a panel are pooled into an individual primer pool to reduce panel handling and the number of pools required for enrichment and library construction. [0094] The booster panel is a pool of up to 100 primers that can be used to boost the performance of certain primers in any panel (cataloged, extended, or custom), or to extend the contents of an existing custom panel. The primers are delivered as a single pool that can be spiked into the existing panel.
[0095] After removing unused adapters, a limited number of PCR cycles can be conducted using an adapter primer and a pool of single primers, each carrying a gene specific sequence and a 5’ universal sequence. During this process, each single primer repeatedly samples the same target locus from different DNA templates. Afterwards, additional PCR cycles can be conducted using universal primers to attach complete adapter sequences and to amplify the library to the desired quantity.
[0096] A real-time polymerase chain reaction (Real-Time PCR), also known as quantitative polymerase chain reaction (qPCR), is a laboratory technique of molecular biology based on the polymerase chain reaction (PCR). It monitors the amplification of a targeted DNA molecule during the PCR, i.e. in real-time, and not at its end, as in conventional PCR. Real time PCR can be used quantitatively (quantitative real-time PCR), and semi -quantitatively, i.e. above/below a certain amount of DNA molecules (semi quantitative real-time PCR). Other types of PCRs include but are not limited to nested PCR (used to analyze DNA sequences coming from different organisms of the same species but that can differ for a single nucleotide (SNIPS) and to ensure amplification of the sequence of interest in each of the organism analyzed) and Inverse-PCR (usually used to clone a region flanking an insert or a transposable element).
[0097] Two common methods for the detection of PCR products in real-time PCR are: (1) non-specific fluorescent dyes that intercalate with any double-stranded DNA, and (2) sequence-specific DNA probes consisting of oligonucleotides that are labeled with a fluorescent reporter which permits detection only after hybridization of the probe with its complementary sequence.
[0098] Methods and kits for performing PCR are well known in the art. PCR is a reaction in which replicate copies are made of a target polynucleotide using a pair of primers or a set of primers consisting of an upstream and a downstream primer, and a catalyst of polymerization, such as a DNA polymerase, and typically a thermally-stable polymerase enzyme. Methods for PCR are well known in the art, and taught, for example in MacPherson et al. (1991) PCR 1 : A Practical Approach (IRL Press at Oxford University Press).
[0099] In the methods disclosed herein, the corresponding capture probe can bind to the corresponding sample tag sequence of the ligation or PCR product, i.e., the unique sequence specific for a sample.
[0100] The capture probes can be pooled at a desired concentration prior to the adding in (c) to the pooled samples.
[0101] A “capture probe” is meant a moiety which can be used to bind or attach to a strand of nucleic acid. A capture probe can comprise a nucleotide sequence that can bind to a corresponding or complementary nucleotide sequence in a sample tag oligonucleotide, or a ligation or PCR product containing the unique sequence specific for a sample. The capture probe can also include a polyT tail. In some embodiments, the capture probe comprises a nucleotide sequence that has at least 80%, at least 85%, at least 90%, at least 93%, at least 94%, at least 95%, at least 96%, at least 97%, at least 98%, at least 99%, or 100% identical to a nucleotide sequence that is complementary to a region of the sample tag oligonucleotide or sequence that comprises a unique sequence for a sample.
[0102] The capture probe can be attached or linked to an affinity tag. In some embodiments, the capture probe is an affinity tag that is attached to a nucleic acid. An “affinity tag” is a moiety that can bind to a solid surface or support. An affinity tag can be, but is not limited to: antigens (such as proteins (including peptides)) and antibodies (including fragments thereof (FAbs, etc.)); proteins and small molecules, including biotin/streptavidin; enzymes and substrates or inhibitors; other protein-protein interacting pairs; receptor-ligands; carbohydrates and their binding partners, nucleic acid— nucleic acid binding proteins pairs; or oligonucleotides and their reverse complement oligonucleotide sequence pairs. In some embodiments, the capture probe can be linked to, but not limited to, an affinity tag for binding to a solid surface, His tag, TAP tag, or antibody tag. The affinity tag can be, but not limited to, biotin and the solid surface can be a streptavidin coated magnetic bead.
[0103] In some embodiments, the affinity tag comprises biotin or imino-biotin, which can bind to streptavidin. Imino-biotin disassociates from streptavidin in pH 4.0 buffer while biotin requires a denaturant (e.g., 6 M guanidinium HC1, pH 1.5 or 90% formamide at 95°C). [0104] By "specifically bind" is meant that the capture probe binds with specificity to a sample tag oligonucleotide or a moiety on a solid support to differentiate between the pair and other components or contaminants of the system. The binding should be sufficient to remain bound under the conditions of the assay, including washes to remove non-specific binding. In some embodiments, the dissociation constants of the pair will be less than about 1 O 4- 1 O 6 M 1, with less than about 105 to 109 M 1, or less than about 107-109 M 1.
[0105] Thus, the target nucleic acid comprising the sample tag sequence can be immobilized on a solid surface or support rather than non-sample tag oligonucleotides. In other embodiments, the sample tag oligonucleotide is immobilized by binding to a capture probe attached to a solid support.
[0106] The sample tag sequence region or primer containing the sample tag sequence binds to a corresponding capture probe, which can be attached to a solid surface or support. By "substrate" or "solid support" or other grammatical equivalents herein is meant any material that is appropriate for or can be modified to be appropriate for the attachment of the target sequences. As will be appreciated by those in the art, the number of possible substrates is very large. Possible substrates include, but are not limited to, glass and modified or functionalized glass, plastics (including acrylics, polystyrene and copolymers of styrene and other materials, polypropylene, polyethylene, polybutylene, polyurethanes, Teflon, etc.), polysaccharides, nylon or nitrocellulose, ceramics, resins, silica or silica-based materials including silicon and modified silicon, carbon, metals, inorganic glasses, plastics, optical fiber bundles, and a variety of other polymers. For example, the solid surface can be magnetic beads or high throughput microtiter plates.
[0107] The sample tag oligonucleotide can bind or attach to the solid surface or support in a number of ways, such as by its corresponding capture probes.
[0108] The non-hybridized nucleic acids can be removed by washing. For example, the hybridization complexes are immobilized on a solid support and washed under conditions sufficient to remove non-hybridized nucleic acids, i.e., non-hybridized probes and sample nucleic acids. In a particularly preferred embodiment immobilized complexes are washed under conditions sufficient to remove imperfectly hybridized complexes. That is, hybridization complexes that contain mismatches are also removed in the wash steps.
[0109] A variety of hybridization or washing conditions can be used, including high, moderate and low stringency conditions; see for example, Maniatis et al., Molecular Cloning: A Laboratory Manual, 2d Edition, 1989, and Short Protocols in Molecular Biology, ed. Ausubel, et al, hereby incorporated by reference. Stringent conditions are sequence- dependent and will be different in different circumstances. Longer sequences hybridize specifically at higher temperatures. An extensive guide to the hybridization of nucleic acids is found in Tijssen, Techniques in Biochemistry and Molecular Biology— Hybridization with Nucleic Acid Probes, "Overview of principles of hybridization and the strategy of nucleic acid assays" (1993). Generally, stringent conditions are selected to be about 5-10°C. lower than the thermal melting point (Tm) for the specific sequence at a defined ionic strength and pH. The Tm is the temperature (under defined ionic strength, pH and nucleic acid concentration) at which 50% of the probes complementary to the target hybridize to the target sequence at equilibrium (as the target sequences are present in excess, at Tm, 50% of the probes are occupied at equilibrium). Stringent conditions will be those in which the salt concentration is less than about 1.0 M sodium ion, typically about 0.01 to 1.0 M sodium ion concentration (or other salts) at pH 7.0 to 8.3 and the temperature is at least about 30°C for short probes (e.g. 10 to 50 nucleotides) and at least about 60°C for long probes (e.g. greater than 50 nucleotides). Stringent conditions may also be achieved with the addition of helix destabilizing agents such as formamide. The hybridization or washing conditions may also vary when a non-ionic backbone, i.e. PNA is used, as is known in the art. In addition, cross- linking agents may be added after target binding to cross-link, i.e. covalently attach, the two strands of the hybridization complex. [0110] Where beads are used, it is not intended that the disclosure herein be limited to the particular type. A variety of bead types are commercially available, including but not limited to, beads selected from agarose beads, streptavidin-coated beads, NeutrAvidin- coated beads, antibody-coated beads, paramagnetic beads, magnetic beads, electrostatic beads, electrically conducting beads, fluorescently labeled beads, colloidal beads, glass beads, semiconductor beads, and polymeric beads.
[0111] In some embodiments, it is desirable to release or cleave the nucleic acid molecules attached to the capture probes.
[0112] In the methods disclosed herein, the releasing in (e) can comprise releasing the nucleic acid molecules from the capture probes by, but not limited to, heat or alkaline denaturation. The releasing in (e) can comprise releasing the nucleic acid molecules from the capture probes by uracil-DNA glycosylase digestion of the dU base in the sample specific oligonucleotide or sequence.
[0113] In some embodiments, the cleavage enzyme composition comprises an N- glycosylase and an AP endonuclease. In some embodiments, the cleavage enzyme is an N- glycosylase selected from among uracil-N-glycosylase and an AP endonuclease and FPG protein and the AP endonuclease is selected from among E. cob endonuclease III or endonuclease IV.
[0114] The cleavage enzyme can also be a restriction enzyme. For example, the sample tag oligonucleotide bound to the capture probe can exhibit a sequence of a restriction site, which can then be incubated with a restriction endonuclease that recognizes the restriction site, wherein the restriction endonuclease cleaves the nucleic acid molecule from the capture probe.
[0115] The methods disclosed herein results in obtaining or achieving normalization of or normalized quantity of target nucleic acid molecules between samples such that the quantity or number of the target nucleic acid molecules a sample is similar to the quantity or number of the same or different target nucleic acid molecules in other sample(s).
[0116] In some embodiments, the sample pool can be normalized by hybridization of the capture probe to the sample tag oligonucleotide or sequence for 10 min to 2 hr, 15 min to 1.5 hr, 30 min to 1 hr, or any length of time or ranges of time derived therefrom as needed for the normalization.
[0117] In some embodiments, the normalization of the target nucleic acid molecules in the pooled samples is within 5-fold to 0.5-fold, within 4-fold to 1-fold, within 3-fold to 2-fold, within 1.5-fold to 0.5-fold, or within folds or ranges of folds derived therefrom in the pooled samples. In some embodiments, the normalization of the target nucleic acid molecules in the pooled samples is within 1% to 10%, 2% to 7%, 3% to 5%, 4% to 6%, or any % or ranges of % derived therefrom. In some embodiments, the normalization of the target nucleic acid molecules in the pooled samples have a Gini coefficient of 0.5 to 0.03 0.4 to 0.05, 0.3 to 0.06, 0.2 to 0.03, 0.015 to 0.07, or any Gini coefficient or ranges of Gini coefficient derived therefrom. As used herein, a “Gini coefficient” is a measure of the inequality of values across the population. A Gini coefficient of zero expresses perfect equality, where all values are the same. A Gini coefficient of one (or 100%) expresses maximal inequality among values. In some embodiments, such normalizations as disclosed herein can be compared to prenormalization measurements.
[0118] The amplified and captured target nucleic acid molecule, DNA or cDNA library thereof can be sequenced and analyzed using methods known to those of skill in the art, e.g., by next-generation sequencing (NGS). In certain exemplary embodiments, RNA expression profiles are determined using any sequencing methods known in the art. Determination of the sequence of a nucleic acid sequence of interest can be performed using a variety of sequencing methods known in the art including, but not limited to, sequencing by synthesis (SBS), sequencing by hybridization (SBH), sequencing by ligation (SBL) (Shendure et al. (2005) Science 309:1728), quantitative incremental fluorescent nucleotide addition sequencing (QIFNAS), stepwise ligation and cleavage, fluorescence resonance energy transfer (FRET), molecular beacons, TaqMan reporter probe digestion, pyrosequencing, fluorescent in situ sequencing (FISSEQ), FISSEQ beads (U.S. Pat. No. 7,425,431), wobble sequencing (PCT/US05/27695), multiplex sequencing (U.S. Ser. No. 12/027,039, filed Feb. 6, 2008; Porreca et al (2007) Nat. Methods 4:931), polymerized colony (POLONY) sequencing (U.S. Pat. Nos. 6,432,360, 6,485,944 and 6,511,803, and PCT/US05/06425); nanogrid rolling circle sequencing (ROLONY) (US2009/0018024), allele-specific oligo ligation assays (e.g., oligo ligation assay (OLA), single template molecule OLA using a ligated linear probe and a rolling circle amplification (RCA) readout, ligated padlock probes, and/or single template molecule OLA using a ligated circular padlock probe and a rolling circle amplification (RCA) readout) and the like. High-throughput sequencing methods, e.g., using platforms such as Roche 454, Illumina Solexa, AB-SOLiD, Helicos, Complete Genomics, Polonator platforms and the like, can also be utilized. A variety of light-based sequencing technologies are known in the art (Landegren et al. (1998) Genome Res. 8:769- 76; Kwok (2000) Pharmacogenomics 1:95-100; and Shi (2001) Clin. Chem. 47:164-172). [0119] Embodiments of the disclosure herein also provide methods for analyzing gene expression in a plurality of single cells, the method comprising the steps of preparing a cDNA library using the method described herein and sequencing the cDNA library. A “gene” refers to a polynucleotide containing at least one open reading frame (ORF) that is capable of encoding a particular polypeptide or protein after being transcribed and translated. Any of the polynucleotide sequences described herein can be used to identify larger fragments or full- length coding sequences of the gene with which they are associated. Methods of isolating larger fragment sequences are known to those of skill in the art.
[0120] As used herein, “expression” refers to the process by which polynucleotides are transcribed into mRNA and/or the process by which the transcribed mRNA is subsequently being translated into peptides, polypeptides, or proteins. If the polynucleotide is derived from genomic DNA, expression can include splicing of the mRNA in a eukaryotic cell.
[0121] The cDNA library can be sequenced by any suitable screening method. In particular, the cDNA library can be sequenced using a high-throughput screening method, such as Applied Biosystems’ SOLiD sequencing technology, or Illumina’s Genome Analyzer. In some aspects, the cDNA library can be shotgun sequenced. The number of reads can be at least 10,000, at least 1 million, at least 10 million, at least 100 million, or at least 1000 million. In another aspect, the number of reads can be from 10,000 to 100,000, or alternatively from 100,000 to 1 million, or alternatively from 1 million to 10 million, or alternatively from 10 million to 100 million, or alternatively from 100 million to 1000 million. A “read” is a length of continuous nucleic acid sequence obtained by a sequencing reaction.
[0122] The DNA or gDNA library generated by the methods disclosed herein can be useful for, but not limited to, DNA variant detection, copy number analysis, fusion gene detection and structural variant detection. The cDNA library generated by the methods disclosed herein can be useful for, but not limited to, RNA variant detection, gene expression analysis, and fusion gene detection. The DNA and cDNA libraries can also be used for paired DNA and RNA profiling.
[0123] The expression profiles described herein are useful in the field of predictive medicine in which diagnostic assays, prognostic assays, pharmacogenomics, and monitoring clinical trials are used for prognostic (predictive) purposes to thereby treat an individual prophylactically. Accordingly, some embodiments relate to diagnostic assays for determining the expression profile of nucleic acid sequences (e.g., RNAs), in order to determine whether an individual is at risk of developing a disorder and/or disease. Such assays can be used for prognostic or predictive purposes to thereby prophylactically treat an individual prior to the onset of the disorder and/or disease. Accordingly, in certain exemplary embodiments, methods of diagnosing and/or prognosing one or more diseases and/or disorders using one or more of expression profiling methods described herein are provided.
[0124] Some embodiments pertain to monitoring the influence of agents (e.g., drugs or other compounds administered either to inhibit or to treat or prevent a disorder and/or disease) on the expression profile of nucleic acid sequences (e.g., RNAs) in clinical trials. Accordingly, in certain exemplary embodiments, methods of monitoring one or more diseases and/or disorders before, during and/or subsequent to treatment with one or more agents using one or more of expression profiling methods described herein are provided.
[0125] Monitoring the influence of agents (e.g., drug compounds) on the level of expression of a marker can be applied not only in basic drug screening, but also in clinical trials. For example, the effectiveness of an agent to affect an expression profile can be monitored in clinical trials of subjects receiving treatment for a disease and/or disorder associated with the expression profile. In certain exemplary embodiments, the methods for monitoring the effectiveness of treatment of a subject with an agent (e.g., an agonist, antagonist, peptidomimetic, protein, peptide, nucleic acid, small molecule, or other drug candidate) comprising the steps of (i) obtaining a pre-administration sample from a subject prior to administration of the agent; (ii) detecting one or more expression profiled in the pre administration sample; (iii) obtaining one or more post-administration samples from the subject; (iv) detecting one or more expression profiles in the post-administration samples; (v) comparing the one or more expression profiled in the pre-administration sample with the one or more expression profiles in the post-administration sample or samples; and (vi) altering the administration of the agent to the subject accordingly.
[0126] The expression profiling methods described herein allow the quantitation of gene expression. Thus, not only tissue specificity, but also the level of expression of a variety of genes in the tissue is ascertainable. Thus, genes can be grouped on the basis of their tissue expression per se and level of expression in that tissue. This is useful, for example, in ascertaining the relationship of gene expression between or among tissues. Thus, one tissue can be perturbed and the effect on gene expression in a second tissue can be determined. In this context, the effect of one cell type on another cell type in response to a biological stimulus can be determined. Such a determination is useful, for example, to know the effect of cell-cell interaction at the level of gene expression. If an agent is administered therapeutically to treat one cell type but has an undesirable effect on another cell type, an assay can be used to determine the molecular basis of the undesirable effect and thus provides the opportunity to co-administer a counteracting agent or otherwise treat the undesired effect. Similarly, even within a single cell type, undesirable biological effects can be determined at the molecular level. Thus, the effects of an agent on expression of other than the target gene can be ascertained and counteracted.
[0127] In other embodiments, the time course of expression of one or more nucleic acid sequences (e.g., genes, mRNAs and the like) in an expression profile can be monitored. This can occur in various biological contexts, as disclosed herein, for example development of a disease and/or disorder, progression of a disease and/or disorder, and processes, such as cellular alterations associated with the disease and/or disorder.
[0128] The expression profiling methods described herein are also useful for ascertaining the effect of the expression of one or more nucleic acid sequences (e.g., genes, mRNAs and the like) on the expression of other nucleic acid sequences (e.g., genes, mRNAs and the like) in the same cell or in different cells. This provides, for example, for a selection of alternate molecular targets for therapeutic intervention if the ultimate or downstream target cannot be regulated.
[0129] The expression profiling methods described herein are also useful for ascertaining differential expression patterns of one or more nucleic acid sequences (e.g., genes, mRNAs and the like) in normal and abnormal cells. This provides a battery of nucleic acid sequences (e.g., genes, mRNAs and the like) that could serve as a molecular target for diagnosis or therapeutic intervention.
EXAMPLES
Example 1
[0130] The implementation of the library normalization process can be in many forms, of which a few are shown below.
[0131] In step 1, to create PCR product with single stranded arm for hybridization, modified PCR primers can be used. In the following example, it consists of 5’ variable region for hybridization to capture probe, dU base for cleavage in later steps, and a spacer preventing polymerase elongation beyond this point so that variable region for probe binding is always single stranded (FIG. 3 A).
[0132] After PCR amplification, the product is as shown in FIG. 3B.
[0133] In step 2, the PCR products are pooled from multiple samples.
[0134] In step 3, the capture probe is designed to bind to corresponding variable region of the PCR products. For immobilization in step 4, it is 3’ biotinylated for binding to streptavidin coated magnetic beads (FIG. 3C).
[0135] Capture probe is in limited supply in the reaction, in lesser quantity than the corresponding PCR products, so that only a subset of the PCR product binds. Capture probes are pooled at pre-determined concentration, depending on sequencing capacity allocation for each sample. A typical normalization is equimolar for each sample. The total amount of the capture probe determines the output library concentration.
[0136] In step 5, the release of normalized library can be achieved by UNG (Uracil-DNA glycosylase) digestion of the dU base from the PCR primer. In this case, the library molecules maintains double stranded structure. It can be visualized and quantified by optional library quantification methods such as bioanalyzer. Traditional qPCR quantification is also possible, although it is not necessary. The library is ready to be sequenced (FIG. 3D).
Example 2
[0137] To test how sequence differences between capture probes affected performance of normalization, especially nonspecific binding when the sequences were similar, two probes that differed by 4 bases (SEQ ID NOS:3 and 33), and two probes that differed by 8 bases (SEQ ID NOS:21 and 23) were selected. DNA template derived from a PCR amplicon was used for uPCR with each of the uPCR primers. Four (4) NGS libraries were generated representing 4 different samples. The libraries were pooled into two batches as two pre normalization conditions. Each batch had two samples, with nearly 100-fold difference of the sample abundance between them. The library mixes were normalized according to the workflow disclosed herein, aiming for equal molarity of each sample as output. The abundance of each sample in the pre-normalization and post-normalization library mix was determined by sequencing on an Illumina MiSeq sequencer. Table 1
Figure imgf000033_0001
[0138] When probe SEQ ID NOS:3 and 33 (edit distance of 4) were used, a 99-fold sample difference in the pre-normalization library was reduced to less than 3 -folds. When the edit distance was increased to 8, the post-normalization library difference was within 2-folds. These results demonstrated that the methods disclosed herein have good normalization capacity, as long as the probes have sequence difference pairwise.
Example 3
[0139] To expand the test, a panel of 5 probes were selected. Sequences of any two of the capture probes differed by at least 3 bases. DNA template derived from a PCR amplicon was used for uPCR with each of the uPCR primers, resulting in 5 NGS libraries representing 5 different samples. These libraries were pooled together at different concentrations as the pre normalization condition. The library pool was normalized using the workflow disclosed herein, aiming for equal molarity of each sample as the output. The abundance of each sample in the pre-normalization and post-normalization library mix was determined by sequencing on an Illumina MiSeq sequencer.
Table 2
Figure imgf000033_0002
Figure imgf000034_0001
[0140] For a highly uneven library mix with a Gini coefficient of 0.420, the normalization method resulted in a fairly uniform library mix with a Gini coefficient of 0.118. The highest and lowest sample were within 2-fold after normalization
Example 4
[0141] A panel of 105 probes were designed. Sequences of any two of the capture probes differed by at least 4 bases. Out of the 105 designs, 25 were randomly selected and used for lab test (each set of uPCR primer and corresponding capture probe sequences shown in SEQ ID NOS:l and 2, and 3; 16 and 2, and 17; 20 and 2, and 21; 24 and 2, and 25; 26 and 2, and 27; 38 and 2, and 39; 40 and 2, and 41; 42 and 2, and 43; 44 and 2, and 45; 46 and 2, and 47; 48 and 2, and 49; 50 and 2, and 51; 52 and 2, and 53; 54 and 2, and 55; 56 and 2, and 57; 58 and 2, and 59; 60 and 2, and 61; 62 and 2, and 63; 64 and 2, and 65; 66 and 2, and 67; 68 and
2, and 69; 70 and 2, and 71; 72 and 2, and 73; 74 and 2, and 75; and 76 and 2, and 77) (FIGS. 4A and 4B). Twenty-five (25) DNA NGS libraries were prepared by using QIAseq Targeted DNA Panel system following manufacturer recommended condition with minor modifications. Briefly, the primers used in the final uPCR were substituted with normalization compatible PCR primers. This modification did not add any substantial handling complexity or additional hands-on time to the regular workflow. After that, libraries were individually quantified using Agilent’s bioanalyzer. The libraries were then pooled together at different concentrations based on bioanalyzer measurement. This created an uneven library pool, which was used to assess normalization performance. This uneven pool was sequenced on Illumina MiSeq platform to confirm the abundance of each library (Table
3, Pre-normalization abundance). The lowest abundant library constituted only 1% (Target Lib 2) to the pool, while the highest abundant one was 9.2% (Target Lib 13).
[0142] The library pool was normalized using the workflow disclosed herein, aiming for equal molar output. Two normalization reactions were performed. One was done by using 1- hour hybridization, the other was 30-min hybridization, to the capture probes. The abundance of each sample was determined by sequencing on an Illumina MiSeq sequencer (Table 3, Post-normalization abundance). Table 3
Figure imgf000035_0001
[0143] Using the highly uneven library mix with a Gini coefficient of 0.310 as input, both of the normalization reactions achieved very uniform output. The highest and lowest abundant samples were within 2-fold after normalization (5.1% vs 2.7% for 1-hr hyb, 5.1% vs 2.8% for 30-min hyb). For the 1-hr hyb with the capture probes, 23 out of the 25 samples were within the 3%-5% range. For the 30-min hyb with the capture probes, 22 out of the 25 samples were in that range. The 1-hr hybridization workflow performed slightly better than the 30-min hybridization, but the difference was not significant. This experiment demonstrated that with only a 30-min hybridization, the normalization workflow achieved exceptional performance.
Example 5
[0144] The normalization method was applied to Whole Genome Sequencing (WGS) workflow as a real case example. The same 25 capture probes/PCR primers from Example 4 were used. Twenty-five (25) WGS NGS libraries were prepared by using QIAseq FX DNA Library Kit following Manufacturer recommended condition with minor modifications. Briefly, the final PCR amplification were substituted with the normalization compatible conditions. This modification did not add any substantial handling complexity or reaction time to the original workflow. After PCR amplification, 5ul (out of 50ul) of each libraries were pooled together, and purified by Agencourt AMPure XP Beads. This library pool was normalized using a 30-min hybridization normalization method and sequenced on MiSeq platform (Table 4).
Table 4
Figure imgf000036_0001
Figure imgf000037_0001
[0145] The normalization achieved uniform library output with a Gini coefficient of 0.092. The highest and lowest abundant samples were within 2-fold (5.3% vs 3.0%). Twenty-two (22) out of the 25 samples were within a 3%-5% range.
[0146] The turnaround time for this normalization is about 1 hour and 45 mins, including purification of only one pooled library (30 mins). The hands-on time is about 45 mins. In contrast, the turnaround time for an equivalent traditional method is about 2 hrs. It requires individual purification of all the samples, which takes at least 45 mins to process 25 samples. Additionally, individual quantification, dilution and pooling of 25 samples takes about 1 hr and 15 mins to perform. The total hands-on time is about 1.5 hours. The time goes longer with more samples, while for the normalization, the time remains substantially unchanged with more samples. This makes the methods disclosed herein particularly attractive for high throughput applications.
[0147] The library normalization workflows disclosed herein provide high-throughput alternatives to traditional library quantification and mixing method. The workflows can be used as integrated components in current DNA or RNA sequencing kits. The workflows can also be used as a stand-alone kit, complementary to other universal NGS library preparation kits.
[0148] The foregoing description of the specific embodiments will so fully reveal the general nature of the invention that others can, by applying knowledge within the skill of the art, readily modify and/or adapt for various applications, without departing from the general concept of the invention. Therefore, such adaptations and modifications are intended to be within the meaning and range of equivalents of the disclosed embodiments, based on the teaching and guidance presented herein. It is to be understood that the phraseology or terminology herein is for the purpose of description and not of limitation, such that the terminology or phraseology of the present specification is to be interpreted by the skilled artisan in light of the teachings and guidance.
[0149] The breadth and scope of the present invention should not be limited by any of the above-described exemplary embodiments but should be defined only in accordance with the following claims and their equivalents.
[0150] All of the various aspects, embodiments, and options described herein can be combined in any and all variations.
[0151] All publications, patents, and patent applications mentioned in this specification are herein incorporated by reference to the same extent as if each individual publication, patent, or patent application was specifically and individually indicated to be herein incorporated by reference.

Claims

WHAT IS CLAIMED IS:
1. A method of obtaining a normalized quantity of target nucleic acid molecules from multiple samples, comprising
(a) attaching a sample tag oligonucleotide to a target nucleic acid molecule in each sample, wherein the sample tag oligonucleotide comprises a unique sequence specific for each sample;
(b) pooling multiple samples;
(c) adding corresponding capture probes to the pooled samples, wherein the sample tag oligonucleotide attached to the target nucleic acid molecule binds to the corresponding capture probe; and
(d) separating captured target nucleic acid molecules from uncaptured nucleic acid molecules.
2. The method of claim 1, further comprising (e) releasing the captured target nucleic acid molecules from the capture probes.
3. The method of claim 1 or 2, wherein the target nucleic acid molecule concentration in each sample is greater than the corresponding capture probe concentration.
4. The method of any one of claims 1 to 3, wherein the attaching in (a) comprises ligating the sample tag oligonucleotide to the target nucleic acid molecule in each sample.
5. The method of any one of claims 1 to 3, wherein the attaching in (a) comprises performing PCR amplification with a PCR primer.
6. The method of claim 5, wherein the PCR primer comprises the sample tag oligonucleotide, a spacer, and a universal sequence.
7. The method of any one of claims 1 to 6, wherein the corresponding capture probe binds to the unique sequence of a corresponding sample tag oligonucleotide attached to the target nucleic acid molecule.
8. The method of any one of claims 1 to 7, wherein the capture probes are pooled at a desired concentration prior to the adding in (c) to the pooled samples.
9. The method of any one of claims 2 to 8, wherein the releasing in (e) comprises heat or alkaline denaturation.
10. The method of any one of claims 1 to 9, wherein the sample tag oligonucleotide further comprises a dU base.
11. The method of any one of claim 10, wherein the releasing in (e) comprises releasing the target nucleic acid molecules from the capture probes by uracil-DNA glycosylase digestion of the dU base in the sample tag oligonucleotide.
12. The method of any one of claims 1 to 11, wherein the unique sequence of the sample tag oligonucleotide comprises 1-100 base differences compared to the unique sequence of the sample tag oligonucleotide for another sample.
13. The method of any one of claims 4 to 12, wherein the ligation or PCR products comprises a 5’ overhang and wherein the 5’ overhang comprises 10-50 bases.
14. The method of any one of claims 1 to 13, wherein the capture probe is linked to an affinity tag for binding to a solid surface.
15. The method of claim 14, wherein the affinity tag is biotin and the solid surface is a streptavidin coated magnetic bead.
16. The method of any one of claims 1 to 15, further comprising purifying the pooled target nucleic acid molecules attached to the sample tag oligonucleotides from the multiple samples.
17. The method of claim 16, wherein the purifying is by spin column, magnetic beads, or gel electrophoresis.
18. The method of any one of claims 5 to 17, further comprising quantifying the pooled PCR products.
19. The method of claim 18, wherein the quantifying is performed by qPCR or electrophoresis.
20. The method of any one of claims 2 to 19, further comprising sequencing the released target nucleic acid molecules.
21. The method of any one of claims 1 to 20, further comprising performing next generation sequencing.
22. The method of any one of claims 1 to 20, further comprising performing third generation sequencing.
23. The method of any one of claims 1 to 20, further comprising performing single cell analysis.
24. A method of simultaneously sequencing target nucleic acids from multiple samples, comprising:
(a) preparing two or more samples, each sample comprising a target nucleic acid molecule;
(b) performing PCR amplification of the target nucleic acid molecule in each sample with a PCR primer, wherein the PCR primer comprises a 5’ sample tag sequence comprising a unique sequence specific for each sample, a spacer, and a universal sequence;
(c) pooling the samples;
(d) adding corresponding capture probes to the pooled samples, wherein the sample tag sequence of the PCR product binds to a corresponding capture probe and the PCR product concentration of each sample is greater than the corresponding capture probe concentration;
(e) separating captured target nucleic acid molecules from uncaptured nucleic acid molecules; (f) releasing the captured target nucleic acids from the capture probes; and
(g) simultaneously sequencing the target nucleic acids from multiple samples.
25. The method of claim 24, further comprising performing target enrichment of the prepared sample prior to PCR amplification.
26. A kit comprising: multiple PCR primers for use with multiple samples comprising target nucleic acid molecules, wherein each PCR primer comprises a 5’ sample tag sequence comprising a unique sequence specific for each sample, a spacer, and a universal sequence; and corresponding capture probes capable of binding to the unique sequence specific for each sample.
27. The kit of claim 26, further comprising a ligase.
28. The kit of claim 26 or 27, wherein the PCR primers further comprise dU bases.
29. The kit of claim 28, further comprising a uracil-DNA glycosylase.
PCT/US2021/028193 2020-04-20 2021-04-20 Nucleic acid preparations from multiple samples and uses thereof WO2021216574A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
US202063012467P 2020-04-20 2020-04-20
US63/012,467 2020-04-20

Publications (1)

Publication Number Publication Date
WO2021216574A1 true WO2021216574A1 (en) 2021-10-28

Family

ID=78269900

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/US2021/028193 WO2021216574A1 (en) 2020-04-20 2021-04-20 Nucleic acid preparations from multiple samples and uses thereof

Country Status (1)

Country Link
WO (1) WO2021216574A1 (en)

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5688669A (en) * 1993-04-19 1997-11-18 Murtagh; James J. Methods for nucleic acid detection, sequencing, and cloning using exonuclease
US20130053253A1 (en) * 2010-02-22 2013-02-28 Population Genetics Technologies Ltd Region of Interest Extraction and Normalization Methods
US20170292124A1 (en) * 2016-04-07 2017-10-12 Illumina, Inc. Methods and systems for construction of normalized nucleic acid libraries
US20190010543A1 (en) * 2010-05-18 2019-01-10 Natera, Inc. Methods for simultaneous amplification of target loci
US20190203202A1 (en) * 2012-07-18 2019-07-04 Siemens Healthcare Diagnostics Inc. Method of Normalizing Biological Samples

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5688669A (en) * 1993-04-19 1997-11-18 Murtagh; James J. Methods for nucleic acid detection, sequencing, and cloning using exonuclease
US20130053253A1 (en) * 2010-02-22 2013-02-28 Population Genetics Technologies Ltd Region of Interest Extraction and Normalization Methods
US20190010543A1 (en) * 2010-05-18 2019-01-10 Natera, Inc. Methods for simultaneous amplification of target loci
US20190203202A1 (en) * 2012-07-18 2019-07-04 Siemens Healthcare Diagnostics Inc. Method of Normalizing Biological Samples
US20170292124A1 (en) * 2016-04-07 2017-10-12 Illumina, Inc. Methods and systems for construction of normalized nucleic acid libraries

Similar Documents

Publication Publication Date Title
CN106795514B (en) Bubble joint and application thereof in nucleic acid library construction and sequencing
KR102592367B1 (en) Systems and methods for clonal replication and amplification of nucleic acid molecules for genomic and therapeutic applications
CN114150041A (en) Reagents and methods for analyzing associated nucleic acids
CN110719957B (en) Methods and kits for targeted enrichment of nucleic acids
US20210024920A1 (en) Integrative DNA and RNA Library Preparations and Uses Thereof
JP2020501554A (en) Method for increasing the throughput of single molecule sequencing by linking short DNA fragments
JP2015516814A (en) Enrichment and sequencing of targeted DNA
US20220017954A1 (en) Methods for Preparing CDNA Samples for RNA Sequencing, and CDNA Samples and Uses Thereof
CN111801427A (en) Generation of single-stranded circular DNA templates for single molecules
WO2013079649A1 (en) Method and kit for characterizing rna in a composition
CA3170345A1 (en) Methods and materials for assessing nucleic acids
WO2021053008A1 (en) Immune repertoire profiling by primer extension target enrichment
US20220127600A1 (en) Methods of Detecting Analytes and Compositions Thereof
WO2021216574A1 (en) Nucleic acid preparations from multiple samples and uses thereof
KR20230124636A (en) Compositions and methods for highly sensitive detection of target sequences in multiplex reactions
KR20220130591A (en) Methods for accurate parallel quantification of nucleic acids in dilute or non-purified samples
CN115380119A (en) Method for detecting structural rearrangement in genome
CN114450420A (en) Compositions and methods for accurate determination of oncology
US11268087B2 (en) Isolation and immobilization of nucleic acids and uses thereof
WO2023063958A1 (en) Methods for producing dna libraries and uses thereof
WO2021180791A1 (en) Novel nucleic acid template structure for sequencing
KR20240032630A (en) Methods for accurate parallel detection and quantification of nucleic acids
JP2024035110A (en) Sensitive method for accurate parallel quantification of mutant nucleic acids
WO2023225515A1 (en) Compositions and methods for oncology assays
WO2024059622A2 (en) Methods for simultaneous amplification of dna and rna

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 21793171

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 21793171

Country of ref document: EP

Kind code of ref document: A1