WO2011146942A1 - Methods and kits to analyze microrna by nucleic acid sequencing - Google Patents

Methods and kits to analyze microrna by nucleic acid sequencing Download PDF

Info

Publication number
WO2011146942A1
WO2011146942A1 PCT/US2011/037616 US2011037616W WO2011146942A1 WO 2011146942 A1 WO2011146942 A1 WO 2011146942A1 US 2011037616 W US2011037616 W US 2011037616W WO 2011146942 A1 WO2011146942 A1 WO 2011146942A1
Authority
WO
WIPO (PCT)
Prior art keywords
seq
sequence
mirna
group
barcode
Prior art date
Application number
PCT/US2011/037616
Other languages
French (fr)
Inventor
Glen Weiss
Xueliang Xia
Original Assignee
The Translational Genomics Research Institute
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by The Translational Genomics Research Institute filed Critical The Translational Genomics Research Institute
Publication of WO2011146942A1 publication Critical patent/WO2011146942A1/en

Links

Classifications

    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12QMEASURING OR TESTING PROCESSES INVOLVING ENZYMES, NUCLEIC ACIDS OR MICROORGANISMS; COMPOSITIONS OR TEST PAPERS THEREFOR; PROCESSES OF PREPARING SUCH COMPOSITIONS; CONDITION-RESPONSIVE CONTROL IN MICROBIOLOGICAL OR ENZYMOLOGICAL PROCESSES
    • C12Q1/00Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions
    • C12Q1/68Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions involving nucleic acids
    • C12Q1/6806Preparing nucleic acids for analysis, e.g. for polymerase chain reaction [PCR] assay

Definitions

  • This present invention is related to a method, DNA constructs, kits and designed sequences for high-throughput sequencing of small non-coding RNA in general and miRNA in particular.
  • MicroRNAs are small, 18-25 nucleotide, non-coding, single- stranded
  • RNA molecules capable of regulating gene expression at both the transcriptional and translational level are capable of regulating gene expression at both the transcriptional and translational level.
  • Small non-coding RNA plays a key role in regulating a variety of biological processes, including developmental timing, cellular differentiation, tumor progression, neurogenesis, transposon silencing and viral defense.
  • the current tools for studying small RNA are inadequate for whole genome discovery and characterization of novel small RNA.
  • High throughput platforms based on probe-hybridization, such as chip-based microarrays require a prior knowledge of the miRNA sequences, and have problems such as limited dynamic range, poor concordance, poor sensitivity, and lack of reproducibility.
  • Current methods of using nucleic acid sequencing technologies to analyze miRNA are inefficient and costly.
  • One aspect of the invention provides a method of generating a DNA construct comprising a sequence derived from a miRNA molecule.
  • the method generally comprises: the steps of isolating isolating one or more miRNA molecules from a first sample; subjecting a first mixture comprising said miRNA molecule(s) to conditions that allow addition of a poly-A tail to the miRNA; adding a first adaptor to said first mixture, wherein said first adaptor comprises a sequence selected from the group consisting of SEQ ID NO. 1, SEQ ID NO. 2, SEQ ID NO. 3 and SEQ ID NO. 4; adding a reverse transcription primer to said first mixture, wherein said reverse transcription primer includes a sequence selected from the group consisting of SEQ ID NO. 5 and SEQ ID NO. 6; adding a forward amplification primer and a reverse amplification primer to said first mixture; amplifying said first mixture; and obtaining an amplified DNA construct from said amplification, said amplified DNA construct comprising a sequence derived from said miRNA.
  • the method further comprises the step of sequencing the amplified
  • DNA construct comprising the nucleic acid sequence of the miRNA molecule.
  • the method may further comprise the additional steps of isolating miRNA from a second sample; adding a second adaptor to a second mixture comprising the miRNA from the second sample; wherein the second adaptor includes a sequence selected from SEQ ID NO. 1, SEQ ID NO. 2, SEQ ID NO. 3, and SEQ ID NO. 4 and comprises a second barcode differing from the first barcode; and adding at least a portion of the second mixture to the first mixture.
  • the first barcode and the second barcode first barcode sequence and said second barcode sequence are each at least 2 nucleotides in length. Further, the said first barcode sequence and said second barcode sequence are each from 2 to 6 nucleotides in length.
  • the first barcode and the second barcode that is differing from the first barcode each is a sequence selected from the group consisting of AGAGAT (SEQ ID NO. 11), CATGCT (SEQ ID NO. 12), CGCGCT (SEQ ID NO. 13), GCCGGT (SEQ ID NO. 14) GCTAGT (SEQ ID NO. 15), CCGGCT (SEQ ID NO. 16), GGAAGT (SEQ ID NO. 17), GACTGT (SEQ ID NO.
  • AACCAT SEQ ID NO. 35
  • CCCCCT SEQ ID NO. 36
  • CGATCT SEQ ID NO. 37
  • TCGATT SEQ ID NO. 38
  • TGCATT SEQ ID NO. 39
  • CAACCT SEQ ID NO. 40
  • GGTTGT SEQ ID NO. 41
  • AAGGAT SEQ ID NO. 42
  • AGCTAT SEQ ID NO. 43
  • AAAAAT SEQ ID NO. 44
  • ACACAT SEQ ID NO. 45
  • AATTAT SEQ ID NO. 46
  • TCTCTT SEQ ID NO. 47
  • TCAGTT SEQ ID NO. 48
  • TATATT SEQ ID NO. 49
  • AGTCAT SEQ ID NO.
  • TAGCTT SEQ ID NO. 51
  • TGTGTT SEQ ID NO. 52
  • TGGTTT SEQ ID NO. 53
  • TAATTT SEQ ID N0.54
  • TCCTTT SEQ ID NO. 55
  • TGACTT SEQ ID NO. 56
  • the forward amplification primer of the provided method includes a sequence selected from SEQ ID NO. 7, SEQ ID NO. 8 and SEQ ID NO. 57.
  • the reverse amplification primer of the method includes a sequence selected from the group consisting of SEQ ID NO. 9, SEQ ID NO. 10 and SEQ ID NO. 58.
  • the DNA sequencing of the method is selected from Sanger sequencing, pyrosequencing, SOLiD sequencing, massively parallel sequencing, and derivatives thereof.
  • the DNA construct comprises: a first sequence derived from a miRNA; a second sequence selected from the group consisting of SEQ ID NO. 1 and SEQ ID NO. 2, SEQ ID NO. 3 and SEQ ID NO. 4; and a third sequence selected from the group consisting of SEQ ID NO. 5 and SEQ ID NO. 6.
  • the DNA construct further comprises a fourth sequence selected from the group consisting of SEQ ID NO. 7, SEQ ID NO. 8 and SEQ ID NO. 57.
  • the DNA construct further comprises a fifth sequence selected from the group consisting of SEQ ID NO. 9, SEQ ID NO. 10 and SEQ ID NO. 58.
  • the DNA construct further comprises a barcode sequence comprising 2, 3, 4, 5, 6, 7, or more nucleotides in length.
  • the barcode can comprise 6 or fewer, but more than 2, nucleotides.
  • Such barcode sequence in the DNA construct is selected from the group consisting of AGAGAT (SEQ ID NO. 11), CATGCT (SEQ ID NO. 12), CGCGCT (SEQ ID NO. 13), GCCGGT (SEQ ID NO. 14) GCTAGT (SEQ ID NO. 15), CCGGCT (SEQ ID NO. 16), GGAAGT (SEQ ID NO. 17), GACTGT (SEQ ID NO. 18), GAGAGT (SEQ ID NO. 19), CGGCCT (SEQ ID NO. 20), CCAACT (SEQ ID NO.
  • kits for generating a DNA construct comprising a sequence derived from a miRNA.
  • the kit comprises: an adaptor wherein said adaptor includes a sequence selected from the group consisting of SEQ ID NO. 1, SEQ ID NO. 2, SEQ ID NO. 3 and SEQ ID NO. 4; and a reverse transcription primer wherein the reverse transcription primer is selected from the group consisting of SEQ ID NO. 5 and SEQ ID NO. 6.
  • the kit further comprises: a forward amplification primer, wherein said forward amplification primer includes a sequence selected from the group consisting of SEQ ID NO. 7, SEQ ID NO. 8 and SEQ ID NO.
  • the adaptor of the kit comprises a barcode sequence having a length of at least 2 nucleotides.
  • the said barcode sequence is between 2 to 6 nucleotides in length.
  • Such a barcode in the kit is selected from the group consisting of AGAGAT (SEQ ID NO. 11), CATGCT (SEQ ID NO. 12), CGCGCT (SEQ ID NO. 13), GCCGGT (SEQ ID NO. 14) GCTAGT (SEQ ID NO. 15), CCGGCT (SEQ ID NO. 16), GGAAGT (SEQ ID NO.
  • GCATGT SEQ ID NO. 34
  • AACCAT SEQ ID NO. 35
  • CCCCCT SEQ ID NO. 36
  • CGATCT SEQ ID NO. 37
  • TCGATT SEQ ID NO. 38
  • TGCATT SEQ ID NO. 39
  • CAACCT SEQ ID NO. 40
  • GGTTGT SEQ ID NO. 41
  • AAGGAT SEQ ID NO. 42
  • AGCTAT SEQ ID NO. 43
  • AAAAAT SEQ ID NO. 44
  • ACACAT SEQ ID NO. 45
  • AATTAT SEQ ID NO. 46
  • TCTCTT SEQ ID NO. 47
  • TCAGTT SEQ ID NO. 48
  • TATATT SEQ ID NO.
  • AGTCAT SEQ ID NO. 50
  • TAGCTT SEQ ID NO. 51
  • TGTGTT SEQ ID NO. 52
  • TGGTTT SEQ ID NO. 53
  • TAATTT SEQ ID N0.54
  • TCCTTT SEQ ID NO. 55
  • TGACTT SEQ ID NO. 56
  • Still another aspect of the invention provides an isolated sequence having at least
  • the isolated sequence having at least 80% sequence identity with a sequence selected from the group consisting of SEQ ID NO. 1, SEQ ID NO. 2, SEQ ID NO. 3, SEQ ID NO. 4 comprises a barcode sequence having a length of at least 2 nucleotides, said barcode sequence has a length between 2 to 6 nucleotides.
  • Such a barcode in these isolated sequence is selected from the group consisting of AGAGAT (SEQ ID NO.
  • Figure 1 depicts an example of generating a DNA template from miRNA in preparation for sequencing
  • Fig. 2 depicts gel electrophoresis of DNA templates containing sequences derived from miRNA
  • Fig. 3 depicts the distribution of DNA templates containing sequences derived from miRNA on an array configured to hybridize to miRNA.
  • MiRNAs have been shown as a major new class of regulatory gene products. For example, in human heart, liver or brain, miRNAs play a role in tissue specification or cell lineage decisions. In addition, miRNAs influence a variety of processes, including early development, cell proliferation, cell death, and apoptosis and fat metabolism. The large number of miRNA genes, the diverse expression patterns and the abundance of potential miRNA targets suggest that miRNAs may be a significant but unrecognized source of human genetic disease. Differences in miRNA expression have also been found to be associated with cancer diagnosis, prognosis, and susceptibility to treatments.
  • a mature miRNA is typically an 18-25 nucleotide, non-coding RNA that regulates expression of mRNA including sequences complementary to the miRNA.
  • These small RNA molecules are known to control gene expression by regulating the stability and/or translation of mRNAs.
  • miRNAs bind to the 3' UTR of target mRNAs and suppress translation.
  • MiRNA' s may also bind to target mRNAs and mediate gene silencing through the RNAi pathway. MiRNAs may also regulate gene expression by causing chromatin condensation.
  • Endogenously expressed miRNAs are processed by endonucleolytic cleavage from larger double- stranded RNA precursor molecules.
  • the resulting small single- stranded miRNAs are incorporated into a multi-protein complex, termed RISC.
  • the small RNA in RISC provides sequence information that is used to guide the RNA-protein complex to its target RNA molecules.
  • the degree of complimentarity between the small RNA and its target determines the fate of the bound mRNA. Perfect pairing induces target RNA cleavage, as is the case for siRNAs and most plant miRNAs. In comparison, the imperfect pairing in the central part of the duplex leads to a block in translation.
  • miRNAs regulate various biological functions including developmental processes, developmental timing, cell proliferation, neuronal gene expression and cell fate, apoptosis, tissue growth, viral pathogenesis, brain morphogenesis, muscle differentiation, stem cell division and progression of human diseases. Many miRNAs are conserved in sequence and function between distantly related organisms. However, condition- specific, time- specific, and individual- specific levels of gene expression may be due to the interactions of different miRNAs which lead to genetic expression of various traits. The large number of miRNA genes, the diverse expression patterns and the abundance of potential miRNA targets suggest that miRNAs may be a significant but unrecognized source of human genetic diseases.
  • miRNA genetic alterations such as deletions, insertions, reversions or conversions, may affect the accuracy of miRNA related gene regulation.
  • miRNA genetic alterations may be used as markers for disease prognosis and diagnosis.
  • miRNA alleles may alternatively be used as target for disease treatment, and markers for disease prognosis and diagnosis.
  • Common methods of analyzing miRNA such as array-based methods are unable to detect mutated miRNA.
  • the invention provides methods for detecting the presence of a known or unknown miRNA in a sample or total RNA provided.
  • a sample may be derived from a subject.
  • a subject may be any organism including bacteria, fungi, plants, animals that include chordates, mammals, humans, insects, endangered species, or any other organism of agricultural, environmental, or other significance.
  • Samples derived from animals include, but are not limited to, biopsy or other in vivo or ex vivo analysis of prostate, breast, skin, muscle, facia, brain, endometrium, lung, head and neck, pancreas, small intestine, blood, liver, testes, ovaries, colon, skin, stomach, esophagus, spleen, lymph node, bone marrow, kidney, placenta, or fetus.
  • Samples derived from subjects may also take the form of a fluid sample such as peripheral blood, lymph fluid, ascites, serous fluid, pleural effusion, sputum, bronchial wash, bronchioalveolar lavage fluid (BALF,) cerebrospinal fluid, semen, amniotic fluid, lacrimal fluid, stool, urine, or any other source in which a miRNA might be present.
  • Samples may be collected by any method now known or yet to be disclosed, including swiping or swabbing an area or orifice, removal of a piece of tissue as in a biopsy, or any method known to collect bodily fluids.
  • samples may be derived from a subject displaying symptoms of a cancer and/or from a subject suspected of having cancer.
  • a sample may be from any environmental source including soil, air, water, solid surfaces (whether natural or artificial,) culture media, foodstuffs, and any interfaces between or combinations of these elements.
  • the invention may comprise a step of isolating (interchangeably called purifying or extracting) of miRNA from a sample. miRNA is also readily detectable in blood and blood compartments such as serum or plasma or whole blood by any of a number of methods. miRNA may be isolated from a sample by any method now known in the art or yet to be disclosed that may be used to isolate RNA. Such methods include guanidium thiocyanate phenol-chloroform extraction also known as Trizol® extraction, spin-column based methods such as methods involving glass fiber filter columns. The method may alternatively include a method that removes highly expressed miRNA.
  • the invention may further comprise a step of adding a poly-A tail to the miRNA.
  • RNA in a mixture of 50mM Tris (pH 8.0), 250mM NaCl, lOmM MgCl 2 , ImM DTT, O.lmM ATP, and 0.4 unit of Poly(A) polymerase is incubated at 37 °C for 30 minutes. This may be followed by 95 °C incubation for 5 minutes to inactivate the Poly-A polymerase.
  • the invention may further comprise a step of ligating one or more adaptor molecules to the miRNA.
  • the length of the adaptor may and will vary with regard to the length of the barcode region and the length of areas to which sequencing primers, PCR primers, or other primers may bind or hybridize.
  • a first adaptor sequence may be added to the 5' end of the miRNA and a second adaptor sequence may be added to the 3' end of the miRNA.
  • An adaptor sequence comprises at least one universal priming sequence that is known to those skilled in the art.
  • the universal priming sequence may be any sequence that is unlikely to hybridize to miRNA sequences.
  • the adaptor may comprise one or more barcode (also known as index or zip code) sequences, among many purposes, the one or more barcodes facilitate the indexing and identifying the miRNA to be analyzed.
  • Figure 1 depicts one of the embodiments of the present invention.
  • a barcode is any sequence of two or more nucleic acids that may aid in the identification of a nucleic acid as being derived from a particular sample.
  • a barcode may be any identifiable DNA sequence of two or more nucleotides.
  • barcodes may be 3, 4, 5, 6, 7, or more than 7 nucleotides in length.
  • Nonlimiting examples of barcode sequences include AGAGAT (SEQ ID NO. 11), CATGCT (SEQ ID NO. 12), CGCGCT (SEQ ID NO. 13), GCCGGT (SEQ ID NO. 14) GCTAGT (SEQ ID NO. 15), CCGGCT (SEQ ID NO. 16), GGAAGT (SEQ ID NO. 17), GACTGT (SEQ ID NO.
  • AACCAT SEQ ID NO. 35
  • CCCCCT SEQ ID NO. 36
  • CGATCT SEQ ID NO. 37
  • TCGATT SEQ ID NO. 38
  • TGCATT SEQ ID NO. 39
  • CAACCT SEQ ID NO. 40
  • GGTTGT SEQ ID NO. 41
  • AAGGAT SEQ ID NO. 42
  • AGCTAT SEQ ID NO. 43
  • AAAAAT SEQ ID NO. 44
  • ACACAT SEQ ID NO. 45
  • AATTAT SEQ ID NO. 46
  • TCTCTT SEQ ID NO. 47
  • TCAGTT SEQ ID NO. 48
  • TATATT SEQ ID NO. 49
  • AGTCAT SEQ ID NO.
  • the adaptor sequence comprises a sequence selected from a group consisting of : 5 ' -CCGCGTACTGGAAGATTTGCGCATTTTTATC-3 ' (SEQ ID NO.
  • the number of bases represented by Y may be 1, 2, 3, 4, 5 or any other higher number that is suitable but without affecting the following steps of the invention disclosed herein.
  • the optional bases represented by Y is 5'-AAA-3'
  • the letter "n” represents the number of nucleotides in the barcode sequence and "n” may be any positive whole number or zero.
  • the "[X] n Y" portion of these sequences is not included therein as these will be variable.
  • the sequence listed in the sequence listing is the "base” sequence, to which the "[X] n Y" portion, as defined above, is added.
  • An adaptor may be ligated to a miRNA by any appropriate method including the following example: a mixture of 50 ⁇ adaptor, 50mM Tris (pH 8.0), lOmM MgCl 2 , lOmg/ml BSA, ImM hexamine CoCl 2 , 10% DMSO, 0. ImM ATP, and 5 units of T4 RNA ligase may be incubated at 20 °C for 2 hours.
  • the invention may further comprise a step of annealing of a reverse transcription primer to the construct.
  • the reverse transcription primer may be any sequence that facilitates reverse transcription of the miRNA construct and may include a sequence of two or more T nucleotides. Nucleic acid sequences may be identified by the IUAPC letter code which is as follows: A - Adenine base; C- Cytosine base; G - guanine base; T or U - thymine or uracil base.
  • T or U may be used interchangeably depending on whether the nucleic acid is DNA or RNA.
  • Nonlimiting examples of reverse transcription primers include:
  • the invention may further comprise a step of reverse transcription of the construct with a reverse transcriptase.
  • the reverse transcriptase may be derived from any source including M-MLV, AMV, HIV, or any other enzyme or other chemical or physical entity capable of generating single stranded cDNA from an RNA template.
  • the invention may further comprise a step of nucleic acid amplification of the cDNA through any of a number of processes.
  • Nucleic acids that may be subjected to amplification may be from any samples of any source or subject.
  • nucleic acid amplification is a process by which copies of a nucleic acid may be made from a source nucleic acid. In some nucleic amplification methods, the copies are generated exponentially.
  • nucleic acid amplification examples include but are not limited to: the polymerase chain reaction (PCR), ligase chain reaction (LCR,) self- sustained sequence replication (3SR), nucleic acid sequence based amplification (NASBA,) strand displacement amplification (SDA,) amplification with Q replicase, whole genome amplification with enzymes such as ⁇ 29, whole genome PCR, in vitro transcription with Klenow or any other RNA polymerase, or any other method by which copies of a desired sequence are generated.
  • PCR polymerase chain reaction
  • LCR ligase chain reaction
  • NASBA nucleic acid sequence based amplification
  • SDA strand displacement amplification
  • PCR Polymerase chain reaction
  • a DNA polymerase which may be a thermostable DNA polymerase such as Taq or Pfu, and deoxyribose nucleoside triphosphates (dNTP's).
  • dNTP's deoxyribose nucleoside triphosphates
  • the reaction mixture is subjected to temperature cycles comprising a denaturation stage; (typically 80-100° C); an annealing stage with a temperature that may based on the melting temperature (Tm) of the primers and the degeneracy of the primers; and an extension stage at a suitable temperature, an exemplary range may be 40-75° C.
  • RNA may be detected by PCR analysis by creating a DNA template from RNA through a reverse transcriptase enzyme.
  • Primers for nucleic acid amplification may contain additional sequences that add features to the construct including sites for protein binding such as restriction enzyme sites or promoter binding sites or sites that facilitate additional methods of nucleic acid analysis including sequences that facilitate DNA sequencing.
  • Forward primers used in nucleic acid amplification may include sequences such as- 5 ' -ATCTCCGCGTACTGGAAGATTTGC-3 ' (SEQ ID NO. 7) or 5 ' - AATGATACGGCGACCACCGAGATCTACACTCTTTCCCTA- CACGACGCTCTTCCGATCTCCGCGTACTGGAAGATTTGC-3' (SEQ ID NO. 8).
  • Reverse primers used in nucleic acid amplification may include sequences such as 5'- AATGCGCATACTTATAA-3 ' (SEQ ID NO. 9) or 5'- CAAGCAGAAGACGG- CATACGAGATCGGTCTCGGCATTCCTGCTGAACCGCTCTTCCGATCTGTTAATGCGC ATACTT ATA A- 3 ' (SEQ ID NO. 10).
  • the consensus sequence of the forward primer is: 5'- TAATACGACTCACTATAGGGCGATCGTCACCGTGTACAAANNNNNNNNNNNNNN NNNNNNAAA AAAAAAAAAAA-3 ' (SEQ ID NO. 57 ); and the consensus sequence of the reverse primer is: 5'-
  • SEQ ID NO. 58 TCGGCCTGCCTGAAAGCGTGGTGATTTCCGTTTTTTTTTTTTNNNNNNNNNNNNNNN NNNNNNNTTTGTACACG-3 ' (SEQ ID NO. 58).
  • Each of the SEQ ID NO. 57 and SEQ ID NO. 58 is 76 bases.
  • the 24 Ns of SEQ ID NO. 57 and SEQ ID NO. 58 represent where the miRNAs should be. Thus the 24 Ns of SEQ ID NO. 57 hybridize to the cDNA product containing the miRNA sequence.
  • the first 40 nucleotides of SEQ ID NO. 57 are configured to bind to a sequencing primer. For SEQ ID N0.58, the 24 Ns of SEQ ID NO.
  • the length of the forward and reverse PCR primers may be between 50-110 bases, as applicable.
  • the length of the forward and reverse PCR primers is between 60-100 bases. Still more preferably, the length of the forward and reverse PCR primers is between 70-90 bases.
  • sequence Identity refers to a relationship between two or more polypeptide sequences or two or more polynucleotide sequences, namely a reference sequence and a given sequence to be compared with the reference sequence.
  • Sequence identity is determined by comparing the given sequence to the reference sequence after the sequences have been optimally aligned to produce the highest degree of sequence similarity, as determined by the match between strings of such sequences. Upon such alignment, sequence identity is ascertained on a position-by-position basis, e.g., the sequences are "identical” at a particular position if at that position, the nucleotides or amino acid residues are identical. The total number of such position identities is then divided by the total number of nucleotides or residues in the reference sequence to give % sequence identity. Sequence identity can be readily calculated by known methods, including but not limited to, those described in Computational Molecular Biology, Lesk, A.
  • nucleotide sequence having at least, for example, 85%, preferably 90%, even more preferably 95% "sequence identity" to a reference nucleotide sequence it is intended that the nucleotide sequence of the given polynucleotide is identical to the reference sequence except that the given polynucleotide sequence may include up to 15, preferably up to 10, even more preferably up to 5 point mutations per each 100 nucleotides of the reference nucleotide sequence.
  • a polynucleotide having a nucleotide sequence having at least 85%, preferably 90%, even more preferably 95% identity relative to the reference nucleotide sequence up to 15%, preferably 10%, even more preferably 5% of the nucleotides in the reference sequence may be deleted or substituted with another nucleotide, or a number of nucleotides up to 15%, preferably 10%, even more preferably 5% of the total nucleotides in the reference sequence may be inserted into the reference sequence.
  • These mutations of the reference sequence may occur at the 5' or 3' terminal positions of the reference nucleotide sequence or anywhere between those terminal positions, interspersed either individually among nucleotides in the reference sequence or in one or more contiguous groups within the reference sequence.
  • the invention may further comprise the step of sequencing the amplified construct.
  • Methods of sequencing include but need not be limited to any form of DNA sequencing including Sanger, next generation sequencing, pyrosequencing, SOLiD sequencing, massively parallel sequencing, pooled, and barcoded DNA sequencing or any other sequencing method now known or yet to be disclosed.
  • a single- stranded DNA template, a primer, a DNA polymerase, nucleotides and a label such as a radioactive label conjugated with the nucleotide base or a fluorescent label conjugated to the primer, and one chain terminator base comprising a dideoxynucleotide (ddATP, ddGTP, ddCTP, or ddTTP, are added to each of four reaction (one reaction for each of the chain terminator bases).
  • the sequence may be determined by electrophoresis of the resulting strands.
  • dye terminator sequencing each of the chain termination bases is labeled with a fluorescent label of a different wavelength that allows the sequencing to be performed in a single reaction.
  • pyrosequencing the addition of a base to a single stranded template to be sequenced by a polymerase results in the release of a phyrophosphate upon nucleotide incorporation.
  • An ATP sulfyrlase enayme converts pyrophosphate into ATP that in turn catalyzes the conversion of luciferin to oxyluciferin which results in the generation of visible light that is then detected by a camera or other sensor capable of capturing visible light.
  • the molecule to be sequenced is fragmented and used to prepare a population of clonal magnetic beads (in which each bead is conjugated to a plurality of copies of a single fragment) with an adaptor sequence and alternatively a barcode sequence.
  • the beads are bound to a glass surface. Sequencing is then performed through 2-base encoding.
  • nucleic acid sequences may be identified by the IUAPC letter code which is as follows: A - Adenine base; C- Cytosine base; G - guanine base; T or U - thymine or uracil base.
  • T or U may be used interchangeably depending on whether the nucleic acid is DNA or RNA.
  • a sequence having less than 60%, 70%, 80%, 90%, 95%, 99% or 100% identity to the identifying sequence may still be encompassed by the invention if it is able of binding to its complimentary sequence and/or facilitating nucleic acid amplification of a desired target sequence.
  • Sequence Identity is defined the same as above provided.
  • a sequence is called degenerate if some of its positions have several possible bases. If a sequence is represented in degenerate form, for example, through the use of codes other than A, C, G, T, or U (for example: R, Y, M, K, S, W, H, B, V, D, N according to IUPAC nomenclature of mixed bases, see above for the detail), the nucleic acid sequence presented in a degenerate form includes individual sequences each of which has a specific nucleotide encompassed by the IUPAC code at the degenerate position. In some embodiment, a degenerate is applied a mixture of nucleic acids of individual sequences having a specific nucleotide encompassed by the IUPAC code at the degenerate position.
  • a nucleic acid may be added to a sample by any of a number of methods including manual methods, mechanical methods, or any combination thereof.
  • one aspect of the present invention provides a method of generating a
  • DNA construct comprises isolating miRNA from a first sample; adding a first adaptor to a first mixture comprising the miRNA, wherein the first adaptor includes a sequence selected from the group consisting of SEQ ID NO. 1 and SEQ ID NO. 3; adding a reverse transcription primer to the mixture, wherein the reverse transcription primer includes a sequence selected from SEQ ID NO. 5 and SEQ ID NO. 6; adding a forward amplification primer to the first mixture, adding a reverse amplification primer to the first mixture; and subjecting the mixture to conditions that allow addition of a poly-A tail to the miRNA, ligating the adaptor to the miRNA; generating a cDNA template by reverse transcription; and amplifying the nucleic acid.
  • the forward amplification primer may include a sequence represented by SEQ ID NO. 7. If the primer includes a sequence represented by SEQ ID NO. 7, then the primer may also include SEQ ID NO. 8.
  • the reverse amplification primer may include SEQ ID NO. 9. If the primer includes SEQ ID NO. 9, it may also include SEQ ID NO. 10.
  • the first adaptor may comprise a first barcode. If the first adaptor comprises a first barcode, the method may further comprise the steps of isolating miRNA from a second sample; adding a second adaptor to a second mixture comprising the miRNA from the second sample, wherein the second adaptor includes a sequence selected from SEQ ID NO. 1 and SEQ ID NO. 3 and further comprises a second barcode; and adding at least a portion of the second mixture to the first mixture.
  • the barcode may comprise any number of nucleotides including six or fewer nucleotides.
  • the nonlimiting exemplary barcodes include the following sequences: AGAGAT (SEQ ID NO. 11), CATGCT (SEQ ID NO. 12), CGCGCT (SEQ ID NO.
  • the method may further comprise the step of performing DNA sequencing on the DNA template.
  • Preferred methods of DNA sequencing include Sanger sequencing, pyrosequencing, SOLiD sequencing, or massively parallel sequencing.
  • Another aspect of the present invention provides a DNA construct, which comprises a first sequence derived from a miRNA; a second sequence selected from the group consisting of SEQ ID NO. 1 and SEQ ID NO. 3; and a third sequence selected from the group consisting of SEQ ID NO. 5 and SEQ ID NO. 6.
  • the second sequence may be 5' of the first sequence.
  • the third sequence may be 3' of the first sequence.
  • the construct may further comprise a barcode sequence.
  • the barcode sequence may be 5' of the first sequence.
  • the barcode may comprise six or fewer nucleotides such as AGAGAT (SEQ ID NO. 11), CATGCT (SEQ ID NO. 12), CGCGCT (SEQ ID NO.
  • AATTAT SEQ ID NO. 46
  • TCTCTT SEQ ID NO. 47
  • TCAGTT SEQ ID NO. 48
  • TATATT SEQ ID NO. 49
  • AGTCAT SEQ ID NO. 50
  • TAGCTT SEQ ID NO. 51
  • TGTGTT SEQ ID NO. 52
  • TGGTTT SEQ ID NO. 53
  • TAATTT SEQ ID N0.54
  • TCCTTT SEQ ID NO. 55
  • TGACTT SEQ ID NO. 56
  • Yet another aspect of the invention provides kits to facilitate miRNA sequencing.
  • a kit may include any combination of components that facilitates the performance of an assay.
  • a kit may include suitable nucleic acid-based and immunological reagents as well as suitable buffers, control reagents, and printed protocols.
  • Kits that facilitate nucleic acid sequencing may further include one or more of the following: specific nucleic acids such as oligonucleotides, primers, or probes, labeling reagents, enzymes including PCR amplification reagents such as Taq or Pfu, reverse transcriptase, or one or more other polymerases, and/or reagents that facilitate binding of the nucleic acids to their targets.
  • Specific nucleic acids may further include nucleic acids, polynucleotides, oligonucleotides (DNA, or RNA), or any combination of molecules that includes one or more of the above, or any other molecular entity capable of specific binding to a nucleic acid sequence.
  • An oligonucleotide may be any polynucleotide of at least 2 nucleotides.
  • Oligonucleotides may be less than 10, 15, 20, 30, 40, 50, 75, 100, 200, or 500 nucleotides in length. While oligonucleotides are often linear, they may, depending on their sequence and conditions, assume a two- or three-dimensional structure. Oligonucleotides may be chemically synthesized by any of a number of methods including sequential synthesis, solid phase synthesis, or any other synthesis method now known or yet to be disclosed. Alternatively, oligonucleotides may be produced by recombinant DNA based methods.
  • the oligonucleotide may be affixed to a solid substrate.
  • the sample may be affixed to a solid substrate.
  • a probe or sample may be covalently bound to the substrate or it may be bound by some non covalent interaction including electrostatic, hydrophobic, hydrogen bonding, Van Der Waals, magnetic, or any other interaction by which a probe such as an oligonucleotide probe may be attached to a substrate while maintaining its ability to recognize the allele to which it has specificity.
  • a substrate may be any solid or semi solid material onto which a probe may be affixed, attached or printed, either singly or in the formation of a microarray.
  • substrate materials include but are not limited to polyvinyl, polystyrene, polypropylene, polyester or any other plastic, glass, silicon dioxide or other silanes, hydrogels, gold, platinum, microbeads, micelles and other lipid formations, nitrocellulose, or nylon membranes.
  • the substrate may take any form, including a spherical bead or flat surface.
  • the probe may be bound to a substrate in the case of an array.
  • the sample may be bound to a substrate as (for example) the case of a Southern Blot, Northern blot or other method that affixes the sample to a substrate.
  • the kit for miRNA sequencing generally comprises a first adaptor wherein the first adaptor includes a sequence selected from the group consisting of SEQ ID NO. 1 and SEQ ID NO. 3; and a reverse transcription primer, wherein the reverse transcription primer is selected from the group consisting of SEQ ID NO. 5 and SEQ ID NO. 6.
  • the first adaptor of the kit comprises a first barcode. If the first adaptor comprises a first barcode, then the kit may further comprise a second adaptor that includes a sequence selected from the group consisting of SEQ ID NO. 1 and SEQ ID NO. 3 and wherein the second adaptor comprises a second barcode.
  • the first barcode may comprise any number of nucleotides including six or fewer nucleotides such as AG AG AT (SEQ ID NO. 11), CATGCT (SEQ ID NO. 12), CGCGCT (SEQ ID NO. 13), GCCGGT (SEQ ID NO. 14) GCTAGT (SEQ ID NO. 15), CCGGCT (SEQ ID NO. 16), GGAAGT (SEQ ID NO. 17), GACTGT (SEQ ID NO. 18), GAGAGT (SEQ ID NO. 19), CGGCCT (SEQ ID NO. 20), CCAACT (SEQ ID NO. 21), ACCAAT (SEQ ID NO. 22), CACACT (SEQ ID NO. 23), GAAGGT (SEQ ID NO.
  • GGTTGT SEQ ID NO. 41
  • AAGGAT SEQ ID NO. 42
  • AGCTAT SEQ ID NO. 43
  • AAAAAT SEQ ID NO. 44
  • ACACAT SEQ ID NO. 45
  • AATTAT SEQ ID NO. 46
  • TCTCTT SEQ ID NO. 47
  • TCAGTT SEQ ID NO. 48
  • TATATT SEQ ID NO. 49
  • AGTCAT SEQ ID NO. 50
  • TAGCTT SEQ ID NO. 51
  • TGTGTT SEQ ID NO. 52
  • TGGTTT SEQ ID NO. 53
  • TAATTT SEQ ID N0.54
  • TCCTTT SEQ ID NO. 55
  • TGACTT SEQ ID NO. 56
  • the kit may further comprise a forward amplification primer that includes SEQ
  • the forward amplification primer may also include SEQ ID NO. 8.
  • the kit may further comprise a reverse amplification primer that includes SEQ ID NO. 9. If the reverse amplification primer includes SEQ ID NO. 9, then the reverse amplification primer may further include SEQ ID NO. 10.
  • the kit may further comprise a DNA polymerase such as a thermostable DNA polymerase.
  • the kit may further comprise a reverse transcriptase.
  • New "next-generation" high-throughput sequencing platforms capable of generating massive amounts of reads very quickly, can be more effectively utilized for systematic and in-depth study of genetic variation.
  • This invention provides the reverse transcription process to facilitate barcoding of cDNA derived from miRNA. Furthermore, sample oligo-pooling is optimized to increase density and coverage on a flowcell, which substantially increase the throughput for sample analysis. Without oligo-pooling, sequencing of the miRNA transcriptome can be performed on 8 samples on a flowcell with 1 unique sample per channel. Using sample oligo-pooling, up to 400 samples per flowcell can be performed without loss of data.
  • FIG. 1 depicts the generation of a cDNA construct that may be used in next-generation sequencing that contains miRNA sequence.
  • the cDNA derived from miRNA is 75-130 nucleotides in length (a 30-50 nucleotide 5' adaptor + a 15-30 nucleotide miRNA sequence + a 30-50 nucleotide reverse transcription primer).
  • This example includes a barcode of 5-6 nucleotides in length near the 3' end of the adaptor, just 5' of the miRNA sequence.
  • SEQ ID NO. 57 TAATACGACTCACTATAGGGCGATCGTCACCGTGTACAAANNNNNNNNNNNNNN NNNNNNAAA AAAAAAAAA-3 '
  • consensus sequence of the reverse primer is: 5'- TCGGCCTGCCTGAAAGCGTGGTGATTTCCGTTTTTTTTTTTTNNNNNNNNNNNNNNN NNNNNNNTTTGTACACG-3 ' (SEQ ID NO. 58).
  • Each of the SEQ ID NO. 57 and SEQ ID NO. 58 is 76 bases.
  • the 24 Ns of SEQ ID NO. 57 and SEQ ID NO. 58 represent where the miRNAs should be. Thus the 24 Ns of SEQ ID NO.
  • the first 40 nucleotides of SEQ ID NO. 57 hybridize to the cDNA product containing the miRNA sequence.
  • the first 40 nucleotides of SEQ ID NO. 57 are configured to bind to a sequencing primer.
  • the 24 Ns of SEQ ID NO. 57 hybridize to the cDNA product containing the miRNA sequence, and the first 40 nucleotides are configured to bind to the sequencing primer.
  • the length of the forward and reverse PCR primer may vary depending on the number of nucleotides configured to bind to a sequencing primer, and the number of nucleotides hybridize to the cDNA construct containing the miRNA sequence.
  • the length of the forward and reverse PCR primers may be between 50-110 bases, as applicable.
  • the length of the forward and reverse PCR primers is between 60-100 bases. Still more preferably, the length of the forward and reverse PCR primers is between 70-90 bases.
  • the forward PCR primer is 82 nucleotides in length.
  • the first 58 nucleotides are configured to bind to a sequencing primer and the last 24 nucleotides hybridize to the cDNA product containing the miRNA sequence.
  • the reverse PCR primer is 78 nucleotides in length.
  • the 17 nucleotides at the 5'-end hybridize to the cDNA construct that includes the miRNA sequence.
  • the remaining 61 nucleotides are configured to bind to the sequencing primer.
  • the final cDNA product containing a miRNA sequence should be approximately 220 nucleotides in length.
  • Figure 2 depicts the cDNA resulting from the study.
  • a DNA size ladder is in lane 1.
  • Lanes 2 through 8 contain products from seven replicates of the exemplary method outlined in Figure 1.
  • the bright bands at ⁇ 160bp are primer dimers. These are of the expected size given an 82 nucleotide forward primer and a 78 nucleotide reverse primer.
  • the region around 220 base pairs was excised from the gel.
  • Figure 3 depicts a miRNA array analysis on cDNA excised from the gel. This confirms that the cDNA created using these primers contains a wide range of miRNA, in terms of expression level and content with similar expression levels of replicate miRNA.
  • qRT-PCR using primers capable of specifically amplifying the miRNA' s let-7d and miR-125b.
  • Table 1 the row labeled qRT-PCR row displays the Cp values for the let-7d and miR-125b (relative to a no template control. Lower numbers correlate with greater miRNA expression.
  • the second row displays miRNA microarray signal intensity for let-7a and miR-125 (relative to background signal). By qRT-PCR and miRNA microarray, let-7a appears to be in higher abundance relative to miR-125b in this cDNA product.
  • let-7a and miR-125b Presence of let-7a and miR-125b was verified by Sanger sequencing of the quantitative real-time PCR product.
  • the results for let-7a and miR-125 by qRT-PCR and miRNA microarray is shown in Table 1 :

Landscapes

  • Chemical & Material Sciences (AREA)
  • Organic Chemistry (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Analytical Chemistry (AREA)
  • Zoology (AREA)
  • Wood Science & Technology (AREA)
  • Proteomics, Peptides & Aminoacids (AREA)
  • Health & Medical Sciences (AREA)
  • Engineering & Computer Science (AREA)
  • Biophysics (AREA)
  • Immunology (AREA)
  • Microbiology (AREA)
  • Molecular Biology (AREA)
  • Biotechnology (AREA)
  • Physics & Mathematics (AREA)
  • Chemical Kinetics & Catalysis (AREA)
  • Biochemistry (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Health & Medical Sciences (AREA)
  • Genetics & Genomics (AREA)
  • Measuring Or Testing Involving Enzymes Or Micro-Organisms (AREA)

Abstract

The present invention provides DNA constructs useful in the analysis of miRNA by sequencing. The invention also provides methods of generating said constructs and kits that facilitate the generation of said constructs.

Description

METHODS AND KITS TO ANALYZE MICRORNA BY NUCLEIC ACID
SEQUENCING
CROSS REFERENCE
[0001] This application is related to and claims the priority benefit of U.S. provisional application 61/347,202, filed on May 21, 2010, the teachings and content of which are incorporated by reference herein.
FIELD OF INVENTION
[0002] This present invention is related to a method, DNA constructs, kits and designed sequences for high-throughput sequencing of small non-coding RNA in general and miRNA in particular.
BACKGROUND OF THE INVENTION
[0003] MicroRNAs (miRNAs) are small, 18-25 nucleotide, non-coding, single- stranded
RNA molecules capable of regulating gene expression at both the transcriptional and translational level. Small non-coding RNA plays a key role in regulating a variety of biological processes, including developmental timing, cellular differentiation, tumor progression, neurogenesis, transposon silencing and viral defense. The current tools for studying small RNA are inadequate for whole genome discovery and characterization of novel small RNA. High throughput platforms based on probe-hybridization, such as chip-based microarrays, require a prior knowledge of the miRNA sequences, and have problems such as limited dynamic range, poor concordance, poor sensitivity, and lack of reproducibility. Current methods of using nucleic acid sequencing technologies to analyze miRNA are inefficient and costly. Clearly, there is need for new methods and kits that facilitate miRNA analysis using sequencing technologies, whether the miRNA was known or unknown before the analysis.
BRIEF SUMMARY OF THE INVENTION
[0004] One aspect of the invention provides a method of generating a DNA construct comprising a sequence derived from a miRNA molecule. The method generally comprises: the steps of isolating isolating one or more miRNA molecules from a first sample; subjecting a first mixture comprising said miRNA molecule(s) to conditions that allow addition of a poly-A tail to the miRNA; adding a first adaptor to said first mixture, wherein said first adaptor comprises a sequence selected from the group consisting of SEQ ID NO. 1, SEQ ID NO. 2, SEQ ID NO. 3 and SEQ ID NO. 4; adding a reverse transcription primer to said first mixture, wherein said reverse transcription primer includes a sequence selected from the group consisting of SEQ ID NO. 5 and SEQ ID NO. 6; adding a forward amplification primer and a reverse amplification primer to said first mixture; amplifying said first mixture; and obtaining an amplified DNA construct from said amplification, said amplified DNA construct comprising a sequence derived from said miRNA.
[0005] In some forms, the method further comprises the step of sequencing the amplified
DNA construct comprising the nucleic acid sequence of the miRNA molecule.
[0006] The method may further comprise the additional steps of isolating miRNA from a second sample; adding a second adaptor to a second mixture comprising the miRNA from the second sample; wherein the second adaptor includes a sequence selected from SEQ ID NO. 1, SEQ ID NO. 2, SEQ ID NO. 3, and SEQ ID NO. 4 and comprises a second barcode differing from the first barcode; and adding at least a portion of the second mixture to the first mixture.
[0007] In this provided method, the first barcode and the second barcode first barcode sequence and said second barcode sequence are each at least 2 nucleotides in length. Further, the said first barcode sequence and said second barcode sequence are each from 2 to 6 nucleotides in length. The first barcode and the second barcode that is differing from the first barcode, each is a sequence selected from the group consisting of AGAGAT (SEQ ID NO. 11), CATGCT (SEQ ID NO. 12), CGCGCT (SEQ ID NO. 13), GCCGGT (SEQ ID NO. 14) GCTAGT (SEQ ID NO. 15), CCGGCT (SEQ ID NO. 16), GGAAGT (SEQ ID NO. 17), GACTGT (SEQ ID NO. 18), GAGAGT (SEQ ID NO. 19), CGGCCT (SEQ ID NO. 20), CCAACT (SEQ ID NO. 21), ACCAAT (SEQ ID NO. 22), CACACT (SEQ ID NO. 23), GAAGGT (SEQ ID NO. 24), GGGGGG (SEQ ID NO. 25), CGTACT (SEQ ID NO. 26), CAGTCT (SEQ ID NO. 27), AGGAAT (SEQ ID NO. 28), ACTGAT (SEQ ID NO. 29), ACGTAT (SEQ ID NO. 30), TACGTT (SEQ ID NO. 31), CCTTCT (SEQ ID NO. 32), GATCGT (SEQ ID NO. 33), GCATGT (SEQ ID NO. 34), AACCAT (SEQ ID NO. 35), CCCCCT (SEQ ID NO. 36), CGATCT (SEQ ID NO. 37), TCGATT (SEQ ID NO. 38), TGCATT (SEQ ID NO. 39), CAACCT (SEQ ID NO. 40), GGTTGT (SEQ ID NO. 41), AAGGAT (SEQ ID NO. 42), AGCTAT (SEQ ID NO. 43), AAAAAT (SEQ ID NO. 44), ACACAT (SEQ ID NO. 45), AATTAT (SEQ ID NO. 46), TCTCTT (SEQ ID NO. 47), TCAGTT (SEQ ID NO. 48), TATATT (SEQ ID NO. 49), AGTCAT (SEQ ID NO. 50), TAGCTT (SEQ ID NO. 51), TGTGTT (SEQ ID NO. 52), TGGTTT (SEQ ID NO. 53), TAATTT (SEQ ID N0.54 ), TCCTTT (SEQ ID NO. 55), and TGACTT (SEQ ID NO. 56).
[0008] Further, the forward amplification primer of the provided method includes a sequence selected from SEQ ID NO. 7, SEQ ID NO. 8 and SEQ ID NO. 57. The reverse amplification primer of the method includes a sequence selected from the group consisting of SEQ ID NO. 9, SEQ ID NO. 10 and SEQ ID NO. 58. The DNA sequencing of the method is selected from Sanger sequencing, pyrosequencing, SOLiD sequencing, massively parallel sequencing, and derivatives thereof.
[0009] Another aspect of the invention provides a DNA construct. The DNA construct comprises: a first sequence derived from a miRNA; a second sequence selected from the group consisting of SEQ ID NO. 1 and SEQ ID NO. 2, SEQ ID NO. 3 and SEQ ID NO. 4; and a third sequence selected from the group consisting of SEQ ID NO. 5 and SEQ ID NO. 6. The DNA construct further comprises a fourth sequence selected from the group consisting of SEQ ID NO. 7, SEQ ID NO. 8 and SEQ ID NO. 57. The DNA construct further comprises a fifth sequence selected from the group consisting of SEQ ID NO. 9, SEQ ID NO. 10 and SEQ ID NO. 58. The DNA construct further comprises a barcode sequence comprising 2, 3, 4, 5, 6, 7, or more nucleotides in length. The barcode can comprise 6 or fewer, but more than 2, nucleotides. Such barcode sequence in the DNA construct is selected from the group consisting of AGAGAT (SEQ ID NO. 11), CATGCT (SEQ ID NO. 12), CGCGCT (SEQ ID NO. 13), GCCGGT (SEQ ID NO. 14) GCTAGT (SEQ ID NO. 15), CCGGCT (SEQ ID NO. 16), GGAAGT (SEQ ID NO. 17), GACTGT (SEQ ID NO. 18), GAGAGT (SEQ ID NO. 19), CGGCCT (SEQ ID NO. 20), CCAACT (SEQ ID NO. 21), ACCAAT (SEQ ID NO. 22), CACACT (SEQ ID NO. 23), GAAGGT (SEQ ID NO. 24), GGGGGG (SEQ ID NO. 25), CGTACT (SEQ ID NO. 26), CAGTCT (SEQ ID NO. 27), AGGAAT (SEQ ID NO. 28), ACTGAT (SEQ ID NO. 29), ACGTAT (SEQ ID NO. 30), TACGTT (SEQ ID NO. 31), CCTTCT (SEQ ID NO. 32), GATCGT (SEQ ID NO. 33), GCATGT (SEQ ID NO. 34), AACCAT (SEQ ID NO. 35), CCCCCT (SEQ ID NO. 36), CGATCT (SEQ ID NO. 37), TCGATT (SEQ ID NO. 38), TGCATT (SEQ ID NO. 39), CAACCT (SEQ ID NO. 40), GGTTGT (SEQ ID NO. 41), AAGGAT (SEQ ID NO. 42), AGCTAT (SEQ ID NO. 43), AAAAAT (SEQ ID NO. 44), ACACAT (SEQ ID NO. 45), AATTAT (SEQ ID NO. 46), TCTCTT (SEQ ID NO. 47), TCAGTT (SEQ ID NO. 48), TATATT (SEQ ID NO. 49), AGTCAT (SEQ ID NO. 50), TAGCTT (SEQ ID NO. 51), TGTGTT (SEQ ID NO. 52), TGGTTT (SEQ ID NO. 53), TAATTT (SEQ ID N0.54 ), TCCTTT (SEQ ID NO. 55), and TGACTT (SEQ ID NO. 56).
[0010] Yet another aspect of the invention provides a kit for generating a DNA construct comprising a sequence derived from a miRNA. The kit comprises: an adaptor wherein said adaptor includes a sequence selected from the group consisting of SEQ ID NO. 1, SEQ ID NO. 2, SEQ ID NO. 3 and SEQ ID NO. 4; and a reverse transcription primer wherein the reverse transcription primer is selected from the group consisting of SEQ ID NO. 5 and SEQ ID NO. 6. The kit further comprises: a forward amplification primer, wherein said forward amplification primer includes a sequence selected from the group consisting of SEQ ID NO. 7, SEQ ID NO. 8 and SEQ ID NO. 57; and a reverse amplification primer, wherein said reverse amplification primer includes a sequence selected from the group consisting of SEQ ID NO. 9, SEQ ID NO. 10 and SEQ ID NO. 58. The adaptor of the kit comprises a barcode sequence having a length of at least 2 nucleotides. The said barcode sequence is between 2 to 6 nucleotides in length. Such a barcode in the kit is selected from the group consisting of AGAGAT (SEQ ID NO. 11), CATGCT (SEQ ID NO. 12), CGCGCT (SEQ ID NO. 13), GCCGGT (SEQ ID NO. 14) GCTAGT (SEQ ID NO. 15), CCGGCT (SEQ ID NO. 16), GGAAGT (SEQ ID NO. 17), GACTGT (SEQ ID NO. 18), GAGAGT (SEQ ID NO. 19), CGGCCT (SEQ ID NO. 20), CCAACT (SEQ ID NO. 21), ACCAAT (SEQ ID NO. 22), CACACT (SEQ ID NO. 23), GAAGGT (SEQ ID NO. 24), GGGGGG (SEQ ID NO. 25), CGTACT (SEQ ID NO. 26), CAGTCT (SEQ ID NO. 27), AGGAAT (SEQ ID NO. 28), ACTGAT (SEQ ID NO. 29), ACGTAT (SEQ ID NO. 30), TACGTT (SEQ ID NO. 31), CCTTCT (SEQ ID NO. 32), GATCGT (SEQ ID NO. 33), GCATGT (SEQ ID NO. 34), AACCAT (SEQ ID NO. 35), CCCCCT (SEQ ID NO. 36), CGATCT (SEQ ID NO. 37), TCGATT (SEQ ID NO. 38), TGCATT (SEQ ID NO. 39), CAACCT (SEQ ID NO. 40), GGTTGT (SEQ ID NO. 41), AAGGAT (SEQ ID NO. 42), AGCTAT (SEQ ID NO. 43), AAAAAT (SEQ ID NO. 44), ACACAT (SEQ ID NO. 45), AATTAT (SEQ ID NO. 46), TCTCTT (SEQ ID NO. 47), TCAGTT (SEQ ID NO. 48), TATATT (SEQ ID NO. 49), AGTCAT (SEQ ID NO. 50), TAGCTT (SEQ ID NO. 51), TGTGTT (SEQ ID NO. 52), TGGTTT (SEQ ID NO. 53), TAATTT (SEQ ID N0.54 ), TCCTTT (SEQ ID NO. 55), and TGACTT (SEQ ID NO. 56).
[0011] Still another aspect of the invention provides an isolated sequence having at least
80% sequence identity with a sequence selected from the group consisting of SEQ ID NO. 1, SEQ ID NO. 2, SEQ ID NO. 3, SEQ ID NO. 4, SEQ ID NO. 5, SEQ ID NO. 6, SEQ ID NO. 7, SEQ ID NO. 8, SEQ ID NO. 57, and SEQ ID NO. 58. The isolated sequence having at least 80% sequence identity with a sequence selected from the group consisting of SEQ ID NO. 1, SEQ ID NO. 2, SEQ ID NO. 3, SEQ ID NO. 4 comprises a barcode sequence having a length of at least 2 nucleotides, said barcode sequence has a length between 2 to 6 nucleotides. Such a barcode in these isolated sequence is selected from the group consisting of AGAGAT (SEQ ID NO. 11), CATGCT (SEQ ID NO. 12), CGCGCT (SEQ ID NO. 13), GCCGGT (SEQ ID NO. 14) GCTAGT (SEQ ID NO. 15), CCGGCT (SEQ ID NO. 16), GGAAGT (SEQ ID NO. 17), GACTGT (SEQ ID NO. 18), GAGAGT (SEQ ID NO. 19), CGGCCT (SEQ ID NO. 20), CCAACT (SEQ ID NO. 21), ACCAAT (SEQ ID NO. 22), CACACT (SEQ ID NO. 23), GAAGGT (SEQ ID NO. 24), GGGGGG (SEQ ID NO. 25), CGTACT (SEQ ID NO. 26), CAGTCT (SEQ ID NO. 27), AGGAAT (SEQ ID NO. 28), ACTGAT (SEQ ID NO. 29), ACGTAT (SEQ ID NO. 30), TACGTT (SEQ ID NO. 31), CCTTCT (SEQ ID NO. 32), GATCGT (SEQ ID NO. 33), GCATGT (SEQ ID NO. 34), AACCAT (SEQ ID NO. 35), CCCCCT (SEQ ID NO. 36), CGATCT (SEQ ID NO. 37), TCGATT (SEQ ID NO. 38), TGCATT (SEQ ID NO. 39), CAACCT (SEQ ID NO. 40), GGTTGT (SEQ ID NO. 41), AAGGAT (SEQ ID NO. 42), AGCTAT (SEQ ID NO. 43), AAAAAT (SEQ ID NO. 44), ACACAT (SEQ ID NO. 45), AATTAT (SEQ ID NO. 46), TCTCTT (SEQ ID NO. 47), TCAGTT (SEQ ID NO. 48), TATATT (SEQ ID NO. 49), AGTCAT (SEQ ID NO. 50), TAGCTT (SEQ ID NO. 51), TGTGTT (SEQ ID NO. 52), TGGTTT (SEQ ID NO. 53), TAATTT (SEQ ID N0.54 ), TCCTTT (SEQ ID NO. 55), and TGACTT (SEQ ID NO. 56)..
[0012] Other aspects and features of the disclosure are described more thoroughly below.
BRIEF DESCRIPTION OF THE FIGURES
[0013] Figure 1 depicts an example of generating a DNA template from miRNA in preparation for sequencing; [0014] Fig. 2 depicts gel electrophoresis of DNA templates containing sequences derived from miRNA; and
[0015] Fig. 3 depicts the distribution of DNA templates containing sequences derived from miRNA on an array configured to hybridize to miRNA.
DETAILED DESCRIPTION OF THE INVENTION
[0016] MiRNAs have been shown as a major new class of regulatory gene products. For example, in human heart, liver or brain, miRNAs play a role in tissue specification or cell lineage decisions. In addition, miRNAs influence a variety of processes, including early development, cell proliferation, cell death, and apoptosis and fat metabolism. The large number of miRNA genes, the diverse expression patterns and the abundance of potential miRNA targets suggest that miRNAs may be a significant but unrecognized source of human genetic disease. Differences in miRNA expression have also been found to be associated with cancer diagnosis, prognosis, and susceptibility to treatments.
[0017] A mature miRNA is typically an 18-25 nucleotide, non-coding RNA that regulates expression of mRNA including sequences complementary to the miRNA. These small RNA molecules are known to control gene expression by regulating the stability and/or translation of mRNAs. For example, miRNAs bind to the 3' UTR of target mRNAs and suppress translation. MiRNA' s may also bind to target mRNAs and mediate gene silencing through the RNAi pathway. MiRNAs may also regulate gene expression by causing chromatin condensation.
[0018] Endogenously expressed miRNAs are processed by endonucleolytic cleavage from larger double- stranded RNA precursor molecules. The resulting small single- stranded miRNAs are incorporated into a multi-protein complex, termed RISC. The small RNA in RISC provides sequence information that is used to guide the RNA-protein complex to its target RNA molecules. The degree of complimentarity between the small RNA and its target determines the fate of the bound mRNA. Perfect pairing induces target RNA cleavage, as is the case for siRNAs and most plant miRNAs. In comparison, the imperfect pairing in the central part of the duplex leads to a block in translation.
[0019] miRNAs regulate various biological functions including developmental processes, developmental timing, cell proliferation, neuronal gene expression and cell fate, apoptosis, tissue growth, viral pathogenesis, brain morphogenesis, muscle differentiation, stem cell division and progression of human diseases. Many miRNAs are conserved in sequence and function between distantly related organisms. However, condition- specific, time- specific, and individual- specific levels of gene expression may be due to the interactions of different miRNAs which lead to genetic expression of various traits. The large number of miRNA genes, the diverse expression patterns and the abundance of potential miRNA targets suggest that miRNAs may be a significant but unrecognized source of human genetic diseases. miRNA genetic alterations, such as deletions, insertions, reversions or conversions, may affect the accuracy of miRNA related gene regulation. miRNA genetic alterations may be used as markers for disease prognosis and diagnosis. miRNA alleles may alternatively be used as target for disease treatment, and markers for disease prognosis and diagnosis. Common methods of analyzing miRNA such as array-based methods are unable to detect mutated miRNA.
[0020] The invention provides methods for detecting the presence of a known or unknown miRNA in a sample or total RNA provided. A sample may be derived from a subject. A subject may be any organism including bacteria, fungi, plants, animals that include chordates, mammals, humans, insects, endangered species, or any other organism of agricultural, environmental, or other significance. Samples derived from animals include, but are not limited to, biopsy or other in vivo or ex vivo analysis of prostate, breast, skin, muscle, facia, brain, endometrium, lung, head and neck, pancreas, small intestine, blood, liver, testes, ovaries, colon, skin, stomach, esophagus, spleen, lymph node, bone marrow, kidney, placenta, or fetus. Samples derived from subjects may also take the form of a fluid sample such as peripheral blood, lymph fluid, ascites, serous fluid, pleural effusion, sputum, bronchial wash, bronchioalveolar lavage fluid (BALF,) cerebrospinal fluid, semen, amniotic fluid, lacrimal fluid, stool, urine, or any other source in which a miRNA might be present. Samples may be collected by any method now known or yet to be disclosed, including swiping or swabbing an area or orifice, removal of a piece of tissue as in a biopsy, or any method known to collect bodily fluids. In one embodiment, samples may be derived from a subject displaying symptoms of a cancer and/or from a subject suspected of having cancer. In another embodiment, a sample may be from any environmental source including soil, air, water, solid surfaces (whether natural or artificial,) culture media, foodstuffs, and any interfaces between or combinations of these elements. [0021] The invention may comprise a step of isolating (interchangeably called purifying or extracting) of miRNA from a sample. miRNA is also readily detectable in blood and blood compartments such as serum or plasma or whole blood by any of a number of methods. miRNA may be isolated from a sample by any method now known in the art or yet to be disclosed that may be used to isolate RNA. Such methods include guanidium thiocyanate phenol-chloroform extraction also known as Trizol® extraction, spin-column based methods such as methods involving glass fiber filter columns. The method may alternatively include a method that removes highly expressed miRNA.
[0022] The invention may further comprise a step of adding a poly-A tail to the miRNA.
For example, a mixture with a total RNA of 500ng - ^g may be used. The poly-A tail may contain 3, 4, 5 or any higher number of As. The poly-A tail may be added by any appropriate method. For example, a poly-A tail may be added to the 3 ' end of RNA through the following conditions: RNA in a mixture of 50mM Tris (pH 8.0), 250mM NaCl, lOmM MgCl2, ImM DTT, O.lmM ATP, and 0.4 unit of Poly(A) polymerase is incubated at 37 °C for 30 minutes. This may be followed by 95 °C incubation for 5 minutes to inactivate the Poly-A polymerase.
[0023] The invention may further comprise a step of ligating one or more adaptor molecules to the miRNA. The length of the adaptor may and will vary with regard to the length of the barcode region and the length of areas to which sequencing primers, PCR primers, or other primers may bind or hybridize. In one embodiment of the invention, a first adaptor sequence may be added to the 5' end of the miRNA and a second adaptor sequence may be added to the 3' end of the miRNA. An adaptor sequence comprises at least one universal priming sequence that is known to those skilled in the art. The universal priming sequence may be any sequence that is unlikely to hybridize to miRNA sequences. The adaptor may comprise one or more barcode (also known as index or zip code) sequences, among many purposes, the one or more barcodes facilitate the indexing and identifying the miRNA to be analyzed. Figure 1 depicts one of the embodiments of the present invention.
[0024] A barcode is any sequence of two or more nucleic acids that may aid in the identification of a nucleic acid as being derived from a particular sample. In one embodiment, a barcode may be any identifiable DNA sequence of two or more nucleotides. For example, barcodes may be 3, 4, 5, 6, 7, or more than 7 nucleotides in length. Nonlimiting examples of barcode sequences include AGAGAT (SEQ ID NO. 11), CATGCT (SEQ ID NO. 12), CGCGCT (SEQ ID NO. 13), GCCGGT (SEQ ID NO. 14) GCTAGT (SEQ ID NO. 15), CCGGCT (SEQ ID NO. 16), GGAAGT (SEQ ID NO. 17), GACTGT (SEQ ID NO. 18), GAGAGT (SEQ ID NO. 19), CGGCCT (SEQ ID NO. 20), CCAACT (SEQ ID NO. 21), ACCAAT (SEQ ID NO. 22), CACACT (SEQ ID NO. 23), GAAGGT (SEQ ID NO. 24), GGGGGG (SEQ ID NO. 25), CGTACT (SEQ ID NO. 26), CAGTCT (SEQ ID NO. 27), AGGAAT (SEQ ID NO. 28), ACTGAT (SEQ ID NO. 29), ACGTAT (SEQ ID NO. 30), TACGTT (SEQ ID NO. 31), CCTTCT (SEQ ID NO. 32), GATCGT (SEQ ID NO. 33), GCATGT (SEQ ID NO. 34), AACCAT (SEQ ID NO. 35), CCCCCT (SEQ ID NO. 36), CGATCT (SEQ ID NO. 37), TCGATT (SEQ ID NO. 38), TGCATT (SEQ ID NO. 39), CAACCT (SEQ ID NO. 40), GGTTGT (SEQ ID NO. 41), AAGGAT (SEQ ID NO. 42), AGCTAT (SEQ ID NO. 43), AAAAAT (SEQ ID NO. 44), ACACAT (SEQ ID NO. 45), AATTAT (SEQ ID NO. 46), TCTCTT (SEQ ID NO. 47), TCAGTT (SEQ ID NO. 48), TATATT (SEQ ID NO. 49), AGTCAT (SEQ ID NO. 50), TAGCTT (SEQ ID NO. 51), TGTGTT (SEQ ID NO. 52), TGGTTT (SEQ ID NO. 53), TAATTT (SEQ ID N0.54 ), TCCTTT (SEQ ID NO. 55), and TGACTT (SEQ ID NO. 56).. In one embodiment, the adaptor sequence comprises a sequence selected from a group consisting of : 5 ' -CCGCGTACTGGAAGATTTGCGCATTTTTATC-3 ' (SEQ ID NO. 1), 5'- CCGCGT ACTGG A AG ATTTGCGC ATTTTTATC [X] n Y- 3 ' (SEQ ID NO.2) ; 5 ' -CCGCGT ACTGG AAG ATTTGCC-3 ' (SEQ ID NO. 3) and 5'- CCGCGT ACTGG A AG ATTTGCC [X] n Y- 3 ' (SEQ ID NO. 4); wherein the "X" in brackets represents the the barcode sequence, wherein the letter "Y" represent optional additional bases that can be added immediately following the barcode region of the adaptor in order to minimize ligation efficiency variation due to the barcode sequence. The number of bases represented by Y may be 1, 2, 3, 4, 5 or any other higher number that is suitable but without affecting the following steps of the invention disclosed herein. In one preferred embodiment the optional bases represented by Y is 5'-AAA-3' The letter "n" represents the number of nucleotides in the barcode sequence and "n" may be any positive whole number or zero. For purposes of the sequence listing, the "[X]nY" portion of these sequences is not included therein as these will be variable. As those skilled in the art will understand, the sequence listed in the sequence listing is the "base" sequence, to which the "[X]nY" portion, as defined above, is added. An adaptor may be ligated to a miRNA by any appropriate method including the following example: a mixture of 50μΜ adaptor, 50mM Tris (pH 8.0), lOmM MgCl2, lOmg/ml BSA, ImM hexamine CoCl2, 10% DMSO, 0. ImM ATP, and 5 units of T4 RNA ligase may be incubated at 20 °C for 2 hours.
[0025] The invention may further comprise a step of annealing of a reverse transcription primer to the construct. The reverse transcription primer may be any sequence that facilitates reverse transcription of the miRNA construct and may include a sequence of two or more T nucleotides. Nucleic acid sequences may be identified by the IUAPC letter code which is as follows: A - Adenine base; C- Cytosine base; G - guanine base; T or U - thymine or uracil base. M - A or C; R - A or G; W - A or T; S - C or G; Y - C or T; K - G or T; V - A or C or G; H - A or C or T; D - A or G or T; B - C or G or T; N or X - A or C or G or T. Note that T or U may be used interchangeably depending on whether the nucleic acid is DNA or RNA. Nonlimiting examples of reverse transcription primers include:
5 ' -TTATA AGTATGCGCATTAAAATAGTCACGCTTTTTTTTTTTTVN-3 '
(SEQ ID NO. 5); and 5 ' -TTATAAGTATGCGCATTAACTTTTTTTTTTTTVN-3 ' (SEQ ID NO. 6).
[0026] The invention may further comprise a step of reverse transcription of the construct with a reverse transcriptase. The reverse transcriptase may be derived from any source including M-MLV, AMV, HIV, or any other enzyme or other chemical or physical entity capable of generating single stranded cDNA from an RNA template.
[0027] The invention may further comprise a step of nucleic acid amplification of the cDNA through any of a number of processes. Nucleic acids that may be subjected to amplification may be from any samples of any source or subject. In general, nucleic acid amplification is a process by which copies of a nucleic acid may be made from a source nucleic acid. In some nucleic amplification methods, the copies are generated exponentially. Examples of nucleic acid amplification include but are not limited to: the polymerase chain reaction (PCR), ligase chain reaction (LCR,) self- sustained sequence replication (3SR), nucleic acid sequence based amplification (NASBA,) strand displacement amplification (SDA,) amplification with Q replicase, whole genome amplification with enzymes such as φ29, whole genome PCR, in vitro transcription with Klenow or any other RNA polymerase, or any other method by which copies of a desired sequence are generated.
[0028] Polymerase chain reaction (PCR) is a particular method of amplifying DNA, generally involving the mixing of a nucleic sample, two or more primers, a DNA polymerase, which may be a thermostable DNA polymerase such as Taq or Pfu, and deoxyribose nucleoside triphosphates (dNTP's). In general, the reaction mixture is subjected to temperature cycles comprising a denaturation stage; (typically 80-100° C); an annealing stage with a temperature that may based on the melting temperature (Tm) of the primers and the degeneracy of the primers; and an extension stage at a suitable temperature, an exemplary range may be 40-75° C. In real-time PCR analysis, additional reagents, methods, optical detection systems, and devices are used that allow a measurement of the magnitude of fluorescence in proportion to concentration of amplified DNA. In such analyses, incorporation of fluorescent dye into the amplified strands may be detected or labeled probes that bind to a specific sequence during the annealing phase release their fluorescent tags during the extension phase. Either of these will allow for the quantification of the amount of specific DNA present in the initial sample. RNA may be detected by PCR analysis by creating a DNA template from RNA through a reverse transcriptase enzyme. Primers for nucleic acid amplification may contain additional sequences that add features to the construct including sites for protein binding such as restriction enzyme sites or promoter binding sites or sites that facilitate additional methods of nucleic acid analysis including sequences that facilitate DNA sequencing. Forward primers used in nucleic acid amplification may include sequences such as- 5 ' -ATCTCCGCGTACTGGAAGATTTGC-3 ' (SEQ ID NO. 7) or 5 ' - AATGATACGGCGACCACCGAGATCTACACTCTTTCCCTA- CACGACGCTCTTCCGATCTCCGCGTACTGGAAGATTTGC-3' (SEQ ID NO. 8). Reverse primers used in nucleic acid amplification may include sequences such as 5'- AATGCGCATACTTATAA-3 ' (SEQ ID NO. 9) or 5'- CAAGCAGAAGACGG- CATACGAGATCGGTCTCGGCATTCCTGCTGAACCGCTCTTCCGATCTGTTAATGCGC ATACTT ATA A- 3 ' (SEQ ID NO. 10). The consensus sequence of the forward primer is: 5'- TAATACGACTCACTATAGGGCGATCGTCACCGTGTACAAANNNNNNNNNNNNNNNN NNNNNNNNAAA AAAAAAAAA-3 ' (SEQ ID NO. 57 ); and the consensus sequence of the reverse primer is: 5'-
TCGGCCTGCCTGAAAGCGTGGTGATTTCCGTTTTTTTTTTTTNNNNNNNNNNNNNNN NNNNNNNNNTTTGTACACG-3 ' (SEQ ID NO. 58). Each of the SEQ ID NO. 57 and SEQ ID NO. 58 is 76 bases. The 24 Ns of SEQ ID NO. 57 and SEQ ID NO. 58 represent where the miRNAs should be. Thus the 24 Ns of SEQ ID NO. 57 hybridize to the cDNA product containing the miRNA sequence. The first 40 nucleotides of SEQ ID NO. 57 are configured to bind to a sequencing primer. For SEQ ID N0.58, the 24 Ns of SEQ ID NO. 57 hybridize to the cDNA product containing the miRNA sequence, and the first 40 nucleotides are configured to bind to the sequencing primer. The length of the forward and reverse PCR primers may be between 50-110 bases, as applicable. Preferably, the length of the forward and reverse PCR primers is between 60-100 bases. Still more preferably, the length of the forward and reverse PCR primers is between 70-90 bases.
[0029] To all above described sequences represented by its SEQ ID NO respectively, a sequence sharing about 60%, 65%, 70%, 75%, 80%, 85%, 90%, 95%, 99% or 100% identity to the described individual sequences is encompassed by the invention if it is capable of binding to the complimentary sequences of the described individual sequences. "Sequence Identity" as it is known in the art refers to a relationship between two or more polypeptide sequences or two or more polynucleotide sequences, namely a reference sequence and a given sequence to be compared with the reference sequence. Sequence identity is determined by comparing the given sequence to the reference sequence after the sequences have been optimally aligned to produce the highest degree of sequence similarity, as determined by the match between strings of such sequences. Upon such alignment, sequence identity is ascertained on a position-by-position basis, e.g., the sequences are "identical" at a particular position if at that position, the nucleotides or amino acid residues are identical. The total number of such position identities is then divided by the total number of nucleotides or residues in the reference sequence to give % sequence identity. Sequence identity can be readily calculated by known methods, including but not limited to, those described in Computational Molecular Biology, Lesk, A. N., ed., Oxford University Press, New York (1988), Biocomputing: Informatics and Genome Projects, Smith, D.W., ed., Academic Press, New York (1993); Computer Analysis of Sequence Data, Part I, Griffin, A.M., and Griffin, H. G., eds., Humana Press, New Jersey (1994); Sequence Analysis in Molecular Biology, von Heinge, G., Academic Press (1987); Sequence Analysis Primer, Gribskov, M. and Devereux, J., eds., M. Stockton Press, New York (1991); and Carillo, H., and Lipman, D., SIAM J. Applied Math., 48: 1073 (1988), the teachings of which are incorporated herein by reference. Preferred methods to determine the sequence identity are designed to give the largest match between the sequences tested. Methods to determine sequence identity are codified in publicly available computer programs which determine sequence identity between given sequences. Examples of such programs include, but are not limited to, the GCG program package (Devereux, J., et al., Nucleic Acids Research, 12(1):387 (1984)), BLASTP, BLASTN and FASTA (Altschul, S. F. et al, J. Molec. Biol, 215:403-410 (1990). The BLASTX program is publicly available from NCBI and other sources (BLAST Manual, Altschul, S. et al., NCVI NLM NIH Bethesda, MD 20894, Altschul, S. F. et al., J. Molec. Biol., 215:403-410 (1990), the teachings of which are incorporated herein by reference). These programs optimally align sequences using default gap weights in order to produce the highest level of sequence identity between the given and reference sequences. As an illustration, by a polynucleotide having a nucleotide sequence having at least, for example, 85%, preferably 90%, even more preferably 95% "sequence identity" to a reference nucleotide sequence, it is intended that the nucleotide sequence of the given polynucleotide is identical to the reference sequence except that the given polynucleotide sequence may include up to 15, preferably up to 10, even more preferably up to 5 point mutations per each 100 nucleotides of the reference nucleotide sequence. In other words, in a polynucleotide having a nucleotide sequence having at least 85%, preferably 90%, even more preferably 95% identity relative to the reference nucleotide sequence, up to 15%, preferably 10%, even more preferably 5% of the nucleotides in the reference sequence may be deleted or substituted with another nucleotide, or a number of nucleotides up to 15%, preferably 10%, even more preferably 5% of the total nucleotides in the reference sequence may be inserted into the reference sequence. These mutations of the reference sequence may occur at the 5' or 3' terminal positions of the reference nucleotide sequence or anywhere between those terminal positions, interspersed either individually among nucleotides in the reference sequence or in one or more contiguous groups within the reference sequence.
[0030] The invention may further comprise the step of sequencing the amplified construct. Methods of sequencing include but need not be limited to any form of DNA sequencing including Sanger, next generation sequencing, pyrosequencing, SOLiD sequencing, massively parallel sequencing, pooled, and barcoded DNA sequencing or any other sequencing method now known or yet to be disclosed.
[0031] In Sanger Sequencing, a single- stranded DNA template, a primer, a DNA polymerase, nucleotides and a label such as a radioactive label conjugated with the nucleotide base or a fluorescent label conjugated to the primer, and one chain terminator base comprising a dideoxynucleotide (ddATP, ddGTP, ddCTP, or ddTTP, are added to each of four reaction (one reaction for each of the chain terminator bases). The sequence may be determined by electrophoresis of the resulting strands. In dye terminator sequencing, each of the chain termination bases is labeled with a fluorescent label of a different wavelength that allows the sequencing to be performed in a single reaction.
[0032] In pyrosequencing, the addition of a base to a single stranded template to be sequenced by a polymerase results in the release of a phyrophosphate upon nucleotide incorporation. An ATP sulfyrlase enayme converts pyrophosphate into ATP that in turn catalyzes the conversion of luciferin to oxyluciferin which results in the generation of visible light that is then detected by a camera or other sensor capable of capturing visible light.
[0033] In SOLiD sequencing, the molecule to be sequenced is fragmented and used to prepare a population of clonal magnetic beads (in which each bead is conjugated to a plurality of copies of a single fragment) with an adaptor sequence and alternatively a barcode sequence. The beads are bound to a glass surface. Sequencing is then performed through 2-base encoding.
[0034] In massively parallel sequencing, randomly fragmented targeted DNA is attached to a surface. The fragments are extended and bridge amplified to create a flow cell with clusters, each with a plurality of copies of a single fragment sequence. The templates are sequenced by synthesizing the fragments in parallel. Bases are indicated by the release of a fluorescent dye correlating to the addition of the particular base to the fragment. Nucleic acid sequences may be identified by the IUAPC letter code which is as follows: A - Adenine base; C- Cytosine base; G - guanine base; T or U - thymine or uracil base. M - A or C; R - A or G; W - A or T; S - C or G; Y - C or T; K - G or T; V - A or C or G; H - A or C or T; D - A or G or T; B - C or G or T; N or X - A or C or G or T. Note that T or U may be used interchangeably depending on whether the nucleic acid is DNA or RNA. A sequence having less than 60%, 70%, 80%, 90%, 95%, 99% or 100% identity to the identifying sequence may still be encompassed by the invention if it is able of binding to its complimentary sequence and/or facilitating nucleic acid amplification of a desired target sequence. "Sequence Identity" is defined the same as above provided.
[0035] A sequence is called degenerate if some of its positions have several possible bases. If a sequence is represented in degenerate form, for example, through the use of codes other than A, C, G, T, or U (for example: R, Y, M, K, S, W, H, B, V, D, N according to IUPAC nomenclature of mixed bases, see above for the detail), the nucleic acid sequence presented in a degenerate form includes individual sequences each of which has a specific nucleotide encompassed by the IUPAC code at the degenerate position. In some embodiment, a degenerate is applied a mixture of nucleic acids of individual sequences having a specific nucleotide encompassed by the IUPAC code at the degenerate position.
[0036] A nucleic acid may be added to a sample by any of a number of methods including manual methods, mechanical methods, or any combination thereof.
[0037] Therefore, one aspect of the present invention provides a method of generating a
DNA construct, and the method comprises isolating miRNA from a first sample; adding a first adaptor to a first mixture comprising the miRNA, wherein the first adaptor includes a sequence selected from the group consisting of SEQ ID NO. 1 and SEQ ID NO. 3; adding a reverse transcription primer to the mixture, wherein the reverse transcription primer includes a sequence selected from SEQ ID NO. 5 and SEQ ID NO. 6; adding a forward amplification primer to the first mixture, adding a reverse amplification primer to the first mixture; and subjecting the mixture to conditions that allow addition of a poly-A tail to the miRNA, ligating the adaptor to the miRNA; generating a cDNA template by reverse transcription; and amplifying the nucleic acid. The forward amplification primer may include a sequence represented by SEQ ID NO. 7. If the primer includes a sequence represented by SEQ ID NO. 7, then the primer may also include SEQ ID NO. 8. The reverse amplification primer may include SEQ ID NO. 9. If the primer includes SEQ ID NO. 9, it may also include SEQ ID NO. 10.
[0038] The first adaptor may comprise a first barcode. If the first adaptor comprises a first barcode, the method may further comprise the steps of isolating miRNA from a second sample; adding a second adaptor to a second mixture comprising the miRNA from the second sample, wherein the second adaptor includes a sequence selected from SEQ ID NO. 1 and SEQ ID NO. 3 and further comprises a second barcode; and adding at least a portion of the second mixture to the first mixture. The barcode may comprise any number of nucleotides including six or fewer nucleotides. The nonlimiting exemplary barcodes include the following sequences: AGAGAT (SEQ ID NO. 11), CATGCT (SEQ ID NO. 12), CGCGCT (SEQ ID NO. 13), GCCGGT (SEQ ID NO. 14) GCTAGT (SEQ ID NO. 15), CCGGCT (SEQ ID NO. 16), GGAAGT (SEQ ID NO. 17), GACTGT (SEQ ID NO. 18), GAGAGT (SEQ ID NO. 19), CGGCCT (SEQ ID NO. 20), CCAACT (SEQ ID NO. 21), ACCAAT (SEQ ID NO. 22), CACACT (SEQ ID NO. 23), GAAGGT (SEQ ID NO. 24), GGGGGG (SEQ ID NO. 25), CGTACT (SEQ ID NO. 26), CAGTCT (SEQ ID NO. 27), AGGAAT (SEQ ID NO. 28), ACTGAT (SEQ ID NO. 29), ACGTAT (SEQ ID NO. 30), TACGTT (SEQ ID NO. 31), CCTTCT (SEQ ID NO. 32), GATCGT (SEQ ID NO. 33), GCATGT (SEQ ID NO. 34), AACCAT (SEQ ID NO. 35), CCCCCT (SEQ ID NO. 36), CGATCT (SEQ ID NO. 37), TCGATT (SEQ ID NO. 38), TGCATT (SEQ ID NO. 39), CAACCT (SEQ ID NO. 40), GGTTGT (SEQ ID NO. 41), AAGGAT (SEQ ID NO. 42), AGCTAT (SEQ ID NO. 43), AAAAAT (SEQ ID NO. 44), ACACAT (SEQ ID NO. 45), AATTAT (SEQ ID NO. 46), TCTCTT (SEQ ID NO. 47), TCAGTT (SEQ ID NO. 48), TATATT (SEQ ID NO. 49), AGTCAT (SEQ ID NO. 50), TAGCTT (SEQ ID NO. 51), TGTGTT (SEQ ID NO. 52), TGGTTT (SEQ ID NO. 53), TAATTT (SEQ ID N0.54 ), TCCTTT (SEQ ID NO. 55), and TGACTT (SEQ ID NO. 56). The method may further comprise the step of performing DNA sequencing on the DNA template. Preferred methods of DNA sequencing include Sanger sequencing, pyrosequencing, SOLiD sequencing, or massively parallel sequencing.
[0039] Another aspect of the present invention provides a DNA construct, which comprises a first sequence derived from a miRNA; a second sequence selected from the group consisting of SEQ ID NO. 1 and SEQ ID NO. 3; and a third sequence selected from the group consisting of SEQ ID NO. 5 and SEQ ID NO. 6. In some forms of the invention, the second sequence may be 5' of the first sequence. In some forms of the invention, the third sequence may be 3' of the first sequence. The construct may further comprise a barcode sequence. The barcode sequence may be 5' of the first sequence. The barcode may comprise six or fewer nucleotides such as AGAGAT (SEQ ID NO. 11), CATGCT (SEQ ID NO. 12), CGCGCT (SEQ ID NO. 13), GCCGGT (SEQ ID NO. 14) GCTAGT (SEQ ID NO. 15), CCGGCT (SEQ ID NO. 16), GGAAGT (SEQ ID NO. 17), GACTGT (SEQ ID NO. 18), GAGAGT (SEQ ID NO. 19), CGGCCT (SEQ ID NO. 20), CCAACT (SEQ ID NO. 21), ACCAAT (SEQ ID NO. 22), CACACT (SEQ ID NO. 23), GAAGGT (SEQ ID NO. 24), GGGGGG (SEQ ID NO. 25), CGTACT (SEQ ID NO. 26), CAGTCT (SEQ ID NO. 27), AGGAAT (SEQ ID NO. 28), ACTGAT (SEQ ID NO. 29), ACGTAT (SEQ ID NO. 30), TACGTT (SEQ ID NO. 31), CCTTCT (SEQ ID NO. 32), GATCGT (SEQ ID NO. 33), GCATGT (SEQ ID NO. 34), AACCAT (SEQ ID NO. 35), CCCCCT (SEQ ID NO. 36), CGATCT (SEQ ID NO. 37), TCGATT (SEQ ID NO. 38), TGCATT (SEQ ID NO. 39), CAACCT (SEQ ID NO. 40), GGTTGT (SEQ ID NO. 41), AAGGAT (SEQ ID NO. 42), AGCTAT (SEQ ID NO. 43), AAAAAT (SEQ ID NO. 44), ACACAT (SEQ ID NO. 45), AATTAT (SEQ ID NO. 46), TCTCTT (SEQ ID NO. 47), TCAGTT (SEQ ID NO. 48), TATATT (SEQ ID NO. 49), AGTCAT (SEQ ID NO. 50), TAGCTT (SEQ ID NO. 51), TGTGTT (SEQ ID NO. 52), TGGTTT (SEQ ID NO. 53), TAATTT (SEQ ID N0.54 ), TCCTTT (SEQ ID NO. 55), and TGACTT (SEQ ID NO. 56)..
[0040] Yet another aspect of the invention provides kits to facilitate miRNA sequencing.
A kit may include any combination of components that facilitates the performance of an assay. A kit may include suitable nucleic acid-based and immunological reagents as well as suitable buffers, control reagents, and printed protocols.
[0041] Kits that facilitate nucleic acid sequencing may further include one or more of the following: specific nucleic acids such as oligonucleotides, primers, or probes, labeling reagents, enzymes including PCR amplification reagents such as Taq or Pfu, reverse transcriptase, or one or more other polymerases, and/or reagents that facilitate binding of the nucleic acids to their targets. Specific nucleic acids may further include nucleic acids, polynucleotides, oligonucleotides (DNA, or RNA), or any combination of molecules that includes one or more of the above, or any other molecular entity capable of specific binding to a nucleic acid sequence.
[0042] An oligonucleotide may be any polynucleotide of at least 2 nucleotides.
Oligonucleotides may be less than 10, 15, 20, 30, 40, 50, 75, 100, 200, or 500 nucleotides in length. While oligonucleotides are often linear, they may, depending on their sequence and conditions, assume a two- or three-dimensional structure. Oligonucleotides may be chemically synthesized by any of a number of methods including sequential synthesis, solid phase synthesis, or any other synthesis method now known or yet to be disclosed. Alternatively, oligonucleotides may be produced by recombinant DNA based methods.
[0043] In one embodiment of the invention, the oligonucleotide may be affixed to a solid substrate. In another embodiment of the invention, the sample may be affixed to a solid substrate. A probe or sample may be covalently bound to the substrate or it may be bound by some non covalent interaction including electrostatic, hydrophobic, hydrogen bonding, Van Der Waals, magnetic, or any other interaction by which a probe such as an oligonucleotide probe may be attached to a substrate while maintaining its ability to recognize the allele to which it has specificity. A substrate may be any solid or semi solid material onto which a probe may be affixed, attached or printed, either singly or in the formation of a microarray. Examples of substrate materials include but are not limited to polyvinyl, polystyrene, polypropylene, polyester or any other plastic, glass, silicon dioxide or other silanes, hydrogels, gold, platinum, microbeads, micelles and other lipid formations, nitrocellulose, or nylon membranes. The substrate may take any form, including a spherical bead or flat surface. For example, the probe may be bound to a substrate in the case of an array. The sample may be bound to a substrate as (for example) the case of a Southern Blot, Northern blot or other method that affixes the sample to a substrate.
[0044] In one embodiment of the present invention, the kit for miRNA sequencing generally comprises a first adaptor wherein the first adaptor includes a sequence selected from the group consisting of SEQ ID NO. 1 and SEQ ID NO. 3; and a reverse transcription primer, wherein the reverse transcription primer is selected from the group consisting of SEQ ID NO. 5 and SEQ ID NO. 6. Preferably, the first adaptor of the kit comprises a first barcode. If the first adaptor comprises a first barcode, then the kit may further comprise a second adaptor that includes a sequence selected from the group consisting of SEQ ID NO. 1 and SEQ ID NO. 3 and wherein the second adaptor comprises a second barcode. The first barcode may comprise any number of nucleotides including six or fewer nucleotides such as AG AG AT (SEQ ID NO. 11), CATGCT (SEQ ID NO. 12), CGCGCT (SEQ ID NO. 13), GCCGGT (SEQ ID NO. 14) GCTAGT (SEQ ID NO. 15), CCGGCT (SEQ ID NO. 16), GGAAGT (SEQ ID NO. 17), GACTGT (SEQ ID NO. 18), GAGAGT (SEQ ID NO. 19), CGGCCT (SEQ ID NO. 20), CCAACT (SEQ ID NO. 21), ACCAAT (SEQ ID NO. 22), CACACT (SEQ ID NO. 23), GAAGGT (SEQ ID NO. 24), GGGGGG (SEQ ID NO. 25), CGTACT (SEQ ID NO. 26), CAGTCT (SEQ ID NO. 27), AGGAAT (SEQ ID NO. 28), ACTGAT (SEQ ID NO. 29), ACGTAT (SEQ ID NO. 30), TACGTT (SEQ ID NO. 31), CCTTCT (SEQ ID NO. 32), GATCGT (SEQ ID NO. 33), GCATGT (SEQ ID NO. 34), AACCAT (SEQ ID NO. 35), CCCCCT (SEQ ID NO. 36), CGATCT (SEQ ID NO. 37), TCGATT (SEQ ID NO. 38), TGCATT (SEQ ID NO. 39), CAACCT (SEQ ID NO. 40), GGTTGT (SEQ ID NO. 41), AAGGAT (SEQ ID NO. 42), AGCTAT (SEQ ID NO. 43), AAAAAT (SEQ ID NO. 44), ACACAT (SEQ ID NO. 45), AATTAT (SEQ ID NO. 46), TCTCTT (SEQ ID NO. 47), TCAGTT (SEQ ID NO. 48), TATATT (SEQ ID NO. 49), AGTCAT (SEQ ID NO. 50), TAGCTT (SEQ ID NO. 51), TGTGTT (SEQ ID NO. 52), TGGTTT (SEQ ID NO. 53), TAATTT (SEQ ID N0.54 ), TCCTTT (SEQ ID NO. 55), and TGACTT (SEQ ID NO. 56)..
[0045] The kit may further comprise a forward amplification primer that includes SEQ
ID NO. 7. If the forward amplification primer includes SEQ ID NO. 7, then the forward amplification primer may also include SEQ ID NO. 8. The kit may further comprise a reverse amplification primer that includes SEQ ID NO. 9. If the reverse amplification primer includes SEQ ID NO. 9, then the reverse amplification primer may further include SEQ ID NO. 10. The kit may further comprise a DNA polymerase such as a thermostable DNA polymerase. The kit may further comprise a reverse transcriptase.
EXAMPLE
[0046] New "next-generation" high-throughput sequencing platforms, capable of generating massive amounts of reads very quickly, can be more effectively utilized for systematic and in-depth study of genetic variation.
[0047] This invention provides the reverse transcription process to facilitate barcoding of cDNA derived from miRNA. Furthermore, sample oligo-pooling is optimized to increase density and coverage on a flowcell, which substantially increase the throughput for sample analysis. Without oligo-pooling, sequencing of the miRNA transcriptome can be performed on 8 samples on a flowcell with 1 unique sample per channel. Using sample oligo-pooling, up to 400 samples per flowcell can be performed without loss of data.
[0048] One example of the invention is outlined in Figure 1. This figure depicts the generation of a cDNA construct that may be used in next-generation sequencing that contains miRNA sequence. The cDNA derived from miRNA is 75-130 nucleotides in length (a 30-50 nucleotide 5' adaptor + a 15-30 nucleotide miRNA sequence + a 30-50 nucleotide reverse transcription primer). This example includes a barcode of 5-6 nucleotides in length near the 3' end of the adaptor, just 5' of the miRNA sequence.
[0049] The example of the invention outlined in Figure 1 was validated through the following study. In this study, 3-5 additional bases are added immediately following the barcode region of the adaptor in order to minimize ligation efficiency variation due to the barcode sequence. Additional PCR primers were designed and synthesized on both primers in order to facilitate sequencing on the Illumina Genome Analyzer. The consensus sequence of the forward primer is: 5'-
TAATACGACTCACTATAGGGCGATCGTCACCGTGTACAAANNNNNNNNNNNNNNNN NNNNNNNNAAA AAAAAAAAA-3 ' (SEQ ID NO. 57 ); and the consensus sequence of the reverse primer is: 5'- TCGGCCTGCCTGAAAGCGTGGTGATTTCCGTTTTTTTTTTTTNNNNNNNNNNNNNNN NNNNNNNNNTTTGTACACG-3 ' (SEQ ID NO. 58). Each of the SEQ ID NO. 57 and SEQ ID NO. 58 is 76 bases. The 24 Ns of SEQ ID NO. 57 and SEQ ID NO. 58 represent where the miRNAs should be. Thus the 24 Ns of SEQ ID NO. 57 hybridize to the cDNA product containing the miRNA sequence. The first 40 nucleotides of SEQ ID NO. 57 are configured to bind to a sequencing primer. For SEQ ID N0.58, the 24 Ns of SEQ ID NO. 57 hybridize to the cDNA product containing the miRNA sequence, and the first 40 nucleotides are configured to bind to the sequencing primer. The length of the forward and reverse PCR primer may vary depending on the number of nucleotides configured to bind to a sequencing primer, and the number of nucleotides hybridize to the cDNA construct containing the miRNA sequence. The length of the forward and reverse PCR primers may be between 50-110 bases, as applicable. Preferably, the length of the forward and reverse PCR primers is between 60-100 bases. Still more preferably, the length of the forward and reverse PCR primers is between 70-90 bases. As one example, the forward PCR primer is 82 nucleotides in length. The first 58 nucleotides are configured to bind to a sequencing primer and the last 24 nucleotides hybridize to the cDNA product containing the miRNA sequence. In this example the reverse PCR primer is 78 nucleotides in length. The 17 nucleotides at the 5'-end hybridize to the cDNA construct that includes the miRNA sequence. The remaining 61 nucleotides are configured to bind to the sequencing primer. In this example, the final cDNA product containing a miRNA sequence should be approximately 220 nucleotides in length.
[0050] Total RNA was extracted from HEK293 cells. Figure 2 depicts the cDNA resulting from the study. A DNA size ladder is in lane 1. Lanes 2 through 8 contain products from seven replicates of the exemplary method outlined in Figure 1. The bright bands at ~160bp are primer dimers. These are of the expected size given an 82 nucleotide forward primer and a 78 nucleotide reverse primer. The region around 220 base pairs was excised from the gel.
[0051] Figure 3 depicts a miRNA array analysis on cDNA excised from the gel. This confirms that the cDNA created using these primers contains a wide range of miRNA, in terms of expression level and content with similar expression levels of replicate miRNA.
[0052] The gel extracted cDNA constructs were also analyzed by quantitative real-time
PCR using primers capable of specifically amplifying the miRNA' s let-7d and miR-125b. In Table 1, the row labeled qRT-PCR row displays the Cp values for the let-7d and miR-125b (relative to a no template control. Lower numbers correlate with greater miRNA expression. The second row displays miRNA microarray signal intensity for let-7a and miR-125 (relative to background signal). By qRT-PCR and miRNA microarray, let-7a appears to be in higher abundance relative to miR-125b in this cDNA product.
[0053] Presence of let-7a and miR-125b was verified by Sanger sequencing of the quantitative real-time PCR product. The results for let-7a and miR-125 by qRT-PCR and miRNA microarray is shown in Table 1 :
Figure imgf000022_0001

Claims

CLAIMS We claim:
1. A method of generating a DNA construct comprising a sequence derived from a miRNA molecule, comprising the steps of:
isolating one or more miRNA molecules from a first sample;
subjecting a first mixture comprising said miRNA molecule(s) to conditions that allow addition of a poly-A tail to the miRNA;
adding a first adaptor to said first mixture, wherein said first adaptor comprises a sequence selected from the group consisting of SEQ ID NO. 1, SEQ ID NO. 2, SEQ ID NO. 3 and SEQ ID NO. 4;
adding a reverse transcription primer to said first mixture, wherein said reverse transcription primer includes a sequence selected from the group consisting of SEQ ID NO. 5 and SEQ ID NO. 6;
adding a forward amplification primer and a reverse amplification primer to said first mixture;
amplifying said first mixture; and
obtaining an amplified DNA construct from said amplification, said amplified DNA construct comprising a sequence derived from said miRNA.
2. The method of claim 1, further comprising the step of sequencing said amplified DNA construct comprising the nucleic sequence of said miRNA molecule
3. The method of claim 1, wherein said first adaptor comprises a first barcode sequence.
4. The method of claim 3, further comprising the steps of isolating miRNA from a second sample; adding a second adaptor to a second mixture comprising said miRNA from said second sample; wherein said second adaptor includes a sequence selected from the group consisting of SEQ ID NO. 1, SEQ ID NO. 2, , SEQ ID NO. 3, and SEQ ID NO. 4 and wherein said second adaptor comprises a second barcode sequence that differs from said first barcode sequence; and adding at least a portion of said second mixture to said first mixture.
5. The method of claim 4, wherein said first barcode sequence and said second barcode sequence are each at least 2 nucleotides in length.
6. The method of claim 4, wherein said first barcode sequence and said second barcode sequence are each from 2 to 6 nucleotides in length.
7. The method of claim 6, wherein the first barcode sequence and the second barcode sequence each are each individually selected from the group consisting of SEQ ID NOs. 11-56.
8. The method of claim 1, wherein said forward amplification primer includes a sequence selected from the group consisting of SEQ ID NO. 7, SEQ ID NO. 8 and SEQ ID No. 57.
9. The method of claim 1, wherein said reverse amplification primer includes a sequence selected from the group consisting of SEQ ID NO. 9, SEQ ID NO.
10 and SEQ ID No. 58.
11. The method of claim 2, wherein the DNA sequencing is selected from Sanger sequencing, pyro sequencing, SOLiD sequencing, massively parallel sequencing, and derivatives thereof.
12. A DNA construct comprising:
a first sequence derived from a miRNA;
a second sequence selected from the group consisting of SEQ ID NO. 1, SEQ ID NO. 2, SEQ ID NO. 3, and SEQ ID NO. 4; and
a third sequence selected from the group consisting of SEQ ID NO. 5 and SEQ ID NO. 6.
13. The DNA construct of claim 12, further comprising a fourth sequence selected from the group consisting of SEQ ID NO. 7, SEQ ID NO. 8 and SEQ ID No. 57.
14. The DNA construct of claim 12, further comprising a fifth sequence selected from the group consisting of SEQ ID NO. 9, SEQ ID NO. 10 and SEQ ID No. 58.
15. The DNA construct of claim 12, further comprising a barcode sequence being at least 2 nucleotides in length.
16. The DNA construct of claim 15, wherein said barcode sequence has a length between 2 to 6 nucleotides.
17. The DNA construct of claim 16, wherein said barcode sequence is selected from the group consisting of SEQ ID NOS. 11-56.
18. A kit for generating a DNA construct, said DNA construct comprising a sequence derived from a miRNA, said kit comprising:
an adaptor wherein said adaptor includes a sequence selected from the group consisting of SEQ ID NO. 1, SEQ ID NO. 2, SEQ ID NO. 3 and SEQ ID NO. 4; and
a reverse transcription primer wherein the reverse transcription primer is selected from the group consisting of SEQ ID NO. 5 and SEQ ID NO. 6.
19. The kit of claim 18, further comprising:
a forward amplification primer, wherein said forward amplification primer includes a sequence selected from the group consisting of SEQ ID NO. 7, SEQ ID NO. 8 and SEQ ID NO. 57; and
a reverse amplification primer, wherein said reverse amplification primer includes a sequence selected from the group consisting of SEQ ID NO. 9, SEQ ID NO. 10 and SEQ ID NO. 58.
20. The kit of claim 18, wherein said adaptor comprises a barcode sequence having a length of at least 2 nucleotides.
21. The kit of claim 20, wherein said barcode sequence is between 2 to 6 nucleotides in length.
22. The kit of claim 20, wherein said barcode sequence is selected from the group consisting of SEQ ID NOS. 11-56. REPLACEMENT PAGE
23. An isolated sequence having at least 80% sequence identity with a sequence selected from the group consisting of SEQ ID NO. 1, SEQ ID NO. 2, SEQ ID NO. 3, SEQ ID NO. 4, SEQ ID NO. 5, SEQ ID NO. 6, SEQ ID NO. 7, SEQ ID NO. 8, SEQ ID NO. 57, and SEQ ID NO. 58.
24. The isolated sequence of claim 24, further comprising a barcode sequence having a length of at least 2 nucleotides.
25. The isolated sequence of claim 25, wherein said barcode sequence has a length between 2 to 6 nucleotides.
26. The isolated sequence of claim 26, wherein said barcode sequence is selected from the group consisting of SEQ ID NOS. 11-56.
PCT/US2011/037616 2010-05-21 2011-05-23 Methods and kits to analyze microrna by nucleic acid sequencing WO2011146942A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
US34720210P 2010-05-21 2010-05-21
US61/347,202 2010-05-21

Publications (1)

Publication Number Publication Date
WO2011146942A1 true WO2011146942A1 (en) 2011-11-24

Family

ID=44992094

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/US2011/037616 WO2011146942A1 (en) 2010-05-21 2011-05-23 Methods and kits to analyze microrna by nucleic acid sequencing

Country Status (1)

Country Link
WO (1) WO2011146942A1 (en)

Cited By (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP2733221A1 (en) * 2012-11-19 2014-05-21 Samsung Electronics Co., Ltd Polynucleotide and use thereof
WO2016149021A1 (en) * 2015-03-13 2016-09-22 Life Technologies Corporation Methods, compositions and kits for small rna capture, detection and quantification
US10954553B2 (en) 2012-11-02 2021-03-23 Life Technologies Corporation Compositions, methods and kits for enhancing PCR specificity
US20230046411A1 (en) * 2011-01-31 2023-02-16 Roche Sequencing Solutions, Inc. Methods of identifying multiple epitopes in cells
US11939634B2 (en) 2010-05-18 2024-03-26 Natera, Inc. Methods for simultaneous amplification of target loci
US11946101B2 (en) 2015-05-11 2024-04-02 Natera, Inc. Methods and compositions for determining ploidy

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20050196754A1 (en) * 2000-03-31 2005-09-08 Drmanac Radoje T. Novel nucleic acids and polypeptides
US20060246464A1 (en) * 2005-02-04 2006-11-02 Xueliang Xia Method of isolating, labeling and profiling small RNAs
US20070054278A1 (en) * 2003-11-18 2007-03-08 Applera Corporation Polymorphisms in nucleic acid molecules encoding human enzyme proteins, methods of detection and uses thereof
US20080045418A1 (en) * 2005-02-04 2008-02-21 Xia Xueliang J Method of labeling and profiling rnas
US20080113342A1 (en) * 1998-11-16 2008-05-15 Monsanto Technology Llc Plant Genome Sequence and Uses Thereof

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20080113342A1 (en) * 1998-11-16 2008-05-15 Monsanto Technology Llc Plant Genome Sequence and Uses Thereof
US20050196754A1 (en) * 2000-03-31 2005-09-08 Drmanac Radoje T. Novel nucleic acids and polypeptides
US20070054278A1 (en) * 2003-11-18 2007-03-08 Applera Corporation Polymorphisms in nucleic acid molecules encoding human enzyme proteins, methods of detection and uses thereof
US20060246464A1 (en) * 2005-02-04 2006-11-02 Xueliang Xia Method of isolating, labeling and profiling small RNAs
US20080045418A1 (en) * 2005-02-04 2008-02-21 Xia Xueliang J Method of labeling and profiling rnas

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
"The Power You Need to Fuel Your microRNA Research", STRATAGENE, 2 April 2010 (2010-04-02), pages 1 - 15, Retrieved from the Internet <URL:http://www.genomics.agilent.com/GenericAaspx?PageType=Product&SubPageType=ProductLiterature&PageID=309> [retrieved on 20110815] *

Cited By (26)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US11939634B2 (en) 2010-05-18 2024-03-26 Natera, Inc. Methods for simultaneous amplification of target loci
US11859240B2 (en) 2011-01-31 2024-01-02 Roche Sequencing Solutions, Inc. Methods of identifying multiple epitopes in cells
US11926864B1 (en) 2011-01-31 2024-03-12 Roche Sequencing Solutions, Inc. Method for labeling ligation products with cell-specific barcodes I
US11692214B2 (en) * 2011-01-31 2023-07-04 Roche Sequencing Solutions, Inc. Barcoded beads and method for making the same by split-pool synthesis
US11939624B2 (en) 2011-01-31 2024-03-26 Roche Sequencing Solutions, Inc. Method for labeling ligation products with cell-specific barcodes II
US11708599B2 (en) * 2011-01-31 2023-07-25 Roche Sequencing Solutions, Inc. Methods of identifying multiple epitopes in cells
US11932903B2 (en) 2011-01-31 2024-03-19 Roche Sequencing Solutions, Inc. Kit for split-pool barcoding target molecules that are in or on cells or cell organelles
US11932902B2 (en) 2011-01-31 2024-03-19 Roche Sequencing Solutions, Inc. Barcoded beads and method for making the same by split-pool synthesis
US11732290B2 (en) 2011-01-31 2023-08-22 Roche Sequencing Solutions, Inc. Methods of identifying multiple epitopes in cells
US20240068013A1 (en) * 2011-01-31 2024-02-29 Roche Sequencing Solutions, Inc. Method for labeling ligation products with cell-specific barcodes ii
US20230407368A1 (en) * 2011-01-31 2023-12-21 Roche Sequencing Solutions, Inc. Barcoded Beads and Method for Making the Same by Split-Pool Synthesis
US20230046411A1 (en) * 2011-01-31 2023-02-16 Roche Sequencing Solutions, Inc. Methods of identifying multiple epitopes in cells
US20230146787A1 (en) * 2011-01-31 2023-05-11 Roche Sequencing Solutions, Inc. Methods of identifying multiple epitopes in cells
US11781171B1 (en) 2011-01-31 2023-10-10 Roche Sequencing Solutions, Inc. Methods of identifying multiple epitopes in cells
US10954553B2 (en) 2012-11-02 2021-03-23 Life Technologies Corporation Compositions, methods and kits for enhancing PCR specificity
US11473132B2 (en) 2012-11-02 2022-10-18 Life Technologies Corporation Compositions, methods and kits for enhancing PCR specificity
US11208688B2 (en) 2012-11-02 2021-12-28 Life Technologies Corporation Small RNA capture, detection and quantification
EP2733221A1 (en) * 2012-11-19 2014-05-21 Samsung Electronics Co., Ltd Polynucleotide and use thereof
US9657345B2 (en) 2012-11-19 2017-05-23 Samsung Electronics Co., Ltd. Polynucleotide and use thereof
US11274340B2 (en) 2015-03-13 2022-03-15 Life Technologies Corporation Methods, compositions and kits for small RNA capture, detection and quantification
CN107429296A (en) * 2015-03-13 2017-12-01 生命技术公司 Method, composition and the kit of capture, detection and quantitative tiny RNA
EP3967768A1 (en) * 2015-03-13 2022-03-16 Life Technologies Corporation Compositions for small rna capture, detection and quantification
CN107429296B (en) * 2015-03-13 2022-01-28 生命技术公司 Methods, compositions and kits for capturing, detecting and quantifying small RNAs
US10563250B2 (en) 2015-03-13 2020-02-18 Life Technologies Corporation Methods, compositions and kits for small RNA capture, detection and quantification
WO2016149021A1 (en) * 2015-03-13 2016-09-22 Life Technologies Corporation Methods, compositions and kits for small rna capture, detection and quantification
US11946101B2 (en) 2015-05-11 2024-04-02 Natera, Inc. Methods and compositions for determining ploidy

Similar Documents

Publication Publication Date Title
AU2016268089A1 (en) Methods for next generation genome walking and related compositions and kits
TW201321518A (en) Method of micro-scale nucleic acid library construction and application thereof
US11761037B1 (en) Probe and method of enriching target region applicable to high-throughput sequencing using the same
US11512344B2 (en) Consecutive hybridization for multiplexed analysis of biological samples
CN107849618A (en) Differentiate and detect the genetic marker of aquatile infectious disease Causative virus and using its Causative virus discriminating and detection method
CN101638685A (en) Method for amplifying target nucleic acid sequence by using cross primer and kit for amplifying target nucleic acid sequence and application thereof
WO2011146942A1 (en) Methods and kits to analyze microrna by nucleic acid sequencing
RU2018113795A (en) PROBE KIT FOR ANALYSIS OF DNA SAMPLES AND METHODS FOR USING THEM
Sahebi et al. Suppression subtractive hybridization versus next-generation sequencing in plant genetic engineering: challenges and perspectives
CN105793435A (en) Multiplex probes
CN103789414B (en) The composite amplification reagent kit of 17 X chromosome STRs
CN102559856B (en) Method for deleting vector segments in sequencing library
CA3135619A1 (en) Methods and systems to characterize tumors and identify tumor heterogeneity
CN110818757A (en) Nucleotide analogs and method for screening DNA polymerase
CN111321229B (en) Construction and application of liver cancer prediction model
Ahmed Differential display (DD) analysis
CA3029402C (en) Method for producing dna probe and method for analyzing genomic dna using the dna probe
KR20100012319A (en) Methods for classifying and identifying sepsis-causing microorganisms
CN106282332B (en) Label and primer for multiple nucleic acid sequencing
CN114480327B (en) Taq DNA polymerase mutant
US10093988B2 (en) Universal primers and the use thereof for the detection and identification of amphibia/fish species
CN109680066B (en) miRNA for distinguishing left and right half-colon cancers and application
Mano et al. mRAP, a sensitive method for determination of microRNA expression profiles
CN108251531B (en) Application of ENSG00000267549 in judging osteosarcoma metastasis
JP6983906B2 (en) Quantitative and qualitative library

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 11784396

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 11784396

Country of ref document: EP

Kind code of ref document: A1