WO2011120101A1 - Small rna molecules and methods of use - Google Patents

Small rna molecules and methods of use Download PDF

Info

Publication number
WO2011120101A1
WO2011120101A1 PCT/AU2011/000380 AU2011000380W WO2011120101A1 WO 2011120101 A1 WO2011120101 A1 WO 2011120101A1 AU 2011000380 W AU2011000380 W AU 2011000380W WO 2011120101 A1 WO2011120101 A1 WO 2011120101A1
Authority
WO
WIPO (PCT)
Prior art keywords
rna molecule
isolated rna
isolated
nucleotide sequence
fragment
Prior art date
Application number
PCT/AU2011/000380
Other languages
French (fr)
Inventor
Ryan James Taft
Cas Simons
Original Assignee
The University Of Queensland
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by The University Of Queensland filed Critical The University Of Queensland
Publication of WO2011120101A1 publication Critical patent/WO2011120101A1/en

Links

Classifications

    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12NMICROORGANISMS OR ENZYMES; COMPOSITIONS THEREOF; PROPAGATING, PRESERVING, OR MAINTAINING MICROORGANISMS; MUTATION OR GENETIC ENGINEERING; CULTURE MEDIA
    • C12N15/00Mutation or genetic engineering; DNA or RNA concerning genetic engineering, vectors, e.g. plasmids, or their isolation, preparation or purification; Use of hosts therefor
    • C12N15/09Recombinant DNA-technology
    • C12N15/11DNA or RNA fragments; Modified forms thereof; Non-coding nucleic acids having a biological activity
    • C12N15/113Non-coding nucleic acids modulating the expression of genes, e.g. antisense oligonucleotides; Antisense DNA or RNA; Triplex- forming oligonucleotides; Catalytic nucleic acids, e.g. ribozymes; Nucleic acids used in co-suppression or gene silencing
    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12NMICROORGANISMS OR ENZYMES; COMPOSITIONS THEREOF; PROPAGATING, PRESERVING, OR MAINTAINING MICROORGANISMS; MUTATION OR GENETIC ENGINEERING; CULTURE MEDIA
    • C12N2310/00Structure or type of the nucleic acid
    • C12N2310/10Type of nucleic acid
    • C12N2310/18Type of nucleic acid acting by a non-sequence specific mechanism

Definitions

  • RNA molecules More particularly, this invention relates to non-protein-coding, splice-site associated small RNA molecules linked to gene regulatory activities and uses thereof.
  • RNAs small regulatory RNAs
  • these small RNAs which are present in all kingdoms of life, are thought to be invol ved in many, if not most, fundamental cellular processes (Chu and Rana, 2007; Mattick and Makunin, 2005).
  • micro RNAs micro RNAs
  • RNA sequencing techniques have led to the detection of new members of established classes of small RNAs (Ghildiyal & Zamore, 2009; Malone & Hannon, 2009), and to the discovery and characterization of several classes of promoter-proximal species including 5 ⁇ capped promoter-associated small RNAs (PASRs) (Fejes-Toth et al, 2009), TSSa-RNAs (Seila et al, 2008), and transcription RNAs (tiRNAs) (Taft e/ /., 2009 (I)).
  • PASRs 5 ⁇ capped promoter-associated small RNAs
  • tiRNAs transcription initiation RNAs
  • RNA molecules e.g., miRNAs, piwi-interacting RNAs (pi R As) and tiRNAs
  • miRNAs e.g., miRNAs, piwi-interacting RNAs (pi R As) and tiRNAs
  • tiRNAs RNA molecules that are distinguishable from any previously identified class of small RNA molecules, including tiRNAs.
  • the invention provides an isolated substantially single- stranded RNA molecule that comprises a nucleotide sequence comprising no more than 26 contiguous nucleotides that correspond to a genomic DNA sequence associated with gene regulation, wherein said nucleotide sequence comprises a nucleotide sequence corresponding to a 3' nucleotide sequence of an internal exon of said genomic DNA sequence.
  • the isolated RNA molecule comprises a nucleotide sequence that corresponds to a sense strand of the internal exon of the genomic DNA sequence.
  • said isolated RNA molecule terminates in a 3 ' nucleotide sequence that corresponds to the 3' end of the internal exon of the genomic DNA sequence.
  • said isolated RNA molecule consists of a nucleotide sequence that corresponds to the 3' end of the internal exon of the genomic DNA sequence.
  • said isolated RN A molecule comprises a nucleotide sequence comprising 14-26 contiguous nucleotides.
  • said isolated RNA molecule comprises a nucleotide sequence comprising 14-20 contiguous nucleotides.
  • said isolated RNA molecule comprises a nucleotide sequence comprising 16-18 contiguous nucleotides.
  • said isolated RNA molecule comprises a nucleotide sequence comprising 17 or 18 nucleotides.
  • the isolated RNA molecule comprises a nucleotide sequence that is located in, or obtainable from, a cell nucleus.
  • the isolated RNA molecule comprises a nucleotide sequence that corresponds to a nucleotide sequence that is located at or near a 5' splice site of the internal exon of the genomic DNA sequence.
  • the isolated RNA molecule consists of a nucleotide sequence that corresponds to a nucleotide sequence that is located at the 5' splice site of the internal exon of the genomic DNA sequence.
  • the genomic DNA sequence is of or obtainable from a eukaryote.
  • the genomic DNA sequence is of or obtainable from a metazoan.
  • the genomic DNA sequence' is of or obtainable from a vertebrate or a mammal.
  • genomic DNA sequence is of or obtainable from a human
  • the nucleotide sequence of the isolated RNA molecule is GC enriched.
  • This aspect of the invention also provides a modified, isolated RNA molecule, a fragment of an isolated RNA molecule and/or an RNA or DNA molecule at least partly complementary to said isolated RNA molecule.
  • the invention provides a genetic construct which comprises or encodes one or a plurality of:
  • the genetic construct is an expression construct comprising a DNA sequence complementary to one or a plurality of the isolated RNA molecules of the first aspect operably linked or connected to one or more regulatory nucleotide sequences.
  • the invention provides a method of identifying the isolated RNA molecule of the first aspect, said method including the step of isolating one or more of said isolated RNA molecules from a nucleic acid sample obtained from an organism.
  • the invention provides a method of identifying the isolated RNA molecule of the first aspect, said method including the step of identifying a DNA sequence in a genome of an organism which is complementary to the nucleotide sequence of said one or more isolated RNA molecules.
  • the invention provides a computer-readable storage medium or device encoded with data corresponding to one or more of:
  • the invention provides a method of identifying a regulatory region in a genome, said method including the step of identifying an isolated RNA molecule according to the first aspect to thereby identify said regulatory region.
  • said regulatory region is associated with one or more regulatory activities selected from the group consisting of mRNA splicing, transcription, epigenetic modification, eplgenetic regulation, chromatin modification and nucleosome positioning.
  • the invention provides a method of determining whether a mammal has, or is predisposed to, a disease or condition associated with one or more regulatory regions of a genome, said method including the step of determining whether said mammal comprises one or more isolated RNA molecules according to the first aspect, wherein the or each nucleotide sequence of said one or more isolated RNA molecules corresponds to a genomic DNA sequence associated with said disease or condition.
  • said regulatory region is associated with one or more regulatory activities selected from the group consisting of mRNA splicing, transcription, epigenetic modification, epigenetic regulation, chromatin modification and nucleosome positioning.
  • the mammal is a human.
  • the invention provides a nucleic acid array comprising a plurality of isolated RNA molecules according to the first aspect (or their DNA equivalents), immobilized, affixed or otherwise mounted to a substrate.
  • the invention provides an antibody which binds:
  • the invention provides a kit comprising one or more isolated RNA molecules according to the first aspect, or one or more isolated nucleic acids respectively complementary thereto, and/or an antibody according to the ninth aspect, and one or more detection reagents.
  • the invention provides a method of treating a disease or condition in a mammal, said method including the step of administering to the mammal a therapeutic agent selected from the group consisting of:
  • said disease or condition is associated with aberrant regulatory activity of one or more genes.
  • said disease or condition is associated with aberrant mRNA splicing of one or more genes.
  • said disease or condition is associated with aberrant transcriptional activity of one or more genes.
  • said disease or condition is associated with aberrant epigenetic modification and/or regulation of one or more genes.
  • said disease or condition is associated with aberrant chromatin modification.
  • said disease or condition is associated with aberrant nucleosome positioning.
  • the mammal is a human.
  • the invention provides a pharmaceutical composition
  • a therapeutic agent selected from the group consisting of:
  • the pharmaceutical composition is for treating a disease or condition, such as but not limited to a disease or condition associated with one o more aberrant regulatory activities of one or more genes.
  • said one or more aberrant regulatory activities is selected from the group consisting of mRNA . splicing, transcription, epigenetic modification, epigenetic regulation, chromatin modification and nucleosome positioning.
  • FIGURE 3 Features of nuclear-localized very small RNAs.
  • (a) A schematic showing an example of nuclear small RNAs density at the human CAP-1 locus. tiRNAs are present downstream of the CAP1 TSS, and antisense and upstream, consistent with bidirectional promoter activity. Splice-site RNAs (spliRNAs), small RNAs whose 3' ends map to the exon 3' end (i.e., the 5' splice site) are expressed at exons 4 and 7.
  • FIGURE 4 Splice site RNAs are conserved across metazoa. The position of small RNA 3' ends is plotted with respect to the 5' splice site, i.e., the 3' end of the exon.
  • FIGURE 5 Quantitative real time PCR validation of the expression of (A) H/ACA snoRNA SNORA-19, (B-D) three snRNAs, and (E-G) three tRNAs in nuclear, cytoplasmic, and total RNA fractions from cultured human THP-l cells which were subsequently interrogated by deep sequencing.
  • FIGURE 6 Northern validation of cytoplasmic and nuclear THP-l RNA fractions.
  • A Validation of the nuclear expression of C/D snoRNA SNORD77, which has been previously shown to generate an -30 nt sdRNA, and H ACA snoRNA SNORA19.
  • B Validation of the cytoplasmic localization of tRNA Lys (AAG) and tRNA His (CAY).
  • FIGURE 7 The density of (A) nuclear, and (B) cytoplasmic THP- l small RNAs across exon-intron boundaries. Note that (A) and (B) have vertical axes on different scales.
  • FIGURE 8 The density of (A) nuclear, and (B) cytoplasmic THP-l small RNA that multimap (2-5x) across exon-exon boundaries. Note that (A) and (B) have vertical axes on different scales.
  • FIGURE 9. The density of total THP-l small RNAs at (A) exon-exon junctions, or (B) exon-intron junctions.
  • FIGURE 10 The density of small RNAs in primary mouse granulocyte nuclei at (A) exon-exon, or (B) exon-intron junctions. See Fig. 11 and the Methods for more information about the isolation and sequencing of these small RNAs.
  • FIGURE 11 Mouse primary granulocyte purification.
  • A-C Flow cytometric analysis of FACS purified granulocytes showing percentage of sorted cells in each gate (>95% purity). Granulocytes were defined as (A) lineage negative (B220, CD19, CD3, Sca- 1 negative), side scatter high (SSC-A), (B) CD34 negative, c-kit negative and (C) Gr- l high, CD 16/32 positive.
  • D Granulocyte morphology was confirmed by May-Grunwald Giemsa staining of FACS purified cells.
  • FIGURE 12 The density of small RNAs in mouse embryonic stem (ES) cells lacking the micro RNA processing enzymes (A) Dicer or (B) Dger8 across exon- exon junctions. spliRNA biogenesis is not affected by the loss of known components of the RNAi pathway in mice.
  • FIGURE 13 The density of C. elegans small RNAs across exon-exon junctions.
  • spliRNAs are expressed in (A) Fog-2, (B) Glp-4, and (C) Prg- 1 mutant animals, indicating that spliRNA production is not abolished in response to mutations affecting the development of the germline.
  • FIGURE 14 The density of Drosophila small RNAs across exon-exon junctions. spliRNAs are expressed in (A,B) early embryonic development and (C,D) cultured S2 and KC cells.
  • FIGURE 15 The density of C. elegans small RNAs across exon-exon junctions. spliRNAs are expressed in (A) embryonic worms, and (B-E) developing larvae. (F) This pattern is consistent in a mixed stage library.
  • FIGURE 16 The density of Drosophila small RNAs across exon-exon junctions in various anatomical regions. spliRNAs are expressed in adult (A) male and (C) female heads but are not apparent in (B) male or (D) female bodies. spliRNAs are weakly expressed compared to background in (E) imaginal discs.
  • FIGURE 17 The density of Amphimedon queenslandica (vnmme sponge) small RNAs across exon-exon junctions. Although spliRNAs are expressed in both (A) embryonic and (B) adult sponge, their relative expression compared to background is ⁇ 2 fold higher in embryo.
  • FIGURE 18 The relationship between spliRNA and gene expression.
  • A Genes with spliRNAs are significantly more highly expressed than those without in THP-1 cells.
  • B Genes with spliRNAs are generally more highly expressed than those without throughout Drosophila early embryonic development. **p-value ⁇ 2.2e- 16 and **p-value ⁇ 0.05 by Wilcoxon rank sum test with continuity correction. ⁇
  • FIGURE 19 The density of small RNAs from Arabidopsis and two budding yeast (S. cerevisiae and S. castellii) across exon-exon junctions. spliRNA expression is not evident in (A) Arabidopsis or (B,C) budding yeast.
  • FIGURE 20 spliRNA sequence logos. Sequence logos of (A) 18 nt, (B) 17 nt, and (C) 16 nt nuclear-localized THP-1 spliRNAs are shown.
  • FIGURE 21 The correlation between GRO-seq density and spliRNAs. GRO-seq mappings were obtained from NCBI GEO (see Methods) and their density was plotted against all Refseq exons (excluding annotations on the haplotype, random, or mitochondrial chromosomes), or against the subset of exons that show spliRNAs in the THP-1 nuclear dataset. The spliRNA average density shown is derived from the THP-1 nuclear dataset. A local minimum of GRO-seq density correlates with the presence of spliRNAs, consistent with a model of spliRNA biogenesis dependent on RNAPII backtracking and TFIlS-mediated cleavage.
  • FIGURE 22 The relationship between spliRNAs and downstream intron size.
  • the size distribution of introns immediately upstream or downstream of an exon with a THP-1 spliRNA is shown in orange and green, respectively. Short introns are twofold more common immediately downstream of exons with spliRNAs in THP-1 cells.
  • FIGURE 23 The relationship between spliRNAs and exon size.
  • spliRNAs are ⁇ 2-fold less abundant at very short exons, which do not have positioned nucleosomes.
  • FIGURE 24 Small RNAs from wild type 5. cerevisiae exhibit characteristics similar to tiRNAs and spliRNAs.
  • A S. cerevisiae small RNA average density is plotted with respected to the average density of nucleosomes downstream of all annotated Refgene transcription start sites. Peak small RNA density coincides with local nucleosome density minima.
  • FIGURE 26 KEGG pathway enrichment analysis for genes with spliRNAs in DCIS tissue.
  • FIGURE 27 Splice-site RNA expression comparison per gene in the normal adjacent (i. e. , control) versus the DGIS (i.e. , cancer) tissues. Note the number of genes with differentially highly expressed spliRNAs in the DCIS tissue.
  • FIGURE 28 Genes with greater than twofold greater spliRNA expression in the DCIS sample are significantly associated with breast cancer pathways in GeneSigDB.
  • the present inventors analysed the relationship between 5' splice-sites and small RNAs present in deep sequencing libraries from human and mouse cell lines, mouse primary granulocyte nuclei, a variety of nematode worm Caenorhabditis elegans (C. elegans), fruit fly Drosophila melanogaster (D. melanogaster) and marine sponge Amphimedon queenslandica (A. queenslandicd) primary tissues.
  • the present invention arises from the finding of a novel class of "splice- site" RNA molecules (spliRNAs) that may be associated with gene regulatory mechanisms including, but not limited to, mRNA splicing, transcription, epigenetic modification, epigenetic regulation, chromatin modification and nucleosome positioning.
  • spliRNAs splice- site RNA molecules
  • the present inventors have surprisingly found that these spliRNAs may be distinguished from any previously identified class of small RNAs based on their location in the genome and the features they are associated with.
  • the spliRNA may be characterized as comprising a nucleotide sequence that (i) corresponds to a nucleotide sequence that is located at or near a 5' splice-site of an internal exon of a genomic DNA sequence, and (ii) corresponds to a 3' nucleotide sequence of the internal exon of the genomic DNA sequence.
  • RNA molecules exhibit different characteristics to the small non-coding RNA molecules (e.g. , miRNAs, piRNAs and tiRNAs) previously identified.
  • the present invention is based on the inventors' identification of spliRNA molecules, the manipulation of these spliRNAs and the use of spliRNAs to characterize their role and function in cells.
  • the invention also concerns methods and compositions for identifying spliRNAs, arrays comprising spliRNAs (spliRNA arrays) and use of spliRNAs for diagnostic, therapeutic and prognostic applications in mammals, particularly humans.
  • ' ' 'isolated is meant present in an environment removed from a natural state or otherwise subjected to human manipulation. Isolated material may be substantially or essentially free from components that normally accompany it in its natural state, or may be manipulated so as to be in an artificial state together with components that normally accompany it in its natural state.
  • the term ''isolated also encompasses terms such as ''enriched", “purified”, “synthetic” and/or ''recombinant”.
  • nucleic acid as used herein designates single- or double- stranded mRNA, RNA, cRNA, RNAi and DNA inclusive of cDNA and genomic DNA.
  • Nucleic acids may comprise naturally-occurring nucleotides or synthetic, modified or derivatized bases (e.g., inosine, methyinosine, pseudouridine, methy!cytosine etc). Nucleic acids may also comprise chemical moieties coupled thereto to them. Examples of chemical moieties include, but are not limited to, biotin, locked nucleic acids (LNAs), peptide nucleic acids (PNAs), cholesterol, 2'0-methyl, Morpholino, and fluorophores such as HEX, FAM, Fluorescein and FITC.
  • LNAs locked nucleic acids
  • PNAs peptide nucleic acids
  • cholesterol 2'0-methyl
  • Morpholino Morpholino
  • fluorophores such as HEX, FAM, Fluorescein and FITC.
  • the invention provides an isolated substantially-single stranded RNA molecule (referred to herein as a "splice-site RNA” or “spliRNA”) that comprises a nucleotide sequence comprising no more than 26 contiguous nucleotides that correspond to a genomic DNA sequence associated with gene regulation, wherein said nucleotide sequence comprises a nucleotide sequence corresponding to a 3 ' nucleotide sequence of an internal exon of said genomic DNA sequence.
  • splice-site RNA or splitRNA
  • corresponding to and “corresponds to” means that the spliRNA molecule has a nucleotide sequence of, or a sequence complementary to, a genomic DNA nucleotide sequence. It will be appreciated that this definition should take into account that RNA uses a U instead of a T, as found in DNA.
  • the spliRNA comprises a nucleotide sequence that corresponds to a sense strand of the internal exon of the genomic DNA sequence.
  • said spliRNA molecule terminates in a 3' nucleotide sequence that corresponds to the 3' end of the internal exon of the genomic DNA sequence.
  • said isolated RNA molecule consists of a nucleotide sequence that corresponds to the 3' end of the internal exon of the genomic DNA sequence.
  • the spliRNA molecule comprises a nucleotide sequence comprising 14-26 contiguous nucleotides.
  • the spliRNA molecule comprises a nucleotide sequence comprising 14-20 contiguous nucleotides. In yet another preferred form, the spliRNA molecule comprises a nucleotide sequence comprising 16- 18 contiguous nucleotides.
  • the spliRNA molecule comprises a nucleotide sequence comprising 17 or 18 contiguous nucleotides.
  • the length of the spliRNA molecule may vary depending on the genome that it is located in and/or derived from.
  • a spliRNA molecule of a human genome may comprise on average 17 or 18 nucleotides while a spliRNA molecule of a different eukaryotic genome (e.g. , mouse and Drosophila) may comprise on average 17 nucleotides, although without limitation thereto.
  • the isolated RNA molecule comprises a nucleotide sequence that corresponds to a nucleotide sequence that is located at or near a 5' splice site of the internal exon of the genomic DNA sequence.
  • the isolated RNA molecule consists of a nucleotide sequence that corresponds to a nucleotide sequence that is located at the 5 ' splice site of the internal exon of the genomic DNA sequence.
  • the spliRNA does not encode a peptide or a protein encoded by a genome. Accordingly, the spliRNA comprises a nucleotide sequence that is referred to herein as "non-coding".
  • spliRNAs are typically not linked to the pathways that produce miRNAs, siRNAs and piRNAs. Thus, in contrast to other small non-coding RNA molecules (e.g. , miRNAs, siRNAs and piRNAs), spliRNAs do typically not require a dicer and/or a Dgcr8 for their processing and/or production.
  • the spliRNAs are expressed in most tissues and developmental stages in eukaryotes such as, but not limited to, humans, mice, Drosophila and C. elegans. In some cases, the spliRNAs are enriched, increased or otherwise more prevalent in actively differentiating and/or undifferentiated tissues. For example, the expression of a spliRNA may be higher in a Drosophila head compared to in a Drosophila body. It will also be appreciated that the expression of the spliRNA may be upregulated, induced, elevated or otherwise increased in an embryonic sponge compared to in an adult sponge (e.g., the marine sponge A. queenslandica). It will also be appreciated that genes associated with spliRNAs may be more highly expressed compared to genes that lack or have a low abundance of spliRNAs.
  • the spliRNA molecule has a nucleotide sequence transcribed from the corresponding DNA sequence, it will be appreciated that said spliRNA molecule may be chemically-synthesized de novo, rather than transcribed from a DNA sequence.
  • RNA synthesis using TOM amidite chemistry examples include RNA synthesis using TOM amidite chemistry, 2- cyanoethoxymethyl (CEM), a 2'-hydroxyl protecting groups and fast oligonucleotide deprotecting groups.
  • TOM amidite chemistry examples include RNA synthesis using TOM amidite chemistry, 2- cyanoethoxymethyl (CEM), a 2'-hydroxyl protecting groups and fast oligonucleotide deprotecting groups.
  • CEM 2- cyanoethoxymethyl
  • the nucleotide sequence of a spliRNA molecule is typically GC rich.
  • the percent GC content of the nucleotide sequence is substantially greater than the average GC content of the genome from which the spliRNA is derived. This GC content also differs from that of miRNAs.
  • the GC content of spliRNAs is greater than about 50-70%, or greater than about 51 -60%.
  • the GC content of spliRNAs is greater than about 51 %, greater than about 52%, greater than about 53%, or greater than about 54% compared to about 50% for miRNAs and compared to about 70% for tiRNAs.
  • a spliRNA may be is associated with the maintenance and/or initiation of nucleosome positioning.
  • a spliRNA may be located at a region of ⁇ a genome with (i) RNA polymerase II binding and/or (ii) transcription elongation factor II (TF1IS) activity.
  • spliRNA expression may be at least partly upregulated, elevated or otherwise increased at exons located upstream of short introns.
  • spliRNA expression at an exon located upstream of an intron that comprises 60-120 nucleotides or less may be, for example, 25%, 50%, 75%, 100% 150%, 200%, 250%, 300%, 350%, 400%, 450% and up to about 500% increased compared to spliRNA expression at an exon located upstream of an intron that comprises 121 nucleotides or more.
  • spliRNA expression may be at least partly reduced, decreased or otherwise down-regulated at short exons. Consequently, spliRNA expression may be, for example, 25%, 50%, 75%, 100% 150%, 200%, 250%, 300%, 350%, 400%, 450% and up to about 500% reduced at an exon comprising 60 nucleotides or less compared to spliRNA expression at an exon comprising 61 nucleotides or more.
  • spliRNA molecules do not form secondary structures, such as stem and loop structures. Accordingly, spliRNA molecules are substantially free of internal base-pairing. In this context, by “substantially free” is meant fewer than 3, 2 or 1 internal base pairs.
  • the invention provides an isolated substantially single-stranded spliRNA molecule, wherein said isolated spliRNA molecule comprises a nucleotide sequence that:
  • (i) consists of 17 or 18 contiguous nucleotides that correspond to a sense strand of an internal exon of a mammalian genomic DNA sequence located at a 5' splice site;
  • (ii) corresponds to a 3' end of the internal exon of the mammalian genomic DNA sequence
  • (iii) comprises a GC content greater than 54%
  • the mammalian genomic sequence is of, or obtainable from, a human genome.
  • Non-limiting examples of the isolated spliRNA molecules of the invention are set forth in SEQ ID NOs: 1 -6884 (Fig. 1 .(human)) and SEQ ID NOs:6885- 16,898 (Fig. 2 A-C (mouse, C. elegans., and Drosophila)).
  • the isolated spliRNA molecule comprises a nucleotide sequence that is located in, or obtainable from, a. cell nucleus. It will also be appreciated that the invention contemplates nucleic acid, molecules (e.g., RNA or DNA) complementary to or at least partly complementary to the spliRNA molecules of the invention. Complementary or at least partly complementary nucleic acid molecules may be in DNA or RNA form.
  • the invention also provides a modified spliRNA molecule.
  • a modified spliRNA may be altered by, complexed, labeled or otherwise covalently or non-covalently coupled to one or more other chemical entities.
  • the chemical entity may be bonded, linked or otherwise attached directly to the spliRNA, or it may be bonded, linked or otherwise attached to the spliRNA via a linking group (e.g., a spacer).
  • Examples of such chemical entities include, but are not limited to, incorporation of modified bases (e.g. , inosine, methyl inosine, pseudouridine and morpholino), sugars and other carbohydrates such as 2'-0-methyl and locked nucleic acids (LNA), amino groups and peptides (e.g.. peptide, nucleic acids (PNA)), biotin, cholesterol, fluorophores (e.g., FITC, Fluoroscein, Rhodamine, HEX, FAM, TET and Oregon Green) radionuclides and metals, although without limitation thereto (Fabani and Gait, 2008; You et al. , 2006; Summerton and Weller, 1997), A more complete list of possible chemical modifications can be found at http://www.oligos.com'ModificationsList.htm.
  • modified bases e.g. , inosine, methyl inosine, pseudouridine and morpholino
  • sugars and other carbohydrates such as 2'
  • the modified spliRNA is an "antisense inhibitor ".
  • antisense inhibitor is meant a nucleic acid sequence that is either complementary to or at least partly complementary to the spliRNA molecule (Dias and Stein, 2002; Kurreck, 2003; Sahu et al, 2007). The antisense inhibitor pairs with the spliRNA and interferes with interactions such as, but not limited to, spliRNA-mRNA and spliRNA-DNA interactions. Experiments showing sequence-specific inhibition of small RNA function have previously been demonstrated both in vitro (Meister et al , 2004; Hutvagner et al , 2004) and in vivo (Rriitzfeldt et al , 2005).
  • the modified spliRNA is a "point mutant ".
  • point mutant is meant a spliRNA molecule where 1 or 2 nucleotides have been removed, substituted or otherwise altered. Point mutants of spliRNAs or their targets can be employed to study the function of spliRNAs in disease or to increase the affinity of spliRNAs to variant targets.
  • Small RNA molecules involved in disease processes, including spliRNAs may have "seed- sequences ".
  • seed-sequences is meant nucleic acid sequences that comprise 2-7 nucleotides and are involved in target recognition (Lewis et al. , 2003 ; Lewis et al. , 2005). Increasing the mismatch in these sequences is predicted to significantly decrease the gene regulation function of spliRNAs. This approach may be applicable for partial inhibition of spliRNA . targets.
  • the modified spliRNA molecule is a "spliRNA sponge".
  • spliRNA sponge is meant a genetically encoded competitive spliRNA inhibitor that may be stably expressed in a cell, such as a eukaryotic cell.
  • the spliRNA binds to the spliRNA thereby preventing it from binding its mRNA target in a technique called "sponging".
  • the spliRNA sponges may be produced using methods such as the ones described in Cohen, 2009, Ebert et al. , 2007, Hammond, 2007 and Rooij et al. , 2008. It will be appreciated that a spliRNA sponge may bind to, soak up and/or inhibit a specific spliRNA and/or a family of spliRNAs.
  • the modified spliRNA is a "spliRNA mimic".
  • a "spliRNA mimic" is a single-stranded RNA oligonucleotide that is complementary to or at least partly complementary to the spliRNA.
  • the spliRNA mimic may inactivate pathological spliRNAs through complementary . base-pairing. It will also be appreciated that chemical modification to LNA, PNA or morpholino and conjugation to cholesterol may stabilize the spliRNA mimic molecule and facilitate delivery of single-stranded RNA molecules to targets following intravenous administration (Rooij and Olson, 2007).
  • the invention also provides a fragment of a spliRNA of the invention.
  • fragment ' is meant a portion, domain, region or sub-sequence of a spliRNA molecule which comprises one or more structural and/or functional characteristics of a spliRNA molecule.
  • a fragment may comprise at least 5, at least 6, at least 7, at least 8, at least 9, at least 10, at least 11, at least 12, at least 13, at least 14, at least 15, at least 16 or at least 17 nucleotides of a spliRNA molecule.
  • the spliRNA molecules can be chemically modified to facilitate penetration into cells.
  • modifications include, but are not limited to, conjugation to cholesterol, Morpholino, ⁇ - methyl, PNA or LNA (Partridge et al , 1996; Corey and Abrams, 2001 ; os et al, 2003).
  • Modified spliRNA molecules also include "variants" of the spliRNA molecules of the invention.
  • Variants include RNA or DNA molecules comprising a nucleotide sequence at least 70%, 75%, 80%, 85%, 90%, 95%, 96%, 97%, 98% or 99% identical to a nucleotide sequence of a spliRNA molecule such as described in Fig. 1 and Fig. 2.
  • Such variants may include one or more point mutations, nucleotide substitutions, deletions or additions;
  • a genetic construct comprising or encoding one or a plurality of the same or different spliRNA molecules, modified spliRNA molecules, at least partly complementary DNA or RNA molecules, or fragments thereof.
  • spliRNA molecules may be oriented in tandem repeats or with multiple copies of each spliRNA sequence.
  • a "genetic construct” is any artificially constructed nucleic acid molecule comprising heterologous nucleotide sequences.
  • a genetic construct is typically in DNA form, such as a phage, plasmid, cosmid, artificial chromosome (e.g., a YAC or BAC), although without limitation thereto.
  • the genetic construct suitably comprises one or more additional nucleotide sequences, such as for assisting propagation and/or selection of bacterial or other cells transformed or transfected with the genetic construct.
  • the genetic construct is a DNA expression construct that comprises one or more regulatory sequences that facilitate transcription of one or more spliRNA molecules, modified spliRNA molecules or fragments thereof.
  • regulatory sequences may include promoters, enhancers, polyadenylatiori sequences, splice donor/acceptor sites, although without limitation thereto.
  • Suitable promoters may be selected according to the cell or organism in which the spliRNA molecule is to be expressed. Promoters may be selected to facilitate constitutive, conditional, tissue-specific, inducible or repressible expression as is well understood in the art. Examples of promoters are T7, SP6, SV40, PolIII, U6, HI and 7S , although without limitation thereto.
  • the spliRNA molecule may be provided as an encoding DNA sequence in an expression construct that, when transcribed, produces the spliRNA molecule as a transcript.
  • spliRNA molecules appear to be a hitherto unknown form of small, single stranded RNA molecules that occur throughout evolution. Accordingly, spliRNA molecules may be isolated, identified, purified or otherwise obtained from a number of different organisms.
  • the organism is a eukaryote.
  • the organism is a metazoan inclusive of all multi-celled animals ranging from marine sponge to insects and vertebrates.
  • the organism is a vertebrate, inclusive of mammals, avians such as chickens and ducks and aquaculture species such as fish, although without limitation thereto.
  • the organism is a mammal.
  • Mammals include humans, livestock such as horses, pigs, cows and sheep, domestic animals such as cats and dogs, although without limitation thereto.
  • the invention therefore provides methods of identifying, purifying or otherwise obtaining a spliRNA molecule.
  • such methods may include analysis of nucleic acid samples obtained from an organism, and/or bioinformatic analysis of genome sequence information.
  • the nucleic acid samples are derived from the genome of a eukaryote. More preferably, the nucleic acid samples are derived from the genome of a metazoan inclusive of marine sponge, insects and vertebrates.
  • the nucleic acid samples are derived from the genome of a vertebrate, inclusive of mammals, avians such as chickens and ducks and aquaculture species such as fish, although without limitation thereto..
  • the nucleic acid samples are derived from the genome of a mammal.
  • Mammals include humans, livestock such as horses, pigs, cows and sheep, domestic animals such -as cats and dogs, although without limitation thereto.
  • methods for analyzing a nucleic acid sample to identify a spliR A include "deep sequencing" and mapping strategies that consider exon- exon and/or exon-intron boundaries and multi-mapping deep sequencing reads.
  • deep sequencing technologies employed for the identification of exon boundaries and spliRNAs include, but are not limited to, 454TM-, Helicos-, PacBio-, Soiexa/Illumina- and SOLiD-sequencing.
  • the invention provides a computer-readable storage medium or device encoded with structural information of one or more spliRNA molecules.
  • the structural information may be nucleotide sequence, sequence length,
  • a computer-readable storage medium may have computer readable program code components stored thereon for programming a computer (e.g., any device comprising a processor) ' to perform a method as described herein.
  • Examples of such computer-readable storage media include, but are not limited to, a hard disk, a CD-ROM, an optical storage device, a magnetic storage device, a ROM (Read Only Memory), a PROM (Programmable Read Only Memory), an EPROM (Erasable Programmable Read Only Memory), an EEPROM (Electrically Erasable Programmable Read Only Memory) and a Flash memory.
  • the computer-readable storage medium or device is part of a computer or computer network capable of interrogating, searching or querying a genome sequence database.
  • a bioinformatic method may utilize a high performance computing station which houses a local mirror of the UCSC Genome Browser.
  • One further aspect of the invention provides antibodies which bind, recognize and/or have been raised against a spliRNA of the invention, inclusive of fragments and modified spliRNA molecules.
  • Antibodies may be monoclonal or polyclonal. Antibodies also include antibody fragments such as Fc fragments, Fab and Fab '2 fragments, diabodies and ScFv fragments. Antibodies may be made in a suitable production animal such as a mouse, rat, rabbit, sheep, chicken or goat.
  • the invention also contemplates recombinant methods of producing antibodies and antibody fragments.
  • antibodies to RNA molecules have been produced by a method utilizing a synthetic phage display library approach to select RNA-binding antibody fragments (Ye et al, 2008).
  • antibodies may be conjugated with labels selected from a group including an enzyme, a fluorophore, a chemiiuminescent molecule, biotin, radioisotope or other label.
  • suitable enzyme labels useful in the present invention include alkaline phosphatase, horseradish peroxidase, luciferase, ⁇ -galactosidase, glucose oxidase, lysozyme, malate dehydrogenase and the like.
  • the enzyme label may be used alone or in combination with a second enzyme in solution or with a suitable chromogenic or chemiiuminescent substrate.
  • chromogens examples include diaminobanzidine (DAB), permanent red, 3-ethylbenzthiazoline sulfonic acid (ABTS), 5-bromo-4-chloro-3-indolyl phosphate (BCIP), nitro blue tetrazolium (NBT), 3,3 ' ,5,5 ' -tetramethyl benzidine (TNB) and 4-chloro-l -naphthol (4-CN) , although without limitation thereto.
  • a non-limiting example of a chemiluminescent substrate is LuminolTM, which is oxidized in the presence of horseradish peroxidase and hydrogen peroxide to form an excited state product (3-ammophthalate).
  • Fluorophores may be fluorescein isothiocyanate (FITC), tetramethylrhodamine isothiocyanate (TRITC), allophycocyanin (APC), Texas Red (TR), Cy5 or R-Phycoerythrin (RPE), although without limitation thereto.
  • FITC fluorescein isothiocyanate
  • TRITC tetramethylrhodamine isothiocyanate
  • APC allophycocyanin
  • TR Texas Red
  • Cy5 Cy5
  • RPE R-Phycoerythrin
  • Radioisotope labels may include . l23 I, I 3 I I, 5 l Cr and 99 Tc, although without limitation thereto.
  • antibody labels that may be useful include colloidal gold particles and digoxigenin.
  • the invention provides a method of identifying a spliRNA expression profile as a quantitative or qualitative . indicator or measure of gene regulation. These methods may be particularly, although not exclusively, relevant to diagnosis of diseases and conditions associated with differential gene regulation.
  • said spliRNA expression profile is an indicator and/or measure of mRNA splicing activity.
  • said spliRNA expression profile is an indicator of particular exon inclusion as a result of mRNA splicing activity.
  • said spliRNA expression profile is an indicator and/or measure of gene transcriptional activity.
  • said spliRNA expression profile is an indicator and/or measure of epigenetic modulatory and/or regulatory activity.
  • said spliRNA expression profile is an indicator and/or measure of chromatin modification activity.
  • said spliRNA expression profile is an indicator and/or measure of nucleosome positioning activity.
  • the method uses a "nucleic acid array” (spliRNA array).
  • nucleic acid array is a meant a plurality of nucleic acids, preferably ranging in size from 10, 15, 20 or 50 bp to 250, 500, 700 or 900 kb, immobilized, affixed or otherwise mounted to a substrate or solid support. Typically, each of the plurality of nucleic acids has been placed at a defined location, either by spotting or direct synthesis. In array analysis, a nucleic acid-containing sample is labeled and allowed to hybridize with the plurality of nucleic acids on the array. Nucleic acids attached to arrays are referred to as “targets” whereas the labelled nucleic acids comprising the sample are called “probes ".
  • gene arrays Based on the amount of probe hybridized to each target spot, information is gained about the specific nucleic acid composition of the sample.
  • the major advantage of gene arrays is that they can provide information on thousands of targets in a single experiment and are most often used to monitor gene expression levels and "differential expression ".
  • “Differential expression" indicates whether the level of a particular spliRNA in a sample is higher or lower than the level of that particular spliRNA in a normal or reference sample.
  • nucleic acid samples representing entire genomes, ranging from 3,000-32,000 genes, may be packaged onto one solid support.
  • the arrayed nucleic acids may be composed of oligonucleotides, PGR products or cDNA vectors or purified inserts.
  • the sequences may represent entire genomes and may include both known and unknown sequences or may be collections of sequences such as miRNAs.
  • gene profiling such as but not limited to vising a spliRNA array, is used to identify mRNAs whose expression shows a positive or inverse correlation with the expression of a specific spliRNA.
  • an absence of spliRNA expression could correlate with a presence of mRNA expression, or vice versa.
  • a presence of spliRNA expression could correlate with a presence of mRNA expression or an absence of spliRNA expression could correlate with an absence of mRNA expression.
  • a level of spliRNA expression could correlate with a level of mRNA expression, whether directly or inversely. It will be appreciated that a level of expression may be measured as a quantitative or a relative expression level.
  • gene profiling allows the identification of regulators of disease processes and .potential therapeutic targets.
  • diseases and conditions that show differential gene regulation include but are not limited to Crohn's disease, Alzheimer's disease, Parkinson's disease, schizophrenia, infertility, rheumatoid arthritis, myocardial infarction, diabetes, congenital developmental disorders, coronary heart disease, and cancer such as breast cancer, lymphoma, leukemia, colorectal cancer, gastric cancer, ovarian cancer, aggressive metastatic brain cancer, and pituitary tumors (McKatg et al , 2003; Grunblatt et al , 2007; Liang efal , 2008; Liibke et al , 2008; Ridker,
  • said gene regulation may refer to aberrant gene transcription, aberrant mRNA splicing, aberrant epigenetic modification, aberrant epigenetic regulation, aberrant chromatin modification and/or aberrant nucleosome positioning.
  • spliRNAs may be associated with aberrant regulatory activity of oncogenes or tumor suppressors (Zhang et al , 2006) and may therefore become useful biomarkers for cancer diagnostics.
  • said aberrant regulatory activity may in some embodiments refer to activities such as transcription, mRNA splicing, 'epigenetic modification, epigenetic regulation, chromatin modification and/or nucleosome positioning.
  • the spliRNAs may be associated with oncogenes such as myc*, Bcl-2 and -3*, myb* , mdm2*, mdmx*, and ras.
  • the spliRNAs may be associated with a tumour suppressor gene, for example, p21 and/or p53 (see, e.g. , Sotos-Reyes & Recillas-Targa ⁇ O lO).
  • the spliRNAs may be linked to aberrant mRNA splicing of genes associated with diseases and conditions such as, Dnmtl * and Dnmt3* associated with cancer progression, and APP and beta amyloid in Alzheimer's disease.
  • the spliRNAs may be linked to aberrant epigenetic modification ⁇ and/or regulation of genes associated with various cancers, a-Thalassaemia, and Prader-Willi, AT -X, Fragile X, 1CF, Angelman's, and Rett syndromes.
  • spliRNAs may be detected in biological samples in order to determine and classify certain cell types or. tissue types or spliRNA - associated pathogenic disorders which are characterized by differential expression of spliRNA molecules or spliRNA molecule patterns. Further, the developmental stage of cells, organs and/or tissues may be classified by determining spatial and/or temporal expression patterns of spliRNA molecules.
  • the invention provides a method of treating a disease or condition in an animal, said method including the step of administering to the animal a therapeutic agent selected from . the group consisting of:
  • the aforementioned therapeutic agents may be suitable for prophylaxis and/or therapy of animals, including mammals such as humans.
  • the therapeutic agents may be used to treat diseases, conditions, developmental processes and/or disorders associated with developmental dysfunctions.
  • Certain spliRNAs may function as tumour-suppressors and thus expression or delivery of these spliRNAs or spliPJ 'A mimics " to tumor cells may provide therapeutic efficacy.
  • the use of chemically modified spliRNAs to target either a specific spliRNA or to disrupt the binding of a spliRNA and its specific ⁇ mRNA target in vivo may provide a potentially effective means of inactivating pathological spliRNAs.
  • spliRNAs may be administered to potentiate the effects of natural spliRNAs by promoting the expression of beneficial gene products such as tumour suppressor proteins (Rooij and Olson, 2007).
  • Therapeutic agents may be delivered to an animal in the form of a pharmaceutical composition comprising a pharmaceutically acceptable carrier diluent or excipient.
  • the invention provides a pharmaceutical composition
  • a therapeutic agent selected from the group consisting of:
  • pharmaceutically-acceptable carrier diluent or excipient
  • a solid or liquid filler diluent or encapsulating substance that may be safely used in systemic administration. This includes carriers, diluents or excipients suitable for veterinary use.
  • carriers may be selected from a group including sugars, starches, cellulose and its derivatives, malt, gelatine, talc, calcium sulfate, vegetable oils, synthetic oils, polyo!s, alginic acid, phosphate buffered solutions, emulsifiers, isotonic saline and salts such as mineral acid salts including hydrochlorides, bromides and sulfates, organic acids such as acetates, propionates and malonates and pyrogen-free water.
  • carriers may be selected from a group including sugars, starches, cellulose and its derivatives, malt, gelatine, talc, calcium sulfate, vegetable oils, synthetic oils, polyo!s, alginic acid, phosphate buffered solutions, emulsifiers, isotonic saline and salts such as mineral acid salts including hydrochlorides, bromides and sulfates, organic acids such as acetates, propionates and malonates
  • any safe route of administration may be employed for providing a patient with the composition of the invention.
  • oral, rectal, parenteral, sublingual, buccal, intravenous, intra-articular, intra-muscular, intra-dermal, subcutaneous, inhalational, intraocular, intraperitoneal, intracerebroventricular, transdermal and the like may be employed.
  • Intra-muscular and subcutaneous injection is appropriate, for example, for administration of immunotherapeutic compositions, proteinaceous vaccines and nucleic acid vaccines.
  • the drug may be transfected into cells together with the DNA.
  • Dosage forms include tablets, dispersions, suspensions, injections, solutions, syrups, troches, capsules, suppositories, aerosols, transdermal patches and the like. These dosage forms may also include injecting or implanting controlled releasing devices designed specifically for this purpose or other forms of implants modified to act additionally in this fashion. Controlled release of the therapeutic agent may be achieved by coating the same, for example, with hydrophobic polymers including acrylic resins, waxes, higher aliphatic alcohols, polylactic and polyglycolic acids and certain cellulose derivatives such as hydroxypropylmethyl cellulose. In addition, the controlled release may be achieved by using other polymer matrices, liposomes and/or microspheres.
  • compositions of the present invention suitable for oral or parenteral administration may be presented as discrete units such as capsules, sachets or tablets each containing a pre-determined amount of one or more therapeutic agents of the invention, as a powder or granules or as a solution or a suspension in an aqueous liquid, a non-aqueous liquid, an oil-in-water emulsion or a water-in- oil liquid emulsion.
  • Such compositions may be prepared by any of the methods of pharmacy but all methods include the step of bringing into association one or more agents as described above with the carrier which constitutes one or more necessary ingredients.
  • compositions are prepared by uniformly and intimately admixing the agents of the invention with liquid carriers or finely divided solid carriers or both, and then, if necessary, shaping the product into the desired presentation.
  • the above compositions may be administered in a manner compatible with the dosage formulation, and in such amount as is pharmaceuticaily-effective.
  • the dose administered to a patient in the context of the present invention, should be sufficient to achieve a beneficial response in a patient over an appropriate period of time.
  • the quantity of agent(s) to be administered may depend on the subject to be treated inclusive , of the age, sex, weight and general health condition thereof, factors that will depend on the judgement of the practitioner.
  • Animals include and encompass fish, avians (e.g., chickens and other poultry) and mammals inclusive of humans, livestock, domestic pets and performance animals (e.g., racehorses), although without limitation thereto.
  • avians e.g., chickens and other poultry
  • mammals inclusive of humans
  • livestock e.g., livestock, domestic pets and performance animals (e.g., racehorses), although without limitation thereto.
  • Example 1 Nuclear-localized tiny RNAs are associated with transcription initiation and splice sites in raetazoans
  • THP-1 cells were grown in suspension culture according to previously published methods (Taft et al , 2009 (I)); Suzuki ' et al , 2009) and harvested. Growth media was aspirated and the pellets were resuspended and washed twice in equivalent volumes of ice-cold PBS. The cells were then split into two equal volumes for the extraction of total RNA and the extraction of nuclear and cytoplasmic RNA. Total RNA was extracted using TRIzol (Invitrogen), according to the manufacturer's instructions.
  • Nuclear and cytoplasmic RNA was isolated as previously described (Hwang et al, 2007), except that each wash was carried out using 1ml of wash buffer and tween-40 was substituted, for tween-20 in the final wash. Additionally, to ensure complete clearing of any intact cells or nuclei, the cytoplasmic fraction was subjected to an additional centrifugation step at 1000 X g for 5 minutes at 4°C, after which the supernatant was transferred to clean tubes for RNA extraction. To validate the integrity of the cytoplasmic fraction an aliquot was aspirated and inspected under an inverted light microscope to confirm that it was free from intact cells or nuclei.
  • RNA species were assessed by qPCR on cDNA prepared from total, cytoplasmic, and nuclear RNA fractions (for a list of targets, primers and results, see Table 5 and Fig. 5.
  • a total of 6ul from each cell-equivalent fraction was used (0.336ug from nuclear fraction, 1.74ug from cytoplasmic fraction and 3ug from the total fraction).
  • Each 6ul fraction was DNase treated using TURBO DNase (Ambion) according to manufacturer's instructions and then reverse transcribed using the Superscript III Reverse Transcriptase kit (Invitrogen). Reverse transcription was primed using random hexamers, which were added at a concentration of 250ng per 5 ⁇ g RNA. RT negative reactions were carried out in parallel.
  • PCR amplicons were generated from each of the qPCR targets and extracted from a 2% agarose gel using the Wizard SV Gel and PCR Clean-Up System kit (Promega) and quantified using, a Nanodrop spectrophotometer. Each amplicon was then diluted to a concentration of 2ng/ul and used as templates for generating qPCR standard curves as a 1/10 serial dilution series.
  • C57BL/6J mice were obtained from the Animal Resource Centre (Perth, Australia), with all animal experiments performed in a pathogen free facility according to national and institutional guidelines. Bone marrow was harvested from the femur, tibia and spine using a mortar and pestle in PBS supplemented with 2% fetal calf serum (FCS) as previously described (Hoist et al. , 2006).
  • FCS fetal calf serum
  • Bone marrow cells were passed through a 70 ⁇ filter to ensure a single cell suspension, and incubated with lineage specific antibodies (B220, CD19, CD3, Sca- 1 ; BioLegend), conjugated to biotin, together with anti-Gr-1 -fluorescein isothiocyanate (FITC; Biolegend), anti-cKit-phycoerythrin (PE; Becton Dickinson), anti-CD34-Alexafluor 647 (eBioscience) and anti-CD16/32- PerCP/Cy5.5 (Becton Dickinson) antibodies. Cells were washed twice in PBS with 2% FCS and incubated with stxeptavidin-APC-Cy7 (Biolegend).
  • lineage specific antibodies B220, CD19, CD3, Sca- 1 ; BioLegend
  • FITC anti-Gr-1 -fluorescein isothiocyanate
  • PE anti-cKit-phycoerythrin
  • Control stains for FITC, PE, PerCP/Cy5.5, Alexafluor 647 and APC-Cy7 were used to determine compensation settings and gating for each population.
  • Mature granulocytes were purified as shown previously (Guibal et al, 2009), using fluorescence activated cell sorting (FACS) on a Becton Dickinson Aria II. After purification, a small sample of the cells was reanalyzed for purity by flow cytometry, and a separate ⁇ sample was stained using May-Griinwald Giemsa following a cytospin. Purification of the nucleus from sorted cells was carried out using the PARIS kit (Ambion) with modifications to minimise RNA degradation.
  • RNA datasets from mouse (Babiarz et al , 2008), Drosophila (Chung et al , 2008), C. elegans (Batista et al , 2008), A. queenslandica (Grimson et al , 2008) and budding yeast (Drinnenberg et al , 2009) were obtained from the NCBI Gene Expression Omnibus (NCBI GEO). See Table 7 for a complete list of the datasets and their corresponding identifiers. Human GRO-seq data (Core et al , 2008) was obtained through NCBI GEO (GSE13518). The summary 'aligned' BED files provided by the authors, and available at the GEO website, were used for all analyses.
  • Arabidopsis small RNA datasets were obtained from the Arabidopsis SBS database, available at http:/ mpss.udsl.edu at sbs/ (Nakano, 2006), and pooled. Reference genome and annotation sources
  • CD4+ T-cell nucleosome modification data (Barski et al , 2007; Wang et al , 2008) was downloaded directly from the authors' website, and is available at http;//dir.nhlbi.iiih.gov/papers/lniL1 ⁇ 2pigenomes/listcell.as and lTitp;//dir.nhlbi,niL
  • the summary bed files provided by the authors were used as the basis for all analyses (see below for more details).
  • Control CD4+ T-cell nucleosome datasets were obtained from the NCBI Sequence Read Archive (SRR00071 1 - SRR000720), and processed to obtain nucleosome-length fragments as described previously (Nahkuri et al , 2009) (more below).
  • S. cerevisiae combined H3 and H4 nucleosome data was obtained from http://atlas.bx.psu.edu/veast-maps/veast- index.html (Mavrich et al. , 2008).
  • RNA datasets, raw CD4+ nucleosome data, and S. eerevisiae H3 and H4 nucleosome data were mapped to the appropriate genome using ZOOM (Lin et al , 2008).
  • Small RNA, GRO-seq, chromatin modification and nucleosome density distributions were accomplished by converting mapped tag positions ⁇ i. e. , BED coordinates) to genome-wide wiggle density plots and averaging these densities across all loci of interest ⁇ e.g. , Refgene TSSs) using a set of in-house Perl scripts.
  • CD4+ T-cell nucleosome data (Barski et al , 2007; Wang et al , 2008) were processed to facilitate high-resolution bioinformatic queries.
  • the signals from the plus and minus strands associated with the same nucleosome are typically -150 bp apart, because the sequence tags are derived from the ends of the strands rather than over their whole length.
  • ChlP-seq nucleosome profiles from the publicly available deep sequencing data we : extended the genomic matches of all uniquely mapping tags in silico in the 3' direction so that they reached a total length of 150 nt, consistent with the expected length of nucleosome associated DNA, as described previously (Nahkuri et al , 2009; Schmid & Bucher, 2007).
  • To compute the 'wiggle profile' we summed the distribution of all tags across the genome downstream of the TSS, and then computed the average based on the number of TSSs queried. This resolves into a distinct single curve representing peak nucleosome density.
  • Pol II ChlP-seq fragments were also summed across all genes with tiRNAs and, like the , nucleosome data, resolved to a single high-resolution curve.
  • THP-1 nuclear tiRNAs were normalized by the relative expression of spike-ins 2 and 6 (Table 2). Bioinformatic queries against spike-ins were performed without mismatches to ensure accurate quantification and nonnalization. Identification and analysis of THP-1 nuclear tiRNAs was performed as previously described (Taft et al , 2009 (I)). Briefly, tags that mapped to known small RNA loci, repeat elements, or other potential confounding features were removed and small RNAs with a modal length of -18 nt that map sense and proximal (generally -60 to +120 nt) to TSSs (defined by Refgene annotations) were identified. ).
  • Splice site RNAs are defined as small RNAs less than or equal to 26 nt, dominantly 17-18 nt, whose 3' ends are precisely coincident any exonic splice donor site ⁇ i.e., the 3' end of an exon), including donor sites at both non-protein- coding and canonical protein-coding genes.
  • Splice site RNAs also include small RNAs less than or equal to 26 nt, dominantly 17-18 nt, whose 3' ends map to the 3' end of internal exons, that is, exons that are boaaded on both sides by an intron and/or exons that are exclusively protein encoding.
  • RNAs were mapped to both the genome and a library of ⁇ splice site junctions for each organism. Dips of small RNAs just across the splice site in some organisms may reflect poor gene annotations ⁇ i.e., incorrectly annotated or missed exons). Analysis of the expression of genes with spliRNAs in human -and Drosophila was accomplished using gene expression data , from undifferentiated THP-1 cells (ifflp ://fantom. gsc. riken.
  • spliRNAs are nuclear localized in primary mouse granulocyte nuclei (Figs. 10 and 11), and are detectable in a diverse set of evolutionarily distant animals.
  • Splice-site R As are expressed in mouse embryonic stem (ES) cells (Fig. 4c), a wide range of Drosophila melanogaster (Fig. 4d) and Caenorhabditis elegans (Fig. 4e) tissues, and in one of the most basal multicellular animals, the marine sponge Amphimedon q eenslandica (Fig. 4f). They have a modal length of 17- 18 nt in human THP-1 cells, and a modal length of 17 nt in all other species examined.
  • spliRNAs are expressed in- most tissues and developmental stages - in D. melanogaster and C. elegans (Figs. 14 and 15). However, spliRNAs are more enriched compared to background in Drosophila heads compared to bodies, are almost undetectable in imaginal discs, and are less abundant in adult sponge compared . to embryo (Figs.
  • spliRNAs may be connected with high gene expression in actively proliferating or undifferentiated tissues. Indeed, THP-1 and Drosophila genes With spliRNAs are more highly expressed than those without (Fig. 18).
  • small RNA distributions at splice donor sites in the flowering plant Arabidopsis thaliana and the budding yeasts Saccharomyces castellii and Saccharomyces cerevisiae were investigated.- No evidence of spliRNAs was detected in yeast or plants (Fig. 19).
  • spliRNAs are weakly expressed (the median abundance in THP-1 nuclei is 1 ) and show a strong enrichment for 3' -terminal guanines, which is likely, however, driven by the consensus splice site sequence (Fig. 20). Additionally, although spliRNAs are statistically more common at constitutive splice sites, we also observed a mild but statistically significant enrichment of spliRNAs at alternative first exons (Table 4). To query the relationship between RNAPII activit and spliRNAs we examined the recently described GRO-seq (Core et al. , 2008) dataset, which captures the position, amount and orientation of transcriptionally engaged RNA polymerases.
  • RNAPII backtracking and TFIIS activity (Taft et ai , 2009 (II)) that allows efferent signals to be produced in parallel with transcription elongation to mark the position for future reference.
  • II RNAPII backtracking and TFIIS activity
  • spliRNAs are linked to, or are by products of, splicing or result from post-transcriptional cleavage of longer capped RNAs (Fejes-Toth et al., 2009).
  • RNAs derived from wild type S. cerevisiae (Drinnenberg et ai , 2009), which, lacks RNAi, are . dominantly -17-18 nt, have a 3' -terminal nucleotide purine i.e. , adenine) bias, and are phased such that local small RNA maxima coincide with minima of nucleosome density (Fig. 24).
  • RNAs do not meet the criteria we have used to define tiRNAs and spliRNAs in metazoans, they exhibit many similar characteristics, suggesting that very small RNAs are a basal feature within the eukaryotic lineage that may have been coopted to specific genomic positions, and into specific roles, in animals.
  • Example 2 - Splice site NAs differentiate normal and breast cancer tissue
  • RNA deep sequencing data obtained from normal adjacent and ductal carcinoma in situ (DCiS, stage 2) human breast tissue from a single patient. These data show that spliRNAs are highly expressed in both normal and DCIS tissue, but are nonetheless significantly different. Indeed, a Gene Ontology enrichment analysis showed that while genes with spliRNAs in the normal adjacent tissue are strongly enriched for terms associated with terminal differentiation (Table 8), including three morphogenesis-associated Gene Ontology terms, they are completely absent in the breast carcinoma sample (Table 9). Likewise, KEGG pathway analysis revealed that enrichment for the Cell Cycle and Focal Adhesion pathways was specific to genes with spliRNAs in the normal adjacent tissue (Fig. 25). These enrichments were lost in DCIS tissue, and replaced with enrichments for the splicesome and the RNA degradation pathway (Fig. 26)
  • a differential expression analysis showed that a subset of more than 170 genes had spliRNA expression values twofold or more up- regulated in the DCIS sample compared to normal (Fig. 27).
  • Querying GeneSigDB revealed that these genes are significantly associated with gene expression signatures previously linked to breast cancer, specifically those with ERCC2 and BRCAl mutations (Fig. 28).
  • querying the GSEA MSigDB revealed significant overlap of genes with highly expressed spliRNAs in the DCIS sample with gene sets previously associated with a number of breast cancer subtypes and responses to treatments (Table 10). Overall, this data indicates that spliRNAs are differentially expressed between normal and cancer tissue, and are as good a diagnostic as much larger gene sets. This strongly suggests that. spliRNAs may be a better diagnostic and prognostic indicator than traditional gene profiling.
  • spliRNAs i.e. , small RNAs that map to 5' splice sites
  • SpliRNAs may be counted more than once since they may map to exons present in more than one mRNA isoform.
  • spliRNAs are not preferentially positioned at any one exon, but do appear to be generally enriched at internal exons. Overall, 7.6% of exons within genes with spliRNAs show evidence of spliRNA expression.
  • Target Probe sequence (5* to 3' ⁇
  • MCF-7 cells (breast cancer).
  • S.60E-05 Genes up-reguiated in hepatoblastoma samples

Landscapes

  • Health & Medical Sciences (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Engineering & Computer Science (AREA)
  • Genetics & Genomics (AREA)
  • Biomedical Technology (AREA)
  • Chemical & Material Sciences (AREA)
  • Molecular Biology (AREA)
  • Organic Chemistry (AREA)
  • Biotechnology (AREA)
  • General Engineering & Computer Science (AREA)
  • Zoology (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Wood Science & Technology (AREA)
  • Microbiology (AREA)
  • Plant Pathology (AREA)
  • Physics & Mathematics (AREA)
  • Biochemistry (AREA)
  • General Health & Medical Sciences (AREA)
  • Biophysics (AREA)
  • Pharmaceuticals Containing Other Organic And Inorganic Compounds (AREA)

Abstract

The invention relates to a class of small RNA molecules involved in gene regulatory activities. More particularly, this invention relates to non-protein-coding, splice-site associated small RNA molecules linked to gene regulatory activities and uses thereof.

Description

TITLE
SMALL R A MOLECULES AND METHODS OF USE
FIELD OF THE INVENTION THIS INVENTION relates to RNA molecules. More particularly, this invention relates to non-protein-coding, splice-site associated small RNA molecules linked to gene regulatory activities and uses thereof.
BACKGROUND OF THE INVENTION
In recent years, numerous small RNAs with interesting biological roles have emerged. These small RNAs, which are present in all kingdoms of life, are thought to be invol ved in many, if not most, fundamental cellular processes (Chu and Rana, 2007; Mattick and Makunin, 2005). For example, the best-studied class of small regulatory RNAs, micro RNAs (miRNAs), are predicted to regulate thousands of eukary'otic genes (Pillai et al, 2007; Vasudevan et al , 2007) and their dysregulation is associated with various diseases and conditions (Rooij and Olson, 2007; Zhang et al. , 2007).
More recently, advances in RNA sequencing techniques have led to the detection of new members of established classes of small RNAs (Ghildiyal & Zamore, 2009; Malone & Hannon, 2009), and to the discovery and characterization of several classes of promoter-proximal species including 5\ capped promoter-associated small RNAs (PASRs) (Fejes-Toth et al, 2009), TSSa-RNAs (Seila et al, 2008), and transcription initiation RNAs (tiRNAs) (Taft e/ /., 2009 (I)).
SUMMARY OF THE INVENTION
It has previously been established that small RNA molecules (e.g., miRNAs, piwi-interacting RNAs (pi R As) and tiRNAs) are widespread in mammalian genomes. However, it has remained unclear as to whether there are as yet unidentified classes of small non-coding RNAs involved in regulating transcription and developmental pathways in mammalian and other genomes. The present invention has arisen from the inventors' unexpected discovery of a new class of small RNA molecules involved in gene regulatory activities that are distinguishable from any previously identified class of small RNA molecules, including tiRNAs.
In a first aspect, the invention provides an isolated substantially single- stranded RNA molecule that comprises a nucleotide sequence comprising no more than 26 contiguous nucleotides that correspond to a genomic DNA sequence associated with gene regulation, wherein said nucleotide sequence comprises a nucleotide sequence corresponding to a 3' nucleotide sequence of an internal exon of said genomic DNA sequence.
Typically, the isolated RNA molecule comprises a nucleotide sequence that corresponds to a sense strand of the internal exon of the genomic DNA sequence.
In one preferred form, said isolated RNA molecule terminates in a 3 ' nucleotide sequence that corresponds to the 3' end of the internal exon of the genomic DNA sequence.
In another preferred form, said isolated RNA molecule consists of a nucleotide sequence that corresponds to the 3' end of the internal exon of the genomic DNA sequence.
In one preferred form, said isolated RN A molecule comprises a nucleotide sequence comprising 14-26 contiguous nucleotides.
In another preferred form, said isolated RNA molecule comprises a nucleotide sequence comprising 14-20 contiguous nucleotides.
In yet another preferred form, said isolated RNA molecule comprises a nucleotide sequence comprising 16-18 contiguous nucleotides.
In still yet another preferred form, said isolated RNA molecule comprises a nucleotide sequence comprising 17 or 18 nucleotides. '
Typically, although not exclusively, the isolated RNA molecule comprises a nucleotide sequence that is located in, or obtainable from, a cell nucleus.
Typically, the isolated RNA molecule comprises a nucleotide sequence that corresponds to a nucleotide sequence that is located at or near a 5' splice site of the internal exon of the genomic DNA sequence. Preferably, the isolated RNA molecule consists of a nucleotide sequence that corresponds to a nucleotide sequence that is located at the 5' splice site of the internal exon of the genomic DNA sequence.
It will be appreciated that the terms "3 'end of an internal exon" and "J ' splice site of an internal exon" refer to the same locatio in the genomic DNA sequence and may hereafter be used interchangeably.
Preferably, the genomic DNA sequence is of or obtainable from a eukaryote.
More preferably, the genomic DNA sequence is of or obtainable from a metazoan.
Even more preferably, the genomic DNA sequence' is of or obtainable from a vertebrate or a mammal.
Advantageously, the genomic DNA sequence is of or obtainable from a human;
In certain embodiments, the nucleotide sequence of the isolated RNA molecule is GC enriched.
This aspect of the invention also provides a modified, isolated RNA molecule, a fragment of an isolated RNA molecule and/or an RNA or DNA molecule at least partly complementary to said isolated RNA molecule.
In a second aspect, the invention provides a genetic construct which comprises or encodes one or a plurality of:
(i) an isolated RNA molecule according to the first aspect;
(ii) a fragment of the isolated RNA molecule according to the first aspect;
(iii) a modified RNA molecule according to the first aspect; and/or
(iv) an at least partly complementary RNA or DNA molecule according to the first aspect.
In one particular embodiment, the genetic construct is an expression construct comprising a DNA sequence complementary to one or a plurality of the isolated RNA molecules of the first aspect operably linked or connected to one or more regulatory nucleotide sequences. In a third aspect, the invention provides a method of identifying the isolated RNA molecule of the first aspect, said method including the step of isolating one or more of said isolated RNA molecules from a nucleic acid sample obtained from an organism.
In a fourth aspect, the invention provides a method of identifying the isolated RNA molecule of the first aspect, said method including the step of identifying a DNA sequence in a genome of an organism which is complementary to the nucleotide sequence of said one or more isolated RNA molecules.
In a fifth aspect, the invention provides a computer-readable storage medium or device encoded with data corresponding to one or more of:
(i) an isolated RNA molecule according to the first aspect;
(ii) a fragment of the isolated RN A molecule according to the first aspect;
(iii) a modified RNA molecule according to the first aspect; and/or ·
(iv) an at least partly complementary RNA or DNA molecule according to the first aspect;
In a sixth aspect, the invention provides a method of identifying a regulatory region in a genome, said method including the step of identifying an isolated RNA molecule according to the first aspect to thereby identify said regulatory region.
In some particular embodiments, said regulatory region is associated with one or more regulatory activities selected from the group consisting of mRNA splicing, transcription, epigenetic modification, eplgenetic regulation, chromatin modification and nucleosome positioning.
In a seventh aspect, the invention provides a method of determining whether a mammal has, or is predisposed to, a disease or condition associated with one or more regulatory regions of a genome, said method including the step of determining whether said mammal comprises one or more isolated RNA molecules according to the first aspect, wherein the or each nucleotide sequence of said one or more isolated RNA molecules corresponds to a genomic DNA sequence associated with said disease or condition. In some particular embodiments, said regulatory region is associated with one or more regulatory activities selected from the group consisting of mRNA splicing, transcription, epigenetic modification, epigenetic regulation, chromatin modification and nucleosome positioning.
Preferably, the mammal is a human.
In an eighth aspect, the invention provides a nucleic acid array comprising a plurality of isolated RNA molecules according to the first aspect (or their DNA equivalents), immobilized, affixed or otherwise mounted to a substrate.
In a ninth aspect, the invention provides an antibody which binds:
(i) an isolated RNA molecule according to the first aspect
(ii) a fragment of the isolated RNA molecule according to the first aspect;
(iii) a modified RNA molecule according to the first aspect; and/or
(iv) an at least partly complementary RNA or DNA molecule according to the first aspect.
In a tenth aspect, the invention provides a kit comprising one or more isolated RNA molecules according to the first aspect, or one or more isolated nucleic acids respectively complementary thereto, and/or an antibody according to the ninth aspect, and one or more detection reagents.
In an eleventh aspect, the invention provides a method of treating a disease or condition in a mammal, said method including the step of administering to the mammal a therapeutic agent selected from the group consisting of:
(i) an isolated RNA molecule according to the first aspect;
(ii) a fragment of the isolated RNA molecule according to the first aspect;
(iii) a modified RNA molecule according to the first aspect;
(iv) an at least partly complementary RNA or DNA molecule , according to the first aspect; and/or
(v) an antibody according to the ninth aspect;
to thereby treat said disease or condition. In one non-limiting embodiment, said disease or condition is associated with aberrant regulatory activity of one or more genes.
In another non-limiting embodiment, said disease or condition is associated with aberrant mRNA splicing of one or more genes.
In yet another non-limiting embodiment, said disease or condition is associated with aberrant transcriptional activity of one or more genes.
In still another non-limiting embodiment, said disease or condition is associated with aberrant epigenetic modification and/or regulation of one or more genes.
In still another non-limiting embodiment, said disease or condition is associated with aberrant chromatin modification.
In still yet another non-limiting embodiment, said disease or condition is associated with aberrant nucleosome positioning.
Preferably, the mammal is a human.
In a twelfth aspect the invention provides a pharmaceutical composition comprising a therapeutic agent selected from the group consisting of:
(i) an isolated RNA molecule according to the first aspect;
(ii) a fragment of the isolated RNA molecule according to the first aspect;
(iii) a modified RN A molecule according to the first aspect
(iv) an at least partly complementary RNA or DNA molecule according to the first aspect; and
(v) an antibody according to the ninth aspect
and a pharmaceutically acceptable carrier, diluent or excipient.
In . one embodiment, the pharmaceutical composition is for treating a disease or condition, such as but not limited to a disease or condition associated with one o more aberrant regulatory activities of one or more genes.
Typically, although not exclusively, said one or more aberrant regulatory activities is selected from the group consisting of mRNA. splicing, transcription, epigenetic modification, epigenetic regulation, chromatin modification and nucleosome positioning. Throughout this specification, unless the context requires otherwise, the words "comprise", ''"comprises" and "comprising" will be understood to imply the inclusion of a stated integer or group of integers but not the exclusion of any other integer or group of integers.
BRIEF DESCRIPTION OF THE FIGURES FIGURE 1. Human spliRNA sequences from THP-l cells (n= 16,298).
FIGURE 2. (A) Mouse spliRNAs derived from ' (wild type) ' mouse primary granulocyte nuclei (n=200); (B); Embryonic Drosophila spliRNAs (n=200) (C); C. elegans spliRNAs (n=200).
FIGURE 3. Features of nuclear-localized very small RNAs. (a) A schematic showing an example of nuclear small RNAs density at the human CAP-1 locus. tiRNAs are present downstream of the CAP1 TSS, and antisense and upstream, consistent with bidirectional promoter activity. Splice-site RNAs (spliRNAs), small RNAs whose 3' ends map to the exon 3' end (i.e., the 5' splice site) are expressed at exons 4 and 7. FIGURE 4. Splice site RNAs are conserved across metazoa. The position of small RNA 3' ends is plotted with respect to the 5' splice site, i.e., the 3' end of the exon. Schematics at the top of the image depict the position of spliRNAs and their strand orientation with respect to exon-exon junctions, (a) Small RNAs, dominantly -17-18 nt, are >35-fold enriched at the 5' splice site in THP-l nuclei compared to either the background or (b) cytosolic THP-l small RNAs. SpliRNA expression is conserved in species representative of all major metazoan lineages, including (c) mouse (see also Figs. 10-12) (d) the fruit fly Drosophila melanogaster (see also Figs. 14, 16), (e) the nematode worm Caenorhahditis elegans (see also Figs. 13, 15), and (f) the marine sponge Amphimedon queenslandica (see also Fig. 17). The data presented in (c), (d), (e) and (f) are derived from the publicly available NCBI GEO series GSE1251, GSE11624, GSE1 1738 and GSE 12578, respectively. See Table 7 for further detail. FIGURE 5. Quantitative real time PCR validation of the expression of (A) H/ACA snoRNA SNORA-19, (B-D) three snRNAs, and (E-G) three tRNAs in nuclear, cytoplasmic, and total RNA fractions from cultured human THP-l cells which were subsequently interrogated by deep sequencing.
FIGURE 6. Northern validation of cytoplasmic and nuclear THP-l RNA fractions. (A) Validation of the nuclear expression of C/D snoRNA SNORD77, which has been previously shown to generate an -30 nt sdRNA, and H ACA snoRNA SNORA19. (B) Validation of the cytoplasmic localization of tRNA Lys (AAG) and tRNA His (CAY).
FIGURE 7. The density of (A) nuclear, and (B) cytoplasmic THP- l small RNAs across exon-intron boundaries. Note that (A) and (B) have vertical axes on different scales.
FIGURE 8. The density of (A) nuclear, and (B) cytoplasmic THP-l small RNA that multimap (2-5x) across exon-exon boundaries. Note that (A) and (B) have vertical axes on different scales. FIGURE 9. The density of total THP-l small RNAs at (A) exon-exon junctions, or (B) exon-intron junctions.
FIGURE 10. The density of small RNAs in primary mouse granulocyte nuclei at (A) exon-exon, or (B) exon-intron junctions. See Fig. 11 and the Methods for more information about the isolation and sequencing of these small RNAs.
FIGURE 11. Mouse primary granulocyte purification. (A-C) Flow cytometric analysis of FACS purified granulocytes showing percentage of sorted cells in each gate (>95% purity). Granulocytes were defined as (A) lineage negative (B220, CD19, CD3, Sca- 1 negative), side scatter high (SSC-A), (B) CD34 negative, c-kit negative and (C) Gr- l high, CD 16/32 positive. (D) Granulocyte morphology was confirmed by May-Grunwald Giemsa staining of FACS purified cells. (E) The purity of whole cells (cell lysate) and nuclear fractions (nuclear lysate) was determined by SDS-PAGE and Western blot for β-actin (42kDa; both nuclear and cytoplasmic) and GAPDH (36kDa; cytoplasmic). FIGURE 12. The density of small RNAs in mouse embryonic stem (ES) cells lacking the micro RNA processing enzymes (A) Dicer or (B) Dger8 across exon- exon junctions. spliRNA biogenesis is not affected by the loss of known components of the RNAi pathway in mice. FIGURE 13. The density of C. elegans small RNAs across exon-exon junctions. spliRNAs are expressed in (A) Fog-2, (B) Glp-4, and (C) Prg- 1 mutant animals, indicating that spliRNA production is not abolished in response to mutations affecting the development of the germline. FIGURE 14. The density of Drosophila small RNAs across exon-exon junctions. spliRNAs are expressed in (A,B) early embryonic development and (C,D) cultured S2 and KC cells.
FIGURE 15. The density of C. elegans small RNAs across exon-exon junctions. spliRNAs are expressed in (A) embryonic worms, and (B-E) developing larvae. (F) This pattern is consistent in a mixed stage library.
FIGURE 16. The density of Drosophila small RNAs across exon-exon junctions in various anatomical regions. spliRNAs are expressed in adult (A) male and (C) female heads but are not apparent in (B) male or (D) female bodies. spliRNAs are weakly expressed compared to background in (E) imaginal discs.
FIGURE 17. The density of Amphimedon queenslandica (vnmme sponge) small RNAs across exon-exon junctions. Although spliRNAs are expressed in both (A) embryonic and (B) adult sponge, their relative expression compared to background is ~2 fold higher in embryo. FIGURE 18. The relationship between spliRNA and gene expression. (A) Genes with spliRNAs are significantly more highly expressed than those without in THP-1 cells. (B) Genes with spliRNAs are generally more highly expressed than those without throughout Drosophila early embryonic development. **p-value < 2.2e- 16 and **p-value < 0.05 by Wilcoxon rank sum test with continuity correction. ·
FIGURE 19. The density of small RNAs from Arabidopsis and two budding yeast (S. cerevisiae and S. castellii) across exon-exon junctions. spliRNA expression is not evident in (A) Arabidopsis or (B,C) budding yeast.
FIGURE 20. spliRNA sequence logos. Sequence logos of (A) 18 nt, (B) 17 nt, and (C) 16 nt nuclear-localized THP-1 spliRNAs are shown. FIGURE 21. The correlation between GRO-seq density and spliRNAs. GRO-seq mappings were obtained from NCBI GEO (see Methods) and their density was plotted against all Refseq exons (excluding annotations on the haplotype, random, or mitochondrial chromosomes), or against the subset of exons that show spliRNAs in the THP-1 nuclear dataset. The spliRNA average density shown is derived from the THP-1 nuclear dataset. A local minimum of GRO-seq density correlates with the presence of spliRNAs, consistent with a model of spliRNA biogenesis dependent on RNAPII backtracking and TFIlS-mediated cleavage.
FIGURE 22. The relationship between spliRNAs and downstream intron size. The size distribution o'f all introns in the human knownGene dataset, up to 1.2kb, is shown as blue bars. The size distribution of introns immediately upstream or downstream of an exon with a THP-1 spliRNA is shown in orange and green, respectively. Short introns are twofold more common immediately downstream of exons with spliRNAs in THP-1 cells.
FIGURE 23. The relationship between spliRNAs and exon size. The size distribution of all exons in the human knownGene dataset, up to 900 bp, is shown as red bars. The size distribution of exons associated with spliRNAs is shown as blue bars. spliRNAs are ~2-fold less abundant at very short exons, which do not have positioned nucleosomes. FIGURE 24. Small RNAs from wild type 5. cerevisiae exhibit characteristics similar to tiRNAs and spliRNAs. (A) S. cerevisiae small RNA average density is plotted with respected to the average density of nucleosomes downstream of all annotated Refgene transcription start sites. Peak small RNA density coincides with local nucleosome density minima. (B) Consistent with previous analysis, & cerevisiae, which lack the miRNA and siRNA pathways, have a distribution of small RNAs that peaks at ~17-18 nucleotides. These small RNAs exhibit a random distribution of 5'' end nucleotides. (C) Like tiRNAs and spliRNAs, S. cerevisiae small RNAs show a purine 3 ' nucleotide bias. FIGURE 25, K£GG pathway enrichment analysis for genes with spliRNAs in normal adjacent tissue.
FIGURE 26. KEGG pathway enrichment analysis for genes with spliRNAs in DCIS tissue.
FIGURE 27. Splice-site RNA expression comparison per gene in the normal adjacent (i. e. , control) versus the DGIS (i.e. , cancer) tissues. Note the number of genes with differentially highly expressed spliRNAs in the DCIS tissue. FIGURE 28. Genes with greater than twofold greater spliRNA expression in the DCIS sample are significantly associated with breast cancer pathways in GeneSigDB.
DETAILED DESCRIPTION OF THE PREFERRED EMBODIMENT
To investigate splice-site associated small RNAs in detail, the present inventors analysed the relationship between 5' splice-sites and small RNAs present in deep sequencing libraries from human and mouse cell lines, mouse primary granulocyte nuclei, a variety of nematode worm Caenorhabditis elegans (C. elegans), fruit fly Drosophila melanogaster (D. melanogaster) and marine sponge Amphimedon queenslandica (A. queenslandicd) primary tissues.
The present invention arises from the finding of a novel class of "splice- site" RNA molecules (spliRNAs) that may be associated with gene regulatory mechanisms including, but not limited to, mRNA splicing, transcription, epigenetic modification, epigenetic regulation, chromatin modification and nucleosome positioning. The present inventors have surprisingly found that these spliRNAs may be distinguished from any previously identified class of small RNAs based on their location in the genome and the features they are associated with. In a particular embodiment, the spliRNA may be characterized as comprising a nucleotide sequence that (i) corresponds to a nucleotide sequence that is located at or near a 5' splice-site of an internal exon of a genomic DNA sequence, and (ii) corresponds to a 3' nucleotide sequence of the internal exon of the genomic DNA sequence.
It will be appreciated that these small RNA molecules exhibit different characteristics to the small non-coding RNA molecules (e.g. , miRNAs, piRNAs and tiRNAs) previously identified. The present invention is based on the inventors' identification of spliRNA molecules, the manipulation of these spliRNAs and the use of spliRNAs to characterize their role and function in cells. The invention also concerns methods and compositions for identifying spliRNAs, arrays comprising spliRNAs (spliRNA arrays) and use of spliRNAs for diagnostic, therapeutic and prognostic applications in mammals, particularly humans.
For the purposes of this invention, by '''isolated" is meant present in an environment removed from a natural state or otherwise subjected to human manipulation. Isolated material may be substantially or essentially free from components that normally accompany it in its natural state, or may be manipulated so as to be in an artificial state together with components that normally accompany it in its natural state. The term ''isolated" also encompasses terms such as ''enriched", "purified", "synthetic" and/or ''recombinant". The term "nucleic acid" as used herein designates single- or double- stranded mRNA, RNA, cRNA, RNAi and DNA inclusive of cDNA and genomic DNA. Nucleic acids may comprise naturally-occurring nucleotides or synthetic, modified or derivatized bases (e.g., inosine, methyinosine, pseudouridine, methy!cytosine etc). Nucleic acids may also comprise chemical moieties coupled thereto to them. Examples of chemical moieties include, but are not limited to, biotin, locked nucleic acids (LNAs), peptide nucleic acids (PNAs), cholesterol, 2'0-methyl, Morpholino, and fluorophores such as HEX, FAM, Fluorescein and FITC.
According to a first aspect, the invention provides an isolated substantially-single stranded RNA molecule (referred to herein as a "splice-site RNA" or "spliRNA") that comprises a nucleotide sequence comprising no more than 26 contiguous nucleotides that correspond to a genomic DNA sequence associated with gene regulation, wherein said nucleotide sequence comprises a nucleotide sequence corresponding to a 3 ' nucleotide sequence of an internal exon of said genomic DNA sequence.
In this context, "corresponding to" and "corresponds to" means that the spliRNA molecule has a nucleotide sequence of, or a sequence complementary to, a genomic DNA nucleotide sequence. It will be appreciated that this definition should take into account that RNA uses a U instead of a T, as found in DNA.
Typically, the spliRNA comprises a nucleotide sequence that corresponds to a sense strand of the internal exon of the genomic DNA sequence.
In one preferred form, said spliRNA molecule terminates in a 3' nucleotide sequence that corresponds to the 3' end of the internal exon of the genomic DNA sequence.
In another preferred form, said isolated RNA molecule consists of a nucleotide sequence that corresponds to the 3' end of the internal exon of the genomic DNA sequence.
In one preferred form, the spliRNA molecule comprises a nucleotide sequence comprising 14-26 contiguous nucleotides.
In another preferred form, the spliRNA molecule comprises a nucleotide sequence comprising 14-20 contiguous nucleotides. In yet another preferred form, the spliRNA molecule comprises a nucleotide sequence comprising 16- 18 contiguous nucleotides.
In still yet another preferred form, the spliRNA molecule comprises a nucleotide sequence comprising 17 or 18 contiguous nucleotides.
It will be appreciated that the length of the spliRNA molecule may vary depending on the genome that it is located in and/or derived from. For example, a spliRNA molecule of a human genome may comprise on average 17 or 18 nucleotides while a spliRNA molecule of a different eukaryotic genome (e.g. , mouse and Drosophila) may comprise on average 17 nucleotides, although without limitation thereto.
Typically, the isolated RNA molecule comprises a nucleotide sequence that corresponds to a nucleotide sequence that is located at or near a 5' splice site of the internal exon of the genomic DNA sequence.
Preferably, the isolated RNA molecule consists of a nucleotide sequence that corresponds to a nucleotide sequence that is located at the 5 ' splice site of the internal exon of the genomic DNA sequence.
Typically, the spliRNA does not encode a peptide or a protein encoded by a genome. Accordingly, the spliRNA comprises a nucleotide sequence that is referred to herein as "non-coding".
The spliRNAs are typically not linked to the pathways that produce miRNAs, siRNAs and piRNAs. Thus, in contrast to other small non-coding RNA molecules (e.g. , miRNAs, siRNAs and piRNAs), spliRNAs do typically not require a dicer and/or a Dgcr8 for their processing and/or production.
It will be appreciated that the spliRNAs are expressed in most tissues and developmental stages in eukaryotes such as, but not limited to, humans, mice, Drosophila and C. elegans. In some cases, the spliRNAs are enriched, increased or otherwise more prevalent in actively differentiating and/or undifferentiated tissues. For example, the expression of a spliRNA may be higher in a Drosophila head compared to in a Drosophila body. It will also be appreciated that the expression of the spliRNA may be upregulated, induced, elevated or otherwise increased in an embryonic sponge compared to in an adult sponge (e.g., the marine sponge A. queenslandica). It will also be appreciated that genes associated with spliRNAs may be more highly expressed compared to genes that lack or have a low abundance of spliRNAs.
While in one embodiment the spliRNA molecule has a nucleotide sequence transcribed from the corresponding DNA sequence, it will be appreciated that said spliRNA molecule may be chemically-synthesized de novo, rather than transcribed from a DNA sequence.
Chemical synthesis of RNA is well known in the art. Non- limiting ' examples include RNA synthesis using TOM amidite chemistry, 2- cyanoethoxymethyl (CEM), a 2'-hydroxyl protecting groups and fast oligonucleotide deprotecting groups.
As hereinbefore described, the nucleotide sequence of a spliRNA molecule is typically GC rich. By this is meant, that the percent GC content of the nucleotide sequence is substantially greater than the average GC content of the genome from which the spliRNA is derived. This GC content also differs from that of miRNAs.
Preferably, the GC content of spliRNAs is greater than about 50-70%, or greater than about 51 -60%.
Typically, the GC content of spliRNAs is greater than about 51 %, greater than about 52%, greater than about 53%, or greater than about 54% compared to about 50% for miRNAs and compared to about 70% for tiRNAs.
It will be appreciated that this comparison is organism dependent hence the actual GC content will vary for spliRNAs of each different organism.
In some embodiments, a spliRNA may be is associated with the maintenance and/or initiation of nucleosome positioning.
In certain embodiments, a spliRNA may be located at a region of · a genome with (i) RNA polymerase II binding and/or (ii) transcription elongation factor II (TF1IS) activity.
In some cases, spliRNA expression may be at least partly upregulated, elevated or otherwise increased at exons located upstream of short introns.
Thus, spliRNA expression at an exon located upstream of an intron that comprises 60-120 nucleotides or less may be, for example, 25%, 50%, 75%, 100% 150%, 200%, 250%, 300%, 350%, 400%, 450% and up to about 500% increased compared to spliRNA expression at an exon located upstream of an intron that comprises 121 nucleotides or more.
In some cases spliRNA expression may be at least partly reduced, decreased or otherwise down-regulated at short exons. Consequently, spliRNA expression may be, for example, 25%, 50%, 75%, 100% 150%, 200%, 250%, 300%, 350%, 400%, 450% and up to about 500% reduced at an exon comprising 60 nucleotides or less compared to spliRNA expression at an exon comprising 61 nucleotides or more.
Typically, although not exclusively, spliRNA molecules do not form secondary structures, such as stem and loop structures. Accordingly, spliRNA molecules are substantially free of internal base-pairing. In this context, by "substantially free" is meant fewer than 3, 2 or 1 internal base pairs.
Therefore, in one particularly preferred embodiment, the invention provides an isolated substantially single-stranded spliRNA molecule, wherein said isolated spliRNA molecule comprises a nucleotide sequence that:
(i) consists of 17 or 18 contiguous nucleotides that correspond to a sense strand of an internal exon of a mammalian genomic DNA sequence located at a 5' splice site;
(ii) corresponds to a 3' end of the internal exon of the mammalian genomic DNA sequence;
(iii) comprises a GC content greater than 54%; and
(iv) is substantially free of internal base-pairing.
Preferably, the mammalian genomic sequence is of, or obtainable from, a human genome.
Non-limiting examples of the isolated spliRNA molecules of the invention are set forth in SEQ ID NOs: 1 -6884 (Fig. 1 .(human)) and SEQ ID NOs:6885- 16,898 (Fig. 2 A-C (mouse, C. elegans., and Drosophila)).
Typically, although not exclusively, the isolated spliRNA molecule comprises a nucleotide sequence that is located in, or obtainable from, a. cell nucleus. It will also be appreciated that the invention contemplates nucleic acid, molecules (e.g., RNA or DNA) complementary to or at least partly complementary to the spliRNA molecules of the invention. Complementary or at least partly complementary nucleic acid molecules may be in DNA or RNA form.
By '"at least partly complementary" is meant having at least 60%, at least
70%, at least 75%, at east 80%, at least 90%, or at least 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%o or 99% sequence identity with a nucleotide sequence of a spliRNA molecule.
The invention also provides a modified spliRNA molecule.
A modified spliRNA may be altered by, complexed, labeled or otherwise covalently or non-covalently coupled to one or more other chemical entities. In some embodiments, the chemical entity may be bonded, linked or otherwise attached directly to the spliRNA, or it may be bonded, linked or otherwise attached to the spliRNA via a linking group (e.g., a spacer).
Examples of such chemical entities include, but are not limited to, incorporation of modified bases (e.g. , inosine, methyl inosine, pseudouridine and morpholino), sugars and other carbohydrates such as 2'-0-methyl and locked nucleic acids (LNA), amino groups and peptides (e.g.. peptide, nucleic acids (PNA)), biotin, cholesterol, fluorophores (e.g., FITC, Fluoroscein, Rhodamine, HEX, FAM, TET and Oregon Green) radionuclides and metals, although without limitation thereto (Fabani and Gait, 2008; You et al. , 2006; Summerton and Weller, 1997), A more complete list of possible chemical modifications can be found at http://www.oligos.com'ModificationsList.htm.
In one particular embodiment, the modified spliRNA is an "antisense inhibitor ". By "antisense inhibitor " is meant a nucleic acid sequence that is either complementary to or at least partly complementary to the spliRNA molecule (Dias and Stein, 2002; Kurreck, 2003; Sahu et al, 2007). The antisense inhibitor pairs with the spliRNA and interferes with interactions such as, but not limited to, spliRNA-mRNA and spliRNA-DNA interactions. Experiments showing sequence-specific inhibition of small RNA function have previously been demonstrated both in vitro (Meister et al , 2004; Hutvagner et al , 2004) and in vivo (Rriitzfeldt et al , 2005). In another particular embodiment, the modified spliRNA is a "point mutant ". By "point mutant " is meant a spliRNA molecule where 1 or 2 nucleotides have been removed, substituted or otherwise altered. Point mutants of spliRNAs or their targets can be employed to study the function of spliRNAs in disease or to increase the affinity of spliRNAs to variant targets. Small RNA molecules involved in disease processes, including spliRNAs, may have "seed- sequences ". By "seed-sequences " is meant nucleic acid sequences that comprise 2-7 nucleotides and are involved in target recognition (Lewis et al. , 2003 ; Lewis et al. , 2005). Increasing the mismatch in these sequences is predicted to significantly decrease the gene regulation function of spliRNAs. This approach may be applicable for partial inhibition of spliRNA. targets.
In yet another particular embodiment, the modified spliRNA molecule is a "spliRNA sponge". By "spliRNA sponge" is meant a genetically encoded competitive spliRNA inhibitor that may be stably expressed in a cell, such as a eukaryotic cell. The spliRNA binds to the spliRNA thereby preventing it from binding its mRNA target in a technique called "sponging". The spliRNA sponges may be produced using methods such as the ones described in Cohen, 2009, Ebert et al. , 2007, Hammond, 2007 and Rooij et al. , 2008. It will be appreciated that a spliRNA sponge may bind to, soak up and/or inhibit a specific spliRNA and/or a family of spliRNAs.
In still yet another particular embodiment, the modified spliRNA is a "spliRNA mimic ". A "spliRNA mimic " is a single-stranded RNA oligonucleotide that is complementary to or at least partly complementary to the spliRNA. The spliRNA mimic may inactivate pathological spliRNAs through complementary . base-pairing. It will also be appreciated that chemical modification to LNA, PNA or morpholino and conjugation to cholesterol may stabilize the spliRNA mimic molecule and facilitate delivery of single-stranded RNA molecules to targets following intravenous administration (Rooij and Olson, 2007).
The invention also provides a fragment of a spliRNA of the invention. By '"fragment'" is meant a portion, domain, region or sub-sequence of a spliRNA molecule which comprises one or more structural and/or functional characteristics of a spliRNA molecule. By way of example only, a fragment may comprise at least 5, at least 6, at least 7, at least 8, at least 9, at least 10, at least 11, at least 12, at least 13, at least 14, at least 15, at least 16 or at least 17 nucleotides of a spliRNA molecule.
It will be appreciated that the spliRNA molecules can be chemically modified to facilitate penetration into cells. Examples of such modifications include, but are not limited to, conjugation to cholesterol, Morpholino, ΎΟ- methyl, PNA or LNA (Partridge et al , 1996; Corey and Abrams, 2001 ; os et al, 2003).
Modified spliRNA molecules also include "variants" of the spliRNA molecules of the invention. Variants include RNA or DNA molecules comprising a nucleotide sequence at least 70%, 75%, 80%, 85%, 90%, 95%, 96%, 97%, 98% or 99% identical to a nucleotide sequence of a spliRNA molecule such as described in Fig. 1 and Fig. 2. Such variants may include one or more point mutations, nucleotide substitutions, deletions or additions;
According to another aspect, there is provided a genetic construct comprising or encoding one or a plurality of the same or different spliRNA molecules, modified spliRNA molecules, at least partly complementary DNA or RNA molecules, or fragments thereof.
It will be appreciated that said spliRNA molecules may be oriented in tandem repeats or with multiple copies of each spliRNA sequence.
As used herein, a "genetic construct" is any artificially constructed nucleic acid molecule comprising heterologous nucleotide sequences.
A genetic construct is typically in DNA form, such as a phage, plasmid, cosmid, artificial chromosome (e.g., a YAC or BAC), although without limitation thereto. The genetic construct suitably comprises one or more additional nucleotide sequences, such as for assisting propagation and/or selection of bacterial or other cells transformed or transfected with the genetic construct.
In one particular embodiment, the genetic construct is a DNA expression construct that comprises one or more regulatory sequences that facilitate transcription of one or more spliRNA molecules, modified spliRNA molecules or fragments thereof. Such regulatory sequences may include promoters, enhancers, polyadenylatiori sequences, splice donor/acceptor sites, although without limitation thereto.
Suitable promoters may be selected according to the cell or organism in which the spliRNA molecule is to be expressed. Promoters may be selected to facilitate constitutive, conditional, tissue-specific, inducible or repressible expression as is well understood in the art. Examples of promoters are T7, SP6, SV40, PolIII, U6, HI and 7S , although without limitation thereto.
It will be appreciated that the spliRNA molecule may be provided as an encoding DNA sequence in an expression construct that, when transcribed, produces the spliRNA molecule as a transcript.
It will also be appreciated that spliRNA molecules appear to be a hitherto unknown form of small, single stranded RNA molecules that occur throughout evolution. Accordingly, spliRNA molecules may be isolated, identified, purified or otherwise obtained from a number of different organisms.
Preferably, the organism is a eukaryote.
More preferably, the organism is a metazoan inclusive of all multi-celled animals ranging from marine sponge to insects and vertebrates.
Even more preferably, the organism is a vertebrate, inclusive of mammals, avians such as chickens and ducks and aquaculture species such as fish, although without limitation thereto.
Even more preferably, the organism is a mammal.
Mammals include humans, livestock such as horses, pigs, cows and sheep, domestic animals such as cats and dogs, although without limitation thereto.
In further aspects, the invention therefore provides methods of identifying, purifying or otherwise obtaining a spliRNA molecule.
Broadly, such methods may include analysis of nucleic acid samples obtained from an organism, and/or bioinformatic analysis of genome sequence information.
Preferably, the nucleic acid samples are derived from the genome of a eukaryote. More preferably, the nucleic acid samples are derived from the genome of a metazoan inclusive of marine sponge, insects and vertebrates.
Even more preferably, the nucleic acid samples are derived from the genome of a vertebrate, inclusive of mammals, avians such as chickens and ducks and aquaculture species such as fish, although without limitation thereto..
Even more preferably, the nucleic acid samples are derived from the genome of a mammal.
Mammals include humans, livestock such as horses, pigs, cows and sheep, domestic animals such -as cats and dogs, although without limitation thereto.
Preferably, methods for analyzing a nucleic acid sample to identify a spliR A include "deep sequencing" and mapping strategies that consider exon- exon and/or exon-intron boundaries and multi-mapping deep sequencing reads. Examples of specific deep sequencing technologies employed for the identification of exon boundaries and spliRNAs include, but are not limited to, 454™-, Helicos-, PacBio-, Soiexa/Illumina- and SOLiD-sequencing.
In particular embodiments that relate to bioinformatic analyses of genome sequence information, the invention provides a computer-readable storage medium or device encoded with structural information of one or more spliRNA molecules. The structural information may be nucleotide sequence, sequence length,
GC content and/or proximity to a 5 ' splice site of an internal exon, although without limitation thereto.
A computer-readable storage medium may have computer readable program code components stored thereon for programming a computer (e.g., any device comprising a processor) ' to perform a method as described herein. Examples of such computer-readable storage media include, but are not limited to, a hard disk, a CD-ROM, an optical storage device, a magnetic storage device, a ROM (Read Only Memory), a PROM (Programmable Read Only Memory), an EPROM (Erasable Programmable Read Only Memory), an EEPROM (Electrically Erasable Programmable Read Only Memory) and a Flash memory. Further, it is expected that one having ordinary skill in the art, notwithstanding possibly significant effort and many design choices motivated by, for example, available time, current technology, and economic considerations, when guided by the concepts and principles disclosed herein 'will be readily capable of implementing the invention by generating necessary software instructions, programs and/or integrated circuits (ICs) with minimal experimentation.
Typically, the computer-readable storage medium or device is part of a computer or computer network capable of interrogating, searching or querying a genome sequence database.
In one example, a bioinformatic method may utilize a high performance computing station which houses a local mirror of the UCSC Genome Browser.
One further aspect of the invention provides antibodies which bind, recognize and/or have been raised against a spliRNA of the invention, inclusive of fragments and modified spliRNA molecules.
Antibodies may be monoclonal or polyclonal. Antibodies also include antibody fragments such as Fc fragments, Fab and Fab '2 fragments, diabodies and ScFv fragments. Antibodies may be made in a suitable production animal such as a mouse, rat, rabbit, sheep, chicken or goat.
The invention also contemplates recombinant methods of producing antibodies and antibody fragments. For example, antibodies to RNA molecules have been produced by a method utilizing a synthetic phage display library approach to select RNA-binding antibody fragments (Ye et al, 2008).
As is well understood in the art, antibodies may be conjugated with labels selected from a group including an enzyme, a fluorophore, a chemiiuminescent molecule, biotin, radioisotope or other label.
Examples of suitable enzyme labels useful in the present invention include alkaline phosphatase, horseradish peroxidase, luciferase, β-galactosidase, glucose oxidase, lysozyme, malate dehydrogenase and the like. The enzyme label may be used alone or in combination with a second enzyme in solution or with a suitable chromogenic or chemiiuminescent substrate.
Examples of chromogens include diaminobanzidine (DAB), permanent red, 3-ethylbenzthiazoline sulfonic acid (ABTS), 5-bromo-4-chloro-3-indolyl phosphate (BCIP), nitro blue tetrazolium (NBT), 3,3 ',5,5 '-tetramethyl benzidine (TNB) and 4-chloro-l -naphthol (4-CN) , although without limitation thereto. A non-limiting example of a chemiluminescent substrate is Luminol™, which is oxidized in the presence of horseradish peroxidase and hydrogen peroxide to form an excited state product (3-ammophthalate).
Fluorophores may be fluorescein isothiocyanate (FITC), tetramethylrhodamine isothiocyanate (TRITC), allophycocyanin (APC), Texas Red (TR), Cy5 or R-Phycoerythrin (RPE), although without limitation thereto.
Radioisotope labels may include . l23I, I 3 II, 5 lCr and 99Tc, although without limitation thereto.
Other antibody labels that may be useful include colloidal gold particles and digoxigenin.
In other aspects, the invention provides a method of identifying a spliRNA expression profile as a quantitative or qualitative . indicator or measure of gene regulation. These methods may be particularly, although not exclusively, relevant to diagnosis of diseases and conditions associated with differential gene regulation.
In one particular embodiment, said spliRNA expression profile is an indicator and/or measure of mRNA splicing activity.
In another particular embodiment, said spliRNA expression profile is an indicator of particular exon inclusion as a result of mRNA splicing activity.
In another particular embodiment, said spliRNA expression profile is an indicator and/or measure of gene transcriptional activity.
In yet another particular embodiment, said spliRNA expression profile is an indicator and/or measure of epigenetic modulatory and/or regulatory activity.
In still another particular embodiment, said spliRNA expression profile is an indicator and/or measure of chromatin modification activity.
In still yet another particular embodiment, said spliRNA expression profile is an indicator and/or measure of nucleosome positioning activity.
In one embodiment, the method uses a "nucleic acid array" (spliRNA array).
By "nucleic acid array" is a meant a plurality of nucleic acids, preferably ranging in size from 10, 15, 20 or 50 bp to 250, 500, 700 or 900 kb, immobilized, affixed or otherwise mounted to a substrate or solid support. Typically, each of the plurality of nucleic acids has been placed at a defined location, either by spotting or direct synthesis. In array analysis, a nucleic acid-containing sample is labeled and allowed to hybridize with the plurality of nucleic acids on the array. Nucleic acids attached to arrays are referred to as "targets" whereas the labelled nucleic acids comprising the sample are called "probes ". Based on the amount of probe hybridized to each target spot, information is gained about the specific nucleic acid composition of the sample. The major advantage of gene arrays is that they can provide information on thousands of targets in a single experiment and are most often used to monitor gene expression levels and "differential expression ".
"Differential expression " indicates whether the level of a particular spliRNA in a sample is higher or lower than the level of that particular spliRNA in a normal or reference sample.
The physical area occupied by each sample on a nucleic acid array is usually 50-200 μιη in diameter thus nucleic acid samples representing entire genomes, ranging from 3,000-32,000 genes, may be packaged onto one solid support. Depending on the type of array, the arrayed nucleic acids may be composed of oligonucleotides, PGR products or cDNA vectors or purified inserts. The sequences may represent entire genomes and may include both known and unknown sequences or may be collections of sequences such as miRNAs. Using array analysis, the expression profiles of normal and diseased tissues, treated and untreated cell cultures, developmental stages of an organism or tissue, and different tissues can be compared.
In one embodiment, gene profiling, such as but not limited to vising a spliRNA array, is used to identify mRNAs whose expression shows a positive or inverse correlation with the expression of a specific spliRNA.
It will be appreciated that an absence of spliRNA expression could correlate with a presence of mRNA expression, or vice versa. Alternatively, a presence of spliRNA expression could correlate with a presence of mRNA expression or an absence of spliRNA expression could correlate with an absence of mRNA expression. Furthermore, a level of spliRNA expression could correlate with a level of mRNA expression, whether directly or inversely. It will be appreciated that a level of expression may be measured as a quantitative or a relative expression level.
In another embodiment, gene profiling allows the identification of regulators of disease processes and .potential therapeutic targets.
Examples of diseases and conditions that show differential gene regulation include but are not limited to Crohn's disease, Alzheimer's disease, Parkinson's disease, schizophrenia, infertility, rheumatoid arthritis, myocardial infarction, diabetes, congenital developmental disorders, coronary heart disease, and cancer such as breast cancer, lymphoma, leukemia, colorectal cancer, gastric cancer, ovarian cancer, aggressive metastatic brain cancer, and pituitary tumors (McKatg et al , 2003; Grunblatt et al , 2007; Liang efal , 2008; Liibke et al , 2008; Ridker,
2007; Zecchini et al , 2008; Perrin et al , 2007; Zumodio et al , 2008).
It will be appreciated that said gene regulation may refer to aberrant gene transcription, aberrant mRNA splicing, aberrant epigenetic modification, aberrant epigenetic regulation, aberrant chromatin modification and/or aberrant nucleosome positioning.
Further, spliRNAs may be associated with aberrant regulatory activity of oncogenes or tumor suppressors (Zhang et al , 2006) and may therefore become useful biomarkers for cancer diagnostics.
It will be appreciated that said aberrant regulatory activity may in some embodiments refer to activities such as transcription, mRNA splicing, 'epigenetic modification, epigenetic regulation, chromatin modification and/or nucleosome positioning.
In one particular embodiment, the spliRNAs may be associated with oncogenes such as myc*, Bcl-2 and -3*, myb* , mdm2*, mdmx*, and ras.
In another particular embodiment, the spliRNAs may be associated with a tumour suppressor gene, for example, p21 and/or p53 (see, e.g. , Sotos-Reyes & Recillas-Targa^O lO).
In another particular embodiment, the spliRNAs may be linked to aberrant mRNA splicing of genes associated with diseases and conditions such as, Dnmtl * and Dnmt3* associated with cancer progression, and APP and beta amyloid in Alzheimer's disease. In yet another particular embodiment, the spliRNAs may be linked to aberrant epigenetic modification ^and/or regulation of genes associated with various cancers, a-Thalassaemia, and Prader-Willi, AT -X, Fragile X, 1CF, Angelman's, and Rett syndromes.
Other methods of the invention, including but not limited to the herein mentioned spliRNA array, relate to diagnostic applications of the claimed nucleic acid molecules. For example, spliRNAs may be detected in biological samples in order to determine and classify certain cell types or. tissue types or spliRNA - associated pathogenic disorders which are characterized by differential expression of spliRNA molecules or spliRNA molecule patterns. Further, the developmental stage of cells, organs and/or tissues may be classified by determining spatial and/or temporal expression patterns of spliRNA molecules.
In another aspect, the invention provides a method of treating a disease or condition in an animal, said method including the step of administering to the animal a therapeutic agent selected from . the group consisting of:
(i) an isolated spliRNA molecule;
(ii) a fragment of the isolated spliRNA molecule;
(iii) a modified spliRNA molecule;
(iv) an at least partly complementary RNA or DNA molecule; and/or
(v) an antibody that binds any one of (i)-(iv);
to thereby treat said disease or condition.
Accordingly, the aforementioned therapeutic agents may be suitable for prophylaxis and/or therapy of animals, including mammals such as humans. For example, the therapeutic agents may be used to treat diseases, conditions, developmental processes and/or disorders associated with developmental dysfunctions. Certain spliRNAs may function as tumour-suppressors and thus expression or delivery of these spliRNAs or spliPJ 'A mimics " to tumor cells may provide therapeutic efficacy.
In one embodiment, the use of chemically modified spliRNAs to target either a specific spliRNA or to disrupt the binding of a spliRNA and its specific mRNA target in vivo may provide a potentially effective means of inactivating pathological spliRNAs.
Alternatively, spliRNAs may be administered to potentiate the effects of natural spliRNAs by promoting the expression of beneficial gene products such as tumour suppressor proteins (Rooij and Olson, 2007).
Therapeutic agents may be delivered to an animal in the form of a pharmaceutical composition comprising a pharmaceutically acceptable carrier diluent or excipient.
Accordingly, the invention provides a pharmaceutical composition comprising a therapeutic agent selected from the group consisting of:
(i) an isolated spliRNA molecule;
(ii) a fragment of the isolated spliR A molecule;
(iii) a modified spliRNA molecule;
(iv) an at least partly complementary RNA or DNA molecule and/or
(v) an antibody that binds any one of (i)-(iv);
and a pharmaceutically acceptable carrier, diluent or excipient.
By "pharmaceutically-acceptable carrier, diluent or excipient" is meant a solid or liquid filler, diluent or encapsulating substance that may be safely used in systemic administration. This includes carriers, diluents or excipients suitable for veterinary use.
Depending upon the particular route of administration, a variety of carriers, well known in the art may be used. These carriers may be selected from a group including sugars, starches, cellulose and its derivatives, malt, gelatine, talc, calcium sulfate, vegetable oils, synthetic oils, polyo!s, alginic acid, phosphate buffered solutions, emulsifiers, isotonic saline and salts such as mineral acid salts including hydrochlorides, bromides and sulfates, organic acids such as acetates, propionates and malonates and pyrogen-free water.
A useful reference describing pharmaceutically acceptable carriers, diluents and excipients is Remington's Pharmaceutical Sciences (Mack Publishing Co. N.J. USA, 1991). Any safe route of administration may be employed for providing a patient with the composition of the invention. For example, oral, rectal, parenteral, sublingual, buccal, intravenous, intra-articular, intra-muscular, intra-dermal, subcutaneous, inhalational, intraocular, intraperitoneal, intracerebroventricular, transdermal and the like may be employed. Intra-muscular and subcutaneous injection is appropriate, for example, for administration of immunotherapeutic compositions, proteinaceous vaccines and nucleic acid vaccines. In the case of gene therapy, which contemplates the use of electroporation or liposomal transfection into tissues, the drug may be transfected into cells together with the DNA.
Dosage forms include tablets, dispersions, suspensions, injections, solutions, syrups, troches, capsules, suppositories, aerosols, transdermal patches and the like. These dosage forms may also include injecting or implanting controlled releasing devices designed specifically for this purpose or other forms of implants modified to act additionally in this fashion. Controlled release of the therapeutic agent may be achieved by coating the same, for example, with hydrophobic polymers including acrylic resins, waxes, higher aliphatic alcohols, polylactic and polyglycolic acids and certain cellulose derivatives such as hydroxypropylmethyl cellulose. In addition, the controlled release may be achieved by using other polymer matrices, liposomes and/or microspheres.
Compositions of the present invention suitable for oral or parenteral administration may be presented as discrete units such as capsules, sachets or tablets each containing a pre-determined amount of one or more therapeutic agents of the invention, as a powder or granules or as a solution or a suspension in an aqueous liquid, a non-aqueous liquid, an oil-in-water emulsion or a water-in- oil liquid emulsion. Such compositions may be prepared by any of the methods of pharmacy but all methods include the step of bringing into association one or more agents as described above with the carrier which constitutes one or more necessary ingredients. In general, the compositions are prepared by uniformly and intimately admixing the agents of the invention with liquid carriers or finely divided solid carriers or both, and then, if necessary, shaping the product into the desired presentation. The above compositions may be administered in a manner compatible with the dosage formulation, and in such amount as is pharmaceuticaily-effective. The dose administered to a patient, in the context of the present invention, should be sufficient to achieve a beneficial response in a patient over an appropriate period of time. The quantity of agent(s) to be administered may depend on the subject to be treated inclusive, of the age, sex, weight and general health condition thereof, factors that will depend on the judgement of the practitioner.
Methods and compositions may be used for treating diseases or conditions in any animal. Animals include and encompass fish, avians (e.g., chickens and other poultry) and mammals inclusive of humans, livestock, domestic pets and performance animals (e.g., racehorses), although without limitation thereto.
So that the invention may be readily understood and put into practical effect, reference is made to the following non-limiting example, EXAMPLES
Example 1 - Nuclear-localized tiny RNAs are associated with transcription initiation and splice sites in raetazoans
Methods
THP-1 RNA isolation
THP-1 cells were grown in suspension culture according to previously published methods (Taft et al , 2009 (I)); Suzuki' et al , 2009) and harvested. Growth media was aspirated and the pellets were resuspended and washed twice in equivalent volumes of ice-cold PBS. The cells were then split into two equal volumes for the extraction of total RNA and the extraction of nuclear and cytoplasmic RNA. Total RNA was extracted using TRIzol (Invitrogen), according to the manufacturer's instructions.
Nuclear and cytoplasmic RNA was isolated as previously described (Hwang et al, 2007), except that each wash was carried out using 1ml of wash buffer and tween-40 was substituted, for tween-20 in the final wash. Additionally, to ensure complete clearing of any intact cells or nuclei, the cytoplasmic fraction was subjected to an additional centrifugation step at 1000 X g for 5 minutes at 4°C, after which the supernatant was transferred to clean tubes for RNA extraction. To validate the integrity of the cytoplasmic fraction an aliquot was aspirated and inspected under an inverted light microscope to confirm that it was free from intact cells or nuclei. The nuclear fraction was similarly confirmed free from intact cells under an inverted light microscope by comparing an aliquot of nuclei after the final wash step to an aliquot of intact cells. RNA was extracted using the TRIzol (Invitrogen) according to the manufacturer's instructions and the resulting RNA pellets were resuspended in equal volumes of ultra-pure water to obtain cell-equivalent concentrations. Validation of THP-1 nuclear and cytoplasmic RNA fractionation
Seven RNA species were assessed by qPCR on cDNA prepared from total, cytoplasmic, and nuclear RNA fractions (for a list of targets, primers and results, see Table 5 and Fig. 5. A total of 6ul from each cell-equivalent fraction was used (0.336ug from nuclear fraction, 1.74ug from cytoplasmic fraction and 3ug from the total fraction). Each 6ul fraction was DNase treated using TURBO DNase (Ambion) according to manufacturer's instructions and then reverse transcribed using the Superscript III Reverse Transcriptase kit (Invitrogen). Reverse transcription was primed using random hexamers, which were added at a concentration of 250ng per 5μg RNA. RT negative reactions were carried out in parallel. PCR amplicons were generated from each of the qPCR targets and extracted from a 2% agarose gel using the Wizard SV Gel and PCR Clean-Up System kit (Promega) and quantified using, a Nanodrop spectrophotometer. Each amplicon was then diluted to a concentration of 2ng/ul and used as templates for generating qPCR standard curves as a 1/10 serial dilution series.
Quantitative PCR of the cDNA targets was carried out using the SYBR
Green PCR Master Mix (Applied Biosystems) in 20ul reactions, using 2ul of cDNA, 4ul of primers (2uM), 4ul water and l Oul 2X master mix per reaction. The reactions were carried out in 96 well plates on an Applied Biosystems 7500 QPCR System in triplicate and also included both RT negative controls and water controls. Mean cDNA concentrations and standard deviations were assessed using the Applied Biosystems Sequence Detection Software vl .4. Northern blots were completed by EDC cross-linking as previously described Pall et al , 2.007). Successful separation of cytoplasmic and nuclear THP-1 RNA fractions was assessed using probes against two snoRNAs and two tRNAs (Table 6, Fig. 6). Nuclear enrichment of miR-15 was assessed using a probe spanning the 5' 16 nucleotides (Table 6), which facilitated detection of both miR-15a .and miR-15b without any increase in background. MieoRNA-16 was detected using a northern probe spanning its entire length (Table 6).
THP-1 small RNA deep sequencing
Prior to deep sequencing 2.4 x l O'6 pmol and 2.4 x 10"7 pmol of three 21mer and three 18mer synthetic RNA spike-ins with 5' phosphate and 3 ' hydroxyl groups, respectively, were added to the cytoplasmic, nuclear, and total RNA pools in equal concentrations (Table 2), Spike-ins were designed so that' they would not map to the human genome. Deep sequencing of the THP-1 small RNA samples was performed by GeneWorks (Adelaide, Australia) on the Hlumina GAIL Library preparation and sequencing was completed according to the manufacturer's instructions with one modification. Sample isolation from the PAGE gel after adaptor ligation was performed with a modified set of size markers to facilitate sequencing of small RNAs greater than or equal to 15 nt, allowing detection and quantification of tiRNAs and other very small RNA species.
Mouse granulocyte nuclei preparation, isolation and small RNA sequencing .
C57BL/6J mice were obtained from the Animal Resource Centre (Perth, Australia), with all animal experiments performed in a pathogen free facility according to national and institutional guidelines. Bone marrow was harvested from the femur, tibia and spine using a mortar and pestle in PBS supplemented with 2% fetal calf serum (FCS) as previously described (Hoist et al. , 2006). Bone marrow cells were passed through a 70 μηι filter to ensure a single cell suspension, and incubated with lineage specific antibodies (B220, CD19, CD3, Sca- 1 ; BioLegend), conjugated to biotin, together with anti-Gr-1 -fluorescein isothiocyanate (FITC; Biolegend), anti-cKit-phycoerythrin (PE; Becton Dickinson), anti-CD34-Alexafluor 647 (eBioscience) and anti-CD16/32- PerCP/Cy5.5 (Becton Dickinson) antibodies. Cells were washed twice in PBS with 2% FCS and incubated with stxeptavidin-APC-Cy7 (Biolegend). Control stains for FITC, PE, PerCP/Cy5.5, Alexafluor 647 and APC-Cy7 were used to determine compensation settings and gating for each population. Mature granulocytes were purified as shown previously (Guibal et al, 2009), using fluorescence activated cell sorting (FACS) on a Becton Dickinson Aria II. After purification, a small sample of the cells was reanalyzed for purity by flow cytometry, and a separate · sample was stained using May-Griinwald Giemsa following a cytospin. Purification of the nucleus from sorted cells was carried out using the PARIS kit (Ambion) with modifications to minimise RNA degradation. In order to confirm the purity of the nuclear fraction, SDS-PAGE and Western blots were carried out to detect the known cytoplasmic protein GAPDH, and the ubiquitous protein β-actin, on whole cell and nuclear lysates to ensure absent cytoplasmic contamination of the nuclear fraction. RNA was extracted, from the nuclear fraction using Trizol prior to deep sequencing. Deep sequencing of nuclear small RNAs was performed by GeneWorks (Adelaide, Australia) on the Illumina GAIL Library preparation and sequencing was completed according to the manufacturers' instructions.
Other small RNA deep sequencing datasets
Small RNA datasets from mouse (Babiarz et al , 2008), Drosophila (Chung et al , 2008), C. elegans (Batista et al , 2008), A. queenslandica (Grimson et al , 2008) and budding yeast (Drinnenberg et al , 2009) were obtained from the NCBI Gene Expression Omnibus (NCBI GEO). See Table 7 for a complete list of the datasets and their corresponding identifiers. Human GRO-seq data (Core et al , 2008) was obtained through NCBI GEO (GSE13518). The summary 'aligned' BED files provided by the authors, and available at the GEO website, were used for all analyses. Arabidopsis small RNA datasets were obtained from the Arabidopsis SBS database, available at http:/ mpss.udsl.edu at sbs/ (Nakano, 2006), and pooled. Reference genome and annotation sources
Human (hgl 8, NCBI Build 36.1), mouse (mm9, NCBI Build 37), Drosophila (dm3, BDGP Release 5), C. elegans (ce6, WS 190); and S. cerevisiae (sacCer2, SGD June 2008) genome sequences, and gene and genome feature annotations, were obtained from a local mirror of the UCSC genome browser (Kuhn et a!. , 2009). Human and mouse Refseq genes were obtained from the respective refGene databases. Drosophila Flybase, C, elegans Sanger, and S. cerevisiae SGD . gene annotations were obtained from the dm3.flyBaseGene, ce6.sangerGene, and sacCer2.sgdGene databases, respectively. A. queenslandica contig sequences and gene annotations were obtained from the University of Queensland sponge genome sequence repository. The S. castellii genome sequence and gene annotations were obtained from the Yeast Gene Order Database, available at ht tp .7/ wo lie . gen . tc d . ie/y go b/ (Byrne & Wolfe, 2005). We used the Arabidopsis TAIR8 genome sequence (ftp;// .m,abidopsis.org/home/¼ir/Sequences vvhok_cluor o3omes^ and · the TAIR8 Ensemble gene annotations (Poole, 2007).
CD4+ T-cell nucleosome modification data (Barski et al , 2007; Wang et al , 2008) was downloaded directly from the authors' website, and is available at http;//dir.nhlbi.iiih.gov/papers/lniL½pigenomes/listcell.as and lTitp;//dir.nhlbi,niL The summary bed files provided by the authors were used as the basis for all analyses (see below for more details). Control CD4+ T-cell nucleosome datasets were obtained from the NCBI Sequence Read Archive (SRR00071 1 - SRR000720), and processed to obtain nucleosome-length fragments as described previously (Nahkuri et al , 2009) (more below). S. cerevisiae combined H3 and H4 nucleosome data was obtained from http://atlas.bx.psu.edu/veast-maps/veast- index.html (Mavrich et al. , 2008).
Bioinformatic analyses
All bioinformatic analyses were done on a local high-performance computer that houses a mirror of the UCSC Genome Brower. For most analysis we used a suite of in-house AWK, Perl, and Python scripts or backend tools inherent to the UCSC mirror. Small RNA datasets, raw CD4+ nucleosome data, and S. eerevisiae H3 and H4 nucleosome data were mapped to the appropriate genome using ZOOM (Lin et al , 2008). Small RNA, GRO-seq, chromatin modification and nucleosome density distributions were accomplished by converting mapped tag positions {i. e. , BED coordinates) to genome-wide wiggle density plots and averaging these densities across all loci of interest {e.g. , Refgene TSSs) using a set of in-house Perl scripts.
CD4+ T-cell nucleosome data (Barski et al , 2007; Wang et al , 2008) were processed to facilitate high-resolution bioinformatic queries. The signals from the plus and minus strands associated with the same nucleosome are typically -150 bp apart, because the sequence tags are derived from the ends of the strands rather than over their whole length. Therefore, to obtain accurate ChlP-seq nucleosome profiles from the publicly available deep sequencing data we: extended the genomic matches of all uniquely mapping tags in silico in the 3' direction so that they reached a total length of 150 nt, consistent with the expected length of nucleosome associated DNA, as described previously (Nahkuri et al , 2009; Schmid & Bucher, 2007). To compute the 'wiggle profile' we summed the distribution of all tags across the genome downstream of the TSS, and then computed the average based on the number of TSSs queried. This resolves into a distinct single curve representing peak nucleosome density. Pol II ChlP-seq fragments were also summed across all genes with tiRNAs and, like the , nucleosome data, resolved to a single high-resolution curve.
The abundance of THP- 1 nuclear and cytoplasmic small R A datasets were normalized by the relative expression of spike-ins 2 and 6 (Table 2). Bioinformatic queries against spike-ins were performed without mismatches to ensure accurate quantification and nonnalization. Identification and analysis of THP-1 nuclear tiRNAs was performed as previously described (Taft et al , 2009 (I)). Briefly, tags that mapped to known small RNA loci, repeat elements, or other potential confounding features were removed and small RNAs with a modal length of -18 nt that map sense and proximal (generally -60 to +120 nt) to TSSs (defined by Refgene annotations) were identified. ). Analysis of the expression of genes with tiRNAs derived from the THP-1 nuclear, cytoplasmic, or total RNA fractions was accomplished using gene expression data from undifferentiated THP-1 cells (htt p : //fantom . gsc . riken . i p/4/') (Suzuki et αί , 2009) as described previously Taft et αί, 2009 (I). efgenes with high tiRNA abundance (>8) or low tiRNA abundance (1) were obtained and regions -60 to +300 relative to the TSS were assessed for chromatin mark densities, as described above. 'Unannotated' 18mers were identified after eliminating all canonical tiRNAs, and then further filtering to exclude those proximal to any knownGene TSS or within a knownGene boundary {i.e. , within the bounds defined by the transcription start and stop sites). Enrichments at chromatin marks were accomplished using loci with chromatin mark tag densities (relative scores) two standard deviations higher than the mean for that mark across the entire genome. These loci were then collapsed down to single unique entities using the UCSC backend tool, featureBits. Loci located near TSSs (within 200 nt) or. that mapped to known small RNA annotations were excluded from the analysis. The relative enrichment of nuclear small RNAs at each chromatin mark or protein binding site was assessed using an inrhouse bootstrapping program over 1000 iterations. All chromatin marks and binding sites were high-confidence - the loci examined were required to have ChlP-seq scores greater than 2 standard deviations above the mean for that sample/chromatin mark. Bootstrap control experiments at 50, 100, and 500 iterations showed no significant difference in enrichment profiles, suggesting that small RNA enrichments at specific chromatin marks are robust.
Splice site RNAs are defined as small RNAs less than or equal to 26 nt, dominantly 17-18 nt, whose 3' ends are precisely coincident any exonic splice donor site {i.e., the 3' end of an exon), including donor sites at both non-protein- coding and canonical protein-coding genes. Splice site RNAs also include small RNAs less than or equal to 26 nt, dominantly 17-18 nt, whose 3' ends map to the 3' end of internal exons, that is, exons that are boaaded on both sides by an intron and/or exons that are exclusively protein encoding. To ensure the observed spHRNA enrichment was not influenced by the failure to map tags across splice site junctions, small RNAs were mapped to both the genome and a library of splice site junctions for each organism. Dips of small RNAs just across the splice site in some organisms may reflect poor gene annotations {i.e., incorrectly annotated or missed exons). Analysis of the expression of genes with spliRNAs in human -and Drosophila was accomplished using gene expression data , from undifferentiated THP-1 cells (ifflp ://fantom. gsc. riken. j /4/) (Suzuki et al , 2009) and a Drosophila developmental time course (Arbeitman et al. , 2002), as described previously (Taft et al , 2009 (I)). Sequence logos were generated using Weblogo (Crooks et al , 2004) To examine the association of spliRNAs with alternative and constitutive exons UCSC knownGene exon annotations were used to derive the splicing status of exons with spliRNAs versus exons without spliRNAs in the same genes. The prevalence of four different alternative splicing events (see Table 4) in both data sets was assessed, and the statistical significance of .the observed difference was calculated using the Fisher's exact test. The analysis was performed using in-house C and Perl programs that are available on request.
We used annotations from miRbase Version 12 (Griffiths-Jones et al. , 2008) to assess THP-1 nuclear and cytoplasmic micro RNA expression. To ensure accurate miRNA expression values we included both uniquely mapping and multimapping tags in our analysis. However, multimapping tags were only analysed if they multimapped amongst miRNA loci. Relative micro RNA expression was calculated as the sum of the normalized abundance of all tags that mapped to any particular pre-miRNA. We defined micro RNA-offset RNAs as any RNA tag that covered the most 5' or 3 ' ends of a pre-miRNA annotation.
Results
Investigation of human Refseq exon boundaries revealed more than 5,000 THP-1 genes with small RNAs whose 3' termini map precisely to the splice donor site (i. e. , the 3' end of the exon), are approximately 35-fold enriched in the nuclear deep sequencing library ' (Fig. 4a,b), and present at internal exons regardless of gene length or exon number (Table 3). These splice-site RNAs (spliRNAs) are detectable using mapping strategies that consider exon-exon or exon-intron boundaries and multi-mapping deep sequencing reads (Figs. 7-9).
We also found that spliRNAs are nuclear localized in primary mouse granulocyte nuclei (Figs. 10 and 11), and are detectable in a diverse set of evolutionarily distant animals. Splice-site R As are expressed in mouse embryonic stem (ES) cells (Fig. 4c), a wide range of Drosophila melanogaster (Fig. 4d) and Caenorhabditis elegans (Fig. 4e) tissues, and in one of the most basal multicellular animals, the marine sponge Amphimedon q eenslandica (Fig. 4f). They have a modal length of 17- 18 nt in human THP-1 cells, and a modal length of 17 nt in all other species examined. Their expression is not affected by the loss of Dicer or DGCR8 in mouse ES cells or C. elegans, germline mutants (Figs. 12 and 13), indicating that spliRNA biogenesis is not intimately connected with the pathways that produce miRNAs or siR As. Indeed, with few exceptions, spliRNAs are expressed in- most tissues and developmental stages - in D. melanogaster and C. elegans (Figs. 14 and 15). However, spliRNAs are more enriched compared to background in Drosophila heads compared to bodies, are almost undetectable in imaginal discs, and are less abundant in adult sponge compared . to embryo (Figs. 16 and 17), suggesting that spliRNAs may be connected with high gene expression in actively proliferating or undifferentiated tissues. Indeed, THP-1 and Drosophila genes With spliRNAs are more highly expressed than those without (Fig. 18). To investigate if spliRNAs are present outside the animal kingdom, small RNA distributions at splice donor sites in the flowering plant Arabidopsis thaliana and the budding yeasts Saccharomyces castellii and Saccharomyces cerevisiae were investigated.- No evidence of spliRNAs was detected in yeast or plants (Fig. 19).
Overall, spliRNAs are weakly expressed (the median abundance in THP-1 nuclei is 1 ) and show a strong enrichment for 3' -terminal guanines, which is likely, however, driven by the consensus splice site sequence (Fig. 20). Additionally, although spliRNAs are statistically more common at constitutive splice sites, we also observed a mild but statistically significant enrichment of spliRNAs at alternative first exons (Table 4). To query the relationship between RNAPII activit and spliRNAs we examined the recently described GRO-seq (Core et al. , 2008) dataset, which captures the position, amount and orientation of transcriptionally engaged RNA polymerases. We found a local GROseq minimum at the splice donor site (Fig. 21), which aligns with the position of spliRNAs and may be consistent with a model of spliRNA biogenesis dependent on cleavage of the 3 ' end of the nascent transcript. Consistent with this hypothesis, short introns were twofold enriched downstream of exons expressing spliRNAs, which could promote RNAPII pausing and backtracking due the proximity of the downstream exon-associated nucleosome (Fig. 22), and spliRNAs are approximately 2-fold less frequent at exons < 60 nt (Fig. 23), which generally lack positioned nucleosomes (Anderson et ai , 2009).
Discussion
Taken together, these data suggest that there is a wide diversity of small RNAs localized to, and abundant in, the metazoan nucleus. We propose that many of these species are involved in regulating epigenomic modifications and transcription. Transcription initiation RNAs and spliRNAs may have a common origin and a common function, possibly associated with the positioning of nucleosomes. If this is so, our preferred hypothesis is that this is an evolved capacity of RNAPII backtracking and TFIIS activity (Taft et ai , 2009 (II)) that allows efferent signals to be produced in parallel with transcription elongation to mark the position for future reference. Indeed, 3.1 % of THP- 1 genes with spliRNAs also have tiRNAs. However, two alternative, but not mutually exclusive, possibilities are that spliRNAs are linked to, or are by products of, splicing or result from post-transcriptional cleavage of longer capped RNAs (Fejes-Toth et al., 2009).
·. The absence of tiRNAs and spliRNAs in yeast and plants may reflect different systems of nucleosome positioning, chromatin marking, or the criteria used to define these small RNA classes. For example, small RNAs derived from wild type S. cerevisiae (Drinnenberg et ai , 2009), which, lacks RNAi, are . dominantly -17-18 nt, have a 3' -terminal nucleotide purine i.e. , adenine) bias, and are phased such that local small RNA maxima coincide with minima of nucleosome density (Fig. 24). Therefore, although these small RNAs do not meet the criteria we have used to define tiRNAs and spliRNAs in metazoans, they exhibit many similar characteristics, suggesting that very small RNAs are a basal feature within the eukaryotic lineage that may have been coopted to specific genomic positions, and into specific roles, in animals. Example 2 - Splice site NAs differentiate normal and breast cancer tissue
To test if spliRNAs are differentially expressed in, and therefore diagnostic of, cancer, we assessed small RNA deep sequencing data obtained from normal adjacent and ductal carcinoma in situ (DCiS, stage 2) human breast tissue from a single patient. These data show that spliRNAs are highly expressed in both normal and DCIS tissue, but are nonetheless significantly different. Indeed, a Gene Ontology enrichment analysis showed that while genes with spliRNAs in the normal adjacent tissue are strongly enriched for terms associated with terminal differentiation (Table 8), including three morphogenesis-associated Gene Ontology terms, they are completely absent in the breast carcinoma sample (Table 9). Likewise, KEGG pathway analysis revealed that enrichment for the Cell Cycle and Focal Adhesion pathways was specific to genes with spliRNAs in the normal adjacent tissue (Fig. 25). These enrichments were lost in DCIS tissue, and replaced with enrichments for the splicesome and the RNA degradation pathway (Fig. 26)
Additionally, a differential expression analysis showed that a subset of more than 170 genes had spliRNA expression values twofold or more up- regulated in the DCIS sample compared to normal (Fig. 27). Querying GeneSigDB revealed that these genes are significantly associated with gene expression signatures previously linked to breast cancer, specifically those with ERCC2 and BRCAl mutations (Fig. 28). Likewise, querying the GSEA MSigDB revealed significant overlap of genes with highly expressed spliRNAs in the DCIS sample with gene sets previously associated with a number of breast cancer subtypes and responses to treatments (Table 10). Overall, this data indicates that spliRNAs are differentially expressed between normal and cancer tissue, and are as good a diagnostic as much larger gene sets. This strongly suggests that. spliRNAs may be a better diagnostic and prognostic indicator than traditional gene profiling.
Throughout this specification, the aim has been to describe the preferred embodiments of the invention without limiting the invention to any one embodiment or specific collection of features. Various changes and modifications may be made to. the embodiments described and illustrated herein without departing from the broad spirit and scope of the invention.
All computer programs, algorithms, patent and scientific literature referred to in this specification are incorporated herein by reference in their entirety.
TABLES
Table 1. THP-1 small RNA deep sequencing library statistics
Cytoplasmic Nuclear Total
Total reads . 6226629' 6579778 5820445
Total reads minus 5' and 3'
adaptor contamination 6057957 6270629 5089263
% of reads that map uniquely
w/o mismatches 23 24 23
% of reads that map to the genome
any number of times w/o mismatches 58.1 67 58
Average Illumina G All v 1.0
quality scores* 36.3 32.8 36.4
*Quality scores are the average of all quality values given in the Illumina FASTQ
files for each library.
Table 2. Synthetic RNA spike-ins and library norraalization
Total RNA Fold abundance Change
Cytoplasmic Nuclear (Nuc v.
Spike-in „ abundance abundance Cvto)
ATCAGCTGTTATAAGCCGGCC 3 12 14 -
AGTAACTCTAGCGGCTTAGTC . 1836 2741 3010 1.49
AACCTATGGTTGCGCTACGAC 6 14 25 -
ATCGTGCAATCGCGCATA 2 4 12
ACTCTATACGCGGTACGA 10 15 37 -
ATACGTCGACACGGTTCA 263 476 580 1.8
Average Fold Change
(Nuc v. Cyto) \ : 1.645
All values listed above are total sequence counts (i.e.; total sequence abundance
in a given library).
Table 3. Number of unique spliRNAs per exon in human ΊΉΡ-1 cells
1st 2nd 3rd 4th 5th 6th 7th 8th 9th 10th 11th 12th 13th 14th 15 th 16th 17th 18th 19t exon exon exon exon exon exon exon exon exon exon exon exon exon exon exon exon exon exon exo two 56 - - - - - - - - - - - - - _
three 50 89 - - - - - - - - - - - - - _ _
four 49 77 76 - - - - - - - - - - - - five 67 77 93 89 - - - - - - - - - _ _
six 77 95 101 98 94 - - - - - - - - - seven 45 79 75 102 86 85 - - - - - - _ - _ _
eight 48 61 76 68 96 121 79 - - - - - - _ _ _ _ nine 45 59 71 80 66 85 85 81 - .- - - - - _ _
ten 63 80 66 101 78 81 51 82 75 - - - - - - _
eleven 52 77 78 63 59 81 69 68 83 55 - - _ - - twelve 31 57 72 69 47 41 51 62 59 53 47 - - _ _
thirteen 40 36 71 73 51 49 59 39 52 52 40 54 - _ _
fourteen 26 43 34 47 47 ■ 33 42 39 38 41 39 38 38 _ _ _
fifteen 28 37 30 39 55 49 ' 30 58 39 46 43 29 28 30 _ _ sixteen 23 16 20 33 53 27 65 33 . 57 27 21 32 27 36 23 _
seventeen 13 18 1 1 24 39 24 29 35 27 30 28 25 33 31 29 14 _ _ eighteen 14 21 15 20 33 21 29 25 19 18 36 28 3 1 14 30 18 16 _ nineteen 13 14 21 21 25 19 30 19 29 10 31 28 32 18 23 19 17 10 _ twenty 13 23 7 12· 24 8 ■ 22 26 21 12 14 24 28 24 17 20 10 25 1 1
*Note: Human Refseq genes with various numbers of exons are shown as rows. For example 'three' refers to genes with three exons. Columns show the number of spliRNAs (i.e. , small RNAs that map to 5' splice sites) counted at each exon in that gene. SpliRNAs may be counted more than once since they may map to exons present in more than one mRNA isoform. spliRNAs are not preferentially positioned at any one exon, but do appear to be generally enriched at internal exons. Overall, 7.6% of exons within genes with spliRNAs show evidence of spliRNA expression.
Table 4. Splice-site RNA occurrence by exon type
Figure imgf000044_0001
Table 5. qRT-PCR primers
Table 6. Northern blot probes
Target Probe sequence (5* to 3'}
SNORD77 CATCAGACAGATAGTACATCTCTTCATGAT
SNORA 19 AAGCAGGTCAATGAAATGTGC
tRNA Lys (AAG) TCTACCGACTGAGCTAGCCG
tRNA His (CAY) CTAACCACTATACGATCACGGC
miR- 16 CGCCAATATTTACGTGCTGCTA
miR- 15 GTAAACCATGATGTGC
Table 7. GEO small RNA deep sequencing daiasets
GEO identifier Library identifier Description
Mus musc lus ί mouse) _
GSE 12521 GSM3 14552 WT ES cells
GSM3 14553 Dicer''" ES cells
GSM3 K557 ___ DGCR8"'" £S cells
DKOsbphilci τ βΐακ oguster (ύτΛΪι fly)
GSE i !'624 " ' " GSM286613 0- fh embryo
GSM286605 2-6h embryo
GSM286606 6- 1 Oh embryo
GSM240749 Female heads
GSM286603 Female bodies
GSM286601 Male heads
GSM286602 Male bodies ■
Figure imgf000045_0001
GS 297745 L3 larvae
GSM297746 L4 larvae
GSM297749 Glp-4 mutant
GSM297750 Mixed stage
GSM297751 Adult
GSM297752 Prg- l mutant
. . . ,.,.. r..: .., .^-^.m,^. GSM297753 Fog-2. m. utant . . .... ........ .
A mphif idcoYi que l*L::. .: ^ii :„.. „..::.... '
" GSET2578~: GSM315551 Embryo
GSM3 15553 Adult
Budding yeast ±
GSE 17872 GSiVi447740 Wild type Saccharomyces castellii
GSM447747 Wild type Saccharomyces cerevisiae Table 8. Gene Ontology enrichment analysis - normal adjacent tissue
GO ID Description P value
GO-0003 159 morphogenesis of an endothelium 0.006
GO:0015803 branched-chain aliphatic amino acid transport 0.006
GO:0015820 leucine transport 0.006
GO:0015827 tryptophan transport 0.006
GO:0060356 leucine import 0.006
GO:0060440 trachea formation 0.006
GO:0061047 . branching involved in lung morphogenesis 0.006
GO:0061 154 endothelial tube morphogenesis 0.006
GO:0030388 fructose 1 ,6-bisphosphate metabolic process 0.005
GO:0015801 aromatic amino acid transport 0.015 '
GO-.0014009 glial cell proliferation 0.042
GO:0070062 extracellular vesicular exosome 0.042
GO:0050775 positive regulation of dendrite morphogenesis 0.04
GO.-0048593 ' camera-type eye morphogenesis 0.009
GO:0016278 lysine N-methyltransferase activity 0.006
GO:0016279 protein-lysine N-methyltransferase activity 0.006
GO:0018024 histone-lysine N-methyltrahsferase activity 0.006
GO:0016571 histone methylation 0.01 1
GO:0042054 histone rnethyltransferase activity 0.01
GO:0004003 ATP-dependent DNA helicase activity- 0.01 1
GO:0051291 protein heterooligomerization 0
GO: 0008094 DNA-dependent ATPase activity 0.01
GO:0003678 DNA helicase activity 0.045
GO:0004386 helicase activity 0
GO:0016568 chromatin modification 0
GO:0016887 ATPase activity 0.01 1
GO:0005524 ATP binding 0.042
GO:0032559 adenyl ribonucleotide binding 0.049
Table 9. Gene Ontology enrichment analysis - breast carcinoma sample
GO ID Description P value
GO-.0015803 branched-chain aliphatic amino acid transport 0
GO:0015820 leucine transport 0
GO:0015827 tryptophan transport 0
GO:0060356 leucine import 0
GO:0015801 aromatic amino acid transport 0.004
GO:0001786 phosphatidylserine binding 0.003
GO.0005544 calcium-dependent phospholipid binding 0.015
GO:0004386 helicase activity 0
GO:0008026 ATP-dependent helicase activity 0
GO:0070035 purine NTP-dependent helicase activity 0
GO:0042623 ATPase activity, coupled 0
GO:0016887 ATPase activity 0
GO:0005524 ATP binding 0.005
GO:0032559 adenyl ribonucleotide binding 0.007
GO:0030554 adenyl nucleotide binding 0.015 Table 10. GSEA MsigDB query
Description P value
Genes down-regulated in non-metastatic breast cancer tumors
having type 1 amplification in the 20q 3 region. 1.24E-05
Genes down-regulated in PaCa44 and CFPAC i cells
(pancreatic cancer) after treatment with deeitabine. 1 54E-05
Genes down-regulated.by ESRRA [Gene 10=2101] only. 2.36E-05
DNA repair genes whose promoters contain putative ZNF 143
[Gene 03=7702] binding sites. 2.97E-05
Upregulated by induction of exogenous BRCAl in EcR-293 cells. 5.22E-05
Genes regulated by ESRRA [Gene ID=2101 ] in
MCF-7 cells (breast cancer). S.60E-05 Genes up-reguiated in hepatoblastoma samples
compared to normaHiver tissue. 6.1 1E-05
REFERENCES
Anderson, R. et al., Genome Res 19: 1732- 1741 (2009).
Arbeitman, M.N. et al., Science 297: 2270-2275 (2002).
Babiarz, J.E. et al., Genes Dev 22: 2773-2785 (2008). .
Barski, A. et al., Cell 129: 823-837 (2007).
Batista, P.J. et al., Mol Cell 31 : 67-78 (2008).
Byrne, K.P. & Wolfe, K.H., Genome Res 15: 1456- 1461 (2005).
Chu CY. & Rana TM., J Cell Physiol 213: 412 (2007).
Chung, W.J. et al., Curr Biol 18: 795-802 (2008).
Core, L.J. et al, Science 322: 1845- 1848(2008).
Crooks, G.E. et al, Genome Res : 1 188- 1 190 (2004).
Drinnenberg, I.A. et al., Science 326: 544-550 (2009).
Fejes-Toth, K. et al., Nature 457: 1028-1032 (2009).
Ghildiyai, M. & Zamore, P.D., Nat Rev Genet 10: 94-108 (2009).
Griffiths- Jones, S. et al., Nucleic Acids Res 36: D 154-158 (2008)..
Crimson, A. et al. Nature 455: 1 193- 1 197 (2008).
Guibal, F.C. et al., Blood 114: 5415-5425 (2009).
Hoist, J. et al., Nat Protoc 1 : 406-417 (2006).
Hwang, H.VV. et al., Science 315: 97- 100 (2007).
Kuhn, R.M. et al. Nucleic Acids Res 37: D755-761 (2009).
Lin, H., et al., Bioinformatics 24: 2431-2437 (2008).
Malone, C. & Haunon, G., Cell 136: 656-668 (2009).
Mattick J.S. & Makunin I.V., Hum Mol Genet 14: R121 (2005).
Mavrich, T.N. et al., Genome Res 18: 1073-1083 (2008).
Nahkuri, S. et al., Cell Cycle 8: 3420-3424 (2009).
Nakano, M., Nucl Acids Res 34: D731-735 (2006).
Pall, G. et al., Nucl Acids Res 35: e60 (2007).
Perrin et al., Schizophrenia Bulletin 33: 1270- 1273 (2007).
Pi!lai RS. et al., Trends Cell Biol 17: 1 18 (2007).
Poole, R.L. Methods Mol Biol 406: 179-212 (2007).
Schmid, CD, & Bucher, P. Cell 131 : 83 1 -832; author reply 832-833 (2007). Seila, A.C. et aL, Science 322: 1849- 185 1 (2008).
Sotos-Reyes & Recillas-Targa, Oncogene (published online 25 January 2010).
Suzuki, H. et aL, Nat Genet 41: 553-562 (2009).
Taft, RJ. et al. (I), Nat Genet 41 : 572-578 (2009).
Taft, R.J. et al. (II), Cell Cycle 8: 2332-8 (2009).
van Rooij E. & Olson EN., J Clin Invest 111: 2369 (2007).
Vasudevali S. et al., Science 318: 193 1 (2007).
Wang, Z. et al., Nat Genet 40: 897-903 (2008).
Zhang B. et al., Dev Biol 302: 1 (2007).
Zumodio, N. et al., Reproduction 136: 131 -146 (2008).

Claims

1. An isolated substantially single-stranded RNA molecule, wherein said isolated RNA molecule comprises a nucleotide sequence:
(i) comprising no more than 26 contiguous nucleotides that correspond to a genomic DNA sequence associated with gene regulation; and
(ii) comprising a nucleotide sequence corresponding to a 3' nucleotide sequence of an internal exon of the genomic DNA sequence.
2. The isolated RNA molecule of Claim 1 , wherein said isolated RNA molecule comprises a nucleotide sequence that corresponds to a sense strand of the internal exon of the genomic DNA sequence.
3. The isolated RNA molecule of Claim 1 or 2, wherein said isolated RNA molecule terminates in a 3' nucleotide sequence that corresponds to the 3' end of the internal exon of the genomic DNA sequence.
4. The isolated RNA molecule of any one of Claims 1 -3, wherein said isolated RNA molecule consists of a nucleotide sequence that corresponds to the 3' end of the internal exon of the genomic DNA sequence.
5. The isolated RNA molecule of any one of Claims 1-4, wherein said isolated RNA molecule comprises a nucleotide sequence that corresponds to a nucleotide sequence that is located at or near a 5' splice site of the internal exon of the genomic DNA sequence.
6. The isolated RNA molecule of Claim 5, wherein said isolated RNA molecule consists of a nucleotide sequence that corresponds to a nucleotide sequence that is located at the 5' splice site of the internal exon of the genomic DN A sequence.
7. The isolated RNA molecule of any one of Claims 1-6, wherein said isolated RNA molecule consists of 14-26 contiguous nucleotides.
8. The isolated RNA molecule of Claim 7, wherein said isolated RNA molecule consists of 14-20 contiguous nucleotides.
9. The isolated RNA molecule of Claim 8, wherein said isolated RNA molecule consists of 16- 18 contiguous nucleotides.
10. The isolated RNA molecule of Claim 9, wherein said isolated RNA molecule consists of 17 or 18 nucleotides.
1 1. The isolated RNA molecule of any one . of Claims 1 - 10, wherei said isolated RNA molecule comprises a nucleotide sequence that is located in or obtainable from a cell nucleus.
12. The isolated RNA molecule of any one of Claims 1-1 1 , wherein the genomic DNA sequence is of or obtainable from a eukaryote.
13. The isolated RNA molecule of any one of Claims 1 - 12, wherein the genomic DNA sequence is of or obtainable from a metazoan.
14. The isolated RNA molecule of any one of Claims 1 -13, wherein the genomic DNA sequence is of or obtainable from a vertebrate or a mammal.
15. The isolated RNA molecule of any one of Claims 1-14, wherein the genomic DNA sequence is of or obtainable from a human.
16. The isolated RNA molecule of any one of Claims 1 - 15, wherein the isolated RNA molecule comprises a nucleotide sequence that is GC enriched, preferably comprising a GC content of >54%.
17. An isolated substantially single-stranded RNA molecule comprising a nucleotide sequence:
(i) consisting of 17 or 18 nucleotides that correspond to a sense strand of an internal exon of a mammalian genomic DNA sequence located at a 5 ' splice site;
(ii) corresponding to a 3 ' end of the internal exon of the mammalian genomic DNA sequence;
(iii) comprising a GC content of >54%; and
(iv) that is substantially free of internal base-pairing.
18. The isolated RNA molecule of any one of Claims 1 -17, wherein said isolated RNA molecule comprises a nucleotide sequence selected from any one of the nucleotide sequences set forth in SEQ ID NOs: l to 16,898, or a nucleotide sequence at least partly complementary thereto.
A modified RNA molecule comprising the isolated RNA molecule any one' of Claims 1-18, or a nucleotide sequence at least 70% identical thereto.
The modified RNA molecule of Claim 1 comprising a chemical entity selected from the group consisting of: a modified base, a carbohydrate, a peptide, a biotin, a cholesterol molecule, a fluorophore, a radionuclide, and a metal.
The modified RNA molecule of Claim 19 comprising a chemical modification selected from the group consisting of an LNA, a PNA, a 2' -methyl, and a morpholino.
The modified RNA molecule, of Claim 19, wherein said modified RNA molecule is selected from the group consisting of an antisense inhibitor, a point mutant, a spliRNA mimic, and a spliRNA sponge.
A fragment of the isolated RNA molecule of any one of Claims 1 - 18, wherein said fragment comprises at least 5 nucleotides of said isolated RNA molecule.
A genetic construct comprising or encoding one or a plurality of:
(i) the isolated RNA molecule of any one of Claims 1- 18;
(ii) the modified RNA molecule of any one of Claims 19-22; or
(iii) . the fragment of Claim 23.
The genetic construct of Claim 24, wherein said genetic construct is an expression construct comprising a DNA sequence complementary to one or a plurality of the isolated RNA molecules of. any one of Claims 1 - 18, the modified RNA molecule of any one of Claims 19-22, or the fragment of Claim 23, operably linked or connected to one or more regulatory nucleotide sequences.
The genetic construct of Claim 24 or the expression construct of Claim 25, wherein said genetic construct or said expression construct is selected from the group consisting of a phage, a plasmid, a cosmid, and an artificial chromosome.
A host cell comprising the genetic construct of Claim 24 or 26, or the expression construct of Claim 25 or 26.
28. A method of identifying the. isolated RNA molecule of any one of Claims lr 18, or the fragment of Claim 23, said method including the step of isolating one or more of the isolated RNA molecules from a nucleic acid sample.
29. The method of Claim 28, wherein said nucleic acid sample is of or obtainable from a human.
30. A method of identifying one or more of the isolated RNA molecules of any one of Claims 1-18, or the fragment of Claim 23, said method including the step of identifying a genomic DNA sequence which is complementary to the nucleotide sequence of said one or more isolated RNA molecules.
3 1. The method of Claim 30, wherein said genomic DNA sequence is of or obtainable form a human.
32. A method of identifying a regulatory region in a genome, said method including the step of identifying one or more of the isolated RNA molecules of any one of Claims 1 -18, or the fragment of Claim 23, to thereby identify said regulatory region.
33. The method of Claim 32, wherein said regulatory region is associated with one or more regulatory activities selected from the group consisting of mRNA splicing, transcription, epigenetic modification, epigenetic regulation, chromatin modification and nucleosome positioning.
34. The method of Claim 32 or 33, wherein said genome is of a human.
35. The method of any one of Claims 28-34, wherein said method is undertaken using a deep sequencing technology selected from the group consisting of 454™-, Helicos-, PacBio-, Solexa/Illumina- and SOLiD- sequencing.
36. A computer-readable storage medium or device encoded with data corresponding to one or more of:
(i) the isolated RNA molecule of any one of Claims 1 -18;
(ii) the modified RNA molecule of any one of Claims 19-22; or
(iii) the fragment of Claim 23; and (iv) an at least partly complementary RNA or DNA molecule of the isolated RNA molecule of any one of Claims 1 -18 or the fragment of Claim 23
A method of determining whether a mammal has, or is predisposed to, a disease or condition associated with one or more regulatory regions of a genome, said method including the step of determining whether said mammal comprises one or more of the isolated RNA molecules of any one of Claims 1 -18, or the fragment of Claim 23, wherein the or each nucleotide sequence of said one or more isolated RNA molecules or said fragment corresponds to a genomic DNA sequence associated with' said disease or condition.
The method of Claim 37, wherein said regulatory region is associated with one or more regulatory activities selected from the group consisting of mRNA splicing, transcription, epigenetic modification, epigenetic regulation, chromatin modification and nucleosome positioning.
The method of Claim 37 or 38, wherein said mammal is a human.
A nucleic acid array comprising a plurality of the isolated RNA molecules of any one of. Claims 1- 18, the modified RNA molecule of any one of Claims 19-22, the fragment of Claim 23, or one or more isolated nucleic acids respectively complementary thereto, immobilized, affixed or otherwise mounted to a substrate.
An antibody which binds:
(i) the isolated RNA molecule of any one of Claims 1 - 1 8;
(ii) the modified RNA molecule of any one of Claims 19-22; or
(iii) the fragment of Claim 23 ; and
(iv) an at least partly complementary RNA or DNA molecule of the isolated RNA molecule of any one of Claims 1 -18 or the fragment of Claim 23.
A kit comprising one or more of the isolated RNA molecules of any one of Claims 1 - 18, the. modified RNA molecule of any one of Claims 19-22, the fragment of Claim 23, or one or more isolated nucleic acids respectively complementary thereto, and/or the antibody of Claim 41, and one or more detection reagents.
A method of treating a disease or condition in a mammal, said method including the step of administering to the mammal a therapeutic agent selected from the group consisting of:
(i) the isolated RNA molecule of any one of Claims 1 -1 8;
(ii) the modified RNA molecule of any one of Claims 19-22; (Hi) the fragment of Claim 23 ;
(iv) an at least partly complementary RNA or DNA molecule of the isolated RNA molecule of any one of Claims 1-18 or the fragment of Claim 23; and/or
(v) the antibody of Claim 41 ;
to thereby treat said disease or condition.
The method of Claim 43, wherein said disease or condition is associated with aberrant regulation of one or more activities selected from the group consisting of transcription, mRNA splicing, epigenetic modification, epigenetic regulation, chromatin modification and nucleosome positioning.
The method of Claim 43 or 44, wherein the mammal is a human.
A pharmaceutical composition comprising a therapeutic agent selected from the group consisting of:
(i) the isolated RNA molecule of any one of Claims 1 -18;
(ii) the modified RNA molecule of any one of Claims 19-22;
(iii) the fragment of Claim 23 ;
(iv) an at least partly complementary RNA or DNA molecule of the isolated RNA molecule of any one of Claims 1 -18 or the fragment of Claim 23; and/or
(v) the antibody of Claim 41.
and a pharmaceutically acceptable carrier, diluent or excipient.
PCT/AU2011/000380 2010-04-01 2011-04-01 Small rna molecules and methods of use WO2011120101A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
US31998810P 2010-04-01 2010-04-01
US61/319,988 2010-04-01

Publications (1)

Publication Number Publication Date
WO2011120101A1 true WO2011120101A1 (en) 2011-10-06

Family

ID=44711238

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/AU2011/000380 WO2011120101A1 (en) 2010-04-01 2011-04-01 Small rna molecules and methods of use

Country Status (1)

Country Link
WO (1) WO2011120101A1 (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP3645724A4 (en) * 2017-06-27 2021-07-21 Agency for Science, Technology and Research Antisense oligonucleotides for modulating the function of a t cell
WO2023044412A1 (en) * 2021-09-17 2023-03-23 Ionis Pharmaceuticals, Inc. Compounds and methods for reducing dnm1l or drp1 expression

Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2009124341A1 (en) * 2008-04-07 2009-10-15 The University Of Queensland Rna molecules and uses thereof

Patent Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2009124341A1 (en) * 2008-04-07 2009-10-15 The University Of Queensland Rna molecules and uses thereof

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
AFFYMETRIX ENCODE TRANSCRIPTOME PROJECT: "'Post-transcriptional processing generates a diversity of 5'- modified long and short RNAs'", NATURE, vol. 457, no. 7232, 2009, pages 1028 - 1032 *
TAFT, R. J. ET AL.: "Nuclear-localized tiny RNAs are associated with transcription initiation and splice sites in metazoans", NATURE STRUCTURAL AND MOLECULAR BIOLOGY, vol. 17, no. 8, August 2010 (2010-08-01), pages 1030 - 1035 *

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP3645724A4 (en) * 2017-06-27 2021-07-21 Agency for Science, Technology and Research Antisense oligonucleotides for modulating the function of a t cell
WO2023044412A1 (en) * 2021-09-17 2023-03-23 Ionis Pharmaceuticals, Inc. Compounds and methods for reducing dnm1l or drp1 expression

Similar Documents

Publication Publication Date Title
Khorkova et al. Basic biology and therapeutic implications of lncRNA
Juanchich et al. Characterization of an extensive rainbow trout miRNA transcriptome by next generation sequencing
Sun et al. From discovery to function: the expanding roles of long noncoding RNAs in physiology and disease
Agliano et al. Long noncoding RNAs in host–pathogen interactions
Rajasethupathy et al. A role for neuronal piRNAs in the epigenetic control of memory-related synaptic plasticity
Li et al. Deep sequencing analysis of small non-coding RNAs reveals the diversity of microRNAs and piRNAs in the human epididymis
Miao et al. Genome-wide analysis reveals the differential regulations of mRNAs and miRNAs in Dorset and Small Tail Han sheep muscles
Zhu et al. Identification of common carp (Cyprinus carpio) microRNAs and microRNA-related SNPs
US20110263687A1 (en) Rna molecules and uses thereof
Chiang et al. Shrimp Dscam and its cytoplasmic tail splicing activator serine/arginine (SR)-rich protein B52 were both induced after white spot syndrome virus challenge
Huang et al. Circular RNA profiling reveals an abundant circEch1 that promotes myogenesis and differentiation of bovine skeletal muscle
Wang et al. Identification and characterization of long non-coding RNAs in subcutaneous adipose tissue from castrated and intact full-sib pair Huainan male pigs
Li et al. Profiling and functional analysis of circular RNAs in porcine fast and slow muscles
Tan et al. Deep parallel sequencing reveals conserved and novel miRNAs in gill and hepatopancreas of giant freshwater prawn
Tang et al. CNN3 is regulated by microRNA-1 during muscle development in pigs
Li et al. PIWI-mediated control of tissue-specific transposons is essential for somatic cell differentiation
Zheng et al. Pm-miR-133 hosting in one potential lncRNA regulates RhoA expression in pearl oyster Pinctada martensii
Broadwell et al. Myosin 7b is a regulatory long noncoding RNA (lncMYH7b) in the human heart
Chang et al. Characterization and comparative analysis of microRNAs in the rice pest Sogatella furcifera
Luo et al. MicroRNA-1 expression and function in Hyalomma Anatolicum anatolicum (Acari: ixodidae) ticks
Gong et al. Genome-wide identification and characterization of conserved and novel microRNAs in grass carp (Ctenopharyngodon idella) by deep sequencing
Yu et al. Comparative analysis of microRNA expression profiles of adult Schistosoma japonicum isolated from water buffalo and yellow cattle
JP2022533236A (en) Gene silencing mediator
WO2011120101A1 (en) Small rna molecules and methods of use
Han et al. Comparative analysis of microRNA in schistosomula isolated from non-permissive host and susceptible host

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 11761843

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 11761843

Country of ref document: EP

Kind code of ref document: A1