WO2016059187A1 - Method of capturing and identifying novel rnas - Google Patents

Method of capturing and identifying novel rnas Download PDF

Info

Publication number
WO2016059187A1
WO2016059187A1 PCT/EP2015/073949 EP2015073949W WO2016059187A1 WO 2016059187 A1 WO2016059187 A1 WO 2016059187A1 EP 2015073949 W EP2015073949 W EP 2015073949W WO 2016059187 A1 WO2016059187 A1 WO 2016059187A1
Authority
WO
WIPO (PCT)
Prior art keywords
rna
rnas
ndsrna
capture probe
sample
Prior art date
Application number
PCT/EP2015/073949
Other languages
French (fr)
Inventor
Hinrich Gronemeyer
Maximiliano PORTAL
Original Assignee
Universite De Strasbourg
Centre National De La Recherche Scientifique
Institut National De La Sante Et De La Recherche Medicale
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Universite De Strasbourg, Centre National De La Recherche Scientifique, Institut National De La Sante Et De La Recherche Medicale filed Critical Universite De Strasbourg
Publication of WO2016059187A1 publication Critical patent/WO2016059187A1/en

Links

Classifications

    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12QMEASURING OR TESTING PROCESSES INVOLVING ENZYMES, NUCLEIC ACIDS OR MICROORGANISMS; COMPOSITIONS OR TEST PAPERS THEREFOR; PROCESSES OF PREPARING SUCH COMPOSITIONS; CONDITION-RESPONSIVE CONTROL IN MICROBIOLOGICAL OR ENZYMOLOGICAL PROCESSES
    • C12Q1/00Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions
    • C12Q1/68Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions involving nucleic acids
    • C12Q1/6844Nucleic acid amplification reactions
    • C12Q1/6853Nucleic acid amplification reactions using modified primers or templates
    • C12Q1/6855Ligating adaptors
    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12QMEASURING OR TESTING PROCESSES INVOLVING ENZYMES, NUCLEIC ACIDS OR MICROORGANISMS; COMPOSITIONS OR TEST PAPERS THEREFOR; PROCESSES OF PREPARING SUCH COMPOSITIONS; CONDITION-RESPONSIVE CONTROL IN MICROBIOLOGICAL OR ENZYMOLOGICAL PROCESSES
    • C12Q1/00Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions
    • C12Q1/68Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions involving nucleic acids
    • C12Q1/6806Preparing nucleic acids for analysis, e.g. for polymerase chain reaction [PCR] assay

Definitions

  • the present invention relates to a method for identifying double stranded RNAs in a sample. Background of the invention
  • RNA transcripts arising from unexplored regions within the genome, leading to the discovery of novel regulatory paradigms 1"5 .
  • novel regulatory paradigms 1"5 the transcriptional profile of hundreds of large genomic regions displaying disease-associated markers such as translocations, chromosomal rearrangements and single nucleotide polymorphisms (SNPs) remains largely unexplored 6"8 .
  • Mercer et al. 4 describe a method for targeted sequencing of the human transcriptome that reveals its deep complexity.
  • Ng et al. 31 describe the targeted capture and massive parallel sequencing of human exomes. These reports provide insights on exome and transcriptome complexity.
  • the present inventors have described a new class of double- stranded RNAs with potential regulatory functions, referred to as natural double-stranded RNAs (ndsRNA).
  • ndsRNAs natural double-stranded RNAs
  • ndsRNAs are double-stranded structures that resist to single-stranded RNA- specific RNAse and are sensitive to double-stranded RNA- specific RNAse, and that escape processing by members of the known siRNA/miRNA pathways.
  • the inventors herein propose a method and a bioinformatics tool to identify or define the nucleotide sequences of ndsRNAs. This method proposed herein allows the rapid and genome-wide identification of ndsRNAs in a cell or tissue. It is based on the capture of ndsRNAs followed by the identification of all captured molecules by a sequencing step, in particular with massive parallel next generation sequencing (NGS).
  • NGS massive parallel next generation sequencing
  • a first object of the invention relates to a method for identifying double-stranded RNAs (more particularly ndsRNAs) from cell/tissue samples, comprising:
  • RNA sequencing the captured RNAs and determining the presence or the absence of double- stranded RNAs (such as ndsRNAs) in the sample.
  • double- stranded RNAs such as ndsRNAs
  • the capture probe comprises:
  • the capture probe, and RNAs ligated to it may be purified using selective means.
  • the moiety which can be selectively bound by adapted means is a modified nucleotide (alternatively referred to as a "tagged nucleotide” in the following) such as a biotinylated nucleotide, for example a biotinylated thymine.
  • the said moiety is a peptide moiety which may be used to purify the capture probe and any RNA ligated with means having selective affinity to such peptide moiety, for example an antibody specifically binding said peptide moiety.
  • the capture probe comprises, in this order a first nucleic acid with a 5'-phosphate end, a moiety which can be selectively bound by adapted means, and a second nucleic acid sequence having a 3'-OH end.
  • the first and second nucleic acid sequences have complementary nucleotides one with the other which can form a stem.
  • the length of this stem can vary but can be of 1-10 base pairs, such as 1 , 2, 3, 4, 5, 6, 7, 8, 9 or 10 base pairs, for example.
  • the first and second nucleotides each have two complementary nucleotides, thus being able to form two base pairs in the stem.
  • the first and second nucleic acid sequence comprise independently from 1 to 20 nucleotides, in particular from 1 to 10, in particular from 2 to 6 nucleotides, such as 2, 3, 4, 5 or 6 nucleotides.
  • the first and second nucleic acid sequences are complementary one with the other on their entire length.
  • the first and second nucleic acid sequences are only partly complementary, with the 5 '-phosphate and 3'-OH ends of respectively the first and the second nucleic acid sequences, being free.
  • These free “overhangs” may be of 1, 2, 3 or 4 nucleotides or more, in particular of 2 nucleotides.
  • the moiety which can be selectively bound by adapted means is a third nucleic acid sequence which can form a loop and is tagged by inclusion of modified nucleotides such as biotinylated nucleotides (for example one or more biotinylated T, A, C or U) or non natural biotinylated bases that can be incorporated into the capture probe by chemical synthesis.
  • the third nucleotide sequence may be of variable length, but may comprise in particular from 2 to 20 nucleotides, such as 2, 3, 4, 5, 6, 7, 8, 9 or 10 nucleotides, independently of the size of the first and/or second nucleic acid sequences.
  • the moiety which can be selectively bound by adapted means is a peptide or protein provided between the first and second nucleic acid sequences.
  • the capture probe comprises, from 5' to 3', a first, a second and a third nucleic acid sequences, wherein
  • said sequences allow formation of a hairpin with the first and third nucleic acid sequences being complementary (either entirely or partly as provided above) and efficiently forming the stem of the hairpin;
  • the second nucleic acid sequence forms the loop of the hairpin and is tagged such that it can be purified using tag-specific purification.
  • the capture probe is an RNA-based oligonucleotide capture probe, such as an RNA-based oligonucleotide capture probe, capable of ligating preferentially RNA molecules with 5' and 3' ends in close proximity; wherein the capture probe comprises, from 5' to 3', a first, a second and a third nucleic acid sequences,
  • said sequences allow formation of a hairpin with the first and third nucleic acid sequences being complementary and efficiently forming the stem of the hairpin; and b) the second nucleic acid sequence forms the loop of the hairpin and is tagged such that it can be purified using tag-specific purification.
  • the second nucleic acid sequence is tagged (or labeled) with one or more biotinylated nucleotides, for example with biotinylated dT or any other biotinylated nucleotide.
  • the capture probe is selected from
  • the capture probe comprises a first and a second nucleic acid sequences, wherein said sequences are complementary and wherein the protein is provided between said first and second nucleic acid sequences;
  • any other capture probe comprising a stem comprised of two complementary nucleic acid sequences linked with any other compound that can be used to selectively purify the capture probe and RNA linked to it, such as an antibody or antibody fragment.
  • the capture probe is of the formula
  • X is a nucleotide adapted to the selective purification of the capture probe (such as a modified nucleotide such as a biotinylated nucleotide).
  • the capture probe is of the formula
  • r indicates that the corresponding nucleotide is a ribonucleotide.
  • Another object of the invention is a method of identifying double stranded RNA (such as ndsRNA) molecules in a sample comprising:
  • RNAs a) capturing (or purifying) RNAs with an oligonucleotide capture probe as described above;
  • Another object of the invention relates to a method of identifying or characterizing a specific defined double-stranded RNA (in particular a ndsRNA) or a specific pattern or the entire spectrum of ndsRNAs present in a RNA sample, such as for example the RNAs extracted from a physiologically normal or a pathological specimen, comprising application of the above method to the RNAs contained in said sample, thereby identifying or characterizing individual or patterns/signatures of double-stranded RNAs (in particular ndsRNAs) contained therein.
  • a specific defined double-stranded RNA in particular a ndsRNA
  • a specific pattern or the entire spectrum of ndsRNAs present in a RNA sample such as for example the RNAs extracted from a physiologically normal or a pathological specimen
  • a further object of the invention relates to a method for identifying a marker associated with a phenotype or cell function, or for identifying a target for the treatment of a disease or for identifying a bio marker indicative of a disease or condition.
  • This method comprises the identification of individual or of a specific pattern/set of double-stranded RNAs (such as ndsRNAs), or determining the expression profile of double-stranded RNAs (e.g. ndsRNAs) in a sample of interest, thereby associating individual or patterns of the double-stranded RNAs (suh as ndsRNAs) identified to a phenotype, cell function, or disease.
  • the sample of interest may correspond to a cancer cell and double-stranded RNA (e.g. ndsRNA) expression profiling allows the identification of double-stranded RNA-based (such as ndsRNA-based) biomarkers, or signatures composed of several/multiple double-stranded RNA (e.g. ndsRNA) expression patterns, including the presence or absence of double- stranded RNA(s) (e.g.
  • ndsRNA(s) which correlates with the pathology, indicative of this cancer or the identification of a double-stranded RNA (such as a ndsRNA) which could be the target of a treatment (for example by increasing or decreasing the expression of said double- stranded RNA, for example a ndsRNA).
  • Figure 1 identification of RAM-derived RNAs.
  • GAPDH is depicted as an RNAse treatment control, (f), stsRT-qPCR of Class III and IV long RNAs and small RNAs in PLB985-total RNA from whole cell (Input), nuclei (Nuclei) or cytoplasmic extracts (Cytoplasm).
  • let-7c and U49 snoRNA are cytoplasmic and nuclear controls, respectively. Opposite bars correspond to matching sense/antisense pairs.
  • Histogram represents mean RNA levels as arbitrary units (A.U.) +/- SD of one representative experiment out of three independent biological replicates, (g), stsRT-qPCRs of RAM-derived Class III (from left to right: ndsRNA3, 4, 5, 6, 8 and 9), IV long (from left to right: ndsRNAl and 2) and small (from left to right class III: sRNA 5, 6, 7, 8, 10 and 11 and IV: sRNA 1, 2, 3 and 4) RNAs in BJELR cells. Opposite bars correspond to matching sense/antisense pairs. RNA levels are shown as arbitrary units (A.U.) +/- SD. Genome positions are shown in Table 1.
  • Figure 2 nds-2a establishes specific RNA-protein interactions and displays cell cycle- dependent subcellular localization,
  • (a) Electrophoretic mobility shift assay performed with radioactively labelled (*) nds-2a/2e incubated with BJELR nuclear extract (N.E). Specificity was confirmed by competition with non-radioactive nds-2a/2e.
  • (b) Gene ontology enrichment analysis performed for nds-2a associated proteins identified by mass spectrometry,
  • Immobilized nds-2a/2e were incubated with N.E.
  • RNA levels as arbitrary units (A.U.) +/- SD of one experiment out of three biological replicates
  • Immobilized nds-2a was incubated with RAN, RCCl, RANGAPl or RANBP2 siRNA-depleted nuclear extracts and the nds-2a protein bound fraction was analysed for RAN, RCCl, RANGAPl and RANBP2 proteins by Western blot.
  • N.E. depletion was controlled by Western blot and is depicted as input
  • Interphase BJ cells were processed for nds-2a RNA-FISH (red channel) coupled to immunocytochemistry for RAN, RANGAPl and RANBP2 (green channel) and analysed by confocal microscopy.
  • a-Tubulin (a-Tub) and DAPI staining are shown to delineate the cell.
  • Merge and enlarged images are depicted, (g), nds-2a and RAN, RANGAPl and RANBP2 localization was analyzed by confocal microscopy in metaphase BJ cells.
  • DAPI staining delineates the metaphasic plaque.
  • Merge and enlarged images are depicted, (h), Interphase BJ cells were processed for nds-2a RNA-FISH to detect either nds-2a forward (2a Fw, red channel) or reverse (2a Rv, green channel) strands.
  • a-Tubulin (a-Tub) and DAPI staining are shown to delineate the cell.
  • Merge images are depicted, (i), nds-2a and a-Tubulin (a-Tub) localization was analyzed by confocal microscopy in metaphase BJ cells. Merge and enlarged images (Inset) are depicted.
  • Figure 3 nds-2a overexpression leads to a range of mitotic defects and pronounced changes in nuclear shape, (a), Diagram of pBI-nds-2a plasmid and overexpressed SP6- tagged nds-2a (AB and CD), (b), RNAse ONE protection assays followed by stsRT-PCR of overexpressed SP6 _, tagged ndsRNA-2a.
  • GAPDH is depicted as RNAse treatment control, (c- f), HeLA cells overexpressing nds-2a variants (AB or CD) were processed for immunocytochemistry against ⁇ -Tubulin (a-Tub, red channel), counterstained with DAPI (blue, channel) to delineate the cellular contour and analyzed by confocal microscopy. Empty plasmid was used as control. Histone Hl-GFP (Hl-GFP, green channel) expressing plasmid was co-transfected to monitor transfection efficiency (Fig. 13a). The number of bi/multinucleated cells, number of chromatin bridges and cells displaying abnormal nuclear shape was determined by a double blind analysis.
  • Figure 4 Genome- wide expression of ndsRNAs.
  • (a) Pie chart representing percentages of ndsRNAs mapping to exons, introns or intergenic regions
  • (b) Pie chart representing ndsRNAs compared to fRNAdb annotated features. Different RNA species contained in fRNAdb are color-coded
  • (c) Retinoic acid-modulated ndsRNA and nds-derived small RNA in PLB985 cells are depicted
  • (d) RNA levels of forward (Fw) and reverse (Rv) strands of modulated ndsRNAs from RA or vehicle-treated cells was determined by stsRT-qPCR. ICAM1 and PRC mRNAs are shown as controls.
  • Results are expressed as arbitrary units (A.U.) +/- SD of one out of two biological replicas. Genome positions are shown in Table 2.
  • Figure 5 Identification and validation of RAM-derived RNAs.
  • (a) RNA capture approach,
  • (b) Pie chart representing the relative distribution of Class I-IV RAM-derived transcripts,
  • (c-f) Schematic representation and validation of single-stranded Class I-II and natural double-stranded (ndsRNAs) Class III-IV long RNAs, as well as small RNAs from Class II-IV transcripts by stsRT-PCR.
  • FIG. 6 RAM-derived RNAs in PLB985 and BJELR cells, (a-b), RNAse ONE or RNAse III protection assays followed by stsRT-PCR of Class III-IV RNA duplexes and Class I-II transcripts.
  • GAPDH is depicted as an RNAse treatment control, (c), Detection by stsRT-PCR of RAM-derived Class II-IV long and small RNAs in RAS-transformed foreskin fibroblast (BJELR).
  • FIG. 7 RAM-derived small RNAs are neither products of pervasive transcription nor byproducts of the canonical miRNA machinery, (a), Determination of Drosha, Dicer, AG02 and EXOSC3 knock-down efficiency by RT-qPCR in BJELR cells. mRNA levels are expressed relative to untreated mock samples. Histogram represents mean RNA levels as arbitrary units (A.U.) +/- SD of one experiment out of three biological replicates, (b), Determination of Drosha, Dicer, AG02 and EXOSC3 knock-down efficiency by Western blot. Actin levels are shown as loading control, (c), Analysis of the impact of Drosha, Dicer, AG02 and EXOSC3 knock-down on the levels of previously reported miRNAs.
  • d-g Quantification of Class II and nds-derived sRNAs as well as miR-93 control upon Drosha (d), Dicer (e), AG02 (f) or EXOSC3 (g) knock down in BJELR cells.
  • the expression level for each sRNA in scramble siRNAs-transfected (scr) samples was arbitrarily set to 1. Values upon knock down are expressed relative to the scr. Histogram represents mean RNA levels as arbitrary units (A.U.) +/- SD of one experiment out of three biological replicates.
  • FIG. 8 RAM-derived small RNAs are loaded into AG02. (a-b), Silver staining and Western blot for argonaute-2 (AG02) in whole cell lysates (input) and AG02- immunoprecipitated material (AG02 IP). AG02, heavy (he) and light (lc) chains of the IP- antibody are indicated, (c-d), Detection by stsRT-qPCR of Class II and nds-derived sRNAs in RNA extracted from whole cell lysates (input) or AG02-immunoprecipitated material from PLB985 cells. let-7c and U6 snoRNA are included as AG02-loaded positive and negative controls. Histogram represents mean RNA levels as arbitrary units (A.U.) +/- SD of one experiment out of two biological replicates.
  • Figure 9 RAM-derived long RNAs are neither modulated upon exosome depletion nor by the miRNA pathway, (a-d), Quantification of Class II (RNA1 : B2, RNA2: B3, RNA3: B4, RNA4: B6, RNA5: B7 and RNA6: B8), Class III (ndsRNA3: C1/C2, ndsRNA4: C3/C4, ndsRNA5: C5/C6, ndsRNA6: C7/C8, ndsRNA7: C9/C10, ndsRNA8: C11/C12, ndsRNA9: C13/C14) and Class IV (ndsRNAl : D1/D2 and ndsRNA2: D3/D4) long RNAs precursors upon Drosha (a), Dicer (b), AG02 (c) or EXOSC3 (d) knock down in BJELR cells.
  • RNA1 RNA1 : B2, RNA2: B3, RNA3: B4, RNA4: B6, RNA5
  • RNA levels were arbitrary set as 1. Values upon specific knock-down are expressed relative to the scr sample. Histogram represents mean RNA levels as arbitrary units (A.U.) +/- SD of one experiment out of three biological replicates.
  • ndsRNAs establish specific protein interactions, (a), Nuclear proteins interacting with nds-2a are revealed by SDS-PAGE followed by silver staining, (b-c), Gene ontology enrichment analysis of nds-2a and nds-2e interacting proteins identified by mass spectrometry, (d), Peptide coverage of nds-2a interacting RAN, RCC1, RANGAPl and RANBP2 identified by mass spectrometry.
  • Figure 11 nds-2a binds RAN and RCC1 in vitro and in vivo
  • (a) The specificity of RAN, RCC1, RANGAPl and RANBP2 antibodies was tested by a siRNA based approach, a- Tubulin (a-Tub) is shown as a loading control
  • (b) RAN, RCC1, RANGAPl and RANBP2 proteins were immunoprecipitated with specific antibodies and their corresponding levels were analysed by Western blot to determine immunoprecipitation efficacy
  • BJELR cells were sorted according to cell cycle phases (Gl , S and G2-M) by flow cytometry. Panels show a normal cell cycle and the enrichment of the FACS sorted populations in the indicated phases of the cell cycle, (e), Sorted cells from d were analyzed for nds-2a forward (2a Fw) and reverse (2a Rv) strand levels by stsRT-qPCR. Histogram represents mean RNA levels as arbitrary units (A.U.) +/- SD of one experiment out of two biological replicates.
  • Figure 12 nds-2a localization in interphase cells, (a), Interphase BJ cells were processed for nds-2a RNA-FISH to detect either nds-2a forward (2a Fw, red channel) or reverse (2a Rv, green channel) strands. a-Tubulin (a-Tub) and DAPI staining are shown to delineate the cell. Merge images are depicted.
  • Figure 13 Cell cycle profile of nds-2a transfected cells, (a), HeLA cells were cotransfected with the indicated plasmids and transfection efficiency was calculated as the percentage of cell displaying a positive labelling for GFP (upper right). Non-transfected cell were used as an autofluorescence control, (b), Cell cycle profile of transfected HeLA cells with indicated plasmid pairs. Percentage of cells in each phase of the cell cycle are depicted.
  • Figure 14 Quality controls of global PLB985-derived stsRNA-Seq libraries, (a-b), Intensity correlation analysis of technical replicates at 20 nt resolution of long fragmented RNAs (50-70 nt) and naturally occurring small RNAs (18-30 nt) from global stsRNA-Seq libraries. Results are displayed as log 2 of original values.
  • Pearson's correlation coefficient values are shown, (c-e), Screenshots of long stsRNA-Seq showing the profile obtained for known genes (HSP90B1 in forward (Fw) strand and cl2orf73 in reverse (Rv) strand), lincRNAs (chr6: 141071891-141249602; Rv strand) and several SNAR RNAs precursors and its corresponding small RNAs (SNAR- A3; Rv strand; small stsRNA-Seq).
  • Figure 15 Modulation of ndsRNA levels by Retinoic Acid
  • (a) Flow cytometry analysis of PLB985 cells treated either with vehicle (ETOH) or retinoic acid (RA). Percentage of differentiated cells was determined by CDl lc/CD14 immunolabelling. Fluorescent background signal was assessed by labeling with fluorescently labeled non-specific isotypic antibodies
  • (b) Representation of the RNA levels of forward (Fw) and reverse (Rv) strands of randomly selected ndsRNAs transcripts in PLB985 samples treated with RA or vehicle determined by stsRT-qPCR. Histogram represents mean RNA levels as arbitrary units (A.U.) +/- SD of one experiment out of two biological replicates. Genome positions for the displayed examples are shown in Table 2.
  • Figure 16 schematic representation of an embodiment of the ndsRNA capture and identification method of the invention. Isolated nuclei are incubated with a biotin labelled- ndsRNA-specific adaptor oligonucleotide (or capture probe) to capture ndsRNAs in presence of an RNA ligase. Total RNA is incubated with streptavidin beads and recovered ligated products are subjected to strand- specific RNA-seq library construction. Libraries are sequenced in HiSeq2000/2500 Illumina sequencer and data is analyzed bioinformatically.
  • Figure 17 validation of the capture of ndsRNAs. The proximity ligation assay proposed herein captures nds-2a (a), nds-2e (b) and additional already validated ndsRNAs (c).
  • Figure 18 example of re-constructed molecules from PLA analysis for ndsRNAs after RNAse optimization.
  • a single adaptor (bases underlined) is identified for a given single reconstructed molecule showing that RNAse optimization may be used to achieve optimal results.
  • a particular embodiment of the methods and uses of the invention is directed to the capture, identification or sequencing of a ndsRNA.
  • ndsRNAs were first described by the inventors in Portal et al. 34 , EP14305822 and PCT/EP2015/062179 (which are incorporated by reference).
  • ndsRNA are double-stranded structures that resist to single- stranded RNA-specific RNAse and are sensitive to double-stranded RNA-specific RNAse, and that escape processing by members of the known siRNA/miRNA pathways. In a particular variant, they correspond to non-coding sequences.
  • the size of the ndsRNA is of at least 50 nucleotides, at least 60, at least 70, at least 80, at least 90, at least 100, at least 150, at least 200 or even at least 250 nucleotides.
  • the test RNA (i.e. the RNA from which the capture is to be performed) may be obtained from any sample of interest.
  • the sample of interest may be a cell, a cell culture or a tissue from a subject, for example an animal subject, in particular a mammal or non-mammal animal, in particular from a human subject.
  • the sample may also correspond to nuclei isolated from such cells or tissues.
  • the sample may be a cell or tissue sample from a patient in a diseased state.
  • the present invention permits the identification and characterization of ndsRNAs involved in the disease or indicative of the pathology.
  • RNAs are isolated using methods well known in the art (Molecular Cloning: A Laboratory Manual, Third Edition. J. Sambrook, D. Russell). In particular, kits are readily available to those skilled in the art for performing total RNA extraction from a cell or tissue sample.
  • RNAs are extracted and purified using methods well known in the art.
  • RNAs are purified using Trizol, as is well known to those skilled in molecular biology.
  • ribosomal RNA Before the capture (or purification) step, ribosomal RNA may be depleted from the RNA extract such that the RNA transcripts are enriched regardless of their polyadenylation status or the presence of a 5'-cap structure.
  • kits for depleting ribosomal RNA include the RiboMinus kit from Invitrogen.
  • single stranded RNA species may be depleted by using a RNase that only degrades single stranded RNAs, such as RNase ONE® from Promega.
  • RNase ONE® a RNase that only degrades single stranded RNAs
  • single- stranded RNA-specific RNAse treatment may be implemented either before or after contacting the capture probe with the RNA extract, in particular before.
  • the RNA extract may be treated with a DNAse.
  • the RNA extract may be assessed to determine whether it is free from genomic DNA. For example, quantitative PCR amplification of a short region (e.g. of about 100 bp) from a single copy gene may be implemented.
  • the topoisomerase (DNA) III Alpha (TOP3 A) single copy gene may be assessed with the forward primer of SEQ ID NO: 3 (5 '-TC ATCTGT ATGGCC AGGT AGG-3 ') and the reverse primer of SEQ ID NO:4 (5 '-GGAACCTTT AGGTTGTTAAC AGTTG-3 ') .
  • the RNA extract may be treated with a DNAse, such as the TURBO DNAse, followed by a new RNA extraction as described above, such as a Trizol extraction.
  • the captured RNAs correspond to a small RNA fraction of RNAs 18 to 30 nucleotide long.
  • long RNAs are captured, preferably after fragmentation (e.g. chemical or enzymatic fragmentation) to 50-250, such as 50-150, 50-100 or 50-80 nucleotide long RNAs, such as 50-70, 60-80 or 75-100 nucleotide long RNAs.
  • fragmented long RNAs are 50-70 or 75-100 nucleotide long, or may be the fragmented long RNAs may be of up to 100, up to 150, up to 200 or even up to 250 nucleotide long.
  • RNA fragmentation of long RNAs can be carried out with divalent cation-based cleavage such as zinc-mediated RNA fragmentation, for example with a zinc based RNA fragmentation reagent such as "RNA fragmentation reagent®" from Ambion.
  • divalent cation-based cleavage such as zinc-mediated RNA fragmentation
  • a zinc based RNA fragmentation reagent such as "RNA fragmentation reagent®” from Ambion.
  • Both small and fragmented long RNA fractions may be purified according to methods well known in the art. In an embodiment, these fractions are purified after migration on denaturing polyacrylamide gel electrophoresis (e.g. PAGE-urea).
  • both small RNAs and long fragmented RNAs are captured in a separate or simultaneous reaction, preferably in separate reactions.
  • RNA capture e.g. PAGE-urea
  • RNA molecules in particular the small fraction RNAs and/or the long fraction RNAs, in particular the fragmented long fraction RNAs described above, are incubated with the capture probe.
  • a treatment with a RNAse specific for single- stranded RNA molecules may be done before or after incubation of the RNA molecules with the capture probe.
  • the present invention is based on the idea that both strands of a double stranded RNA molecule could be ligated by an exogenous closed hairpin oligonucleotide adapter.
  • the capture probe may be any molecule that may capture double-stranded RNA species and which may be purified according to a specific moiety present in said adapter. By taking advantage of this specific moiety (or tag) present in the adapter, captured molecules are enriched and further processed according to the steps indicated below.
  • the capture probe comprises, from 5' to 3', a first, a second and a third nucleic acid sequences, wherein
  • said sequences allows formation of a hairpin with the first and third nucleic acid sequences being complementary and efficiently forming the stem of the hairpin;
  • the second nucleic acid sequence forms the loop of the hairpin and is tagged such that it can be purified using tag-specific purification.
  • the capture probe is an oligonucleotide capture probe, in particular an RNA-based oligonucleotide capture probe, capable of ligating preferentially R A molecules with 5' and 3' ends in close proximity; wherein the capture probe comprises, from 5' to 3', a first, a second and a third nucleic acid sequences,
  • the third nucleic acid sequence is tagged with one or more biotinylated nucleotides, for example with biotinylated dT or any other biotinylated nucleotide.
  • the capture probe is selected from
  • the capture probe comprises a first and a second nucleic acid sequences, wherein said sequences are complementary and wherein the protein is provided between said first and second nucleic acid sequences;
  • any other capture probe comprising a stem comprised of two complementary nucleic acid sequences linked with any other compound that can be used to selectively purify the capture probe and RNA linked to it such as an antibody or antibody fragment.
  • the capture probe is of the formula
  • X is a moiety adapted to the selective purification of the capture probe (such as modified nucleotides such as biotinylated nucleotides or a protein moiety).
  • the capture probe is of the formula
  • capture probe is an oligonucleotide and the method comprises incubating the isolated and/or purified RNA molecules with the oligonucleotide capture probe and a RNA ligase (e.g. for example with T4 RNA ligase), under conditions allowing ligation of the RNAs with the capture probe.
  • Captured RNAs i.e. RNAs ligated to the capture probe, are then recovered implementing a selective recognition of the label present in the third nucleic acid sequence.
  • RNA/capture probe complexes are recovered using a support grafted with streptavidin such as streptavidin (magnetic) beads, columns, etc.
  • streptavidin such as streptavidin (magnetic) beads, columns, etc.
  • the capture probe comprises a moiety which is a protein, methods based on affinity for this protein, for example with antibodies recognizing this protein bound on a support, are implemented. Bound RNA is then eluted from the support.
  • RNA sequencing over dsRNAs captured by the capture probe in particular a capture probe such a double stranded RNA (e.g. a ndsRNA) capture oligonucleotide.
  • a capture probe such as a double stranded RNA (e.g. a ndsRNA) capture oligonucleotide.
  • Potential double stranded RNAs, and most particularly ndsRNAs are identified by de-novo transcript reconstruction (i.e. with no genome alignment) followed by a search of the double-stranded RNA (e.g. ndsRNA) capture oligonucleotide sequence in the reconstructed transcripts. Finally the transcripts containing the capture oligonucleotide sequence are split before and after the capture oligonucleotide to test overlapping in opposite strands.
  • potential double- stranded RNAs e.g. ndsRNAs
  • ndsRNAs are identified when transcript overlaps in opposite strands
  • polarity of the double stranded RNA may be retrieved based on the presence of the capture oligonucleotide present in the ligated species.
  • Other sequencing methods are known in the art that preserve directional information, for example using distinguishable adapters for different ends of RNA. Representative methods include the ligation of a 3'-adapter and a 5'-adapter, the sequences of which are known and different, to the captured RNAs.
  • a method for preparing a RNA sample for further sequencing may include:
  • RNA sample for further sequencing:
  • RNAs are dephosphorylated using a phosphatase (e.g; antartic phosphatase);
  • a phosphatase e.g. antartic phosphatase
  • RNA ligase such as T4 RNA ligase
  • RNA molecules are 5'-phosphorylated
  • RNA molecules are purified, for example by size separating them with denaturing gel electrophoresis and recovery by gel excision according to their expected size followed by RNA precipitation;
  • RNAs purified in the preceding step is ligated to the RNAs purified in the preceding step.
  • the 5'- and 3 '-ligated RNA molecules may be purified by denaturing gel electrophoresis and recovered by gel excision according to their expected size.
  • kits available for preparing a RNA with 3'-, 5' adapters include the DGE small RNA library kit from Illumina.
  • the RNAs are then reverse transcribed with specific primers, for example with primers specific of the 3' and 5' adapters and the obtained cDNA may be amplified by PCR.
  • the invention may implement ligation-mediated reverse transcription followed by PCR amplification.
  • multiplexed library preparation can be implemented by using indexed primers during a PCR- based library amplification, allowing multiple samples to be sequenced in parallel in a single sequencing run.
  • sequencing length is of at least 75, at least 100, at least 150, at least 200 or even at least 250 nucleotide long reads (such as 75-100 nucleotide long reads) along the sequencing procedure to improve reconstruction accuracy and retrieve ndsRNA molecules with better confidence.
  • RNA sequences may first be preprocessed, in particular with a computer program, to identify the capture oligonucleotide sequence contained within reads, thereby generating datasets. These datasets may then be used to de-novo reconstruct transcripts. Transcript re-construction may be done using a computer program such as the Trinity (Grabherr, MG, et al. Nature Biotech. 2011) or scripture (Gutmann M. et al. Nature Biotech. 2010). Since transcript polarity is conserved during the construction of the RNA- Sequencing library, the retrieved sequencing reads are 5 '-3' oriented. Therefore, single and potential double stranded RNA transcripts can be reconstructed and identified.
  • a computer program such as the Trinity (Grabherr, MG, et al. Nature Biotech. 2011) or scripture (Gutmann M. et al. Nature Biotech. 2010). Since transcript polarity is conserved during the construction of the RNA- Sequencing library, the retrieved sequencing reads are 5
  • transcripts containing the capture oligonucleotide sequence may be identified and processed further. Sequence before and after the capture oligonucleotide are split into two different transcripts and mapped independently, for example using Bowtie Aligner program (Langmead et al. "Ultrafast and memory efficient alignment of short DNA sequences to the human genome”. Genome Biology, 2009). Afterwards, transcripts overlap in opposite strands is identified, for example by using a computer program such as BedTools (Quinlan AR and Hall AM. "BEDTools: a flexible suite of utilities comparing genomic features". Bioinformatics, 2009) and identified regions are extracted for further analysis.
  • BedTools Quality of tools
  • a double-stranded RNA (such as s ndsRNA) is identified on the basis of identification of transcript overlap in opposite strands of the transcripts identified as sequences before and after the sequence of the capture oligonucleotide, hence experimentally validating double stranded RNA.
  • the presence of a double-stranded RNA such as a ndsRNA can further be validated by strand specific reverse transcription followed by PCR or qPCR for each of the strands in the ndsRNA and their corresponding small RNAs.
  • RNAse specific for single stranded RNA species such as RNAse ONE can be used to further support the double stranded nature of the molecule (e.g. a ndsRNA) identified.
  • the invention relates to a method for identifying double stranded RNAs (such as ndsRNAs) in a sample comprising:
  • RNAs double stranded RNAs (such as ndsRNAs)using a capture probe as described above; - denaturing the captured RNAs;
  • RNAs double stranded RNAs
  • the invention provides a method for identifying double stranded RNAs (such as ndsRNAs)in a sample, comprising:
  • RNAs such as ndsRNAs
  • the invention relates also to a double stranded RNAs (such as ndsRNAs), in particular a ndsRNA identified according to the method described above.
  • the ndsRNA is ndsRNA-2a or ndsRNA-2e.
  • the inventors have shown that ndsRNA- 2a is involved in a mitosis-specific RAN containing complex, showing its potential involvement in mitosis.
  • ndsRNA-2a and ndsRNA-2e sequences are shown in SEQ ID NO: l and SEQ ID NO:2, respectively.
  • SEQ ID NO: 1
  • ndsRNAs have a functional role in the cell.
  • nds-RNA2a interacts with major mitotic components involved in fundamental aspects of cell physiology ranging from nuclear import/export to spindle assembly and mitotic progression.
  • overexpression of ndsRNA-2a leads to a range of mitotic defects and a pronounced change in nuclear shape highlighting its role in cell cycle progression.
  • the inventors have shown that a subset of globally expressed ndsRNAs is modulated upon retinoic acid treatment, demonstrating that the novel RNAs are regulated by cellular cues and participate in a plethora of regulatory systems.
  • an object of the invention is also a method for identifying a marker associated with a phenotype or cell function, or for identifying a target for the treatment of a disease or for identifying a bio marker indicative of a disease or condition.
  • This method comprises the identification of ndsRNAs, or determining the expression profile of ndsRNAs in a sample of interest, thereby associating the ndsRNAs identified to a phenotype, cell function, or disease.
  • the sample of interest may correspond to a cancer cell and ndsRNA expression profiling allows the identification of biomarkers indicative of this cancer or the identification of a ndsRNA which could be the target of a treatment (for example by increasing or decreasing the expression of said ndsRNA).
  • the invention also relates to a double stranded RNAs (such as ndsRNAs)that may be identified thanks to the method described in the preceding paragraph.
  • a double stranded RNAs such as ndsRNAs
  • the invention further relates to a method for identifying the function of a ndsRNA, wherein said ndsRNA is either introduced or depleted in a cell, tissue, organ or organism (in particular a mammal organism, more particularly a non-human organism) and phenotypic or functional changes occurring after said introduction or depletion are determined.
  • Representative changes searched include, for example, changes in the cell cycle, cell shape, induction of apoptosis, induction of cell differentiation, induction of a sensitivity or resistance to a therapeutic molecule, etc.
  • the characterization of a ndsRNA may also comprise the identification of binding partners of said ndsRNA, in particular of binding proteins, for example using a mass spectrometry (MS) analysis, as provided in the examples.
  • MS mass spectrometry
  • the binding partners are identified using a biotinylated RNA which is incubated with a protein sample, for example a whole cell extract, a nuclear extract, a cytoplasmic extract or with a protein produced in vitro, the biotinylated ndsRNA:protein complex is then captured on a streptavidin covered support and then protein analysis is performed, in particular using a MS analysis.
  • a protein sample for example a whole cell extract, a nuclear extract, a cytoplasmic extract or with a protein produced in vitro
  • the biotinylated ndsRNA:protein complex is then captured on a streptavidin covered support and then protein analysis is performed, in particular using a MS analysis.
  • the invention also relates to an oligonucleotide capture probe as described above.
  • the invention further relates to a kit comprising an oligonucleotide capture probe as described above.
  • the kit may comprise any buffer or material used in the implementation of the above methods.
  • the kit of the invention may comprise a ligase and/or a capture support (for example streptavidin beads if the capture probe is labeled with biotin) and/or any buffer useful in the practice of the invention.
  • the kit may comprise instructions for the user to follow for implementing the invention.
  • PLB985 cells were grown in RPMI medium supplemented with 25 mM HEPES, 10% FCS and glutamine.
  • BJ and BJELR cells were grown in DMEM/M199 1 :4 (lg/1 glucose) supplemented with 10% FCS.
  • HeLA cells were grown in DMEM (lg/1 glucose), 5% FCS supplemented with glutamine.
  • Total RNA was extracted using Trizol (Invitrogen) according to manufacturer instructions.
  • RNA fragmentation Ribominus RNA (Invitrogen) was fragmented in zinc-based RNA fragmentation reagent (Ambion) during 6 min at 70°C, separated by PAGE/urea and 50-70 nt RNA was recovered by NaCl overnight elution.
  • BACs (RP11-3O20, RP11-44018, RP11-770K21, RP11-588B17) covering the RAM region were obtained from Children's Hospital Oakland Research Institute (CHORI). Briefly, 200 ng of an equimolar mix of BAC DNA was sonicated for 37 cycles (10 sec ON, 50 sec off, amplitude 30%) in a Vibra-Cell apparatus (Bioblock Scientific) in lysis buffer (50 mM HEPES pH 7.5, 140 mM NaCl, 1% Titron X-100, 0.1% Na-Deoxycholate, supplemented with protease inhibitors). Sonicated BAC DNAs were size-separated by agarose gel electrophoresis and 200-300 bp band was purified by QIAquick column (Qiagen).
  • Traps were generated by using MEGAPrime random primer labelling system (Amersham) with 250 ng of BAC DNA library and 5' Biotin-random primers. After Klenow extension, BAC DNA was removed from the Traps by Dpn I treatment.
  • Specific RAM-region traps were generated by random priming of 200-300 bp purified sonicated BACS (RP11-44018, RP11-588B17, RP11-770K21 and RP11-3O20) using 5 '- biotinylated primers.
  • Long RNA fraction was prepared by chemical fragmentation of total RNA (50-70 nt, Ambion) and further purified by PAGE, whereas the small RNA fraction (18- 30 nt) was directly prepared by PAGE.
  • RNA traps were incubated with the traps in Binding buffer (0.5 M NaCl, 0.01 M Tris-HCl pH 7.5, 0.5% SDS, 0.1 mM EDTA) at 62°C (small RNAs) or 68°C (long RNAs) overnight. RNA traps were further recovered using magnetic streptavidin beads, RNA eluted into Elution buffer (0.01 M Tris-HCl pH 7.5, 1 mM EDTA) and further purified by PAGE according to previous size selection.
  • Binding buffer 0.5 M NaCl, 0.01 M Tris-HCl pH 7.5, 0.5% SDS, 0.1 mM EDTA
  • RNAs were dephosphorylated with 5 units of Antarctic phosphatase (NEB) separated by PAGE/Urea gel electrophoresis and purified by gel excision according to prior size selection.
  • NEB Antarctic phosphatase
  • 3' RNA Adapter (Illumina) was ligated to purified fragments with T4 RNA ligase during 6 h at 20°C followed by an overnight incubation at 4°C. Ligated fragments were size separated by PAGE/Urea and purified according to prior size selection.
  • 3'-ligated RNAs were further 5'phosphorylated by PNK treatment for 1 h at 37°C, size selected by PAGE/Urea and purified.
  • RNA Adapter (Illumina) was ligated to 3'-ligated RNA fragments with T4 RNA ligase during 6 h at 20°C and further incubated over night at 4°C. 5 -3' adapter-ligated RNA was separated by PAGE/Urea and purified between 70-90 nt for the small RNA fraction and between 100-130 nt for the long RNA fraction. Reverse transcription was performed by using Superscript II (Invitrogen) with specific primers for 1 h at 44°C. Final amplification was performed by 15 cycles of PCR amplification using Phusion DNA polymerase (Finnzymes). Library quality and ligation steps were assessed when possible by Agilent Bioanalyzer. stsRNA-Seq analysis
  • Strand-specific RNA-Seq data was analyzed by custom scripts to remove low quality reads, reads shorter than 21 nt in the case of small RNA libraries and shorter than 35 for long RNA libraries, adapter contamination and empty reads.
  • Final datasets were aligned using Bowtie Aligner allowing up to 2 mismatches to either map the reads to the RAM region or to the human genome (hgl9) according to the experimental setup. For each experiment analysed, ⁇ 24M reads were uniquely mapped to hgl9 for the long RNA datasets whereas for the small RNA datasets the number of unique aligned reads was ⁇ 12M. Aligned reads were further processed for strand specificity and wig files generated for visualization. Correlation analysis
  • Intensity correlation analysis was performed at 20 nt resolution with a custom pipeline for the RAM region and for global analysis.
  • a second correlation analysis was performed by a binary analysis indicating whether transcripts were present or not in a determined window by using a custom pipeline.
  • Class I-IV transcripts were validated by strand-specific reverse transcription followed by PCR. Specific primers were designed with T7 promoter overhanging bases in order to establish reverse transcription orientation. Small RNA determination was assessed with custom Taqman primers (Applied Biosystems) and relative expression levels were determined by reverse transcription followed by real time PCR.
  • 5x106 PLB cells were centrifuged at 700 rpm during 7 min, washed twice with ice cold PBS and lysed with microRNA isolation kit, human AG02 (Wako chemicals) lysis buffer and immunoprecipitation was performed following manufacturer's instructions. Eluted samples were further processed for RNA purification by Trizol (Invitrogen) extraction. Small RNA determination was assessed by custom Taqman primers (Applied Biosystems). Specific primers for U6 RNA, U49 snoRNA and let-7c miRNA were used according to manufacturer's instructions (Applied Biosystems). Immunoprecipitated AG02 levels were assessed by SDS- PAGE followed by silver staining and Western blot against AG02. Transient transfection
  • Transient transfection in BJELR cells was done following standard reverse transfection protocols using lipofectamine RNAiMAX (Invitrogen).
  • ON-target plus smart pools for knocking down Dicer 1 (L-003483-00), Drosha (L-016996-00), AG02 (L-004639-00) and EXOSC3 (L-03195501) as well as scramble negative control (D-001210-01-05) were purchased from Dharmacon and used at a final concentration of lOmM. Samples were collected 72 h post-transfection. Knock down efficiency was controlled by RT-qPCR (using customized primers, sequences are available upon request) and western blot assays.
  • BJELR cells (80% confluence) were rinsed twice with ice-cold PBS, collected in PBS and recovered in lx hypotonic buffer (Cellytic Nuclear Xtract - Sigma) supplemented with 1 mM DTT and protease inhibitors, incubated 15 minutes on ice, vortexed in the presence of Igepal, spun down at 11000 rpm and the supernatant conserved as the Cytoplasmic fraction. Immediately after, the pellet was resuspended in Extraction buffer (Cellytic Nuclear Xtract) supplemented with 1 mM DTT and protease inhibitors and incubated 30 minutes in a thermomixer at 1400 rpm at 8°C (Nuclear fraction). All obtained fractions were aliquoted, flash-frozen and stored at -80°C.
  • ndsRNA electrophoretic mobility shift assay (EMSA)
  • NdsRNA-2a and ndsRNA-2e sequences were cloned into pGEM-T easy vector and further PCR amplified using T7 tagged oligonucleotides from both flanks (Expand High Fidelity - Roche). PCR products were in- vitro transcribed (MegaScript RNAi - Ambion) and 5 '- radioactively labelled by Poly Nucleotide Kinase (PNK - Promega).
  • Nuclear extract was incubated with radio labelled ndsRNA-2a/2e ( ⁇ 25 fmol/reaction) in DBD buffer (10 mM Tris- HC1 pH 8, 0.1 mM EDTA pH 8, 0.4 mM DTT, 5% Glycerol, tRNA, supplemented with NaCl according to experimental setup) and incubated during 15 min at room temperature prior to native PAGE. Competition was achieved by addition of non-radioactive ndsRNA-2a or ndsRNA-2e (50 fmo 1/200 fmol range). When purified recombinant proteins were used, 120 ng of RAN or RCC1 (Origene Technologies) were incubated with ndsRNA-2a or ndsRNA-2e as described above.
  • ndsRNA in vitro binding assay ndsRNA-2a and ndsRNA-2e were PCR amplified, in vitro transcribed with MegaScript RNAi kit (Ambion) and 3 '-biotinylated using RNA 3 '-biotinylation kit (Pierce) according to the manufacturer's instructions. Biotinylated ndsRNAs were immobilized in 5 mM Tris-HCl pH 7.5, 0.5 mM EDTA and 1M NaCl to "my ONE" streptavidin magnetic beads (Invitrogen) for 1 h at 22°C in thermomixer prior to nuclear extract incubation.
  • Nuclear extract was prepared as described above but subjected to two rounds of pre-clearing with "my ONE" magnetic beads in DBD buffer lx supplemented with tRNA and NaCl prior to interaction with immobilized ndsRNAs.
  • Final N.E./ndsRNA incubation was performed at 22°C in a thermomixer for 15 min and further washed 3 times with DBD buffer supplemented with NaCl and NP-40 at room temperature.
  • magnetic beads were recovered in Laemmli buffer, boiled for 10 min and separated by SDS-PAGE or eluted in 1 M NaCl prior to Liquid Chromatography followed by Mass Spectrometry (LC-MS/MS). Protein composition was evaluated by silver staining or Western blot when appropriate.
  • RNA Fluorescence in-situ hybridization coupled to immunocytochemistry ndsRNA-2a sequence was PCR amplified and in- vitro transcribed with T7 RNA polymerase using Chromatide Alexa Fluor 546-14-UTP or Chromatide Alexa Fluor 488-5-UTP as a source of UTP. Fluorescently labeled forward or reverse ndsRNA-2a strands were Trizol purified and stored at -80°C. For immunofluorescence analysis, BJ and HeLA cells were grown in round coverslips and treated according to each experimental setup.
  • Cells were fixed with 3% paraformaldehyde, 4% sucrose in 10 mM PBS for 10 min, permeabilized with 0,25% Triton X- 100 in 10 mM PBS for 10 min, and then blocked for 1 h in 1% BSA in 10 mM PBS (Blocking buffer). Coverslips were incubated over night at 4 °C in blocking buffer containing RAN (#4462, Cell Signalling), RCC1 (#5134, Cell Signalling), RANGAP1 (ab92360, Abeam) or RANBP2 (ab64276, Abeam) antibodies. Cells were washed twice in 10 mM PBS, 0.1%) Tween 20, and incubated with secondary antibody Alexa 488 (Molecular Probes).
  • ndsRNA-2a strand specific probes were hybridized in hybridization buffer (2 ⁇ SSC, 20%> dextran sulfate, and 1 mg/mL BSA) overnight at 37 °C in a thermomixer humid chamber, washed twice with 2 » SSC in 50%> formamide, twice with 2 » SSC, and counterstained with DAPI. Finally coverslips were mounted in ProLong Antifade (Molecular Probes), and visualized on a confocal laser-scanning microscope SP2-MP (Leica).
  • ndsRNA-2 overexpression pTREG-bi plasmid (Clontech) bearing an inducible bidirectional promoter was modified in order to express a 5' (termed CD) or 3' (termed AB) SP6-tagged version of ndsRNA-2a.
  • HeLA Tet-ON 3G cells were AB/CD trans fected, treated with doxycycline 12 h postransfection and collected in trizol for RNA analysis or processed for immunocytochemistry 24 h later.
  • Overexpression and double-stranded nature of exogenous ndsRNA-2a was confirmed by reverse transcription using T7 flagged primers that recognizes the SP6 tag followed by PCR on RNAse ONE treated samples.
  • HeLA cells were co-transfected with ndsRNA-2a overexpressing plasmids (AB/CD) and histone Hl- GFP (AB/CD:H1-GFP 4: 1 ratio) to control in-well efficiency of transfection.
  • Cells were fixed, permeabilized and stained for a-tubulin as previously described.
  • the number of cells displaying an abnormal nuclear morphology, chromatin bridges and bi/multinuclei were determined by double blind analysis in 3 independent experiments (3000 cell counted for each condition).
  • RNA transcripts arising from unexplored regions within the genome, leading to the discovery of novel regulatory paradigms 1"5 .
  • disease-associated markers such as translocations, chromosomal rearrangements and single nucleotide polymorphisms (SNPs) remains largely unexplored 6"8 .
  • One of those regions encompasses -500 kb on chromosome 8 (RAM region: 130,269,750-130,744,812), is critically involved in retinoic acid induced differentiation 9 and contains multiple disease susceptibility SNPs 10"17 .
  • RNAs map on both sense and antisense strands from the RAM region.
  • sense-antisense RNA pairs coexist within the same cell and generate stable long natural double-strand RNA (ndsRNA).
  • ndsRNAs are mainly localized in the nucleus and establish specific interactions with nuclear components.
  • ndsRNA-2a interacts with the mitotic RAN/RANGAP 1 -SUMO 1/RANBP2 complex in a RAN-dependent manner and displays differential nuclear localization throughout the cell cycle.
  • ndsRNA- 2a overexpression leads to a range of mitotic defects and a pronounced change in nuclear shape highlighting its involvement in cell cycle progression.
  • global strand- specific RNA sequencing show that ndsRNA signatures are genome wide interspersed and revealed that ndsRNA molecules are modulated upon cellular cues. Taken together this study reveals ndsRNAs as novel members of the natural RNA-repertoire in human cells that are involved in a plethora of regulatory processes.
  • RNA capture approach (Fig. 5a) coupled to a customized strand-specific RNA sequencing protocol (stsRNA-Seq).
  • stsRNA-Seq a customized strand-specific RNA sequencing protocol
  • This technology permits the concomitant identification of long (>50 nt) and small (18-30 nt) RNAs using a single experimental protocol.
  • DNA traps obtained from random priming of BAC DNA covering the RAM region were hybridized to either the naturally occurring small RNA fraction (18-30 nt) or chemically fragmented and size selected (50-70 nt) total RNA from human leukemic PLB985 cells.
  • RNA profiling revealed a plethora of RNAs mapping to either strand of RAM region including 437 long (Fig. la, upper panel; Fig. 5a) and 630 small RNAs (Fig. la, lower panel). Importantly, -90% of the identified transcripts were detected in three independent experiments indicating a high level of reproducibility among biological and technical replicates (Fig. lb; see material and methods). Bioinformatics analysis between transcripts from the long and small RNA datasets identified four different RNA classes (Fig. lc, Class I-IV).
  • Class I comprises 'classical' transcripts from the long RNA fraction mapping on either forward (Fw) or reverse (Rv) strands, that do not overlap either with long RNAs on the opposite strand nor with small RNAs (Fig. lc and Fig. 5c, Class I).
  • Class II transcripts are long RNA molecules mapping to one strand and overlap with a small RNA (sRNA; Fig. lc and Fig. 5d).
  • sRNA small RNA
  • the existence of Classes III and IV was unexpected as these RNAs correspond to overlapping long transcripts from both strands and represent -22% of all mapped RNAs (Fig. lc and Fig. 5b, e-f).
  • ndsRNAs are natural components of human cells
  • the validation of 11/11 Class III-IV long complementary RNAs prompted us to analyze whether these molecules exist as double-stranded RNA within the cell. If these overlapping transcripts exist as double-strand RNA they should be resistant against an RNAse displaying single-strand specificity (RNAse ONE). Indeed, when total RNA from PLB985 cells was subjected to R Ase ONE treatment Class III-IV transcripts were protected from RNAse degradation (Fig. Id and Fig. 6a). Contrary, single strand Class I-II and GAPDH transcripts were not protected.
  • RNAse III double-strand RNA specificity
  • ndsRNAs are predominantly nuclear (Fig. If, upper row), whereas their corresponding small RNAs are located either exclusively in the nucleus or in both nuclear and cytoplasmic fractions (Fig If, lower panel).
  • the nuclear localization of ndsRNAs prompted us to analyze whether these molecules interact with nuclear proteins. Therefore we performed electrophoretic mobility shift assays with 2 previously identified radioactively labeled ndsRNAs (nds-2a and nds-2e) and nuclear extract obtained from BJELR cells. The results indicated that both ndsRNAs specifically interact with nuclear proteins (Fig. 2a).
  • nds-2a binds a mitosis-specific RAN containing complex
  • Fig. 2b Fig. lOb-c
  • RAN, RCC1, RANGAP1 and RANBP2 were found to interact with nds-2a (Fig. lOd; note the high peptide coverage).
  • these partners are major mitotic components involved in fundamental aspects of cell
  • nds-2a was biotin-immobilized, incubated with nuclear extract and the presence of these 4 proteins in the bound fraction was confirmed by Western blot. None of the analyzed proteins was detected when nds-2e was used as bait, supporting that nds-2a bind selectively to these components of the mitotic machinery (Fig. 2c). Importantly, we observed that only the mitosis-specific sumoylated form of RANGAPl
  • n (RANGAPl -SUMO 1) was present in the complex with nds-2a.
  • RAN, RCCl, RANGAPl and RANBP2 were immunoprecipitated from BJELR cells and the coprecipitated fraction was evaluated for nds-2a presence by stsRT-qPCR (Fig. 2d and Fig. l ib). Both nds-2a forward and reverse strands were enriched in the coprecipitated material, supporting that nds-2a interacts with members of the RAN complex in vivo.
  • RAN, RCCl, RANGAPl or RANBP2-depleted nuclear extracts were used to perform in vitro interaction assays in the presence of biotin-labeled nds-2a.
  • the interaction of nds-2a with the RAN complex was abrogated in RAN depleted nuclear extracts, since the absence of RAN impaired the detection of RANGAPl or RANBP2. Contrary, RCCl binding remained unaffected, thus revealing RAN-independent interaction.
  • nuclear extracts depleted for RCCl were used, RAN/RANGAP1/RANBP2 interaction was unaffected.
  • RNA-FISH RNA-Fluorescence In Situ Hybridization
  • nds-2a is a functionally important component of the mitotic machinery and highlights its biological relevance in a complex biological setting such as mitotic progression.
  • ndsRNAs are modulated by cellular cues and represent a novel class of RNA
  • ndsRNA-seq libraries demonstrated that ndsRNAs are expressed throughout the entire genome and are more abundant in intergenic regions than in exons/introns (Fig. 4a and Fig. 14). Moreover, detailed database exploration showed that ndsRNAs map within different RNA classes (Fig. 4b) suggesting that ndsRNAs are not restricted to any previously described transcript family, but rather represent a novel class of RNAs interspersed within the human genome. Notably, we observed that a subset of globally expressed ndsRNAs is modulated upon retinoic acid treatment (Fig. 4c-d and Fig.15) supporting the notion that these novel molecules are regulated by cellular cues and might participate in a plethora of regulatory systems. Genome positions for the displayed examples are shown in Table 2.
  • ndsRNAs map to interspersed elements along the genome indicating that they correspond to a new class of RNAs.
  • ndsRNAs were merely sRNA precursors
  • ndsR As establish specific RNA-protein interactions suggesting that these molecules serve diverse functions within the cell.
  • nds-2a displays differential localization throughout the cell cycle, interacts with the mitotic RAN/RANGAP1 SUM01/RANBP2 complex and localizes within the mitotic spindle, supporting its biological relevance.
  • nds-2a overexpression leads to a range of mitotic defects and a pronounced change in nuclear shape highlighting its role in cell cycle progression. All in all, our work expands the already complex RNA catalog and demonstrates that ndsRNAs play fundamental roles in cellular physiology.
  • This example describes a method for capturing and identifying ndsRNAs from isolated nucleic. Isolated nuclei are incubated with a biotin labelled-ndsRNA-specific adaptor oligonucleotide (or capture probe) to capture ndsRNAs in the presence of an RNA ligase.
  • RNA is incubated with streptavidin beads and recovered ligated products are subjected to strand-specific RNA-seq library construction.
  • Libraries are sequenced in HiSeq2000 Illumina sequencer and data is analyzed bioinformatically.
  • This method allows retrieval of sequence information that are used to identify ndsRNAs.
  • the method of the invention is used to identify ndsRNAs tumorigenesis-related ndsRNAs.
  • ndsRNA profile specific for each step of the BJ stepwise tumorigenesis model 18 is obtained.
  • This in vitro cellular model recapitulates the basic events necessary for cellular transformation. Briefly, normal primary human cells are transformed in a stepwise manner by the introduction of the catalytic subunit of telomerase (hTERT), the early region of the SV40 virus (SV40 ER) and the activated allele of H-ras (H-rasV12).
  • hTERT catalytic subunit of telomerase
  • SV40 ER the early region of the SV40 virus
  • H-rasV12 activated allele of H-ras
  • This example presents a validation of the method of the invention as effective in retrieving ndsRNAs.
  • the proximity ligation assay i.e. the method of the invention implementing the hairpin capture probe as described above
  • RNAseONE - treatment in order to augment the number of adaptor containing reads prior to de-novo RNA reconstruction.
  • RNAseONE single stranded specificity
  • Circular RNAs are a large class of animal RNAs with regulatory potency. Nature 495, 333-8 (2013).
  • RNA exosome depletion reveals transcription upstream of active human promoters. Science 322, 1851-4 (2008).

Abstract

The present invention relates to a method for identifying a novel class of RNAs from a sample.

Description

METHOD OF CAPTURING AND IDENTIFYING NOVEL RNAs
The present invention relates to a method for identifying double stranded RNAs in a sample. Background of the invention
Recent advances in high throughout sequencing technologies have disheveled an enormous diversity of RNA transcripts arising from unexplored regions within the genome, leading to the discovery of novel regulatory paradigms1"5. However, the transcriptional profile of hundreds of large genomic regions displaying disease-associated markers such as translocations, chromosomal rearrangements and single nucleotide polymorphisms (SNPs) remains largely unexplored6"8.
Mercer et al.4 describe a method for targeted sequencing of the human transcriptome that reveals its deep complexity. Ng et al.31 describe the targeted capture and massive parallel sequencing of human exomes. These reports provide insights on exome and transcriptome complexity. Since, in Portal et al.34, EP14305822 and PCT/EP2015/062179 (which are incorporated by reference) the present inventors have described a new class of double- stranded RNAs with potential regulatory functions, referred to as natural double-stranded RNAs (ndsRNA).
However, there is still a need for a more detailed analysis of the transcriptome and for tools that would allow easy and universal exploration thereof for the identification of novel targets and/or novel RNA species.
Summary of the invention
The inventors have unexpectedly discovered a new class of RNAs herein after referred to as natural double-stranded RNAs (or ndsRNAs). These ndsRNAs are double-stranded structures that resist to single-stranded RNA- specific RNAse and are sensitive to double-stranded RNA- specific RNAse, and that escape processing by members of the known siRNA/miRNA pathways. The inventors herein propose a method and a bioinformatics tool to identify or define the nucleotide sequences of ndsRNAs. This method proposed herein allows the rapid and genome-wide identification of ndsRNAs in a cell or tissue. It is based on the capture of ndsRNAs followed by the identification of all captured molecules by a sequencing step, in particular with massive parallel next generation sequencing (NGS).
Accordingly, a first object of the invention relates to a method for identifying double-stranded RNAs (more particularly ndsRNAs) from cell/tissue samples, comprising:
- generating covalent bonds between the RNAs present in an extract from said sample and a capture probe;
- sequencing the captured RNAs and determining the presence or the absence of double- stranded RNAs (such as ndsRNAs) in the sample.
In a particular embodiment, the capture probe comprises:
- at least two nucleic acid sequences, each being able to be ligated to a RNA present in a sample; and
- a moiety which can be selectively bound by adapted means.
The capture probe, and RNAs ligated to it may be purified using selective means. In a particular embodiment, the moiety which can be selectively bound by adapted means is a modified nucleotide (alternatively referred to as a "tagged nucleotide" in the following) such as a biotinylated nucleotide, for example a biotinylated thymine. In another embodiment, the said moiety is a peptide moiety which may be used to purify the capture probe and any RNA ligated with means having selective affinity to such peptide moiety, for example an antibody specifically binding said peptide moiety.
In a particular embodiment, the capture probe comprises, in this order a first nucleic acid with a 5'-phosphate end, a moiety which can be selectively bound by adapted means, and a second nucleic acid sequence having a 3'-OH end. The first and second nucleic acid sequences have complementary nucleotides one with the other which can form a stem. The length of this stem can vary but can be of 1-10 base pairs, such as 1 , 2, 3, 4, 5, 6, 7, 8, 9 or 10 base pairs, for example. In a particular embodiment, the first and second nucleotides each have two complementary nucleotides, thus being able to form two base pairs in the stem.
In a particular embodiment, the first and second nucleic acid sequence comprise independently from 1 to 20 nucleotides, in particular from 1 to 10, in particular from 2 to 6 nucleotides, such as 2, 3, 4, 5 or 6 nucleotides.
In a particular embodiment, the first and second nucleic acid sequences are complementary one with the other on their entire length. In another embodiment, which is preferred if more flexibility is desired for the capture of R As in the sample of interest, the first and second nucleic acid sequences are only partly complementary, with the 5 '-phosphate and 3'-OH ends of respectively the first and the second nucleic acid sequences, being free. These free "overhangs" may be of 1, 2, 3 or 4 nucleotides or more, in particular of 2 nucleotides. In a particular embodiment, the moiety which can be selectively bound by adapted means is a third nucleic acid sequence which can form a loop and is tagged by inclusion of modified nucleotides such as biotinylated nucleotides (for example one or more biotinylated T, A, C or U) or non natural biotinylated bases that can be incorporated into the capture probe by chemical synthesis. The third nucleotide sequence may be of variable length, but may comprise in particular from 2 to 20 nucleotides, such as 2, 3, 4, 5, 6, 7, 8, 9 or 10 nucleotides, independently of the size of the first and/or second nucleic acid sequences.
In another embodiment, the moiety which can be selectively bound by adapted means is a peptide or protein provided between the first and second nucleic acid sequences.
In a particular embodiment, the capture probe comprises, from 5' to 3', a first, a second and a third nucleic acid sequences, wherein
a) said sequences allow formation of a hairpin with the first and third nucleic acid sequences being complementary (either entirely or partly as provided above) and efficiently forming the stem of the hairpin; and
b) the second nucleic acid sequence forms the loop of the hairpin and is tagged such that it can be purified using tag-specific purification.
According to this embodiment, purification of captured RNAs is done using a method specific for the label (or tag) present in the third nucleic acid sequence. In a further particular embodiment, the capture probe is an RNA-based oligonucleotide capture probe, such as an RNA-based oligonucleotide capture probe, capable of ligating preferentially RNA molecules with 5' and 3' ends in close proximity; wherein the capture probe comprises, from 5' to 3', a first, a second and a third nucleic acid sequences,
wherein a) said sequences allow formation of a hairpin with the first and third nucleic acid sequences being complementary and efficiently forming the stem of the hairpin; and b) the second nucleic acid sequence forms the loop of the hairpin and is tagged such that it can be purified using tag-specific purification.
In a particular embodiment, the second nucleic acid sequence is tagged (or labeled) with one or more biotinylated nucleotides, for example with biotinylated dT or any other biotinylated nucleotide. In a further embodiment, the capture probe is selected from
- DNA oligonucleotides;
- RNA oligonucleotides;
- mixed DNA-RNA oligonucleotides;
- locked nucleotide-containing oligonucleotides or oligonucleotides of any other adapted chemistry;
- a protein bridged capture probe, wherein the capture probe comprises a first and a second nucleic acid sequences, wherein said sequences are complementary and wherein the protein is provided between said first and second nucleic acid sequences;
- any other capture probe comprising a stem comprised of two complementary nucleic acid sequences linked with any other compound that can be used to selectively purify the capture probe and RNA linked to it, such as an antibody or antibody fragment.
In a particular embodiment, the capture probe is of the formula
5* GGAC/X/ACGG(U or T)AA 3*,
wherein X is a nucleotide adapted to the selective purification of the capture probe (such as a modified nucleotide such as a biotinylated nucleotide).
In a further specific embodiment, the capture probe is of the formula
5* Phos/rGrGrArC/X/rArCrGrGrUrArA 3*, more specifically 5' Phos/rGrGrArC/BiotdT/rArCrGrGrUrArA 3'
wherein r indicates that the corresponding nucleotide is a ribonucleotide.
Another object of the invention is a method of identifying double stranded RNA (such as ndsRNA) molecules in a sample comprising:
a) capturing (or purifying) RNAs with an oligonucleotide capture probe as described above; and
b) sequencing the captured RNAs and predicting double-stranded RNA sequences. Furthermore, another object of the invention relates to a method of identifying or characterizing a specific defined double-stranded RNA (in particular a ndsRNA) or a specific pattern or the entire spectrum of ndsRNAs present in a RNA sample, such as for example the RNAs extracted from a physiologically normal or a pathological specimen, comprising application of the above method to the RNAs contained in said sample, thereby identifying or characterizing individual or patterns/signatures of double-stranded RNAs (in particular ndsRNAs) contained therein.
A further object of the invention relates to a method for identifying a marker associated with a phenotype or cell function, or for identifying a target for the treatment of a disease or for identifying a bio marker indicative of a disease or condition. This method comprises the identification of individual or of a specific pattern/set of double-stranded RNAs (such as ndsRNAs), or determining the expression profile of double-stranded RNAs (e.g. ndsRNAs) in a sample of interest, thereby associating individual or patterns of the double-stranded RNAs (suh as ndsRNAs) identified to a phenotype, cell function, or disease. For example, the sample of interest may correspond to a cancer cell and double-stranded RNA (e.g. ndsRNA) expression profiling allows the identification of double-stranded RNA-based (such as ndsRNA-based) biomarkers, or signatures composed of several/multiple double-stranded RNA (e.g. ndsRNA) expression patterns, including the presence or absence of double- stranded RNA(s) (e.g. ndsRNA(s)) which correlates with the pathology, indicative of this cancer or the identification of a double-stranded RNA (such as a ndsRNA) which could be the target of a treatment (for example by increasing or decreasing the expression of said double- stranded RNA, for example a ndsRNA).
Brief description of the drawings Figure 1: identification of RAM-derived RNAs. (a), Transcript profile of RAM-derived long and small libraries, (b), Intensity correlation analysis and Pearson's correlation coefficient values (R) of technical replicates, (c), Screenshots of Class I-IV long and small RNA sequencing data, (d-e), RNAse ONE or RNAse III protection assays followed by stsRT- PCR of Class III-IV ndsRNAs and Class I-II transcripts. GAPDH is depicted as an RNAse treatment control, (f), stsRT-qPCR of Class III and IV long RNAs and small RNAs in PLB985-total RNA from whole cell (Input), nuclei (Nuclei) or cytoplasmic extracts (Cytoplasm). let-7c and U49 snoRNA are cytoplasmic and nuclear controls, respectively. Opposite bars correspond to matching sense/antisense pairs. Histogram represents mean RNA levels as arbitrary units (A.U.) +/- SD of one representative experiment out of three independent biological replicates, (g), stsRT-qPCRs of RAM-derived Class III (from left to right: ndsRNA3, 4, 5, 6, 8 and 9), IV long (from left to right: ndsRNAl and 2) and small (from left to right class III: sRNA 5, 6, 7, 8, 10 and 11 and IV: sRNA 1, 2, 3 and 4) RNAs in BJELR cells. Opposite bars correspond to matching sense/antisense pairs. RNA levels are shown as arbitrary units (A.U.) +/- SD. Genome positions are shown in Table 1.
Figure 2: nds-2a establishes specific RNA-protein interactions and displays cell cycle- dependent subcellular localization, (a), Electrophoretic mobility shift assay performed with radioactively labelled (*) nds-2a/2e incubated with BJELR nuclear extract (N.E). Specificity was confirmed by competition with non-radioactive nds-2a/2e. (b), Gene ontology enrichment analysis performed for nds-2a associated proteins identified by mass spectrometry, (c), Immobilized nds-2a/2e were incubated with N.E. and the presence of RAN, RCCl, RANGAPl and RANBP2 in the ndsRNA-bound fraction was analysed by Western blot, (d), RAN, RCCl, RANGAPl and RANBP2 proteins were immunoprecipitated and the levels of nds-2a forward (2a Fw) and reverse (2a Rv) strands in each fraction was analysed by stsRT- qPCR. Histogram represents mean RNA levels as arbitrary units (A.U.) +/- SD of one experiment out of three biological replicates, (e), Immobilized nds-2a was incubated with RAN, RCCl, RANGAPl or RANBP2 siRNA-depleted nuclear extracts and the nds-2a protein bound fraction was analysed for RAN, RCCl, RANGAPl and RANBP2 proteins by Western blot. N.E. depletion was controlled by Western blot and is depicted as input, (f), Interphase BJ cells were processed for nds-2a RNA-FISH (red channel) coupled to immunocytochemistry for RAN, RANGAPl and RANBP2 (green channel) and analysed by confocal microscopy. a-Tubulin (a-Tub) and DAPI staining are shown to delineate the cell. Merge and enlarged images (Inset) are depicted, (g), nds-2a and RAN, RANGAPl and RANBP2 localization was analyzed by confocal microscopy in metaphase BJ cells. DAPI staining delineates the metaphasic plaque. Merge and enlarged images (Inset) are depicted, (h), Interphase BJ cells were processed for nds-2a RNA-FISH to detect either nds-2a forward (2a Fw, red channel) or reverse (2a Rv, green channel) strands. a-Tubulin (a-Tub) and DAPI staining are shown to delineate the cell. Merge images are depicted, (i), nds-2a and a-Tubulin (a-Tub) localization was analyzed by confocal microscopy in metaphase BJ cells. Merge and enlarged images (Inset) are depicted.
Figure 3: nds-2a overexpression leads to a range of mitotic defects and pronounced changes in nuclear shape, (a), Diagram of pBI-nds-2a plasmid and overexpressed SP6- tagged nds-2a (AB and CD), (b), RNAse ONE protection assays followed by stsRT-PCR of overexpressed SP6_,tagged ndsRNA-2a. GAPDH is depicted as RNAse treatment control, (c- f), HeLA cells overexpressing nds-2a variants (AB or CD) were processed for immunocytochemistry against α-Tubulin (a-Tub, red channel), counterstained with DAPI (blue, channel) to delineate the cellular contour and analyzed by confocal microscopy. Empty plasmid was used as control. Histone Hl-GFP (Hl-GFP, green channel) expressing plasmid was co-transfected to monitor transfection efficiency (Fig. 13a). The number of bi/multinucleated cells, number of chromatin bridges and cells displaying abnormal nuclear shape was determined by a double blind analysis. Data displayed corresponds to one representative experiment out of 3 independent biological replicates. Total number of cells in G2-M phase was determined by flow cytometry for each condition and arbitrarily set to 100% (Fig. 13b). Percentage of abnormal cells per treatment was calculated relative to G2-M cells. Representative images for each condition are depicted in e and f. Bi/multinucleated cells in E and cells displaying abnormal nuclear shape in f are indicated by arrowheads.
Figure 4: Genome- wide expression of ndsRNAs. (a), Pie chart representing percentages of ndsRNAs mapping to exons, introns or intergenic regions, (b), Pie chart representing ndsRNAs compared to fRNAdb annotated features. Different RNA species contained in fRNAdb are color-coded, (c), Retinoic acid-modulated ndsRNA and nds-derived small RNA in PLB985 cells are depicted, (d), RNA levels of forward (Fw) and reverse (Rv) strands of modulated ndsRNAs from RA or vehicle-treated cells was determined by stsRT-qPCR. ICAM1 and PRC mRNAs are shown as controls. Results are expressed as arbitrary units (A.U.) +/- SD of one out of two biological replicas. Genome positions are shown in Table 2. Figure 5: Identification and validation of RAM-derived RNAs. (a), RNA capture approach, (b), Pie chart representing the relative distribution of Class I-IV RAM-derived transcripts, (c-f), Schematic representation and validation of single-stranded Class I-II and natural double-stranded (ndsRNAs) Class III-IV long RNAs, as well as small RNAs from Class II-IV transcripts by stsRT-PCR.
Figure 6: RAM-derived RNAs in PLB985 and BJELR cells, (a-b), RNAse ONE or RNAse III protection assays followed by stsRT-PCR of Class III-IV RNA duplexes and Class I-II transcripts. GAPDH is depicted as an RNAse treatment control, (c), Detection by stsRT-PCR of RAM-derived Class II-IV long and small RNAs in RAS-transformed foreskin fibroblast (BJELR).
Figure 7: RAM-derived small RNAs are neither products of pervasive transcription nor byproducts of the canonical miRNA machinery, (a), Determination of Drosha, Dicer, AG02 and EXOSC3 knock-down efficiency by RT-qPCR in BJELR cells. mRNA levels are expressed relative to untreated mock samples. Histogram represents mean RNA levels as arbitrary units (A.U.) +/- SD of one experiment out of three biological replicates, (b), Determination of Drosha, Dicer, AG02 and EXOSC3 knock-down efficiency by Western blot. Actin levels are shown as loading control, (c), Analysis of the impact of Drosha, Dicer, AG02 and EXOSC3 knock-down on the levels of previously reported miRNAs. (d-g), Quantification of Class II and nds-derived sRNAs as well as miR-93 control upon Drosha (d), Dicer (e), AG02 (f) or EXOSC3 (g) knock down in BJELR cells. The expression level for each sRNA in scramble siRNAs-transfected (scr) samples was arbitrarily set to 1. Values upon knock down are expressed relative to the scr. Histogram represents mean RNA levels as arbitrary units (A.U.) +/- SD of one experiment out of three biological replicates.
Figure 8: RAM-derived small RNAs are loaded into AG02. (a-b), Silver staining and Western blot for argonaute-2 (AG02) in whole cell lysates (input) and AG02- immunoprecipitated material (AG02 IP). AG02, heavy (he) and light (lc) chains of the IP- antibody are indicated, (c-d), Detection by stsRT-qPCR of Class II and nds-derived sRNAs in RNA extracted from whole cell lysates (input) or AG02-immunoprecipitated material from PLB985 cells. let-7c and U6 snoRNA are included as AG02-loaded positive and negative controls. Histogram represents mean RNA levels as arbitrary units (A.U.) +/- SD of one experiment out of two biological replicates.
Figure 9: RAM-derived long RNAs are neither modulated upon exosome depletion nor by the miRNA pathway, (a-d), Quantification of Class II (RNA1 : B2, RNA2: B3, RNA3: B4, RNA4: B6, RNA5: B7 and RNA6: B8), Class III (ndsRNA3: C1/C2, ndsRNA4: C3/C4, ndsRNA5: C5/C6, ndsRNA6: C7/C8, ndsRNA7: C9/C10, ndsRNA8: C11/C12, ndsRNA9: C13/C14) and Class IV (ndsRNAl : D1/D2 and ndsRNA2: D3/D4) long RNAs precursors upon Drosha (a), Dicer (b), AG02 (c) or EXOSC3 (d) knock down in BJELR cells. The expression level of each RNA in scramble siRNAs-transfected (scr) samples was arbitrary set as 1. Values upon specific knock-down are expressed relative to the scr sample. Histogram represents mean RNA levels as arbitrary units (A.U.) +/- SD of one experiment out of three biological replicates.
Figure 10: ndsRNAs establish specific protein interactions, (a), Nuclear proteins interacting with nds-2a are revealed by SDS-PAGE followed by silver staining, (b-c), Gene ontology enrichment analysis of nds-2a and nds-2e interacting proteins identified by mass spectrometry, (d), Peptide coverage of nds-2a interacting RAN, RCC1, RANGAPl and RANBP2 identified by mass spectrometry.
Figure 11: nds-2a binds RAN and RCC1 in vitro and in vivo, (a), The specificity of RAN, RCC1, RANGAPl and RANBP2 antibodies was tested by a siRNA based approach, a- Tubulin (a-Tub) is shown as a loading control, (b), RAN, RCC1, RANGAPl and RANBP2 proteins were immunoprecipitated with specific antibodies and their corresponding levels were analysed by Western blot to determine immunoprecipitation efficacy, (c), Electrophoretic mobility shift assay performed with radioactively labelled (*) nds-2a/2e incubated with increasing concentration of recombinant RAN or RCC1 (RAN/RCCl = 120 ng and RAN++/RCC1++ = 240 ng). (d), BJELR cells were sorted according to cell cycle phases (Gl , S and G2-M) by flow cytometry. Panels show a normal cell cycle and the enrichment of the FACS sorted populations in the indicated phases of the cell cycle, (e), Sorted cells from d were analyzed for nds-2a forward (2a Fw) and reverse (2a Rv) strand levels by stsRT-qPCR. Histogram represents mean RNA levels as arbitrary units (A.U.) +/- SD of one experiment out of two biological replicates.
Figure 12: nds-2a localization in interphase cells, (a), Interphase BJ cells were processed for nds-2a RNA-FISH to detect either nds-2a forward (2a Fw, red channel) or reverse (2a Rv, green channel) strands. a-Tubulin (a-Tub) and DAPI staining are shown to delineate the cell. Merge images are depicted. Note that no signal for nds-2a is retrieved when no initial heat denaturation is applied to the slides before hybridization, (b), Interphase BJ cells were processed for nds-2a RNA-FISH (red channel) coupled to immunocytochemistry for RAN, RANGAPl and RANBP2 (green channel) and analysed by confocal microscopy. a-Tubulin (a-Tub) and DAPI staining are shown to delineate the cell. Merge images are depicted. Note that no signal for nds2a is retrieved when no initial heat denaturation is applied to the slides before hybridization, (cd), Interphase and metaphase BJ cells were processed for nds-2a RNA-FISH (red channel) coupled to immunocytochemistry for RCC1 (green channel) and analysed by confocal microscopy. a-Tubulin (a-Tub) and DAPI staining are shown to delineate the cell. Merge and enlarged (Inset) images are depicted, (e), Interphase HeLA cells were processed for nds-2a RNAFISH (red channel) coupled to immunocytochemistry for RAN, RCC1, RANGAP1 and RANBP2 (green channel) and analysed by confocal microscopy. a-Tubulin (a-Tub) and DAPI staining are shown to delineate the cell. Merge and enlarged (Inset) images are depicted.
Figure 13: Cell cycle profile of nds-2a transfected cells, (a), HeLA cells were cotransfected with the indicated plasmids and transfection efficiency was calculated as the percentage of cell displaying a positive labelling for GFP (upper right). Non-transfected cell were used as an autofluorescence control, (b), Cell cycle profile of transfected HeLA cells with indicated plasmid pairs. Percentage of cells in each phase of the cell cycle are depicted.
Figure 14: Quality controls of global PLB985-derived stsRNA-Seq libraries, (a-b), Intensity correlation analysis of technical replicates at 20 nt resolution of long fragmented RNAs (50-70 nt) and naturally occurring small RNAs (18-30 nt) from global stsRNA-Seq libraries. Results are displayed as log 2 of original values. Pearson's correlation coefficient values (R) are shown, (c-e), Screenshots of long stsRNA-Seq showing the profile obtained for known genes (HSP90B1 in forward (Fw) strand and cl2orf73 in reverse (Rv) strand), lincRNAs (chr6: 141071891-141249602; Rv strand) and several SNAR RNAs precursors and its corresponding small RNAs (SNAR- A3; Rv strand; small stsRNA-Seq).
Figure 15: Modulation of ndsRNA levels by Retinoic Acid, (a), Flow cytometry analysis of PLB985 cells treated either with vehicle (ETOH) or retinoic acid (RA). Percentage of differentiated cells was determined by CDl lc/CD14 immunolabelling. Fluorescent background signal was assessed by labeling with fluorescently labeled non-specific isotypic antibodies, (b), Representation of the RNA levels of forward (Fw) and reverse (Rv) strands of randomly selected ndsRNAs transcripts in PLB985 samples treated with RA or vehicle determined by stsRT-qPCR. Histogram represents mean RNA levels as arbitrary units (A.U.) +/- SD of one experiment out of two biological replicates. Genome positions for the displayed examples are shown in Table 2.
Figure 16: schematic representation of an embodiment of the ndsRNA capture and identification method of the invention. Isolated nuclei are incubated with a biotin labelled- ndsRNA-specific adaptor oligonucleotide (or capture probe) to capture ndsRNAs in presence of an RNA ligase. Total RNA is incubated with streptavidin beads and recovered ligated products are subjected to strand- specific RNA-seq library construction. Libraries are sequenced in HiSeq2000/2500 Illumina sequencer and data is analyzed bioinformatically. Figure 17: validation of the capture of ndsRNAs. The proximity ligation assay proposed herein captures nds-2a (a), nds-2e (b) and additional already validated ndsRNAs (c).
Figure 18: example of re-constructed molecules from PLA analysis for ndsRNAs after RNAse optimization. A single adaptor (bases underlined) is identified for a given single reconstructed molecule showing that RNAse optimization may be used to achieve optimal results. Detailed description
Unless otherwise defined, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this invention belongs. In the present application, the terms "label" and "tag" are used interchangeably.
In the present application, a particular embodiment of the methods and uses of the invention is directed to the capture, identification or sequencing of a ndsRNA. ndsRNAs were first described by the inventors in Portal et al.34, EP14305822 and PCT/EP2015/062179 (which are incorporated by reference). ndsRNA are double-stranded structures that resist to single- stranded RNA-specific RNAse and are sensitive to double-stranded RNA-specific RNAse, and that escape processing by members of the known siRNA/miRNA pathways. In a particular variant, they correspond to non-coding sequences. Furthermore, in a particular embodiment, the size of the ndsRNA is of at least 50 nucleotides, at least 60, at least 70, at least 80, at least 90, at least 100, at least 150, at least 200 or even at least 250 nucleotides.
RNA source and isolation
The test RNA (i.e. the RNA from which the capture is to be performed) may be obtained from any sample of interest. The sample of interest may be a cell, a cell culture or a tissue from a subject, for example an animal subject, in particular a mammal or non-mammal animal, in particular from a human subject. The sample may also correspond to nuclei isolated from such cells or tissues. For example, the sample may be a cell or tissue sample from a patient in a diseased state. In this case, the present invention permits the identification and characterization of ndsRNAs involved in the disease or indicative of the pathology. RNAs are isolated using methods well known in the art (Molecular Cloning: A Laboratory Manual, Third Edition. J. Sambrook, D. Russell). In particular, kits are readily available to those skilled in the art for performing total RNA extraction from a cell or tissue sample.
The RNAs are extracted and purified using methods well known in the art. In particular, RNAs are purified using Trizol, as is well known to those skilled in molecular biology.
Before the capture (or purification) step, ribosomal RNA may be depleted from the RNA extract such that the RNA transcripts are enriched regardless of their polyadenylation status or the presence of a 5'-cap structure. Commercially available kits for depleting ribosomal RNA include the RiboMinus kit from Invitrogen.
In addition, single stranded RNA species may be depleted by using a RNase that only degrades single stranded RNAs, such as RNase ONE® from Promega. For example, single- stranded RNA-specific RNAse treatment may be implemented either before or after contacting the capture probe with the RNA extract, in particular before.
In a particular embodiment, the RNA extract may be treated with a DNAse. In a further particular embodiment, the RNA extract may be assessed to determine whether it is free from genomic DNA. For example, quantitative PCR amplification of a short region (e.g. of about 100 bp) from a single copy gene may be implemented. For example, the topoisomerase (DNA) III Alpha (TOP3 A) single copy gene may be assessed with the forward primer of SEQ ID NO: 3 (5 '-TC ATCTGT ATGGCC AGGT AGG-3 ') and the reverse primer of SEQ ID NO:4 (5 '-GGAACCTTT AGGTTGTTAAC AGTTG-3 ') . If genomic DNA contamination is detected, the RNA extract may be treated with a DNAse, such as the TURBO DNAse, followed by a new RNA extraction as described above, such as a Trizol extraction.
In a particular embodiment, the captured RNAs correspond to a small RNA fraction of RNAs 18 to 30 nucleotide long. In another embodiment, long RNAs are captured, preferably after fragmentation (e.g. chemical or enzymatic fragmentation) to 50-250, such as 50-150, 50-100 or 50-80 nucleotide long RNAs, such as 50-70, 60-80 or 75-100 nucleotide long RNAs. In a particular embodiment, fragmented long RNAs are 50-70 or 75-100 nucleotide long, or may be the fragmented long RNAs may be of up to 100, up to 150, up to 200 or even up to 250 nucleotide long. Fragmentation of long RNAs can be carried out with divalent cation-based cleavage such as zinc-mediated RNA fragmentation, for example with a zinc based RNA fragmentation reagent such as "RNA fragmentation reagent®" from Ambion. Both small and fragmented long RNA fractions may be purified according to methods well known in the art. In an embodiment, these fractions are purified after migration on denaturing polyacrylamide gel electrophoresis (e.g. PAGE-urea). In a particular embodiment, both small RNAs and long fragmented RNAs are captured in a separate or simultaneous reaction, preferably in separate reactions. RNA capture
The isolated and/or purified RNA molecules, in particular the small fraction RNAs and/or the long fraction RNAs, in particular the fragmented long fraction RNAs described above, are incubated with the capture probe. As mentioned above, a treatment with a RNAse specific for single- stranded RNA molecules may be done before or after incubation of the RNA molecules with the capture probe.
The present invention is based on the idea that both strands of a double stranded RNA molecule could be ligated by an exogenous closed hairpin oligonucleotide adapter. As indicated above, the capture probe may be any molecule that may capture double-stranded RNA species and which may be purified according to a specific moiety present in said adapter. By taking advantage of this specific moiety (or tag) present in the adapter, captured molecules are enriched and further processed according to the steps indicated below. In a particular embodiment, the capture probe comprises, from 5' to 3', a first, a second and a third nucleic acid sequences, wherein
a) said sequences allows formation of a hairpin with the first and third nucleic acid sequences being complementary and efficiently forming the stem of the hairpin; and
b) the second nucleic acid sequence forms the loop of the hairpin and is tagged such that it can be purified using tag-specific purification.
According to this embodiment, purification of captured RNAs is done using a method specific for the tag present in the third nucleic acid sequence In a further particular embodiment, the capture probe is an oligonucleotide capture probe, in particular an RNA-based oligonucleotide capture probe, capable of ligating preferentially R A molecules with 5' and 3' ends in close proximity; wherein the capture probe comprises, from 5' to 3', a first, a second and a third nucleic acid sequences,
wherein a) said sequences allows formation of a hairpin with the first and third nucleic acid sequences being complementary and efficiently forming the stem of the hairpin; and b) the second nucleic acid sequence forms the loop of the hairpin and is tagged such that it can be purified using tag-specific purification. In a particular embodiment, the third nucleic acid sequence is tagged with one or more biotinylated nucleotides, for example with biotinylated dT or any other biotinylated nucleotide.
In a further embodiment, the capture probe is selected from
- DNA oligonucleotides;
- RNA oligonucleotides;
- mixed DNA-RNA oligonucleotides;
- locked nucleotide-containing oligonucleotides or oligonucleotides of any other adapted chemistry;
- a protein bridged capture probe, wherein the capture probe comprises a first and a second nucleic acid sequences, wherein said sequences are complementary and wherein the protein is provided between said first and second nucleic acid sequences;
- any other capture probe comprising a stem comprised of two complementary nucleic acid sequences linked with any other compound that can be used to selectively purify the capture probe and RNA linked to it such as an antibody or antibody fragment.
In a particular embodiment, the capture probe is of the formula
5* GGAC/X/ACGG(U or T)AA 3*,
wherein X is a moiety adapted to the selective purification of the capture probe (such as modified nucleotides such as biotinylated nucleotides or a protein moiety).
In a further specific embodiment, the capture probe is of the formula
5* Phos/rGrGrArC/X/rArCrGrGrUrArA 3*,
more specifically 5' Phos/rGrGrArC/BiotdT/rArCrGrGrUrArA 3' wherein r indicates that the corresponding nucleotide is a ribonucleotide.
In a particular embodiment, capture probe is an oligonucleotide and the method comprises incubating the isolated and/or purified RNA molecules with the oligonucleotide capture probe and a RNA ligase (e.g. for example with T4 RNA ligase), under conditions allowing ligation of the RNAs with the capture probe. Captured RNAs, i.e. RNAs ligated to the capture probe, are then recovered implementing a selective recognition of the label present in the third nucleic acid sequence. In particular, in case of a biotinylated capture probe, RNA/capture probe complexes are recovered using a support grafted with streptavidin such as streptavidin (magnetic) beads, columns, etc. If the capture probe comprises a moiety which is a protein, methods based on affinity for this protein, for example with antibodies recognizing this protein bound on a support, are implemented. Bound RNA is then eluted from the support.
Sequencing
The present invention relies on the implementation of RNA sequencing over dsRNAs captured by the capture probe, in particular a capture probe such a double stranded RNA (e.g. a ndsRNA) capture oligonucleotide. Potential double stranded RNAs, and most particularly ndsRNAs, are identified by de-novo transcript reconstruction (i.e. with no genome alignment) followed by a search of the double-stranded RNA (e.g. ndsRNA) capture oligonucleotide sequence in the reconstructed transcripts. Finally the transcripts containing the capture oligonucleotide sequence are split before and after the capture oligonucleotide to test overlapping in opposite strands. As is explained in more details below, potential double- stranded RNAs (e.g. ndsRNAs) are identified when transcript overlaps in opposite strands are detected.
Thanks to the method of the invention, polarity of the double stranded RNA may be retrieved based on the presence of the capture oligonucleotide present in the ligated species. Other sequencing methods are known in the art that preserve directional information, for example using distinguishable adapters for different ends of RNA. Representative methods include the ligation of a 3'-adapter and a 5'-adapter, the sequences of which are known and different, to the captured RNAs. As described in the examples, a method for preparing a RNA sample for further sequencing may include:
- ligating a 3'-adapter to the RNAs;
- optionally, purifying (e.g. on a PAGE/urea device) the 3'-ligated RNAs; - ligating a 5'-adapter to the R As;
- purifying the ligated RNAs.
More particularly, the following steps may be implemented for preparing a RNA sample for further sequencing:
- the 3'-ends of the captured RNAs are dephosphorylated using a phosphatase (e.g; antartic phosphatase);
- a 3 '-adapter is ligated to the RNAs using a RNA ligase such as T4 RNA ligase;
- the RNA molecules are 5'-phosphorylated;
- 3 '-adapter ligated RNA molecules are purified, for example by size separating them with denaturing gel electrophoresis and recovery by gel excision according to their expected size followed by RNA precipitation;
- a 5 '-adapter is ligated to the RNAs purified in the preceding step.
Following this last step, the 5'- and 3 '-ligated RNA molecules may be purified by denaturing gel electrophoresis and recovered by gel excision according to their expected size.
Representative commercial kits available for preparing a RNA with 3'-, 5' adapters include the DGE small RNA library kit from Illumina. The RNAs are then reverse transcribed with specific primers, for example with primers specific of the 3' and 5' adapters and the obtained cDNA may be amplified by PCR. In particular, the invention may implement ligation-mediated reverse transcription followed by PCR amplification. In a particular embodiment of the invention, at this step of the method, multiplexed library preparation can be implemented by using indexed primers during a PCR- based library amplification, allowing multiple samples to be sequenced in parallel in a single sequencing run.
Sequencing is then performed according to methods well known in the art, using a sequencing apparatus. In a particular embodiment, sequencing length is of at least 75, at least 100, at least 150, at least 200 or even at least 250 nucleotide long reads (such as 75-100 nucleotide long reads) along the sequencing procedure to improve reconstruction accuracy and retrieve ndsRNA molecules with better confidence.
Data processing Sequence datasets are then processed.
RNA sequences may first be preprocessed, in particular with a computer program, to identify the capture oligonucleotide sequence contained within reads, thereby generating datasets. These datasets may then be used to de-novo reconstruct transcripts. Transcript re-construction may be done using a computer program such as the Trinity (Grabherr, MG, et al. Nature Biotech. 2011) or Scripture (Gutmann M. et al. Nature Biotech. 2010). Since transcript polarity is conserved during the construction of the RNA- Sequencing library, the retrieved sequencing reads are 5 '-3' oriented. Therefore, single and potential double stranded RNA transcripts can be reconstructed and identified. Afterwards, reconstructed transcripts containing the capture oligonucleotide sequence may be identified and processed further. Sequence before and after the capture oligonucleotide are split into two different transcripts and mapped independently, for example using Bowtie Aligner program (Langmead et al. "Ultrafast and memory efficient alignment of short DNA sequences to the human genome". Genome Biology, 2009). Afterwards, transcripts overlap in opposite strands is identified, for example by using a computer program such as BedTools (Quinlan AR and Hall AM. "BEDTools: a flexible suite of utilities comparing genomic features". Bioinformatics, 2009) and identified regions are extracted for further analysis. According to an embodiment, a double-stranded RNA (such as s ndsRNA) is identified on the basis of identification of transcript overlap in opposite strands of the transcripts identified as sequences before and after the sequence of the capture oligonucleotide, hence experimentally validating double stranded RNA. The presence of a double-stranded RNA such as a ndsRNA can further be validated by strand specific reverse transcription followed by PCR or qPCR for each of the strands in the ndsRNA and their corresponding small RNAs. Furthermore RNAse specific for single stranded RNA species, such as RNAse ONE can be used to further support the double stranded nature of the molecule (e.g. a ndsRNA) identified.
In a specific embodiment, the invention relates to a method for identifying double stranded RNAs (such as ndsRNAs) in a sample comprising:
- capturing double stranded RNAs (such as ndsRNAs)using a capture probe as described above; - denaturing the captured RNAs;
- reverse transcribing the captured RNAs thereby producing cDNAs;
- sequencing the cDNAs obtained in the previous step;
- analysing the sequences, thereby identifying double stranded RNAs (such as ndsRNAs).
In a particular embodiment, the invention provides a method for identifying double stranded RNAs (such as ndsRNAs)in a sample, comprising:
- capturing RNAs using a capture probe as described above;
- sequencing the capture RNAs or a cDNA reverse transcribed from the capture RNAs;
- preprocessing the thereby obtained sequences to identify the capture probe oligonucleotide sequence within the reads;
- reconstructing and identifying single-stranded and potential double-stranded RNA transcripts;
- mapping for sequences before and after the capture probe oligonucleotide sequence; and - detecting transcript overlap in opposite strands, thereby identifying double stranded RNAs (such as ndsRNAs).
The invention relates also to a double stranded RNAs (such as ndsRNAs), in particular a ndsRNA identified according to the method described above. In a particular embodiment, the ndsRNA is ndsRNA-2a or ndsRNA-2e. In particular, the inventors have shown that ndsRNA- 2a is involved in a mitosis-specific RAN containing complex, showing its potential involvement in mitosis. ndsRNA-2a and ndsRNA-2e sequences are shown in SEQ ID NO: l and SEQ ID NO:2, respectively. SEQ ID NO: 1 :
AUGCCUGUAAUCCCAGCCACUUGGGAGGCUGAGGCAGGAAAAUUGCUUGAACC CAGGAGGCAGAGGUUGCAGUGAGCCAAGAUCACGCCACUGCACUCCAGCCUGG GCAACAGAGCAAGACUCCAUCUCAAAAAAAG SEQ ID NO:2:
ACAGAACAGAGGCCUCAGAAAUAACACCACACAUCUACAACCACCUGAUCUUU
GACAAACCUGACAAAAACAAGCACUGGGGAAAGGAUUCCCUAUUUAAUAAAUG
GUGCUGGGAAAACUG Methods and uses implementing ndsRNAs
The inventors have shown that ndsRNAs have a functional role in the cell. For example, it is herein demonstrated that nds-RNA2a interacts with major mitotic components involved in fundamental aspects of cell physiology ranging from nuclear import/export to spindle assembly and mitotic progression. In addition, overexpression of ndsRNA-2a leads to a range of mitotic defects and a pronounced change in nuclear shape highlighting its role in cell cycle progression. Furthermore, the inventors have shown that a subset of globally expressed ndsRNAs is modulated upon retinoic acid treatment, demonstrating that the novel RNAs are regulated by cellular cues and participate in a plethora of regulatory systems.
Therefore, an object of the invention is also a method for identifying a marker associated with a phenotype or cell function, or for identifying a target for the treatment of a disease or for identifying a bio marker indicative of a disease or condition. This method comprises the identification of ndsRNAs, or determining the expression profile of ndsRNAs in a sample of interest, thereby associating the ndsRNAs identified to a phenotype, cell function, or disease.
For example, the sample of interest may correspond to a cancer cell and ndsRNA expression profiling allows the identification of biomarkers indicative of this cancer or the identification of a ndsRNA which could be the target of a treatment (for example by increasing or decreasing the expression of said ndsRNA).
The invention also relates to a double stranded RNAs (such as ndsRNAs)that may be identified thanks to the method described in the preceding paragraph.
In addition, the invention further relates to a method for identifying the function of a ndsRNA, wherein said ndsRNA is either introduced or depleted in a cell, tissue, organ or organism (in particular a mammal organism, more particularly a non-human organism) and phenotypic or functional changes occurring after said introduction or depletion are determined. Representative changes searched include, for example, changes in the cell cycle, cell shape, induction of apoptosis, induction of cell differentiation, induction of a sensitivity or resistance to a therapeutic molecule, etc. The characterization of a ndsRNA may also comprise the identification of binding partners of said ndsRNA, in particular of binding proteins, for example using a mass spectrometry (MS) analysis, as provided in the examples. In an embodiment, the binding partners are identified using a biotinylated RNA which is incubated with a protein sample, for example a whole cell extract, a nuclear extract, a cytoplasmic extract or with a protein produced in vitro, the biotinylated ndsRNA:protein complex is then captured on a streptavidin covered support and then protein analysis is performed, in particular using a MS analysis.
The invention also relates to an oligonucleotide capture probe as described above.
The invention further relates to a kit comprising an oligonucleotide capture probe as described above. The kit may comprise any buffer or material used in the implementation of the above methods. For example, the kit of the invention may comprise a ligase and/or a capture support (for example streptavidin beads if the capture probe is labeled with biotin) and/or any buffer useful in the practice of the invention. In addition, the kit may comprise instructions for the user to follow for implementing the invention.
The following examples are given for purposes of illustration and not by way of limitation. Examples
EXAMPLE 1 Material and methods
Cell culture and total RNA preparation
PLB985 cells were grown in RPMI medium supplemented with 25 mM HEPES, 10% FCS and glutamine. BJ and BJELR cells were grown in DMEM/M199 1 :4 (lg/1 glucose) supplemented with 10% FCS. HeLA cells were grown in DMEM (lg/1 glucose), 5% FCS supplemented with glutamine. Total RNA was extracted using Trizol (Invitrogen) according to manufacturer instructions.
RNA fragmentation Ribominus RNA (Invitrogen) was fragmented in zinc-based RNA fragmentation reagent (Ambion) during 6 min at 70°C, separated by PAGE/urea and 50-70 nt RNA was recovered by NaCl overnight elution.
BAC DNA library
BACs (RP11-3O20, RP11-44018, RP11-770K21, RP11-588B17) covering the RAM region were obtained from Children's Hospital Oakland Research Institute (CHORI). Briefly, 200 ng of an equimolar mix of BAC DNA was sonicated for 37 cycles (10 sec ON, 50 sec off, amplitude 30%) in a Vibra-Cell apparatus (Bioblock Scientific) in lysis buffer (50 mM HEPES pH 7.5, 140 mM NaCl, 1% Titron X-100, 0.1% Na-Deoxycholate, supplemented with protease inhibitors). Sonicated BAC DNAs were size-separated by agarose gel electrophoresis and 200-300 bp band was purified by QIAquick column (Qiagen).
TRAPs
Traps (or baits) were generated by using MEGAPrime random primer labelling system (Amersham) with 250 ng of BAC DNA library and 5' Biotin-random primers. After Klenow extension, BAC DNA was removed from the Traps by Dpn I treatment.
RNA capture
Specific RAM-region traps were generated by random priming of 200-300 bp purified sonicated BACS (RP11-44018, RP11-588B17, RP11-770K21 and RP11-3O20) using 5 '- biotinylated primers. Long RNA fraction was prepared by chemical fragmentation of total RNA (50-70 nt, Ambion) and further purified by PAGE, whereas the small RNA fraction (18- 30 nt) was directly prepared by PAGE. Both fractions were incubated with the traps in Binding buffer (0.5 M NaCl, 0.01 M Tris-HCl pH 7.5, 0.5% SDS, 0.1 mM EDTA) at 62°C (small RNAs) or 68°C (long RNAs) overnight. RNA traps were further recovered using magnetic streptavidin beads, RNA eluted into Elution buffer (0.01 M Tris-HCl pH 7.5, 1 mM EDTA) and further purified by PAGE according to previous size selection.
Strand-Specific RNA-Seq
Eluted RNAs were dephosphorylated with 5 units of Antarctic phosphatase (NEB) separated by PAGE/Urea gel electrophoresis and purified by gel excision according to prior size selection. 3' RNA Adapter (Illumina) was ligated to purified fragments with T4 RNA ligase during 6 h at 20°C followed by an overnight incubation at 4°C. Ligated fragments were size separated by PAGE/Urea and purified according to prior size selection. 3'-ligated RNAs were further 5'phosphorylated by PNK treatment for 1 h at 37°C, size selected by PAGE/Urea and purified. 5' RNA Adapter (Illumina) was ligated to 3'-ligated RNA fragments with T4 RNA ligase during 6 h at 20°C and further incubated over night at 4°C. 5 -3' adapter-ligated RNA was separated by PAGE/Urea and purified between 70-90 nt for the small RNA fraction and between 100-130 nt for the long RNA fraction. Reverse transcription was performed by using Superscript II (Invitrogen) with specific primers for 1 h at 44°C. Final amplification was performed by 15 cycles of PCR amplification using Phusion DNA polymerase (Finnzymes). Library quality and ligation steps were assessed when possible by Agilent Bioanalyzer. stsRNA-Seq analysis
Strand-specific RNA-Seq data was analyzed by custom scripts to remove low quality reads, reads shorter than 21 nt in the case of small RNA libraries and shorter than 35 for long RNA libraries, adapter contamination and empty reads. Final datasets were aligned using Bowtie Aligner allowing up to 2 mismatches to either map the reads to the RAM region or to the human genome (hgl9) according to the experimental setup. For each experiment analysed, ~24M reads were uniquely mapped to hgl9 for the long RNA datasets whereas for the small RNA datasets the number of unique aligned reads was ~12M. Aligned reads were further processed for strand specificity and wig files generated for visualization. Correlation analysis
Intensity correlation analysis was performed at 20 nt resolution with a custom pipeline for the RAM region and for global analysis. To correlate reproducibility between experiments, a second correlation analysis was performed by a binary analysis indicating whether transcripts were present or not in a determined window by using a custom pipeline.
Identified transcript validation
Class I-IV transcripts were validated by strand-specific reverse transcription followed by PCR. Specific primers were designed with T7 promoter overhanging bases in order to establish reverse transcription orientation. Small RNA determination was assessed with custom Taqman primers (Applied Biosystems) and relative expression levels were determined by reverse transcription followed by real time PCR.
AG02 immunoprecipitation and bound small RNA analysis
5x106 PLB cells were centrifuged at 700 rpm during 7 min, washed twice with ice cold PBS and lysed with microRNA isolation kit, human AG02 (Wako chemicals) lysis buffer and immunoprecipitation was performed following manufacturer's instructions. Eluted samples were further processed for RNA purification by Trizol (Invitrogen) extraction. Small RNA determination was assessed by custom Taqman primers (Applied Biosystems). Specific primers for U6 RNA, U49 snoRNA and let-7c miRNA were used according to manufacturer's instructions (Applied Biosystems). Immunoprecipitated AG02 levels were assessed by SDS- PAGE followed by silver staining and Western blot against AG02. Transient transfection
Transient transfection in BJELR cells was done following standard reverse transfection protocols using lipofectamine RNAiMAX (Invitrogen). ON-target plus smart pools for knocking down Dicer 1 (L-003483-00), Drosha (L-016996-00), AG02 (L-004639-00) and EXOSC3 (L-03195501) as well as scramble negative control (D-001210-01-05) were purchased from Dharmacon and used at a final concentration of lOmM. Samples were collected 72 h post-transfection. Knock down efficiency was controlled by RT-qPCR (using customized primers, sequences are available upon request) and western blot assays. Western blots were performed following standard protocols, mouse polyclonal to EXOSC3 (ab88859), mouse monoclonal to AG02 (ab57133) and mouse monoclonal to Dicer l(abl4601) were purchased from Abeam. Rabbit monoclonal antibody against Drosha (D28B1) was purchased from Cell Signalling. Goat polyclonal to actin (c-11 , sc-1615) was purchased from Santa Cruz Biotechnology. Nuclear/Cytoplasmic fractionation
BJELR cells (80% confluence) were rinsed twice with ice-cold PBS, collected in PBS and recovered in lx hypotonic buffer (Cellytic Nuclear Xtract - Sigma) supplemented with 1 mM DTT and protease inhibitors, incubated 15 minutes on ice, vortexed in the presence of Igepal, spun down at 11000 rpm and the supernatant conserved as the Cytoplasmic fraction. Immediately after, the pellet was resuspended in Extraction buffer (Cellytic Nuclear Xtract) supplemented with 1 mM DTT and protease inhibitors and incubated 30 minutes in a thermomixer at 1400 rpm at 8°C (Nuclear fraction). All obtained fractions were aliquoted, flash-frozen and stored at -80°C. ndsRNA electrophoretic mobility shift assay (EMSA)
NdsRNA-2a and ndsRNA-2e sequences were cloned into pGEM-T easy vector and further PCR amplified using T7 tagged oligonucleotides from both flanks (Expand High Fidelity - Roche). PCR products were in- vitro transcribed (MegaScript RNAi - Ambion) and 5 '- radioactively labelled by Poly Nucleotide Kinase (PNK - Promega). Nuclear extract was incubated with radio labelled ndsRNA-2a/2e (~25 fmol/reaction) in DBD buffer (10 mM Tris- HC1 pH 8, 0.1 mM EDTA pH 8, 0.4 mM DTT, 5% Glycerol, tRNA, supplemented with NaCl according to experimental setup) and incubated during 15 min at room temperature prior to native PAGE. Competition was achieved by addition of non-radioactive ndsRNA-2a or ndsRNA-2e (50 fmo 1/200 fmol range). When purified recombinant proteins were used, 120 ng of RAN or RCC1 (Origene Technologies) were incubated with ndsRNA-2a or ndsRNA-2e as described above. ndsRNA in vitro binding assay ndsRNA-2a and ndsRNA-2e were PCR amplified, in vitro transcribed with MegaScript RNAi kit (Ambion) and 3 '-biotinylated using RNA 3 '-biotinylation kit (Pierce) according to the manufacturer's instructions. Biotinylated ndsRNAs were immobilized in 5 mM Tris-HCl pH 7.5, 0.5 mM EDTA and 1M NaCl to "my ONE" streptavidin magnetic beads (Invitrogen) for 1 h at 22°C in thermomixer prior to nuclear extract incubation. Nuclear extract was prepared as described above but subjected to two rounds of pre-clearing with "my ONE" magnetic beads in DBD buffer lx supplemented with tRNA and NaCl prior to interaction with immobilized ndsRNAs. Final N.E./ndsRNA incubation was performed at 22°C in a thermomixer for 15 min and further washed 3 times with DBD buffer supplemented with NaCl and NP-40 at room temperature. Finally magnetic beads were recovered in Laemmli buffer, boiled for 10 min and separated by SDS-PAGE or eluted in 1 M NaCl prior to Liquid Chromatography followed by Mass Spectrometry (LC-MS/MS). Protein composition was evaluated by silver staining or Western blot when appropriate. RNA Fluorescence in-situ hybridization coupled to immunocytochemistry ndsRNA-2a sequence was PCR amplified and in- vitro transcribed with T7 RNA polymerase using Chromatide Alexa Fluor 546-14-UTP or Chromatide Alexa Fluor 488-5-UTP as a source of UTP. Fluorescently labeled forward or reverse ndsRNA-2a strands were Trizol purified and stored at -80°C. For immunofluorescence analysis, BJ and HeLA cells were grown in round coverslips and treated according to each experimental setup. Cells were fixed with 3% paraformaldehyde, 4% sucrose in 10 mM PBS for 10 min, permeabilized with 0,25% Triton X- 100 in 10 mM PBS for 10 min, and then blocked for 1 h in 1% BSA in 10 mM PBS (Blocking buffer). Coverslips were incubated over night at 4 °C in blocking buffer containing RAN (#4462, Cell Signalling), RCC1 (#5134, Cell Signalling), RANGAP1 (ab92360, Abeam) or RANBP2 (ab64276, Abeam) antibodies. Cells were washed twice in 10 mM PBS, 0.1%) Tween 20, and incubated with secondary antibody Alexa 488 (Molecular Probes). Cells were crosslinked again and washed twice with SSC 2·, 50%> formamide before hybridization. ndsRNA-2a strand specific probes were hybridized in hybridization buffer (2· SSC, 20%> dextran sulfate, and 1 mg/mL BSA) overnight at 37 °C in a thermomixer humid chamber, washed twice with 2»SSC in 50%> formamide, twice with 2»SSC, and counterstained with DAPI. Finally coverslips were mounted in ProLong Antifade (Molecular Probes), and visualized on a confocal laser-scanning microscope SP2-MP (Leica). ndsRNA-2 overexpression pTREG-bi plasmid (Clontech) bearing an inducible bidirectional promoter was modified in order to express a 5' (termed CD) or 3' (termed AB) SP6-tagged version of ndsRNA-2a. HeLA Tet-ON 3G cells were AB/CD trans fected, treated with doxycycline 12 h postransfection and collected in trizol for RNA analysis or processed for immunocytochemistry 24 h later. Overexpression and double-stranded nature of exogenous ndsRNA-2a was confirmed by reverse transcription using T7 flagged primers that recognizes the SP6 tag followed by PCR on RNAse ONE treated samples. For phenotypic analysis HeLA cells were co-transfected with ndsRNA-2a overexpressing plasmids (AB/CD) and histone Hl- GFP (AB/CD:H1-GFP 4: 1 ratio) to control in-well efficiency of transfection. Cells were fixed, permeabilized and stained for a-tubulin as previously described. The number of cells displaying an abnormal nuclear morphology, chromatin bridges and bi/multinuclei were determined by double blind analysis in 3 independent experiments (3000 cell counted for each condition). Results
Recent advances in high throughout sequencing technologies have disheveled an enormous diversity of RNA transcripts arising from unexplored regions within the genome, leading to the discovery of novel regulatory paradigms1"5. However, the transcriptional profile of hundreds of large genomic regions displaying disease-associated markers such as translocations, chromosomal rearrangements and single nucleotide polymorphisms (SNPs) remains largely unexplored6"8. One of those regions encompasses -500 kb on chromosome 8 (RAM region: 130,269,750-130,744,812), is critically involved in retinoic acid induced differentiation9 and contains multiple disease susceptibility SNPs10"17. By using a novel RNA capture approach followed by strand- specific RNA sequencing, we demonstrate that a plethora of RNAs map on both sense and antisense strands from the RAM region. Importantly, we unequivocally demonstrate that sense-antisense RNA pairs coexist within the same cell and generate stable long natural double-strand RNA (ndsRNA). Moreover we evidenced that ndsRNAs are mainly localized in the nucleus and establish specific interactions with nuclear components. Particularly, we demonstrate that ndsRNA-2a interacts with the mitotic RAN/RANGAP 1 -SUMO 1/RANBP2 complex in a RAN-dependent manner and displays differential nuclear localization throughout the cell cycle. Importantly, ndsRNA- 2a overexpression leads to a range of mitotic defects and a pronounced change in nuclear shape highlighting its involvement in cell cycle progression. Finally, global strand- specific RNA sequencing show that ndsRNA signatures are genome wide interspersed and revealed that ndsRNA molecules are modulated upon cellular cues. Taken together this study reveals ndsRNAs as novel members of the natural RNA-repertoire in human cells that are involved in a plethora of regulatory processes.
RNA capture and strand-specific RNA- Sequencing unveils ndsRNAs
To generate a comprehensive view of the transcripts originating from the RAM region we developed an RNA capture approach (Fig. 5a) coupled to a customized strand-specific RNA sequencing protocol (stsRNA-Seq). This technology permits the concomitant identification of long (>50 nt) and small (18-30 nt) RNAs using a single experimental protocol. Briefly, DNA traps obtained from random priming of BAC DNA covering the RAM region were hybridized to either the naturally occurring small RNA fraction (18-30 nt) or chemically fragmented and size selected (50-70 nt) total RNA from human leukemic PLB985 cells. Captured RNAs (long and small) were recovered, subjected to stsRNA-Seq and reads were aligned to the RAM region. Unexpectedly, RNA profiling revealed a plethora of RNAs mapping to either strand of RAM region including 437 long (Fig. la, upper panel; Fig. 5a) and 630 small RNAs (Fig. la, lower panel). Importantly, -90% of the identified transcripts were detected in three independent experiments indicating a high level of reproducibility among biological and technical replicates (Fig. lb; see material and methods). Bioinformatics analysis between transcripts from the long and small RNA datasets identified four different RNA classes (Fig. lc, Class I-IV). Class I comprises 'classical' transcripts from the long RNA fraction mapping on either forward (Fw) or reverse (Rv) strands, that do not overlap either with long RNAs on the opposite strand nor with small RNAs (Fig. lc and Fig. 5c, Class I). Class II transcripts are long RNA molecules mapping to one strand and overlap with a small RNA (sRNA; Fig. lc and Fig. 5d). The existence of Classes III and IV was unexpected as these RNAs correspond to overlapping long transcripts from both strands and represent -22% of all mapped RNAs (Fig. lc and Fig. 5b, e-f). Within the overlapping region we detected either a single small RNA mapping to one of the strands or two complementary small RNAs originating from both strands. Importantly, 36/40 randomly selected class I-IV long transcripts and 17/17 small class II-IV RNAs were validated using strand- specific reverse transcription followed by quantitative PCR (stsRT-PCR; Fig. 5c-f and Table 1), revealing the high confidence of the capture protocol for RNA discovery. Furthermore, the expression of Class III and IV long and
18 sRNAs was confirmed in an unrelated transformed cell line (BJELR, Fig. lg and Fig. 6c) . Notably, the levels of Class I-IV validated transcripts were not modified upon exosome depletion indicating that these molecules are neither products of pervasive transcription nor
19-21
byproducts of RNA degradation (Fig. 7a-b, 7g and 9d) . Moreover, even though nds- derived sRNAs are loaded into AG02 (Fig. 8), neither their levels nor those of their corresponding long precursors rely on any of the small RNA biogenesis pathways described
22-25
to date (Fig. 7a-f and 9a-c) Table 1 :
Figure imgf000030_0001
ndsRNAs are natural components of human cells The validation of 11/11 Class III-IV long complementary RNAs prompted us to analyze whether these molecules exist as double-stranded RNA within the cell. If these overlapping transcripts exist as double-strand RNA they should be resistant against an RNAse displaying single-strand specificity (RNAse ONE). Indeed, when total RNA from PLB985 cells was subjected to R Ase ONE treatment Class III-IV transcripts were protected from RNAse degradation (Fig. Id and Fig. 6a). Contrary, single strand Class I-II and GAPDH transcripts were not protected. Furthermore, when total RNA was incubated with an RNAse displaying double-strand RNA specificity (RNAse III), all Class III/IV transcripts were degraded (Fig. le, Fig. 6b) supporting the double-stranded nature of these RNA molecules in human cells. ndsRNAs establish specific interactions with nuclear components
To gain insight into the function of the identified ndsRNAs we analyzed their subcellular localization. Interestingly, Class III/IV ndsRNAs are predominantly nuclear (Fig. If, upper row), whereas their corresponding small RNAs are located either exclusively in the nucleus or in both nuclear and cytoplasmic fractions (Fig If, lower panel). The nuclear localization of ndsRNAs prompted us to analyze whether these molecules interact with nuclear proteins. Therefore we performed electrophoretic mobility shift assays with 2 previously identified radioactively labeled ndsRNAs (nds-2a and nds-2e) and nuclear extract obtained from BJELR cells. The results indicated that both ndsRNAs specifically interact with nuclear proteins (Fig. 2a). To identify binding partner(s) immobilized biotinylated nds-2a/2e were incubated with nuclear extract. Analysis of ndsRNA-bound proteins by Liquid Chromatography followed by Mass Spectrometry (LC-MS/MS) indicated that nds-2a and nds-2e bind different proteins/complexes, suggesting that ndsRNAs establish specific interactions within the cell (Fig. lOa-c). nds-2a binds a mitosis-specific RAN containing complex Gene ontology analysis performed over nds-2a binding proteins show enrichment of mitosis- related proteins (Fig. 2b; Fig. lOb-c). Particularly, RAN, RCC1, RANGAP1 and RANBP2 were found to interact with nds-2a (Fig. lOd; note the high peptide coverage). Importantly, these partners are major mitotic components involved in fundamental aspects of cell
26 physiology ranging from nuclear import/export to spindle assembly and mitotic progression . The interaction of nds-2a with RAN, RCC1, RANGAP1 and RANBP2 was validated by two complementary approaches. In the first one, nds-2a was biotin-immobilized, incubated with nuclear extract and the presence of these 4 proteins in the bound fraction was confirmed by Western blot. None of the analyzed proteins was detected when nds-2e was used as bait, supporting that nds-2a bind selectively to these components of the mitotic machinery (Fig. 2c). Importantly, we observed that only the mitosis-specific sumoylated form of RANGAPl
27
(RANGAPl -SUMO 1) was present in the complex with nds-2a. In the second approach, RAN, RCCl, RANGAPl and RANBP2 were immunoprecipitated from BJELR cells and the coprecipitated fraction was evaluated for nds-2a presence by stsRT-qPCR (Fig. 2d and Fig. l ib). Both nds-2a forward and reverse strands were enriched in the coprecipitated material, supporting that nds-2a interacts with members of the RAN complex in vivo. In order to discriminate whether nds-2a interacts with a single or several proteins of the RAN complex, RAN, RCCl, RANGAPl or RANBP2-depleted nuclear extracts were used to perform in vitro interaction assays in the presence of biotin-labeled nds-2a. The interaction of nds-2a with the RAN complex was abrogated in RAN depleted nuclear extracts, since the absence of RAN impaired the detection of RANGAPl or RANBP2. Contrary, RCCl binding remained unaffected, thus revealing RAN-independent interaction. When nuclear extracts depleted for RCCl were used, RAN/RANGAP1/RANBP2 interaction was unaffected. Finally, depletion of RANBP2 neither affected nds-2a-RAN nor nds-2a-RCCl interaction (Fig. 2e). Altogether, these data indicate that RAN and RCCl interact with nds-2a, suggesting that at least 2 different nds2a/protein complexes exist (nds-2a/RAN/RANGAP 1 -SUMO 1/RANBP2 and nds-2a/RCCl). To further support this notion we evaluated the ability of recombinant purified RAN or RCCl to bind to nds-2a. Indeed, RAN and RCCl bound directly to nds-2a while they failed to bind to nds2e, thus indicating that RAN and RCCl contain nds-2a-specific binding surfaces (Fig. 11c).
Altered nds-2a levels result in mitotic defects
Since data indicates that nds-2a binds a mitosis-specific species of RANGAPl (Fig. 2c and
27
2e) and since nds-2a levels increase along cell cycle progression (G1-S-G2/M, Fig. l ld-e) we investigated the dynamics of the intracellular localization of nds-2a throughout the cell cycle. For this we performed RNA-Fluorescence In Situ Hybridization (RNA-FISH) for nds- 2a (Fig. 2h and Fig. 12a-b) and coupled it to immunocytochemistry for RAN, RCCl, RANGAPl and RANBP2 using cycling normal mesenchymal fibroblasts (BJ) and HeLA cells. Confocal microscopy showed that whereas nds-2a co localizes with RAN and RCCl in interphase nuclei, no such co localization was observed with RANGAPl or RANBP2 in the same conditions (Fig. 2f and Fig. 12c and 12e). Contrary, when metaphase cells were analyzed, nds2a colocalized with RAN, RANGAPl and RANBP2 in the cellular periphery and particular structures along the mitotic spindle (Fig. 2g and 2i and Fig. 12d) reinforcing the notion that this particular ndsR A participates in mitosis-related events. To support this hypothesis, we generated a set of nds-2a overexpressing plasmids bearing a SP6 sequence tag located either in the 3' (termed "AB") or 5' (termed "CD") end of nds-2a (Fig. 3a), transfected HeLA cells and verified the double strand nature of the overexpressed molecule (Fig. 3b). Importantly, double-blind confocal examination revealed that the number of cells displaying mitotic defects (chromatin "bridges" and bi/multinucleated cells) was increased in AB/CD transfected cells compared to controls (Fig. 3c-d). Furthermore, the number of nuclei displaying abnormal shapes in the entire population was also increased in nds-2a overexpressing cells (Fig. 3d-f). Collectively, this data indicates that nds-2a is a functionally important component of the mitotic machinery and highlights its biological relevance in a complex biological setting such as mitotic progression. ndsRNAs are modulated by cellular cues and represent a novel class of RNA
Global stsRNA-seq libraries demonstrated that ndsRNAs are expressed throughout the entire genome and are more abundant in intergenic regions than in exons/introns (Fig. 4a and Fig. 14). Moreover, detailed database exploration showed that ndsRNAs map within different RNA classes (Fig. 4b) suggesting that ndsRNAs are not restricted to any previously described transcript family, but rather represent a novel class of RNAs interspersed within the human genome. Notably, we observed that a subset of globally expressed ndsRNAs is modulated upon retinoic acid treatment (Fig. 4c-d and Fig.15) supporting the notion that these novel molecules are regulated by cellular cues and might participate in a plethora of regulatory systems. Genome positions for the displayed examples are shown in Table 2.
Table 2:
Figure imgf000034_0001
Discussion
The increasing number of non-coding RNA species identified indicates that the transcriptional
1-5 landscape in higher eukaryotes is much more complex than originally anticipated However the biological role for the large majority of these molecules remains elusive. In this work, we unequivocally demonstrate that sense-antisense transcripts coexist within the cell and generate stable nuclear double-stranded RNAs. Contrary to previous reports suggesting that overlapping sense/antisense RNA expression is restricted to pseudogenes or repetitive
28-30
elements in restricted biological scenarios , we show that ndsRNAs map to interspersed elements along the genome indicating that they correspond to a new class of RNAs. Interestingly, although our initial evidence suggested that ndsRNAs were merely sRNA precursors, we demonstrate that ndsR As establish specific RNA-protein interactions suggesting that these molecules serve diverse functions within the cell. Particularly, we provide evidence that nds-2a displays differential localization throughout the cell cycle, interacts with the mitotic RAN/RANGAP1 SUM01/RANBP2 complex and localizes within the mitotic spindle, supporting its biological relevance. Importantly, nds-2a overexpression leads to a range of mitotic defects and a pronounced change in nuclear shape highlighting its role in cell cycle progression. All in all, our work expands the already complex RNA catalog and demonstrates that ndsRNAs play fundamental roles in cellular physiology. EXAMPLE 2
This example describes a method for capturing and identifying ndsRNAs from isolated nucleic. Isolated nuclei are incubated with a biotin labelled-ndsRNA-specific adaptor oligonucleotide (or capture probe) to capture ndsRNAs in the presence of an RNA ligase.
Then, total RNA is incubated with streptavidin beads and recovered ligated products are subjected to strand-specific RNA-seq library construction. Libraries are sequenced in HiSeq2000 Illumina sequencer and data is analyzed bioinformatically.
This method allows retrieval of sequence information that are used to identify ndsRNAs.
EXAMPLE 3
The method of the invention is used to identify ndsRNAs tumorigenesis-related ndsRNAs.
An ndsRNA profile specific for each step of the BJ stepwise tumorigenesis model18 is obtained. This in vitro cellular model recapitulates the basic events necessary for cellular transformation. Briefly, normal primary human cells are transformed in a stepwise manner by the introduction of the catalytic subunit of telomerase (hTERT), the early region of the SV40 virus (SV40 ER) and the activated allele of H-ras (H-rasV12). This cellular system allows performing comparative analysis between cancer cells with their normal progenitors. After performing a genome-wide ndsRNA capture and sequencing on this model system as provided above, we identify a signature of differentially expressed ndsRNAs associated to the immortalization (early marker) and transformation process (tumor marker). Once the profiles are generated, genome-wide ndsRNA analysis is bioinformatically performed.
EXAMPLE 4 Proof-of-principle for ndsRNA Proximity Ligation assay
This example presents a validation of the method of the invention as effective in retrieving ndsRNAs. We performed a series of pilot experiments were we performed the proximity ligation assay (i.e. the method of the invention implementing the hairpin capture probe as described above) in isolated nuclei and determined the levels of individual known ndsRNAs using specifically designed primers for each one of them 32. As can be observed, using this method we were able to successfully retrieve several ndsRNAs (Fig. 17a and b 32; and Fig. 17c, known ndsRNAs validated within the context of the grant) suggesting that the method can be used to perform large scale ndsRNA identification in complex systems (cellular differentiation/transformation, etc.)
Extending ndsRNA Proximity Ligation assay to next generation sequencing platforms
As demonstrated, our assay retrieved many individual ndsRNAs prompting us to extend our analysis to next generation sequencing. Hence, we performed the assay in BJELR cells (tumor cell line corresponding to the last step from the stepwise tumorigenesis system) under stringent conditions and proceed to the construction of an RNA sequencing library using our customized protocol (described in EP14305822 and PCT/EP2015/062179). After sequencing, retrieved data showed that out of 100M reads, -170000 reads contained the adaptor sequence at least once within its sequence. Therefore, we proceeded to de-novo RNA reconstruction using Trinity pipeline 33. In order to optimize the procedure, library complexity was reduced (for example by implementing a RNAse - e.g. RNAseONE - treatment) in order to augment the number of adaptor containing reads prior to de-novo RNA reconstruction. In doing so, we were able to retrieve isolated reads mapping to known ndsR As 32 suggesting that ndsR A molecules are contained within the analyzed libraries.
RNAse optimization
In order to limit the number of reads that do not contain the adaptor sequence and the number of molecules that bear adaptor/adaptor sequences, we used a RNAse displaying single stranded specificity (e.g. RNAseONE) as most of the aforementioned potentially unwanted reads should be single stranded. Hence, we included into our experimental pipeline a single step of RNAseONE treatment. The RNAseONE treatment significantly improved the detection of single-adaptor containing reads (-160000 reads). Moreover, -1100 molecules bearing a single adaptor sequence were reconstructed (Fig. 18). Most of the reconstructed molecules could be further processed for ndsRNA retrieval. References
1. Guttman, M. & Rinn, J.L. Modular regulatory principles of large non-coding RNAs. Nature 482, 339-46 (2012).
2. Guil, S. et al. Intronic RNAs mediate EZH2 regulation of epigenetic targets. Nat Struct Mol Biol 19, 664-70 (2012).
3. Memczak, S. et al. Circular RNAs are a large class of animal RNAs with regulatory potency. Nature 495, 333-8 (2013).
4. Mercer, T.R. et al. Targeted RNA sequencing reveals the deep complexity of the human transcriptome. Nat Bio techno 7 30, 99-104 (2012).
5. Mercer, T.R. & Mattick, J.S. Structure and function of long noncoding RNAs in epigenetic regulation. Nat Struct Mol Biol 20, 300-7 (2013).
6. Fletcher, O. & Houlston, R.S. Architecture of inherited susceptibility to common cancer. Nat Rev Cancer 10, 353-61 (2010).
7. Greenman, C. et al. Patterns of somatic mutation in human cancer genomes. Nature 446, 153-8 (2007).
8. Wood, L.D. et al. The genomic landscapes of human breast and colorectal cancers. Science 318, 1 108-13 (2007).
9. Yin, W., Rossin, A., Clifford, J.L. & Gronemeyer, H. Co-resistance to retinoic acid and TRAIL by insertion mutagenesis into RAM. Oncogene 25, 3735-44 (2006). 10. Kiemeney, L.A. et al. Sequence variant on 8q24 confers susceptibility to urinary bladder cancer. Nat Genet 40, 1307-12 (2008).
11. Radtke, I. et al. Genomic analysis reveals few genetic alterations in pediatric acute myeloid leukemia. Proc Natl Acad Sci USA 106, 12944-9 (2009).
12. Rafiq, M.A. et al. Mapping of three novel loci for non-syndromic autosomal recessive mental retardation (NS-ARMR) in consanguineous families from Pakistan. Clin Genet 78, 478-83 (2010).
13. Schoemaker, M.J. et al. Interaction between 5 genetic variants and allergy in glioma risk. Am J Epidemiol 111, 1 165-73 (2010).
14. Shete, S. et al. Genome-wide association study identifies five susceptibility loci for glioma. Nat Genet 41, 899-904 (2009).
15. Simon, M. et al. Genetic risk profiles identify different molecular etiologies for glioma. Clin Cancer Res 16, 5252-9 (2010).
16. Tomlinson, I. et al. A genome-wide association scan of tag SNPs identifies a susceptibility variant for colorectal cancer at 8q24.21. Nat Genet 39, 984-8 (2007).
17. Jenkins, R.B. et al. A low-frequency variant at 8q24.21 is strongly associated with risk of oligodendroglial tumors and astrocytomas with IDH1 or IDH2 mutation. Nat Genet 44, 1122- 5 (2012).
18. Hahn, W.C. et al. Creation of human tumour cells with defined genetic elements. Nature 400, 464-8 (1999).
19. Lykke-Andersen, S., Brodersen, D.E. & Jensen, T.H. Origins and activities of the eukaryotic exosome. J Cell Sci 122, 1487-94 (2009).
20. Preker, P. et al. RNA exosome depletion reveals transcription upstream of active human promoters. Science 322, 1851-4 (2008).
21. Belostotsky, D. Exosome complex and pervasive transcription in eukaryotic genomes. Curr Opin Cell Biol 21, 352-8 (2009).
22. Yang, J.S. & Lai, E.C. Alternative miRNA biogenesis pathways and the interpretation of core miRNA pathway mutants. Mol Cell 43, 892-903 (2011).
23. Rana, T.M. Illuminating the silence: understanding the structure and function of small RNAs. Nat Rev Mol Cell Biol 8, 23-36 (2007).
24. Czech, B. & Hannon, G.J. Small RNA sorting: matchmaking for Argonautes. Nat Rev Genet 12, 19-31 (2011).
25. Djuranovic, S., Nahvi, A. & Green, R. A parsimonious model for gene regulation by miRNAs. Science 331, 550-3 (2011). 26. Clarke, P.R. & Zhang, C. Spatial and temporal coordination of mitosis by Ran GTPase. Nat Rev Mol Cell Biol 9, 464-77 (2008).
27. Joseph, J., Tan, S.H., Karpova, T.S., McNally, J.G. & Dasso, M. SUMO-1 targets RanGAPl to kinetochores and mitotic spindles. J Cell Biol 156, 595-602 (2002).
28. Katayama, S. et al. Antisense transcription in the mammalian transcriptome. Science 309, 1564-6 (2005).
29. Tarn, O.H. et al. Pseudogene-derived small interfering RNAs regulate gene expression in mouse oocytes. Nature 453, 534-8 (2008).
30. Watanabe, T. et al. Endogenous siRNAs from naturally formed dsRNAs regulate transcripts in mouse oocytes. Nature 453, 539-43 (2008).
31. Ng SB. et al. Targeted capture and massively parallel sequencing of 12 human exomes. Nature 461, 272-76 (2009).
32. Portal, M. M., Pavet, V., Erb, C. & Gronemeyer, H. Human cells contain natural double- stranded RNAs with potential regulatory functions. Nature structural & molecular biology 22, 89-97, doi: 10.1038/nsmb.2934 (2015).
33. Grabherr, M. G. et al. Full-length transcriptome assembly from RNA-Seq data without a reference genome. Nature biotechnology 29, 644-652, doi: 10.1038/nbt. l883 (2011).
34. Portal, M. M., Pavet, V., Erb, C. & Gronemeyer, H. Human cells contain natural double- stranded RNAs with potential regulatory functions. Nature structural & molecular biology 22, 89-97, doi: 10.1038/nsmb.2934 (2015).

Claims

1. A method for identifying double stranded R As from a sample, comprising:
- contacting said sample with an adapted capture probe; and
- sequencing the capture RNAs, thereby identifying double stranded RNAs in the sample.
2. The method according to claim 1 , wherein the capture probe comprises a first and a second nucleic acid sequence, wherein the first and second nucleic acid sequences are complementary one with the other, and wherein the capture probe further comprises a moiety which may be used for purifying said capture probe bound alone or bound to a double stranded RNA.
3. A method for purifying double stranded RNAs from a sample, comprising:
- generating covalent bonds, in the presence of a RNA ligase, between a RNA extract from said sample and an RNA-based oligonucleotide capture probe capable of ligating RNA molecules in close proximity;
wherein the capture probe comprises, from 5' to 3', a first, a second and a third nucleic acid sequences, wherein a) said sequences allow formation of a hairpin with the first and third nucleic acid sequences being complementary one to the other and forming the stem of the hairpin; and b) the second nucleic acid sequence forms the loop of the hairpin and is labeled, such as with one or more biotynylated nucleotides, so it can be purified using label-specific purification means; and
- purifying the captured RNAs using purification method specific of the label present in the third nucleic acid sequence.
4. The method according to any one of claims 1 to 4, wherein the capture probe is of the formula
5* Phos/rGrGrArC/BiotdT/rArCrGrGrUrArA 3*.
5. A method of identifying double stranded RNA molecules in a sample, the method comprising the steps of:
a) capturing RNAs from a sample using the method of any one of claims 3 to 4; and b) sequencing the captured RNAs.
6. The method according to any one of claims 1 to 5, wherein the captured RNAs correspond to a long and/or a fragmented and size selected long RNA fraction of RNAs 50 to 250 nucleotide long.
7. The method according to any one of the preceding claims, wherein the captured RNAs are modified by ligating 5'- and 3 '-adapter sequences on said captured RNAs, such that RNA polarity information may be retrieved during the sequencing step.
8. The method according to any one of claims 1 to 7, wherein single stranded RNA species are depleted using a RNAse that only degrades single stranded RNAs.
9. A method for identifying a double stranded RNA in a RNA sample, comprising implementing the method according to any one of claims 1 to 8 and thereby identifying one or more long natural double stranded RNAs in said sample.
10. The method according to claim 9, wherein a double stranded RNA is identified by de- novo transcript reconstruction followed by a search of the capture oligonucleotide sequence in a reconstructed transcript, and wherein a double stranded RNA is identified when transcript overlaps in opposite strands are detected.
11. The method according to any one of claims 5 to 10, comprising:
- capturing RNAs using a capture probe as described above;
- sequencing the capture RNAs or a cDNA reverse transcribed from the capture RNAs;
- preprocessing the thereby obtained sequences to identify the capture probe oligonucleotide sequence within the reads;
- reconstructing and identifying single-stranded and potential double-stranded RNA transcripts;
- mapping for sequences before and after the capture probe oligonucleotide sequence; and
- detecting transcript overlap in opposite strands, thereby identifying double stranded RNAs.
12. The method according to any one of claims 1 to 11, wherein the double stranded RNA is a natural double stranded RNA (ndsRNA).
13. A ndsRNA identified according to the method of any one of claims 1 to 12.
14. A ndsRNA selected from ndsRNA-2a and ndsRNA-2e, the sequence of which is shown in SEQ ID NO: l and SEQ ID NO:2, respectively.
15. A method for identifying a marker associated with a phenotype or cell fonction, or for identifying a target for the treatment of a disease or for identifying a biomarker indicative of a disease or condition, wherein the method comprises identifying the presence or absence of one or more ndsRNAs, or determining a change in the expression profile of one or more ndsRNAs in a sample of interest by implementing the method of any one of claims 1 to 12, thereby associating the ndsRNAs identified to a phenotype, a cell function, or a disease, wherein in a particular embodiment, the sample of interest is from a cancer cell and ndsRNA expression profiling allows the identification of bio markers indicative of this cancer or the identification of one or more ndsRNA which could be the target of a treatment of said cancer.
16. A method for the functional characterization of a ndsRNA, wherein
- said ndsRNA is introduced or depleted in a cell, tissue, organ or organism and a phenotypic or functional change occurring after said introduction or depletion is determined, for example a modification of the cell cycle, the cell shape, induction of apoptosis, induction of cell differentiation and induction of a sensitivity or resistance to a therapeutic molecule; or -the binding partners, in particular protein binding partners, of said ndsRNA are identified.
17. An oligonucleotide capture probe capable of ligating exclusively double stranded RNA molecules;
wherein the capture probe comprises, from 5' to 3', a first, a second and a third nucleic acid sequences, wherein a) said sequences allow formation of a hairpin with the first and third nucleic acid sequences being complementary one to the other and forming the stem of the hairpin; and b) the second nucleic acid sequence forms the loop of the hairpin and is labeled, such as with one or more biotinylated nucleotides, so it can be purified using label-specific purification means.
18. The oligonucleotide capture probe according to claim 17, wherein the capture probe is of the formula
5* Phos/rGrGrArC/BiotdT/rArCrGrGrUrArA 3*.
PCT/EP2015/073949 2014-10-16 2015-10-15 Method of capturing and identifying novel rnas WO2016059187A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
EP14306648 2014-10-16
EP14306648.8 2014-10-16

Publications (1)

Publication Number Publication Date
WO2016059187A1 true WO2016059187A1 (en) 2016-04-21

Family

ID=51799063

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/EP2015/073949 WO2016059187A1 (en) 2014-10-16 2015-10-15 Method of capturing and identifying novel rnas

Country Status (1)

Country Link
WO (1) WO2016059187A1 (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP3784798A4 (en) * 2018-05-23 2022-01-12 Pacific Biosciences Of California, Inc. Enrichment of dna comprising target sequence of interest

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5945312A (en) * 1996-04-15 1999-08-31 University Of Southern California Synthesis of fluorophore-labeled DNA
WO2000001846A2 (en) * 1998-07-03 2000-01-13 Devgen N.V. Characterisation of gene function using double stranded rna inhibition
WO2009072972A1 (en) * 2007-12-03 2009-06-11 Karolinska Institutet Innovations Ab A method for enzymatic joining of a dsrna adapter to a dsrna molecule
EP2388325A1 (en) * 2010-05-20 2011-11-23 RLP AgroScience GmbH Method for isolating small RNA-molecules using HC-Pro protein
US20130323725A1 (en) * 2012-06-01 2013-12-05 Agilent Technologies, Inc. Target enrichment and labeling for multi-kilobase dna

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5945312A (en) * 1996-04-15 1999-08-31 University Of Southern California Synthesis of fluorophore-labeled DNA
WO2000001846A2 (en) * 1998-07-03 2000-01-13 Devgen N.V. Characterisation of gene function using double stranded rna inhibition
WO2009072972A1 (en) * 2007-12-03 2009-06-11 Karolinska Institutet Innovations Ab A method for enzymatic joining of a dsrna adapter to a dsrna molecule
EP2388325A1 (en) * 2010-05-20 2011-11-23 RLP AgroScience GmbH Method for isolating small RNA-molecules using HC-Pro protein
US20130323725A1 (en) * 2012-06-01 2013-12-05 Agilent Technologies, Inc. Target enrichment and labeling for multi-kilobase dna

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
MITCHELL GUTTMAN ET AL: "Modular regulatory principles of large non-coding RNAs", NATURE, vol. 482, no. 7385, 15 February 2012 (2012-02-15), pages 339 - 346, XP055152365, ISSN: 0028-0836, DOI: 10.1038/nature10887 *
TOSHIAKI WATANABE ET AL: "Endogenous siRNAs from naturally formed dsRNAs regulate transcripts in mouse oocytes", NATURE, vol. 453, no. 7194, 10 April 2008 (2008-04-10), pages 539 - 543, XP055153025, ISSN: 0028-0836, DOI: 10.1038/nature06908 *

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP3784798A4 (en) * 2018-05-23 2022-01-12 Pacific Biosciences Of California, Inc. Enrichment of dna comprising target sequence of interest

Similar Documents

Publication Publication Date Title
Tan-Wong et al. R-loops promote antisense transcription across the mammalian genome
Lee et al. Enhancer RNA m6A methylation facilitates transcriptional condensate formation and gene activation
Treiber et al. A compendium of RNA-binding proteins that regulate microRNA biogenesis
Rossiello et al. DNA damage response inhibition at dysfunctional telomeres by modulation of telomeric DNA damage response RNAs
Rosa-Mercado et al. Hyperosmotic stress alters the RNA polymerase II interactome and induces readthrough transcription despite widespread transcriptional repression
Wang et al. LncRNA Dum interacts with Dnmts to regulate Dppa2 expression during myogenic differentiation and muscle regeneration
Wong et al. Long non-coding RNAs in hematological malignancies: translating basic techniques into diagnostic and therapeutic strategies
US8748354B2 (en) RNA interactome analysis
Portal et al. Human cells contain natural double-stranded RNAs with potential regulatory functions
Shibata et al. Detection of DNA fusion junctions for BCR-ABL translocations by Anchored ChromPET
US20230065720A1 (en) High Throughput Cell-Based Screening for Aptamers
Yasuhara et al. Condensates induced by transcription inhibition localize active chromatin to nucleoli
Aeby et al. Decapping enzyme 1A breaks X-chromosome symmetry by controlling Tsix elongation and RNA turnover
El Said et al. Malat-1-PRC2-EZH1 interaction supports adaptive oxidative stress dependent epigenome remodeling in skeletal myotubes
Mayor-Ruiz et al. TrapSeq: An RNA sequencing-based pipeline for the identification of gene-trap insertions in mammalian cells
WO2016059187A1 (en) Method of capturing and identifying novel rnas
Charmant et al. The nuclear PIWI-interacting protein Gtsf1 controls the selective degradation of small RNAs in Paramecium
WO2015181397A1 (en) METHOD OF SEQUENCING AND IDENTIFYING RNAs
WO2013167744A1 (en) Rna products and uses thereof
Bettin Telomeric localization of the TElomeric Repeat-containing RNA TERRA impairs telomerase activity in human cancer cells
Teixeira Pereira A comprehensive analysis of Med12 controlled (l) ncRNAs and characterization of a novel Sall1 antisense transcript
Pereira A Comprehensive Analysis of Med12 Controlled (l) ncRNAs and Characterization of a Novel Sall1 Antisense Transcript
미숙BALAS Regulation of long noncoding RNA via RNA-RNA and matchmaker protein interactions
Elguindy Regulation of Pumilio RNA Binding Proteins by Long Noncoding RNA NORAD
Wang Investigation of the Role of SMCHD1 in X-Chromosome Inactivation

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 15783981

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 15783981

Country of ref document: EP

Kind code of ref document: A1