WO2015181397A1 - METHOD OF SEQUENCING AND IDENTIFYING RNAs - Google Patents

METHOD OF SEQUENCING AND IDENTIFYING RNAs Download PDF

Info

Publication number
WO2015181397A1
WO2015181397A1 PCT/EP2015/062179 EP2015062179W WO2015181397A1 WO 2015181397 A1 WO2015181397 A1 WO 2015181397A1 EP 2015062179 W EP2015062179 W EP 2015062179W WO 2015181397 A1 WO2015181397 A1 WO 2015181397A1
Authority
WO
WIPO (PCT)
Prior art keywords
rna
rnas
ndsrna
sample
cell
Prior art date
Application number
PCT/EP2015/062179
Other languages
French (fr)
Inventor
Hinrich Gronemeyer
Maximiliano PORTAL
Original Assignee
Universite De Strasbourg
Centre National De La Recherche Scientifique
INSERM (Institut National de la Santé et de la Recherche Médicale)
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Universite De Strasbourg, Centre National De La Recherche Scientifique, INSERM (Institut National de la Santé et de la Recherche Médicale) filed Critical Universite De Strasbourg
Publication of WO2015181397A1 publication Critical patent/WO2015181397A1/en

Links

Classifications

    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12QMEASURING OR TESTING PROCESSES INVOLVING ENZYMES, NUCLEIC ACIDS OR MICROORGANISMS; COMPOSITIONS OR TEST PAPERS THEREFOR; PROCESSES OF PREPARING SUCH COMPOSITIONS; CONDITION-RESPONSIVE CONTROL IN MICROBIOLOGICAL OR ENZYMOLOGICAL PROCESSES
    • C12Q1/00Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions
    • C12Q1/68Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions involving nucleic acids
    • C12Q1/6869Methods for sequencing
    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12QMEASURING OR TESTING PROCESSES INVOLVING ENZYMES, NUCLEIC ACIDS OR MICROORGANISMS; COMPOSITIONS OR TEST PAPERS THEREFOR; PROCESSES OF PREPARING SUCH COMPOSITIONS; CONDITION-RESPONSIVE CONTROL IN MICROBIOLOGICAL OR ENZYMOLOGICAL PROCESSES
    • C12Q1/00Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions
    • C12Q1/68Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions involving nucleic acids
    • C12Q1/6876Nucleic acid products used in the analysis of nucleic acids, e.g. primers or probes
    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12QMEASURING OR TESTING PROCESSES INVOLVING ENZYMES, NUCLEIC ACIDS OR MICROORGANISMS; COMPOSITIONS OR TEST PAPERS THEREFOR; PROCESSES OF PREPARING SUCH COMPOSITIONS; CONDITION-RESPONSIVE CONTROL IN MICROBIOLOGICAL OR ENZYMOLOGICAL PROCESSES
    • C12Q2600/00Oligonucleotides characterized by their use
    • C12Q2600/178Oligonucleotides characterized by their use miRNA, siRNA or ncRNA

Definitions

  • the present invention relates to a method for sequencing and identifying novel classes of RNAs from a sample. More particularly, we herein describe the discovery and characterization of natural double-stranded RNAs (ndsRNAs).
  • Mercer et al. 4 describe a method for targeted sequencing of the human transcriptome that reveals its deep complexity.
  • Ng et al. 31 describe the targeted capture and massive parallel sequencing of human exomes.
  • An object of the invention is a method of sequencing RNA molecules in a sample comprising: a) capturing RNAs with a bait nucleic acid or nucleic acid set;
  • ndsRNAs represent another object of the present invention.
  • another object of the invention relates to a method of identifying or characterizing a ndsRNA in a R A sample, comprising implementing the above method of sequencing on the R As contained in said sample thereby identifying or characterizing ndsRNAs contained therein.
  • a further object of the invention relates to a method for identifying a marker associated with a phenotype or cell function, or for identifying a target for the treatment of a disease or for identifying a bio marker indicative of a disease or condition.
  • This method comprises the identification of ndsRNAs, or determining the expression profile of ndsRNAs in a sample of interest, thereby associating the ndsRNAs identified to a phenotype, cell function, or disease.
  • the sample of interest may correspond to a cancer cell and ndsRNA expression profiling allows the identification of biomarkers indicative of this cancer or the identification of a ndsRNA which could be the target of a treatment (for example by increasing or decreasing the expression of said ndsRNA).
  • Figure 1 identification of RAM-derived RNAs.
  • GAPDH is depicted as an RNAse treatment control, (f), stsRT-qPCR of Class III and IV long RNAs and small RNAs in PLB985-total RNA from whole cell (Input), nuclei (Nuclei) or cytoplasmic extracts (Cytoplasm).
  • let-7c and U49 snoRNA are cytoplasmic and nuclear controls, respectively. Opposite bars correspond to matching sense/antisense pairs.
  • Histogram represents mean RNA levels as arbitrary units (A.U.) +/- SD of one representative experiment out of three independent biological replicates, (g), stsRT-qPCRs of RAM-derived Class III (from left to right: ndsRNA3, 4, 5, 6, 8 and 9), IV long (from left to right: ndsRNAl and 2) and small (from left to right class III: sRNA 5, 6, 7, 8, 10 and 1 1 and IV: sRNA 1, 2, 3 and 4) RNAs in BJELR cells. Opposite bars correspond to matching sense/antisense pairs. RNA levels are shown as arbitrary units (A.U.) +/- SD. Genome positions are shown in Table 1.
  • Figure 2 nds-2a establishes specific RNA-protein interactions and displays cell cycle- dependent subcellular localization,
  • (a) Electrophoretic mobility shift assay performed with radioactively labelled (*) nds-2a/2e incubated with BJELR nuclear extract (N.E). Specificity was confirmed by competition with non-radioactive nds-2a/2e.
  • (b) Gene ontology enrichment analysis performed for nds-2a associated proteins identified by mass spectrometry,
  • Immobilized nds-2a/2e were incubated with N.E.
  • RNA levels as arbitrary units (A.U.) +/- SD of one experiment out of three biological replicates
  • Immobilized nds-2a was incubated with RAN, RCCl, RANGAPl or RANBP2 siRNA-depleted nuclear extracts and the nds-2a protein bound fraction was analysed for RAN, RCCl, RANGAPl and RANBP2 proteins by Western blot.
  • N.E. depletion was controlled by Western blot and is depicted as input
  • Interphase BJ cells were processed for nds-2a RNA-FISH (red channel) coupled to immunocytochemistry for RAN, RANGAPl and RANBP2 (green channel) and analysed by confocal microscopy.
  • a-Tubulin (a-Tub) and DAPI staining are shown to delineate the cell.
  • Merge and enlarged images are depicted, (g), nds-2a and RAN, RANGAPl and RANBP2 localization was analyzed by confocal microscopy in metaphase BJ cells.
  • DAPI staining delineates the metaphasic plaque.
  • Merge and enlarged images are depicted, (h), Interphase BJ cells were processed for nds-2a RNA-FISH to detect either nds-2a forward (2a Fw, red channel) or reverse (2a Rv, green channel) strands.
  • a-Tubulin (a-Tub) and DAPI staining are shown to delineate the cell.
  • Merge images are depicted, (i), nds-2a and a-Tubulin (a-Tub) localization was analyzed by confocal microscopy in metaphase BJ cells. Merge and enlarged images (Inset) are depicted.
  • Figure 3 nds-2a overexpression leads to a range of mitotic defects and pronounced changes in nuclear shape, (a), Diagram of pBI-nds-2a plasmid and overexpressed SP6- tagged nds-2a (AB and CD), (b), RNAse ONE protection assays followed by stsRT-PCR of overexpressed SP6 _, tagged ndsRNA-2a.
  • GAPDH is depicted as RNAse treatment control, (c- f), HeLA cells overexpressing nds-2a variants (AB or CD) were processed for immunocytochemistry against ⁇ -Tubulin (a-Tub, red channel), counterstained with DAPI (blue, channel) to delineate the cellular contour and analyzed by confocal microscopy. Empty plasmid was used as control. Histone Hl-GFP (Hl-GFP, green channel) expressing plasmid was co-transfected to monitor transfection efficiency (Fig. 13a). The number of bi/multinucleated cells, number of chromatin bridges and cells displaying abnormal nuclear shape was determined by a double blind analysis.
  • Figure 4 Genome- wide expression of ndsRNAs.
  • (a) Pie chart representing percentages of ndsRNAs mapping to exons, introns or intergenic regions
  • (b) Pie chart representing ndsRNAs compared to fRNAdb annotated features. Different RNA species contained in fRNAdb are color-coded
  • (c) Retinoic acid-modulated ndsRNA and nds-derived small RNA in PLB985 cells are depicted
  • (d) RNA levels of forward (Fw) and reverse (Rv) strands of modulated ndsRNAs from RA or vehicle-treated cells was determined by stsRT-qPCR. ICAM1 and PRC mRNAs are shown as controls.
  • Results are expressed as arbitrary units (A.U.) +/- SD of one out of two biological replicas. Genome positions are shown in Table 2.
  • Figure 5 Identification and validation of RAM-derived RNAs.
  • (a) RNA capture approach,
  • (b) Pie chart representing the relative distribution of Class I-IV RAM-derived transcripts,
  • (c-f ) Schematic representation and validation of single-stranded Class I-II and natural double-stranded (ndsRNAs) Class III-IV long RNAs, as well as small RNAs from Class II-IV transcripts by stsRT-PCR.
  • FIG. 6 RAM-derived RNAs in PLB985 and BJELR cells, (a-b), RNAse ONE or RNAse III protection assays followed by stsRT-PCR of Class III-IV RNA duplexes and Class I-II transcripts.
  • GAPDH is depicted as an RNAse treatment control, (c), Detection by stsRT-PCR of RAM-derived Class II-IV long and small RNAs in RAS-transformed foreskin fibroblast (BJELR).
  • FIG. 7 RAM-derived small RNAs are neither products of pervasive transcription nor byproducts of the canonical miRNA machinery, (a), Determination of Drosha, Dicer, AG02 and EXOSC3 knock-down efficiency by RT-qPCR in BJELR cells. niRNA levels are expressed relative to untreated mock samples. Histogram represents mean RNA levels as arbitrary units (A.U.) +/- SD of one experiment out of three biological replicates, (b), Determination of Drosha, Dicer, AG02 and EXOSC3 knock-down efficiency by Western blot. Actin levels are shown as loading control, (c), Analysis of the impact of Drosha, Dicer, AG02 and EXOSC3 knock-down on the levels of previously reported miRNAs.
  • d-g Quantification of Class II and nds-derived sRNAs as well as miR-93 control upon Drosha (d), Dicer (e), AG02 (f) or EXOSC3 (g) knock down in BJELR cells.
  • the expression level for each sRNA in scramble siRNAs-transfected (scr) samples was arbitrarily set to 1. Values upon knock down are expressed relative to the scr. Histogram represents mean RNA levels as arbitrary units (A.U.) +/- SD of one experiment out of three biological replicates.
  • FIG. 8 RAM-derived small RNAs are loaded into AG02. (a-b), Silver staining and Western blot for argonaute-2 (AG02) in whole cell lysates (input) and AG02- immunoprecipitated material (AG02 IP). AG02, heavy (he) and light (lc) chains of the IP- antibody are indicated, (c-d), Detection by stsRT-qPCR of Class II and nds-derived sRNAs in RNA extracted from whole cell lysates (input) or AG02-immunoprecipitated material from PLB985 cells. let-7c and U6 snoRNA are included as AG02-loaded positive and negative controls. Histogram represents mean RNA levels as arbitrary units (A.U.) +/- SD of one experiment out of two biological replicates.
  • Figure 9 RAM-derived long RNAs are neither modulated upon exosome depletion nor by the miRNA pathway, (a-d), Quantification of Class II (RNA1 : B2, RNA2: B3, RNA3: B4, RNA4: B6, RNA5: B7 and RNA6: B8), Class III (ndsRNA3: C1/C2, ndsRNA4: C3/C4, ndsRNA5: C5/C6, ndsRNA6: C7/C8, ndsRNA7: C9/C10, ndsRNA8: C11/C12, ndsRNA9: C13/C14) and Class IV (ndsRNAl : D1/D2 and ndsRNA2: D3/D4) long RNAs precursors upon Drosha (a), Dicer (b), AG02 (c) or EXOSC3 (d) knock down in BJELR cells.
  • RNA1 RNA1 : B2, RNA2: B3, RNA3: B4, RNA4: B6, RNA5
  • RNA levels were arbitrary set as 1. Values upon specific knock-down are expressed relative to the scr sample. Histogram represents mean RNA levels as arbitrary units (A.U.) +/- SD of one experiment out of three biological replicates.
  • ndsRNAs establish specific protein interactions, (a), Nuclear proteins interacting with nds-2a are revealed by SDS-PAGE followed by silver staining, (b-c), Gene ontology enrichment analysis of nds-2a and nds-2e interacting proteins identified by mass spectrometry, (d), Peptide coverage of nds-2a interacting RAN, RCC1, RANGAPl and RANBP2 identified by mass spectrometry.
  • Figure 11 nds-2a binds RAN and RCC1 in vitro and in vivo
  • (a) The specificity of RAN, RCC1, RANGAPl and RANBP2 antibodies was tested by a siRNA based approach, a- Tubulin (a-Tub) is shown as a loading control
  • (b) RAN, RCC1, RANGAPl and RANBP2 proteins were immunoprecipitated with specific antibodies and their corresponding levels were analysed by Western blot to determine immunoprecipitation efficacy
  • BJELR cells were sorted according to cell cycle phases (Gl , S and G2-M) by flow cytometry. Panels show a normal cell cycle and the enrichment of the FACS sorted populations in the indicated phases of the cell cycle, (e), Sorted cells from d were analyzed for nds-2a forward (2a Fw) and reverse (2a Rv) strand levels by stsRT-qPCR. Histogram represents mean RNA levels as arbitrary units (A.U.) +/- SD of one experiment out of two biological replicates.
  • Figure 12 nds-2a localization in interphase cells, (a), Interphase BJ cells were processed for nds-2a RNA-FISH to detect either nds-2a forward (2a Fw, red channel) or reverse (2a Rv, green channel) strands. a-Tubulin (a-Tub) and DAPI staining are shown to delineate the cell. Merge images are depicted.
  • Figure 13 Cell cycle profile of nds-2a transfected cells, (a), HeLA cells were cotransfected with the indicated plasmids and transfection efficiency was calculated as the percentage of cell displaying a positive labelling for GFP (upper right). Non-transfected cell were used as an autofluorescence control, (b), Cell cycle profile of transfected HeLA cells with indicated plasmid pairs. Percentage of cells in each phase of the cell cycle are depicted.
  • Figure 14 Quality controls of global PLB985-derived stsRNA-Seq libraries, (a-b), Intensity correlation analysis of technical replicates at 20 nt resolution of long fragmented RNAs (50-70 nt) and naturally occurring small RNAs (18-30 nt) from global stsRNA-Seq libraries. Results are displayed as log 2 of original values.
  • Pearson's correlation coefficient values are shown, (c-e), Screenshots of long stsRNA-Seq showing the profile obtained for known genes (HSP90B1 in forward (Fw) strand and cl2orf73 in reverse (Rv) strand), lincRNAs (chr6: 141071891-141249602; Rv strand) and several SNAR RNAs precursors and its corresponding small RNAs (SNAR- A3; Rv strand; small stsRNA-Seq).
  • Figure 15 Modulation of ndsRNA levels by Retinoic Acid
  • (a) Flow cytometry analysis of PLB985 cells treated either with vehicle (ETOH) or retinoic acid (RA). Percentage of differentiated cells was determined by CDl lc/CD14 immunolabelling. Fluorescent background signal was assessed by labeling with fluorescently labeled non-specific isotypic antibodies
  • (b) Representation of the RNA levels of forward (Fw) and reverse (Rv) strands of randomly selected ndsRNAs transcripts in PLB985 samples treated with RA or vehicle determined by stsRT-qPCR. Histogram represents mean RNA levels as arbitrary units (A.U.) +/- SD of one experiment out of two biological replicates. Genome positions for the displayed examples are shown in Table 2.
  • Figure 16 Schematic overview of a particular embodiment of the invention, representing its modular workflow.
  • Figure 17 Experimental workflow and timing of a particular embodiment of the method of the invention.
  • the invention relates to a method of sequencing RNA molecules in a sample, the method comprising the steps of:
  • RNAs from a sample with a bait nucleic acid or nucleic acid set a) capturing RNAs from a sample with a bait nucleic acid or nucleic acid set
  • the proposed method allows the concomitant identification of long (i.e. more than 50 nucleotides) and short (i.e. 18 to 30 nucleotides) RNAs using a single experimental protocol. This method also allows identifying and characterizing a new class of RNAs, i.e. ndsRNAs. Generation of capture sequences
  • the method of the present invention is based on the direct hybridization of DNA traps, in particular randomly generated DNA traps such as (5')-biotinylated randomly generated DNA traps generated from a template of interest, such as one or more Bacterial Artificial Chromosomes (BAC) covering the full extent of a genomic region of interest, with either the naturally occurring small RNA fraction (such as miRNA, endosiRNAs, piRNAs, etc.) and/or the total RNA repertoire of a sample of interest.
  • the bait nucleic acid or nucleic acid set may thus be specific of a region of interest.
  • Regions of interest may be selected on the basis of previously collected biological/clinical data such as the presence of single nucleotide polymorphism associated to a particular/several diseases, trans location/amplification/integration hot-spots or known phenotype.
  • the region of interest may also correspond to mitochondrial DNA, virus-specific (DNA or RNA) genome or pathogen genome.
  • the genomic region of interest may also correspond to non-coding DNA.
  • the template used for generating the bait(s) contains or consists of a genomic DNA region of interest fragmented to a controlled size, for example into fragments 250-300 nucleotides long, by any fragmentation means available to those skilled in the art, for example by mechanical fragmentation means. The fragments are then collected, in particular after separation by electrophoresis and purification by gel excision, for instance.
  • the bait is derived from a DNA template carrying a genomic DNA region of interest such as a BAC, a PAC, a cosmid or a mini-chromosome.
  • a genomic DNA region of interest such as a BAC, a PAC, a cosmid or a mini-chromosome.
  • the genomic region of interest may be covered by a subset of contiguous BAC DNAs that fully cover the selected genomic region of interest.
  • suitable BACs may be identified by using the UCSC browser (http://genome.ucsc.edu) 32 .
  • baits may be obtained from single gene cDNAs or cDNA libraries in order to enhance coverage of the selected open reading frames to monitor gene mutations, differential promoter usage or identify novel iso forms.
  • Baits may be obtained from enzymatic or chemical fragmentation of the template DNA or, preferably, from polymerization reactions from the template using random primers, in particular using as a template one or more BACs.
  • random primers are used that are modified so that they can be easily purified.
  • the random primers are biotinylated primers. Thanks to this specific embodiment, the resulting baits are biotinylated DNA fragments that may be purified, either alone or as a complex with a complementary RNA by implementing the biotin/streptavidin specific interaction.
  • the step of generating capture sequences comprises the production of randomly generated 5 '-biotinylated DNA traps from BAC DNAs covering the genome region of interest whose transcripts are to be captured. Polymerization is performed using a polymerase, for example Klenow polymerase, for strand extension from the random primers. Bait size is controlled by the length of the template used, preferably 250-300 nucleotides long as described above, which will generate baits no longer than the template used. To avoid undesired capture events, template DNA of bacterial origin, e.g. BAC DNA, may be removed using a restriction enzyme that selectively cleaves the template DNA and leaves intact the generated baits such as the Dpnl enzyme.
  • a restriction enzyme that selectively cleaves the template DNA and leaves intact the generated baits such as the Dpnl enzyme.
  • the step of generating capture sequences comprises:
  • the test RNA (i.e. the RNA from which the capture is to be performed) may be obtained from any sample of interest.
  • the sample of interest may be a cell, a cell culture or a tissue from a subject, for example an animal subject, in particular a mammal or non-mammal animal, in particular from a human subject.
  • the sample may be a cell or tissue sample from a patient in a diseased state.
  • the present invention permits the identification and characterization of ndsRNAs involved in the disease or indicative of the pathology.
  • RNAs are isolated using methods well known in the art (Molecular Cloning: A Laboratory Manual, Third Edition. J. Sambrook, D. Russell). In particular, kits are readily available to those skilled in the art for performing total RNA extraction from a cell or tissue sample.
  • RNAs are extracted and purified using methods well known in the art.
  • RNAs are purified using Trizol, as is well known to those skilled in molecular biology.
  • ribosomal RNA may be depleted from the RNA extract such that the RNA transcripts are enriched regardless of their polyadenylation status or the presence of a 5'- cap structure (in figures 16 and 17, the term Ribominus is used for generally describing this ribosomal RNA depletion step).
  • Commercially available kits for depleting ribosomal RNA include the RiboMinus kit from Invitrogen.
  • single stranded RNA species may be depleted by using a RNase that only degrades single stranded RNAs, such as RNase ONE® from Promega.
  • the RNA extract may be assessed to determine whether it is free from genomic DNA. For example, quantitative PCR amplification of a short region (e.g. of about 100 bp) from a single copy gene may be implemented.
  • the topoisomerase (DNA) III Alpha (TOP3A) single copy gene may be assessed with the forward primer of SEQ ID NO: 8 (5 '-TC ATCTGTATGGCC AGGT AGG-3 ') and the reverse primer of SEQ ID NO:9 (5 '-GGAACCTTT AGGTTGTT AAC AGTTG-3 ') .
  • the RNA extract may be treated with a DNAse, such as the TURBO DNAse, followed by a new RNA extraction as described above, such as a Trizol extraction.
  • the captured RNAs correspond to a small RNA fraction of RNAs 18 to 30 nucleotide long.
  • long RNAs are captured, preferably after fragmentation (e.g. chemical or enzymatic fragmentation) to 50-80 nucleotide long RNAs, such as 50-70 or 60-80 nucleotide long RNAs.
  • fragmented long RNAs are 50-70 nucleotide long. Fragmentation of long RNAs can be carried out with divalent cation-based cleavage such as zinc-mediated RNA fragmentation, for example with a zinc based RNA fragmentation reagent such as "RNA fragmentation reagent®" from Ambion.
  • Both small and fragmented long RNA fractions may be purified according to methods well known in the art. In an embodiment, these fractions are purified after migration on denaturing polyacrylamide gel electrophoresis (e.g. PAGE-urea).
  • both small RNAs and long fragmented RNAs are captured in a separate or simultaneous reaction, preferably in separate reactions. In a specific embodiment, the small and long fragmented RNA fractions are captured in separate reactions.
  • RNA-capture protocols described to date require conversion of the input RNA material into a pre-amplified RNA-Seq library (first, second strand cDNA synthesis and PCR) prior to capture.
  • the method of the present invention streams its RNA input directly onto the DNA traps, in particular the 5'-biotinylated DNA traps, providing several methodological advantages.
  • the RNA isolation step comprises:
  • RNA sample i. gel-purifying size selected small 18-30 nucleotide long RNA population from a sample; ii. depleting ribosomal RNA from the RNA sample;
  • RNA molecules in particular the small fraction RNAs and the fragmented long RNAs described above, are incubated, either together or independently, with the baits generated from the template of interest under conditions allowing hybridization of the RNAs with the baits.
  • a small RNA fraction (18-30 nucleotide long) and a zinc-mediated fragmented gel-purified (50-80, such as 50-70 or 60-80 nucleotide long), ribosomal RNA-depleted, RNA fraction are independently mixed with 5'-biotinylated DNA traps for in-solution hybridization/RNA capture.
  • RNA molecules using DNA traps bears a kinetic advantage by favoring DNA/RNA hybrid formation over DNA/DNA or RNA/RNA hybridization.
  • high hybridization temperatures and stringent washing conditions may be used to increase capture-specificity and reduce off-target hybridization.
  • 50-80 such as 50-70 or nucleotide 60-80 nucleotide long
  • ribosomal RNA- depleted RNA pool.
  • one of skill in the art will adapt these conditions, in particular hybridization temperature, depending on RNA length or when a known bias in nucleotide content of the targeted RNA molecule is considered a major caveat.
  • RNAs i.e. RNAs forming a complex with baits
  • the baits are labeled random primers
  • the bait/RNA complexes are recovered implementing a selective recognition of said label.
  • bait/RNA complexes are recovered using a support grafted with streptavidin such as streptavidin (magnetic) beads, columns, etc.
  • streptavidin such as streptavidin (magnetic) beads, columns, etc.
  • RNAs may be captured using nucleic acid (e.g. DNA) arrays. Non-specifically bound material may be removed by washing the support, such as by sequential washings of increased stringency. Bound RNA is then eluted from the support.
  • dot-blot assays may be performed. Briefly, the material that was used as template (such as BAC DNA) for the generation of DNA traps (e.g. 5 '-biotinylated DNA traps) may be spotted onto a nitrocellulose membrane. Captured RNAs are then radiolabeled and used as probes for dotblot hybridization. Spotting in parallel BAC DNAs covering a non-related genomic region and which expresses known RNA targets may serve as negative control. This approach can be used by one skilled in the art to optimize hybridization temperatures and washing conditions during the initial experimental setup. The capture step implemented herein is used for capturing RNA molecules from a RNA extract.
  • DNA traps e.g. 5 '-biotinylated DNA traps
  • the method of the invention allows the capture of both small and long species of RNAs which allows the determination of precursor/product reactions from a long RNA into a biologically processed small RNA.
  • long ndsRNAs co-exist with shorter ndsRNAs which are identical in sequence with a part of the longer ndsRNAs. Accordingly, it is believed that, in a precursor/product point of view, the small ndsRNAs are generated from the long ndsRNAs.
  • the present invention relies on the implementation of strand specific RNA sequencing.
  • Strand specific RNA sequencing allows retrieval of the polarity of the transcript which in turn allows identifying double stranded RNA, if any, in an RNA sample.
  • potential ndsRNAs are identified when transcript overlaps in opposite strands are detected.
  • RNA sample for further sequencing may include:
  • step (1) optionally, purifying (e.g. on a PAGE/urea device) the 3'-ligated RNAs of step (1);
  • the method of the invention implements the following steps for preparing a RNA sample for further sequencing:
  • RNAs are dephosphorylated using a phosphatase (e.g; antartic phosphatase);
  • a phosphatase e.g. antartic phosphatase
  • RNA ligase such as T4 RNA ligase
  • RNA molecules are 5'-phosphorylated
  • RNA molecules are purified, for example by size separating them with denaturing gel electrophoresis and recovery by gel excision according to their expected size followed by RNA precipitation;
  • RNAs purified in the preceding step is ligated to the RNAs purified in the preceding step.
  • the 5'- and 3 '-ligated RNA molecules may be purified by denaturing gel electrophoresis and recovered by gel excision according to their expected size.
  • kits available for preparing a RNA with 3'-, 5' adapters include the DGE small RNA library kit from Illumina.
  • the 3 '-adapter is a R A adapter, in particular that having the sequence shown in SEQ ID NO: 3:
  • 3ddC denotes 3'-dideoxycytidine.
  • the 5 '-adapter is a RNA adapter, in particular that having the sequence shown in SEQ ID NO: 4:
  • 5InvddT denotes dideoxythymidine covalently linked to the remainder of the adaptor in a inverted orientation.
  • RNA oligonucleotide of known sequence bearing the described terminal modifications 3ddC for the 3'-adaptor; 5InvddT for the 5'-adaptor
  • the RNAs are then reverse transcribed with specific primers, for example with primers specific of the 3' and 5' adapters and the obtained cDNA may be amplified by PCR.
  • the invention may implement ligation-mediated reverse transcription followed by PCR amplification.
  • multiplexed library preparation can be implemented by using indexed primers during a PCR- based library amplification, allowing multiple samples to be sequenced in parallel in a single sequencing run.
  • a reverse transcription primer is used, having the sequence shown in SEQ ID NO:5: 5 '-CAAGCAGAAGACGGCATACGA-3 ' .
  • the primers used for PCR amplification implemented after reverse transcription are those shown in SEQ ID NO: 6 (5'-
  • Small and long R A sequences may first be preprocessed with a computer program to remove the 5'- and/or 3'-adapter sequence, in particular the 3'-adapter from the reads and, then, adaptor free datasets may be aligned to the genomic region of interest. Alignment may be done using a computer program such as the Bowtie Aligner program (Langmead et al. "Ultrafast and memory efficient alignment of short DNA sequences to the human genome”. Genome Biology, 2009). Since transcript polarity is conserved during the construction of the RNA- Sequencing library, the retrieved sequencing reads are 5 '-3' oriented. Therefore, single and potential double stranded RNA transcripts can be reconstructed and identified.
  • a computer program such as the Bowtie Aligner program (Langmead et al. "Ultrafast and memory efficient alignment of short DNA sequences to the human genome”. Genome Biology, 2009). Since transcript polarity is conserved during the construction of the RNA- Sequ
  • transcripts overlap in opposite strands is identified by using a computer program such as BedTools (Quinlan AR and Hall AM. "BEDTools: a flexible suite of utilities comparing genomic features". Bio informatics, 2009) and identified regions are extracted for further analysis.
  • the TEQC R package may be used to assess the capture performance 33 .
  • the Tophat2/Cufflinks pipeline may be used to map and reconstruct transcripts from ribosomial RNA-depleted, Zn 2+ fragmentation-generated long RNA libraries 34 ' 35 .
  • small RNA-derived reads are preferably mapped using Bowtie aligner under stringent mapping conditions (no mismatches allowe; Langmead et al., cited supra).
  • RNA libraries constructed from the small RNA fraction may be pre-processed to remove the 3 '-adapter.
  • small RNA- derived data will contain sequences of variable length that can be readilly mapped.
  • reads of less than 21 nucleotides may be removed from small RNA-obtained data, for increasing mapping accuracy and limiting the number of false-positive alignments.
  • RNA-sequencing library preparation method described above can be used under many different experimental circumstances, as global libraries can be readily prepared and obtained data either for small or long RNA datasets provide high confidence and reproducibility when mapped to a global reference. Moreover, this datasets can be readily used for quantification purposes with a high level of reproducibility and used as an alternative to commercial solutions.
  • ndsR As identification may be achieved through the correlation of the identified regions with small RNAs present in the small RNA datasets. Hence ndsRNA definition is based on the identification of transcript overlap in opposite strands for the long RNA datasets that correlates with the presence of small RNAs within their sequence (Precursor-product relationship).
  • ndsRNAs are identified using bio informatics tools as long double stranded RNAs. ndsRNAs differ from canonical small RNA pathways such as miRNA in that canonical small pathways generate small RNAs from a single stranded RNA precursor, whereas ndsRNAs are processed from a double stranded RNA expressed from opposite strands.
  • RNAse ONE or RNAse III can be used to further support the double stranded nature of the ndsRNA identified.
  • the invention relates also to a ndsRNA, in particular a ndsRNA identified according to the method described above.
  • the ndsRNA is ndsRNA-2a or ndsRNA- 2e.
  • the inventors have shown that ndsRNA-2a is involved in a mitosis-specific RAN containing complex, showing its potential involvement in mitosis.
  • ndsRNA-2a and ndsRNA-2e sequences are shown in SEQ ID NO:l and SEQ ID NO:2, respectively.
  • ndsRNAs have a functional role in the cell.
  • nds-RNA2a interacts with major mitotic components involved in fundamental aspects of cell physiology ranging from nuclear import/export to spindle assembly and mitotic progression.
  • overexpression of ndsRNA-2a leads to a range of mitotic defects and a pronounced change in nuclear shape highlighting its role in cell cycle progression.
  • ndsRNAs are modulated upon retinoic acid treatment, demonstrating that the novel RNAs are regulated by cellular cues and participate in a plethora of regulatory systems.
  • an object of the invention is also a method for identifying a marker associated with a phenotype or cell function, or for identifying a target for the treatment of a disease or for identifying a bio marker indicative of a disease or condition.
  • This method comprises the identification of ndsRNAs, or determining the expression profile of ndsRNAs in a sample of interest, thereby associating the ndsRNAs identified to a phenotype, cell function, or disease.
  • the sample of interest may correspond to a cancer cell and ndsRNA expression profiling allows the identification of biomarkers indicative of this cancer or the identification of a ndsRNA which could be the target of a treatment (for example by increasing or decreasing the expression of said ndsRNA).
  • the invention also relates to a ndsRNA that may be identified thanks to the method described in the preceding paragraph.
  • the invention further relates to a method for identifying the function of a ndsRNA, wherein said ndsRNA is either introduced or depleted in a cell, tissue, organ or organism (in particular a mammal organism, more particularly a non-human organism) and phenotypic or functional changes occurring after said introduction or depletion are determined.
  • Representative changes searched include, for example, changes in the cell cycle, cell shape, induction of apoptosis, induction of cell differentiation, induction of a sensitivity or resistance to a therapeutic molecule, etc.
  • the characterization of a ndsRNA may also comprise the identification of binding partners of said ndsRNA, in particular of binding proteins, for example using a mass spectrometry (MS) analysis, as provided in the examples.
  • MS mass spectrometry
  • the binding partners are identified using a biotinylated RNA which is incubated with a protein sample, for example a whole cell extract, a nuclear extract, a cytoplasmic extract or with a protein produced in vitro, the biotinylated ndsRNA:protein complex is then captured on a streptavidin covered support and then protein analysis is performed, in particular using a MS analysis.
  • a protein sample for example a whole cell extract, a nuclear extract, a cytoplasmic extract or with a protein produced in vitro
  • the biotinylated ndsRNA:protein complex is then captured on a streptavidin covered support and then protein analysis is performed, in particular using a MS analysis.
  • PLB985 cells were grown in RPMI medium supplemented with 25 mM HEPES, 10% FCS and glutamine.
  • BJ and BJELR cells were grown in DMEM/M199 1 :4 (lg/1 glucose) supplemented with 10% FCS.
  • HeLA cells were grown in DMEM (lg/1 glucose), 5% FCS supplemented with glutamine.
  • Total RNA was extracted using Trizol (Invitrogen) according to manufacturer instructions.
  • RNA fragmentation Ribominus RNA (Invitrogen) was fragmented in zinc-based RNA fragmentation reagent (Ambion) during 6 min at 70°C, separated by PAGE/urea and 50-70 nt RNA was recovered by NaCl overnight elution.
  • BACs (RP11-3O20, RPl 1-44018, RP11-770K21, RP11-588B17) covering the RAM region were obtained from Children's Hospital Oakland Research Institute (CHORI). Briefly, 200 ng of an equimolar mix of BAC DNA was sonicated for 37 cycles (10 sec ON, 50 sec off, amplitude 30%) in a Vibra-Cell apparatus (Bioblock Scientific) in lysis buffer (50 mM HEPES H 7.5, 140 mM NaCl, 1% Titron X-100, 0.1% Na-Deoxycholate, supplemented with protease inhibitors). Sonicated BAC DNAs were size-separated by agarose gel electrophoresis and 200-300 bp band was purified by QIAquick column (Qiagen). TRAPs
  • Traps were generated by using MEGAPrime random primer labelling system (Amersham) with 250 ng of BAC DNA library and 5' Biotin-random primers. After Klenow extension, BAC DNA was removed from the Traps by Dpn I treatment. RNA capture
  • Specific RAM-region traps were generated by random priming of 200-300 bp purified sonicated BACS (RP11-44018, RP11-588B17, RP11-770K21 and RP11-3O20) using 5 '- biotinylated primers.
  • Long RNA fraction was prepared by chemical fragmentation of total RNA (50-70 nt, Ambion) and further purified by PAGE, whereas the small RNA fraction (18- 30 nt) was directly prepared by PAGE.
  • RNA traps were incubated with the traps in Binding buffer (0.5 M NaCl, 0.01 M Tris-HCl pH 7.5, 0.5% SDS, 0.1 mM EDTA) at 62°C (small RNAs) or 68°C (long RNAs) overnight. RNA traps were further recovered using magnetic streptavidin beads, RNA eluted into Elution buffer (0.01 M Tris-HCl pH 7.5, 1 mM EDTA) and further purified by PAGE according to previous size selection.
  • Binding buffer 0.5 M NaCl, 0.01 M Tris-HCl pH 7.5, 0.5% SDS, 0.1 mM EDTA
  • RNAs were dephosphorylated with 5 units of Antarctic phosphatase (NEB) separated by PAGE/Urea gel electrophoresis and purified by gel excision according to prior size selection.
  • NEB Antarctic phosphatase
  • 3' RNA Adapter (Illumina) was ligated to purified fragments with T4 RNA ligase during 6 h at 20°C followed by an overnight incubation at 4°C. Ligated fragments were size separated by PAGE/Urea and purified according to prior size selection.
  • 3'-ligated RNAs were further 5'phosphorylated by PNK treatment for 1 h at 37°C, size selected by PAGE/Urea and purified.
  • RNA Adapter (Illumina) was ligated to 3'-ligated RNA fragments with T4 RNA ligase during 6 h at 20°C and further incubated over night at 4°C. 5 -3' adapter-ligated RNA was separated by PAGE/Urea and purified between 70-90 nt for the small RNA fraction and between 100-130 nt for the long RNA fraction. Reverse transcription was performed by using Superscript II (Invitrogen) with specific primers for 1 h at 44°C. Final amplification was performed by 15 cycles of PCR amplification using Phusion DNA polymerase (Finnzymes). Library quality and ligation steps were assessed when possible by Agilent Bioanalyzer.
  • RNA-Seq Strand-specific RNA-Seq data was analyzed by custom scripts to remove low quality reads, reads shorter than 21 nt in the case of small RNA libraries and shorter than 35 for long RNA libraries, adapter contamination and empty reads.
  • Intensity correlation analysis was performed at 20 nt resolution with a custom pipeline for the RAM region and for global analysis.
  • a second correlation analysis was performed by a binary analysis indicating whether transcripts were present or not in a determined window by using a custom pipeline.
  • Class I-IV transcripts were validated by strand-specific reverse transcription followed by PCR. Specific primers were designed with T7 promoter overhanging bases in order to establish reverse transcription orientation. Small RNA determination was assessed with custom Taqman primers (Applied Biosystems) and relative expression levels were determined by reverse transcription followed by real time PCR.
  • 5x106 PLB cells were centrifuged at 700 rpm during 7 min, washed twice with ice cold PBS and lysed with microRNA isolation kit, human AG02 (Wako chemicals) lysis buffer and immunoprecipitation was performed following manufacturer's instructions. Eluted samples were further processed for RNA purification by Trizol (Invitrogen) extraction. Small RNA determination was assessed by custom Taqman primers (Applied Biosystems). Specific primers for U6 RNA, U49 snoRNA and let-7c miRNA were used according to manufacturer's instructions (Applied Biosystems). Immunoprecipitated AG02 levels were assessed by SDS- PAGE followed by silver staining and Western blot against AG02.
  • Transient transfection in BJELR cells was done following standard reverse transfection protocols using lipofectamine RNAiMAX (Invitrogen).
  • ON-target plus smart pools for knocking down Dicer 1 (L-003483-00), Drosha (L-016996-00), AG02 (L-004639-00) and EXOSC3 (L-03195501) as well as scramble negative control (D-001210-01-05) were purchased from Dharmacon and used at a final concentration of lOmM. Samples were collected 72 h post-transfection. Knock down efficiency was controlled by RT-qPCR (using customized primers, sequences are available upon request) and western blot assays.
  • BJELR cells (80% confluence) were rinsed twice with ice-cold PBS, collected in PBS and recovered in lx hypotonic buffer (Cellytic Nuclear Xtract - Sigma) supplemented with 1 mM DTT and protease inhibitors, incubated 15 minutes on ice, vortexed in the presence of Igepal, spun down at 11000 rpm and the supernatant conserved as the Cytoplasmic fraction. Immediately after, the pellet was resuspended in Extraction buffer (Cellytic Nuclear Xtract) supplemented with 1 mM DTT and protease inhibitors and incubated 30 minutes in a thermomixer at 1400 rpm at 8°C (Nuclear fraction). All obtained fractions were aliquoted, flash-frozen and stored at -80°C.
  • ndsRNA electrophoretic mobility shift assay EMS A
  • NdsRNA-2a and ndsRNA-2e sequences were cloned into pGEM-T easy vector and further PCR amplified using T7 tagged oligonucleotides from both flanks (Expand High Fidelity - Roche). PCR products were in- vitro transcribed (MegaScript RNAi - Ambion) and 5 '- radioactively labelled by Poly Nucleotide Kinase (PNK - Promega).
  • Nuclear extract was incubated with radio labelled ndsRNA-2a/2e ( ⁇ 25 fmol/reaction) in DBD buffer (10 mM Tris- HC1 pH 8, 0.1 mM EDTA pH 8, 0.4 mM DTT, 5% Glycerol, tRNA, supplemented with NaCl according to experimental setup) and incubated during 15 min at room temperature prior to native PAGE. Competition was achieved by addition of non-radioactive ndsRNA-2a or ndsRNA-2e (50 fmo 1/200 fmol range). When purified recombinant proteins were used, 120 ng of RAN or RCC1 (Origene Technologies) were incubated with ndsRNA-2a or ndsRNA-2e as described above.
  • ndsRNA in vitro binding assay ndsRNA-2a and ndsRNA-2e were PCR amplified, in vitro transcribed with MegaScript RNAi kit (Ambion) and 3 '-biotinylated using RNA 3 '-biotinylation kit (Pierce) according to the manufacturer's instructions. Biotinylated ndsRNAs were immobilized in 5 mM Tris-HCl pH 7.5, 0.5 mM EDTA and 1M NaCl to "my ONE" streptavidin magnetic beads (Invitrogen) for 1 h at 22°C in thermomixer prior to nuclear extract incubation.
  • Nuclear extract was prepared as described above but subjected to two rounds of pre-clearing with "my ONE" magnetic beads in DBD buffer lx supplemented with tRNA and NaCl prior to interaction with immobilized ndsRNAs.
  • Final N.E./ndsRNA incubation was performed at 22°C in a thermomixer for 15 min and further washed 3 times with DBD buffer supplemented with NaCl and NP-40 at room temperature.
  • magnetic beads were recovered in Laemmli buffer, boiled for 10 min and separated by SDS-PAGE or eluted in 1 M NaCl prior to Liquid Chromatography followed by Mass Spectrometry (LC-MS/MS). Protein composition was evaluated by silver staining or Western blot when appropriate.
  • RNA Fluorescence in-situ hybridization coupled to immunocytochemistry ndsRNA-2a sequence was PCR amplified and in- vitro transcribed with T7 RNA polymerase using Chromatide Alexa Fluor 546-14-UTP or Chromatide Alexa Fluor 488-5-UTP as a source of UTP. Fluorescently labeled forward or reverse ndsRNA-2a strands were Trizol purified and stored at -80°C. For immunofluorescence analysis, BJ and HeLA cells were grown in round coverslips and treated according to each experimental setup.
  • Cells were fixed with 3% paraformaldehyde, 4% sucrose in 10 mM PBS for 10 min, permeabilized with 0,25% Triton X- 100 in 10 mM PBS for 10 min, and then blocked for 1 h in 1% BSA in 10 mM PBS (Blocking buffer). Coverslips were incubated over night at 4 °C in blocking buffer containing RAN (#4462, Cell Signalling), RCC1 (#5134, Cell Signalling), RANGAP1 (ab92360, Abeam) or RANBP2 (ab64276, Abeam) antibodies. Cells were washed twice in 10 mM PBS, 0.1% Tween 20, and incubated with secondary antibody Alexa 488 (Molecular Probes).
  • ndsRNA-2a strand specific probes were hybridized in hybridization buffer (2 ⁇ SSC, 20%> dextran sulfate, and 1 mg/mL BSA) overnight at 37 °C in a thermomixer humid chamber, washed twice with 2 » SSC in 50%> formamide, twice with 2 » SSC, and counterstained with DAPI. Finally coverslips were mounted in ProLong Antifade (Molecular Probes), and visualized on a confocal laser-scanning microscope SP2-MP (Leica).
  • ndsPvNA-2 overexpression pTREG-bi plasmid (Clontech) bearing an inducible bidirectional promoter was modified in order to express a 5' (termed CD) or 3' (termed AB) SP6-tagged version of ndsRNA-2a.
  • HeLA Tet-ON 3G cells were AB/CD trans fected, treated with doxy eye line 12 h postransfection and collected in trizol for RNA analysis or processed for immunocytochemistry 24 h later.
  • Overexpression and double-stranded nature of exogenous ndsRNA-2a was confirmed by reverse transcription using T7 flagged primers that recognizes the SP6 tag followed by PCR on RNAse ONE treated samples.
  • HeLA cells were co-transfected with ndsRNA-2a overexpressing plasmids (AB/CD) and histone Hl- GFP (AB/CD:H1-GFP 4: 1 ratio) to control in-well efficiency of transfection.
  • Cells were fixed, permeabilized and stained for a-tubulin as previously described.
  • the number of cells displaying an abnormal nuclear morphology, chromatin bridges and bi/multinuclei were determined by double blind analysis in 3 independent experiments (3000 cell counted for each condition).
  • RNA transcripts arising from unexplored regions within the genome, leading to the discovery of novel regulatory paradigms 1"5 .
  • disease-associated markers such as translocations, chromosomal rearrangements and single nucleotide polymorphisms (SNPs) remains largely unexplored 6"8 .
  • One of those regions encompasses -500 kb on chromosome 8 (RAM region: 130,269,750-130,744,812), is critically involved in retinoic acid induced differentiation 9 and contains multiple disease susceptibility SNPs 10"17 .
  • RNAs map on both sense and antisense strands from the RAM region.
  • sense-antisense RNA pairs coexist within the same cell and generate stable long natural double-strand RNA (ndsRNA).
  • ndsRNAs are mainly localized in the nucleus and establish specific interactions with nuclear components.
  • ndsRNA-2a interacts with the mitotic RAN/RANGAP 1 -SUMO 1/RANBP2 complex in a RAN-dependent manner and displays differential nuclear localization throughout the cell cycle.
  • ndsRNA- 2a overexpression leads to a range of mitotic defects and a pronounced change in nuclear shape highlighting its involvement in cell cycle progression.
  • global strand- specific RNA sequencing show that ndsRNA signatures are genome wide interspersed and revealed that ndsRNA molecules are modulated upon cellular cues. Taken together this study reveals ndsRNAs as novel members of the natural RNA-repertoire in human cells that are involved in a plethora of regulatory processes.
  • RNA capture approach (Fig. 5a) coupled to a customized strand-specific RNA sequencing protocol (stsRNA-Seq).
  • stsRNA-Seq a customized strand-specific RNA sequencing protocol
  • This technology permits the concomitant identification of long (>50 nt) and small (18-30 nt) RNAs using a single experimental protocol.
  • DNA traps obtained from random priming of BAC DNA covering the RAM region were hybridized to either the naturally occurring small RNA fraction (18-30 nt) or chemically fragmented and size selected (50-70 nt) total RNA from human leukemic PLB985 cells.
  • RNA profiling revealed a plethora of RNAs mapping to either strand of RAM region including 437 long (Fig. la, upper panel; Fig. 5a) and 630 small RNAs (Fig. la, lower panel). Importantly, -90% of the identified transcripts were detected in three independent experiments indicating a high level of reproducibility among biological and technical replicates (Fig. lb; see material and methods). Bioinformatics analysis between transcripts from the long and small RNA datasets identified four different RNA classes (Fig. lc, Class I-IV).
  • Class I comprises 'classical' transcripts from the long RNA fraction mapping on either forward (Fw) or reverse (Rv) strands, that do not overlap either with long RNAs on the opposite strand nor with small RNAs (Fig. lc and Fig. 5c, Class I).
  • Class II transcripts are long RNA molecules mapping to one strand and overlap with a small RNA (sRNA; Fig. lc and Fig. 5d).
  • sRNA small RNA
  • the existence of Classes III and IV was unexpected as these RNAs correspond to overlapping long transcripts from both strands and represent -22% of all mapped RNAs (Fig. lc and Fig. 5b, e-f).
  • ndsRNAs are natural components of human cells
  • the validation of 11/11 Class III-IV long complementary RNAs prompted us to analyze whether these molecules exist as double-stranded RNA within the cell. If these overlapping transcripts exist as double-strand RNA they should be resistant against an RNAse displaying single-strand specificity (RNAse ONE). Indeed, when total RNA from PLB985 cells was subjected to R Ase ONE treatment Class III-IV transcripts were protected from RNAse degradation (Fig. Id and Fig. 6a). Contrary, single strand Class I-II and GAPDH transcripts were not protected.
  • RNAse III double-strand RNA specificity
  • ndsRNAs are predominantly nuclear (Fig. If, upper row), whereas their corresponding small RNAs are located either exclusively in the nucleus or in both nuclear and cytoplasmic fractions (Fig If, lower panel).
  • the nuclear localization of ndsRNAs prompted us to analyze whether these molecules interact with nuclear proteins. Therefore we performed electrophoretic mobility shift assays with 2 previously identified radioactively labeled ndsRNAs (nds-2a and nds-2e) and nuclear extract obtained from BJELR cells. The results indicated that both ndsRNAs specifically interact with nuclear proteins (Fig. 2a).
  • nds-2a binds a mitosis-specific RAN containing complex
  • Fig. 2b Fig. lOb-c
  • RAN, RCC1, RANGAP1 and RANBP2 were found to interact with nds-2a (Fig. lOd; note the high peptide coverage).
  • these partners are major mitotic components involved in fundamental aspects of cell
  • nds-2a was biotin-immobilized, incubated with nuclear extract and the presence of these 4 proteins in the bound fraction was confirmed by Western blot. None of the analyzed proteins was detected when nds-2e was used as bait, supporting that nds-2a bind selectively to these components of the mitotic machinery (Fig. 2c). Importantly, we observed that only the mitosis-specific sumoylated form of RANGAPl
  • n (RANGAPl -SUMO 1) was present in the complex with nds-2a.
  • RAN, RCCl, RANGAPl and RANBP2 were immunoprecipitated from BJELR cells and the coprecipitated fraction was evaluated for nds-2a presence by stsRT-qPCR (Fig. 2d and Fig. 1 lb). Both nds-2a forward and reverse strands were enriched in the coprecipitated material, supporting that nds-2a interacts with members of the RAN complex in vivo.
  • RAN, RCCl, RANGAPl or RANBP2-depleted nuclear extracts were used to perform in vitro interaction assays in the presence of biotin-labeled nds-2a.
  • the interaction of nds-2a with the RAN complex was abrogated in RAN depleted nuclear extracts, since the absence of RAN impaired the detection of RANGAPl or RANBP2. Contrary, RCCl binding remained unaffected, thus revealing RAN-independent interaction.
  • nuclear extracts depleted for RCCl were used, RAN/RANGAP1/RANBP2 interaction was unaffected.
  • RNA-FISH RNA-Fluorescence In Situ Hybridization
  • nds-2a is a functionally important component of the mitotic machinery and highlights its biological relevance in a complex biological setting such as mitotic progression.
  • ndsRNAs are modulated by cellular cues and represent a novel class of R A
  • ndsRNA-seq libraries demonstrated that ndsRNAs are expressed throughout the entire genome and are more abundant in intergenic regions than in exons/introns (Fig. 4a and Fig. 14). Moreover, detailed database exploration showed that ndsRNAs map within different RNA classes (Fig. 4b) suggesting that ndsRNAs are not restricted to any previously described transcript family, but rather represent a novel class of RNAs interspersed within the human genome. Notably, we observed that a subset of globally expressed ndsRNAs is modulated upon retinoic acid treatment (Fig. 4c-d and Fig.15) supporting the notion that these novel molecules are regulated by cellular cues and might participate in a plethora of regulatory systems. Genome positions for the displayed examples are shown in Table 2.
  • ndsRNAs map to interspersed elements along the genome indicating that they correspond to a new class of RNAs.
  • ndsRNAs were merely sRNA precursors
  • ndsRNAs establish specific RNA-protein interactions suggesting that these molecules serve diverse functions within the cell.
  • nds-2a displays differential localization throughout the cell cycle, interacts with the mitotic RAN/RANGAP1 SUM01/RANBP2 complex and localizes within the mitotic spindle, supporting its biological relevance.
  • nds-2a overexpression leads to a range of mitotic defects and a pronounced change in nuclear shape highlighting its role in cell cycle progression. All in all, our work expands the already complex RNA catalog and demonstrates that ndsRNAs play fundamental roles in cellular physiology.
  • Circular RNAs are a large class of animal RNAs with regulatory potency. Nature 495, 333-8 (2013).
  • RNA exosome depletion reveals transcription upstream of active human promoters. Science 322, 1851-4 (2008).
  • Hummel, M et al. TEQC an R package for quality control in target capture experiments.

Landscapes

  • Life Sciences & Earth Sciences (AREA)
  • Chemical & Material Sciences (AREA)
  • Proteomics, Peptides & Aminoacids (AREA)
  • Organic Chemistry (AREA)
  • Zoology (AREA)
  • Wood Science & Technology (AREA)
  • Health & Medical Sciences (AREA)
  • Engineering & Computer Science (AREA)
  • Microbiology (AREA)
  • Immunology (AREA)
  • Biotechnology (AREA)
  • Molecular Biology (AREA)
  • Biophysics (AREA)
  • Analytical Chemistry (AREA)
  • Physics & Mathematics (AREA)
  • Biochemistry (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Health & Medical Sciences (AREA)
  • Genetics & Genomics (AREA)
  • Measuring Or Testing Involving Enzymes Or Micro-Organisms (AREA)

Abstract

The present invention relates to a method for sequencing and identifying novel classes of RNAs from a sample.

Description

METHOD OF SEQUENCING AND IDENTIFYING RNAs
The present invention relates to a method for sequencing and identifying novel classes of RNAs from a sample. More particularly, we herein describe the discovery and characterization of natural double-stranded RNAs (ndsRNAs).
Background of the invention Recent advances in high throughout sequencing technologies have disheveled an enormous diversity of RNA transcripts arising from unexplored regions within the genome, leading to the discovery of novel regulatory paradigms1"5. However, the transcriptional profile of hundreds of large genomic regions displaying disease-associated markers such as translocations, chromosomal rearrangements and single nucleotide polymorphisms (SNPs) remains largely unexplored6"8.
Mercer et al.4 describe a method for targeted sequencing of the human transcriptome that reveals its deep complexity. Ng et al.31 describe the targeted capture and massive parallel sequencing of human exomes. These reports provide insights on exome and transcriptome complexity. However, there is still a need for a more detailed analysis of the transcriptome and for tools that would allow easy and universal exploration thereof for the identification of novel targets and/or novel RNA species.
Summary of the invention
An object of the invention is a method of sequencing RNA molecules in a sample comprising: a) capturing RNAs with a bait nucleic acid or nucleic acid set;
b) sequencing the captured RNAs and predicting double-stranded RNA sequences. The capture and sequencing steps are defined in greater details below.
The method of sequencing of the invention has allowed to unexpectedly discover a new class of RNAs herein after referred to as natural double-stranded RNAs (or ndsRNAs). Accordingly, ndsRNAs represent another object of the present invention. Furthermore, another object of the invention relates to a method of identifying or characterizing a ndsRNA in a R A sample, comprising implementing the above method of sequencing on the R As contained in said sample thereby identifying or characterizing ndsRNAs contained therein.
A further object of the invention relates to a method for identifying a marker associated with a phenotype or cell function, or for identifying a target for the treatment of a disease or for identifying a bio marker indicative of a disease or condition. This method comprises the identification of ndsRNAs, or determining the expression profile of ndsRNAs in a sample of interest, thereby associating the ndsRNAs identified to a phenotype, cell function, or disease. For example, the sample of interest may correspond to a cancer cell and ndsRNA expression profiling allows the identification of biomarkers indicative of this cancer or the identification of a ndsRNA which could be the target of a treatment (for example by increasing or decreasing the expression of said ndsRNA).
Brief description of the drawings
Figure 1: identification of RAM-derived RNAs. (a), Transcript profile of RAM-derived long and small libraries, (b), Intensity correlation analysis and Pearson's correlation coefficient values (R) of technical replicates, (c), Screenshots of Class I-IV long and small RNA sequencing data, (d-e), RNAse ONE or RNAse III protection assays followed by stsRT- PCR of Class III-IV ndsRNAs and Class I-II transcripts. GAPDH is depicted as an RNAse treatment control, (f), stsRT-qPCR of Class III and IV long RNAs and small RNAs in PLB985-total RNA from whole cell (Input), nuclei (Nuclei) or cytoplasmic extracts (Cytoplasm). let-7c and U49 snoRNA are cytoplasmic and nuclear controls, respectively. Opposite bars correspond to matching sense/antisense pairs. Histogram represents mean RNA levels as arbitrary units (A.U.) +/- SD of one representative experiment out of three independent biological replicates, (g), stsRT-qPCRs of RAM-derived Class III (from left to right: ndsRNA3, 4, 5, 6, 8 and 9), IV long (from left to right: ndsRNAl and 2) and small (from left to right class III: sRNA 5, 6, 7, 8, 10 and 1 1 and IV: sRNA 1, 2, 3 and 4) RNAs in BJELR cells. Opposite bars correspond to matching sense/antisense pairs. RNA levels are shown as arbitrary units (A.U.) +/- SD. Genome positions are shown in Table 1.
Figure 2: nds-2a establishes specific RNA-protein interactions and displays cell cycle- dependent subcellular localization, (a), Electrophoretic mobility shift assay performed with radioactively labelled (*) nds-2a/2e incubated with BJELR nuclear extract (N.E). Specificity was confirmed by competition with non-radioactive nds-2a/2e. (b), Gene ontology enrichment analysis performed for nds-2a associated proteins identified by mass spectrometry, (c), Immobilized nds-2a/2e were incubated with N.E. and the presence of RAN, RCCl, RANGAPl and RANBP2 in the ndsRNA-bound fraction was analysed by Western blot, (d), RAN, RCCl, RANGAPl and RANBP2 proteins were immunoprecipitated and the levels of nds-2a forward (2a Fw) and reverse (2a Rv) strands in each fraction was analysed by stsRT- qPCR. Histogram represents mean RNA levels as arbitrary units (A.U.) +/- SD of one experiment out of three biological replicates, (e), Immobilized nds-2a was incubated with RAN, RCCl, RANGAPl or RANBP2 siRNA-depleted nuclear extracts and the nds-2a protein bound fraction was analysed for RAN, RCCl, RANGAPl and RANBP2 proteins by Western blot. N.E. depletion was controlled by Western blot and is depicted as input, (f), Interphase BJ cells were processed for nds-2a RNA-FISH (red channel) coupled to immunocytochemistry for RAN, RANGAPl and RANBP2 (green channel) and analysed by confocal microscopy. a-Tubulin (a-Tub) and DAPI staining are shown to delineate the cell. Merge and enlarged images (Inset) are depicted, (g), nds-2a and RAN, RANGAPl and RANBP2 localization was analyzed by confocal microscopy in metaphase BJ cells. DAPI staining delineates the metaphasic plaque. Merge and enlarged images (Inset) are depicted, (h), Interphase BJ cells were processed for nds-2a RNA-FISH to detect either nds-2a forward (2a Fw, red channel) or reverse (2a Rv, green channel) strands. a-Tubulin (a-Tub) and DAPI staining are shown to delineate the cell. Merge images are depicted, (i), nds-2a and a-Tubulin (a-Tub) localization was analyzed by confocal microscopy in metaphase BJ cells. Merge and enlarged images (Inset) are depicted.
Figure 3: nds-2a overexpression leads to a range of mitotic defects and pronounced changes in nuclear shape, (a), Diagram of pBI-nds-2a plasmid and overexpressed SP6- tagged nds-2a (AB and CD), (b), RNAse ONE protection assays followed by stsRT-PCR of overexpressed SP6_,tagged ndsRNA-2a. GAPDH is depicted as RNAse treatment control, (c- f), HeLA cells overexpressing nds-2a variants (AB or CD) were processed for immunocytochemistry against α-Tubulin (a-Tub, red channel), counterstained with DAPI (blue, channel) to delineate the cellular contour and analyzed by confocal microscopy. Empty plasmid was used as control. Histone Hl-GFP (Hl-GFP, green channel) expressing plasmid was co-transfected to monitor transfection efficiency (Fig. 13a). The number of bi/multinucleated cells, number of chromatin bridges and cells displaying abnormal nuclear shape was determined by a double blind analysis. Data displayed corresponds to one representative experiment out of 3 independent biological replicates. Total number of cells in G2-M phase was determined by flow cytometry for each condition and arbitrarily set to 100% (Fig. 13b). Percentage of abnormal cells per treatment was calculated relative to G2-M cells. Representative images for each condition are depicted in e and f. Bi/multinucleated cells in E and cells displaying abnormal nuclear shape in f are indicated by arrowheads.
Figure 4: Genome- wide expression of ndsRNAs. (a), Pie chart representing percentages of ndsRNAs mapping to exons, introns or intergenic regions, (b), Pie chart representing ndsRNAs compared to fRNAdb annotated features. Different RNA species contained in fRNAdb are color-coded, (c), Retinoic acid-modulated ndsRNA and nds-derived small RNA in PLB985 cells are depicted, (d), RNA levels of forward (Fw) and reverse (Rv) strands of modulated ndsRNAs from RA or vehicle-treated cells was determined by stsRT-qPCR. ICAM1 and PRC mRNAs are shown as controls. Results are expressed as arbitrary units (A.U.) +/- SD of one out of two biological replicas. Genome positions are shown in Table 2. Figure 5: Identification and validation of RAM-derived RNAs. (a), RNA capture approach, (b), Pie chart representing the relative distribution of Class I-IV RAM-derived transcripts, (c-f , Schematic representation and validation of single-stranded Class I-II and natural double-stranded (ndsRNAs) Class III-IV long RNAs, as well as small RNAs from Class II-IV transcripts by stsRT-PCR.
Figure 6: RAM-derived RNAs in PLB985 and BJELR cells, (a-b), RNAse ONE or RNAse III protection assays followed by stsRT-PCR of Class III-IV RNA duplexes and Class I-II transcripts. GAPDH is depicted as an RNAse treatment control, (c), Detection by stsRT-PCR of RAM-derived Class II-IV long and small RNAs in RAS-transformed foreskin fibroblast (BJELR).
Figure 7: RAM-derived small RNAs are neither products of pervasive transcription nor byproducts of the canonical miRNA machinery, (a), Determination of Drosha, Dicer, AG02 and EXOSC3 knock-down efficiency by RT-qPCR in BJELR cells. niRNA levels are expressed relative to untreated mock samples. Histogram represents mean RNA levels as arbitrary units (A.U.) +/- SD of one experiment out of three biological replicates, (b), Determination of Drosha, Dicer, AG02 and EXOSC3 knock-down efficiency by Western blot. Actin levels are shown as loading control, (c), Analysis of the impact of Drosha, Dicer, AG02 and EXOSC3 knock-down on the levels of previously reported miRNAs. (d-g), Quantification of Class II and nds-derived sRNAs as well as miR-93 control upon Drosha (d), Dicer (e), AG02 (f) or EXOSC3 (g) knock down in BJELR cells. The expression level for each sRNA in scramble siRNAs-transfected (scr) samples was arbitrarily set to 1. Values upon knock down are expressed relative to the scr. Histogram represents mean RNA levels as arbitrary units (A.U.) +/- SD of one experiment out of three biological replicates.
Figure 8: RAM-derived small RNAs are loaded into AG02. (a-b), Silver staining and Western blot for argonaute-2 (AG02) in whole cell lysates (input) and AG02- immunoprecipitated material (AG02 IP). AG02, heavy (he) and light (lc) chains of the IP- antibody are indicated, (c-d), Detection by stsRT-qPCR of Class II and nds-derived sRNAs in RNA extracted from whole cell lysates (input) or AG02-immunoprecipitated material from PLB985 cells. let-7c and U6 snoRNA are included as AG02-loaded positive and negative controls. Histogram represents mean RNA levels as arbitrary units (A.U.) +/- SD of one experiment out of two biological replicates.
Figure 9: RAM-derived long RNAs are neither modulated upon exosome depletion nor by the miRNA pathway, (a-d), Quantification of Class II (RNA1 : B2, RNA2: B3, RNA3: B4, RNA4: B6, RNA5: B7 and RNA6: B8), Class III (ndsRNA3: C1/C2, ndsRNA4: C3/C4, ndsRNA5: C5/C6, ndsRNA6: C7/C8, ndsRNA7: C9/C10, ndsRNA8: C11/C12, ndsRNA9: C13/C14) and Class IV (ndsRNAl : D1/D2 and ndsRNA2: D3/D4) long RNAs precursors upon Drosha (a), Dicer (b), AG02 (c) or EXOSC3 (d) knock down in BJELR cells. The expression level of each RNA in scramble siRNAs-transfected (scr) samples was arbitrary set as 1. Values upon specific knock-down are expressed relative to the scr sample. Histogram represents mean RNA levels as arbitrary units (A.U.) +/- SD of one experiment out of three biological replicates.
Figure 10: ndsRNAs establish specific protein interactions, (a), Nuclear proteins interacting with nds-2a are revealed by SDS-PAGE followed by silver staining, (b-c), Gene ontology enrichment analysis of nds-2a and nds-2e interacting proteins identified by mass spectrometry, (d), Peptide coverage of nds-2a interacting RAN, RCC1, RANGAPl and RANBP2 identified by mass spectrometry.
Figure 11: nds-2a binds RAN and RCC1 in vitro and in vivo, (a), The specificity of RAN, RCC1, RANGAPl and RANBP2 antibodies was tested by a siRNA based approach, a- Tubulin (a-Tub) is shown as a loading control, (b), RAN, RCC1, RANGAPl and RANBP2 proteins were immunoprecipitated with specific antibodies and their corresponding levels were analysed by Western blot to determine immunoprecipitation efficacy, (c), Electrophoretic mobility shift assay performed with radioactively labelled (*) nds-2a/2e incubated with increasing concentration of recombinant RAN or RCC1 (RAN/RCCl = 120 ng and RAN++/RCC1++ = 240 ng). (d), BJELR cells were sorted according to cell cycle phases (Gl , S and G2-M) by flow cytometry. Panels show a normal cell cycle and the enrichment of the FACS sorted populations in the indicated phases of the cell cycle, (e), Sorted cells from d were analyzed for nds-2a forward (2a Fw) and reverse (2a Rv) strand levels by stsRT-qPCR. Histogram represents mean RNA levels as arbitrary units (A.U.) +/- SD of one experiment out of two biological replicates.
Figure 12: nds-2a localization in interphase cells, (a), Interphase BJ cells were processed for nds-2a RNA-FISH to detect either nds-2a forward (2a Fw, red channel) or reverse (2a Rv, green channel) strands. a-Tubulin (a-Tub) and DAPI staining are shown to delineate the cell. Merge images are depicted. Note that no signal for nds-2a is retrieved when no initial heat denaturation is applied to the slides before hybridization, (b), Interphase BJ cells were processed for nds-2a RNA-FISH (red channel) coupled to immunocytochemistry for RAN, RANGAPl and RANBP2 (green channel) and analysed by confocal microscopy. a-Tubulin (a-Tub) and DAPI staining are shown to delineate the cell. Merge images are depicted. Note that no signal for nds2a is retrieved when no initial heat denaturation is applied to the slides before hybridization, (cd), Interphase and metaphase BJ cells were processed for nds-2a RNA-FISH (red channel) coupled to immunocytochemistry for RCC1 (green channel) and analysed by confocal microscopy. α-Tubulin (a-Tub) and DAPI staining are shown to delineate the cell. Merge and enlarged (Inset) images are depicted, (e), Interphase HeLA cells were processed for nds-2a RNAFISH (red channel) coupled to immunocytochemistry for RAN, RCC1, RANGAPl and RANBP2 (green channel) and analysed by confocal microscopy. α-Tubulin (a-Tub) and DAPI staining are shown to delineate the cell. Merge and enlarged (Inset) images are depicted.
Figure 13: Cell cycle profile of nds-2a transfected cells, (a), HeLA cells were cotransfected with the indicated plasmids and transfection efficiency was calculated as the percentage of cell displaying a positive labelling for GFP (upper right). Non-transfected cell were used as an autofluorescence control, (b), Cell cycle profile of transfected HeLA cells with indicated plasmid pairs. Percentage of cells in each phase of the cell cycle are depicted.
Figure 14: Quality controls of global PLB985-derived stsRNA-Seq libraries, (a-b), Intensity correlation analysis of technical replicates at 20 nt resolution of long fragmented RNAs (50-70 nt) and naturally occurring small RNAs (18-30 nt) from global stsRNA-Seq libraries. Results are displayed as log 2 of original values. Pearson's correlation coefficient values (R) are shown, (c-e), Screenshots of long stsRNA-Seq showing the profile obtained for known genes (HSP90B1 in forward (Fw) strand and cl2orf73 in reverse (Rv) strand), lincRNAs (chr6: 141071891-141249602; Rv strand) and several SNAR RNAs precursors and its corresponding small RNAs (SNAR- A3; Rv strand; small stsRNA-Seq).
Figure 15: Modulation of ndsRNA levels by Retinoic Acid, (a), Flow cytometry analysis of PLB985 cells treated either with vehicle (ETOH) or retinoic acid (RA). Percentage of differentiated cells was determined by CDl lc/CD14 immunolabelling. Fluorescent background signal was assessed by labeling with fluorescently labeled non-specific isotypic antibodies, (b), Representation of the RNA levels of forward (Fw) and reverse (Rv) strands of randomly selected ndsRNAs transcripts in PLB985 samples treated with RA or vehicle determined by stsRT-qPCR. Histogram represents mean RNA levels as arbitrary units (A.U.) +/- SD of one experiment out of two biological replicates. Genome positions for the displayed examples are shown in Table 2.
Figure 16: Schematic overview of a particular embodiment of the invention, representing its modular workflow.
Figure 17: Experimental workflow and timing of a particular embodiment of the method of the invention.
Detailed description
Unless otherwise defined, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this invention belongs.
The invention relates to a method of sequencing RNA molecules in a sample, the method comprising the steps of:
a) capturing RNAs from a sample with a bait nucleic acid or nucleic acid set;
b) sequencing the captured RNAs using a strand specific sequencing method.
The proposed method allows the concomitant identification of long (i.e. more than 50 nucleotides) and short (i.e. 18 to 30 nucleotides) RNAs using a single experimental protocol. This method also allows identifying and characterizing a new class of RNAs, i.e. ndsRNAs. Generation of capture sequences
The method of the present invention is based on the direct hybridization of DNA traps, in particular randomly generated DNA traps such as (5')-biotinylated randomly generated DNA traps generated from a template of interest, such as one or more Bacterial Artificial Chromosomes (BAC) covering the full extent of a genomic region of interest, with either the naturally occurring small RNA fraction (such as miRNA, endosiRNAs, piRNAs, etc.) and/or the total RNA repertoire of a sample of interest. The bait nucleic acid or nucleic acid set may thus be specific of a region of interest. Regions of interest may be selected on the basis of previously collected biological/clinical data such as the presence of single nucleotide polymorphism associated to a particular/several diseases, trans location/amplification/integration hot-spots or known phenotype. The region of interest may also correspond to mitochondrial DNA, virus-specific (DNA or RNA) genome or pathogen genome. The genomic region of interest may also correspond to non-coding DNA. The template used for generating the bait(s) (or traps, or capture sequences) contains or consists of a genomic DNA region of interest fragmented to a controlled size, for example into fragments 250-300 nucleotides long, by any fragmentation means available to those skilled in the art, for example by mechanical fragmentation means. The fragments are then collected, in particular after separation by electrophoresis and purification by gel excision, for instance.
In a particular embodiment, the bait is derived from a DNA template carrying a genomic DNA region of interest such as a BAC, a PAC, a cosmid or a mini-chromosome. In a particular embodiment, the genomic region of interest may be covered by a subset of contiguous BAC DNAs that fully cover the selected genomic region of interest. These sequences are readily available to those skilled in the art. For example, suitable BACs may be identified by using the UCSC browser (http://genome.ucsc.edu)32. In addition, baits may be obtained from single gene cDNAs or cDNA libraries in order to enhance coverage of the selected open reading frames to monitor gene mutations, differential promoter usage or identify novel iso forms.
Baits may be obtained from enzymatic or chemical fragmentation of the template DNA or, preferably, from polymerization reactions from the template using random primers, in particular using as a template one or more BACs. In particular, random primers are used that are modified so that they can be easily purified. In a specific embodiment, the random primers are biotinylated primers. Thanks to this specific embodiment, the resulting baits are biotinylated DNA fragments that may be purified, either alone or as a complex with a complementary RNA by implementing the biotin/streptavidin specific interaction. According to a particular embodiment, the step of generating capture sequences comprises the production of randomly generated 5 '-biotinylated DNA traps from BAC DNAs covering the genome region of interest whose transcripts are to be captured. Polymerization is performed using a polymerase, for example Klenow polymerase, for strand extension from the random primers. Bait size is controlled by the length of the template used, preferably 250-300 nucleotides long as described above, which will generate baits no longer than the template used. To avoid undesired capture events, template DNA of bacterial origin, e.g. BAC DNA, may be removed using a restriction enzyme that selectively cleaves the template DNA and leaves intact the generated baits such as the Dpnl enzyme.
In a particular embodiment of the invention, the step of generating capture sequences comprises:
- providing one or more BACs covering a genomic region of interest;
- generating 5'-biotinylated DNA baits by amplifying said BAC(s) using biotinylated random primers;
- degrading methylated bacterial derived BAC DNA using the Dpnl restriction enzyme. RNA source and isolation
The test RNA (i.e. the RNA from which the capture is to be performed) may be obtained from any sample of interest. The sample of interest may be a cell, a cell culture or a tissue from a subject, for example an animal subject, in particular a mammal or non-mammal animal, in particular from a human subject. For example, the sample may be a cell or tissue sample from a patient in a diseased state. In this case, the present invention permits the identification and characterization of ndsRNAs involved in the disease or indicative of the pathology. RNAs are isolated using methods well known in the art (Molecular Cloning: A Laboratory Manual, Third Edition. J. Sambrook, D. Russell). In particular, kits are readily available to those skilled in the art for performing total RNA extraction from a cell or tissue sample.
The RNAs are extracted and purified using methods well known in the art. In particular, RNAs are purified using Trizol, as is well known to those skilled in molecular biology. Before the capture step, ribosomal RNA may be depleted from the RNA extract such that the RNA transcripts are enriched regardless of their polyadenylation status or the presence of a 5'- cap structure (in figures 16 and 17, the term Ribominus is used for generally describing this ribosomal RNA depletion step). Commercially available kits for depleting ribosomal RNA include the RiboMinus kit from Invitrogen. In addition, single stranded RNA species may be depleted by using a RNase that only degrades single stranded RNAs, such as RNase ONE® from Promega.
Furthermore, in a particular embodiment, the RNA extract may be assessed to determine whether it is free from genomic DNA. For example, quantitative PCR amplification of a short region (e.g. of about 100 bp) from a single copy gene may be implemented. For example, the topoisomerase (DNA) III Alpha (TOP3A) single copy gene may be assessed with the forward primer of SEQ ID NO: 8 (5 '-TC ATCTGTATGGCC AGGT AGG-3 ') and the reverse primer of SEQ ID NO:9 (5 '-GGAACCTTT AGGTTGTT AAC AGTTG-3 ') . If genomic DNA contamination is detected, the RNA extract may be treated with a DNAse, such as the TURBO DNAse, followed by a new RNA extraction as described above, such as a Trizol extraction.
In a particular embodiment, the captured RNAs correspond to a small RNA fraction of RNAs 18 to 30 nucleotide long. In another embodiment, long RNAs are captured, preferably after fragmentation (e.g. chemical or enzymatic fragmentation) to 50-80 nucleotide long RNAs, such as 50-70 or 60-80 nucleotide long RNAs. In a particular embodiment, fragmented long RNAs are 50-70 nucleotide long. Fragmentation of long RNAs can be carried out with divalent cation-based cleavage such as zinc-mediated RNA fragmentation, for example with a zinc based RNA fragmentation reagent such as "RNA fragmentation reagent®" from Ambion. Both small and fragmented long RNA fractions may be purified according to methods well known in the art. In an embodiment, these fractions are purified after migration on denaturing polyacrylamide gel electrophoresis (e.g. PAGE-urea). In a particular embodiment, both small RNAs and long fragmented RNAs are captured in a separate or simultaneous reaction, preferably in separate reactions. In a specific embodiment, the small and long fragmented RNA fractions are captured in separate reactions.
Size selection of long RNAs through gel purification allows the analysis of RNA molecules that would otherwise be size excluded by standard methodologies. Notably, many RNA/cDNA fragmentation methods have been used for RNA-Seq library preparation. However, these techniques are usually biased by structure or sequence specificity, an issue that is significantly reduced in Zn2+-mediated RNA fragmentation. Therefore, the later supports a robust and accurate transcript assembly, a critical issue when dealing with rare RNA species. Notably, RNA-capture protocols described to date require conversion of the input RNA material into a pre-amplified RNA-Seq library (first, second strand cDNA synthesis and PCR) prior to capture. Considering the dynamic range of transcript abundance, high-copy RNA molecules compete with rare transcripts during pre-capture RNA-Seq library preparation and thus, may generate a positive bias towards the most abundant molecules at the expense of low-copy RNAs/fragments. To prevent this issue and reduce sample manipulation prior to capture, the method of the present invention streams its RNA input directly onto the DNA traps, in particular the 5'-biotinylated DNA traps, providing several methodological advantages. For instance, it diminishes positive bias towards high-copy transcripts, it prevents the use of input material containing additional exogenous nucleotide sequences (adapter), which can affect hybridization efficiency and it precludes the risk of spurious DNA- dependent synthesis arising during pre-capture library amplification which may affect strand- specificity.
In a particular embodiment of the invention, the RNA isolation step comprises:
i. gel-purifying size selected small 18-30 nucleotide long RNA population from a sample; ii. depleting ribosomal RNA from the RNA sample;
iii. zinc-mediated fragmenting a long RNA population to 50-80 nucleotide long (e.g. 50-70 or 60-80 nucleotide long) RNAs from the ribosomal RNA depleted RNA sample of step ii;
iv. gel purifying said fragmented RNAs.
RNA capture
The isolated and/or purified RNA molecules, in particular the small fraction RNAs and the fragmented long RNAs described above, are incubated, either together or independently, with the baits generated from the template of interest under conditions allowing hybridization of the RNAs with the baits. In a particular embodiment, a small RNA fraction (18-30 nucleotide long) and a zinc-mediated fragmented gel-purified (50-80, such as 50-70 or 60-80 nucleotide long), ribosomal RNA-depleted, RNA fraction are independently mixed with 5'-biotinylated DNA traps for in-solution hybridization/RNA capture.
In-solution hybridization of RNA molecules using DNA traps bears a kinetic advantage by favoring DNA/RNA hybrid formation over DNA/DNA or RNA/RNA hybridization. Under this premise high hybridization temperatures and stringent washing conditions may be used to increase capture-specificity and reduce off-target hybridization. Based on the length of the RNAs to be captured, in a non-limiting illustrative embodiment, one may use 60°C for capturing small R As (18-30 nucleotide long) and 68 °C for the Zn2+ fragmented gel- purified (e.g. 50-80, such as 50-70 or nucleotide 60-80 nucleotide long), ribosomal RNA- depleted, RNA pool. Of course, one of skill in the art will adapt these conditions, in particular hybridization temperature, depending on RNA length or when a known bias in nucleotide content of the targeted RNA molecule is considered a major caveat.
Captured RNAs, i.e. RNAs forming a complex with baits, are then recovered. For example, if the baits are labeled random primers, the bait/RNA complexes are recovered implementing a selective recognition of said label. In particular, in case of biotinylated baits (e.g. biotinylated random primers), bait/RNA complexes are recovered using a support grafted with streptavidin such as streptavidin (magnetic) beads, columns, etc. Alternatively, RNAs may be captured using nucleic acid (e.g. DNA) arrays. Non-specifically bound material may be removed by washing the support, such as by sequential washings of increased stringency. Bound RNA is then eluted from the support.
To control whether the capture process was successful, dot-blot assays may be performed. Briefly, the material that was used as template (such as BAC DNA) for the generation of DNA traps (e.g. 5 '-biotinylated DNA traps) may be spotted onto a nitrocellulose membrane. Captured RNAs are then radiolabeled and used as probes for dotblot hybridization. Spotting in parallel BAC DNAs covering a non-related genomic region and which expresses known RNA targets may serve as negative control. This approach can be used by one skilled in the art to optimize hybridization temperatures and washing conditions during the initial experimental setup. The capture step implemented herein is used for capturing RNA molecules from a RNA extract. This is advantageous in that it is not necessary to carry out a cDNA synthesis step before sequencing, contrary to what is classically required. In addition, the method of the invention allows the capture of both small and long species of RNAs which allows the determination of precursor/product reactions from a long RNA into a biologically processed small RNA. Without intending to be bound by any theory, the inventors have observed that long ndsRNAs co-exist with shorter ndsRNAs which are identical in sequence with a part of the longer ndsRNAs. Accordingly, it is believed that, in a precursor/product point of view, the small ndsRNAs are generated from the long ndsRNAs. Sequencing
The present invention relies on the implementation of strand specific RNA sequencing. Strand specific RNA sequencing allows retrieval of the polarity of the transcript which in turn allows identifying double stranded RNA, if any, in an RNA sample. As is explained in more details below, potential ndsRNAs are identified when transcript overlaps in opposite strands are detected.
Several sequencing methods are known in the art that preserve directional information, for example using distinguishable adapters for different ends of RNA. Representative methods include the ligation of a 3 '-adapter and a 5 '-adapter, the sequence of which are known and different, to the captured RNAs. As described in the examples, a method for preparing a RNA sample for further sequencing may include:
(1) ligating a 3 '-adapter to the captured RNAs;
(2) optionally, purifying (e.g. on a PAGE/urea device) the 3'-ligated RNAs of step (1);
(3) ligating a 5'-adapter to the RNAs of step (1) or (2);
(4) purifying the ligated RNAs of step (3).
More particularly, the method of the invention implements the following steps for preparing a RNA sample for further sequencing:
- the 3'-ends of the captured RNAs are dephosphorylated using a phosphatase (e.g; antartic phosphatase);
- a 3 '-adapter is ligated to the RNAs using a RNA ligase such as T4 RNA ligase;
- the RNA molecules are 5'-phosphorylated;
- 3 '-adapter ligated RNA molecules are purified, for example by size separating them with denaturing gel electrophoresis and recovery by gel excision according to their expected size followed by RNA precipitation;
- a 5 '-adapter is ligated to the RNAs purified in the preceding step.
Following this last step, the 5'- and 3 '-ligated RNA molecules may be purified by denaturing gel electrophoresis and recovered by gel excision according to their expected size.
Representative commercial kits available for preparing a RNA with 3'-, 5' adapters include the DGE small RNA library kit from Illumina. In a particular embodiment, the 3 '-adapter is a R A adapter, in particular that having the sequence shown in SEQ ID NO: 3:
5'-(5Phos)rUrCrGrUrArUrGrCrCrGrUrCrUrUrCrUrGrCrUrUrGrUrU(3ddC)-3'
wherein 3ddC denotes 3'-dideoxycytidine.
In a particular embodiment, the 5 '-adapter is a RNA adapter, in particular that having the sequence shown in SEQ ID NO: 4:
5'-(5InvddT)rGrUrUrCrArGrArGrUrUrCrUrArCrArGrUrCrCrGrArCrGrArUrC-3' wherein 5InvddT denotes dideoxythymidine covalently linked to the remainder of the adaptor in a inverted orientation.
However, any RNA oligonucleotide of known sequence bearing the described terminal modifications (3ddC for the 3'-adaptor; 5InvddT for the 5'-adaptor) can be used as a potential adapter for the practice of the present invention. The RNAs are then reverse transcribed with specific primers, for example with primers specific of the 3' and 5' adapters and the obtained cDNA may be amplified by PCR. In particular, the invention may implement ligation-mediated reverse transcription followed by PCR amplification. In a particular embodiment of the invention, at this step of the method, multiplexed library preparation can be implemented by using indexed primers during a PCR- based library amplification, allowing multiple samples to be sequenced in parallel in a single sequencing run.
In a particular embodiment, a reverse transcription primer is used, having the sequence shown in SEQ ID NO:5: 5 '-CAAGCAGAAGACGGCATACGA-3 ' . In a further particular embodiment, the primers used for PCR amplification implemented after reverse transcription are those shown in SEQ ID NO: 6 (5'-
AATGATACGGCGACCACCGACAGGTTCAGAGTTCTACAGTCCGA-3 ') and SEQ ID NO : 7 (5 ' -C AAGC AGAAGACGGC AT ACGA-3 ') . Sequencing is then performed according to methods well known in the art, using a sequencing apparatus.
Data processing Sequence datasets are then processed.
Small and long R A sequences may first be preprocessed with a computer program to remove the 5'- and/or 3'-adapter sequence, in particular the 3'-adapter from the reads and, then, adaptor free datasets may be aligned to the genomic region of interest. Alignment may be done using a computer program such as the Bowtie Aligner program (Langmead et al. "Ultrafast and memory efficient alignment of short DNA sequences to the human genome". Genome Biology, 2009). Since transcript polarity is conserved during the construction of the RNA- Sequencing library, the retrieved sequencing reads are 5 '-3' oriented. Therefore, single and potential double stranded RNA transcripts can be reconstructed and identified. Afterwards, transcripts overlap in opposite strands is identified by using a computer program such as BedTools (Quinlan AR and Hall AM. "BEDTools: a flexible suite of utilities comparing genomic features". Bio informatics, 2009) and identified regions are extracted for further analysis. In a particular embodiment, the TEQC R package may be used to assess the capture performance33. Furthermore, the Tophat2/Cufflinks pipeline may be used to map and reconstruct transcripts from ribosomial RNA-depleted, Zn2+ fragmentation-generated long RNA libraries34'35. On the other hand, in another embodiment, small RNA-derived reads are preferably mapped using Bowtie aligner under stringent mapping conditions (no mismatches allowe; Langmead et al., cited supra).
In a particular embodiment, RNA libraries constructed from the small RNA fraction may be pre-processed to remove the 3 '-adapter. IN this case, after adapter trimming, small RNA- derived data will contain sequences of variable length that can be readilly mapped. In addition, reads of less than 21 nucleotides may be removed from small RNA-obtained data, for increasing mapping accuracy and limiting the number of false-positive alignments.
Advantageously, RNA-sequencing library preparation method described above can be used under many different experimental circumstances, as global libraries can be readily prepared and obtained data either for small or long RNA datasets provide high confidence and reproducibility when mapped to a global reference. Moreover, this datasets can be readily used for quantification purposes with a high level of reproducibility and used as an alternative to commercial solutions. Finally ndsR As identification may be achieved through the correlation of the identified regions with small RNAs present in the small RNA datasets. Hence ndsRNA definition is based on the identification of transcript overlap in opposite strands for the long RNA datasets that correlates with the presence of small RNAs within their sequence (Precursor-product relationship). ndsRNAs are identified using bio informatics tools as long double stranded RNAs. ndsRNAs differ from canonical small RNA pathways such as miRNA in that canonical small pathways generate small RNAs from a single stranded RNA precursor, whereas ndsRNAs are processed from a double stranded RNA expressed from opposite strands.
The presence of a ndsRNA can further be validated by strand specific reverse transcription followed by PCR or qPCR for each of the strands in the ndsRNA and their corresponding small RNAs. Furthermore RNAse ONE or RNAse III can be used to further support the double stranded nature of the ndsRNA identified.
The invention relates also to a ndsRNA, in particular a ndsRNA identified according to the method described above. In a particular embodiment, the ndsRNA is ndsRNA-2a or ndsRNA- 2e. In particular, the inventors have shown that ndsRNA-2a is involved in a mitosis-specific RAN containing complex, showing its potential involvement in mitosis. ndsRNA-2a and ndsRNA-2e sequences are shown in SEQ ID NO:l and SEQ ID NO:2, respectively.
SEQ ID NO: l :
AUGCCUGUAAUCCCAGCCACUUGGGAGGCUGAGGCAGGAAAAUUGCUUGAACC CAGGAGGCAGAGGUUGCAGUGAGCCAAGAUCACGCCACUGCACUCCAGCCUGG GCAACAGAGCAAGACUCCAUCUCAAAAAAAG
SEQ ID NO:2:
ACAGAACAGAGGCCUCAGAAAUAACACCACACAUCUACAACCACCUGAUCUUU GACAAACCUGACAAAAACAAGCACUGGGGAAAGGAUUCCCUAUUUAAUAAAUG GUGCUGGGAAAACUG
Methods and uses implementing ndsRNAs The inventors have shown that ndsRNAs have a functional role in the cell. For example, it is herein demonstrated that nds-RNA2a interacts with major mitotic components involved in fundamental aspects of cell physiology ranging from nuclear import/export to spindle assembly and mitotic progression. In addition, overexpression of ndsRNA-2a leads to a range of mitotic defects and a pronounced change in nuclear shape highlighting its role in cell cycle progression.
Furthermore, the inventors have shown that a subset of globally expressed ndsRNAs is modulated upon retinoic acid treatment, demonstrating that the novel RNAs are regulated by cellular cues and participate in a plethora of regulatory systems.
Therefore, an object of the invention is also a method for identifying a marker associated with a phenotype or cell function, or for identifying a target for the treatment of a disease or for identifying a bio marker indicative of a disease or condition. This method comprises the identification of ndsRNAs, or determining the expression profile of ndsRNAs in a sample of interest, thereby associating the ndsRNAs identified to a phenotype, cell function, or disease. For example, the sample of interest may correspond to a cancer cell and ndsRNA expression profiling allows the identification of biomarkers indicative of this cancer or the identification of a ndsRNA which could be the target of a treatment (for example by increasing or decreasing the expression of said ndsRNA).
The invention also relates to a ndsRNA that may be identified thanks to the method described in the preceding paragraph.
In addition, the invention further relates to a method for identifying the function of a ndsRNA, wherein said ndsRNA is either introduced or depleted in a cell, tissue, organ or organism (in particular a mammal organism, more particularly a non-human organism) and phenotypic or functional changes occurring after said introduction or depletion are determined. Representative changes searched include, for example, changes in the cell cycle, cell shape, induction of apoptosis, induction of cell differentiation, induction of a sensitivity or resistance to a therapeutic molecule, etc. The characterization of a ndsRNA may also comprise the identification of binding partners of said ndsRNA, in particular of binding proteins, for example using a mass spectrometry (MS) analysis, as provided in the examples. In an embodiment, the binding partners are identified using a biotinylated RNA which is incubated with a protein sample, for example a whole cell extract, a nuclear extract, a cytoplasmic extract or with a protein produced in vitro, the biotinylated ndsRNA:protein complex is then captured on a streptavidin covered support and then protein analysis is performed, in particular using a MS analysis.
The following examples are given for purposes of illustration and not by way of limitation.
Examples
Material and methods Cell culture and total RNA preparation
PLB985 cells were grown in RPMI medium supplemented with 25 mM HEPES, 10% FCS and glutamine. BJ and BJELR cells were grown in DMEM/M199 1 :4 (lg/1 glucose) supplemented with 10% FCS. HeLA cells were grown in DMEM (lg/1 glucose), 5% FCS supplemented with glutamine. Total RNA was extracted using Trizol (Invitrogen) according to manufacturer instructions.
RNA fragmentation Ribominus RNA (Invitrogen) was fragmented in zinc-based RNA fragmentation reagent (Ambion) during 6 min at 70°C, separated by PAGE/urea and 50-70 nt RNA was recovered by NaCl overnight elution.
BAC DNA library
BACs (RP11-3O20, RPl 1-44018, RP11-770K21, RP11-588B17) covering the RAM region were obtained from Children's Hospital Oakland Research Institute (CHORI). Briefly, 200 ng of an equimolar mix of BAC DNA was sonicated for 37 cycles (10 sec ON, 50 sec off, amplitude 30%) in a Vibra-Cell apparatus (Bioblock Scientific) in lysis buffer (50 mM HEPES H 7.5, 140 mM NaCl, 1% Titron X-100, 0.1% Na-Deoxycholate, supplemented with protease inhibitors). Sonicated BAC DNAs were size-separated by agarose gel electrophoresis and 200-300 bp band was purified by QIAquick column (Qiagen). TRAPs
Traps (or baits) were generated by using MEGAPrime random primer labelling system (Amersham) with 250 ng of BAC DNA library and 5' Biotin-random primers. After Klenow extension, BAC DNA was removed from the Traps by Dpn I treatment. RNA capture
Specific RAM-region traps were generated by random priming of 200-300 bp purified sonicated BACS (RP11-44018, RP11-588B17, RP11-770K21 and RP11-3O20) using 5 '- biotinylated primers. Long RNA fraction was prepared by chemical fragmentation of total RNA (50-70 nt, Ambion) and further purified by PAGE, whereas the small RNA fraction (18- 30 nt) was directly prepared by PAGE. Both fractions were incubated with the traps in Binding buffer (0.5 M NaCl, 0.01 M Tris-HCl pH 7.5, 0.5% SDS, 0.1 mM EDTA) at 62°C (small RNAs) or 68°C (long RNAs) overnight. RNA traps were further recovered using magnetic streptavidin beads, RNA eluted into Elution buffer (0.01 M Tris-HCl pH 7.5, 1 mM EDTA) and further purified by PAGE according to previous size selection.
Strand-Specific RNA-Seq
Eluted RNAs were dephosphorylated with 5 units of Antarctic phosphatase (NEB) separated by PAGE/Urea gel electrophoresis and purified by gel excision according to prior size selection. 3' RNA Adapter (Illumina) was ligated to purified fragments with T4 RNA ligase during 6 h at 20°C followed by an overnight incubation at 4°C. Ligated fragments were size separated by PAGE/Urea and purified according to prior size selection. 3'-ligated RNAs were further 5'phosphorylated by PNK treatment for 1 h at 37°C, size selected by PAGE/Urea and purified. 5' RNA Adapter (Illumina) was ligated to 3'-ligated RNA fragments with T4 RNA ligase during 6 h at 20°C and further incubated over night at 4°C. 5 -3' adapter-ligated RNA was separated by PAGE/Urea and purified between 70-90 nt for the small RNA fraction and between 100-130 nt for the long RNA fraction. Reverse transcription was performed by using Superscript II (Invitrogen) with specific primers for 1 h at 44°C. Final amplification was performed by 15 cycles of PCR amplification using Phusion DNA polymerase (Finnzymes). Library quality and ligation steps were assessed when possible by Agilent Bioanalyzer. stsRNA-Seq analysis Strand-specific RNA-Seq data was analyzed by custom scripts to remove low quality reads, reads shorter than 21 nt in the case of small RNA libraries and shorter than 35 for long RNA libraries, adapter contamination and empty reads. Final datasets were aligned using Bowtie Aligner allowing up to 2 mismatches to either map the reads to the RAM region or to the human genome (hgl9) according to the experimental setup. For each experiment analysed, ~24M reads were uniquely mapped to hgl9 for the long RNA datasets whereas for the small RNA datasets the number of unique aligned reads was ~12M. Aligned reads were further processed for strand specificity and wig files generated for visualization.
Correlation analysis
Intensity correlation analysis was performed at 20 nt resolution with a custom pipeline for the RAM region and for global analysis. To correlate reproducibility between experiments, a second correlation analysis was performed by a binary analysis indicating whether transcripts were present or not in a determined window by using a custom pipeline.
Identified transcript validation
Class I-IV transcripts were validated by strand-specific reverse transcription followed by PCR. Specific primers were designed with T7 promoter overhanging bases in order to establish reverse transcription orientation. Small RNA determination was assessed with custom Taqman primers (Applied Biosystems) and relative expression levels were determined by reverse transcription followed by real time PCR.
AG02 immunoprecipitation and bound small RNA analysis
5x106 PLB cells were centrifuged at 700 rpm during 7 min, washed twice with ice cold PBS and lysed with microRNA isolation kit, human AG02 (Wako chemicals) lysis buffer and immunoprecipitation was performed following manufacturer's instructions. Eluted samples were further processed for RNA purification by Trizol (Invitrogen) extraction. Small RNA determination was assessed by custom Taqman primers (Applied Biosystems). Specific primers for U6 RNA, U49 snoRNA and let-7c miRNA were used according to manufacturer's instructions (Applied Biosystems). Immunoprecipitated AG02 levels were assessed by SDS- PAGE followed by silver staining and Western blot against AG02.
Transient transfection
Transient transfection in BJELR cells was done following standard reverse transfection protocols using lipofectamine RNAiMAX (Invitrogen). ON-target plus smart pools for knocking down Dicer 1 (L-003483-00), Drosha (L-016996-00), AG02 (L-004639-00) and EXOSC3 (L-03195501) as well as scramble negative control (D-001210-01-05) were purchased from Dharmacon and used at a final concentration of lOmM. Samples were collected 72 h post-transfection. Knock down efficiency was controlled by RT-qPCR (using customized primers, sequences are available upon request) and western blot assays. Western blots were performed following standard protocols, mouse polyclonal to EXOSC3 (ab88859), mouse monoclonal to AG02 (ab57133) and mouse monoclonal to Dicer l(abl4601) were purchased from Abeam. Rabbit monoclonal antibody against Drosha (D28B1) was purchased from Cell Signalling. Goat polyclonal to actin (c-11 , sc-1615) was purchased from Santa Cruz Biotechnology.
Nuclear/Cytoplasmic fractionation
BJELR cells (80% confluence) were rinsed twice with ice-cold PBS, collected in PBS and recovered in lx hypotonic buffer (Cellytic Nuclear Xtract - Sigma) supplemented with 1 mM DTT and protease inhibitors, incubated 15 minutes on ice, vortexed in the presence of Igepal, spun down at 11000 rpm and the supernatant conserved as the Cytoplasmic fraction. Immediately after, the pellet was resuspended in Extraction buffer (Cellytic Nuclear Xtract) supplemented with 1 mM DTT and protease inhibitors and incubated 30 minutes in a thermomixer at 1400 rpm at 8°C (Nuclear fraction). All obtained fractions were aliquoted, flash-frozen and stored at -80°C. ndsRNA electrophoretic mobility shift assay (EMS A)
NdsRNA-2a and ndsRNA-2e sequences were cloned into pGEM-T easy vector and further PCR amplified using T7 tagged oligonucleotides from both flanks (Expand High Fidelity - Roche). PCR products were in- vitro transcribed (MegaScript RNAi - Ambion) and 5 '- radioactively labelled by Poly Nucleotide Kinase (PNK - Promega). Nuclear extract was incubated with radio labelled ndsRNA-2a/2e (~25 fmol/reaction) in DBD buffer (10 mM Tris- HC1 pH 8, 0.1 mM EDTA pH 8, 0.4 mM DTT, 5% Glycerol, tRNA, supplemented with NaCl according to experimental setup) and incubated during 15 min at room temperature prior to native PAGE. Competition was achieved by addition of non-radioactive ndsRNA-2a or ndsRNA-2e (50 fmo 1/200 fmol range). When purified recombinant proteins were used, 120 ng of RAN or RCC1 (Origene Technologies) were incubated with ndsRNA-2a or ndsRNA-2e as described above. ndsRNA in vitro binding assay ndsRNA-2a and ndsRNA-2e were PCR amplified, in vitro transcribed with MegaScript RNAi kit (Ambion) and 3 '-biotinylated using RNA 3 '-biotinylation kit (Pierce) according to the manufacturer's instructions. Biotinylated ndsRNAs were immobilized in 5 mM Tris-HCl pH 7.5, 0.5 mM EDTA and 1M NaCl to "my ONE" streptavidin magnetic beads (Invitrogen) for 1 h at 22°C in thermomixer prior to nuclear extract incubation. Nuclear extract was prepared as described above but subjected to two rounds of pre-clearing with "my ONE" magnetic beads in DBD buffer lx supplemented with tRNA and NaCl prior to interaction with immobilized ndsRNAs. Final N.E./ndsRNA incubation was performed at 22°C in a thermomixer for 15 min and further washed 3 times with DBD buffer supplemented with NaCl and NP-40 at room temperature. Finally magnetic beads were recovered in Laemmli buffer, boiled for 10 min and separated by SDS-PAGE or eluted in 1 M NaCl prior to Liquid Chromatography followed by Mass Spectrometry (LC-MS/MS). Protein composition was evaluated by silver staining or Western blot when appropriate. RNA Fluorescence in-situ hybridization coupled to immunocytochemistry ndsRNA-2a sequence was PCR amplified and in- vitro transcribed with T7 RNA polymerase using Chromatide Alexa Fluor 546-14-UTP or Chromatide Alexa Fluor 488-5-UTP as a source of UTP. Fluorescently labeled forward or reverse ndsRNA-2a strands were Trizol purified and stored at -80°C. For immunofluorescence analysis, BJ and HeLA cells were grown in round coverslips and treated according to each experimental setup. Cells were fixed with 3% paraformaldehyde, 4% sucrose in 10 mM PBS for 10 min, permeabilized with 0,25% Triton X- 100 in 10 mM PBS for 10 min, and then blocked for 1 h in 1% BSA in 10 mM PBS (Blocking buffer). Coverslips were incubated over night at 4 °C in blocking buffer containing RAN (#4462, Cell Signalling), RCC1 (#5134, Cell Signalling), RANGAP1 (ab92360, Abeam) or RANBP2 (ab64276, Abeam) antibodies. Cells were washed twice in 10 mM PBS, 0.1% Tween 20, and incubated with secondary antibody Alexa 488 (Molecular Probes). Cells were crosslinked again and washed twice with SSC 2·, 50%> formamide before hybridization. ndsRNA-2a strand specific probes were hybridized in hybridization buffer (2· SSC, 20%> dextran sulfate, and 1 mg/mL BSA) overnight at 37 °C in a thermomixer humid chamber, washed twice with 2»SSC in 50%> formamide, twice with 2»SSC, and counterstained with DAPI. Finally coverslips were mounted in ProLong Antifade (Molecular Probes), and visualized on a confocal laser-scanning microscope SP2-MP (Leica). ndsPvNA-2 overexpression pTREG-bi plasmid (Clontech) bearing an inducible bidirectional promoter was modified in order to express a 5' (termed CD) or 3' (termed AB) SP6-tagged version of ndsRNA-2a. HeLA Tet-ON 3G cells were AB/CD trans fected, treated with doxy eye line 12 h postransfection and collected in trizol for RNA analysis or processed for immunocytochemistry 24 h later. Overexpression and double-stranded nature of exogenous ndsRNA-2a was confirmed by reverse transcription using T7 flagged primers that recognizes the SP6 tag followed by PCR on RNAse ONE treated samples. For phenotypic analysis HeLA cells were co-transfected with ndsRNA-2a overexpressing plasmids (AB/CD) and histone Hl- GFP (AB/CD:H1-GFP 4: 1 ratio) to control in-well efficiency of transfection. Cells were fixed, permeabilized and stained for a-tubulin as previously described. The number of cells displaying an abnormal nuclear morphology, chromatin bridges and bi/multinuclei were determined by double blind analysis in 3 independent experiments (3000 cell counted for each condition).
Results
Recent advances in high throughout sequencing technologies have disheveled an enormous diversity of RNA transcripts arising from unexplored regions within the genome, leading to the discovery of novel regulatory paradigms1"5. However, the transcriptional profile of hundreds of large genomic regions displaying disease-associated markers such as translocations, chromosomal rearrangements and single nucleotide polymorphisms (SNPs) remains largely unexplored6"8. One of those regions encompasses -500 kb on chromosome 8 (RAM region: 130,269,750-130,744,812), is critically involved in retinoic acid induced differentiation9 and contains multiple disease susceptibility SNPs10"17. By using a novel R A capture approach followed by strand- specific RNA sequencing, we demonstrate that a plethora of RNAs map on both sense and antisense strands from the RAM region. Importantly, we unequivocally demonstrate that sense-antisense RNA pairs coexist within the same cell and generate stable long natural double-strand RNA (ndsRNA). Moreover we evidenced that ndsRNAs are mainly localized in the nucleus and establish specific interactions with nuclear components. Particularly, we demonstrate that ndsRNA-2a interacts with the mitotic RAN/RANGAP 1 -SUMO 1/RANBP2 complex in a RAN-dependent manner and displays differential nuclear localization throughout the cell cycle. Importantly, ndsRNA- 2a overexpression leads to a range of mitotic defects and a pronounced change in nuclear shape highlighting its involvement in cell cycle progression. Finally, global strand- specific RNA sequencing show that ndsRNA signatures are genome wide interspersed and revealed that ndsRNA molecules are modulated upon cellular cues. Taken together this study reveals ndsRNAs as novel members of the natural RNA-repertoire in human cells that are involved in a plethora of regulatory processes.
RNA capture and strand-specific RNA- Sequencing unveils ndsRNAs
To generate a comprehensive view of the transcripts originating from the RAM region we developed an RNA capture approach (Fig. 5a) coupled to a customized strand-specific RNA sequencing protocol (stsRNA-Seq). This technology permits the concomitant identification of long (>50 nt) and small (18-30 nt) RNAs using a single experimental protocol. Briefly, DNA traps obtained from random priming of BAC DNA covering the RAM region were hybridized to either the naturally occurring small RNA fraction (18-30 nt) or chemically fragmented and size selected (50-70 nt) total RNA from human leukemic PLB985 cells. Captured RNAs (long and small) were recovered, subjected to stsRNA-Seq and reads were aligned to the RAM region. Unexpectedly, RNA profiling revealed a plethora of RNAs mapping to either strand of RAM region including 437 long (Fig. la, upper panel; Fig. 5a) and 630 small RNAs (Fig. la, lower panel). Importantly, -90% of the identified transcripts were detected in three independent experiments indicating a high level of reproducibility among biological and technical replicates (Fig. lb; see material and methods). Bioinformatics analysis between transcripts from the long and small RNA datasets identified four different RNA classes (Fig. lc, Class I-IV). Class I comprises 'classical' transcripts from the long RNA fraction mapping on either forward (Fw) or reverse (Rv) strands, that do not overlap either with long RNAs on the opposite strand nor with small RNAs (Fig. lc and Fig. 5c, Class I). Class II transcripts are long RNA molecules mapping to one strand and overlap with a small RNA (sRNA; Fig. lc and Fig. 5d). The existence of Classes III and IV was unexpected as these RNAs correspond to overlapping long transcripts from both strands and represent -22% of all mapped RNAs (Fig. lc and Fig. 5b, e-f). Within the overlapping region we detected either a single small RNA mapping to one of the strands or two complementary small RNAs originating from both strands. Importantly, 36/40 randomly selected class I-IV long transcripts and 17/17 small class II-IV RNAs were validated using strand- specific reverse transcription followed by quantitative PCR (stsRT-PCR; Fig. 5c-f and Table 1), revealing the high confidence of the capture protocol for RNA discovery. Furthermore, the expression of Class III and IV long and
18 sRNAs was confirmed in an unrelated transformed cell line (BJELR, Fig. lg and Fig. 6c) . Notably, the levels of Class I-IV validated transcripts were not modified upon exosome depletion indicating that these molecules are neither products of pervasive transcription nor
19-21
byproducts of RNA degradation (Fig. 7a-b, 7g and 9d) . Moreover, even though nds- derived sRNAs are loaded into AG02 (Fig. 8), neither their levels nor those of their corresponding long precursors rely on any of the small RNA biogenesis pathways described
22-25
to date (Fig. 7a-f and 9a-c)
Table 1 :
Figure imgf000027_0001
ndsRNAs are natural components of human cells The validation of 11/11 Class III-IV long complementary RNAs prompted us to analyze whether these molecules exist as double-stranded RNA within the cell. If these overlapping transcripts exist as double-strand RNA they should be resistant against an RNAse displaying single-strand specificity (RNAse ONE). Indeed, when total RNA from PLB985 cells was subjected to R Ase ONE treatment Class III-IV transcripts were protected from RNAse degradation (Fig. Id and Fig. 6a). Contrary, single strand Class I-II and GAPDH transcripts were not protected. Furthermore, when total RNA was incubated with an RNAse displaying double-strand RNA specificity (RNAse III), all Class III/IV transcripts were degraded (Fig. le, Fig. 6b) supporting the double-stranded nature of these RNA molecules in human cells. ndsRNAs establish specific interactions with nuclear components
To gain insight into the function of the identified ndsRNAs we analyzed their subcellular localization. Interestingly, Class III/IV ndsRNAs are predominantly nuclear (Fig. If, upper row), whereas their corresponding small RNAs are located either exclusively in the nucleus or in both nuclear and cytoplasmic fractions (Fig If, lower panel). The nuclear localization of ndsRNAs prompted us to analyze whether these molecules interact with nuclear proteins. Therefore we performed electrophoretic mobility shift assays with 2 previously identified radioactively labeled ndsRNAs (nds-2a and nds-2e) and nuclear extract obtained from BJELR cells. The results indicated that both ndsRNAs specifically interact with nuclear proteins (Fig. 2a). To identify binding partner(s) immobilized biotinylated nds-2a/2e were incubated with nuclear extract. Analysis of ndsRNA-bound proteins by Liquid Chromatography followed by Mass Spectrometry (LC-MS/MS) indicated that nds-2a and nds-2e bind different proteins/complexes, suggesting that ndsRNAs establish specific interactions within the cell (Fig. lOa-c). nds-2a binds a mitosis-specific RAN containing complex Gene ontology analysis performed over nds-2a binding proteins show enrichment of mitosis- related proteins (Fig. 2b; Fig. lOb-c). Particularly, RAN, RCC1, RANGAP1 and RANBP2 were found to interact with nds-2a (Fig. lOd; note the high peptide coverage). Importantly, these partners are major mitotic components involved in fundamental aspects of cell
26 physiology ranging from nuclear import/export to spindle assembly and mitotic progression . The interaction of nds-2a with RAN, RCC1, RANGAP1 and RANBP2 was validated by two complementary approaches. In the first one, nds-2a was biotin-immobilized, incubated with nuclear extract and the presence of these 4 proteins in the bound fraction was confirmed by Western blot. None of the analyzed proteins was detected when nds-2e was used as bait, supporting that nds-2a bind selectively to these components of the mitotic machinery (Fig. 2c). Importantly, we observed that only the mitosis-specific sumoylated form of RANGAPl
27
(RANGAPl -SUMO 1) was present in the complex with nds-2a. In the second approach, RAN, RCCl, RANGAPl and RANBP2 were immunoprecipitated from BJELR cells and the coprecipitated fraction was evaluated for nds-2a presence by stsRT-qPCR (Fig. 2d and Fig. 1 lb). Both nds-2a forward and reverse strands were enriched in the coprecipitated material, supporting that nds-2a interacts with members of the RAN complex in vivo. In order to discriminate whether nds-2a interacts with a single or several proteins of the RAN complex, RAN, RCCl, RANGAPl or RANBP2-depleted nuclear extracts were used to perform in vitro interaction assays in the presence of biotin-labeled nds-2a. The interaction of nds-2a with the RAN complex was abrogated in RAN depleted nuclear extracts, since the absence of RAN impaired the detection of RANGAPl or RANBP2. Contrary, RCCl binding remained unaffected, thus revealing RAN-independent interaction. When nuclear extracts depleted for RCCl were used, RAN/RANGAP1/RANBP2 interaction was unaffected. Finally, depletion of RANBP2 neither affected nds-2a-RAN nor nds-2a-RCCl interaction (Fig. 2e). Altogether, these data indicate that RAN and RCCl interact with nds-2a, suggesting that at least 2 different nds2a/protein complexes exist (nds-2a/RAN/RANGAP 1 -SUMO 1/RANBP2 and nds-2a/RCCl). To further support this notion we evaluated the ability of recombinant purified RAN or RCCl to bind to nds-2a. Indeed, RAN and RCCl bound directly to nds-2a while they failed to bind to nds2e, thus indicating that RAN and RCCl contain nds-2a-specific binding surfaces (Fig. 11c).
Altered nds-2a levels result in mitotic defects
Since data indicates that nds-2a binds a mitosis-specific species of RANGAPl (Fig. 2c and
27
2e) and since nds-2a levels increase along cell cycle progression (G1-S-G2/M, Fig. l ld-e) we investigated the dynamics of the intracellular localization of nds-2a throughout the cell cycle. For this we performed RNA-Fluorescence In Situ Hybridization (RNA-FISH) for nds- 2a (Fig. 2h and Fig. 12a-b) and coupled it to immunocytochemistry for RAN, RCCl, RANGAPl and RANBP2 using cycling normal mesenchymal fibroblasts (BJ) and HeLA cells. Confocal microscopy showed that whereas nds-2a co localizes with RAN and RCCl in interphase nuclei, no such co localization was observed with RANGAPl or RANBP2 in the same conditions (Fig. 2f and Fig. 12c and 12e). Contrary, when metaphase cells were analyzed, nds2a colocalized with RAN, RANGAPl and RANBP2 in the cellular periphery and particular structures along the mitotic spindle (Fig. 2g and 2i and Fig. 12d) reinforcing the notion that this particular ndsR A participates in mitosis-related events. To support this hypothesis, we generated a set of nds-2a overexpressing plasmids bearing a SP6 sequence tag located either in the 3' (termed "AB") or 5' (termed "CD") end of nds-2a (Fig. 3a), transfected HeLA cells and verified the double strand nature of the overexpressed molecule (Fig. 3b). Importantly, double-blind confocal examination revealed that the number of cells displaying mitotic defects (chromatin "bridges" and bi/multinucleated cells) was increased in AB/CD transfected cells compared to controls (Fig. 3c-d). Furthermore, the number of nuclei displaying abnormal shapes in the entire population was also increased in nds-2a overexpressing cells (Fig. 3d-f). Collectively, this data indicates that nds-2a is a functionally important component of the mitotic machinery and highlights its biological relevance in a complex biological setting such as mitotic progression. ndsRNAs are modulated by cellular cues and represent a novel class of R A
Global stsRNA-seq libraries demonstrated that ndsRNAs are expressed throughout the entire genome and are more abundant in intergenic regions than in exons/introns (Fig. 4a and Fig. 14). Moreover, detailed database exploration showed that ndsRNAs map within different RNA classes (Fig. 4b) suggesting that ndsRNAs are not restricted to any previously described transcript family, but rather represent a novel class of RNAs interspersed within the human genome. Notably, we observed that a subset of globally expressed ndsRNAs is modulated upon retinoic acid treatment (Fig. 4c-d and Fig.15) supporting the notion that these novel molecules are regulated by cellular cues and might participate in a plethora of regulatory systems. Genome positions for the displayed examples are shown in Table 2.
Table 2:
Figure imgf000031_0001
Discussion
The increasing number of non-coding RNA species identified indicates that the transcriptional
1-5 landscape in higher eukaryotes is much more complex than originally anticipated However the biological role for the large majority of these molecules remains elusive. In this work, we unequivocally demonstrate that sense-antisense transcripts coexist within the cell and generate stable nuclear double-stranded RNAs. Contrary to previous reports suggesting that overlapping sense/antisense RNA expression is restricted to pseudogenes or repetitive
28-30
elements in restricted biological scenarios , we show that ndsRNAs map to interspersed elements along the genome indicating that they correspond to a new class of RNAs. Interestingly, although our initial evidence suggested that ndsRNAs were merely sRNA precursors, we demonstrate that ndsRNAs establish specific RNA-protein interactions suggesting that these molecules serve diverse functions within the cell. Particularly, we provide evidence that nds-2a displays differential localization throughout the cell cycle, interacts with the mitotic RAN/RANGAP1 SUM01/RANBP2 complex and localizes within the mitotic spindle, supporting its biological relevance. Importantly, nds-2a overexpression leads to a range of mitotic defects and a pronounced change in nuclear shape highlighting its role in cell cycle progression. All in all, our work expands the already complex RNA catalog and demonstrates that ndsRNAs play fundamental roles in cellular physiology. References
1. Guttman, M. & Rinn, J.L. Modular regulatory principles of large non-coding RNAs. Nature 482, 339-46 (2012).
2. Guil, S. et al. Intronic RNAs mediate EZH2 regulation of epigenetic targets. Nat Struct Mol Biol 19, 664-70 (2012).
3. Memczak, S. et al. Circular RNAs are a large class of animal RNAs with regulatory potency. Nature 495, 333-8 (2013).
4. Mercer, T.R. et al. Targeted RNA sequencing reveals the deep complexity of the human transcriptome. Nat Biotechnol 30, 99-104 (2012).
5. Mercer, T.R. & Mattick, J.S. Structure and function of long noncoding RNAs in epigenetic regulation. Nat Struct Mol Biol 20, 300-7 (2013).
6. Fletcher, O. & Houlston, R.S. Architecture of inherited susceptibility to common cancer. Nat Rev Cancer 10, 353-61 (2010).
7. Greenman, C. et al. Patterns of somatic mutation in human cancer genomes. Nature 446, 153-8 (2007).
8. Wood, L.D. et al. The genomic landscapes of human breast and colorectal cancers. Science 318, 1 108-13 (2007).
9. Yin, W., Rossin, A., Clifford, J.L. & Gronemeyer, H. Co-resistance to retinoic acid and TRAIL by insertion mutagenesis into RAM. Oncogene 25, 3735-44 (2006).
10. Kiemeney, L.A. et al. Sequence variant on 8q24 confers susceptibility to urinary bladder cancer. Nat Genet 40, 1307-12 (2008).
11. Radtke, I. et al. Genomic analysis reveals few genetic alterations in pediatric acute myeloid leukemia. Proc Natl Acad Sci USA 106, 12944-9 (2009).
12. Rafiq, M.A. et al. Mapping of three novel loci for non-syndromic autosomal recessive mental retardation (NS-ARMR) in consanguineous families from Pakistan. Clin Genet 78, 478-83 (2010).
13. Schoemaker, M.J. et al. Interaction between 5 genetic variants and allergy in glioma risk. Am J Epidemiol 111, 1 165-73 (2010).
14. Shete, S. et al. Genome-wide association study identifies five susceptibility loci for glioma. Nat Genet 41, 899-904 (2009).
15. Simon, M. et al. Genetic risk profiles identify different molecular etiologies for glioma. Clin Cancer Res 16, 5252-9 (2010).
16. Tomlinson, I. et al. A genome-wide association scan of tag SNPs identifies a susceptibility variant for colorectal cancer at 8q24.21. Nat Genet 39, 984-8 (2007).
17. Jenkins, R.B. et al. A low-frequency variant at 8q24.21 is strongly associated with risk of oligodendroglial tumors and astrocytomas with IDH1 or IDH2 mutation. Nat Genet 44, 1122- 5 (2012).
18. Hahn, W.C. et al. Creation of human tumour cells with defined genetic elements. Nature 400, 464-8 (1999).
19. Lykke-Andersen, S., Brodersen, D.E. & Jensen, T.H. Origins and activities of the eukaryotic exosome. J Cell Sci 122, 1487-94 (2009).
20. Preker, P. et al. RNA exosome depletion reveals transcription upstream of active human promoters. Science 322, 1851-4 (2008).
21. Belostotsky, D. Exosome complex and pervasive transcription in eukaryotic genomes. Curr Opin Cell Biol 21, 352-8 (2009).
22. Yang, J.S. & Lai, E.C. Alternative miRNA biogenesis pathways and the interpretation of core miRNA pathway mutants. Mol Cell 43, 892-903 (2011).
23. Rana, T.M. Illuminating the silence: understanding the structure and function of small RNAs. Nat Rev Mol Cell Biol 8, 23-36 (2007).
24. Czech, B. & Hannon, G.J. Small RNA sorting: matchmaking for Argonautes. Nat Rev Genet 12, 19-31 (2011).
25. Djuranovic, S., Nahvi, A. & Green, R. A parsimonious model for gene regulation by miRNAs. Science 331, 550-3 (2011).
26. Clarke, P.R. & Zhang, C. Spatial and temporal coordination of mitosis by Ran GTPase. Nat Rev Mol Cell Biol 9, 464-77 (2008).
27. Joseph, J., Tan, S.H., Karpova, T.S., McNally, J.G. & Dasso, M. SUMO-1 targets RanGAPl to kinetochores and mitotic spindles. J Cell Biol 156, 595-602 (2002).
28. Katayama, S. et al. Antisense transcription in the mammalian transcriptome. Science 309, 1564-6 (2005).
29. Tarn, O.H. et al. Pseudogene-derived small interfering R As regulate gene expression in mouse oocytes. Nature 453, 534-8 (2008).
30. Watanabe, T. et al. Endogenous siRNAs from naturally formed dsRNAs regulate transcripts in mouse oocytes. Nature 453, 539-43 (2008).
31. Ng SB. et al. Targeted capture and massively parallel sequencing of 12 human exomes. Nature 461, 272-76 (2009).
32. Karolchik, D. et al. The UCSC Table Browser data retrieval tool. Nucleic acids research 32, D493-496, doi:10.1093/nar/gkhl0332/suppl_l/D493 [pii] (2004).
33. Hummel, M et al. TEQC: an R package for quality control in target capture experiments.
Bioinformatics 27, 1316-1317, doi: 10.1093/bioinformatics/btrl22 (2011).
34. Trapnell, C. et al. Transcript assembly and quantification by RNA-Seq reveals unannotated transcripts and isoform switching during cell differentiation. Nature biotechnology 28, 511-515, doi: 10.1038/nbt.l621 (2010).
35. Kim, D. et al. TopHat2: accurate alignment of transcriptomes in the presence of insertions, deletions and gene fusions. Genome biology 14, R36, doi:10.1186/gb-2013-14-4- r36 (2013).

Claims

1. A method of sequencing RNA molecules in a sample, the method comprising the steps of: a) capturing RNAs from a sample with bait nucleic acid or nucleic acid set; and b) sequencing the captured RNAs using a strand specific sequencing method.
2. The method according to claim 1, wherein step a) comprises capturing RNAs with a bait nucleic acid or nucleic acid set derived from a genomic DNA of interest.
3. The method according to claim 1 or 2, wherein the bait nucleic acid or nucleic acid set is obtained from random priming of said genomic DNA.
4. The method according to any one of the preceding claims, wherein the genomic DNA corresponds to the whole genome of a cell, or is contained in vector such as a BAC, a PAC, a plasmid, a cosmid or a mini-chromosome.
5. The method according to any one of claims 1 to 4, wherein the RNA capture is carried out with capture sequences generated according to the following steps:
- providing one or more BACs covering a genomic region of interest;
- generating 5'-biotinylated DNA baits by amplifying said BAC(s) using biotinylated random primers;
- degrading methylated bacterial derived BAC DNA using the Dpnl restriction enzyme.
6. The method according to any one of the preceding claims, wherein the captured RNAs correspond to a small RNA fraction of RNAs 18 to 30 nucleotide long and/or a fragmented and size selected long RNA fraction of RNAs 50 to 80, in particular 50 to 70 or 60-80, nucleotide long.
7. The method according to any one of claims 1 to 6, wherein RNAs are isolated according to the following steps:
i. gel-purifying size selected small 18-30 nucleotide long RNA population from a sample; ii. depleting ribosomal RNA from the RNA sample;
iii. zinc-mediated fragmenting a long RNA population to 50-80 nucleotide long (e.g. 50-70 or 60-80 nucleotide long) RNAs from the ribosomal RNA depleted RNA sample of step ii; iv. gel purifying said fragmented RNAs.
8. The method according to any one of the preceding claims, wherein the captured RNAs are modified by ligating 5'- and 3 '-adapter sequences on said captured RNAs, such that RNA polarity information may be retrieved during the sequencing step.
9. The method according to claim 8, wherein the method implements the following steps for preparing a RNA sample for further sequencing:
(1) ligating a 3 '-adapter to the captured RNAs;
(2) optionally, purifying (e.g. on a PAGE/urea device) the 3'-ligated RNAs of step (1);
(3) ligating a 5'-adapter to the RNAs of step (1) or (2);
(4) purifying the ligated RNAs of step (3).
10. The method according to claim 9, wherein the method implements the following steps for preparing a RNA sample for further sequencing:
- the 3'-ends of the captured RNAs are dephosphorylated using a phosphatase (e.g; antartic phosphatase);
- a 3 '-adapter is ligated to the RNAs using a RNA ligase such as T4 RNA ligase;
- the RNA molecules are 5'-phosphorylated;
- 3 '-adapter ligated RNA molecules are purified, for example by size separating them with denaturing gel electrophoresis and recovery by gel excision according to their expected size followed by RNA precipitation;
- a 5 '-adapter is ligated to the RNAs purified in the preceding step.
Following this last step, the 5'- and 3 '-ligated RNA molecules may be purified by denaturing gel electrophoresis and recovered by gel excision according to their expected size.
11. The method according to any one of claims 1 to 10, comprising reverse transcribing the captured RNAs and then submitting the produced cDNA to sequencing.
12. A method for identifying a natural double stranded RNA (ndsRNA) in a RNA sample, comprising implementing the method according to any one of claims 1 to 11 and thereby identifying one or more long natural double stranded RNAs in said sample.
13. The method according to claim 12, wherein a ndsRNA is identified when overlapping sequences are identified when transcript overlaps in opposite strands are detected.
14. A ndsRNA identified according to the method of claim 12 or 13.
15. A ndsRNA selected from ndsRNA-2a and ndsRNA-2e, the sequence of which is shown in SEQ ID NO: l and SEQ ID NO:2, respectively.
16. A method for identifying a marker associated with a phenotype or cell function, or for identifying a target for the treatment of a disease or for identifying a biomarker indicative of a disease or condition, wherein the method comprises identifying the presence or absence of one or more ndsRNAs, or determining a change in the expression profile of one or more ndsRNAs in a sample of interest by implementing the method of claim 12 or 13, thereby associating the ndsRNAs identified to a phenotype, a cell function, or a disease.
17. The method according to claim 16, wherein the sample of interest is from a cancer cell and ndsRNA expression profiling allows the identification of bio markers indicative of this cancer or the identification of one or more ndsRNA which could be the target of a treatment of said cancer.
18. A method for the functional characterization of a ndsRNA, wherein said ndsRNA is introduced or depleted in a cell, tissue, organ or organism and a phenotypic or functional change occurring after said introduction or depletion is determined.
19. The method according to claim 18, wherein the phenotypic or functional change is selected from a modification of the cell cycle, the cell shape, induction of apoptosis, induction of cell differentiation and induction of a sensitivity or resistance to a therapeutic molecule.
20. A method for the functional characterization of a ndsRNA, comprising the identification of the binding partners, in particular protein binding partners, of said ndsRNA.
PCT/EP2015/062179 2014-05-30 2015-06-01 METHOD OF SEQUENCING AND IDENTIFYING RNAs WO2015181397A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
EP14305822.0 2014-05-30
EP14305822 2014-05-30

Publications (1)

Publication Number Publication Date
WO2015181397A1 true WO2015181397A1 (en) 2015-12-03

Family

ID=50933115

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/EP2015/062179 WO2015181397A1 (en) 2014-05-30 2015-06-01 METHOD OF SEQUENCING AND IDENTIFYING RNAs

Country Status (1)

Country Link
WO (1) WO2015181397A1 (en)

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2009099602A1 (en) * 2008-02-04 2009-08-13 Massachusetts Institute Of Technology Selection of nucleic acids by solution hybridization to oligonucleotide baits
US20100035249A1 (en) * 2008-08-05 2010-02-11 Kabushiki Kaisha Dnaform Rna sequencing and analysis using solid support
WO2011097528A1 (en) * 2010-02-05 2011-08-11 Institute For Systems Biology Methods and compositions for profiling rna molecules

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2009099602A1 (en) * 2008-02-04 2009-08-13 Massachusetts Institute Of Technology Selection of nucleic acids by solution hybridization to oligonucleotide baits
US20100035249A1 (en) * 2008-08-05 2010-02-11 Kabushiki Kaisha Dnaform Rna sequencing and analysis using solid support
WO2011097528A1 (en) * 2010-02-05 2011-08-11 Institute For Systems Biology Methods and compositions for profiling rna molecules

Non-Patent Citations (8)

* Cited by examiner, † Cited by third party
Title
FATIH OZSOLAK ET AL: "Direct RNA sequencing", NATURE, vol. 461, no. 7265, 23 September 2009 (2009-09-23), pages 814 - 818, XP055127674, ISSN: 0028-0836, DOI: 10.1038/nature08390 *
FATIH OZSOLAK ET AL: "RNA sequencing: advances, challenges and opportunities", NATURE REVIEWS GENETICS, vol. 12, no. 2, 1 February 2011 (2011-02-01), pages 87 - 98, XP055152859, ISSN: 1471-0056, DOI: 10.1038/nrg2934 *
GUNTER MEISTER ET AL: "Mechanisms of gene silencing by double-stranded RNA", NATURE, vol. 431, no. 7006, 16 September 2004 (2004-09-16), pages 343 - 349, XP055153799, ISSN: 0028-0836, DOI: 10.1038/nature02873 *
OLIVER H. TAM ET AL: "Pseudogene-derived small interfering RNAs regulate gene expression in mouse oocytes", NATURE, vol. 453, no. 7194, 22 May 2008 (2008-05-22), pages 534 - 538, XP055153023, ISSN: 0028-0836, DOI: 10.1038/nature06904 *
REYNOLDS A ET AL: "Rational siRNA design for RNA interference", NATURE BIOTECHNOLOGY, NATURE PUBLISHING GROUP, NEW YORK, NY, US, vol. 22, no. 3, 1 March 2004 (2004-03-01), pages 326 - 330, XP002311429, ISSN: 1087-0156, DOI: 10.1038/NBT936 *
T. D. HARRIS ET AL: "Single-Molecule DNA Sequencing of a Viral Genome", SCIENCE, vol. 320, no. 5872, 4 April 2008 (2008-04-04), pages 106 - 109, XP055072779, ISSN: 0036-8075, DOI: 10.1126/science.1150427 *
TOSHIAKI WATANABE ET AL: "Endogenous siRNAs from naturally formed dsRNAs regulate transcripts in mouse oocytes", NATURE, vol. 453, no. 7194, 10 April 2008 (2008-04-10), pages 539 - 543, XP055153025, ISSN: 0028-0836, DOI: 10.1038/nature06908 *
ZHONG WANG ET AL: "RNA-Seq: a revolutionary tool for transcriptomics", NATURE REVIEWS GENETICS, vol. 10, no. 1, 1 January 2009 (2009-01-01), pages 57 - 63, XP055152757, ISSN: 1471-0056, DOI: 10.1038/nrg2484 *

Similar Documents

Publication Publication Date Title
Tan-Wong et al. R-loops promote antisense transcription across the mammalian genome
Chujo et al. Unusual semi‐extractability as a hallmark of nuclear body‐associated architectural noncoding RNA s
Pintacuda et al. hnRNPK recruits PCGF3/5-PRC1 to the Xist RNA B-repeat to establish polycomb-mediated chromosomal silencing
Knuckles et al. RNA fate determination through cotranscriptional adenosine methylation and microprocessor binding
Żylicz et al. The implication of early chromatin changes in X chromosome inactivation
Velazquez Camacho et al. Major satellite repeat RNA stabilize heterochromatin retention of Suv39h enzymes by RNA-nucleosome association and RNA: DNA hybrid formation
Rossiello et al. DNA damage response inhibition at dysfunctional telomeres by modulation of telomeric DNA damage response RNAs
Treiber et al. A compendium of RNA-binding proteins that regulate microRNA biogenesis
Kashi et al. Discovery and functional analysis of lncRNAs: Methodologies to investigate an uncharacterized transcriptome
Marnef et al. Transcription-coupled DNA double-strand break repair: active genes need special care
Rosa-Mercado et al. Hyperosmotic stress alters the RNA polymerase II interactome and induces readthrough transcription despite widespread transcriptional repression
McDonel et al. Approaches for understanding the mechanisms of long noncoding RNA regulation of gene expression
De Dieuleveult et al. Genome-wide nucleosome specificity and function of chromatin remodellers in ES cells
Nojima et al. Mammalian NET-seq reveals genome-wide nascent transcription coupled to RNA processing
Wang et al. LncRNA Dum interacts with Dnmts to regulate Dppa2 expression during myogenic differentiation and muscle regeneration
Fox-Walsh et al. A multiplex RNA-seq strategy to profile poly (A+) RNA: application to analysis of transcription response and 3′ end formation
Ferreira et al. Satellite non-coding RNAs: the emerging players in cells, cellular pathways and cancer
Wong et al. Long non-coding RNAs in hematological malignancies: translating basic techniques into diagnostic and therapeutic strategies
Portal et al. Human cells contain natural double-stranded RNAs with potential regulatory functions
Niemelä et al. Global analysis of the nuclear processing of transcripts with unspliced U12-type introns by the exosome
Aeby et al. Decapping enzyme 1A breaks X-chromosome symmetry by controlling Tsix elongation and RNA turnover
US20230065720A1 (en) High Throughput Cell-Based Screening for Aptamers
Rosspopoff et al. Species-specific regulation of XIST by the JPX/FTX orthologs
Lindsay et al. Unique small RNA signatures uncovered in the tammar wallaby genome
WO2016059187A1 (en) Method of capturing and identifying novel rnas

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 15726600

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 15726600

Country of ref document: EP

Kind code of ref document: A1