WO2013135910A1 - Procédé d'identification de la séquence d'un poly(a)+arn qui interagit physiquement avec les protéines - Google Patents

Procédé d'identification de la séquence d'un poly(a)+arn qui interagit physiquement avec les protéines Download PDF

Info

Publication number
WO2013135910A1
WO2013135910A1 PCT/EP2013/055569 EP2013055569W WO2013135910A1 WO 2013135910 A1 WO2013135910 A1 WO 2013135910A1 EP 2013055569 W EP2013055569 W EP 2013055569W WO 2013135910 A1 WO2013135910 A1 WO 2013135910A1
Authority
WO
WIPO (PCT)
Prior art keywords
rna
protein
poly
sequence
binding
Prior art date
Application number
PCT/EP2013/055569
Other languages
English (en)
Inventor
Markus Landthaler
Mathias MUNSCHAUER
Alexander BALTZ
Original Assignee
Max-Delbrück-Centrum für Molekulare Medizin
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Max-Delbrück-Centrum für Molekulare Medizin filed Critical Max-Delbrück-Centrum für Molekulare Medizin
Priority to US14/385,501 priority Critical patent/US20150045237A1/en
Priority to EP13714874.8A priority patent/EP2825890A1/fr
Publication of WO2013135910A1 publication Critical patent/WO2013135910A1/fr

Links

Classifications

    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12NMICROORGANISMS OR ENZYMES; COMPOSITIONS THEREOF; PROPAGATING, PRESERVING, OR MAINTAINING MICROORGANISMS; MUTATION OR GENETIC ENGINEERING; CULTURE MEDIA
    • C12N15/00Mutation or genetic engineering; DNA or RNA concerning genetic engineering, vectors, e.g. plasmids, or their isolation, preparation or purification; Use of hosts therefor
    • C12N15/09Recombinant DNA-technology
    • C12N15/11DNA or RNA fragments; Modified forms thereof; Non-coding nucleic acids having a biological activity
    • C12N15/113Non-coding nucleic acids modulating the expression of genes, e.g. antisense oligonucleotides; Antisense DNA or RNA; Triplex- forming oligonucleotides; Catalytic nucleic acids, e.g. ribozymes; Nucleic acids used in co-suppression or gene silencing
    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12QMEASURING OR TESTING PROCESSES INVOLVING ENZYMES, NUCLEIC ACIDS OR MICROORGANISMS; COMPOSITIONS OR TEST PAPERS THEREFOR; PROCESSES OF PREPARING SUCH COMPOSITIONS; CONDITION-RESPONSIVE CONTROL IN MICROBIOLOGICAL OR ENZYMOLOGICAL PROCESSES
    • C12Q1/00Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions
    • C12Q1/68Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions involving nucleic acids
    • C12Q1/6869Methods for sequencing
    • C12Q1/6874Methods for sequencing involving nucleic acid arrays, e.g. sequencing by hybridisation
    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12QMEASURING OR TESTING PROCESSES INVOLVING ENZYMES, NUCLEIC ACIDS OR MICROORGANISMS; COMPOSITIONS OR TEST PAPERS THEREFOR; PROCESSES OF PREPARING SUCH COMPOSITIONS; CONDITION-RESPONSIVE CONTROL IN MICROBIOLOGICAL OR ENZYMOLOGICAL PROCESSES
    • C12Q1/00Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions
    • C12Q1/68Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions involving nucleic acids
    • C12Q1/6804Nucleic acid analysis using immunogens
    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12QMEASURING OR TESTING PROCESSES INVOLVING ENZYMES, NUCLEIC ACIDS OR MICROORGANISMS; COMPOSITIONS OR TEST PAPERS THEREFOR; PROCESSES OF PREPARING SUCH COMPOSITIONS; CONDITION-RESPONSIVE CONTROL IN MICROBIOLOGICAL OR ENZYMOLOGICAL PROCESSES
    • C12Q1/00Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions
    • C12Q1/68Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions involving nucleic acids
    • C12Q1/6806Preparing nucleic acids for analysis, e.g. for polymerase chain reaction [PCR] assay
    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12QMEASURING OR TESTING PROCESSES INVOLVING ENZYMES, NUCLEIC ACIDS OR MICROORGANISMS; COMPOSITIONS OR TEST PAPERS THEREFOR; PROCESSES OF PREPARING SUCH COMPOSITIONS; CONDITION-RESPONSIVE CONTROL IN MICROBIOLOGICAL OR ENZYMOLOGICAL PROCESSES
    • C12Q1/00Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions
    • C12Q1/68Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions involving nucleic acids
    • C12Q1/6813Hybridisation assays
    • C12Q1/6827Hybridisation assays for detection of mutation or polymorphism
    • GPHYSICS
    • G01MEASURING; TESTING
    • G01NINVESTIGATING OR ANALYSING MATERIALS BY DETERMINING THEIR CHEMICAL OR PHYSICAL PROPERTIES
    • G01N33/00Investigating or analysing materials by specific methods not covered by groups G01N1/00 - G01N31/00
    • G01N33/48Biological material, e.g. blood, urine; Haemocytometers
    • G01N33/50Chemical analysis of biological material, e.g. blood, urine; Testing involving biospecific ligand binding methods; Immunological testing
    • G01N33/68Chemical analysis of biological material, e.g. blood, urine; Testing involving biospecific ligand binding methods; Immunological testing involving proteins, peptides or amino acids
    • G01N33/6803General methods of protein analysis not limited to specific proteins or families of proteins
    • G01N33/6842Proteomic analysis of subsets of protein mixtures with reduced complexity, e.g. membrane proteins, phosphoproteins, organelle proteins
    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12NMICROORGANISMS OR ENZYMES; COMPOSITIONS THEREOF; PROPAGATING, PRESERVING, OR MAINTAINING MICROORGANISMS; MUTATION OR GENETIC ENGINEERING; CULTURE MEDIA
    • C12N2310/00Structure or type of the nucleic acid
    • C12N2310/10Type of nucleic acid
    • C12N2310/11Antisense
    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12NMICROORGANISMS OR ENZYMES; COMPOSITIONS THEREOF; PROPAGATING, PRESERVING, OR MAINTAINING MICROORGANISMS; MUTATION OR GENETIC ENGINEERING; CULTURE MEDIA
    • C12N2310/00Structure or type of the nucleic acid
    • C12N2310/10Type of nucleic acid
    • C12N2310/13Decoys
    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12NMICROORGANISMS OR ENZYMES; COMPOSITIONS THEREOF; PROPAGATING, PRESERVING, OR MAINTAINING MICROORGANISMS; MUTATION OR GENETIC ENGINEERING; CULTURE MEDIA
    • C12N2310/00Structure or type of the nucleic acid
    • C12N2310/30Chemical structure
    • C12N2310/33Chemical structure of the base
    • C12N2310/333Modified A
    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12NMICROORGANISMS OR ENZYMES; COMPOSITIONS THEREOF; PROPAGATING, PRESERVING, OR MAINTAINING MICROORGANISMS; MUTATION OR GENETIC ENGINEERING; CULTURE MEDIA
    • C12N2310/00Structure or type of the nucleic acid
    • C12N2310/30Chemical structure
    • C12N2310/33Chemical structure of the base
    • C12N2310/335Modified T or U
    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12NMICROORGANISMS OR ENZYMES; COMPOSITIONS THEREOF; PROPAGATING, PRESERVING, OR MAINTAINING MICROORGANISMS; MUTATION OR GENETIC ENGINEERING; CULTURE MEDIA
    • C12N2320/00Applications; Uses
    • C12N2320/10Applications; Uses in screening processes
    • C12N2320/11Applications; Uses in screening processes for the determination of target sites, i.e. of active nucleic acids
    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12QMEASURING OR TESTING PROCESSES INVOLVING ENZYMES, NUCLEIC ACIDS OR MICROORGANISMS; COMPOSITIONS OR TEST PAPERS THEREFOR; PROCESSES OF PREPARING SUCH COMPOSITIONS; CONDITION-RESPONSIVE CONTROL IN MICROBIOLOGICAL OR ENZYMOLOGICAL PROCESSES
    • C12Q2535/00Reactions characterised by the assay type for determining the identity of a nucleotide base or a sequence of oligonucleotides
    • C12Q2535/101Sanger sequencing method, i.e. oligonucleotide sequencing using primer elongation and dideoxynucleotides as chain terminators
    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12QMEASURING OR TESTING PROCESSES INVOLVING ENZYMES, NUCLEIC ACIDS OR MICROORGANISMS; COMPOSITIONS OR TEST PAPERS THEREFOR; PROCESSES OF PREPARING SUCH COMPOSITIONS; CONDITION-RESPONSIVE CONTROL IN MICROBIOLOGICAL OR ENZYMOLOGICAL PROCESSES
    • C12Q2600/00Oligonucleotides characterized by their use
    • C12Q2600/136Screening for pharmacological compounds

Definitions

  • the invention relates to an in vitro method for identifying the sequence of one or more poly(A)+RNA molecules that physically interacts with protein.
  • the present invention provides a method to define the protein-bound transcriptome regions under any given cellular condition, such as a disease condition or after treatment with any given substance, drug, or other cellular perturbation.
  • the invention also relates to an anti-sense oligonucleotide targeted against the sequence of a poly(A)+RNA molecule identified using the method, a method for identification of a drug target and a method for the identification of one or more biomarkers, preferably for identification of a panel of biomarkers, for any given medical condition, comprising the method of the invention.
  • the present invention relates in a preferred embodiment to a photoreactive nucleoside-enhanced UV- crosslinking and oligo(dT) affinity purification approach to globally map the sites of protein-poly(A)+RNA interactions in mammalian cells and other animal cell culture systems.
  • Protein occupancy profiling on poly(A)+RNA by next-generation sequencing of protein-crosslinked RNA fragments using the method of the present invention provides a transcriptome-wide view of the interaction sites of the mRNA-bound proteome and reveals widespread binding of proteins to 5' and 3' untranslated regions (3'UTRs) as well as coding regions of messengerRNAs (mRNAs).
  • the invention therefore relates to an in vitro method for identifying the sequence of one or more poly(A)+RNA molecules that physically interact with protein, comprising formation of covalently linked poly(A)+RNA-protein complexes via cross-linking, isolation of poly(A)+RNA-protein complexes by binding of poly(A)+RNA-protein complexes with oligo(dT) oligonucleotides, ribonuclease treatment and removal of unbound poly(A)+RNA, followed by removal of total protein, and identification of poly(A)+RNA sequences, preferably by cDNA library preparation and sequencing.
  • mRNA interactions are fundamental to core biological processes, such as mRNA splicing, localization, degradation and translation.
  • nascent mRNAs associate with proteins to form messenger ribonucleoprotein (mRNP) complexes that mediate and regulate most aspects of mRNA metabolism and function.
  • mRNP complexes consist of a dynamically changing repertoire of proteins that define the processing, cellular localization, as well as the decay and translation rate of specific mRNAs.
  • RNA-binding proteins de Lima Morais et al, 201 1
  • IRP1 iron-regulatory protein 1
  • RNA-interacting proteins A prerequisite for our understanding of the function of RNA-interacting proteins is a systematic identification of their binding sites and the definition of their RNA targets.
  • Current genomic approaches use UV crosslinking and immunoprecipitation (CLIP) of mRNA-RBP complexes in combination with next generation sequencing to identify RBP binding sites (Konig et al., 2010; Licatalosi et al., 2008).
  • CLIP UV crosslinking and immunoprecipitation
  • PAR-CLIP employs the photoreactive thionucleosides, 4-thiouridine and 6- thioguanosine, to increase the crosslinking efficiency between protein and RNA and to provide near nucleotide resolution of the RNA-binding site (Hafner et al., 2010).
  • This approach is however limited to particular proteins, as it relies on IP-based approaches, that pull down essentially only those RNA molecules that interact with any given particular protein of interest.
  • RNA occupancy profiling on mRNA reveals detailed information on which RNA sequences are bound by protein, showing for example that large stretches in 3' UTRs are covered by the mRNA-bound proteome, with numerous binding sites in regions harboring disease- associated nucleotide polymorphisms.
  • the present invention relates in a preferred embodiment to a photoreactive nucleoside-enhanced UV-crosslinking and oligo(dT) affinity purification approach to globally map the sites of protein-mRNA interactions in mammalian cells and other animal cell culture systems.
  • Protein occupancy profiling on poly(A)+RNA by "next-generation" sequencing of protein-crosslinked RNA fragments using the method of the present invention provides a transcriptome-wide view of the interaction sites of the mRNA-bound proteome and reveals widespread binding of proteins to coding sequences and 5' and 3' untranslated regions (3'UTRs) of mRNAs.
  • the present invention provides a method to define the protein-bound transcriptome under any given cellular condition, such as disease condition or after treatment with any given substance, drug, or other cellular perturbation.
  • the invention therefore relates to an in vitro method for identifying the sequence of one or more poly(A)+RNA molecules that physically interacts with protein, comprising: a) formation of poly(A)+RNA-protein complexes via cross-linking, b) isolation of poly(A)+RNA-protein complexes by binding of poly(A)+RNA-protein complexes with poly(A)+RNA-binding oligonucleotides, preferably to oligo(dT) oligonucleotides, and - removal of unbound poly(A)+RNA, followed by c) removal of total protein, and d) identification of poly(A)+RNA sequences.
  • poly(A)+RNA-isolation methods are as such known in the art, the combination of isolation of poly(A)+RNA, using preferably via oligo(dT) oligonucleotides, with subsequent deep sequencing represents a technically challenging procedure. Simple combination of known methods for poly(A)+RNA-isolation and subsequent sequencing of isolated material is not technically feasible. The combination of approaches applied in the present invention required the inventors to overcome significant compatibility issues, which ultimately have led to unexpectedly positive outcomes.
  • the invention is therefore characterised by the removal of unbound poly(A)+RNA, preferably after RNA isolation and before removal of total protein. Without this additional RNA-removal step in the method of the present invention analysis of the bound RNA molecules is technically impossible due to interfering high background RNA observed by "next-generation sequencing".
  • the method of the present invention is characterised in that the cross-linking is carried out by UV irradiation of cells treated with photoreactive nucleosides, such as 4-thiouridine and/or 6-thioguanosine.
  • photoreactive nucleosides such as 4-thiouridine and/or 6-thioguanosine.
  • the method of the present invention is characterised in that the cross-linking is carried out by a) introducing a photoreactive nucleoside into living cells wherein the living cells incorporate the photoreactive nucleoside into RNA transcripts during transcription thereby producing modified RNA transcripts and b) irradiating said cells at a wavelength significantly absorbed by the photoreactive
  • the wavelength is preferably greater than 300 nm.
  • Photoreactive nucleosides such as 4-thiouridine and/or 6-thioguanosine, provide a particularly effective method for cross-linking.
  • the subsequent mutation induced by the incorporation of a photoreactive nucleoside that has been cross-linked to protein enables effective sequencing and comparison to sequence databases to identify protein interaction sites in a fast and efficient manner, effectively enabling "next-generation" sequencing to be applied in genome-wide analyses.
  • the method of the present invention is characterised in that the isolation of poly(A)+RNA-protein complexes is carried out using oligo(dT) oligonucleotides attached to a solid support material, preferably by a) forming a soluble extract of the cells, b) addition of poly(A)+RNA-binding oligonucleotides, preferably oligo(dT) oligonucleotides, attached to a solid support material to said extract, c) washing the RNA-protein complexes that are bound to said poly(A)+RNA-binding
  • oligonucleotides preferably oligo(dT) oligonucleotides, attached to a solid support material under denaturing conditions, and d) treating the extract with a nuclease thereby removing unbound poly(A)+RNA.
  • a solid support enables simple separation of bound and unbound material.
  • solid-support mediated isolation is compatible with high throughput analysis and enables the analysis of multiple samples in parallel without extra experimental burden.
  • the method of the present invention is characterised in that unbound poly(A)+RNA is removed via a) treatment with one or more RNA-hydrolyzing enzymes, such as RNAse, and/or
  • benzonase more preferably RNAse I, as it exhibits no nucleotide bias for RNA degradation, thereby providing unbiased and efficient removal of unwanted or interfering RNA
  • b) precipitation of protein-poly(A)+RNA complexes preferably by ammonium sulphate precipitation and/or other protein precipitation methods such as Et-OH, and/or c) separation according to size, such as by gel electrophoresis, preferably by SDS-PAGE and subsequent transfer of protein-RNA complexes to nitrocellulose.
  • RNA-hydrolyzing enzymes and/or precipitation methods may be applied.
  • the most preferred method is the use of ammonium sulphate, or other effectively similar means for precipitation of protein-RNA complexes, in combination with electrophoresis and transfer of said complexes to nitrocellulose before analysis.
  • Protein-RNA complexes are therefore enriched by ammonium sulphate precipitation and then separated by SDS-PAGE, before being blotted onto nitrocellulose.
  • RNA can be extracted from the nitrocellulose membrane by proteinase treatment and nucleic acid purification, for example by phenol/chloroform extraction.
  • Ammonium sulphate precipitation is preferred over other methods of concentrating proteins, as it efficiently precipitates proteins, while nucleic acids remain largely soluble.
  • protein bound RNA fragments are enriched in the precipitate and background RNA is further removed by transfer of separated protein-RNA complexes to nitrocellulose, which specifically retains proteins but not free RNA.
  • Alternative protein precipitation methods can be applied, but the inventors observed a surprising and beneficial reduced level of background RNA when using ammonium sulphate precipitation, in comparison to other methods.
  • the method of the present invention is characterised in that total protein is removed via protease treatment, such as protease K treatment. Proteinase K is a highly processive enzyme without any amino acid sequence bias and provides a suitable method for releasing bound RNA.
  • the method of the present invention is characterised in that poly(A)+RNA sequences are identified via cloning poly(A)+RNA molecules into cDNA libraries followed by sequencing of said libraries.
  • the method of the present invention is characterised in that the identification of a sequence of a poly(A)+RNA molecule that physically interacts with protein is determined by a) identification of a mutation in the sequence of said poly(A)+RNA molecule by sequencing of the purified protein-bound poly(A)+RNA molecules and comparison of said sequence to a reference sequence, b) whereby the mutation is preferably defined as replacement of a deoxythymidine of the reference sequence by a deoxycytidine, or replacement of a deoxyguanine of the reference sequence by a deoxyadenine in the cDNA of the protein-crosslinked purified poly(A)+RNA molecule of 4-thiouridine and 6-thioguanine labelled cells, respectively, and c) the sequence of the binding site extends either side of the mutation for at least 1
  • nucleotide preferably from 1 to 20 nucleotides.
  • the method of the present invention is characterised in that the protein-interaction site is a protein-coding transcript or non-coding transcript.
  • a further aspect of the invention relates to a kit for identifying a protein-interaction site on poly(A)+RNA transcripts, the kit comprising: a) a thiouridine and/or thioguanosine analog and/or thiouridine and/or thioguanosine analog-supplemented tissue culture medium, b) reagents for removal of unbound RNA, such as reagents for the precipitation of RNA- protein complexes, c) reagents for oligo(dT) affinity purification, and d) reagents for protein precipitation e) adapters and primers for small RNA cloning.
  • a thiouridine and/or thioguanosine analog and/or thiouridine and/or thioguanosine analog-supplemented tissue culture medium b) reagents for removal of unbound RNA, such as reagents for the precipitation of RNA- protein complexes, c) reagents for oligo
  • a further aspect of the invention relates to one or more anti-sense oligonucleotides targeted against the sequence of a poly(A)+RNA molecule identified using the method of any of the preceding claims, preferably for use as a medicament, more preferably for the treatment of a medical disorder associated with physical interaction between a protein and said poly(A)+RNA sequence.
  • the method of the invention enables identification of protein-bound RNA sequences, in particular those sequences bound specifically according to disease-state or cell-type, the generation of anti-sense oligonucleotides binding potentially protein-bound RNA sequences represents one aspect of the invention.
  • RNA sequence identified by the present invention into a pharmaceutical composition, preferably with a pharmaceutically relevant carrier, such as are known in the art, requires no undue or inventive effort by a skilled person and is therefore a further aspect of the present invention.
  • oligonucleotide is targeted against a sequence of a poly(A)+RNA molecule comprising a single nucleotide polymorphism (SNP) provided in Figure 40and 41 and Table S7 as a medicament for the treatment of a medical disorder associated with said SNP, such as those disorders disclosed in Table S7.
  • Table S7 discloses specific sequences which are characterised by disease-associated SNPs and are (when in RNA form) bound by RNA-binding proteins, implicating these sequences are targets for anti-sense-based targeting approaches. For example, gain of function SNPs that lead to disease could be countered by targeting said sequences with anti-sense oligos, subsequently leading to reduced expression of said SNP-containing genes and subsequently preventing development of said disease.
  • the oligonucleotide of the present invention is characterised in that the
  • oligonucleotide binding to the poly(A)+RNA molecule results in changes in expression of the protein for which the poly(A)+RNA molecule codes, either by ribosome disruption, regulation of translation and/or RNA degradation induced by blockage of the binding site of RNA-interacting proteins using anti-sense oligonucleotides. Modulation of splicing may also be achieved by the oligonucleotide of the present invention
  • a further aspect of the invention relates to a method for identification of a drug target comprising the method according to any one of the preceding claims, whereby a protein-bound sequence of poly(A)+RNA molecule identified via the method of the preceding claims represents a drug target for treatment with anti-sense oligonucleotides that bind the protein interaction site on the poly(A)+RNA molecule.
  • a further aspect of the invention relates to a method for optimizing a therapeutic antisense
  • oligonucleotide by using the method as described herein, whereby the sequence of said oligonucleotide is modified according to the protein-binding characteristics of the poly(A)+RNA target molecule, as indentified using the method described herein.
  • a significant number of anti-sense molecules are in clinical development and many may bind regions of an RNA template that are also bound by protein.
  • the specific sequence of the RNA molecule that binds protein can be determined, thereby enabling modification of the anti-sense molecule as desired, wither to bind a protein-binding site or to avoid one.
  • a further aspect of the invention relates to a method for the identification of one or more biomarkers, preferably for identification of a panel or collection of biomarkers, for any given medical condition comprising the method according to any one of the preceding claims, whereby a) the method is carried out on samples obtained from healthy subjects and affected
  • protein-bound sequences of poly(A)+RNA molecules are identified as biomarkers for the medical condition when the presence, extent and/or quantity of protein-binding at the protein-bound sequence of said poly(A)+RNA molecule is significantly different between the two samples.
  • the cloning and sequencing is carried out as follows: a) the RNA of isolated cross-linked complexes is reverse-transcribed, thereby generating cDNA transcripts with one mutation wherein the photoreactive nucleoside is transcribed to a mismatched deoxynucleoside; b) cDNA transcripts are amplified thereby generating amplicons; c) nucleotide sequences of the amplicons having at least 15 nucleotides are determined; d) sequences of the amplicons are aligned against a reference sequence; and e) sequences of the amplicons aligned against the reference sequence are analysed so as to identify the binding site, wherein the sequences of each amplicon having a mutation resulting from the introduction of the photoreactive nucleoside is considered to be a valid amplicon comprising at least a portion of a binding site on the RNA transcript and enable single nucleotide resolution of crosslinking sites.
  • the identification of the sequence further comprises determining the sequence of a consensus motif, wherein the determination comprises using the mutation as an anchor and comparing the sequence surrounding the mutation to the reference sequence, wherein the mutation is within a sequence window that includes the mutation plus at least one nucleotide on either side of the mutation. ln one embodiment the identification of the sequence is characterized in that the sequence window includes one to twenty nucleotides on either side of the mutation. One nucleotide downstream and one upstream would make a 3 nt recognition sequence. Such a sequence region could be sufficient for binding and is therefore relevant for the present invention. In one embodiment the identification of the sequence is characterized in that the mutation is at the center of the sequence window.
  • the identification of the sequence is characterized in that the reference sequence is a genomic sequence.
  • the identification of the sequence is characterized in that the genomic sequence is a sequence that produced the RNA transcript.
  • the identification of the sequence is characterized in that the reference sequence is a synthetic RNA sequence.
  • the identification of the sequence is characterized in that the reference sequence is derived from an expressed sequence tag database. In one embodiment the identification of the sequence further comprises identifying a feature required for interaction of the protein-interaction site.
  • the identification of the sequence is characterized in that aligning the sequences of the amplicons comprises determining which amplicons have a mutation wherein a deoxythymidine and deoxyguanine of the reference sequence is replaced by a deoxycytidine and deoxyadenine, respectively, in the amplicons.
  • deoxythymidine and deoxyguanine of the reference sequence is replaced by a deoxycytidine and deoxyadenine, respectively, in the amplicons.
  • the photoreactive nucleoside is a thiouridine analog.
  • the thiouridine analog is 2-thiouridine; A- thiouridine; or 2,4- di-thiouridine.
  • the thiouridine analog is substituted at the 5 and/or 6 position substituents selected from the group consisting of methyl, ethyl, halo, nitro, NR R 2 and OR 3 wherein R 1 , R 2 and R 3 independently represent hydrogen, methyl or ethyl.
  • the photoreactive nucleoside is a thioguanosine analog.
  • the thioguanosine analog is 6-thioguanosine.
  • a further aspect of the invention relates to an in vitro method for identifying one or more proteins that physically interact with poly(A)+RNA, comprising: - formation of poly(A)+RNA-protein complexes via cross-linking, binding and purification of poly(A)+RNA-protein complexes using
  • poly(A)+RNApoly(A)+RNA-binding oligonucleotides preferably oligo(dT) oligos, removal of total RNA, and identification of proteins via mass spectrometry.
  • the proteins are separated by gel electrophoresis and/or enzymatically digested into peptide fragments, preferably with trypsin, and subsequently analysed via mass spectrometry, whereby protein identity is derived from comparing measured peptide mass to predicted peptide mass from a database.
  • the method is characterised in that quantitative mass spectrometry is performed using SILAC, whereby a control sample is obtained from cells grown in culture medium comprising a suitable SILAC isotope that exhibits a different mass from the isotope in the medium of the cells used to obtain the sample to be analysed.
  • a further aspect of the invention is therefore a poly(A)+RNA-interacting protein selected from Table S2, in particular the sub-group of Table S2 as a medicament or drug target, preferably for the treatment of a medical disorder associated with physical interaction between said protein and an poly(A)+RNA molecule.
  • the inventors utilise the fact that a photoreactive nucleoside undergoes a structural change upon crosslinking to protein, and is subsequently identified as a mutation in cDNA that is prepared from the modified mRNA.
  • This effect the sequencing of cDNA and comparison of sequences to reference sequences is disclosed in detail in WO 2010/014636, which we hereby incorporate in its entirety by reference.
  • the mutated cDNA can be analyzed by exploiting the mutation, thereby providing a means of distinguishing UV-crosslinked target sites from background RNA fragments that were captured but not initially crosslinked to the moiety. Such an analysis dramatically increases the recovery of target sites that were crosslinked, reduces the risk of scoring false positives of target sites, and allows for extraction of sequence information of the target site.
  • proteins that "physically interacts” or “binds” with the RNA refers to any substantially protein entity that binds to an RNA protein binding site.
  • proteins include, but are not limited to, proteins, protein complexes, or portions or fragments thereof, including protein domains, regions, sections and the like. Proteins include one or more RNA-binding proteins (RBP), RNA-associated proteins or combinations thereof.
  • RBP RNA-binding proteins
  • a protein complex may comprise, for example, nucleic acid components in ribonucleoprotein complexes (RNP), e.g., miRNA, piRNA, siRNA, endo-siRNA, snoRNA, snRNA, tRNA, rRNA, ncRNA, IncRNA or combinations thereof.
  • RNP ribonucleoprotein complexes
  • RNA guides and participates in target RNA binding.
  • Protein complexes may also include RNA helicases, e.g. MOV10, and Proteins containing nuclease motifs, e.g. SND1.
  • protein binding site or “interaction site” refers to that portion, region, position or location of an RNA transcript in which at least one interaction with a protein occurs. Such interaction may include at least one direct interaction between a nucleotide of the RNA transcript and an amino acid of the protein.
  • a binding site or sites of an RNA transcript may be found at a structured or unstructured region of the RNA transcript. It is also contemplated that more than one binding site may exist for any one RNA transcript. Further, binding sites of RNA transcripts may involve non-contiguous nucleotides of the RNA transcript. Such binding sites are contemplated when structure, such as, for example, a stem loop, is involved in binding.
  • a “photoreactive nucleoside” refers to a modified nucleoside that contains a photochromophore and is capable of photocrosslinking with a protein.
  • the photoreactive group will absorb light in a spectrum of the wavelength that is not absorbed by the protein or the non-modified portions of the RNA.
  • the "living cell or cells” may be part of a cell culture, a cell extract, cell line, whole tissue, a whole organ, tissue extract, or tissue sample, such as, for example, a biopsy or progenitor cells as from bone marrow or stem cells.
  • the living cell can be from a healthy source or from a diseased source, such as, for example, a tumor, a tumor cell, a cell mass, diseased tissue, tumor cell extract, a pre-cancerous lesion, polyp, or cyst or taken from fluids of such sources.
  • the cells can be any kind of cells, for example, cells from bacteria and yeast, animals, especially mammalian cells, and plants.
  • RNA transcripts have been produced, or at a time at which transcription should have produced transcripts within the living cell or cells, the living cell or cells comprising the modified RNA transcripts are then irradiated.
  • the irradiation is at a wavelength which is significantly absorbed by the
  • the minimum wavelength can be 300 nm, preferably 320 nm, and more preferably 340 nm.
  • the maximum wavelength can be 410 nm, preferably 390 nm, and more preferably 380 nm. Any combination of minimum and maximum wavelength values can be used to describe a suitable range.
  • the optimal wavelength is approximately 330 nm for a thiouridine analog.
  • the optimal wavelength for a thioguanosine analog is approximately 310nm.
  • Irradiation forms covalent cross-links between the modified RNA transcript and a protein spatially located close enough to said modified RNA transcript to undergo cross-linking
  • the Part or parts of a modified RNA transcript which are close enough contact to have undergone cross-linking with a protein can be considered binding sites.
  • binding sites are covalently cross-linked to binding proteins.
  • Covalent cross-linking allows the use, in some embodiments of the present invention, of rigorous purification schemes, such as, for example, oligo(dT) oligonucleotide purification and separating complexes an SDS-PAGE.
  • the covalent bond enables partial cleavage of RNA molecules without affecting their protein binding by the use of nucleases.
  • the modified RNA transcripts, or portions thereof, which are not covalently cross-linked upon irradiation to one or more binding proteins are removed.
  • cross-linked segments or "RNA-protein complexes”
  • cross-linked segments include the portion of the modified transcript that comprises the binding site as well as at least the portion of the protein that was subject to cross linking.
  • the binding site therefore contains at least one photoreactive nucleoside through which the binding site is cross-linked to the protein.
  • the complexes also may include additional nucleotides of the modified RNA transcript that are not bound to the binding moiety.
  • the cross-linked segments are then isolated.
  • the preferred isolation method relates to isolation of poly(A)+ RNA-protein complexes using oligo(dT) oligonucleotides attached to a solid support material, preferably by forming a soluble extract of the cells, addition of poly(A)+RNA-binding antisense oligonucleotides attached to a solid support material to said extract, washing the RNA-protein complexes that are bound to said poly(A)+RNA-binding antisense oligonucleotides attached to a solid support material, and treating the extract with a nuclease thereby removing unbound poly(A)+RNA.
  • a "poly(A)+RNA molecule” is to be understood as any RNA molecule that comprises a polyA- sequence attached to it.
  • the poly(A) sequence is commonly known as a tail that consists of multiple adenosine monophosphates; in other words, it is a stretch of RNA that has adenine bases.
  • polyadenylation is part of the process that produces mature messenger RNA (mRNA) for translation.
  • RNA-protein complexes are treated with a ribonuclease nuclease.
  • the nuclease trims the regions of the modified transcripts that are not cross-linked to binding proteins. It is contemplated, in one embodiment, that the nuclease would remove, or trim, the entire portion of a modified transcript that is not cross- linked to a binding moiety. However, since trimming can occur in various places an a modified RNA transcript which are not cross-linked to binding proteins, the population of "cross-linked segments" may include “cross-linked segments" with various species of "flanking segments”.
  • the nuclease is ribonuclease I (Escherichia coli). Ribonuclease I preferentially hydrolyzes single-stranded RNA to nucleoside 3'-monophosphates via nucleoside 2', 3'-cyclic monophosphate intermediates.
  • Protein-RNA complexes are preferably enriched by ammonium sulphate precipitation and separated by electrophoresis, preferable SDS-PAGE, and blotted onto nitrocellulose to futher removed non- crosslinked RNA.
  • Precipitation is known in the art for enriching proteins.
  • the present invention encompasses as precipitation any method which leads to effective precipitation of RNA-protein complexes, and therefore preferably encompasses any given protein precipitation method.
  • Common protocols relate to acetone/TCA precipitation, chloroform methanol, ammonium sulphate or ethanol precipitation. Further examples are given below.
  • Precipitation serves to concentrate and fractionate the target product from various contaminants.
  • the underlying mechanism of precipitation is to alter the solvation potential of the solvent and thus lower the solubility of the solute by addition of a reagent.
  • the solubility of proteins in aqueous buffers depends on the distribution of hydrophilic and hydrophobic amino acid residues on the protein's surface.
  • Hydrophobic residues predominantly occur in the globular protein core, but some exist in patches on the surface. Proteins that have high hydrophobic amino acid content on the surface have low solubility in an aqueous solvent. Charged and polar surface residues interact with ionic groups in the solvent and increase solubility. Knowledge of amino acid composition of a protein will aid in determining an ideal precipitation solvent and method. Salting out is the most common method used to precipitate a target protein. Addition of a neutral salt, such as ammonium sulphate, compresses the solvation layer and increases protein-protein interactions. As the salt concentration of a solution is increased, the charges on the surface of the protein interact with the salt, not the water, and the protein falls out of solution (precipitates).
  • a neutral salt such as ammonium sulphate
  • the isoelectric point (pi) is the pH of a solution at which the net primary charge of a protein becomes zero. At a solution pH that is above the pi the surface of the protein is predominantly negatively charged and therefore like-charged molecules will exhibit repulsive forces. Likewise, at a solution pH that is below the pi, the surface of the protein is predominantly positively charged and repulsion between proteins occurs. However, at the pi the negative and positive charges cancel, repulsive electrostatic forces are reduced and the attraction forces predominate.
  • the attraction forces will cause aggregation and precipitation.
  • the pi of most proteins is in the pH range of 4-6.
  • Mineral acids such as hydrochloric and sulfuric acid are used as precipitants.
  • Addition of miscible solvents such as ethanol or methanol to a solution may cause proteins in the solution to precipitate.
  • the solvation layer around the protein will decrease as the organic solvent progressively displaces water from the protein surface and binds it in hydration layers around the organic solvent molecules.
  • the binding proteins are removed from the "isolated cross-linked segments" to generate “isolated segments.”
  • the protein components of the binding proteins are removed by digesting the binding proteins with a protease.
  • digestion is effected by Proteinase K or a homologous enzyme. Proteinase K is capable of efficiently digesting protein binding proteins, liberating RNA and yielding RNA products.
  • proteases or their homologues include: Aspartyl proteases, caspases, thiol proteases, Insulinase family proteases, zinc binding proteases, Cytosol Aminopeptidase family proteases, Zinc carboxypeptidases Neutral Zinc Metallopeptidases, extracellular matrix
  • Methionine aminopeptidases Serine Carboxypeptidases, Cathepsins, Subtilases, Proteasome A-type Proteases, Proteosome B-type Proteases, Trypsin Family Serine Proteases, Subtilase Family Serine Proteases, Peptidases, and Ubiquitin carboxyl-terminal hydrolases.
  • the "isolated cross-linked segments" and/or the "isolated segments” are then reverse transcribed to generate cDNA transcripts.
  • the isolated cross-linked segments i.e., the segments to which a whole or partial binding moiety is attached.
  • the introduction of the photoreactive nucleoside yields a mutation in the cDNA transcript when the isolated crosslinked segment is reverse transcribed.
  • the thiouridine analog is reverse transcribed to a deoxyguanosine instead of the deoxyadenosine that is normally incorporated into the reverse transcribed cDNA by Watson-Crick base pairing.
  • the thioguanosine analog is reverse transcribed to a deoxythymidine instead of the deoxycytidine normally incorporated by Watson-Crick base-pairing. Therefore, the mutation within the cDNA transcript is located within a binding site.
  • the cDNA transcripts are then amplified, thereby generating cDNA amplicons.
  • the thiouridine analog is reverse transcribed to produce the mutation of a deoxyguanosine instead of the
  • the respective cDNA transcripts when amplified, will include a mutation wherein the expected deoxythymidine is replaced with a deoxycytidine in the amplicons.
  • the thioguanosine analog is reverse transcribed to produce the mutation of a deoxythymidine instead of the deoxycytidine, as described above, the respective cDNA transcripts, when amplified, will include a mutation wherein the expected deoxyguanosine is replaced by a deoxyadenosine in the amplicons.
  • the reverse transcription and amplification can be performed by methods known in the art. For example, the reverse transcription to generate cDNA transcripts and amplification can be achieved using linker ligation and RT-PCR thereby generating amplified cDNA transcripts.
  • first synthetic oligonucleotide adapters of known sequence are ligated to the 3' and 5' ends of the small RNA Pool using T4 RNA ligases.
  • the adapters introduce primer-binding sites for reverse transcription and PCR amplification.
  • the small RNA Pool typically comprises contaminants resulting from the nuclease digests of very abundant transcripts and non-coding RNAs such as ribosomal RNAs. If desired, non-palindromic restriction sites present within the adapter/primer sequences can be used for generation of concatamers to increase the read length for conventional sequencing or longer size range 454 sequencing.
  • the attachment, or joining, of the adapter sequence to the "isolated cross-linked segments" and/or the “isolated segments” can be done in a variety of ways.
  • the adapter sequence can be attached either at the 3' or 5' ends, or in an internal position of "isolated cross-linked segments" and/or the "isolated segments.”
  • precautions can be taken to prevent circularization of 5' phosphate/3' hydroxyl small RNAs during adapter ligation.
  • chemically pre-adenylated 3' adapter deoxyoligonucleotides which are blocked at their 3' ends to avoid their circularization, can be used.
  • pre-adenylated adapters eliminates the need for ATP during ligation, and thus minimizes the Problem of adenylation of the Pool RNA 5' phosphate that leads to circularization.
  • a truncated form of T4 RNA ligase 2, Rn12(1-249), or an improved mutant, Rn12(1-249)K227Q can be used to minimize adenylate transfer from the 3' adapter 5' phosphate to the 5' phosphate of the small RNA Pool and subsequent Pool RNA circularization. See also International Patent Application No. PCT/US2008/001227, published as WO 2008/094599, which is incorporated herein by reference in its entirety.
  • the length of the adapter sequences will vary. In a preferred embodiment, adapter sequences range from about 6 to about 500 nucleotides in length, preferably from about 8 to about 100, and most preferably from about 10 to about 25 nucleotides in length.
  • the cDNA amplicons are then sequenced. The sequencing can be performed by any known means. In a preferred embodiment, the sequencing method will generate sequences of amplicons of at least about 20 nucleotides in length.
  • the amplicons can be sequenced using "Nlumina" massive parallel sequencing platform or other similar sequencing methods which yields 30 million sequences of 32, 36, 72 or 100 nucleotides in length per library and sequencing reaction.
  • Solexa/lllumina sequencing can also be carried out conveniently at a smaller scale processing a larger sample number, i.e. yielding about 1.5-150 million reads per sample. The larger sets are obtained, if a full sequencing plate is used.
  • RNA cDNA libraries Tuschl, Identification of microRNAs and other small regulatory RNAs using cDNA library sequencing, Methods, 2008, 44:3-12.
  • the amplicons can be sequenced using pyrosequencing (454 sequencing, Roche), which provides up to 400,000 sequences of up to 250 nt in length for a single read. Data management and sequence analysis from small RNA cDNA libraries is best carried out in collaboration with an experienced computational biology laboratory.
  • the amplicons are then assessed in order to identify those that include the portion of the RNA transcript that binds to the binding moiety in vivo.
  • first unique sequences are identified and counted.
  • the amplicons are filtered to remove irrelevant sequences (i.e., irrelevant amplicons).
  • the amplicon sequences can be filtered in accordance with any or all for the following rules:
  • the selected amplicons should have sufficient length to enable identification by means of sequencing or hybridization.
  • the selected amplicons should not have highly repetitive portion(s) within their sequence.
  • the selected amplicons should avoid sequences that may interfere with the manipulation of RNA and DNA while performing the invention (e.g. they should not have recognition sites for restriction endonuleases used during the manipulation process).
  • the amplicons are narrowed to those more likely to include the portion of the RNA transcript that binds to the binding moiety in vivo.
  • amplicons which are shorter than a certain number are removed, for example, less than 20 nucleotides or less than 15 nucleotides.
  • amplicons that do not map to a portion of the reference sequence being studied and/or amplicons that do not map to a portion of a known RNA sequence can be removed.
  • amplicons which contain highly repetitive portion(s) within their sequence e.g., many multiples of TATA or GCGC) can be removed. Such sequences are referred to as "low entropic sequences".
  • a “reference sequence” refers to any known sequence with which to compare an amplicon sequence.
  • the reference sequence may be derived from a genomic sequence, a transcriptome sequence, an expressed sequence tags (EST) database, a sequence from which the RNA transcript was extracted, a known sequence library, a synthetic nucleotide sequence, a randomized RNA sequence, or a known RNA sequence.
  • EST expressed sequence tags
  • the human genomic sequence is being studied.
  • the amplicons with overlapping sequences are "clustered.”
  • Clustering refers to grouping together and aligning overlapping sequences.
  • the quantities of amplicons in a particular cluster are then counted. For example, overlapping amplicon sequences, which differ by length simply because of a different point of digestion by a nuclease, can be counted as a cluster ln another embodiment, aligning sequences occurs without narrowing down the amplicons in quantity before analyzing the amplicons.
  • Noise is the low frequency amplicon counts that are due to random degradation or RNA turnover products present as background in cross-linked RNA recovered from IP or gels.
  • noise is detected by the absence of a deoxythymidine to deoxycytidine mutation when using a thiouridine analog, such as 4-thiouridine, as the photoreactive nucleoside or by the absence of a deoxyguanosine to deoxyadenosine mutation when using a thioguanosine analog, e.g., 6-thioguanosine, as the photoreactive nucleoside.
  • Noise can also be detected by the absence of very sharp "peaks" at any given transcript. Noise is seen as a random distribution of amplicons along a transcript without characteristic mutations.
  • aligning the sequences of the amplicons includes determining which amplicons have a mutation (preferably, a mismatch mutation) when compared to the reference sequence.
  • aligning the sequences of the amplicons may include determining which amplicons have a mutation wherein a deoxythymidine of the reference sequence is replaced by a deoxycytidine in the amplicons, when a thiouridine analog, such as 4-thiouridine, is used as the photoreactive nucleoside.
  • aligning the sequences of the amplicons may include determining which amplicons have a mutation wherein a deoxyguanosine of the reference sequence is replaced by a deoxyadenosine in the amplicons when using a thioguanosine analog, e.g., 6-thioguanosine, as photoreactive nucleoside.
  • a thioguanosine analog e.g., 6-thioguanosine
  • such amplicons that are determined to have a mismatch mutation when compared to the reference sequence are considered "valid amplicons.”
  • the aligning the sequences of the amplicons includes determining which amplicons have at least one mismatch mutation when compared to the reference sequence.
  • the step of aligning the sequences of the amplicons includes determining which amplicons have only one mismatch mutation when compared to the reference sequence.
  • a “mismatch” as used herein refers to a nucleic acid base that is any other nucleic acid base located on an amplicon at a specific position compared to the nucleic acid base that is aligned to the reference sequence.
  • the mismatch can be Adenosine, Guanosine, or Cytosine.
  • the mismatch between the amplicon and reference sequence may be due to deletions, insertions, substitutions, or frameshift mutations in the amplicon or reference sequence.
  • the sequences of the amplicons are then analyzed to determine the specific location on an RNA transcript that a given binding moiety binds in vivo, i.e., to determine the binding site. In this method, the amplicons are further narrowed down to find "valid amplicons.”
  • a "valid amplicon” as used herein refers to an amplicon that is not noise, as described above.
  • a “valid amplicon” includes those having a mutation resulting from the introduction of the photoreactive nucleoside. For example, one method by which to find "valid amplicons" is to use the deoxythymidine to deoxycytidine mutation.
  • Another method by which to find "valid amplicons” is to use the deoxyguanosine to deoxyadenosine mutation.
  • Clustered amplicons with only a single mutation with respect to the "reference sequence,” i.e., the deoxyguanosine to deoxyadenosine mutation, are located. It is considered that the mutation occurred upon reverse transcription, as described above. Such amplicons are also considered to be "valid.”
  • these "valid amplicons" are assessed in view of the total number of sequences that aligned to the region at issue, i.e., the total amplicons in a particular cluster.
  • the total number of aligned sequences includes those sequences that have the mutation and those that do not have the mutation.
  • the quantity of total aligned amplicons i.e., the total amplicons in a particular cluster. For example, a low percentage (e.g., 1 % to 49%) is adequate to demonstrate a "valid amplicon” if the total quantity of aligned sequences is large (20 amplicons or more); and a high percentage (e.g., 50% to 100%) is adequate to demonstrate a "valid amplicon” if the total quantity of aligned sequences is small (19 amplicons or less. At least 10% of the sequences have to show the mutation to indicate a "valid amplicon.”
  • “valid amplicons” are further analyzed in view of the "reference sequence” to determine the presence of a consensus motif or sequence within a binding site.
  • the binding site can be part of coding transcript or non-coding transcript of RNA.
  • the deoxythymidine to deoxycytidine mutation and/or the deoxyguanosine to deoxyadenosine mutation in the amplicon are used as an anchor for comparing the sequence surrounding the mutation to the "reference sequence.” Such surrounding sequence is termed "sequence window.”
  • the "sequence window” includes the mutation plus at least one nucleotide on either side of the mutation.
  • the number of nucleotides on either side of the mutation ranges from about 5 to about 20 nucleotides.
  • the mutation is at the center of the sequence window.
  • Sequence identity and/or similarity is determined using standard techniques known in the art, including, but not limited to, the local sequence identity algorithm of Smith & Waterman, Adv. Appl. Math., 2:482 (1981 ), by the sequence identity alignment algorithm of Needleman & Wunsch, J. Mol. Biol., 48:443 (1970), by the search for similarity method of Pearson & Lipman, Proc. Natl. Acad. Sci.
  • motif searches are conducted for the extracted sequences by computational means known in the art.
  • methods used in conducting motif searches include CONSENSUS, multiple expectation maximization for motif elicitations (MEME) program, Gibbs sampling, PhyloGibbs sampling, Motif Discovery scan program (MDScan), or Al ignACE (Roth, F. P., Hughes, J. D., Estep, P. W. & Church, G. M. Finding DNA regulatory motifs within unaligned noncoding sequences clustered by whole-genome mRNA quantitation. Nat Biotechnol 16, 939-45 (1998)).
  • the MEME program finds conserved ungapped short motifs within a group of related, unaligned sequences (Bailey and Gribskov, 1998, J Comput Biol, 5:21 1-21 ).
  • MDScan for example, is used to identify sequence motifs from a set of identified genomic regions (Liu X S et al. (2002) Nat. Biotechnol., 20(8):835-9).
  • more than one algorithm may be used to identify motifs for the extracted sequences.
  • the analysis of the amplicon sequences can further include identifying a feature required for interaction of the binding site and the binding moiety.
  • evaluation of the consensus sequence of the binding site can reveal a structure, such as a stem loop, that may be required or involved in binding to the binding moiety.
  • the consensus motif of the binding site can be utilized for various clinical or research applications.
  • the binding site can be sequenced using patient DNA to identify mutations, deletion or insertions that may link a genetic alteration in an important, regulatory RNA segment to a disease condition. It is known that RNA binding proteins are essential regulators of proteins by binding to coding and non-coding RNAs and regulating their transcription, modification, splicing, nuclear export, transport and translation.
  • RNA-binding protein known to affect the stability or translation of a gene can be utilized as a drug target for the regulation of the targets of the gene.
  • FIG. 1 Illustration of the experimental setup to identify the mRNA-bound proteome and its occupancy profile on RNA.
  • Transcripts were labeled with photoreactive nucleosides and proteins were crosslinked to RNA by 365 nm UV-irradiation.
  • mRNP complexes were isolated by cell lysis and oligo(dT)- precipitation under denaturing conditions.
  • mRNPs were eluted from the beads, nuclease-treated and analyzed by quantitative mass-spectrometry.
  • To identify the protein binding pattern on RNA mRNPs were RNAse I treated, followed by proteinase K digest to remove RNA-bound proteins. RNA molecules were converted into a cDNA library and next- generation sequenced.
  • FIG. 1 SDS-PAGE analysis of proteins crosslinked to polyadenylated RNA.
  • HEK293 cells were grown in medium supplemented with 4SU and/or 6SG and UV-irradiated at 365 nm. Cells were lysed using denaturing conditions and protein-mRNA complexes were isolated by oligo(dT)-precipitation. Protein-RNA complexes were eluted from oligo(dT) beads, treated with RNAse I, separated on a SDS gradient gel and visualized by silver-staining.
  • FIG. 1 GAPDH mRNA depletion. qRT-PCR analysis of GAPDH mRNA in supernatants (SN1 to SN4) after each round of oligo(dT) bead precipitation (four in total) compared to GAPDH mRNA in extract before precipitations (input) shown as percent of input.
  • the error bars display the calculated maximum and minimum expression levels that represent the standard error of the mean expression level with a 95% confidence interval.
  • Figure 4 Western Blot analysis of FLAG/HA-tagged RNA-binding proteins QUAKING (QKI) and ARGONAUTE 2 (AG02/EIF2C2) in input extract (I), supernatant after precipitation (S), and oligo(dT)- purified material (P) of UV-crosslinked and non-crosslinked cells.
  • Figure 5 Read count distribution over different RNA types. mRNA was purified either from total TRIzol- extracted RNA by a single oligo(dT) precipitation (mRNA seq), or by four rounds of oligo(dT) precipitation from cellular extract of UV-irradiated and non-irradiated cells (4SU+6SG UV and 4SU+6SG no UV, respectively). Crosslinked proteins were removed by Proteinase K digest prior to RNA analysis by next-generation sequencing of recovered RNA. The read count distribution over different RNA classes (mRNA, rRNA and other) was inferred by multiplying the FPKM values with the respective length of the longest transcript of a given gene. Figure 6.
  • 4SU and 6SG-containing RNA was purified from oligo(dT) precipitated RNA of non-crosslinked cells by biotinylation and streptavidin-pulldown (4SU+6SG purified RNA) and analyzed by next-generation sequencing. The diagonal is shown as yellow line for of each pairwise comparison whereas a LOESS regression line is shown in red.
  • FIG. 7 Summary of proteomic experiments. In two replicates the proteomic composition of oligo(dT)- precipitates was analyzed for "light” labeled crosslinked cells (experiments L1 and L2) and one experiment for "heavy” labeled crosslinked cells (H1 ). The overlap of identified proteins in different experiments is shown in the Venn diagram. Table indicates the number of identified and quantified proteins, as determined by SI LAC ratios of proteins in each experiment.
  • Figure 8 Comparison of the log2 fold changes (LFC) of "heavy” to “light” SILAC ratios (H/L) of proteins quantified in biological replicates L1 and L2. Previously known RNA-binding proteins are indicated in green and known contaminants in red. Figure 9. As in ( Figure 8) Proteins quantified in L1 plotted against proteins quantified in label swap experiment H1 .
  • Figure 11 Overview of identified mRNA-interacting proteins. Number of identified proteins belonging to different functional categories.
  • Figure 12 Overview of identified mRNA-interacting proteins. Median relative number of protein molecules belonging to different functional categories as determined shown as box plots. Protein amounts were calculated as the sum of all peptide peak intensities divided by the number of theoretically observable tryptic peptides (Schwanatorir et al., 201 1 ). The median is shown as horizontal line and the surrounding box defines the upper and lower quartile. The sample range is defined by the whiskers, while dots indicate potential outliers.
  • Figure 13 Overview of identified mRNA-interacting proteins. Overlap of identified mRNA binders with proteins present in spliceosome and nucleolus.
  • Figure 14 Overview of identified mRNA-interacting proteins. Number of identified proteins with specific RNA-binding domains (dark grey) was compared to respective number of RNA-binding domain containing proteins in expressed HEK293 proteome (light grey).
  • Figure 15. Validation of RNA-binding activity of candidate mRNA binders. RNA-binding activity of candidate mRNA binders was determined by PAR-CLIP. Protein-RNA complexes were separated by SDS-PAGE and blotted onto nitrocellulose membrane. Western analysis using an anti-HA antibody confirmed the correct size and equal loading of the IPed protein. Phosphorimaging indicated efficient radioactive labeling of covalently bound nucleic acid in the mRNP complex. The assay was performed at least twice for each protein. Representative results are shown. CAPRIN1 , HNRNPD, HNRNPR, HNRNPU and MYEF2 served as positive controls.
  • FIG. 18 PAR-CLIP analysis of candidate RNA-binding proteins. Distribution of mRNA binding sites based on PAR-CLIP sequence clusters for the indicated proteins are shown. Absolute number and percentage distribution of sequence clusters in different transcript regions are indicated.
  • FIG. 19 PAR-CLIP analysis of candidate RNA-binding proteins. PAR-CLIP sequence coverage along transcript regions is shown for ALKBH5 and C22orf28.
  • Figure 20 PAR-CLIP analysis of candidate RNA-binding proteins. Genome browser view of spliced and unspliced XBP1 transcript isoforms. Putative C22orf28 binding sites flanking the XBP1 intron are indicated in dark grey.
  • FIG. 21 Specific T-C transitions in protein occupancy profiling sequencing reads. Specific mismatches in aligned sequence reads demonstrate efficient protein-RNA crosslinking. The frequency of nucleotide mismatches in occupancy profiling reads aligned to human genome is shown for library 1. T-C mismatches are the signature of efficient crosslinking of 4SU-labeled RNA to protein.
  • Figure 22 Specific T-C transitions in protein occupancy profiling sequencing reads. Specific mismatches in aligned sequence reads demonstrate efficient protein-RNA crosslinking. The frequency of nucleotide mismatches in occupancy profiling reads aligned to human genome is shown for library 2. T-C mismatches are the signature of efficient crosslinking of 4SU-labeled RNA to protein.
  • Figure 23 Mapping of protein occupancy profiling sequence reads. Distribution of mapped sequence reads to different RNA types for library 1 .
  • Figure 24 Mapping of protein occupancy profiling sequence reads. Distribution of mapped sequence reads to different RNA types for library 2.
  • Figure 25 Comparison exonic to intronic read counts in protein occupancy profiling libraries (related to Fig. 6). Comparison of exon versus intron read count for occupancy libraries 1 and 2.
  • Figure 26 Correlation of protein occupancy profiling sequence coverage between two libraries. Density of transcript-wise rank correlation coefficients based on sequence coverage of two protein occupancy profiling libraries between corresponding (black solid line) and unrelated (grey dashed line) transcripts. A sliding window approach was used to compare sequence coverage over entire transcripts. Solid vertical lines indicate medians, dashed vertical lines the 5% and 95% quantiles, respectively.
  • FIG. 27 Correlation of protein occupancy profiling sequence coverage between two libraries.
  • Figure 29 Correlation of position-specific number of T-C transitions in two protein-occupancy profiling libraries. Scatterplot of absolute numbers of position-specific T-C transition events for all T positions inside transcripts, which showed at least 2 transitions in one of the two replicates. The solid line indicates the best linear fit. Pearson correlation coefficient is indicated.
  • FIG. 30 Detailed view of occupancy profile on EEF2 gene.
  • Browser view of genomic region encoding EEF2 gene Human genome 18.
  • Track A shows consensus T-C transition profile (number of T-C transitions).
  • Track B shows consensus sequence coverage.
  • Track C shows Phastcon conservation of placental mammals.
  • FIG 31 Detailed view of occupancy profile on EEF2 3'UTR.
  • Browser view of genomic region encoding 3'UTR of EEF2 gene Human genome 18.
  • Track A shows consensus T-C transition profile (number of T-C transitions).
  • Track B shows consensus sequence coverage.
  • Track C shows Phastcon conservation of placental mammals.
  • FIG 32 Detailed view of occupancy profile in 3'UTRs (related to Fig. 11 ). Browser view of genomic region encoding 3'UTR of EEF2 (Human genome 18). Tracks A and B show T-C transition profiles (number of T-C transitions) for libraries 1 and 2, respectively. Tracks C and D show sequence coverage for libraries 1 and 2, respectively. Track D shows Phastcon conservation of placental mammals.
  • Figure 33 Detailed view of occupancy profile on CBX3 3'UTR. Browser view of genomic region encoding 3'UTR of CBX3 gene (Human genome 18). Track A shows consensus T-C transition profile (number of T-C transitions). Track B shows consensus sequence coverage. Track C shows Phastcon conservation of placental mammals. Figure 34.
  • TP53 3'UTR Browser view of genomic region encoding 3'UTR of TP53 gene (Human genome 18).
  • Track A shows consensus T-C transition profile (number of T-C transitions).
  • Track B shows consensus sequence coverage.
  • Track C shows Phastcon conservation of placental mammals.
  • Track D shows binding sites of individual RNA binding proteins. Black boxes indicate experimentally verified binding sites of RNA binding proteins HuR and RBM38. White boxes indicate binding sites of HuR identified by PAR-CLIP.
  • FIG. 35 T-C transition probability around microRNA target sites. Probability of observing T-C transitions around miRNA binding sites in protein occupancy profiling data. microRNA target sites are indicated by bold black line.
  • FIG. 36 T-C transition probability around microRNA target sites. Probability of observing T-C transitions around miRNA binding sites in AGO PAR-CLIP data. microRNA target sites are indicated by bold black line.
  • FIG. 37 T-C transition density on different transcript regions. Relative density of T-C transitions along different transcript regions, observed in protein occupancy profiles. Thin black line indicates entire transcript, thick black line indicates 5'UTR, dashed grey lines indicates CDS, thick grey line indicates 3'UTR.
  • Figure 38 Number of crosslinking sites observed in 3'UTRs compared to number of available thymidines. Number of 3'UTR uridine positions with indicated number of consensus T-C transitions.
  • Figure 39 Conservation of crosslinked thymidines in protein occupancy profiles. Comparison of PhyloP score of 3mer sequences centered on crosslinked T (dashed grey line) to random non-crosslinked 3mers (black line) is shown. The p-value indicates the significance of the difference of the PhyloP score distribution between crosslinked and control regions as given by a two-sample Kolmogorov-Smirnov test
  • Figure 40 Detailed view of occupancy profile around trait/disease-associated SNP rs9299.
  • Track A shows consensus T-C transition profile (number of T-C transitions).
  • Track B shows consensus sequence coverage.
  • rs9299 black box below track B
  • Track C shows Phastcon conservation of placental mammals.
  • Figure 41 Detailed view of occupancy profile around trait/disease-associated SNP rs8321.
  • Track A shows consensus T-C transition profile (number of T-C transitions).
  • Track B shows consensus sequence coverage.
  • rs8321 black box below track B
  • Track C shows Phastcon conservation of placental mammals.
  • FIG 42 Detailed view of protein occupancy on ACTB 3'UTR in HEK293 and MCF7 cells.
  • Tracks A and B show T-C transition profiles in HEK293 and MCF7 cells, respectively.
  • Tracks C and D show sequence coverage in HEK293 and MCF7 cells.
  • Track E shows Phastcon conservation of placental mammals.
  • Bottom panel shows zoom into a 50 nt region within the 3'UTR of ACTB.
  • Tracks F and G show T-C transition profiles for HEK293 and MCF7 in zoom in region.
  • Figure 43 Detailed view of protein occupancy on ACTB 3'UTR in HEK293 and MCF7 cells.
  • Tracks A and B show T-C transition profiles in HEK293 and MCF7 cells, respectively.
  • Tracks C and D show sequence coverage in HEK293 and MCF7 cells.
  • Track E shows Phastcon conservation of placental mammals.
  • Bottom panel shows zoom into a 20 nt region within the 3'UTR of ACTB.
  • Tracks F and G show T-C transition profiles for HEK293 and MCF7 in zoom in region.
  • FIG 44 Detailed view of protein occupancy on Smg7 3'UTR in undifferentiated and differentiated mouse embryonic stem (ES) cells.
  • Tracks A and B show T-C transition profiles in undifferentiated and differentiated mouse ES cells, respectively.
  • Tracks C and D show sequence coverage in undifferentiated and differentiated mouse ES cells, respectively.
  • Track E shows Phastcon conservation of placental mammals.
  • Bottom panel shows zoom into a 100 nt region within the 3'UTR of Smg7.
  • Tracks F and G show T-C transition profiles in undifferentiated and differentiated mouse ES cells, respectively.
  • Protein-denaturing conditions during the purification ensure a stringent isolation of proteins in direct contact with mRNA through covalent bonds and thus enable the identification of the mRNA-interacting proteins by mass spectrometry (Figure 1 ).
  • 4SU-labeled RNA, crosslinked to proteins can readily be identified by characteristic T to C transitions in cDNA (Hafner et al., 2010) providing a way to globally identify the RNA binding sites of the mRNA-bound proteome ( Figure 1 ).
  • We initially tested this approach by purifying protein-mRNA complexes using magnetic oligo(dT) beads from UV-irradiated and non-irradiated intact human embryonic kidney (HEK) 293 cells after growth in medium supplemented with or without 4SU and 6SG.
  • RNA-binding proteins As expected when probing the oligo(dT) precipitate for the presence of known RNA-binding proteins by Western analysis, we were able to detect the heterogeneous nuclear ribonucleoprotein K (HNRNPK).
  • HNRNPK nuclear ribonucleoprotein K
  • AG02/EIF2C2 the Argonaute protein, AG02/EIF2C2 was not detectable after a single oligo(dT) pull down, likely due the insufficient precipitation of mRNAs and/or incomplete capture of mRNAs with shortened poly(A) tails, like microRNA/AGO targeted mRNAs.
  • the GAPDH transcript is abundant and targeted by AGO proteins (Hafner et al., 2010; Kishore et al., 201 1 ).
  • Figure 3 shows that only about 70 % of this transcript was depleted in the supernatant when compared to input RNA.
  • Three additional consecutive pull downs from the same extract reduced the amount of GAPDH mRNA in the supernatant to about 5% ( Figure 3).
  • a Western analysis of the pooled eluates of four oligo(dT) purifications validated the presence of AG02 protein ( Figure 4) as well as the RNA-binding protein QUAKING (QKI), indicating that a single or multiple consecutive oligo(dT) purifications are required to precipitate crosslinked AGO protein.
  • RNA removal such as via enzymatic digestion and/or precipitation followed by SDS-PAGE and transfer to nitrocellulose membranes, enabled a significant increase in sensitivity of the RNA sequences bound by protein.
  • RBPs the small nuclear ribonucleoprotein polypeptide E (SNRPE), the U3 RNA-binding protein PDCD1 1 , ELAVL3, RBM16, PA2G4, and RBPMS.
  • SNRPE small nuclear ribonucleoprotein polypeptide E
  • PDCD1 1 the U3 RNA-binding protein
  • ELAVL3 the U3 RNA-binding protein
  • PA2G4 PA2G4
  • RBPMS RBPMS
  • RNA-interacting proteins present in complexes that influence surveillance and translation of spliced mRNAs.
  • EIF4A1 , EIF4B, EIF4E, EIF4G1 , and EIF4H all of which are present in the translation initiation complex.
  • EIF4B, EIF4E, EIF4G1 , and EIF4H all of which are present in the translation initiation complex.
  • the complete set of 21 HNRNP proteins, which have diverse functions in mRNA processing and transport, were discovered in this analysis.
  • the identified mRNA binders only partially overlapped with sets of proteins found in nuclear RNA-containing structures. 99 out of 172 proteins detected in spliceosomal B and C complexes (Bessonov et al., 2008), were observed to interact with mRNA (Figure 13).
  • 243 identified mRNA interactors were also found in the nucleolus proteome (Andersen et al., 2005) ( Figure 13).
  • RNA-binding proteins In addition to the expected mRNA-interacting proteins, we identified 267 proteins (Table S2), which have not been previously annotated as RNA-binding (Fig. 3A, others). 80% of these proteins were detected in at least 2 out of 3 proteomic analyses and about 50% were observed in all three pull-downs (Table S2).
  • Table S2 We applied an adaptation of a multiple association network integration algorithm (Mostafavi et al., 2008) to predict proteins with RNA-binding function, using gene ontology data, Interpro and Pfam domain data, gene coexpression, protein-protein interaction, and structural similarity data (Drew et al., 2011 ).
  • RNA-binders that use new or highly divergent RNA-binding domains that occupy novel regions of the known protein association networks, Table S2).
  • Some of our discoveries include proteins that are functionally annotated as transcription factors (JUN, NXF1 ), protein kinases (FASTKD1 , FASTKD2, FASTKD5), DNA repair proteins (XRCC5, XRCC6 and PRKDC), an oxygenase (ALKBH5), an ubiquitin-specific protease (USP10), and a phosphatase (DUSP14). Additionally, several proteins encoded by uncharacterized open reading frames (C1orf35, C16orf80, C11orf31 , C9orf1 14, C19orf47) were observed to be RNA binding. Over-representation of nucleic acid binding domains
  • HMG-box a.21.1
  • SSRP1 structure specific recognition protein 1
  • the "Winged Helix” DNA-binding protein is present in a number of RNA helicases.
  • the AlbA-like fold was found in POP7 and in C9orf23. Notably, the AlbA-like superfamily had already previously been suggested to be involved in RNA binding (Aravind et al., 2003).
  • FIG. 13 shows that the majority of RNA-binding domain-containing proteins theoretically expressed in HEK293 were identified by our analyses.
  • RNA-bound proteome connects posttranscriptional regulation to DNA-related processes
  • the PPI sub-network for members linked to the term "response to DNA damage” (GO ID 6974) has been generated (not shown).
  • Central to this network are XRCC6/Ku70, XRCC5/Ku80, and the DNA- activated protein kinase (PRKDC). These proteins were identified in each of the three proteomics analyses. Besides their role in DNA double strand break repair and recombination, the proteins have been shown to interact with RNA structures, such as the RNA-stem loop region in yeast telomerase TLC1 and the RNA-component of human telomerase (hTR) (Ting et al., 2005).
  • RNA structures such as the RNA-stem loop region in yeast telomerase TLC1 and the RNA-component of human telomerase (hTR) (Ting et al., 2005).
  • XRCC6 had been suggested to bind internal ribosomal entry site (IRES) elements and likely involved in the regulation of IRES-mediated mRNA translation (Silvera et al., 2006).
  • XRCC6 harbors a DNA/RNA- binding SAP-domain, which was a significantly over-represented domain in the mRNA-bound proteome (Table 1 ).
  • RNA-binding activity of a subset of the identified proteins we applied a crosslinking- immunoprecipitation (CLIP) assay.
  • CLIP crosslinking- immunoprecipitation
  • RNA-binding proteins As positive controls in this CLIP assay, we used five RNA-binding proteins: CAPRIN1 (Shiina et al., 2005), HNRNPD/AUF1 (Knapinska et al., 2011 ), HNRNPR (Hassfeld et al., 1998), HNRPNU (Kiledjian and Dreyfuss, 1992), as well as MYEF2, which is a transcriptional repressor (Haas et al., 1995) with an RNA recognition motif (RRM) domain.
  • AKAP8L, FAM98A, USP10, SART1 , YTHDF2, and ZC3H7B were previously found to be present in complexes containing RNA-binding proteins, suggesting that these proteins themselves can interact with RNA.
  • crosslinked proteins possess enzymatic activities: ALKBH5 (2- oxoglutarate oxygenase), C22orf28 (RNA ligase), CSNK1 E (kinase), MKRN2 (ubiquitin ligase), PRDX1 (peroxidase), and USP10 (ubiquitin thioesterase).
  • RNA-binding proteins have been implicated in transcriptional regulation either by inhibition of histone deacetylases (KIAA1967) or by acting as transcription factor (BTF3, MYBBP1A, and EDF1 ). Since the EDF1 encodes a prokaryotic-type helix-turn-helix motif, suggesting this protein may function in DNA binding, we further examined the nature of the crosslinked nucleic acid. When we incubated the immunoprecipitate with RNAse I, but not with DNAse I, the radioactive signal of the ribonuclease-treated complex was reduced, indicating that EDF1 was crosslinked to RNA. In addition, our data indicated that two proteins, C17orf85 and IFIT5, whose molecular functions are unknown, were crosslinked to RNA.
  • RNA-binding sites of several novel mRNA interactors To confirm that a subset of our novel identified RNA-binders are indeed binding mRNA transcripts and to identify their binding sites at high resolution, we applied photoactivatable-ribonucleoside-enhanced crosslinking and immunoprecipitation (PAR-CLIP) in combination with next generation sequencing (Hafner et al., 2010).
  • PAR-CLIP photoactivatable-ribonucleoside-enhanced crosslinking and immunoprecipitation
  • crosslinking of 4SU-labeled RNA to proteins leads to specific T to C transition events in cDNA sequences, marking the protein binding site on the target RNA (Hafner et al., 2010).
  • ZC3H7B was previously shown to form a ternary complex with the translation initiation factor EIF4G and the Rotavirus nonstructural protein NSP3 in virus infected cells (Vitour et al., 2004).
  • ALKBH5 is 2-oxoglutarate dependent oxygenase and a direct target of hypoxia-inducible factor 1 ⁇ (HIF-1a) (Thalhammer et al., 2011 ).
  • HIF-1a hypoxia-inducible factor 1 ⁇
  • the ALKBH5 and C17orf85 binding sites were preferentially distributed to the distal 5' region of CDSs ( Figure 20).
  • C22orf28 also known as HSPC117, is the essential subunit of a human tRNA splicing ligase complex (Popow et al., 2011 ).
  • a closer inspection of the C22orf28 target transcripts revealed that the ligase contacts the X-box binding protein 1 (XBP1 ) mRNA.
  • XBP1 X-box binding protein 1
  • two of the C22orf28 RNA binding sites in XBP1 are flanking an intron (Figure 20), which is removed by endoplasmic reticulum stress-induced unconventional cytoplasmic splicing (Yoshida et al., 2001 ).
  • Our findings suggest that the protein is the elusive ligase in this enzyme-mediated splicing event.
  • Protein occupancy profiling on mRNA reveals widespread binding to 3'UTRs
  • Figure 26 shows the density distribution of rank correlation coefficients for corresponding transcripts in both experiments (median 0.712) compared to the correlation of randomly selected unrelated transcripts (median 0.015).
  • Figure 27 we compared the median coverage over entire transcripts (median of all windows for each transcript) between replicate experiments ( Figure 27) and obtained a rank correlation coefficient of 0.984, suggesting a high degree of similarity between replicate experiments, both in coverage signal for individual transcript regions and overall transcript sequence coverage.
  • TASs trait/disease-associated SNPs
  • rs9299 and rs8321 are TASs that are located in the 3'UTRs of HOXB5 and ZNRD1 , respectively. rs9299 has been reported to be linked to childhood obesity (Bradfield et al., 2012),while rs8321 was described to be associated with AIDS progression (Limou et al., 2009).
  • the present method is associated with unexpected advantages and delivers novel results in light of the prior art.
  • Differential protein occupancy profiling in human MCF7 breast cancer cells and HEK293 human embryonic kidney cells has been carried out using the method of the present invention.
  • changes can be observed in particular regions, indicating potentially relevant functional consequences on (m)RNA processing.
  • the present invention also offers an unbiased search for differentially occupied regions, via crosslinking by RNA-binding proteins rather than ribosomes.
  • Differential protein occupancy profiling has also been carried out in undifferentiated and differentiated mouse embryonic stem cells.
  • the method provides an analysis of the role of cis-regulatory RNA sequence elements and trans-acting RNA-binding proteins (RBP) that effect post-transcriptional regulation in the context of self-renewal and cell fate decisions.
  • RBP trans-acting RNA-binding proteins
  • a protein occupancy profiling approach of present invention enables determination of differentially bound regions in undifferentiated and differentiated mouse embryonic stem cells (see Figure 44).
  • the observations obtained by this approach shed light on mechanisms by which RNA-protein interactions provide the highly selective control of basic cellular processes needed for development and differentiation.
  • the knowledge of critical RNA-based network modules might facilitate the development of more rational pluripotent cell- based differentiation strategies for treating diseases.
  • SILAC-based proteomics allowed us to quantify the enrichment of proteins in oligo(dT) precipitations from UV-irradatiated cells to a non- irradiated control population. After applying stringent enrichment cutoff criteria we ended up with a list of 801 proteins highly enriched in oligo(dT)-precipitations from UV-irradiated cells.
  • RNA in the oligo(dT) precipitate and RNA crosslinked to the co-purified proteins showed that the majority of transcripts were derived from protein-coding genes. Close to 90% of the identified proteins were observed in at least two mRNA pulldowns of crosslinked cells compared to those of non- crosslinked cells. As expected a majority, about 70%, of the mRNA binders were proteins previously described to interact with RNA based on their function as RNA-binding proteins, helicases, nucleases and RNA-modifying enzymes. In addition to known RNA-binding domains, our analyses on the enrichment of structural folds and domains revealed several unexpected structures among the identified mRNA binders.
  • KIAA1967 also known as Deleted in Breast cancer 1 (DBC1 ) was initially identified as an inhibitor of the histone acetyltransferase SIRT1 (Kim et al., 2008). Recent work showed that KIAA1967 and SIRT1 play reciprocal roles as major regulators of estrogen receptor a activity (Ji Yu et al., 201 1 ). Initial PAR-CLIP results showed that KIAA1967 directly interacts with mRNA sequences (unpublished). Another new RNA binder is the Myb- binding protein 1 a (MYBBP1A).
  • MYBBP1A Myb- binding protein 1 a
  • MYBBP1A interacts with and regulates the activity of several transcription factors, including c-Myb (Favier and Gonda, 1994), and NFKB (Owen et al., 2007).
  • EDF1 also identified as RNA-binding, interacts with the basic leucine zipper proteins, ATF1 , c-Jun, and c-Fos, and acts as transcriptional coactivator (Kabe et al., 1999). It is presently unknown by what mechanism these proteins modulate transcription and whether the RNA binding function is required for this activity.
  • ALKBH5 found only in vertebrates, possibly functions in oxidative RNA demethylation, since it shows similarity to the Escherichia coli DNA-methylation repair enzyme AlkB and possesses 2-oxoglutarate oxygenase activity (Thalhammer et al., 201 1 ).
  • our set of mRNA binders also included the methyltransferase, NSUN2.
  • NSUN2 Despite its narrow substrate range, catalyzing a 5- methylcytosine modification on tRNAs , NSUN2 might have a broader role in mRNA modification as evidenced by a recent finding of widespread occurrence of 5-methylcytosine in human mRNA (Squires et al., 2012).
  • RNA modifications might be more prevalent in mRNA than anticipated. Further experiments are needed to examine the RNA substrates of these enzymes and their impact on posttranscriptional regulation. Complementing the identification of the mRNA-bound proteome, we were able to determine the mRNA regions that can crosslink to proteins in HEK293 cells. To our knowledge this is the first time that transcriptome-wide protein binding patterns on mRNAs are being reported. One of the most interesting outcomes was that, during the life cycle of an mRNA molecule, widespread regions of the 3'UTRs provide sites for RNA-binding proteins.
  • transcripts are generally bound and regulated by multiple RNA- interacting proteins (Keene, 2007).
  • the combinatorial assembly of cis-regulatory factors which takes place in a spatial and time-resolved manner, determines the fate of an mRNA molecule.
  • Untranslated regions of protein-coding transcripts seem to provide ample sequence elements for proteins to bind and to function in the regulation of mRNA biogenesis, localization, decay and translation.
  • comprehensive high resolution mapping of protein-RNA interactions using different CLIP approaches lead to the discovery of sites of protein-RNA interactions that control distinct posttranscriptional processes.
  • the presented protein occupancy profile on mRNA narrows the genomic sequence search space for cis-regulatory elements in untranslated mRNA regions. As our data indicated, the identification of occupied mRNA sites will be very valuable for the examination of rapidly emerging data on genetic variation between individuals. Some polymorphic variations within a population possibly contribute to complex traits and diseases by impacting posttranscriptional and/or translational regulation of gene expression.
  • the identification of the mRNA-bound proteome and its occupancy profile on protein-coding transcripts offers a systems-wide view on the protein-mRNA interactome, describing its components and the RNA sites of interactions. Using this approach in the future will greatly contribute to a better understanding of cellular functions of mRNP complexes with the goal to elucidate the posttranscriptional regulatory code that defines growth, differentiation and disease.
  • Oligonucleotides, plasmids and antibodies All oligonucleotides, plasmids and antibodies are described in the Supplemental Information. Plasmids are made available through Addgene (www.addgene.com).
  • HEK Human embryonic kidney 293 cell lines that allow stable inducible expression of His/FLAG/HA- tagged proteins were generated using the Flp-ln System (Invitrogen). For mass spectrometry, cells were grown in SILAC medium as described in (Ong et al., 2002).
  • mRNA was isolated from TRIzol extracted total RNA using oligo(dT) Dynabeads (Invitrogen) as recommended by manufacturer or by direct precipitation from cell lysates as described for the isolation of mRNA-bound proteins. 4SU- and 6SG-containing RNA was further isolated from non-crosslinked RNA by biotinylation followed by streptavidin-pulldown as described in (Dolken et al., 2008) and As below. The cDNA libraries were generated from each RNA precipitation following the protocol provided by lllumina and the libraries were sequenced on an lllumina GAM by a 1X36 bp run.
  • HEK 293 cells were grown for 16 hr in medium supplemented with 4-thiouridine and 6-thioguanosine to final concentrations of 200 ⁇ each.
  • An additional labeling pulse with 100 ⁇ of each photoreactive nucleoside was applied 2 hr prior to UV-irradiation to ensure the labeling of short-lived transcripts.
  • Living cells, grown on light SILAC medium were irradiated with 365 nm UV light (0.2 J/cm2) whereas the control cells, grown on heavy SILAC medium were not UV-crosslinked (experiment L1 and L2).
  • label swap experiment (experiment H1 )
  • the cells grown on heavy SILAC medium were crosslinked and the cells grown on light SILAC medium were used as control.
  • lysis/binding buffer 100 mM Tris HCI, pH 7.5, 500 mM LiCI, 10 mM EDTA pH 8.0, 1 % (w/v) LiDS, 5mM EDTA, 5 mM DTT, Complete Mini EDTA-free protease inhibitor (Roche). Oligo(dT) beads were added to cell extract and incubated for 1 hr at room temperature on a rotating wheel. The supernatant was saved for further precipitation rounds.
  • RNAse I (10 U/ml) and benzonase (125 U/ml) for 3 hr at 37°C in elution buffer containing 1 mM MgCI 2 .
  • Cells stably expressing His/FLAG/HA-tagged proteins, were labeled with 100 ⁇ 4SU, UV-irradiated and lysed in NP-40 lysis buffer. 4SU-labeled non-irradiated cells were used as control.
  • Immunoprecipitation was carried out with anti-FLAG magnetic beads (Sigma). Beads were treated with Calf Intestinal Phosphatase and 5'-endlabeled using T4 polynucleotide kinase. The crosslinked protein- RNA complexes were resolved on 4%-12% NuPAGE gel (Invitrogen), and the corresponding protein- RNA complexes were analyzed by phosphorimaging and Western blotting.
  • PAR-CLIP PAR-CLIP protocol was performed as described in (Hafner et al., 2010). In brief, cells were labeled with 4-thiouridine, UV-irradiated and lysed. After immunoprecipitation, the protein-RNA complex was radiolabeled and separated on SDS-PAGE. The protein-RNA complex was visualized by
  • Flp-ln HEK293 cells were grown in medium supplemented with 200 ⁇ 4SU 16 h prior to crosslinking.
  • Harvested cells were resuspended in 10 pellet volumes of lysis/binding buffer (100 mM Tris-HCI pH 7.5, 500 mM LiCI, 10 mM EDTA pH 8.0, 1 % LiDS, 5 mM dithiothreitol (DTT)).
  • Oligo(dT)25 Dynabeads purification was performed as described above. Protein-RNA complexes were TCA precipitated and RNAse I treated. Following RNAse I treatment protein-RNA complexes were precipitated by ammonium sulfate precipitation.
  • Small RNA cloning adapters 5'adapter rGrUrUrCrArGrArGrUrUrCrUrArCrArGrUrCrGrArCrGrArUrCrGrArCrGrArUrC (SEQ ID NO. 1 ) 3' barcoded adapters (bar-coded is underlined)
  • NBC1 AppTCTAAAATCGTATGCCGTCTTCTGCTTG-lnvdT (SEQ ID NO. 2)
  • NBC2 AppTCTCCCATCGTATGCCGTCTTCTGCTTG-lnvdT (SEQ ID NO. 3)
  • NBC3 AppTCTGGGATCGTATGCCGTCTTCTGCTTG- 1 nvdT (SEQ ID NO. 4)
  • NBC4 AppTCTJTTATCGTATGCCGTCTTCTGCTTG-lnvdT (SEQ ID NO. 5)
  • NBC5 AppTCTCACGTCGTATGCCGTCTTCTGCTTG-lnvdT (SEQ ID NO.
  • NBC6 AppTCTCCATTCGTATGCCGTCTTCTGCTTG-lnvdT (SEQ ID NO. 7)
  • NBC7 AppTCTCGTATCGTATGCCGTCTTCTGCTTG-lnvdT (SEQ ID NO. 8)
  • NBC8 AppTCTCTGCTCGTATGCCGTCTTCTGCTTG-lnvdT (SEQ ID NO. 9) cDNA amplification (restriction sites are underlined)
  • ALKBH5 AppTCTCCATTCGTATGCCGTCTTCTGCTTG-lnvdT
  • RNU61 5 -GTGCTCGCTTCGGCAGC (SEQ ID NO. 14); 5 -TGGAACGCTTCACGAATTTGC (SEQ ID NO. 15)
  • GAPDH 5'-AGCCACATCGCTCAGACAC (SEQ ID NO. 1666); 5 -GCCCAATACGACCAAATCC (SEQ ID NO. 17),
  • C22orf28 5'-TCAAGACTATCTGAAGGGAATGG (SEQ ID NO. 18); 5'-CAGGGGTTGTGTTGAAGACC (SEQ ID NO. 19)
  • Plasmids pDONR vectors were largely obtained from the ORFeome project.
  • pENTR constructs were generated by PCR amplification of the respective coding sequences (CDS) from HEK293 cDNA followed by restriction digest and ligation into pENTR4 (Invitrogen).
  • pDONR and pENTR vectors carrying CDS were recombined into pFRT/TO/His/FLAG/HA-DEST destination vector (Invitrogen) using GATEWAY LR recombinase (Invitrogen) according to manufacturer's protocol to allow for doxycycline-inducible expression of stably transfected His/FLAG/HA-tagged protein in Flp-ln T-REx HEK293 cells (Invitrogen) from the inducible TO/CMV promoter.
  • HEK293 T-REx Flp-ln cells were grown in D-MEM high glucose with 10% (v/v) fetal bovine serum, 1 % (v/v) 2 mM L-glutamine, 1 % (v/v) 10,000 U/ml penicillin/10,000 g/ml streptomycin, 100 ⁇ g/ml zeocin and 15 ⁇ g/ml blasticidin.
  • Cell lines stably expressing His/FLAG/HA-tagged proteins were generated by co-transfection of pFRT/TO/His/FLAG/HA constructs with pOG44 (Invitrogen). Cells were selected by exchanging zeocin with 100 ⁇ g/ml hygromycin. Expression of epitope-tagged proteins was induced by addition of 200 ng/ml doxycycline 15 to 20 h before crosslinking. The expression of His/FLAG/ was assessed by Western analysis using a mouse anti-HA.11 monoclonal antibody (Covance).
  • SILAC medium For quantitative proteomics, cell were grown in SILAC medium as described in (Ong et al., 2002). Briefly, Dulbecco's Modified Eagle's Medium (DMEM) Glutamax lacking arginine and lysine (a custom preparation from Gibco) supplemented with 10% dialyzed fetal bovine serum (dFBS, Gibco) was used. Heavy (H) and light (L) SILAC media were prepared by adding 84 mg/l 3 C 6 5 N 4 L- arginine plus 146 mg/l 3 C 6 5 N 2 L-lysine or the corresponding non-labeled amino acids (Sigma), respectively. Labeled amino acids were purchased from Sigma Isotec. Mass spectrometry
  • oligo(dT) precipitated protein-RNA complexes for mass spectrometry analysis using in- gel digestion mRNA-bound proteins were isolated as described in experimental procedures and separated on a NuPAGE Novex 4 to 12% gradient gel (Invitrogen) using reducing conditions. Proteins were fixed in fixative solution (50% methanol (v/v), 10% acetic acid (w/v)) and stained afterwards with the Colloidal Blue staining Kit (Invitrogen). Gel lanes were cut into 12 gel slices which were individually subjected to reduction, alkylation and in-gel digestion with sequence grade modified trypsin (Promega) according to standard protocols (Shevchenko et al., 2006). After in-gel digestion peptides were extracted and desalted using StageTips (Rappsilber et al., 2007) prior to analysis by mass spectrometry.
  • Reversed-phase liquid chromatography was performed employing a Eksigent NanoLC - 1 D Plus system using self-made fritless C18 microcolumns (Ishihama et al., 2002) (75 ⁇ ID packed with ReproSil-Pur C18-AQ 3- ⁇ resin, Dr. Maisch GmbH) connected on-line to the electrospray ion source (Proxeon) of a LTQ-Orbitrap Velos mass spectrometer (Thermo Fisher).
  • Peptide samples were loaded onto the column with a flow rate of 250 nl/min followed by sample elution at a flow rate of 200 nl/min with a 10 to 60 % acetonitrile gradient over 6 h in 0.5% acetic acid.
  • the LTQ-Orbitrap Velos instrument was operated in the data dependent mode (DDA) with a full scan in the Orbitrap followed by up to 20 consecutive MS/MS scans in the LTQ.
  • DDA data dependent mode
  • Former target ions selected for MS/MS were dynamically excluded for 60 s. Total cycle time for one full scan plus up to 20 MS/MS scans was approximately 2 s.
  • Quant.exe module extracts, re-calibrates and quantifies isotope clusters and SILAC doublets in the raw data files (medium labels: Arg6 and Lys4; heavy labels: Arg10 and Lys8; maximum of three labeled amino acids per peptide; polymer detection enabled; top 6 MS/MS peaks per 100 Da).
  • Generated peak lists (msm-files) were submitted to a MASCOT search engine (version 2.2, MatrixScience) and searched against the IPI human database (v. 3.72) supplemented with common contaminants (e.g. trypsin, BSA). The database was modified in-house to obtain a
  • Protein ratios were calculated from the median of all normalized peptide ratios using only unique peptides or peptides assigned to respective protein groups with the highest number of peptides (Occam's razor" peptides). Only protein groups with at least two SILAC counts (peptide ratios) were kept for further analysis. SILAC proteomics data analysis
  • Fold changes were computed by MaxQuant (Cox and Mann, 2008) for proteins and protein groups in case of ambiguities. We considered only fold changes that were supported by at least three measured peptide ratios in a single experiment or three measured peptide ratios over all three experiments (L1 , L2 and H1 ).
  • the quantified protein groups were associated with NCBI Reference Sequence (Refseq) protein IDs by BLASTing the leading protein of the protein group against the human protein database.
  • the MaxQuant software computes protein intensities as the sum of all identified peptide intensities (maximum detector peak intensities of the peptide elution profile, including all peaks in the isotope cluster). Protein intensities were divided by the number of theoretically observable peptides (calculated by in silico protein digestion with a PERL script, all fully tryptic peptides between 6 and 30 amino acids were counted while missed cleavages were neglected). "iBAQ intensities” correlate well with absolute protein abundance and can therefore be used for comparison of protein levels within the experiment (Schwanatorir et al., 2011 ). RNA-binding protein validation assays and PAR-CLIP
  • the growth medium was removed completely while cells were still attached to the plates.
  • Cells were irradiated on ice with 365 nm UV light (0.2 J/cm2) in a Stratalinker 2400 (Stratagene) equipped with light bulbs for the appropriate wavelength. Cells were scraped off with a rubber policeman in 2 ml PBS per plate and collected by centrifugation at 800 ⁇ g for 4min.
  • the pellets of cells crosslinked with UV 365 nm were resuspended in 3 cell pellet volumes of NP40 lysis buffer (50 mM Tris HCI, pH 7.5, 140 mM LiCI, 2 mM EDTA, pH 8.0, 0.5% (v/v) NP40, 0.5 mM DTT, complete EDTA-free protease inhibitor cocktail (Roche)) and incubated on ice for 10 min.
  • the typical scale of such an experiment was 3 ml of cell pellet.
  • the cell lysate was cleared by centrifugation at 13,000 x g.
  • IP wash buffer 50 mM HEPES-KOH, pH 7.5, 300 mM KCI, 0.05% (v/v) NP40, 0.5 mM DTT, complete EDTA-free protease inhibitor cocktail (Roche)
  • RNase T1 Framas
  • Beads were washed 3 times with 1 ml of high-salt wash buffer (50 mM HEPES-KOH, pH 7.5, 500 mM KCI, 0.05% (v/v) NP40, 0.5 mM DTT, complete EDTA-free protease inhibitor cocktail (Roche)) and resuspended in two bead volumes of dephosphorylation buffer (50 mM Tris-HCI, pH 7.9, 100 mM NaCI, 10 mM MgCI2, 1 mM DTT). Calf intestinal alkaline phosphatase (CIP) was added to obtain a final concentration of 0.5 U/ ⁇ , and the suspension was incubated for 60 min at 37°C.
  • CIP Calf intestinal alkaline phosphatase
  • Beads were washed twice with 1 ml of phosphatase wash buffer (50 mM Tris-HCI, pH 7.5, 20 mM EGTA, 0.5% (v/v) NP40) and twice with 1 ml of polynucleotide kinase (PNK) Buffer (50 mM Tris-HCI, pH 7.5, 50 mM NaCI, 10 mM MgCI2, 5 mM DTT). Beads were resuspended in one original bead volume of PNK buffer.
  • phosphatase wash buffer 50 mM Tris-HCI, pH 7.5, 20 mM EGTA, 0.5% (v/v) NP40
  • PNK polynucleotide kinase Buffer
  • Radiolabeling of RNA segments crosslinked to immunoprecipitated proteins To the bead suspension described above, ⁇ -32 ⁇ - ⁇ was added to a final concentration of 0.25 ⁇ / ⁇ and T4 PNK (CIP) to 1 U/ ⁇ in one original bead volume. The suspension was incubated for 30 min at 37°C. Thereafter, nonradioactive ATP was added to obtain a final concentration of 100 ⁇ and the incubation was continued for another 5 min at 37°C.
  • CIP T4 PNK
  • the magnetic beads were then washed 5 times with 800 ⁇ of PNK Buffer and resuspended in 20 ⁇ of SDS-PAGE Loading Buffer (10% glycerol (v/v), 50 mM Tris-HCI, pH 6.8, 2 mM EDTA, 2% SDS (w/v), 100 mM DTT, 0.1 % bromophenol blue).
  • SDS-PAGE Loading Buffer (10% glycerol (v/v), 50 mM Tris-HCI, pH 6.8, 2 mM EDTA, 2% SDS (w/v), 100 mM DTT, 0.1 % bromophenol blue).
  • Protein IP was performed according to the RNA-binding protein validation assay protocol until labeling the Y-32P-ATP RNA-labeling step. After radiolabeling, the beads were washed twice with PNK buffer and resuspended in PNK buffer. The sample was divided in three aliquots and incubated at 37°C for 30 min with either RNAse I (0.1 U/ ⁇ ) or DNAse 1 (0.1 ⁇ / ⁇ ). ⁇ control sample was incubated at 37°C without addition of Nucleases. After incubation, the beads were washed 5 times with 800 ⁇ of PNK Buffer and resuspended in 20 ⁇ of SDS-PAGE Loading Buffer. SDS-PAGE and Western Blotting
  • FLAG beads suspension was incubated for 5 min at 95°C and vortexed.
  • the magnetic beads were separated on a magnetic separator and the supernatant was loaded used for SDS-PAGE.
  • the gel was analyzed by phosphorimaging.
  • the protein-RNA complexes were blotted on a nitrocellulose membrane (HybondTM ECLTM, GE Healthcare) and analyzed by phosphorimaging followed by incubation with anti-HA.11 antibody followed by HRP-conjugated secondary anti-mouse IgG antibody and the tagged protein was visualized using the AmershamTM ECLTM (GE-Healthcare) western blot detection reagent.
  • the radioactive RNA-protein complex migrating at the expected molecular weight of the target protein was excised from the gel and electroeluted in a D-Tube Dialyzer Midi (Novagen) in 800 ⁇ SDS running buffer according to the instructions of the manufacturer.
  • the recovered RNA was carried through a cDNA library preparation protocol originally described for cloning of small regulatory RNAs (Dolken et al., 2008; Hafner et al., 2008).
  • the first ligation step was carried out with a 3' barcoded adapter (see under oligonucleotides) in 20 ⁇ reaction volume using 10.5 ⁇ of the recovered RNA.
  • the PAR-CLIP libraries were sequenced on an lllumina Genome Analyzer GAM and HighSeq using 1X50BP single read protocol.
  • PAR-CLIP computational analysis lllumina PAR-CLIP cDNA sequencing reads were aligned to the human genome assembly (hg18), allowing for up to one mismatch, insertion or deletion. Only uniquely mapping reads were retained.
  • the number of T to C or G to A mismatches served as a crosslink score.
  • We also assigned a quality score based on the number and positions of distinct reads contributing to the cluster.
  • NP40 lysis buffer 50 mM HEPES-KOH at pH 7.4, 150 mM KCI, 2 mM EDTA, 0.5% (v/v) NP40, 0.5 mM DTT, complete EDTA-free protease inhibitor cocktail
  • Lysates were cleared by centrifugation (16,000 RCF, 4°C, 15 min).
  • IP wash buffer 50 mM HEPES-KOH at pH 7.4, 150 mM KCI, 2 mM EDTA, 0.05% (v/v) NP40, 0.5 mM DTT, complete EDTA-free protease inhibitor cocktail), resuspended in one original volume of Proteinase K solution (200 mM Tris-HCI at pH 7.5, 300 mM NaCI, 25 mM EDTA, 2% (w/v) SDS, 0.6 mg/ml Proteinase K) and incubated 20 min at 65°C. RNA was phenol chloroform extracted and ethanol-precipitated.
  • Single stranded cDNAs were synthesized from total RNA with an 18nt oligo(dT) primer using
  • Flp-ln HEK293 cells were grown in medium (D-MEM high glucose with 10% (v/v) fetal bovine serum, 1 % (v/v) 2 mM L-glutamine, 1 % (v/v) 10,000 U/ml penicillin/10,000 ⁇ g/ml streptomycin) supplemented with 200 ⁇ 4SU 16 h prior to harvest.
  • medium D-MEM high glucose with 10% (v/v) fetal bovine serum, 1 % (v/v) 2 mM L-glutamine, 1 % (v/v) 10,000 U/ml penicillin/10,000 ⁇ g/ml streptomycin
  • UV crosslinking culture media was removed and cells were irradiated on ice with 365 nm UV light (0.2 J/cm 2 ) in a Stratalinker 2400 (Stratagene), equipped with light bulbs for the appropriate wavelength.
  • Dynabeads Oligo(dT) 2 5 were briefly washed in lysis/binding buffer, resuspended in the appropriate volume of lysate and incubated 1 h at room temperature on a rotating wheel. Following incubation, supernatant was removed and stored on ice for multiple rounds of mRNA hybridization. Beads were washed 2 times in 1 lysate volume lysis/binding buffer, followed by 3 washes in 1 lysate volume NP40 washing buffer (50 mM Tris pH 7.5, 140 mM LiCI, 2 mM EDTA, 0.5% NP40, 0.5 mM DTT).
  • RNAse I RNAse I, Ambion, 100U
  • protein-RNA complexes were precipitated by ammonium sulfate precipitation.
  • RNA Kinase (NEB, 100U, 20 min, 37°C) and 0.2 ⁇ / ⁇ - 32 ⁇ ⁇ - ⁇ (NEG). Radiolabeled RNA was again Phenol/Chloroform extracted and recovered by ethanol precipitation. Subsequent small RNA cloning and adapter ligations were performed as described in previously (Hafner et al., 2010).
  • HEK293 total RNA was extracted using TRIzol reagent (Invitrogen) following the manufacturer's instructions. Briefly, HEK293 cells grown on SILAC medium were harvested as described previously and the pellet was immediately suspended in TRIzol reagent and homogenized. 1 ml chloroform was added to 5 ml TRIzol solution, vigorously mixed and incubated at room temperature. After centrifugation (13,000 g, 5 min, 4°C) the aqueous phase was transferred to a fresh RNAse-free tube and 1 volume ROTI® phenol/chloroform/isoamyl alcohol (25/24/1 , v/v) was added.
  • TRIzol reagent Invitrogen
  • RNA oliqo(dT) purification from 4SU and 6SG labeled UV-irradiated cells (“UV") mRNA was isolated as described before for the isolation of mRNA-bound proteins, starting from UV- irradiated cells.
  • protein-RNA complexes were proteinase K digested in proteinase K reaction buffer (800 mM GuHCI, 50 mM EDTA, 5% Tween 10, 0.5% Triton-X 100) for 3 h at 55 °C with a final proteinase K concentration of 2 mg/ml.
  • the RNA was recovered by acidic phenol/chloroform extraction and ethanol precipitation and resuspended in nuclease-free water.
  • RNA obtained by the four precipitation methods described above was analyzed by next-generation sequencing.
  • cDNA libraries were prepared from the recovered RNA, following the mRNA sequencing protocol provided by lllumina. Briefly, poly(A) RNA was fragmented using 5x fragmentation buffer (200mM Tris-acetate, pH 8.1 , 500 mM potassium-acetate, 150 mM magnesium-acetate) by heating at 94°C for 3.5 min. After ethanol-precipitation, first- and second-strand cDNA synthesis was performed with random hexamer primers.
  • 5x fragmentation buffer 200mM Tris-acetate, pH 8.1 , 500 mM potassium-acetate, 150 mM magnesium-acetate
  • cDNA fragments were end-repaired using T4 polymerase, T4 PNK and Klenow DNA polymerase and a protruding "A" base was added to the 3' ends of the DNA fragments for the ligation with lllumina adaptors with "T” overhangs.
  • cDNA in the size range of 200 +/- 25 bp was selected for PCR amplification and sequenced on an lllumina GAM or HighSeq for 1X36 bp (single-end sequencing protocol) according to the manufacturer's instructions.
  • RNA preparation and enrichments analysis We computed transcript abundance estimates (FPKM values) using cufflinks (version 1.03; cite) with options --frag-bias-correct and -multi-read-correct. The course of RNA preparation was monitored using pairwise scatter plots of these FPKM values. The read count distribution over different RNA class (mRNA, rRNA and other) was inferred by multiplying the FPKM values with the respective length of the longest transcripts for a given gene.
  • the BAM file of the merged protein occupancy profiling libraries was used to determine the per base coverage. This per base coverage was normalized by the maximal read coverage of the region of interest. We employed the coverageBed tool (Quinlan and Hall, 2010) to compute profiles for individual exons. These profiles are stitched together and relative positions are computed after normalizing for transcript length by discretizing coordinates into 100 bins for each transcript.
  • Protein occupancy profiling short reads were generated with a strand-specific protocol. We separated all reads by strand and generated two strand-specific mpileup file with samtools 0.1.18 (Li et al., 2009). These file were subsequently input into custom PERL scripts to produce a separate bedgraph file for each strand (Watson / Crick). Bedgraph files were loaded into our local UCSC hg18 genome browser instance for visualization purposes. Additionally, a single bedgraph file for strand-specific T-to-C conversions was produced in a similar manner. T-to-C conversion sites are only included in the final file if at least two conversion events were observed.
  • RNA binding proteins were queried against the Proteome Folding Project database (PFP) (Drew et al., 2011 ), a database of protein structure and domain boundary predictions spanning > 100 complete genomes.
  • PFP Proteome Folding Project database
  • This database provided SCOP superfamily classifications derived from sequence similarity (psi-blast), fold recognition and Rosetta de novo structure prediction for proteins for RNA- binding proteins (and their close homologs in other species in the database).
  • SCOP classifications discovered via PDB-Blast, FFAS03, and de novo structure prediction (with a 0.8 confidence threshold) were used for fold enrichment analysis. From these sets of SCOP superfamilies, an enrichment analysis as described in Drew et al. was performed.
  • Pfam and InterPro domain Enrichment We carried out a similar enrichment analysis as the SCOP enrichment to determine Pfam functional families and InterPro signatures (IPRs) that are overrepresented in each of the quantification sets.
  • IPRs InterPro signatures
  • the Pfam families and InterPro signatures found in the novel human RNA-binding sequences formed the enrichment sets for Pfam and InterPro enrichment analysis, respectively, while the family and signature hits for the non-redundant human protein sequences as a whole formed the background sets for each analysis.
  • we split the RNA-binding proteins by quantification group, compiled Pfam family and InterPro signature sets for all groups, and ran enrichment analysis (as described for SCOP folds above) against the background of Pfam and InterPro results for our set of non-redundant human protein sequences.
  • RNA binding Predictions for the GO Molecular Function term RNA binding, along with first-generation child terms of RNA binding, were calculated using our implementation of the GeneMANIA algorithm of (Mostafavi et al., 2008) modified as described below.
  • the GeneMANIA algorithm was chosen because of its strong performance in the MouseFunc function prediction competition (Pena-Castillo et al., 2008), and its computational tractability which allowed us to quickly run predictions on our large set of 49,518 non- redundant human sequences.
  • the GeneMANIA algorithm is a form of Gaussian-field label propagation that operates on a functional association network whose edges define the affinity between genes given a functional context, generated as a weighted combination of a number of association networks.
  • Each node of the graph is a gene which may be previously known to have the function in question, known to not possess that function, or may be unlabeled (here we focus on RNA-binding, its child GO-functions, and DNA-binding).
  • the network edges are generated by an optimization step that maximizes the functional similarity inherent in a set of heterogeneous data-types in the presence of the known labels, (the weights on the influence of each network type are learned from a training set of already annotated proteins separately for each function label we try to predict, as described in (Mostafavi et al., 2008)).
  • discriminant thresholds are chosen to assign predictions to unlabeled sequences.
  • Gene expression data was obtained from two assays: HG-U133_Plus_2, and U133AAofAv2, which combined have a total of 368 cell types/conditions.
  • the data for each assay was normalized individually using the Affy library in R, and the resulting two expression vectors for each gene were concatenated into one vector. Since expression data is collected at the gene-level, we had to map our sequences to gene names that appeared in the two assays. The network was then created as the pairwise PCC of expression vectors.
  • Protein-protein interaction data was collected from the BioNetBuilder project (Avila-Campillo et al., 2007). The network was left as a binary network, with a 1 if two proteins interacted and a 0 otherwise.
  • This database consisted of the proteomes of six organisms: human, mouse, Caenorhabditis elegans, Escherichia coli, Saccharomyces cerevisiae, and Arabidopsis thaliana.
  • the best 250 blast results were filtered to retain sequences with at least 50% identity over 80% sequence length.
  • a best homolog with astral structural coverage was chosen to represent the source human sequence, where the best homolog was considered to be the best blast match with the best structural coverage.
  • Structural coverage was computed by considering all domains of a homolog protein. Each domain is either covered wholly or partially by an astral structure, or not annotated with structure. Structural coverage is the average over the number of domains of the proportion of each domain that is covered by astral structure (for domains without astral structure annotation, proportion covered is 0). Domains are annotated with astral structures by matching the regions of domains assigned to PDB structures via the Ginzu pipeline (Drew et al., 2011 ) to the regions of those PDB structures covered by astral structures. Each domain-to-astral structure annotation was scored with a percentage of sequence-space overlap.
  • each source human protein is represented by a list of covering astral structures that can be considered in a protein- vs-protein comparison based on structural similarity. From conservative choosing of homolog proteins based on sequence identity and high structural coverage, we were left with roughly 23,000 proteins to compare. Prior to this analysis, we computed the structural similarity of all astral structures to each other by MCM (mammoth confidence metric) score, described in (Drew et al., 2011 ). With these pre-computed structure similarities, we calculated the all-vs.-all homolog protein structure matrix for these 23,000 proteins, keeping only the 100 most structurally similar proteins for each source protein.
  • MCM mimmoth confidence metric
  • Structural similarity between two proteins was computed as the sum of the maximum pairwise score between each structure representing each protein averaged over the total number of domains in both proteins. If the similarity score of a source and target protein was in the best 100 scores for that source protein, the score for the pair was added to the structure all-vs.-all matrix. This effort was extremely computationally demanding (23,000 by 23,000 sets of operations), and so was split into 500 parts and run on a compute cluster.
  • ⁇ and t are the positive-positive positive-negative pair weight matrix and the target vector described in (Mostafavi et al., 2008).
  • Positive examples were chosen as any sequence annotated as having the function in question, or with any child of the function in question, and negative examples were sequences with GO molecular function annotations that were not the function in question or a child of the function in question.
  • each network, and the final composite network was normalized as in (Mostafavi et al., 2008). Unlike in GeneMANIA where each node in the network was a gene, in our network nodes represent sequences, and as some data-types contain information at the sequence level, and others at the gene level, the coverage of each data-type is not consistent. Additionally some data-types are simply more comprehensive, such as InterPro, which returned results for 38,396 sequences, compared to Pfam, which only returned hits for 35,082 (Table X.X shows the coverage of each data-type). Because the objective function rewards low similarity for negative example pairs, a data-type with less coverage and therefore more sparsity will get an unfair weight boost in the final network.
  • f* argmin, ⁇ ( f t - y,) 2 + ⁇ w i ⁇ ⁇ ( f t - f ⁇ )
  • w's are the weights from the combined network and y is a bias vector representing your prior knowledge about positive and negative examples, and your prior belief about unlabeled sequences, as in (Mostafavi et al., 2008).
  • Threshold values for the discriminant were obtained through k-fold cross-validation. For each of the k calculations, the known labels are dropped on a random leave-out set of size 49,518/k, which contains the same proportion of positive, negative, and unlabeled sequences as the entire set. The discriminant threshold is then varied until the desired precision level is met on the leave-out set, and the recall value for the discriminant threshold is noted. If the desired precision level is unattainable for any discriminant threshold value, then that particular cross-validation run is not counted in the final totals.
  • the discriminant threshold value for a given precision is calculated as the average of values for all of the cross-validation tests.
  • Table XN.1 shows the recall values at each precision for the different function labels.
  • RNA binding child terms averaged over different levels of functional specificity.
  • function labels with fewer than 10 gene products in the human genome, as our focus is on a general functional term "RNA binding" here, but provide statistics obtained from RNA binding children in the other specificity levels used in mouseFunc: [1 1-30], [31 -100], and [101-300]).
  • HumMania shows strong performance across all specificity levels, often outperforming the methods of Guan and Mostafavi. Of course, this is not a fair comparison, as predictions were done on different organisms, with different base data sets, at different points in time, and for humMania only on a subset of RNA binding-related terms.
  • the goal of this benchmarking is not to demonstrate the superiority of our algorithm over another, but rather to illustrate that our algorithm performs comparably to the current state of the art.
  • the performance of our algorithm and other state of the art algorithms suggests that the RNA-binders that we could not predict are unlikely to be accurately discovered or predicted by any prediction algorithm, and thus represent new RNA-binders (RNA-binders that have novel interactions, structures, domains, and sequence families).
  • table XN.3 shows the performance of humMania on the RNA-binding term itself, compared to the performance of Guan and Mostafavi in mouseFunc on that same GO function term. HumMania outperforms these methods in terms of precision at all but the lowest and highest recall values, and exhibits the top AUC score and recall at 1 % false positive rate. It is worth noting that the count of annotated RNA Binders is much higher in our data compared to the count in mouseFunc (1214 in our data, and a specificity range of [101-300] in mouseFunc), which might contribute to the enhanced performance of our algorithm.
  • RNA-binder association network Networks used for function prediction were output in SIF format, prior to combining networks for function prediction. For each RNA-binder the top 100 (or fewer) network edges for each network type were loaded into Cytoscape (Cline et al., 2007). Previous RNA-binding function annotations and the number of times each RNA-binder was seen (in 1 , 2 or 3 experiments) were loaded as node attributes. The network used to generate all network diagrams is available as raw network formats (.sif, .eda and .noa) and Cytoscape files (.cys) as supplemental files. Several Cytoscape plugins (Avila-Campillo et al.,
  • RNA-binders identified in this study using Cytoscape to analyze the connectivity between them.
  • Protein-Protein interaction data was gathered from the iReflndex database consolidation, via the iRefScape CytoScape plugin (Razick, 2011 #810). Data was obtained for the list of RNA-binders (examining only intra-list interactions), as well as for a background network to use as a control. This background network consisted of to a theoretical set of expressed proteins deduced from mRNA sequencing data which make up approximately 95% of the total cellular mRNA molecules.
  • the transcripts were mapped to unique gene symbols, with any of the RNA-binding list members that did not appear in the background added to it manually, creating a final HEK293 Interactome of -6400 genes.
  • 50 random subsets of this background network were chosen, each the same size as the RNA-binder list, and their clustering coefficients, average degrees, and characteristic path lengths averaged.
  • RNA-binding domains DUF1220 NBPF10 PF06758 IPR010630 5.22e-23 2.1837 zf-NF-X1 NFX1 PF01422 IPR000967 0.0049 2.7584
  • NP 002902 NP 055309 NP 004719 NP 054737 NP 060060 NP 055642 NP 775930 NP 065757 NP 005443 NP 006744
  • NP 62431 1 NP 006833 NP 005828 NP 001036100 NP 277028
  • NP 004930 NP 1 15700 NP 001 139373 NP 1 15727 NP 006378
  • NP 056360 NP 001035526 NP 006436 NP 061874 NP 055323 NP 036473 NP 076971 NP 00101 1 NP 1 15487 NP 001 135757
  • NP 001 136402 NP 005849 NP 055642 NP 1 12223 NP 05631 1
  • API5 PFRT/TO/HIS/FLAG/HA-API5 positive positive positive positive
  • mRNA polyadenylate-binding protein gene isolation and sequencing and identification of a
  • hnRNP R heterogeneous nuclear ribonucleoprotein R
  • RNA regulons coordination of post-transcriptional events. Nat Rev Genet 8, 533-543.
  • DBC1 is a negative regulator of SIRT1. Nature 451, 583- 586.
  • RNA-binding protein HuR enhances p53 translation in response to ultraviolet light irradiation. Proc Natl Acad Sci U S A 100, 8354-8359.
  • GeneMANIA a realtime multiple association network integration algorithm for predicting gene function. Genome Biol 9 Suppl 1, S4.
  • MYBBPIa is a novel repressor of NF-kappaB. J Mol Biol 366, 725-736.
  • HSPC1 17 is the essential subunit of a human tRNA splicing ligase complex. Science 331, 760-764.
  • RNA-binding proteins in yeast indicates dual functions for many enzymes.
  • Human AlkB homologue 5 is a nuclear 2- oxoglutarate dependent oxygenase and a direct target of hypoxia-inducible factor lalpha (HIF- lalpha).
  • RoXaN a novel cellular protein containing TPR, LD, and zinc finger motifs, forms a ternary complex with eukaryotic initiation factor 4G and rotavirus NSP3. J Virol 78, 3851 -3862.
  • XBP1 mRNA is induced by ATF6 and spliced by IRE1 in response to ER stress to produce a highly active transcription factor.
  • BioNetBuilder2.0 bringing systems biology to chicken and other model organisms. BMC Genomics 10 Suppl 2, S6.
  • GeneMANIA a real- time multiple association network integration algorithm for predicting gene function. Genome Biol 9 Suppl 1, S4.

Landscapes

  • Life Sciences & Earth Sciences (AREA)
  • Chemical & Material Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Engineering & Computer Science (AREA)
  • Organic Chemistry (AREA)
  • Molecular Biology (AREA)
  • Proteomics, Peptides & Aminoacids (AREA)
  • Zoology (AREA)
  • Wood Science & Technology (AREA)
  • Genetics & Genomics (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Physics & Mathematics (AREA)
  • Biotechnology (AREA)
  • Biophysics (AREA)
  • Analytical Chemistry (AREA)
  • Immunology (AREA)
  • General Engineering & Computer Science (AREA)
  • Biochemistry (AREA)
  • Microbiology (AREA)
  • General Health & Medical Sciences (AREA)
  • Biomedical Technology (AREA)
  • Hematology (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Urology & Nephrology (AREA)
  • Pathology (AREA)
  • Plant Pathology (AREA)
  • Cell Biology (AREA)
  • Food Science & Technology (AREA)
  • Medicinal Chemistry (AREA)
  • General Physics & Mathematics (AREA)
  • Chemical Kinetics & Catalysis (AREA)
  • Measuring Or Testing Involving Enzymes Or Micro-Organisms (AREA)

Abstract

La présente invention concerne un procédé in vitro d'identification de la séquence d'une ou de plusieurs molécules de poly(A)+ARN interagissant physiquement avec les protéines. La présente invention concerne un procédé permettant de définir le transcriptome lié aux protéines dans n'importe quelle condition cellulaire donnée, comme un état pathologique ou après un traitement avec une substance ou un médicament donné, ou une autre perturbation cellulaire. L'invention concerne également un procédé d'identification d'un médicament cible et un procédé d'identification d'un ou de plusieurs biomarqueurs, de préférence d'identification d'un panel de biomarqueurs, pour n'importe quel état médical donné, comprenant le procédé selon l'invention.
PCT/EP2013/055569 2012-03-16 2013-03-18 Procédé d'identification de la séquence d'un poly(a)+arn qui interagit physiquement avec les protéines WO2013135910A1 (fr)

Priority Applications (2)

Application Number Priority Date Filing Date Title
US14/385,501 US20150045237A1 (en) 2012-03-16 2013-03-18 Method for identification of the sequence of poly(a)+rna that physically interacts with protein
EP13714874.8A EP2825890A1 (fr) 2012-03-16 2013-03-18 Procédé d'identification de la séquence d'un poly(a)+arn qui interagit physiquement avec les protéines

Applications Claiming Priority (4)

Application Number Priority Date Filing Date Title
EP12159836.1 2012-03-16
EP12159836 2012-03-16
EP12175269 2012-07-06
EP12175269.5 2012-07-06

Publications (1)

Publication Number Publication Date
WO2013135910A1 true WO2013135910A1 (fr) 2013-09-19

Family

ID=48050672

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/EP2013/055569 WO2013135910A1 (fr) 2012-03-16 2013-03-18 Procédé d'identification de la séquence d'un poly(a)+arn qui interagit physiquement avec les protéines

Country Status (3)

Country Link
US (1) US20150045237A1 (fr)
EP (1) EP2825890A1 (fr)
WO (1) WO2013135910A1 (fr)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107109698A (zh) * 2014-09-22 2017-08-29 加利福尼亚大学董事会 Rna stitch测序:用于直接映射细胞中rna:rna相互作用的测定

Families Citing this family (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2011143659A2 (fr) * 2010-05-14 2011-11-17 Fluidigm Corporation Méthodes d'isolement de l'acide nucléique
US9828600B2 (en) * 2013-09-20 2017-11-28 University Of Massachusetts Compositions and methods for constructing cDNA libraries that allow for mapping the 5′ and 3′ ends of RNAs
US9896683B2 (en) * 2014-07-30 2018-02-20 University Of Massachusetts Isolating circulating microRNA (miRNA)
US11441169B2 (en) 2016-06-17 2022-09-13 Ludwig Institute For Cancer Research Ltd Methods of small-RNA transcriptome sequencing and applications thereof
CN110010196B (zh) * 2019-03-19 2020-11-06 北京工业大学 一种基于异质网的基因相似性搜索方法
CA3147297A1 (fr) * 2019-07-16 2021-01-21 Meliolabs Inc. Procedes et dispositifs pour fusion haute resolution numerique a cellule unique
CN115497555B (zh) * 2022-08-16 2024-01-05 哈尔滨工业大学(深圳)(哈尔滨工业大学深圳科技创新研究院) 多物种蛋白质功能预测方法、装置、设备及存储介质

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2008094599A2 (fr) 2007-01-30 2008-08-07 Rockefeller University, The Ligase d'arn modifié pour modification en 3' efficace d'arn
WO2010014636A1 (fr) 2008-07-28 2010-02-04 Rockefeller University Méthodes d’identification de segments d’arn liés par des protéines de liaison à l’arn ou des complexes ribonucléoprotéines
US20110076676A1 (en) 2003-10-23 2011-03-31 The Rockefeller University Method of Purifying RNA Binding Protein-RNA Complexes
US20110269647A1 (en) 2010-04-28 2011-11-03 Medical Research Council Method

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20110076676A1 (en) 2003-10-23 2011-03-31 The Rockefeller University Method of Purifying RNA Binding Protein-RNA Complexes
WO2008094599A2 (fr) 2007-01-30 2008-08-07 Rockefeller University, The Ligase d'arn modifié pour modification en 3' efficace d'arn
WO2010014636A1 (fr) 2008-07-28 2010-02-04 Rockefeller University Méthodes d’identification de segments d’arn liés par des protéines de liaison à l’arn ou des complexes ribonucléoprotéines
US20110287412A1 (en) 2008-07-28 2011-11-24 Rockefeller University Methods for Identifying RNA Segments Bound by RNA-Binding Proteins or Ribonucleoprotein Complexes
US20110269647A1 (en) 2010-04-28 2011-11-03 Medical Research Council Method

Non-Patent Citations (96)

* Cited by examiner, † Cited by third party
Title
ADAM, S.A.; NAKAGAWA, T.; SWANSON, M.S.; WOODRUFF, T.K.; DREYFUSS, G.: "mRNA polyadenylate-binding protein: gene isolation and sequencing and identification of a ribonucleoprotein consensus sequence.", MOL CELL BIOL, vol. 6, 1986, pages 2932 - 2943
ANDERSEN, J.S.; LAM, Y.W.; LEUNG, A.K.; ONG, S.E.; LYON, C.E.; LAMOND, A.I.; MANN, M.: "Nucleolar proteome dynamics", NATURE, vol. 433, 2005, pages 77 - 83, XP002570014, DOI: doi:10.1038/nature03207
ARAVIND, L.; LYER, L.M.; ANANTHARAMAN, V.: "The two faces of Alba: the evolutionary connection between proteins participating in chromatin structure and RNA metabolism", GENOME BIOL, vol. 4, 2003, pages R64, XP021012754, DOI: doi:10.1186/gb-2003-4-10-r64
ASCANO, M.; HAFNER, M.; CEKAN, P.; GERSTBERGER, S.; TUSCHL, T.: "Identification of RNA-protein interaction networks using PAR-CLIP", WILEY INTERDISCIP REV RNA, 2011
AVILA-CAMPILLO, I.; DREW, K.; LIN, J.; REISS, D.J.; BONNEAU, R.: "BioNetBuilder: automatic integration of biological networks.", BIOINFORMATICS, vol. 23, 2007, pages 392 - 393
BAILEY; GRIBSKOV, J COMPUT BIOL, vol. 5, 1998, pages 211 - 21
BALTZ A G ET AL: "The mRNA-Bound Proteome and Its Global Occupancy Profile on Protein-Coding Transcripts", MOLECULAR CELL 20120608 CELL PRESS USA LNKD- DOI:10.1016/J.MOLCEL.2012.05.021, vol. 46, no. 5, 8 June 2012 (2012-06-08), pages 674 - 690, XP002683323, ISSN: 1097-2765 *
BESSONOV, S.; ANOKHINA, M.; WILL, C.L.; URLAUB, H.; LUHRMANN, R.: "Isolation of an active step I spliceosome and composition of its RNP core", NATURE, vol. 452, 2008, pages 846 - 850
BRENNER, S.E.; KOEHL, P.; LEVITT, M.: "The ASTRAL compendium for protein structure and sequence analysis.", NUCLEIC ACIDS RES, vol. 28, 2000, pages 254 - 256, XP002963343, DOI: doi:10.1093/nar/28.1.254
CHOI, Y.D.; DREYFUSS, G.: "Isolation of the heterogeneous nuclear RNA- ribonucleoprotein complex (hnRNP): a unique supramolecular assembly", PROC NATL ACAD SCI U S A, vol. 81, 1984, pages 7471 - 7475
CLINE, M.S.; SMOOT, M.; CERAMI, E.; KUCHINSKY, A.; LANDYS, N.; WORKMAN, C.; CHRISTMAS, R.; AVILA-CAMPILO, I.; CREECH, M.; GROSS, B: "Integration of biological networks and gene expression data using Cytoscape", NAT PROTOC, vol. 2, 2007, pages 2366 - 2382, XP055163055, DOI: doi:10.1038/nprot.2007.324
COX, J.; MANN, M.: "MaxQuant enables high peptide identification rates, individualized p.p.b.-range mass accuracies and proteome-wide protein quantification", NAT BIOTECHNOL, vol. 26, 2008, pages 1367 - 1372
DENHEZ, F.; LAFYATIS, R.: "Conservation of regulated alternative splicing and identification of functional domains in vertebrate homologs to the Drosophila splicing regulator, suppressor-of-white-apricot.", J BIOL CHEM, vol. 269, 1994, pages 16170 - 16179
DEVEREUX ET AL., NUCL. ACID RES., vol. 12, 1984, pages 387 - 395
DOLKEN, L.; RUZSICS, Z.; RADLE, B.; FRIEDEL, C.C.; ZIMMER, R.; MAGES, J.; HOFFMANN, R.; DICKINSON, P.; FORSTER, T.; GHAZAL, P. ET: "High-resolution gene expression profiling for simultaneous kinetic parameter analysis of RNA synthesis and decay", RNA, vol. 14, 2008, pages 1959 - 1972
DREW, K.; WINTERS, P.; BUTTERFOSS, G.L.; BERSTIS, V.; UPLINGER, K.; ARMSTRONG, J.; RIFFLE, M.; SCHWEIGHOFER, E.; BOVERMANN, B.; GO: "The Proteome Folding Project: proteome-scale prediction of structure and function", GENOME RES, vol. 21, 2011, pages 1981 - 1994
DREW, K.; WINTERS, P.; BUTTERFOSS, G.L.; BERSTIS, V.; UPLINGER, K.; ARMSTRONG, J.; RIFFLE, M.; SCHWEIGHOFER, E.; BOVERMANN, B.; GO: "The Proteome Folding Project: proteome-scale prediction of structure and function.", GENOME RES, vol. 21, 2011, pages 1981 - 1994
ELIAS, J.E.; GYGI, S.P.: "Target-decoy search strategy for increased confidence in large-scale protein identifications by mass spectrometry", NAT METHODS, vol. 4, 2007, pages 207 - 214
FABIAN SCHMIDT ET AL: "A proteomic analysis of oligo(dT)-bound mRNP containing oxidative stress-induced Arabidopsis thaliana RNA-binding proteins ATGRP7 and ATGRP8", MOLECULAR BIOLOGY REPORTS ; AN INTERNATIONAL JOURNAL ON MOLECULAR AND CELLULAR BIOLOGY, KLUWER ACADEMIC PUBLISHERS, DO, vol. 37, no. 2, 12 August 2009 (2009-08-12), pages 839 - 845, XP019773030, ISSN: 1573-4978 *
FAVIER, D.; GONDA, T.J.: "Detection of proteins that bind to the leucine zipper motif of c-Myb", ONCOGENE, vol. 9, 1994, pages 305 - 311
GREENBERG, J.R.: "Ultraviolet light-induced crosslinking of mRNA to proteins", NUCLEIC ACIDS RES, vol. 6, 1979, pages 715 - 732
HAAS, S.; STEPLEWSKI, A.; SIRACUSA, L.D.; AMINI, S.; KHALILI, K.: "Identification of a sequence-specific single-stranded DNA binding protein that suppresses transcription of the mouse myelin basic protein gene", J BIOL CHEM, vol. 270, 1995, pages 12503 - 12510, XP001165662
HAFNER, M.; LANDGRAF, P.; LUDWIG, J.; RICE, A.; OJO, T.; LIN, C.; HOLOCH, D.; LIM, C.; TUSCHL, T.: "Identification of microRNAs and other small regulatory RNAs using cDNA library sequencing.", METHODS, vol. 44, 2008, pages 3 - 12, XP022398574, DOI: doi:10.1016/j.ymeth.2007.09.009
HAFNER, M.; LANDTHALER, M.; BURGER, L.; KHORSHID, M.; HAUSSER, J.; BERNINGER, P.; ROTHBALLER, A.; ASCANO, M., JR.; JUNGKAMP, A.C.;: "Transcriptome-wide identification of RNA-binding protein and microRNA target sites by PAR-CLIP", CELL, vol. 141, 2010, pages 129 - 141, XP055293783, DOI: doi:10.1016/j.cell.2010.03.009
HASSFELD, W.; CHAN, E.K.; MATHISON, D.A.; PORTMAN, D.; DREYFUSS, G.; STEINER, G.; TAN, E.M.: "Molecular definition of heterogeneous nuclear ribonucleoprotein R (hnRNP R) using autoimmune antibody: immunological relationship with hnRNP P", NUCLEIC ACIDS RES, vol. 26, 1998, pages 439 - 445, XP002254074, DOI: doi:10.1093/nar/26.2.439
HINDORFF, L.A.; SETHUPATHY, P.; JUNKINS, H.A.; RAMOS, E.M.; MEHTA, J.P.; COLLINS, F.S.; MANOLIO, T.A.: "Potential etiologic and functional implications of genome-wide association loci for human diseases and traits", PROC NATL ACAD SCI U S A, vol. 106, 2009, pages 9362 - 9367, XP008150157, DOI: doi:10.1073/pnas.0903103106
HUNTER, S.; JONES, P.; MITCHELL, A.; APWEILER, R.; ATTWOOD, T.K.; BATEMAN, A.; BERNARD, T.; BINNS, D.; BORK, P.; BURGE, S. ET AL.: "InterPro in 2011: new developments in the family and domain prediction database", NUCLEIC ACIDS RES, vol. 40, 2011, pages D306 - D312
ISHIHAMA, Y.; RAPPSILBER, J.; ANDERSEN, J.S.; MANN, M.: "Microcolumns with self- assembled particle frits for proteomics.", JOURNAL OF CHROMATOGRAPHY, vol. 979, 2002, pages 233 - 239, XP004392262, DOI: doi:10.1016/S0021-9673(02)01402-4
JACKSON, R.J.; HELLEN, C.U.; PESTOVA, T.V.: "The mechanism of eukaryotic translation initiation and principles of its regulation", NAT REV MOL CELL BIOL, vol. 11, 2010, pages 113 - 127, XP009135819, DOI: doi:10.1038/nrm2838
JI YU, E.; KIM, S.H.; HEO, K.; OU, C.Y.; STALLCUP, M.R.; KIM, J.H.: "Reciprocal roles of DBC1 and SIRT1 in regulating estrogen receptor {alpha} activity and co-activator synergy", NUCLEIC ACIDS RES, vol. 39, 2011, pages 6932 - 6943
KABE, Y.; GOTO, M.; SHIMA, D.; IMAI, T.; WADA, T.; MOROHASHI, K.; SHIRAKAWA, M.; HIROSE, S.; HANDA, H.: "The role of human MBF1 as a transcriptional coactivator", J BIOL CHEM, vol. 274, 1999, pages 34196 - 34202
KATHIRESAN, S.; MELANDER, O.; GUIDUCCI, C.; SURTI, A.; BURTT, N.P.; RIEDER, M.J.; COOPER, G.M.; ROOS, C.; VOIGHT, B.F.; HAVULINNA,: "Six new loci associated with blood low- density lipoprotein cholesterol, high-density lipoprotein cholesterol or triglycerides in humans", NAT GENET, vol. 40, 2008, pages 189 - 197, XP002516887, DOI: doi:10.1038/NG.75
KATHIRESAN, S.; WILLER, C.J.; PELOSO, G.M.; DEMISSIE, S.; MUSUNURU, K.; SCHADT, E.E.; KAPLAN, L.; BENNETT, D.; LI, Y.; TANAKA, T.: "Common variants at 30 loci contribute to polygenic dyslipidemia.", NAT GENET, vol. 41, 2009, pages 56 - 65, XP008161409, DOI: doi:10.1038/ng.291
KEENE, J.D.: "RNA regulons: coordination of post-transcriptional events", NAT REV GENET, vol. 8, 2007, pages 533 - 543
KENNEDY, M.C.; MENDE-MUELLER, L.; BLONDIN, G.A.; BEINERT, H.: "Purification and characterization of cytosolic aconitase from beef liver and its relationship to the iron-responsive element binding protein.", PROC NATL ACAD SCI U S A, vol. 89, 1992, pages 11730 - 11734, XP001318615
KILEDJIAN, M.; DREYFUSS, G.: "Primary structure and binding activity of the hnRNP U protein: binding RNA through RGG box", EMBO J, vol. 11, 1992, pages 2655 - 2664, XP001108945
KIM, J.E.; CHEN, J.; LOU, Z.: "DBC1 is a negative regulator of SIRT1", NATURE, vol. 451, 2008, pages 583 - 586
KISHORE, S.; JASKIEWICZ, L.; BURGER, L.; HAUSSER, J.; KHORSHID, M.; ZAVOLAN, M.: "A quantitative analysis of CLIP methods for identifying binding sites of RNA-binding proteins", NAT METHODS, vol. 8, 2011, pages 559 - 564
KNAPINSKA, A.M.; GRATACOS, F.M.; KRAUSE, C.D.; HERNANDEZ, K.; JENSEN, A.G.; BRADLEY, J.J.; WU, X.; PESTKA, S.; BREWER, G.: "Chaperone Hsp27 modulates AUF1 proteolysis and AU- rich element-mediated mRNA degradation", MOL CELL BIOL, vol. 31, 2011, pages 1419 - 1431
KONIECZKA, J.H.; DREW, K.; PINE, A.; BELASCO, K.; DAVEY, S.; YATSKIEVYCH, T.A.; BONNEAU, R.; ANTIN, P.B.: "BioNetBuilder2.0: bringing systems biology to chicken and other model organisms.", BMC GENOMICS, vol. 10, no. 2, 2009, pages S6, XP021056264, DOI: doi:10.1186/1471-2164-10-S2-S6
KONIG, J.; ZARNACK, K.; ROT, G.; CURK, T.; KAYIKCI, M.; ZUPAN, B.; TURNER, D.J; LUSCOMBE, N.M.; ULE, J.: "iCLIP reveals the function of hnRNP particles in splicing at individual nucleotide resolution", NAT STRUCT MOL BIOL, vol. 17, 2010, pages 909 - 915, XP055368440, DOI: doi:10.1038/nsmb.1838
LE HIR, H.; ANDERSEN, G.R.: "Structural insights into the exon junction complex.", CURR OPIN STRUCT BIOL, vol. 18, 2008, pages 112 - 119, XP022485999, DOI: doi:10.1016/j.sbi.2007.11.002
LEBEDEVA, S.; JENS, M.; THEIL, K.; SCHWANHAUSSER, B.; SELBACH, M.; LANDTHALER, M.; RAJEWSKY, N.: "Transcriptome-wide Analysis of Regulatory Interactions of the RNA-Binding Protein HuR.", MOL CELL, vol. 43, 2011, pages 340 - 352, XP028276489, DOI: doi:10.1016/j.molcel.2011.06.008
LEE, I.; HONG, W.: "RAP--a putative RNA-binding domain.", TRENDS BIOCHEM SCI, vol. 29, 2004, pages 567 - 570, XP004610107, DOI: doi:10.1016/j.tibs.2004.09.005
LI, H.; HANDSAKER, B.; WYSOKER, A.; FENNELL, T.; RUAN, J.; HOMER, N.; MARTH, G.; ABECASIS, G.; DURBIN, R.: "The Sequence Alignment/Map format and SAMtools", BIOINFORMATICS, vol. 25, 2009, pages 2078 - 2079, XP055229864, DOI: doi:10.1093/bioinformatics/btp352
LINDBERG, U.; SUNDQUIST, B.: "Isolation of messenger ribonucleoproteins from mammalian cells", J MOL BIOL, vol. 86, 1974, pages 451 - 468, XP024012775, DOI: doi:10.1016/0022-2836(74)90030-8
LINDBLAD-TOH, K.; GARBER, M.; ZUK, O.; LIN, M.F.; PARKER, B.J.; WASHIETL, S.; KHERADPOUR, P.; ERNST, J.; JORDAN, G.; MAUCELI, E. E: "A high-resolution map of human evolutionary constraint using 29 mammals", NATURE, vol. 478, 2011, pages 476 - 482
LIU X S ET AL., NAT. BIOTECHNOL., vol. 20, no. 8, 2002, pages 835 - 9
M. HAFNER, P; LANDGRAF; J. LUDWIG; A. RICE; T. OJO; C. LIN; D. HOLOCH; C. LIM; T. TUSCHL: "Identification of microRNAs and other small regulatory RNAs using cDNA library sequencing", METHODS, vol. 44, 2008, pages 3 - 12, XP022398574, DOI: doi:10.1016/j.ymeth.2007.09.009
MARTIN, K.C.; EPHRUSSI, A.: "mRNA localization: gene expression in the spatial dimension", CELL, vol. 136, 2009, pages 719 - 730
MAZAN-MAMCZARZ, K.; GALBAN, S.; LOPEZ DE SILANES, I.; MARTINDALE, J.L.; ATASOY, U.; KEENE, J.D.; GOROSPE, M.: "RNA-binding protein HuR enhances p53 translation in response to ultraviolet light irradiation", PROC NATL ACAD SCI U S A, vol. 100, 2003, pages 8354 - 8359
MCCARTHY, F.M.; WANG, N.; MAGEE, G.B.; NANDURI, B; LAWRENCE, M.L.; CAMON, E.B.; BARRELL, D.G.; HILL, D.P.; DOLAN, M.E.; WILLIAMS,: "AgBase: a functional genomics resource for agriculture.", BMC GENOMICS, vol. 7, 2006, pages 229, XP021022207, DOI: doi:10.1186/1471-2164-7-229
MILEK, M.; WYLER, E.; LANDTHALER, M.: "Transcriptome-wide analysis of protein-RNA interactions using high-throughput sequencing.", SEMIN CELL DEV BIOL., 2011
MOORE, M.J.; PROUDFOOT, N.J.: "Pre-mRNA processing reaches back to transcription and ahead to translation", CELL, vol. 136, 2009, pages 688 - 700
MOSTAFAVI, S.; RAY, D.; WARDE-FARLEY, D.; GROUIOS, C.; MORRIS, Q.: "GeneMANIA: a real-time multiple association network integration algorithm for predicting gene function", GENOME BIOL, vol. 9, no. 1, 2008, pages S4, XP021041667
MOSTAFAVI, S.; RAY, D.; WARDE-FARLEY, D.; GROUIOS, C.; MORRIS, Q.: "GeneMANIA: a real-time multiple association network integration algorithm for predicting gene function.", GENOME BIOL, vol. 9, no. 1, 2008, pages S4, XP021041667
NAGARAJ, N.; WISNIEWSKI, J.R.; GEIGER, T.; COX, J.; KIRCHER, M.; KELSO, J.; PAABO, S.; MANN, M.: "Deep proteome and transcriptome mapping of a human cancer cell line", MOL SYST BIOL, vol. 7, 2011, pages 548
NEEDLEMAN; WUNSCH, J. MOL. BIOL., vol. 48, 1970, pages 443
NILSEN, T.W.; GRAVELEY, B.R.: "Expansion of the eukaryotic proteome by alternative splicing", NATURE, vol. 463, 2010, pages 457 - 463
ONG, S.E.; BLAGOEV, B.; KRATCHMAROVA, I.; KRISTENSEN, D.B.; STEEN, H.; PANDEY, A.; MANN, M.: "Stable isotope labeling by amino acids in cell culture, SILAC, as a simple and accurate approach to expression proteomics", MOL CELL PROTEOMICS, vol. 1, 2002, pages 376 - 386, XP009020302, DOI: doi:10.1074/mcp.M200025-MCP200
ONG, S.E.; BLAGOEV, B.; KRATCHMAROVA, I.; KRISTENSEN, D.B.; STEEN, H.; PANDEY, A.; MANN, M.: "Stable isotope labeling by amino acids in cell culture, SILAC, as a simple and accurate approach to expression proteomics.", MOL CELL PROTEOMICS, vol. 1, 2002, pages 376 - 386, XP009020302, DOI: doi:10.1074/mcp.M200025-MCP200
OWEN, H.R.; ELSER, M.; CHEUNG, E.; GERSBACH, M.; KRAUS, W.L.; HOTTIGER, M.O.: "MYBBP1a is a novel repressor of NF-kappaB.", J MOL BIOL, vol. 366, 2007, pages 725 - 736, XP026268823, DOI: doi:10.1016/j.jmb.2006.11.099
P. BERNINGER; D. GAIDATZIS; E. VAN NIMWEGEN; M. ZAVOLAN: "Computational analysis of small RNA cloning data", METHODS, vol. 44, 2008, pages 13 - 21, XP022398575, DOI: doi:10.1016/j.ymeth.2007.10.002
PEARSON; LIPMAN, PROC. NATL. ACAD. SCI. U.S.A., vol. 85, 1988, pages 2444
PENA-CASTILLO, L.; TASAN, M.; MYERS, C.L.; LEE, H.; JOSHI, T.; ZHANG, C.; GUAN, Y.; LEONE, M.; PAGNANI, A.; KIM, W.K. ET AL.: "A critical assessment of Mus musculus gene function prediction using integrated genomic evidence", GENOME BIOL, vol. 9, no. 1, 2008, pages S2, XP021041665
PENA-CASTILLO, L.; TASAN, M.; MYERS, C.L.; LEE, H.; JOSHI, T.; ZHANG, C.; GUAN, Y.; LEONE, M.; PAGNANI, A.; KIM, W.K. ET AL.: "A critical assessment of Mus musculus gene function prediction using integrated genomic evidence.", GENOME BIOL, vol. 9, no. 1, 2008, pages S2, XP021041665
POLLARD, K.S.; HUBISZ, M.J.; ROSENBLOOM, K.R.; SIEPEL, A.: "Detection of nonneutral substitution rates on mammalian phylogenies", GENOME RES, vol. 20, 2010, pages 110 - 121
POPOW, J.; ENGLERT, M.; WEITZER, S.; SCHLEIFFER, A.; MIERZWA, B.; MECHTLER, K.; TROWITZSCH, S.; WILL, C.L.; LUHRMANN, R.; SOLL, D.: "HSPC117 is the essential subunit of a human tRNA splicing ligase complex", SCIENCE, vol. 331, 2011, pages 760 - 764, XP007917547, DOI: doi:10.1126/science.1197847
QUENAULT, T.; LITHGOW, T.; TRAVEN, A.: "PUF proteins: repression, activation and mRNA localization.", TRENDS CELL BIOL, vol. 21, 2011, pages 104 - 112, XP028131672, DOI: doi:10.1016/j.tcb.2010.09.013
QUINLAN, A.R.; HALL, I.M.: "BEDTools: a flexible suite of utilities for comparing genomic features.", BIOINFORMATICS, vol. 26, 2010, pages 841 - 842, XP055307411, DOI: doi:10.1093/bioinformatics/btq033
RAPPSILBER, J.; MANN, M.; ISHIHAMA, Y.: "Protocol for micro-purification, enrichment, pre-fractionation and storage of peptides for proteomics using StageTips", NAT PROTOC, vol. 2, 2007, pages 1896 - 1906
ROTH, F. P.; HUGHES, J. D.; ESTEP, P. W.; CHURCH, G. M.: "Finding DNA regulatory motifs within unaligned noncoding sequences clustered by whole-genome mRNA quantitation", NAT BIOTECHNOL, vol. 16, 1998, pages 939 - 45, XP002153325, DOI: doi:10.1038/nbt1098-939
SCHERRER, T.; MITTAL, N.; JANGA, S.C.; GERBER, A.P.: "A screen for RNA-binding proteins in yeast indicates dual functions for many enzymes.", PLOS ONE, vol. 5, 2010, pages E15499
SCHMIDT, F.; MARNEF, A.; CHEUNG, M-K.; WILSON, I.; HANCOCK, J.; STAIGER, D.; LADOMERY, M.: "A protemoic analysis of oligo(dT)-bound mRNP containing oxidative stress-induced Arabidopsis thaliana RNA-binding proteins ATGRP7 and ATGRP8.", MOL. BIOL. REP, vol. 37, 2010, pages 839 - 845, XP019773030
SCHWANHAUSSER, B.; BUSSE, D.; LI, N.; DITTMAR, G.; SCHUCHHARDT, J.; WOLF, J.; CHEN, W.; SELBACH, M.: "Global quantification of mammalian gene expression control.", NATURE, vol. 473, 2011, pages 337 - 342, XP055059077, DOI: doi:10.1038/nature10098
SETYONO, B.; GREENBERG, J.R.: "Proteins associated with poly(A) and other regions of mRNA and hnRNA molecules as investigated by crosslinking", CELL, vol. 24, 1981, pages 775 - 783, XP027463052, DOI: doi:10.1016/0092-8674(81)90103-3
SHANNON, P.T.; REISS, D.J.; BONNEAU, R.; BALIGA, N.S.: "The Gaggle: an open-source software system for integrating bioinformatics software and data sources.", BMC BIOINFORMATICS, vol. 7, 2006, pages 176, XP021013681, DOI: doi:10.1186/1471-2105-7-176
SHEVCHENKO, A.; TOMAS, H.; HAVLIS, J.; OLSEN, J.V.; MANN, M.: "In-gel digestion for mass spectrometric characterization of proteins and proteomes", NAT PROTOC, vol. 1, 2006, pages 2856 - 2860
SHIINA, N.; SHINKURA, K.; TOKUNAGA, M.: "A novel RNA-binding protein in neuronal RNA granules: regulatory machinery for local translation.", J NEUROSCI, vol. 25, 2005, pages 4420 - 4434
SILVERA, D.; KOLOTEVA-LEVINE, N.; BURMA, S.; ELROY-STEIN, O.: "Effect of Ku proteins on IRES-mediated translation.", BIOL CELL, vol. 98, 2006, pages 353 - 361
SMITH; WATERMAN, ADV. APPL. MATH., vol. 2, 1981, pages 482
SQUIRES, J.E.; PATEL, H.R.; NOUSCH, M.; SIBBRITT, T.; HUMPHREYS, D.T.; PARKER, B.J.; SUTER, C.M.; PREISS, T.: "Widespread occurrence of 5-methylcytosine in human coding and non- coding RNA", NUCLEIC ACIDS RES., 2012
THALHAMMER, A.; BENCOKOVA, Z.; POOLE, R.; LOENARZ, C.; ADAM, J.; O'FLAHERTY, L.; SCHODEL, J.; MOLE, D.; GIASLAKIOTIS, K.; SCHOFIEL: "Human AlkB homologue 5 is a nuclear 2-oxoglutarate dependent oxygenase and a direct target of hypoxia-inducible factor 1alpha (HIF-1alpha).", PLOS ONE, vol. 6, 2011, pages E16210
TING, N.S.; YU, Y.; POHORELIC, B.; LEES-MILLER, S.P.; BEATTIE, T.L.: "Human Ku70/80 interacts directly with hTR, the RNA component of human telomerase", NUCLEIC ACIDS RES, vol. 33, 2005, pages 2090 - 2098
TRAPNELL, C.; PACHTER, L.; SALZBERG, S.L.: "TopHat: discovering splice junctions with RNA-Seq.", BIOINFORMATICS, vol. 25, 2009, pages 1105 - 1111
TSVETANOVA, N.G.; KLASS, D.M.; SALZMAN, J.; BROWN, P.O.: "Proteome-wide search reveals unexpected RNA-binding proteins in Saccharomyces cerevisiae", PLOS ONE, 2010, pages 5
ULE ET AL., SCIENCE, 2003
ULE, J.; JENSEN, K.B.; RUGGIU, M.; MELE, A.; ULE, A.; DARNELL, R.B.: "CLIP identifies Nova-regulated RNA networks in the brain", SCIENCE, vol. 302, 2003, pages 1212 - 1215
VITOUR, D.; LINDENBAUM, P.; VENDE, P.; BECKER, M.M.; PONCET, D.: "RoXaN, a novel cellular protein containing TPR, LD, and zinc finger motifs, forms a ternary complex with eukaryotic initiation factor 4G and rotavirus NSP3", J VIROL, vol. 78, 2004, pages 3851 - 3862
VOGEL, C.; ABREU RDE, S.; KO, D.; LE, S.Y.; SHAPIRO, B.A.; BURNS, S.C.; SANDHU, D.; BOUTZ, D.R.; MARCOTTE, E.M.; PENALVA, L.O.: "Sequence signatures and mRNA concentration can explain two-thirds of protein abundance variation in a human cell line", MOL SYST BIOL, vol. 6, 2010, pages 400, XP055261348, DOI: doi:10.1038/msb.2010.59
WAGENMAKERS, A.J.; REINDERS, R.J.; VAN VENROOIJ, W.J.: "Cross-linking of mRNA to proteins by irradiation of intact cells with ultraviolet light.", EUR J BIOCHEM, vol. 112, 1980, pages 323 - 330
WANG, E.T.; SANDBERG, R.; LUO, S.; KHREBTUKOVA, I.; ZHANG, L.; MAYR, C.; KINGSMORE, S.F.; SCHROTH, G.P.; BURGE, C.B.: "Alternative isoform regulation in human tissue transcriptomes", NATURE, vol. 456, 2008, pages 470 - 476
WOZNIAK, M.; TIURYN, J.; D DUTKOWSKI, J.: "MODEVO: exploring modularity and evolution of protein interaction networks.", BIOINFORMATICS, vol. 26, 2010, pages 1790 - 1791
YOSHIDA, H.; MATSUI, T.; YAMAMOTO, A.; OKADA, T.; MORI, K.: "XBP1 mRNA is induced by ATF6 and spliced by IRE1 in response to ER stress to produce a highly active transcription factor", CELL, vol. 107, 2001, pages 881 - 891
ZHANG, J.; CHO, S.J.; SHU, L.; YAN, W.; GUERRERO, T.; KENT, M.; SKORUPSKI, K.; CHEN, H.; CHEN, X.: "Translational repression of p53 by RNPC1, a p53 target overexpressed in lymphomas", GENES DEV, vol. 25, 2011, pages 1528 - 1543
ZOU, T.; MAZAN-MAMCZARZ, K.; RAO, J.N.; LIU, L.; MARASA, B.S.; ZHANG, A.H.; XIAO, L.; PULLMANN, R.; GOROSPE, M.; WANG, J.Y.: "Polyamine depletion increases cytoplasmic levels of RNA-binding protein HuR leading to stabilization of nucleophosmin and p53 mRNAs", J BIOL CHEM, vol. 281, 2006, pages 19387 - 19394

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107109698A (zh) * 2014-09-22 2017-08-29 加利福尼亚大学董事会 Rna stitch测序:用于直接映射细胞中rna:rna相互作用的测定
EP3198063A4 (fr) * 2014-09-22 2018-05-02 The Regents of the University of California Séquençage par maillage de l'arn : analyse permettant une cartographie directe de l'arn : interactions de l'arn dans les cellules
CN107109698B (zh) * 2014-09-22 2021-07-20 加利福尼亚大学董事会 Rna stitch测序:用于直接映射细胞中rna:rna相互作用的测定

Also Published As

Publication number Publication date
US20150045237A1 (en) 2015-02-12
EP2825890A1 (fr) 2015-01-21

Similar Documents

Publication Publication Date Title
Nielsen et al. Best practice standards for circular RNA research
Jathar et al. Technological developments in lncRNA biology
Ke et al. m6A mRNA modifications are deposited in nascent pre-mRNA and are not required for splicing but do specify cytoplasmic turnover
US20150045237A1 (en) Method for identification of the sequence of poly(a)+rna that physically interacts with protein
Hafner et al. Identification of mRNAs bound and regulated by human LIN28 proteins and molecular requirements for RNA recognition
Hafner et al. Genome-wide identification of miRNA targets by PAR-CLIP
US8841073B2 (en) Methods for identifying RNA segments bound by RNA-binding proteins or ribonucleoprotein complexes
Tang et al. Alternative polyadenylation by sequential activation of distal and proximal PolyA sites
Jaskiewicz et al. Argonaute CLIP–a method to identify in vivo targets of miRNAs
Liu et al. circFL-seq reveals full-length circular RNAs with rolling circular reverse transcription and nanopore sequencing
CN107109698B (zh) Rna stitch测序:用于直接映射细胞中rna:rna相互作用的测定
Roos et al. Mutations in cis that affect mRNA synthesis, processing and translation
Gorbovytska et al. Enhancer RNAs stimulate Pol II pause release by harnessing multivalent interactions to NELF
US11859248B2 (en) Nucleic acid modification and identification method
Solé et al. The use of circRNAs as biomarkers of cancer
Zinnall et al. HDLBP binds ER-targeted mRNAs by multivalent interactions to promote protein synthesis of transmembrane and secreted proteins
Wolin et al. SPIDR: a highly multiplexed method for mapping RNA-protein interactions uncovers a potential mechanism for selective translational suppression upon cellular stress
US20110269647A1 (en) Method
Class et al. Patent application title: METHOD FOR IDENTIFICATION OF THE SEQUENCE OF POLY (A)+ RNA THAT PHYSICALLY INTERACTS WITH PROTEIN Inventors: Markus Landthaler (Berlin, DE) Mathias Munschauer (Berlin, DE) Alexander Baltz (Berlin, DE)
Wang et al. Capture, amplification, and global profiling of microRNAs from low quantities of whole cell lysate
Ayub et al. Useful methods to study epigenetic marks: DNA methylation, histone modifications, chromatin structure, and noncoding RNAs
Surani Transcription Start Site selection within a single cluster and G quadruplex structures: a novel mechanism regulating gene expression
Monteiro Martins Bioinformatics analysis of multi-omics data elucidates U2 snRNP function in transcription
Kargapolova A novel crosslinking and immunoprecipitation method reveals the function of CSTF2tau in alternative processing of snRNAs
Oo et al. Long non-coding RNAs direct the SWI/SNF complex to cell-specific enhancers

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 13714874

Country of ref document: EP

Kind code of ref document: A1

WWE Wipo information: entry into national phase

Ref document number: 14385501

Country of ref document: US

WWE Wipo information: entry into national phase

Ref document number: 2013714874

Country of ref document: EP

NENP Non-entry into the national phase

Ref country code: DE