WO2018112336A1 - Systems and methods for dna-guided rna cleavage - Google Patents

Systems and methods for dna-guided rna cleavage Download PDF

Info

Publication number
WO2018112336A1
WO2018112336A1 PCT/US2017/066664 US2017066664W WO2018112336A1 WO 2018112336 A1 WO2018112336 A1 WO 2018112336A1 US 2017066664 W US2017066664 W US 2017066664W WO 2018112336 A1 WO2018112336 A1 WO 2018112336A1
Authority
WO
WIPO (PCT)
Prior art keywords
target
rna
sequence
guide
argonaute
Prior art date
Application number
PCT/US2017/066664
Other languages
French (fr)
Inventor
Kotaro NAKANISHI
Daniel DAYEH
Original Assignee
Ohio State Innovation Foundation
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Ohio State Innovation Foundation filed Critical Ohio State Innovation Foundation
Publication of WO2018112336A1 publication Critical patent/WO2018112336A1/en

Links

Classifications

    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12NMICROORGANISMS OR ENZYMES; COMPOSITIONS THEREOF; PROPAGATING, PRESERVING, OR MAINTAINING MICROORGANISMS; MUTATION OR GENETIC ENGINEERING; CULTURE MEDIA
    • C12N15/00Mutation or genetic engineering; DNA or RNA concerning genetic engineering, vectors, e.g. plasmids, or their isolation, preparation or purification; Use of hosts therefor
    • C12N15/09Recombinant DNA-technology
    • C12N15/11DNA or RNA fragments; Modified forms thereof; Non-coding nucleic acids having a biological activity
    • C12N15/113Non-coding nucleic acids modulating the expression of genes, e.g. antisense oligonucleotides; Antisense DNA or RNA; Triplex- forming oligonucleotides; Catalytic nucleic acids, e.g. ribozymes; Nucleic acids used in co-suppression or gene silencing
    • CCHEMISTRY; METALLURGY
    • C07ORGANIC CHEMISTRY
    • C07KPEPTIDES
    • C07K14/00Peptides having more than 20 amino acids; Gastrins; Somatostatins; Melanotropins; Derivatives thereof
    • C07K14/37Peptides having more than 20 amino acids; Gastrins; Somatostatins; Melanotropins; Derivatives thereof from fungi
    • C07K14/39Peptides having more than 20 amino acids; Gastrins; Somatostatins; Melanotropins; Derivatives thereof from fungi from yeasts
    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12NMICROORGANISMS OR ENZYMES; COMPOSITIONS THEREOF; PROPAGATING, PRESERVING, OR MAINTAINING MICROORGANISMS; MUTATION OR GENETIC ENGINEERING; CULTURE MEDIA
    • C12N15/00Mutation or genetic engineering; DNA or RNA concerning genetic engineering, vectors, e.g. plasmids, or their isolation, preparation or purification; Use of hosts therefor
    • C12N15/09Recombinant DNA-technology
    • C12N15/63Introduction of foreign genetic material using vectors; Vectors; Use of hosts therefor; Regulation of expression
    • C12N15/79Vectors or expression systems specially adapted for eukaryotic hosts
    • C12N15/80Vectors or expression systems specially adapted for eukaryotic hosts for fungi
    • C12N15/81Vectors or expression systems specially adapted for eukaryotic hosts for fungi for yeasts
    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12QMEASURING OR TESTING PROCESSES INVOLVING ENZYMES, NUCLEIC ACIDS OR MICROORGANISMS; COMPOSITIONS OR TEST PAPERS THEREFOR; PROCESSES OF PREPARING SUCH COMPOSITIONS; CONDITION-RESPONSIVE CONTROL IN MICROBIOLOGICAL OR ENZYMOLOGICAL PROCESSES
    • C12Q1/00Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions
    • C12Q1/68Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions involving nucleic acids
    • C12Q1/6844Nucleic acid amplification reactions
    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12NMICROORGANISMS OR ENZYMES; COMPOSITIONS THEREOF; PROPAGATING, PRESERVING, OR MAINTAINING MICROORGANISMS; MUTATION OR GENETIC ENGINEERING; CULTURE MEDIA
    • C12N2310/00Structure or type of the nucleic acid
    • C12N2310/10Type of nucleic acid
    • C12N2310/14Type of nucleic acid interfering N.A.

Definitions

  • the present invention relates generally to compositions, systems, and methods for cleaving RNA molecules.
  • RNA-induced silencing complex RNA-induced silencing complex
  • RISC RNA-induced silencing complex
  • the loaded RNAs pre- organized in the nucleic acid-binding channel, serve as guides to facilitate base pairing with targets. It is well known that the RISCs open their bilobal structures to widen the intervening channel during the transition from nucleation to propagation steps of guide-target duplex formation and cleave the targets only when their sequence perfectly matches the guide.
  • the significance of the proteinaceous part of RISC for this step has not been studied well due to the difficulty of making suitable constructs.
  • current methods involving the RISC complex in attenuating gene expression require an RNA oligonucleotide to facilitate target RNA cleavage.
  • HH ribozymes and ASREs are limited by the need to re- engineer the RNA-recognition motif for each unique target of interest and DNAzymes depend on multiple cycles of selective evolution to achieve catalysis against desired targets.
  • RNA design is dependent on prior knowledge of any secondary structural features that the RNA may exhibit.
  • Chemical probing methods and enzymatic strategies using RNase H have allowed researchers to gain insights into which regions of RNA are unpaired or exposed to solvent and may serve as candidate target sites for enzymatic cleavage, antisense oligonucleotide or small- interfering RNA design.
  • a yeast Argonaute polypeptide can utilize single-stranded DNA as a guide molecule for cleaving target RNAs.
  • RNA-guided RNA cleavage system comprising: a yeast Argonaute polypeptide; and
  • oligonucleotide guide molecule a heterologous, single-stranded oligonucleotide guide molecule
  • the single-stranded oligonucleotide guide molecule is a DNA oligonucleotide that is complementary to a target RNA sequence.
  • RNA sequence comprising: binding to a target RNA sequence a complex comprising:
  • oligonucleotide guide molecule a heterologous, single-stranded oligonucleotide guide molecule
  • the single-stranded oligonucleotide guide molecule is a DNA oligonucleotide that is complementary to the target RNA sequence
  • Argonaute polypeptide:guide molecule complex cleaves the target RNA sequence.
  • a method for attenuating expression of a target gene in a cell comprising:
  • yeast Argonaute polypeptide introducing into the cell a yeast Argonaute polypeptide
  • ssDNA single stranded DNA
  • a method for attenuating expression of a target gene in a cell comprising:
  • a complex comprising: a yeast Argonaute polypeptide and a single stranded DNA (ssDNA) in an amount sufficient to attenuate expression of the target gene; wherein the ssDNA comprises a nucleotide sequence that is complementary to a nucleotide sequence of the target gene.
  • the yeast Argonaute polypeptide is from Vanderwaltozyma polyspora (also known as Kluyveromyces polysporus). In one embodiment, the yeast Argonaute polypeptide is selected from SEQ ID NO:31 , SEQ ID NO:32, or SEQ ID NO:33. In one embodiment, the yeast Argonaute polypeptide is SEQ ID NO:32. In some embodiments, the yeast Argonaute polypeptide has at least 60% (for example, at least 60%, at least 65%, at least 70%, at least 80%, at least 85%, at least 90%, at least 95%, at least 99%) identity to SEQ ID NO:31, SEQ ID NO: 32, or SEQ ID NO:33.
  • the yeast Argonaute polypeptide has at least 60% (for example, at least 60%, at least 65%, at least 70%, at least 80%, at least 85%, at least 90%, at least 95%, at least 99%) identity to SEQ ID NO:32.
  • the single-stranded oligonucleotide guide molecule is about 12 to about 45 nucleotides. In some embodiments, the single-stranded oligonucleotide guide molecule is about 12, about 13, about 14, about 15, about 16, about 17, about 18, about 19, about 20, about 21 , about 22, about 23, about 24, about 25, about 26, about 27, about 28, about 29, about 30, about 31 , about 32, about 33, about 34, about 35, about 36, about 37, about 38, about 39, about 40, about 41 , about 42, about 43, about 44, or about 45 nucleotides. In one embodiment, the single-stranded oligonucleotide guide molecule is about 12 to about 30 nucleotides.
  • the single-stranded oligonucleotide guide molecule is about 14 to about 26 nucleotides. In some embodiments, the single-stranded oligonucleotide guide molecule is about 21 to about 25 nucleotides. In one embodiment, the single-stranded oligonucleotide guide molecule is about 23 nucleotides.
  • the target RNA sequence is from a mammal. In one embodiment, the target RNA sequence is from a human. In one embodiment, the target RNA sequence is from a virus. In one embodiment, the target RNA sequence is from a pathogen. In one embodiment, the target RNA sequence is from a bacterium. In one embodiment, the target RNA sequence is from a prokaryotic cell. In one embodiment, the target RNA sequence is from a eukaryotic cell. Further disclosed herein are systems and methods for detecting nuclease accessibility sites in an RNA sequence.
  • a yeast Argonaute protein can utilize single-stranded DNA as a guide molecule for, among other applications, high-throughput identification and targeting of accessible regions of highly-structured RNAs.
  • Complexes referred to as a DNA-induced slicing complex; or "DISC"
  • DISC DNA-induced slicing complex
  • RISC single-stranded RNA
  • a method of detecting nuclease accessibility sites in an RNA sequence comprising a) binding to a target RNA sequence a complex comprising a yeast Argonaute polypeptide and a first single-stranded DNA oligonucleotide guide molecule, wherein the single-stranded DNA oligonucleotide guide molecule is complementary to the target RNA sequence; b) cleaving the target RNA sequence with the Argonaute polypeptide:guide complex to form an RNA cleavage product; c) detecting the RNA cleavage product; and d) determining a nuclease accessibility site based on the RNA cleavage product.
  • a method of high-throughput detection of nuclease accessibility sites comprising a) assaying a target RNA sequence with two or more Argonaute polypeptide: guide complexes, wherein each complex comprises a yeast Argonaute polypeptide and a single-stranded DNA oligonucleotide guide molecule from a library of single-stranded DNA oligonucleotide guide molecules, wherein each single-stranded DNA oligonucleotide guide molecule is complementary to a portion of the target RNA sequence; b) cleaving the target RNA sequence with the Argonaute polypeptide: guide complexes to form at least one RNA cleavage product; c) detecting the at least one RNA cleavage product; and d) determining a nuclease accessibility site based on the at least one RNA cleavage product.
  • a DNA-guided RNA cleavage system for high- throughput detection of nuclease accessibility sites, the system comprising a first complex comprising a first yeast Argonaute polypeptide and a first single-stranded DNA oligonucleotide guide molecule; and a second complex comprising a second yeast Argonaute polypeptide and a second single-stranded DNA oligonucleotide guide molecule; wherein the first and second single-stranded DNA oligonucleotide guide molecules are not identical and are complementary to a target RNA sequence.
  • kits comprising a vector comprising: a nucleic acid sequence encoding a yeast Argonaute polypeptide operably linked to a promoter; an RNA-dependent DNA polymerase; a set of buffered RNA cleavage reagents; and a set of buffered reverse transcription reagents.
  • FIGS. 1A-1H show cleavage activity of mini-AGO.
  • FIG. la Domain architectures of K. polysporus Argonaute (wildtype, Agol; and truncated, AGO) and its miniature Argonaute (mini-AGO) as well as the previously crystalized construct of Neurospora crassa QDE-2 C- terminal lobe (PDB accession code 2YHA).
  • FIG. lb Sequence alignment of conserved RxxxGxxG (Argonaute clade) and GxxG (PIWI clade) motifs in the N domain of Argonaute family proteins.
  • FIG. lc Nuclease-sensitivity of co-purifying nucleic acid. Polynucleotides were extracted from indicated samples, end- labelled, and either untreated or incubated with RNase (R) or DNase (D) before analysis by denaturing PAGE with a hydrolyzed marker (nt, nucleotide).
  • R RNase
  • D DNase
  • FIG. le Analysis of cleavage product length. Products generated as in Fig. Id were resolved on a sequencing gel alongside a hydrolyzed marker.
  • FIG. If, Schematic of target (top) and guide (bottom) strands. RNA or DNA guides matched with either 5' capped RNA or 5' phosphorylated DNA targets used in Fig. lg and Fig. lh. FIG.
  • FIG. lg Cleavage activity of a DNA-induced silencing complex (DISC).
  • DISC DNA-induced silencing complex
  • FIG. lh Cleavage activity of a mini- DISC (DNA-programmed mini-AGO).
  • Cleavage assays were performed as in Fig. lg except with mini-AGO.
  • FIGS. 2A-2D show the recognition of the seed region and catalytic assembly.
  • FIG. 2a Crystal structure of mini-AGO with N (cyan), L2 (grey), MID (orange), and PIWI (green) domains in ribbon representation. The bound RNA is drawn as a stick model (red).
  • FIG. 2b Interaction of the RxxxGxxG motif with the PIWI domain. Hydrogen bonds shown as dashed lines. Residues in the N- (cyan) and PIWI (green) domains are drawn as stick models. Water molecules are shown as red spheres.
  • FIG. 2c MiRNA seed is recognized by the MID and PIWI domain residues along the sugar phosphate backbone.
  • FIG. 2d Catalytically active conformation of mini-AGO. Superposition of AGO from ⁇ . polysporus (white) or mini-AGO (green) reveals fully assembled active site with plugged-in glutamate finger.
  • FIGS. 3A-3D show the reconstitution of in vitro RNAi by mini-AGO.
  • FIG. 3a
  • FIG. 3b In vitro execution of all stages in the RNAi pathway. Unlabeled siRNA duplexes were pre-incubated with either AGO or mini-AGO. The complex was then incubated with the cap-labelled, matched target RNAs.
  • FIG. 3c Matched and mismatched targets. Both targets (top) were cap-labeled (red). The mismatched target contained a dinucleotide mismatch at tlO and tl 1 (blue) against miR-20a guide (bottom).
  • FIG. 3b In vitro execution of all stages in the RNAi pathway. Unlabeled siRNA duplexes were pre-incubated with either AGO or mini-AGO. The complex was then incubated with the cap-labelled, matched target RNAs.
  • FIG. 3c Matched and mismatched targets. Both targets (top) were cap-labeled (red). The mismatched target contained a dinucleotide mismatch at tlO and tl 1 (blue) against miR
  • FIGS. 4A-4E show the discrimination of guide:target pairs between AGO and mini-
  • FIG. 4a, FIG. 4b Cleavage of the perfectly matched target with guide RNAs of different lengths. Either of the guide RNAs (10, 11, 12, 13, 14, 16, 23 nt) was loaded into AGO (a) and mini-AGO (b). Relative cleavage was calculated using Equation 2.
  • FIG. 4c, FIG. 4d Cleavage of the mismatched target with guide RNAs of different length. The target was the same in Fig. 4a and 4b except for the tlO-tl 1 step mismatches. The same guides used in Fig. 4a and 4b were loaded into AGO (c) and mini-AGO (d). Relative cleavage was calculated using Equation 2.
  • FIG. 4c, FIG. 4b Cleavage of the perfectly matched target with guide RNAs of different lengths. Either of the guide RNAs (10, 11, 12, 13, 14, 16, 23 nt) was loaded into AGO (a) and mini-AGO (b). Relative cleavage
  • FIGS. 5A-5D show the design of mini-AGO.
  • FIG. 5a Bilobal structure of AGO from
  • FIG. 5b Extended ⁇ -strands in the PIWI domain. The color codes are the same as in (a).
  • FIG. 5c Strategy for designing a mini-AGO construct. Catalytic and conserved RxxxGxxG residues are circled and labelled in red and cyan, respectively (amino acid residues are abbreviated as follows: D, aspartate; E, glutamate; G, glycine; R, arginine)
  • FIG. 5d Amino acid sequence and secondary structure of mini-AGO segment located at the interface of the N-terminal and C-terminal lobes. conserveed RxxxGxxG motif is underscored.
  • FIG. 6 shows contribution of conserved RxxxGxxG motif to stability. Effect of point mutations to the RxxxGxxG motif on the solubility of the C-terminal-lobe construct. After lysis, the soluble (S) and precipitated (P) fractions were separated by centrifugation and resolved by SDS-PAGE. The bands of SUMO-tag fused mini-AGO are indicated with an arrowhead.
  • FIGS. 7A-7E show RNAs co-purified and crystallized with mini-AGO.
  • FIG. 7a Profile of size-exclusion chromatography of mini-AGO. Absorbance values at 254 and 280 nm are colored in red and blue, respectively.
  • FIG. 7b, FIG. 7c Fo-Fc omit map contoured at 2.5 ⁇ around the bound guide RNA. The omit map is shown with the ribbon model of mini-AGO (wheat) (b) and with the final RNA model (red) (c).
  • FIG. 7a Profile of size-exclusion chromatography of mini-AGO. Absorbance values at 254 and 280 nm are colored in red and blue, respectively.
  • FIG. 7b, FIG. 7c Fo-Fc omit map contoured at 2.5 ⁇ around the bound guide RNA. The omit map is shown with the ribbon model of mini-AGO (wheat) (b) and with the final RNA model (red) (c).
  • RNAs were extracted from mini- AGO crystals, 5' end-labelled, and resolved by denaturing PAGE alongside RNAs of known length.
  • FIGS. 8A-8G show reconstitution of in vitro RNAi by mini-AGO.
  • FIG. 8a Schematic of duplex loading, passenger cleavage, and target recognition and cleavage by mini-AGO.
  • FIG. 8b Preparation of siRNA duplex used in passenger strand cleavage assays. The 5'-end label of the passenger strand is colored in red. Annealed siRNA duplex was resolved on 20% native PAGE alongside 23-nt single-stranded passenger. Gel was visualized by phosphorimaging.
  • FIG. 8c, FIG. 8d Passenger strand cleavage by AGO (c) or mini-AGO (d) from Figure 3a were quantified and plotted using Equation 1.
  • FIG. 8e Preparation of unlabeled siRNA duplex used in siRNA-mediated target RNA cleavage assays. Annealed siRNA duplex was resolved as in (b). Gel was visualized by SybrGold staining.
  • FIGS. 9A-9B show recognition of mismatches at the cleavage site.
  • FIG. 9a, FIG. 9b, Guide-mediated mismatched target cleavage by AGO (a) and mini-AGO (b) from Figure 3d were quantified.
  • FIGS. 10A-10D show cleavage of targets guided by atypically short guides.
  • FIG. 10a Schematic of miR-20a RNA guides trimmed at their 3' ends used for guide-mediated cleavage assays shown in Figure 4a-d.
  • FIG. 10b Schematic of programming mini-AGO with ssRNA guides before adding the 60-nt target strand. Cap-label shown as yellow circle.
  • FIG. 10c Secondary structure prediction and free energy calculation of two single-stranded RNAs with guide (red) and target (blue). For clarity, the first nucleotide of the guide is not shown.
  • FIG. lOd Model of guide:target pairing on mini-AGO. Guide and target colored as in (c). Stable and unstable base pairs between guide and target are shown as black solid lines and dashed grey lines, respectively.
  • FIGS. 11 A-l ID show DISC-mediated RNA cleavage activity.
  • FIG. 1 la Schematic of cleavage assay. AGOAexN was programmed with either a 5' monophosphorylated gRNA or gDNA followed by addition of a perfectly (100%) complementary RNA or DNA target (yellow circle indicates 2 P-phosphate).
  • FIG. 1 lb Combinations of RNA and DNA guide:target pairs assayed for AGOAexN cleavage activity. Bottom strand (guide); top strand (target); yellow (p) indicates 2 P-radiolabel on target. Complete 60-nt target sequences are shown in Table 1.
  • FIG. 1 la Schematic of cleavage assay. AGOAexN was programmed with either a 5' monophosphorylated gRNA or gDNA followed by addition of a perfectly (100%) complementary RNA or DNA target (yellow circle indicates 2 P-phosphate).
  • FIG. 1 lb Combinations of RNA and DNA guide:target pairs assayed
  • RNA or DNA target cleavage activity of AGOAexN programmed with a gRNA or gDNA Cleavage activity was plotted relative to RNA target cleavage when AGOAexN was loaded with gRNA. Average of three experiments is shown as a bar with individual replicates plotted as circles. Boxed inset shows expanded view of low level DNA target cleavage.
  • FIG. l id Mismatch sensitivity of gRNA- or gDNA-dependent RNA cleavage.
  • the matched RNA target was the same as that used in FIG. l ib.
  • the mismatched RNA target included an unpaired dinucleotide (bold) pairing to the guide positions 10 and 11.
  • AGOAexN programmed with either gRNA or gDNA was incubated with the 5' cap-labeled matched or mismatched RNA target. The reaction was resolved on 16% denaturing PAGE.
  • FIGS. 12A-12F show cleavage of highly-structured viral RNA by DISC.
  • FIG. 12a Predicted secondary structure of HIV-1 ADIS 5'UTR. The position of each 23-nt gDNA- targeted sequence is indicated along with the corresponding gDNA# in parentheses (Table 3). Segments are colored in alternating black and purple for clarity. Shaded circles highlight the 3' nt of each target segment that does not pair to the gDNA (FIG. 12c).
  • Triangles in FIG. 12a indicate cleavage sites on the RNA and coloring of the triangles reflects cleavage site reactivity, as shown in the scale to the right side of FIG. 12b. (FIG.
  • FIG. 12b Results of cleavage assays using the gDNAs complementary to each 23-nt segment shown in FIG. 12a. Averages of three independent experiments are shown as bars and individual replicates are plotted as circles. Inset shows expanded view of low-level cleavage by gDNAl-3. Color scale bar indicates reactivity, which is grouped into quartiles based on percent target cleaved (Ql, 0-12.5%; Q2, 12.5-25%; Q3, 25-37.5%; Q4, 37.5-50%).
  • FIG. 12c Schematic of guide:target pairs used in mismatch assay. gDNA-4, -6, -8 and -11 served as representatives from each quartile.
  • FIGS. 13A-13C show high-throughput mapping of accessible sites on HIV-1 RNA.
  • FIG. 13a Schematic of steps involved in batch-cleavage by DISC on HIV-1 ADIS 5'UTR RNA substrate followed by RT/PE and capillary electrophoresis analysis. Refer to Materials and Methods for detailed experimental procedure.
  • FIG. 13b Electropherogram of arbitrary reactivity units of assorted DISC-mediated cleavage with gDNA-2 through -12. The data were analyzed by RiboCAT. Traces from three independent experiments show consistency and reproducibility of the method. gDNA# used for cleavage is shown above each peak.
  • FIG. 13c Trace showing average of three independent experiments.
  • FIGS. 14A-14C show gDNA-dependent RNA cleavage by a truncated K. polysporus AGOl variant.
  • FIG. 14a Domain architecture of yeast AGO. The four conserved domains and two linker regions: N (cyan), Linker 1 (black line), PAZ (violet), Linker 2 (black line), MID (orange), and PIWI (green).
  • N cyan
  • Linker 1 black line
  • PAZ violet
  • Linker 2 black line
  • MID yellow
  • PIWI green
  • a truncated K. polysporus AGOl variant lacking the first 206 residues was used which retains comparable RNAi activity as wild-type (WT) AGOl.
  • FIG. 14b, FIG. 14c In vitro RNA cleavage by DISC.
  • AGOAexN 500 nM was mixed with increasing amounts of miR-20a-derived gDNA before adding 1 nM 5' end-labeled 60-nt target.
  • a representative gel is shown in (b). Cleavage products were plotted as a function of gDNA concentration in (c). Data points represent the average of three independent experiments with error bars representing S.D.
  • FIGS. 15A-15C show HIV-1 5' UTR substrate.
  • FIG. 15a Predicted secondary structure of WT HIV-1 5'UTR RNA (nt 1-356) based on SHAPE analysis. Dimerization Initiation Signal (DIS) is shown in red (nt 256-264).
  • FIG. 15b Predicted secondary structure of HIV-1 ADIS 5'UTR used in this study; DIS is replaced with a GAGA tetraloop. Residue numbering throughout this study follows the mutated ADIS 5'UTR construct.
  • FIG. 15c Evaluation of HIV-1 ADIS 5'UTR sample homogeneity. After folding (see Materials and Methods), 2 P-labeled HIV-1 ADIS 5'UTR was resolved on 6% native PAGE supplemented with 1 mM MgCh.
  • FIG. 16 shows the workflow to generate gDNAs for systematic analysis with assorted DISCs.
  • the sequence of target RNA (HIV-1 ADIS 5'UTR) is used as the input.
  • Target RNA is first converted from RNA to DNA followed by generation of the reverse complement, which is divided into 23-nt fragments from its 5' end.
  • Each gDNA 5' nt is changed to T, as previously reported for human AG02.
  • FIG. 17 shows a representative gel of HIV-1 ADIS 5'UTR cleavage by DISC, (a) Denaturing PAGE (8%) showing resolved cleavage products depicted in FIG. 12b alongside an RNA marker, (b) 8% denaturing PAGE showing results of dinucleotide mismatch assay used in FIG. 12d.
  • FIGS. 18A-18B show gDNAs using unstructured miR-20a-derived RNA target.
  • FIG. 18A gDNA 5 ' nucleotide sequence was analyzed by altering the identity of the 5 ' nt to T, A, G or C. Cleavage percentage in the endpoint assay indicated that gDNAs with a 5' should be used for gDNA design.
  • FIG. 18B gDNA length was investigated for the unstructured miR-20a target by truncating or extending the base-paired region between the guide and target strands. All gDNAs perfectly match the RNA target. gDNAs tested range from 15-25 nt in length.
  • FIGS. 19A-19D show gDNAs using structured HIV-1 ADIS 5'UTR RNA target.
  • FIG. 19A gDNAs were designed at 20 - 25 nt in length to target two sites on the HIV-1 ADIS 5'UTR target at sites #6 and #8. Representative gel of cleavage assay showing substrates resolved from cleavage products be denaturing urea PAGE (8%).
  • FIG. 19B Schematic showing partial secondary structure of HIV-1 ADIS 5'UTR and sites targeted by gDNAs #6 and #8. Color scale bar indicates cleavage reactivity grouped into 12.5% windows.
  • FIG. 19C, FIG. 19D Quantified data showing cleavage percentages shown in Fig. 2A for gDNA#6 (FIG.
  • FIGS. 20A-20B show cleavage assay comparing activity by DISC and RNase H against unstructured miR-20a RNA target and structured HIV-1 ADIS 5'UTR RNA target.
  • FIG. 20A Quantified data of cleavage of unstructured miR-20a target by DISC (solid circles) and RNase H (open triangles). Black bar represents average mean of three independent experiments. Circles and triangles represent individual replicates.
  • FIG. 20B Quantified data of cleavage of structured HIV-1 ADIS 5'UTR RNA target by DISC (solid circles) and RNase H (open triangles). Black bar represents average mean of three independent experiments. Circles and triangles represent individual replicates. The results indicate that DISC is able to access and cleave structured regions of RNA that RNase H is unable to cleave.
  • a yeast Argonaute protein can utilize single-stranded DNA as a guide molecule for cleaving target RNAs. Also disclosed herein are systems and methods for detecting nuclease accessibility sites in an RNA sequence. The inventors have further shown that a yeast Argonaute protein can utilize single-stranded DNA as a guide molecule for, among other applications, high-throughput identification and targeting of accessible regions of highly- structured RNAs.
  • DISC DNA-induced slicing complex
  • RISC single-stranded RNA
  • the article “a,” “an,” and “the” means “at least one,” unless the context in which the article is used clearly indicates otherwise.
  • the term “about” as used herein when referring to a measurable value such as an amount, a percentage, and the like, is meant to encompass variations of ⁇ 20%, ⁇ 10%, ⁇ 5%, or ⁇ 1 % from the measurable value.
  • the terms “may,” “optionally,” and “may optionally” are used interchangeably and are meant to include cases in which the condition occurs as well as cases in which the condition does not occur.
  • the statement that a composition "optionally includes a second component” is meant to include cases in which the composition includes second component as well as cases in which the formulation does not include a second component.
  • nucleic acid means a polymer composed of nucleotides, e.g. deoxyribonucleotides or ribonucleotides.
  • ribonucleic acid and "RNA” as used herein mean a polymer composed of ribonucleotides.
  • deoxyribonucleic acid and "DNA” as used herein mean a polymer composed of deoxyribonucleotides.
  • oligonucleotide denotes single- or double-stranded nucleotide multimers of from about 2 to up to about 100 nucleotides in length.
  • Suitable oligonucleotides may be prepared by the phosphoramidite method described by Beaucage and Carruthers, Tetrahedron Lett., 22: 1859-1862 (1981), or by the triester method according to Matteucci, et al., J. Am. Chem. Soc, 103:3185 (1981), both incorporated herein by reference, or by other chemical methods using either a commercial automated oligonucleotide synthesizer or VLSIPSTM technology.
  • double-stranded When oligonucleotides are referred to as “double-stranded,” it is understood by those of skill in the art that a pair of oligonucleotides exist in a hydrogen-bonded, helical array typically associated with, for example, DNA.
  • double-stranded As used herein is also meant to refer to those forms which include such structural features as bulges and loops, described more fully in such biochemistry texts as Stryer, Biochemistry, Third Ed., (1988), incorporated herein by reference for all purposes.
  • a single-stranded oligonucleotide can exist as a linear molecule without any hydrogen-bonded nucleotides, or can fold three-dimensionally to form hydrogen bonds between individual nucleotides along the single stranded oligonucleotide.
  • polynucleotide refers to a single or double stranded polymer composed of nucleotide monomers.
  • Polynucleotides can be any length, either deoxyribonucleotides or ribonucleotides, or analogs thereof.
  • Polynucleotides may have any three-dimensional structure, and may perform any function, known or unknown.
  • polynucleotides a gene or gene fragment, exons, introns, messenger RNA (mRNA), transfer RNA, ribosomal RNA, ribozymes, cDNA, recombinant polynucleotides, branched polynucleotides, plasmids, vectors, isolated DNA of any sequence, isolated RNA of any sequence, nucleic acid probes, and primers.
  • a polynucleotide may comprise modified nucleotides, such as methylated nucleotides and nucleotide analogs. If present, modifications to the nucleotide structure may be imparted before or after assembly of the polymer.
  • the sequence of nucleotides may be interrupted by non-nucleotide components.
  • a polynucleotide may be further modified after polymerization, such as by conjugation with a labeling component.
  • a polynucleotide is composed of a specific sequence of four nucleotide bases: adenine (A); cytosine (C); guanine (G); thymine (T); and uracil (U) for thymine (T) when the polynucleotide is RNA.
  • the term "polynucleotide sequence" is the alphabetical representation of a polynucleotide molecule.
  • the polynucleotide is composed of nucleotide monomers of generally greater than 100 nucleotides in length and up to about 8,000 or more nucleotides in length.
  • polypeptide refers to a compound made up of a single chain of D- or L- amino acids or a mixture of D- and L-amino acids joined by peptide bonds.
  • complementary refers to the topological compatibility or matching together of interacting surfaces of two molecules (e.g., a probe molecule and its target, particularly a DNA guide molecule and a target RNA molecule).
  • the two molecules e.g., target and its probe
  • the two molecules can be described as complementary, and furthermore, the contact surface characteristics are complementary to each other.
  • the two molecules are complementary if they have sufficiently compatible nucleotide base-pairs such that the two molecules can hybridize.
  • nucleotide molecules e.g., nucleotides, oligonucleotides, polynucleotides, modified nucleotides, etc.
  • nucleotide molecules which have 100% complementarity (e.g., each nucleotide in a sequence of one molecule is the nucleotide base-pair complement of an adjacent nucleotide in a sequence of the second molecule, in sequential order) as well as two or more nucleotide molecules which have less than 100% complementarity but which hybridize under the conditions of the methods disclosed herein.
  • hybridization or “hybridizes” refers to a process of establishing a non- covalent, sequence-specific interaction between two or more complementary strands of nucleic acids into a single hybrid, which in the case of two strands is referred to as a duplex.
  • anneal refers to the process by which a single-stranded nucleic acid sequence pairs by hydrogen bonds to a complementary sequence, forming a double-stranded nucleic acid sequence, including the reformation (renaturation) of complementary strands that were separated by heat (thermally denatured).
  • melting refers to the denaturation of a double-stranded nucleic acid sequence due to high temperatures, resulting in the separation of the double strand into two single strands by breaking the hydrogen bonds between the strands.
  • Target refers to a molecule that has an affinity for a given probe. Targets may be naturally-occurring or man-made molecules. Also, they can be employed in their unaltered state or as aggregates with other species.
  • promoter refers to a region or sequence determinants located upstream or downstream from the start of transcription and which are involved in recognition and binding of RNA polymerase and other proteins to initiate transcription. Promoters need not be of bacterial origin, for example, promoters derived from viruses or from other organisms can be used in the compositions, systems, or methods described herein.
  • regulatory element is intended to include promoters, enhancers, internal ribosomal entry sites (IRES), and other expression control elements (e.g. transcription termination signals, such as polyadenylation signals and poly-U sequences).
  • Regulatory elements include those that direct constitutive expression of a nucleotide sequence in many types of host cell and those that direct expression of the nucleotide sequence only in certain host cells (e.g., tissue-specific regulatory sequences).
  • a tissue-specific promoter may direct expression primarily in a desired tissue of interest, such as muscle, neuron, bone, skin, blood, specific organs (e.g. liver, pancreas), or particular cell types (e.g. lymphocytes).
  • a vector comprises one or more pol III promoter (e.g. 1 , 2, 3, 4, 5, or more pol I promoters), one or more pol II promoters (e.g. 1 , 2, 3, 4, 5, or more pol II promoters), one or more pol I promoters (e.g. 1 , 2, 3, 4, 5, or more pol I promoters), or combinations thereof.
  • pol III promoters include, but are not limited to, U6 and HI promoters.
  • pol II promoters include, but are not limited to, the retroviral Rous sarcoma virus (RSV) LTR promoter (optionally with the RSV enhancer), the cytomegalovirus (CMV) promoter (optionally with the CMV enhancer) [see, e.g., Boshart et al, Cell, 41 :521 - 530 (1985)], the SV40 promoter, the dihydrofolate reductase promoter, the ⁇ -actin promoter, the phosphoglycerol kinase (PGK) promoter, and the EFla promoter.
  • RSV Rous sarcoma virus
  • CMV cytomegalovirus
  • PGK phosphoglycerol kinase
  • enhancer elements such as WPRE; CMV enhancers; the R- U5' segment in LTR of HTLV-I (Mol. Cell. Biol., Vol. 8(1), p. 466-472, 1988); SV40 enhancer; and the intron sequence between exons 2 and 3 of rabbit ⁇ -globin (Proc. Natl. Acad. Sci. USA., Vol. 78(3), p. 1527-31, 1981).
  • WPRE WPRE
  • CMV enhancers the R- U5' segment in LTR of HTLV-I
  • SV40 enhancer SV40 enhancer
  • the intron sequence between exons 2 and 3 of rabbit ⁇ -globin Proc. Natl. Acad. Sci. USA., Vol. 78(3), p. 1527-31, 1981.
  • recombinant refers to a human manipulated nucleic acid (e.g. polynucleotide) or a copy or complement of a human manipulated nucleic acid (e.g. polynucleotide), or if in reference to a protein (i.e, a "recombinant protein"), a protein encoded by a recombinant nucleic acid (e.g. polynucleotide).
  • a recombinant expression cassette comprising a promoter operably linked to a second nucleic acid (e.g. polynucleotide) may include a promoter that is heterologous to the second nucleic acid (e.g.
  • a recombinant expression cassette may comprise nucleic acids (e.g. polynucleotides) combined in such a way that the nucleic acids (e.g. polynucleotides) are extremely unlikely to be found in nature.
  • nucleic acids e.g. polynucleotides
  • human manipulated restriction sites or plasmid vector sequences may flank or separate the promoter from the second nucleic acid (e.g. polynucleotide).
  • an expression cassette refers to a nucleic acid construct, which when introduced into a host cell, results in transcription and/or translation of a RNA or polypeptide, respectively.
  • an expression cassette comprising a promoter operably linked to a second nucleic acid may include a promoter that is heterologous to the second nucleic acid (e.g. polynucleotide) as the result of human manipulation (e.g., by methods described in Sambrook et al, Molecular Cloning— A Laboratory Manual, Cold Spring Harbor Laboratory, Cold Spring Harbor, N.Y., (1989) or Current Protocols in Molecular Biology Volumes 1 -3, John Wiley & Sons, Inc. (1994-1998)).
  • an expression cassette comprising a terminator (or termination sequence) operably linked to a second nucleic acid may include a terminator that is heterologous to the second nucleic acid (e.g. polynucleotide) as the result of human manipulation.
  • the expression cassette comprises a promoter operably linked to a second nucleic acid (e.g. polynucleotide) and a terminator operably linked to the second nucleic acid (e.g. polynucleotide) as the result of human manipulation.
  • the expression cassette comprises an endogenous promoter.
  • the expression cassette comprises an endogenous terminator.
  • the expression cassette comprises a synthetic (or non-natural) promoter.
  • the expression cassette comprises a synthetic (or non-natural) terminator.
  • nucleic acids or polypeptide sequences refer to two or more sequences or subsequences that are the same or have a specified percentage of amino acid residues or nucleotides that are the same (i.e., about 60% identity, preferably 61 %, 62%, 63%, 64%, 65%, 66%, 67%, 68%, 69%, 70%, 71 %, 72%, 73%, 74%, 75%, 76%, 77%, 78%, 79%, 80%, 81%, 82%, 83%, 84%, 85%, 86%, 87%, 88%, 89%, 90%, 91 %, 92%, 93%,94%, 95%, 96%, 97%, 98%, 99% or higher identity over a specified region when compared and aligned for maximum correspondence over a comparison window or designated region) as measured using a BLAST or BLAST 2.0 sequence comparison algorithms with default parameters described below, or by manual alignment and
  • sequences are then said to be “substantially identical.”
  • This definition also refers to, or may be applied to, the compliment of a test sequence.
  • the definition also includes sequences that have deletions and/or additions, as well as those that have substitutions.
  • the preferred algorithms can account for gaps and the like.
  • identity exists over a region that is at least about 10 amino acids or 20 nucleotides in length, or more preferably over a region that is 10-50 amino acids or 20-50 nucleotides in length.
  • percent (%) amino acid sequence identity is defined as the percentage of amino acids in a candidate sequence that are identical to the amino acids in a reference sequence, after aligning the sequences and introducing gaps, if necessary, to achieve the maximum percent sequence identity.
  • Alignment for purposes of determining percent sequence identity can be achieved in various ways that are within the skill in the art, for instance, using publicly available computer software such as BLAST, BLAST-2, ALIGN, ALIGN-2 or Megalign (DNASTAR) software. Appropriate parameters for measuring alignment, including any algorithms needed to achieve maximal alignment over the full-length of the sequences being compared can be determined by known methods.
  • sequence comparisons typically one sequence acts as a reference sequence, to which test sequences are compared.
  • test and reference sequences are entered into a computer, subsequence coordinates are designated, if necessary, and sequence algorithm program parameters are designated.
  • sequence algorithm program parameters Preferably, default program parameters can be used, or alternative parameters can be designated.
  • sequence comparison algorithm then calculates the percent sequence identities for the test sequences relative to the reference sequence, based on the program parameters.
  • HSPs high scoring sequence pairs
  • T is referred to as the neighborhood word score threshold (Altschul et al. (1990) J. Mol. Biol. 215:403-410). These initial neighborhood word hits act as seeds for initiating searches to find longer HSPs containing them. The word hits are extended in both directions along each sequence for as far as the cumulative alignment score can be increased. Cumulative scores are calculated using, for nucleotide sequences, the parameters M (reward score for a pair of matching residues; always >0) and N (penalty score for mismatching residues; always ⁇ 0). For amino acid sequences, a scoring matrix is used to calculate the cumulative score.
  • Extension of the word hits in each direction are halted when: the cumulative alignment score falls off by the quantity X from its maximum achieved value; the cumulative score goes to zero or below, due to the accumulation of one or more negative-scoring residue alignments; or the end of either sequence is reached.
  • the BLAST algorithm parameters W, T, and X determine the sensitivity and speed of the alignment.
  • the BLASTP program uses as defaults a wordlength of 3, and expectation (E) of 10, and the BLOSUM62 scoring matrix (see Henikoff and Henikoff (1989) Proc.
  • BLAST algorithm also performs a statistical analysis of the similarity between two sequences (see, e.g., Karlin and Altschul (1993) Proc. Natl. Acad. Sci. USA 90:5873-5787).
  • P(N) the smallest sum probability
  • a nucleic acid is considered similar to a reference sequence if the smallest sum probability in a comparison of the test nucleic acid to the reference nucleic acid is less than about 0.2, more preferably less than about 0.01.
  • codon optimized refers to genes or coding regions of nucleic acid molecules for the transformation of various hosts, refers to the alteration of codons in the gene or coding regions of polynucleic acid molecules to reflect the typical codon usage of a selected organism without altering the polypeptide encoded by the DNA. Such optimization includes replacing at least one, or more than one, or a significant number, of codons with one or more codons that are more frequently used in the genes of that selected organism.
  • Nucleic acid is "operably linked" when it is placed into a functional relationship with another nucleic acid sequence.
  • DNA for a presequence or secretory leader is operably linked to DNA for a polypeptide if it is expressed as a preprotein that participates in the secretion of the polypeptide;
  • a promoter or enhancer is operably linked to a coding sequence if it affects the transcription of the sequence; or
  • a ribosome binding site is operably linked to a coding sequence if it is positioned so as to facilitate translation.
  • "operably linked” means that the DNA sequences being linked are near each other, and, in the case of a secretory leader, contiguous and in reading phase.
  • operably linked nucleic acids do not have to be contiguous. Linking is accomplished by ligation at convenient restriction sites. If such sites do not exist, the synthetic oligonucleotide adaptors or linkers are used in accordance with conventional practice.
  • a promoter is operably linked with a coding sequence when it is capable of affecting (e.g. modulating relative to the absence of the promoter) the expression of a protein from that coding sequence (i.e., the coding sequence is under the transcriptional control of the promoter).
  • nucleobase refers to the part of a nucleotide that bears the Watson/Crick base-pairing functionality.
  • the most common naturally-occurring nucleobases, adenine (A), guanine (G), uracil (U), cytosine (C), and thymine (T) bear the hydrogen-bonding functionality that binds one nucleic acid strand to another in a sequence specific manner.
  • a “subject” (or a “host”) is meant an individual.
  • the "subject” can include, for example, domesticated animals, such as cats, dogs, etc., livestock (e.g., cattle, horses, pigs, sheep, goats, etc.), laboratory animals (e.g., mouse, rabbit, rat, guinea pig, etc.) mammals, non-human mammals, primates, non-human primates, rodents, birds, reptiles, amphibians, fish, and any other animal.
  • the subj ect can be a mammal such as a primate or a human.
  • a polynucleotide sequence is "heterologous" to a second polynucleotide sequence if it originates from a foreign species, or, if from the same species, is modified by human action from its original form.
  • a promoter operably linked to a heterologous coding sequence refers to a coding sequence from a species different from that from which the promoter was derived, or, if from the same species, a coding sequence which is different from naturally occurring allelic variants.
  • the phrase "selectively (or specifically) hybridizes to” refers to the binding, duplexing, or hybridizing of a molecule only to a particular nucleotide sequence with a higher affinity, e.g., under more stringent conditions, than to other nucleotide sequences (e.g., total cellular or library DNA or RNA).
  • stringent hybridization conditions refers to conditions under which a probe will hybridize to its target subsequence, typically in a complex mixture of nucleic acids, but to no other sequences. Stringent conditions are sequence-dependent and will be different in different circumstances. Longer sequences hybridize specifically at higher temperatures. An extensive guide to the hybridization of nucleic acids is found in Tijssen, Techniques in Biochemistry and Molecular Biology— Hybridization with Nucleic Probes, "Overview of principles of hybridization and the strategy of nucleic acid assays” (1993). Generally, stringent conditions are selected to be about 5-10° C. lower than the thermal melting point (T m ) for the specific sequence at a defined ionic strength pH.
  • T m thermal melting point
  • the T m is the temperature (under defined ionic strength, pH, and nucleic concentration) at which 50% of the probes complementary to the target hybridize to the target sequence at equilibrium (as the target sequences are present in excess, at T m , 50% of the probes are occupied at equilibrium).
  • Stringent conditions may also be achieved with the addition of destabilizing agents such as formamide.
  • a positive signal is at least two times background, preferably 10 times background hybridization.
  • Exemplary stringent hybridization conditions can be as follows: 50% formamide, 5xSSC, and 1% SDS, incubating at 42° C, or, 5xSSC, 1% SDS, incubating at 65° C, with wash in 0.2xSSC, and 0.1% SDS at 65° C.
  • Nucleic acids that do not hybridize to each other under stringent conditions are still substantially identical if the polypeptides which they encode are substantially identical. This occurs, for example, when a copy of a nucleic acid is created using the maximum codon degeneracy permitted by the genetic code. In such cases, the nucleic acids typically hybridize under moderately stringent hybridization conditions.
  • Exemplary "moderately stringent hybridization conditions" include a hybridization in a buffer of 40% formamide, 1 M NaCl, 1% SDS at 37° C, and a wash in 1 * SSC at 45° C. A positive hybridization is at least twice background. Those of ordinary skill will readily recognize that alternative hybridization and wash conditions can be utilized to provide conditions of similar stringency.
  • a group of amino acids having aliphatic side chains is glycine, alanine, valine, leucine, and isoleucine; a group of amino acids having aliphatic-hydroxyl side chains is serine and threonine; a group of amino acids having amide-containing side chains is asparagine and glutamine; a group of amino acids having aromatic side chains is phenylalanine, tyrosine, and tryptophan; a group of amino acids having basic side chains is lysine, arginine, and histidine; and a group of amino acids having sulfur-containing side chains is cysteine and methionine.
  • Exemplary conservative amino acids substitution groups are: valine-leucine-isoleucine, phenylalanine-tyrosine, lysine-arginine, alanine-valine, aspartic acid-glutamic acid, and asparagine-glutamine.
  • RNA-guided RNA cleavage system comprising: a yeast Argonaute polypeptide; and
  • oligonucleotide guide molecule a heterologous, single-stranded oligonucleotide guide molecule
  • the single-stranded oligonucleotide guide molecule is a DNA oligonucleotide that is complementary to a target RNA sequence.
  • the yeast Argonaute polypeptide is from Vanderwaltozyma polyspora (also known as Kluyveromyces polysporus). In one embodiment, the yeast Argonaute polypeptide is selected from SEQ ID NO: 31 , SEQ ID NO:32, or SEQ ID NO:33. In one embodiment, the yeast Argonaute polypeptide is SEQ ID NO:31. In one embodiment, the yeast Argonaute polypeptide is SEQ ID NO:32. In one embodiment, the yeast Argonaute polypeptide is SEQ ID NO:33.
  • the yeast Argonaute polypeptide has at least 60% (for example, at least 60%, at least 65%, at least 70%, at least 80%, at least 85%, at least 90%, at least 95%, at least 99%) identity to SEQ ID NO:31, SEQ ID NO:32, or SEQ ID NO:33. In some embodiments, the yeast Argonaute polypeptide has at least 60% (for example, at least 60%, at least 65%, at least 70%, at least 80%, at least 85%, at least 90%, at least 95%, at least 99%) identity to SEQ ID NO:31.
  • the yeast Argonaute polypeptide has at least 60% (for example, at least 60%, at least 65%, at least 70%, at least 80%, at least 85%, at least 90%, at least 95%, at least 99%) identity to SEQ ID NO:32. In some embodiments, the yeast Argonaute polypeptide has at least 60% (for example, at least 60%, at least 65%, at least 70%, at least 80%, at least 85%, at least 90%, at least 95%, at least 99%) identity to SEQ ID NO:33.
  • the single-stranded oligonucleotide guide molecule is about 12 to about 45 nucleotides. In some embodiments, the single-stranded oligonucleotide guide molecule is about 12, about 13, about 14, about 15, about 16, about 17, about 18, about 19, about 20, about 21 , about 22, about 23, about 24, about 25, about 26, about 27, about 28, about 29, about 30, about 31 , about 32, about 33, about 34, about 35, about 36, about 37, about 38, about 39, about 40, about 41 , about 42, about 43, about 44, or about 45 nucleotides. In one embodiment, the single-stranded oligonucleotide guide molecule is about 12 to about 30 nucleotides.
  • the single-stranded oligonucleotide guide molecule is about 14 to about 26 nucleotides. In some embodiments, the single-stranded oligonucleotide guide molecule is about 21 to about 25 nucleotides. In one embodiment, the single-stranded oligonucleotide guide molecule is about 21 nucleotides. In one embodiment, the single- stranded oligonucleotide guide molecule is about 22 nucleotides. In one embodiment, the single-stranded oligonucleotide guide molecule is about 23 nucleotides. In one embodiment, the single-stranded oligonucleotide guide molecule is about 24 nucleotides.
  • the single-stranded oligonucleotide guide molecule is about 25 nucleotides. In one embodiment, the single-stranded oligonucleotide guide molecule is 21 nucleotides. In one embodiment, the single-stranded oligonucleotide guide molecule is 22 nucleotides. In one embodiment, the single-stranded oligonucleotide guide molecule is 23 nucleotides. In one embodiment, the single-stranded oligonucleotide guide molecule is 24 nucleotides. In one embodiment, the single-stranded oligonucleotide guide molecule is 25 nucleotides.
  • the target RNA sequence is from a mammal. In one embodiment, the target RNA sequence is from a human. In one embodiment, the DNA encoding a yeast Argonaute polypeptide is encoded by SEQ ID NO:34, SEQ ID NO:35, or SEQ ID NO:36. In one embodiment, the DNA encoding a yeast Argonaute polypeptide is encoded by SEQ ID NO:34. In one embodiment, the DNA encoding a yeast Argonaute polypeptide is encoded by SEQ ID NO:35. In one embodiment, the DNA encoding a yeast Argonaute polypeptide is encoded by SEQ ID NO:36.
  • the DNA encoding a yeast Argonaute polypeptide is encoded by a nucleic acid which is at least 60%, at least 75%, at least 80%, at least 85%, at least 90%, at least 95%, at least 97.5%, or at least 99% identical to SEQ ID NO:34, SEQ ID NO: 35, or SEQ ID NO:36.
  • the DNA encoding a yeast Argonaute polypeptide is encoded by a nucleic acid which is at least 60%, at least 75%, at least 80%, at least 85%, at least 90%, at least 95%, at least 97.5%, or at least 99% identical to SEQ ID NO:34.
  • the DNA encoding a yeast Argonaute polypeptide is encoded by a nucleic acid which is at least 60%, at least 75%, at least 80%, at least 85%, at least 90%, at least 95%, at least 97.5%, or at least 99% identical to SEQ ID NO:35. In one embodiment, the DNA encoding a yeast Argonaute polypeptide is encoded by a nucleic acid which is at least 60%, at least 75%, at least 80%, at least 85%, at least 90%, at least 95%, at least 97.5%, or at least 99% identical to SEQ ID NO:36.
  • the single-stranded oligonucleotide guide molecule (for example, ssDNA) has at least one chemically modified nucleotide.
  • modified nucleotides may confer increased stability, decreased off-target effects, and/or reduced toxicity, as compared to a ssDNA not having the chemically modified nucleotide.
  • the at least one chemically modified nucleotide comprises a chemically modified nucleobase, a chemically modified ribose, a chemically modified phosphodiester linkage, or a combination thereof.
  • the chemically modified nucleobase is selected from 5- formylcytidine (5fC), 5-methylcytidine (5meC), 5-methoxycytidine (5moC), 5- hydroxycytidine (5hoC), 5-hydroxymethylcytidine (5hmC), 5-formyluridine (5fU), 5- methyluridine (5-meU), 5-methoxyuridine (5moU), 5-carboxymethylesteruridine (5camU), pseudouridine ( ⁇ ), N ⁇ methylpseudouridine (me l F), N 6 -methyladenosine (me 6 A), or thienoguanosine ( th G).
  • the chemically modified ribose is selected from 2'-0-methyl (2'- O-Me), 2'-Fluoro (2'-F), 2'-deoxy-2'-fluoro-beta-D-arabino-nucleic acid (2'F-ANA), 4'-S, 4'- SFANA, 2'-azido, UNA, 2'-0-methoxy-ethyl (2'-0-ME), 2'-0-Allyl, 2'-0-Ethylamine, I'-O- Cyanoethyl, Locked nucleic acid (LAN), Methylene-cLAN, N-MeO-amino BNA, or N-MeO- aminooxy BNA.
  • the chemically modified phosphodiester linkage is selected from Phosphorothioate (PS), Boranophosphate, phosphodithioate (PS2), 3',5'-amide, N3'- phosphoramidate (NP), Phosphodiester (PO), or 2',5'-phosphodiester (2',5'-PO).
  • a guide ssDNA sequence is any polynucleotide sequence having sufficient complementarity with a target polynucleotide sequence to hybridize with the target sequence and direct cleavage of the target sequence.
  • the degree of complementarity between a guide ssDNA sequence and its corresponding RNA target sequence, when optimally aligned using a suitable alignment algorithm is about or more than about 50%, 60%, 75%, 80%, 85%, 90%, 95%, 97.5%, 99%, or more.
  • the guide ssDNA is perfectly complementary (has perfect complementarity) with its corresponding RNA target sequence, when optimally aligned using a suitable alignment algorithm.
  • RNA sequence comprising: binding to a target RNA sequence a complex comprising:
  • oligonucleotide guide molecule a heterologous, single-stranded oligonucleotide guide molecule
  • the single-stranded oligonucleotide guide molecule is a DNA oligonucleotide that is complementary to the target RNA sequence
  • Argonaute polypeptide:guide molecule complex cleaves the target RNA sequence.
  • a method for attenuating expression of a target gene in a cell comprising:
  • yeast Argonaute polypeptide introducing into the cell a yeast Argonaute polypeptide
  • ssDNA single stranded DNA
  • the yeast Argonaute polypeptide is from Vanderwaltozyma polyspora (also known as Kluyveromyces polysporus). In one embodiment, the yeast Argonaute polypeptide is selected from SEQ ID NO:31, SEQ ID NO:32, or SEQ ID NO:33. In one embodiment, the yeast Argonaute polypeptide is SEQ ID NO:31. In one embodiment, the yeast Argonaute polypeptide is SEQ ID NO:32. In one embodiment, the yeast Argonaute polypeptide is SEQ ID NO:33.
  • the yeast Argonaute polypeptide has at least 60% (for example, at least 60%, at least 65%, at least 70%, at least 80%, at least 85%, at least 90%, at least 95%, at least 99%) identity to SEQ ID NO:31, SEQ ID NO:32, or SEQ ID NO:33. In some embodiments, the yeast Argonaute polypeptide has at least 60% (for example, at least 60%, at least 65%, at least 70%, at least 80%, at least 85%, at least 90%, at least 95%, at least 99%) identity to SEQ ID NO:31.
  • the yeast Argonaute polypeptide has at least 60% (for example, at least 60%, at least 65%, at least 70%, at least 80%, at least 85%, at least 90%, at least 95%, at least 99%) identity to SEQ ID NO:32. In some embodiments, the yeast Argonaute polypeptide has at least 60% (for example, at least 60%, at least 65%, at least 70%, at least 80%, at least 85%, at least 90%, at least 95%, at least 99%) identity to SEQ ID NO:33.
  • the single-stranded oligonucleotide guide molecule is about 12 to about 45 nucleotides. In some embodiments, the single-stranded oligonucleotide guide molecule is about 12, about 13, about 14, about 15, about 16, about 17, about 18, about 19, about 20, about 21 , about 22, about 23, about 24, about 25, about 26, about 27, about 28, about 29, about 30, about 31 , about 32, about 33, about 34, about 35, about 36, about 37, about 38, about 39, about 40, about 41 , about 42, about 43, about 44, or about 45 nucleotides. In one embodiment, the single-stranded oligonucleotide guide molecule is about 12 to about 30 nucleotides.
  • the single-stranded oligonucleotide guide molecule is about 14 to about 26 nucleotides. In some embodiments, the single-stranded oligonucleotide guide molecule is about 21 to about 25 nucleotides. In one embodiment, the single-stranded oligonucleotide guide molecule is about 21 nucleotides. In one embodiment, the single- stranded oligonucleotide guide molecule is about 22 nucleotides. In one embodiment, the single-stranded oligonucleotide guide molecule is about 23 nucleotides. In one embodiment, the single-stranded oligonucleotide guide molecule is about 24 nucleotides.
  • the single-stranded oligonucleotide guide molecule is about 25 nucleotides. In one embodiment, the single-stranded oligonucleotide guide molecule is 21 nucleotides. In one embodiment, the single-stranded oligonucleotide guide molecule is 22 nucleotides. In one embodiment, the single-stranded oligonucleotide guide molecule is 23 nucleotides. In one embodiment, the single-stranded oligonucleotide guide molecule is 24 nucleotides. In one embodiment, the single-stranded oligonucleotide guide molecule is 25 nucleotides.
  • the target RNA sequence is from a mammal. In one embodiment, the target RNA sequence is from a human. In one embodiment, the target RNA sequence is from a virus. In one embodiment, the target RNA sequence is from a pathogen. In one embodiment, the target RNA sequence is from a bacterium. In one embodiment, the target RNA sequence is from a prokaryotic cell. In one embodiment, the target RNA sequence is from a eukaryotic cell.
  • a method of detecting a target RNA in a sample comprising:
  • the complex comprises:
  • oligonucleotide guide molecule a heterologous, single-stranded oligonucleotide guide molecule
  • the single-stranded oligonucleotide guide molecule is a DNA oligonucleotide that is complementary to a target RNA sequence
  • Argonaute polypeptide:guide molecule complex cleaves the target RNA sequence
  • kits comprising:
  • a vector comprising a nucleotide sequence encoding a yeast Argonaute polypeptide operably linked to a promoter;
  • oligonucleotide guide molecule a heterologous, single-stranded oligonucleotide guide molecule
  • the single-stranded oligonucleotide guide molecule is a DNA oligonucleotide that is complementary to a target RNA sequence.
  • Non-limiting examples of vectors that can be used to introduce expression vectors that encode Argonaute in various cell types a nucleic acid vector (e.g., a plasmid vector) encoding Argonaute can be delivered directly to bacterial cells or cultured cells (e.g., mammalian cells) by electroporation; a nucleic acid vector (e.g., a plasmid vector) encoding Argonaute can be delivered directly to bacterial cells by chemical transformation; a viral vector (e.g., a retroviral vector, adenoviral vector, an adeno associated viral vector, an alphavirus vector, a vaccinia viral vector, a herpes viral vector, etc., as are known in the art) comprising a nucleotide sequence encoding Argonaute can be used to deliver Argonaute to cells (e.g., mammalian cells); a baculovirus expression system can be used to deliver Argonaute to insect cells; Agro
  • the gene sequence (for example, of a gene expressing Argonaute) may be codon optimized, without changing the resulting polypeptide sequence.
  • the codon optimization includes replacing at least one, or more than one, or a significant number, of codons with one or more codons that are more frequently used in various organisms. In some embodiments, the codon optimization increases expression of the optimized gene sequence.
  • a DNA-guided RNA cleavage system for high- throughput detection of nuclease accessibility sites, the system comprising a first complex comprising a first yeast Argonaute polypeptide and a first single-stranded DNA oligonucleotide guide molecule; and a second complex comprising a second yeast Argonaute polypeptide and a second single-stranded DNA oligonucleotide guide molecule; wherein the first and second single-stranded DNA oligonucleotide guide molecules are not identical and are complementary to a target RNA sequence.
  • the Argonaute polypeptide is from a yeast. In some embodiments, the Argonaute polypeptide is from Vanderwaltozyma polyspora (also known as Kluyveromyces polysporus). Additional non-limiting examples of yeast Argonaute polypeptides can be from additional yeast species of the genus Kluyveromyces: K. aestuari, K. africanus, K. bacillisporus , K. blattae, K. dobzhanskii, K. hubeiensis, K. lactis, K. lodderae, K. marxianus, K. nonfermentans, K. piceae, K. sinensis, K. thermotolerans, K.
  • yeast Argonaute polypeptides can be from Yarrowia lipolytica, Pichia pastori, Candida vulgaris, Saccharomyces castellii, or Schizosaccharomyces pombe.
  • the Argonaute polypeptide is from a eukaryote. In some embodiments, the Argonaute polypeptide is from a mammal. In some embodiments, the Argonaute polypeptide is from a primate. In some embodiments, the Argonaute polypeptide is from a human (for example, hAGO 1 , hAG02, hAG03, or hAG04). The number of Argonaute family members (genes) ranges from one in Schizosaccharomyces pombe to twenty-seven in Caenorhabditis elegans.
  • Argonaute proteins are found in Homo sapiens (8), Rattus norvegicus (8), Rattus norvegicus (8), Drosophila melanogaster (5), Arabidopsis thaliana (10), and Neurospora crassa (2). (Hock, J and G Meister. Genome Biology 2008 9:210). Argonautes are key components of RISC in mammals, fungi, worms, protozoans and plants (M.A. Carmell et al, Nat. Struct. Mol. Biol. 1 1, 214 (2004)).
  • the Argonaute polypeptide is a full length Argonaute polypeptide. In some embodiments, the Argonaute polypeptide comprises a portion of the Argonaute protein.
  • the Argonaute polypeptide is a wild-type sequence. In one embodiment, the Argonaute polypeptide is a sequence with at least one mutation. In one embodiment, the Argonaute polypeptide comprises an amino acid sequence that is different from a naturally-occurring Argonaute polypeptide.
  • system and methods may comprise additional polypeptides in addition to the Argonaute polypeptide.
  • additional components of the RISC complex may be present.
  • the yeast Argonaute polypeptide is from Vanderwaltozyma polyspora (also known as Kluyveromyces polysporus). In one embodiment, the yeast Argonaute polypeptide is selected from SEQ ID NO: 31 , SEQ ID NO:32, or SEQ ID NO:33. In one embodiment, the yeast Argonaute polypeptide is SEQ ID NO:31. In one embodiment, the yeast Argonaute polypeptide is SEQ ID NO:32. In one embodiment, the yeast Argonaute polypeptide is SEQ ID NO:33.
  • the yeast Argonaute polypeptide has at least 60% (for example, at least 60%, at least 65%, at least 70%, at least 80%, at least 85%, at least 90%, at least 95%, at least 99%) identity to SEQ ID NO:31, SEQ ID NO:32, or SEQ ID NO:33. In some embodiments, the yeast Argonaute polypeptide has at least 60% (for example, at least 60%, at least 65%, at least 70%, at least 80%, at least 85%, at least 90%, at least 95%, at least 99%) identity to SEQ ID NO:31.
  • the yeast Argonaute polypeptide has at least 60% (for example, at least 60%, at least 65%, at least 70%, at least 80%, at least 85%, at least 90%, at least 95%, at least 99%) identity to SEQ ID NO:32. In some embodiments, the yeast Argonaute polypeptide has at least 60% (for example, at least 60%, at least 65%, at least 70%, at least 80%, at least 85%, at least 90%, at least 95%, at least 99%) identity to SEQ ID NO:33.
  • the first yeast Argonaute polypeptide can be the same as the second yeast Argonaute polypeptide. However, in some embodiments, the first yeast Argonaute polypeptide can be a different Argonaute polypeptide compared to the second yeast Argonaute polypeptide.
  • the first single-stranded oligonucleotide guide molecule (occasionally referred to herein as a first "ssDNA guide molecule" or "gDNA”) is about 12 to about 45 nucleotides.
  • the first ssDNA guide molecule is about 12, about 13, about 14, about 15, about 16, about 17, about 18, about 19, about 20, about 21, about 22, about 23, about 24, about 25, about 26, about 27, about 28, about 29, about 30, about 31, about 32, about 33, about 34, about 35, about 36, about 37, about 38, about 39, about 40, about 41, about 42, about 43, about 44, or about 45 nucleotides.
  • the first ssDNA guide molecule is about 12 to about 30 nucleotides.
  • the first ssDNA guide molecule is about 14 to about 26 nucleotides. In some embodiments, the first ssDNA guide molecule is about 21 to about 25 nucleotides. In some embodiments, the first ssDNA guide molecule is about 21 nucleotides. In some embodiments, the first ssDNA guide molecule is about 22 nucleotides. In some embodiments, the first ssDNA guide molecule is about 23 nucleotides. In some embodiments, the first ssDNA guide molecule is about 24 nucleotides. In some embodiments, the first ssDNA guide molecule is about 25 nucleotides.
  • the second single-stranded oligonucleotide guide molecule (occasionally referred to herein as a second "ssDNA guide molecule" or “gDNA”) is about 12 to about 45 nucleotides.
  • the second ssDNA guide molecule is about 12, about 13, about 14, about 15, about 16, about 17, about 18, about 19, about 20, about 21, about 22, about 23, about 24, about 25, about 26, about 27, about 28, about 29, about 30, about 31, about 32, about 33, about 34, about 35, about 36, about 37, about 38, about 39, about 40, about 41, about 42, about 43, about 44, or about 45 nucleotides.
  • the second ssDNA guide molecule is about 12 to about 30 nucleotides.
  • the second ssDNA guide molecule is about 14 to about 26 nucleotides. In some embodiments, the second ssDNA guide molecule is about 21 to about 25 nucleotides. In some embodiments, the second ssDNA guide molecule is about 21 nucleotides. In some embodiments, the second ssDNA guide molecule is about 22 nucleotides. In some embodiments, the second ssDNA guide molecule is about 23 nucleotides. In some embodiments, the second ssDNA guide molecule is about 24 nucleotides. In some embodiments, the second ssDNA guide molecule is about 25 nucleotides.
  • the first and/or second ssDNA guide molecule is heterologous to the genomic DNA of a biological cell. In some embodiments, the first and/or second ssDNA guide molecule is homologous to the genomic DNA of a biological cell.
  • DNA oligonucleotide guide molecule are associated by any one or more intermolecular forces, such that a stable polypeptide-oligonucleotide complex contains the capacity to bind to and cleave an RNA molecule (occasionally referred to herein as an "Argonaute polypeptide:guide complex").
  • the intermolecular force(s) binding the yeast Argonaute polypeptide and the single-stranded DNA oligonucleotide guide molecule together can be any one, or combination, of intermolecular binding forces, for example covalent, ionic, ion-dipole, dipole, London dispersion, van der Wall's, hydrogen bonding forces and/or hydrophobic interaction.
  • the complex can contain other bound molecules, for instance, amino acids, proteins, nucleotides, polynucleotides, small molecules, lipids, carbohydrates, etc., so long as the complex retains the capacity to bind to and cleave an RNA molecule.
  • the complex contains other bound molecules typically present in a DISC complex or RISC complex.
  • DISC DNA-induced slicing complex
  • RISC RISC complex
  • the first and second single-stranded DNA oligonucleotide guide molecules are not identical. That is, the first and second ssDNA guide molecules do not contain the exact same oligonucleotide sequence.
  • the first and second single-stranded DNA oligonucleotide guide molecules are complementary to a target RNA sequence.
  • the target RNA sequence in some embodiments, is one continuous RNA molecule.
  • the target RNA sequence can be a target RNA sequence on different RNA molecules.
  • the target can be an isolated and/or purified RNA molecule, or can be mixed with other molecules (e.g., one or more additional RNA molecules).
  • the target RNA sequence can be mixed with cellular components, as in the case of crude extracts of cellular RNAs.
  • the target RNA sequence can be comprised within a cell.
  • the first and second ssDNA guide molecules bind target RNA sequences which are not identical. However, depending on the nucleotides of a ssDNA guide molecule which hybridize with the target RNA sequence, the first and second ssDNA guide molecules can bind overlapping or even identical target RNA sequences.
  • the DNA-guided RNA cleavage system for high-throughput detection of nuclease accessibility sites comprises more than two Argonaute polypeptide: guide complexes.
  • the DNA-guided RNA cleavage system comprises three, four, five, six, seven, eight, nine, ten, or more Argonaute polypeptide:guide complexes.
  • the high- throughput nature of the system can allow large numbers of Argonaute polypeptide:guide complexes to be used in the methods described herein.
  • the ssDNA guide molecules are not identical.
  • the DNA-guided RNA cleavage system comprises two or more Argonaute polypeptide:guide complexes comprising a library of single-stranded DNA oligonucleotide guide molecules.
  • the library can be designed randomly, or be based on intentional selection of DNA sequences.
  • the library can be used to form a collection of separately provided complexes (e.g., each ssDNA guide molecule is separately bound to an Argonaute polypeptide in a separate reaction mixture).
  • the library can be used to form a mixture of complexes (e.g., each ssDNA guide molecule is bound to an Argonaute polypeptide in a single mixture).
  • the target RNA sequence is not particularly limited and can be synthetic or natural.
  • a natural target RNA sequence can be from any biological cell or any organism.
  • the target RNA sequence is from a mammal.
  • the target RNA sequence is from a human.
  • the target RNA sequence is from a virus.
  • the target RNA sequence is from a pathogen.
  • the target RNA sequence is from a bacterium.
  • the target RNA sequence is from a prokaryotic cell.
  • the target RNA sequence is from a eukaryotic cell.
  • the target RNA is a 5'UTR RNA.
  • the target RNA is a genomic RNA (e.g., a viral genomic RNA), or a portion thereof.
  • the target RNA is from HIV-1 , Zika virus.
  • the target RNA is from a cell which expresses long coding RNAs (IncRNAs), for example and without limitation MALAT1 or XIST, or a cell which expresses IncRNAs (e.g., MALAT1 or XIST) at high levels.
  • the lncRNA is from a cancer cell or tumor.
  • the target RNA in some embodiments, can range in length from about 10 nucleotides to about 100,000 nucleotides, from about 100 nucleotides, to about 50,000 nucleotides, from about 300 nucleotides to about 10,000 nucleotides, or from about 500 nucleotides to about 5,000 nucleotides.
  • the target RNA can be range in length from any of the above minimums to any of the preceding maximum nucleotide lengths (e.g., from 10 nucleotides to about 10,000 nucleotides, or from about 300 nucleotides to about 100,000 nucleotides).
  • the target RNA in some embodiments, can have a length of at least 10 nucleotides, at least 50 nucleotides, at least 100 nucleotides, at least 200 nucleotides, at least 300 nucleotides, at least 500 nucleotides, at least 1 ,000 nucleotides, at least 2,500 nucleotides, at least 5,000 nucleotides, at least 7,000 nucleotides, at least 10,000 nucleotides or more.
  • the target RNA sequence can be analyzed by a computer-based or internet-based program which analyzes, predicts, and/or models nucleotide structure (e.g., the folding structure of a single-stranded RNA molecule).
  • Structural modeling can, in some instances, aid in selecting single-stranded DNA oligonucleotide guide molecules.
  • ssDNA guide molecules which are complementary to RNA sequences having unpaired nucleotides can be selected, as they may be predicted, in some instances, to have improved binding kinetics with a Argonaute polypeptide:guide complex.
  • the DNA encoding a yeast Argonaute polypeptide is encoded by SEQ ID NO:34, SEQ ID NO:35, or SEQ ID NO:36. In some embodiments, the DNA encoding a yeast Argonaute polypeptide is encoded by SEQ ID NO: 35. In some embodiments, the DNA encoding a yeast Argonaute polypeptide is encoded by a nucleic acid sequence which is at least 60%, at least 75%, at least 80%, at least 85%, at least 90%, at least 95%, at least 97.5%, or at least 99% identical to SEQ ID NO:34, SEQ ID NO:35, or SEQ ID NO:36.
  • the DNA encoding a yeast Argonaute polypeptide is encoded by a nucleic acid sequence which is at least 60%, at least 75%, at least 80%, at least 85%, at least 90%, at least 95%, at least 97.5%, or at least 99% identical to SEQ ID NO:35.
  • the first and/or second single-stranded oligonucleotide guide molecule (for example, ssDNA) has at least one chemically modified nucleotide.
  • modified nucleotides may confer increased stability, decreased off-target effects, and/or reduced toxicity, as compared to a ssDNA not having the chemically modified nucleotide.
  • the at least one chemically modified nucleotide comprises a chemically modified nucleobase, a chemically modified ribose, a chemically modified phosphodiester linkage, or a combination thereof.
  • the chemically modified nucleobase is selected from 5- formylcytidine (5fC), 5-methylcytidine (5meC), 5-methoxycytidine (5moC), 5- hydroxycytidine (5hoC), 5-hydroxymethylcytidine (5hmC), 5-formyluridine (5fU), 5- methyluridine (5-meU), 5-methoxyuridine (5moU), 5-carboxymethylesteruridine (5camU), pseudouridine ( ⁇ ), N ⁇ methylpseudouridine (me l F), N 6 -methyladenosine (me 6 A), or thienoguanosine ( th G).
  • the chemically modified ribose is selected from 2'-0-methyl (2'- O-Me), 2'-Fluoro (2'-F), 2'-deoxy-2'-fluoro-beta-D-arabino-nucleic acid (2'F-ANA), 4'-S, 4'- SFANA, 2'-azido, UNA, 2'-0-methoxy-ethyl (2'-0-ME), 2'-0-Allyl, 2'-0-Ethylamine, I'-O- Cyanoethyl, Locked nucleic acid (LAN), Methylene-cLAN, N-MeO-amino BNA, or N-MeO- aminooxy BNA.
  • the chemically modified phosphodiester linkage is selected from Phosphorothioate (PS), Boranophosphate, phosphodithioate (PS2), 3',5'-amide, N3'- phosphoramidate (NP), Phosphodiester (PO), or 2',5'-phosphodiester (2',5'-PO).
  • a guide ssDNA sequence is any polynucleotide sequence having sufficient complementarity with a target polynucleotide sequence to hybridize with the target sequence and direct cleavage of the target sequence.
  • the degree of complementarity between a guide ssDNA sequence and its corresponding RNA target sequence, when optimally aligned using a suitable alignment algorithm is about or more than about 50%, 60%, 75%, 80%, 85%, 90%, 95%, 97.5%, 99%, or more.
  • the guide ssDNA is perfectly complementary (has perfect complementarity) with its corresponding RNA target sequence, when optimally aligned using a suitable alignment algorithm.
  • a method of detecting nuclease accessibility sites in an RNA sequence comprising a) binding to a target RNA sequence a complex comprising a yeast Argonaute polypeptide and a first single-stranded DNA oligonucleotide guide molecule, wherein the single-stranded DNA oligonucleotide guide molecule is complementary to the target RNA sequence; b) cleaving the target RNA sequence with the Argonaute polypeptide:guide complex to form an RNA cleavage product; c) detecting the RNA cleavage product; and d) determining a nuclease accessibility site based on the RNA cleavage product.
  • the complex comprising a yeast Argonaute polypeptide and a first ssDNA guide molecule, wherein the ssDNA guide molecule is complementary to the target RNA sequence, can be any Argonaute polypeptide:guide complex described herein.
  • the binding step a) comprises binding to a target RNA sequence a second complex comprising a yeast Argonaute polypeptide and a second ssDNA guide molecule, wherein the second ssDNA guide molecule is complementary to the target RNA sequence.
  • the first and second ssDNA guide molecules are not identical. That is, the first and second ssDNA guide molecules do not contain the exact same oligonucleotide sequence.
  • binding the second Argonaute polypeptide:guide complex can be performed in a separate reaction compared to binding the first Argonaute polypeptide:guide complex. In some embodiments, binding the second Argonaute polypeptide:guide complex can be performed in a prior, contemporaneous, or subsequent reaction compared to binding the first Argonaute polypeptide:guide complex. As such, in some embodiments, the method can be a high throughput method. In some embodiments, binding the first and second Argonaute polypeptide:guide complexes can occur in an assay (e.g., in a 96-well or 384-well microtiter plate).
  • the target RNA may be cleaved in two (or more) locations if the first and second Argonaute polypeptide:guide complexes bind to and cleave the target RNA at different locations.
  • the separate reaction mixtures would contain different RNA cleavage products.
  • binding the second Argonaute polypeptide:guide complex can be performed in the same reaction mixture as the binding of the first Argonaute polypeptide: guide complex.
  • the target RNA may be cleaved in two (or more) locations if the first and second Argonaute polypeptide:guide complexes bind to and cleave the target RNA at different locations.
  • the resultant different RNA cleavage products will be present in the same reaction mixture.
  • the binding step a) comprises binding to a target RNA sequence a third complex comprising a yeast Argonaute polypeptide and a third single-stranded DNA oligonucleotide guide molecule (hereinafter, a "third Argonaute polypeptide:guide complex”), wherein the third single-stranded DNA oligonucleotide guide molecule is complementary to the target RNA sequence.
  • the binding step a) comprises binding to a target RNA sequence a fourth, fifth, sixth, seventh, eighth, ninth, tenth, or more Argonaute polypeptide:guide complexes.
  • the number of Argonaute polypeptide:guide complexes which can be bound to the target RNA is not particularly limited.
  • the binding step a) comprises binding to a target RNA sequence a library of Argonaute polypeptide:guide complexes.
  • the cleaving step b) is performed by a nuclease.
  • the nuclease comprises the Argonaute polypeptide:guide complex. After binding a target RNA sequence, the Argonaute polypeptide: guide complex can, in some embodiments, cleave the target RNA sequence.
  • the detecting step c) detects whether the target RNA sequence was cleaved by detecting an RNA cleavage product.
  • the detecting step c) comprises reverse transcribing the RNA cleavage product to form a cDNA reverse transcript.
  • the detecting step c) comprises amplifying or extending a cDNA reverse transcript.
  • the RNA cleavage product can be reverse transcribed and extended by reverse transcription polymerase-chain reaction (RT-PCR)-coupled primer extension.
  • Extension of the RNA cleavage product is ideally performed by binding a DNA primer complementary to the 3' end of the target RNA, then extending the primer along the RNA, using an RNA-dependent DNA polymerase, to the site of cleavage. Repetitive cycling of the RT-PCR-primer extension process amplifies the cDNA reverse transcripts.
  • the cDNA reverse transcript is separated based on size. Separating cDNA reverse transcripts (e.g., separating based on size) can aid in detecting and distinguishing the RNA cleavage products (via detecting and distinguishing the cDNA reverse transcripts). In some embodiments, the cDNA reverse transcript is separated by capillary electrophoresis.
  • Step d) requires determining a nuclease accessibility site based on the RNA cleavage product.
  • Nuclease accessibility sites can be determined by analyzing the RNA cleavage product or cDNA reverse transcript. In some embodiments, the nuclease accessibility site is determined by determining the 3 ' nucleotide in the cDNA reverse transcript. In some embodiments, the nuclease accessibility site is determined by determining the size of the cDNA reverse transcript, for example by electrophoresis, particularly capillary electrophoresis. In some or further embodiments, the nuclease accessibility site is determined based on the sequence of the RNA cleavage product, for example by sequencing the cDNA reverse transcript.
  • step d) determines only one nuclease accessibility site in the target RNA sequence. In some embodiments, more than one nuclease accessibility site is determined in the target RNA sequence. For example, two, three, four, or a plurality of nuclease accessibility site are determined in the target RNA sequence.
  • a method of high-throughput detection of nuclease accessibility sites comprising a) assaying a target RNA sequence with two or more Argonaute polypeptide: guide complexes, wherein each complex comprises a yeast Argonaute polypeptide and a single-stranded DNA oligonucleotide guide molecule from a library of single-stranded DNA oligonucleotide guide molecules, wherein each single-stranded DNA oligonucleotide guide molecule is complementary to a portion of the target RNA sequence; b) cleaving the target RNA sequence with the Argonaute polypeptide: guide complexes to form at least one RNA cleavage product; c) detecting the at least one RNA cleavage product; and d) determining a nuclease accessibility site based on the at least one RNA cleavage product.
  • Each complex comprising a yeast Argonaute polypeptide and a ssDNA guide molecule from a library of ssDNA guide molecules, wherein each ssDNA guide molecule is complementary to a portion of the target RNA sequence can be any Argonaute polypeptide:guide complex described herein.
  • the assaying step a) requires two or more Argonaute polypeptide: guide complexes, wherein each complex comprises a library of ssDNA guide molecules.
  • a "library" of single-stranded DNA oligonucleotide guide molecules means at least two or more (e.g., three, four, five, or a plurality of single-stranded DNA oligonucleotide guide molecules.
  • Each ssDNA guide molecule forms a separate complex with the yeast Argonaute polypeptide. As such, the method is a high throughput method.
  • the single-stranded DNA oligonucleotide guide molecules in the library are not identical. That is, any given two or more (e.g., first and second) ssDNA guide molecules do not contain the exact same oligonucleotide sequence.
  • Argonaute polypeptide:guide complexes can be performed in separate reactions. In other words, the target RNA sequence is assayed separately with each of the two or more Argonaute polypeptide:guide complexes. In some embodiments, assaying a target RNA sequence with two or more Argonaute polypeptide: guide complexes can be performed in prior, contemporaneous, or subsequent reactions. In some embodiments, assaying a target RNA sequence with two or more Argonaute polypeptide:guide complexes can occur in a 96-well or 384-well microtiter plate.
  • the target RNA may be cleaved in two (or more) locations if the two or more Argonaute polypeptide:guide complexes bind to and cleave the target RNA at different locations.
  • the separate reaction mixtures would contain different RNA cleavage products.
  • assaying a target RNA sequence with two or more Argonaute polypeptide:guide complexes can be performed in the same reaction mixture.
  • the target RNA sequence is assayed together with each of the two or more Argonaute polypeptide:guide complexes in a mixture.
  • the target RNA may be cleaved in two (or more) locations if the two or more Argonaute polypeptide:guide complexes bind to and cleave the target RNA at different locations.
  • the resultant different RNA cleavage products will be present in the same reaction mixture.
  • the library of ssDNA guide molecules comprises at least three, four, five, six, seven, eight, nine, ten, or more single-stranded DNA oligonucleotide guide molecules.
  • the assay step a) comprises assaying a target RNA sequence with at least three, four, five, six, seven, eight, nine, ten, or more Argonaute polypeptide:guide complexes.
  • the number of Argonaute polypeptide:guide complexes which can be bound to the target RNA is not particularly limited.
  • the primary limiting feature of the number of Argonaute polypeptide:guide complexes which can be bound to the target RNA is the number of distinct ssDNA guide molecule which can bind to the target RNA.
  • the cleaving step b) is performed by a nuclease.
  • the nuclease comprises the Argonaute polypeptide:guide complex. After binding a target RNA sequence, the Argonaute polypeptide: guide complex can, in some embodiments, cleave the target RNA sequence.
  • the detecting step c) detects whether the target RNA sequence was cleaved by detecting an RNA cleavage product.
  • the detecting step c) comprises reverse transcribing the RNA cleavage product to form a cDNA reverse transcript.
  • the detecting step c) comprises amplifying or extending a cDNA reverse transcript.
  • the RNA cleavage product can be reverse transcribed and extended by reverse transcription polymerase-chain reaction (RT-PCR)-coupled primer extension.
  • Extension of the RNA cleavage product is ideally performed by binding a DNA primer complementary to the 3' end of the target RNA, then extending the primer along the RNA, using an RNA-dependent DNA polymerase, to the site of cleavage. Repetitive cycling of the RT-PCR-primer extension process amplifies the cDNA reverse transcripts.
  • the cDNA reverse transcript is separated based on size. Separating cDNA reverse transcripts (e.g., separating based on size) can aid in detecting and distinguishing the RNA cleavage products (via detecting and distinguishing the cDNA reverse transcripts). In some embodiments, the cDNA reverse transcript is separated by capillary electrophoresis.
  • Step d) requires determining a nuclease accessibility site based on the RNA cleavage product.
  • Nuclease accessibility sites can be determined by analyzing the RNA cleavage product or cDNA reverse transcript. In some embodiments, the nuclease accessibility site is determined by determining the 3 ' nucleotide in the cDNA reverse transcript. In some embodiments, the nuclease accessibility site is determined by determining the size of the cDNA reverse transcript, for example by electrophoresis, particularly capillary electrophoresis. In some or further embodiments, the nuclease accessibility site is determined based on the sequence of the RNA cleavage product, for example by sequencing the cDNA reverse transcript.
  • step d) determines only one nuclease accessibility site in the target RNA sequence. In some embodiments, more than one nuclease accessibility site is determined in the target RNA sequence. For example, two, three, four, or a plurality of nuclease accessibility site are determined in the target RNA sequence.
  • a method of detecting sites for gene expression attenuation in a cell comprising: a) introducing into a biological cell a yeast Argonaute polypeptide and a library of single-stranded DNA oligonucleotide guide molecules, wherein each single-stranded DNA oligonucleotide guide molecule is complementary to a target RNA molecule; b) cleaving the target RNA sequence with the Argonaute polypeptide: guide complexes to form at least one RNA cleavage product; c) detecting the at least one RNA cleavage product; and d) determining a nuclease accessibility site based on the at least one RNA cleavage product.
  • the biological cell can be any biological cell containing RNA.
  • the biological cell is a mammalian cell.
  • the biological cell is a human cell.
  • a method of attenuating expression of a target gene in a cell comprising a) introducing into a biological cell a yeast Argonaute polypeptide and a library of single-stranded DNA oligonucleotide guide molecules, wherein each single-stranded DNA oligonucleotide guide molecule is complementary to a target RNA molecule; and b) cleaving the target RNA sequence with the Argonaute polypeptide:guide complexes, wherein cleaving the target RNA sequence attenuates the expression of the target gene.
  • the biological cell can be any biological cell containing RNA.
  • the biological cell is a mammalian cell.
  • the biological cell is a human cell.
  • a method of mapping nuclease accessibility sites in an RNA sequence comprising a) binding to a target RNA sequence a complex comprising a yeast Argonaute polypeptide and a first single-stranded DNA oligonucleotide guide molecule, wherein the single-stranded DNA oligonucleotide guide molecule is complementary to the target RNA sequence; b) cleaving the target RNA sequence with the Argonaute polypeptide:guide complex to form an RNA cleavage product; c) detecting the RNA cleavage product; and d) mapping nuclease accessibility site based on the RNA cleavage product.
  • a kit comprising a vector comprising a nucleic acid sequence encoding a yeast Argonaute polypeptide operably linked to a promoter; an RNA- dependent DNA polymerase; a set of buffered RNA cleavage reagents; and a set of buffered reverse transcription reagents.
  • the Argonaute polypeptide is from a yeast.
  • the Argonaute polypeptide is from Vanderwaltozyma polyspora (also known as Kluyveromyces polysporus).
  • the Argonaute polypeptide is from a eukaryote.
  • the Argonaute polypeptide is from a mammal. In some embodiments, the Argonaute polypeptide is from a primate. In some embodiments, the Argonaute polypeptide is from a human (for example, hAGOl, hAG02, hAG03, or hAG04).
  • the library of ssDNA guide molecule comprises ssDNA guide molecules which are not identical. That is, the any given two ssDNA guide molecules do not contain the exact same oligonucleotide sequence.
  • the library of ssDNA guide molecules can be complementary to a target RNA sequence.
  • the target RNA sequence in some embodiments, is one continuous RNA molecule.
  • the target RNA sequence can be a target RNA sequence on different RNA molecules.
  • the first and second ssDNA guide molecules bind target RNA sequences which are not identical. However, depending on the nucleotides of a ssDNA guide molecule which hybridize with the target RNA sequence, the first and second ssDNA guide molecules can bind overlapping or even identical target RNA sequences.
  • Non-limiting examples of vectors that can be used to introduce expression vectors that encode Argonaute in various cell types a nucleic acid vector (e.g., a plasmid vector) encoding Argonaute can be delivered directly to bacterial cells or cultured cells (e.g., mammalian cells) by electroporation; a nucleic acid vector (e.g., a plasmid vector) encoding Argonaute can be delivered directly to bacterial cells by chemical transformation; a viral vector (e.g., a retroviral vector, adenoviral vector, an adeno associated viral vector, an alphavirus vector, a vaccinia viral vector, a herpes viral vector, etc., as are known in the art) comprising a nucleotide sequence encoding Argonaute can be used to deliver Argonaute to cells (e.g., mammalian cells); a baculovirus expression system can be used to deliver Argonaute to insect cells; Agro
  • the gene sequence (for example, of a gene expressing Argonaute) may be codon optimized, without changing the resulting polypeptide sequence.
  • the codon optimization includes replacing at least one, or more than one, or a significant number, of codons with one or more codons that are more frequently used in various organisms. In some embodiments, the codon optimization increases expression of the optimized gene sequence.
  • MicroRNAs are the regulatory small RNAs that control gene expression by inhibition of translation or degradation of messenger RNAs (mRNAs) containing a complementary sequence.
  • miRNAs To degrade the target mRNAs, miRNAs need to be loaded onto Argonaute (AGO) proteins, forming a ribonucleoprotein complex called the RNA-induced silencing complex (RISC).
  • a complex of an AGO and a guide strand alone is referred to as 'the mature RISC' or simply 'RISC'.
  • the same complex is also called 'the RISC core' in the context when the RISC stands for a huge complex including many components required for translational repression and/or deadenylation.
  • the bound guide strand takes the RISC to the target mRNAs, which often possess the sequence complementarity to the guide in the 3' untranslated region (3' UTR).
  • the AGO proteins belong to the PIWI protein superfamily, defined by the presence of a PIWI (P element-induced wimpy testis) domain.
  • PIWI P element-induced wimpy testis
  • all eukaryotic Argonautes eAGOs
  • N N-terminal
  • PAZ PIWI-Argonaute-Zwille
  • MID middle
  • Many prokaryotic genomes also feature ago genes.
  • Long prokaryotic Argonaute proteins (pAGOs) encompass the same domains as eAGOs, whereas short pAGOs consist of only the MID and PIWI domains.
  • the term “Argonaute” refers to a protein which mediates RNA cleavage and has an amino acid sequence at least 60 percent identical, and more preferably at least 75, 85, 90 or 95 percent identical to SEQ ID NO: 31.
  • the term “yeast Argonaute” refers to a protein, from a yeast, which mediates RNA cleavage and has an amino acid sequence at least 60 percent identical, and more preferably at least 75, 85, 90 or 95 percent identical to SEQ ID NO: 31.
  • the Argonaute polypeptide is from a yeast. In some embodiments, the Argonaute polypeptide is from Vanderwaltozyma polyspora (also known as
  • yeast Argonaute polypeptides can be from additional yeast species of the genus Kluyveromyces: K. aestuari,;
  • yeast Argonaute polypeptides can be from Yarrowia lipolytica, Pichia pastori, Candida vulgaris,
  • Saccharomyces castellii or Schizosaccharomyces pombe.
  • the Argonaute polypeptide is from a eukaryote. In some embodiments, the Argonaute polypeptide is from a mammal. In some embodiments, the Argonaute polypeptide is from a primate. In some embodiments, the Argonaute polypeptide is from a human (for example, hAGO 1 , hAG02, hAG03, or hAG04). The number of Argonaute family members (genes) ranges from one in Schizosaccharomyces pombe to twenty-seven in Caenorhabditis elegans.
  • Argonautes are key components of RISC in mammals, fungi, worms, protozoans and plants (M.A. Carmell et al, Nat. Struct. Mol. Biol. 11, 214 (2004)).
  • RNA molecule comprising:
  • RNA sequence binding to a target RNA sequence comprising:
  • oligonucleotide guide molecule a heterologous, single-stranded oligonucleotide guide molecule
  • the single-stranded oligonucleotide guide molecule is a DNA oligonucleotide that is complementary to the target RNA sequence
  • Argonaute polypeptide:guide molecule complex cleaves the target RNA sequence.
  • the Argonaute polypeptide is a full length Argonaute polypeptide. In some embodiments, the Argonaute polypeptide comprises a portion of the Argonaute protein. In some embodiments, disclosed herein is a truncated Argonaute polypeptide termed "miniature Argonaute (mini-AGO)". In some embodiments, disclosed herein is an Argonaute polypeptide comprising SEQ ID NO:33. In some embodiments, the Argonaute polypeptide is isolated and/or purified.
  • the Argonaute polypeptide is a wild-type sequence. In one embodiment, the Argonaute polypeptide is a sequence with at least one mutation. In one embodiment, the Argonaute polypeptide comprises an amino acid sequence that is different from a naturally-occurring Argonaute polypeptide.
  • the Argonaute polypeptide is selected from SEQ ID NO:31, SEQ ID NO:32, or SEQ ID NO:33. In one embodiment, the Argonaute polypeptide is SEQ ID NO:31. In one embodiment, the Argonaute polypeptide is SEQ ID NO:32. In one embodiment, the Argonaute polypeptide is SEQ ID NO: 33.
  • system and methods may comprise additional polypeptides in addition to the Argonaute polypeptide.
  • additional components of the RISC complex may be present.
  • RNA-induced silencing complex RISC (Meister, G. Nat Rev Genet 14 5 447-459 (2013); Nakanishi, K. Wiley Interdiscip Rev RNA 7, 637-660 (2016); Hammond, S. M., et al. Science 293, 1146-1150 (2001)).
  • RISC RNA-induced silencing complex
  • the loaded RNAs pre-organized in the nucleic acid-binding channel (Elkayam, E. et al. Cell 150, 100-110 (2012); Faehnle, C. R., et al. Cell Rep 3, 1901-1909 (2013); Nakanishi, K. et al.
  • mini-AGO ayeast Argonaute C-terminal lobe
  • RNAi RNA interference
  • the two lobes are connected by two strands, ⁇ in the N-domain and ⁇ 20 in the L2 linker domain, both of which are part of an extended ⁇ -sheet of the PIWI domain (Fig. 5a,b).
  • Preceding the ⁇ , a conserved RxxxGxxG (R, arginine; and G, glycine) sequence motif sews through the PIWI domain, significantly stabilizing the C-terminal lobe (Fig. lb and Fig. 6).
  • ssDNAs 5'-monophosphorylated single-stranded DNAs
  • the recombinant protein was incubated with a synthetic 5' phosphorylated ssDNA of the genomic sequence of miR-20a (Fig. If), followed by addition of the cap-labeled 60-nt matched RNA target.
  • the deoxyribonucleoprotein complex cleaved the RNA target (Fig.
  • yeast Argonaute can use either DNA or RNA as guides to cleave only RNAs. This is not consistent with the substrate specificities of prokaryotic Argonaute proteins, which exclusively use a DNA or RNA guide to target both DNA and RNA (Table 1) (Swarts, D. C. etal. Nature 507, 258-261 (2014); Kaya, E. et al. Proc Natl Acad Sci U SA 113, 4057-4062 (2016); Olovnikov, I., et al. Mol Cell 51, 594-605 (2013)).
  • a Values in parentheses are for highest-resolution shell.
  • the structure showed a clear electron density map of the bound RNA whose 5' nucleotide was captured at the interface between the MID and PIWI domains while the remainder ran along the exposed nucleic acid-binding channel, as does the AGO-bound guide nucleotides 1-7 (gl- g7) (Fig. 2c and Fig. 7b-d) (Elkayam, E. et al. Cell 150, 100-110 (2012); Faehnle, C. R, et al. Cell Rep 3, 1901-1909 (2013); Nakanishi, K. et al. Cell Rep 3, 1893-1900 (2013); Nakanishi, K., et al.
  • mini-AGO is a competent construct in terms of guide-dependent target cleavage, which raised the question as to whether mini- AGO can load an siRNA duplex, cleave and discard the passenger strand, and recognize and cleave target RNAs, like natural Argonaute proteins do physiologically.
  • each stage was tested in vitro comparing to AGO (Fig. 8a).
  • An siRNA duplex in which one strand corresponded exactly to miR-20a (Fig. 8b) was incubated with either AGO or mini-AGO.
  • Mini- AGO cleaved the 5'-end labeled passenger strand of the miR-20a siRNA at the expected position, as did AGO (Fig. 3a, and Fig.
  • mini-AGO was pre-incubated with an unlabeled miR-20a siRNA (Fig. 8e), followed by addition of a cap-labelled target RNA containing a sequence perfectly matched to the miR-20a guide.
  • mini-AGO generated a cleavage product of expected size, as did AGO, diagnostic of RNAi activity (Fig. 3b, and Fig. 8f,g).
  • target cleavage occurs only when there is extensive base pairing between the two strands (Martinez, J., et al. Cell 1 10, 563-574 (2002); Hutvagner, G. & Zamore, P. D. Science 297, 2056-2060 (2002)).
  • target RNAs whose bases break Watson-Crick pairing to the guide strand at glO and gl 1, termed the tlO-tl l step, were poor substrates for AGO (Fig. 3c,d left, and Fig. 9a) as previously shown (Nakanishi, K., et al.
  • mini-AGO was able to efficiently cleave the target including the tlO-tl l mismatches (Fig. 3c,d right, and Fig. 9b).
  • This result indicates that the N-terminal lobe is essential to modulate target cleavage in response to mismatches, which is another important feature of catalytically active Argonaute proteins.
  • the tapered channel may serve as a physical barrier to check the base complementarity between g9-gl2 and t9-tl2 prior to target cleavage between tlO and tl 1.
  • miR- 20a variants trimmed at their 3' ends into different sizes (10, 11, 12, 13, 14, 16, or 23 nt) were loaded into either AGO or mini-AGO, followed by addition of the cap-labelled matched target (Fig. 10a, b).
  • the 12-nt guide promoted the onset of target cleavage by the AGO-RISC (Fig 4a), indicating that a minimum of 12-nt of guide is required to widen the tapered channel by base pairing with the bound target.
  • RISCs revealed the solvent-exposed g2-g4 of Argonaute-bound guide RNAs (Elkayam, E. et al. Cell 150, 100-110 (2012); Nakanishi, K., et al. Nature 486, 368-374 (2012); Wang, Y., et al. Nature 456, 209-213 (2008); Schirle, N. T. & MacRae, I. J. Science 336, 1037-1040 (2012); Schirle, N. T., et al. Science 346, 608-613 (2014)) from which the unidirectional base pairing nucleates and propagates towards the 3' end of the guide (Yao, C, et al. Mol Cell 59, 125-132 (2015)).
  • mini-AGO uses g2-g4 as the primary seed as well. Since 14-nt guide catalyzed cleavage almost as efficiently as the 23- nt guide (Fig. 4b), systematic dinucleotide mismatches were made on the 14-nt guide to evaluate how the mismatches affect seed-dependent target cleavage. Either matched or mismatched guides were loaded into AGO and mini-AGO, followed by addition of the cap- labeled 60-nt target. Two-nt mismatches within the g2-g4 window affected the 14-nt guide- dependent target cleavage by AGO (Fig. 4e).
  • mini-AGO enabled mini-AGO to cleave the target as efficiently as the perfectly matched guide, indicating that g2-g4 no longer served as a guide in the absence of the N-terminal lobe.
  • This result suggests that making all the bases of the bound RNA accessible to target RNAs simultaneously results in shortening the requirement for nucleotides capable of serving as a guide.
  • Mini-AGO was also more tolerant to 2-nt mismatches within the g8-gl l window than AGO (Fig. 4e), supporting the aforementioned gatekeeper model of the tapered channel.
  • DNA encoding the designed mini-AGO from K. polysporus Argonaute was generated by first amplifying the MID-PIWI lobe from ⁇ . polysporus Agol using Primer Set I (FW1 : GACATT TTGACAGGTTCAGGTAGAGTACCATCTCGTATTCTAGATGCCCC (SEQ ID NO: l) & RV1 : GCGCGC
  • the gene was cloned into a modified pRSF Duet vector (Novagen) containing an amino-terminal Ulpl-cleavable His6-SUMO tag. Mini-AGO was overexpressed in E. coli BL21 (DE3) Rosetta2 (Novagen).
  • Cell extract was prepared by homogenization in Buffer A (10 mM phosphate buffer pH 7.3, 2 M NaCl, 25 mM imidazole, 10 mM ⁇ - mercaptoethanol, 1 mM phenylmethylsulfonyl fluoride) and clarified by centrifugation. The supernatant was loaded onto a nickel column (GE Healthcare), washed with Buffer A, and eluted with a linear gradient to 100% Buffer B (10 mM phosphate buffer pH 7.3, 1 M NaCl, 750 mM imidazole, 10 mM ⁇ -mercaptoethanol).
  • Buffer A 10 mM phosphate buffer pH 7.3, 2 M NaCl, 25 mM imidazole, 10 mM ⁇ -mercaptoethanol, 1 mM phenylmethylsulfonyl fluoride
  • Fractions containing mini-AGO were mixed with Ulpl protease and dialyzed overnight against Buffer C (10 mM phosphate buffer pH 7.3, 500 mM NaCl, 20 mM imidazole, 10 mM ⁇ -mercaptoethanol) and the digested protein was loaded onto a nickel column (GE Healthcare) to remove the cleaved His6-SUMO tag.
  • Buffer C 10 mM phosphate buffer pH 7.3, 500 mM NaCl, 20 mM imidazole, 10 mM ⁇ -mercaptoethanol
  • the flow-through sample containing mini-AGO was dialyzed against Buffer D (10 mM phosphate buffer pH 7.3, 10 mM ⁇ -mercaptoethanol), loaded onto a SP column (GE Healthcare), and eluted with a linear gradient to 70% Buffer E (10 mM phosphate buffer pH 7.3, 2 M NaCl, 10 mM ⁇ -mercaptoethanol).
  • Buffer D 10 mM phosphate buffer pH 7.3, 2 M NaCl, 10 mM ⁇ -mercaptoethanol
  • Fractions containing mini-AGO were dialyzed against Buffer D, loaded onto a MonoQ column (GE Healthcare), and eluted with a linear gradient to 100% Buffer E.
  • Mini-AGO was again dialyzed against Buffer D and loaded onto a MonoS column (GE Healthcare) and eluted over a linear gradient to 14% Buffer E.
  • the eluted protein was dialyzed against Buffer F (10 mM Tris-HCl pH 7.5, 200 mM NaCl, 5 mM DTT), concentrated by ultrafiltration, and loaded onto a HiLoad 16/600 Superdex 200 column (GE Healthcare) equilibrated with Buffer F. Purified mini-AGO was concentrated to approximately 40 mg mL " 1 measured by Bradford Assay (Bio-Rad), and stored at -80 °C.
  • Buffer F 10 mM Tris-HCl pH 7.5, 200 mM NaCl, 5 mM DTT
  • Polynucleotides were extracted from either AGO, mini-AGO, or water (for mock) by phenol xholoroform and dephosphorylated with Alkaline Phosphatase (Roche) by incubation at 37 °C for 30 minutes. Reactions were quenched by the addition of EDTA to a final concentration of 10 mM followed by inactivation of phosphatase by incubation at 70 °C for 30 minutes. Prior to 5' labelling, samples were supplemented with 10 mM MgCh.
  • Point mutations Arg227, Gly231, or Gly234 were introduced by PCR-based mutagenesis to generate vectors encoding mutant mini-AGO.
  • the mutants were overexpressed in E. coli BL21 (DE3) Rosetta2 (Novagen). After ultrasonication, the cell lysate was centrifuged to separate the soluble fraction from the pellet. The pellet was resuspended in original volume using Buffer A. Representative samples of the supernatant and pellet for each construct were resolved by SDS-PAGE.
  • RNA and DNA oligonucleotides used in this study is provided (Tables 3, 4, 5, and 6).
  • 5' phosphorylated guide RNAs were chemically synthesized (Dharmacon), deprotected, and gel-purified.
  • DNA guides were chemically synthesized (Sigma Aldrich), 5' phosphorylated using OptiKinase (Affirmatory), and gel purified.
  • the sequences encoding target RNAs were cloned into pUC19 vector and transcribed in vitro using T7 RNA polymerase.
  • DNase-treated transcripts were gel-purified, capped using ScriptCap m 7 G Capping System (CellScript) either with GTP for unlabeled targets or with [oc- 2 P]GTP (3000 Ci mmol "1 ) for cap-labelled target RNAs and gel purified again.
  • DNA target was chemically synthesized (Sigma Aldrich), 5' end- labelled with OptiKinase (Affymatrix) and [ ⁇ 2 ⁇ ] ATP (3000 Ci mmol "1 ) before gel purification.
  • RNA was phosphorylated using OptiKinase (Affymetrix) either with ATP for unlabeled passenger strands or with [ ⁇ - 2 ⁇ ] ⁇ (3000 Ci mmol "1 ) for 5'- 2 P labelled passenger strands.
  • OptiKinase Affymetrix
  • siRNA duplexes were prepared as described previously (Nakanishi, K., et al. Nature 486, 368-374 (2012)).
  • miR-20a gl0gl l 5'p UAAAGUGCUCCUAG (SEQ ID NO: 23) miR-20a gl lgl2 5'p UAAAGUGCUUCCAG (SEQ ID NO: 24) miR-20a gl2gl3 5'p UAAAGUGCUUACCG (SEQ ID NO: 25) miR-20a gl3gl4 5'p UAAAGUGCUUAUCU (SEP ID NO: 26)
  • siRNA-mediated target cleavage For siRNA-mediated target cleavage (shown in Figure 3b), 1 ⁇ of either AGO or mini-AGO was pre-incubated with an unlabeled siRNA duplex at 30 °C for 30 minutes to allow for passenger strand cleavage and RISC maturation followed by addition of 10 nM cap-labelled target RNA. Reactions were quenched at indicated time points by addition of formamide loading buffer.
  • AGO or mini-AGO was pre-incubated with a single-stranded synthetic guide RNA at 25 °C for 30 minutes before addition of cap-labelled target RNAs at 30 °C for 20 minutes. Reactions were quenched with formamide loading dye, resolved by 16% denaturing PAGE, and visualized by phosphorimaging. Gels were quantified by ImageQuant (GE Healthcare). Cleavage assays using either DNA guides or DNA targets were performed similarly.
  • R k 100 x [C k /(C k + U k )] /[C 23 /(C 23 + U 23 )] where C k and U k are the intensities of the cleaved and uncleaved bands, respectively.
  • R k ' 100 x [C k /(C k + U k )]/[C 14 /(C 14 + U 14 )] where C and U are the intensities of the cleaved and uncleaved bands, respectively. Equation 4.
  • C mis and U mis are the intensities of the cleaved and uncleaved bands derived from the tl O-tl 1 mismatch target, respectively, while C match j and U match j are the intensities of the cleaved and uncleaved bands derived from the match target, respectively.
  • RNA-free form that can load any synthetic ssDNA guides.
  • a homogenous DISC i.e. a complex of Argonaute protein loaded with ssDNA
  • QMC quadruplex magnesium connection
  • the purified AGO or mini-AGO is incubated with a 5' monophospholylated ssDNA whose 3' end is covalently connected with a half of QMC, followed by fishing only the programmed DISC using the counter part of QMC (see Figure 1 of Kankia, B. Sci. Rep. 5: 12996 (2015)).
  • programmed AGO and/or mini-AGO is used to cleave a viral RNA sequence (HIV RNA).
  • the TAT Trans-activator of transcription
  • Different variations (lengths) of the TAT peptide can be used to deliver the DISC to cells.
  • In vivo evaluation of RNA-cleavage is evaluated by either of two methods: Northern Blot analysis using 5' radiolabeled DNA probes complementary to the RNA of interest and detection by phosphorimaging; or in vivo detection of RNA-cleavage is measured by correlation to the downstream levels of protein by Western Blot analysis.
  • Other methods for delivering vectors, nucleic acids, proteins, or compositions to cells are known in the art (for example, viral vectors, lipid particles, etc.)
  • RNA-induced silencing complex contains a ss-gRNA that exposes only three 5' nucleotides at positions 2-4 to solvent to scan target RNAs.
  • the target specificity of RISC relies only on the sequence of the gRNA with no requirement for the target sequence. Therefore, the scanning mechanism of RISC is able to search for accessible nucleotides of highly-structured RNAs without any sequence requirements.
  • gDNAs DNA guides
  • AG02 is a catalytically active RNase in the presence of a gDNA.
  • DISC DNA-induced slicing complex
  • AGOAexN normally uses a 5' monophosphorylated 23 -nucleotide (nt) gRNA to cleave complementary RNA targets.
  • nt monophosphorylated 23 -nucleotide
  • the recombinant protein was loaded with miR-20a-derived gRNA or gDNA, followed by addition of either a complementary RNA or DNA target (Figs. 1 1a and l ib).
  • AGOAexN bound with gDNA was able to cleave target RNA almost as efficiently as with the canonical gRNA (FIG. 1 1c and FIG. 14), demonstrating that gDNA can activate yeast AGO as a functional DISC.
  • RNA packaging domain a variant of the 352-nt RNA derived from the human immunodeficiency virus type 1 (HIV-1) 5' untranslated region (UTR) was used as a target RNA.
  • Structured sub-domains include the transactivation response (TAR; nt 1 -57) element, poly(A)denylation signal (poly(A); nt 58-104), primer- binding site (PBS; nt 125-223), and genomic RNA packaging domain (Psi, nt 228-334) (FIG. 15). Numerous studies highlight the functional importance of each of these domains for viral replication.
  • the structurally characterized dimerization initiation signal (DIS) mutant called ADIS (FIG. 15) was used to reduce technical complications associated with RNA dimerization.
  • ADIS dimerization initiation signal
  • 14 gDNAs gDNAl to gDNA14
  • the 14 gDNAs were designed to generate cleavage products in 23-nt increments (FIGs. 12a and 16).
  • Each of the different sites on the target RNA ADIS 5'UTR complementary to the 14 gDNAs were first individually targeted by simply changing the gDNA in separate cleavage reactions.
  • AGOAexN and a single gDNA (one of the 14 gDNAs) were pre-incubated to form the DISC followed by addition of the 32 P end-labeled ADIS 5'UTR substrate (FIG. 15). Reactions were quenched and cleavage products were resolved by denaturing PAGE. Cleavage by DISC was detected at all sites, albeit to different extents (FIGs. 12b and 17).
  • DISC cleaved other sites predicted to be in base-paired regions (as determined by SHAPE analysis) more efficiently, such as those targeted by gDNA-6, -8, -9, and -10.
  • gDNA-4 Ql
  • gDNA-8 Q2
  • gDNA- 11 Q3
  • gDNA-6 Q4
  • Dinucleotide mismatches were introduced in the selected gDNAs at the two DNA nucleotide positions complementary to the two cleavage site ADIS 5'UTR RNA nucleotides (FIG. 12c), thereby created "mismatched" variants of each selected gDNA.
  • DISC bound with "matched” having 100% complementarity to target ADIS 5'UTR RNA
  • gDNA-6 showed 25-fold higher specificity towards the Q4 site in the PBS domain compared with the mismatched variant (FIGs. 12d and 17).
  • DISC bound with mismatched gDNA-4 displayed low cleavage activity in the poly(A) loop, similar to DISC bound with matched gDNA-4, the low cleavage efficiency (Ql) representative.
  • Ql cleavage efficiency
  • DISCs containing mismatched gDNA-8 and gDNA-11 did not display any detectable cleavage against the Q2 and Q3 sites, respectively (FIG. 12d).
  • DISC can be directed to a given specific sequence of a target RNA by including a gDNA complementary to the specific RNA sequence.
  • the gDNA "programs" the DISC to be specific for a sequence of the target RNA.
  • DISC can be readily programmed to target different sequences without modifying the catalytic machinery, (ii) retains high specificity towards its intended target sites, and (iii) possesses no target site sequence limitations. However, to map accessible sites on long RNAs, a high-throughput approach is desirable.
  • the combined DISCs generated only four cleavage products, all of which migrated at lengths that matched those generated by separate reactions using individual DISC-gDNA combinations.
  • the fact that cleaved products did not undergo multiple cuts by different DISCs in the same reaction mixture demonstrates that the cleavage displayed single-hit kinetics.
  • FIG. 13a Another requirement for mapping accessible sites in a high-throughput manner is accurate read-out of the cleavage sites generated by multiple DISCs.
  • RT/PE reverse- transcription/primer extension analysis
  • FIG. 13a DISCs were assembled with 11 of the 14 gDNAs used in FIG. 12a spanning nucleotides 24-276, and the 11 DISCs were mixed together in a single mixture. Noise associated with large peaks corresponding to the primer and full-length product limit the applicability of this technique at the 5' and 3' termini, thus 3 gDNAs were intentionally excluded.
  • An unlabeled HIV-1 ADIS 5'UTR substrate was added to the mixture to initiate cleavage.
  • RNA pool containing all cleavage products was used to template RT reactions using 23-nt long fluorophore-labeled primers.
  • the extended primers were subjected to capillary electrophoresis and analyzed by RiboCAT software to assign peaks and identify DISC-mediated cleavage sites.
  • the output of the analysis provided a trace of peak intensities corresponding to programmed cleavage sites, revealing accessible regions.
  • the RT/PE assay detected DISC -generated cleavage products across the HIV-1 ADIS 5'UTR substrate in 23-nt increments (FIG. 13b and 13c).
  • RNA substrates of unknown structure such as long non-coding RNAs or full-length genomic viral RNAs.
  • AGOAexN Ilel251
  • RNA and DNA oligonucleotides and polynucleotides used in this study is provided (Tables 8-1 1).
  • miR-20a-derived 5' phosphorylated gRNAs were chemically synthesized (Dharmacon), deprotected, and gel -purified.
  • 5' phosphorylated gDNAs were chemically synthesized (Sigma Aldrich). The sequences encoding target RNAs were cloned into a pUC 19 vector and transcribed in vitro using T7 RNA polymerase.
  • DNase I-treated transcripts were gel-purified (10% polyacrylamide, 8 M urea, lx TBE), capped using ScriptCap m 7 G Capping System (CellScript) either with GTP for unlabeled targets or with [a- 2 P]GTP
  • DNA target was chemically synthesized (Sigma Aldrich), 5' end-labeled with T4 PNK (ThermoFisher) and [ ⁇ 2 ⁇ ] ⁇ (3000 Ci mmol "1 ) before gel purification. Unlabeled nucleic acid concentrations were quantified by spectrophotometry at 260 nm and calculated using the molar extinction coefficient. All extinction coefficients for substrates synthesized by commercial vendors were calculated and provided by manufacturer. The extinction coefficient used for capped miR-20a RNA targets is 587,900 (L / mole*cm).
  • HIV-1 ADIS 5'UTR variant used in this study contained a stable GAGA tetraloop sequence in place of the dimerization initiation signal (DIS) (FIG. 15 and Table 8). HIV-1
  • ADIS 5'UTR was in vitro transcribed from a Fokl-digested pUC18 vector with an upstream hammerhead ribozyme using T7 RNA polymerase.
  • DNase I-treated transcripts were gel- purified by denaturing PAGE (7% polyacrylamide, 8 M urea, lx TBE) and visualized by UV shadowing.
  • RNA was eluted from the gel in elution buffer (500 mM ammonium acetate, 1 mM EDTA, 0.1% (w/v) SDS), ethanol precipitated, resuspended in MilliQ water, and quantified by UV absorbance at 260 nm using an extinction coefficient for quantification of 3,243,098 (L / mole*cm).
  • elution buffer 500 mM ammonium acetate, 1 mM EDTA, 0.1% (w/v) SDS
  • ethanol precipitated resuspended in MilliQ water
  • the RNA was 5' end-labeled with T4 PNK4 (ThermoFisher) and [ ⁇ 32 ⁇ ] ⁇ (3000 Ci mmol-1) for labeled substrate or with ATP for unlabeled substrate.
  • gDNAs for experiments targeting the HIV-1 ADIS 5'UTR sequence were generated by following the workflow outlined in FIG. 16.
  • the designed gDNAs were chemically synthesized with 5' monophosphates (Sigma Aldrich), resuspended in MilliQ water, and quantified by UV absorbance at 260 nm using extinction coefficients provided by manufacturer. miR-20a-mediated cleavage assays
  • AGOAexN co-purifies with bound endogenous E. coli RNA14
  • optimal guide:protein concentrations were approximated to identify an appropriate amount of gDNA to mix with AGOAexN for biochemical assays.
  • AGOAexN 500 nM was pre-incubated with increasing amounts of gDNA (0-100 nM) for 30 min at 25 °C followed by addition of cap- labeled miR-20a-derived target (1 nM) and shifting the temperature to 30 °C for 20 min. Reactions were quenched with formamide loading buffer and resolved by 16% denaturing PAGE (8M urea, lx TBE). Gels were visualized by phosphorimaging and quantified by ImageQuant (GE Healthcare).
  • AGOAexN 500 nM AGOAexN were used. AGOAexN was pre-incubated with gRNA or gDNA followed by addition of either perfectly matched cap-labeled RNA target (1 nM) or the same target but with a dinucleotide mismatch at the cleavage site. Products were resolved on 16% denaturing PAGE and gels were visualized by phosphorimaging.
  • HIV-1 ADIS 5'UTR substrate was prepared by mixing unlabeled HIV-1 ADIS 5'UTR substrate (10 nM) and trace amounts of 32 P-end-labeled HIV-1 ADIS 5'UTR in 50 mM HEPES (pH 7.5). Sample was heated at 80 °C for two min followed by incubation at 60 °C for four min. MgCh was added to a final concentration of 10 mM and sample was transferred to 37 °C for 6 min followed by incubation on ice for at least 30 min. Sample homogeneity was checked by 6% native PAGE (lx TB, 1 mM MgCh) at 4 °C (FIG. 15c).
  • Cleavage assays using a mixture of gDNAs were performed similarly to the individually guided cleavage assays except that equimolar amounts of each of the selected gDNAs were pre-mixed together before adding to the reaction mixture.
  • the mixture was pre- incubated at 25 °C for 30 min to form a mixture of DISCs that would recognize different regions of the HIV-1 ADIS 5'UTR substrate. After DISC-formation, 5'-labeled HIV-1 ADIS
  • 5'UTR was added to the mixture (final concentration 1 nM) and 3- ⁇ aliquots were removed at indicated time-points (0-60 min) and quenched with formamide dye. Products were resolved by 8% denaturing PAGE alongside an RNA marker.
  • HIV-1 ADIS 5'UTR final concentration 25 nM
  • HIV-1 ADIS 5'UTR cleavage was performed at 30 °C for 60 min. The higher concentration was used based on earlier observations that 2.5 - 5 picomoles RNA template was optimal to prime reverse transcription during the primer extension steps of the assay. After 60 min, reactions were quenched and extracted by the addition of phenol pH 6.6
  • RNA was ethanol precipitated in the presence of glycogen (2 ⁇ g) and stored as a pellet at -20 °C. Control reactions were performed similarly except either AGOAexN [AGO(-)], gDNAs [gDNA(-)], or both [AGO(-)/gDNA(-)] were excluded to identify capillary electrophoresis peaks resulting from degradation of transcript or background.
  • RNA pellets were resuspended in 9 MilliQ water, annealed with 2 ⁇ of 5 ⁇ NEDTM-labeled primer and extended using Superscript III reverse transcriptase following the manufacturer's protocol (Invitrogen) in a total reaction volume of 20 ⁇ ..
  • Remaining RNA was digested by adding 1 of 4 M NaOH and heating to 95 °C for 3 min. The reactions were then neutralized with 2 ⁇ 2 M HC1. For each sample, 3 ⁇ . of neutralized reaction was added to 17 ⁇ . MilliQ water and ethanol precipitated with 10 ⁇ g glycogen.
  • the reactivity values were scaled based on the average of the lowest 20% of peak areas in the gDNA(-) background control and then normalized by subtracting the gDNA(-) background from each and dividing the resulting values by the average of the top 10% of reactivity values. Averaged data represents the average of three independent experiments.
  • DISC-accessible sites may be an Achilles's heel of target RNAs. Identification of these sites can provide a new therapeutic strategy aimed at targeting RNA-based diseases such as AIDS, hepatitis C, ZIKA, microcephaly, cancer, and others.
  • RNA-based diseases such as AIDS, hepatitis C, ZIKA, microcephaly, cancer, and others.
  • gDNA 5' nucleotide sequence was analyzed by altering the identity of the 5 ' nt to T, A, G or C (FIG. 18 A). Cleavage percentage in the endpoint assay indicated that gDNAs with a 5' should be used for gDNA design.
  • gDNA length was analyzed for the unstructured miR-20a target by truncating or extending the base-paired region between the guide and target strands. All gDNAs perfectly match the RNA target and were from 15-25 nt in length. The longer sequences (closer to 25 nt) produced higher levels of cleavage (FIG. 18B).
  • gDNAs were analyzed using the structured HIV-1 ADIS 5 'UTR RNA target (FIG. 19A-19D). gDNAs were designed at 20 - 25 nt in length to target two sites on the HIV- 1 ADIS 5 'UTR target at sites #6 and #8 (FIG. 19B). Quantified data showed lengths of 23 and/or 24 appeared to provide the best cleavage at site #6, while lengths of 22 and 23 appeared best for site #8. Finally, cleavage assays were performed to compare activity by DISC and RNase H against unstructured miR-20a RNA target and structured HIV-1 ADIS 5'UTR RNA target (FIG. 20A-20B).
  • RNAse H While quantified data of cleavage for the unstructured miR20a sequence worked slightly better with RNAse H, the cleavage of structured HIV-1 ADIS 5'UTR RNA target by DISC was superior to that of RNAse H. The results indicate that DISC is able to access and cleave structured regions of RNA that RNase H is unable to cleave.
  • WT HIV-1 ADIS 5'UTR (SEQ ID NO: 37):
  • HIV-1 ADIS 5'UTR SEQ ID NO: 38
  • the target region refers to the nucleotides, in order from 5' to 3 ', of the HIV-1 ADIS 5 'UTR.
  • the sequences of the gDNAs are complementary to the listed target regions.
  • the 5 ' product length refers to the number of nucleotides expected after primer extension of the cleavage product.
  • the sequences, target region, and 5' product lengths are as described in Table 9.
  • the gDNA # refers to the "mismatched" (mm) sequences of the selected quartile representatives.
  • Wild-type KpAGO amino acid sequence corresponds to NCBI code:
  • KpAGO 207-1251 used in this disclosure is composed of the following amino acid sequence, which includes an N-terminal serine leftover after enzymatic tag-cleavage by Ulpl:
  • Miniature- AGO used in this disclosure is composed of the following amino acid sequence, which includes aN-terminal serine leftover after enzymatic tag-cleavage by Ulpl:
  • the nucleotide sequence encoding the polypeptide for KpAGO 207-1251 has the sequence below, which includes the codon for the N-terminal serine leftover after enzymatic Ulpl tag- cleavage.

Landscapes

  • Life Sciences & Earth Sciences (AREA)
  • Chemical & Material Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Genetics & Genomics (AREA)
  • Engineering & Computer Science (AREA)
  • Organic Chemistry (AREA)
  • Zoology (AREA)
  • Wood Science & Technology (AREA)
  • General Engineering & Computer Science (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Biomedical Technology (AREA)
  • Biotechnology (AREA)
  • Molecular Biology (AREA)
  • Mycology (AREA)
  • Microbiology (AREA)
  • Biochemistry (AREA)
  • General Health & Medical Sciences (AREA)
  • Biophysics (AREA)
  • Proteomics, Peptides & Aminoacids (AREA)
  • Physics & Mathematics (AREA)
  • Plant Pathology (AREA)
  • Chemical Kinetics & Catalysis (AREA)
  • Gastroenterology & Hepatology (AREA)
  • Medicinal Chemistry (AREA)
  • Immunology (AREA)
  • Analytical Chemistry (AREA)
  • Measuring Or Testing Involving Enzymes Or Micro-Organisms (AREA)

Abstract

The present invention relates generally to compositions, systems, and methods for cleaving RNA molecules.

Description

SYSTEMS AND METHODS FOR DNA-GUIDED RNA CLEAVAGE
CROSS REFERENCE TO RELATED APPLICATIONS
This application claims the benefit of U. S. Provisional Patent Application Serial No. 62/435,272 filed December 16, 2016 and U. S. Provisional Patent Application Serial No. 62/580,642 filed November 2, 2017, which are expressly incorporated herein by reference.
FIELD
The present invention relates generally to compositions, systems, and methods for cleaving RNA molecules.
BACKGROUND
MicroRNAs and small interfering RNAs (siRNAs) are incorporated into Argonaute proteins to assemble the RNA-induced silencing complex, RISC. The loaded RNAs, pre- organized in the nucleic acid-binding channel, serve as guides to facilitate base pairing with targets. It is well known that the RISCs open their bilobal structures to widen the intervening channel during the transition from nucleation to propagation steps of guide-target duplex formation and cleave the targets only when their sequence perfectly matches the guide. However, the significance of the proteinaceous part of RISC for this step has not been studied well due to the difficulty of making suitable constructs. Also, current methods involving the RISC complex in attenuating gene expression require an RNA oligonucleotide to facilitate target RNA cleavage.
In addition, scientists have yet to identify a sequence-specific endoribonuclease. To meet the demand for analogous RNA restriction enzymes, researchers have made great strides in modifying existing ribozymes and in developing artificial RNA-targeting nucleases with defined target sites. Previous approaches have employed hammerhead (HH) ribozymes, catalytic DNAzymes, and artificial site-specific RNA endonucleases (ASREs) capable of RNA recognition and cleavage. However, HH ribozymes and ASREs are limited by the need to re- engineer the RNA-recognition motif for each unique target of interest and DNAzymes depend on multiple cycles of selective evolution to achieve catalysis against desired targets. Furthermore, target-accessibility is a prerequisite to RNA recognition, and thus, nuclease design is dependent on prior knowledge of any secondary structural features that the RNA may exhibit. Chemical probing methods and enzymatic strategies using RNase H have allowed researchers to gain insights into which regions of RNA are unpaired or exposed to solvent and may serve as candidate target sites for enzymatic cleavage, antisense oligonucleotide or small- interfering RNA design.
While all these methods collectively hold promise to guide the design and development of site-directed RNases, their preparative strategies and experimental execution are costly, labor-intensive, and frequently minimize the targetable pool to single-stranded regions. Consequently, the scientific community still lacks a programmable RNA restriction enzyme that overcomes the shortcomings of existing technologies and is amenable to the high- throughput identification of cleavage-accessible sites on RNAs with complex secondary structures.
The systems and methods disclosed herein address these and other needs. SUMMARY
Disclosed herein are methods for cleaving target RNAs using a yeast Argonaute polypeptide and a single stranded DNA as a guide sequence. The inventors have shown that a yeast Argonaute protein can utilize single-stranded DNA as a guide molecule for cleaving target RNAs.
In one aspect, disclosed herein is a DNA-guided RNA cleavage system comprising: a yeast Argonaute polypeptide; and
a heterologous, single-stranded oligonucleotide guide molecule;
wherein the single-stranded oligonucleotide guide molecule is a DNA oligonucleotide that is complementary to a target RNA sequence.
In one aspect, provided herein is a method for cleaving an RNA molecule, comprising: binding to a target RNA sequence a complex comprising:
a yeast Argonaute polypeptide; and
a heterologous, single-stranded oligonucleotide guide molecule;
wherein the single-stranded oligonucleotide guide molecule is a DNA oligonucleotide that is complementary to the target RNA sequence; and
wherein the Argonaute polypeptide:guide molecule complex cleaves the target RNA sequence.
In another aspect, disclosed herein is a method for attenuating expression of a target gene in a cell, comprising:
introducing into the cell a yeast Argonaute polypeptide; and
introducing into the cell a single stranded DNA (ssDNA) in an amount sufficient to attenuate expression of the target gene; wherein the ssDNA comprises a nucleotide sequence that is complementary to a nucleotide sequence of the target gene.
In another aspect, disclosed herein is a method for attenuating expression of a target gene in a cell, comprising:
introducing into the cell a complex comprising: a yeast Argonaute polypeptide and a single stranded DNA (ssDNA) in an amount sufficient to attenuate expression of the target gene; wherein the ssDNA comprises a nucleotide sequence that is complementary to a nucleotide sequence of the target gene.
In one embodiment, the yeast Argonaute polypeptide is from Vanderwaltozyma polyspora (also known as Kluyveromyces polysporus). In one embodiment, the yeast Argonaute polypeptide is selected from SEQ ID NO:31 , SEQ ID NO:32, or SEQ ID NO:33. In one embodiment, the yeast Argonaute polypeptide is SEQ ID NO:32. In some embodiments, the yeast Argonaute polypeptide has at least 60% (for example, at least 60%, at least 65%, at least 70%, at least 80%, at least 85%, at least 90%, at least 95%, at least 99%) identity to SEQ ID NO:31, SEQ ID NO: 32, or SEQ ID NO:33. In some embodiments, the yeast Argonaute polypeptide has at least 60% (for example, at least 60%, at least 65%, at least 70%, at least 80%, at least 85%, at least 90%, at least 95%, at least 99%) identity to SEQ ID NO:32.
In one embodiment, the single-stranded oligonucleotide guide molecule is about 12 to about 45 nucleotides. In some embodiments, the single-stranded oligonucleotide guide molecule is about 12, about 13, about 14, about 15, about 16, about 17, about 18, about 19, about 20, about 21 , about 22, about 23, about 24, about 25, about 26, about 27, about 28, about 29, about 30, about 31 , about 32, about 33, about 34, about 35, about 36, about 37, about 38, about 39, about 40, about 41 , about 42, about 43, about 44, or about 45 nucleotides. In one embodiment, the single-stranded oligonucleotide guide molecule is about 12 to about 30 nucleotides. In one embodiment, the single-stranded oligonucleotide guide molecule is about 14 to about 26 nucleotides. In some embodiments, the single-stranded oligonucleotide guide molecule is about 21 to about 25 nucleotides. In one embodiment, the single-stranded oligonucleotide guide molecule is about 23 nucleotides.
In one embodiment, the target RNA sequence is from a mammal. In one embodiment, the target RNA sequence is from a human. In one embodiment, the target RNA sequence is from a virus. In one embodiment, the target RNA sequence is from a pathogen. In one embodiment, the target RNA sequence is from a bacterium. In one embodiment, the target RNA sequence is from a prokaryotic cell. In one embodiment, the target RNA sequence is from a eukaryotic cell. Further disclosed herein are systems and methods for detecting nuclease accessibility sites in an RNA sequence. The inventors have shown that a yeast Argonaute protein can utilize single-stranded DNA as a guide molecule for, among other applications, high-throughput identification and targeting of accessible regions of highly-structured RNAs. Complexes (referred to as a DNA-induced slicing complex; or "DISC") containing an Argonaute protein which utilize single-stranded DNA as a guide molecule have advantages over complexes containing an Argonaute protein which utilize single-stranded RNA ("RISC") due to the increased stability and significantly lower cost of DNA over RNA, making large-scale high- throughput applications more feasible.
In one aspect, disclosed herein is a method of detecting nuclease accessibility sites in an RNA sequence, the method comprising a) binding to a target RNA sequence a complex comprising a yeast Argonaute polypeptide and a first single-stranded DNA oligonucleotide guide molecule, wherein the single-stranded DNA oligonucleotide guide molecule is complementary to the target RNA sequence; b) cleaving the target RNA sequence with the Argonaute polypeptide:guide complex to form an RNA cleavage product; c) detecting the RNA cleavage product; and d) determining a nuclease accessibility site based on the RNA cleavage product.
In another aspect, disclosed herein is a method of high-throughput detection of nuclease accessibility sites, the method comprising a) assaying a target RNA sequence with two or more Argonaute polypeptide: guide complexes, wherein each complex comprises a yeast Argonaute polypeptide and a single-stranded DNA oligonucleotide guide molecule from a library of single-stranded DNA oligonucleotide guide molecules, wherein each single-stranded DNA oligonucleotide guide molecule is complementary to a portion of the target RNA sequence; b) cleaving the target RNA sequence with the Argonaute polypeptide: guide complexes to form at least one RNA cleavage product; c) detecting the at least one RNA cleavage product; and d) determining a nuclease accessibility site based on the at least one RNA cleavage product.
In a further aspect, disclosed herein is a DNA-guided RNA cleavage system for high- throughput detection of nuclease accessibility sites, the system comprising a first complex comprising a first yeast Argonaute polypeptide and a first single-stranded DNA oligonucleotide guide molecule; and a second complex comprising a second yeast Argonaute polypeptide and a second single-stranded DNA oligonucleotide guide molecule; wherein the first and second single-stranded DNA oligonucleotide guide molecules are not identical and are complementary to a target RNA sequence. In another aspect, disclosed herein is a kit comprising a vector comprising: a nucleic acid sequence encoding a yeast Argonaute polypeptide operably linked to a promoter; an RNA- dependent DNA polymerase; a set of buffered RNA cleavage reagents; and a set of buffered reverse transcription reagents.
BRIEF DESCRIPTION OF THE DRAWINGS
The accompanying figures, which are incorporated in and constitute a part of this specification, illustrate several aspects described below.
FIGS. 1A-1H show cleavage activity of mini-AGO. FIG. la, Domain architectures of K. polysporus Argonaute (wildtype, Agol; and truncated, AGO) and its miniature Argonaute (mini-AGO) as well as the previously crystalized construct of Neurospora crassa QDE-2 C- terminal lobe (PDB accession code 2YHA). FIG. lb, Sequence alignment of conserved RxxxGxxG (Argonaute clade) and GxxG (PIWI clade) motifs in the N domain of Argonaute family proteins. Kluyveromyces polysporus (Kp), Neurospora crassa (Nc), Schizosaccharomyces pombe (Sp), Homo sapiens (Hs), Arabidopsis thaliana (At), Drosophila melanogaster (Dm), Bombyx mori (Bm), Mus musculus (Mm). FIG. lc, Nuclease-sensitivity of co-purifying nucleic acid. Polynucleotides were extracted from indicated samples, end- labelled, and either untreated or incubated with RNase (R) or DNase (D) before analysis by denaturing PAGE with a hydrolyzed marker (nt, nucleotide). FIG. Id, Cleavage activity of mini-AGO. AGO and mini-AGO were pre-incubated with miR-20a, and mixed with cap- labelled (red) target RNAs matching the guide strand. Reactions were resolved by denaturing PAGE alongside RNA of known length. FIG. le, Analysis of cleavage product length. Products generated as in Fig. Id were resolved on a sequencing gel alongside a hydrolyzed marker. FIG. If, Schematic of target (top) and guide (bottom) strands. RNA or DNA guides matched with either 5' capped RNA or 5' phosphorylated DNA targets used in Fig. lg and Fig. lh. FIG. lg, Cleavage activity of a DNA-induced silencing complex (DISC). Cleavage assays were performed as in Fig. Id except that AGO was pre-incubated with either a 5' phosphorylated RNA or DNA guide before addition of end-labelled RNA or DNA targets depicted in Fig. If Data represent mean ± standard deviation (s.d.) (n=3). FIG. lh, Cleavage activity of a mini- DISC (DNA-programmed mini-AGO). Cleavage assays were performed as in Fig. lg except with mini-AGO.
FIGS. 2A-2D show the recognition of the seed region and catalytic assembly. FIG. 2a, Crystal structure of mini-AGO with N (cyan), L2 (grey), MID (orange), and PIWI (green) domains in ribbon representation. The bound RNA is drawn as a stick model (red). FIG. 2b, Interaction of the RxxxGxxG motif with the PIWI domain. Hydrogen bonds shown as dashed lines. Residues in the N- (cyan) and PIWI (green) domains are drawn as stick models. Water molecules are shown as red spheres. FIG. 2c, MiRNA seed is recognized by the MID and PIWI domain residues along the sugar phosphate backbone. 5' binding pocket in the MID domain anchors phosphate group at position 1. Hydrogen bonds are indicated by dashed lines. Color codes of atoms are as follows: phosphate, yellow; sulfur, orange; nitrogen, blue; oxygen, red; 2'-oxygen, white. FIG. 2d, Catalytically active conformation of mini-AGO. Superposition of AGO from ^. polysporus (white) or mini-AGO (green) reveals fully assembled active site with plugged-in glutamate finger.
FIGS. 3A-3D show the reconstitution of in vitro RNAi by mini-AGO. FIG. 3a,
Passenger strand cleavage assays. SiRNA duplexes labelled at the 5' end of the passenger were incubated with either AGO or mini-AGO. FIG. 3b, In vitro execution of all stages in the RNAi pathway. Unlabeled siRNA duplexes were pre-incubated with either AGO or mini-AGO. The complex was then incubated with the cap-labelled, matched target RNAs. FIG. 3c, Matched and mismatched targets. Both targets (top) were cap-labeled (red). The mismatched target contained a dinucleotide mismatch at tlO and tl 1 (blue) against miR-20a guide (bottom). FIG. 3d, Tolerance of target cleavage by mini-AGO to mismatches. AGO and mini-AGO were programmed with single-stranded miR-20a and incubated with the matched or mismatched target shown in Figure 3c. All reactions were separated by 16% denaturing PAGE.
FIGS. 4A-4E show the discrimination of guide:target pairs between AGO and mini-
AGO. FIG. 4a, FIG. 4b, Cleavage of the perfectly matched target with guide RNAs of different lengths. Either of the guide RNAs (10, 11, 12, 13, 14, 16, 23 nt) was loaded into AGO (a) and mini-AGO (b). Relative cleavage was calculated using Equation 2. FIG. 4c, FIG. 4d, Cleavage of the mismatched target with guide RNAs of different length. The target was the same in Fig. 4a and 4b except for the tlO-tl 1 step mismatches. The same guides used in Fig. 4a and 4b were loaded into AGO (c) and mini-AGO (d). Relative cleavage was calculated using Equation 2. FIG. 4e, Reliance on the different seed regions for target cleavage by AGO and mini-AGO. Mismatched dinucleotides on the 14-nt guide are indicated in boxes. Relative cleavage was calculated using Equation 3. Data represent mean ± s.d. (n=3).
FIGS. 5A-5D show the design of mini-AGO. FIG. 5a, Bilobal structure of AGO from
K. polysporus (PDB accession code 4F1N). The C-terminal lobe is composed of the MID (orange) and PIWI (green) domains along with βΐ (cyan) and β20 (dark grey). For clarity, the N-terminal lob is colored in light grey. FIG. 5b, Extended β-strands in the PIWI domain. The color codes are the same as in (a). FIG. 5c, Strategy for designing a mini-AGO construct. Catalytic and conserved RxxxGxxG residues are circled and labelled in red and cyan, respectively (amino acid residues are abbreviated as follows: D, aspartate; E, glutamate; G, glycine; R, arginine) FIG. 5d. Amino acid sequence and secondary structure of mini-AGO segment located at the interface of the N-terminal and C-terminal lobes. Conserved RxxxGxxG motif is underscored.
FIG. 6 shows contribution of conserved RxxxGxxG motif to stability. Effect of point mutations to the RxxxGxxG motif on the solubility of the C-terminal-lobe construct. After lysis, the soluble (S) and precipitated (P) fractions were separated by centrifugation and resolved by SDS-PAGE. The bands of SUMO-tag fused mini-AGO are indicated with an arrowhead.
FIGS. 7A-7E show RNAs co-purified and crystallized with mini-AGO. FIG. 7a, Profile of size-exclusion chromatography of mini-AGO. Absorbance values at 254 and 280 nm are colored in red and blue, respectively. FIG. 7b, FIG. 7c, Fo-Fc omit map contoured at 2.5σ around the bound guide RNA. The omit map is shown with the ribbon model of mini-AGO (wheat) (b) and with the final RNA model (red) (c). FIG. 7d, Result of superposing mini-AGO and AGO shows alignment of mini-AGO-bound guide RNA nucleotides 2-7 (red) on AGO- bound ones 2-8 (white, PDB accession code 4F1N). FIG. 7e, RNAs were extracted from mini- AGO crystals, 5' end-labelled, and resolved by denaturing PAGE alongside RNAs of known length.
FIGS. 8A-8G show reconstitution of in vitro RNAi by mini-AGO. FIG. 8a, Schematic of duplex loading, passenger cleavage, and target recognition and cleavage by mini-AGO. FIG. 8b, Preparation of siRNA duplex used in passenger strand cleavage assays. The 5'-end label of the passenger strand is colored in red. Annealed siRNA duplex was resolved on 20% native PAGE alongside 23-nt single-stranded passenger. Gel was visualized by phosphorimaging. FIG. 8c, FIG. 8d, Passenger strand cleavage by AGO (c) or mini-AGO (d) from Figure 3a were quantified and plotted using Equation 1. Dashed line shows amount of the residual passenger strand after duplex formation seen in (b) (2.6%). FIG. 8e, Preparation of unlabeled siRNA duplex used in siRNA-mediated target RNA cleavage assays. Annealed siRNA duplex was resolved as in (b). Gel was visualized by SybrGold staining. FIG. 8f, FIG. 8g, SiRNA-mediated target RNA cleavage by AGO (f) or mini-AGO (g) from Figure 3b was quantified and plotted using Equation 1. Data represent mean ± s.d. (n=3).
FIGS. 9A-9B show recognition of mismatches at the cleavage site. FIG. 9a, FIG. 9b, Guide-mediated mismatched target cleavage by AGO (a) and mini-AGO (b) from Figure 3d were quantified. Cleavage percentages were calculated using Equation 4. Data represent mean ± s.d. (n=3).
FIGS. 10A-10D show cleavage of targets guided by atypically short guides. FIG. 10a, Schematic of miR-20a RNA guides trimmed at their 3' ends used for guide-mediated cleavage assays shown in Figure 4a-d. FIG. 10b, Schematic of programming mini-AGO with ssRNA guides before adding the 60-nt target strand. Cap-label shown as yellow circle. FIG. 10c, Secondary structure prediction and free energy calculation of two single-stranded RNAs with guide (red) and target (blue). For clarity, the first nucleotide of the guide is not shown. FIG. lOd, Model of guide:target pairing on mini-AGO. Guide and target colored as in (c). Stable and unstable base pairs between guide and target are shown as black solid lines and dashed grey lines, respectively.
FIGS. 11 A-l ID show DISC-mediated RNA cleavage activity. (FIG. 1 la) Schematic of cleavage assay. AGOAexN was programmed with either a 5' monophosphorylated gRNA or gDNA followed by addition of a perfectly (100%) complementary RNA or DNA target (yellow circle indicates 2P-phosphate). (FIG. 1 lb) Combinations of RNA and DNA guide:target pairs assayed for AGOAexN cleavage activity. Bottom strand (guide); top strand (target); yellow (p) indicates 2P-radiolabel on target. Complete 60-nt target sequences are shown in Table 1. (FIG. 11 c) RNA or DNA target cleavage activity of AGOAexN programmed with a gRNA or gDNA. Cleavage activity was plotted relative to RNA target cleavage when AGOAexN was loaded with gRNA. Average of three experiments is shown as a bar with individual replicates plotted as circles. Boxed inset shows expanded view of low level DNA target cleavage. (FIG. l id) Mismatch sensitivity of gRNA- or gDNA-dependent RNA cleavage. The matched RNA target was the same as that used in FIG. l ib. The mismatched RNA target included an unpaired dinucleotide (bold) pairing to the guide positions 10 and 11. AGOAexN programmed with either gRNA or gDNA was incubated with the 5' cap-labeled matched or mismatched RNA target. The reaction was resolved on 16% denaturing PAGE.
FIGS. 12A-12F show cleavage of highly-structured viral RNA by DISC. (FIG. 12a) Predicted secondary structure of HIV-1 ADIS 5'UTR. The position of each 23-nt gDNA- targeted sequence is indicated along with the corresponding gDNA# in parentheses (Table 3). Segments are colored in alternating black and purple for clarity. Shaded circles highlight the 3' nt of each target segment that does not pair to the gDNA (FIG. 12c). Triangles in FIG. 12a indicate cleavage sites on the RNA and coloring of the triangles reflects cleavage site reactivity, as shown in the scale to the right side of FIG. 12b. (FIG. 12b) Results of cleavage assays using the gDNAs complementary to each 23-nt segment shown in FIG. 12a. Averages of three independent experiments are shown as bars and individual replicates are plotted as circles. Inset shows expanded view of low-level cleavage by gDNAl-3. Color scale bar indicates reactivity, which is grouped into quartiles based on percent target cleaved (Ql, 0-12.5%; Q2, 12.5-25%; Q3, 25-37.5%; Q4, 37.5-50%). (FIG. 12c) Schematic of guide:target pairs used in mismatch assay. gDNA-4, -6, -8 and -11 served as representatives from each quartile. Perfectly matched guide:target pairs in the nucleotide sequences at the left side of FIG. 12C are shown alongside dinucleotide mismatched counterparts in the nucleotide sequences at the right side of FIG. 12C. Mutated nucleotides at positions 10 and 11 shown in bold and colored black on gDNA of mismatched pairs. (FIG. 12d) Sensitivity of DISC to mismatches. Cleavage of matched or mismatched target using gDNA-4, -6, -8 and -11 were quantified as in FIG. 12b. (FIG. 12e, FIG. 12f) Schematic (e) and a gel image (f) of HIV- 1 ADIS 5'UTR cleavage by a mixture of gDNA-4, -6, -8 and -11 (lanes 7-13). The individual gDNA reactions are run in lanes 3-6.
FIGS. 13A-13C show high-throughput mapping of accessible sites on HIV-1 RNA. (FIG. 13a) Schematic of steps involved in batch-cleavage by DISC on HIV-1 ADIS 5'UTR RNA substrate followed by RT/PE and capillary electrophoresis analysis. Refer to Materials and Methods for detailed experimental procedure. (FIG. 13b) Electropherogram of arbitrary reactivity units of assorted DISC-mediated cleavage with gDNA-2 through -12. The data were analyzed by RiboCAT. Traces from three independent experiments show consistency and reproducibility of the method. gDNA# used for cleavage is shown above each peak. (FIG. 13c) Trace showing average of three independent experiments.
FIGS. 14A-14C show gDNA-dependent RNA cleavage by a truncated K. polysporus AGOl variant. (FIG. 14a) Domain architecture of yeast AGO. The four conserved domains and two linker regions: N (cyan), Linker 1 (black line), PAZ (violet), Linker 2 (black line), MID (orange), and PIWI (green). In the present study, a truncated K. polysporus AGOl variant lacking the first 206 residues (AGOAexN) was used which retains comparable RNAi activity as wild-type (WT) AGOl. (FIG. 14b, FIG. 14c) In vitro RNA cleavage by DISC. AGOAexN (500 nM) was mixed with increasing amounts of miR-20a-derived gDNA before adding 1 nM 5' end-labeled 60-nt target. A representative gel is shown in (b). Cleavage products were plotted as a function of gDNA concentration in (c). Data points represent the average of three independent experiments with error bars representing S.D.
FIGS. 15A-15C show HIV-1 5' UTR substrate. (FIG. 15a) Predicted secondary structure of WT HIV-1 5'UTR RNA (nt 1-356) based on SHAPE analysis. Dimerization Initiation Signal (DIS) is shown in red (nt 256-264). (FIG. 15b) Predicted secondary structure of HIV-1 ADIS 5'UTR used in this study; DIS is replaced with a GAGA tetraloop. Residue numbering throughout this study follows the mutated ADIS 5'UTR construct. (FIG. 15c) Evaluation of HIV-1 ADIS 5'UTR sample homogeneity. After folding (see Materials and Methods), 2P-labeled HIV-1 ADIS 5'UTR was resolved on 6% native PAGE supplemented with 1 mM MgCh.
FIG. 16 shows the workflow to generate gDNAs for systematic analysis with assorted DISCs. The sequence of target RNA (HIV-1 ADIS 5'UTR) is used as the input. Target RNA is first converted from RNA to DNA followed by generation of the reverse complement, which is divided into 23-nt fragments from its 5' end. Each gDNA 5' nt is changed to T, as previously reported for human AG02.
FIG. 17 shows a representative gel of HIV-1 ADIS 5'UTR cleavage by DISC, (a) Denaturing PAGE (8%) showing resolved cleavage products depicted in FIG. 12b alongside an RNA marker, (b) 8% denaturing PAGE showing results of dinucleotide mismatch assay used in FIG. 12d.
FIGS. 18A-18B show gDNAs using unstructured miR-20a-derived RNA target. FIG.
18A. gDNA 5 ' nucleotide sequence was analyzed by altering the identity of the 5 ' nt to T, A, G or C. Cleavage percentage in the endpoint assay indicated that gDNAs with a 5' should be used for gDNA design. FIG. 18B. gDNA length was investigated for the unstructured miR-20a target by truncating or extending the base-paired region between the guide and target strands. All gDNAs perfectly match the RNA target. gDNAs tested range from 15-25 nt in length.
FIGS. 19A-19D show gDNAs using structured HIV-1 ADIS 5'UTR RNA target. FIG. 19A. gDNAs were designed at 20 - 25 nt in length to target two sites on the HIV-1 ADIS 5'UTR target at sites #6 and #8. Representative gel of cleavage assay showing substrates resolved from cleavage products be denaturing urea PAGE (8%). FIG. 19B. Schematic showing partial secondary structure of HIV-1 ADIS 5'UTR and sites targeted by gDNAs #6 and #8. Color scale bar indicates cleavage reactivity grouped into 12.5% windows. FIG. 19C, FIG. 19D. Quantified data showing cleavage percentages shown in Fig. 2A for gDNA#6 (FIG. 19C) and gDNA#8 (FIG. 19D). Black bar represents average mean of three independent experiments and gray dots indicate cleavage percentage of each replicate. Student's T-test P- values are indicated. 23 nt gDNAs with 5 ' T were designed for gDNAs targeting HIV-1 ADIS 5'UTR.
FIGS. 20A-20B show cleavage assay comparing activity by DISC and RNase H against unstructured miR-20a RNA target and structured HIV-1 ADIS 5'UTR RNA target. FIG. 20A. Quantified data of cleavage of unstructured miR-20a target by DISC (solid circles) and RNase H (open triangles). Black bar represents average mean of three independent experiments. Circles and triangles represent individual replicates. FIG. 20B. Quantified data of cleavage of structured HIV-1 ADIS 5'UTR RNA target by DISC (solid circles) and RNase H (open triangles). Black bar represents average mean of three independent experiments. Circles and triangles represent individual replicates. The results indicate that DISC is able to access and cleave structured regions of RNA that RNase H is unable to cleave.
DETAILED DESCRIPTION
Disclosed herein are methods for cleaving target RNAs using a yeast Argonaute polypeptide and a single stranded DNA as a guide sequence. The inventors have shown that a yeast Argonaute protein can utilize single-stranded DNA as a guide molecule for cleaving target RNAs. Also disclosed herein are systems and methods for detecting nuclease accessibility sites in an RNA sequence. The inventors have further shown that a yeast Argonaute protein can utilize single-stranded DNA as a guide molecule for, among other applications, high-throughput identification and targeting of accessible regions of highly- structured RNAs. Complexes (referred to as a DNA-induced slicing complex; or "DISC") containing an Argonaute protein which utilize single-stranded DNA as a guide molecule have advantages over complexes containing an Argonaute protein which utilize single-stranded RNA ("RISC") due to the increased stability and significantly lower cost of DNA over RNA, making large-scale high-throughput applications more feasible.
Reference will now be made in detail to the embodiments of the invention, examples of which are illustrated in the drawings and the examples. This invention may, however, be embodied in many different forms and should not be construed as limited to the embodiments set forth herein.
Unless defined otherwise, all technical and scientific terms used herein have the same meaning as commonly understood to one of ordinary skill in the art to which this invention belongs. The following definitions are provided for the full understanding of terms used in this specification.
Terminology
As used herein, the article "a," "an," and "the" means "at least one," unless the context in which the article is used clearly indicates otherwise. The term "about" as used herein when referring to a measurable value such as an amount, a percentage, and the like, is meant to encompass variations of ±20%, ±10%, ±5%, or ±1 % from the measurable value.
As used herein, the terms "may," "optionally," and "may optionally" are used interchangeably and are meant to include cases in which the condition occurs as well as cases in which the condition does not occur. Thus, for example, the statement that a composition "optionally includes a second component" is meant to include cases in which the composition includes second component as well as cases in which the formulation does not include a second component.
The term "comprising" and variations thereof as used herein is used synonymously with the terms "including," "containing," and variations thereof and are open, non-limiting terms. Although the terms "comprising," "including," and "containing" have been used herein to describe various embodiments, the terms "consisting essentially of and "consisting of can be used in place of "comprising," "including," and "containing" to provide for more specific embodiments and are also disclosed.
The term "nucleic acid" as used herein means a polymer composed of nucleotides, e.g. deoxyribonucleotides or ribonucleotides.
The terms "ribonucleic acid" and "RNA" as used herein mean a polymer composed of ribonucleotides.
The terms "deoxyribonucleic acid" and "DNA" as used herein mean a polymer composed of deoxyribonucleotides.
The term "oligonucleotide" denotes single- or double-stranded nucleotide multimers of from about 2 to up to about 100 nucleotides in length. Suitable oligonucleotides may be prepared by the phosphoramidite method described by Beaucage and Carruthers, Tetrahedron Lett., 22: 1859-1862 (1981), or by the triester method according to Matteucci, et al., J. Am. Chem. Soc, 103:3185 (1981), both incorporated herein by reference, or by other chemical methods using either a commercial automated oligonucleotide synthesizer or VLSIPS™ technology. When oligonucleotides are referred to as "double-stranded," it is understood by those of skill in the art that a pair of oligonucleotides exist in a hydrogen-bonded, helical array typically associated with, for example, DNA. In addition to the 100% complementary form of double-stranded oligonucleotides, the term "double-stranded," as used herein is also meant to refer to those forms which include such structural features as bulges and loops, described more fully in such biochemistry texts as Stryer, Biochemistry, Third Ed., (1988), incorporated herein by reference for all purposes. A single-stranded oligonucleotide can exist as a linear molecule without any hydrogen-bonded nucleotides, or can fold three-dimensionally to form hydrogen bonds between individual nucleotides along the single stranded oligonucleotide.
The term "polynucleotide" refers to a single or double stranded polymer composed of nucleotide monomers. Polynucleotides can be any length, either deoxyribonucleotides or ribonucleotides, or analogs thereof. Polynucleotides may have any three-dimensional structure, and may perform any function, known or unknown. The following are non-limiting examples of polynucleotides: a gene or gene fragment, exons, introns, messenger RNA (mRNA), transfer RNA, ribosomal RNA, ribozymes, cDNA, recombinant polynucleotides, branched polynucleotides, plasmids, vectors, isolated DNA of any sequence, isolated RNA of any sequence, nucleic acid probes, and primers. A polynucleotide may comprise modified nucleotides, such as methylated nucleotides and nucleotide analogs. If present, modifications to the nucleotide structure may be imparted before or after assembly of the polymer. The sequence of nucleotides may be interrupted by non-nucleotide components. A polynucleotide may be further modified after polymerization, such as by conjugation with a labeling component. A polynucleotide is composed of a specific sequence of four nucleotide bases: adenine (A); cytosine (C); guanine (G); thymine (T); and uracil (U) for thymine (T) when the polynucleotide is RNA. Thus, the term "polynucleotide sequence" is the alphabetical representation of a polynucleotide molecule. In some embodiments, the polynucleotide is composed of nucleotide monomers of generally greater than 100 nucleotides in length and up to about 8,000 or more nucleotides in length.
The term "polypeptide" refers to a compound made up of a single chain of D- or L- amino acids or a mixture of D- and L-amino acids joined by peptide bonds.
The term "complementary" or "complementarity" refers to the topological compatibility or matching together of interacting surfaces of two molecules (e.g., a probe molecule and its target, particularly a DNA guide molecule and a target RNA molecule). Thus, the two molecules (e.g., target and its probe) can be described as complementary, and furthermore, the contact surface characteristics are complementary to each other. In the case of nucleotides or polynucleotides (e.g., DNA or RNA), the two molecules are complementary if they have sufficiently compatible nucleotide base-pairs such that the two molecules can hybridize. The term "complementary," as it relates to nucleotide molecules (e.g., nucleotides, oligonucleotides, polynucleotides, modified nucleotides, etc.), is intended to include two or more nucleotide molecules which have 100% complementarity (e.g., each nucleotide in a sequence of one molecule is the nucleotide base-pair complement of an adjacent nucleotide in a sequence of the second molecule, in sequential order) as well as two or more nucleotide molecules which have less than 100% complementarity but which hybridize under the conditions of the methods disclosed herein.
The term "hybridization" or "hybridizes" refers to a process of establishing a non- covalent, sequence-specific interaction between two or more complementary strands of nucleic acids into a single hybrid, which in the case of two strands is referred to as a duplex.
The term "anneal" refers to the process by which a single-stranded nucleic acid sequence pairs by hydrogen bonds to a complementary sequence, forming a double-stranded nucleic acid sequence, including the reformation (renaturation) of complementary strands that were separated by heat (thermally denatured).
The term "melting" refers to the denaturation of a double-stranded nucleic acid sequence due to high temperatures, resulting in the separation of the double strand into two single strands by breaking the hydrogen bonds between the strands.
The term "target" refers to a molecule that has an affinity for a given probe. Targets may be naturally-occurring or man-made molecules. Also, they can be employed in their unaltered state or as aggregates with other species.
The term "promoter" or "regulatory element" refers to a region or sequence determinants located upstream or downstream from the start of transcription and which are involved in recognition and binding of RNA polymerase and other proteins to initiate transcription. Promoters need not be of bacterial origin, for example, promoters derived from viruses or from other organisms can be used in the compositions, systems, or methods described herein. The term "regulatory element" is intended to include promoters, enhancers, internal ribosomal entry sites (IRES), and other expression control elements (e.g. transcription termination signals, such as polyadenylation signals and poly-U sequences). Such regulatory elements are described, for example, in Goeddel, Gene Expression Technology: Methods in Enzymology 185, Academic Press, San Diego, Calif. (1990). Regulatory elements include those that direct constitutive expression of a nucleotide sequence in many types of host cell and those that direct expression of the nucleotide sequence only in certain host cells (e.g., tissue- specific regulatory sequences). A tissue-specific promoter may direct expression primarily in a desired tissue of interest, such as muscle, neuron, bone, skin, blood, specific organs (e.g. liver, pancreas), or particular cell types (e.g. lymphocytes). Regulatory elements may also direct expression in a temporal-dependent manner, such as in a cell-cycle dependent or developmental stage-dependent manner, which may or may not also be tissue or cell-type specific. In some embodiments, a vector comprises one or more pol III promoter (e.g. 1 , 2, 3, 4, 5, or more pol I promoters), one or more pol II promoters (e.g. 1 , 2, 3, 4, 5, or more pol II promoters), one or more pol I promoters (e.g. 1 , 2, 3, 4, 5, or more pol I promoters), or combinations thereof. Examples of pol III promoters include, but are not limited to, U6 and HI promoters. Examples of pol II promoters include, but are not limited to, the retroviral Rous sarcoma virus (RSV) LTR promoter (optionally with the RSV enhancer), the cytomegalovirus (CMV) promoter (optionally with the CMV enhancer) [see, e.g., Boshart et al, Cell, 41 :521 - 530 (1985)], the SV40 promoter, the dihydrofolate reductase promoter, the β-actin promoter, the phosphoglycerol kinase (PGK) promoter, and the EFla promoter. Also encompassed by the term "regulatory element" are enhancer elements, such as WPRE; CMV enhancers; the R- U5' segment in LTR of HTLV-I (Mol. Cell. Biol., Vol. 8(1), p. 466-472, 1988); SV40 enhancer; and the intron sequence between exons 2 and 3 of rabbit β-globin (Proc. Natl. Acad. Sci. USA., Vol. 78(3), p. 1527-31, 1981). It will be appreciated by those skilled in the art that the design of the expression vector can depend on such factors as the choice of the host cell to be transformed, the level of expression desired, etc.
The term "recombinant" refers to a human manipulated nucleic acid (e.g. polynucleotide) or a copy or complement of a human manipulated nucleic acid (e.g. polynucleotide), or if in reference to a protein (i.e, a "recombinant protein"), a protein encoded by a recombinant nucleic acid (e.g. polynucleotide). In embodiments, a recombinant expression cassette comprising a promoter operably linked to a second nucleic acid (e.g. polynucleotide) may include a promoter that is heterologous to the second nucleic acid (e.g. polynucleotide) as the result of human manipulation (e.g., by methods described in Sambrook et al, Molecular Cloning— A Laboratory Manual, Cold Spring Harbor Laboratory, Cold Spring Harbor, N.Y., (1989) or Current Protocols in Molecular Biology Volumes 1-3, John Wiley & Sons, Inc. (1994-1998)). In another example, a recombinant expression cassette may comprise nucleic acids (e.g. polynucleotides) combined in such a way that the nucleic acids (e.g. polynucleotides) are extremely unlikely to be found in nature. For instance, human manipulated restriction sites or plasmid vector sequences may flank or separate the promoter from the second nucleic acid (e.g. polynucleotide). One of skill will recognize that nucleic acids (e.g. polynucleotides) can be manipulated in many ways and are not limited to the examples above.
The term "expression cassette" refers to a nucleic acid construct, which when introduced into a host cell, results in transcription and/or translation of a RNA or polypeptide, respectively. In embodiments, an expression cassette comprising a promoter operably linked to a second nucleic acid (e.g. polynucleotide) may include a promoter that is heterologous to the second nucleic acid (e.g. polynucleotide) as the result of human manipulation (e.g., by methods described in Sambrook et al, Molecular Cloning— A Laboratory Manual, Cold Spring Harbor Laboratory, Cold Spring Harbor, N.Y., (1989) or Current Protocols in Molecular Biology Volumes 1 -3, John Wiley & Sons, Inc. (1994-1998)). In some embodiments, an expression cassette comprising a terminator (or termination sequence) operably linked to a second nucleic acid (e.g. polynucleotide) may include a terminator that is heterologous to the second nucleic acid (e.g. polynucleotide) as the result of human manipulation. In some embodiments, the expression cassette comprises a promoter operably linked to a second nucleic acid (e.g. polynucleotide) and a terminator operably linked to the second nucleic acid (e.g. polynucleotide) as the result of human manipulation. In some embodiments, the expression cassette comprises an endogenous promoter. In some embodiments, the expression cassette comprises an endogenous terminator. In some embodiments, the expression cassette comprises a synthetic (or non-natural) promoter. In some embodiments, the expression cassette comprises a synthetic (or non-natural) terminator.
The terms "identical" or percent "identity," in the context of two or more nucleic acids or polypeptide sequences, refer to two or more sequences or subsequences that are the same or have a specified percentage of amino acid residues or nucleotides that are the same (i.e., about 60% identity, preferably 61 %, 62%, 63%, 64%, 65%, 66%, 67%, 68%, 69%, 70%, 71 %, 72%, 73%, 74%, 75%, 76%, 77%, 78%, 79%, 80%, 81%, 82%, 83%, 84%, 85%, 86%, 87%, 88%, 89%, 90%, 91 %, 92%, 93%,94%, 95%, 96%, 97%, 98%, 99% or higher identity over a specified region when compared and aligned for maximum correspondence over a comparison window or designated region) as measured using a BLAST or BLAST 2.0 sequence comparison algorithms with default parameters described below, or by manual alignment and visual inspection (see, e.g., NCBI web site or the like). Such sequences are then said to be "substantially identical." This definition also refers to, or may be applied to, the compliment of a test sequence. The definition also includes sequences that have deletions and/or additions, as well as those that have substitutions. As described below, the preferred algorithms can account for gaps and the like. Preferably, identity exists over a region that is at least about 10 amino acids or 20 nucleotides in length, or more preferably over a region that is 10-50 amino acids or 20-50 nucleotides in length. As used herein, percent (%) amino acid sequence identity is defined as the percentage of amino acids in a candidate sequence that are identical to the amino acids in a reference sequence, after aligning the sequences and introducing gaps, if necessary, to achieve the maximum percent sequence identity. Alignment for purposes of determining percent sequence identity can be achieved in various ways that are within the skill in the art, for instance, using publicly available computer software such as BLAST, BLAST-2, ALIGN, ALIGN-2 or Megalign (DNASTAR) software. Appropriate parameters for measuring alignment, including any algorithms needed to achieve maximal alignment over the full-length of the sequences being compared can be determined by known methods.
For sequence comparisons, typically one sequence acts as a reference sequence, to which test sequences are compared. When using a sequence comparison algorithm, test and reference sequences are entered into a computer, subsequence coordinates are designated, if necessary, and sequence algorithm program parameters are designated. Preferably, default program parameters can be used, or alternative parameters can be designated. The sequence comparison algorithm then calculates the percent sequence identities for the test sequences relative to the reference sequence, based on the program parameters.
One example of an algorithm that is suitable for determining percent sequence identity and sequence similarity are the BLAST and BLAST 2.0 algorithms, which are described in Altschul et al. (1977) Nuc. Acids Res. 25:3389-3402, and Altschul et al. (1990) J. Mol. Biol. 215 :403-410, respectively. Software for performing BLAST analyses is publicly available through the National Center for Biotechnology Information (http://www.ncbi.nlm.nih.gov/). This algorithm involves first identifying high scoring sequence pairs (HSPs) by identifying short words of length W in the query sequence, which either match or satisfy some positive- valued threshold score T when aligned with a word of the same length in a database sequence. T is referred to as the neighborhood word score threshold (Altschul et al. (1990) J. Mol. Biol. 215:403-410). These initial neighborhood word hits act as seeds for initiating searches to find longer HSPs containing them. The word hits are extended in both directions along each sequence for as far as the cumulative alignment score can be increased. Cumulative scores are calculated using, for nucleotide sequences, the parameters M (reward score for a pair of matching residues; always >0) and N (penalty score for mismatching residues; always <0). For amino acid sequences, a scoring matrix is used to calculate the cumulative score. Extension of the word hits in each direction are halted when: the cumulative alignment score falls off by the quantity X from its maximum achieved value; the cumulative score goes to zero or below, due to the accumulation of one or more negative-scoring residue alignments; or the end of either sequence is reached. The BLAST algorithm parameters W, T, and X determine the sensitivity and speed of the alignment. The BLASTN program (for nucleotide sequences) uses as defaults a wordlength (W) of 1 1, an expectation (E) or 10, M=5, N=-4 and a comparison of both strands. For amino acid sequences, the BLASTP program uses as defaults a wordlength of 3, and expectation (E) of 10, and the BLOSUM62 scoring matrix (see Henikoff and Henikoff (1989) Proc. Natl. Acad. Sci. USA 89: 10915) alignments (B) of 50, expectation (E) of 10, M=5, N=-4, and a comparison of both strands. The BLAST algorithm also performs a statistical analysis of the similarity between two sequences (see, e.g., Karlin and Altschul (1993) Proc. Natl. Acad. Sci. USA 90:5873-5787). One measure of similarity provided by the BLAST algorithm is the smallest sum probability (P(N)), which provides an indication of the probability by which a match between two nucleotide or amino acid sequences would occur by chance. For example, a nucleic acid is considered similar to a reference sequence if the smallest sum probability in a comparison of the test nucleic acid to the reference nucleic acid is less than about 0.2, more preferably less than about 0.01.
The phrase "codon optimized" as it refers to genes or coding regions of nucleic acid molecules for the transformation of various hosts, refers to the alteration of codons in the gene or coding regions of polynucleic acid molecules to reflect the typical codon usage of a selected organism without altering the polypeptide encoded by the DNA. Such optimization includes replacing at least one, or more than one, or a significant number, of codons with one or more codons that are more frequently used in the genes of that selected organism.
Nucleic acid is "operably linked" when it is placed into a functional relationship with another nucleic acid sequence. For example, DNA for a presequence or secretory leader is operably linked to DNA for a polypeptide if it is expressed as a preprotein that participates in the secretion of the polypeptide; a promoter or enhancer is operably linked to a coding sequence if it affects the transcription of the sequence; or a ribosome binding site is operably linked to a coding sequence if it is positioned so as to facilitate translation. Generally, "operably linked" means that the DNA sequences being linked are near each other, and, in the case of a secretory leader, contiguous and in reading phase. However, operably linked nucleic acids (e.g. enhancers and coding sequences) do not have to be contiguous. Linking is accomplished by ligation at convenient restriction sites. If such sites do not exist, the synthetic oligonucleotide adaptors or linkers are used in accordance with conventional practice. In embodiments, a promoter is operably linked with a coding sequence when it is capable of affecting (e.g. modulating relative to the absence of the promoter) the expression of a protein from that coding sequence (i.e., the coding sequence is under the transcriptional control of the promoter).
The term "nucleobase" refers to the part of a nucleotide that bears the Watson/Crick base-pairing functionality. The most common naturally-occurring nucleobases, adenine (A), guanine (G), uracil (U), cytosine (C), and thymine (T) bear the hydrogen-bonding functionality that binds one nucleic acid strand to another in a sequence specific manner.
As used throughout, by a "subject" (or a "host") is meant an individual. Thus, the "subject" can include, for example, domesticated animals, such as cats, dogs, etc., livestock (e.g., cattle, horses, pigs, sheep, goats, etc.), laboratory animals (e.g., mouse, rabbit, rat, guinea pig, etc.) mammals, non-human mammals, primates, non-human primates, rodents, birds, reptiles, amphibians, fish, and any other animal. The subj ect can be a mammal such as a primate or a human.
A polynucleotide sequence is "heterologous" to a second polynucleotide sequence if it originates from a foreign species, or, if from the same species, is modified by human action from its original form. For example, a promoter operably linked to a heterologous coding sequence refers to a coding sequence from a species different from that from which the promoter was derived, or, if from the same species, a coding sequence which is different from naturally occurring allelic variants.
The phrase "selectively (or specifically) hybridizes to" refers to the binding, duplexing, or hybridizing of a molecule only to a particular nucleotide sequence with a higher affinity, e.g., under more stringent conditions, than to other nucleotide sequences (e.g., total cellular or library DNA or RNA).
The phrase "stringent hybridization conditions" refers to conditions under which a probe will hybridize to its target subsequence, typically in a complex mixture of nucleic acids, but to no other sequences. Stringent conditions are sequence-dependent and will be different in different circumstances. Longer sequences hybridize specifically at higher temperatures. An extensive guide to the hybridization of nucleic acids is found in Tijssen, Techniques in Biochemistry and Molecular Biology— Hybridization with Nucleic Probes, "Overview of principles of hybridization and the strategy of nucleic acid assays" (1993). Generally, stringent conditions are selected to be about 5-10° C. lower than the thermal melting point (Tm) for the specific sequence at a defined ionic strength pH. The Tmis the temperature (under defined ionic strength, pH, and nucleic concentration) at which 50% of the probes complementary to the target hybridize to the target sequence at equilibrium (as the target sequences are present in excess, at Tm, 50% of the probes are occupied at equilibrium). Stringent conditions may also be achieved with the addition of destabilizing agents such as formamide. For selective or specific hybridization, a positive signal is at least two times background, preferably 10 times background hybridization. Exemplary stringent hybridization conditions can be as follows: 50% formamide, 5xSSC, and 1% SDS, incubating at 42° C, or, 5xSSC, 1% SDS, incubating at 65° C, with wash in 0.2xSSC, and 0.1% SDS at 65° C.
Nucleic acids that do not hybridize to each other under stringent conditions are still substantially identical if the polypeptides which they encode are substantially identical. This occurs, for example, when a copy of a nucleic acid is created using the maximum codon degeneracy permitted by the genetic code. In such cases, the nucleic acids typically hybridize under moderately stringent hybridization conditions. Exemplary "moderately stringent hybridization conditions" include a hybridization in a buffer of 40% formamide, 1 M NaCl, 1% SDS at 37° C, and a wash in 1 * SSC at 45° C. A positive hybridization is at least twice background. Those of ordinary skill will readily recognize that alternative hybridization and wash conditions can be utilized to provide conditions of similar stringency. Additional guidelines for determining hybridization parameters are provided in numerous reference, e.g., and Current Protocols in Molecular Biology, ed. Ausubel, et al. One of skill will recognize that these values can be appropriately adjusted to determine corresponding identity of proteins encoded by two nucleotide sequences by taking into account codon degeneracy, amino acid similarity, reading frame positioning and the like. Polypeptides which are "substantially similar" share sequences as noted above except that residue positions which are not identical may differ by conservative amino acid changes. Conservative amino acid substitutions refer to the interchangeability of residues having similar side chains. For example, a group of amino acids having aliphatic side chains is glycine, alanine, valine, leucine, and isoleucine; a group of amino acids having aliphatic-hydroxyl side chains is serine and threonine; a group of amino acids having amide-containing side chains is asparagine and glutamine; a group of amino acids having aromatic side chains is phenylalanine, tyrosine, and tryptophan; a group of amino acids having basic side chains is lysine, arginine, and histidine; and a group of amino acids having sulfur-containing side chains is cysteine and methionine. Exemplary conservative amino acids substitution groups are: valine-leucine-isoleucine, phenylalanine-tyrosine, lysine-arginine, alanine-valine, aspartic acid-glutamic acid, and asparagine-glutamine.
DNA-Guided RNA Cleavage Systems
In one aspect, disclosed herein is a DNA-guided RNA cleavage system comprising: a yeast Argonaute polypeptide; and
a heterologous, single-stranded oligonucleotide guide molecule;
wherein the single-stranded oligonucleotide guide molecule is a DNA oligonucleotide that is complementary to a target RNA sequence.
In one embodiment, the yeast Argonaute polypeptide is from Vanderwaltozyma polyspora (also known as Kluyveromyces polysporus). In one embodiment, the yeast Argonaute polypeptide is selected from SEQ ID NO: 31 , SEQ ID NO:32, or SEQ ID NO:33. In one embodiment, the yeast Argonaute polypeptide is SEQ ID NO:31. In one embodiment, the yeast Argonaute polypeptide is SEQ ID NO:32. In one embodiment, the yeast Argonaute polypeptide is SEQ ID NO:33.
In some embodiments, the yeast Argonaute polypeptide has at least 60% (for example, at least 60%, at least 65%, at least 70%, at least 80%, at least 85%, at least 90%, at least 95%, at least 99%) identity to SEQ ID NO:31, SEQ ID NO:32, or SEQ ID NO:33. In some embodiments, the yeast Argonaute polypeptide has at least 60% (for example, at least 60%, at least 65%, at least 70%, at least 80%, at least 85%, at least 90%, at least 95%, at least 99%) identity to SEQ ID NO:31. In some embodiments, the yeast Argonaute polypeptide has at least 60% (for example, at least 60%, at least 65%, at least 70%, at least 80%, at least 85%, at least 90%, at least 95%, at least 99%) identity to SEQ ID NO:32. In some embodiments, the yeast Argonaute polypeptide has at least 60% (for example, at least 60%, at least 65%, at least 70%, at least 80%, at least 85%, at least 90%, at least 95%, at least 99%) identity to SEQ ID NO:33.
In one embodiment, the single-stranded oligonucleotide guide molecule is about 12 to about 45 nucleotides. In some embodiments, the single-stranded oligonucleotide guide molecule is about 12, about 13, about 14, about 15, about 16, about 17, about 18, about 19, about 20, about 21 , about 22, about 23, about 24, about 25, about 26, about 27, about 28, about 29, about 30, about 31 , about 32, about 33, about 34, about 35, about 36, about 37, about 38, about 39, about 40, about 41 , about 42, about 43, about 44, or about 45 nucleotides. In one embodiment, the single-stranded oligonucleotide guide molecule is about 12 to about 30 nucleotides. In one embodiment, the single-stranded oligonucleotide guide molecule is about 14 to about 26 nucleotides. In some embodiments, the single-stranded oligonucleotide guide molecule is about 21 to about 25 nucleotides. In one embodiment, the single-stranded oligonucleotide guide molecule is about 21 nucleotides. In one embodiment, the single- stranded oligonucleotide guide molecule is about 22 nucleotides. In one embodiment, the single-stranded oligonucleotide guide molecule is about 23 nucleotides. In one embodiment, the single-stranded oligonucleotide guide molecule is about 24 nucleotides. In one embodiment, the single-stranded oligonucleotide guide molecule is about 25 nucleotides. In one embodiment, the single-stranded oligonucleotide guide molecule is 21 nucleotides. In one embodiment, the single-stranded oligonucleotide guide molecule is 22 nucleotides. In one embodiment, the single-stranded oligonucleotide guide molecule is 23 nucleotides. In one embodiment, the single-stranded oligonucleotide guide molecule is 24 nucleotides. In one embodiment, the single-stranded oligonucleotide guide molecule is 25 nucleotides.
In one embodiment, the target RNA sequence is from a mammal. In one embodiment, the target RNA sequence is from a human. In one embodiment, the DNA encoding a yeast Argonaute polypeptide is encoded by SEQ ID NO:34, SEQ ID NO:35, or SEQ ID NO:36. In one embodiment, the DNA encoding a yeast Argonaute polypeptide is encoded by SEQ ID NO:34. In one embodiment, the DNA encoding a yeast Argonaute polypeptide is encoded by SEQ ID NO:35. In one embodiment, the DNA encoding a yeast Argonaute polypeptide is encoded by SEQ ID NO:36.
In one embodiment, the DNA encoding a yeast Argonaute polypeptide is encoded by a nucleic acid which is at least 60%, at least 75%, at least 80%, at least 85%, at least 90%, at least 95%, at least 97.5%, or at least 99% identical to SEQ ID NO:34, SEQ ID NO: 35, or SEQ ID NO:36. In one embodiment, the DNA encoding a yeast Argonaute polypeptide is encoded by a nucleic acid which is at least 60%, at least 75%, at least 80%, at least 85%, at least 90%, at least 95%, at least 97.5%, or at least 99% identical to SEQ ID NO:34. In one embodiment, the DNA encoding a yeast Argonaute polypeptide is encoded by a nucleic acid which is at least 60%, at least 75%, at least 80%, at least 85%, at least 90%, at least 95%, at least 97.5%, or at least 99% identical to SEQ ID NO:35. In one embodiment, the DNA encoding a yeast Argonaute polypeptide is encoded by a nucleic acid which is at least 60%, at least 75%, at least 80%, at least 85%, at least 90%, at least 95%, at least 97.5%, or at least 99% identical to SEQ ID NO:36.
In one embodiment, the single-stranded oligonucleotide guide molecule (for example, ssDNA) has at least one chemically modified nucleotide. These modified nucleotides may confer increased stability, decreased off-target effects, and/or reduced toxicity, as compared to a ssDNA not having the chemically modified nucleotide.
In some embodiments, the at least one chemically modified nucleotide comprises a chemically modified nucleobase, a chemically modified ribose, a chemically modified phosphodiester linkage, or a combination thereof.
In one embodiment, the chemically modified nucleobase is selected from 5- formylcytidine (5fC), 5-methylcytidine (5meC), 5-methoxycytidine (5moC), 5- hydroxycytidine (5hoC), 5-hydroxymethylcytidine (5hmC), 5-formyluridine (5fU), 5- methyluridine (5-meU), 5-methoxyuridine (5moU), 5-carboxymethylesteruridine (5camU), pseudouridine (Ψ), N^methylpseudouridine (mel F), N6-methyladenosine (me6A), or thienoguanosine (thG).
In one embodiment, the chemically modified ribose is selected from 2'-0-methyl (2'- O-Me), 2'-Fluoro (2'-F), 2'-deoxy-2'-fluoro-beta-D-arabino-nucleic acid (2'F-ANA), 4'-S, 4'- SFANA, 2'-azido, UNA, 2'-0-methoxy-ethyl (2'-0-ME), 2'-0-Allyl, 2'-0-Ethylamine, I'-O- Cyanoethyl, Locked nucleic acid (LAN), Methylene-cLAN, N-MeO-amino BNA, or N-MeO- aminooxy BNA.
In one embodiment, the chemically modified phosphodiester linkage is selected from Phosphorothioate (PS), Boranophosphate, phosphodithioate (PS2), 3',5'-amide, N3'- phosphoramidate (NP), Phosphodiester (PO), or 2',5'-phosphodiester (2',5'-PO).
In general, a guide ssDNA sequence is any polynucleotide sequence having sufficient complementarity with a target polynucleotide sequence to hybridize with the target sequence and direct cleavage of the target sequence. In some embodiments, the degree of complementarity between a guide ssDNA sequence and its corresponding RNA target sequence, when optimally aligned using a suitable alignment algorithm, is about or more than about 50%, 60%, 75%, 80%, 85%, 90%, 95%, 97.5%, 99%, or more. In some embodiments, the guide ssDNA is perfectly complementary (has perfect complementarity) with its corresponding RNA target sequence, when optimally aligned using a suitable alignment algorithm.
Methods
In one aspect, provided herein is a method for cleaving an RNA molecule, comprising: binding to a target RNA sequence a complex comprising:
a yeast Argonaute polypeptide; and
a heterologous, single-stranded oligonucleotide guide molecule;
wherein the single-stranded oligonucleotide guide molecule is a DNA oligonucleotide that is complementary to the target RNA sequence; and
wherein the Argonaute polypeptide:guide molecule complex cleaves the target RNA sequence.
In another aspect, disclosed herein is a method for attenuating expression of a target gene in a cell, comprising:
introducing into the cell a yeast Argonaute polypeptide; and
introducing into the cell a single stranded DNA (ssDNA) in an amount sufficient to attenuate expression of the target gene, wherein the ssDNA comprises a nucleotide sequence that is complementary to a nucleotide sequence of the target gene.
In one embodiment, the yeast Argonaute polypeptide is from Vanderwaltozyma polyspora (also known as Kluyveromyces polysporus). In one embodiment, the yeast Argonaute polypeptide is selected from SEQ ID NO:31, SEQ ID NO:32, or SEQ ID NO:33. In one embodiment, the yeast Argonaute polypeptide is SEQ ID NO:31. In one embodiment, the yeast Argonaute polypeptide is SEQ ID NO:32. In one embodiment, the yeast Argonaute polypeptide is SEQ ID NO:33.
In some embodiments, the yeast Argonaute polypeptide has at least 60% (for example, at least 60%, at least 65%, at least 70%, at least 80%, at least 85%, at least 90%, at least 95%, at least 99%) identity to SEQ ID NO:31, SEQ ID NO:32, or SEQ ID NO:33. In some embodiments, the yeast Argonaute polypeptide has at least 60% (for example, at least 60%, at least 65%, at least 70%, at least 80%, at least 85%, at least 90%, at least 95%, at least 99%) identity to SEQ ID NO:31. In some embodiments, the yeast Argonaute polypeptide has at least 60% (for example, at least 60%, at least 65%, at least 70%, at least 80%, at least 85%, at least 90%, at least 95%, at least 99%) identity to SEQ ID NO:32. In some embodiments, the yeast Argonaute polypeptide has at least 60% (for example, at least 60%, at least 65%, at least 70%, at least 80%, at least 85%, at least 90%, at least 95%, at least 99%) identity to SEQ ID NO:33.
In one embodiment, the single-stranded oligonucleotide guide molecule is about 12 to about 45 nucleotides. In some embodiments, the single-stranded oligonucleotide guide molecule is about 12, about 13, about 14, about 15, about 16, about 17, about 18, about 19, about 20, about 21 , about 22, about 23, about 24, about 25, about 26, about 27, about 28, about 29, about 30, about 31 , about 32, about 33, about 34, about 35, about 36, about 37, about 38, about 39, about 40, about 41 , about 42, about 43, about 44, or about 45 nucleotides. In one embodiment, the single-stranded oligonucleotide guide molecule is about 12 to about 30 nucleotides. In one embodiment, the single-stranded oligonucleotide guide molecule is about 14 to about 26 nucleotides. In some embodiments, the single-stranded oligonucleotide guide molecule is about 21 to about 25 nucleotides. In one embodiment, the single-stranded oligonucleotide guide molecule is about 21 nucleotides. In one embodiment, the single- stranded oligonucleotide guide molecule is about 22 nucleotides. In one embodiment, the single-stranded oligonucleotide guide molecule is about 23 nucleotides. In one embodiment, the single-stranded oligonucleotide guide molecule is about 24 nucleotides. In one embodiment, the single-stranded oligonucleotide guide molecule is about 25 nucleotides. In one embodiment, the single-stranded oligonucleotide guide molecule is 21 nucleotides. In one embodiment, the single-stranded oligonucleotide guide molecule is 22 nucleotides. In one embodiment, the single-stranded oligonucleotide guide molecule is 23 nucleotides. In one embodiment, the single-stranded oligonucleotide guide molecule is 24 nucleotides. In one embodiment, the single-stranded oligonucleotide guide molecule is 25 nucleotides.
In one embodiment, the target RNA sequence is from a mammal. In one embodiment, the target RNA sequence is from a human. In one embodiment, the target RNA sequence is from a virus. In one embodiment, the target RNA sequence is from a pathogen. In one embodiment, the target RNA sequence is from a bacterium. In one embodiment, the target RNA sequence is from a prokaryotic cell. In one embodiment, the target RNA sequence is from a eukaryotic cell.
In an additional embodiment, disclosed herein is a method of detecting a target RNA in a sample, comprising:
contacting the sample with a complex, wherein the complex comprises:
a yeast Argonaute polypeptide; and
a heterologous, single-stranded oligonucleotide guide molecule;
wherein the single-stranded oligonucleotide guide molecule is a DNA oligonucleotide that is complementary to a target RNA sequence; and
wherein the Argonaute polypeptide:guide molecule complex cleaves the target RNA sequence; and
detecting a cleavage product of the target RNA, thereby detecting the target RNA in the sample.
Kits
In one aspect, disclosed herein is a kit comprising:
a vector comprising a nucleotide sequence encoding a yeast Argonaute polypeptide operably linked to a promoter; and
a heterologous, single-stranded oligonucleotide guide molecule;
wherein the single-stranded oligonucleotide guide molecule is a DNA oligonucleotide that is complementary to a target RNA sequence.
Non-limiting examples of vectors that can be used to introduce expression vectors that encode Argonaute in various cell types: a nucleic acid vector (e.g., a plasmid vector) encoding Argonaute can be delivered directly to bacterial cells or cultured cells (e.g., mammalian cells) by electroporation; a nucleic acid vector (e.g., a plasmid vector) encoding Argonaute can be delivered directly to bacterial cells by chemical transformation; a viral vector (e.g., a retroviral vector, adenoviral vector, an adeno associated viral vector, an alphavirus vector, a vaccinia viral vector, a herpes viral vector, etc., as are known in the art) comprising a nucleotide sequence encoding Argonaute can be used to deliver Argonaute to cells (e.g., mammalian cells); a baculovirus expression system can be used to deliver Argonaute to insect cells; Agrobacterium mediated delivery can be employed in plants; and/or lipid mediated delivery (e.g., lipofectamine, oligofectamine) can also be employed for mammalian cells. In some embodiments, the gene sequence (for example, of a gene expressing Argonaute) may be codon optimized, without changing the resulting polypeptide sequence. In some embodiments, the codon optimization includes replacing at least one, or more than one, or a significant number, of codons with one or more codons that are more frequently used in various organisms. In some embodiments, the codon optimization increases expression of the optimized gene sequence.
High-Throughput DNA-Guided RNA Cleavage Systems
In one aspect, disclosed herein is a DNA-guided RNA cleavage system for high- throughput detection of nuclease accessibility sites, the system comprising a first complex comprising a first yeast Argonaute polypeptide and a first single-stranded DNA oligonucleotide guide molecule; and a second complex comprising a second yeast Argonaute polypeptide and a second single-stranded DNA oligonucleotide guide molecule; wherein the first and second single-stranded DNA oligonucleotide guide molecules are not identical and are complementary to a target RNA sequence.
In some embodiments, the Argonaute polypeptide is from a yeast. In some embodiments, the Argonaute polypeptide is from Vanderwaltozyma polyspora (also known as Kluyveromyces polysporus). Additional non-limiting examples of yeast Argonaute polypeptides can be from additional yeast species of the genus Kluyveromyces: K. aestuari, K. africanus, K. bacillisporus , K. blattae, K. dobzhanskii, K. hubeiensis, K. lactis, K. lodderae, K. marxianus, K. nonfermentans, K. piceae, K. sinensis, K. thermotolerans, K. waltii, K. wickerhamii, or K. yarrowii. Additional non-limiting examples of yeast Argonaute polypeptides can be from Yarrowia lipolytica, Pichia pastori, Candida vulgaris, Saccharomyces castellii, or Schizosaccharomyces pombe.
In some embodiments, the Argonaute polypeptide is from a eukaryote. In some embodiments, the Argonaute polypeptide is from a mammal. In some embodiments, the Argonaute polypeptide is from a primate. In some embodiments, the Argonaute polypeptide is from a human (for example, hAGO 1 , hAG02, hAG03, or hAG04). The number of Argonaute family members (genes) ranges from one in Schizosaccharomyces pombe to twenty-seven in Caenorhabditis elegans. Other non-limiting examples of eukaryotic Argonaute proteins are found in Homo sapiens (8), Rattus norvegicus (8), Rattus norvegicus (8), Drosophila melanogaster (5), Arabidopsis thaliana (10), and Neurospora crassa (2). (Hock, J and G Meister. Genome Biology 2008 9:210). Argonautes are key components of RISC in mammals, fungi, worms, protozoans and plants (M.A. Carmell et al, Nat. Struct. Mol. Biol. 1 1, 214 (2004)).
In some embodiments, the Argonaute polypeptide is a full length Argonaute polypeptide. In some embodiments, the Argonaute polypeptide comprises a portion of the Argonaute protein.
In one embodiment, the Argonaute polypeptide is a wild-type sequence. In one embodiment, the Argonaute polypeptide is a sequence with at least one mutation. In one embodiment, the Argonaute polypeptide comprises an amino acid sequence that is different from a naturally-occurring Argonaute polypeptide.
In some embodiments, the system and methods may comprise additional polypeptides in addition to the Argonaute polypeptide. For example, additional components of the RISC complex may be present.
In one embodiment, the yeast Argonaute polypeptide is from Vanderwaltozyma polyspora (also known as Kluyveromyces polysporus). In one embodiment, the yeast Argonaute polypeptide is selected from SEQ ID NO: 31 , SEQ ID NO:32, or SEQ ID NO:33. In one embodiment, the yeast Argonaute polypeptide is SEQ ID NO:31. In one embodiment, the yeast Argonaute polypeptide is SEQ ID NO:32. In one embodiment, the yeast Argonaute polypeptide is SEQ ID NO:33.
In some embodiments, the yeast Argonaute polypeptide has at least 60% (for example, at least 60%, at least 65%, at least 70%, at least 80%, at least 85%, at least 90%, at least 95%, at least 99%) identity to SEQ ID NO:31, SEQ ID NO:32, or SEQ ID NO:33. In some embodiments, the yeast Argonaute polypeptide has at least 60% (for example, at least 60%, at least 65%, at least 70%, at least 80%, at least 85%, at least 90%, at least 95%, at least 99%) identity to SEQ ID NO:31. In some embodiments, the yeast Argonaute polypeptide has at least 60% (for example, at least 60%, at least 65%, at least 70%, at least 80%, at least 85%, at least 90%, at least 95%, at least 99%) identity to SEQ ID NO:32. In some embodiments, the yeast Argonaute polypeptide has at least 60% (for example, at least 60%, at least 65%, at least 70%, at least 80%, at least 85%, at least 90%, at least 95%, at least 99%) identity to SEQ ID NO:33.
The first yeast Argonaute polypeptide can be the same as the second yeast Argonaute polypeptide. However, in some embodiments, the first yeast Argonaute polypeptide can be a different Argonaute polypeptide compared to the second yeast Argonaute polypeptide.
In some embodiments, the first single-stranded oligonucleotide guide molecule (occasionally referred to herein as a first "ssDNA guide molecule" or "gDNA") is about 12 to about 45 nucleotides. In some embodiments, the first ssDNA guide molecule is about 12, about 13, about 14, about 15, about 16, about 17, about 18, about 19, about 20, about 21, about 22, about 23, about 24, about 25, about 26, about 27, about 28, about 29, about 30, about 31, about 32, about 33, about 34, about 35, about 36, about 37, about 38, about 39, about 40, about 41, about 42, about 43, about 44, or about 45 nucleotides. In some embodiments, the first ssDNA guide molecule is about 12 to about 30 nucleotides. In some embodiments, the first ssDNA guide molecule is about 14 to about 26 nucleotides. In some embodiments, the first ssDNA guide molecule is about 21 to about 25 nucleotides. In some embodiments, the first ssDNA guide molecule is about 21 nucleotides. In some embodiments, the first ssDNA guide molecule is about 22 nucleotides. In some embodiments, the first ssDNA guide molecule is about 23 nucleotides. In some embodiments, the first ssDNA guide molecule is about 24 nucleotides. In some embodiments, the first ssDNA guide molecule is about 25 nucleotides.
In some embodiments, the second single-stranded oligonucleotide guide molecule (occasionally referred to herein as a second "ssDNA guide molecule" or "gDNA") is about 12 to about 45 nucleotides. In some embodiments, the second ssDNA guide molecule is about 12, about 13, about 14, about 15, about 16, about 17, about 18, about 19, about 20, about 21, about 22, about 23, about 24, about 25, about 26, about 27, about 28, about 29, about 30, about 31, about 32, about 33, about 34, about 35, about 36, about 37, about 38, about 39, about 40, about 41, about 42, about 43, about 44, or about 45 nucleotides. In some embodiments, the second ssDNA guide molecule is about 12 to about 30 nucleotides. In some embodiments, the second ssDNA guide molecule is about 14 to about 26 nucleotides. In some embodiments, the second ssDNA guide molecule is about 21 to about 25 nucleotides. In some embodiments, the second ssDNA guide molecule is about 21 nucleotides. In some embodiments, the second ssDNA guide molecule is about 22 nucleotides. In some embodiments, the second ssDNA guide molecule is about 23 nucleotides. In some embodiments, the second ssDNA guide molecule is about 24 nucleotides. In some embodiments, the second ssDNA guide molecule is about 25 nucleotides.
In some embodiments, the first and/or second ssDNA guide molecule is heterologous to the genomic DNA of a biological cell. In some embodiments, the first and/or second ssDNA guide molecule is homologous to the genomic DNA of a biological cell.
By "complex," it is meant that the yeast Argonaute polypeptide and the single-stranded
DNA oligonucleotide guide molecule are associated by any one or more intermolecular forces, such that a stable polypeptide-oligonucleotide complex contains the capacity to bind to and cleave an RNA molecule (occasionally referred to herein as an "Argonaute polypeptide:guide complex"). The intermolecular force(s) binding the yeast Argonaute polypeptide and the single-stranded DNA oligonucleotide guide molecule together can be any one, or combination, of intermolecular binding forces, for example covalent, ionic, ion-dipole, dipole, London dispersion, van der Wall's, hydrogen bonding forces and/or hydrophobic interaction. The complex can contain other bound molecules, for instance, amino acids, proteins, nucleotides, polynucleotides, small molecules, lipids, carbohydrates, etc., so long as the complex retains the capacity to bind to and cleave an RNA molecule. In some embodiments, the complex contains other bound molecules typically present in a DISC complex or RISC complex. As used herein, "DNA-induced slicing complex," "DISC," and "complex" (as it refers to a DNA-containing complex) can be used interchangeably.
The first and second single-stranded DNA oligonucleotide guide molecules are not identical. That is, the first and second ssDNA guide molecules do not contain the exact same oligonucleotide sequence.
The first and second single-stranded DNA oligonucleotide guide molecules are complementary to a target RNA sequence. The target RNA sequence, in some embodiments, is one continuous RNA molecule. Alternatively, the target RNA sequence can be a target RNA sequence on different RNA molecules. The target can be an isolated and/or purified RNA molecule, or can be mixed with other molecules (e.g., one or more additional RNA molecules). Optionally, the target RNA sequence can be mixed with cellular components, as in the case of crude extracts of cellular RNAs. In some embodiments, the target RNA sequence can be comprised within a cell.
Typically, the first and second ssDNA guide molecules bind target RNA sequences which are not identical. However, depending on the nucleotides of a ssDNA guide molecule which hybridize with the target RNA sequence, the first and second ssDNA guide molecules can bind overlapping or even identical target RNA sequences.
Optionally, the DNA-guided RNA cleavage system for high-throughput detection of nuclease accessibility sites comprises more than two Argonaute polypeptide: guide complexes. In some embodiments, the DNA-guided RNA cleavage system comprises three, four, five, six, seven, eight, nine, ten, or more Argonaute polypeptide:guide complexes. Indeed, the high- throughput nature of the system can allow large numbers of Argonaute polypeptide:guide complexes to be used in the methods described herein. In each of the more than two Argonaute polypeptide:guide complexes, the ssDNA guide molecules are not identical. One way to provide multiple Argonaute polypeptide:guide complexes each comprising a different ssDNA guide molecule is to bind a library of single-stranded DNA oligonucleotide guide molecules with Argonaute polypeptides. Thus, in some embodiments, the DNA-guided RNA cleavage system comprises two or more Argonaute polypeptide:guide complexes comprising a library of single-stranded DNA oligonucleotide guide molecules. The library can be designed randomly, or be based on intentional selection of DNA sequences. The library can be used to form a collection of separately provided complexes (e.g., each ssDNA guide molecule is separately bound to an Argonaute polypeptide in a separate reaction mixture). Alternatively, the library can be used to form a mixture of complexes (e.g., each ssDNA guide molecule is bound to an Argonaute polypeptide in a single mixture).
The target RNA sequence is not particularly limited and can be synthetic or natural. A natural target RNA sequence can be from any biological cell or any organism. In some embodiments, the target RNA sequence is from a mammal. In some embodiments, the target RNA sequence is from a human. In one embodiment, the target RNA sequence is from a virus. In one embodiment, the target RNA sequence is from a pathogen. In one embodiment, the target RNA sequence is from a bacterium. In one embodiment, the target RNA sequence is from a prokaryotic cell. In one embodiment, the target RNA sequence is from a eukaryotic cell. In some embodiments, the target RNA is a 5'UTR RNA. In some embodiments, the target RNA is a genomic RNA (e.g., a viral genomic RNA), or a portion thereof. In some embodiments, the target RNA is from HIV-1 , Zika virus. In some embodiments, the target RNA is from a cell which expresses long coding RNAs (IncRNAs), for example and without limitation MALAT1 or XIST, or a cell which expresses IncRNAs (e.g., MALAT1 or XIST) at high levels. In some embodiments, the lncRNA is from a cancer cell or tumor.
The target RNA, in some embodiments, can range in length from about 10 nucleotides to about 100,000 nucleotides, from about 100 nucleotides, to about 50,000 nucleotides, from about 300 nucleotides to about 10,000 nucleotides, or from about 500 nucleotides to about 5,000 nucleotides. The target RNA can be range in length from any of the above minimums to any of the preceding maximum nucleotide lengths (e.g., from 10 nucleotides to about 10,000 nucleotides, or from about 300 nucleotides to about 100,000 nucleotides). The target RNA, in some embodiments, can have a length of at least 10 nucleotides, at least 50 nucleotides, at least 100 nucleotides, at least 200 nucleotides, at least 300 nucleotides, at least 500 nucleotides, at least 1 ,000 nucleotides, at least 2,500 nucleotides, at least 5,000 nucleotides, at least 7,000 nucleotides, at least 10,000 nucleotides or more.
In some embodiments, the target RNA sequence can be analyzed by a computer-based or internet-based program which analyzes, predicts, and/or models nucleotide structure (e.g., the folding structure of a single-stranded RNA molecule). Structural modeling can, in some instances, aid in selecting single-stranded DNA oligonucleotide guide molecules. For example, ssDNA guide molecules which are complementary to RNA sequences having unpaired nucleotides can be selected, as they may be predicted, in some instances, to have improved binding kinetics with a Argonaute polypeptide:guide complex.
In some embodiments, the DNA encoding a yeast Argonaute polypeptide is encoded by SEQ ID NO:34, SEQ ID NO:35, or SEQ ID NO:36. In some embodiments, the DNA encoding a yeast Argonaute polypeptide is encoded by SEQ ID NO: 35. In some embodiments, the DNA encoding a yeast Argonaute polypeptide is encoded by a nucleic acid sequence which is at least 60%, at least 75%, at least 80%, at least 85%, at least 90%, at least 95%, at least 97.5%, or at least 99% identical to SEQ ID NO:34, SEQ ID NO:35, or SEQ ID NO:36. In some embodiments, the DNA encoding a yeast Argonaute polypeptide is encoded by a nucleic acid sequence which is at least 60%, at least 75%, at least 80%, at least 85%, at least 90%, at least 95%, at least 97.5%, or at least 99% identical to SEQ ID NO:35.
In some embodiments, the first and/or second single-stranded oligonucleotide guide molecule (for example, ssDNA) has at least one chemically modified nucleotide. These modified nucleotides may confer increased stability, decreased off-target effects, and/or reduced toxicity, as compared to a ssDNA not having the chemically modified nucleotide.
In some embodiments, the at least one chemically modified nucleotide comprises a chemically modified nucleobase, a chemically modified ribose, a chemically modified phosphodiester linkage, or a combination thereof.
In some embodiments, the chemically modified nucleobase is selected from 5- formylcytidine (5fC), 5-methylcytidine (5meC), 5-methoxycytidine (5moC), 5- hydroxycytidine (5hoC), 5-hydroxymethylcytidine (5hmC), 5-formyluridine (5fU), 5- methyluridine (5-meU), 5-methoxyuridine (5moU), 5-carboxymethylesteruridine (5camU), pseudouridine (Ψ), N^methylpseudouridine (mel F), N6-methyladenosine (me6A), or thienoguanosine (thG).
In one embodiment, the chemically modified ribose is selected from 2'-0-methyl (2'- O-Me), 2'-Fluoro (2'-F), 2'-deoxy-2'-fluoro-beta-D-arabino-nucleic acid (2'F-ANA), 4'-S, 4'- SFANA, 2'-azido, UNA, 2'-0-methoxy-ethyl (2'-0-ME), 2'-0-Allyl, 2'-0-Ethylamine, I'-O- Cyanoethyl, Locked nucleic acid (LAN), Methylene-cLAN, N-MeO-amino BNA, or N-MeO- aminooxy BNA.
In one embodiment, the chemically modified phosphodiester linkage is selected from Phosphorothioate (PS), Boranophosphate, phosphodithioate (PS2), 3',5'-amide, N3'- phosphoramidate (NP), Phosphodiester (PO), or 2',5'-phosphodiester (2',5'-PO). In general, a guide ssDNA sequence is any polynucleotide sequence having sufficient complementarity with a target polynucleotide sequence to hybridize with the target sequence and direct cleavage of the target sequence. In some embodiments, the degree of complementarity between a guide ssDNA sequence and its corresponding RNA target sequence, when optimally aligned using a suitable alignment algorithm, is about or more than about 50%, 60%, 75%, 80%, 85%, 90%, 95%, 97.5%, 99%, or more. In some embodiments, the guide ssDNA is perfectly complementary (has perfect complementarity) with its corresponding RNA target sequence, when optimally aligned using a suitable alignment algorithm.
High-Throughput Methods
In one aspect, provided herein is a method of detecting nuclease accessibility sites in an RNA sequence, the method comprising a) binding to a target RNA sequence a complex comprising a yeast Argonaute polypeptide and a first single-stranded DNA oligonucleotide guide molecule, wherein the single-stranded DNA oligonucleotide guide molecule is complementary to the target RNA sequence; b) cleaving the target RNA sequence with the Argonaute polypeptide:guide complex to form an RNA cleavage product; c) detecting the RNA cleavage product; and d) determining a nuclease accessibility site based on the RNA cleavage product.
The complex comprising a yeast Argonaute polypeptide and a first ssDNA guide molecule, wherein the ssDNA guide molecule is complementary to the target RNA sequence, can be any Argonaute polypeptide:guide complex described herein.
In some embodiments, the binding step a) comprises binding to a target RNA sequence a second complex comprising a yeast Argonaute polypeptide and a second ssDNA guide molecule, wherein the second ssDNA guide molecule is complementary to the target RNA sequence. In some embodiments, the first and second ssDNA guide molecules are not identical. That is, the first and second ssDNA guide molecules do not contain the exact same oligonucleotide sequence.
In some or further embodiments, binding the second Argonaute polypeptide:guide complex can be performed in a separate reaction compared to binding the first Argonaute polypeptide:guide complex. In some embodiments, binding the second Argonaute polypeptide:guide complex can be performed in a prior, contemporaneous, or subsequent reaction compared to binding the first Argonaute polypeptide:guide complex. As such, in some embodiments, the method can be a high throughput method. In some embodiments, binding the first and second Argonaute polypeptide:guide complexes can occur in an assay (e.g., in a 96-well or 384-well microtiter plate). In such embodiments, the target RNA may be cleaved in two (or more) locations if the first and second Argonaute polypeptide:guide complexes bind to and cleave the target RNA at different locations. Thus, the separate reaction mixtures would contain different RNA cleavage products.
Alternatively, in some or further embodiments, binding the second Argonaute polypeptide:guide complex can be performed in the same reaction mixture as the binding of the first Argonaute polypeptide: guide complex. In such embodiments and as discussed above, the target RNA may be cleaved in two (or more) locations if the first and second Argonaute polypeptide:guide complexes bind to and cleave the target RNA at different locations. However, the resultant different RNA cleavage products will be present in the same reaction mixture.
In some embodiments, the binding step a) comprises binding to a target RNA sequence a third complex comprising a yeast Argonaute polypeptide and a third single-stranded DNA oligonucleotide guide molecule (hereinafter, a "third Argonaute polypeptide:guide complex"), wherein the third single-stranded DNA oligonucleotide guide molecule is complementary to the target RNA sequence. In some embodiments, the binding step a) comprises binding to a target RNA sequence a fourth, fifth, sixth, seventh, eighth, ninth, tenth, or more Argonaute polypeptide:guide complexes. The number of Argonaute polypeptide:guide complexes which can be bound to the target RNA is not particularly limited. The primary limiting feature of the number of Argonaute polypeptide:guide complexes which can be bound to the target RNA is the number of distinct ssDNA guide molecule which can bind to the target RNA. Thus, in some embodiments, the binding step a) comprises binding to a target RNA sequence a library of Argonaute polypeptide:guide complexes.
The cleaving step b) is performed by a nuclease. The nuclease comprises the Argonaute polypeptide:guide complex. After binding a target RNA sequence, the Argonaute polypeptide: guide complex can, in some embodiments, cleave the target RNA sequence.
The detecting step c) detects whether the target RNA sequence was cleaved by detecting an RNA cleavage product. In some embodiments, the detecting step c) comprises reverse transcribing the RNA cleavage product to form a cDNA reverse transcript. In some or further embodiments, the detecting step c) comprises amplifying or extending a cDNA reverse transcript. In some embodiments, the RNA cleavage product can be reverse transcribed and extended by reverse transcription polymerase-chain reaction (RT-PCR)-coupled primer extension. Extension of the RNA cleavage product is ideally performed by binding a DNA primer complementary to the 3' end of the target RNA, then extending the primer along the RNA, using an RNA-dependent DNA polymerase, to the site of cleavage. Repetitive cycling of the RT-PCR-primer extension process amplifies the cDNA reverse transcripts.
In some or further embodiments, the cDNA reverse transcript is separated based on size. Separating cDNA reverse transcripts (e.g., separating based on size) can aid in detecting and distinguishing the RNA cleavage products (via detecting and distinguishing the cDNA reverse transcripts). In some embodiments, the cDNA reverse transcript is separated by capillary electrophoresis.
Step d) requires determining a nuclease accessibility site based on the RNA cleavage product. Nuclease accessibility sites can be determined by analyzing the RNA cleavage product or cDNA reverse transcript. In some embodiments, the nuclease accessibility site is determined by determining the 3 ' nucleotide in the cDNA reverse transcript. In some embodiments, the nuclease accessibility site is determined by determining the size of the cDNA reverse transcript, for example by electrophoresis, particularly capillary electrophoresis. In some or further embodiments, the nuclease accessibility site is determined based on the sequence of the RNA cleavage product, for example by sequencing the cDNA reverse transcript.
In some embodiments, step d) determines only one nuclease accessibility site in the target RNA sequence. In some embodiments, more than one nuclease accessibility site is determined in the target RNA sequence. For example, two, three, four, or a plurality of nuclease accessibility site are determined in the target RNA sequence.
In another aspect, provided herein is a method of high-throughput detection of nuclease accessibility sites, the method comprising a) assaying a target RNA sequence with two or more Argonaute polypeptide: guide complexes, wherein each complex comprises a yeast Argonaute polypeptide and a single-stranded DNA oligonucleotide guide molecule from a library of single-stranded DNA oligonucleotide guide molecules, wherein each single-stranded DNA oligonucleotide guide molecule is complementary to a portion of the target RNA sequence; b) cleaving the target RNA sequence with the Argonaute polypeptide: guide complexes to form at least one RNA cleavage product; c) detecting the at least one RNA cleavage product; and d) determining a nuclease accessibility site based on the at least one RNA cleavage product.
Each complex comprising a yeast Argonaute polypeptide and a ssDNA guide molecule from a library of ssDNA guide molecules, wherein each ssDNA guide molecule is complementary to a portion of the target RNA sequence can be any Argonaute polypeptide:guide complex described herein. The assaying step a) requires two or more Argonaute polypeptide: guide complexes, wherein each complex comprises a library of ssDNA guide molecules. As used herein, a "library" of single-stranded DNA oligonucleotide guide molecules means at least two or more (e.g., three, four, five, or a plurality of single-stranded DNA oligonucleotide guide molecules. Each ssDNA guide molecule forms a separate complex with the yeast Argonaute polypeptide. As such, the method is a high throughput method.
In some embodiments, the single-stranded DNA oligonucleotide guide molecules in the library are not identical. That is, any given two or more (e.g., first and second) ssDNA guide molecules do not contain the exact same oligonucleotide sequence.
In some or further embodiments, assaying a target RNA sequence with two or more
Argonaute polypeptide:guide complexes can be performed in separate reactions. In other words, the target RNA sequence is assayed separately with each of the two or more Argonaute polypeptide:guide complexes. In some embodiments, assaying a target RNA sequence with two or more Argonaute polypeptide: guide complexes can be performed in prior, contemporaneous, or subsequent reactions. In some embodiments, assaying a target RNA sequence with two or more Argonaute polypeptide:guide complexes can occur in a 96-well or 384-well microtiter plate. In such embodiments, the target RNA may be cleaved in two (or more) locations if the two or more Argonaute polypeptide:guide complexes bind to and cleave the target RNA at different locations. Thus, the separate reaction mixtures would contain different RNA cleavage products.
Alternatively, in some or further embodiments, assaying a target RNA sequence with two or more Argonaute polypeptide:guide complexes can be performed in the same reaction mixture. In other words, the target RNA sequence is assayed together with each of the two or more Argonaute polypeptide:guide complexes in a mixture. In such embodiments and as discussed above, the target RNA may be cleaved in two (or more) locations if the two or more Argonaute polypeptide:guide complexes bind to and cleave the target RNA at different locations. However, the resultant different RNA cleavage products will be present in the same reaction mixture.
In some embodiments, the library of ssDNA guide molecules comprises at least three, four, five, six, seven, eight, nine, ten, or more single-stranded DNA oligonucleotide guide molecules. Thus, in some embodiments, the assay step a) comprises assaying a target RNA sequence with at least three, four, five, six, seven, eight, nine, ten, or more Argonaute polypeptide:guide complexes. The number of Argonaute polypeptide:guide complexes which can be bound to the target RNA is not particularly limited. The primary limiting feature of the number of Argonaute polypeptide:guide complexes which can be bound to the target RNA is the number of distinct ssDNA guide molecule which can bind to the target RNA.
The cleaving step b) is performed by a nuclease. The nuclease comprises the Argonaute polypeptide:guide complex. After binding a target RNA sequence, the Argonaute polypeptide: guide complex can, in some embodiments, cleave the target RNA sequence.
The detecting step c) detects whether the target RNA sequence was cleaved by detecting an RNA cleavage product. In some embodiments, the detecting step c) comprises reverse transcribing the RNA cleavage product to form a cDNA reverse transcript. In some or further embodiments, the detecting step c) comprises amplifying or extending a cDNA reverse transcript. In some embodiments, the RNA cleavage product can be reverse transcribed and extended by reverse transcription polymerase-chain reaction (RT-PCR)-coupled primer extension. Extension of the RNA cleavage product is ideally performed by binding a DNA primer complementary to the 3' end of the target RNA, then extending the primer along the RNA, using an RNA-dependent DNA polymerase, to the site of cleavage. Repetitive cycling of the RT-PCR-primer extension process amplifies the cDNA reverse transcripts.
In some embodiments, the cDNA reverse transcript is separated based on size. Separating cDNA reverse transcripts (e.g., separating based on size) can aid in detecting and distinguishing the RNA cleavage products (via detecting and distinguishing the cDNA reverse transcripts). In some embodiments, the cDNA reverse transcript is separated by capillary electrophoresis.
Step d) requires determining a nuclease accessibility site based on the RNA cleavage product. Nuclease accessibility sites can be determined by analyzing the RNA cleavage product or cDNA reverse transcript. In some embodiments, the nuclease accessibility site is determined by determining the 3 ' nucleotide in the cDNA reverse transcript. In some embodiments, the nuclease accessibility site is determined by determining the size of the cDNA reverse transcript, for example by electrophoresis, particularly capillary electrophoresis. In some or further embodiments, the nuclease accessibility site is determined based on the sequence of the RNA cleavage product, for example by sequencing the cDNA reverse transcript.
In some embodiments, step d) determines only one nuclease accessibility site in the target RNA sequence. In some embodiments, more than one nuclease accessibility site is determined in the target RNA sequence. For example, two, three, four, or a plurality of nuclease accessibility site are determined in the target RNA sequence. In another aspect, provided herein is a method of detecting sites for gene expression attenuation in a cell, the method comprising: a) introducing into a biological cell a yeast Argonaute polypeptide and a library of single-stranded DNA oligonucleotide guide molecules, wherein each single-stranded DNA oligonucleotide guide molecule is complementary to a target RNA molecule; b) cleaving the target RNA sequence with the Argonaute polypeptide: guide complexes to form at least one RNA cleavage product; c) detecting the at least one RNA cleavage product; and d) determining a nuclease accessibility site based on the at least one RNA cleavage product. The biological cell can be any biological cell containing RNA. In some embodiments, the biological cell is a mammalian cell. In some embodiments, the biological cell is a human cell.
In another aspect, provided herein is a method of attenuating expression of a target gene in a cell, the method comprising a) introducing into a biological cell a yeast Argonaute polypeptide and a library of single-stranded DNA oligonucleotide guide molecules, wherein each single-stranded DNA oligonucleotide guide molecule is complementary to a target RNA molecule; and b) cleaving the target RNA sequence with the Argonaute polypeptide:guide complexes, wherein cleaving the target RNA sequence attenuates the expression of the target gene. The biological cell can be any biological cell containing RNA. In some embodiments, the biological cell is a mammalian cell. In some embodiments, the biological cell is a human cell.
In another aspect, provided herein is a method of mapping nuclease accessibility sites in an RNA sequence, the method comprising a) binding to a target RNA sequence a complex comprising a yeast Argonaute polypeptide and a first single-stranded DNA oligonucleotide guide molecule, wherein the single-stranded DNA oligonucleotide guide molecule is complementary to the target RNA sequence; b) cleaving the target RNA sequence with the Argonaute polypeptide:guide complex to form an RNA cleavage product; c) detecting the RNA cleavage product; and d) mapping nuclease accessibility site based on the RNA cleavage product.
High- Throughput Kits
In one aspect, disclosed herein is a kit comprising a vector comprising a nucleic acid sequence encoding a yeast Argonaute polypeptide operably linked to a promoter; an RNA- dependent DNA polymerase; a set of buffered RNA cleavage reagents; and a set of buffered reverse transcription reagents. In some embodiments, the Argonaute polypeptide is from a yeast. In some embodiments, the Argonaute polypeptide is from Vanderwaltozyma polyspora (also known as Kluyveromyces polysporus). In some embodiments, the Argonaute polypeptide is from a eukaryote. In some embodiments, the Argonaute polypeptide is from a mammal. In some embodiments, the Argonaute polypeptide is from a primate. In some embodiments, the Argonaute polypeptide is from a human (for example, hAGOl, hAG02, hAG03, or hAG04).
In some embodiments, the library of ssDNA guide molecule comprises ssDNA guide molecules which are not identical. That is, the any given two ssDNA guide molecules do not contain the exact same oligonucleotide sequence.
The library of ssDNA guide molecules can be complementary to a target RNA sequence. The target RNA sequence, in some embodiments, is one continuous RNA molecule. Alternatively, the target RNA sequence can be a target RNA sequence on different RNA molecules. Typically, the first and second ssDNA guide molecules bind target RNA sequences which are not identical. However, depending on the nucleotides of a ssDNA guide molecule which hybridize with the target RNA sequence, the first and second ssDNA guide molecules can bind overlapping or even identical target RNA sequences.
Non-limiting examples of vectors that can be used to introduce expression vectors that encode Argonaute in various cell types: a nucleic acid vector (e.g., a plasmid vector) encoding Argonaute can be delivered directly to bacterial cells or cultured cells (e.g., mammalian cells) by electroporation; a nucleic acid vector (e.g., a plasmid vector) encoding Argonaute can be delivered directly to bacterial cells by chemical transformation; a viral vector (e.g., a retroviral vector, adenoviral vector, an adeno associated viral vector, an alphavirus vector, a vaccinia viral vector, a herpes viral vector, etc., as are known in the art) comprising a nucleotide sequence encoding Argonaute can be used to deliver Argonaute to cells (e.g., mammalian cells); a baculovirus expression system can be used to deliver Argonaute to insect cells; Agrobacterium mediated delivery can be employed in plants; and/or lipid mediated delivery (e.g., lipofectamine, oligofectamine) can also be employed for mammalian cells.
In some embodiments, the gene sequence (for example, of a gene expressing Argonaute) may be codon optimized, without changing the resulting polypeptide sequence. In some embodiments, the codon optimization includes replacing at least one, or more than one, or a significant number, of codons with one or more codons that are more frequently used in various organisms. In some embodiments, the codon optimization increases expression of the optimized gene sequence. Argonaute Proteins
MicroRNAs (miRNAs) are the regulatory small RNAs that control gene expression by inhibition of translation or degradation of messenger RNAs (mRNAs) containing a complementary sequence. To degrade the target mRNAs, miRNAs need to be loaded onto Argonaute (AGO) proteins, forming a ribonucleoprotein complex called the RNA-induced silencing complex (RISC). A complex of an AGO and a guide strand alone is referred to as 'the mature RISC' or simply 'RISC'. The same complex is also called 'the RISC core' in the context when the RISC stands for a huge complex including many components required for translational repression and/or deadenylation. The bound guide strand takes the RISC to the target mRNAs, which often possess the sequence complementarity to the guide in the 3' untranslated region (3' UTR).
The AGO proteins belong to the PIWI protein superfamily, defined by the presence of a PIWI (P element-induced wimpy testis) domain. In addition, all eukaryotic Argonautes (eAGOs) feature an N (N-terminal) domain, a PAZ (PIWI-Argonaute-Zwille) domain and a MID (middle) domain, along with two domain linkers, LI and L2. Many prokaryotic genomes also feature ago genes. Long prokaryotic Argonaute proteins (pAGOs) encompass the same domains as eAGOs, whereas short pAGOs consist of only the MID and PIWI domains.
As used herein, the term "Argonaute" refers to a protein which mediates RNA cleavage and has an amino acid sequence at least 60 percent identical, and more preferably at least 75, 85, 90 or 95 percent identical to SEQ ID NO: 31. As used herein, the term "yeast Argonaute" refers to a protein, from a yeast, which mediates RNA cleavage and has an amino acid sequence at least 60 percent identical, and more preferably at least 75, 85, 90 or 95 percent identical to SEQ ID NO: 31.
In some embodiments, the Argonaute polypeptide is from a yeast. In some embodiments, the Argonaute polypeptide is from Vanderwaltozyma polyspora (also known as
Kluyveromyces polysporus). Additional non-limiting examples of yeast Argonaute polypeptides can be from additional yeast species of the genus Kluyveromyces: K. aestuari,;
K. africanus, K. bacillisporus, K. blattae, K. dobzhanskii, K. hubeiensis, K. lactis, K. lodderae,
K. marxianus, K. nonfermentans, K. piceae, K. sinensis, K. thermotolerans, K. waltii, K. wickerhamii, or K. yarrowii. Additional non-limiting examples of yeast Argonaute polypeptides can be from Yarrowia lipolytica, Pichia pastori, Candida vulgaris,
Saccharomyces castellii, or Schizosaccharomyces pombe.
In some embodiments, the Argonaute polypeptide is from a eukaryote. In some embodiments, the Argonaute polypeptide is from a mammal. In some embodiments, the Argonaute polypeptide is from a primate. In some embodiments, the Argonaute polypeptide is from a human (for example, hAGO 1 , hAG02, hAG03, or hAG04). The number of Argonaute family members (genes) ranges from one in Schizosaccharomyces pombe to twenty-seven in Caenorhabditis elegans. Other non-limiting examples of eukaryotic Argonaute proteins are found in Homo sapiens (8), Rattus norvegicus (8), Rattus norvegicus (8), Drosophila melanogaster (5), Arabidopsis thaliana (10), and Neurospora crassa (2). (Hock, J and G Meister. Genome Biology 2008 9:210). Argonautes are key components of RISC in mammals, fungi, worms, protozoans and plants (M.A. Carmell et al, Nat. Struct. Mol. Biol. 11, 214 (2004)).
In some aspects, provided herein is a method for cleaving an RNA molecule, comprising:
binding to a target RNA sequence a complex comprising:
an Argonaute polypeptide; and
a heterologous, single-stranded oligonucleotide guide molecule;
wherein the single-stranded oligonucleotide guide molecule is a DNA oligonucleotide that is complementary to the target RNA sequence; and
wherein the Argonaute polypeptide:guide molecule complex cleaves the target RNA sequence.
In some embodiments, the Argonaute polypeptide is a full length Argonaute polypeptide. In some embodiments, the Argonaute polypeptide comprises a portion of the Argonaute protein. In some embodiments, disclosed herein is a truncated Argonaute polypeptide termed "miniature Argonaute (mini-AGO)". In some embodiments, disclosed herein is an Argonaute polypeptide comprising SEQ ID NO:33. In some embodiments, the Argonaute polypeptide is isolated and/or purified.
In one embodiment, the Argonaute polypeptide is a wild-type sequence. In one embodiment, the Argonaute polypeptide is a sequence with at least one mutation. In one embodiment, the Argonaute polypeptide comprises an amino acid sequence that is different from a naturally-occurring Argonaute polypeptide.
In one embodiment, the Argonaute polypeptide is selected from SEQ ID NO:31, SEQ ID NO:32, or SEQ ID NO:33. In one embodiment, the Argonaute polypeptide is SEQ ID NO:31. In one embodiment, the Argonaute polypeptide is SEQ ID NO:32. In one embodiment, the Argonaute polypeptide is SEQ ID NO: 33.
In some embodiments, the system and methods may comprise additional polypeptides in addition to the Argonaute polypeptide. For example, additional components of the RISC complex may be present. EXAMPLES
The following examples are set forth below to illustrate the compositions, systems, vectors, methods, and results according to the disclosed subject matter. These examples are not intended to be inclusive of all aspects of the subject matter disclosed herein, but rather to illustrate representative methods and results. These examples are not intended to exclude equivalents and variations of the present invention which are apparent to one skilled in the art.
Example 1. DNA-Guided RNA Cleavage by Yeast Argonaute
MicroRNAs and small interfering RNAs (siRNAs) are incorporated into Argonaute proteins to assemble the RNA-induced silencing complex, RISC (Meister, G. Nat Rev Genet 145 447-459 (2013); Nakanishi, K. Wiley Interdiscip Rev RNA 7, 637-660 (2016); Hammond, S. M., et al. Science 293, 1146-1150 (2001)). The loaded RNAs, pre-organized in the nucleic acid-binding channel (Elkayam, E. et al. Cell 150, 100-110 (2012); Faehnle, C. R., et al. Cell Rep 3, 1901-1909 (2013); Nakanishi, K. et al. Cell Rep 3, 1893-1900 (2013); Nakanishi, K., et al. Nature 486, 368-374 (2012); Wang, Y., et al. Nature 456, 209-213 (2008); Schirle, N. T. & MacRae, I. J. Science 336, 1037-1040 (2012)), serve as guides to facilitate base pairing with targets (Parker, J. S., et al. Mol Cell 33, 204-214 (2009)). It is well known that the RISCs open their bilobal structures to widen the intervening channel during the transition from nucleation to propagation steps of guide-target duplex formation (Wang, Y. et al. Nature 461, 754-761 (2009); Schirle, N. T., et al. Science 346, 608-613 (2014)) and cleave the targets only when their sequence perfectly matches the guide (Barrel, D. P. Cell 136, 215-233 (2009); Martinez, J., et al. Cell 110, 563-574 (2002); Hutvagner, G. & Zamore, P. D. Science 297, 2056-2060 (2002)). However, the significance of the proteinaceous part of RISC for this step has not been studied well due to the difficulty of making suitable constructs.
In this example, the structure-based design of ayeast Argonaute C-terminal lobe termed "miniature Argonaute (mini-AGO)" is disclosed. Despite lacking half of the molecule, mini- AGO is capable of duplex loading, passenger ejection, and even guide-dependent target cleavage. The crystal structure shows that mini-AGO mirrors its full length in terms of recognizing the seed of the guide strand and completing the composite catalytic tetrad. These observations demonstrate that only binding of the 5'-end seven nucleotides to the C-terminal lobe assembles and activates eukaryotic RISCs for gene silencing, unlike their prokaryotic counterparts which remain inactive until extensive propagation of guide-target duplex (Wang, Y. et al. Nature 461, 754-761 (2009)). It was also found that even with such similarities to its full length, mini-AGO cannot modulate target cleavage in response to mismatches. These results suggest that the channel between the N- and C-terminal lobes of Argonaute proteins functions as a gatekeeper that scrutinizes if the incoming targets retain sufficient base complementarity to the guide. The resultant widening of the tapered channel serves as a prerequisite to passing the gate in order to reach the catalytic site. This is the first evidence that the proteinaceous component of RISC is indeed involved in target recognition as the judge of cleavage.
To understand how the N- and C-terminal lobes cooperate to recognize guide and target strands, it is important to study the sole function of each lobe. To this end, a C-terminal construct based on the crystal structure of yeast Vanderwaltozyma polyspora Argonaute was designed (AGO in Fig. l a, Protein Data Bank accession code 4F1N) that lacks an extended N- terminal region but retains comparable RNA interference (RNAi) activity (Nakanishi, K., et al. Nature 486, 368-374 (2012)). The two lobes are connected by two strands, βΐ in the N-domain and β20 in the L2 linker domain, both of which are part of an extended β-sheet of the PIWI domain (Fig. 5a,b). Preceding the βΐ , a conserved RxxxGxxG (R, arginine; and G, glycine) sequence motif sews through the PIWI domain, significantly stabilizing the C-terminal lobe (Fig. lb and Fig. 6). These fragments missing in the previously reported C-terminal constructs (Boland, A., et al. Proc Natl Acad Sci USA 108, 10466-10471 (201 1); Hur, J. K., et al. J Biol Chem 288, 7829-7840 (2013)) were fused with the MID and PIWI domains to make a stable C-terminal lobe (Fig. l a and Data Fig. 5c,d). The recombinant protein was expressed in Escherichia coli, and co-purified with tightly bound nucleic acids that were resistant to DNase but not to RNase (Fig. l c and Fig. 7a), indicating that the C-terminal construct alone can autonomously load endogenous cellular RNAs during overexpression, like AGO (Nakanishi, K., et al. Nature 486, 368-374 (2012). A small population of the purified protein, however, remained RNA-free and could be loaded with synthetic miR-20a guide, evidenced by the cleavage of a cap-labelled 60-nt target RNA harboring a perfectly matched sequence at the same position as did AGO (Fig. l d,e). These results indicate that the designed C-terminal construct retains the activity to cleave RNA targets in an RNA guide-dependent manner. Therefore, this construct is hereafter referred to as miniature Argonaute (mini-AGO).
The RISC assembly with RNAs over DNAs in E. coli occurred presumably due to the circumstance in which very little 5'-monophosphorylated single-stranded DNAs (ssDNAs), if any, exist in the cytoplasm, though the genomic DNA is not spatially separated from transcripts. To test if AGO can be loaded with DNA as a guide, the recombinant protein was incubated with a synthetic 5' phosphorylated ssDNA of the genomic sequence of miR-20a (Fig. If), followed by addition of the cap-labeled 60-nt matched RNA target. As a result, the deoxyribonucleoprotein complex cleaved the RNA target (Fig. lg), suggesting that eukaryotic Argonaute proteins can assemble a functional DNA-induced silencing complex (DISC) in vitro. Neither AGO-RISC nor AGO-DISC, however, cleaved the DNA target. Similarly, mini- AGO did not slice the DNA target regardless of the type of guide but it cleaved the RNA target when programmed with the DNA guide (Fig. lh). Switching the guide from ssDNA to ssRNA increased the RNA cleavage efficiencies of AGO and mini -AGO 1.18 and 1.32 fold, respectively (Fig. lg,h). These results indicate that the requirements for target specificity reside in the C-terminal lobe. In addition, the results demonstrate that yeast Argonaute can use either DNA or RNA as guides to cleave only RNAs. This is not consistent with the substrate specificities of prokaryotic Argonaute proteins, which exclusively use a DNA or RNA guide to target both DNA and RNA (Table 1) (Swarts, D. C. etal. Nature 507, 258-261 (2014); Kaya, E. et al. Proc Natl Acad Sci U SA 113, 4057-4062 (2016); Olovnikov, I., et al. Mol Cell 51, 594-605 (2013)).
Consistency of the product lengths between AGO and mini -AGO (Fig. 1 e) was a strong indication that both treat guides in the same manner. To validate this idea, the 2.1 A crystal structure of mini-AGO with co-purified RNAs was determined (Fig. 2a and Table 2), which also proved the significance of the RxxxGxxG motif (Fig. 2b).
Table 1. Differences in the substrate specificities among Argonaute proteins
Guide Target
cleavag referenc
RNA DNA RNA DNA e e
Kluyveromyces / / / this work polysporus / / / this work Thermus / 18 thermophilus / / / 18
/ / /
Maiinitoga piezopniia
/ / /
Rhodobacter
sphaeroides / / 20 Table 2. Data collection and refinement statistics (molecular replacement)
mini-AGO in com plex
with guide RNA
Data collection
Space grou p P2
Cel l dimensions
a, b, c (A) 119.6, 85.6, 127.9
<¾ A r(°) o 90.0, 89.8, 90.0
Resolution (A) 50.00-2.10 (2.14-2.10)a
/?sym 15.0 (58.4)
IMD 19.4 (2.9)
Com pleteness (%) 99.9 (99.4)
Redu nda ncy 3.8 (3.7)
Refinement
Resolution (A) 47.13-2.10 (2.14 - 2.10)
No. reflections 150,594
/?work / Rfree (%) 15.76 / 20.58 (19.29 / 25.1
No. atoms
Protein 16180
RNA 628
Water 1423
B factors
Protein 32.5
RNA 43.5
Water 40.7
R. m.s. deviations
Bond lengths (A) 0.008
Bond a ngles (°) 0.914
Number of crystals for each structure should be noted in footnote.
a Values in parentheses are for highest-resolution shell. The structure showed a clear electron density map of the bound RNA whose 5' nucleotide was captured at the interface between the MID and PIWI domains while the remainder ran along the exposed nucleic acid-binding channel, as does the AGO-bound guide nucleotides 1-7 (gl- g7) (Fig. 2c and Fig. 7b-d) (Elkayam, E. et al. Cell 150, 100-110 (2012); Faehnle, C. R, et al. Cell Rep 3, 1901-1909 (2013); Nakanishi, K. et al. Cell Rep 3, 1893-1900 (2013); Nakanishi, K., et al. Nature 486, 368-374 (2012); Schirle, N. T. & MacRae, I. J. Science 336, 1037-1040 (2012)). Although the 0 -Fc map revealed continuous electron density of the 7-nt guide RNA (Fig. 7b,c), the extracted RNAs from the purified mini-AGO and crystal were longer than 7-nt (Fig. lc and Fig. 7e), indicating that the bound guide RNAs post g7 are free to move in solution. Nevertheless, mini-AGO cleaved the target between tlO and ti l, which pair to the guide strand at gl O and gl l , respectively, in a guide-dependent manner (Fig. l e). Therefore, base pairing after g7 is propagated independent of the N-terminal lobe and properly orients the target strand so that positions tl O and ti l are arranged in the vicinity of the catalytic site. The current structure also showed the rearrangement of the conserved glutamate finger that completes the catalytic tetrad (Fig. 2d), a hallmark of the catalytically active conformation (Nakanishi, K., et al. Nature 486, 368-374 (2012). These observations demonstrate that binding of the 5'-terminal seven nucleotides to the C-terminal lobe is sufficient to drive eukaryotic Argonaute proteins to be assembled and activated for RNA cleavage without the aid of the N-terminal lobe. This highlights the difference in the activation mechanism from their prokaryotic counterparts that require an extensive propagation of guide-target base pairing towards the 3' end of the guide (Wang, Y. et al. Nature 461, 754-761 (2009)).
Structural and functional data have supported that mini-AGO is a competent construct in terms of guide-dependent target cleavage, which raised the question as to whether mini- AGO can load an siRNA duplex, cleave and discard the passenger strand, and recognize and cleave target RNAs, like natural Argonaute proteins do physiologically. To this end, each stage was tested in vitro comparing to AGO (Fig. 8a). An siRNA duplex in which one strand corresponded exactly to miR-20a (Fig. 8b) was incubated with either AGO or mini-AGO. Mini- AGO cleaved the 5'-end labeled passenger strand of the miR-20a siRNA at the expected position, as did AGO (Fig. 3a, and Fig. 8c,d). To examine passenger ejection and subsequent target recognition and cleavage, mini-AGO was pre-incubated with an unlabeled miR-20a siRNA (Fig. 8e), followed by addition of a cap-labelled target RNA containing a sequence perfectly matched to the miR-20a guide. As a result, mini-AGO generated a cleavage product of expected size, as did AGO, diagnostic of RNAi activity (Fig. 3b, and Fig. 8f,g). These data offer evidence that mini-AGO retains the identity of catalytically active Argonaute proteins. What then is the role of the N-terminal lobe? In the case of slicer-dependent Argonaute proteins like human Argonaute2 as well as yeast Argonaute, target cleavage occurs only when there is extensive base pairing between the two strands (Martinez, J., et al. Cell 1 10, 563-574 (2002); Hutvagner, G. & Zamore, P. D. Science 297, 2056-2060 (2002)). On the other hand, target RNAs whose bases break Watson-Crick pairing to the guide strand at glO and gl 1, termed the tlO-tl l step, were poor substrates for AGO (Fig. 3c,d left, and Fig. 9a) as previously shown (Nakanishi, K., et al. Nature 486, 368-374 (2012)). On the contrary, mini-AGO was able to efficiently cleave the target including the tlO-tl l mismatches (Fig. 3c,d right, and Fig. 9b). This result indicates that the N-terminal lobe is essential to modulate target cleavage in response to mismatches, which is another important feature of catalytically active Argonaute proteins.
Previous structural studies of human Argonaute2 and Thermus thermophilus Argonaute showed that the valley between the two lobes is wider when the loaded guide base-pairs with the incoming target RNA (Wang, Y., et al. Nature 456, 209-213 (2008); Wang, Y. et al. Nature 461, 754-761 (2009); Schirle, N. T., et al. Science 346, 608-613 (2014)). The data have shown the tolerance of target cleavage to the tlO-tl 1 mismatch due to loss of the N-terminal lobe (Fig. 3d). The tapered channel may serve as a physical barrier to check the base complementarity between g9-gl2 and t9-tl2 prior to target cleavage between tlO and tl 1. To test this idea, miR- 20a variants trimmed at their 3' ends into different sizes (10, 11, 12, 13, 14, 16, or 23 nt) were loaded into either AGO or mini-AGO, followed by addition of the cap-labelled matched target (Fig. 10a, b). The 12-nt guide promoted the onset of target cleavage by the AGO-RISC (Fig 4a), indicating that a minimum of 12-nt of guide is required to widen the tapered channel by base pairing with the bound target. This target cleavage is enhanced by the propagation of base pairing beyond gl2 (Fig. 4a). In contrast, the 12-nt guide enabled mini-AGO to promote -30% cleavage of the matched target relative to cleavage guided by a 23-nt miR-20a. (Fig. 4b). Moreover, mini-AGO was able to cleave the target almost as efficiently using a 14-nt guide as it does with a 23-nt guide (Fig. 4b). This difference between AGO and mini-AGO strongly supports the idea that the tapered channel serves as a gatekeeper of target cleavage. The difference was even more apparent for a tlO-tl 1 mismatched target which impaired the target cleavage by AGO regardless of the length of guide (Fig. 4c). In the absence of the tapered channel, however, targets were cleaved by mini-AGO as long as the guide and target strands formed at least 3-nt of continuous base pairs after the mismatched site (Fig. 4d and Fig. 10c,d). These results indicate that eukaryotic Argonaute proteins exploit the tapered channel to gauge the base complementarity before deciding if the target is suitable for cleavage or not.
The structure of RISCs revealed the solvent-exposed g2-g4 of Argonaute-bound guide RNAs (Elkayam, E. et al. Cell 150, 100-110 (2012); Nakanishi, K., et al. Nature 486, 368-374 (2012); Wang, Y., et al. Nature 456, 209-213 (2008); Schirle, N. T. & MacRae, I. J. Science 336, 1037-1040 (2012); Schirle, N. T., et al. Science 346, 608-613 (2014)) from which the unidirectional base pairing nucleates and propagates towards the 3' end of the guide (Yao, C, et al. Mol Cell 59, 125-132 (2015)). It was next examined whether mini-AGO uses g2-g4 as the primary seed as well. Since 14-nt guide catalyzed cleavage almost as efficiently as the 23- nt guide (Fig. 4b), systematic dinucleotide mismatches were made on the 14-nt guide to evaluate how the mismatches affect seed-dependent target cleavage. Either matched or mismatched guides were loaded into AGO and mini-AGO, followed by addition of the cap- labeled 60-nt target. Two-nt mismatches within the g2-g4 window affected the 14-nt guide- dependent target cleavage by AGO (Fig. 4e). However, the same guides enabled mini-AGO to cleave the target as efficiently as the perfectly matched guide, indicating that g2-g4 no longer served as a guide in the absence of the N-terminal lobe. This result suggests that making all the bases of the bound RNA accessible to target RNAs simultaneously results in shortening the requirement for nucleotides capable of serving as a guide. Mini-AGO was also more tolerant to 2-nt mismatches within the g8-gl l window than AGO (Fig. 4e), supporting the aforementioned gatekeeper model of the tapered channel.
It has been thought that Argonaute merely serves as a scaffold to pre-organize the loaded guide in order to facilitate base pairing with targets (Parker, J. S., et al. Mol Cell 33, 204-214 (2009); Ma, J. B. et al. Nature 434, 666-670 (2005)). Meanwhile, target recognition by RISC has been defined solely on the base pairing between guide and target strands (Bartel, D. P. Cell 136, 215-233 (2009); Hutvagner, G. & Zamore, P. D. Science 297, 2056-2060 (2002)). The data demonstrate that the proteinaceous component of RISC indeed scrutinizes the base complementarity as the judge of target cleavage. A recent study using a single- molecule approach proposed that Argonaute proteins reshape the binding properties of the loaded guide (Salomon, W. E., et al. Cell 162, 84-95 (2015)). These results suggest that the two lobes of Argonaute proteins and the loaded guide build a 'composite target-binding channel', implying a possibility that the target specificity of even the same guide is different, for example, among four human Argonaute proteins due to the unique structures of their composite channels (Hauptmann, J., et al. RNA 20, 1532-1538 (2014); Schurmann, N., et al. Nat Struct Mol Biol 20, 818-826 (2013); Hauptmann, J. et al. Nat Struct Mol Biol 20, 814-817 (2013)). Materials and methods
Cloning, expression, and purification of mini-AGO.
DNA encoding the designed mini-AGO from K. polysporus Argonaute was generated by first amplifying the MID-PIWI lobe from ^. polysporus Agol using Primer Set I (FW1 : GACATT TTGACAGGTTCAGGTAGAGTACCATCTCGTATTCTAGATGCCCC (SEQ ID NO: l) & RV1 : GCGCGC
CTC GAGTC AAATGTAATAC ATT AC GGATTTA ATGTT ATC G (SEQ ID NO:2)) followed by a second PCR amplification using Primer Set II (FWII: GC GCGC GGATC C ATCT ATA AAGTTGAAAAT AGAC ATGATT ATGGT ACT AAAGGT ACTAAAGTTGAC (SEQ ID NO:3) & RVII
GGTACTCTACCTGAACCTGTCAAAATGTCAACTTTAGTACCTTTAGTACCATAAT CATGTC (SEQ ID NO:4)) to fuse two Agol fragments, Ile221-Thr241 and Arg728-Ilel251 with a Gly-Ser-Gly linker. The gene was cloned into a modified pRSF Duet vector (Novagen) containing an amino-terminal Ulpl-cleavable His6-SUMO tag. Mini-AGO was overexpressed in E. coli BL21 (DE3) Rosetta2 (Novagen). Cell extract was prepared by homogenization in Buffer A (10 mM phosphate buffer pH 7.3, 2 M NaCl, 25 mM imidazole, 10 mM β- mercaptoethanol, 1 mM phenylmethylsulfonyl fluoride) and clarified by centrifugation. The supernatant was loaded onto a nickel column (GE Healthcare), washed with Buffer A, and eluted with a linear gradient to 100% Buffer B (10 mM phosphate buffer pH 7.3, 1 M NaCl, 750 mM imidazole, 10 mM β-mercaptoethanol). Fractions containing mini-AGO were mixed with Ulpl protease and dialyzed overnight against Buffer C (10 mM phosphate buffer pH 7.3, 500 mM NaCl, 20 mM imidazole, 10 mM β-mercaptoethanol) and the digested protein was loaded onto a nickel column (GE Healthcare) to remove the cleaved His6-SUMO tag. The flow-through sample containing mini-AGO was dialyzed against Buffer D (10 mM phosphate buffer pH 7.3, 10 mM β-mercaptoethanol), loaded onto a SP column (GE Healthcare), and eluted with a linear gradient to 70% Buffer E (10 mM phosphate buffer pH 7.3, 2 M NaCl, 10 mM β-mercaptoethanol). Fractions containing mini-AGO were dialyzed against Buffer D, loaded onto a MonoQ column (GE Healthcare), and eluted with a linear gradient to 100% Buffer E. Mini-AGO was again dialyzed against Buffer D and loaded onto a MonoS column (GE Healthcare) and eluted over a linear gradient to 14% Buffer E. The eluted protein was dialyzed against Buffer F (10 mM Tris-HCl pH 7.5, 200 mM NaCl, 5 mM DTT), concentrated by ultrafiltration, and loaded onto a HiLoad 16/600 Superdex 200 column (GE Healthcare) equilibrated with Buffer F. Purified mini-AGO was concentrated to approximately 40 mg mL" 1 measured by Bradford Assay (Bio-Rad), and stored at -80 °C.
Structure determination and refinement
Initial protein crystals were obtained by sitting-drop vapor-diffusion at 20 °C in 100 mM sodium citrate pH 5.5 and 15% PEG6000 and optimized in 100 mM sodium citrate pH 5.5 and 18% PEG4000. Crystals were soaked in collection buffer containing 1.1 -fold reservoir buffer and cryoprotected with 25% glycerol. Diffraction data sets were collected at the NE- CAT beamlines (Advanced Photon Source, Chicago) at 0.97918 A wavelength and processed with HKL2000 (Otwinowski, Z. & Minor, W. Method Enzymol 276, 307-326 (1997). Data collection and refinement statistics are listed in Table 2. Molecular replacement was performed with PHASER (McCoy, A. J. et al. JAppl Crystallogr 40, 658-674 (2007)) using the amino acids 728-1251 of the previously determined structure of K. polysporus Argonaute (AGO, PDB accession code 4F1N) as the search model for the MID-PIWI lobe. The remainder of the model (Ile221-Thr241-Gly-Ser-Gly) was built manually with COOT (Emsley, P., et al. Acta Crystallogr D Biol Crystallogr 66, 486-501 (2010)), and then the entire model was improved with iterative cycles of refinement with Phenix (Adams, P. D. et al. Acta Crystallogr D Biol Crystallogr 66, 213-221 (2010)). Ramachandran plot analysis was performed by PROCHECK (CCP4) (Winn, M. D. et al. Acta Crystallogr D Biol Crystallogr 67, 235-242 (2011)) and showed 90.3% and 9.8% of the protein residues in the favored and allowed regions, respectively, with no residues in disallowed regions. OMIT maps were generated by Phenix (Adams, P. D. et al. Acta Crystallogr D Biol Crystallogr 66, 213-221 (2010)), hydrogen-bond interactions were identified by CONTACT (CCP4) (Winn, M. D. et al. Acta Crystallogr D Biol Crystallogr 67, 235-242 (2011)). All figures of structures were generated using PyMol (Schrodinger, LLC. The AxPyMOL Molecular Graphics Plugin for Microsoft PowerPoint, Version 1.8 (2015)).
Analysis of copurifying nucleic acid
Polynucleotides were extracted from either AGO, mini-AGO, or water (for mock) by phenol xholoroform and dephosphorylated with Alkaline Phosphatase (Roche) by incubation at 37 °C for 30 minutes. Reactions were quenched by the addition of EDTA to a final concentration of 10 mM followed by inactivation of phosphatase by incubation at 70 °C for 30 minutes. Prior to 5' labelling, samples were supplemented with 10 mM MgCh. 5' end-labelling reactions were performed in a 30 reaction containing 3 heat-inactivated dephosphorylation reaction, 3 10 χ OptiKinase buffer (USB), 2 OptiKinase (USB), and 0.5 μΐ. [γ-32Ρ]ΑΤΡ (3,000 Ci mrnof1). End-labelling reactions were incubated at 37 °C for 40 minutes before aliquot ting into three equal volumes and treating with either RNase A (USB), RQ-1 RNase-free DNase I (Promega), or neither, for 20 minutes at 37 °C. Samples were resolved by 16% denaturing PAGE alongside a base-hydrolyzed 45-nt polyuridine ladder. Gels were visualized by phosphorimaging (Typhoon, GE Healthcare). Analysis of RNA that crystalized with mini-AGO was performed similarly except RNA was extracted from -70 crystals. Each crystal was individually harvested, rinsed in crystallization buffer (100 mM sodium citrate pH 5.5 and 18% PEG4000) three times and then dissolved in water prior to dephosphorylation and end-labeling. Expression of RxxxGxxG point mutants for solubility assay
Point mutations Arg227, Gly231, or Gly234 were introduced by PCR-based mutagenesis to generate vectors encoding mutant mini-AGO. The mutants were overexpressed in E. coli BL21 (DE3) Rosetta2 (Novagen). After ultrasonication, the cell lysate was centrifuged to separate the soluble fraction from the pellet. The pellet was resuspended in original volume using Buffer A. Representative samples of the supernatant and pellet for each construct were resolved by SDS-PAGE.
Expression and purification of K. polysporus AGO
K. polysporus Argonaute Thr207-Ilel251, AGO, was expressed and purified as previously described7. The concentration of purified AGO was determined by Bradford Assay (Bio-Rad) and stock aliquots were stored at -80 °C.
Substrate preparation
A list of RNA and DNA oligonucleotides used in this study is provided (Tables 3, 4, 5, and 6). 5' phosphorylated guide RNAs were chemically synthesized (Dharmacon), deprotected, and gel-purified. DNA guides were chemically synthesized (Sigma Aldrich), 5' phosphorylated using OptiKinase (Affirmatory), and gel purified. The sequences encoding target RNAs were cloned into pUC19 vector and transcribed in vitro using T7 RNA polymerase. DNase-treated transcripts were gel-purified, capped using ScriptCap m7G Capping System (CellScript) either with GTP for unlabeled targets or with [oc- 2P]GTP (3000 Ci mmol"1) for cap-labelled target RNAs and gel purified again. DNA target was chemically synthesized (Sigma Aldrich), 5' end- labelled with OptiKinase (Affymatrix) and [γ 2Ρ] ATP (3000 Ci mmol"1) before gel purification. For passenger strand cleavage assays, 5'-OH RNA was phosphorylated using OptiKinase (Affymetrix) either with ATP for unlabeled passenger strands or with [γ- 2Ρ]ΑΤΡ (3000 Ci mmol"1) for 5'- 2P labelled passenger strands. siRNA duplexes were prepared as described previously (Nakanishi, K., et al. Nature 486, 368-374 (2012)).
Table 3. List of guide RNA of varying length
Figure imgf000052_0001
Table 5. List of 14-nt guide RNA used for dinucleotide mismatch assay
(Bases mismatched to the target are underscored and shown in bold).
Figure imgf000052_0002
miR-20a gl0gl l 5'p UAAAGUGCUCCUAG (SEQ ID NO: 23) miR-20a gl lgl2 5'p UAAAGUGCUUCCAG (SEQ ID NO: 24) miR-20a gl2gl3 5'p UAAAGUGCUUACCG (SEQ ID NO: 25) miR-20a gl3gl4 5'p UAAAGUGCUUAUCU (SEP ID NO: 26)
Table 6. List of passengers and targets
(Bases across from the guide glO-gl 1 are underscored).
miR-20a 23-nt passenger 5'p ACCUGCACUAUAAGCACUUUAAG (SEQ ID
NO:27)
miR-20a 60-nt perfect match RNA target
5'm7GpppGGGAGAAACAAAAAUACCUACCUGCACUAUAAGCACUUUACCAUCU CAAACUUACUCAGA (SEQ ID NO:28)
miR-20a 60-nt ΙΟ'-l 1 ' mismatch RNA target
5'm7GpppGGGAGAAACAAAAAUACCUACCUGCACUAAUAGCACUUUACCAUCU CAAACUUACUCAGA (SEQ ID NO: 29)
miR-20a 60-nt perfect match DNA target
5'pGGGAGAAACAAAAATACCTACCTGCACTATAAGCACTTTACCATCTCAAAC TTACTCAGA (SEQ ID NO:30)
AGO Cleavage Assays
For all biochemical assays, stock AGO and mini-AGO were diluted and stored in dilution buffer (Buffer F + 0.5 mg mL"1 Ultrapure BSA (Ambion)) at -80 °C. All assays were performed in 1 χ reaction buffer (20 mM Tris-HCl pH 7.5, 150 mM NaCl, 1 mM MgCh, 1 mM DTT, 5% glycerol), 0.05 mg mL"1 BSA (Ambion), and 4 U RiboLock RNase Inhibitor (Thermo Scientific). For all guide-mediated cleavage assays, 1 μΜ of either AGO or mini- AGO was mixed with 50 nM guide RNA and incubated at 25 °C for 30 minutes to form the RISC. Cleavage was initiated by adding 1 5'-capped target RNA (final concentration, 10 nM) with trace amounts of 2P-cap-labelled target in a 10 reaction and incubated at 30 °C for 20 minutes before quenching with \0 μΐ. formamide loading buffer (95% formamide, 18 mM EDTA, 0.025% sodium dodecyl sulfate, 0.025% bromophenol blue, 0.025% xylene cyanol). For time-course reactions, 10 reactions were prepared similarly except 3 aliquots were removed at indicated time points and quenched by addition of 10 formamide loading buffer. For passenger strand cleavage assays (shown in Figure 3a), 1 of 10 nM 2P- passenger-strand-labelled siRNA duplex was added to a 9 μί mixture containing 1 χ reaction buffer and 1 μΜ either AGO or mini-AGO. Reactions were quenched at indicated time points by addition of formamide loading buffer. For siRNA-mediated target cleavage (shown in Figure 3b), 1 μΜ of either AGO or mini-AGO was pre-incubated with an unlabeled siRNA duplex at 30 °C for 30 minutes to allow for passenger strand cleavage and RISC maturation followed by addition of 10 nM cap-labelled target RNA. Reactions were quenched at indicated time points by addition of formamide loading buffer. For all guide-mediated cleavage assays, AGO or mini-AGO was pre-incubated with a single-stranded synthetic guide RNA at 25 °C for 30 minutes before addition of cap-labelled target RNAs at 30 °C for 20 minutes. Reactions were quenched with formamide loading dye, resolved by 16% denaturing PAGE, and visualized by phosphorimaging. Gels were quantified by ImageQuant (GE Healthcare). Cleavage assays using either DNA guides or DNA targets were performed similarly.
All cleavage percentages were calculated using equations listed below and averaged over three independent experiments. Equation 1.
Cleavage by k-nt guide was calculated using the following equation:
Pk = 100 x [Ck/(Ck + Uk)]
Equation 2.
The relative cleavage by k-nt guide was calculated using the following equation:
Rk = 100 x [Ck/(Ck + Uk)] /[C23/(C23 + U23)] where Ck and Uk are the intensities of the cleaved and uncleaved bands, respectively.
Equation 3.
The relative cleavage by k-nt guide was calculated using the following equation:
Rk' = 100 x [Ck/(Ck + Uk)]/[C14/(C14 + U14)] where C and U are the intensities of the cleaved and uncleaved bands, respectively. Equation 4.
The relative cleavage percentage of the tl O-tl 1 mismatch target for incubation of j min was calculated using the following equation:
Figure imgf000055_0001
where Cmis and Umis are the intensities of the cleaved and uncleaved bands derived from the tl O-tl 1 mismatch target, respectively, while Cmatch j and Umatch j are the intensities of the cleaved and uncleaved bands derived from the match target, respectively.
Free energy values for guide-target pairs and base pairing schematics shown in Figure 10c were calculated with RNA structure (Reuter, J. S. & Mathews, D. H. BMC Bioinformatics 11 , 129 (2010)).
References cited in this example
1 Meister, G. Argonaute proteins: functional insights and emerging roles. Nat Rev Genet 14,
447-459, doi: 10.1038/nrg3462 (2013).
2 Nakanishi, K. Anatomy of RISC: how do small RNAs and chaperones activate Argonaute proteins? Wiley Inter discip Rev RNA 7, 637-660, doi: 10.1002/wrna. l 356 (2016).
3 Hammond, S. M., Boettcher, S., Caudy, A. A., Kobayashi, R. & Hannon, G. J. Argonaute2, a link between genetic and biochemical analyses of RNAi. Science 293, 1 146-1 150, doi : 10.1126/science.1064023 (2001 ).
4 Elkayam, E. et al. The structure of human argonaute-2 in complex with miR-20a. Cell 150,
100-1 10, doi: 10.1016/j .cell.2012.05.017 (2012).
5 Faehnle, C. R., Elkayam, E., Haase, A. D., Hannon, G. J. & Joshua-Tor, L. The making of a sheer: activation of human Argonaute- 1. Cell Rep 3, 1901-1909, doi: 10.1016/j .celrep.2013.05.033 (2013).
6 Nakanishi, K. et al. Eukaryote-specific insertion elements control human ARGONAUTE sheer activity. Cell Rep 3, 1893-1900, doi: 10.1016/j .celrep.2013.06.010 (2013).
7 Nakanishi, K., Weinberg, D. E., Barrel, D. P. & Patel, D. J. Structure of yeast Argonaute with guide RNA. Nature 486, 368-374, doi: 10.1038/naturel l211 (2012).
8 Wang, Y., Sheng, G, Juranek, S., Tuschl, T. & Patel, D. J. Structure of the guide-strand- containing argonaute silencing complex. Nature 456, 209-213, doi: 10.1038/nature07315
(2008).
9 Schirle, N. T. & MacRae, I. J. The crystal structure of human Argonaute2. Science 336,
1037-1040, doi: 10.1 126/science. l221551 (2012). 10 Parker, J. S., Parizotto, E. A., Wang, M., Roe, S. M. & Barford, D. Enhancement of the seed-target recognition step in RNA silencing by a PIWI/MID domain protein. Mol Cell 33, 204-214, doi: 10.1016/j .molcel.2008.12.012 (2009).
11 Wang, Y. et al. Nucleation, propagation and cleavage of target RNAs in Ago silencing complexes. Nature 461, 754-761, doi: 10.1038/nature08434 (2009).
12 Schirle, N. T., Sheu-Gruttadauria, J. & MacRae, I. J. Structural basis for microRNA targeting. Science 346, 608-613, doi: 10.1126/science.1258040 (2014).
13 Bartel, D. P. MicroRNAs: target recognition and regulatory functions. Cell 136, 215-233, doi : 10.1016/j . cell.2009.01.002 (2009).
14 Martinez, J., Patkaniowska, A., Urlaub, H., Luhrmann, R. & Tuschl, T. Single-stranded antisense siRNAs guide target RNA cleavage in RNAi. Cell 110, 563-574 (2002).
15 Hutvagner, G. & Zamore, P. D. A microRNA in a multiple-turnover RNAi enzyme complex. Science 297, 2056-2060, doi: 10.1 126/science.1073827 (2002).
16 Boland, A., Huntzinger, E., Schmidt, S., Izaurralde, E. & Weichenrieder, O. Crystal structure of the MID-PIWI lobe of a eukaryotic Argonaute protein. Proc Natl Acad Sci U
SA 108, 10466-10471 , doi: 10.1073/pnas.1103946108 (2011).
17 Hur, J. K., Zinchenko, M. K., Djuranovic, S. & Green, R. Regulation of Argonaute sheer activity by guide RNA 3' end interactions with the N-terminal lobe. J Biol Chem 288, 7829- 7840, doi: 10.1074/jbc.Ml 12.441030 (2013).
18 Swarts, D. C. et al. DNA-guided DNA interference by a prokaryotic Argonaute. Nature
507, 258-261 , doi: 10.1038/naturel2971 (2014).
19 Kaya, E. et al. A bacterial Argonaute with noncanonical guide RNA specificity. Proc Natl Acad Sci USA 113, 4057-4062, doi: 10.1073/pnas. l524385113 (2016).
20 Olovnikov, I., Chan, K., Sachidanandam, R., Newman, D. K. & Aravin, A. A. Bacterial argonaute samples the transcriptome to identify foreign DNA. Mol Cell 51, 594-605, doi: 10.1016/j .molcel.2013.08.014 (2013).
21 Yao, C, Sasaki, H. M., Ueda, T., Tomari, Y. & Tadakuma, H. Single-Molecule Analysis of the Target Cleavage Reaction by the Drosophila RNAi Enzyme Complex. Mol Cell 59, 125-132, doi: 10.1016/j .molcel.2015.05.015 (2015).
22 Ma, J. B. et al. Structural basis for 5 '-end-specific recognition of guide RNA by the A. fulgidus Piwi protein. Nature 434, 666-670, doi: 10.1038/nature03514 (2005).
23 Salomon, W. E., Jolly, S. M., Moore, M. J., Zamore, P. D. & Serebrov, V. Single-Molecule Imaging Reveals that Argonaute Reshapes the Binding Properties of Its Nucleic Acid Guides. Cell 162, 84-95, doi: 10.1016/j .cell.2015.06.029 (2015). 24 Hauptmann, J., Kater, L., Loftier, P., Merkl, R. & Meister, G. Generation of catalytic human Ago4 identifies structural elements important for RNA cleavage. RNA 20, 1532- 1538, doi: 10.1261/rna.045203.114 (2014).
25 Schurmann, N., Trabuco, L. G, Bender, C, Russell, R. B. & Grimm, D. Molecular dissection of human Argonaute proteins by DNA shuffling. Nat Struct Mol Biol 20, 818-
826, doi: 10.1038/nsmb.2607 (2013).
26 Hauptmann, J. et al. Turning catalytically inactive human Argonaute proteins into active sheer enzymes. Nat Struct Mol Biol 20, 814-817, doi: 10.1038/nsmb.2577 (2013).
27 Otwinowski, Z. & Minor, W. Processing of X-ray diffraction data collected in oscillation mode. Method Enzymol 276, 307-326, doi:Doi 10.1016/S0076-6879(97)76066-X (1997).
28 McCoy, A. J. et al. Phaser crystallographic software. J Appl Crystallogr 40, 658-674, doi: 10.1107/S0021889807021206 (2007).
29 Emsley, P., Lohkamp, B., Scott, W. G. & Cowtan, K. Features and development of Coot.
Acta Crystallogr D Biol Crystallogr 66, 486-501, doi: 10.1107/S0907444910007493 (2010).
30 Adams, P. D. et al. PHENIX: a comprehensive Python-based system for macromolecular structure solution. Acta Crystallogr D Biol Crystallogr 66, 213-221, doi: 10.1107/S0907444909052925 (2010).
31 Winn, M. D. et al. Overview of the CCP4 suite and current developments. Acta Crystallogr D Biol Crystallogr 67, 235-242, doi: 10.1107/S0907444910045749 (2011).
32 Schrodinger, LLC. The AxPyMOL Molecular Graphics Plugin for Microsoft PowerPoint, Version 1.8 (2015).
33 Reuter, J. S. & Mathews, D. H. RNAstructure: software for RNA secondary structure prediction and analysis. BMC Bioinformatics 11, 129, doi: 10.1186/1471-2105-l l-129 (2010).
Example 2. In Vivo RNA Cleavage
Purified AGO and/or mini- AGO from ?. coli cells are already loaded with some cellular endogenous RNAs, while a small population of the purified proteins remained as RNA-free form that can load any synthetic ssDNA guides. To purify a homogenous DISC (i.e. a complex of Argonaute protein loaded with ssDNA), a quadruplex magnesium connection (QMC) method is used (See Kankia, B. Sci. Rep. 5: 12996 (2015)). Two DNA fragments use their intermolecular quadruplex to bind to each other. The purified AGO or mini-AGO is incubated with a 5' monophospholylated ssDNA whose 3' end is covalently connected with a half of QMC, followed by fishing only the programmed DISC using the counter part of QMC (see Figure 1 of Kankia, B. Sci. Rep. 5: 12996 (2015)). In one example, programmed AGO and/or mini-AGO is used to cleave a viral RNA sequence (HIV RNA).
To deliver DISC to live cells, the TAT (Trans-activator of transcription) from the HIV- 1 virus is used. Different variations (lengths) of the TAT peptide can be used to deliver the DISC to cells. In vivo evaluation of RNA-cleavage is evaluated by either of two methods: Northern Blot analysis using 5' radiolabeled DNA probes complementary to the RNA of interest and detection by phosphorimaging; or in vivo detection of RNA-cleavage is measured by correlation to the downstream levels of protein by Western Blot analysis. Other methods for delivering vectors, nucleic acids, proteins, or compositions to cells are known in the art (for example, viral vectors, lipid particles, etc.)
Example 3. High Throughput DNA-Guided RNA Cleavage by Yeast Argonaute
Native ribonucleoprotein cellular defense systems, such as Argonaute (AGO) and clustered regularly interspaced short palindromic repeats (CRISPR)/Cas9, exploit the base- pairing capability of single-stranded (ss) RNAs to recognize and cleave target nucleic acids. Their highly specific guide RNA (gRNA)-dependent cleavage has been applied to RNA interference (RNAi), genome editing, and programmable enzymatic restriction. The effector complex of AGO, termed RNA-induced silencing complex (RISC), contains a ss-gRNA that exposes only three 5' nucleotides at positions 2-4 to solvent to scan target RNAs. Once the three nucleotides are fully base paired to a complementary sequence, the remainder of the guide base pairs with the target. In contrast to CRISPR/Cas9, which requires a 3- to 6-nt protospacer adjacent motif in the target DNA, the target specificity of RISC relies only on the sequence of the gRNA with no requirement for the target sequence. Therefore, the scanning mechanism of RISC is able to search for accessible nucleotides of highly-structured RNAs without any sequence requirements.
Some prokaryotic AGOs use DNA guides (gDNAs) for their native activity, which has led to the recent development of DNA-guided programmable DNA-endonucleases. A recent study reported that human AG02 is a catalytically active RNase in the presence of a gDNA. Such a DNA-induced slicing complex (DISC) has advantages over RISCs due to the increased stability and significantly lower cost of DNA over RNA, making large-scale high-throughput applications more feasible. Here, the capability of a previously characterized variant of the budding yeast K. polysporus AGOl, AGOAexN (FIG. 14) to function as part of a DISC is shown. AGOAexN normally uses a 5' monophosphorylated 23 -nucleotide (nt) gRNA to cleave complementary RNA targets. To test if gDNA could activate AGOAexN to cleave RNA or DNA targets, the recombinant protein was loaded with miR-20a-derived gRNA or gDNA, followed by addition of either a complementary RNA or DNA target (Figs. 1 1a and l ib). AGOAexN bound with gDNA was able to cleave target RNA almost as efficiently as with the canonical gRNA (FIG. 1 1c and FIG. 14), demonstrating that gDNA can activate yeast AGO as a functional DISC. Accuracy of DNA-guided RNA cleavage was validated by introducing a dinucleotide mismatch at a non-permissive site in the target strand, which is known to inhibit cleavage by yeast AGO. Cleavage of the mismatched target by DISC was not detected, whereas RISC showed minor but detectable cleavage (FIG. l i d). These results highlight the specificity of DISC against RNA targets, as well as the superiority of gDNA over gRNA in terms of guide- dependent RNA cleavage.
Towards developing a DISC-based approach capable of identifying and targeting accessible regions of highly-structured RNAs in a high-throughput manner, a variant of the 352-nt RNA derived from the human immunodeficiency virus type 1 (HIV-1) 5' untranslated region (UTR) was used as a target RNA. Structured sub-domains include the transactivation response (TAR; nt 1 -57) element, poly(A)denylation signal (poly(A); nt 58-104), primer- binding site (PBS; nt 125-223), and genomic RNA packaging domain (Psi, nt 228-334) (FIG. 15). Numerous studies highlight the functional importance of each of these domains for viral replication. The structurally characterized dimerization initiation signal (DIS) mutant called ADIS (FIG. 15) was used to reduce technical complications associated with RNA dimerization. To survey accessible sites, 14 gDNAs (gDNAl to gDNA14) were systematically designed end- to-end that span the ADIS 5'UTR. The 14 gDNAs were designed to generate cleavage products in 23-nt increments (FIGs. 12a and 16).
Each of the different sites on the target RNA ADIS 5'UTR complementary to the 14 gDNAs were first individually targeted by simply changing the gDNA in separate cleavage reactions. AGOAexN and a single gDNA (one of the 14 gDNAs) were pre-incubated to form the DISC followed by addition of the 32P end-labeled ADIS 5'UTR substrate (FIG. 15). Reactions were quenched and cleavage products were resolved by denaturing PAGE. Cleavage by DISC was detected at all sites, albeit to different extents (FIGs. 12b and 17). Cleavage efficiencies were categorized into quartiles (Q1-Q4) based on their detectable levels of cleavage in 12.5% windows, with the most poorly cleaved sites in Ql (lowest cleavage) and the most efficiently targeted domains in Q4 (highest cleavage) (FIG. 12b). Previous structural studies using small-angle X-ray scattering showed that the TAR/poly(A) stem loops form a stable co-axially stacked helix, which is consistent with results herein showing that DISC cleaved TAR and poly(A) sites relatively poorly. In contrast, DISC cleaved other sites predicted to be in base-paired regions (as determined by SHAPE analysis) more efficiently, such as those targeted by gDNA-6, -8, -9, and -10. These data demonstrate that DISC- accessible sites on structured RNAs do not necessarily correlate with those predicted to be base paired from computational or other experimental approaches.
To test the specificity of DISC-mediated cleavage, one representative gDNA was selected from each quartile. The selected gDNAs were gDNA-4 (Ql), gDNA-8 (Q2), gDNA- 11 (Q3), and gDNA-6 (Q4). Dinucleotide mismatches were introduced in the selected gDNAs at the two DNA nucleotide positions complementary to the two cleavage site ADIS 5'UTR RNA nucleotides (FIG. 12c), thereby created "mismatched" variants of each selected gDNA. DISC bound with "matched" (having 100% complementarity to target ADIS 5'UTR RNA) gDNA-6 showed 25-fold higher specificity towards the Q4 site in the PBS domain compared with the mismatched variant (FIGs. 12d and 17). DISC bound with mismatched gDNA-4 displayed low cleavage activity in the poly(A) loop, similar to DISC bound with matched gDNA-4, the low cleavage efficiency (Ql) representative. Finally, DISCs containing mismatched gDNA-8 and gDNA-11 did not display any detectable cleavage against the Q2 and Q3 sites, respectively (FIG. 12d). These experiments underscore the utility of DISC as a highly sequence-specific "programmable" RNase. By programmable, it is meant that DISC can be directed to a given specific sequence of a target RNA by including a gDNA complementary to the specific RNA sequence. Thus, the gDNA "programs" the DISC to be specific for a sequence of the target RNA.
These biochemical data show that DISC (i) can be readily programmed to target different sequences without modifying the catalytic machinery, (ii) retains high specificity towards its intended target sites, and (iii) possesses no target site sequence limitations. However, to map accessible sites on long RNAs, a high-throughput approach is desirable.
To determine if multiple gDNAs can be used in a single reaction without losing cleavage activity or target site specificity, it was first tested whether DISC-mediated cleavage events affect each other. Equimolar amounts of gDNA-4, -6, -8 and -1 1 (one representative from each quartile) were pre-incubated with AGOAexN to assemble four DISCs in a single mixture, each targeting a different region of the HIV-1 ADIS 5'UTR (FIG. 12e). The 2P end- labeled RNA substrate was added to the assembly, and formation of cleavage products was monitored over time (FIG. 12f). The combined DISCs generated only four cleavage products, all of which migrated at lengths that matched those generated by separate reactions using individual DISC-gDNA combinations. The fact that cleaved products did not undergo multiple cuts by different DISCs in the same reaction mixture demonstrates that the cleavage displayed single-hit kinetics.
Another requirement for mapping accessible sites in a high-throughput manner is accurate read-out of the cleavage sites generated by multiple DISCs. To this end, reverse- transcription/primer extension analysis (RT/PE) was employed (FIG. 13a). DISCs were assembled with 11 of the 14 gDNAs used in FIG. 12a spanning nucleotides 24-276, and the 11 DISCs were mixed together in a single mixture. Noise associated with large peaks corresponding to the primer and full-length product limit the applicability of this technique at the 5' and 3' termini, thus 3 gDNAs were intentionally excluded. An unlabeled HIV-1 ADIS 5'UTR substrate was added to the mixture to initiate cleavage. Following a 60-min incubation, reactions were quenched and worked up as described in the Methods. The total RNA pool containing all cleavage products was used to template RT reactions using 23-nt long fluorophore-labeled primers. The extended primers were subjected to capillary electrophoresis and analyzed by RiboCAT software to assign peaks and identify DISC-mediated cleavage sites. The output of the analysis provided a trace of peak intensities corresponding to programmed cleavage sites, revealing accessible regions. In a single experiment, the RT/PE assay detected DISC -generated cleavage products across the HIV-1 ADIS 5'UTR substrate in 23-nt increments (FIG. 13b and 13c). The relative reactivity at each site was generally consistent with PAGE analysis (FIG. 12b). Overall, these results show that TAR is the least accessible domain of HIV-1 5'UTR and PBS and the 5' end of Psi are the most accessible. PBS and Psi are likely to be more dynamic than the highly stable TAR element, facilitating DISC-bound gDNA annealing and base-pairing even in predicted double-stranded regions.
In summary, developed herein is a high-throughput method for the identification of accessible sites within highly-structured RNAs using DISCs. The success of this combinatorial approach, the single-hit kinetics of mixtures of DISCs, and the read-out by RT/PE facilitate mapping accessible sites of long (e.g., kilobase-length) RNA substrates of unknown structure, such as long non-coding RNAs or full-length genomic viral RNAs.
Materials and Methods
Expression and purification of K. polysporus AGOAexN
A recombinant protein encompassing Thr207-Ilel251 of K. polysporus AGO Thr207-
Ilel251 (AGOAexN) was expressed and purified as previously described. The concentration of purified AGOAexN was determined by Bradford assay (Bio-Rad) and stock aliquots were stored at -80 °C. Preparation of miR-20a-derived substrates
A list of RNA and DNA oligonucleotides and polynucleotides used in this study is provided (Tables 8-1 1). miR-20a-derived 5' phosphorylated gRNAs were chemically synthesized (Dharmacon), deprotected, and gel -purified. 5' phosphorylated gDNAs were chemically synthesized (Sigma Aldrich). The sequences encoding target RNAs were cloned into a pUC 19 vector and transcribed in vitro using T7 RNA polymerase. DNase I-treated transcripts were gel-purified (10% polyacrylamide, 8 M urea, lx TBE), capped using ScriptCap m7G Capping System (CellScript) either with GTP for unlabeled targets or with [a- 2P]GTP
(3000 Ci mmol"1) for cap-labeled and gel purified again. DNA target was chemically synthesized (Sigma Aldrich), 5' end-labeled with T4 PNK (ThermoFisher) and [γ 2Ρ]ΑΤΡ (3000 Ci mmol"1) before gel purification. Unlabeled nucleic acid concentrations were quantified by spectrophotometry at 260 nm and calculated using the molar extinction coefficient. All extinction coefficients for substrates synthesized by commercial vendors were calculated and provided by manufacturer. The extinction coefficient used for capped miR-20a RNA targets is 587,900 (L / mole*cm).
Preparation of HIV- 1 ADIS 5'UTR transcript
The HIV-1 ADIS 5'UTR variant used in this study contained a stable GAGA tetraloop sequence in place of the dimerization initiation signal (DIS) (FIG. 15 and Table 8). HIV-1
ADIS 5'UTR was in vitro transcribed from a Fokl-digested pUC18 vector with an upstream hammerhead ribozyme using T7 RNA polymerase. DNase I-treated transcripts were gel- purified by denaturing PAGE (7% polyacrylamide, 8 M urea, lx TBE) and visualized by UV shadowing. The RNA was eluted from the gel in elution buffer (500 mM ammonium acetate, 1 mM EDTA, 0.1% (w/v) SDS), ethanol precipitated, resuspended in MilliQ water, and quantified by UV absorbance at 260 nm using an extinction coefficient for quantification of 3,243,098 (L / mole*cm). The RNA was 5' end-labeled with T4 PNK4 (ThermoFisher) and [γ32Ρ]ΑΤΡ (3000 Ci mmol-1) for labeled substrate or with ATP for unlabeled substrate.
Design of gDNAs for HIV-1 5' UTR cleavage
gDNAs for experiments targeting the HIV-1 ADIS 5'UTR sequence (Tables 10-11) were generated by following the workflow outlined in FIG. 16. The designed gDNAs were chemically synthesized with 5' monophosphates (Sigma Aldrich), resuspended in MilliQ water, and quantified by UV absorbance at 260 nm using extinction coefficients provided by manufacturer. miR-20a-mediated cleavage assays
Stock AGOAexN was diluted and stored in dilution buffer (10 mM Tris-HCl pH 7.5,
200 mM NaCl, 5 mM DTT, 0.5 mg/mL Ultrapure BSA (Ambion)) at -80 °C. All assays were performed in 1 χ reaction buffer (20 mM Tris-HCl pH 7.5, 150 mM NaCl, 1 mM MgCh, 1 mM
DTT, 5% glycerol), 0.05 mg/mL BSA (Ambion), and 0.4 U/pL RiboLock RNase Inhibitor
(ThermoFisher). 1 μΜ of AGOAexN was mixed with 50 nM gRNA or gDNA and incubated at
25 °C for 30 min to form the RISC or DISC, respectively. Cleavage was initiated by adding 1
5 '-capped target RNA (final concentration, 25 nM) and trace amounts of 2P-cap-labeled target in a 10 reaction and incubating at 30 °C for 20 min. For DNA targets, 2P-end-labeled targets were added to a final concentration of 25 nM. Reactions were quenched with 10 μί formamide loading buffer (95% formamide, 18 mM EDTA, 0.025% sodium dodecyl sulfate,
0.025% bromophenol blue, 0.025% xylene cyanol). Products were resolved by 16% denaturing
PAGE and visualized by phosphorimaging. Gels were quantified by ImageQuant (GE
Healthcare). All cleavage percentages were calculated using Equation 5, plotted using Equation
6, and averaged over three independent experiments.
Equation 5.
Cleavage of nucleic acid target by either RISC or DISC given as percent:
target, guide
Figure imgf000063_0001
where P is the percent cleavage of either target RNA or target DNA by RISC or DISC, and Ic and h indicate the band intensities of the 5' cleavage product and intact substrate, respectively.
Equation 6.
Relative percent target cleavage (shown in FIG. 1 lc) of a given guide:target pair relative to the gRNA-dependent RNA cleavage.
Figure imgf000063_0002
Identification of appropriate guide: AGOAexN ratio
Since AGOAexN co-purifies with bound endogenous E. coli RNA14, optimal guide:protein concentrations were approximated to identify an appropriate amount of gDNA to mix with AGOAexN for biochemical assays. AGOAexN (500 nM) was pre-incubated with increasing amounts of gDNA (0-100 nM) for 30 min at 25 °C followed by addition of cap- labeled miR-20a-derived target (1 nM) and shifting the temperature to 30 °C for 20 min. Reactions were quenched with formamide loading buffer and resolved by 16% denaturing PAGE (8M urea, lx TBE). Gels were visualized by phosphorimaging and quantified by ImageQuant (GE Healthcare). All cleavage percentages were calculated using Equation 5 and averaged over three independent experiments. Cleavage percentage saturated at 70 ± 3% (FIG. 14b, 14c) suggesting that only ~ 2-5% of purified AGOAexN was free of nucleic acid and can be loaded with gDNA. 500 nM AGOAexN and 10 nM gDNA were used for the remainder of the gel-based assays.
Validation of DISC-specificity on unstructured targets
For biochemical assays shown in FIG. l id, specificity and fidelity of DISC was determined by performing in vitro cleavage assays as in FIG. 11c, except 10 nM gDNA and
500 nM AGOAexN were used. AGOAexN was pre-incubated with gRNA or gDNA followed by addition of either perfectly matched cap-labeled RNA target (1 nM) or the same target but with a dinucleotide mismatch at the cleavage site. Products were resolved on 16% denaturing PAGE and gels were visualized by phosphorimaging.
HIV-1 ADIS 5'UTR cleavage assays with individual gDNA
All cleavage assays were performed in lx reaction buffer. HIV-1 ADIS 5'UTR substrate was prepared by mixing unlabeled HIV-1 ADIS 5'UTR substrate (10 nM) and trace amounts of 32P-end-labeled HIV-1 ADIS 5'UTR in 50 mM HEPES (pH 7.5). Sample was heated at 80 °C for two min followed by incubation at 60 °C for four min. MgCh was added to a final concentration of 10 mM and sample was transferred to 37 °C for 6 min followed by incubation on ice for at least 30 min. Sample homogeneity was checked by 6% native PAGE (lx TB, 1 mM MgCh) at 4 °C (FIG. 15c). For DISC formation and HIV-1 ADIS 5'UTR cleavage, AGOAexN (500 nM) was pre-mixed with gDNA (10 nM) for 30 min at 25 °C in a 9 μΐ. reaction followed by addition of 1 of cap-labeled HIV-1 ADIS 5'UTR substrate (final concentration 1 nM) and shifting the temperature to 30 °C for 20 min. Reactions were quenched with formamide dye and products were resolved by 8% denaturing PAGE (8M urea, lx TBE). The dinucleotide mismatch assay was performed in the same manner. All gels were visualized by phosphorimaging and quantified by ImageQuant (GE Healthcare) using Equation 7. Equation 7:
HIV-1 ' UTR cleavage by
Figure imgf000065_0001
HIV-1 ADIS 5'UTR cleavage assays with mixture of gDNAs
Cleavage assays using a mixture of gDNAs were performed similarly to the individually guided cleavage assays except that equimolar amounts of each of the selected gDNAs were pre-mixed together before adding to the reaction mixture. The mixture was pre- incubated at 25 °C for 30 min to form a mixture of DISCs that would recognize different regions of the HIV-1 ADIS 5'UTR substrate. After DISC-formation, 5'-labeled HIV-1 ADIS
5'UTR was added to the mixture (final concentration 1 nM) and 3-μί aliquots were removed at indicated time-points (0-60 min) and quenched with formamide dye. Products were resolved by 8% denaturing PAGE alongside an RNA marker.
Generation of cleavage products and high-throughput analysis
Cleavage reactions using unlabeled HIV-1 ADIS 5'UTR substrate were performed to generate DISC-mediated products that would be analyzed by a reverse-transcription and primer-extension reaction primed by 5'-NEDTM (ThermoFisher) fluorophore-labeled oligonucleotides. The cleavage reaction was performed similarly to that of FIG. 12f, with some modifications. Eleven gDNAs targeting the HIV-1 ADIS 5'UTR at 23-nt increments between positions 24-276 (gDNA-2 through -12 from Table 9) were pre-mixed at equimolar concentrations. AGOAexN (500 nM) and the gDNA mixture (20 nM) were pre-incubated at 25
°C for 30 min in lx reaction buffer in a 97.5 reaction. Following DISC-formation, 2.5 of folded, unlabeled HIV-1 ADIS 5'UTR (final concentration 25 nM) was added, bringing the total volume to 100 μΐ,. HIV-1 ADIS 5'UTR cleavage was performed at 30 °C for 60 min. The higher concentration was used based on earlier observations that 2.5 - 5 picomoles RNA template was optimal to prime reverse transcription during the primer extension steps of the assay. After 60 min, reactions were quenched and extracted by the addition of phenol pH 6.6
(1/2 volume) and chloroform (1/2 volume). 50 μΐ. MilliQ water was added to the organic phase and residual RNA was subjected to a second round of phenol/chloroform extraction. RNA was ethanol precipitated in the presence of glycogen (2 μg) and stored as a pellet at -20 °C. Control reactions were performed similarly except either AGOAexN [AGO(-)], gDNAs [gDNA(-)], or both [AGO(-)/gDNA(-)] were excluded to identify capillary electrophoresis peaks resulting from degradation of transcript or background. RNA pellets were resuspended in 9 MilliQ water, annealed with 2 μί of 5 μΜ NEDTM-labeled primer and extended using Superscript III reverse transcriptase following the manufacturer's protocol (Invitrogen) in a total reaction volume of 20 μΐ.. Remaining RNA was digested by adding 1 of 4 M NaOH and heating to 95 °C for 3 min. The reactions were then neutralized with 2 μί 2 M HC1. For each sample, 3 μΐ. of neutralized reaction was added to 17 μΐ. MilliQ water and ethanol precipitated with 10 μg glycogen. To facilitate sequence alignment, Sanger-style sequencing reactions were also performed using the same NEDTM- labeled primer with the transcription template plasmid using the Thermo Sequenase Cycle Sequencing Kit (ThermoFisher) per the manufacturer's protocol. All reaction and sequencing pellets were resuspended in formamide supplemented with GeneScanTM 600 LIZ® Size Standard (Applied Biosy stems) and resolved using a 3730 DNA Analyzer (Applied Biosy stems and the Plant Microbe Genomics Facility at The Ohio State University). The resulting electropherograms were analyzed using RiboCAT. All control data were compared for consistency to ensure the absence of systematic error. For each dataset, the reactivity values were scaled based on the average of the lowest 20% of peak areas in the gDNA(-) background control and then normalized by subtracting the gDNA(-) background from each and dividing the resulting values by the average of the top 10% of reactivity values. Averaged data represents the average of three independent experiments.
DISC-accessible sites may be an Achilles's heel of target RNAs. Identification of these sites can provide a new therapeutic strategy aimed at targeting RNA-based diseases such as AIDS, hepatitis C, ZIKA, microcephaly, cancer, and others.
Additional experiments were conducted to analyze gDNAs using the unstructured miR- 20a-derived RNA target. gDNA 5' nucleotide sequence was analyzed by altering the identity of the 5 ' nt to T, A, G or C (FIG. 18 A). Cleavage percentage in the endpoint assay indicated that gDNAs with a 5' should be used for gDNA design. Next, gDNA length was analyzed for the unstructured miR-20a target by truncating or extending the base-paired region between the guide and target strands. All gDNAs perfectly match the RNA target and were from 15-25 nt in length. The longer sequences (closer to 25 nt) produced higher levels of cleavage (FIG. 18B).
Next, gDNAs were analyzed using the structured HIV-1 ADIS 5 'UTR RNA target (FIG. 19A-19D). gDNAs were designed at 20 - 25 nt in length to target two sites on the HIV- 1 ADIS 5 'UTR target at sites #6 and #8 (FIG. 19B). Quantified data showed lengths of 23 and/or 24 appeared to provide the best cleavage at site #6, while lengths of 22 and 23 appeared best for site #8. Finally, cleavage assays were performed to compare activity by DISC and RNase H against unstructured miR-20a RNA target and structured HIV-1 ADIS 5'UTR RNA target (FIG. 20A-20B). While quantified data of cleavage for the unstructured miR20a sequence worked slightly better with RNAse H, the cleavage of structured HIV-1 ADIS 5'UTR RNA target by DISC was superior to that of RNAse H. The results indicate that DISC is able to access and cleave structured regions of RNA that RNase H is unable to cleave.
Table 7: miR-20a-derived sequences
Figure imgf000067_0001
In Table 7, bold and underlined dinucleotides in the target sequence indicate the two nucleotides between which lies the cleavage site. Cleavage occurs at the phosphodiester bond between the nt shown in bold across from glO and gl 1 of the guide strand counting from the 5' end of the guide. Table 8; HIV-1 WT and APIS 5'UTR Sequences
WT HIV-1 ADIS 5'UTR (SEQ ID NO: 37):
5'pGGGUCUCUCUGGUUAGACCAGAUCUGAGCCUGGGAGCUCUCUGGCUAACUAGGG
AACCCACUGCUUAAGCCUCAAUAAAGCUUGCCUUGAGUGCUCAAAGUAGUGUGUGC
CCGUCUGUUGUGUGACUCUGGUAACUAGAGAUCCCUCAGACCCUUUUAGUCAGUGU
GGAAAAUCUCUAGCAGUGGCGCCCGAACAGGGACUUGAAAGCGAAAGUAAAGCCAG
AGGAGAUCUCUCGACGCAGGACUCGGCUUGCUGAAGCGCGCACGGCAAGAGGCGAG
GGGCGGCGACUGGUGAGUACGCCAAAAAUUUUGACUAGCGGAGGCUAGAAGGAGAG
AGAUGGGUGCGAGAGCGUCGGUA
HIV-1 ADIS 5'UTR (SEQ ID NO: 38):
5'pGGGUCUCUCUGGUUAGACCAGAUCUGAGCCUGGGAGCUCUCUGGCUAACUAGGG AACCCACUGCUUAAGCCUCAAUAAAGCUUGCCUUGAGUGCUCAAAGUAGUGUGUGC CCGU
CUGUUGUGUGACUCUGGUAACUAGAGAUCCCUCAGACCCUUUUAGUCAGUGUGGAA
AAUCUCUAGCAGUGGCGCCCGAACAGGGACUUGAAAGCGAAAGUAAAGCCAGAGGA
GAUCUCUCGACGCAGGACUCGGCUUGCUGGAGACGGCAAGAGGCGAGGGGCGGCGA
CUGGUGAGUACGCCAAAAAUUUUGACUAGCGGAGGCUAGAAGGAGAGAGAUGGGU
GCGAGAGCGUCGGUA
Table 9: HIV-1 ADIS 5'UTR-targeting gDNAs
Figure imgf000068_0001
13 pTTTG G CGTACTCACCAGTCG CCG 277-299 289 SEQ ID NO: 51
14 pTCTAG CCTCCG CTAGTCAAA ATT 300-322 312 SEQ ID NO: 52
In Table 9, the target region refers to the nucleotides, in order from 5' to 3 ', of the HIV-1 ADIS 5 'UTR. The sequences of the gDNAs are complementary to the listed target regions. The 5 ' product length refers to the number of nucleotides expected after primer extension of the cleavage product.
Table 10: HIV-1 ADIS 5'UTR-targeting gDNAs
Figure imgf000069_0001
In Table 10, the sequences, target region, and 5' product lengths are as described in Table 9. The gDNA # refers to the "mismatched" (mm) sequences of the selected quartile representatives.
References cited in this example
1 Choudhury, R., Tsai, Y. S., Dominguez, D., Wang, Y. & Wang, Z. Nat Commun 3, 1147, (2012).
2 Kharma, N. et al. Nucleic Acids Res 44, e39, (2016).
3 Silverman, S. K. Nucleic Acids Res 33, 6151-6163, (2005).
4 Weeks, K. M. Curr Opin Struct Biol 20, 295-304, (2010).
5 Kauffmann, A. D., Campagna, R. J., Bartels, C. B. & Childs-Disney, J. L. Nucleic Acids Res 37, el21, (2009).
6 Tafer, H. et al. Nat Biotechnol 26, 578-583, (2008).
7 Agrawal, N. et al. Microbiol Mol Biol Rev 67, 657-685 (2003).
8 Doudna, J. A. & Charpentier, E. Science 346, 1258096, (2014).
9 Enghiad, B. & Zhao, H. ACS Synth Biol 6, 752-757, (2017).
10 Kobayashi, H. & Tomari, Y. Biochim Biophys Acta 1859, 71-81 , (2016). 11 Nakanishi, K. Wiley Interdiscip Rev RNA 7, 637-660, (2016).
12 O/'Connell, M. R. et al. Nature 516, 263-266, (2014).
13 Willkomm, S., Zander, A., Grohmann, D. & Restle, T. PLoS One 11, e0164695,
(2016) .
14 Nakanishi, K., Weinberg, D. E., Bartel, D. P. & Patel, D. J. Nature 486, 368-374, (2012).
15 Jones, C. P., Cantara, W. A., Olson, E. D. & Musier-Forsyth, K. Proc Natl Acad Sci U S A 111, 3395-3400, (2014).
16 Lu, K., Heng, X. & Summers, M. F. J Mol Biol 410, 609-633, (2011).
17 Wilkinson, K. A. et al. PLoS Biol 6, e96, (2008).
18 Cantara, W. A., Hatterschide, J., Wu, W. & Musier-Forsyth, K. RNA 23, 240-249,
(2017) .
19 Smola, M. J. et al. Proc Natl Acad Sci U S A 113, 10322-10327, (2016).
20 Kuno, G. & Chang, G. J. Arch Virol 152, 687-696, (2007).
Example 4. Sequences
Wild-type KpAGO amino acid sequence corresponds to NCBI code:
xp_001644461.1
SEQ ID NO: 31
MATLKPDTQIIAGNAAETEKPIVKKSSSSKPDAGDGEVESKPKKDKKKSKKENSDGKDET STAKKTKKSKNTKKSKDKTESSSTEKLDETSTSESPSEESSEKPKKSKKSKKKSSEGNEN VEANSENVKEKPEKKKKSKKSSKESTPETESENVETKSEKKKKSKKPKKSSKESTPEPNS STSTEQSSEAKSQSFGFKYSDKVFDLTEKTVDQPKEDTHAIYKVENRHDYGTKGTKVDIL TNHILLAVGNDVPTEKIDKELVPKLDGWWKTAFIFTYHIDFKPQQKGPPRRGKPVPPQEL SKPKKYELIEALLDEDEILYKYRDRIAFNGEDTIYSHVPLEEFTLFDGCWEVSNKQKKKV
VPGMGAPSNKASLQKKIDPELEEMVSQITLKFSGKVGLKDIYNDTTTQDTEVQESRMSAI DKTCLLSLLGAKFMSTDDLIFQVQGNKFFIFNNFAKAIPFQIGGYLLQGFTVSLTHVYGG VALNTVSVPAPFIKHTKYLPGDPRFKNNEKEQFTLMDWIIECYHQSKAIRDIRYNPKTAP PPSVKDLNYFVEKNTDISALLKGLKVYRPYINYSINKDGTPKPPRKRSSKGIVGFTRESA VSMRFNVLESSLKKNSAPKPNEKPININTIDYFKRKYDITLKYPDMKLVNLGGKNDVVPP ECLTIVPGQKLKGQIFDTKTYIDFSAIRPTEKFDLISRLSMPAIKRGLTDSEKEESSAPH NSAYQFMRVPSRILDAPVVQFKESTFEYKDKSYGTKHEESKGNWNMKGHQFISTPAKQVN LRAIFINNANTAPPASMESELDISMDKFASDVKQLGVDFNVSGKPILINQFGPPIKKFQG GGRGGRGGRGSRGGRGGRGAPSGPPTFETSPGEISLLNLLENIPSNTYILYVLRRGNDSA VYDRLKYITDLKFGALNSCVVWDNFKKNSIQYNSNVVMKMNLKLLGSNHSLSIENNKLLI DKESNLPILVLGSDVTHYPEKDQNSIASLVGSYDDKFTQFPGDYMLQDGPGEEIITNVGS LMLNRLKIYQKHNNGKLPTKIMYFRDGVSVDQFSQVVKIEVKSIKESVRKFGPQLNGGNK YDPPVTCIATVKRNQVRFIPIQENAKNEKGEEVAVQSMGNVMPGTVVDRGITSVAHFDFF IQSHQALKGTGVPCHYWCLYDENQSTSDYLQEICNNLCYIFGRSTTSVKVPAPVYYADLL CTRATCFFKAGFELNMAQAPKEKGSKDQPTVSKNVLLPQVNDNIKSVMYYI
KpAGO 207-1251 used in this disclosure is composed of the following amino acid sequence, which includes an N-terminal serine leftover after enzymatic tag-cleavage by Ulpl:
SEQ ID NO: 32
STEKTVDQPKEDTHAIYKVENRHDYGTKGTKVDILTNHILLAVGNDVPTEKIDKELVPKLDGWWKTAFIFT YHIDFKPQQKGPPRRGKPVPPQELSKPKKYELIEALLDEDEILYKYRDRIAFNGEDTIYSHVPLEEFTLFDGC WEVSNKQKKKVVPGMGAPSNKASLQKKIDPELEEMVSQITLKFSGKVGLKDIYNDTTTQDTEVQESRMS AIDKTCLLSLLGAKFMSTDDLIFQVQGNKFFIFNNFAKAIPFQIGGYLLQGFTVSLTHVYGGVALNTVSVPA PFIKHTKYLPGDPRFKNNEKEQFTLMDWIIECYHQSKAIRDIRYNPKTAPPPSVKDLNYFVEKNTDISALLK GLKVYRPYINYSINKDGTPKPPRKRSSKGIVG FTRESAVSMRFNVLESSLKKNSAPKPNEKPININTIDYFKR KYDITLKYPDMKLVNLGGKNDVVPPECLTIVPGQKLKGQIFDTKTYIDFSAIRPTEKFDLISRLSMPAIKRGL TDSEKEESSAPHNSAYQFMRVPSRILDAPVVQFKESTFEYKDKSYGTKHEESKGNWNMKGHQFISTPAK QVNLRAIFINNANTAPPASMESELDISMDKFASDVKQLGVDFNVSGKPILINQFGPPIKKFQGGGRGGR GGRGSRGGRGGRGAPSGPPTFETSPGEISLLNLLENIPSNTYILYVLRRGNDSAVYDRLKYITDLKFGALNS CVVWDNFKKNSIQYNSNVVMKMNLKLLGSNHSLSIENNKLLIDKESNLPILVLGSDVTHYPEKDQNSIASL VGSYDDKFTQFPGDYMLQDGPGEEIITNVGSLMLNRLKIYQKHNNGKLPTKIMYFRDGVSVDQFSQVVK IEVKSIKESVRKFGPQLNGGNKYDPPVTCIATVKRNQVRFIPIQENAKNEKGEEVAVQSMGNVMPGTVV DRGITSVAHFDFFIQSHQALKGTGVPCHYWCLYDENQSTSDYLQEICNNLCYIFGRSTTSVKVPAPVYYAD LLCTRATCFFKAGFELNMAQAPKEKGSKDQPTVSKNVLLPQVNDNIKSVMYYI
Miniature- AGO (mini-AGO) used in this disclosure is composed of the following amino acid sequence, which includes aN-terminal serine leftover after enzymatic tag-cleavage by Ulpl:
SEQ ID NO: 33
SIYKVENRHDYGTKGTKVDILTGSGRVPSRILDAPVVQFKESTFEYKDKSYGTKHEESKGNWNMKGHQFI STPAKQVNLRAIFINNANTAPPASMESELDISMDKFASDVKQLGVDFNVSGKPILINQFGPPIKKFQGGG RGGRGGRGSRGGRGGRGAPSGPPTFETSPGEISLLNLLENIPSNTYILYVLRRGNDSAVYDRLKYITDLKFG ALNSCVVWDNFKKNSIQYNSNVVMKMNLKLLGSNHSLSIENNKLLIDKESNLPILVLGSDVTHYPEKDQN SIASLVGSYDDKFTQFPGDYMLQDGPGEEIITNVGSLMLNRLKIYQKHNNGKLPTKIMYFRDGVSVDQFS QVVKIEVKSIKESVRKFGPQLNGGNKYDPPVTCIATVKRNQVRFIPIQENAKNEKGEEVAVQSMGNVMP GTVVDRGITSVAHFDFFIQSHQALKGTGVPCHYWCLYDENQSTSDYLQEICNNLCYIFGRSTTSVKVPAP VYYADLLCTRATCFFKAGFELNMAQAPKEKGSKDQPTVSKNVLLPQVNDNIKSVMYYI
The following DNA sequences were used to generate the following sequence identity table:
Wild type K. polysporus Argonaute gene found on NCBI corresponds to ID #
NW_001834651.1
SEQ ID NO: 34
ATGGCTACTTTAAAGCCTGATACCCAAATAATTGCTGGAAATGCAGCTGAGACCGAAAAACCTATTGTGA AGAAATCAAGTTCTTCAAAACCTGATGCTGGAGATGGTGAAGTTGAAAGTAAACCAAAGAAGGACAAGAA GAAATCCAAAAAGGAAAATTCCGATGGAAAGGATGAAACCAGTACTGCTAAAAAGACAAAAAAATCTAAA AACACAAAAAAATCCAAGGATAAAACAGAATCTTCTTCAACTGAGAAATTAGATGAGACATCCACAAGTG AAAGTCCATCTGAAGAAAGTTCAGAAAAACCTAAAAAATCCAAAAAATCCAAAAAGAAATCATCTGAAGG TAATGAAAATGTAGAAGCAAATTCTGAAAATGTTAAAGAAAAACCTGAAAAGAAGAAGAAATCAAAGAAA TCTTCAAAAGAATCTACTCCTGAAACAGAGTCTGAAAATGTAGAAACAAAATCTGAAAAGAAGAAGAAAT CAAAGAAACCAAAGAAATCTTCAAAAGAATCTACCCCTGAACCAAATTCTTCAACATCTACAGAACAATC TTCTGAAGCTAAATCTCAATCTTTCGGTTTCAAATATTCTGATAAAGTGTTTGACTTAACCGAGAAAACT GTAGATCAACCTAAGGAAGATACTCATGCAATCTATAAAGTTGAAAATAGACATGATTATGGTACTAAAG GTACTAAAGTTGACATTTTGACAAACCATATTTTGCTTGCGGTAGGTAACGATGTTCCTACTGAAAAAAT CGACAAAGAATTGGTTCCTAAATTAGATGGTTGGTGGAAAACTGCCTTCATCTTTACTTATCATATCGAT TTC AAACCCCAACAAAAAG G CCCTCCACGTAG AG G CAAACCAGTTCCACCACAG G AATTATCAAAG CCAA AGAAATATGAATTAATTGAAGCTTTACTGGATGAAGATGAAATCTTGTACAAGTATAGGGATCGTATCGC TTTCAATGGTGAAGATACTATTTATTCTCATGTGCCACTAGAAGAATTCACTTTGTTTGATGGTTGTTGG G AAGTCAGTAACAAG CAAAAG AAG AAG GTTGTTCCAG GTATG G GTG CTCCATCCAACAAAG CTTCTCTAC AAAAAAAAATAGATCCAGAATTAGAAGAAATGGTTTCTCAAATCACTTTGAAATTCAGTGGTAAAGTAGG CTTG AAAG ATATTTATAACG AC ACCACTACTCAAG ACACCG AAGTACAG G AAAGTAG AATGTCTG CAATC GATAAGACATGTTTGCTATCTTTATTAGGTGCTAAGTTCATGAGTACAGATGATTTGATTTTCCAAGTTC AAG GTAACAAGTTCTTTATTTTCAATAATTTTG CTAAAG CTATCCCATTCCAAATCG GTG GTTATTTGTT G CAG G GTTTTACTGTTTCATTAACTCATGTTTATG GTG GTGTCG CTTTAAACACTGTCAGTGTTCCTG CT CCATTTATTAAACATACCAAGTACTTGCCAGGTGATCCAAGATTTAAAAACAATGAAAAGGAACAATTTA CATTAATGGACTGGATCATTGAATGTTACCACCAATCTAAGGCTATAAGAGATATCAGATATAATCCAAA AACAGCTCCTCCACCATCAGTTAAAGATTTGAACTATTTTGTCGAAAAGAACACTGATATTTCAGCTTTG TTAAAGGGTTTAAAGGTTTACAGACCGTACATCAATTATAGTATCAATAAAGATGGTACCCCAAAACCAC CAAGAAAGAGATCTTCTAAGGGTATTGTTGGATTTACCCGTGAATCTGCGGTATCCATGAGGTTTAATGT TCTTGAAAGTAGTTTGAAGAAGAACAGTGCTCCAAAACCTAATGAAAAACCTATCAATATCAATACTATT GATTATTTCAAGAGGAAATATGACATTACTTTGAAATATCCTGATATGAAGTTAGTAAACTTGGGCGGTA AAAATG ATGTCGTTCCTCCTG AATGTTTG ACTATTGTG CCG G GTCAAAAATTG AAG G GTCAAATTTTTG A TACAAAAACTTATATCGACTTCAGTGCAATTAGACCAACTGAAAAGTTTGATTTAATCTCCAGGTTATCT ATG CCAG CTATAAAAAG AG G GTTAACTG ATTCTG AAAAG G AAG AATCATCAG CTCCTCACAATAGTG CTT ATCAATTTATGAGAGTACCATCTCGTATTCTAGATGCCCCAGTTGTTCAGTTCAAAGAATCTACCTTTGA ATATAAGGACAAGAGTTATGGAACTAAGCATGAAGAATCTAAAGGTAACTGGAACATGAAAGGTCACCAA TTTATTTCCACTCC AG CCAAACAG GTCAACTTAAG AG CAATATTTATTAATAATG CTAACACAG CTCCAC CAG CATCTATG G AAAGTG AACTTG ACATCTCTATG G ATA AATTCG CATCTG ATGTTAAACAATTAG GTGT GGACTTCAACGTATCAGGTAAACCAATTCTAATTAATCAATTTGGTCCCCCAATTAAGAAATTCCAAGGT GGTGGCCGTGGTGGCCGTGGTGGTCGTGGTAGCCGTGGTGGCCGTGGTGGCCGTGGTGCTCCATCTGGTC CTCCAACTTTCGAAACCTCTCCAGGTGAGATATCTTTGTTAAACTTATTAGAAAATATTCCAAGCAATAC CTATATTTTGTATGTATTGCGCCGTGGTAACGATTCTGCTGTTTATGATAGATTGAAATATATCACTGAT TTGAAATTTGGTGCATTGAATTCCTGTGTTGTTTGGGACAACTTCAAAAAGAATTCTATTCAATATAATT CCAATGTTGTTATGAAGATGAACTTGAAGTTATTAGGTAGTAACCACTCTCTATCTATTGAAAACAACAA ACTATTAATTGATAAGGAATCTAACTTGCCAATATTAGTGTTGGGTTCTGATGTGACACATTATCCTGAA AAGGATCAAAACTCTATTGCCTCGTTAGTAGGTTCATACGATGACAAATTTACCCAATTCCCTGGTGATT ACATGCTTCAAGATGGTCCAGGTGAAGAAATAATTACTAATGTCGGTTCATTAATGTTGAACAGATTAAA GATATATCAAAAACATAATAATGGTAAATTACCAACGAAGATCATGTACTTCAGAGATGGTGTTTCAGTT GACCAATTCTCTCAAGTTGTTAAGATTGAAGTTAAGTCTATTAAGGAATCAGTCCGTAAATTTGGTCCTC AATTAAATGGTGGTAACAAATACGATCCACCAGTTACATGTATTGCCACTGTCAAAAGAAATCAAGTCAG ATTTATTCCTATCCAAGAAAATGCGAAGAATGAAAAGGGTGAAGAAGTTGCTGTTCAATCCATGGGTAAT GTTATG CCAG GTACTGTTGTAG ACCGTG GTATCACATCTGTG G CACACTTTG ATTTCTTTATCCAATCTC ATCAAGCTTTGAAGGGTACTGGTGTCCCATGCCACTATTGGTGTCTATATGACGAAAACCAATCTACTTC TGACTACTTACAAGAAATATGTAACAACTTATGTTACATTTTCGGTAGATCTACTACCAGTGTAAAAGTC CCAG CCCCAGTATATTATG CCG ATTTGTTGTGTACCCGTG CTACATG CTTTTTCAAG G CAG GTTTTG AAC TTAATATG G CTCAAG CACCAAAG G AG AAG G GTTCTAAG G ATCAACCTACTGTTTCCAAG AATGTTCTATT ACCACAAGTTAACGATAACATTAAATCCGTAATGTATTACATTTGA
The nucleotide sequence encoding the polypeptide for KpAGO 207-1251 has the sequence below, which includes the codon for the N-terminal serine leftover after enzymatic Ulpl tag- cleavage.
SEQ ID NO: 35
TCCACCGAG A A A ACTGTAG ATC A ACCTA AG G A AG ATACTC ATG C A ATCTATA A AGTTG A A A ATAG AC ATGATTATGGTACTAAAGGTACTAAAGTTGACATTTTGACAAACCATATTTTGCTTGCGGTAGGTAAC GATGTTCCTACTGAAAAAATCGACAAAGAATTGGTTCCTAAATTAGATGGTTGGTGGAAAACTGCCT TCATCTTTACTTATCATATCGATTTCAAACCCCAACAAAAAGGCCCTCCACGTAGAGGCAAACCAGTT CCACC AC AG G A ATT ATC A A AG CCA A AG A A AT ATG A ATTA ATTG A AG CTTTACTG G ATG AAG ATG A A A TCTTGTACAAGTATAGGGATCGTATCGCTTTCAATGGTGAAGATACTATTTATTCTCATGTGCCACTA GAAGAATTCACTTTGTTTGATGGTTGTTGGGAAGTCAGTAACAAGCAAAAGAAGAAGGTTGTTCCA GGTATGGGTGCTCCATCCAACAAAGCTTCTCTACAAAAAAAAATAGATCCAGAATTAGAAGAAATG GTTTCTCAAATCACTTTGAAATTCAGTGGTAAAGTAGGCTTGAAAGATATTTATAACGACACCACTAC TCAAGACACCGAAGTACAGGAAAGTAGAATGTCTGCAATCGATAAGACATGTTTGCTATCTTTATTA GGTGCTAAGTTCATGAGTACAGATGATTTGATTTTCCAAGTTCAAGGTAACAAGTTCTTTATTTTCAA TAATTTTGCTAAAGCTATCCCATTCCAAATCGGTGGTTATTTGTTGCAGGGTTTTACTGTTTCATTAAC TC ATGTTTATG GTGGTGTCG CTTTAAAC ACTGTCAGTGTTCCTG CTCCATTTATTAA ACATACCAAGTA CTTGCCAGGTGATCCAAGATTTAAAAACAATGAAAAGGAACAATTTACATTAATGGACTGGATCATT GAATGTTACCACCAATCTAAGGCTATAAGAGATATCAGATATAATCCAAAAACAGCTCCTCCACCATC AGTTA A AG ATTTG A ACTATTTTGTCG A A A AG A AC ACTG ATATTTC AG CTTTGTT A A AG G GTTTA A AG GTTTACAGACCGTACATCAATTATAGTATCAATAAAGATGGTACCCCAAAACCACCAAGAAAGAGAT CTTCTAAGGGTATTGTTGGATTTACCCGTGAATCTGCGGTATCCATGAGGTTTAATGTTCTTGAAAGT AGTTTG A AG A AG A AC AGTG CTCC A A A ACCTA ATG A A A A ACCTATC A ATATC A ATACTATTG ATTATTT CAAGAGGAAATATGACATTACTTTGAAATATCCTGATATGAAGTTAGTAAACTTGGGCGGTAAAAAT GATGTCGTTCCTCCTGAATGTTTGACTATTGTGCCGGGTCAAAAATTGAAGGGTCAAATTTTTGATAC AAAAACTTATATCGACTTCAGTGCAATTAGACCAACTGAAAAGTTTGATTTAATCTCCAGGTTATCTA TG CCAG CTATA AAAAG AG G GTTAACTG ATTCTG A AAAG G AAG AATCATCAG CTCCTCACAATAGTGC TTATCAATTTATGAGAGTACCATCTCGTATTCTAGATGCCCCAGTTGTTCAGTTCAAAGAATCTACCTT TG A ATATA AG G AC A AG AGTTATG G A ACTA AG C ATG A AG A ATCTA A AG GTA ACTG G A AC ATG AAAG GTCACCAATTTATTTCCACTCCAG CCAAACAG GTC A ACTTA AG AG CAATATTTATTAATAATG CTA AC AC AG CTCC ACC AG C ATCTATG G A A AGTG A ACTTG AC ATCTCTATG G ATA A ATTCG C ATCTG ATGTTA A ACAATTAGGTGTGGACTTCAACGTATCAGGTAAACCAATTCTAATTAATCAATTTGGTCCCCCAATTA AGAAATTCCAAGGTGGTGGCCGTGGTGGCCGTGGTGGTCGTGGTAGCCGTGGTGGCCGTGGTGGC CGTGGTGCTCCATCTGGTCCTCCAACTTTCGAAACCTCTCCAGGTGAGATATCTTTGTTAAACTTATTA G AA AATATTCCAAG CAATACCTATATTTTGTATGTATTG CGCCGTG GTAACG ATTCTG CTGTTTATG A TAGATTGAAATATATCACTGATTTGAAATTTGGTGCATTGAATTCCTGTGTTGTTTGGGACAACTTCA AAAAGAATTCTATTCAATATAATTCCAATGTTGTTATGAAGATGAACTTGAAGTTATTAGGTAGTAAC CACTCTCTATCTATTGAAAACAACAAACTATTAATTGATAAGGAATCTAACTTGCCAATATTAGTGTT GGGTTCTGATGTGACACATTATCCTGAAAAGGATCAAAACTCTATTGCCTCGTTAGTAGGTTCATAC GATGACAAATTTACCCAATTCCCTGGTGATTACATGCTTCAAGATGGTCCAGGTGAAGAAATAATTA CTAATGTCGGTTCATTAATGTTGAACAGATTAAAGATATATCAAAAACATAATAATGGTAAATTACCA ACGAAGATCATGTACTTCAGAGATGGTGTTTCAGTTGACCAATTCTCTCAAGTTGTTAAGATTGAAGT TAAGTCTATTAAGGAATCAGTCCGTAAATTTGGTCCTCAATTAAATGGTGGTAACAAATACGATCCAC C AGTTAC ATGTATTG CC ACTGTC A A A AG A A ATCA AGTC AG ATTTATTCCTATCC A AG A A A ATG CG A A GAATGAAAAGGGTGAAGAAGTTGCTGTTCAATCCATGGGTAATGTTATGCCAGGTACTGTTGTAGA CCGTG GTATCACATCTGTGG CACACTTTG ATTTCTTTATCCAATCTCATCAAG CTTTG AAGG GTACTG GTGTCCCATGCCACTATTGGTGTCTATATGACGAAAACCAATCTACTTCTGACTACTTACAAGAAATA TGTAACAACTTATGTTACATTTTCGGTAGATCTACTACCAGTGTAAAAGTCCCAGCCCCAGTATATTA TG CCG ATTTGTTGTGTACCCGTG CTACATG CTTTTTCAAGG CAG GTTTTG AACTTAATATG GCTCA AG CACCAAAGGAGAAGGGTTCTAAGGATCAACCTACTGTTTCCAAGAATGTTCTATTACCACAAGTTAA CGATAACATTAAATCCGTAATGTATTACATTTGA The nucleotide sequence encoding the polypeptide for miniature- AGO has the sequence below, which includes the codon for the N-terminal Serine leftover after enzymatic Ulpl tag- cleavage.
SEQ ID NO: 36
TCCATCTATAAAGTTGAAAATAGACATGATTATGGTACTAAAGGTACTAAAGTTGACATTTTGACAG GTTCAGGTAGAGTACCATCTCGTATTCTAGATGCCCCAGTTGTTCAGTTCAAAGAATCTACCTTTGAA TATAAGGACAAGAGTTATGGAACTAAGCATGAAGAATCTAAAGGTAACTGGAACATGAAAGGTCAC CAATTTATTTCCACTCCAG CCA AACAG GTCAACTTAAG AG CAATATTTATTAATAATGCTAACACAG C TCCACCAGCATCTATGGAAAGTGAACTTGACATCTCTATGGATAAATTCGCATCTGATGTTAAACAAT TAGGTGTGGACTTCAACGTATCAGGTAAACCAATTCTAATTAATCAATTTGGTCCCCCAATTAAGAAA TTCCAAGGTGGTGGCCGTGGTGGCCGTGGTGGTCGTGGTAGCCGTGGTGGCCGTGGTGGCCGTGG TGCTCCATCTGGTCCTCCAACTTTCGAAACCTCTCCAGGTGAGATATCTTTGTTAAACTTATTAGAAAA TATTCCAAGCAATACCTATATTTTGTATGTATTGCGCCGTGGTAACGATTCTGCTGTTTATGATAGATT G A A ATATATC ACTG ATTTG A A ATTTG GTG CATTG A ATTCCTGTGTTGTTTG G G ACA ACTTC A A A A AG A ATTCTATTCAATATAATTCCAATGTTGTTATGAAGATGAACTTGAAGTTATTAGGTAGTAACCACTCT CTATCTATTG A AAAC AACAAACTATTA ATTG ATAAG G A ATCTAACTTG CCAATATTAGTGTTGG GTTC TGATGTGACACATTATCCTGAAAAGGATCAAAACTCTATTGCCTCGTTAGTAGGTTCATACGATGAC A A ATTTACCCA ATTCCCTG GTG ATTAC ATG CTTC A AG ATG GTCC AG GTG A AG A A ATA ATTACTA ATGT CGGTTCATTAATGTTG AACAG ATTAAAGATATATCAAAAACATAATAATGGTAAATTACCAACGAAG ATCATGTACTTCAGAGATGGTGTTTCAGTTGACCAATTCTCTCAAGTTGTTAAGATTGAAGTTAAGTC TATTAAG G AATCAGTCCGTAAATTTG GTCCTCAATTA AATG GTG GTAACAA ATACG ATCCACCAGTTA CATGTATTGCCACTGTCAAAAGAAATCAAGTCAGATTTATTCCTATCCAAGAAAATGCGAAGAATGA AAAGGGTGAAGAAGTTGCTGTTCAATCCATGGGTAATGTTATGCCAGGTACTGTTGTAGACCGTGG TATCACATCTGTGG CACACTTTG ATTTCTTTATCCAATCTCATCAAG CTTTGAAG G GTACTGGTGTCCC ATGCCACTATTGGTGTCTATATGACGAAAACCAATCTACTTCTGACTACTTACAAGAAATATGTAACA ACTTATGTTACATTTTCGGTAGATCTACTACCAGTGTAAAAGTCCCAGCCCCAGTATATTATGCCGAT TTGTTGTGTACCCGTGCTACATG CTTTTTCAAG G CAG GTTTTG A ACTTAATATG GCTC AAG CACCAAA GGAGAAGGGTTCTAAGGATCAACCTACTGTTTCCAAGAATGTTCTATTACCACAAGTTAACGATAAC ATTAAATCCGTAATGTATTACATTTGA
Sequence alignments were performed using Clustal Omega: Sievers F, Wilm A, Dineen DG, Gibson TJ, Karplus K, Li W, Lopez R, McWilliam H, Remmert M, Soding J, Thompson JD, Higgins D. Molecular Systems Biology 7 Article number: 539 doi: 10.1038/msb.2011.75, http://www.ebi.ac.uk/Tools/msa/clustalo/
Alignment files were used as inputs for the calculation of sequence identity for KpAGO wt against KpAGO 207-1251 and mini-AGO. KpAGO 207-1251 sequence identity against mini-AGO was also calculated. All alignments were calculated for the amino acid sequence as well as the DNA sequence using the following online available software: Stothard P (2000) The Sequence Manipulation Suite: JavaScript programs for analyzing and formatting protein and DNA sequences. Biotechniques 28: 1102-04. http://www.bioinformatics.org/sms2/ident_sim.html
Table 11. Sequence Comparisons
Figure imgf000076_0001
Unless defined otherwise, all technical and scientific terms used herein have the same meanings as commonly understood by one of skill in the art to which the disclosed invention belongs. Publications cited herein and the materials for which they are cited are specifically incorporated by reference.
Those skilled in the art will appreciate that numerous changes and modifications can be made to the preferred embodiments of the invention and that such changes and modifications can be made without departing from the spirit of the invention. It is, therefore, intended that the appended claims cover all such equivalent variations as fall within the true spirit and scope of the invention.

Claims

CLAIMS We claim:
1. A DNA-guided RNA cleavage system comprising:
a yeast Argonaute polypeptide; and
a heterologous, single-stranded oligonucleotide guide molecule;
wherein the single-stranded oligonucleotide guide molecule is a DNA oligonucleotide that is complementary to a target RNA sequence.
2. The system of claim 1, wherein the yeast Argonaute polypeptide is from Vanderwaltozyma polyspora.
3. The system of claim 1, wherein the yeast Argonaute polypeptide is SEQ ID NO: 32.
4. The system of any one of claims 1 to 3, wherein the single-stranded oligonucleotide guide molecule is about 12 to about 30 nucleotides.
5. The system of claim 4, wherein the single-stranded oligonucleotide guide molecule is about 14 to about 26 nucleotides.
6. The system of any one of claims 1 to 5, wherein the target RNA sequence is from a mammal.
7. The system of any one of claims 1 to 5, wherein the target RNA sequence is from a human.
8. A method for cleaving an RNA molecule, comprising:
binding to a target RNA sequence a complex comprising:
a yeast Argonaute polypeptide; and
a heterologous, single-stranded oligonucleotide guide molecule;
wherein the single-stranded oligonucleotide guide molecule is a DNA oligonucleotide that is complementary to the target RNA sequence; and
wherein the Argonaute polypeptide: guide molecule complex cleaves the target RNA sequence.
9. The method of claim 8, wherein the yeast Argonaute polypeptide is from Vanderwaltozyma polyspora.
10. The method of claim 8, wherein the yeast Argonaute polypeptide is SEQ ID NO: 32.
11. The method of any one of claims 8 to 10, wherein the single-stranded oligonucleotide guide molecule is about 12 to about 30 nucleotides.
12. The method of claim 11, wherein the single-stranded oligonucleotide guide molecule is about 14 to about 26 nucleotides.
13. The method of any one of claims 8 to 12, wherein the target RNA sequence is from a mammal.
14. The method of any one of claims 8 to 12, wherein the target RNA sequence is from a human.
15. A kit comprising:
a vector comprising a nucleotide sequence encoding a yeast Argonaute polypeptide operably linked to a promoter; and
a heterologous, single-stranded oligonucleotide guide molecule;
wherein the single-stranded oligonucleotide guide molecule is a DNA oligonucleotide that is complementary to a target RNA sequence.
16. The kit of claim 15, wherein the yeast Argonaute polypeptide is from Vanderwaltozyma polyspora.
17. The kit of claim 15, wherein the yeast Argonaute polypeptide is SEQ ID NO:32.
18. The kit of any one of claims 15 to 17, wherein the single-stranded oligonucleotide guide molecule is about 12 to about 30 nucleotides.
19. The kit of claim 18, wherein the single-stranded oligonucleotide guide molecule is about 14 to about 26 nucleotides.
20. The kit of any one of claims 15 to 19, wherein the target RNA sequence is from a mammal.
21. The kit of any one of claims 15 to 19, wherein the target RNA sequence is from a human.
22. A method for attenuating expression of a target gene in a cell, comprising:
introducing into the cell a yeast Argonaute polypeptide; and
introducing into the cell a single stranded DNA (ssDNA) in an amount sufficient to attenuate expression of the target gene, wherein the ssDNA comprises a nucleotide sequence that is complementary to a nucleotide sequence of the target gene.
23. The method of claim 22, wherein the yeast Argonaute polypeptide is from Vanderwaltozyma polyspora.
24. The method of claim 22, wherein the yeast Argonaute polypeptide is SEQ ID NO:32.
25. The method of any one of claims 22 to 24, wherein the single-stranded oligonucleotide guide molecule is about 12 to about 30 nucleotides.
26. The method of claim 25, wherein the single-stranded oligonucleotide guide molecule is about 14 to about 26 nucleotides.
27. The method of any one of claims 22 to 26, wherein the target RNA sequence is from a mammal.
28. The method of any one of claims 22 to 26, wherein the target RNA sequence is from a human.
29. A method of detecting nuclease accessibility sites in an RNA sequence, the method comprising:
a. binding to a target RNA sequence a complex comprising a yeast Argonaute polypeptide and a first single-stranded DNA oligonucleotide guide molecule, wherein the single-stranded DNA oligonucleotide guide molecule is complementary to the target RNA sequence;
b. cleaving the target RNA sequence with the Argonaute polypeptide: guide complex to form an RNA cleavage product;
c. detecting the RNA cleavage product; and
d. determining a nuclease accessibility site based on the RNA cleavage product.
30. The method of claim 29, wherein the detecting step c) comprises reverse transcribing the RNA cleavage product to form a cDNA reverse transcript, and amplifying the cDNA reverse transcript.
31. The method of claim 30, wherein the cDNA reverse transcript is separated based on size.
32. The method of claim 30 or claim 31, wherein the cDNA reverse transcript is separated by capillary electrophoresis.
33. The method of any one of claims 29 to 32, wherein the nuclease accessibility site is determined based on sequencing the cDNA reverse transcript.
34. The method of any one of claims 29 to 33, wherein the binding step a) further
comprises binding to a target RNA sequence a second complex comprising a yeast Argonaute polypeptide and a second single-stranded DNA oligonucleotide guide molecule, wherein the second single-stranded DNA oligonucleotide guide molecule is complementary to the target RNA sequence.
35. The method of claim 34, wherein the first single-stranded oligonucleotide guide
molecule and the second single-stranded oligonucleotide guide molecule are each from 12 to 45 nucleotides in length.
36. The method of any one of claims 29 to 35, wherein the yeast Argonaute polypeptide has at least 60% identity to a polypeptide sequence selected from SEQ ID NO:31, SEQ ID NO:32, and SEQ ID NO:33.
37. The method of any one of claims 29 to 36, wherein the target RNA sequence is one continuous RNA molecule.
38. The method of any one of claims 29 to 37, wherein the target RNA sequence
comprises more than one different RNA molecule.
39. The method of any one of claims 29 to 38, wherein step d) determines more than one nuclease accessibility site in the target RNA sequence.
40. A method of high-throughput detection of nuclease accessibility sites, the method comprising:
a. assaying a target RNA sequence with two or more Argonaute
polypeptide:guide complexes, wherein each complex comprises a yeast Argonaute polypeptide and a single-stranded DNA oligonucleotide guide molecule from a library of single-stranded DNA oligonucleotide guide molecules, wherein each single-stranded DNA oligonucleotide guide molecule is complementary to a portion of the target RNA sequence;
b. cleaving the target RNA sequence with the Argonaute polypeptide: guide complexes to form at least one RNA cleavage product;
c. detecting the at least one RNA cleavage product; and
d. determining a nuclease accessibility site based on the at least one RNA
cleavage product.
41. The method of claim 40, wherein the target RNA sequence is assayed separately with each of the two or more Argonaute polypeptide:guide complexes.
42. The method of claim 40, wherein the target RNA sequence is assayed together with each of the two or more Argonaute polypeptide:guide complexes in a mixture.
43. The method of any one of claims 40 to 42, wherein five or more Argonaute
polypeptide:guide complexes are assayed.
44. The method of any one of claims 40 to 43, wherein step d) determines more than one nuclease accessibility site in the target RNA sequence.
45. A DNA-guided RNA cleavage system for high-throughput detection of nuclease accessibility sites, the system comprising:
a first complex comprising a first yeast Argonaute polypeptide and a first single- stranded DNA oligonucleotide guide molecule; and
a second complex comprising a second yeast Argonaute polypeptide and a second single-stranded DNA oligonucleotide guide molecule; wherein the first and second single-stranded DNA oligonucleotide guide molecules are not identical and are complementary to a target RNA sequence.
46. A kit comprising a vector comprising a nucleic acid sequence encoding a yeast Argonaute polypeptide operably linked to a promoter; an RNA-dependent DNA polymerase; a set of buffered RNA cleavage reagents; and a set of buffered reverse transcription reagents.
PCT/US2017/066664 2016-12-16 2017-12-15 Systems and methods for dna-guided rna cleavage WO2018112336A1 (en)

Applications Claiming Priority (4)

Application Number Priority Date Filing Date Title
US201662435272P 2016-12-16 2016-12-16
US62/435,272 2016-12-16
US201762580642P 2017-11-02 2017-11-02
US62/580,642 2017-11-02

Publications (1)

Publication Number Publication Date
WO2018112336A1 true WO2018112336A1 (en) 2018-06-21

Family

ID=62559595

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/US2017/066664 WO2018112336A1 (en) 2016-12-16 2017-12-15 Systems and methods for dna-guided rna cleavage

Country Status (1)

Country Link
WO (1) WO2018112336A1 (en)

Cited By (24)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US10323236B2 (en) 2011-07-22 2019-06-18 President And Fellows Of Harvard College Evaluation and improvement of nuclease cleavage specificity
US10465176B2 (en) 2013-12-12 2019-11-05 President And Fellows Of Harvard College Cas variants for gene editing
US10508298B2 (en) 2013-08-09 2019-12-17 President And Fellows Of Harvard College Methods for identifying a target site of a CAS9 nuclease
US10597679B2 (en) 2013-09-06 2020-03-24 President And Fellows Of Harvard College Switchable Cas9 nucleases and uses thereof
US10682410B2 (en) 2013-09-06 2020-06-16 President And Fellows Of Harvard College Delivery system for functional nucleases
US10704062B2 (en) 2014-07-30 2020-07-07 President And Fellows Of Harvard College CAS9 proteins including ligand-dependent inteins
US10745677B2 (en) 2016-12-23 2020-08-18 President And Fellows Of Harvard College Editing of CCR5 receptor gene to protect against HIV infection
US10858639B2 (en) 2013-09-06 2020-12-08 President And Fellows Of Harvard College CAS9 variants and uses thereof
US10947530B2 (en) 2016-08-03 2021-03-16 President And Fellows Of Harvard College Adenosine nucleobase editors and uses thereof
US11046948B2 (en) 2013-08-22 2021-06-29 President And Fellows Of Harvard College Engineered transcription activator-like effector (TALE) domains and uses thereof
US11214780B2 (en) 2015-10-23 2022-01-04 President And Fellows Of Harvard College Nucleobase editors and uses thereof
US11268082B2 (en) 2017-03-23 2022-03-08 President And Fellows Of Harvard College Nucleobase editors comprising nucleic acid programmable DNA binding proteins
US11306324B2 (en) 2016-10-14 2022-04-19 President And Fellows Of Harvard College AAV delivery of nucleobase editors
US11319532B2 (en) 2017-08-30 2022-05-03 President And Fellows Of Harvard College High efficiency base editors comprising Gam
US11447770B1 (en) 2019-03-19 2022-09-20 The Broad Institute, Inc. Methods and compositions for prime editing nucleotide sequences
US11542496B2 (en) 2017-03-10 2023-01-03 President And Fellows Of Harvard College Cytosine to guanine base editor
US11542509B2 (en) 2016-08-24 2023-01-03 President And Fellows Of Harvard College Incorporation of unnatural amino acids into proteins using base editing
US11560566B2 (en) 2017-05-12 2023-01-24 President And Fellows Of Harvard College Aptazyme-embedded guide RNAs for use with CRISPR-Cas9 in genome editing and transcriptional activation
US11661590B2 (en) 2016-08-09 2023-05-30 President And Fellows Of Harvard College Programmable CAS9-recombinase fusion proteins and uses thereof
US11732274B2 (en) 2017-07-28 2023-08-22 President And Fellows Of Harvard College Methods and compositions for evolving base editors using phage-assisted continuous evolution (PACE)
US11795443B2 (en) 2017-10-16 2023-10-24 The Broad Institute, Inc. Uses of adenosine base editors
US11898179B2 (en) 2017-03-09 2024-02-13 President And Fellows Of Harvard College Suppression of pain by gene editing
WO2024033790A1 (en) * 2022-08-08 2024-02-15 Waters Technologies Corporation mRNA ANALYSIS USING RESTRICTION ENZYMES
US11912985B2 (en) 2020-05-08 2024-02-27 The Broad Institute, Inc. Methods and compositions for simultaneous editing of both strands of a target double-stranded nucleotide sequence

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6013447A (en) * 1997-11-21 2000-01-11 Innovir Laboratories, Inc. Random intracellular method for obtaining optimally active nucleic acid molecules
US20030165952A1 (en) * 2000-07-21 2003-09-04 Sten Linnarsson Method and an alggorithm for mrna expression analysis
US20160289734A1 (en) * 2015-04-03 2016-10-06 University Of Massachusetts Methods of using oligonucleotide-guided argonaute proteins
WO2016187583A1 (en) * 2015-05-21 2016-11-24 Cofactor Genomics, Inc. Methods for generating circular dna from circular rna

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6013447A (en) * 1997-11-21 2000-01-11 Innovir Laboratories, Inc. Random intracellular method for obtaining optimally active nucleic acid molecules
US20030165952A1 (en) * 2000-07-21 2003-09-04 Sten Linnarsson Method and an alggorithm for mrna expression analysis
US20160289734A1 (en) * 2015-04-03 2016-10-06 University Of Massachusetts Methods of using oligonucleotide-guided argonaute proteins
WO2016187583A1 (en) * 2015-05-21 2016-11-24 Cofactor Genomics, Inc. Methods for generating circular dna from circular rna

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
GAO, F ET AL.: "DNA-Guided Genome Editing using the Natronobacterium gregoryi Argonaute", NATURE BIOTECHNOLOGY, vol. 35, no. 8, 2 May 2016 (2016-05-02), pages 1 - 7, XP055287398 *
NAKANISHI, K ET AL.: "Structure of Yeast Argonaute with Guide RNA", NATURE, vol. 486, no. 7403, 20 June 2012 (2012-06-20), pages 1 - 47, XP055511035 *

Cited By (36)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US10323236B2 (en) 2011-07-22 2019-06-18 President And Fellows Of Harvard College Evaluation and improvement of nuclease cleavage specificity
US10954548B2 (en) 2013-08-09 2021-03-23 President And Fellows Of Harvard College Nuclease profiling system
US11920181B2 (en) 2013-08-09 2024-03-05 President And Fellows Of Harvard College Nuclease profiling system
US10508298B2 (en) 2013-08-09 2019-12-17 President And Fellows Of Harvard College Methods for identifying a target site of a CAS9 nuclease
US11046948B2 (en) 2013-08-22 2021-06-29 President And Fellows Of Harvard College Engineered transcription activator-like effector (TALE) domains and uses thereof
US10597679B2 (en) 2013-09-06 2020-03-24 President And Fellows Of Harvard College Switchable Cas9 nucleases and uses thereof
US10682410B2 (en) 2013-09-06 2020-06-16 President And Fellows Of Harvard College Delivery system for functional nucleases
US11299755B2 (en) 2013-09-06 2022-04-12 President And Fellows Of Harvard College Switchable CAS9 nucleases and uses thereof
US10858639B2 (en) 2013-09-06 2020-12-08 President And Fellows Of Harvard College CAS9 variants and uses thereof
US10912833B2 (en) 2013-09-06 2021-02-09 President And Fellows Of Harvard College Delivery of negatively charged proteins using cationic lipids
US11053481B2 (en) 2013-12-12 2021-07-06 President And Fellows Of Harvard College Fusions of Cas9 domains and nucleic acid-editing domains
US11124782B2 (en) 2013-12-12 2021-09-21 President And Fellows Of Harvard College Cas variants for gene editing
US10465176B2 (en) 2013-12-12 2019-11-05 President And Fellows Of Harvard College Cas variants for gene editing
US11578343B2 (en) 2014-07-30 2023-02-14 President And Fellows Of Harvard College CAS9 proteins including ligand-dependent inteins
US10704062B2 (en) 2014-07-30 2020-07-07 President And Fellows Of Harvard College CAS9 proteins including ligand-dependent inteins
US11214780B2 (en) 2015-10-23 2022-01-04 President And Fellows Of Harvard College Nucleobase editors and uses thereof
US11702651B2 (en) 2016-08-03 2023-07-18 President And Fellows Of Harvard College Adenosine nucleobase editors and uses thereof
US10947530B2 (en) 2016-08-03 2021-03-16 President And Fellows Of Harvard College Adenosine nucleobase editors and uses thereof
US11661590B2 (en) 2016-08-09 2023-05-30 President And Fellows Of Harvard College Programmable CAS9-recombinase fusion proteins and uses thereof
US11542509B2 (en) 2016-08-24 2023-01-03 President And Fellows Of Harvard College Incorporation of unnatural amino acids into proteins using base editing
US11306324B2 (en) 2016-10-14 2022-04-19 President And Fellows Of Harvard College AAV delivery of nucleobase editors
US10745677B2 (en) 2016-12-23 2020-08-18 President And Fellows Of Harvard College Editing of CCR5 receptor gene to protect against HIV infection
US11820969B2 (en) 2016-12-23 2023-11-21 President And Fellows Of Harvard College Editing of CCR2 receptor gene to protect against HIV infection
US11898179B2 (en) 2017-03-09 2024-02-13 President And Fellows Of Harvard College Suppression of pain by gene editing
US11542496B2 (en) 2017-03-10 2023-01-03 President And Fellows Of Harvard College Cytosine to guanine base editor
US11268082B2 (en) 2017-03-23 2022-03-08 President And Fellows Of Harvard College Nucleobase editors comprising nucleic acid programmable DNA binding proteins
US11560566B2 (en) 2017-05-12 2023-01-24 President And Fellows Of Harvard College Aptazyme-embedded guide RNAs for use with CRISPR-Cas9 in genome editing and transcriptional activation
US11732274B2 (en) 2017-07-28 2023-08-22 President And Fellows Of Harvard College Methods and compositions for evolving base editors using phage-assisted continuous evolution (PACE)
US11319532B2 (en) 2017-08-30 2022-05-03 President And Fellows Of Harvard College High efficiency base editors comprising Gam
US11932884B2 (en) 2017-08-30 2024-03-19 President And Fellows Of Harvard College High efficiency base editors comprising Gam
US11795443B2 (en) 2017-10-16 2023-10-24 The Broad Institute, Inc. Uses of adenosine base editors
US11795452B2 (en) 2019-03-19 2023-10-24 The Broad Institute, Inc. Methods and compositions for prime editing nucleotide sequences
US11643652B2 (en) 2019-03-19 2023-05-09 The Broad Institute, Inc. Methods and compositions for prime editing nucleotide sequences
US11447770B1 (en) 2019-03-19 2022-09-20 The Broad Institute, Inc. Methods and compositions for prime editing nucleotide sequences
US11912985B2 (en) 2020-05-08 2024-02-27 The Broad Institute, Inc. Methods and compositions for simultaneous editing of both strands of a target double-stranded nucleotide sequence
WO2024033790A1 (en) * 2022-08-08 2024-02-15 Waters Technologies Corporation mRNA ANALYSIS USING RESTRICTION ENZYMES

Similar Documents

Publication Publication Date Title
WO2018112336A1 (en) Systems and methods for dna-guided rna cleavage
Tambe et al. RNA binding and HEPN-nuclease activation are decoupled in CRISPR-Cas13a
US20210324382A1 (en) Chimeric DNA:RNA Guide for High Accuracy Cas9 Genome Editing
JP7053706B2 (en) Increased specificity of RNA-induced genome editing with shortened guide RNA (tru-gRNA)
Hegge et al. DNA-guided DNA cleavage at moderate temperatures by Clostridium butyricum Argonaute
Taylor et al. Catalysts from synthetic genetic polymers
US20200325471A1 (en) Compositions and methods for detecting nucleic acid regions
Huff et al. Dnmt1-independent CG methylation contributes to nucleosome positioning in diverse eukaryotes
Lim et al. Uridylation by TUT4 and TUT7 marks mRNA for degradation
Zhao et al. Gene silencing by artificial microRNAs in Chlamydomonas
Nakanishi et al. Structure of yeast Argonaute with guide RNA
Motamedi et al. Two RNAi complexes, RITS and RDRC, physically interact and localize to noncoding centromeric RNAs
Peška et al. Characterisation of an unusual telomere motif (TTTTTTAGGG) n in the plant Cestrum elegans (Solanaceae), a species with a large genome
Schlosser et al. A versatile endoribonuclease mimic made of DNA: characteristics and applications of the 8–17 RNA‐cleaving DNAzyme
Doxzen et al. DNA recognition by an RNA-guided bacterial Argonaute
Song et al. Argonaute and RNA—getting into the groove
WO2019178428A1 (en) Novel crispr dna and rna targeting enzymes and systems
Li et al. Stand-alone rolling circle amplification combined with capillary electrophoresis for specific detection of small RNA
Taylor et al. A modular XNAzyme cleaves long, structured RNAs under physiological conditions and enables allele-specific gene silencing
Sczepanski et al. Specific inhibition of microRNA processing using L-RNA aptamers
Wang et al. Profiling of circular RNA N6‐methyladenosine in moso bamboo (Phyllostachys edulis) using nanopore‐based direct RNA sequencing
Fajkus et al. Evolution of plant telomerase RNAs: farther to the past, deeper to the roots
Jin et al. Argonaute proteins: structures and their endonuclease activity
Gainetdinov et al. Relaxed targeting rules help PIWI proteins silence transposons
Loffer et al. A DCL3 dicing code within Pol IV-RDR2 transcripts diversifies the siRNA pool guiding RNA-directed DNA methylation

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 17881544

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 17881544

Country of ref document: EP

Kind code of ref document: A1