WO2023250384A2 - Crispr-cas effector polypeptides and methods of use thereof - Google Patents

Crispr-cas effector polypeptides and methods of use thereof Download PDF

Info

Publication number
WO2023250384A2
WO2023250384A2 PCT/US2023/068823 US2023068823W WO2023250384A2 WO 2023250384 A2 WO2023250384 A2 WO 2023250384A2 US 2023068823 W US2023068823 W US 2023068823W WO 2023250384 A2 WO2023250384 A2 WO 2023250384A2
Authority
WO
WIPO (PCT)
Prior art keywords
crispr
polypeptide
cas effector
cell
activity
Prior art date
Application number
PCT/US2023/068823
Other languages
French (fr)
Other versions
WO2023250384A3 (en
Inventor
Jennifer A. Doudna
Basem AL-SHAYEB
Original Assignee
The Regents Of The University Of California
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by The Regents Of The University Of California filed Critical The Regents Of The University Of California
Publication of WO2023250384A2 publication Critical patent/WO2023250384A2/en
Publication of WO2023250384A3 publication Critical patent/WO2023250384A3/en

Links

Classifications

    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12NMICROORGANISMS OR ENZYMES; COMPOSITIONS THEREOF; PROPAGATING, PRESERVING, OR MAINTAINING MICROORGANISMS; MUTATION OR GENETIC ENGINEERING; CULTURE MEDIA
    • C12N9/00Enzymes; Proenzymes; Compositions thereof; Processes for preparing, activating, inhibiting, separating or purifying enzymes
    • C12N9/14Hydrolases (3)
    • C12N9/16Hydrolases (3) acting on ester bonds (3.1)
    • C12N9/22Ribonucleases RNAses, DNAses
    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12NMICROORGANISMS OR ENZYMES; COMPOSITIONS THEREOF; PROPAGATING, PRESERVING, OR MAINTAINING MICROORGANISMS; MUTATION OR GENETIC ENGINEERING; CULTURE MEDIA
    • C12N15/00Mutation or genetic engineering; DNA or RNA concerning genetic engineering, vectors, e.g. plasmids, or their isolation, preparation or purification; Use of hosts therefor
    • C12N15/09Recombinant DNA-technology
    • C12N15/87Introduction of foreign genetic material using processes not otherwise provided for, e.g. co-transformation
    • C12N15/90Stable introduction of foreign DNA into chromosome
    • C12N15/902Stable introduction of foreign DNA into chromosome using homologous recombination
    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12NMICROORGANISMS OR ENZYMES; COMPOSITIONS THEREOF; PROPAGATING, PRESERVING, OR MAINTAINING MICROORGANISMS; MUTATION OR GENETIC ENGINEERING; CULTURE MEDIA
    • C12N15/00Mutation or genetic engineering; DNA or RNA concerning genetic engineering, vectors, e.g. plasmids, or their isolation, preparation or purification; Use of hosts therefor
    • C12N15/09Recombinant DNA-technology
    • C12N15/87Introduction of foreign genetic material using processes not otherwise provided for, e.g. co-transformation
    • C12N15/90Stable introduction of foreign DNA into chromosome
    • C12N15/902Stable introduction of foreign DNA into chromosome using homologous recombination
    • C12N15/907Stable introduction of foreign DNA into chromosome using homologous recombination in mammalian cells
    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12NMICROORGANISMS OR ENZYMES; COMPOSITIONS THEREOF; PROPAGATING, PRESERVING, OR MAINTAINING MICROORGANISMS; MUTATION OR GENETIC ENGINEERING; CULTURE MEDIA
    • C12N2310/00Structure or type of the nucleic acid
    • C12N2310/10Type of nucleic acid
    • C12N2310/20Type of nucleic acid involving clustered regularly interspaced short palindromic repeats [CRISPRs]

Definitions

  • a Sequence Listing is provided herewith as a Sequence Listing XML, “BERK- 472WO_SeqList.xml” created on June 21, 2023 and having a size of 203,574 bytes.
  • the contents of the Sequence Listing XML are incorporated by reference herein in their entirety.
  • CRISPR-Cas systems comprise a CRISPR-associated (Cas) effector polypeptide and a guide nucleic acid.
  • CRISPR-Cas systems can bind to and modify a targeted nucleic acid.
  • the programmable nature of these CRISPR-Cas effector systems has facilitated their use as a versatile technology for use in, e.g., gene editing.
  • the present disclosure provides CRISPR-Cas effector polypeptides that are referred to herein as “Casl2L” polypeptides, “CasZ” polypeptides, or “Cas-lambda” polypeptides.
  • the present disclosure provides a nucleic acid encoding a CasZ polypeptide of the present disclosure.
  • the present disclosure provides methods of modifying a target nucleic acid using a CasZ polypeptide.
  • FIG. 1 is a phylogenetic tree comparing various CasZ (Cas-lambda) sequences, with CasPhi as an outgroup.
  • FIG. 2A-2E (A) Cryo-EM maps of the CasZ- guide RNA- DNA complex in two 90°-rotated orientations.
  • D Close-up views of the residues predicted to be responsible for recognition of the seed and low mismatch tolerance regions observed in (Fig 6F).
  • FIG. 3A-3E depict (A) Schematic representation of the CasZ-gRNA-DNA complex. Disordered linkers are shown as dotted lines. Insets for protein-DNA interactions are shown in (Fig. S5) (B) Cryo-EM maps of the CasZ- guide RNA- DNA complex. The target strand is shown in cyan and the non-target strand is shown in magenta. (C) Cryo-EM-based model of guide RNA-target DNA complex. (D) Schematic of the domain organization and secondary structure of CasZ. (E) Hierarchical clustering dendrogram of different repeats with their predicted secondary structures. CasZ can still cleave ssDNA in trans with guide RNAs consisting of non-cognate repeats that are divergent at the sequence level.
  • FIG. 4 depicts fuorescence output using oligonucleotide activators with mismatches at each respective position along the target DNA. “0” indicates no mismatches (control).
  • FIG. 5A-5M provide amino acid sequences of Cas-lambda polypeptides. Protein sequences, top to bottom: SEQ ID Nos.:7-19. Repeat sequences, top to bottom: SEQ ID Nos.:20-26)
  • FIG. 6 provides the nucleotide sequence of a UBQ10 promoter (SEQ ID NO:5).
  • FIG. 7A-7S provides an alignment of amino acid sequences of various CasL proteins.
  • FIG. 7 also provides a consensus sequence (Top (“Casl2L_l_257905508”) to Bottom (“CasL_61”): SEQ ID NOs.:27-68).
  • FIG. 8A-8D Diversity of CRISPR-encoding phages and the hosts they predate.
  • FIG. 8A An illustration of the mechanism of CRISPR interference as an anti-viral system used by bacteria and bacteriophages.
  • FIG. 8B Protein-clustering network analysis based on the number of shared protein clusters between the CRISPR-encoding phages in this study and RefSeq phages. The plot is composed of viral clusters where each node represents a phage genome, and each edge is the hypergeometric similarity between genomes based on shared protein clusters.
  • FIG. 9A-9E Diversity of phage-encoded CRISPR systems highlights anti-phage capability.
  • FIG. 9A Phage CRISPR spacers target other mobile genetic elements across bacterial phyla to abrogate superinfection via diverse mechanisms.
  • FIG. 9B- FIG. 9E Graphical illustrations of representative phage CRISPR loci harboring known and novel subtypes and their proposed mechanisms and functions as determined via spacer targeting and protein sequence analysis. Special consideration is given to phages carrying multiple loci.
  • FIG. 10A-10C Diversity of Class 2 CRISPR-Cas systems on phage and phage-like genomes.
  • FIG. 10A Maximum likelihood phylogenetic tree of phage encoded and bacterially encoded type II nucleases and respective predicted ancestral IscB nucleases. Bootstrap and approximate likelihood-ratio test values > 90 are denoted on the branches, and the bootstrap support percentages at branch points are shown in numbers. Bottom illustration of genomic CRISPR-Cas loci of type II and representative type V systems previously employed in genome editing applications. (FIG.
  • FIG. 11A-11H Cask processes its own crRNA and cleaves dsDNA.
  • FIG. 11A Caskl from huge Mahaphages displays a unique crRNA hairpin compared to known Casl2 enzymes, and is reminiscent of stem-loop 1 of the engineered SpyCas9 single gRNA (sgRNA).
  • FIG. 11B Cask repeats uniquely display highly conserved nucleotide sequences at the 5', 3', and center of the RNA.
  • FIG. 11C 5' radiolabeling of crRNAs indicates that Cask 1 uniquely processes its own crRNA in the spacer region (or 3' end).
  • FIG. 1 ID Processing of the repeat-spacer-repeat pre-crRNA substrate occurs similarly to (C) in the spacer region and does not occur in the absence of Mg2+, indicating a role for the RuvC in the processing mechanism.
  • FIG. 1 IE Cask with targeting or non-targeting guides validates its capacity to cleave DNA flanking experimentally determined PAMs in E. coll at different dilutions.
  • FIG. 11F Cleavage assay targeting dsDNA for mapping of the cleavage structure.
  • FIG. 11G Scheme illustrating the DNA cleavage pattern.
  • FIG. 12A-12D Cask RNPs are functional for editing endogenous genes in human, Arabidopsis, and wheat cells with large deletion profiles.
  • FIG. 12A Indel efficiency using Cask and Casl2a RNPs with identical spacers targeting VEGF, and Cask RNPs targeting EMX1 genes in HEK293T cells, and a schematic of the in vitro model of DNA cleavage outcomes following DNA cleavage by Cask.
  • FIG. 12B and FIG. 12C Indel efficiencies in Arabidopsis thaliana protoplasts show significantly higher levels of editing than previously achieved by Cas ⁇ D for the same PDS3 gene, and (C) in wheat protoplasts targeting the disease resistance gene Snn5.
  • FIG. 12D Indel profdes generated by Cask RNP administration show primarily large deletions, and little change without Cask.
  • FIG. 13A-13F Structure of Cask-gRNA-DNA complex.
  • FIG. 13 A Schematic representation of the Cask-gRNA-DNA complex. Disordered linkers are shown as dotted lines. Insets for protein-DNA interactions are shown in FIG. 18.
  • FIG. 13B Cryo-EM maps of the Cask-guide-RNA- DNA complex. The target strand is shown in cyan and the non-target strand is shown in magenta.
  • FIG. 13C Cryo-EM-based model of guide RNA-target DNA complex.
  • FIG. 13D Schematic of the domain organization and secondary structure of Cask.
  • FIG. 13E Hierarchical clustering dendrogram of different repeats with their predicted secondary structures. Cask can still cleave ssDNA in trans with guide RNAs consisting of non-cognate repeats that are divergent at the sequence level.
  • FIG. 13F Fluorescence output using oligonucleotide activators with mismatches at each respective position along the target DNA. “0” indicates no mismatches (control). Insets relating to protein-DNA interactions are shown in FIG. 19. [0018] FIG. 14A-14B Sequence similarity of phage-encoded CRISPR-Cas systems, related to FIG. 10. (FIG.
  • Phage type III CaslO Alignment of CaslO effectors from Huge Phages with those sourced from bacterial genomes.
  • Phage type III CaslO are predicted to cleave DNA via CaslO HD nuclease, but lack the residues required for the Palm domain to generate cyclic oligonucleotide signaling molecules.
  • FIG. 14B Alignment of Cas7 proteins from phages with those sourced from bacterial genomes. Phage type III Cas7’s have conserved motifs that are predicted to cleave RNA.
  • FIG. 15A-15D Structure of phage-encoded Cas9-like systems and comparison of type I-X CRISPR arrays, related to Figs. 1 and 2.
  • FIG. 15A Predicted domain organization for hypercompact phage-encoded Cas9-like systems.
  • FIG. 15B Predicted models for Cas9-like phage-encoded systems.
  • FIG. 15C comparison of type I-X and Cas ⁇ D-encoded Biggiephages recovered across a four-year time frame using Mauve, with the CRISPR repeat locations denoted in blue. Identical sequences at the nucleotide level are shown in green, with differences shown in brown or red.
  • FIG. 15D Phage type I-X CRISPR arrays from metagenomes sampled from the same site over the span of four years show remarkably stable arrays.
  • FIG. 16A-16D Divergent properties of Cask, related to FIG. 11.
  • FIG. 16 A Cask remote homolog searches across public databases led to poor hits and no similarity to known CRISPR-Cas proteins, where only poor hits (green-black) were observed in one RuvC motif.
  • FIG. 16B Comparison of crRNA repeat similarity across orthologs.
  • FIG. 16C Comparison of protein similarity across orthologs.
  • FIG. 16D A time-series experiment incubating Cask with 5’ radiolabeled crRNAs with the product run on a 20% Urea PAGE gel supports the finding that Cask uniquely processes its own single crRNA in the spacer region (or 3‘ end).
  • FIG. 17A-17C Cask PAM specificity and comparison with other Cas otholog trans-cleavage and indel profiles, related to FIG. 11.
  • FIG. 17A The most depleted 5’ PAMs resulting from the PAM depletion assay, indicating DNA recognition and cleavage preferences for CasX I .
  • FIG. 17B DNAse alert trans-cleavage assay with the same molarities of Casl2a, CasZ. and CasO targeting the same ssDNA activator.
  • FIG. 17C CasZ indel profile in HEK293T cells compared to AsCasl2a. Guide 107 targets the antisense strand, while guide 109 targets the sense strand of VEGFa.
  • FIG. 18A-18H Cryo-EM workflow, related to FIG. 13.
  • FIG. 18 A Map generation pipeline in cryoSPARC.
  • FIG. 18B- FIG. 18D Representative 2D class averages of the final set of particles, (C) the corresponding 3D maps resulting from ab initio reconstruction, and further (D) from heterogeneous refinement.
  • FIG. 18E Local resolution map as calculated in cryoSPARC v3.3.
  • FIG. 18F Orientation distribution of the final set of refined particles.
  • FIG. 18G and FIG. 18H gold standard, and (H) map versus model FSC curves of the model refined to the LocSpiral map and plotted with the final cryoSPARC sharp experimental map.
  • FIG. 19A-19E Structure of Cask ternary complex, related to FIG. 13.
  • FIG. 19A Cryo-EM maps of the Cas -guide RNA- DNA complex in two 90°-rotated orientations.
  • FIG. 19B Cartoon representation of the CasX-gRNA-DNA complex. Insets highlight residues N102, S253, N254 predicted to be responsible for PAM recognition. Hydrogen bonds are shown as dashed lines.
  • FIG. 19C Model of guide RNA-target DNA complex, with insets highlighting residues conserved across the protein family that are predicted to be interacting with the RNA.
  • FIG. 19D Close-up views of the residues predicted to be responsible for recognition of the seed and low mismatch tolerance regions observed in (FIG. 13F).
  • FIG. 19E Direct comparison of Cask and Cas ⁇ l> (PDB-ID: 7LYS) with a dashed bubble highlighting the CasX TSL domain. Differences in Reel (Blue) can also be observed between the two proteins.
  • FIG. 20 Structural comparison of Casl2 orthologs, related to FIG. 13. Structural comparison of all DNA-targeting Casl2’s in order of increasing RNP size: Cas ⁇ I» (7LYS26), CasX (6NY327), Casl2i (6W5C50), Casl2a (5XUS51), Casl2b (5WTI52), Casl2f (7C7L53).
  • FIG. 21A-21F Trans-cleavage assay, related to FIG. 17.
  • FIG. 21 A- FIG. 21F Trans- cleavage assays conducting with RNase Alert reporter substrate at decreasing RNP concentrations
  • FIG. 21 A- FIG. 21 C for binary and ternary complexes of CasX
  • FIG. 21 D PolyU RNA reporter substrates, and testing cell viability assays with cells expressing CasX in conjunction with (FIG. 21E) targeting and (FIG. 2 IF) non-targeting guides.
  • Heterologous means a nucleotide sequence or an amino acid sequence that is not found in the native nucleic acid or protein, respectively.
  • a heterologous polypeptide comprises an amino acid sequence from a protein other than the CRISPR-Cas effector polypeptide.
  • a CRISPR-Cas effector polypeptide can be fused to an active domain from a non-CRISPR-Cas effector polypeptide; the sequence of the active domain can be considered a heterologous polypeptide (it is heterologous to the CRISPR-Cas effector polypeptide).
  • a heterologous guide nucleotide sequence (present in a targeting segment) that can hybridize with a target nucleotide sequence (target region) of a target nucleic acid is a nucleotide sequence that is not found in nature in a guide nucleic acid together with a binding segment that can bind to a CRISPR-Cas effector polypeptide of the present disclosure.
  • a heterologous target nucleotide sequence (present in a heterologous targeting segment) is from a different source than a binding nucleotide sequence (present in a binding segment) that can bind to a CRISPR-Cas effector polypeptide of the present disclosure.
  • a guide nucleic acid may comprise a guide nucleotide sequence (present in a targeting segment) that can hybridize with a target nucleotide sequence present in a eukaryotic target nucleic acid.
  • a guide nucleic acid of the present disclosure can be generated by human intervention and can comprise a nucleotide sequence not found in a naturally-occurring guide nucleic acid.
  • nucleic acid refers to a nucleic acid, cell, protein, or organism that is found in nature.
  • polynucleotide and “nucleic acid,” used interchangeably herein, refer to a polymeric form of nucleotides of any length, either ribonucleotides or deoxynucleotides or combinations thereof. Thus, this term includes, but is not limited to, single-, double-, or multi-stranded DNA or RNA, genomic DNA, cDNA, DNA-RNA hybrids, or a polymer comprising purine and pyrimidine bases or other natural, chemically or biochemically modified, non-natural, or derivatized nucleotide bases.
  • polynucleotide and “nucleic acid” should be understood to include, as applicable to the embodiment being described, single-stranded (such as sense or antisense) and double-stranded polynucleotides .
  • polypeptide refers to a polymeric form of amino acids of any length, which can include genetically coded and non-genetically coded amino acids, chemically or biochemically modified or derivatized amino acids, and polypeptides having modified peptide backbones.
  • the term includes fusion proteins, including, but not limited to, fusion proteins with a heterologous amino acid sequence.
  • Polypeptides as described herein also include polypeptides having various amino acid additions, deletions, or substitutions relative to the native amino acid sequence of a polypeptide of the present disclosure.
  • polypeptides that are homologs of a polypeptide of the present disclosure contain non-conservative changes of certain amino acids relative to the native sequence of a polypeptide of the present disclosure.
  • polypeptides that are homologs of a polypeptide of the present disclosure contain conservative changes of certain amino acids relative to the native sequence of a polypeptide of the present disclosure, and thus may be referred to as conservatively modified valiants.
  • a conservatively modified variant may include individual substitutions, deletions or additions to a polypeptide sequence which result in the substitution of an amino acid with a chemically similar amino acid. Conservative substitution tables providing functionally similar amino acids are well- known in the art. Such conservatively modified variants are in addition to and do not exclude polymorphic variants, interspecies homologs, and alleles of the disclosure.
  • the following eight groups contain amino acids that are conservative substitutions for one another: 1) Alanine (A), Glycine (G); 2) Aspartic acid (D), Glutamic acid (E); 3) Asparagine (N), Glutamine (Q); 4) Arginine (R), Lysine (K); 5) Isoleucine (I), Leucine (L), Methionine (M), Valine (V); 6) Phenylalanine (F), Tyrosine (Y), Tryptophan (W); 7) Serine (S), Threonine (T); and 8) Cysteine (C), Methionine (M) (see, e.g., Creighton, Proteins (1984)).
  • a modification of an amino acid to produce a chemically similar amino acid may be referred to as an analogous amino acid.
  • a polynucleotide or polypeptide has a certain percent “sequence identity” to another polynucleotide or polypeptide, meaning that, when aligned, that percentage of bases or amino acids are the same, and in the same relative position, when comparing the two sequences. Sequence similarity can be determined in a number of different manners. To determine sequence identity, sequences can be aligned using the methods and computer programs, including BLAST, available over the world wide web at ncbi.nlm.nih.gov/BLAST. See, e.g., Altschul et al. (1990), J. Mol. Biol. 215:403-10.
  • FASTA is FASTA, available in the Genetics Computing Group (GCG) package, from Madison, Wisconsin, USA, a wholly owned subsidiary of Oxford Molecular Group, Inc.
  • GCG Genetics Computing Group
  • Other techniques for alignment are described in Methods in Enzymology, vol. 266: Computer Methods for Macromolecular Sequence Analysis (1996), ed. Doolittle, Academic Press, Inc., a division of Harcourt Brace & Co., San Diego, California, USA.
  • alignment programs that permit gaps in the sequence.
  • the Smith- Waterman is one type of algorithm that permits gaps in sequence alignments. See Meth. Mol. Biol. 70: 173-187 (1997).
  • the GAP program using the Needleman and Wunsch alignment method can be utilized to align sequences. See J. Mol. Biol. 48: 443-453 (1970).
  • Recombinant means that a particular nucleic acid (DNA or RNA) is the product of various combinations of cloning, restriction, and/or ligation steps resulting in a construct having a structural coding or non-coding sequence distinguishable from endogenous nucleic acids found in natural systems.
  • DNA sequences encoding the structural coding sequence can be assembled from cDNA fragments and short oligonucleotide linkers, or from a series of synthetic oligonucleotides, to provide a synthetic nucleic acid which is capable of being expressed from a recombinant transcriptional unit contained in a cell or in a cell-free transcription and translation system.
  • sequences can be provided in the form of an open reading frame uninterrupted by internal non-translated sequences, or introns, which are typically present in eukaryotic genes.
  • Genomic DNA comprising the relevant sequences can also be used in the formation of a recombinant gene or transcriptional unit. Sequences of non-translated DNA may be present 5’ or 3’ from the open reading frame, where such sequences do not interfere with manipulation or expression of the coding regions, and may indeed act to modulate production of a desired product by various mechanisms (see “DNA regulatory sequences”, below).
  • the term “recombinant” polypeptide refers to a polypeptide which is not naturally occurring, e.g., is made by the artificial combination of two otherwise separated segments of amino sequence through human intervention. Thus, e.g., a polypeptide that comprises a heterologous amino acid sequence is recombinant.
  • the term “recombinant” polynucleotide or “recombinant” nucleic acid refers to one which is not naturally occurring, e.g., is made by the artificial combination of two otherwise separated segments of sequence through human intervention.
  • This artificial combination is often accomplished by either chemical synthesis means, or by the artificial manipulation of isolated segments of nucleic acids, e.g., by genetic engineering techniques. Such is usually done to replace a codon with a redundant codon encoding the same or a conservative amino acid, while typically introducing or removing a sequence recognition site. Alternatively, it is performed to join together nucleic acid segments of desired functions to generate a desired combination of functions.
  • This artificial combination is often accomplished by either chemical synthesis means, or by the artificial manipulation of isolated segments of nucleic acids, e.g., by genetic engineering techniques.
  • DNA regulatory sequences refer to transcriptional and translational control sequences, such as promoters, enhancers, polyadenylation signals, terminators, protein degradation signals, and the like, that provide for and/or regulate expression of a coding sequence and/or production of an encoded polypeptide in a host cell.
  • transformation is used interchangeably herein with “genetic modification” and refers to a permanent or transient genetic change induced in a cell following introduction of new nucleic acid (e.g., DNA exogenous to the cell) into the cell.
  • Genetic change (“modification”) can be accomplished either by incorporation of the new nucleic acid into the genome of the host cell, or by transient or stable maintenance of the new nucleic acid as an episomal element.
  • a permanent genetic change is generally achieved by introduction of new DNA into the genome of the cell.
  • chromosomes In prokaryotic cells, permanent changes can be introduced into the chromosome or via extrachromosomal elements such as plasmids and expression vectors, which may contain one or more selectable markers to aid in their maintenance in the recombinant host cell.
  • Suitable methods of genetic modification include viral infection, transfection, conjugation, protoplast fusion, electroporation, particle gun technology, calcium phosphate precipitation, direct microinjection, and the like.
  • the choice of method is generally dependent on the type of cell being transformed and the circumstances under which the transformation is taking place (i.e. in vitro, ex vivo, or in vivo). A general discussion of these methods can be found in Ausubel, et al, Short Protocols in Molecular Biology, 3rd ed., Wiley & Sons, 1995.
  • “Operably linked” refers to a juxtaposition wherein the components so described are in a relationship permitting them to function in their intended manner.
  • a promoter is operably linked to a coding sequence if the promoter affects its transcription or expression.
  • heterologous promoter and “heterologous control regions” refer to promoters and other control regions that are not normally associated with a particular nucleic acid in nature.
  • a “transcriptional control region heterologous to a coding region” is a transcriptional control region that is not normally associated with the coding region in nature.
  • the “about” used in reference to the lower amount of the range means that the lower amount includes an amount that is 10% lower than the lower amount of the range
  • “about” used in reference to the higher amount of the range means that the higher amount includes an amount 10% higher than the higher amount of the range.
  • “from about 100 to about 1000” means that the range extends from 90 to 1100.
  • a and/or B is intended to include both A and B; A or B; A (alone); and B (alone).
  • the term “and/or” as used herein a phrase such as “A, B, and/or C” is intended to encompass each of the following embodiments: A, B, and C; A, B, or C; A or C; A or B; B or C; A and C; A and B; B and C; A (alone); B (alone); and C (alone).
  • isolated and purified refers to a material that is removed from at least one component with which it is naturally associated (e.g., removed from its original environment).
  • isolated when used in reference to an isolated protein, refers to a protein that has been removed from the culture medium of the host cell that expressed the protein. As such an isolated protein is free of extraneous or unwanted compounds (e.g., nucleic acids, native bacterial or other proteins, etc.).
  • aspects and embodiments of the present disclosure described herein include “comprising,” “consisting,” and “consisting essentially of’ aspects and embodiments.
  • the present disclosure provides CRISPR-Cas effector polypeptides that are referred to herein as “Casl2L” polypeptides, “Cas ” polypeptides, or “Cas-lambda” polypeptides.
  • the present disclosure provides a nucleic acid encoding a CasZ polypeptide of the present disclosure.
  • the present disclosure provides methods of modifying a target nucleic acid using a CasZ polypeptide.
  • a Casl2L polypeptide of the present disclosure is capable of forming a ribonucleoprotein (RNP) complex by binding to or otherwise interacting with a guide nucleic acid (e.g., a guide RNA (gRNA)).
  • a guide nucleic acid e.g., a guide RNA (gRNA)
  • the Casl2L-gRNA ribonucleoprotein complex is capable of being targeted to a target nucleic acid via base pairing between the guide RNA and a target nucleotide sequence in the target nucleic acid that is complementary to the sequence of the guide RNA.
  • the guide RNA thus provides the specificity for targeting a particular target nucleic.
  • the Casl2L-gRNA ribonucleoprotein complex has come into association with a target nucleic acid by virtue of the targeting of the RNP complex to that target nucleic acid by the guide RNA, the Casl2L protein is able to bind to the target nucleic acid.
  • the Casl2L polypeptide will modify the target nucleic acid.
  • the modification comprises homology-directed repair (HDR).
  • the modification comprises non-homologous end joining (NHEJ).
  • a Casl2L polypeptide is a fusion polypeptide comprising: i) a Casl2L polypeptide; and ii) one or more heterologous polypeptides, in some cases, the heterologous polypeptide modifies the target nucleic acid, or a polypeptide associated with the target nucleic acid.
  • the present disclosure provides nucleic acid-guided (e.g., RNA-guided) CRISPR-Cas effector polypeptides for use in CRISPR-based targeting systems in cells (e.g., eukaryotic cells), where the CRISPR-Cas systems provide for modification (“editing”) of a target nucleic acid and/or modification of a polypeptide associated with a target nucleic acid.
  • the present disclosure provides Cas12L polypeptides for use in CRISPR-based targeting systems in plants.
  • Casl2L polypeptides Provided herein are Casl2L polypeptides, nucleic acids encoding the same, compositions containing the same, and methods of using the same to e.g.
  • the present disclosure provides ribonucleoprotein complexes containing a Casl2L polypeptide and a guide RNA which may be used to e.g. edit a target nucleic acid.
  • a guide RNA which may be used to e.g. edit a target nucleic acid.
  • guide RNAs that can bind or otherwise interact with Casl2L polypeptides, nucleic acids encoding the same, compositions containing the same, and methods of using the same to e.g. edit a target nucleic acid.
  • the present disclosure provides methods of modifying a target nucleic acid in a eukaryotic cell.
  • the methods comprise contacting the target nucleic acid in the eukaryotic cell with: a) a Casl2L polypeptide; and b) a Casl2L guide nucleic acid.
  • the contacting is carried out at a temperature of from about 25°C to about 40°C (e.g., from about 25 °C to about 28°C, from about 28°C to about 30°C, from about 28°C to about 32°C, from about 30°C to about 32°C, from about 30°C to about 37°C, from about 32°C to about 34°C, from about 30°C to about 34°C, from about 34°C to about 37°C, or from about 37°C to about 40°C).
  • modification of a target nucleic acid does not substantially occur at a temperature of less than 28°C.
  • modification of a target nucleic acid does not substantially occur at a temperature of from about 17°C to about 25°C, or from about 25°C to about 28°C. In some cases, modification of a target nucleic acid occurs, if at all, at less than 75%, less than 50%, less than 25%, less than 10%, or less than 5%, of the extent to which the modification of the target nucleic acid occurs when the modification is conducted at 32°C.
  • each containing the target nucleic acid at least 50%, at least 60%, at least 70%, at least 75%, at least 80%, at least 85%, at least 90%, at least 95%, at least 98%, or at least 99%, of the cells would, following contact at 32 °C with a Casl2L polypeptide and a Casl2L guide nucleic acid, contain a modification of the target nucleic acid, which modification was effected by the Casl2L polypeptide (together with the Casl2L guide nucleic acid); while, if the contacting was carried out at a temperature of less than 28°C (e.g., from 17°C to 28°C, from 25°C to 28°C, or from 17°C to 25°C), less than 50%, less than 45%, less than 40%, less than 35%, less than 30%, less than 25%, less than 20%, less than 15%, less than 10%, or less than 5%, of the euk
  • a temperature of less than 28°C e.g
  • a target nucleic acid can be present in any of a variety of eukaryotic cells; i.e., a method of the present disclosure can be carried out in a variety of eukaryotic cells.
  • eukaryotic cells in which a method of the present disclosure can be carried out include, e.g., a plant cell, an insect cell, an arthropod cell, a mammalian cell, a fish cell, a fungal cell, a yeast cell, an amphibian cell, and an avian cell.
  • Suitable cells include cells of members of the kingdom Protista, including, but not limited to, algae (e.g., green algae, red algae, glaucophytes, cyanobacteria); fungus-like members of Protista, e.g., slime molds, water molds, etc.; animal-like members of Protista, e.g., flagellates (e.g., Euglena), amoeboids (e.g., amoeba), sporozoans (e.g, Apicomplexa, Myxozoa, Microsporidia), and ciliates (e.g., Paramecium).
  • algae e.g., green algae, red algae, glaucophytes, cyanobacteria
  • fungus-like members of Protista e.g., slime molds, water molds, etc.
  • animal-like members of Protista e.g., flagellates (e.g., Euglena),
  • Suitable cells include cells of members of the kingdom Fungi, including, but not limited to, members of any of the phyla: Basidiomycota (club fungi; e.g., members of Agaricus, Amanita, Boletus, Cantherellus, etc.); Ascomycota (sac fungi, including, e.g., Saccharomyces); Mycophycophyta (lichens); Zygomycota (conjugation fungi); and Deuteromycota Suitable cells include cells members of the kingdom Plantae, including, but not limited to, members of any of the following divisions: Bryophyta (e.g., mosses), Anthocerotophyta (e.g., hornworts), Hepaticophyta (e.g., liverworts), Lycophyta (e.g., club mosses), Sphenophyta (e.g., horsetails), Psilophyta (e.g
  • Suitable cells include cells of members of the kingdom Animalia, including, but not limited to, members of any of the following phyla: Porifera (sponges); Placozoa; Orthonectida (parasites of marine invertebrates); Rhombozoa; Cnidaria (corals, anemones, jellyfish, sea pens, sea pansies, sea wasps); Ctenophora (comb jellies); Platyhelminthes (flatworms); Nemertina (ribbon worms); Ngathostomulida (jawed worms)p Gastrotricha; Rotifera; Priapulida; Kinorhyncha; Loricifera; Acanthocephala; Entoprocta; Nemotoda; Nematomorpha; Cycliophora; Mollusca (mollusks); Sipuncula (peanut worms); Annelida (segmented worms); Tardigrada (water bears); Onychophor
  • Suitable members of Chordata include any member of the following subphyla: Urochordata (sea squirts; including Ascidiacea, Thaliacea, and Larvacea); Cephalochordata (lancelets); Myxini (hagfish); and Vertebrata, where members of Vertebrata include, e.g., members of Petromyzontida (lampreys), Chondrichthyces (cartilaginous fish), Actinopterygii (ray-finned fish), Actinista (coelocanths), Dipnoi (lungfish), Reptilia (reptiles, e.g., snakes, alligators, crocodiles, lizards, etc.), Aves
  • the cell is a unicellular organism in vitro. In some cases, the cell is a unicellular organism in vitro. In some cases, the cell is obtained from a multicellular organism and is cultured as a unicellular entity in vitro. In some cases, the cell is present in a multicellular organism in vivo.
  • a eukaryotic cell e.g., a multicellular organism comprising the eukaryotic cell
  • a eukaryotic cell is modified to include a Casl2L polypeptide and a Casl2L guide nucleic acid, where temperature is used to control activity of the Casl2L polypeptide in the context of gene drive.
  • a first temperature e.g. from about 17°C to about 25°C or from about 17°C to about 28°C
  • the gene drive does not occur.
  • a second temperature e.g., from about 25°C to about 40°C (e.g., from about 25°C to about 28°C, from about 28°C to about 30°C, from about 28°C to about 32°C, from about 30°C to about 32°C, from about 30°C to about 37°C, from about 32°C to about 34°C, from about 30°C to about 34°C, from about 34°C to about 37°C, or from about 37°C to about 40°C).
  • gene drive occurs.
  • Such temperaturedependent activity can be used to control populations such as mosquitoes, fruit flies, and the like.
  • the present disclosure provides a method for modifying a target nucleic acid in a plant cell, the method including: a) introducing into a plant cell a Casl2L polypeptide and a guide RNA, and b) cultivating the plant cell under conditions whereby the Casl2L polypeptide and guide RNA are present as a complex that targets the target nucleic acid to generate a modification in the target nucleic acid.
  • the Casl2L polypeptide comprises an amino acid sequence having at least 50%, at least 60%, at least 70%, at least 80%, at least 85%, at least 90%, at least 95%, at least 98%, at least 99%, or 100%, amino acid identity to the amino acid sequence depicted in any one of FIG. 5A-5M.
  • the Casl2L polypeptide includes one or more nuclear localization signals (NLS).
  • at least one of the one or more nuclear localization signals is an SV40-type NLS.
  • the Casl2L polypeptide and the guide RNA are encoded in one or more recombinant nucleic acids in the plant cell; i.e., a recombinant nucleic acid comprising a nucleotide sequence encoding the Casl2L polypeptide and/or the guide RNA.
  • one of more of the recombinant nucleic acids include at least one intron.
  • the nucleotide sequence encoding the Casl2L polypeptide and/or the nucleotide sequence encoding the guide RNA is operably linked to a promoter that is functional in plants. In some cases, the promoter is a UBQ10 promoter.
  • the UBQ10 promoter includes a nucleic acid sequence that is at least 80% identical to SEQ ID NO:1.
  • expression of the guide RNA is driven by an RNA Polymerase II promoter (i.e., the nucleotide sequence encoding the guide RNA is operably linked to an RNA Polymerase II (“Pol II”) promoter).
  • the Pol IT promoter is a CmYLCV promoter or a 2x35S promoter.
  • the promoter comprises a nucleic acid sequence that is at least 80% identical to SEQ ID NO:2 or SEQ ID NO:3.
  • the plant cell is cultivated at a temperature in the range of about 23°C to about 37°C. In some embodiments that may be combined with any of the preceding embodiments, the plant cell is cultivated at a temperature in the range of about 20°C to about 25°C. In some embodiments that may be combined with any of the preceding embodiments, the modification includes a deletion of one or more nucleotides in the target nucleic acid. In some embodiments that may be combined with any of the preceding embodiments, the deletion includes deletion of 3-15 nucleotides in the target nucleic acid. In some cases, the deletion includes deletion of 9 nucleotides in the target nucleic acid.
  • the target nucleic acid sequence is located in a region of repressive chromatin. In some embodiments that may be combined with any of the preceding embodiments, the target nucleic acid sequence is located in a region of open chromatin. In some cases, the modification includes an insertion of one or more nucleotides in the target nucleic acid. In some cases, the modification includes a combination of insertions of one or more nucleotides into, and deletions of one or more nucleotides from, the target nucleic acid.
  • the modification may include a combination of insertions and deletions of 3-15 nucleotides in the target nucleic acid.
  • the guide RNA is recombinantly fused to a ribozyme.
  • the plant cell comprises a genetic background that exhibits reduced susceptibility to transgene silencing.
  • the present disclosure provides a recombinant vector including a nucleic acid sequence that includes a promoter that is functional in plants and that encodes a Casl2L polypeptide and a guide RNA.
  • the Casl2L polypeptide comprises an amino acid sequence having at least 50%, at least 60%, at least 70%, at least 80%, at least 85%, at least 90%, at least 95%, at least 98%, at least 99%, or 100%, amino acid identity to the amino acid sequence depicted in any one of FIG. 5A-5M.
  • the Casl2L polypeptide includes a nuclear localization signal (NLS).
  • the nuclear localization signal is an SV40-type NLS.
  • the nucleic acid sequence includes at least one intron.
  • the promoter is a UBQ10 promoter.
  • the UBQ10 promoter includes a nucleic acid sequence that is at least 80% identical to SEQ ID NO:1.
  • expression of the guide RNA is driven by an RNA Polymerase II promoter.
  • the RNA Polymerase II promoter is a CmYLCV promoter or a 2x35S promoter.
  • the promoter comprises a nucleic acid sequence that is at least 80% identical to SEQ ID NO:2 or SEQ ID NOG.
  • the guide RNA is recombinantly fused to a ribozyme.
  • the present disclosure provides a plant cell including a Casl2L polypeptide and a guide RNA, wherein the Cas12L polypeptide and guide RNA are capable of existing in a complex that targets a target nucleic acid to generate a modification in the target nucleic acid.
  • the Casl2L polypeptide includes an amino acid sequence having at least 50%, at least 60%, at least 70%, at least 80%, at least 85%, at least 90%, at least 95%, at least 98%, at least 99%, or 100%, amino acid identity to the amino acid sequence depicted in any one of FIG. 5 A-5M.
  • the Casl2L polypeptide includes a nuclear localization signal (NLS).
  • the nuclear localization signal is an SV40-type NLS.
  • the Casl2L polypeptide and guide RNA are encoded from one or more recombinant nucleic acids in the plant cell.
  • one of more of the recombinant nucleic acids include at least one intron.
  • one of more of the recombinant nucleic acids include a promoter that is functional in plants. In some cases, the promoter is a UBQ10 promoter.
  • the UBQ10 promoter includes a nucleic acid sequence that is at least 80% identical to SEQ ID NO:1.
  • expression of the guide RNA is driven by an RNA Polymerase II promoter.
  • the RNA Polymerase II promoter is a CmYLCV promoter or a 2x35S promoter.
  • the promoter comprises a nucleic acid sequence that is at least 80% identical to SEQ ID NO:2 or SEQ ID NOG.
  • the plant cell is cultivated at a temperature in the range of about 23°C to about 37°C.
  • the plant cell is cultivated at a temperature in the range of about 20°C to about 25 °C.
  • the modification includes a deletion of one or more nucleotides in the target nucleic acid.
  • the deletion includes deletion of 3-15 nucleotides in the target nucleic acid.
  • the deletion includes deletion of 9 nucleotides in the target nucleic acid.
  • the modification includes an insertion of one or more nucleotides into the target nucleic acid (e.g., an insertion of from 3 to 15 nucleotides).
  • the modification includes a combination of an insertion of one or more nucleotides into, and a deletion of one or more nucleotides from, the target nucleic acid.
  • the target nucleic acid sequence is located in a region of repressive chromatin.
  • the target nucleic acid sequence is located in a region of open chromatin.
  • the guide RNA is recombinantly fused to a ribozyme.
  • the plant cell comprises a genetic background that exhibits reduced susceptibility to transgene silencing.
  • the present disclosure provides a plant including a plant cell of any one of the preceding embodiments, wherein the plant includes a modified nucleic acid.
  • the modification includes a deletion of one or more nucleotides in the nucleic acid.
  • the deletion includes deletion of 3-15 nucleotides. In some embodiments, the deletion includes deletion of 9 nucleotides.
  • the modification includes an insertion of one or more nucleotides into the target nucleic acid (e.g., an insertion of from 3 to 15 nucleotides). In some cases, the modification includes a combination of an insertion of one or more nucleotides into, and a deletion of one or more nucleotides from, the target nucleic acid.
  • the present disclosure provides a progeny plant of the plant of any one of the preceding embodiments, wherein the progeny plant includes a modified nucleic acid.
  • the modification includes a deletion of one or more nucleotides in the nucleic acid.
  • the deletion includes deletion of 3-15 nucleotides.
  • the deletion includes deletion of 9 nucleotides.
  • the modification includes an insertion of one or more nucleotides into the target nucleic acid (e.g., an insertion of from 3 to 15 nucleotides).
  • the modification includes a combination of an insertion of one or more nucleotides into, and a deletion of one or more nucleotides from, the target nucleic acid.
  • a method of the present disclosure comprises: a) contacting a target nucleic acid in a plant cell with: i) a Casl2L polypeptide; and ii) a Casl2L guide nucleic acid; b) maintaining a plant cell for a first period of time at a first temperature of from about 17 °C to about 25 °C, wherein the target nucleic acid is substantially not modified by the Casl2L polypeptide; and c) maintaining the plant cell for a second period of time at a second temperature of from about 25°C to about 37°C, wherein the target nucleic acid is modified by the Cas12L polypeptide.
  • a method of the present disclosure comprises: a) contacting a target nucleic acid in a plant cell with: i) a Casl2L polypeptide; and ii) a Casl2L guide nucleic acid; b) maintaining the plant cell for a first period of time at a first temperature of from about 25°C to about 37°C (or from about 25 °C to about 40°C), wherein the target nucleic acid is modified by the Casl2L polypeptide; and c) maintaining a plant cell for a second period of time at a second temperature of from about 17°C to about 25°C, wherein the target nucleic acid is substantially not modified by the Cas12L polypeptide.
  • the modification results in repression of expression of a target nucleic acid (e.g., silencing of a target nucleic acid).
  • the modification is deletion of all or a portion of a target nucleic acid.
  • the modification includes an insertion of one or more nucleotides into the target nucleic acid.
  • the modification includes a combination of an insertion of one or more nucleotides into, and a deletion of one or more nucleotides from, the target nucleic acid.
  • the modification results in expression of a target nucleic acid.
  • the modification results in expression of a target nucleic acid, where the target nucleic acid is an endogenous plant nucleic acid. In some cases, the modification results in expression of a target nucleic acid, where the target nucleic acid is heterologous to the plant cell (e.g., the target nucleic acid is a transgene or an exogenous nucleic acid).
  • the modification results in repression of expression of a target nucleic acid (e.g., silencing of a target nucleic acid)
  • the modification results in repression of expression of a gene product in a pigment production pathway that provides for a change in color of a flower, a bract, a leaf, or another plant part.
  • Pigment production pathway gene products include those involved in an anthocyanin synthesis pathway (e.g., anthocyanin-5-acyltransferase; chaicone synthase; chaicone isomerase; flavanone 3 -hydroxylase; flavonoid 3 ’-hydroxylase; flavonoid 3 ’,5 ’-hydroxylase; flavonoid 3-O-glucosyltransferase; anthocyanidin synthase; any of a variety of enzymes that modify anthocyanidin, such as glucosyltransferases, acyltransferases, and methyltransferases; and the like; see, e.g., Liu et al. (2016) Front. Chem.
  • a betalain synthesis pathway e.g., dihydroxyphenylalanine (DOPA) 4,5-dioxygenase; cyclic-DOPA 5-O-gIucosyltransferase; and the like
  • DOPA dihydroxyphenylalanine
  • cyclic-DOPA 5-O-gIucosyltransferase e.g., cyclic-DOPA 5-O-gIucosyltransferase; and the like
  • a carotenoid synthesis pathway e.g., Tanaka et al. (2008) Plant J. 54:733.
  • a first temperature e.g., a temperature of from about 17°C to about 25°C
  • the bract of a poinsettia is green
  • a second temperature e.g., a temperature of from about 28°C to about 37°C, or from about 28°C to about 40°C
  • the bract of the poinsettia is red.
  • the target nucleic acid comprises a nucleotide sequence encoding a pigment production pathway enzyme.
  • the target nucleic acid is not modified by the Casl2L polypeptide; thus, the plant or the plant part will contain the pigment produced as a result of activity of the pigment production pathway.
  • the target nucleic acid is modified by the Casl2L polypeptide; thus, the plant or the plant part lacks the pigment that would normally be produced by action of the pigment production pathway.
  • the target nucleic acid is an endogenous nucleic acid or a transgene encoding a negative regulator of a pigment production pathway.
  • the target nucleic acid is not modified by the Casl2L polypeptide; thus, the pigment production pathway is blocked by the negative regulator and the pigment is not produced.
  • the target nucleic acid is modified by the Casl2L polypeptide, thus allowing the pigment production pathway to function and change of the color of the plant or the plant part.
  • Target nucleic acids include, e.g., Colorless non-ripening (CNR), nonripening (NOR), ripening inhibitor (RIN), DNA demthylase-2 (DML2), and ethylene insensitive-3 (EIN3). See, e.g., Wang et al. (2002) Plant Cell 14 Suppl: S 131.
  • a first temperature e.g., a temperature of from about 17°C to about 25°C
  • a second temperature e.g., a temperature of from about 28°C to about 37°C
  • the target nucleic acid is a nucleic acid in a fruit, where the nucleic acid compries a nucleotide sequence encoding an ethylene production pathway enzyme or signaling pathway polypeptide.
  • the target nucleic acid compries a nucleotide sequence encoding an ethylene production pathway enzyme or signaling pathway polypeptide.
  • the target nucleic acid is not modified by the Casl2L polypeptide; thus, the fruit continues the ripening process.
  • the target nucleic acid is modified by the Casl2L polypeptide; thus, the ripening process in the fruit is slowed down.
  • the target nucleic acid is an endogenous nucleic acid or a transgene encoding a negative regulator of ethylene production or signaling pathway.
  • the target nucleic acid is not modified by the Casl2L polypeptide; thus, the production or signaling of ethylene is blocked, resulting in slower ripening of the fruit.
  • the target nucleic acid is modified by the Casl2L polypeptide, thus allowing the fruit to ripen.
  • the modification results in expression of a transgene that confers resistance to insects or disease (e.g., a fungal disease, a bacterial disease), where the expression of such transgene occurs at a second temperature (e.g., a temperature of from about 28°C to about 37°C) and does not substantially occur at a first temperature (e.g., a temperature of from about 17°C to about 25°C).
  • a second temperature e.g., a temperature of from about 28°C to about 37°C
  • a first temperature e.g., a temperature of from about 17°C to about 25°C
  • the transgene is a plant disease resistance gene. Plant defenses are often activated by specific interaction between the product of a disease resistance gene in the plant and the product of a corresponding avirulence (Avr) gene in the pathogen.
  • Avr corresponding avirulence
  • a plant can be genetically modified with a transgene that confers resistance to specific pathogen strains.
  • a transgene that confers resistance to specific pathogen strains.
  • the tomato Cf-9 gene confers resistance to Cladosporiumfulvum
  • the tomato Pto gene confers resistance to Pseudomonas syringae
  • the Arabidopsis RSP2 gene confers resistance to Pseudomonas syringae; and the like.
  • a plant that is genetically modified with a transgene, and that is “resistant” to a disease-causing pathogen is one that is more resistant (e.g., at least 20%, at least 30%, at least 40%, at least 50%, at least 60%, at least 70%, or at least 80% more resistant) to the disease-causing pathogen as compared to the wild type plant (a plant of the same species that does not comprise the transgene).
  • the transgene is a nucleic acid comprising a nucleotide sequence encoding a Bacillus thuringiensis (Bt) polypeptide, a derivative thereof, or a synthetic polypeptide modeled after a Bt polypeptide.
  • Bt polypeptides examples include a Bt delta-endotoxin polypeptide.
  • the transgene comprises a nucleotide sequence encoding a peticidal polypeptide, where non-limiting examples of such pesticidal polypeptides include, e.g., insecticidal proteins from Pseudomonas sp. such as PSEEN3174 (Monalysin (2011) PLoS Pathogens 7:1-13); insecticidal proteins from Photorhabdus sp.
  • a PIP-1 polypeptide an AflP- 1 A and/or AfIP-lB polypeptide; a PHI-4 polypeptide; a PIP-47 polypeptide; a PIP-72 polypeptide; a PtIP-50 polypeptide; a PtIP-65 polypeptide; a PtIP-83 polypeptide; a PtIP-96 polypeptide; a deltaendotoxin such as a Cryl, Cry2, Cry3, Cry4, Cry5, Cry6, Cry7, Cry8, Cry9, CrylO, Cryl l, Cryl2, Cryl3, Cryl4, Cryl5, Cryl6, Cryl7, Cryl8, Cryl9, Cry20, Cry21, Cry22, Cry23, Cry24, Cry25, Cry26, Cry27, Cry 28, Cry 29, Cry 30, Cry31, Cry32, Cry33, Cry34, Cry35, Cry36,
  • thuringiensis' a CrylA polypeptide (see, e.g., U.S. Patent Nos. 5,880,275 and 7,858,849); a DIG-3 polypeptide (see, e.g., U.S. Pat. Nos. 8,304,604 and 8,304,605); a DIG-11 polypeptide (see, e.g., U.S. Pat. Nos. 8,304,604 and 8,304,605); a CrylB polypeptide; a CrylC polypeptice; a CrylF polypeptide; a Cry2 polypeptide (see, e.g., U.S. Pat. No.
  • a Cry3A polypeptide a Cry4 polypeptide; a Cry5 polypeptide; a Cry6 polypeptide; a Cry8 polypeptide; a Cry9 polypeptide; a Cry46 protein, a Cry 51 protein, a Cry binary toxin; a TIC901 or related toxin; an AXMI-027, AXMI-036, or AXMI-038 polypeptide (see, e.g., U.S. Pat. No. 8,236,757); a vegetative insecticidal protein (Vip; see, e.g., Gupta et al. (2021) Front. Microbiol.
  • Vip vegetative insecticidal protein
  • the transgene is a nucleic acid comprising a nucleotide sequence encoding an insect-specific polypeptide that, upon expression, disrupts the physiology of the affected pest; where such polypeptides include, e.g., an insect diuretic hormone receptor, an allatostatin, and the like.
  • the transgene is a nucleic acid comprising a nucleotide sequence encoding an enzyme involved in the modification, including the post-translational modification, of a biologically active molecule; for example, a glycolytic enzyme, a proteolytic enzyme, a lipolytic enzyme, a nuclease, a cyclase, a transaminase, an esterase, a hydrolase, a phosphatase, a kinase, a phosphorylase, a polymerase, an elastase, a chitinase, or a glucanase.
  • a glycolytic enzyme for example, a glycolytic enzyme, a proteolytic enzyme, a lipolytic enzyme, a nuclease, a cyclase, a transaminase, an esterase, a hydrolase, a phosphatase, a kinase, a phosphorylase, a polyme
  • the modification can result in expression of a transgene, where the transgene is a nucleic acid comprising a nucleotide sequence encoding a lectin, where the nucleotide sequence is operably linked to a plant-specific promoter, e.g., a phloem-specific promoter, or the like.
  • the modification can result in expression of a transgene, where the transgene is a nucleic acid comprising a nucleotide sequence encoding a co-ACTX-Hvla toxin (Hvt) (a component of the venom of the Australian funnel web spider Hadronyche versuta (Khan et al. (2006) Transgenic Res.
  • Hvt co-ACTX-Hvla toxin
  • the modification can result in expression of a transgene, where the transgene is a nucleic acid comprising a nucleotide sequence encoding a lectin and a nucleotide sequence encoding Hvt.
  • a transgene can confer broad-spectrum resistance against lepidopteran (e.g., Helicoverpa armigera and Spodoptera litura) and hemipteran (e.g., Myzus persicae, Phenacoccus solenopsis, and Bemisia tabaci) insect pests. See, e.g., Rauf et al. (2019) Nature Scientific Reports 9:6745
  • the modification results in increased expression of an endogenous plant gene product that has insecticidal activity.
  • endogenous plant proteins include, e.g., lectins, ribosomeinactivating proteins, enzymes inhibitors, arcelins, chitinases, ureases, and modified storage proteins. See, e.g., Carlini and Grossi-de-Sa (2002) Toxicon. 40:1515.
  • the modification results in increased expression of an endogenous jasmonic acid pathway protein.
  • a transgene can be a nucleic acid comprising a nucleotide sequence encoding an enzyme that cleaves a protein of a plant pathogen.
  • a transgene can be a nucleic acid comprising a nucleotide sequence encoding a plant apoplastic subtilisin-like protease, such as tomato P69B, which is able to cleave a secreted protein PC2 from the potato late blight pathogen Phytophthora infestans, thus triggering downstream immune responses. See, e.g., Wang et al. (2021) New Phytol. 229:3424.
  • a transgene can be a nucleic acid comprising a nucleotide sequence encoding an inhibitory RNA, such as a microRNA or a long double-stranded RNA, that inhibits an RNA of a plant pathogen.
  • a transgene can be a nucleic acid comprising a nucleotide sequence encoding TAS1 c-siR483 and TAS2-siR453, which targets the RNA produced by BC1G_10728, BClG_10508 and BC1G_O8464 genes of the fungal pathogen Botrytis cinerea. See, e.g., Cai et al. (2016) Science 360:1126.
  • the target nucleic acid comprises a nucleotide sequence encoding a polypeptide that provides for resistance to a disease (by plant pathogen such as fungus or a bacterium) or for resistance to an insect (e.g., an insect that causes plant pathology).
  • a first temperature of from about 17°C to about 25°C the target nucleic acid is not modified by the Cas12L polypeptide; thus, the plant is resistant to the fungus, bacterium, or insect.
  • the target nucleic acid is modified by the Casl2L polypeptide; thus, the plant is susceptible to the fungus, bacterium, or insect.
  • the target nucleic acid is an endogenous nucleic acid or a transgene comprising a nucleotide sequence encoding a negative regulator of a disease resistance or insect resistance gene or pathway.
  • the target nucleic acid is not modified by the Casl2L polypeptide; thus, the plant is susceptible to the fungus, bacterium, or insect.
  • the target nucleic acid is modified by the Casl2L polypeptide; thus, the polypeptide that provides for resistance is produced and the plant is resistant to the fungus, bacterium, or insect.
  • the modification results in expression of a transgene that confers resistance to an herbicide.
  • the transgene is a nucleic acid comprising a nucleotide sequence encoding a polypeptide that confers resistance to an herbicide, such as an imidazolinone or a sulfonylurea, that inhibits the growing point or meristem; such polypeptides include, e.g., a mutant ALS or a mutant AHAS enzyme.
  • the transgene is a nucleic acid comprising a nucleotide sequence encoding a polypeptide that confers resistance to glyphosphate, e.g., where resistance can be conferred by a mutant 5-enolpyruvl-3-phosphikimate synthase gene (EPSP).
  • ESP 5-enolpyruvl-3-phosphikimate synthase gene
  • the modification controls male sterility/fertility.
  • examples include, e.g., a transgene that is a nucleic acid comprising a nucleotide sequence encoding barstar (an inhibitor of barnase), e.g., where the nucleotide sequence is operably linked to an anther-specific promoter or a pollen-specific promoter (see, e.g., Roque et al. (2019) Front. Plant Sci. 10:819); a a transgene that is a nucleic acid comprising a nucleotide sequence encoding barnase (Paul et al., (1992) Plant Mol. Biol. 19:611-622); and the like.
  • Another example includes a transgene encoding a deacetylase gene under the control of a tapetum-specific promoter.
  • Other male sterility genes include, e.g., MAC1, EMS1, and GNE2 (Sorensen et al. (2002) Plant J. 29:581-594). Further examples of male sterility genes include CMS-D2-2, CMS-hir, CMS-D8, CMS-D4, and CMS-C1 .
  • the target nucleic acid comprises a nucleotide sequence that encodes a male reproductive pathway polypeptide.
  • the target nucleic acid At a first temperature of from about 17°C to about 25°C, the target nucleic acid is not modified by the Casl2L polypeptide; thus, the plant is fertile.
  • the target nucleic acid At a second temperature of from about 28°C to about 37°C or from about 28°C to about 40°C, the target nucleic acid is modified by the Casl2L polypeptide; thus, the plant is male sterile.
  • the target nucleic acid is an endogenous nucleic acid or a transgene comprising a nucleotide sequence encoding a negative regulator of the male reproductive pathway.
  • a first temperature of from about 17°C to about 25°C the target nucleic acid is not modified by the Casl2L polypeptide; thus, the male reproductive pathway is blocked, resulting in a male sterile phenotype.
  • a second temperature of from about 28°C to about 37°C or from about 28°C to about 40°C the target nucleic acid is modified by the Casl2L polypeptide; thus, the male reproductive pathway is allowed to function and the plant is fertile.
  • a Casl2L polypeptide can be targeted to a specific target nucleic acid to modify the target nucleic acid.
  • Casl2L is targeted to a target nucleic acid based on its association/complex with a guide RNA that is able to hybridize with the particular target nucleotide sequence in the target nucleic acid.
  • the guide RNA provides the targeting functionality to target a particular target nucleotide sequence in a target nucleic acid.
  • Various types of nucleic acids may be targeted to e.g. modulate their expression, as will be readily apparent to one of skill in the art.
  • Certain aspects of the present disclosure relate to targeting a target nucleic acid with a Casl2L polypeptide such that the Casl2L polypeptide is able to enact enzymatic activity at the target nucleic acid.
  • a Casl2L polypeptide/gRNA complex is targeted to a target nucleic acid and introduces an edit/modification into the target nucleic acid.
  • the edit/modification is to intr oduce a single-stranded break or a double stranded break into the nucleic acid backbone of the target nucleic acid.
  • a target site generally refers to a location of a target nucleic acid that is capable of being bound by a Casl2L/gRNA complex and subjected to the activity of a Casl2L polypeptide or variant thereof.
  • the target site may include both the nucleotide sequence hybridized with a guide RNA as well as at least 5, at least 10, at least 15, at least 20, at least 25, at least 30, at least 35, at least 40, at least 45, or at least 50 nucleotides or more on the 3’ side, the 5’ side, or both the 3’ and 5’ side of the nucleotide sequence in the target nucleic acid that is hybridized with a guide RNA.
  • the target site may contain at least 30, at least 40, at least 50, at least 60, at least 70, at least 80, at least 90, at least 100, at least 125, at least 150, at least 175, or at least 200 or more nucleotides.
  • a Casl2L polypeptide is targeted to a particular locus.
  • a locus generally refers to a specific position on a chromosome or other nucleic acid molecule.
  • a locus may contain, for example, a polynucleotide that encodes a protein or an RNA.
  • a locus may also contain, for example, a non-coding RNA, a gene, a promoter, a 5’ untranslated region (UTR), an exon, an intron, a 3’ UTR, or combinations thereof.
  • a locus may contain a coding region for a gene.
  • a Casl2L polypeptide is targeted to a gene.
  • a gene generally refers to a polynucleotide that encodes a gene product (for example, a polypeptide or a noncoding RNA).
  • a gene may contain a promoter, an enhancer sequence, a leader sequence, a transcriptional start site, a transcriptional stop site, a polyadenylation site, one or more exons, one or more introns, a 5’ UTR, a 3’ UTR, or combinations thereof.
  • a gene sequence may contain a polynucleotide sequence encoding a promoter, an enhancer sequence, a leader sequence, a transcriptional start site, a transcriptional stop site, a poly adenylation site, one or more exons, one or more introns, a 5’ UTR, a 3’ UTR, or combinations thereof.
  • the target nucleic acid sequence may be located within the coding region of a target gene or upstream or downstream thereof. Moreover, the target nucleic acid sequence may reside endogenously in a target gene or may be inserted into the gene, e.g., heterologous, for example, using techniques such as homologous recombination.
  • a target gene of the present disclosure can be operably linked to a control region, such as a promoter, that contains a sequence that can be recognized by a guide RNA of the present disclosure such that a Casl2L polypeptide may be targeted to that sequence.
  • the target sequence may be a promoter or other regulatory region.
  • the target nucleic acid sequence may be located in a region of chromatin.
  • the target nucleic acid sequence to be edited by a Casl2L polypeptide may be in a region of open chromatin or similar region of DNA that is generally accessible to transcriptional machinery. Regions of open chromatin may be characterized by nucleosome depletion, nucleosome disruption, accessibility to transcriptional machinery, and/or a transcriptionally active state. Regions of open chromatin will be readily understood and identifiable by one of skill in the art.
  • Editing a target nucleic acid sequence that is in a region of open chromatin may result in improved editing efficiency by the Casl2L polypeptide as compared to a corresponding control nucleic acid sequence (e.g. one that is present in a region of more closed, repressive, and/or transcriptionally inactive chromatin).
  • a corresponding control nucleic acid sequence e.g. one that is present in a region of more closed, repressive, and/or transcriptionally inactive chromatin.
  • Target genes or nucleic acid regions to be edited by a Casl2L polypeptide of the present disclosure will be readily apparent to those of skill in the art depending on the particular application and/or purpose.
  • genes with particular agricultural importance may be edited/modified according to the methods of the present disclosure.
  • Exemplary genes to be edited/modified may include, for example, those involved in light perception (e.g. PHYB, etc.); those involved in the circadian clock (e.g. CCA I, LHY, etc.); those involved in flowering time (e.g. CO, FT, etc.); those involved in meristem size (e.g.
  • WUS, CLV3, etc. those involved in plant architecture (S, SP, TFL1, SFT, etc.); those involved in ripening (e.g., genes in the ethylene production pathway); those involved in flower color; those involved in bract color; and those involved in embryogenesis, chromatin structure, stress response, growth and development, etc.
  • the target nucleic acid is one that provides for resistance to an antimicrobial agent.
  • antimicrobial agents include penicillin, a cephalosporin, a monobactam, a carbapenem, a macrolide, an aminoglycoside, a quinolone, a sulfonamide, a tetracycline, a glycopeptide, a lipoglycopeptide, an oxazolidinone, a rifamycin, a tuberactinomycin, chloramphenicol, metronidazole, tinidazole, nitrofurantoin, teicoplanin, telavancin, linezolid, cycloserine 2, bacitracin, polymyxin B, viomycin, and capreomycin.
  • the target nucleic acid is one that provides for resistance to an antifungal agent, where examples of antifungal agents include an allylamine, an imidazole, a triazole, a thiazole, a polyene, and an echinocandin.
  • antifungal agents include an allylamine, an imidazole, a triazole, a thiazole, a polyene, and an echinocandin.
  • the target nucleic acid is one that provides for resistance to an insecticidal agent
  • insecticidal agents include a chloronicotinyl, a neonicotinoid, a carbamate, an organophosphate, a pyrethroid, an oxadiazine, a spinosyn, a cyclodiene, an organochlorine, a fiprole, a mectin, a diacylhydrazine, a benzoylurea, an organotin, a pyrrole, a dinitroterpenol, a METI, a tetronic acid, a tetramic acid, and a pthalamide.
  • insecticidal agents include a chloronicotinyl, a neonicotinoid, a carbamate, an organophosphate, a pyrethroid, an oxadiazine, a spinosyn, a cyclodiene, an organoch
  • the target nucleic acid provides for resistance to a plant pathogen.
  • the plant pathogen is a bacterium, a fungus, a parasitic insect, a parasitic nematode, or a parasitic protozoan.
  • the target nucleic acid is endogenous to the plant where the expression of one or more genes is modulated according to the methods described herein.
  • the target nucleic acid is a transgene of interest that has been inserted into a plant. Suitable target nucleic acids will be readily apparent to one of skill in the art depending on the particular need or outcome.
  • the target nucleic acid sequence may be in e.g. a region of euchromatin (e.g. highly expressed gene), or the target nucleic acid sequence may be in a region of heterochromatin (e.g. centromere DNA).
  • the target nucleic acid may be in a region of repressive chromatin.
  • Repressive chromatin generally refers to regions of chromatin where transcription is repressed or otherwise generally transcriptionally inactive.
  • Exemplary regions of repressive chromatin include, for example, regions with repressive DNA methylation, compact chromatin, and/or no transcription).
  • a Casl2L polypeptide can be used to create mutations in plants that result in reduced or silenced expression of a target gene.
  • a Casl2L polypeptide can be used to create functional “overexpression” mutations in a plant by releasing repression of the target gene expression as a consequence of a modification that results in transcriptional activation of the target nucleic acid. Release of gene expression repression, which may lead to activation of gene expression, may be of a structural gene, e.g., one encoding a protein having for example enzymatic activity, or of a regulatory gene, e.g., one encoding a protein that in turn regulates expression of a structural gene.
  • a Casl2L polypeptide can be used to control an endogenous biosynthetic pathway in a plant cell. In some cases, a Casl2L polypeptide can be used to control a heterologous biosynthetic pathway in a plant cell.
  • biosynthetic pathways that can be controlled using a Casl2L polypeptide (together with a Casl2L guide nucleic acid) include, e.g., biosynthetic pathways involved in psychoactive alkaloid production (e.g., for reducing opium production by Papaver soniferum); biosynthetic pathways for production of cannabidiol; biosynthetic pathways for production of tetrahydrocannabinol; a phytic acid production pathway; and the like.
  • a Casl2L polypeptide is used to control an endogenous glucosinolate production pathway.
  • the Casl2L polypeptide inhibits an endogenous glucosinolate production pathway, but only at a higher temperature (e.g., from about 25C to about 32C), where such higher temperature, and only just prior to (e.g., one week, two weeks, or three weeks) harvest of a vegetable intended for human consumption, where the vegetable is produced by the plant.
  • Casl2L polypeptides and their use in facilitating the editing/modification of a tar get nucleic acid.
  • Casl2L polypeptides generally function as RNA-guided DNA-binding proteins.
  • Casl2L polypeptides may have endonuclease activity which can facilitate modification/editing of a target nucleic acid.
  • a Casl2L polypeptide (this term is used interchangeably with the term “Casl2L protein”) can bind and/or modify (e.g., cleave, nick, methylate, demethylate, etc.) a target nucleic acid and/or a polypeptide associated with target nucleic acid (e.g., methylation or acetylation of a histone tail) (e.g., in some cases, the Casl2L protein includes a fusion partner with an activity, and in some cases, the Casl2L protein provides nuclease activity).
  • the Casl2L protein is a naturally-occurring protein (e.g., naturally occurs in bacteriophage).
  • the Casl2L protein is not a naturally-occurring polypeptide (e.g., the Cas12L protein is a variant Cas12L protein, a fusion Cas12L protein, and the like).
  • Assays to determine whether given protein interacts with a Casl2L guide RNA can be any convenient binding assay that tests for binding between a protein and a nucleic acid. Suitable binding assays (e.g., gel shift assays) will be known to one of ordinary skill in the art (e.g., assays that include adding a Casl2L guide RNA and a protein to a target nucleic acid).
  • Assays to determine whether a protein has an activity can be any convenient assay (e.g., any convenient nucleic acid cleavage assay that tests for nucleic acid cleavage). Suitable assays (e.g., cleavage assays) will be known to one of ordinary skill in the art.
  • a naturally occurring Casl2L protein functions as an endonuclease that catalyzes a double strand break at a specific sequence in a targeted double stranded DNA (dsDNA).
  • the sequence specificity is provided by the associated guide RNA, which hybridizes to a target sequence within the target DNA.
  • the naturally occurring Casl2L guide RNA is a crRNA, where the crRNA includes (i) a guide sequence that hybridizes to a target sequence in the target DNA and (ii) a protein binding segment which includes a stem-loop (hairpin - dsRNA duplex) that binds to the Casl2L protein.
  • a Casl2L polypeptide suitable for use in a subject method and/or composition is (or is derived from) a naturally occurring (wild type) protein. Examples of naturally occurring Casl2L proteins are depicted in FIG. 5A-5M.
  • a Casl2L protein (of the subject compositions and/or methods) includes an amino acid sequence having 20% or more sequence identity (e.g., 30% or more, 40% or more, 50% or more, 60% or more, 70% or more, 80% or more, 85% or more, 90% or more, 95% or more, 97% or more, 98% or more, 99% or more, or 100% sequence identity) with any one of the Casl2L amino acid sequences depicted in FIG. 5A-5M.
  • a Casl2L protein (of the subject compositions and/or methods) has more sequence identity to the amino acid sequence depicted in any one of FIG. 5A-5M than to any of the following: Casl2a proteins, Casl2b proteins, Casl2c proteins, Casl2d proteins, Casl2e proteins, Casl2 g proteins, Casl2h proteins, and Casl2i proteins.
  • a Casl2L protein (of the subject compositions and/or methods) includes an amino acid sequence having a RuvC domain (which includes the RuvC-I, RuvC-II, and RuvC-III domains) that has more sequence identity to the RuvC domain of any of the Casl2L amino acid sequences depicted in FIG. 5A-5M) than to the RuvC domain of any of the following: Casl2a proteins, Casl2b proteins, Casl2c proteins, Casl2d proteins, Casl2e proteins, Casl2 g proteins, Casl2h proteins, and Casl2i proteins.
  • a Casl2a proteins Casl2b proteins, Casl2c proteins, Casl2d proteins, Casl2e proteins, Casl2 g proteins, Casl2h proteins, and Casl2i proteins.
  • FIG. 5 A provides the locations of active site residues present in RuvC domains of the CasL polypeptide designated “CasL_56.”
  • active site residues of CasL_56 are amino acid residues 336, 523, and 676.
  • Corresponding active site residues of other CasL polypeptides presented in FIG. 5A-5M can be determined by those skilled in the art. See, e.g., the bold and underlined residues in FIG. 5B and FIG. 5C.
  • a CasL protein of the present disclosure includes an Asn at position 102 (N102) of the CasL polypeptides depicted in FIG. 5A-5C, or corresponding positions in the CasL polypeptide of FIG. 5D-5M.
  • N102 Asn at position 102
  • substitution of the N102 with another amino acid can modify the PAM requirement. For example, substitution of N102 with Q, S, E, T, or D could expand the PAM from R(-l) to C, T, or N.
  • a CasL protein of the present disclosure includes amino acids that interact directly with the RNA nucleobases (Q452, N510), and or amino acids that interact directly with the RNA phosphate backbone to stabilize the guide (S451, K598, E444, N445, K503, Y619) (where the amino acid numbering is based on the numbering of the amino acid sequence depicted in FIG. 5B), or corresponding positions in the amino acid sequence depicted in any one of FIG. 5A or FIG. 5C-5M. For example, corresponding positions in the amino acid sequence depicted in FIG. 5C are shown in bold.
  • a CasL protein of the present disclosure has a domain structure as shown in FIG. 3D.
  • a CasL protein comprises: i) an OBD domain of about 27 amino acids in length at the N-terminus of the protein; ii) a REC I domain from amino acids 28-54; iii) a PID domain from amino acids 55-113; iv) a REC I domain from amino acids 114-245; v) an OBD domain from amino acids 246 to 321; vi) a first RuvC domain from amino acids 322 to 350; vii) a REC II domain from aino acids 351 to 387; viii) a second RuvC domain from amino acids 388 to 396; ix) a REC II domain from amino acids 397-522; x) a third RuvC domain from amino acids 523 to 640; xi) a TSL domain from amino acids 641 to 678; xii)
  • a Casl2L protein (of the subject compositions and/or methods) includes an amino acid sequence having 20% or more sequence identity (e.g., 30% or more, 40% or more, 50% or more, 60% or more, 70% or more, 80% or more, 85% or more, 90% or more, 95% or more, 97% or more, 98% or more, 99% or more, or 100% sequence identity) with the RuvC domain (which includes the RuvC-I, RuvC-II, and RuvC-III domains) of any one of the Casl2L amino acid sequences depicted in FIG. 5A-5M.
  • sequence identity e.g., 30% or more, 40% or more, 50% or more, 60% or more, 70% or more, 80% or more, 85% or more, 90% or more, 95% or more, 97% or more, 98% or more, 99% or more, or 100% sequence identity
  • RuvC domain which includes the RuvC-I, RuvC-II, and RuvC-III domains
  • a Cas12L protein (of the subject compositions and/or methods) includes an amino acid sequence having 70% or more sequence identity (e.g., 75% or more, 80% or more, 85% or more, 90% or more, 95% or more, 97% or more, 98% or more, 99% or more, or 100% sequence identity) with the RuvC domain (which includes the RuvC-I, RuvC-II, and RuvC-III domains) of any one of the Casl2L amino acid sequences depicted in FIG. 5A-5M.
  • sequence identity e.g., 75% or more, 80% or more, 85% or more, 90% or more, 95% or more, 97% or more, 98% or more, 99% or more, or 100% sequence identity
  • RuvC domain which includes the RuvC-I, RuvC-II, and RuvC-III domains
  • a Casl2L protein (of the subject compositions and/or methods) includes the RuvC domain (which includes the RuvC-I, RuvC-II, and RuvC-III domains) of any one of the Cas12L amino acid sequences depicted in FIG. 5 A-5M.
  • a guide RNA that binds a Casl2L polypeptide includes a nucleotide sequence depicted in any one of FIG. 5A-5M; where “T” is replaced with “U”) (or in some cases the reverse complement of same).
  • the guide RNA comprises the nucleotide sequence (N)nX or the reverse complement of same, where N is any nucleotide, n is an integer from 15 to 30 (e.g., from 15 to 20, from 17 to 25, from 17 to 22, from 18 to 22, from 18 to 20, from 20 to 25, or from 25 to 30), and X is any one of the nucleotide sequences depicted in any one of FIG. 5A-5M (or in some cases the reverse complement of same).
  • a guide RNA that binds a Casl2L polypeptide includes a nucleotide sequence (a repeat sequence; or protein-binding sequence) of the following consensus sequence: WAUUGUUGUARMWNYYWUUUURUAWGGWKURAACAAC (SEQ ID NO:69), where W is A or U; R is G or A; M is A or C; N is A, G, C, or U; Y is U or C; and K is G or U.
  • a guide RNA can comprise a protein-binding segment comprising the nucleotide sequence: AUUGUUGAAAUAGUACUUUUAUAGUCUAUAUACAAC (SEQ ID NO:70).
  • a guide RNA that binds a CasL polypeptide can comprise a protein-binding segment comprising the nucleotide sequence: AUUGUUGUAACAUCUAUUUUGUAAGGUGUAAACAAC (SEQ ID NO:71).
  • a guide RNA that binds a CasL polypeptide can comprise a protein-binding segment comprising the nucleotide sequence: UAUUGUUGUAACUCUUAUUUUGUAUGGAGUAAACAAC (SEQ ID NO:72).
  • a guide RNA that binds a CasL polypeptide can comprise a protein-binding segment comprising the nucleotide sequence:
  • a guide RNA that binds a CasL polypeptide can comprise a protein-binding segment comprising the nucleotide sequence: AUUGUUGUAACUCUUAUUUUGUAUGGAGUAAACAAC (SEQ ID NO:74).
  • a guide RNA that binds a CasL polypeptide can comprise a protein-binding segment comprising the nucleotide sequence: AUUGUUGUAACUUUUAUUUUGUAUGGAGUAAACAAC (SEQ ID NO:75).
  • a guide RNA that binds a CasL polypeptide can comprise a protein-binding segment comprising the nucleotide sequence:
  • a Casl2L polypeptide of the present disclosure can form a complex (a ribonucleoprotein (RNP) complex) with a guide RNA comprising a protein-binding segment described herein.
  • RNP ribonucleoprotein
  • a guide RNA that binds a CasL polypeptide can comprise a proteinbinding segment comprising the nucleotide sequence: AAUGUUGUAGAUGCCUUUUUAUAAGGAUUAAACAAC (SEQ ID NO:77).
  • a guide RNA that binds a CasL polypeptide can comprise a protein-binding segment comprising the nucleotide sequence: AAUGUUGUAGAUACCUUUUUGUAAGGAUUGAACAAC (SEQ ID NO:78).
  • a guide RNA that binds a CasL polypeptide can comprise a protein-binding segment comprising the nucleotide sequence: UAUUGUUGUAGAUACCUUUUUGUAAGGAUUAAACAAC (SEQ ID NO:79).
  • a guide RNA that binds a CasL polypeptide can comprise a protein-binding segment comprising the nucleotide sequence: AUUGUUGUAGAUACCUUUUUGUAAGGAUUGAACAAC (SEQ ID NO: 80).
  • a guide RNA that binds a CasL polypeptide can comprise a protein-binding segment comprising the nucleotide sequence: AUUGUUGUAAUACUAUUUUUGUAAAGUAUAAACAAC (SEQ ID NO:81).
  • a guide RNA that binds a CasL polypeptide can comprise a protein-binding segment comprising the nucleotide sequence: AUUGUUGUAAUACACUUUUUAUAAGGUAUGAACAAC (SEQ ID NO:82).
  • the repeat region of a CasLambda guide RNA share conserved secondary structures across homologs.
  • the repeat region can include palindromic regions that can form stem and stem-loop structures.
  • a guide RNA that binds a Casl2L polypeptide includes a nucleotide sequence having 20% or more sequence identity (e.g., 30% or more, 40% or more, 50% or more, 60% or more, 70% or more, 80% or more, 85% or more, 90% or more, 95% or more, 97% or more, 98% or more, 99% or more, or 100% sequence identity) with the nucleotide sequence depicted in any one of FIG. 5A-5M (or in some cases the reverse complement of same).
  • the guide RNA comprises the nucleotide sequence (N)nX or the reverse complement of same, where N is any nucleotide, n is an integer from 15 to 30 (e.g., from 15 to 20, from 17 to 25, from 17 to 22, from 18 to 22, from 18 to 20, from 20 to 25, or from 25 to 30), and X is a nucleotide sequence having 20% or more sequence identity (e.g., 30% or more, 40% or more, 50% or more, 60% or more, 70% or more, 80% or more, 85% or more, 90% or more, 95% or more, 97% or more, 98% or more, 99% or more, or 100% sequence identity) with nucleotide sequence depicted in any one of FIG. 5 A-5M.
  • N is any nucleotide
  • n is an integer from 15 to 30 (e.g., from 15 to 20, from 17 to 25, from 17 to 22, from 18 to 22, from 18 to 20, from 20 to 25, or from 25 to 30)
  • X is a nucleotide sequence having
  • a guide RNA that binds a Cas12L polypeptide includes a nucleotide sequence having 85% or more sequence identity (e.g., 90% or more, 95% or more, 97% or more, 98% or more, 99% or more, or 100% sequence identity) with nucleotide sequence depicted in any one of FIG. 5A-5M (or in some cases the reverse complement of same).
  • the guide RNA comprises the nucleotide sequence (N)nX or the reverse complement of same, where N is any nucleotide, n is an integer from 15 to 30 (e.g., from 15 to 20, from 17 to 25, from 17 to 22, from 18 to 22, from 18 to 20, from 20 to 25, or from 25 to 30), and X is a nucleotide sequence having 85% or more sequence identity (e.g., 90% or more, 95% or more, 97% or more, 98% or more, 99% or more, or 100% sequence identity) with nucleotide sequence depicted in any one of FIG. 5A-5M.
  • N is any nucleotide
  • n is an integer from 15 to 30 (e.g., from 15 to 20, from 17 to 25, from 17 to 22, from 18 to 22, from 18 to 20, from 20 to 25, or from 25 to 30)
  • X is a nucleotide sequence having 85% or more sequence identity (e.g., 90% or more, 95% or more, 97% or more, 98% or
  • a guide RNA that binds a Casl2L polypeptide includes a nucleotide sequence depicted in any one of FIG. 5A-5M (or in some cases the reverse complement of same).
  • the guide RNA comprises the nucleotide sequence X(N)n, where N is any nucleotide, n is an integer from 15 to 30 (e.g., from 15 to 20, from 17 to 25, from 17 to 22, from 18 to 22, from 18 to 20, from 20 to 25, or from 25 to 30), and X is the nucleotide sequence depicted in any one of FIG. 5A-5M (or in some cases the reverse complement of same).
  • a guide RNA that binds a Casl2L polypeptide includes a nucleotide sequence having 20% or more sequence identity (e.g., 30% or more, 40% or more, 50% or more, 60% or more, 70% or more, 80% or more, 85% or more, 90% or more, 95% or more, 97% or more, 98% or more, 99% or more, or 100% sequence identity) with the nucleotide sequence depicted in any one of FIG. 5A-5M (or in some cases the reverse complement of same).
  • the guide RNA comprises the nucleotide sequence X(N)n, where N is any nucleotide, n is an integer from 15 to 30 (e.g., from 15 to 20, from 17 to 25, from 17 to 22, from 18 to 22, from 18 to 20, from 20 to 25, or from 25 to 30), and X is a nucleotide sequence having 20% or more sequence identity (e.g., 30% or more, 40% or more, 50% or more, 60% or more, 70% or more, 80% or more, 85% or more, 90% or more, 95% or more, 97% or more, 98% or more, 99% or more, or 100% sequence identity) with the nucleotide sequence depicted in any one of FIG. 5A- 5M.
  • N any nucleotide
  • n is an integer from 15 to 30 (e.g., from 15 to 20, from 17 to 25, from 17 to 22, from 18 to 22, from 18 to 20, from 20 to 25, or from 25 to 30)
  • X is a nucleotide sequence having 20% or more sequence
  • a guide RNA that binds a Casl2L polypeptide includes a nucleotide sequence having 85% or more sequence identity (e.g., 90% or more, 95% or more, 97% or more, 98% or more, 99% or more, or 100% sequence identity) with the nucleotide sequence depicted in any one of FIG. 5A- 5M (or in some cases the reverse complement of same).
  • the guide RNA comprises the nucleotide sequence X(N)n, where N is any nucleotide, n is an integer from 15 to 30 (e.g., from 15 to 20, from 17 to 25, from 17 to 22, from 18 to 22, from 18 to 20, from 20 to 25, or from 25 to 30), and X is a nucleotide sequence having 85% or more sequence identity (e.g., 90% or more, 95% or more, 97% or more, 98% or more, 99% or more, or 100% sequence identity) with the nucleotide sequence depicted in any one of FIG. 5A-5M.
  • N any nucleotide
  • n is an integer from 15 to 30 (e.g., from 15 to 20, from 17 to 25, from 17 to 22, from 18 to 22, from 18 to 20, from 20 to 25, or from 25 to 30)
  • X is a nucleotide sequence having 85% or more sequence identity (e.g., 90% or more, 95% or more, 97% or more, 98% or more, 99% or
  • FIG. 5 A [0114] FIG. 5 A
  • a Casl2L protein (of the subject compositions and/or methods) includes an amino acid sequence having 20% or more sequence identity (e.g., 30% or more, 40% or more, 50% or more, 60% or more, 70% or more, 80% or more, 85% or more, 90% or more, 95% or more, 97% or more, 98% or more, 99% or more, or 100% sequence identity) with the Casl2L amino acid sequence depicted in FIG.
  • a Casl2L protein includes an amino acid sequence having 50% or more sequence identity (e.g., 60% or more, 70% or more, 80% or more, 85% or more, 90% or more, 95% or more, 97% or more, 98% or more, 99% or more, or 100% sequence identity) with the Casl2L amino acid sequence depicted in FIG. 5A.
  • a Casl2L protein includes an amino acid sequence having 80% or more sequence identity (e.g., 85% or more, 90% or more, 95% or more, 97% or more, 98% or more, 99% or more, or 100% sequence identity) with the Casl2L amino acid sequence depicted in FIG.
  • a Casl2L protein includes an amino acid sequence having 90% or more sequence identity (e.g., 95% or more, 97% or more, 98% or more, 99% or more, or 100% sequence identity) with the Casl2L amino acid sequence depicted in FIG. 5A.
  • a Casl2L protein includes an amino acid sequence having the Casl2L amino acid sequence depicted in FIG. 5A.
  • a Casl2L protein includes an amino acid sequence having the Casl2L protein sequence depicted in FIG. 5A, with the exception that the sequence includes an amino acid substitution (e.g., 1, 2, or 3 amino acid substitutions) that reduces the naturally occurring catalytic activity of the protein.
  • the Casl2L polypeptide has a length of from 700 amino acids (aa) to 750 aa, e.g., from 700 aa to 725 aa, from 725 aa to 735 aa, from 735 aa to 740 aa, or from 740 aa to 750 aa). In some cases, the Casl2L polypeptide has a length of 735 amino acids.
  • a guide RNA that binds a Casl2L polypeptide includes the following nucleotide sequence: ATTGTTGTAGATACCTTTTTATAAGGTTTGAACAAC (SEQ ID NO:83) or the reverse complement of same.
  • the guide RNA comprises the nucleotide sequence (N)nATTGTTGTAGATACCTTTTTATAAGGTTTGAACAAC (SEQ ID NO: 84) or the reverse complement of same, where N is any nucleotide and n is an integer from 15 to 30, e.g., from 15 to 20, from 17 to 25, from 17 to 22, from 18 to 22, from 18 to 20, from 20 to 25, or from 25 to 30).
  • a Casl2L protein (of the subject compositions and/or methods) includes an amino acid sequence having 20% or more sequence identity (e.g., 30% or more, 40% or more, 50% or more, 60% or more, 70% or more, 80% or more, 85% or more, 90% or more, 95% or more, 97% or more, 98% or more, 99% or more, or 100% sequence identity) with the Casl2L amino acid sequence depicted in FIG.
  • a Casl2L protein includes an amino acid sequence having 50% or more sequence identity (e.g., 60% or more, 70% or more, 80% or more, 85% or more, 90% or more, 95% or more, 97% or more, 98% or more, 99% or more, or 100% sequence identity) with the Casl2L amino acid sequence depicted in FIG. 5B.
  • a Casl2L protein includes an amino acid sequence having 80% or more sequence identity (e.g., 85% or more, 90% or more, 95% or more, 97% or more, 98% or more, 99% or more, or 100% sequence identity) with the Casl2L amino acid sequence depicted in FIG.
  • a Casl2L protein includes an amino acid sequence having 90% or more sequence identity (e.g., 95% or more, 97% or more, 98% or more, 99% or more, or 100% sequence identity) with the Casl2L amino acid sequence depicted in FIG. 5B.
  • a Casl2L protein includes an amino acid sequence having the Casl2L amino acid sequence depicted in FIG. 5B.
  • a Casl2L protein includes an amino acid sequence having the Casl2L protein sequence depicted in FIG. 5B, with the exception that the sequence includes an amino acid substitution (e.g., 1, 2, or 3 amino acid substitutions) that reduces the naturally occurring catalytic activity of the protein.
  • the Casl2L polypeptide has a length of from 730 amino acids (aa) to 775 aa, e.g., from 730 aa to 740 aa, from 740 aa to 750 aa, or from 750 aa to 775 aa). In some cases, the Casl2L polypeptide has a length of 746 amino acids.
  • a guide RNA that binds a Casl2L polypeptide includes the following nucleotide sequence: ATTGTTGTAACTCTTATTTTGTATGGAGTAAACAAC (SEQ ID NO: 85) or the reverse complement of same.
  • the guide RNA comprises the nucleotide sequence (N)nATTGTTGTAACTCTTATTTTGTATGGAGTAAACAAC (SEQ ID NO: 86) or the reverse complement of same, where N is any nucleotide and n is an integer from 15 to 30, e.g., from 15 to 20, from 17 to 25, from 17 to 22, from 18 to 22, from 18 to 20, from 20 to 25, or from 25 to 30).
  • a Casl2L protein (of the subject compositions and/or methods) includes an amino acid sequence having 20% or more sequence identity (e.g., 30% or more, 40% or more, 50% or more, 60% or more, 70% or more, 80% or more, 85% or more, 90% or more, 95% or more, 97% or more, 98% or more, 99% or more, or 100% sequence identity) with the Casl2L amino acid sequence depicted in FIG.
  • a Casl2L protein includes an amino acid sequence having 50% or more sequence identity (e.g., 60% or more, 70% or more, 80% or more, 85% or more, 90% or more, 95% or more, 97% or more, 98% or more, 99% or more, or 100% sequence identity) with the Casl2L amino acid sequence depicted in FIG. 5C.
  • a Casl2L protein includes an amino acid sequence having 80% or more sequence identity (e.g., 85% or more, 90% or more, 95% or more, 97% or more, 98% or more, 99% or more, or 100% sequence identity) with the Casl2L amino acid sequence depicted in FIG.
  • a Casl2L protein includes an amino acid sequence having 90% or more sequence identity (e.g., 95% or more, 97% or more, 98% or more, 99% or more, or 100% sequence identity) with the Casl2L amino acid sequence depicted in FIG. 5C.
  • a Casl2L protein includes an amino acid sequence having the Casl2L amino acid sequence depicted in FIG. 5C.
  • a Casl2L protein includes an amino acid sequence having the Casl2L protein sequence depicted in FIG. 5C, with the exception that the sequence includes an amino acid substitution (e.g., 1, 2, or 3 amino acid substitutions) that reduces the naturally occurring catalytic activity of the protein.
  • the Casl2L polypeptide has a length of from 730 amino acids (aa) to 775 aa, e.g., from 730 aa to 740 aa, from 740 aa to 750 aa, or from 750 aa to 775 aa). In some cases, the Casl2L polypeptide has a length of 746 amino acids.
  • a guide RNA that binds a Casl2L polypeptide includes the following nucleotide sequence: ATTGTTGTAACTCTTATTTTGTATGGAGTAAACAAC (SEQ ID NO: 85) or the reverse complement of same.
  • the guide RNA comprises the nucleotide sequence (N)nATTGTTGTAACTCTTATTTTGTATGGAGTAAACAAC (SEQ ID NO: 86) or the reverse complement of same, where N is any nucleotide and n is an integer from 15 to 30, e.g., from 15 to 20, from 17 to 25, from 17 to 22, from 18 to 22, from 18 to 20, from 20 to 25, or from 25 to 30).
  • a Casl2L protein (of the subject compositions and/or methods) includes an amino acid sequence having 20% or more sequence identity (e.g., 30% or more, 40% or more, 50% or more, 60% or more, 70% or more, 80% or more, 85% or more, 90% or more, 95% or more, 97% or more, 98% or more, 99% or more, or 100% sequence identity) with the Casl2L amino acid sequence depicted in FIG.
  • a Casl2L protein includes an amino acid sequence having 50% or more sequence identity (e.g., 60% or more, 70% or more, 80% or more, 85% or more, 90% or more, 95% or more, 97% or more, 98% or more, 99% or more, or 100% sequence identity) with the Casl2L amino acid sequence depicted in FIG. 5D.
  • a Casl2L protein includes an amino acid sequence having 80% or more sequence identity (e.g., 85% or more, 90% or more, 95% or more, 97% or more, 98% or more, 99% or more, or 100% sequence identity) with the Casl2L amino acid sequence depicted in FIG.
  • a Casl2L protein includes an amino acid sequence having 90% or more sequence identity (e.g., 95% or more, 97% or more, 98% or more, 99% or more, or 100% sequence identity) with the Casl2L amino acid sequence depicted in FIG. 5D.
  • a Casl2L protein includes an amino acid sequence having the Casl2L amino acid sequence depicted in FIG. 5D.
  • a Casl2L protein includes an amino acid sequence having the Casl2L protein sequence depicted in FIG. 5D, with the exception that the sequence includes an amino acid substitution (e.g., 1, 2, or 3 amino acid substitutions) that reduces the naturally occurring catalytic activity of the protein.
  • the Casl2L polypeptide has a length of from 800 amino acids (aa) to 875 aa, e.g., from 800 aa to 825 aa, from 825 aa to 850 aa, or from 850 aa to 875 aa). In some cases, the Casl2L polypeptide has a length of 828 amino acids.
  • a guide RNA that binds a Casl2L polypeptide includes the following nucleotide sequence: ACTGTTGGTTATCCTAATTTTATGGGAATACACAAC (SEQ ID NO: 87) or the reverse complement of same.
  • the guide RNA comprises the nucleotide sequence (N)nACTGTTGGTTATCCTAATTTTATGGGAATACACAAC (SEQ ID NO:88) or the reverse complement of same, where N is any nucleotide and n is an integer from 15 to 30, e.g., from 15 to 20, from 17 to 25, from 17 to 22, from 18 to 22, from 18 to 20, from 20 to 25, or from 25 to 30).
  • a Casl2L protein (of the subject compositions and/or methods) includes an amino acid sequence having 20% or more sequence identity (e.g., 30% or more, 40% or more, 50% or more, 60% or more, 70% or more, 80% or more, 85% or more, 90% or more, 95% or more, 97% or more, 98% or more, 99% or more, or 100% sequence identity) with the Casl2L amino acid sequence depicted in FIG.
  • a Casl2L protein includes an amino acid sequence having 50% or more sequence identity (e.g., 60% or more, 70% or more, 80% or more, 85% or more, 90% or more, 95% or more, 97% or more, 98% or more, 99% or more, or 100% sequence identity) with the Casl2L amino acid sequence depicted in FIG. 5E.
  • a Casl2L protein includes an amino acid sequence having 80% or more sequence identity (e.g., 85% or more, 90% or more, 95% or more, 97% or more, 98% or more, 99% or more, or 100% sequence identity) with the Casl2L amino acid sequence depicted in FIG.
  • a Casl2L protein includes an amino acid sequence having 90% or more sequence identity (e.g., 95% or more, 97% or more, 98% or more, 99% or more, or 100% sequence identity) with the Casl2L amino acid sequence depicted in FIG. 5E.
  • a Casl2L protein includes an amino acid sequence having the Casl2L amino acid sequence depicted in FIG. 5E.
  • a Cast 2L protein includes an amino acid sequence having the Casl2L protein sequence depicted in FIG. 5E, with the exception that the sequence includes an amino acid substitution (e.g., 1, 2, or 3 amino acid substitutions) that reduces the naturally occurring catalytic activity of the protein.
  • the Casl2L polypeptide has a length of from 800 amino acids (aa) to 875 aa, e.g., from 800 aa to 825 aa, from 825 aa to 850 aa, or from 850 aa to 875 aa). In some cases, the Casl2L polypeptide has a length of 828 amino acids.
  • a guide RNA that binds a Casl2L polypeptide e.g., a Casl2L polypeptide comprising an amino acid sequence having 20% or more, 30% or more, 40% or more, 50% or more, 60% or more, 70% or more, 80% or more, 85% or more, 90% or more, 95% or more, 97% or more, 98% or more, 99% or more, or 100%, amino acid sequence identity to the Casl2L amino acid sequence depicted in FIG. 5E) includes the following nucleotide sequence: ACTGTTGGTTATCCTAATTTTATGGGAATACACAAC (SEQ ID NO: 87) or the reverse complement of same.
  • the guide RNA comprises the nucleotide sequence (N)nACTGTTGGTTATCCTAATTTTATGGGAATACACAAC (SEQ ID NO:88) or the reverse complement of same, where N is any nucleotide and n is an integer from 15 to 30, e.g., from 15 to 20, from 17 to 25, from 17 to 22, from 18 to 22, from 18 to 20, from 20 to 25, or from 25 to 30).
  • a Casl2L protein (of the subject compositions and/or methods) comprises an amino acid sequence having 20% or more sequence identity (e.g., 30% or more, 40% or more, 50% or more, 60% or more, 70% or more, 80% or more, 85% or more, 90% or more, 95% or more, 97% or more, 98% or more, 99% or more, or 100% sequence identity) with the Casl2L amino acid sequence depicted in FIG.
  • a Casl2L protein comprises an amino acid sequence having 50% or more sequence identity (e.g., 60% or more, 70% or more, 80% or more, 85% or more, 90% or more, 95% or more, 97% or more, 98% or more, 99% or more, or 100% sequence identity) with the Casl2L amino acid sequence depicted in FIG. 5F.
  • a Casl2L protein comprises an amino acid sequence having 80% or more sequence identity (e.g., 85% or more, 90% or more, 95% or more, 97% or more, 98% or more, 99% or more, or 100% sequence identity) with the Casl2L amino acid sequence depicted in FIG.
  • a Casl2L protein comprises an amino acid sequence having 90% or more sequence identity (e.g., 95% or more, 97% or more, 98% or more, 99% or more, or 100% sequence identity) with the Casl2L amino acid sequence depicted in FIG. 5F.
  • a Casl2L protein comprises the Casl2L amino acid sequence depicted in FIG. 5F.
  • a Casl2L protein comprises an amino acid sequence having the Casl2L protein sequence depicted in FIG. 5F, with the exception that the sequence includes an amino acid substitution (e.g., 1, 2, or 3 amino acid substitutions) that reduces the naturally occurring catalytic activity of the protein.
  • the Casl 2L polypeptide has a length of from 800 amino acids (aa) to 875 aa, e.g., from 800 aa to 825 aa, from 825 aa to 850 aa, or from 850 aa to 875 aa). In some cases, the Casl2L polypeptide has a length of 828 amino acids.
  • a guide RNA that binds a Casl2L polypeptide includes the following nucleotide sequence: ATTGTTGGTTATCCTAATTTTATAGGAATACACAAC (SEQ ID NO: 89) or the reverse complement of same.
  • the guide RNA comprises the nucleotide sequence (N)nATTGTTGGTTATCCTAATTTTATAGGAATACACAAC (SEQ ID NO:90) or the reverse complement of same, where N is any nucleotide and n is an integer from 15 to 30, e.g., from 15 to 20, from 17 to 25, from 17 to 22, from 18 to 22, from 18 to 20, from 20 to 25, or from 25 to 30).
  • a Casl2L protein (of the subject compositions and/or methods) includes an amino acid sequence having 20% or more sequence identity (e.g., 30% or more, 40% or more, 50% or more, 60% or more, 70% or more, 80% or more, 85% or more, 90% or more, 95% or more, 97% or more, 98% or more, 99% or more, or 100% sequence identity) with the Casl2L amino acid sequence depicted in FIG.
  • a Casl2L protein includes an amino acid sequence having 50% or more sequence identity (e.g., 60% or more, 70% or more, 80% or more, 85% or more, 90% or more, 95% or more, 97% or more, 98% or more, 99% or more, or 100% sequence identity) with the Casl2L amino acid sequence depicted in FIG. 5G.
  • a Casl2L protein includes an amino acid sequence having 80% or more sequence identity (e.g., 85% or more, 90% or more, 95% or more, 97% or more, 98% or more, 99% or more, or 100% sequence identity) with the Casl2L amino acid sequence depicted in FIG.
  • a Casl2L protein includes an amino acid sequence having 90% or more sequence identity (e.g., 95% or more, 97% or more, 98% or more, 99% or more, or 100% sequence identity) with the Casl2L amino acid sequence depicted in FIG. 5G.
  • a Casl2L protein includes an amino acid sequence having the Casl2L amino acid sequence depicted in FIG. 5G.
  • a Cas12L protein includes an amino acid sequence having the Casl2L protein sequence depicted in FIG. 5G, with the exception that the sequence includes an amino acid substitution (e.g., 1, 2, or 3 amino acid substitutions) that reduces the naturally occurring catalytic activity of the protein.
  • the Casl2L polypeptide has a length of from 800 amino acids (aa) to 875 aa, e.g., from 800 aa to 825 aa, from 825 aa to 850 aa, or from 850 aa to 875 aa). In some cases, the Casl2L polypeptide has a length of 827 amino acids.
  • a guide RNA that binds a Casl2L polypeptide includes the following nucleotide sequence: ACTGTTGGAGTACTTAATTTTATGGGTATTCACAAC (SEQ ID NO:91) or the reverse complement of same.
  • the guide RNA comprises the nucleotide sequence (N)nACTGTTGGAGTACTTAATTTTATGGGTATTCACAAC (SEQ ID NO:92) or the reverse complement of same, where N is any nucleotide and n is an integer from 15 to 30, e.g., from 15 to 20, from 17 to 25, from 17 to 22, from 18 to 22, from 18 to 20, from 20 to 25, or from 25 to 30).
  • a Casl2L protein (of the subject compositions and/or methods) includes an amino acid sequence having 20% or more sequence identity (e.g., 30% or more, 40% or more, 50% or more, 60% or more, 70% or more, 80% or more, 85% or more, 90% or more, 95% or more, 97% or more, 98% or more, 99% or more, or 100% sequence identity) with the Casl2L amino acid sequence depicted in FIG.
  • a Casl2L protein includes an amino acid sequence having 50% or more sequence identity (e.g., 60% or more, 70% or more, 80% or more, 85% or more, 90% or more, 95% or more, 97% or more, 98% or more, 99% or more, or 100% sequence identity) with the Casl2L amino acid sequence depicted in FIG. 5H.
  • a Casl2L protein includes an amino acid sequence having 80% or more sequence identity (e.g., 85% or more, 90% or more, 95% or more, 97% or more, 98% or more, 99% or more, or 100% sequence identity) with the Casl2L amino acid sequence depicted in FIG.
  • a Casl2L protein includes an amino acid sequence having 90% or more sequence identity (e.g., 95% or more, 97% or more, 98% or more, 99% or more, or 100% sequence identity) with the Casl2L amino acid sequence depicted in FIG. 5H.
  • a Casl2L protein includes an amino acid sequence having the Casl2L amino acid sequence depicted in FIG. 5H.
  • a Casl2L protein includes an amino acid sequence having the Casl2L protein sequence depicted in FIG. 5H, with the exception that the sequence includes an amino acid substitution (e.g., 1, 2, or 3 amino acid substitutions) that reduces the naturally occurring catalytic activity of the protein.
  • the Cas12L polypeptide has a length of from 700 amino acids (aa) to 750 aa, e.g., from 700 aa to 725 aa, from 725 aa to 735 aa, from 735 aa to 740 aa, or from 740 aa to 750 aa). In some cases, the Casl2L polypeptide has a length of 738 amino acids.
  • a guide RNA that binds a Casl2L polypeptide includes the following nucleotide sequence: ACTGTTGGAGTACTTAATTTTATGGGTATTCACAAC (SEQ ID NO:91) or the reverse complement of same.
  • the guide RNA comprises the nucleotide sequence (N)nACTGTTGGAGTACTTAATTTTATGGGTATTCACAAC (SEQ ID NO:92) or the reverse complement of same, where N is any nucleotide and n is an integer from 15 to 30, e.g., from 15 to 20, from 17 to 25, from 17 to 22, from 18 to 22, from 18 to 20, from 20 to 25, or from 25 to 30).
  • FIG. 51 [0130] FIG. 51
  • a Casl2L protein (of the subject compositions and/or methods) includes an amino acid sequence having 20% or more sequence identity (e.g., 30% or more, 40% or more, 50% or more, 60% or more, 70% or more, 80% or more, 85% or more, 90% or more, 95% or more, 97% or more, 98% or more, 99% or more, or 100% sequence identity) with the Casl2L amino acid sequence depicted in FIG.
  • a Casl2L protein includes an amino acid sequence having 50% or more sequence identity (e.g., 60% or more, 70% or more, 80% or more, 85% or more, 90% or more, 95% or more, 97% or more, 98% or more, 99% or more, or 100% sequence identity) with the Casl2L amino acid sequence depicted in FIG. 51.
  • a Casl2L protein includes an amino acid sequence having 80% or more sequence identity (e.g., 85% or more, 90% or more, 95% or more, 97% or more, 98% or more, 99% or more, or 100% sequence identity) with the Casl2L amino acid sequence depicted in FIG. 51.
  • a Casl2L protein includes an amino acid sequence having 90% or more sequence identity (e.g., 95% or more, 97% or more, 98% or more, 99% or more, or 100% sequence identity) with the Casl2L amino acid sequence depicted in FIG. 51.
  • a Casl2L protein includes an amino acid sequence having the Casl2L amino acid sequence depicted in FIG. 51.
  • a Casl2L protein includes an amino acid sequence having the Casl2L protein sequence depicted in FIG. 51, with the exception that the sequence includes an amino acid substitution (e.g., 1, 2, or 3 amino acid substitutions) that reduces the naturally occurring catalytic activity of the protein.
  • the Casl2L polypeptide has a length of from 740 amino acids (aa) to 800 aa, e.g., from 740 aa to 750 aa, from 750 aa to 775 aa, or from 775 aa to 800 aa). In some cases, the Casl2L polypeptide has a length of 767 amino acids.
  • a guide RNA that binds a Casl2L polypeptide includes the following nucleotide sequence: ACTGTTGGAGTACTTAATTTTATGGGTATTCACAAC (SEQ ID NO:91) or the reverse complement of same.
  • the guide RNA comprises the nucleotide sequence
  • N nACTGTTGGAGTACTTAATTTTATGGGTATTCACAAC (SEQ ID NO:92) or the reverse complement of same, where N is any nucleotide and n is an integer from 15 to 30, e.g., from 15 to 20, from 17 to 25, from 17 to 22, from 18 to 22, from 18 to 20, from 20 to 25, or from 25 to 30).
  • a Casl2L protein (of the subject compositions and/or methods) includes an amino acid sequence having 20% or more sequence identity (e.g., 30% or more, 40% or more, 50% or more, 60% or more, 70% or more, 80% or more, 85% or more, 90% or more, 95% or more, 97% or more, 98% or more, 99% or more, or 100% sequence identity) with the Casl2L amino acid sequence depicted in FIG.
  • a Casl2L protein includes an amino acid sequence having 50% or more sequence identity (e.g., 60% or more, 70% or more, 80% or more, 85% or more, 90% or more, 95% or more, 97% or more, 98% or more, 99% or more, or 100% sequence identity) with the Casl2L amino acid sequence depicted in FIG. 5J.
  • a Casl2L protein includes an amino acid sequence having 80% or more sequence identity (e.g., 85% or more, 90% or more, 95% or more, 97% or more, 98% or more, 99% or more, or 100% sequence identity) with the Casl2L amino acid sequence depicted in FIG.
  • a Casl2L protein includes an amino acid sequence having 90% or more sequence identity (e.g., 95% or more, 97% or more, 98% or more, 99% or more, or 100% sequence identity) with the Casl2L amino acid sequence depicted in FIG. 5 J.
  • a Casl2L protein includes an amino acid sequence having the Casl2L amino acid sequence depicted in FIG. 5J.
  • a Casl2L protein includes an amino acid sequence having the Casl2L protein sequence depicted in FIG. 5J, with the exception that the sequence includes an amino acid substitution (e.g., 1, 2, or 3 amino acid substitutions) that reduces the naturally occurring catalytic activity of the protein.
  • the Casl2L polypeptide has a length of from 740 amino acids (aa) to 800 aa, e.g., from 740 aa to 750 aa, from 750 aa to 775 aa, or from 775 aa to 800 aa). In some cases, the Casl2L polypeptide has a length of 767 amino acids.
  • a guide RNA that binds a Casl2L polypeptide includes the following nucleotide sequence: ACTGTTGGAGTACTTAATTTTATGGGTATTCACAAC (SEQ ID NO:91) or the reverse complement of same.
  • the guide RNA comprises the nucleotide sequence (N)nACTGTTGGAGTACTTAATTTTATGGGTATTCACAAC (SEQ ID NO:92) or the reverse complement of same, where N is any nucleotide and n is an integer from 15 to 30, e.g., from 15 to 20, from 17 to 25, from 17 to 22, from 18 to 22, from 18 to 20, from 20 to 25, or from 25 to 30).
  • a Casl2L protein (of the subject compositions and/or methods) comprises an amino acid sequence having 20% or more sequence identity (e.g., 30% or more, 40% or more, 50% or more, 60% or more, 70% or more, 80% or more, 85% or more, 90% or more, 95% or more, 97% or more, 98% or more, 99% or more, or 100% sequence identity) with the Casl2L amino acid sequence depicted in FIG.
  • a Casl2L protein comprises an amino acid sequence having 50% or more sequence identity (e.g., 60% or more, 70% or more, 80% or more, 85% or more, 90% or more, 95% or more, 97% or more, 98% or more, 99% or more, or 100% sequence identity) with the Casl2L amino acid sequence depicted in FIG. 5K.
  • a Casl2L protein comprises an amino acid sequence having 80% or more sequence identity (e.g., 85% or more, 90% or more, 95% or more, 97% or more, 98% or more, 99% or more, or 100% sequence identity) with the Casl2L amino acid sequence depicted in FIG.
  • a Casl2L protein comprises an amino acid sequence having 90% or more sequence identity (e.g., 95% or more, 97% or more, 98% or more, 99% or more, or 100% sequence identity) with the Casl2L amino acid sequence depicted in FIG. 5K.
  • a Casl2L protein comprises an amino acid sequence having having the Casl2L amino acid sequence depicted in FIG. 5K.
  • a Casl2L protein comprises an amino acid sequence having the Casl2L protein sequence depicted in FIG. 5K, with the exception that the sequence includes an amino acid substitution (e.g., 1, 2, or 3 amino acid substitutions) that reduces the naturally occurring catalytic activity of the protein.
  • the Casl2L polypeptide has a length of from 600 amino acids (aa) to 700 aa, e.g., from 600 aa to 625 aa, from 625 aa to 650 aa, from 650 aa to 675 aa, or from 675 aa to 700 aa). In some cases, the Casl2L polypeptide has a length of 638 amino acids.
  • a guide RNA that binds a Casl2L polypeptide e.g., a Casl2L polypeptide comprising an amino acid sequence having 20% or more, 30% or more, 40% or more, 50% or more, 60% or more, 70% or more, 80% or more, 85% or more, 90% or more, 95% or more, 97% or more, 98% or more, 99% or more, or 100%, amino acid sequence identity to the Casl2L amino acid sequence depicted in FIG. 5K) includes the following nucleotide sequence: CTTGTTGTACATACTCTTTTATAGGTATTAAACAAC (SEQ ID NO:93) or the reverse complement of same.
  • the guide RNA comprises the nucleotide sequence (N)nCTTGTTGTACATACTCTTTTATAGGTATTAAACAAC (SEQ ID NO:94) or the reverse complement of same, where N is any nucleotide and n is an integer from 15 to 30, e.g., from 15 to 20, from 17 to 25, from 17 to 22, from 18 to 22, from 18 to 20, from 20 to 25, or from 25 to 30).
  • FIG. 5L [0136] FIG. 5L
  • a Casl2L protein (of the subject compositions and/or methods) includes a contiguous stretch of about 92 amino acids having 20% or more sequence identity (e.g., 30% or more, 40% or more, 50% or more, 60% or more, 70% or more, 80% or more, 85% or more, 90% or more, 95% or more, 97% or more, 98% or more, 99% or more, or 100% sequence identity) with the Casl2L amino acid sequence depicted in FIG.
  • a Casl2L protein includes a contiguous stretch of about 92 amino acids having 50% or more sequence identity (e.g., 60% or more, 70% or more, 80% or more, 85% or more, 90% or more, 95% or more, 97% or more, 98% or more, 99% or more, or 100% sequence identity) with the Casl2L amino acid sequence depicted in FIG. 5L.
  • a Casl2L protein includes a contiguous stretch of about 92 amino acids having 80% or more sequence identity (e.g., 85% or more, 90% or more, 95% or more, 97% or more, 98% or more, 99% or more, or 100% sequence identity) with the Casl2L amino acid sequence depicted in FIG. 5L.
  • a Casl2L protein includes a contiguous stretch of about 92 amino acids having 90% or more sequence identity (e.g., 95% or more, 97% or more, 98% or more, 99% or more, or 100% sequence identity) with the Casl2L amino acid sequence depicted in FIG. 5L.
  • a Casl2L protein includes a contiguous stretch of about 92 amino acids having the Casl2L amino acid sequence depicted in FIG. 5L. In some cases, a Casl2L protein includes a contiguous stretch of about 92 amino acids having the Casl2L protein sequence depicted in FIG. 5L, with the exception that the sequence includes an amino acid substitution (e.g., 1, 2, or 3 amino acid substitutions) that reduces the naturally occurring catalytic activity of the protein.
  • an amino acid substitution e.g., 1, 2, or 3 amino acid substitutions
  • the Casl2L polypeptide has a length of from 700 amino acids (aa) to 800 aa, e.g., from 700 aa to 725 aa, from 725 aa to 750 aa, from 750 aa to 775 aa, or from 775 aa to 800 aa). In some cases, the Casl2L polypeptide has a length of from 725 amino acids to 775 amino acids. In some cases, the Casl2L polypeptide has a length of 754 amino acids.
  • a guide RNA that binds a Casl2L polypeptide e.g., a Casl2L polypeptide comprising an amino acid sequence having 20% or more, 30% or more, 40% or more, 50% or more, 60% or more, 70% or more, 80% or more, 85% or more, 90% or more, 95% or more, 97% or more, 98% or more, 99% or more, or 100%, amino acid sequence identity to the Casl2L amino acid sequence depicted in FIG. 5L
  • amino acid sequence identity to the Casl2L amino acid sequence depicted in FIG. 5L includes the following nucleotide sequence: CTTGTTGTACATACTCTTTTATAGGTATTAAACAAC (SEQ ID NO:93) or the reverse complement of same.
  • the guide RNA comprises the nucleotide sequence (N)nCTTGTTGTACATACTCTTTTATAGGTATTAAACAAC (SEQ ID NO:94) or the reverse complement of same, where N is any nucleotide and n is an integer from 15 to 30, e.g., from 15 to 20, from 17 to 25, from 17 to 22, from 18 to 22, from 18 to 20, from 20 to 25, or from 25 to 30).
  • a Casl2L protein (of the subject compositions and/or methods) includes an amino acid sequence having 20% or more sequence identity (e.g., 30% or more, 40% or more, 50% or more, 60% or more, 70% or more, 80% or more, 85% or more, 90% or more, 95% or more, 97% or more, 98% or more, 99% or more, or 100% sequence identity) with the Casl2L amino acid sequence depicted in FIG.
  • a Casl2L protein includes an amino acid sequence having 50% or more sequence identity (e.g., 60% or more, 70% or more, 80% or more, 85% or more, 90% or more, 95% or more, 97% or more, 98% or more, 99% or more, or 100% sequence identity) with the Casl2L amino acid sequence depicted in FIG. 5M.
  • a Casl2L protein includes an amino acid sequence having 80% or more sequence identity (e.g., 85% or more, 90% or more, 95% or more, 97% or more, 98% or more, 99% or more, or 100% sequence identity) with the Casl2L amino acid sequence depicted in FIG.
  • a Casl2L protein includes an amino acid sequence having 90% or more sequence identity (e.g., 95% or more, 97% or more, 98% or more, 99% or more, or 100% sequence identity) with the Casl2L amino acid sequence depicted in FIG. 5M.
  • a Casl2L protein includes an amino acid sequence having the Casl2L amino acid sequence depicted in FIG. 5M.
  • a Casl2L protein includes an amino acid sequence having the Casl2L protein sequence depicted in FIG. 5M, with the exception that the sequence includes an amino acid substitution (e.g., 1, 2, or 3 amino acid substitutions) that reduces the naturally occurring catalytic activity of the protein.
  • the Casl2L polypeptide has a length of from 700 amino acids (aa) to 800 aa, e.g., from 700 aa to 725 aa, from 725 aa to 750 aa, from 750 aa to 775 aa, or from 775 aa to 800 aa). In some cases, the Casl2L polypeptide has a length of 746 amino acids.
  • a guide RNA that binds a Casl2L polypeptide includes the following nucleotide sequence: ATTGTTGTAACTCTTATTTTGTATGGAGTAAACAAC (SEQ ID NO:85) or the reverse complement of same.
  • the guide RNA comprises the nucleotide sequence (N)nATTGTTGTAACTCTTATTTTGTATGGAGTAAACAAC (SEQ ID NO: 86) or the reverse complement of same, where N is any nucleotide and n is an integer from 15 to 30, e.g., from 15 to 20, from 17 to 25, from 17 to 22, from 18 to 22, from 18 to 20, from 20 to 25, or from 25 to 30).
  • a variant Casl2L protein has an amino acid sequence that is different by at least one amino acid (e.g., has a deletion, insertion, substitution, fusion) when compared to the amino acid sequence of the corresponding wild type Casl2L protein, e.g., when compared to the Casl2L amino acid sequence depicted in any one of FIG. 5A-5M.
  • a Casl2L variant comprises from 1 amino acid substitution to 10 amino acid substitutions compared to the Casl2L amino acid sequence depicted in any one of FIG. 5A-5M.
  • a Casl2L variant comprises from 1 amino acid substitution to 10 amino acid substitutions in the RuvC domain, compared to the Casl2L amino acid sequence depicted in any one of FIG. 5A-5M.
  • the Casl2L protein is a variant Casl2L protein, e.g., mutated relative to the naturally occurring catalytically active sequence, and exhibits reduced cleavage activity (e.g., exhibits 90%, or less, 80% or less, 70% or less, 60% or less, 50% or less, 40% or less, or 30% or less cleavage activity) when compared to the corresponding naturally occurring sequence.
  • such a variant Casl2L protein is a catalytically ‘dead’ protein (has substantially no cleavage activity) and can be referred to as a ‘dCas!
  • the variant Cas12L protein is a nickase (cleaves only one strand of a double stranded target nucleic acid, e.g., a double stranded target DNA).
  • a Casl2L protein in some case a Casl2L protein with wild type cleavage activity and in some cases a variant Casl2L with reduced cleavage activity, e.g., a dCasl2L or a nickase Casl2L
  • a heterologous polypeptide that has an activity of interest (e.g., a catalytic activity of interest) to form a fusion protein (a fusion Casl2L protein).
  • a variant Casl2L polypeptide comprises a substitution of one or more of D336, E523, and D676 based on the amino acid numbering of the amino acid sequence depicted in FIG. 5A, or corresponding amino acids of a Casl2L polypeptide depicted in any one of FIG. 5B-5M, where the variant Casl2L polypeptide exhibits reduced catalytic activity compared to a control Casl2L polypeptide that does not include the substitutions.
  • “corresponding amino acids” are shown in bold and underlining in FIG. 5B and FIG. 5C.
  • a variant Casl2L polypeptide comprises a D336A substitution, i.e., D336, based on the amino acid numbering of the amino acid sequence depicted in FIG. 5 A, or a corresponding amino acid of a Casl2L polypeptide depicted in any one of FIG. 5B-5M, is replaced with an Ala.
  • a variant Casl2L polypeptide comprises an E523A substitution, i.e., E523, based on the amino acid numbering of the amino acid sequence depicted in FIG. 5A, or a corresponding amino acid of a Casl2L polypeptide depicted in any one of FIG.
  • a variant Casl2L polypeptide comprises a D676A substitution, i.e., D676, based on the amino acid numbering of the amino acid sequence depicted in FIG. 5 A, or a corresponding amino acid of a Casl2L polypeptide depicted in any one of FIG. 5B-5M, is replaced with an Ala.
  • a variant Casl2L polypeptide comprises D336A, E523, and D676 substitutions, i.e., each of D336, E523, and D676, based on the amino acid numbering of the amino acid sequence depicted in FIG.
  • a variant Casl2L polypeptide comprises a substitution of the Asn at position 102 (N102) of the CasL polypeptides depicted in FIG. 5A-5C, or corresponding positions in the CasL polypeptide of FIG. 5D-5M.
  • Substitution of the N102 with another amino acid can modify the PAM requirement. For example, substitution of N102 with Q, S, E, T, or D could expand the PAM from R(-l) to C, T, or N.
  • a variant Casl2L polypeptide comprises a substitution of the Asn at position 102 (N102) of the CasL polypeptides depicted in FIG. 5A-5C, or corresponding positions in the CasL polypeptide of FIG. 5D-5M, with Gin.
  • a variant Casl2L polypeptide comprises a substitution of the Asn at position 102 (N102) of the CasL polypeptides depicted in FIG. 5A-5C, or corresponding positions in the CasL polypeptide of FIG. 5D-5M, with Ser.
  • a variant Casl2L polypeptide comprises a substitution of the Asn at position 102 (N102) of the CasL polypeptides depicted in FIG. 5A-5C, or corresponding positions in the CasL polypeptide of FIG. 5D-5M, with Glu.
  • a variant Casl2L polypeptide comprises a substitution of the Asn at position 102 (N102) of the CasL polypeptides depicted in FIG. 5A-5C, or corresponding positions in the CasL polypeptide of FIG. 5D-5M, with Asp.
  • a Casl2L protein in some cases a Casl2L protein with wild type cleavage activity and in some cases a variant Casl2L with reduced cleavage activity, e.g., a dCasl2L or a nickase Casl2L
  • a heterologous polypeptide has an activity of interest (e.g., a catalytic activity of interest) to form a fusion protein.
  • a heterologous polypeptide to which a Casl2L protein can be fused is referred to herein as a “fusion partner.”
  • the fusion partner can modulate transcription (e.g., inhibit transcription, increase transcription) of a target DNA.
  • the fusion partner is a protein (or a domain from a protein) that inhibits transcription (e.g., a transcriptional repressor, a protein that functions via recruitment of transcription inhibitor proteins, modification of target DNA such as methylation, recruitment of a DNA modifier, modulation of histones associated with target DNA, recruitment of a histone modifier such as those that modify acetylation and/or methylation of histones, and the like).
  • the fusion partner is a protein (or a domain from a protein) that increases transcription (e.g., a transcription activator, a protein that acts via recruitment of transcription activator proteins, modification of target DNA such as demethylation, recruitment of a DNA modifier, modulation of histones associated with target DNA, recruitment of a histone modifier such as those that modify acetylation and/or methylation of histones, and the like).
  • the fusion partner is a reverse transcriptase.
  • the fusion partner is a base editor.
  • the fusion partner (heterologous polypeptide) is a deaminase.
  • a fusion Casl2L protein includes a heterologous polypeptide that has enzymatic activity that modifies a target nucleic acid (e.g., nuclease activity, methyltransferase activity, demethylase activity, DNA repair activity, DNA damage activity, deamination activity, dismutase activity, alkylation activity, depurination activity, oxidation activity, pyrimidine dimer forming activity, integrase activity, transposase activity, recombinase activity, polymerase activity, ligase activity, helicase activity, photolyase activity or glycosylase activity).
  • a target nucleic acid e.g., nuclease activity, methyltransferase activity, demethylase activity, DNA repair activity, DNA damage activity, deamination activity, dismutase activity, alkylation activity, depurination activity, oxidation activity, pyrimidine dimer forming activity, integrase activity,
  • a fusion Casl2L protein includes a heterologous polypeptide that has enzymatic activity that modifies a polypeptide (e.g., a histone) associated with a target nucleic acid (e.g., methyltransferase activity, demethylase activity, acetyltransferase activity, deacetylase activity, kinase activity, phosphatase activity, ubiquitin ligase activity, deubiquitinating activity, adenylation activity, deadenylation activity, SUMOylating activity, deSUMOylating activity, ribosylation activity, deribosylation activity, myristoylation activity or demyristoylation activity).
  • a target nucleic acid e.g., methyltransferase activity, demethylase activity, acetyltransferase activity, deacetylase activity, kinase activity, phosphatase activity, ubiquitin liga
  • proteins (or fragments thereof) that can be used in increase transcription include but are not limited to: transcriptional activators such as VP16, VP64, VP48, VP160, p65 subdomain (e.g., from NFkB), and activation domain of EDLL and/or TAL activation domain (e.g., for activity in plants); histone lysine methyltransferases such as SET1A, SET1B, MLL1 to 5, ASH1, SYMD2, NSD1, and the like; histone lysine demethylases such as JHDM2a/b, UTX, JMJD3, and the like; histone acetyltransferases such as GCN5, PCAF, CBP, p300, TAF1, TIP60/PLIP, M0Z/MYST3, MORF/MYST4, SRC1, ACTR, P160, CLOCK, and the like; and DNA demethylases such as Ten-Eleven Translocation (TET) di
  • K0X1 repression domain the Mad mSIN3 interaction domain (SID); the ERF repressor domain (ERD), the SRDX repression domain (e.g., for repression in plants), and the like; histone lysine methyltransferases such as Pr-SET7/8, SUV4-20H1, RIZ1, and the like; histone lysine demethylases such as JMJD2A/JHDM3A, JMJD2B, JMJD2C/GASC1, JMJD2D, JARID1A/RBP2, JARID1B/PLU-1, JARID1C/SMCX, JARID1D/SMCY, and the like; histone lysine deacetylases such as HDAC1, HDAC2, HDAC3, HDAC8, HDAC4, HDAC5, HDAC7, HDAC9, SIRT1, SIRT2, HDAC11, and the like; DNA methylases such as Hhal DNA m5c-methyltransferase (M
  • the fusion partner has enzymatic activity that modifies the target nucleic acid (e.g., ssRNA, dsRNA, ssDNA, dsDNA).
  • enzymatic activity that can be provided by the fusion partner include but are not limited to: nuclease activity such as that provided by a restriction enzyme (e.g., FokI nuclease), methyltransferase activity such as that provided by a methyltransferase (e.g., Hhal DNA m5c-methyltransferase (M.Hhal), DNA methyltransferase 1 (DNMT1), DNA methyltransferase 3a (DNMT3a), DNA methyltransferase 3b (DNMT3b), METI, DRM3 (plants), ZMET2, CMT1, CMT2 (plants), and the like); demethylase activity such as that provided by a demethylase (e.g., Ten-Eleven
  • the fusion partner has enzymatic activity that modifies a protein associated with the target nucleic acid (e.g., ssRNA, dsRNA, ssDNA, dsDNA) (e.g., a histone, an RNA binding protein, a DNA binding protein, and the like).
  • a protein associated with the target nucleic acid e.g., ssRNA, dsRNA, ssDNA, dsDNA
  • a histone e.g., an RNA binding protein, a DNA binding protein, and the like.
  • enzymatic activity that modifyies a protein associated with a target nucleic acid
  • enzymatic activity that modifyies a protein associated with a target nucleic acid
  • methyltransferase activity such as that provided by a histone methyltransferase (HMT) (e.g., suppressor of variegation 3-9 homolog 1 (SUV39H1, also known as KMT1A), Vietnamese histone lysine methyltransferase 2 (G9A, also known as KMT1C and EHMT2), SUV39H2, ESET/SETDB1, and the like, SET1 A, SET1 B, MLL1 to 5, ASH1 , SYMD2, NSD1 , DOT1 L, Pr-SET7/8, SUV4-20H1 , EZH2, RIZ1), demethylase activity such as that provided by a histone demethylase (e.g., Lysine Demethylase 1 A (KDM1A also known as LSD1),
  • Suitable fusion partners are dihydrofolate reductase (DHFR) destabilization domain (e.g., to generate a chemically controllable fusion Casl2L protein), and a chloroplast transit peptide.
  • DHFR dihydrofolate reductase
  • Suitable chloroplast transit peptides include, but are not limited to:
  • MAALTTSQLATSATGFGIADRSAPSSLLRHGFQGLKPRSPAGGDATSLSVTTSARATPKQQRSVQ RGSRRFPSVVVC (SEQ ID NO: 102);
  • MESLAATSVFAPSRVAVPAARALVRAGTVVPTRRTSSTSGTSGVKCSAAVTPQASPVISRSAAAA SEQ ID NO: 104
  • MGAAATSMQSLKFSNRLVPPSRRLSPVPNNVTCNNLPKSAAPVRTVKCCASSWNSTINGAAATT NGASAASS SEQ ID NO: 105
  • a Casl2L fusion polypeptide of the present disclosure comprises: a) a Casl2L polypeptide of the present disclosure; and b) a chloroplast transit peptide.
  • a Casl2L polypeptide/guide RNA complex can be targeted to the chloroplast. In some cases, this targeting may be achieved by the presence of an N-terminal extension, called a chloroplast transit peptide (CTP) or plastid transit peptide.
  • CTP chloroplast transit peptide
  • Chromosomal transgenes from bacterial sources must have a sequence encoding a CTP sequence fused to a sequence encoding an expressed polypeptide if the expressed polypeptide is to be compartmentalized in the plant plastid (e.g. chloroplast). Accordingly, localization of an exogenous polypeptide to a chloroplast is often 1 accomplished by means of operably linking a polynucleotide sequence encoding a CTP sequence to the 5' region of a polynucleotide encoding the exogenous polypeptide. The CTP is removed in a processing step during translocation into the plastid.
  • Processing efficiency may, however, be affected by the amino acid sequence of the CTP and nearby sequences at the amino terminus (NH2 terminus) of the peptide.
  • Other options for targeting to the chloroplast which have been described are the maize cab-m7 signal sequence (U.S. Pat. No. 7,022,896, WO 97/41228) a pea glutathione reductase signal sequence (WO 97/41228) and the CTP described in US2009029861.
  • a Casl2L fusion polypeptide of the present disclosure can comprise: a) a Casl2L polypeptide of the present disclosure; and b) an endosomal escape peptide.
  • an endosomal escape polypeptide comprises the amino acid sequence GLFXALLXLLXSLWXLLLXA (SEQ ID NO: 106), wherein each X is independently selected from lysine, histidine, and arginine.
  • an endosomal escape polypeptide comprises the amino acid sequence GLFHALLHLLHSLWHLLLHA (SEQ ID NO: 107).
  • heterologous polypeptides include, but are not limited to, a polypeptide that directly and/or indirectly provides for increased transcription and/or translation of a target nucleic acid (e.g., a transcription activator or a fragment thereof, a protein or fragment thereof that recruits a transcription activator, a small molecule/drug-responsive transcription and/or translation regulator, a translation-regulating protein, etc.).
  • a target nucleic acid e.g., a transcription activator or a fragment thereof, a protein or fragment thereof that recruits a transcription activator, a small molecule/drug-responsive transcription and/or translation regulator, a translation-regulating protein, etc.
  • heterologous polypeptides to accomplish increased or decreased transcription include transcription activator and transcription repressor domains.
  • a fusion Casl2L polypeptide is targeted by the guide nucleic acid (guide RNA) to a specific location (i.e., sequence) in the target nucleic acid and exerts locus-specific regulation such as blocking RNA polymerase binding to a promoter (which selectively inhibits transcription activator function), and/or modifying the local chromatin status (e.g., when a fusion sequence is used that modifies the target nucleic acid or modifies a polypeptide associated with the target nucleic acid).
  • the changes are transient (e.g., transcription repression or activation).
  • the changes are inheritable (e.g., when epigenetic modifications are made to the target nucleic acid or to proteins associated with the target nucleic acid, e.g., nucleosomal histones).
  • heterologous polypeptides for use when tar geting ssRNA tar get nucleic acids include (but are not limited to): splicing factors (e.g., RS domains); protein translation components (e.g., translation initiation, elongation, and/or release factors; e.g., eIF4G); RNA methylases; RNA editing enzymes (e.g., RNA deaminases, e.g., adenosine deaminase acting on RNA (ADAR), including A to I and/or C to U editing enzymes); helicases; RNA-binding proteins; and the like. It is understood that a heterologous polypeptide can include the entire protein or in some cases can include a fragment of the protein (e.g., a functional domain).
  • splicing factors e.g., RS domains
  • protein translation components e.g., translation initiation, elongation, and/or release factors; e.
  • the heterologous polypeptide of a subject fusion Casl2L polypeptide can be any domain capable of interacting with ssRNA (which, for the purposes of this disclosure, includes intramolecular and/or intermolecular secondary structures, e.g., double-stranded RNA duplexes such as hairpins, stemloops, etc.), whether transiently or irreversibly, directly or indirectly, including but not limited to an effector domain selected from the group comprising; Endonucleases (for example RNase III, the CRR22 DYW domain, Dicer, and PIN (PilT N-terminus) domains from proteins such as SMG5 and SMG6); proteins and protein domains responsible for stimulating RNA cleavage (for example CPSF, CstF, CFIm and CFIIm); Exonucleases (for example XRN-1 or Exonuclease T) ; Deadenylases (for example HNT3); proteins and protein domains responsible for nonsense
  • the effector domain may be selected from the group comprising Endonucleases; proteins and protein domains capable of stimulating RNA cleavage; Exonucleases; Deadenylases; proteins and protein domains having nonsense mediated RNA decay activity; proteins and protein domains capable of stabilizing RNA; proteins and protein domains capable of repressing translation; proteins and protein domains capable of stimulating translation; proteins and protein domains capable of modulating translation (e.g., translation factors such as initiation factors, elongation factors, release factors, etc., e.g., eIF4G); proteins and protein domains capable of polyadenylation of RNA; proteins and protein domains capable of polyuridinylation of RNA; proteins and protein domains having RNA localization activity; proteins and protein domains capable of nuclear retention of RNA; proteins and protein domains having RNA nuclear export activity; proteins and protein domains capable of repression of RNA splicing; proteins and protein domains capable of stimulation of RNA splicing; proteins and protein domain
  • RNA splicing factors that can be used (in whole or as fragments thereof) as heterologous polypeptides for a fusion Casl2L polypeptide have modular organization, with separate sequence-specific RNA binding modules and splicing effector domains.
  • members of the Serine/ Arginine-rich (SR) protein family contain N-terminal RNA recognition motifs (RRMs) that bind to exonic splicing enhancers (ESEs) in pre-mRNAs and C-terminal RS domains that promote exon inclusion.
  • RRMs N-terminal RNA recognition motifs
  • ESEs exonic splicing enhancers
  • the hnRNP protein hnRNP Al binds to exonic splicing silencers (ESSs) through its RRM domains and inhibits exon inclusion through a C-terminal Glycine -rich domain.
  • Some splicing factors can regulate alternative use of splice site (ss) by binding to regulatory sequences between the two alternative sites.
  • ss splice site
  • ASF/SF2 can recognize ESEs and promote the use of intron proximal sites
  • hnRNP Al can bind to ESSs and shift splicing towards the use of intron distal sites.
  • One application for such factors is to generate ESFs that modulate alternative splicing of endogenous genes, particularly disease associated genes.
  • Bcl-x pre-mRNA produces two splicing isoforms with two alternative 5' splice sites to encode proteins of opposite functions.
  • the long splicing isoform Bcl-xL is a potent apoptosis inhibitor expressed in long-lived postmitotic cells and is up- regulated in many cancer cells, protecting cells against apoptotic signals.
  • the short isoform Bcl-xS is a pro-apoptotic isoform and expressed at high levels in cells with a high turnover rate (e.g., developing lymphocytes).
  • the ratio of the two Bcl-x splicing isoforms is regulated by multiple co'j-clcmcnts that are located in either the core exon region or the exon extension region (i.e., between the two alternative 5' splice sites).
  • W02010075303 which is hereby incorporated by reference in its entirety.
  • fusion partners include, but are not limited to, proteins (or fragments thereof) that are boundary elements (e.g., CTCF), proteins and fragments thereof that provide periphery recruitment (e.g., Lamin A, Lamin B, etc.), protein docking elements (e.g., FKBP/FRB, Pill/Abyl, etc.).
  • boundary elements e.g., CTCF
  • proteins and fragments thereof that provide periphery recruitment e.g., Lamin A, Lamin B, etc.
  • protein docking elements e.g., FKBP/FRB, Pill/Abyl, etc.
  • a subject fusion Casl2L polypeptide comprises: i) a Casl2L polypeptide of the present disclosure; and ii) one or more heterologous polypeptides (one or more “fusion partners”), where at least one of the one or more heterologous polypeptides is a nuclease.
  • Suitable nucleases include, but are not limited to, a homing nuclease polypeptide; a FokI polypeptide; a transcription activator-like effector nuclease (TALEN) polypeptide; a MegaTAL polypeptide; a meganuclease polypeptide; a zinc finger nuclease (ZFN); an ARCUS nuclease; and the like.
  • the meganuclease can be engineered from an LADLIDADG homing endonuclease (LHE).
  • a megaTAL polypeptide can comprise a TALE DNA binding domain and an engineered meganuclease.
  • a subject fusion Casl2L polypeptide comprises: i) a Casl2L polypeptide of the present disclosure; and ii) one or more heterologous polypeptides, where at least one of the one or more heterologous polypeptides is a reverse transcriptase polypeptide.
  • the Casl2L polypeptide is catalytically inactive.
  • Suitable reverse transcriptases include, e.g., a murine leukemia virus reverse transcriptase; a Rous sarcoma virus reverse transcriptase; a human immunodeficiency virus type I reverse transcriptase; a Moloney murine leukemia virus reverse transcriptase; and the like.
  • a Casl2L fusion polypeptide of the present disclosure comprises: i) a Casl2L polypeptide of the present disclosure; and ii) one or more heterologous polypeptides, where at least one of the one or more heterologous polypeptides is a base editor.
  • Suitable base editors include, e.g., an adenosine deaminase; a cytidine deaminase (e.g., an activation-induced cytidine deaminase (AID)); APOBEC3G; and the like); and the like.
  • a suitable adenosine deaminase is any enzyme that is capable of deaminating adenosine in DNA.
  • the deaminase is a TadA deaminase.
  • a suitable adenosine deaminase comprises an amino acid sequence having at least 80%, at least 85%, at least 90%, at least 95%, at least 98%, at least 99%, or 100%, amino acid sequence identity to the following amino acid sequence: MSEVEFSHEYWMRHALTLAKRAWDEREVPVGAVLVHNNRVIGEGWNRPIGRHDPTAHAEIMA LRQGGLVMQNYRL1DATLYVTLEPCVMCAGAM1HSR1GRVVFGARDAKTGAAGSLMDVLHHP GMNHRVEITEGILADECAALLSDFFRMRRQEIKAQKKAQSSTD (SEQ ID NO: 108)
  • a suitable adenosine deaminase comprises an amino acid sequence having at least 80%, at least 85%, at least 90%, at least 95%, at least 98%, at least 99%, or 100%, amino acid sequence identity to the following amino acid sequence: MRRAFITGVFFLSEVEFSHEYWMRHALTLAKRAWDEREVPVGAVLVHNNRVIGEGWNRPIGRH DPTAHAEIMALRQGGLVMQNYRLIDATLYVTLEPCVMCAGAMIHSRIGRVVFGARDAKTGAAG SLMDVLHHPGMNHRVE1TEGILADECAALLSDFFRMRRQEIKAQKKAQSSTD (SEQ ID NO: 109).
  • a suitable adenosine deaminase comprises an amino acid sequence having at least 80%, at least 85%, at least 90%, at least 95%, at least 98%, at least 99%, or 100%, amino acid sequence identity to the following Staphylococcus aureus TadA amino acid sequence: MGSHMTNDIYFMTLAIEEAKKAAQLGEVPIGAIITKDDEVIARAHNLRETLQQPTAHAEHIAIERA AKVLGSWRLEGCTLYVTLEPCVMCAGTIVMSRIPRVVYGADDPKGGCSGSLMNLLQQSNFNHR AIVDKGVLKEACSTLLTTFFKNLRANKKSTN (SEQ ID NO: 110)
  • a suitable adenosine deaminase comprises an amino acid sequence having at least 80%, at least 85%, at least 90%, at least 95%, at least 98%, at least 99%, or 100%, amino acid sequence identity to the following Bacillus subtilis TadA amino acid sequence:
  • a suitable adenosine deaminase comprises an amino acid sequence having at least 80%, at least 85%, at least 90%, at least 95%, at least 98%, at least 99%, or 100%, amino acid sequence identity to the following Salmonella typhimurium TadA:
  • a suitable adenosine deaminase comprises an amino acid sequence having at least 80%, at least 85%, at least 90%, at least 95%, at least 98%, at least 99%, or 100%, amino acid sequence identity to the following Shewanella putrefaciens TadA amino acid sequence:
  • a suitable adenosine deaminase comprises an amino acid sequence having at least 80%, at least 85%, at least 90%, at least 95%, at least 98%, at least 99%, or 100%, amino acid sequence identity to the following Haemophilus influenzae F3031 TadA amino acid sequence:
  • a suitable adenosine deaminase comprises an amino acid sequence having at least 80%, at least 85%, at least 90%, at least 95%, at least 98%, at least 99%, or 100%, amino acid sequence identity to the following Caulobacter crescentus TadA amino acid sequence:
  • a suitable adenosine deaminase comprises an amino acid sequence having at least 80%, at least 85%, at least 90%, at least 95%, at least 98%, at least 99%, or 100%, amino acid sequence identity to the following Geobacter sulfurreducens TadA amino acid sequence: MSSLKKTPIRDDAYWMGKAIREAAKAAARDEVPIGAVIVRDGAVIGRGHNLREGSNDPSAHAE MIAIRQAARRSANWRLTGATLYVTLEPCLMCMGAIILARLERVVFGCYDPKGGAAGSLYDLSAD PRLNHQVRLSPGVCQEECGTMLSDFFRD
  • the cytidine deaminase is a deaminase from the apolipoprotein B mRNA- editing complex (APOBEC) family of deaminases.
  • APOBEC family deaminase is selected from the group consisting of APOBEC 1 deaminase, APOBEC2 deaminase, APOBEC3A deaminase, APOBEC3B deaminase, APOBEC3C deaminase, APOBEC3D deaminase, APOBEC3F deaminase, APOBEC3G deaminase, and APOBEC3H deaminase.
  • the cytidine deaminase is an activation induced deaminase (AID).
  • a suitable cytidine deaminase comprises an amino acid sequence having at least 80%, at least 85%, at least 90%, at least 95%, at least 98%, at least 99%, or 100%, amino acid sequence identity to the following amino acid sequence: MDSLLMNRRKFLYQFKNVRWAKGRRETYLCYVVKRRDSATSFSLDFGYLRNKNGCHVELLFLR YISDWDLDPGRCYRVTWFTSWSPCYDCARHVADFLRGNPNLSLRIFTARLYFCEDRKAEPEGLR RLHRAGVQIAIMTFKDYFYCWNTFVENHERTFKAWEGLHENSVRLSRQLRRILLPLYEVDDLRD AFRTLGL (SEQ ID NO: 117)
  • a suitable cytidine deaminase is an AID and comprises an amino acid sequence having at least 80%, at least 85%, at least 90%, at least 95%, at least 98%, at least 99%, or 100%, amino acid sequence identity to the following amino acid sequence: MDSLLMNRRKFLYQFKNVRWAKGRRETYLCYVVKRRDSATSFSLDFGYLRNKNGCHVELLFLR YISDWDLDPGRCYRVTWFTSWSPCYDCARHVADFLRGNPNLSLRIFTARLYFCEDRKAEPEGLR RLHRAGVQIAIMTFKENHERTFKAWEGLHENSVRLSRQLRRILLPLYEVDDLRD AFRTLGL (SEQ ID NO: 118).
  • a suitable cytidine deaminase is an AID and comprises an amino acid sequence having at least 80%, at least 85%, at least 90%, at least 95%, at least 98%, at least 99%, or 100%, amino acid sequence identity to the following amino acid sequence: MDSLLMNRRKFLYQFKNVRWAKGRRETYLCYVVKRRDSATSFSLDFGYLRNKNGCHVELLFLR YISDWDLDPGRCYRVTWFTSWSPCYDCARHVADFLRGNPNLSLRIFTARLYFCEDRKAEPEGLR RLHRAGVQIAIMTFKDYFYCWNTFVENHERTFKAWEGLHENSVRLSRQLRRILLPLYEVDDLRD AFRTLGL (SEQ ID NO: 117).
  • a Casl2L fusion polypeptide of the present disclosure comprises: i) a Casl2L polypeptide of the present disclosure; and ii) one or more heterologous polypeptides, where at least one of the one or more heterologous polypeptides is a transcription factor.
  • a transcription factor can include: i) a DNA binding domain; and ii) a transcription activator.
  • a transcription factor can include: i) a DNA binding domain; and ii) a transcription repressor.
  • Suitable transcription factors include polypeptides that include a transcription activator or a transcription repressor domain (e.g., the Kruppel associated box (KRAB or SKD); the Mad mSIN3 interaction domain (SID); the ERF repressor domain (ERD), etc.); zinc-finger-based artificial transcription factors (see, e.g., Sera (2009) Adv. Drug Deliv. 61:513); TALE- based artificial transcription factors (see, e.g., Liu et al. (2013) Nat. Rev. Genetics 14:781); and the like.
  • the transcription factor comprises a VP64 polypeptide (transcriptional activation).
  • the transcription factor comprises a Kriippel-associated box (KRAB) polypeptide (transcriptional repression).
  • the transcription factor comprises a Mad mSIN3 interaction domain (SID) polypeptide (transcriptional repression).
  • the transcription factor comprises an ERF repressor domain (ERD) polypeptide (transcriptional repression).
  • the transcription factor is a transcriptional activator, where the transcriptional activator is GAL4-VP16.
  • a Casl2L fusion polypeptide of the present disclosure comprises: i) a Casl2L polypeptide of the present disclosure; and ii) one or more heterologous polypeptides, where at least one of the one or more heterologous polypeptides is a recombinase.
  • Suitable recombinases include, e.g., a Cre recombinase; a Hin recombinase; a Tre recombinase; a FLP recombinase; and the like.
  • heterologous polypeptide or fragments thereof for a subject fusion Casl2L polypeptide
  • examples of various additional suitable heterologous polypeptide (or fragments thereof) for a subject fusion Casl2L polypeptide include, but are not limited to, those described in the following applications (which publications are related to other CRISPR endonucleases such as Cas9, but the described fusion partners can also be used with Cast 2L instead): PCT patent applications: W02010075303, WO2012068627, and WO2013155555, and can be found, for example, in U.S.
  • a heterologous polypeptide (a fusion partner) provides for subcellular localization, i.e., the heterologous polypeptide contains a subcellular localization sequence (e.g., a nuclear localization signal (NLS) for targeting to the nucleus, a sequence to keep the fusion protein out of the nucleus, e.g., a nuclear export sequence (NES), a sequence to keep the fusion protein retained in the cytoplasm, a mitochondrial localization signal for targeting to the mitochondria, a chloroplast localization signal for targeting to a chloroplast, an ER retention signal, and the like).
  • a subcellular localization sequence e.g., a nuclear localization signal (NLS) for targeting to the nucleus
  • NES nuclear export sequence
  • a sequence to keep the fusion protein retained in the cytoplasm e.g., a mitochondrial localization signal for targeting to the mitochondria
  • chloroplast localization signal for targeting to a chloroplast
  • an ER retention signal e.g.
  • a Casl2L fusion polypeptide does not include an NLS so that the protein is not targeted to the nucleus (which can be advantageous, e.g., when the target nucleic acid is an RNA that is present in the cytosol).
  • the heterologous polypeptide can provide a tag (i.e., the heterologous polypeptide is a detectable label) for ease of tracking and/or purification (e.g., a fluorescent protein, e.g., green fluorescent protein (GFP), yellow fluorescent protein (YFP), red fluorescent protein (RFP), cyan fluorescent protein (CFP), mCherry, tdTomato, and the like; a histidine tag, e.g., a 6XHis tag; a hemagglutinin (HA) tag; a FLAG tag; a Myc tag; and the like).
  • a fluorescent protein e.g., green fluorescent protein (GFP), yellow fluorescent protein (YFP), red fluorescent protein (RFP), cyan fluorescent protein (CFP), mCherry, tdTomato, and the like
  • a histidine tag e.g., a 6XHis tag
  • HA hemagglutinin
  • FLAG tag a FLAG tag
  • a Casl2L protein (e.g., a wild type Casl2L protein, a variant Casl2L protein, a fusion Casl2L protein, a dCasl2L protein, and the like) includes (is fused to) a nuclear localization signal (NLS) (e.g., in some cases 2 or more, 3 or more, 4 or more, or 5 or more NLSs).
  • NLS nuclear localization signal
  • a Casl2L polypeptide includes one or more NLSs (e.g., 2 or more, 3 or more, 4 or more, or 5 or more NLSs).
  • one or more NLSs (2 or more, 3 or more, 4 or more, or 5 or more NLSs) are positioned at or near (e.g., within 50 amino acids of) the N-terminus and/or the C-terminus. In some cases, one or more NLSs (2 or more, 3 or more, 4 or more, or 5 or more NLSs) are positioned at or near (e.g., within 50 amino acids of) the N-terminus. In some cases, one or more NLSs (2 or more, 3 or more, 4 or more, or 5 or more NLSs) are positioned at or near (e.g., within 50 amino acids of) the C-terminus.
  • one or more NLSs (3 or more, 4 or more, or 5 or more NLSs) are positioned at or near (e.g., within 50 amino acids of) both the N-terminus and the C-terminus. In some cases, an NLS is positioned at the N-terminus and an NLS is positioned at the C-terminus.
  • a Casl2L protein (e.g., a wild type Casl2L protein, a variant Casl2L protein, a fusion Cas12L protein, a dCas! 2L protein, and the like) includes (is fused to) between 1 and 10 NLSs (e.g., 1-9, 1-8, 1-7, 1-6, 1-5, 2-10, 2-9, 2-8, 2-7, 2-6, or 2-5 NLSs).
  • NLSs e.g., 1-9, 1-8, 1-7, 1-6, 1-5, 2-10, 2-9, 2-8, 2-7, 2-6, or 2-5 NLSs.
  • a Casl2L protein (e.g., a wild type Casl2L protein, a variant Casl2L protein, a fusion Casl2L protein, a dCasl2L protein, and the like) includes (is fused to) between 2 and 5 NLSs (e.g., 2-4, or 2-3 NLSs).
  • Non-limiting examples of NLSs include an NLS sequence derived from: the NLS of the SV40 virus large T-antigen, having the amino acid sequence PKKKRKV (SEQ ID NO: 119); the NLS from nucleoplasmin (e.g., the nucleoplasmin bipartite NLS with the sequence KRPAATKKAGQAKKKK (SEQ ID NO: 120)); the c-myc NLS having the amino acid sequence PAAKRVKLD (SEQ ID NO: 121) or RQRRNELKRSP (SEQ ID NO: 122); the hRNPAl M9 NLS having the sequence NQSSNFGPMKGGNFGGRSSGPYGGGGQYFAKPRNQGGY (SEQ ID NO: 123); the sequence RMRIZFKNKGKDTAELRRRRVEVSVELRKAKKDEQILKRRNV (SEQ ID NO: 124) of the IBB domain from importin-alpha; the sequences VSRKRPRP (SEQ ID NO: 119);
  • NLS are of sufficient strength to drive accumulation of the Casl2L protein in a detectable amount in the nucleus of a eukaryotic cell. Detection of accumulation in the nucleus may be performed by any suitable technique. For example, a detectable marker may be fused to the Casl2L protein such that location within a cell may be visualized. Cell nuclei may also be isolated from cells, the contents of which may then be analyzed by any suitable process for detecting protein, such as immunohistochemistry, Western blot, or enzyme activity assay. Accumulation in the nucleus may also be determined indirectly.
  • a Casl2L fusion polypeptide includes a "Protein Transduction Domain” or PTD (also known as a CPP - cell penetrating peptide), which refers to a polypeptide, polynucleotide, carbohydrate, or organic or inorganic compound that facilitates traversing a lipid bilayer, micelle, cell membrane, organelle membrane, or vesicle membrane.
  • PTD Protein Transduction Domain
  • a PTD attached to another molecule which can range from a small polar molecule to a large macromolecule and/or a nanoparticle, facilitates the molecule traversing a membrane, for example going from extracellular space to intracellular space, or cytosol to within an organelle.
  • a PTD is covalently linked to the amino terminus a polypeptide (e.g., linked to a wild type Casl2L polypeptide to generate a fusion protein, or linked to a variant Cas12L protein such as a dCas! 2L, nickase Cast 2L, or fusion Cas12L protein, to generate a fusion protein).
  • a polypeptide e.g., linked to a wild type Casl2L polypeptide to generate a fusion protein, or linked to a variant Cas12L protein such as a dCas! 2L, nickase Cast 2L, or fusion Cas12L protein, to generate a fusion protein.
  • a PTD is covalently linked to the carboxyl terminus of a polypeptide (e.g., linked to a wild type Casl2L to generate a fusion protein, or linked to a variant Casl2L protein such as a dCasl2L, nickase Casl2L, or fusion Casl2L protein to generate a fusion protein).
  • the PTD is inserted internally in the Casl2L fusion polypeptide (i.e., is not at the N- or C-terminus of the Casl2L fusion polypeptide) at a suitable insertion site.
  • a subject Casl2L fusion polypeptide includes (is conjugated to, is fused to) one or more PTDs (e.g., two or more, three or more, four or more PTDs).
  • a PTD includes a nuclear localization signal (NLS) (e.g., in some cases 2 or more, 3 or more, 4 or more, or 5 or more NLSs).
  • NLS nuclear localization signal
  • a Casl2L fusion polypeptide includes one or more NLSs (e.g., 2 or more, 3 or more, 4 or more, or 5 or more NLSs).
  • a PTD is covalently linked to a nucleic acid (e.g., a Casl2L guide nucleic acid, a polynucleotide encoding a Casl2L guide nucleic acid, a polynucleotide encoding a Casl2L fusion polypeptide, a donor polynucleotide, etc.).
  • a nucleic acid e.g., a Casl2L guide nucleic acid, a polynucleotide encoding a Casl2L guide nucleic acid, a polynucleotide encoding a Casl2L fusion polypeptide, a donor polynucleotide, etc.
  • PTDs include but are not limited to a minimal undecapeptide protein transduction domain (corresponding to residues 47-57 of HIV-1 TAT comprising YGRKKRRQRRR; SEQ ID NO: 135); a polyarginine sequence comprising a number of arginines sufficient to direct entry into a cell (e.g., 3, 4, 5, 6, 7, 8, 9, 10, or 10-50 arginines); a VP22 domain (Zender et al. (2002) Cancer Gene Ther. 9(6):489-96); a Drosophila Antennapedia protein transduction domain (Noguchi et al. (2003) Diabetes 52(7): 1732-1737); a truncated human calcitonin peptide (Trehin et al. (2004) Pharm. Research 21:1248-1256); polylysine (Wender et al. (2000) Proc. Natl. Acad. Sci. USA 97:13003-13008);
  • RRQRRTSKLMKR (SEQ ID NO: 136); Transportan GWTLNSAGYLLGKINLKALAALAKKIL (SEQ ID NO: 137); KALAWEAKLAKALAKALAKHLAKALAKALKCEA (SEQ ID NO: 138); and RQIKIWFQNRRMKWKK (SEQ ID NO: 139).
  • Exemplary PTDs include but are not limited to, YGRKKRRQRRR (SEQ ID NO: 135), RKKRRQRRR (SEQ ID NO: 140); an arginine homopolymer of from 3 arginine residues to 50 arginine residues;
  • Exemplary PTD domain amino acid sequences include, but are not limited to, any of the following: YGRKKRRQRRR (SEQ ID NO: 135); RKKRRQRR (SEQ ID NO: 141); YARAAARQARA (SEQ ID NO: 142); THRLPRRRRRR (SEQ ID NO: 143); and GGRRARRRRRR (SEQ ID NO: 144).
  • the PTD is an activatable CPP (ACPP) (Aguilera et al. (2009) Integr Biol (Camb) June; 1 (5-6): 371 -381).
  • ACPPs comprise a polycationic CPP (e.g., Arg9 or “R9”) connected via a cleavable linker to a matching polyanion (e.g., Glu9 or “E9”), which reduces the net charge to nearly zero and thereby inhibits adhesion and uptake into cells.
  • a polyanion e.g., Glu9 or “E9”
  • Linkers (e.g., for fusion partners)
  • a subject Casl2L protein can fused to a fusion partner via a linker polypeptide (e.g., one or more linker polypeptides).
  • the linker polypeptide may have any of a variety of amino acid sequences. Proteins can be joined by a spacer peptide, generally of a flexible nature, although other chemical linkages are not excluded. Suitable linkers include polypeptides of between 4 amino acids and 40 amino acids in length, or between 4 amino acids and 25 amino acids in length. These linkers can be produced by using synthetic, linker-encoding oligonucleotides to couple the proteins, or can be encoded by a nucleic acid sequence encoding the fusion protein.
  • Peptide linkers with a degree of flexibility can be used.
  • the linking peptides may have virtually any amino acid sequence, bearing in mind that the preferred linkers will have a sequence that results in a generally flexible peptide.
  • the use of small amino acids, such as glycine and alanine, are of use in creating a flexible peptide. The creation of such sequences is routine to those of skill in the art.
  • a variety of different linkers are commercially available and are considered suitable for use.
  • linker polypeptides include glycine polymers (G) n , glycine-serine polymers (including, for example, (GS) n (SEQ ID NO: 145), and (GGGGS) n (SEQ ID NO: 146), where n is an integer from 1 to 10), glycine-alanine polymers, alanine-serine polymers.
  • Exemplary linkers can comprise amino acid sequences including, but not limited to, GGSG (SEQ ID NO: 147), GGSGG (SEQ ID NO: 148), GSGSG (SEQ ID NO: 149), GSGGG (SEQ ID NO: 150), GGGSG (SEQ ID NO: 151), GSSSG (SEQ ID NO: 152), and the like.
  • GGSG SEQ ID NO: 147
  • GGSGG SEQ ID NO: 148
  • GSGSG SEQ ID NO: 149
  • GSGGG SEQ ID NO: 150
  • GGGSG SEQ ID NO: 151
  • GSSSG SEQ ID NO: 152
  • a variety of shorter or longer linker regions are known in the art, for example corresponding to a series of glycine residues, a series of adjacent glycine-serine dipeptides, a series of adjacent glycine- glycine-serine tripeptides, or known linkers from other proteins.
  • a flexible linker may include, for example, the amino acid sequence: SSGPPPGTG (SEQ ID NO: 153) and variants thereof.
  • a rigid linker may include, for example, the amino acid sequence: AEAAAKEAAAKA (SEQ ID NO: 154) and variants thereof.
  • a Casl2L polypeptide may contain one or more tags that allow for e.g. purification and/or detection of the recombinant polypeptide.
  • tags may be used herein and are well-known to those of skill in the art.
  • Exemplary tags may include hemagglutinin (HA), glutathione-S-transferase (GST), FLAG, maltose-binding protein (MBP), etc., and multiple copies of one or more tags may be present in a Casl2L polypeptide.
  • a Cas12L polypeptide may contain one or more reporters that allow for e.g. visualization and/or detection of the Casl2L polypeptide.
  • a reporter polypeptide encodes a protein that may be readily detectable due to its biochemical characteristics such as, for example, enzymatic activity or chemifluorescent features. Reporter polypeptides may be detected in a number of ways depending on the characteristics of the particular reporter. For example, a reporter polypeptide may be detected by its ability to generate a detectable signal (e.g. fluorescence), by its ability to form a detectable product, etc.
  • Various reporters may be used herein and are well-known to those of skill in the art.
  • Exemplary reporters may include a green fluorescent protein (GFP), a yellow fluorescent protein (YFP), a cyan fluorescent protein, GUS, mCherry, luciferase, etc., and multiple copies of one or more tags may be present in a recombinant polypeptide.
  • GFP green fluorescent protein
  • YFP yellow fluorescent protein
  • cyan fluorescent protein GUS
  • mCherry luciferase, etc.
  • multiple copies of one or more tags may be present in a recombinant polypeptide.
  • a Casl2L polypeptide may contain one or more polypeptide domains that serve a particular purpose depending on the particular goal/need.
  • a Casl2L polypeptide may contain a GB 1 polypeptide.
  • a Casl2L polypeptide may contain translocation sequences that target the polypeptide to a particular cellular compartment or area. Suitable features will be readily apparent to those of skill in the art.
  • a Casl2L protein binds to target DNA at a target sequence defined by the region of complementarity between the DNA-targeting RNA and the target DNA.
  • site-specific binding (and/or cleavage) of a double stranded target DNA occurs at locations determined by both (i) base-pairing complementarity between the guide RNA and the target DNA; and (ii) a short motif [referred to as the protospacer adjacent motif (PAM)] in the target DNA.
  • PAM protospacer adjacent motif
  • the PAM for a Casl2L protein is immediately 5’ of the target sequence of the non-complementary strand of the target DNA (the complementary strand: (i) hybridizes to the guide sequence of the guide RNA, while the non-complementary strand does not directly hybridize with the guide RNA; and (ii) is the reverse complement of the non-complementary strand).
  • Casl2L proteins may be advantageous to use in the various provided methods in order to capitalize on various enzymatic characteristics of the different Casl2L proteins (e.g., for different PAM sequence preferences; for increased or decreased enzymatic activity; for an increased or decreased level of cellular toxicity; to change the balance between NHEJ, homology-directed repair, single strand breaks, double strand breaks, etc.; to take advantage of a short total sequence; and the like).
  • Casl2L proteins from different species may require different PAM sequences in the target DNA.
  • a Call2L polypeptide of the present disclosure can be reprogrammed (by complexing with a guide RNA) to cleave any sequence of a target nucleic acid (e.g., a target DNA) that is complementary to the targeting segment of the guide RNA, where the PAM is present on the 5’ end of the target (e.g., a T- rich PAM for CasXl); additional RNA components are not required for the formation of functional effectors in vivo.
  • a PAM sequence is a T-rich sequence (e.g., TTR, where R is a purine).
  • a PAM sequence is TTA.
  • a PAM sequence is TTG.
  • a nucleic acid that binds to a Casl2L protein, forming a ribonucleoprotein complex (RNP), and targets the complex to a specific location within a target nucleic acid (e.g., a target DNA) is referred to herein as a “Casl2L guide RNA” or simply as a “guide RNA.” It is to be understood that in some cases, a hybrid DNA/RNA can be made such that a Casl2L guide RNA includes DNA bases in addition to RNA bases, but the term “Casl2L guide RNA” is still used to encompass such a molecule herein.
  • a Casl2L guide RNA can be said to include two segments, a targeting segment and a protein-binding segment.
  • the protein-binding segment is also referred to herein as the “constant region” of the guide RNA.
  • the targeting segment of a Casl2L guide RNA includes a nucleotide sequence (a guide sequence) that is complementary to (and therefore hybridizes with) a specific sequence (a target site) within a target nucleic acid (e.g., a target dsDNA, a target ssRNA, a target ssDNA, the complementary strand of a double stranded target DNA, etc.).
  • the protein-binding segment interacts with (binds to) a Casl2L polypeptide.
  • the protein-binding segment of a subject Casl2L guide RNA can include two complementary stretches of nucleotides that hybridize to one another to form a double stranded RNA duplex (dsRNA duplex).
  • Site-specific binding and/or cleavage of a target nucleic acid e.g., genomic DNA, ds DNA, RNA, etc.
  • locations e.g., target sequence of a target locus
  • a Casl2L guide RNA and a Casl2L protein form a complex (e.g., bind via non-covalent interactions).
  • the Casl2L guide RNA provides target specificity to the complex by including a targeting segment, which includes a guide sequence (a nucleotide sequence that is complementary to a sequence of a target nucleic acid).
  • the Casl2L protein of the complex provides the site-specific activity (e.g., cleavage activity provided by the Casl2L protein and/or an activity provided by the fusion partner in the case of a fusion Casl2L protein).
  • the Casl2L protein is guided to a target nucleic acid sequence (e.g. a target sequence) by virtue of its association with the Casl2L guide RNA.
  • the “guide sequence” also referred to as the “targeting sequence” of a Casl2L guide RNA can be modified so that the Casl2L guide RNA can target a Casl2L protein (e.g., a naturally occurring Casl2L protein, a fusion Casl2L polypeptide, and the like) to any desired sequence of any desired target nucleic acid, with the exception (e.g., as described herein) that the PAM sequence can be taken into account.
  • a Casl2L protein e.g., a naturally occurring Casl2L protein, a fusion Casl2L polypeptide, and the like
  • a Casl2L guide RNA can have a guide sequence with complementarity to (e.g., can hybridize to) a sequence in a nucleic acid in a eukaryotic cell, e.g., a viral nucleic acid, a eukaryotic nucleic acid (e.g., a eukaryotic chromosome, chromosomal sequence, a eukaryotic RNA, etc.), and the like.
  • a guide sequence with complementarity to e.g., can hybridize to) a sequence in a nucleic acid in a eukaryotic cell, e.g., a viral nucleic acid, a eukaryotic nucleic acid (e.g., a eukaryotic chromosome, chromosomal sequence, a eukaryotic RNA, etc.), and the like.
  • a subject Casl2L guide RNA includes a guide sequence (i.e., a targeting sequence), which is a nucleotide sequence that is complementary to a sequence (a target site) in a target nucleic acid.
  • a guide sequence i.e., a targeting sequence
  • the guide sequence of a Casl2L guide RNA can interact with a target nucleic acid (e.g., double stranded DNA (dsDNA), single stranded DNA (ssDNA), single stranded RNA (ssRNA), or double stranded RNA (dsRNA)) in a sequence-specific manner via hybridization (i.e., base pairing).
  • dsDNA double stranded DNA
  • ssDNA single stranded DNA
  • ssRNA single stranded RNA
  • dsRNA double stranded RNA
  • the guide sequence of a Casl2L guide RNA can be modified (e.g., by genetic engineeringj/designed to hybridize to any desired target sequence (e.g., while taking the PAM into account, e.g., when targeting a dsDNA target) within a target nucleic acid (e.g., a eukaryotic target nucleic acid such as genomic DNA).
  • a target nucleic acid e.g., a eukaryotic target nucleic acid such as genomic DNA.
  • the percent complementarity between the guide sequence and the target site of the target nucleic acid is 60% or more (e.g., 65% or more, 70% or more, 75% or more, 80% or more, 85% or more, 90% or more, 95% or more, 97% or more, 98% or more, 99% or more, or 100%). In some cases, the percent complementarity between the guide sequence and the target site of the target nucleic acid is 80% or more (e.g., 85% or more, 90% or more, 95% or more, 97% or more, 98% or more, 99% or more, or 100%).
  • the percent complementarity between the guide sequence and the target site of the target nucleic acid is 90% or more (e.g., 95% or more, 97% or more, 98% or more, 99% or more, or 100%). In some cases, the percent complementarity between the guide sequence and the target site of the target nucleic acid is 100%.
  • the percent complementarity between the guide sequence and the target site of the target nucleic acid is 100% over the seven contiguous 3 ’-most nucleotides of the target site of the target nucleic acid.
  • the percent complementarity between the guide sequence and the target site of the target nucleic acid is 60% or more (e.g., 70% or more, 75% or more, 80% or more, 85% or more, 90% or more, 95% or more, 97% or more, 98% or more, 99% or more, or 100%) over 17 or more (e.g., 18 or more, 19 or more, 20 or more, 21 or more, 22 or more) contiguous nucleotides.
  • the percent complementarity between the guide sequence and the target site of the target nucleic acid is 80% or more (e.g., 85% or more, 90% or more, 95% or more, 97% or more, 98% or more, 99% or more, or 100%) over 17 or more (e.g., 18 or more, 19 or more, 20 or more, 21 or more, 22 or more) contiguous nucleotides.
  • the percent complementarity between the guide sequence and the target site of the target nucleic acid is 90% or more (e.g., 95% or more, 97% or more, 98% or more, 99% or more, or 100%) over 17 or more (e.g., 18 or more, 19 or more, 20 or more, 21 or more, 22 or more) contiguous nucleotides. In some cases, the percent complementarity between the guide sequence and the target site of the target nucleic acid is 100% over 17 or more (e.g., 18 or more, 19 or more, 20 or more, 21 or more, 22 or more) contiguous nucleotides.
  • the percent complementarity between the guide sequence and the target site of the target nucleic acid is 60% or more (e.g., 70% or more, 75% or more, 80% or more, 85% or more, 90% or more, 95% or more, 97% or more, 98% or more, 99% or more, or 100%) over 19 or more (e.g., 20 or more, 21 or more, 22 or more) contiguous nucleotides.
  • the percent complementarity between the guide sequence and the target site of the target nucleic acid is 80% or more (e.g., 85% or more, 90% or more, 95% or more, 97% or more, 98% or more, 99% or more, or 100%) over 19 or more (e.g., 20 or more, 21 or more, 22 or more) contiguous nucleotides. In some cases, the percent complementarity between the guide sequence and the target site of the target nucleic acid is 90% or more (e.g., 95% or more, 97% or more, 98% or more, 99% or more, or 100%) over 19 or more (e.g., 20 or more, 21 or more, 22 or more) contiguous nucleotides. In some cases, the percent complementarity between the guide sequence and the target site of the target nucleic acid is 100% over 19 or more (e.g., 20 or more, 21 or more, 22 or more) contiguous nucleotides.
  • the percent complementarity between the guide sequence and the target site of the target nucleic acid is 60% or more (e.g., 70% or more, 75% or more, 80% or more, 85% or more, 90% or more, 95% or more, 97% or more, 98% or more, 99% or more, or 100%) over 17-25 contiguous nucleotides. In some cases, the percent complementarity between the guide sequence and the target site of the target nucleic acid is 80% or more (e.g., 85% or more, 90% or more, 95% or more, 97% or more, 98% or more, 99% or more, or 100%) over 17-25 contiguous nucleotides.
  • the percent complementarity between the guide sequence and the target site of the target nucleic acid is 90% or more (e.g., 95% or more, 97% or more, 98% or more, 99% or more, or 100%) over 17-25 contiguous nucleotides. In some cases, the percent complementarity between the guide sequence and the target site of the target nucleic acid is 100% over 17-25 contiguous nucleotides.
  • the percent complementarity between the guide sequence and the target site of the target nucleic acid is 60% or more (e.g., 70% or more, 75% or more, 80% or more, 85% or more, 90% or more, 95% or more, 97% or more, 98% or more, 99% or more, or 100%) over 19-25 contiguous nucleotides. In some cases, the percent complementarity between the guide sequence and the target site of the target nucleic acid is 80% or more (e.g., 85% or more, 90% or more, 95% or more, 97% or more, 98% or more, 99% or more, or 100%) over 19-25 contiguous nucleotides.
  • the percent complementarity between the guide sequence and the target site of the target nucleic acid is 90% or more (e.g., 95% or more, 97% or more, 98% or more, 99% or more, or 100%) over 19-25 contiguous nucleotides. In some cases, the percent complementarity between the guide sequence and the target site of the target nucleic acid is 100% over 19-25 contiguous nucleotides.
  • the guide sequence has a length in a range of from 17-30 nucleotides (nt) (e.g., from 17-25, 17-22, 17-20, 19-30, 19-25, 19-22, 19-20, 20-30, 20-25, or 20-22 nt). In some cases, the guide sequence has a length in a range of from 17-25 nucleotides (nt) (e.g., from 17-22, 17-20, 19-25, 19- 22, 19-20, 20-25, or 20-22 nt).
  • nt nucleotides
  • the guide sequence has a length of 17 or more nt (e.g., 18 or more, 19 or more, 20 or more, 21 or more, or 22 or more nt; 19 nt, 20 nt, 21 nt, 22 nt, 23 nt, 24 nt, 25 nt, etc.). In some cases, the guide sequence has a length of 19 or more nt (e.g., 20 or more, 21 or more, or 22 or more nt; 19 nt, 20 nt, 21 nt, 22 nt, 23 nt, 24 nt, 25 nt, etc.). In some cases, the guide sequence has a length of 17 nt.
  • nt e.g., 18 or more, 19 or more, 20 or more, 21 or more, or 22 or more nt; 19 nt, 20 nt, 21 nt, 22 nt, 23 nt, 24 nt, 25 nt, etc.
  • the guide sequence has a length of 18 nt. In some cases, the guide sequence has a length of 19 nt. In some cases, the guide sequence has a length of 20 nt. In some cases, the guide sequence has a length of 21 nt. In some cases, the guide sequence has a length of 22 nt. In some cases, the guide sequence has a length of 23 nt.
  • the guide sequence (also referred to as a “spacer sequence”) has a length of from 15 to 50 nucleotides (e.g., from 15 nucleotides (nt) to 20 nt, from 20 nt to 25 nt, from 25 nt to 30 nt, from 30 nt to 35 nt, from 35 nt to 40 nt, from 40 nt to 45 nt, or from 45 nt to 50 nt).
  • 15 to 50 nucleotides e.g., from 15 nucleotides (nt) to 20 nt, from 20 nt to 25 nt, from 25 nt to 30 nt, from 30 nt to 35 nt, from 35 nt to 40 nt, from 40 nt to 45 nt, or from 45 nt to 50 nt.
  • the protein-binding segment (the “constant region”) of a subject Casl2L guide RNA interacts with a Casl2L protein.
  • the Casl2L guide RNA guides the bound Casl2L protein to a specific nucleotide sequence within target nucleic acid via the above-mentioned guide sequence.
  • the proteinbinding segment of a Casl2L guide RNA can include two stretches of nucleotides that are complementary to one another and hybridize to form a double stranded RNA duplex (dsRNA duplex).
  • dsRNA duplex double stranded RNA duplex
  • the protein-binding segment includes a dsRNA duplex.
  • the dsRNA duplex region includes a range of from 5-25 base pairs (bp) (e.g., from 5-22, 5-20, 5-18, 5-15, 5-12, 5-10, 5-8, 8-25, 8-22, 8-18, 8-15, 8-12, 12-25, 12-22, 12-18, 12-15, 13- 25, 13-22, 13-18, 13-15, 14-25, 14-22, 14-18, 14-15, 15-25, 15-22, 15-18, 17-25, 17-22, or 17-18 bp, e.g., 5 bp, 6 bp, 7 bp, 8 bp, 9 bp, 10 bp, etc.).
  • bp base pairs
  • the dsRNA duplex region includes a range of from 6-15 base pairs (bp) (e.g., from 6-12, 6-10, or 6-8 bp, e.g., 6 bp, 7 bp, 8 bp, 9 bp, 10 bp, etc.). In some cases, the duplex region includes 5 or more bp (e.g., 6 or more, 7 or more, or 8 or more bp). In some cases, the duplex region includes 6 or more bp (e.g., 7 or more, or 8 or more bp). In some cases, not all nucleotides of the duplex region are paired, and therefore the duplex forming region can include a bulge.
  • bp base pairs
  • the term “bulge” herein is used to mean a stretch of nucleotides (which can be one nucleotide) that do not contribute to a double stranded duplex, but which are surround 5’ and 3’ by nucleotides that do contribute, and as such a bulge is considered part of the duplex region.
  • the dsRNA includes 1 or more bulges (e.g., 2 or more, 3 or more, 4 or more bulges).
  • the dsRNA duplex includes 2 or more bulges (e.g., 3 or more, 4 or more bulges).
  • the dsRNA duplex includes 1-5 bulges (e.g., 1-4, 1-3, 2-5, 2-4, or 2-3 bulges).
  • the stretches of nucleotides that hybridize to one another to form the dsRNA duplex have 70%-100% complementarity (e.g., 75%-100%, 80%-10%, 85%-100%, 90%-100%, 95%-100% complementarity) with one another.
  • the stretches of nucleotides that hybridize to one another to form the dsRNA duplex have 70%-100% complementarity (e.g., 75%-100%, 80%-10%, 85%-100%, 90%-100%, 95%-100% complementarity) with one another.
  • the stretches of nucleotides that hybridize to one another to form the dsRNA duplex have 85%-100% complementarity (e.g., 90%-100%, 95%-100% complementarity) with one another. In some cases, the stretches of nucleotides that hybridize to one another to form the dsRNA duplex have 70%-95% complementarity (e.g., 75%-95%, 80%-95%, 85%-95%, 90%-95% complementarity) with one another.
  • the dsRNA duplex includes two stretches of nucleotides that have 70%-100% complementarity (e.g., 75%-100%, 80%-10%, 85%-100%, 90%-100%, 95%-100% complementarity) with one another.
  • the dsRNA duplex includes two stretches of nucleotides that have 85%-100% complementarity (e.g., 90%-100%, 95%-100% complementarity) with one another.
  • the dsRNA duplex includes two stretches of nucleotides that have 70%- 95% complementarity (e.g., 75%-95%, 80%-95%, 85%-95%, 90%-95% complementarity) with one another.
  • the duplex region of a subject Casl2L guide RNA can include one or more (1, 2, 3, 4, 5, etc) mutations relative to a naturally occurring duplex region. For example, in some cases a base pair can be maintained while the nucleotides contributing to the base pair from each segment can be different. In some cases, the duplex region of a subject Casl2L guide RNA includes more paired bases, less paired bases, a smaller bulge, a larger bulge, fewer bulges, more bulges, or any convenient combination thereof, as compared to a naturally occurring duplex region (of a naturally occurring Cas12L guide RNA).
  • Cas9 guide RNAs can be found in the art, and in some cases variations similar to those introduced into Cas9 guide RNAs can also be introduced into Casl2L guide RNAs of the present disclosure (e.g., mutations to the dsRNA duplex region, extension of the 5’ or 3’ end for added stability for to provide for interaction with another protein, and the like).
  • variations similar to those introduced into Cas9 guide RNAs can also be introduced into Casl2L guide RNAs of the present disclosure (e.g., mutations to the dsRNA duplex region, extension of the 5’ or 3’ end for added stability for to provide for interaction with another protein, and the like).
  • Jinek et al. Science. 2012 Aug 17;337(6096): 816-21 ; Chylinski et al., RNA Biol. 2013 May;10(5):726-37; Ma et al., Biomed Res Int.
  • a Casl2L guide RNA can include a constant region having from 1 to 5 nucleotide substitutions compared to any one of the nucleotide sequences depicted in FIG. 5A-5M.
  • the nucleotide sequences can be combined with a spacer sequence (where the spacer sequence comprises a target nucleic acid-binding sequence (“guide sequence”)) of choice that is from 15 to 50 nucleotides (e.g., from 15 nucleotides (nt) to 20 nt, from 20 nt to 25 nt, from 25 nt to 30 nt, from 30 nt to 35 nt, from 35 nt to 40 nt, from 40 nt to 45 nt, or from 45 nt to 50 nt in length).
  • the spacer sequence is 35-38 nucleotides in length.
  • any one of the nucleotide sequences (with T substituted with U) depicted in FIG. 5A-5M can be included in a guide RNA comprising (N)n-constant region, where N is any nucleotide and n is an integer from 15 to 50 (e.g., from 15 to 20, from 20 to 25, from 25 to 30, from 30 to 35, from 35 to 38, from 35 to 40, from 40 to 45, or from 45 to 50).
  • the constant region of a Casl2L guide RNA can comprise the nucleotide sequence: AUUGUUGUAACUCUUAUUUUGUAUGGAGUAAACAAC (SEQ ID NO:74).
  • the constant region of a Casl2L guide RNA can comprise the nucleotide sequence: AUUGUUGUAGACCUCUUUUUAUAAGGAUUGAACAAC (SEQ ID NO:76).
  • the constant region of a Casl2L guide RNA can comprise the nucleotide sequence: UAUUGUUGUAGAUACCUUUUGUAAGGAUUAAACAAC (SEQ ID NO:79).
  • the constant region of a Casl2L guide RNA can comprise the nucleotide sequence: AAUGUUGUAGAUGCCUUUUUAUAAGGAUUAAACAACUUG (SEQ ID NO: 156).
  • the constant region of a Casl2L guide RNA can comprise the nucleotide sequence: AUUGUUGAAAUAGUACUUUUAUAGUCUAUAUACAAC (SEQ ID NO:70).
  • the constant region of a Casl2L guide RNA can comprise the nucleotide sequence:
  • the constant region of a Casl2L guide RNA can comprise the nucleotide sequence: AUUGUUGUAACUUUUAUUUUGUAUGGAGUAAACAAC (SEQ ID NO:75).
  • the constant region of a Casl2L guide RNA can comprise the nucleotide sequence: AAUGUUGUAGAUACCUUUUUGUAAGGAUUGAACAAC (SEQ ID NO:78).
  • the constant region of a Casl2L guide RNA can comprise the nucleotide sequence: AUUGUUGUAAUACUAUUUUUGUAAAGUAUAAACAAC (SEQ ID NO:81).
  • the constant region of a Cast 2L guide RNA can comprise the nucleotide sequence: AAUGUUGUAGAUGCCUUUUUAUAAGGAUUAAACAAC (SEQ ID NO:77).
  • the constant region of a Casl2L guide RNA can comprise the nucleotide sequence: AUUGUUGUAAUACACUUUUUAUAAGGUAUGAACAAC (SEQ ID NO:82).
  • the constant region of a Casl2L guide RNA can comprise the nucleotide sequence: AUUGUUGUAACAUCUAUUUUGUAAGGUAAACAAC (SEQ ID NO:71).
  • RNA comprising constant region-(N)n, where N is any nucleotide and n is an integer from 15 to 50 (e.g., from 15 to 20, from 20 to 25, from 25 to 30, from 30 to 35, from 35 to 38, from 35 to 40, from 40 to 45, or from 45 to 50).
  • a guide RNA can have the following nucleotide sequence: NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNAUUGUUGUAACUCUUAUUUGUAU GGAGUAAACAAC (SEQ ID NO: 157) or in some cases the reverse complement, where N is any nucleotide, e.g., where the stretch of Ns includes a target nucleic acid-binding sequence.
  • a nucleic acid that binds to a Casl2L protein, forming a nucleic acid/Casl2L polypeptide complex, and that targets the complex to a specific location within a target nucleic acid comprises ribonucleotides only, deoxyribonucleotides only, or a mixture of ribonucleotides and deoxyribonucleotides.
  • a guide polynucleotide comprises ribonucleotides only, and is referred to herein as a “guide RNA.” In some cases, a guide polynucleotide comprises deoxyribonucleotides only, and is referred to herein as a “guide DNA.” In some cases, a guide polynucleotide comprises both ribonucleotides and deoxyribonucleotides.
  • a guide polynucleotide can comprise combinations of ribonucleotide bases, deoxyribonucleotide bases, nucleotide analogs, modified nucleotides, and the like; and may further include naturally-occurring backbone residues and/or linkages and/or non-naturally-occurring backbone residues and/or linkages.
  • recombinant nucleic acids encode recombinant polypeptides of the present disclosure.
  • polynucleotide shall be generic to polydeoxyribonucleotides (containing 2-deoxy-D-ribose), to polyribonucleotides (containing D-ribose), to any other type of polynucleotide that is an N-glycoside of a purine or pyrimidine base, and to other polymers containing non-nucleotidic backbones, provided that the polymers contain nucleobases in a configuration that allows for base pairing and base stacking, as found in DNA and RNA.
  • nucleic acid sequence modifications for example, substitution of one or more of the naturally occurring nucleotides with an analog, and inter-nucleotide modifications.
  • symbols for nucleotides and polynucleotides are those recommended by the IUPAC-IUB Commission of Biochemical Nomenclature.
  • “Recombinant nucleic acid” or “heterologous nucleic acid” or “recombinant polynucleotide” as used herein refers to a polymer of nucleic acids wherein at least one of the following is true: (a) the sequence of nucleic acids is foreign to (i.e., not naturally found in) a given host cell; (b) the sequence may be naturally found in a given host cell, but in an unnatural (e.g., greater than expected) amount; or (c) the sequence of nucleic acids contains two or more subsequences that are not found in the same relationship to each other in nature.
  • a recombinant nucleic acid sequence will have two or more sequences from unrelated genes arranged to make a new functional nucleic acid.
  • the present disclosure describes the introduction of an expression vector into a plant cell, where the expression vector contains a nucleic acid sequence coding for a protein that is not normally found in a plant cell or contains a nucleic acid coding for a protein that is normally found in a plant cell but is under the control of different regulatory sequences. With reference to the plant cell’s genome, then, the nucleic acid sequence that codes for the protein is recombinant.
  • a protein that is referred to as recombinant may be encoded by a recombinant nucleic acid sequence which may be present in the plant cell.
  • Recombinant proteins of the present disclosure may also be exogenously supplied directly to host cells (e.g. plant cells).
  • the present disclosure provides one or more nucleic acids comprising one or more of: a donor polynucleotide sequence, a nucleotide sequence encoding a Casl2L polypeptide (e.g., a wild type Casl2L protein, a nickase Casl2L protein, a dCasl2L protein, fusion Casl2L protein, and the like), a Casl2L guide RNA, and a nucleotide sequence encoding a Casl2L guide RNA.
  • the present disclosure provides a nucleic acid comprising a nucleotide sequence encoding a Casl2L fusion polypeptide.
  • the present disclosure provides a recombinant expression vector that comprises a nucleotide sequence encoding a Casl2L polypeptide.
  • the present disclosure provides a recombinant expression vector that comprises a nucleotide sequence encoding a Casl2L fusion polypeptide.
  • the present disclosure provides a recombinant expression vector that comprises: a) a nucleotide sequence encoding a Casl2L polypeptide; and b) a nucleotide sequence encoding a Casl2L guide RNA(s).
  • the present disclosure provides a recombinant expression vector that comprises: a) a nucleotide sequence encoding a Casl2L fusion polypeptide; and b) a nucleotide sequence encoding a Casl2L guide RNA(s).
  • the nucleotide sequence encoding the Casl2L protein and/or the nucleotide sequence encoding the Casl2L guide RNA is operably linked to a promoter that is operable in a cell type of choice (e.g., a prokaryotic cell, a eukaryotic cell, a plant cell, an animal cell, a mammalian cell, a primate cell, a rodent cell, a human cell, etc.).
  • a promoter that is operable in a cell type of choice (e.g., a prokaryotic cell, a eukaryotic cell, a plant cell, an animal cell, a mammalian cell, a primate cell, a rodent cell, a human cell, etc.).
  • a nucleotide sequence encoding a Casl2L polypeptide of the present disclosure is codon optimized. This type of optimization can entail a mutation of a Casl2L -encoding nucleotide sequence to mimic the codon preferences of the intended host organism or cell while encoding the same protein. Thus, the codons can be changed, but the encoded protein remains unchanged. For example, if the intended target cell was a human cell, a human codon-optimized Casl2L-encoding nucleotide sequence could be used.
  • the intended host cell were a mouse cell, then a mouse codon-optimized Casl2L-encoding nucleotide sequence could be generated.
  • a plant cell then a plant codon-optimized Casl2L-encoding nucleotide sequence could be generated.
  • an insect codon-optimized Casl2L-encoding nucleotide sequence could be generated.
  • a nucleic acid of the present disclosure comprises a Casl2L polypeptide-encoding nucleotide sequence that is codon optimized for expression in a eukaryotic cell. In some cases, a nucleic acid of the present disclosure comprises a Casl2L polypeptide-encoding nucleotide sequence that is codon optimized for expression in an animal cell.
  • a nucleic acid of the present disclosure comprises a Casl2L polypeptide-encoding nucleotide sequence that is codon optimized for expression in a fungus cell. In some cases, a nucleic acid of the present disclosure comprises a Casl2L polypeptide-encoding nucleotide sequence that is codon optimized for expression in a plant cell. In some cases, a nucleic acid of the present disclosure comprises a Casl2L polypeptide-encoding nucleotide sequence that is codon optimized for expression in a monocotyledonous plant species.
  • a nucleic acid of the present disclosure comprises a Casl2L polypeptide-encoding nucleotide sequence that is codon optimized for expression in a dicotyledonous plant species. In some cases, a nucleic acid of the present disclosure comprises a Casl2L polypeptide-encoding nucleotide sequence that is codon optimized for expression in a gymnosperm plant species. In some cases, a nucleic acid of the present disclosure comprises a Casl2L polypeptide-encoding nucleotide sequence that is codon optimized for expression in an angiosperm plant species.
  • a nucleic acid of the present disclosure comprises a Casl2L polypeptide-encoding nucleotide sequence that is codon optimized for expression in a corn cell. In some cases, a nucleic acid of the present disclosure comprises a Casl2L polypeptide-encoding nucleotide sequence that is codon optimized for expression in a soybean cell. In some cases, a nucleic acid of the present disclosure comprises a Cas12L polypeptide-encoding nucleotide sequence that is codon optimized for expression in a rice cell. In some cases, a nucleic acid of the present disclosure comprises a Casl2L polypeptide-encoding nucleotide sequence that is codon optimized for expression in a wheat cell.
  • a nucleic acid of the present disclosure comprises a Casl2L polypeptide-encoding nucleotide sequence that is codon optimized for expression in a cotton cell. In some cases, a nucleic acid of the present disclosure comprises a Casl2L polypeptide-encoding nucleotide sequence that is codon optimized for expression in a sorghum cell. In some cases, a nucleic acid of the present disclosure comprises a Casl2L polypeptide-encoding nucleotide sequence that is codon optimized for expression in an alfalfa cell.
  • a nucleic acid of the present disclosure comprises a Casl2L polypeptide-encoding nucleotide sequence that is codon optimized for expression in a sugar cane cell. In some cases, a nucleic acid of the present disclosure comprises a Casl2L polypeptide-encoding nucleotide sequence that is codon optimized for expression in an Arabidopsis cell. In some cases, a nucleic acid of the present disclosure comprises a Casl2L polypeptide- encoding nucleotide sequence that is codon optimized for expression in a tomato cell.
  • a nucleic acid of the present disclosure comprises a Casl2L polypeptide-encoding nucleotide sequence that is codon optimized for expression in a cucumber cell. In some cases, a nucleic acid of the present disclosure comprises a Casl2L polypeptide-encoding nucleotide sequence that is codon optimized for expression in a potato cell. In some cases, a nucleic acid of the present disclosure comprises a Casl2L polypeptide-encoding nucleotide sequence that is codon optimized for expression in an algae cell.
  • the present disclosure provides one or more recombinant expression vectors that include (in different recombinant expression vectors in some cases, and in the same recombinant expression vector in some cases): (i) a nucleotide sequence of a donor template nucleic acid (where the donor template comprises a nucleotide sequence having homology to a target sequence of a target nucleic acid (e.g., a target genome)); (ii) a nucleotide sequence that encodes a Casl2L guide RNA that hybridizes to a target sequence of the target locus of the targeted genome (e.g., operably linked to a promoter that is operable in a target cell such as a eukaryotic cell); and (iii) a nucleotide sequence encoding a Casl2L protein (e.g., operably linked to a promoter that is operable in a target cell such as a eukaryotic cell).
  • the present disclosure provides one or more recombinant expression vectors that include (in different recombinant expression vectors in some cases, and in the same recombinant expression vector in some cases): (i) a nucleotide sequence of a donor template nucleic acid (where the donor template comprises a nucleotide sequence having homology to a target sequence of a target nucleic acid (e.g., a target genome)); and (ii) a nucleotide sequence that encodes a Casl2L guide RNA that hybridizes to a target sequence of the target locus of the targeted genome (e.g., operably linked to a promoter that is operable in a target cell such as a eukaryotic cell).
  • a nucleotide sequence of a donor template nucleic acid where the donor template comprises a nucleotide sequence having homology to a target sequence of a target nucleic acid (e.g., a target genome)
  • the present disclosure provides one or more recombinant expression vectors that include (in different recombinant expression vectors in some cases, and in the same recombinant expression vector in some cases): (i) a nucleotide sequence that encodes a Casl2L guide RNA that hybridizes to a target sequence of the target locus of the targeted genome (e.g., operably linked to a promoter that is operable in a target cell such as a eukaryotic cell); and (ii) a nucleotide sequence encoding a Casl2L protein (e.g., operably linked to a promoter that is operable in a target cell such as a eukaryotic cell).
  • a nucleotide sequence that encodes a Casl2L guide RNA that hybridizes to a target sequence of the target locus of the targeted genome e.g., operably linked to a promoter that is operable in a target cell such as a eukary
  • Suitable expression vectors include viral expression vectors (e.g. viral vectors based on vaccinia virus; poliovirus; adenovirus (see, e.g., Li et al., Invest Opthalmol Vis Sci 35:2543 2549, 1994; Borras et al., Gene Ther 6:515 524, 1999; Li and Davidson, PNAS 92:77007704, 1995; Sakamoto et al., H Gene Ther 5:1088 1097, 1999; WO 94/12649, WO 93/03769; WO 93/19191; WO 94/28938; WO 95/11984 and WO 95/00655); adeno-associated virus (AAV) (see, e.g., Ali et al., Hum Gene Ther 9:81 86, 1998, Flannery et al., PNAS 94:6916 6921, 1997; Bennett et al., Invest Opthalmol Vis
  • SV40 herpes simplex virus
  • human immunodeficiency virus see, e.g., Miyoshi et al., PNAS 94:10319 23, 1997; Takahashi et al., J Virol 73:7812 7816, 1999
  • a retroviral vector e.g., Murine Leukemia Virus, spleen necrosis virus, and vectors derived from retroviruses such as Rous Sarcoma Virus, Harvey Sarcoma Virus, avian leukosis virus, a lenti virus, human immunodeficiency virus, myeloproliferative sarcoma virus, and mammary tumor virus
  • retroviral vector e.g., Murine Leukemia Virus, spleen necrosis virus, and vectors derived from retroviruses such as Rous Sarcoma Virus, Harvey Sarcoma Virus, avian leukosis virus, a lenti virus, human immunodeficiency virus, myelop
  • a recombinant expression vector of the present disclosure is a recombinant adeno-associated virus (AAV) vector.
  • a recombinant expression vector of the present disclosure is a recombinant lentivirus vector.
  • a recombinant expression vector of the present disclosure is a recombinant retroviral vector.
  • viral vectors based on Tobamoviruses, Potexviruses, Potyviruses, Tobraviruses, Tombus viruses, Geminiviruses, Bromoviruses, Carmoviruses, Alfamo viruses, or Cucumoviruses can be used. See, e.g., Peyret and Lomonossoff (2015) Plant Biotechnol. J. 13:1121.
  • Suitable Tobamovirus vectors include, for example, a tomato mosaic virus (ToMV) vector, a tobacco mosaic virus (TMV) vector, a tobacco mild green mosaic virus (TMGMV) vector, a pepper mild mottle virus (PMMoV) vector, a paprika mild mottle virus (PaMMV) vector, a cucumber green mottle mosaic virus (CGMMV) vector, a kyuri green mottle mosaic virus (KGMMV) vector, a hibiscus latent fort pierce virus (HLFPV) vector, an odontoglossum ringspot virus (ORSV) vector, a rehmannia mosaic virus (ReMV) vector, a Sammon's opuntia virus (SOV) vector, a wasabi mottle virus (WMoV) vector, a youcai mosaic virus (YoMV) vector, a sunn-hemp mosaic virus (SHMV) vector, and the like.
  • ToMV tomato mosaic virus
  • TMV tobacco mosaic virus
  • TMV
  • Suitable Potexvirus vectors include, for example, a potato virus X (PVX) vector, a potato aucubamosaicvirus (PAMV) vector, an Alstroemeria virus X (AlsVX) vector, a cactus virus X (CVX) vector, a Cymbidium mosaic virus (CymMV) vector, a hosta virus X (HVX) vector, a lily virus X (LVX) vector, a Narcissus mosaic virus (NMV) vector, a Nerine virus X (NVX) vector, a Plantago asiatica mosaic virus (P1AMV) vector, a strawberry mild yellow edge virus (SMYEV) vector, a tulip virus X (TVX) vector, a white clover mosaic virus (WC1MV) vector, a bamboo mosaic virus (BaMV) vector, and the like.
  • PVX potato virus X
  • PAMV potato aucubamosaicvirus
  • Suitable Potyvirus vectors include, for example, a potato virus Y (PVY) vector, a bean common mosaic virus (BCMV) vector, a clover yellow vein virus (C1YVV) vector, an East Asian Passiflora virus (EAPV) vector, a Freesia mosaic virus (FreMV) vector, a Japanese yam mosaic virus (JYMV) vector, a lettuce mosaic virus (LMV) vector, a Maize dwarf mosaic virus (MDMV) vector, an onion yellow dwarf virus (OYDV) vector, a papaya ringspot virus (PRSV) vector, a pepper mottle virus (PepMoV) vector, a Perilla mottle virus (PerMo V) vector, a plum pox virus (PPV) vector, a potato virus A (PVA) vector, a sorghum mosaic virus (SrMV) vector, a soybean mosaic virus (SMV) vector, a sugarcane mosaic virus (SCMV) vector, a tulip mosaic virus (TulMV
  • Suitable Tobravirus vectors include, for example, a tobacco rattle virus (TRV) vector and the like.
  • Suitable Tombusvirus vectors include, for example, a tomato bushy stunt virus (TBSV) vector, an eggplant mottled crinkle virus (EMCV) vector, a grapevine Jamaican latent virus (GALV) vector, and the like.
  • Suitable Cucumovirus vectors include, for example, a cucumber mosaic virus (CMV) vector, a peanut stunt virus (PSV) vector, a tomato aspermy virus (TAV) vector, and the like.
  • Suitable Bromovirus vectors include, for example, a brome mosaic virus (BMV) vector, a cowpea chlorotic mottle virus (CCMV) vector, and the like.
  • Suitable Carmovirus vectors include, for example, a carnation mottle virus (CarMV) vector, a melon necrotic spot virus (MNSV) vector, a pea stem necrotic virus (PSNV) vector, a turnip crinkle virus (TCV) vector, and the like.
  • Suitable Alfamovirus vectors include, for example, an alfalfa mosaic virus (AMV) vector, and the like.
  • any of a number of suitable transcription and translation control elements including constitutive and inducible promoters, transcription enhancer elements, transcription terminators, etc. may be used in the expression vector.
  • a nucleotide sequence encoding a Casl2L guide RNA is operably linked to a control element, e.g., a transcriptional control element, such as a promoter.
  • a nucleotide sequence encoding a Casl2L protein or a Casl2L fusion polypeptide is operably linked to a control element, e.g., a transcriptional control element, such as a promoter.
  • the transcriptional control element can be a promoter.
  • the promoter is a constitutively active promoter.
  • the promoter is a regulatable promoter.
  • the promoter is an inducible promoter.
  • the promoter is a tissue-specific promoter.
  • the promoter is a cell type-specific promoter.
  • the transcriptional control element e.g., the promoter
  • the transcriptional control element is functional in a targeted cell type or targeted cell population.
  • the transcriptional control element can be functional in eukaryotic cells, e.g., hematopoietic stem cells (e.g., mobilized peripheral blood (mPB) CD34(+) cell, bone marrow (BM) CD34(+) cell, etc.).
  • eukaryotic promoters include EFla, those from cytomegalovirus (CMV) immediate early, herpes simplex virus (HSV) thymidine kinase, early and late SV40, long terminal repeats (LTRs) from retrovirus, and mouse metallothionein-I.
  • the expression vector may also contain a ribosome binding site for translation initiation and a transcription terminator.
  • the expression vector may also include appropriate sequences for amplifying expression.
  • the expression vector may also include nucleotide sequences encoding protein tags (e.g., 6xHis tag, hemagglutinin tag, fluorescent protein, etc.) that can be fused to the Casl2L protein, thus resulting in a fusion Casl2L polypeptide.
  • a nucleotide sequence encoding a Casl2L guide RNA and/or a Casl2L fusion polypeptide is operably linked to an inducible promoter. In some embodiments, a nucleotide sequence encoding a Casl2L guide RNA and/or a Casl2L fusion protein is operably linked to a constitutive promoter.
  • a promoter can be a constitutively active promoter (i.e., a promoter that is constitutively in an active/”ON” state), it may be an inducible promoter (i.e., a promoter whose state, active/”ON” or inactive/“OFF”, is controlled by an external stimulus, e.g., the presence of a particular temperature, compound, or protein.), it may be a spatially restricted promoter (i.e., transcriptional control element, enhancer, etc.)(e.g., tissue specific promoter, cell type specific promoter, etc.), and it may be a temporally restricted promoter (i.e., the promoter is in the “ON” state or “OFF” state during specific stages of embryonic development or during specific stages of a biological process, e.g., hair' follicle cycle in mice).
  • a constitutively active promoter i.e., a promoter that is constitutively in an active/”ON” state
  • it may be an inducible
  • Suitable promoters can be derived from viruses and can therefore be referred to as viral promoters, or they can be derived from any organism, including prokaryotic or eukaryotic organisms. Suitable promoters can be used to drive expression by any RNA polymerase (e.g., pol I, pol II, pol III).
  • RNA polymerase e.g., pol I, pol II, pol III
  • Exemplary promoters include, but are not limited to the SV40 early promoter, mouse mammary tumor virus long terminal repeat (LTR) promoter; adenovirus major late promoter (Ad MLP); a herpes simplex virus (HSV) promoter, a cytomegalovirus (CMV) promoter such as the CMV immediate early promoter region (CMVIE), a rous sarcoma virus (RSV) promoter, a human U6 small nuclear promoter (U6) (Miyagishi et al., Nature Biotechnology 20, 497 - 500 (2002)), an enhanced U6 promoter (e.g., Xia et ah, Nucleic Acids Res. 2003 Sep 1;31(17)), a human Hl promoter (Hl), and the like.
  • LTR mouse mammary tumor virus long terminal repeat
  • Ad MLP adenovirus major late promoter
  • HSV herpes simplex virus
  • CMV cytomegalovirus
  • CMVIE
  • a nucleotide sequence encoding a Casl2L guide RNA is operably linked to (under the control of) a promoter operable in a eukaryotic cell (e.g., a U6 promoter, an enhanced U6 promoter, an H1 promoter, and the like).
  • a promoter operable in a eukaryotic cell e.g., a U6 promoter, an enhanced U6 promoter, an H1 promoter, and the like.
  • RNA e.g., a guide RNA
  • a nucleic acid e.g., an expression vector
  • U6 promoter e.g., in a eukaryotic cell
  • PolIII polymerase III
  • a nucleotide sequence encoding a Casl2L protein (e.g., a wild type Casl2L protein, a nickase Casl2L protein, a dCasl2L protein, a fusion Casl2L protein and the like) is operably linked to a promoter operable in a eukaryotic cell (e.g., a CMV promoter, an EFla promoter, an estrogen receptor-regulated promoter, and the like).
  • a promoter operable in a eukaryotic cell e.g., a CMV promoter, an EFla promoter, an estrogen receptor-regulated promoter, and the like.
  • inducible promoters include, but are not limited toT7 RNA polymerase promoter, T3 RNA polymerase promoter, Isopropyl-beta-D-thiogalactopyranoside (IPTG)-regulated promoter, lactose induced promoter, heat shock promoter, Tetracycline-regulated promoter, Steroid- regulated promoter, Metal-regulated promoter, estrogen receptor-regulated promoter, etc.
  • Inducible promoters can therefore be regulated by molecules including, but not limited to, doxycycline; estrogen and/or an estrogen analog; IPTG; etc.
  • inducible promoters suitable for use include any inducible promoter described herein or known to one of ordinary skill in the art.
  • inducible promoters include, without limitation, chemically/biochemically-regulated and physically-regulated promoters such as alcohol-regulated promoters, tetracycline -regulated promoters (e.g., anhydrotetracycline (aTc)-responsive promoters and other tetracycline -responsive promoter systems, which include a tetracycline repressor protein (tetR), a tetracycline operator sequence (tetO) and a tetracycline transactivator fusion protein (tTA)), steroid- regulated promoters (e.g., promoters based on the rat glucocorticoid receptor, human estrogen receptor, moth ecdysone receptors, and promoters from the steroid/retinoid/thyroid receptor superfamily), metal- regulated promoter
  • the promoter is a spatially restricted promoter (i.e., cell type specific promoter, tissue specific promoter, etc.) such that in a multi-cellular organism, the promoter is active (i.e., “ON”) in a subset of specific cells.
  • Spatially restricted promoters may also be referred to as enhancers, transcriptional control elements, control sequences, etc. Any convenient spatially restricted promoter may be used as long as the promoter is functional in the targeted host cell (e.g., eukaryotic cell; prokaryotic cell).
  • the promoter is a reversible promoter. Suitable reversible promoters, including reversible inducible promoters are known in the art.
  • Such reversible promoters may be isolated and derived from many organisms, e.g., eukaryotes and prokaryotes. Modification of reversible promoters derived from a first organism for use in a second organism, e.g., a first prokaryote and a second a eukaryote, a first eukaryote and a second a prokaryote, etc., is well known in the art.
  • Such reversible promoters, and systems based on such reversible promoters but also comprising additional control proteins include, but are not limited to, alcohol regulated promoters (e.g., alcohol dehydrogenase I (alcA) gene promoter, promoters responsive to alcohol transactivator proteins (AlcR), etc.), tetracycline regulated promoters, (e.g., promoter systems including Tet Activators, TetON, TetOFF, etc.), steroid regulated promoters (e.g., rat glucocorticoid receptor promoter systems, human estrogen receptor promoter systems, retinoid promoter systems, thyroid promoter systems, ecdysone promoter systems, mifepristone promoter systems, etc.), metal regulated promoters (e.g., metallothionein promoter systems, etc.), pathogenesis-related regulated promoters (e.g., salicylic acid regulated promoters, ethylene regulated promoter
  • RNA polymerase III (Pol III) promoters can be used to drive the expression of non-protein coding RNA molecules (e.g., guide RNAs).
  • a suitable promoter is a Pol III promoter.
  • a Pol III promoter is operably linked to a nucleotide sequence encoding a guide RNA (gRNA).
  • gRNA guide RNA
  • a Pol III promoter is operably linked to a nucleotide sequence encoding a singleguide RNA (sgRNA).
  • sgRNA singleguide RNA
  • a Pol III promoter is operably linked to a nucleotide sequence encoding a CRISPR RNA (crRNA).
  • a Pol III promoter is operably linked to a nucleotide sequence encoding a encoding a tracrRNA.
  • Non-limiting examples of Pol III promoters include a U6 promoter, an Hl promoter, a 5S promoter, an Adenovirus 2 (Ad2) VAI promoter, a tRNA promoter, and a 7SK promoter. See , for example, Schramm and Hernandez (2002) Genes & Development 16:2593-2620.
  • a Pol III promoter is selected from the group consisting of a U6 promoter, an Hl promoter, a 5S promoter, an Adenovirus 2 (Ad2) VAI promoter, a tRNA promoter, and a 7SK promoter.
  • a guide RNA- encoding nucleotide sequence is operably linked to a promoter selected from the group consisting of a U6 promoter, an Hl promoter, a 5S promoter, an Adenovims 2 (Ad2) VAI promoter, a tRNA promoter, and a 7SK promoter.
  • a single-guide RNA-encoding nucleotide sequence is operably linked to a promoter selected from the group consisting of a U6 promoter, an Hl promoter, a 5S promoter, an Adenovirus 2 (Ad2) VAI promoter, a tRNA promoter, and a 7SK promoter.
  • Examples describing a promoter that can be used herein in connection with expression in plants, plant tissues, and plant cells include, but are not limited to, promoters described in: U.S. Pat. No. 6,437,217 (maize RS81 promoter), U.S. Pat. No. 5,641,876 (rice actin promoter), U.S. Pat. No. 6,426,446 (maize RS324 promoter), U.S. Pat. No. 6,429,362 (maize PR-1 promoter), U.S. Pat. No. 6,232,526 (maize A3 promoter), U.S. Pat. No. 6,177,611 (constitutive maize promoters), U.S. Pat. Nos.
  • nucleic acid e.g., a nucleic acid comprising a donor polynucleotide sequence, one or more nucleic acids encoding a Casl2L protein and/or a Casl2L guide RNA, and the like
  • a nucleic acid e.g., an expression construct
  • Suitable methods include e.g., viral infection, transfection, lipofection, electroporation, calcium phosphate precipitation, polyethyleneimine (PEJ)-mediated transfection, DEAE-dextran mediated transfection, liposome-mediated transfection, particle gun technology, calcium phosphate precipitation, direct microinjection, nanoparticle-mediated nucleic acid delivery, and the like.
  • PJ polyethyleneimine
  • introducing the recombinant expression vector into cells can occur in any culture media and under any culture conditions that promote the survival of the cells. Introducing the recombinant expression vector into a target cell can be carried out in vivo or ex vivo. Introducing the recombinant expression vector into a target cell can be carried out in vitro.
  • a Casl2L protein can be provided as RNA.
  • the RNA can be provided by direct chemical synthesis or may be transcribed in vitro from a DNA (e.g., encoding the Casl2L protein). Once synthesized, the RNA may be introduced into a cell by any of the well-known techniques for introducing nucleic acids into cells (e.g., microinjection, electroporation, transfection, etc.).
  • Nucleic acids may be provided to the cells using well-developed transfection techniques; see, e.g. Angel and Yanik (2010) PLoS ONE 5(7): el 1756, and the commercially available TransMessenger® reagents from Qiagen, StemfectTM RNA Transfection Kit from Stemgent, and TransIT®-mRNA Transfection Kit from Mirus Bio LLC. See also Beumer et al. (2008) PNAS 105(50): 19821-19826.
  • Vectors may be provided directly to a target host cell.
  • the cells are contacted with vectors comprising the subject nucleic acids (e.g., recombinant expression vectors having the donor template sequence and encoding the Casl2L guide RNA; recombinant expression vectors encoding the Casl2L protein; etc.) such that the vectors are taken up by the cells.
  • vectors comprising the subject nucleic acids (e.g., recombinant expression vectors having the donor template sequence and encoding the Casl2L guide RNA; recombinant expression vectors encoding the Casl2L protein; etc.) such that the vectors are taken up by the cells.
  • Methods for contacting cells with nucleic acid vectors that are plasmids include electroporation, calcium chloride transfection, microinjection, and lipofection are well known in the art.
  • cells can be contacted with viral particles comprising the subject viral expression vectors.
  • Retroviruses for example, lentiviruses, are suitable for use in methods of the present disclosure.
  • Commonly used retroviral vectors are “defective”, i.e. unable to produce viral proteins required for productive infection. Rather, replication of the vector requires growth in a packaging cell line.
  • the retroviral nucleic acids comprising the nucleic acid are packaged into viral capsids by a packaging cell line.
  • Different packaging cell lines provide a different envelope protein (ecotropic, amphotropic or xenotropic) to be incorporated into the capsid, this envelope protein determining the specificity of the viral particle for the cells (ecotropic for murine and rat; amphotropic for most mammalian cell types including human, dog and mouse; and xenotropic for most mammalian cell types except murine cells).
  • the appropriate packaging cell line may be used to ensure that the cells are targeted by the packaged viral particles.
  • Methods of introducing subject vector expression vectors into packaging cell lines and of collecting the viral particles that are generated by the packaging lines are well known in the art. Nucleic acids can also introduced by direct micro-injection (e.g., injection of RNA).
  • Vectors used for providing the nucleic acids encoding Casl2L guide RNA and/or a Casl2L polypeptide to a target host cell can include suitable promoters for driving the expression, that is, transcriptional activation, of the nucleic acid of interest.
  • suitable promoters for driving the expression that is, transcriptional activation, of the nucleic acid of interest.
  • the nucleic acid of interest will be operably linked to a promoter.
  • This may include ubiquitously acting promoters, for example, the CMV-0-actin promoter, or inducible promoters, such as promoters that are active in particular cell populations or that respond to the presence of drugs such as tetracycline.
  • vectors used for providing a nucleic acid encoding a Casl2L guide RNA and/or a Casl2L protein to a cell may include nucleic acid sequences that encode for selectable markers in the target cells, so as to identify cells that have taken up the Casl2L guide RNA and/or Casl2L protein.
  • a nucleic acid comprising a nucleotide sequence encoding a Casl2L polypeptide, or a Casl2L fusion polypeptide is in some cases an RNA.
  • a Casl2L fusion protein can be introduced into cells as RNA. Methods of introducing RNA into cells are known in the art and may include, for example, direct injection, transfection, or any other method used for the introduction of DNA.
  • a Casl2L protein may instead be provided to cells as a polypeptide. Such a polypeptide may optionally be fused to a polypeptide domain that increases solubility of the product. The domain may be linked to the polypeptide through a defined protease cleavage site, e.g.
  • the linker may also include one or more flexible sequences, e.g. from 1 to 10 glycine residues.
  • the cleavage of the fusion protein is performed in a buffer that maintains solubility of the product, e.g. in the presence of from 0.5 to 2 M urea, in the presence of polypeptides and/or polynucleotides that increase solubility, and the like.
  • Domains of interest include endosomolytic domains, e.g. influenza HA domain; and other polypeptides that aid in production, e.g. IF2 domain, GST domain, GRPE domain, and the like.
  • the polypeptide may be formulated for improved stability.
  • the peptides may be PEGylated, where the polyethyleneoxy group provides for enhanced lifetime in the blood stream.
  • a Casl2L polypeptide of the present disclosure may be fused to a polypeptide permeant domain to promote uptake by the cell.
  • a number of permeant domains are known in the art and may be used in the non-integrating polypeptides of the present disclosure, including peptides, peptidomimetics, and non-peptide carriers.
  • a permeant peptide may be derived from the third alpha helix of Drosophila melanogaster transcription factor Antennapaedia, referred to as penetratin, which comprises the amino acid sequence RQIKIWFQNRRMKWKK (SEQ ID NO:139).
  • the permeant peptide comprises the HIV-1 tat basic region amino acid sequence, which may include, for example, amino acids 49-57 of naturally-occurring tat protein.
  • Other permeant domains include poly-arginine motifs, for example, the region of amino acids 34-56 of HIV-1 rev protein, nonaarginine, octa-arginine, and the like.
  • the nona-arginine (R9) sequence is one of the more efficient PTDs that have been characterized (Wender et al. 2000; Uemura et al. 2002).
  • the site at which the fusion is made may be selected in order to optimize the biological activity, secretion or binding characteristics of the polypeptide. The optimal site will be determined by routine experimentation.
  • the target cell is a plant cell.
  • Numerous methods for transforming chromosomes or plastids in a plant cell with a recombinant nucleic acid are known in the art, which can be used according to methods of the present application to produce a transgenic plant cell and/or a transgenic plant. Any suitable method or technique for transformation of a plant cell known in the art can be used. Effective methods for transformation of plants include bacterially mediated transformation, such as Agrobacterium-mediated or Rhizobium-mediated transformation and microprojectile bombardment-mediated transformation.
  • a variety of methods are known in the art for transforming explants with a transformation vector via bacterially mediated transformation or microprojectile bombardment and then subsequently culturing, etc., those explants to regenerate or develop transgenic plants.
  • Other methods for plant transformation such as microinjection, electroporation, vacuum infiltration, pressure, sonication, silicon carbide fiber agitation, PEG-mediated transformation, etc., are also known in the art.
  • Transgenic plants produced by these transformation methods can be chimeric or non-chimeric for the transformation event depending on the methods and explants used.
  • Methods of transforming plant cells are well known by persons of ordinary skill in the art. For instance, specific instructions for transforming plant cells by microprojectile bombardment with particles coated with recombinant DNA (e.g., biolistic transformation) are found in U.S. Patent Nos. 5,550,318; 5,538,880 6,160,208; 6,399,861; and 6,153,812 and Agrobacterium-mediated transformation is described in U.S. Patent Nos. 5,159,135; 5,824,877; 5,591,616; 6,384,301; 5,750,871; 5,463,174; and 5,188,958. Additional methods for transforming plants can be found in, for example, Compendium of Transgenic Crop Plants (2009) Blackwell Publishing. Any appropriate method known to those skilled in the art can be used to transform a plant cell with any of the nucleic acids provided herein.
  • a Casl2L polypeptide of the present disclosure may be produced in vitro or by eukaryotic cells or by prokaryotic cells, and it may be further processed by unfolding, e.g. heat denaturation, dithiothreitol reduction, etc. and may be further refolded, using methods known in the art.
  • Modifications of interest that do not alter primary sequence include chemical derivatization of polypeptides, e.g., acylation, acetylation, carboxylation, amidation, etc. Also included are modifications of glycosylation, e.g. those made by modifying the glycosylation patterns of a polypeptide during its synthesis and processing or in further processing steps; e.g. by exposing the polypeptide to enzymes which affect glycosylation, such as mammalian glycosylating or deglycosylating enzymes. Also embraced are sequences that have phosphorylated amino acid residues, e.g. phosphotyrosine, phosphoserine, or phosphothreonine.
  • modifications of glycosylation e.g. those made by modifying the glycosylation patterns of a polypeptide during its synthesis and processing or in further processing steps; e.g. by exposing the polypeptide to enzymes which affect glycosylation, such as mammalian glycosylating or
  • nucleic acids e.g., encoding a Casl2L guide RNA, encoding a Casl2L fusion protein, etc.
  • proteins e.g., a Casl2L fusion protein derived from a wild type protein or a variant protein
  • nucleic acids e.g., encoding a Casl2L guide RNA, encoding a Casl2L fusion protein, etc.
  • proteins e.g., a Casl2L fusion protein derived from a wild type protein or a variant protein
  • protein activity e.g., transcription modulatory activity, enzymatic activity, etc.
  • Analogs of such polypeptides include those containing residues other than naturally occurring L-amino acids, e.g. D-amino acids or non-naturally occurring synthetic amino acids. D-amino acids may be substituted for some or all of the amino acid residues.
  • a Casl2L polypeptide of the present disclosure may be prepared by in vitro synthesis, using conventional methods as known in the art.
  • Various commercial synthetic apparatuses are available, for example, automated synthesizers by Applied Biosystems, Inc., Beckman, etc. By using synthesizers, naturally occurring amino acids may be substituted with unnatural amino acids. The particular sequence and the manner of preparation will be determined by convenience, economics, purity required, and the like.
  • cysteines can be used to make thioethers, histidines for linking to a metal ion complex, carboxyl groups for forming amides or esters, amino groups for forming amides, and the like.
  • a Casl2L polypeptide of the present disclosure may also be isolated and purified in accordance with conventional methods of recombinant synthesis.
  • a lysate may be prepared of the expression host and the lysate purified using high performance liquid chromatography (HPLC), exclusion chromatography, gel electrophoresis, affinity chromatography, or other purification technique.
  • HPLC high performance liquid chromatography
  • exclusion chromatography gel electrophoresis
  • affinity chromatography affinity chromatography
  • the compositions which are used will comprise 20% or more by weight of the desired product, more usually 75% or more by weight, preferably 95% or more by weight, and for therapeutic purposes, usually 99.5% or more by weight, in relation to contaminants related to the method of preparation of the product and its purification. Usually, the percentages will be based upon total protein.
  • a Casl 2L polypeptide, or a Cast 2L fusion polypeptide, of the present disclosure is at least 80% pure, at least 85% pure, at least 90% pure, at least 95% pure, at least 98% pure, or at least 99% pure (e.g., free of contaminants, non-Casl2L proteins or other macromolecules, etc.).
  • the Casl2L guide RNA and/or the Casl2L polypeptide of the present disclosure and/or the donor template sequence, whether they be introduced as nucleic acids or polypeptides are provided to the cells for about 30 minutes to about 24 hours, e.g., 1 hour, 1.5 hours, 2 hours, 2.5 hours, 3 hours, 3.5 hours 4 hours, 5 hours, 6 hours, 7 hour's, 8 hours, 12 hours, 16 hours, 18 hours, 20 hours, or any other period from about 30 minutes to about 24 hours, which may be repeated with a frequency of about every day to about every 4 days, e.g., every 1.5 days, every 2 days, every 3 days, or any other frequency from about every day to about every four days.
  • a frequency of about every day to about every 4 days e.g., every 1.5 days, every 2 days, every 3 days, or any other frequency from about every day to about every four days.
  • the agent(s) may be provided to the subject cells one or more times, e.g. one time, twice, three times, or more than three times, and the cells allowed to incubate with the agent(s) for some amount of time following each contacting event e.g. 16-24 hours, after which time the media is replaced with fresh media and the cells are cultured further.
  • the complexes may be provided simultaneously (e.g. as two polypeptides and/or nucleic acids), or delivered simultaneously. Alternatively, they may be provided consecutively, e.g. the targeting complex being provided first, followed by the second targeting complex, etc. or vice versa.
  • the DNA can be protected from damage and its entry into the cell facilitated, for example, by using lipoplexes and polyplexes.
  • a nucleic acid of the present disclosure can be covered with lipids in an organized structure like a micelle or a liposome.
  • lipids in an organized structure like a micelle or a liposome.
  • anionic negatively-charged
  • neutral neutral
  • cationic positively-charged
  • Lipoplexes that utilize cationic lipids have proven utility for gene transfer. Cationic lipids, due to their positive charge, naturally complex with the negatively charged DNA. Also as a result of their charge, they interact with the cell membrane.
  • the cationic lipids also protect against degradation of the DNA by the cell.
  • polyplexes Complexes of polymers with DNA are called polyplexes. Most polyplexes consist of cationic polymers and their production is regulated by ionic interactions.
  • endosome-lytic agents to lyse the endosome that is made during endocytosis
  • polymers such as polyethylenimine have their own method of endosome disruption as does chitosan and trimethylchitosan.
  • Dendrimers a highly branched macromolecule with a spherical shape, may be also be used to genetically modify stem cells.
  • the surface of the dendrimer particle may be functionalized to alter its properties.
  • a cationic dendrimer i.e., one with a positive surface charge.
  • charge complementarity leads to a temporary association of the nucleic acid with the cationic dendrimer.
  • the dendrimer-nucleic acid complex can be taken up into a cell by endocytosis.
  • a nucleic acid of the disclosure includes an insertion site for a guide sequence of interest.
  • a nucleic acid can include an insertion site for a guide sequence of interest, where the insertion site is immediately adjacent to a nucleotide sequence encoding the portion of a Casl2L guide RNA that does not change when the guide sequence is changed to hybridized to a desired target sequence (e.g., sequences that contribute to the Casl2L binding aspect of the guide RNA, e.g., the sequences that contribute to the dsRNA duplex(es) of the Casl2L guide RNA - this portion of the guide RNA can also be referred to as the ‘scaffold’ or ‘constant region’ of the guide RNA).
  • a subject nucleic acid e.g., an expression vector
  • An insertion site is any nucleotide sequence used for the insertion of the desired sequence. “Insertion sites” for use with various technologies are known to those of ordinary skill in the art and any convenient insertion site can be used. An insertion site can be for any method for manipulating nucleic acid sequences.
  • the insertion site is a multiple cloning site (MCS) (e.g., a site including one or more restriction enzyme recognition sequences), a site for ligation independent cloning, a site for recombination based cloning (e.g., recombination based on att sites), a nucleotide sequence recognized by a CRISPR/Cas (e.g. Cas9) based technology, and the like.
  • MCS multiple cloning site
  • Cas CRISPR/Cas
  • An insertion site can be any desirable length, and can depend on the type of insertion site (e.g., can depend on whether (and how many) the site includes one or more restriction enzyme recognition sequences, whether the site includes a target site for a CRISPR/Cas protein, etc.).
  • an insertion site of a subject nucleic acid is 3 or more nucleotides (nt) in length (e.g., 5 or more, 8 or more, 10 or more, 15 or more, 17 or more, 18 or more, 19 or more, 20 or more or 25 or more, or 30 or more nt in length).
  • the length of an insertion site of a subject nucleic acid has a length in a range of from 2 to 50 nucleotides (nt) (e.g., from 2 to 40 nt, from 2 to 30 nt, from 2 to 25 nt, from 2 to 20 nt, from 5 to 50 nt, from 5 to 40 nt, from 5 to 30 nt, from 5 to 25 nt, from 5 to 20 nt, from 10 to 50 nt, from 10 to 40 nt, from 10 to 30 nt, from 10 to 25 nt, from 10 to 20 nt, from 17 to 50 nt, from 17 to 40 nt, from 17 to 30 nt, from 17 to 25 nt). In some cases, the length of an insertion site of a subject nucleic acid has a length in a range of from 5 to 40 nt.
  • nt nucleotides
  • a subject nucleic acid e.g., a Casl2L guide RNA
  • has one or more modifications e.g., a base modification, a backbone modification, etc., to provide the nucleic acid with a new or enhanced feature (e.g., improved stability).
  • a nucleoside is a base-sugar combination. The base portion of the nucleoside is normally a heterocyclic base. The two most common classes of such heterocyclic bases are the purines and the pyrimidines.
  • Nucleotides are nucleosides that further include a phosphate group covalently linked to the sugar portion of the nucleoside.
  • the phosphate group can be linked to the 2', the 3', or the 5' hydroxyl moiety of the sugar.
  • the phosphate groups covalently link adjacent nucleosides to one another to form a lineal' polymeric compound.
  • the respective ends of this linear polymeric compound can be further joined to form a circular compound, however, linear compounds are suitable.
  • linear compounds may have internal nucleotide base complementarity and may therefore fold in a manner as to produce a fully or partially double-stranded compound.
  • the phosphate groups are commonly referred to as forming the internucleoside backbone of the oligonucleotide.
  • the normal linkage or backbone of RNA and DNA is a 3' to 5' phosphodiester linkage.
  • Suitable nucleic acid modifications include, but are not limited to: 2’0methyl modified nucleotides, 2’ Fluoro modified nucleotides, locked nucleic acid (LNA) modified nucleotides, peptide nucleic acid (PNA) modified nucleotides, nucleotides with phosphorothioate linkages, and a 5’ cap (e.g., a 7-methylguanylate cap (m7G)). Additional details and additional modifications are described below.
  • LNA locked nucleic acid
  • PNA peptide nucleic acid
  • a 2'-O-Methyl modified nucleotide (also referred to as 2'-O-Methyl RNA) is a naturally occurring modification of RNA found in tRNA and other small RNAs that arises as a post-transcriptional modification. Oligonucleotides can be directly synthesized that contain 2'-O-Methyl RNA. This modification increases Tm of RNA:RNA duplexes but results in only small changes in RNA:DNA stability. It is stabile with respect to attack by single-stranded ribonucleases and is typically 5 to 10-fold less susceptible to DNases than DNA. It is commonly used in antisense oligos as a means to increase stability and binding affinity to the target message.
  • Fluoro modified nucleotides e.g., 2' Fluoro bases
  • Tm binding affinity
  • 2' Fluoro bases have a fluorine modified ribose which increases binding affinity (Tm) and also confers some relative nuclease resistance when compared to native RNA. These modifications can improve stability in serum or other biological fluids.
  • LNA bases have a modification to the ribose backbone that locks the base in the C3'-endo position, which favors RNA A-type helix duplex geometry. This modification significantly increases Tm and is also very nuclease resistant. Multiple LNA insertions can be placed in an oligo at any position except the 3 '-end. Applications have been described ranging from antisense oligos to hybridization probes to SNP detection and allele specific PCR. Due to the large increase in Tm conferred by LNAs, they also can cause an increase in primer dimer formation as well as self-hairpin formation. In some cases, the number of LNAs incorporated into a single oligo is 10 bases or less.
  • the phosphorothioate (PS) bond (i.e., a phosphorothioate linkage) substitutes a sulfur atom for a non-bridging oxygen in the phosphate backbone of a nucleic acid (e.g., an oligo). This modification renders the internucleotide linkage resistant to nuclease degradation.
  • Phosphorothioate bonds can be introduced between the last 3-5 nucleotides at the 5'- or 3'-end of the oligo to inhibit exonuclease degradation. Including phosphorothioate bonds within the oligo (e.g., throughout the entire oligo) can help reduce attack by endonucleases as well.
  • a subject nucleic acid has one or more nucleotides that are 2'-O-Methyl modified nucleotides. In some embodiments, a subject nucleic acid has one or more 2’ Fluoro modified nucleotides. In some embodiments, a subject nucleic acid has one or more LNA bases. In some embodiments, a subject nucleic acid has one or more nucleotides that are linked by a phosphorothioate bond (i.e., the subject nucleic acid has one or more phosphorothioate linkages). In some embodiments, a subject nucleic acid has a 5’ cap (e.g., a 7-methylguanylate cap (m7G)).
  • m7G 7-methylguanylate cap
  • a subject nucleic acid has a combination of modified nucleotides.
  • a subject nucleic acid can have a 5’ cap (e.g., a 7-methylguanylate cap (m7G)) in addition to having one or more nucleotides with other modifications (e.g., a 2'-O-Methyl nucleotide and/or a 2’ Fluoro modified nucleotide and/or a LNA base and/or a phosphorothioate linkage).
  • m7G 7-methylguanylate cap
  • nucleic acids e.g., a Casl2L guide RNA
  • suitable nucleic acids include nucleic acids containing modified backbones or non-natural internucleoside linkages.
  • Nucleic acids having modified backbones include those that retain a phosphorus atom in the backbone and those that do not have a phosphorus atom in the backbone.
  • Suitable modified oligonucleotide backbones containing a phosphorus atom therein include, for example, phosphorothioates, chiral phosphorothioates, phosphorodithioates, phosphotriesters, aminoalkylphosphotriesters, methyl and other alkyl phosphonates including 3'-alkylene phosphonates, 5'- alkylene phosphonates and chiral phosphonates, phosphinates, phosphoramidates including 3'-amino phosphoramidate and aminoalkylphosphoramidates, phosphorodiamidates, thionophosphoramidates, thionoalkylphosphonates, thionoalkylphosphotriesters, selenophosphates and boranophosphates having normal 3'-5' linkages, 2'-5' linked analogs of these, and those having inverted polarity wherein one or more internucleotide linkages is a 3' to 3', 5'
  • Suitable oligonucleotides having inverted polarity comprise a single 3' to 3' linkage at the 3'-most internucleotide linkage i.e. a single inverted nucleoside residue which may be a basic (the nucleobase is missing or has a hydroxyl group in place thereof).
  • Various salts such as, for example, potassium or sodium), mixed salts and free acid forms are also included.
  • MMI type internucleoside linkages are disclosed in the above referenced U.S.
  • nucleic acids having morpholino backbone structures as described in, e.g., U.S. Pat. No. 5,034,506.
  • a subject nucleic acid comprises a 6- membered morpholino ring in place of a ribose ring.
  • a phosphorodiamidate or other non-phosphodiester internucleoside linkage replaces a phosphodiester linkage.
  • Suitable modified polynucleotide backbones that do not include a phosphorus atom therein have backbones that are formed by short chain alkyl or cycloalkyl internucleoside linkages, mixed heteroatom and alkyl or cycloalkyl internucleoside linkages, or one or more short chain heteroatomic or heterocyclic internucleoside linkages.
  • morpholino linkages formed in part from the sugar portion of a nucleoside
  • siloxane backbones sulfide, sulfoxide and sulfone backbones
  • formacetyl and thioformacetyl backbones methylene formacetyl and thioformacetyl backbones
  • riboacetyl backbones alkene containing backbones; sulfamate backbones; methyleneimino and methylenehydrazino backbones; sulfonate and sulfonamide backbones; amide backbones; and others having mixed N, O, S and CH? component parts.
  • Mimetics formed in part from the sugar portion of a nucleoside
  • siloxane backbones sulfide, sulfoxide and sulfone backbones
  • formacetyl and thioformacetyl backbones methylene formacetyl and thioformacety
  • a subject nucleic acid can be a nucleic acid mimetic.
  • mimetic as it is applied to polynucleotides is intended to include polynucleotides wherein only the furanose ring or both the furanose ring and the internucleotide linkage are replaced with non-furanose groups, replacement of only the furanose ring is also referred to in the art as being a sugar surrogate.
  • the heterocyclic base moiety or a modified heterocyclic base moiety is maintained for hybridization with an appropriate target nucleic acid.
  • PNA peptide nucleic acid
  • the sugar-backbone of a polynucleotide is replaced with an amide containing backbone, in particular an ami noethyl glycine backbone.
  • the nucleotides are retained and are bound directly or indirectly to aza nitrogen atoms of the amide portion of the backbone.
  • PNA peptide nucleic acid
  • the backbone in PNA compounds is two or more linked aminoethylglycine units which gives PNA an amide containing backbone.
  • the heterocyclic base moieties are bound directly or indirectly to aza nitrogen atoms of the amide portion of the backbone.
  • Another class of polynucleotide mimetic that has been studied is based on linked morpholino units (morpholino nucleic acid) having heterocyclic bases attached to the morpholino ring.
  • a number of linking groups have been reported that link the morpholino monomeric units in a morpholino nucleic acid.
  • One class of linking groups has been selected to give a non-ionic oligomeric compound.
  • the nonionic morpholino-based oligomeric compounds are less likely to have undesired interactions with cellular proteins.
  • Morpholino-based polynucleotides are non-ionic mimics of oligonucleotides which are less likely to form undesired interactions with cellular proteins (Dwaine A.
  • Morpholino-based polynucleotides are disclosed in U.S. Pat. No. 5,034,506, the disclosure of which is incorporated herein by reference in its entirety. A variety of compounds within the morpholino class of polynucleotides have been prepared, having a variety of different linking groups joining the monomeric subunits.
  • CeNA cyclohexenyl nucleic acids
  • the furanose ring normally present in a DNA/RNA molecule is replaced with a cyclohexenyl ring.
  • CeNA DMT protected phosphoramidite monomers have been prepared and used for oligomeric compound synthesis following classical phosphoramidite chemistry.
  • Fully modified CeNA oligomeric compounds and oligonucleotides having specific positions modified with CeNA have been prepared and studied (see Wang et al., J. Am. Chem. Soc.. 2000, 122, 8595-8602, the disclosure of which is incorporated herein by reference in its entirety).
  • CeNA monomers In general, the incorporation of CeNA monomers into a DNA chain increases its stability of a DNA/RNA hybrid. CeNA oligoadenylates formed complexes with RNA and DNA complements with similar stability to the native complexes.
  • the study of incorporating CeNA structures into natural nucleic acid structures was shown by NMR and circular dichroism to proceed with easy conformational adaptation.
  • a further modification includes Locked Nucleic Acids (LNAs) in which the 2'-hydroxyl group is linked to the 4' carbon atom of the sugar ring thereby forming a 2'-C,4'-C-oxymethylene linkage thereby forming a bicyclic sugar moiety.
  • the linkage cfan be a methylene (-CH2-), group bridging the 2' oxygen atom and the 4' carbon atom wherein n is 1 or 2 (Singh et al., Chem. Commun., 1998, 4, 455-456, the disclosure of which is incorporated herein by reference in its entirety).
  • Potent and nontoxic antisense oligonucleotides containing LNAs have been described (e.g., Wahlestedt et al., Proc. Natl. Acad. Sci. U.S.A., 2000, 97, 5633-5638, the disclosure of which is incorporated herein by reference in its entirety).
  • LNAs and preparation thereof are also described in WO 98/39352 and WO 99/14226, as well as U.S. applications 20120165514, 20100216983, 20090041809, 20060117410, 20040014959, 20020094555, and 20020086998, the disclosures of which are incorporated herein by reference in their entirety.
  • a subject nucleic acid can also include one or more substituted sugar moieties.
  • Suitable polynucleotides comprise a sugar substituent group selected from: OH; F; O-, S-, or N-alkyl; O-, S-, or N- alkenyl; O-, S- or N-alkynyl; or O-alkyl-O-alkyl, wherein the alkyl, alkenyl and alkynyl may be substituted or unsubstituted Ci to C10 alkyl or C2 to C10 alkenyl and alkynyl.
  • Suitable polynucleotides comprise a sugar substituent group selected from: Ci to C10 lower alkyl, substituted lower alkyl, alkenyl, alkynyl, alkaryl, aralkyl, O-alkaryl or O-aralkyl, SH, SCH 3 , OCN, Cl Br, CN, CF 3 , OCF 3 , SOCH 3 , SO 2 CH 3 , ONO 2 , NO 2 , N 3 , NH 2 , heterocycloalkyl, heterocycloalkaryl, aminoalkylamino, polyalkylamino, substituted silyl, an RNA cleaving group, a reporter group, an intercalator, a group for improving the pharmacokinetic properties of an oligonucleotide, or a group for improving the pharmacodynamic properties of an oligonucleotide, and other substituents having similar properties.
  • a suitable modification includes 2'-methoxyethoxy (2'-O-CH2 CH2OCH3, also known as 2'-0-(2-methoxyethyl) or 2'-M0E) (Martin et al., Helv. Chim. Acta, 1995, 78, 486-504, the disclosure of which is incorporated herein by reference in its entirety) i.e., an alkoxyalkoxy group.
  • a further suitable modification includes 2'- dimethylaminooxyethoxy, i.e., a CXCEh ON Cth group, also known as 2'-DMA0E, as described in examples hereinbelow, and 2'-dimethylaminoethoxyethoxy (also known in the art as 2'-O-dimethyl- amino-ethoxy-ethyl or 2'-DMAEOE), i.e., 2'-O-CH2-O-CH2-N(CH3)2-
  • Suitable sugar substituent groups include methoxy (-O-CH3), aminopropoxy (—0 CH2 CH2 CH2NH2), allyl (-CH2-CH-CH2), -O-allyl (—0— CH2 — CH-CH2) and fluoro (F).
  • 2'-sugar substituent groups may be in the arabino (up) position or ribo (down) position.
  • a suitable 2'-arabino modification is 2'-F.
  • Similar modifications may also be made at other positions on the oligomeric compound, particularly the 3' position of the sugar on the 3' terminal nucleoside or in 2'-5' linked oligonucleotides and the 5' position of 5' terminal nucleotide.
  • Oligomeric compounds may also have sugar mimetics such as cyclobutyl moieties in place of the pentofuranosyl sugar.
  • a subject nucleic acid may also include nucleobase (often referred to in the art simply as “base”) modifications or substitutions.
  • nucleobases include the purine bases adenine (A) and guanine (G), and the pyrimidine bases thymine (T), cytosine (C) and uracil (U).
  • Modified nucleobases include other synthetic and natural nucleobases such as 5 -methylcytosine (5- me-C), 5 -hydroxymethyl cytosine, xanthine, hypoxanthine, 2-aminoadenine, 6-methyl and other alkyl derivatives of adenine and guanine, 2-propyl and other alkyl derivatives of adenine and guanine, 2- thiouracil, 2-thiothymine and 2-thiocytosine, 5-halouracil and cytosine, 5-propynyl (-C ⁇ C-CHs) uracil and cytosine and other alkynyl derivatives of pyrimidine bases, 6-azo uracil, cytosine and thymine, 5- uracil (pseudouracil), 4-thiouracil, 8-halo, 8-amino, 8-thiol, 8-thioalkyl, 8-hydroxyl and other 8- substituted adenines and
  • nucleobases include tricyclic pyrimidines such as phenoxazine cytidine(lH-pyrimido(5,4-b)( 1 ,4)benzoxazin-2(3H)-one), phenothiazine cytidine (lH-pyrimido(5,4- b)(l,4)benzothiazin-2(3H)-one), G-clamps such as a substituted phenoxazine cytidine (e.g.
  • Heterocyclic base moieties may also include those in which the purine or pyrimidine base is replaced with other heterocycles, for example 7-deaza-adenine, 7-deazaguanosine, 2 -aminopyridine and 2-pyridone.
  • Further nucleobases include those disclosed in U.S. Pat. No. 3,687,808, those disclosed in The Concise Encyclopedia Of Polymer Science And Engineering, pages 858-859, Kroschwitz, J. I., ed. John Wiley & Sons, 1990, those disclosed by Englisch et al., Angewandte Chemie, International Edition, 1991, 30, 613, and those disclosed by Sanghvi, Y.
  • nucleobases are useful for increasing the binding affinity of an oligomeric compound.
  • These include 5-substituted pyrimidines, 6- azapyrimidines and N-2, N-6 and O-6 substituted purines, including 2-aminopropyladenine, 5- propynyluracil and 5-propynylcytosine. 5 -methylcytosine substitutions have been shown to increase nucleic acid duplex stability by 0.6-1.2° C.
  • Another possible modification of a subject nucleic acid involves chemically linking to the polynucleotide one or more moieties or conjugates which enhance the activity, cellular distribution or cellular uptake of the oligonucleotide.
  • moieties or conjugates can include conjugate groups covalently bound to functional groups such as primary or secondary hydroxyl groups.
  • Conjugate groups include, but are not limited to, intercalators, reporter molecules, polyamines, polyamides, polyethylene glycols, polyethers, groups that enhance the pharmacodynamic properties of oligomers, and groups that enhance the pharmacokinetic properties of oligomers.
  • Suitable conjugate groups include, but are not limited to, cholesterols, lipids, phospholipids, biotin, phenazine, folate, phenanthridine, anthraquinone, acridine, fluoresceins, rhodamines, coumarins, and dyes.
  • Groups that enhance the pharmacodynamic properties include groups that improve uptake, enhance resistance to degradation, and/or strengthen sequence-specific hybridization with the target nucleic acid.
  • Groups that enhance the pharmacokinetic properties include groups that improve uptake, distribution, metabolism or excretion of a subject nucleic acid.
  • Conjugate moieties include but are not limited to lipid moieties such as a cholesterol moiety (Letsinger et al., Proc. Natl. Acad. Sci. USA, 1989, 86, 6553-6556), cholic acid (Manoharan et al., Bioorg. Med. Chem. Let., 1994, 4, 1053-1060), a thioether, e.g., hexyl-S-tritylthiol (Manoharan et al., Ann. N. Y. Acad. Sci., 1992, 660, 306-309; Manoharan et al., Bioorg. Med. Chem.
  • lipid moieties such as a cholesterol moiety (Letsinger et al., Proc. Natl. Acad. Sci. USA, 1989, 86, 6553-6556), cholic acid (Manoharan et al., Bioorg. Med. Chem. Let., 1994, 4, 10
  • Acids Res., 1990, 18, 3777- 3783 a polyamine or a polyethylene glycol chain (Manoharan et al., Nucleosides & Nucleotides, 1995, 14, 969-973), or adamantane acetic acid (Manoharan et al., Tetrahedron Lett., 1995, 36, 3651-3654), a palmityl moiety (Mishra et al., Biochim. Biophys. Acta, 1995, 1264, 229-237), or an octadecylamine or hexylamino-carbonyl-oxycholesterol moiety (Crooke et al., J. Pharmacol. Exp. Ther., 1996, 277, 923- 937).
  • a conjugate may include a "Protein Transduction Domain” or PTD (also known as a CPP - cell penetrating peptide), which may refer to a polypeptide, polynucleotide, carbohydrate, or organic or inorganic compound that facilitates traversing a lipid bilayer, micelle, cell membrane, organelle membrane, or vesicle membrane.
  • PTD Protein Transduction Domain
  • a PTD attached to another molecule which can range from a small polar molecule to a large macromolecule and/or a nanoparticle, facilitates the molecule traversing a membrane, for example going from extracellular space to intracellular space, or cytosol to within an organelle (e.g., the nucleus).
  • a PTD is covalently linked to the 3’ end of an exogenous polynucleotide. In some cases, a PTD is covalently linked to the 5’ end of an exogenous polynucleotide.
  • Exemplary PTDs include but are not limited to a minimal undecapeptide protein transduction domain (corresponding to residues 47-57 of HIV-1 TAT comprising YGRKKRRQRRR; SEQ ID NO:135); a polyarginine sequence comprising a number of arginines sufficient to direct entry into a cell (e.g., 3, 4, 5, 6, 7, 8, 9, 10, or 10-50 arginines); a VP22 domain (Zender et al. (2002) Cancer Gene Ther.
  • Exemplary PTDs include but are not limited to, YGRKKRRQRRR (SEQ ID NO: 135), RKKRRQRRR (SEQ ID NO: 140); an arginine homopolymer of from 3 arginine residues to 50 arginine residues;
  • Exemplary PTD domain amino acid sequences include, but are not limited to, any of the following: YGRKKRRQRRR SEQ ID NO: 135); RKKRRQRR (SEQ ID N0:141); YARAAARQARA SEQ ID NO:142); THRLPRRRRRR (SEQ ID NO:143); and GGRRARRRRRR (SEQ ID NO: 144).
  • the PTD is an activatable CPP (ACPP) (Aguilera et al. (2009) Integr Biol (Camb) June; 1(5-6): 371-381).
  • ACPPs comprise a polycationic CPP (e.g., Arg9 or “R9”) connected via a cleavable linker to a matching polyanion (e.g., Glu9 or “E9”), which reduces the net charge to nearly zero and thereby inhibits adhesion and uptake into cells.
  • a polyanion e.g., Glu9 or “E9”
  • Sequences of the polynucleotides of the present disclosure may be prepared by various suitable methods known in the art, including, for example, direct chemical synthesis or cloning.
  • formation of a polymer of nucleic acids typically involves sequential addition of 3 '-blocked and 5 '-blocked nucleotide monomers to the terminal 5'-hydroxyl group of a growing nucleotide chain, wherein each addition is effected by nucleophilic attack of the terminal 5'-hydroxyl group of the growing chain on the 3 '-position of the added monomer, which is typically a phosphorus derivative, such as a phosphotriester, phosphoramidite, or the like.
  • the desired sequences may be isolated from natural sources by splitting DNA using appropriate restriction enzymes, separating the fragments using gel electrophoresis, and thereafter, recovering the desired polynucleotide sequence from the gel via techniques known to those of ordinary skill in the art, such as utilization of a polymerase chain reaction (PCR; e.g., U.S. Pat. No. 4,683,195).
  • PCR polymerase chain reaction
  • the nucleic acids employed in the methods and compositions described herein may be codon optimized relative to a parental template for expression in a particular host cell.
  • Cells differ in their usage of particular codons, and codon bias corresponds to relative abundance of particular tRNAs in a given cell type.
  • codon bias corresponds to relative abundance of particular tRNAs in a given cell type.
  • RNAs of the present disclosure relate to guide RNAs and their use in CRISPR-based targeting of a target nucleic acid.
  • Guide RNAs of the present disclosure are capable of binding or otherwise interacting with a Casl2L polypeptide to facilitate targeting of the Casl2L polypeptide to a target nucleic acid.
  • Suitable and exemplary guide RNAs are provided herein and design of such to target a particular nucleic acid will be readily apparent to one of skill in the art.
  • Guide RNAs may also be modified to improve the efficiency of their function in guiding Casl2L to a target nucleic acid.
  • Guide RNAs of the present disclosure contain a CRISPR RNA (crRNA) sequence, and the sequence of the crRNA is involved in conferring specificity to targeting a specific nucleic acid sequence.
  • crRNA CRISPR RNA
  • guide RNA molecules may be extended to include sites for the binding of RNA binding proteins.
  • multiple guide RNAs can be assembled into a pre-crRNA array that can be processed by the RuvC domain of Casl2L. This will allow for multiplex editing to enable simultaneous targeting to several sites.
  • a guide RNA contains both RNA and a repeat sequence that is composed of DNA.
  • a guide RNA may be an RNA-DNA hybrid molecule.
  • a guide RNA may be expressed in a variety of ways as will be apparent to one of skill in the art.
  • a gRNA may be expressed from a recombinant nucleic acid in vivo, from a recombinant nucleic acid in vitro, from a recombinant nucleic acid ex vivo, or can be synthetically synthesized.
  • a guide RNA of the present disclosure may have various nucleotide lengths.
  • a guide RNA may contain, for example, at least 5, at least 10, at least 15, at least 20, at least 25, at least 30, at least 35, at least 40, at least 50, at least 60, at least 70, at least 80, at least 90, at least 100, at least 110, at least 120, at least 130, at least 140, at least 150, at least 160, at least 170, at least 180 nucleotides, at least 190 nucleotides, or at least 200 nucleotides or more.
  • Longer guide RNAs may result in increased editing efficiency by Casl2L polypeptides.
  • a guide RNA of the present disclosure may hybridize with a particular nucleotide sequence on a target nucleic acid. This hybridization may be 100% complementary or it may be less than 100% complementary so long as the hybridiziation is sufficient to allow Casl2L to bind to or interact with the target nucleic acid.
  • a guide RNA may contain a nucleotide sequence that is, for example, at least 80%, at least 85%, at least 90%, at least 91 %, at least 92%, at least 93%, at least 94%, at least 95%, at least 96%, at least 97%, at least 98%, at least 99%, or 100% identical or complementary to the target nucleotide sequence in the target nucleic acid that is targeted by/to be hybridized with the guide RNA.
  • increasing expression of a guide RNA may increase the editing efficiency of a target nucleic acid according to the methods of the present disclosure.
  • use of a Pol II promoter e.g. a CmYLCV promoter
  • a corresponding control promoter e.g. a Pol III promoter, such as a U6 promoter for example.
  • Use of a Pol II promoter to drive gRNA expression may increase the expression of the guide RNA by, for example, at least about 1%, at least about 5%, at least about 10%, at least about 15%, at least about 20%, at least about 25%, at least about 30%, at least about 35%, at least about 40%, at least about 45%, at least about 50%, at least about 60%, at least about 70%, at least about 80%, at least about 90%, at least about 100%, at least about 125%, at least about 150%, at least about 175%, at least about 200%, at least about 225%, at least about 250%, at least about 275%, or at least about 300% or more as compared to a corresponding control (e.g. a U6 promoter).
  • a corresponding control e.g. a U6 promoter
  • a guide RNA of the present disclosure may be recombinantly fused with a ribozyme sequence to assist in gRNA processing.
  • exemplary ribozymes for use herein will be readily apparent to one of skill in the art.
  • Exemplary ribozymes may include, for example, a Hammerhead-type ribozyme and a hepatitis delta virus ribyzome.
  • Use of a ribozyme to assist in processing of guide RNAs may increase efficiency of editing of a target nucleic acid sequence by a Casl2L polypeptide of the present disclosure.
  • Use of a ribozyme fused to a gRNA may increase relative editing efficiency by, for example, at least about 1%, at least about 5%, at least about 10%, at least about 15%, at least about 20%, at least about 25%, at least about 30%, at least about 35%, at least about 40%, at least about 45%, at least about 50%, at least about 60%, at least about 70%, at least about 80%, at least about 90%, at least about 100%, at least about 125%, at least about 150%, at least about 175%, at least about 200%, at least about 225%, at least about 250%, at least about 275%, or at least about 300% or more as compared to a corresponding control (e.g. a guide RNA that is expressed without the assistance of any additional processing machinery).
  • a corresponding control e.g. a guide RNA that is expressed without the assistance of any additional processing machinery.
  • Phylogenetic trees may be created for a gene family by using a program such as CLUSTAL (Thompson et al. Nucleic Acids Res. 22: 4673-4680 (1994); Higgins et al. Methods Enzymol 266: 383- 402 (1996)) or MEGA (Tamura et al. Mol. Biol. & Evo. 24:1596-1599 (2007)).
  • CLUSTAL Thimpson et al. Nucleic Acids Res. 22: 4673-4680 (1994); Higgins et al. Methods Enzymol 266: 383- 402 (1996)) or MEGA (Tamura et al. Mol. Biol. & Evo. 24:1596-1599 (2007)).
  • CLUSTAL Thimpson et al. Nucleic Acids Res. 22: 4673-4680 (1994); Higgins et al. Methods Enzymol 266: 383- 402 (1996)) or MEGA (Tamura e
  • Homologous sequences may also be identified by a reciprocal BLAST strategy. Evolutionary distances may be computed using the Poisson correction method (Zuckerkandl and Pauling, pp. 97-166 in Evolving Genes and Proteins, edited by V. Bryson and H.J. Vogel. Academic Press, New York (1965)). [0309] In addition, evolutionary information may be used to predict gene function. Functional predictions of genes can be greatly improved by focusing on how genes became similar in sequence (i.e. by evolutionary processes) rather than on the sequence similarity itself (Eisen, Genome Res. 8: 163-167 (1998)).
  • consensus sequences can not only be used to define the sequences within each clade, but define the functions of these genes; genes within a clade may contain paralogous sequences, or orthologous sequences that share the same function (see also, for example, Mount, Bioinformatics: Sequence and Genome Analysis Cold Spring Harbor Laboratory Press, Cold Spring Harbor, N.Y., page 543 (2001)).
  • Gapped BLAST in BLAST 2.0
  • Altschul et al. (1997) Nucleic Acids Res. 25:3389.
  • PSI-BLAST in BLAST 2.0
  • PSI-BLAST can be used to perform an iterated search that detects distant relationships between molecules. See Altschul et al. (1997) supra.
  • the default parameters of the respective programs e.g., BLASTN for nucleotide sequences, BLASTX for proteins
  • BLASTN for nucleotide sequences
  • BLASTX for proteins
  • sequence identity refers to the percentage of residues that are identical in the same positions in the sequences being analyzed.
  • sequence similarity refers to the percentage of residues that have similar biophysical / biochemical characteristics in the same positions (e.g. charge, size, hydrophobicity) in the sequences being analyzed.
  • Computer implementations of these mathematical algorithms can be utilized for comparison of sequences to determine sequence identity and/or similarity.
  • Such implementations include, for example: CLUSTAL in the PC/Gene program (available from Intelligenetics, Mountain View, Calif.); the AlignX program, versionl0.3.0 (Invitrogen, Carlsbad, CA) and GAP, BESTFIT, BLAST, FASTA, and TFASTA in the Wisconsin Genetics Software Package, Version 8 (available from Genetics Computer Group (GCG), 575 Science Drive, Madison, Wis., USA). Alignments using these programs can be performed using the default parameters.
  • the CLUSTAL program is well described by Higgins et al. Gene 73:237-244 (1988); Higgins et al. CABIOS 5:151-153 (1989); Corpet et al., Nucleic Acids Res.
  • Polynucleotides homologous to a reference sequence can be identified by hybridization to each other under stringent or under highly stringent conditions. Single stranded polynucleotides hybridize when they associate based on a variety of well characterized physical-chemical forces, such as hydrogen bonding, solvent exclusion, base stacking and the like.
  • the stringency of a hybridization reflects the degree of sequence identity of the nucleic acids involved, such that the higher the stringency, the more similar are the two polynucleotide strands. Stringency is influenced by a variety of factors, including temperature, salt concentration and composition, organic and non-organic additives, solvents, etc.
  • polynucleotide sequences that are capable of hybridizing to the disclosed polynucleotide sequences and fragments thereof under various conditions of stringency (see, for example, Wahl and Berger, Methods Enzymol. 152: 399-407 (1987); and Kimmel, Methods Enzymo. 152: 507-511, (1987)).
  • Full length cDNA, homologs, orthologs, and paralogs of polynucleotides of the present disclosure may be identified and isolated using well-known polynucleotide hybridization methods.
  • Hybridization experiments are generally conducted in a buffer of pH between 6.8 to 7.4, although the rate of hybridization is nearly independent of pH at ionic strengths likely to be used in the hybridization buffer (Anderson and Young (1985)(supra)).
  • one or more of the following may be used to reduce non-specific hybridization: sonicated salmon sperm DNA or another non- complementary DNA, bovine serum albumin, sodium pyrophosphate, sodium dodecylsulfate (SDS), polyvinyl-pyrrolidone, ficoll and Denhardt's solution.
  • Dextran sulfate and polyethylene glycol 6000 act to exclude DNA from solution, thus raising the effective probe DNA concentration and the hybridization signal within a given unit of time.
  • conditions of even greater stringency may be desirable or required to reduce non-specific and/or background hybridization. These conditions may be created with the use of higher temperature, lower ionic strength and higher concentration of a denaturing agent such as formamide.
  • Stringency conditions can be adjusted to screen for moderately similar fragments such as homologous sequences from distantly related organisms, or to highly similar fragments such as genes that duplicate functional enzymes from closely related organisms.
  • the stringency can be adjusted either during the hybridization step or in the post-hybridization washes.
  • Salt concentration, formamide concentration, hybridization temperature and probe lengths are variables that can be used to alter stringency.
  • high stringency is typically performed at Tm-5°C to Tm-20°C, moderate stringency at Tm-20°C to Tm-35°C and low stringency at Tm-35°C to Tm-50° C for duplex >150 base pairs.
  • Hybridization may be performed at low to moderate stringency (25-50°C below Tm), followed by post-hybridization washes at increasing stringencies. Maximum rates of hybridization in solution are determined empirically to occur at Tm-25°C for DNA-DNA duplex and Tm-15°C for RNA- DNA duplex. Optionally, the degree of dissociation may be assessed after each wash step to determine the need for subsequent, higher stringency wash steps.
  • High stringency conditions may be used to select for nucleic acid sequences with high degrees of identity to the disclosed sequences.
  • An example of stringent hybridization conditions obtained in a filter-based method such as a Southern or northern blot for hybridization of complementary nucleic acids that have more than 100 complementary residues is about 5 °C to 20°C lower than the thermal melting point (Tm) for the specific sequence at a defined ionic strength and pH.
  • Hybridization and wash conditions that may be used to bind and remove polynucleotides with less than the desired homology to the nucleic acid sequences or their complements of the present disclosure include, for example: 6X saline sodium citrate (SSC) and 1% sodium dodecyl sulfate (SDS) at 65°C; 50% formamide, 4X SSC at 42°C; 0.5X SSC to 2.0 X SSC, 0.1% SDS at 50°C to 65°C; or 0.1X SSC to 2X SSC, 0.1% SDS at 50°C - 65°C; with a first wash step of, for example, 10 minutes at about 42°C with about 20% (v/v) formamide in 0.1X SSC, and with, for example, a subsequent wash step with 0.2 X SSC and 0.1% SDS at 65°C for 10, 20 or 30 minutes.
  • a 20X solution of SSC is 3 M sodium chloride and 300 mM trisodium citrate, pH 7.0.
  • wash steps may be performed at a lower temperature, e.g., 50o C.
  • An example of a low stringency wash step employs a solution and conditions of at least 25°C in 30 mM NaCl, 3 mM trisodium citrate, and 0.1% SDS over 30 min. Greater stringency may be obtained at 42°C in 15 mM NaCl, with 1.5 mM trisodium citrate, and 0.1% SDS over 30 min. Wash procedures will generally employ at least two final wash steps. Additional variations on these conditions will be readily apparent to those skilled in the art (see, for example, US Patent Application No. 20010010913).
  • wash steps of even greater stringency including conditions of 65°C -68°C in a solution of 15 mM NaCl, 1.5 mM trisodium citrate, and 0.1% SDS, or about 0.2X SSC, 0.1 % SDS at 65° C and washing twice, each wash step of 10, 20 or 30 min in duration, or about 0.1 X SSC, 0.1% SDS at 65° C and washing twice for 10, 20 or 30 min.
  • Hybridization stringency may be increased further by using the same conditions as in the hybridization steps, with the wash temperature raised about 3 °C to about 5 °C, and stringency may be increased even further by using the same conditions except the wash temperature is raised about 6 °C to about 9 °C.
  • Recombinant nucleic acids and/or recombinant polypeptides of the present disclosure may be present in host cells (e.g. plant cells).
  • recombinant nucleic acids are present in an expression vector and may encode a recombinant polypeptide, and the expression vector may be present in host cells (e.g. plant cells).
  • recombinant nucleic acids and/or recombinant polypeptides are present in host cells (e.g. plant cells) via direct introduction into the cell (e.g. via RNPs).
  • the genes encoding the recombinant polypeptides in the plant cell may be heterologous to the plant cell.
  • the plant cell does not naturally produce one or more polypeptides of the present disclosure, and contains heterologous nucleic acid constructs capable of expressing one or more genes necessary for producing those molecules.
  • the plant cell does not naturally produce one or more polypeptides of the present disclosure, and is provided the one or more polypeptides through exogenous delivery of the polypeptides directly to the plant cell without the need to express a recombinant nucleic acid encoding the recombinant polypeptide in the plant cell.
  • Recombinant polypeptides of the present disclosure may be introduced into host cells (e.g. plant cells) via any suitable methods known in the art.
  • a Casl2L polypeptide can be exogenously added to plant cells and the plant cells are maintained under conditions such that the recombinant polypeptide is targeted (via a guide RNA) to one or more target nucleic acids to edit/modify the target nucleic acids in the plant cells.
  • a recombinant nucleic acid encoding a Casl2L polypeptide of the present disclosure can be expressed in plant cells and the plant cells ar e maintained under conditions such that the Casl2L polypeptide is targeted (via a guide RNA) to one or more target nucleic acids to edit/modify the target nucleic acids in the plant cells.
  • a Casl2L polypeptide of the present disclosure may be transiently expressed in a plant via viral infection of the plant, or by introducing a Casl2L polypeptide -encoding RNA into a plant to facilitate editing/modification of a target nucleic acid of interest.
  • TRV Tobacco rattle virus
  • a Casl2L polypeptide and a guide RNA may be exogenously and directly supplied to a plant cell as a ribonucleoprotein (RNP) complex.
  • RNP ribonucleoprotein
  • This particular form of delivery is useful for facilitating transgene-free editing in plants.
  • Modified guide RNAs which are resistant to nuclease digestion could also be used in this approach.
  • Transgene-free callus from plants cells provided with an RNP could be used to regenerate whole edited plants.
  • a recombinant nucleic acid encoding a recombinant polypeptide of the present disclosure can be expressed in a plant with any suitable plant expression vector.
  • Typical vectors useful for expression of recombinant nucleic acids in higher plants are well known in the art and include, for example, vectors derived from the tumor-inducing (Ti) plasmid of Agrobacterium tumefaciens (e.g., see Rogers et al., Meth, in Enzymol. (1987) 153:253-277). These vectors are plant integrating vectors in that on transformation, the vectors integrate a portion of vector DNA into the genome of the host plant.
  • Ti tumor-inducing
  • Exemplary A. tumefaciens vectors useful herein are plasmids pKYLX6 and pKYLX7 (e.g., see of Schardl et al., Gene (1987) 61:1-11; and Berger et al., Proc. Natl. Acad. Sci. USA (1989) 86:8402-8406); and plasmid pBI 101.2 that is available from Clontech Laboratories, Inc. (Palo Alto, CA).
  • recombinant polypeptides of the present disclosure can be expressed as a fusion protein that is coupled to, for example, a maltose binding protein ("MBP"), glutathione S transferase (GST), hexahistidine, c-myc, or the FLAG epitope for ease of purification, monitoring expression, or monitoring cellular and subcellular localization.
  • MBP maltose binding protein
  • GST glutathione S transferase
  • hexahistidine hexahistidine
  • c-myc hexahistidine
  • FLAG epitope for ease of purification, monitoring expression, or monitoring cellular and subcellular localization.
  • a recombinant nucleic acid encoding a recombinant polypeptide of the present disclosure can be modified to improve expression of the recombinant protein in plants by using codon preference/codon optimization to target preferential expression in plant cells.
  • the recombinant nucleic acid is prepared or altered synthetically, advantage can be taken of known codon preferences of the intended plant host where the nucleic acid is to be expressed.
  • recombinant nucleic acids of the present disclosure can be modified to account for the specific codon preferences and GC content preferences of monocotyledons and dicotyledons, as these preferences have been shown to differ (Murray et al., Nucl. Acids Res. (1989) 17: 477-498).
  • the present disclosure further provides expression vectors encoding recombinant polypeptides of the present disclosure.
  • a nucleic acid sequence coding for the desired recombinant nucleic acid of the present disclosure can be used to construct a recombinant expression vector which can be introduced into the desired host cell.
  • a recombinant expression vector will typically contain a nucleic acid encoding a recombinant protein of the present disclosure, operably linked to transcriptional initiation regulatory sequences which will direct the transcription of the nucleic acid in the intended host cell, such as tissues of a transformed plant.
  • Recombinant nucleic acids e.g. encoding recombinant polypeptides of the present disclosure may be expressed on multiple expression vectors or they may be expressed on a single expression vector.
  • plant expression vectors may include (1) a cloned gene under the transcriptional control of 5' and 3' regulatory sequences and (2) a dominant selectable marker.
  • Such plant expression vectors may also contain, if desired, a promoter regulatory region (e.g., one conferring inducible or constitutive, environmentally-regulated or developmentally-regulated expression, or cell- or tissue-specific/selective expression), a transcription initiation start site, a ribosome binding site, an RNA processing signal, a transcription termination site, and/or a polyadenylation signal.
  • a promoter regulatory region e.g., one conferring inducible or constitutive, environmentally-regulated or developmentally-regulated expression, or cell- or tissue-specific/selective expression
  • a transcription initiation start site e.g., one conferring inducible or constitutive, environmentally-regulated or developmentally-regulated expression, or cell- or tissue-specific/selective expression
  • a transcription initiation start site e.g., one conferring inducible or constitutive, environmentally-regulated or developmentally-regulated expression, or cell- or tissue-specific/selective expression
  • RNA processing signal e.g.
  • expression of a nucleic acid of the present disclosure may be driven (in operable linkage) with a promoter (e.g. a promoter functional in plants or a plant-specific promoter).
  • a promoter generally refers to a DNA sequence that contains an RNA polymerase binding site, transcription start site, and/or TATA box and assists or promotes the transcription and expression of an associated transcribable polynucleotide sequence such as, for example, a gene.
  • a plant promoter, or functional fragment thereof can be employed to e.g. control the expression of a recombinant nucleic acid of the present disclosure in regenerated plants.
  • the selection of the promoter used in expression vectors will determine the spatial and temporal expression pattern of the recombinant nucleic acid in the modified plant, e.g., the nucleic acid encoding the recombinant polypeptide of the present disclosure is only expressed in the desired tissue or at a certain time in plant development or growth.
  • Certain promoters will express recombinant nucleic acids in all plant tissues and are active under most environmental conditions and states of development or cell differentiation (i.e., constitutive promoters).
  • promoters will express recombinant nucleic acids in specific cell types (such as leaf epidermal cells, mesophyll cells, root cortex cells) or in specific tissues or organs (roots, leaves or flowers, for example) and the selection will reflect the desired location of accumulation of the gene product.
  • the selected promoter may drive expression of the recombinant nucleic acid under various inducing conditions.
  • suitable constitutive promoters may include, for example, the core promoter of the Rsyn7, the core CaMV 35S promoter (Odell et al., Nature (1985) 313:810-812), CaMV 19S (Lawton et al., 1987), rice actin (Wang et al., 1992; U.S. Pat. No. 5,641,876; and McElroy et al., Plant Cell (1985) 2:163-171); ubiquitin (Christensen et al., Plant Mol. Biol. (1989)12:619-632; and Christensen et al., Plant Mol. Biol.
  • expression of a nucleic acid of the present disclosure may be driven (in operable linkage) with a UBQ10 promoter.
  • expression of a nucleic acid of the present disclosure may be driven (in operable linkage) with a promoter having a nucleic acid sequence with at least about 20%, at least about 25%, at least about 30%, at least about 40%, at least about 50%, at least about 55%, at least about 60%, at least about 65%, at least about 70%, at least about 75%, at least about 80%, at least about 85%, at least about 90%, at least about 91%, at least about 92%, at least about 93%, at least about 94%, at least about 95%, at least about 96%, at least about 97%, at least about 98%, at least about 99%, or at least about 100% nucleic acid sequence identity to the following UBQ10 promoter sequence:
  • a UBQ10 promoter comprises the following amino acid sequence: CGACGAGTCAGTAATAAACGGCGTCAAAGTGGTTGCAGCCGGCACACACGAGTCGTGTTTA TCAACTCAAAGCACAAATACTTTTCCTCAACCTAAAAATAAGGCAATTAGCCAAAAACAACT TTGCGTGTAAACAACGCTCAATACACGTGTCATTTTATTATTAGCTATTGCTTCACCGCCTTA GCTTTCTCGTGACCTAGTCGTCCTCGTCTTTTCTTCTTCTATAAAACAATACCCAAAG AGCTCTCTTCTTCTTCACAATTCAGATTTCAATTTCTCAAAATCTTAAAAACTTTCTCTCTCAATTCT
  • expression of a nucleic acid of the present disclosure may be driven with a UBQ10 promoter (i.e., the nucleic acid is operably linked to a UBQ10 promoter) having a nucleic acid sequence with at least about 20%, at least about 25%, at least about 30%, at least about 40%, at least about 50%, at least about 55%, at least about 60%, at least about 65%.
  • a UBQ10 promoter i.e., the nucleic acid is operably linked to a UBQ10 promoter having a nucleic acid sequence with at least about 20%, at least about 25%, at least about 30%, at least about 40%, at least about 50%, at least about 55%, at least about 60%, at least about 65%.
  • Recombinant nucleic acids of the present disclosure may be expressed using an RNA
  • Polymerase III (Pol III) promoter such as, for example, the U6 promoter or the Hl promoter (eLife 2013 2:e00471).
  • U6 promoter or the Hl promoter eLife 2013 2:e00471.
  • BMC Plant Biology 2014 14:327) a different Pol III promoters from three different Arabidopsis U6 genes, and their corresponding gene terminators.
  • additional Pol 111 promoters could be utilized to, for example, simultaneously express many guide RNAs to many different locations in the genome simultaneously.
  • the use of different Pol III promoters for each gRNA expression cassette may be desirable to reduce the chances of natural gene silencing that can occur when multiple copies of identical sequences are expressed in plants.
  • expression of a nucleic acid of the present disclosure may be driven (in operable linkage) with a U6 promoter.
  • expression of a nucleic acid of the present disclosure may be driven (in operable linkage) with a promoter having a nucleic acid sequence with at least about 20%, at least about 25%, at least about 30%, at least about 40%, at least about 50%, at least about 55%, at least about 60%, at least about 65%, at least about 70%, at least about 75%, at least about 80%, at least about 85%, at least about 90%, at least about 91%, at least about 92%, at least about 93%, at least about 94%, at least about 95%, at least about 96%, at least about 97%, at least about 98%, at least about 99%, or at least about 100% nucleic acid sequence identity to the following U6 promoter sequence: AAGGTCGGGCAGGAAGAGGGCCTATTTCCCATGATTCCTTCATATTTGCATATACGATACAA GGCTG
  • a U6 promoter can have the following amino acid sequence: AAGGTCGGGCAGGAAGAGGGCCTATTTCCCATGATTCCTTCATATTTGCATATACGATACAA GGCTGTTAGAGAGATAATTAGAATTAATTTGACTGTAAACACAAAGATATTAGTACAAAATA CGTGACGTAGAAAGTAATAATTTCTTGGGTAGTTTGCAGTTTTAAAATTATGTTTTAAAATGG ACTATCATATGCTTACCGTAACTTGAAAGTATTTCGATTTCTTGGCTTTATATATCTTGTGGA AAGGACG (SEQ ID NO:4).
  • a nucleic acid comprises a nucleotide sequence that is operably linked to a U6 promoter having a nucleic acid sequence with at least about 20%, at least about 25%, at least about 30%, at least about 40%, at least about 50%, at least about 55%, at least about 60%, at least about 65%, at least about 70%, at least about 75%, at least about 80%, at least about 85%, at least about 90%, at least about 91%, at least about 92%, at least about 93%, at least about 94%, at least about 95%, at least about 96%, at least about 97%, at least about 98%, at least about 99%, or at least about 100% nucleic acid sequence identity to the following AtU626 promoter sequence:
  • Recombinant nucleic acids of the present disclosure may be expressed using an RNA Polymerase II (Pol II) promoter such as, for example, the CmYLCV promoter and the 35S promoter.
  • RNA Polymerase II RNA Polymerase II
  • CmYLCV CmYLCV promoter
  • 35S promoter 35S promoter
  • expression of a nucleic acid of the present disclosure may be driven (in operable linkage) with a CmYLCV promoter.
  • CmYLCV promoters are described in, e.g., WO 2001/073087; and Sahoo et al. (2016) Methods Mol. Biol. 1482:111.
  • expression of a nucleic acid of the present disclosure may be driven (in operable linkage) with a promoter having a nucleic acid sequence with at least about 20%, at least about 25%, at least about 30%, at least about 40%, at least about 50%, at least about 55%, at least about 60%, at least about 65%, at least about 70%, at least about 75%, at least about 80%, at least about 85%, at least about 90%, at least about 91%, at least about 92%, at least about 93%, at least about 94%, at least about 95%, at least about 96%, at least about 97%, at least about 98%, at least about 99%, or 100%, nucleic acid sequence identity to the following CmYLCV promoter nucleotide sequence:
  • a nucleic acid of the present disclosure may be driven (in operable linkage) with a Cauliflower mosaic virus 35S promoter (CaMV 35S promoter).
  • a nucleic acid of the present disclosure comprises a nucleotide sequence operably linked to a promoter having a nucleic acid sequence with at least about 20%, at least about 25%, at least about 30%, at least about 40%, at least about 50%, at least about 55%, at least about 60%, at least about 65%, at least about 70%, at least about 75%, at least about 80%, at least about 85%, at least about 90%, at least about 91%, at least about 92%, at least about 93%, at least about 94%, at least about 95%, at least about 96%, at least about 97%, at least about 98%, at least about 99%, or 100%, nucleic acid sequence identity to the following CaMV 35S promoter nucleotide sequence:
  • a CaMV 35S promoter has the following nucleotide sequence: GGTCAACATGGTGGAGCACGACACACTTGTCTACTCCAAAAATATCAAAGATACAGTCTCAG AAGACCAAAGGGCAATTGAGACTTTTCAACAAAGGGTAATATCCGGAAACCTCCTCGGATT CCATTGCCCAGCTATCTGTCACTTTATTGTGAAGATAGTGGAAAAGGAAGGTGGCTCCTACA AATGCCATCATTGCGATAAAGGAAAGGCCATCGTTGAAGATGCCTCTGCCGACAGTGGTCCC AAAGATGGACCCCCACCCACGAGGAGCATCGTGGAAAAAGAAGACGTTCCAACCACGTCTT CAAAGCAAGTGGATTGATGTGATAACATGGTGGAGCACGACACACTTGTCTACTCCAAAAA TATCAAAGATACAGTCTCAGAAGACCAAAGGGCAATTGATGATAACATGGTGGAGCACGACACACTTGTCTACTCCAAAAA TATCAAAGATACAGTCTCAGAAGACCAAAGGGCAATTGAG
  • expression of a nucleic acid of the present disclosure may be driven (in operable linkage) with a 2x35S promoter.
  • expression of a nucleic acid of the present disclosure may be driven (in operable linkage) with a promoter having a nucleic acid sequence with at least about 20%, at least about 25%, at least about 30%, at least about 40%, at least about 50%, at least about 55%, at least about 60%, at least about 65%, at least about 70%, at least about 75%, at least about 80%, at least about 85%, at least about 90%, at least about 91%, at least about 92%, at least about 93%, at least about 94%, at least about 95%, at least about 96%, at least about 97%, at least about 98%, at least about 99%, or at least about 100% nucleic acid sequence identity to the nucleic acid sequence of SEQ ID NOG.
  • tissue specific promoters may include, for example, the lectin promoter (Vodkin et al., 1983; Lindstrom et al., 1990), the corn alcohol dehydrogenase 1 promoter (Vogel et al., 1989; Dennis et al., 1984), the corn light harvesting complex promoter (Simpson, 1986; Bansal et al., 1992), the corn heat shock protein promoter (Odell et al., Nature (1985) 313:810-812; Rochester et al., 1986), the pea small subunit RuBP carboxylase promoter (Poulsen et al., 1986; Cashmore et al., 1983), the Ti plasmid mannopine synthase promoter (Langridge et al., 1989), the Ti plasmid nopaline synthase promoter (Langridge et al., 1989), the petunia chaicone isomerase promoter (Van Tun
  • the plant promoter can direct expression of a recombinant nucleic acid of the present disclosure in a specific tissue or may be otherwise under more precise environmental or developmental control.
  • promoters are referred to here as “inducible” promoters.
  • Environmental conditions that may affect transcription by inducible promoters include, for example, pathogen attack, anaerobic conditions, or the presence of light.
  • inducible promoters include, for example, the AdhI promoter which is inducible by hypoxia or cold stress, the Hsp70 promoter which is inducible by heat stress, and the PPDK promoter which is inducible by light.
  • promoters under developmental control include, for example, promoters that initiate transcription only, or preferentially, in certain tissues, such as leaves, roots, fruit, seeds, or flowers.
  • An exemplary promoter is the anther specific promoter 5126 (U.S. Pat. Nos. 5,689,049 and 5,689,051).
  • the operation of a promoter may also vary depending on its location in the genome. Thus, an inducible promoter may become fully or partially constitutive in certain locations.
  • any combination of a constitutive or inducible promoter, and a non-tissue specific or tissue specific promoter may be used to control the expression of various recombinant polypeptides of the present disclosure.
  • the recombinant nucleic acids of the present disclosure and/or a vector housing a recombinant nucleic acid of the present disclosure may also contain a regulatory sequence that serves as a 3’ terminator sequence.
  • a terminator sequence generally refers to a nucleic acid sequence that marks the end of a gene or transcribable nucleic acid during transcription.
  • terminators that may be used in the recombinant nucleic acids of the present disclosure.
  • a recombinant nucleic acid of the present disclosure may contain a 3’ NOS terminator.
  • recombinant nucleic acids of the present disclosure contain a transcriptional termination site. Transcription termination sites may include, for example, OCS terminators, rbcS-E9 terminators, NOS terminators, HSP18.2 terminators, and poly-T terminators.
  • a nucleic acid of the present disclosure may contain a transcriptional termination site having a nucleic acid sequence with at least about 20%, at least about 25%, at least about 30%, at least about 40%, at least about 50%, at least about 55%, at least about 60%, at least about 65%, at least about 70%, at least about 75%, at least about 80%, at least about 85%, at least about 90%, at least about 91%, at least about 92%, at least about 93%, at least about 94%, at least about 95%, at least about 96%, at least about 97%, at least about 98%, at least about 99%, or at least about 100% nucleic acid sequence identity to the nucleic acid sequence of an 35S terminator, a HSP18 terminator, and/or an RbcS- E9 terminator.
  • Recombinant nucleic acids of the present disclosure may include one or more introns.
  • Introns may be included in e.g. recombinant nucleic acids being expressed on a vector in a host cell. The inclusion of one of more introns in a recombinant nucleic acid to be expressed may be particularly helpful to increase expression in plant cells.
  • Recombinant nucleic acids of the present disclosure may also contain selectable markers.
  • a selectable marker can be used to assist in the selection of transformed cells or tissue due to the presence of a selection agent, such as an antibiotic or herbicide, where the selectable marker gene provides tolerance or resistance to the selection agent.
  • the selection agent can bias or favor the survival, development, growth, proliferation, etc., of transformed cells expressing the selectable marker gene.
  • Selectable marker genes may include, for example, those conferring tolerance or resistance to antibiotics, such as kanamycin and paromomycin ( nptll ), hygromycin B (aph IV), streptomycin or spectinomycin ( aadA ) and gentamycin ( aac3 and aacC4), or those conferring tolerance or resistance to herbicides such as glufosinate ( bar or pat), dicamba (DM0) and glyphosate (aroA or Cp4-EPSPS).
  • antibiotics such as kanamycin and paromomycin ( nptll ), hygromycin B (aph IV), streptomycin or spectinomycin ( aadA ) and gentamycin ( aac3 and aacC4)
  • those conferring tolerance or resistance to herbicides such as glufosinate ( bar or pat), dicamba (DM0) and glyphosate (aroA or C
  • Selectable marker genes which provide an ability to visually screen for transformants may also be used such as, for example, luciferase or green fluorescent protein (GFP), or a gene expressing a beta glucuronidase or uidA gene (GUS) for which various chromogenic substrates are known.
  • GFP green fluorescent protein
  • GUS beta glucuronidase or uidA gene
  • a nucleic acid molecule provided herein contains a selectable marker gene selected from the group consisting of nptll, aph IV, aadA, aac3, aacC4, bar, pat, DM0, EPSPS, aroA, luciferase, GFP, and GUS.
  • Certain aspects of the present disclosure relate to plants and plant cells that contain Casl2L polypeptides that are targeted to one or more target nucleic acids in the plant/plant cell in order to edit/modify the target nucleic acid.
  • a “plant” refers to any of various photosynthetic, eukaryotic multi-cellular organisms of the kingdom Plantae, characteristically producing embryos, containing chloroplasts, having cellulose cell walls and lacking locomotion.
  • a “plant” includes any plant or part of a plant at any stage of development, including seeds, suspension cultures, plant cells, embryos, meristematic regions, callus tissue, leaves, roots, shoots, gametophytes, sporophytes, pollen, microspores, and progeny thereof. Also included are cuttings, and cell or tissue cultures.
  • plant tissue includes, for example, whole plants, plant cells, plant organs, e.g., leafs, stems, roots, meristems, plant seeds, protoplasts, callus, cell cultures, and any groups of plant cells organized into structural and/or functional units.
  • Various plant cells may be used in the present disclosure so long as they remain viable after being transformed or otherwise modified to express recombinant nucleic acids or house recombinant polypeptides.
  • the plant cell is not adversely affected by the transduction of the necessary nucleic acid sequences, the subsequent expression of the proteins or the resulting intermediates.
  • a broad range of plant types may be modified to incorporate recombinant polypeptides and/or polynucleotides of the present disclosure.
  • Suitable plants that may be modified include both monocoty ledonous (monocot) plants and dicotyledonous (dicot) plants.
  • suitable plants may include, for example, species of the Family Gramineae, including Sorghum bicolor and Zea mays; species of the genera: Cucurbita, Rosa, Vitis, Juglans, Fragaria, Lotus, Medicago, Onobrychis, Trifolium, Trigonella, Vigna, Citrus, Linum, Geranium, Manihot, Daucus, Arabidopsis, Brassica, Raphanus, Sinapis, Atropa, Capsicum, Datura, Hyoscyamus, Lycopersicon, Nicotiana, Solanum, Petunia, Digitalis, Majorana, Ciahorium, Helianthus, Lactuca, Bromus, Asparagus, Antirrhinum, Heterocallis, Nemesis, Pelargonium, Panieum, Pennisetum, Ranunculus, Senecio, Salpiglossis, Cucumis, Browaalia, Glycine, Pisum, Phaseolus
  • plant cells may include, for example, those from corn (Zea mays), canola (Brassica napus, Brassica rapa ssp.), Brassica species useful as sources of seed oil, alfalfa (Medicago sativa), rice (Oryza sativa), rye (Secale cereale), sorghum (Sorghum bicolor, Sorghum vulgare), millet (e.g., pearl millet (Pennisetum glaucum), proso millet (Panieum miliaceum), foxtail millet (Setaria italica), finger millet (Eleusine coracana)), sunflower (Helianthus annuus), safflower (Carthamus tinctorius), wheat (Triticum aestivum), duckweed (Lemna), soybean (Glycine max), tobacco (Nicotiana tabacum), potato (Solanum tuberosum), peanuts (Arachis hypo
  • suitable vegetables plants may include, for example, tomatoes (Lycopersicon esculentum), lettuce (e.g., Lactuca sativa), green beans (Phaseolus vulgaris), lima beans (Phaseolus limensis), peas (Lathyrus spp.), and members of the genus Cucumis such as cucumber (C. sativus), cantaloupe (C. cantalupensis), and musk melon (C. melo).
  • tomatoes Locopersicon esculentum
  • lettuce e.g., Lactuca sativa
  • green beans Phaseolus vulgaris
  • lima beans Phaseolus limensis
  • peas Lathyrus spp.
  • members of the genus Cucumis such as cucumber (C. sativus), cantaloupe (C. cantalupensis), and musk melon (C. melo).
  • Examples of suitable ornamental plants may include, for example, azalea (Rhododendron spp.), hydrangea (Macrophylla hydrangea), hibiscus (Hibiscus rosasanensis), roses (Rosa spp.), tulips (Tulipa spp.), daffodils (Narcissus spp.), petunias (Petunia hybrida), carnation (Dianthus caryophyllus), poinsettia (Euphorbiapulcherrima), and chrysanthemum.
  • suitable conifer plants may include, for example, loblolly pine (Pinus taeda), slash pine (Pinus elliotii), ponderosa pine (Pinus ponderosa), lodgepole pine (Pinus contorta), Monterey pine (Pinus radiata), Douglas-fir (Pseudotsuga menziesii), Western hemlock (Isuga canadensis), Sitka spruce (Picea glauca), redwood (Sequoia sempervirens), silver fir (Abies amabilis), balsam fir (Abies balsamea), Western red cedar (Thuja plicata), and Alaska yellow-cedar (Chamaecyparis nootkatensis).
  • leguminous plants may include, for example, guar, locust bean, fenugreek, soybean, garden beans, cowpea, mungbean, lima bean, fava bean, lentils, chickpea, peanuts (Arachis sp.), crown vetch (Vicia sp.), hairy vetch, adzuki bean, lupine (Lupinus sp.), trifolium, common bean (Phaseolus sp.), field bean (Pisum sp.), clover (Melilotus sp.) Lotus, trefoil, lens, and false indigo.
  • suitable forage and turf grass may include, for example, alfalfa (Medicago s sp.), orchard grass, tall fescue, perennial ryegrass, creeping bent grass, and redtop.
  • alfalfa Medicago s sp.
  • orchard grass tall fescue
  • perennial ryegrass perennial ryegrass
  • creeping bent grass and redtop.
  • suitable crop plants and model plants may include, for example, Arabidopsis, corn, rice, alfalfa, sunflower, canola, soybean, cotton, peanut, sorghum, wheat, tobacco, and lemna.
  • the plants and plant cells of the present disclosure may be genetically modified in that recombinant nucleic acids have been introduced into the plants, and as such the genetically modified plants and/or plant cells do not occur in nature.
  • a suitable plant of the present disclosure is e.g. one capable of expressing one or more nucleic acid constructs encoding one or more recombinant proteins.
  • the recombinant proteins encoded by the nucleic acids may be e.g. Casl2L polypeptides.
  • transgenic plant and “genetically modified plant” are used interchangeably and refer to a plant which contains within its genome a recombinant nucleic acid.
  • the recombinant nucleic acid is stably integrated within the genome such that the polynucleotide is passed on to successive generations.
  • the recombinant nucleic acid is transiently expressed in the plant.
  • the recombinant nucleic acid may be integrated into the genome alone or as part of a recombinant expression cassette.
  • Transgenic is used herein to include any cell, cell line, callus, tissue, plant part or plant, the genotype of which has been altered by the presence of exogenous nucleic acid including those transgenics initially so altered as well as those created by sexual crosses or asexual propagation from the initial transgenic.
  • Plant transformation protocols as well as protocols for introducing recombinant nucleic acids of the present disclosure into plants may vary depending on the type of plant or plant cell, e.g., monocot or dicot, targeted for transformation. Suitable methods of introducing recombinant nucleic acids of the present disclosure into plant cells and subsequent insertion into the plant genome include, for example, microinjection (Crossway et al., Biotechniques (1986) 4:320-334), electroporation (Riggs et al., Proc. Natl. Acad Sci. USA (1986) 83:5602-5606), Agrobacterium-mediated transformation (U.S. Pat. No.
  • Targeting can be achieved by providing the recombinant protein with an appropriate targeting peptide sequence.
  • targeting peptides include, for example, secretory signal peptides (for secretion or cell wall or membrane targeting), plastid transit peptides, chloroplast transit peptides, mitochondrial target peptides, vacuole targeting peptides, nuclear targeting peptides, and the like (e.g., see Reiss et al., Mol. Gen. Genet.
  • Modified plant may be grown in accordance with conventional methods (e.g., see McCormick et al., Plant Cell. Reports (1986) 81-84.). These plants may then be grown, and pollinated with either the same transformed strain or different strains, with the resulting hybrid having the desired phenotypic characteristic. Two or more generations may be grown to ensure that the subject phenotypic characteristic is stably maintained and inherited and then seeds harvested to ensure the desired phenotype or other property has been achieved.
  • the present disclosure also provides plants derived from plants having an edited/modified nucleic acid as a consequence of the methods of the present disclosure.
  • a plant having an edited/modified nucleic acid as a consequence of the methods of the present disclosure may be crossed with itself or with another plant to produce an Fl plant.
  • one or more of the resulting Fl plants may also have an edited/modified nucleic acid.
  • Progeny plants may also have an altered or modified phenotype as compared to a corresponding control plant.
  • the derived plants e.g. F1 or F2 plants resulting from or derived from crossing the plant having an edited/modified nucleic acid expression as a consequence of the methods of the present disclosure with another plant
  • the derived plants can be selected from a population of derived plants.
  • methods of selecting one or more of the derived plants that (i) lack recombinant nucleic acids, and (ii) have an edited/modified nucleic acid.
  • progeny plants as described herein do not necessarily need to contain a Casl2L polypeptide and/or a guide RNA in order to maintain the edit/modification to the target nucleic acid.
  • Plants with genetic backgrounds that are susceptible to transgene silencing may exhibit reduced Casl2L-mediated editing efficiency. It may thus be desireable, in some embodiments, to employ a genetic background that has reduced or eliminated susceptibility to transgene silencing. In some embodiments, employing a genetic background with reduced or eliminated susceptibility to transgene silencing may improve editing efficiency. Exemplary genetic backgrounds with reduced or eliminated susceptibility to transgene silencing will be readily apparent to one of skill in the art and include, for example, plants with mutations in RDR6 that reduce or eliminate RDR6 expression or function.
  • Conducting the methods of the present disclosure in a plant with a genetic background that reduces or eliminates susceptibility to transgene siliencing may increase the relative editing efficiency of a target nucleic acid by, for example, at least about 1%, at least about 5%, at least about 10%, at least about 15%, at least about 20%, at least about 25%, at least about 30%, at least about 35%, at least about 40%, at least about 45%, at least about 50%, at least about 60%, at least about 70%, at least about 80%, at least about 90%, at least about 100%, at least about 125%, at least about 150%, at least about 175%, at least about 200%, at least about 225%, at least about 250%, at least about 275%, or at least about 300% or more as compared to a corresponding control (e.g. a wild-type plant).
  • a corresponding control e.g. a wild-type plant
  • the present disclosure provides methods of modifying a target nucleic acid in a eukaryotic cell; the methods generally involve contacting a target nucleic acid in the eukaryotic cell with a Casl2L polypeptide of the present disclosure and a guide RNA.
  • the methods may further comprise use of a donor DNA.
  • Suitable eukaryotic cells including mammalian cells, plant cells, insect cells, arachnid cells, protozoan cells, fish cells, fungal cells, yeast cells, amphibian cells, reptile cells, and avian cells.
  • the eukaryotic cell is a plant cell.
  • Growing and/or cultivation conditions sufficient for the recombinant polypeptides and/or polynucleotides of the present disclosure to be expressed and/or maintained in the plant/plant cell and to be targeted to and edit/modify one or more target nucleic acids of the present disclosure are well known in the art and include any suitable growing conditions disclosed herein.
  • the plant is grown under conditions sufficient to express a Casl2L polypeptide of the present disclosure, and for the expressed Cast 2L polypeptides to be localized to the nucleus of cells of the plant in order to be targeted to and edit/modify the target nucleic acids (if those target nucleic acids are present in the nucleus).
  • the conditions sufficient for the expression of the Casl2L polypeptide will depend on the promoter used to control the expression of the Casl2L polypeptide. For example, if an inducible promoter is utilized, expression of the recombinant polypeptide in a plant will require that the plant to be grown in the presence of the inducer.
  • growing conditions sufficient for the recombinant polypeptides of the present disclosure to be expressed and/or maintained in the plant and to be targeted to one or more target nucleic acids to edit/modify the one or more target nucleic acids may vary depending on a number of factors (e.g. species of plant, use of inducible promoter, etc.). Suitable growing conditions may include, for example, ambient environmental conditions, standard laboratory conditions, standard greenhouse conditions, growth in long days under standard environmental conditions (e.g. 16 hours of light, 8 hours of dark), growth in 12 hour light: 12 hour dark day/night cycles, etc.
  • Plants and/or plant cells of the present disclosure housing a Casl2L polypeptide and a guide RNA may be maintained at a variety of temperatures. In general, the temperature should be sufficient for the Casl2L polypeptide and guide RNA to form, maintain, or otherwise be present as a complex that is able to target a target nucleic acid in order to edit/modify the target nucleic acids.
  • Exemplary growth/cultivation temperatures include, for example, at least about 20°C, at least about 21 °C, at least about 22°C, at least about 23°C, at least about 24°C, at least about 25°C, at least about 26°C, at least about 27°C, at least about 28°C, at least about 29°C, at least about 30°C, at least about 31 °C, at least about 32°C, at least about 33°C, at least about 34°C, at least about 35°C, at least about 36°C, at least about 37°C, at least about 38°C, at least about 39°C, or at least about 40°C.
  • Exemplary growth/cultivation temperatures include, for example, about 20°C to about 25°C, about 25°C to about 30°C, about 30°C to about 35 °C, or about 35 °C to about 40°C. Plants and plant cells may be maintained at a constant temperature throughout the duration of the growth and/or incuation period, or the temperature schedule can be adjusted at various points throughout the duration of the growth and/or incuation period as will be readily apparent to one of skill in the art depending on the particular growth and/or incubation purpose.
  • Various time frames may be used to observe editing/modification of a target nucleic acid according to the methods of the present disclosure. Plants and/or plant cells may be observed/as sayed for editing/modification of a target nucleic acid after, for example, about 30 minutes, about 45 minutes, about 1 hour, about 2.5 hours, about 5 hours, about 7.5 hours, about 10 hours, about 15 hours, about 20 hours, about 1 day, about 5 days, about 10 days, about 15 days, about 20 days, about 25 days, about 30 days, about 35 days, about 40 days, about 45 days, about 50 days, or about 55 days or more after being cultivated/grown in conditions sufficient for a Casl2L polypeptide to facilitate editing/modification of a target nucleic acid.
  • Certain aspects of the present disclosure relate to editing or modifying a target nucleic acid using Casl2L polypeptides.
  • a Casl2L polypeptide is used to create a mutation in a target nucleic acid.
  • Mutation of a nucleic acid generally refers to an insertion, deletion, substitution, duplication, or inversion of one or more nucleotides in the nucleic acid as compared to a reference or control nucleotide sequence.
  • a Casl2L polypeptide of the present disclosure may induce a doublestranded break (DSB) at a target site of a nucleic acid sequence that is then repaired by the natural processes of either homologous recombination (HR) or non-homologous end-joining (NHEJ). Sequence modifications, such as for example insertions and deletions, can occur at the DSB locations via NHEJ repair. If two DSBs flanking one target region are created, the breaks can be repaired via NHEJ by reversing the orientation of the targeted DNA (also referred to as an “inversion”). HR can be used to integrate a donor nucleic acid sequence into a target site.
  • HR homologous recombination
  • NHEJ non-homologous end-joining
  • a double-stranded break provided herein is repaired by NHEJ. In another aspect, a double-stranded break provided herein is repaired by HR.
  • a Casl2L polypeptide of the present disclosure may induce a doublestranded break with 5’ nucleotide overhangs at a target site of a nucleic acid sequence such that an exogenous DNA segment of interest can serve as the donor nucleic acid to be ligated into the target nucleic acid. The presence of 5’ nucleotide overhangs allows the insertion of the exogenous DNA to be directional.
  • a nucleic acid that encodes a polypeptide may be targeted and edited such that the modification to the nucleic acid results in a change to one or more codons in the encoded polypeptide.
  • the modification of the target nucleic acid may result in deletion of one or more codons in the encoded polypeptide.
  • a target nucleic acid of the present disclosure may be edited or modified in a variety of ways (e.g. deletion of nucleotides in the target nucleic acid) depending on the particular application as will be readily apparent to one of skill in the art.
  • a target nucleic acid subjected to the methods of the present disclosure may have an edit or modification of at least 1 nucleotide, at least 2 nucleotides, at least 3 nucleotides, at least 4 nucleotides, at least 5 nucleotides, at least 6 nucleotides, at least 7 nucleotides, at least 8 nucleotides, at least 9 nucleotides, at least 10 nucleotides, at least 11 nucleotides, at least 12 nucleotides, at least 13 nucleotides, at least 14 nucleotides, at least 15 nucleotides, at least 16 nucleotides, at least 17 nucleotides, at least 18 nucleotides, at least 19 nucleot
  • a target nucleic acid of the present disclosure may have its expression decreased/downregulated as compared to a corresponding control nucleic acid.
  • a target nucleic acid of the present disclosure in a cell may have its expression decreased/downregulated by at least about 5%, at least about 10%, at least about 15%, at least about 20%, at least about 25%, at least about 30%, at least about 40%, at least about 50%, at least about 55%, at least about 60%, at least about 65%, at least about 70%, at least about 75%, at least about 80%, at least about 85%, at least about 90%, at least about 91%, at least about 92%, at least about 93%, at least about 94%, at least about 95%, at least about 96%, at least about 97%, at least about 98%, at least about 99%, or at least about 100% as compared to a corresponding control.
  • a control may be a corresponding plant or plant cell that does not contain recombinant poly
  • a target nucleic acid may have its expression decreased/downregulated at least about 5%, at least 10%, at least 15%, at least 20%, at least 25%, at least 30%, at least 35%, at least 40%, at least 45%, at least 50%, at least 55%, at least 60%, at least 65%, at least 70%, at least 75%, at least 80%, at least 85%, at least 90%, at least 95%, or more, as compared to a corresponding control nucleic acid.
  • a control nucleic acid may be a corresponding nucleic acid from a plant or plant cell that does not contain a nucleic acid encoding a recombinant polypeptide of the present disclosure.
  • a target nucleic acid of the present disclosure may have its expression increased/upregulated/activated as compared to a corresponding control nucleic acid.
  • a target nucleic acid of the present disclosure in a cell may have its expression increased/upregulated/activated by at least about 5%, at least about 10%, at least about 15%, at least about 20%, at least about 25%, at least about 30%, at least about 40%, at least about 50%, at least about 55%, at least about 60%, at least about 65%, at least about 70%, at least about 75%, at least about 80%, at least about 85%, at least about 90%, at least about 91%, at least about 92%, at least about 93%, at least about 94%, at least about 95%, at least about 96%, at least about 97%, at least about 98%, at least about 99%, at least about 100% (or two-fold), at least 2.5-fold, at least 5-fold, at least 10-fold, at least 25-fold, at least 50-fold, at least 75-fold, at least 100-fold,
  • a target nucleic acid may have its expression increased/upregulated/activated at least about 1-fold, at least about 2-fold, at least about 3-fold, at least about 4-fold, at least about 5-fold, at least about 10-fold, at least about 15-fold, at least about 20-fold, at least about 25-fold, at least about 30-fold, at least about 40-fold, at least about 50-fold, at least about 75-fold, at least about 100-fold, at least about 150-fold, at least about 200-fold, at least about 300-fold, at least about 400-fold, at least about 500-fold, at least about 600-fold, at least about 700-fold, at least about 800-fold, at least about 900-fold, at least about 1,000-fold, at least about 1,250-fold, at least about 1,
  • a control nucleic acid may be a corresponding nucleic acid from a plant or plant cell that does not contain a nucleic acid encoding a recombinant polypeptide of the present disclosure.
  • Certain aspects of the present disclosur e relate to increasing editing efficiency of a Casl2L polypeptides of the present disclosure.
  • Editing frequency and efficiency as well as methods of determing such, are well-known in the art.
  • editing efficiency is evaluated by determining the observed quantity of a given target sequence that experienced an editing event (editing frequency) as compared to the total quantity of the target sequence observed (whether edited or unedited).
  • An increase in editing efficiency generally refers to an increase in the number of sequences experiencing an editing event (editing frequency) as compared to the total quantity of the target sequence observed (whether edited or unedited).
  • increases in editing efficiency are compared to corresponding controls in relative terms (relative editing efficiency). For example, if the absolute editing frequency in one condition is 0.5% and the absolute editing frequency in a second condition is 1%, the second condition represents a doubling of the absolute editing frequency relative to the first condition, or in other words, the second condition represents a 100% increase in relative editing efficiency as compared to the first condition.
  • the frequency or efficiency of editing of a target nucleic acid of the present disclosure may vary.
  • the particular promoter used to drive gRNA expression may influence the editing efficiency of a target nucleic acid.
  • use of a Pol II promoter (e.g. a CmYLCV promoter) to drive gRNA expression may result in increased editing efficiency as compared to a corresponding control promoter (e.g. a Pol III promoter, such as a U6 promoter for example).
  • Use of a Pol II promoter to drive gRNA expression may increase the relative editing efficiency of a target nucleic acid by, for example, at least about 1%, at least about 5%, at least about 10%, at least about 15%, at least about 20%, at least about 25%, at least about 30%, at least about 35%, at least about 40%, at least about 45%, at least about 50%, at least about 60%, at least about 70%, at least about 80%, at least about 90%, at least about 100%, at least about 125%, at least about 150%, at least about 175%, at least about 200%, at least about 225%, at least about 250%, at least about 275%, or at least about 300% or more as compared to a corresponding control e.g. a U6 promoter).
  • Various conditions or variables described herein may improve editing efficiency of a Casl2L polypeptide as described herein (e.g. targeting a region of open chromatin for editing, use of a ribozyme in the gRNA targeting, performing editing in a plant genetic background that exhibits reduced transgene silencing, etc.) as compared to corresponding control conditions or varaibles.
  • Various conditions or variables described herein may increase the relative editing efficiency of a target nucleic acid by, for example, at least about 1%, at least about 5%, at least about 10%, at least about 15%, at least about 20%, at least about 25%, at least about 30%, at least about 35%, at least about 40%, at least about 45%, at least about 50%, at least about 60%, at least about 70%, at least about 80%, at least about 90%, at least about 100%, at least about 125%, at least about 150%, at least about 175%, at least about 200%, at least about 225%, at least about 250%, at least about 275%, or at least about 300% or more as compared to a corresponding control condition or variable.
  • control conditions or variables will be readily apparent to one of skill in the art depending on the particular editing context.
  • the corresponding control may be as compared to a region of closed chromatin or heterochromatin, editing without the use of a ribozyme, and/or editing in a plant genetic background that exhibits relatively high transgene silencing.
  • control cell may be a cell that does not contain one or more of: (1) a Casl2L polypeptide, (2) a guide RNA, and/or (3) both a Casl2L polypeptide and a guide RNA.
  • qRT-PCR quantitative reverse transcription-polymerase chain reaction
  • kits comprising a polynucleotide, vector, cell, and/or composition described herein.
  • the kit further comprises a packed insert comprising instructions for the use of the polynucleotide, vector, cell, and/or composition.
  • the article of manufacture or kit further comprises one or more buffer, e.g., for storing, transferring, or otherwise using the polynucleotide, vector, cell, and/or composition.
  • the kit further comprises one or more containers for storing the polynucleotide, vector, cell, and/or composition.
  • a composition comprising:
  • Aspect 2 The composition of aspect 1, wherein the CRISPR-Cas effector polypeptide comprises an amino acid sequence having at least 80% amino acid sequence identity to the amino acid sequence depicted in any one of FIG. 5 A-5M.
  • Aspect 3 The composition of aspect 1, wherein the CRISPR-Cas effector guide RNA comprises a nucleotide sequence having 80%, 90%, 95%, 98%, 99%, or 100%, nucleotide sequence identity with any one of the nucleotide sequences depicted in FIG. 5A-5M, or is encoded by a nucleic acid comprising a nucleotide sequence having 80%, 90%, 95%, 98%, 99%, or 100%, nucleotide sequence identity with any one of the nucleotide sequences depicted in FIG. 5A-5M.
  • Aspect 4 The composition of any one of aspects 1-3, wherein the CRISPR-Cas effector polypeptide is fused to a nuclear localization signal (NLS).
  • NLS nuclear localization signal
  • Aspect 5 The composition of any one of aspects 1-4, wherein the composition comprises a lipid.
  • Aspect 6 The composition of any one of aspects 1-4, wherein a) and b) are within a liposome.
  • Aspect 7 The composition of any one of aspects 1-4, wherein a) and b) are within a particle.
  • Aspect 8 The composition of any one of aspects 1-7, comprising one or more of: a buffer, a nuclease inhibitor, and a protease inhibitor.
  • Aspect 9 The composition of any one of aspects 1-8, wherein the CRISPR-Cas effector polypeptide comprises an amino acid sequence having 96% or more identity to the amino acid sequence depicted in any one of FIG. 5 A-5M.
  • Aspect 10 The composition of any one of aspects 1-9, wherein the CRISPR-Cas effector polypeptide is a nickase that can cleave only one strand of a double-stranded target nucleic acid molecule.
  • Aspect 11 The composition of any one of aspects 1-9, wherein the CRISPR-Cas effector polypeptide is a catalytically inactive CRISPR-Cas effector polypeptide (dCRISPR-Cas effector).
  • Aspect 12 The composition of any one of aspects 1-11, wherein the CRISPR-Cas effector polypeptide has a length of from 600 amino acids to 800 amino acids.
  • Aspect 13 The composition of any one of aspects 1-12, further comprising a DNA donor template.
  • Aspect 14 The composition of any one of aspects 1-13, wherein the CRISPR-Cas effector guide RNA is a single molecule.
  • Aspect 15 The composition of any one of aspects 1-14, wherein the CRISPR-Cas effector guide RNA comprises one or more of a base modification, a sugar modification, and a backbone modification.
  • a CRISPR-Cas effector fusion polypeptide comprising:
  • Aspect 17 The CRISPR-Cas effector fusion polypeptide of aspect 16, wherein the CRISPR- Cas effector polypeptide comprises an amino acid sequence having 80% or more identity to the amino acid sequence depicted in any one of FIG. 5A-5M.
  • Aspect 18 The CRISPR-Cas effector fusion polypeptide of aspect 16, wherein the CRISPR- Cas effector polypeptide comprises an amino acid sequence having 90% or more identity to the amino acid sequence depicted in any one of FIG. 5A-5M.
  • Aspect 19 The CRISPR-Cas effector fusion polypeptide of any one of aspects 16-18, wherein the CRISPR-Cas effector polypeptide is a nickase that can cleave only one strand of a doublestranded target nucleic acid molecule.
  • Aspect 20 The CRISPR-Cas effector fusion polypeptide of any one of aspects 16-18, wherein the CRISPR-Cas effector polypeptide is a catalytically inactive CRISPR-Cas effector polypeptide (dCRISPR-Cas effector).
  • Aspect 21 The CRISPR-Cas effector fusion polypeptide of any one of aspects 16-20, wherein the CRISPR-Cas effector polypeptide has a length of from 600 amino acids to 800 amino acids.
  • Aspect 22 The CRISPR-Cas effector fusion polypeptide of any one of aspects 16-21, wherein the heterologous polypeptide is fused to the N-terminus and/or the C-terminus of the CRISPR- Cas effector polypeptide.
  • Aspect 23 The CRISPR-Cas effector fusion polypeptide of any one of aspects 16-22, comprising a nuclear localization signal (NLS).
  • NLS nuclear localization signal
  • Aspect 24 The CRISPR-Cas effector fusion polypeptide of any one of aspects 16-23, wherein the heterologous polypeptide is a targeting polypeptide that provides for binding to a cell surface moiety on a target cell or target cell type.
  • Aspect 25 The CRISPR-Cas effector fusion polypeptide of any one of aspects 16-23, wherein the heterologous polypeptide exhibits enzymatic activity.
  • Aspect 26 The CRISPR-Cas effector fusion polypeptide of aspect 25, wherein the heterologous polypeptide exhibits one or more enzymatic activities selected from: nuclease activity, methyltransferase activity, demethylase activity, DNA repair activity, DNA damage activity, deamination activity, dismutase activity, alkylation activity, depurination activity, oxidation activity, pyrimidine dimer forming activity, integrase activity, transposase activity, recombinase activity, polymerase activity, ligase activity, helicase activity, photolyase activity and glycosylase activity.
  • enzymatic activities selected from: nuclease activity, methyltransferase activity, demethylase activity, DNA repair activity, DNA damage activity, deamination activity, dismutase activity, alkylation activity, depurination activity, oxidation activity, pyrimidine dimer forming activity, integrase activity, transposas
  • Aspect 27 The CRISPR-Cas effector fusion polypeptide of aspect 25, wherein the heterologous polypeptide exhibits one or more enzymatic activities selected from: reverse transcriptase activity, nuclease activity, methyltransferase activity, demethylase activity, deamination activity, depurination activity, integrase activity, transposase activity, and recombinase activity.
  • Aspect 28 The CRISPR-Cas effector fusion polypeptide of any one of aspects 16-23, wherein the heterologous polypeptide exhibits an enzymatic activity that modifies a target polypeptide associated with a target nucleic acid.
  • Aspect 29 The CRISPR-Cas effector fusion polypeptide of aspect 28, wherein the heterologous polypeptide exhibits histone modification activity.
  • Aspect 30 The CRISPR-Cas effector fusion polypeptide of aspect 28 or aspect 29, wherein the heterologous polypeptide exhibits one or more enzymatic activities selected from: methyltransferase activity, demethylase activity, acetyltransferase activity, deacetylase activity, kinase activity, phosphatase activity, ubiquitin ligase activity, deubiquitinating activity, adenylation activity, deadenylation activity, SUMOylating activity, deSUMOylating activity, ribosylation activity, deribosylation activity, myristoylation activity, demyristoylation activity, glycosylation activity, and deglycosylation activity.
  • methyltransferase activity demethylase activity
  • acetyltransferase activity deacetylase activity
  • kinase activity phosphatase activity
  • ubiquitin ligase activity deubiquitinating activity
  • Aspect 31 The CRISPR-Cas effector fusion polypeptide of aspect 30, wherein the heterologous polypeptide exhibits one or more enzymatic activities selected from: methyltransferase activity, demethylase activity, acetyltransferase activity, and deacetylase activity.
  • Aspect 32 The CRISPR-Cas effector fusion polypeptide of any one of aspects 16-23, wherein the heterologous polypeptide is an endosomal escape polypeptide.
  • Aspect 33 The CRISPR-Cas effector fusion polypeptide of any one of aspects 16-23, wherein the heterologous polypeptide is a protein that increases or decreases transcription.
  • Aspect 34 The CRISPR-Cas effector fusion polypeptide of aspect 33, wherein the heterologous polypeptide is a transcriptional repressor domain.
  • Aspect 35 The CRISPR-Cas effector fusion polypeptide of aspect 33, wherein the heterologous polypeptide is a transcriptional activation domain.
  • Aspect 36 The CRISPR-Cas effector fusion polypeptide of any one of aspects 16-23, wherein the heterologous polypeptide is a protein binding domain.
  • a nucleic acid comprising a nucleotide sequence encoding the CRISPR-Cas effector fusion polypeptide of any one of aspects 16-36.
  • Aspect 38 The nucleic acid of aspect 37, wherein the nucleotide sequence encoding the CRISPR-Cas effector fusion polypeptide is operably linked to a promoter.
  • Aspect 39 The nucleic acid of aspect 38, wherein the promoter is functional in an archaeal cell.
  • Aspect 40 The nucleic acid of aspect 38, wherein the promoter is functional in a eukaryotic cell.
  • Aspect 41 The nucleic acid of aspect 40, wherein the promoter is functional in one or more of: a plant cell, a fungal cell, an animal cell, cell of an invertebrate, a fly cell, a cell of a vertebrate, a mammalian cell, a primate cell, a non-human primate cell, and a human cell.
  • Aspect 42 The nucleic acid of any one of aspects 39-41, wherein the promoter is one or more of: a constitutive promoter, an inducible promoter, a cell type-specific promoter, and a tissue-specific promoter.
  • Aspect 43 The nucleic acid of any one of aspects 38-42, wherein the nucleic acid is a recombinant expression vector.
  • Aspect 44 The nucleic acid of aspect 43, wherein the recombinant expression vector is a recombinant adenoassociated viral vector, a recombinant retroviral vector, or a recombinant lentiviral vector.
  • Aspect 45 The nucleic acid of aspect 39, wherein the promoter is functional in a prokaryotic cell.
  • Aspect 46 The nucleic acid of aspect 38, wherein the nucleic acid is an mRNA.
  • nucleic acids comprising:
  • the CRISPR-Cas effector polypeptide comprises an amino acid sequence having 50% or more amino acid sequence identity to the amino acid sequence set forth in any one of FIG. 5A- 5M.
  • Aspect 48 The one or more nucleic acids of aspect 47, wherein the CRISPR-Cas effector polypeptide comprises an amino acid sequence having 60% or more, or 75% or more, amino acid sequence identity to the amino acid sequence depicted in any one of FIG. 5A-5M.
  • Aspect 49 The one or more nucleic acids of aspect 47, wherein the CRISPR-Cas effector polypeptide comprises an amino acid sequence having 85% or more amino acid identity to the amino acid depicted in any one of FIG. 5A-5M.
  • Aspect 50 The one or more nucleic acids of any one of aspects 47-49, wherein the CRISPR- Cas effector guide RNA comprises a nucleotide sequence having 80% or more nucleotide sequence identity with any one of the nucleotide sequences set forth in FIG.
  • 5A-5M is encoded by a nucleic acid comprising a nucleotide sequence having 80%, 90%, 95%, 98%, 99%, or 100%, nucleotide sequence identity with any one of the nucleotide sequences depicted in FIG. 5A-5M.
  • Aspect 51 The one or more nucleic acids of any one of aspects 47-50, wherein the CRISPR- Cas effector polypeptide is fused to a nuclear localization signal (NLS).
  • NLS nuclear localization signal
  • Aspect 52 The one or more nucleic acids of any one of aspects 47-51, wherein the nucleotide sequence encoding the CRISPR-Cas effector guide RNA is operably linked to a promoter.
  • Aspect 53 The one or more nucleic acids of any one of aspects 47-52, wherein the nucleotide sequence encoding the CRISPR-Cas effector polypeptide is operably linked to a promoter.
  • Aspect 54 The one or more nucleic acids of aspect 52 or aspect 53, wherein the promoter operably linked to the nucleotide sequence encoding the CRISPR-Cas effector guide RNA, and/or the promoter operably linked to the nucleotide sequence encoding the CRISPR-Cas effector polypeptide, is functional in a eukaryotic cell.
  • Aspect 55 The one or more nucleic acids of aspect 54, wherein the promoter is functional in one or more of: a plant cell, a fungal cell, an animal cell, cell of an invertebrate, a fly cell, a cell of a vertebrate, a mammalian cell, a primate cell, a non-human primate cell, and a human cell.
  • Aspect 56 The one or more nucleic acids of any one of aspects 53-55, wherein the promoter is one or more of: a constitutive promoter, an inducible promoter, a cell type-specific promoter, and a tissue-specific promoter.
  • Aspect 57 The one or more nucleic acids of any one of aspects 47-56, wherein the one or more nucleic acids is one or more recombinant expression vectors.
  • Aspect 58 The one or more nucleic acids of aspect 57, wherein the one or more recombinant expression vectors are selected from: one or more adenoassociated viral vectors, one or more recombinant retroviral vectors, or one or more recombinant lentiviral vectors.
  • Aspect 59 The one or more nucleic acids of aspect 53, wherein the promoter is functional in a prokaryotic cell.
  • a eukaryotic cell comprising one or more of:
  • Aspect 61 The eukaryotic cell of aspect 60, comprising the nucleic acid encoding the CRISPR-Cas effector polypeptide, wherein said nucleic acid is integrated into the genomic DNA of the cell.
  • Aspect 62 The eukaryotic cell of aspect 60 or aspect 61, wherein the eukaryotic cell is a plant cell, a mammalian cell, an insect cell, an arachnid cell, a fungal cell, a bird cell, a reptile cell, an amphibian cell, an invertebrate cell, a mouse cell, a rat cell, a primate cell, a non-human primate cell, or a human cell.
  • a cell comprising a comprising a CRISPR-Cas effector fusion polypeptide, or a nucleic acid comprising a nucleotide sequence encoding the CRISPR-Cas effector fusion polypeptide.
  • Aspect 64 The cell of aspect 63, wherein the cell is a prokaryotic cell.
  • Aspect 65 The cell of aspect 63 or aspect 64, comprising the nucleic acid comprising a nucleotide sequence encoding the CRISPR-Cas effector fusion polypeptide, wherein said nucleic acid molecule is integrated into the genomic DNA of the cell.
  • Aspect 66 A method of modifying a target nucleic acid, the method comprising contacting the target nucleic acid with:
  • the CRISPR-Cas effector polypeptide comprises an amino acid sequence having 50% or more amino acid sequence identity to the amino acid sequence set forth in any one of FIG. 5A- 5M.
  • Aspect 67 The method of aspect 66, wherein said modification is cleavage of the target nucleic acid.
  • Aspect 68. The method of aspect 66 or aspect 67, wherein the target nucleic acid is selected from: double stranded DNA, single stranded DNA, RNA, genomic DNA, and extrachromosomal DNA.
  • Aspect 69. The method of any of aspects 66-68, wherein said contacting takes place in vitro outside of a cell.
  • Aspect 70 The method of any of aspects 66-68, wherein said contacting takes place inside of a cell in culture.
  • Aspect 71 The method of any of aspects 66-68, wherein said contacting takes place inside of a cell in vivo.
  • Aspect 72 The method of aspect 70 or aspect 71, wherein the cell is a eukaryotic cell.
  • Aspect 73 The method of aspect 72, wherein the cell is selected from: a plant cell, a fungal cell, a mammalian cell, a reptile cell, an insect cell, an avian cell, a fish cell, a parasite cell, an arthropod cell, a cell of an invertebrate, a cell of a vertebrate, a rodent cell, a mouse cell, a rat cell, a primate cell, a non-human primate cell, and a human cell.
  • the cell is selected from: a plant cell, a fungal cell, a mammalian cell, a reptile cell, an insect cell, an avian cell, a fish cell, a parasite cell, an arthropod cell, a cell of an invertebrate, a cell of a vertebrate, a rodent cell, a mouse cell, a rat cell, a primate cell, a non-human primate cell, and a human cell.
  • Aspect 74 The method of aspect 70 or aspect 71, wherein the cell is a prokaryotic cell.
  • Aspect 75 The method of any one of aspects 66-74, wherein said contacting results in genome editing.
  • Aspect 76 The method of any one of aspects 66-75, wherein said contacting comprises: introducing into a cell: (a) the CRISPR-Cas effector polypeptide, or a nucleic acid comprising a nucleotide sequence encoding the CRISPR-Cas effector polypeptide, and (b) the CRISPR-Cas effector guide RNA, or a nucleic acid comprising a nucleotide sequence encoding the CRISPR-Cas effector guide RNA.
  • Aspect 77 The method of aspect 76, wherein said contacting further comprises: introducing a DNA donor template into the cell.
  • Aspect 78 The method of any one of aspects 66-77, wherein the CRISPR-Cas effector guide RNA comprises a nucleotide sequence having 80% or more nucleotide sequence identity with any one of the nucleotide sequences set forth in FIG. 5A-5M; or is encoded by a nucleic acid comprising a nucleotide sequence having 80%, 90%, 95%, 98%, 99%, or 100%, nucleotide sequence identity with any one of the nucleotide sequences depicted in FIG. 5A-5M.
  • Aspect 79 The method of any one of aspects 66-78, wherein the CRISPR-Cas effector polypeptide is fused to a nuclear localization signal.
  • a method of modulating transcription from a target DNA, modifying a target nucleic acid, or modifying a protein associated with a target nucleic acid comprising contacting the target nucleic acid with: [0494] a) a CRISPR-Cas effector fusion polypeptide comprising a CRISPR-Cas effector polypeptide fused to a heterologous polypeptide; and
  • a CRISPR-Cas effector guide RNA comprising a guide sequence that hybridizes to a target sequence of the target nucleic acid
  • the CRISPR-Cas effector polypeptide comprises an amino acid sequence having 50% or more amino acid sequence identity to the amino acid sequence set forth in any one of FIG. 5A- 5M.
  • Aspect 81 The method of aspect 80, wherein the CRISPR-Cas effector guide RNA comprises a nucleotide sequence having 80% or more nucleotide sequence identity with any one of the crRNA sequences set forth in FIG. 5A-5M; or is encoded by a nucleic acid comprising a nucleotide sequence having 80%, 90%, 95%, 98%, 99%, or 100%, nucleotide sequence identity with any one of the nucleotide sequences depicted in FIG. 5A-5M.
  • Aspect 82 The method of aspect 80 or aspect 81 , wherein the CRISPR-Cas effector fusion polypeptide comprises nuclear localization signal.
  • Aspect 83 The method of any of aspects 80-82, wherein said modification is not cleavage of the target nucleic acid.
  • Aspect 84 The method of any of aspects 80-83, wherein the target nucleic acid is selected from: double stranded DNA, single stranded DNA, RNA, genomic DNA, and extrachromosomal DNA. [0501] Aspect 85. The method of any of aspects 80-84, wherein said contacting takes place in vitro outside of a cell.
  • Aspect 86 The method of any of aspects 80-84, wherein said contacting takes place inside of a cell in culture.
  • Aspect 87 The method of any of aspects 80-84, wherein said contacting takes place inside of a cell in vivo.
  • Aspect 88 The method of aspect 86 or aspect 87, wherein the cell is a eukaryotic cell.
  • Aspect 89 The method of aspect 88, wherein the cell is selected from: a plant cell, a fungal cell, a mammalian cell, a reptile cell, an insect cell, an avian cell, a fish cell, a parasite cell, an arthropod cell, a cell of an invertebrate, a cell of a vertebrate, a rodent cell, a mouse cell, a rat cell, a primate cell, a non-human primate cell, and a human cell.
  • the cell is selected from: a plant cell, a fungal cell, a mammalian cell, a reptile cell, an insect cell, an avian cell, a fish cell, a parasite cell, an arthropod cell, a cell of an invertebrate, a cell of a vertebrate, a rodent cell, a mouse cell, a rat cell, a primate cell, a non-human primate cell, and a human cell.
  • Aspect 90 The method of aspect 86 or aspect 87, wherein the cell is a prokaryotic cell.
  • Aspect 91 The method of any one of aspects 80-90, wherein said contacting comprises: introducing into a cell: (a) the CRISPR-Cas effector fusion polypeptide, or a nucleic acid comprising a nucleotide sequence encoding the CRISPR-Cas effector fusion polypeptide, and (b) the CRISPR-Cas effector guide RNA, or a nucleic acid comprising a nucleotide sequence encoding the CRISPR-Cas effector guide RNA.
  • Aspect 92 The method of any one of aspects 80-91, wherein the CRISPR-Cas effector polypeptide is a catalytically inactive CRISPR-Cas effector polypeptide (dCRISPR-Cas effector).
  • Aspect 93 The method of any one of aspects 80-92, wherein the CRISPR-Cas effector polypeptide has a length of from 275 amino acids to 465 amino acids.
  • Aspect 95 The method of any one of aspects 80-93, wherein the heterologous polypeptide exhibits an enzymatic activity.
  • Aspect 95 The method of aspect 94, wherein the heterologous polypeptide exhibits one or more enzymatic activities selected from: nuclease activity, methyltransferase activity, demethylase activity, DNA repair activity, DNA damage activity, deamination activity, dismutase activity, alkylation activity, depurination activity, oxidation activity, pyrimidine dimer forming activity, integrase activity, transposase activity, recombinase activity, polymerase activity, ligase activity, helicase activity, photolyase activity and glycosylase activity.
  • enzymatic activities selected from: nuclease activity, methyltransferase activity, demethylase activity, DNA repair activity, DNA damage activity, deamination activity, dismutase activity, alkylation activity, depurination activity, oxidation activity, pyrimidine dimer forming activity, integrase activity, transposase activity, recombinase activity
  • Aspect 96 The method of aspect 94, wherein the heterologous polypeptide exhibits one or more enzymatic activities selected from: reverse transcriptase activity, nuclease activity, methyltransferase activity, demethylase activity, deamination activity, depurination activity, integrase activity, transposase activity, and recombinase activity.
  • Aspect 97 The method of any one of aspects 80-93, wherein the heterologous polypeptide exhibits an enzymatic activity that modifies a target polypeptide associated with a target nucleic acid.
  • Aspect 98 The method of aspect 97, wherein the heterologous polypeptide exhibits histone modification activity.
  • Aspect 99 The method of aspect 97 or aspect 98, wherein the heterologous polypeptide exhibits one or more enzymatic activities selected from: methyltransferase activity, demethylase activity, acetyltransferase activity, deacetylase activity, kinase activity, phosphatase activity, ubiquitin ligase activity, deubiquitinating activity, adenylation activity, deadenylation activity, SUMOylating activity, deSUMOylating activity, ribosylation activity, deribosylation activity, myristoylation activity, demyristoylation activity, glycosylation activity (e.g., from O-GlcNAc transferase) and de glycosylation activity.
  • enzymatic activities selected from: methyltransferase activity, demethylase activity, acetyltransferase activity, deacetylase activity, kinase activity, phosphatase activity, ubiquit
  • Aspect 100 The method of aspect 99, wherein the heterologous polypeptide exhibits one or more enzymatic activities selected from: methyltransferase activity, demethylase activity, acetyltransferase activity, and deacetylase activity.
  • Aspect 101 The method of any one of aspects 80-93, wherein the heterologous polypeptide is protein that increases or decreases transcription.
  • Aspect 102 The method of aspect 101, wherein the heterologous polypeptide is a transcriptional repressor domain.
  • Aspect 103 The method of aspect 101, wherein the heterologous polypeptide is a transcriptional activation domain.
  • Aspect 104 The method of any one of aspects 80-93, wherein the heterologous polypeptide is a protein binding domain.
  • a transgenic, multicellular, non-human organism whose genome comprises a transgene comprising a nucleotide sequence encoding one or more of: [0522] a) a CRISPR-Cas effector polypeptide,
  • the CRISPR-Cas effector polypeptide comprises an amino acid sequence having 50% or more amino acid sequence identity to the amino acid sequence set forth in any one of FIG. 5A- 5M.
  • Aspect 106 The transgenic, multicellular', non-human organism of aspect 105, wherein the CRISPR-Cas effector polypeptide comprises an amino acid sequence having 80% or more amino acid sequence identity to the amino acid sequence set forth in any one of FIG. 5 A-5M.
  • Aspect 107 The transgenic, multicellular, non-human organism of aspect 105, wherein the CRISPR-Cas effector polypeptide comprises an amino acid sequence having 90% or more amino acid sequence identity to the amino acid sequence set forth in any one of FIG. 5 A-5M.
  • Aspect 108 The transgenic, multicellular, non-human organism of any one of aspects 105- 107, wherein the organism is a plant, a monocotyledon plant, a dicotyledon plant, an invertebrate animal, an insect, an arthropod, an arachnid, a parasite, a worm, a cnidarian, a vertebrate animal, a fish, a reptile, an amphibian, an ungulate, a bird, a pig, a horse, a sheep, a rodent, a mouse, a rat, or a non-human primate.
  • Aspect 109 A system comprising:
  • i) one or more recombinant expression vectors comprising: i) a nucleotide sequence encoding a CRISPR-Cas effector polypeptide; and ii) a nucleotide sequence encoding a CRISPR-Cas effector guide RNA;
  • j) one or more recombinant expression vectors comprising: i) a nucleotide sequence encoding a CRISPR-Cas effector polypeptide; ii) a nucleotide sequence encoding a CRISPR-Cas effector guide RNA; and iii) a DNA donor template;
  • k) one or more recombinant expression vectors comprising: i) a nucleotide sequence encoding a CRISPR-Cas effector fusion polypeptide; and ii) a nucleotide sequence encoding a CRISPR- Cas effector guide RNA; and
  • one or more recombinant expression vectors comprising: i) a nucleotide sequence encoding a CRISPR-Cas effector fusion polypeptide; ii) a nucleotide sequence encoding a CRISPR-Cas effector guide RNA; and a DNA donor template,
  • the CRISPR-Cas effector polypeptide comprises an amino acid sequence having 50% or more amino acid sequence identity to the amino acid sequence depicted in any one of FIG. 5 A- 5M, and
  • CRISPR-Cas effector fusion polypeptide is a CRISPR-Cas effector fusion polypeptide of any one of aspects 16-36.
  • Aspect 110 The CRISPR-Cas effector system of aspect 109, wherein the CRISPR-Cas effector polypeptide comprises an amino acid sequence having 80% or more amino acid sequence identity to the amino acid sequence depicted in any one of FIG. 5 A-5M.
  • Aspect 111 The CRISPR-Cas effector system of aspect 109, wherein the CRISPR-Cas effector polypeptide comprises an amino acid sequence having 90% or more amino acid sequence identity to the amino acid sequence depicted in any one of FIG. 5 A-5M.
  • Aspect 112. The CRISPR-Cas effector system of any of aspects 109-111, wherein the donor template nucleic acid has a length of from 8 nucleotides to 1000 nucleotides.
  • Aspect 113 The CRISPR-Cas effector system of any of aspects 109-111, wherein the donor template nucleic acid has a length of from 25 nucleotides to 500 nucleotides.
  • Aspect 114 A kit comprising the CRISPR-Cas effector system of any one of aspects 109- 113.
  • Aspect 115 The kit of aspect 114, wherein the components of the kit are in the same container.
  • Aspect 116 The kit of aspect 114, wherein the components of the kit are in separate containers.
  • a sterile container comprising the CRISPR-Cas effector system of any one of aspects 109-116.
  • Aspect 118 The sterile container of aspect 117, wherein the container is a syringe.
  • Aspect 119 An implantable device comprising the CRISPR-Cas effector system of any one of aspects 109-116.
  • Aspect 120 The implantable device of aspect 119, wherein the CRISPR-Cas effector system is within a matrix.
  • Aspect 121 The implantable device of aspect 119, wherein the CRISPR-Cas effector system is in a reservoir.
  • Standard abbreviations may be used, e.g., bp, base pair(s); kb, kilobase(s); pl, picoliter(s); s or sec, second(s); min, minute(s); h or hr, hour(s); aa, amino acid(s); kb, kilobase(s); bp, base pair(s); nt, nucleotide(s); i.m., intramuscular(ly); i.p., intraperitoneal(ly); s.c., subcutaneous(ly); and the like.
  • bp base pair(s); kb, kilobase(s); pl, picoliter(s); s or sec, second(s); min, minute(s); h or hr, hour(s); aa, amino acid(s); kb, kilobase(s); bp, base pair(s); nt, nucleotide(s); i.m
  • CasZ contains a Target Strand Loading (TSL) domain that likely functions to load the single-stranded DNA substrate into the active site.
  • TSL Target Strand Loading
  • the TSL sits in a position analogous to the “Nuc” domain that was incorrectly hypothesized in other type V CRISPR-Cas enzymes to be a second nuclease domain responsible for DNA cleavage.
  • CasZ also exhibits a distinct structure in the REC I domain compared to Cas ⁇ D (FIG. 2D).
  • the crRNA forms an unexpected shape that blankets the protein, with a divergent recognition lobe in CasZ that binds to distinct sequences and structural features of the guide RNA (FIG. 3C; FIG. 2A- 2C). Specifically, possible interactions between primarily polar or charged residues within the REC II domain in CasZ with the conserved motifs of the crRNA hairpin were observed (FIG. 2C). These residues are conserved across the protein family and likely interact either directly with the RNA nucleobases (Q452, N510), or with the RNA phosphate backbone to stabilize the guide (S451 , K596, E444, N445, K503, Y619) (FIG. 3C; FIG. 2C).
  • CRISPR-Cas proteins initiate the unwinding of target double-stranded DNA following PAM recognition.
  • this recognition is achieved via interactions with the OBD, REC I, and a five a- helical bundle referred to as the PAM-interacting domain (PID).
  • Residues within the three domains interact with the sugar-phosphate backbone of the target DNA (FIG. 2B) and, in some cases such as residue N102, make direct contact with the nucleobases.
  • the interaction between N102 and nucleobase G(-l) may explain the preference for purines in this position as opposed to pyrimidines since a pyrimidine substitution would result in a base that is too distant from the interacting asparagine (FIG.
  • CRISPR-Cas systems are host-encoded pathways that protect microbes from viral infection using an adaptive RNA-guided mechanism. As described herein, using genome-resolved metagenomics, it was discovered that CRISPR systems are also encoded in diverse bacteriophages, where they occur as divergent and hypercompact anti-viral systems. Bacteriophage-encoded CRISPR systems belong to all six known CRISPR-Cas types, though some lack crucial components, suggesting alternate functional roles or host complementation. Described are multiple new Cas9-like proteins and families related to type V CRISPR-Cas systems, including the Cask RNA-guided nuclease family.
  • Cask recognizes double-stranded DNA using a uniquely structured CRISPR RNA (crRNA).
  • crRNA CRISPR RNA
  • the Cask-RNA-DNA structure determined by cryoelectron microscopy reveals a compact bilobed architecture capable of inducing genome editing in mammalian, Arabidopsis, and hexapioid wheat cells.
  • CRISPR-Cas systems confer resistance in prokaryotes against invading extrachromosomal elements, including viruses and plasmids (FIG. 8A).
  • microbes capture fragments of foreign genetic elements and incorporate them into their genomic CRISPR array using the Casl-Cas2 integrase.
  • Subsequent transcription of the array creates CRISPR RNAs (crRNAs) that bind to and direct CRISPR-associated (Cas) nucleases to target complementary nucleic acids.
  • crRNAs CRISPR RNAs
  • Cas CRISPR-associated nucleases
  • CRISPR-Cas systems include members of all six CRISPR types (types I- VI) as defined by bacterially encoded examples.
  • Evidence was found for new or alternative modes of nucleic acid interference involving phage-encoded type I, III, IV, and VI systems.
  • phage and phage-like sequences result in a several-fold expansion of CRISPR-Cas9 and - Cas 12 enzymes belonging to the type II and type V families that are widely deployed for genome editing applications.
  • Cask was found to have robust biochemical activity as an RNA-guided double-stranded DNA (dsDNA) cutter.
  • dsDNA RNA-guided double-stranded DNA
  • cryo-EM cryoclcctron microscopy
  • dctcrmincd molecular structure explains its use of a natural single-guide RNA for DNA binding, and cell-based experiments demonstrated robust endogenous genome editing activity in plant and human cells.
  • the compact architecture of Cask and other phage-encoded CRISPR-Cas proteins will help facilitate vector-based and direct delivery into cells for wide-ranging biotechnological applications.
  • phages encode CRISPR arrays, but few of these ( ⁇ 6%) include Cas effectors encoded nearby (FIG. 8D). In such situations, phages may produce their own guide RNAs but hijack the Cas effectors provided by their hosts. Consistent with this possibility, ⁇ 1% of phages encode only the Casl-Cas2 integrase used for the acquisition of new spacers, but no other Cas enzymes. In some cases, phage-encoded Casl contained a fusion to another protein such as reverse transcriptase, suggesting the possibility of the acquisition of RNA protospacers into the phage array.
  • Phage-encoded CRISPR-Cas systems include all six known types but with phage-specific properties
  • CRISPR-Cas systems occur in phages, and relative to host-encoded systems, they have various unique properties associated with their existence within phage genomes. These include missing sequence integration or targeting machinery as mentioned above, modified type Ill and VI systems that mitigate the abortive infection mechanism, and spacers that target other mobile genetic elements.
  • RNA transcripts of other mobile elements such as phage tail proteins or transposases
  • the CaslO protein converts ATP into a cyclic oligoadenylate (cOA) product, which allosterically activates an auxiliary Csm6 ribonuclease.
  • cOA cyclic oligoadenylate
  • the activated Csm6 amplifies the immune response by degrading RNA transcripts indiscriminately, thereby destroying the invasive transcriptome or inducing host cell dormancy or death, aborting the phage infectious cycle.
  • the CaslO subunit contains multiple mutations, hinting at an inability to produce cOA (FIG. 14), and Csm6 or a related CARF-domain ribonuclease is absent, similarly to archaeal Borg elements.
  • Csm6 or a related CARF-domain ribonuclease is absent, similarly to archaeal Borg elements.
  • type III phage systems may be capable of cleaving key RNA transcripts and genomic DNA of competing mobile elements to interfere with their infectious cycle without activating abortive infection in which cOA signaling triggers trans-cleavage of transcripts in the host cell.
  • Class 2 CRISPR-Cas systems including types II, V, and VI, generally employ single-subunit RNA-guided, nucleic acid-targeting interference enzymes.
  • Cas9 a, b, c
  • Casl2 a, b, c, f, i
  • miniature CRISPR-associated nucleases were identified in phages harboring both HNH and RuvC catalytic domains characteristic of Cas9. These miniature nucleases constitute phylogenetically distinct clades denoted as types II-x, -y, and -z (FIG. 10A). These systems lack the Casl, Cas2, or Csn2 sequence acquisition machinery (Figs.
  • bacteriophage genomes harbor an unusual enrichment of hypercompact type V effectors (Figs. ID and 3B) compared to abundance in bacteria, including hundreds of variants comprising 44 protein families that are evolutionarily distant from previously reported and experimentally validated miniature type V CRISPR-Cas nucleases, including Casl2f and Cas ⁇ D (FIG. 10B).
  • Figs. ID and 3B hypercompact type V effectors
  • CRISPR arrays associated with the type V families contained spacer sequences targeting competing dsDNA-based extrachromosomal elements that are predicted to infect the same host (FIG. 9). It was found in this work that in multiple related Biggiephages, miniature type V families including Casq and Cas D co-occurred with a type I system that termed here type I-X, of which only one example was had previously, bearing similarities to type I-C CRISPR systems but featuring a distinct helicase in place of the processive nuclease Cas3. Biggiephage genomes were recovered over a four-year time span, and remained identical save for their CRISPR arrays, which only exhibited minor differences (Figs. 15C and 15D).
  • the arrays of the type I-X system target the same circular extrachromosomal element, albeit with distinct spacers, as the array associated with co-occurring type V systems.
  • One such cryptic element harbored restriction enzymes and retron-based anti-phage defense systems that could limit Biggiephage infectivity, underscoring the dynamic nature of the evolutionary arms race between mobile elements in competition for host resources.
  • Type IV systems encoded in lytic phage genomes.
  • Type IV systems are predominantly found on plasmids, where their mechanisms of action are poorly understood and they sometimes lack a CRISPR array.
  • a type IV subtype is reported here that lacks the DinG hallmark gene and encodes in its place a CysH-likc protein bearing limited similarity to non- CRISPR associated CysH phosphoadenosine 5 '-phosphosulfate reductases.
  • the CRISPR array associated with this type IV-F system and a neighboring type V targets the type V Cas gene encoded in a competing cyanophage (FIG. 9).
  • Cask is a divergent phage -specific CRISPR-Cas enzyme with a unique guide RNA
  • CRISPR arrays associated with Cask contain spacer sequences complementary to dsDNA-based extrachromosomal elements predicted to infect the same Bacteroidetes host (FIG. 9).
  • CRISPR array transcripts consisting of repeats and spacer sequences acquired from previously encountered mobile genetic elements (MGEs)
  • MGEs mobile genetic elements
  • the Cask crRNA is predicted to form an elongated hairpin secondary structure not previously observed in guide RNAs associated with Cas 12 (FIG. 11 A).
  • crRNAs retain a similar predicted hairpin structure across the protein family (FIG. 16B).
  • Cask crRNAs contain conserved sequences at their 5' and-3' ends and in the center of the RNA (FIG. 11B).
  • the Cask-induced pre-crRNA processing yields a crRNA spacer sequence that is complementary to DNA target sites 14-17 nucleotides (nt) in length.
  • CRISPR-Cas systems target DNA sequences following or preceding a 2-5 bp Protospacer Adjacent Motif (PAM) for self-versus-non-self discrimination.
  • PAM Protospacer Adjacent Motif
  • This assay demonstrated the ability of crRNA-guided CasZ to cleave dsDNA, without requirement for additional RNA components, and a TTR PAM sequence specificity (FIG. 17A).
  • CasZ with host genome-targeting guides showed a reduction in colony-forming units (as a proxy for cell viability) of multiple orders of magnitude, in comparison to negative control of CasZ with a non-targeting guide (FIG. HE).
  • Cas ribonucleoproteins induce genome editing in endogenous genes in human and plant cells
  • CasZ RNPs generated promising genome-editing outcomes compared to Casl2a, and in at least one case exceeded Casl2a insertion-deletion (indel) percentages (FIG. 12A).
  • Casl2a insertion-deletion (indel) percentages FIG. 12A.
  • CasZ exhibited editing efficiencies of up to 18% at the endogenous PDS3 gene (FIG. 12B), notably higher than observed previously using Cas®.
  • the efficiency of editing was dependent on temperature, with no editing occurring at 23°C, an intermediate level of editing occurring at 28°C, and the highest level of editing occurring at 32°C.
  • the RuvC domain of CasZ is split into four- parts across the C-terminal half of the protein, likely hindering reliable alignment and clustering with reported Casl2 systems (FIG. 13D).
  • the REC I and REC II domains are also segmented in the protein sequence, with the PAM-interacting domain wedged within REC I as opposed to the N terminus of the protein as seen in Cas ⁇ D, but similar to Casl2i. In contrast to Cas ⁇ l>.
  • CasZ contains a Target Strand Loading (TSL) domain that likely functions to load the single-stranded DNA (ssDNA) substrate, in a position analogous to the “Nuc” domain that was incorrectly hypothesized in other type V CRISPR-Cas enzymes to be a second nuclease domain responsible for DNA cleavage.
  • TSL Target Strand Loading
  • CasZ also exhibits a distinct structure in the REC I domain compared to Cas ⁇ P (FIG. 19D).
  • the crRNA assumes a shape that blankets the protein, with a recognition lobe in CasZ that binds to distinct sequences and structural features of the guide RNA (Figs. 6C, and 19A-19C). Specifically, possible interactions between primarily polar or charged residues within the REC II domain in CasZ with the conserved motifs of the crRNA hairpin were observed (Figs. 4B and 19C). These residues are conserved across the protein family and likely interact either directly with the RNA nucleobases (Q452, N510), or with the RNA phosphate backbone to stabilize the guide (S451, K496, E444, N445, K503, Y619) (Figs. 6C and 19C).
  • CRISPR-Cas proteins initiate the unwinding of target dsDNA following PAM recognition.
  • this recognition is achieved via interactions with the oligonucleotide-binding domain (OBD), REC I, and a five a-helical bundle referred to as the PAM-interacting domain (PID).
  • OBD oligonucleotide-binding domain
  • PID PAM-interacting domain
  • Residues within the three domains interact with the sugar-phosphate backbone of the target DNA (FIG. 19B) and, in some cases such as residue N102, interact directly with the nucleobases.
  • phage genomes arc a natural reservoir of miniature singleeffector CRISPR-Cas systems, including DNA targeting type II and type V enzymes belonging to the Cas9 and Casl2 superfamilies.
  • Greek nomenclature was used here to indicate the phage origins of Casp, CasQ, and Cask, extending the naming convention established by phage-encoded Cas ⁇ I>.
  • the notable abundance of miniature Casl2-family enzymes in phages may reflect the size restriction of many phage genomes.
  • phages evolve quickly, they serve as important sources of new, divergent, or hypercompact CRISPR systems. Some of these, such as Cask, bear sufficient sequence-level divergence to cluster separately from Casl2 and Cas9 systems and obscure a direct evolutionary relationship with known Cas superfamilies. Nonetheless, Cask's structure, domain composition, and biochemical mechanism are similar to other type V enzymes. This finding implies that within phage genomes, distinct type V nucleases may have evolved multiple times from ancestral transposon-encoded TnpB families, which also function as RNA-guided nucleases.
  • the molecular structure of the Cask-crRNA-dsDNA complex reported in this study illustrates possible convergent evolution of RNA-guided effectors, despite extreme sequence divergence and distinct ancestral protein origins.
  • the domain architecture of Cask exhibits more segmentation and likely structural rearrangements than have been seen in other Casl2-family enzymes, with multiple functional domains split at the sequence level into separate segments that assemble during protein folding. This unique domain organization may explain the difficulty in accurately aligning Cask to previously reported enzymes, despite overall structural similarity.
  • This segmented domain composition does not compromise genome editing activity - as shown, e.g., for Cask-based editing of human, Arabidopsis, and wheat cells.
  • HEK293T cells obtained from University of California Berkeley Cell Culture Facility.
  • HEK293T cells were female in origin and grown in DMEM media (Corning) containing 10% fetal bovine serum (VWR) and lOOU/mL of penicillinstreptomycin (Gibco) at 37°C with 5% CO2.
  • VWR fetal bovine serum
  • Gibco penicillinstreptomycin
  • PDS3 gene-editing was tested in A. thaliana protoplasts isolated from the leaves of 4-week- old plants. Following RNP screening experiments, protoplasts were incubated in W5 solution (4 mM MES pH 5.7, 0.5 M mannitol, 20 mM KC1) at RT for 12 h, then moved to 37°C for 2.5 h, followed by a final incubation at room temperature for 48 h.
  • W5 solution 4 mM MES pH 5.7, 0.5 M mannitol, 20 mM KC1
  • CRISPR-RNA (crRNA) repeats from Phage-encoded CRISPR loci were identified using MinCED (github.com/ctSkennerton/minced). The repeats were compared by generating pairwise similarity scores using the Needleman-Wunsch algorithm. A heatmap was built using the similarity score matrix and hierarchical clustering produced dendrograms that were overlaid onto the heatmap to delineate different clusters of repeats. The RNA structures were predicted with ViennaRNA.41
  • PAM depletion assays were performed with plasmids containing the cask protein coding sequence as derived from metagenomics and a mini CRISPR targeting guide (pBAS18), or with plasmids that contained only the cask gene and a non-targeting guide (pBAS12). Assays were performed as three individual biological replicates. Plasmids containing cask and mini CRISPRs were transformed into E. coli BL21(DE3) (NEB). Subsequently, electrocompetent cells were prepared by ice-cold H20 and 10% glycerol washing. A plasmid library was constructed with 8 randomized nucleotides upstream of the (5') end of the target sequence.
  • Competent cells were transformed in triplicate by electroporation with 200 ng library plasmids (0.1 mm electroporation cuvettes (Bio-Rad) on a Micropulser electroporator (Bio-Rad)). After a 2 h recovery period, cells were plated on selective media and colony forming units were determined to ensure appropriate coverage of all possible combinations of the randomized 5' PAM region.
  • Strains were grown at 25 °C for 48 h on media containing appropriate antibiotics (either 100 pg/mL carbenicillin and 34 pg/mL chloramphenicol, or 100 pg/mL carbenicillin and 50 pg/mL kanamycin) and 0.05 mM isopropyl-p-D-thiogalactopyranoside (IPTG), or 200 nM anhydrotetracycline (aTc), depending on the vector to ensure propagation of plasmids and Cask effector production. Subsequently, propagated plasmids were isolated using a QIAprep Spin Miniprep Kit (Qiagen).
  • appropriate antibiotics either 100 pg/mL carbenicillin and 34 pg/mL chloramphenicol, or 100 pg/mL carbenicillin and 50 pg/mL kanamycin
  • IPTG isopropyl-p-D-thiogalactopyranoside
  • a flp recombination assay was performed in E. coli to eliminate the Kanamycin resistance cassette from E. coli strains that contain GFP and RFP expression cassettes integrated into the genome. Individual colonies of the E. coliMAm were picked to inoculate three 5 mL (LB) starter cultures to prepare electrocompetent cells the following day. 100 mL (LB) main cultures were inoculated from the starter cultures and grown vigorously shaking at 37°C to an OD600 of 0.6-0.7 before preparation of electrocompetent cells by repeated ice-cold H20 and 10% glycerol washes.
  • Cask vectors were generated containing codon optimized cask I gene and a guide comprised of its cognate repeat element and selections of spacers targeting the GFP DNA within the resulting E. coliMAon strain (pBAS41, pBAS42, pBAS43, pBAS44) were subcloned from pBAS12.
  • Cask vectors containing Caskl and a guide composed of a non-cognate repeat unit from cask2 and a GFP-targeting spacer (TAGCATCACCTTCACCCTCTCCACGGACAG)(SEQ ID NO: 158) guide were also subcloned to form pBAS40.
  • the Cask vectors and Cask vectors with a non-targeting guide control plasmid were transformed into 25 pL of electrocompetent cells with 100 ng of plasmid via electroporation in 0.1 mm electroporation cuvettes (Bio-Rad) on a Micropulser electroporator (Bio-Rad), cells were recovered in 1 mL recovery medium (Lucigen) shaking at 37°C for 1 h 10-fold dilution series were then prepared and 3.5 pL of the respective dilutions were spot-plated on LB-Agar containing the appropriate antibiotics and IPTG inducer. Plates were incubated overnight at 37°C and colonies were counted the following day to determine the transformation efficiency.
  • coli colonies that have grown in both cases even in undiluted samples is also indicative of possible trans-cleavage of nucleic acids (RNA or DNA), which can be used for diagnostic purposes by providing a sample containing the target nucleic acid with the Cask RNP and a ssDNA fluorophore-quencher (ssDNA-FQ) reporter or RNA fluorophore- quencher (ssRNA-FQ) reporter molecule, generating a strong fluorescence signal in the presence of the target nucleic acid compared to a markedly lower fluorescence signal in its absence.
  • ssDNA-FQ ssDNA fluorophore-quencher
  • ssRNA-FQ RNA fluorophore- quencher
  • Cask overexpression vectors containing a His-Tag were transformed into chemically competent E. coli BL21(DE3)-Star (QB3-Macrolab, UC Berkeley) and incubated overnight at 37°C on LB -Kan agar plates (50 pg/mL Kanamycin). Single colonies were picked to inoculate 50 mL (LB, Kanamycin 50 pg/mL) starter cultures which were incubated at 37°C shaking vigorously overnight.
  • the soluble fraction was loaded on a 5 mL Ni-NTA Superflow Cartridge (Qiagen) which had been pre-equilibrated in the same wash buffer. Bound proteins were washed with 20 column volumes (CV) wash buffer and subsequently eluted in 5 CV elution buffer (50 mM HEPES-Na pH 7.5 RT, 500 mM NaCl, 500 mM imidazole, 5% glycerol, and 0.5 mM TCEP).
  • the eluted proteins were concentrated to 1 mL before injection into a HiLoad 16/600 Superdex 200pg column (GE Healthcare) pre-equilibrated in sizeexclusion chromatography buffer (20 mM HEPES-Na pH 7.5 RT, 500 mM NaCl, 5% glycerol, and 0.5 mM TCEP). Peak fractions were concentrated to 1 mL and concentrations were determined using a NanoDrop 8000 Spectrophotometer (Thermo Scientific). Proteins were purified at a constant temperature of 4°C and concentrated proteins were kept on ice to prevent aggregation, snap-frozen in liquid nitrogen, and stored at -80°C. SDS-PAGE gel electrophoresis of Cask at varying stages of protein purification showed a protein size in line with computationally predicted values of 85 kDa.
  • RNA cleavage buffer containing 20 mM Tris-Cl (pH 7.5 at 37°C), 150 mM KC1, 5 mM MgC12, 1 mM TCEP, and 5% (v/v) glycerol.
  • Pre-crRNA substrates were 5'- radiolabeled with T4 PNK (NEB) in the presence of gamma 32P-ATP.
  • concentrations of Cask and 32P-labeled pre-crRNA substrates were 100 and 3 nM, respectively.
  • RNA hydrolysis ladders were prepared by incubating RNA probes in IX RNA Alkaline Hydrolysis Buffer (Invitrogen) at 95°C before the addition of 2x Quench Buffer.
  • crRNA oligonucleotides were synthesized by IDT and dissolved in DEPC-treated ddH20 to a concentration of 0.5 mM. Subsequently, the crRNA was heated to 65 °C for 3 min and allowed to cool down to room temperature. Cask RNP complexes were reconstituted at a concentration of 10 pM by incubation of 10 pM Cask and 12 pM crRNA for 10 min at RT in 2x cleavage buffer (20 mM Hepes-Na pH 7.5, 300 mM KC1, 10 mM MgC12, 20% glycerol, 1 mM TCEP).
  • RNPs were aliquoted to a volume of 10 pL, flash-frozen in liquid nitrogen, and stored at -80°C. RNP aliquots were thawed on ice before experimental use. Substrates were 5 '-end-labelled using T4-PNK (NEB) in the presence of 32P-y-ATP. Oligonucleotide-duplex targets were generated by combining 32P-labelled and unlabelled complementary oligonucleotides in a 1:1.5 M ratio.
  • Oligos were hybridized to a DNA-duplex concentration of 50 nM in hybridization buffer (10 mM Hepes-Na pH 7.5 RT, 150 mM NaCl), by heating for 5 min to 95°C and a slow cool down to RT in a heating block. Cleavage reactions were initiated by combining 200 nM RNP with 2 nM substrate in CB buffer and subsequently incubated at 37°C.
  • hybridization buffer 10 mM Hepes-Na pH 7.5 RT, 150 mM NaCl
  • DNA oligo activators were ordered from IDT to contain mismatches at each respective position, (A- > C, T- > G, C- > A, G- > T).
  • Cask RNPs were prepared as described above. Reactions were started by combining 100 nM RNP (100 nM Cask, 120 nM crRNA), 100 nM DNase Alert (IDT) FQ probe, with and without activator ssDNA and with the addition of a non-targeting guide or activator control in cleavage buffer in a 384 well flat bottom black polystyrene assay plate (#3820, Corning).
  • the PAM binding assay was conducted using NEB 5-alpha Competent E. coli cells. Plasmids containing the type I-X system included a targeting or non-targeting guide downstream of T7 promoters. PAM library plasmids contained sfGFP under the control of an araBAD promoter, downstream of the promoter was a six-nucleotide variable region of potential PAM sequences, resulting in loss of sfGFP fluorescence for a successful PAM binding event. All cultures used 2xYT media and were supplemented with kanamycin and ampicillin as needed for plasmid maintenance. Cell densities were maintained at greater than lOOx library coverage throughout the assay.
  • Transformations of type I-X systems with guide and library plasmids were conducted consecutively.
  • Type I-X systems with guides were transformed into NEB 5-alpha Competent E. coli cells following manufacturer’s instructions. Individual colonies were incubated at 37°C overnight at 250 RPM. Non-transformed cultures were included for library only and no plasmid controls. Cells were back diluted lOOOx and cultured to ABS600 ⁇ 0.6, pelleted, washed 3 times with sterile water, and resuspended in 10% glycerol to make them electrocompetent.
  • Type I-X systems with guide, and non-transformed cultures were electroporated at 1800V with 100 ng of PAM library stock and recovered for 1 h in SOC media. Recovered cells were plated with appropriate antibiotics and incubated overnight. Plates were scraped, resuspended, and incubated at 37°C 250 RPM for 3 h. 25% glycerol stocks were stored at -80°C.
  • Targeting, non-targeting, and library only strains were individually prepared for next generation sequencing by first purifying plasmid DNA using a Qiagen HiSpeed Plasmid Maxi Kit. Plates were gently scraped and resuspended in —50 mL 2xYT prior to pelleting. Concentrations were determined with a Nanodrop. In conjunction with the original naive PAM library stock control, PAM sequences were amplified using primers containing the 5' stub sequence GCTCTTCCGATCT. (SEQ ID NO: 159) Samples were submitted to the innovative Genomics Core for completion of library preparation and iSeq sequencing at greater than lOOx library coverage.
  • RNPs were formed in the SF nucleofection buffer (Lonza) with lOOpmol protein & 120pmol crRNA at lOpM concentration for 10' at RT. 78 pmol (IpL) of IDT Casl2a electroporation enhancer was then added. HEK293T cells (University of California Berkeley Cell Culture Facility) were added in a I OpL SF nucleofection buffer at 200,000 cells per nucleofection. 21 pL reactions were loaded into cuvettes and electroporated with pulse code DS-150 in a 4D-nucleofector (Lonza). Nucleofections were performed in triplicate for each guide RNA tested.
  • Leaf strips in solution were vacuum infiltrated for 30 min in darkness and then incubated for 6 h shaking at 70 rpm. After the incubation, the enzyme/protoplast solution was diluted with equal volume of W5 solution (2 mM MES pH 5.7, 154 mM NaCl, 125 mM CaC12, 5 mM KC1) and filtered through 40 pm cell strainers. Protoplasts were spun down at 80g for 3 min, then resuspended in 15mL W5 solution and left to aggregate in ice for 60 min.
  • W5 solution 2 mM MES pH 5.7, 154 mM NaCl, 125 mM CaC12, 5 mM KC1
  • MMG solution 4mM MES pH 5.7, 0.4 M mannitol, 15mM MgC12
  • Cask RNP complexes were reconstituted with 6 pM Cask protein, purified as described, and 10 pM guideRNA assembled in RNP reconstitution buffer (20mM Hepes-Na pH 7.5, 300 mM KC1, lOmM MgC12, 20% glycerol, ImM TCEP) and incubated for 20 min at 37°C. 25 pL of 6 pM assembled RNP were added to a 1.5 mL tube, then mixed with 200 pL protoplasts.

Landscapes

  • Health & Medical Sciences (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Genetics & Genomics (AREA)
  • Engineering & Computer Science (AREA)
  • Chemical & Material Sciences (AREA)
  • Zoology (AREA)
  • Organic Chemistry (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Wood Science & Technology (AREA)
  • Molecular Biology (AREA)
  • General Engineering & Computer Science (AREA)
  • Biomedical Technology (AREA)
  • Biotechnology (AREA)
  • Microbiology (AREA)
  • Biochemistry (AREA)
  • General Health & Medical Sciences (AREA)
  • Mycology (AREA)
  • Physics & Mathematics (AREA)
  • Biophysics (AREA)
  • Plant Pathology (AREA)
  • Medicinal Chemistry (AREA)
  • Cell Biology (AREA)
  • Micro-Organisms Or Cultivation Processes Thereof (AREA)
  • Peptides Or Proteins (AREA)

Abstract

The present disclosure provides CRISPR-Cas effector polypeptides that are referred to herein as "Cas12L" polypeptides, "Casλ" polypeptides, or "Cas-lambda" polypeptides. The present disclosure provides a nucleic acid encoding a Casλ polypeptide of the present disclosure. The present disclosure provides methods of modifying a target nucleic acid using a Casλ polypeptide.

Description

CRISPR-CAS EFFECTOR POLYPEPTIDES AND METHODS OF USE THEREOF
CROSS REFERENCE
[0001] This application claims benefit of U.S. Provisional Patent Application No. 63/354,590 filed June 22, 2022, which application is incorporated herein by reference in its entirety.
INCORPORATION BY REFERENCE OF SEQUENCE LISTING PROVIDED AS AN XML FILE
[0002] A Sequence Listing is provided herewith as a Sequence Listing XML, “BERK- 472WO_SeqList.xml” created on June 21, 2023 and having a size of 203,574 bytes. The contents of the Sequence Listing XML are incorporated by reference herein in their entirety.
INTRODUCTION
[0003] Clustered Regularly Interspaced Short Palindromic Repeats (CRISPR)-Cas systems comprise a CRISPR-associated (Cas) effector polypeptide and a guide nucleic acid. Such CRISPR-Cas systems can bind to and modify a targeted nucleic acid. The programmable nature of these CRISPR-Cas effector systems has facilitated their use as a versatile technology for use in, e.g., gene editing.
SUMMARY
[0004] The present disclosure provides CRISPR-Cas effector polypeptides that are referred to herein as “Casl2L” polypeptides, “CasZ” polypeptides, or “Cas-lambda” polypeptides. The present disclosure provides a nucleic acid encoding a CasZ polypeptide of the present disclosure. The present disclosure provides methods of modifying a target nucleic acid using a CasZ polypeptide.
BRIEF DESCRIPTION OF THE DRAWINGS
[0005] FIG. 1 is a phylogenetic tree comparing various CasZ (Cas-lambda) sequences, with CasPhi as an outgroup.
[0006] FIG. 2A-2E (A) Cryo-EM maps of the CasZ- guide RNA- DNA complex in two 90°-rotated orientations. (B) Cartoon representation of the CasZ-gRNA-DNA complex. Insets highlight residues predicted to be responsible for PAM recognition. Hydrogen bonds are shown as dashed lines. (C) Model of guide RNA-target DNA complex, with insets highlighting residues conserved across the protein family that are predicted to be interacting with the RNA. (D) Close-up views of the residues predicted to be responsible for recognition of the seed and low mismatch tolerance regions observed in (Fig 6F). (E) Direct comparison of CasZ and CasQ with a dashed bubble highlighting the CasZ TSL domain. Differences in Reel (Blue) can also be observed between the two proteins.
[0007] FIG. 3A-3E depict (A) Schematic representation of the CasZ-gRNA-DNA complex. Disordered linkers are shown as dotted lines. Insets for protein-DNA interactions are shown in (Fig. S5) (B) Cryo-EM maps of the CasZ- guide RNA- DNA complex. The target strand is shown in cyan and the non-target strand is shown in magenta. (C) Cryo-EM-based model of guide RNA-target DNA complex. (D) Schematic of the domain organization and secondary structure of CasZ. (E) Hierarchical clustering dendrogram of different repeats with their predicted secondary structures. CasZ can still cleave ssDNA in trans with guide RNAs consisting of non-cognate repeats that are divergent at the sequence level.
[0008] FIG. 4 depicts fuorescence output using oligonucleotide activators with mismatches at each respective position along the target DNA. “0” indicates no mismatches (control).
[0009] FIG. 5A-5M provide amino acid sequences of Cas-lambda polypeptides. Protein sequences, top to bottom: SEQ ID Nos.:7-19. Repeat sequences, top to bottom: SEQ ID Nos.:20-26)
[0010] FIG. 6 provides the nucleotide sequence of a UBQ10 promoter (SEQ ID NO:5).
[0011] FIG. 7A-7S provides an alignment of amino acid sequences of various CasL proteins. FIG. 7 also provides a consensus sequence (Top (“Casl2L_l_257905508”) to Bottom (“CasL_61”): SEQ ID NOs.:27-68).
[0012] FIG. 8A-8D Diversity of CRISPR-encoding phages and the hosts they predate. (FIG. 8A) An illustration of the mechanism of CRISPR interference as an anti-viral system used by bacteria and bacteriophages. (FIG. 8B) Protein-clustering network analysis based on the number of shared protein clusters between the CRISPR-encoding phages in this study and RefSeq phages. The plot is composed of viral clusters where each node represents a phage genome, and each edge is the hypergeometric similarity between genomes based on shared protein clusters. (FIG. 8C) Genome size distribution of circularized CRISPR-containing phages from this study (n = 152). (FIG. 8D) A heatmap showing the number of CRISPR phage genomes containing each CRISPR type with respect to major bacterial phyla which they infect. “Unknown” indicates CRISPR phages that could not be assigned to any of the known types. Phyla are organized in the plot from left to right based on CRISPR array abundance and are concordant with the number of bacterial genomes available for each phylum.
[0013] FIG. 9A-9E Diversity of phage-encoded CRISPR systems highlights anti-phage capability. (FIG. 9A) Phage CRISPR spacers target other mobile genetic elements across bacterial phyla to abrogate superinfection via diverse mechanisms. (FIG. 9B- FIG. 9E) Graphical illustrations of representative phage CRISPR loci harboring known and novel subtypes and their proposed mechanisms and functions as determined via spacer targeting and protein sequence analysis. Special consideration is given to phages carrying multiple loci.
[0014] FIG. 10A-10C Diversity of Class 2 CRISPR-Cas systems on phage and phage-like genomes. (FIG. 10A) Maximum likelihood phylogenetic tree of phage encoded and bacterially encoded type II nucleases and respective predicted ancestral IscB nucleases. Bootstrap and approximate likelihood-ratio test values > 90 are denoted on the branches, and the bootstrap support percentages at branch points are shown in numbers. Bottom illustration of genomic CRISPR-Cas loci of type II and representative type V systems previously employed in genome editing applications. (FIG. 10B) Maximum likelihood phylogenetic tree of phage -like (purple) and previously reported (teal) bacterially encoded type V nuclease clades and respective predicted ancestral TnpB nucleases. Outer ring denotes protein sizes with purple indicating previously reported or publicly available sequences and pink denoting systems from this study. (C) Maximum likelihood phylogenetic tree of phage and previously reported bacterially encoded type VI nucleases.
[0015] FIG. 11A-11H Cask processes its own crRNA and cleaves dsDNA. (FIG. 11A) Caskl from huge Mahaphages displays a unique crRNA hairpin compared to known Casl2 enzymes, and is reminiscent of stem-loop 1 of the engineered SpyCas9 single gRNA (sgRNA). (FIG. 11B) Cask repeats uniquely display highly conserved nucleotide sequences at the 5', 3', and center of the RNA. (FIG. 11C) 5' radiolabeling of crRNAs indicates that Cask 1 uniquely processes its own crRNA in the spacer region (or 3' end). OH-ladder enables the pre-crRNA processing sites (red triangles) to be derived. (FIG. 1 ID) Processing of the repeat-spacer-repeat pre-crRNA substrate occurs similarly to (C) in the spacer region and does not occur in the absence of Mg2+, indicating a role for the RuvC in the processing mechanism. (FIG. 1 IE) Cask with targeting or non-targeting guides validates its capacity to cleave DNA flanking experimentally determined PAMs in E. coll at different dilutions. (FIG. 11F) Cleavage assay targeting dsDNA for mapping of the cleavage structure. (FIG. 11G) Scheme illustrating the DNA cleavage pattern. (FIG. 11H) Efficiency and kinetics of DNA cleavage of NTS and TS (n = 3 each, mean ± SD).
[0016] FIG. 12A-12D Cask RNPs are functional for editing endogenous genes in human, Arabidopsis, and wheat cells with large deletion profiles. (FIG. 12A) Indel efficiency using Cask and Casl2a RNPs with identical spacers targeting VEGF, and Cask RNPs targeting EMX1 genes in HEK293T cells, and a schematic of the in vitro model of DNA cleavage outcomes following DNA cleavage by Cask. (FIG. 12B and FIG. 12C) Indel efficiencies in Arabidopsis thaliana protoplasts show significantly higher levels of editing than previously achieved by Cas<D for the same PDS3 gene, and (C) in wheat protoplasts targeting the disease resistance gene Snn5. (FIG. 12D) Indel profdes generated by Cask RNP administration show primarily large deletions, and little change without Cask. [0017] FIG. 13A-13F. Structure of Cask-gRNA-DNA complex. (FIG. 13 A) Schematic representation of the Cask-gRNA-DNA complex. Disordered linkers are shown as dotted lines. Insets for protein-DNA interactions are shown in FIG. 18. (FIG. 13B) Cryo-EM maps of the Cask-guide-RNA- DNA complex. The target strand is shown in cyan and the non-target strand is shown in magenta. (FIG. 13C) Cryo-EM-based model of guide RNA-target DNA complex. (FIG. 13D) Schematic of the domain organization and secondary structure of Cask. (FIG. 13E) Hierarchical clustering dendrogram of different repeats with their predicted secondary structures. Cask can still cleave ssDNA in trans with guide RNAs consisting of non-cognate repeats that are divergent at the sequence level. (FIG. 13F) Fluorescence output using oligonucleotide activators with mismatches at each respective position along the target DNA. “0” indicates no mismatches (control). Insets relating to protein-DNA interactions are shown in FIG. 19. [0018] FIG. 14A-14B Sequence similarity of phage-encoded CRISPR-Cas systems, related to FIG. 10. (FIG. 14A) Alignment of CaslO effectors from Huge Phages with those sourced from bacterial genomes. Phage type III CaslO’s are predicted to cleave DNA via CaslO HD nuclease, but lack the residues required for the Palm domain to generate cyclic oligonucleotide signaling molecules. (FIG. 14B) Alignment of Cas7 proteins from phages with those sourced from bacterial genomes. Phage type III Cas7’s have conserved motifs that are predicted to cleave RNA.
[0019] FIG. 15A-15D Structure of phage-encoded Cas9-like systems and comparison of type I-X CRISPR arrays, related to Figs. 1 and 2. (FIG. 15A) Predicted domain organization for hypercompact phage-encoded Cas9-like systems. (FIG. 15B) Predicted models for Cas9-like phage-encoded systems. (FIG. 15C) comparison of type I-X and Cas<D-encoded Biggiephages recovered across a four-year time frame using Mauve, with the CRISPR repeat locations denoted in blue. Identical sequences at the nucleotide level are shown in green, with differences shown in brown or red. (FIG. 15D) Phage type I-X CRISPR arrays from metagenomes sampled from the same site over the span of four years show remarkably stable arrays.
[0020] FIG. 16A-16D Divergent properties of Cask, related to FIG. 11. (FIG. 16 A) Cask remote homolog searches across public databases led to poor hits and no similarity to known CRISPR-Cas proteins, where only poor hits (green-black) were observed in one RuvC motif. (FIG. 16B) Comparison of crRNA repeat similarity across orthologs. (FIG. 16C) Comparison of protein similarity across orthologs. (FIG. 16D) A time-series experiment incubating Cask with 5’ radiolabeled crRNAs with the product run on a 20% Urea PAGE gel supports the finding that Cask uniquely processes its own single crRNA in the spacer region (or 3‘ end).
[0021] FIG. 17A-17C Cask PAM specificity and comparison with other Cas otholog trans-cleavage and indel profiles, related to FIG. 11. (FIG. 17A) The most depleted 5’ PAMs resulting from the PAM depletion assay, indicating DNA recognition and cleavage preferences for CasX I . (FIG. 17B) DNAse alert trans-cleavage assay with the same molarities of Casl2a, CasZ. and CasO targeting the same ssDNA activator. (FIG. 17C) CasZ indel profile in HEK293T cells compared to AsCasl2a. Guide 107 targets the antisense strand, while guide 109 targets the sense strand of VEGFa.
[0022] FIG. 18A-18H Cryo-EM workflow, related to FIG. 13. (FIG. 18 A) Map generation pipeline in cryoSPARC. (FIG. 18B- FIG. 18D) Representative 2D class averages of the final set of particles, (C) the corresponding 3D maps resulting from ab initio reconstruction, and further (D) from heterogeneous refinement. (FIG. 18E) Local resolution map as calculated in cryoSPARC v3.3. (FIG. 18F) Orientation distribution of the final set of refined particles. (FIG. 18G and FIG. 18H) gold standard, and (H) map versus model FSC curves of the model refined to the LocSpiral map and plotted with the final cryoSPARC sharp experimental map.
[0023] FIG. 19A-19E Structure of Cask ternary complex, related to FIG. 13. (FIG. 19A) Cryo-EM maps of the Cas -guide RNA- DNA complex in two 90°-rotated orientations. (FIG. 19B) Cartoon representation of the CasX-gRNA-DNA complex. Insets highlight residues N102, S253, N254 predicted to be responsible for PAM recognition. Hydrogen bonds are shown as dashed lines. (FIG. 19C) Model of guide RNA-target DNA complex, with insets highlighting residues conserved across the protein family that are predicted to be interacting with the RNA. (FIG. 19D) Close-up views of the residues predicted to be responsible for recognition of the seed and low mismatch tolerance regions observed in (FIG. 13F). (FIG. 19E) Direct comparison of Cask and Cas<l> (PDB-ID: 7LYS) with a dashed bubble highlighting the CasX TSL domain. Differences in Reel (Blue) can also be observed between the two proteins.
[0024] FIG. 20 Structural comparison of Casl2 orthologs, related to FIG. 13. Structural comparison of all DNA-targeting Casl2’s in order of increasing RNP size: Cas<I» (7LYS26), CasX (6NY327), Casl2i (6W5C50), Casl2a (5XUS51), Casl2b (5WTI52), Casl2f (7C7L53).
[0025] FIG. 21A-21F Trans-cleavage assay, related to FIG. 17. (FIG. 21 A- FIG. 21F)Trans- cleavage assays conducting with RNase Alert reporter substrate at decreasing RNP concentrations (FIG. 21 A- FIG. 21 C) for binary and ternary complexes of CasX, and with (FIG. 21 D) PolyU RNA reporter substrates, and testing cell viability assays with cells expressing CasX in conjunction with (FIG. 21E) targeting and (FIG. 2 IF) non-targeting guides.
DEFINITIONS
[0026] “Heterologous,” as used herein, means a nucleotide sequence or an amino acid sequence that is not found in the native nucleic acid or protein, respectively. For example, relative to a subject CRISPR- Cas effector polypeptide, a heterologous polypeptide comprises an amino acid sequence from a protein other than the CRISPR-Cas effector polypeptide. As another example, a CRISPR-Cas effector polypeptide can be fused to an active domain from a non-CRISPR-Cas effector polypeptide; the sequence of the active domain can be considered a heterologous polypeptide (it is heterologous to the CRISPR-Cas effector polypeptide). As another example, in a guide nucleic acid, a heterologous guide nucleotide sequence (present in a targeting segment) that can hybridize with a target nucleotide sequence (target region) of a target nucleic acid is a nucleotide sequence that is not found in nature in a guide nucleic acid together with a binding segment that can bind to a CRISPR-Cas effector polypeptide of the present disclosure. For example, in some cases, a heterologous target nucleotide sequence (present in a heterologous targeting segment) is from a different source than a binding nucleotide sequence (present in a binding segment) that can bind to a CRISPR-Cas effector polypeptide of the present disclosure. For example, a guide nucleic acid may comprise a guide nucleotide sequence (present in a targeting segment) that can hybridize with a target nucleotide sequence present in a eukaryotic target nucleic acid. A guide nucleic acid of the present disclosure can be generated by human intervention and can comprise a nucleotide sequence not found in a naturally-occurring guide nucleic acid.
[0027] The term “naturally-occurring” as used herein as applied to a nucleic acid, a protein, a cell, or an organism, refers to a nucleic acid, cell, protein, or organism that is found in nature.
[0028] The terms “polynucleotide” and “nucleic acid,” used interchangeably herein, refer to a polymeric form of nucleotides of any length, either ribonucleotides or deoxynucleotides or combinations thereof. Thus, this term includes, but is not limited to, single-, double-, or multi-stranded DNA or RNA, genomic DNA, cDNA, DNA-RNA hybrids, or a polymer comprising purine and pyrimidine bases or other natural, chemically or biochemically modified, non-natural, or derivatized nucleotide bases. The terms “polynucleotide” and “nucleic acid” should be understood to include, as applicable to the embodiment being described, single-stranded (such as sense or antisense) and double-stranded polynucleotides .
[0029] The terms "polypeptide," "peptide," and "protein", arc used interchangeably herein, refer to a polymeric form of amino acids of any length, which can include genetically coded and non-genetically coded amino acids, chemically or biochemically modified or derivatized amino acids, and polypeptides having modified peptide backbones. The term includes fusion proteins, including, but not limited to, fusion proteins with a heterologous amino acid sequence.
[0030] Polypeptides as described herein also include polypeptides having various amino acid additions, deletions, or substitutions relative to the native amino acid sequence of a polypeptide of the present disclosure. In some embodiments, polypeptides that are homologs of a polypeptide of the present disclosure contain non-conservative changes of certain amino acids relative to the native sequence of a polypeptide of the present disclosure. In some embodiments, polypeptides that are homologs of a polypeptide of the present disclosure contain conservative changes of certain amino acids relative to the native sequence of a polypeptide of the present disclosure, and thus may be referred to as conservatively modified valiants. A conservatively modified variant may include individual substitutions, deletions or additions to a polypeptide sequence which result in the substitution of an amino acid with a chemically similar amino acid. Conservative substitution tables providing functionally similar amino acids are well- known in the art. Such conservatively modified variants are in addition to and do not exclude polymorphic variants, interspecies homologs, and alleles of the disclosure. The following eight groups contain amino acids that are conservative substitutions for one another: 1) Alanine (A), Glycine (G); 2) Aspartic acid (D), Glutamic acid (E); 3) Asparagine (N), Glutamine (Q); 4) Arginine (R), Lysine (K); 5) Isoleucine (I), Leucine (L), Methionine (M), Valine (V); 6) Phenylalanine (F), Tyrosine (Y), Tryptophan (W); 7) Serine (S), Threonine (T); and 8) Cysteine (C), Methionine (M) (see, e.g., Creighton, Proteins (1984)). A modification of an amino acid to produce a chemically similar amino acid may be referred to as an analogous amino acid.
[0031] A polynucleotide or polypeptide has a certain percent “sequence identity” to another polynucleotide or polypeptide, meaning that, when aligned, that percentage of bases or amino acids are the same, and in the same relative position, when comparing the two sequences. Sequence similarity can be determined in a number of different manners. To determine sequence identity, sequences can be aligned using the methods and computer programs, including BLAST, available over the world wide web at ncbi.nlm.nih.gov/BLAST. See, e.g., Altschul et al. (1990), J. Mol. Biol. 215:403-10. Another alignment algorithm is FASTA, available in the Genetics Computing Group (GCG) package, from Madison, Wisconsin, USA, a wholly owned subsidiary of Oxford Molecular Group, Inc. Other techniques for alignment are described in Methods in Enzymology, vol. 266: Computer Methods for Macromolecular Sequence Analysis (1996), ed. Doolittle, Academic Press, Inc., a division of Harcourt Brace & Co., San Diego, California, USA. Of particular interest are alignment programs that permit gaps in the sequence. The Smith- Waterman is one type of algorithm that permits gaps in sequence alignments. See Meth. Mol. Biol. 70: 173-187 (1997). Also, the GAP program using the Needleman and Wunsch alignment method can be utilized to align sequences. See J. Mol. Biol. 48: 443-453 (1970).
[0032] “Recombinant,” as used herein, means that a particular nucleic acid (DNA or RNA) is the product of various combinations of cloning, restriction, and/or ligation steps resulting in a construct having a structural coding or non-coding sequence distinguishable from endogenous nucleic acids found in natural systems. Generally, DNA sequences encoding the structural coding sequence can be assembled from cDNA fragments and short oligonucleotide linkers, or from a series of synthetic oligonucleotides, to provide a synthetic nucleic acid which is capable of being expressed from a recombinant transcriptional unit contained in a cell or in a cell-free transcription and translation system. Such sequences can be provided in the form of an open reading frame uninterrupted by internal non-translated sequences, or introns, which are typically present in eukaryotic genes. Genomic DNA comprising the relevant sequences can also be used in the formation of a recombinant gene or transcriptional unit. Sequences of non-translated DNA may be present 5’ or 3’ from the open reading frame, where such sequences do not interfere with manipulation or expression of the coding regions, and may indeed act to modulate production of a desired product by various mechanisms (see “DNA regulatory sequences”, below). Similarly, the term “recombinant” polypeptide refers to a polypeptide which is not naturally occurring, e.g., is made by the artificial combination of two otherwise separated segments of amino sequence through human intervention. Thus, e.g., a polypeptide that comprises a heterologous amino acid sequence is recombinant.
[0033] Thus, e.g., the term “recombinant” polynucleotide or “recombinant” nucleic acid refers to one which is not naturally occurring, e.g., is made by the artificial combination of two otherwise separated segments of sequence through human intervention. This artificial combination is often accomplished by either chemical synthesis means, or by the artificial manipulation of isolated segments of nucleic acids, e.g., by genetic engineering techniques. Such is usually done to replace a codon with a redundant codon encoding the same or a conservative amino acid, while typically introducing or removing a sequence recognition site. Alternatively, it is performed to join together nucleic acid segments of desired functions to generate a desired combination of functions. This artificial combination is often accomplished by either chemical synthesis means, or by the artificial manipulation of isolated segments of nucleic acids, e.g., by genetic engineering techniques.
[0034] The terms “DNA regulatory sequences,” “control elements,” and “regulatory elements,” used interchangeably herein, refer to transcriptional and translational control sequences, such as promoters, enhancers, polyadenylation signals, terminators, protein degradation signals, and the like, that provide for and/or regulate expression of a coding sequence and/or production of an encoded polypeptide in a host cell.
[0035] The term “transformation” is used interchangeably herein with “genetic modification” and refers to a permanent or transient genetic change induced in a cell following introduction of new nucleic acid (e.g., DNA exogenous to the cell) into the cell. Genetic change (“modification”) can be accomplished either by incorporation of the new nucleic acid into the genome of the host cell, or by transient or stable maintenance of the new nucleic acid as an episomal element. Where the cell is a eukaryotic cell, a permanent genetic change is generally achieved by introduction of new DNA into the genome of the cell. In prokaryotic cells, permanent changes can be introduced into the chromosome or via extrachromosomal elements such as plasmids and expression vectors, which may contain one or more selectable markers to aid in their maintenance in the recombinant host cell. Suitable methods of genetic modification include viral infection, transfection, conjugation, protoplast fusion, electroporation, particle gun technology, calcium phosphate precipitation, direct microinjection, and the like. The choice of method is generally dependent on the type of cell being transformed and the circumstances under which the transformation is taking place (i.e. in vitro, ex vivo, or in vivo). A general discussion of these methods can be found in Ausubel, et al, Short Protocols in Molecular Biology, 3rd ed., Wiley & Sons, 1995.
[0036] “Operably linked” refers to a juxtaposition wherein the components so described are in a relationship permitting them to function in their intended manner. For instance, a promoter is operably linked to a coding sequence if the promoter affects its transcription or expression. As used herein, the terms “heterologous promoter” and “heterologous control regions” refer to promoters and other control regions that are not normally associated with a particular nucleic acid in nature. For example, a “transcriptional control region heterologous to a coding region” is a transcriptional control region that is not normally associated with the coding region in nature.
[0037] The use of the terms “a,” “an,” and “the,” and similar referents in the context of describing the disclosure (especially in the context of the following claims) arc to be construed to cover both the singular and the plural, unless otherwise indicated herein or clearly contradicted by context. The terms “comprising,” “having,” “including,” and “containing” are to be construed as open-ended terms (i.e., meaning “including, but not limited to,”) unless otherwise noted. Recitation of ranges of values herein are merely intended to serve as a shorthand method of referring individually to each separate value falling within the range, unless otherwise indicated herein, and each separate value is incorporated into the specification as if it were individually recited herein. For example, if the range 10-15 is disclosed, then 11, 12, 13, and 14 are also disclosed. All methods described herein can be performed in any suitable order unless otherwise indicated herein or otherwise clearly contradicted by context. The use of any and all examples, or exemplary language (e.g., “such as”) provided herein, is intended merely to better illuminate the embodiments of the disclosure and does not pose a limitation on the scope of the disclosure unless otherwise claimed. No language in the specification should be construed as indicating any non-claimed element as essential to the practice of the embodiments of the disclosure. [0038] As used herein, the term “about” used in connection with an amount indicates that the amount can vary by 10% of the stated amount. For example, “about 100” means an amount of from 90-110. Where about is used in the context of a range, the “about” used in reference to the lower amount of the range means that the lower amount includes an amount that is 10% lower than the lower amount of the range, and “about” used in reference to the higher amount of the range means that the higher amount includes an amount 10% higher than the higher amount of the range. For example, “from about 100 to about 1000” means that the range extends from 90 to 1100.
[0039] The term “and/or” as used herein a phrase such as “A and/or B” is intended to include both A and B; A or B; A (alone); and B (alone). Likewise, the term “and/or” as used herein a phrase such as “A, B, and/or C” is intended to encompass each of the following embodiments: A, B, and C; A, B, or C; A or C; A or B; B or C; A and C; A and B; B and C; A (alone); B (alone); and C (alone).
[0040] The terms “isolated” and “purified” as used herein refers to a material that is removed from at least one component with which it is naturally associated (e.g., removed from its original environment). The term “isolated,” when used in reference to an isolated protein, refers to a protein that has been removed from the culture medium of the host cell that expressed the protein. As such an isolated protein is free of extraneous or unwanted compounds (e.g., nucleic acids, native bacterial or other proteins, etc.). [0041] It is understood that aspects and embodiments of the present disclosure described herein include “comprising,” “consisting,” and “consisting essentially of’ aspects and embodiments.
[0042] It is to be understood that one, some, or all of the properties of the various embodiments described herein may be combined to form other embodiments of the present disclosure. These and other aspects of the present disclosure will become apparent to one of skill in the art. These and other embodiments of the present disclosure are further described by the detailed description that follows.
[0043] The techniques and procedures described or referenced herein are generally well understood and commonly employed using conventional methodology by those skilled in the art, such as, for example, the widely utilized methodologies described in Sambrook et al., Molecular Cloning: A Laboratory Manual 3d edition (2001) Cold Spring Harbor Laboratory Press, Cold Spring Harbor, N.Y.; Current Protocols in Molecular Biology (F.M. Ausubel, et al. eds., (2003)); the series Methods in Enzymology (Academic Press, Inc.): PCR 2: A Practical Approach (M.J. MacPherson, B.D. Hames and G.R. Taylor eds. (1995)), Harlow and Lane, eds. (1988); Oligonucleotide Synthesis (M.J. Gait, ed., 1984); Methods in Molecular Biology, Humana Press; Cell Biology: A Laboratory Notebook (J.E. Cellis, ed., 1998) Academic Press; Animal Cell Culture (R.I. Freshney), ed., 1987); Introduction to Cell and Tissue Culture (J.P. Mather and P.E. Roberts, 1998) Plenum Press; Cell and Tissue Culture: Laboratory Procedures (A. Doyle, J.B. Griffiths, and D.G. Newell, eds., 1993-8) J. Wiley and Sons; Gene Transfer
Vectors for Mammalian Cells (J.M. Miller and M.P. Calos, eds., 1987); PCR: The Polymerase Chain
DETAILED DESCRIPTION
[0044] The present disclosure provides CRISPR-Cas effector polypeptides that are referred to herein as “Casl2L” polypeptides, “Cas ” polypeptides, or “Cas-lambda” polypeptides. The present disclosure provides a nucleic acid encoding a CasZ polypeptide of the present disclosure. The present disclosure provides methods of modifying a target nucleic acid using a CasZ polypeptide.
[0045] In general, a Casl2L polypeptide of the present disclosure is capable of forming a ribonucleoprotein (RNP) complex by binding to or otherwise interacting with a guide nucleic acid (e.g., a guide RNA (gRNA)). The Casl2L-gRNA ribonucleoprotein complex is capable of being targeted to a target nucleic acid via base pairing between the guide RNA and a target nucleotide sequence in the target nucleic acid that is complementary to the sequence of the guide RNA. The guide RNA thus provides the specificity for targeting a particular target nucleic. Once the Casl2L-gRNA ribonucleoprotein complex has come into association with a target nucleic acid by virtue of the targeting of the RNP complex to that target nucleic acid by the guide RNA, the Casl2L protein is able to bind to the target nucleic acid. In some cases, the Casl2L polypeptide will modify the target nucleic acid. In some cases, the modification comprises homology-directed repair (HDR). In some cases, the modification comprises non-homologous end joining (NHEJ). Where a Casl2L polypeptide is a fusion polypeptide comprising: i) a Casl2L polypeptide; and ii) one or more heterologous polypeptides, in some cases, the heterologous polypeptide modifies the target nucleic acid, or a polypeptide associated with the target nucleic acid.
[0046] Accordingly, the present disclosure provides nucleic acid-guided (e.g., RNA-guided) CRISPR-Cas effector polypeptides for use in CRISPR-based targeting systems in cells (e.g., eukaryotic cells), where the CRISPR-Cas systems provide for modification (“editing”) of a target nucleic acid and/or modification of a polypeptide associated with a target nucleic acid. As an example, the present disclosure provides Cas12L polypeptides for use in CRISPR-based targeting systems in plants. Provided herein are Casl2L polypeptides, nucleic acids encoding the same, compositions containing the same, and methods of using the same to e.g. edit a target nucleic acid. The present disclosure provides ribonucleoprotein complexes containing a Casl2L polypeptide and a guide RNA which may be used to e.g. edit a target nucleic acid. Provided herein are guide RNAs that can bind or otherwise interact with Casl2L polypeptides, nucleic acids encoding the same, compositions containing the same, and methods of using the same to e.g. edit a target nucleic acid. METHODS OF MODIFYING A TARGET NUCEEIC ACID IN A EUKARYOTIC CELL
[0047] The present disclosure provides methods of modifying a target nucleic acid in a eukaryotic cell. The methods comprise contacting the target nucleic acid in the eukaryotic cell with: a) a Casl2L polypeptide; and b) a Casl2L guide nucleic acid.
[0048] In some cases, the contacting is carried out at a temperature of from about 25°C to about 40°C (e.g., from about 25 °C to about 28°C, from about 28°C to about 30°C, from about 28°C to about 32°C, from about 30°C to about 32°C, from about 30°C to about 37°C, from about 32°C to about 34°C, from about 30°C to about 34°C, from about 34°C to about 37°C, or from about 37°C to about 40°C). [0049] In some cases, modification of a target nucleic acid does not substantially occur at a temperature of less than 28°C. For example, in some cases, modification of a target nucleic acid does not substantially occur at a temperature of from about 17°C to about 25°C, or from about 25°C to about 28°C. In some cases, modification of a target nucleic acid occurs, if at all, at less than 75%, less than 50%, less than 25%, less than 10%, or less than 5%, of the extent to which the modification of the target nucleic acid occurs when the modification is conducted at 32°C. For example, in a population of eukaryotic cells, each containing the target nucleic acid, at least 50%, at least 60%, at least 70%, at least 75%, at least 80%, at least 85%, at least 90%, at least 95%, at least 98%, or at least 99%, of the cells would, following contact at 32 °C with a Casl2L polypeptide and a Casl2L guide nucleic acid, contain a modification of the target nucleic acid, which modification was effected by the Casl2L polypeptide (together with the Casl2L guide nucleic acid); while, if the contacting was carried out at a temperature of less than 28°C (e.g., from 17°C to 28°C, from 25°C to 28°C, or from 17°C to 25°C), less than 50%, less than 45%, less than 40%, less than 35%, less than 30%, less than 25%, less than 20%, less than 15%, less than 10%, or less than 5%, of the eukaryotic cells would contain a modification of the target nucleic acid.
[0050] A target nucleic acid can be present in any of a variety of eukaryotic cells; i.e., a method of the present disclosure can be carried out in a variety of eukaryotic cells. Examples of eukaryotic cells in which a method of the present disclosure can be carried out include, e.g., a plant cell, an insect cell, an arthropod cell, a mammalian cell, a fish cell, a fungal cell, a yeast cell, an amphibian cell, and an avian cell. Suitable cells include cells of members of the kingdom Protista, including, but not limited to, algae (e.g., green algae, red algae, glaucophytes, cyanobacteria); fungus-like members of Protista, e.g., slime molds, water molds, etc.; animal-like members of Protista, e.g., flagellates (e.g., Euglena), amoeboids (e.g., amoeba), sporozoans (e.g, Apicomplexa, Myxozoa, Microsporidia), and ciliates (e.g., Paramecium). Suitable cells include cells of members of the kingdom Fungi, including, but not limited to, members of any of the phyla: Basidiomycota (club fungi; e.g., members of Agaricus, Amanita, Boletus, Cantherellus, etc.); Ascomycota (sac fungi, including, e.g., Saccharomyces); Mycophycophyta (lichens); Zygomycota (conjugation fungi); and Deuteromycota Suitable cells include cells members of the kingdom Plantae, including, but not limited to, members of any of the following divisions: Bryophyta (e.g., mosses), Anthocerotophyta (e.g., hornworts), Hepaticophyta (e.g., liverworts), Lycophyta (e.g., club mosses), Sphenophyta (e.g., horsetails), Psilophyta (e.g., whisk ferns), Ophioglossophyta, Pterophyta (e.g., fems), Cycadophyta, Gingkophyta, Pinophyta, Gnetophyta, and Magnoliophyta (e.g., flowering plants). Suitable cells include cells of members of the kingdom Animalia, including, but not limited to, members of any of the following phyla: Porifera (sponges); Placozoa; Orthonectida (parasites of marine invertebrates); Rhombozoa; Cnidaria (corals, anemones, jellyfish, sea pens, sea pansies, sea wasps); Ctenophora (comb jellies); Platyhelminthes (flatworms); Nemertina (ribbon worms); Ngathostomulida (jawed worms)p Gastrotricha; Rotifera; Priapulida; Kinorhyncha; Loricifera; Acanthocephala; Entoprocta; Nemotoda; Nematomorpha; Cycliophora; Mollusca (mollusks); Sipuncula (peanut worms); Annelida (segmented worms); Tardigrada (water bears); Onychophora (velvet worms); Arthropoda (including the subphyla: Chelicerata, Myriapoda, Hexapoda, and Crustacea, where the Chelicerata include, e.g., arachnids, Merostomata, and Pycnogonida, where the Myriapoda include, e.g., Chilopoda (centipedes), Diplopoda (millipedes), Paropoda, and Symphyla, where the Hexapoda include insects, and where the Crustacea include shrimp, krill, barnacles, etc.; Phoronida; Ectoprocta (moss animals); Brachiopoda; Echinodermata (e.g. starfish, sea daisies, feather stars, sea urchins, sea cucumbers, brittle stars, brittle baskets, etc.); Chaetognatha (arrow worms); Hemichordata (acorn worms); and Chordata. Suitable members of Chordata include any member of the following subphyla: Urochordata (sea squirts; including Ascidiacea, Thaliacea, and Larvacea); Cephalochordata (lancelets); Myxini (hagfish); and Vertebrata, where members of Vertebrata include, e.g., members of Petromyzontida (lampreys), Chondrichthyces (cartilaginous fish), Actinopterygii (ray-finned fish), Actinista (coelocanths), Dipnoi (lungfish), Reptilia (reptiles, e.g., snakes, alligators, crocodiles, lizards, etc.), Aves (birds); and Mammalian (mammals). Suitable plants include any monocotyledon and any dicotyledon.
10051] In some cases, the cell is a unicellular organism in vitro. In some cases, the cell is a unicellular organism in vitro. In some cases, the cell is obtained from a multicellular organism and is cultured as a unicellular entity in vitro. In some cases, the cell is present in a multicellular organism in vivo.
[0052] In some cases, a eukaryotic cell (e.g., a multicellular organism comprising the eukaryotic cell) is modified to include a Casl2L polypeptide and a Casl2L guide nucleic acid, where temperature is used to control activity of the Casl2L polypeptide in the context of gene drive. For example, at a first temperature (e.g. from about 17°C to about 25°C or from about 17°C to about 28°C), the gene drive does not occur. However, at a second temperature (e.g., from about 25°C to about 40°C (e.g., from about 25°C to about 28°C, from about 28°C to about 30°C, from about 28°C to about 32°C, from about 30°C to about 32°C, from about 30°C to about 37°C, from about 32°C to about 34°C, from about 30°C to about 34°C, from about 34°C to about 37°C, or from about 37°C to about 40°C), gene drive occurs. Such temperaturedependent activity can be used to control populations such as mosquitoes, fruit flies, and the like.
Gene editing in plant cells
[0053] The following description applies to plant cells. However, similar temperature control of Casl2L-mediated gene editing can be carried out in any of a variety of eukaryotic cells.
[0054] In one aspect, the present disclosure provides a method for modifying a target nucleic acid in a plant cell, the method including: a) introducing into a plant cell a Casl2L polypeptide and a guide RNA, and b) cultivating the plant cell under conditions whereby the Casl2L polypeptide and guide RNA are present as a complex that targets the target nucleic acid to generate a modification in the target nucleic acid. In some cases, the Casl2L polypeptide comprises an amino acid sequence having at least 50%, at least 60%, at least 70%, at least 80%, at least 85%, at least 90%, at least 95%, at least 98%, at least 99%, or 100%, amino acid identity to the amino acid sequence depicted in any one of FIG. 5A-5M. In some cases, the Casl2L polypeptide includes one or more nuclear localization signals (NLS). In some cases, at least one of the one or more nuclear localization signals is an SV40-type NLS. In some cases, the Casl2L polypeptide and the guide RNA are encoded in one or more recombinant nucleic acids in the plant cell; i.e., a recombinant nucleic acid comprising a nucleotide sequence encoding the Casl2L polypeptide and/or the guide RNA. In some cases, one of more of the recombinant nucleic acids include at least one intron. In some cases, the nucleotide sequence encoding the Casl2L polypeptide and/or the nucleotide sequence encoding the guide RNA is operably linked to a promoter that is functional in plants. In some cases, the promoter is a UBQ10 promoter. In some cases, the UBQ10 promoter includes a nucleic acid sequence that is at least 80% identical to SEQ ID NO:1. In some embodiments that may be combined with any of the preceding embodiments, expression of the guide RNA is driven by an RNA Polymerase II promoter (i.e., the nucleotide sequence encoding the guide RNA is operably linked to an RNA Polymerase II (“Pol II”) promoter). In some cases, the Pol IT promoter is a CmYLCV promoter or a 2x35S promoter. In some cases, the promoter comprises a nucleic acid sequence that is at least 80% identical to SEQ ID NO:2 or SEQ ID NO:3. In some cases that may be combined with any of the preceding embodiments, the plant cell is cultivated at a temperature in the range of about 23°C to about 37°C. In some embodiments that may be combined with any of the preceding embodiments, the plant cell is cultivated at a temperature in the range of about 20°C to about 25°C. In some embodiments that may be combined with any of the preceding embodiments, the modification includes a deletion of one or more nucleotides in the target nucleic acid. In some embodiments that may be combined with any of the preceding embodiments, the deletion includes deletion of 3-15 nucleotides in the target nucleic acid. In some cases, the deletion includes deletion of 9 nucleotides in the target nucleic acid. In some embodiments that may be combined with any of the preceding embodiments, the target nucleic acid sequence is located in a region of repressive chromatin. In some embodiments that may be combined with any of the preceding embodiments, the target nucleic acid sequence is located in a region of open chromatin. In some cases, the modification includes an insertion of one or more nucleotides in the target nucleic acid. In some cases, the modification includes a combination of insertions of one or more nucleotides into, and deletions of one or more nucleotides from, the target nucleic acid. In some embodiments that may be combined with any of the preceding embodiments, the modification may include a combination of insertions and deletions of 3-15 nucleotides in the target nucleic acid. In some embodiments that may be combined with any of the preceding embodiments, the guide RNA is recombinantly fused to a ribozyme. In some embodiments that may be combined with any of the preceding embodiments, the plant cell comprises a genetic background that exhibits reduced susceptibility to transgene silencing.
[0055] In another aspect, the present disclosure provides a recombinant vector including a nucleic acid sequence that includes a promoter that is functional in plants and that encodes a Casl2L polypeptide and a guide RNA. In some embodiments, the Casl2L polypeptide comprises an amino acid sequence having at least 50%, at least 60%, at least 70%, at least 80%, at least 85%, at least 90%, at least 95%, at least 98%, at least 99%, or 100%, amino acid identity to the amino acid sequence depicted in any one of FIG. 5A-5M. In some embodiments that may be combined with any of the preceding embodiments, the Casl2L polypeptide includes a nuclear localization signal (NLS). In some embodiments, the nuclear localization signal is an SV40-type NLS. In some embodiments that may be combined with any of the preceding embodiments, the nucleic acid sequence includes at least one intron. In some embodiments, the promoter is a UBQ10 promoter. In some embodiments, the UBQ10 promoter includes a nucleic acid sequence that is at least 80% identical to SEQ ID NO:1. In some embodiments that may be combined with any of the preceding embodiments, expression of the guide RNA is driven by an RNA Polymerase II promoter. In some embodiments, the RNA Polymerase II promoter is a CmYLCV promoter or a 2x35S promoter. In some embodiments, the promoter comprises a nucleic acid sequence that is at least 80% identical to SEQ ID NO:2 or SEQ ID NOG. In some embodiments that may be combined with any of the preceding embodiments, the guide RNA is recombinantly fused to a ribozyme.
[0056] In another aspect, the present disclosure provides a plant cell including a Casl2L polypeptide and a guide RNA, wherein the Cas12L polypeptide and guide RNA are capable of existing in a complex that targets a target nucleic acid to generate a modification in the target nucleic acid. In some embodiments, the Casl2L polypeptide includes an amino acid sequence having at least 50%, at least 60%, at least 70%, at least 80%, at least 85%, at least 90%, at least 95%, at least 98%, at least 99%, or 100%, amino acid identity to the amino acid sequence depicted in any one of FIG. 5 A-5M. In some embodiments that may be combined with any of the preceding embodiments, the Casl2L polypeptide includes a nuclear localization signal (NLS). In some embodiments, the nuclear localization signal is an SV40-type NLS. In some embodiments that may be combined with any of the preceding embodiments, the Casl2L polypeptide and guide RNA are encoded from one or more recombinant nucleic acids in the plant cell. In some embodiments, one of more of the recombinant nucleic acids include at least one intron. In some embodiments, one of more of the recombinant nucleic acids include a promoter that is functional in plants. In some cases, the promoter is a UBQ10 promoter. In some embodiments, the UBQ10 promoter includes a nucleic acid sequence that is at least 80% identical to SEQ ID NO:1. In some embodiments that may be combined with any of the preceding embodiments, expression of the guide RNA is driven by an RNA Polymerase II promoter. In some embodiments, the RNA Polymerase II promoter is a CmYLCV promoter or a 2x35S promoter. In some embodiments, the promoter comprises a nucleic acid sequence that is at least 80% identical to SEQ ID NO:2 or SEQ ID NOG. In some embodiments that may be combined with any of the preceding embodiments, the plant cell is cultivated at a temperature in the range of about 23°C to about 37°C. In some embodiments that may be combined with any of the preceding embodiments, the plant cell is cultivated at a temperature in the range of about 20°C to about 25 °C. In some embodiments that may be combined with any of the preceding embodiments, the modification includes a deletion of one or more nucleotides in the target nucleic acid. In some embodiments that may be combined with any of the preceding embodiments, the deletion includes deletion of 3-15 nucleotides in the target nucleic acid. In some embodiments, the deletion includes deletion of 9 nucleotides in the target nucleic acid. In some cases, the modification includes an insertion of one or more nucleotides into the target nucleic acid (e.g., an insertion of from 3 to 15 nucleotides). In some cases, the modification includes a combination of an insertion of one or more nucleotides into, and a deletion of one or more nucleotides from, the target nucleic acid. In some embodiments that may be combined with any of the preceding embodiments, the target nucleic acid sequence is located in a region of repressive chromatin. In some embodiments that may be combined with any of the preceding embodiments, the target nucleic acid sequence is located in a region of open chromatin. In some embodiments that may be combined with any of the preceding embodiments, the guide RNA is recombinantly fused to a ribozyme. In some embodiments that may be combined with any of the preceding embodiments, the plant cell comprises a genetic background that exhibits reduced susceptibility to transgene silencing. [0057] In another aspect, the present disclosure provides a plant including a plant cell of any one of the preceding embodiments, wherein the plant includes a modified nucleic acid. In some embodiments, the modification includes a deletion of one or more nucleotides in the nucleic acid. In some embodiments that may be combined with any of the preceding embodiments, the deletion includes deletion of 3-15 nucleotides. In some embodiments, the deletion includes deletion of 9 nucleotides. In some cases, the modification includes an insertion of one or more nucleotides into the target nucleic acid (e.g., an insertion of from 3 to 15 nucleotides). In some cases, the modification includes a combination of an insertion of one or more nucleotides into, and a deletion of one or more nucleotides from, the target nucleic acid.
[0058] In another aspect, the present disclosure provides a progeny plant of the plant of any one of the preceding embodiments, wherein the progeny plant includes a modified nucleic acid. In some embodiments, the modification includes a deletion of one or more nucleotides in the nucleic acid. In some embodiments that may be combined with any of the preceding embodiments, the deletion includes deletion of 3-15 nucleotides. In some embodiments, the deletion includes deletion of 9 nucleotides. In some cases, the modification includes an insertion of one or more nucleotides into the target nucleic acid (e.g., an insertion of from 3 to 15 nucleotides). In some cases, the modification includes a combination of an insertion of one or more nucleotides into, and a deletion of one or more nucleotides from, the target nucleic acid.
[0059] The temperature difference in editing activity of a Casl2L polypeptide can be used to control the various functions and features of a plant cell, such as reproduction, flowering, flower color, ripening, disease resistance, pathogen resistance, and the like. For example, in some cases, a method of the present disclosure comprises: a) contacting a target nucleic acid in a plant cell with: i) a Casl2L polypeptide; and ii) a Casl2L guide nucleic acid; b) maintaining a plant cell for a first period of time at a first temperature of from about 17 °C to about 25 °C, wherein the target nucleic acid is substantially not modified by the Casl2L polypeptide; and c) maintaining the plant cell for a second period of time at a second temperature of from about 25°C to about 37°C, wherein the target nucleic acid is modified by the Cas12L polypeptide. As another example, in some cases, a method of the present disclosure comprises: a) contacting a target nucleic acid in a plant cell with: i) a Casl2L polypeptide; and ii) a Casl2L guide nucleic acid; b) maintaining the plant cell for a first period of time at a first temperature of from about 25°C to about 37°C (or from about 25 °C to about 40°C), wherein the target nucleic acid is modified by the Casl2L polypeptide; and c) maintaining a plant cell for a second period of time at a second temperature of from about 17°C to about 25°C, wherein the target nucleic acid is substantially not modified by the Cas12L polypeptide. [0060] In some cases, the modification results in repression of expression of a target nucleic acid (e.g., silencing of a target nucleic acid). In some cases, the modification is deletion of all or a portion of a target nucleic acid. In some cases, the modification includes an insertion of one or more nucleotides into the target nucleic acid. In some cases, the modification includes a combination of an insertion of one or more nucleotides into, and a deletion of one or more nucleotides from, the target nucleic acid. In some cases, the modification results in expression of a target nucleic acid. In some cases, the modification results in expression of a target nucleic acid, where the target nucleic acid is an endogenous plant nucleic acid. In some cases, the modification results in expression of a target nucleic acid, where the target nucleic acid is heterologous to the plant cell (e.g., the target nucleic acid is a transgene or an exogenous nucleic acid).
[0061] For example, where the modification results in repression of expression of a target nucleic acid (e.g., silencing of a target nucleic acid), in some cases, the modification results in repression of expression of a gene product in a pigment production pathway that provides for a change in color of a flower, a bract, a leaf, or another plant part. Pigment production pathway gene products include those involved in an anthocyanin synthesis pathway (e.g., anthocyanin-5-acyltransferase; chaicone synthase; chaicone isomerase; flavanone 3 -hydroxylase; flavonoid 3 ’-hydroxylase; flavonoid 3 ’,5 ’-hydroxylase; flavonoid 3-O-glucosyltransferase; anthocyanidin synthase; any of a variety of enzymes that modify anthocyanidin, such as glucosyltransferases, acyltransferases, and methyltransferases; and the like; see, e.g., Liu et al. (2018) Front. Chem. 6:52); a betalain synthesis pathway (e.g., dihydroxyphenylalanine (DOPA) 4,5-dioxygenase; cyclic-DOPA 5-O-gIucosyltransferase; and the like); a carotenoid synthesis pathway; and the like. See, e.g., Tanaka et al. (2008) Plant J. 54:733.
[0062] As one non-limiting example, at a first temperature (e.g., a temperature of from about 17°C to about 25°C), the bract of a poinsettia is green, and at a second temperature (e.g., a temperature of from about 28°C to about 37°C, or from about 28°C to about 40°C), the bract of the poinsettia is red.
[0063] In some cases, the target nucleic acid comprises a nucleotide sequence encoding a pigment production pathway enzyme. At a first temperature of from about 17°C to about 25°C, the target nucleic acid is not modified by the Casl2L polypeptide; thus, the plant or the plant part will contain the pigment produced as a result of activity of the pigment production pathway. At a second temperature of from about 25°C to about 37°C or from about 25°C to about 40°C, the target nucleic acid is modified by the Casl2L polypeptide; thus, the plant or the plant part lacks the pigment that would normally be produced by action of the pigment production pathway.
[0064] In other cases, the target nucleic acid is an endogenous nucleic acid or a transgene encoding a negative regulator of a pigment production pathway. At a first temperature of from about 17°C to about 25°C, the target nucleic acid is not modified by the Casl2L polypeptide; thus, the pigment production pathway is blocked by the negative regulator and the pigment is not produced. At a second temperature of from about 28°C to about 37°C or from about 28°C to about 40°C, the target nucleic acid is modified by the Casl2L polypeptide, thus allowing the pigment production pathway to function and change of the color of the plant or the plant part.
[0065] In some cases, where the modification results in repression of expression of a target nucleic acid (e.g., silencing of a target nucleic acid), in some cases, the modification results in repression of expression of a gene product in fruit ripening. Target nucleic acids include, e.g., Colorless non-ripening (CNR), nonripening (NOR), ripening inhibitor (RIN), DNA demthylase-2 (DML2), and ethylene insensitive-3 (EIN3). See, e.g., Wang et al. (2002) Plant Cell 14 Suppl: S 131. As one non-limiting example, at a first temperature (e.g., a temperature of from about 17°C to about 25°C), the fruit of a plant is unripe, and at a second temperature (e.g., a temperature of from about 28°C to about 37°C), the fruit of the plant ripens.
[0066] In some cases, the target nucleic acid is a nucleic acid in a fruit, where the nucleic acid compries a nucleotide sequence encoding an ethylene production pathway enzyme or signaling pathway polypeptide. At a first temperature of from about 17 °C to about 25 °C, the target nucleic acid is not modified by the Casl2L polypeptide; thus, the fruit continues the ripening process. At a second temperature of from about 28°C to about 37°C or from about 28°C to about 40°C, the target nucleic acid is modified by the Casl2L polypeptide; thus, the ripening process in the fruit is slowed down.
[0067] In other cases, the target nucleic acid is an endogenous nucleic acid or a transgene encoding a negative regulator of ethylene production or signaling pathway. At a first temperature of from about 17°C to about 25°C, the target nucleic acid is not modified by the Casl2L polypeptide; thus, the production or signaling of ethylene is blocked, resulting in slower ripening of the fruit. At a second temperature of from about 28°C to about 37°C or from about 28°C to about 40°C, the target nucleic acid is modified by the Casl2L polypeptide, thus allowing the fruit to ripen.
[0068] As another example, the modification results in expression of a transgene that confers resistance to insects or disease (e.g., a fungal disease, a bacterial disease), where the expression of such transgene occurs at a second temperature (e.g., a temperature of from about 28°C to about 37°C) and does not substantially occur at a first temperature (e.g., a temperature of from about 17°C to about 25°C). In some cases, the transgene is a plant disease resistance gene. Plant defenses are often activated by specific interaction between the product of a disease resistance gene in the plant and the product of a corresponding avirulence (Avr) gene in the pathogen. A plant can be genetically modified with a transgene that confers resistance to specific pathogen strains. For example: i) the tomato Cf-9 gene confers resistance to Cladosporiumfulvum; ii) the tomato Pto gene confers resistance to Pseudomonas syringae; iii) the Arabidopsis RSP2 gene confers resistance to Pseudomonas syringae; and the like. A plant that is genetically modified with a transgene, and that is “resistant” to a disease-causing pathogen, is one that is more resistant (e.g., at least 20%, at least 30%, at least 40%, at least 50%, at least 60%, at least 70%, or at least 80% more resistant) to the disease-causing pathogen as compared to the wild type plant (a plant of the same species that does not comprise the transgene). In some cases, the transgene is a nucleic acid comprising a nucleotide sequence encoding a Bacillus thuringiensis (Bt) polypeptide, a derivative thereof, or a synthetic polypeptide modeled after a Bt polypeptide. Examples of suitable Bt polypeptides include a Bt delta-endotoxin polypeptide. In some cases, the transgene comprises a nucleotide sequence encoding a peticidal polypeptide, where non-limiting examples of such pesticidal polypeptides include, e.g., insecticidal proteins from Pseudomonas sp. such as PSEEN3174 (Monalysin (2011) PLoS Pathogens 7:1-13); insecticidal proteins from Photorhabdus sp. and Xenorhabdus sp.; a PIP-1 polypeptide; an AflP- 1 A and/or AfIP-lB polypeptide; a PHI-4 polypeptide; a PIP-47 polypeptide; a PIP-72 polypeptide; a PtIP-50 polypeptide; a PtIP-65 polypeptide; a PtIP-83 polypeptide; a PtIP-96 polypeptide; a deltaendotoxin such as a Cryl, Cry2, Cry3, Cry4, Cry5, Cry6, Cry7, Cry8, Cry9, CrylO, Cryl l, Cryl2, Cryl3, Cryl4, Cryl5, Cryl6, Cryl7, Cryl8, Cryl9, Cry20, Cry21, Cry22, Cry23, Cry24, Cry25, Cry26, Cry27, Cry 28, Cry 29, Cry 30, Cry31, Cry32, Cry33, Cry34, Cry35, Cry36, Cry37, Cry38, Cry39, Cry40, Cry41, Cry42, Cry43, Cry44, Cry45, Cry 46, Cry47, Cry49, Cry 51, or Cry55 class of delta-endotoxin genes of B. thuringiensis', a CrylA polypeptide (see, e.g., U.S. Patent Nos. 5,880,275 and 7,858,849); a DIG-3 polypeptide (see, e.g., U.S. Pat. Nos. 8,304,604 and 8,304,605); a DIG-11 polypeptide (see, e.g., U.S. Pat. Nos. 8,304,604 and 8,304,605); a CrylB polypeptide; a CrylC polypeptice; a CrylF polypeptide; a Cry2 polypeptide (see, e.g., U.S. Pat. No. 7,064,249); a Cry3A polypeptide; a Cry4 polypeptide; a Cry5 polypeptide; a Cry6 polypeptide; a Cry8 polypeptide; a Cry9 polypeptide; a Cry46 protein, a Cry 51 protein, a Cry binary toxin; a TIC901 or related toxin; an AXMI-027, AXMI-036, or AXMI-038 polypeptide (see, e.g., U.S. Pat. No. 8,236,757); a vegetative insecticidal protein (Vip; see, e.g., Gupta et al. (2021) Front. Microbiol. 12:659736); and the like. In some cases, the transgene is a nucleic acid comprising a nucleotide sequence encoding an insect-specific polypeptide that, upon expression, disrupts the physiology of the affected pest; where such polypeptides include, e.g., an insect diuretic hormone receptor, an allatostatin, and the like. In some cases, the transgene is a nucleic acid comprising a nucleotide sequence encoding an enzyme involved in the modification, including the post-translational modification, of a biologically active molecule; for example, a glycolytic enzyme, a proteolytic enzyme, a lipolytic enzyme, a nuclease, a cyclase, a transaminase, an esterase, a hydrolase, a phosphatase, a kinase, a phosphorylase, a polymerase, an elastase, a chitinase, or a glucanase. [0069] As an example, the modification can result in expression of a transgene, where the transgene is a nucleic acid comprising a nucleotide sequence encoding a lectin, where the nucleotide sequence is operably linked to a plant-specific promoter, e.g., a phloem-specific promoter, or the like. As another example, the modification can result in expression of a transgene, where the transgene is a nucleic acid comprising a nucleotide sequence encoding a co-ACTX-Hvla toxin (Hvt) (a component of the venom of the Australian funnel web spider Hadronyche versuta (Khan et al. (2006) Transgenic Res. 15:349). As another example, the modification can result in expression of a transgene, where the transgene is a nucleic acid comprising a nucleotide sequence encoding a lectin and a nucleotide sequence encoding Hvt. Such a transgene can confer broad-spectrum resistance against lepidopteran (e.g., Helicoverpa armigera and Spodoptera litura) and hemipteran (e.g., Myzus persicae, Phenacoccus solenopsis, and Bemisia tabaci) insect pests. See, e.g., Rauf et al. (2019) Nature Scientific Reports 9:6745
[0070] In some cases, the modification results in increased expression of an endogenous plant gene product that has insecticidal activity. Such endogenous plant proteins include, e.g., lectins, ribosomeinactivating proteins, enzymes inhibitors, arcelins, chitinases, ureases, and modified storage proteins. See, e.g., Carlini and Grossi-de-Sa (2002) Toxicon. 40:1515. For example, in some cases, the modification results in increased expression of an endogenous jasmonic acid pathway protein.
[0071] As another example, a transgene can be a nucleic acid comprising a nucleotide sequence encoding an enzyme that cleaves a protein of a plant pathogen. For example, a transgene can be a nucleic acid comprising a nucleotide sequence encoding a plant apoplastic subtilisin-like protease, such as tomato P69B, which is able to cleave a secreted protein PC2 from the potato late blight pathogen Phytophthora infestans, thus triggering downstream immune responses. See, e.g., Wang et al. (2021) New Phytol. 229:3424.
[0072] As another example, a transgene can be a nucleic acid comprising a nucleotide sequence encoding an inhibitory RNA, such as a microRNA or a long double-stranded RNA, that inhibits an RNA of a plant pathogen. For example, a transgene can be a nucleic acid comprising a nucleotide sequence encoding TAS1 c-siR483 and TAS2-siR453, which targets the RNA produced by BC1G_10728, BClG_10508 and BC1G_O8464 genes of the fungal pathogen Botrytis cinerea. See, e.g., Cai et al. (2018) Science 360:1126.
[0073] In some cases, the target nucleic acid comprises a nucleotide sequence encoding a polypeptide that provides for resistance to a disease (by plant pathogen such as fungus or a bacterium) or for resistance to an insect (e.g., an insect that causes plant pathology). At a first temperature of from about 17°C to about 25°C, the target nucleic acid is not modified by the Cas12L polypeptide; thus, the plant is resistant to the fungus, bacterium, or insect. At a second temperature of from about 28°C to about 37°C or from about 28°C to about 40°C, the target nucleic acid is modified by the Casl2L polypeptide; thus, the plant is susceptible to the fungus, bacterium, or insect.
[0074] In other cases, the target nucleic acid is an endogenous nucleic acid or a transgene comprising a nucleotide sequence encoding a negative regulator of a disease resistance or insect resistance gene or pathway. At a first temperature of from about 17°C to about 25°C, the target nucleic acid is not modified by the Casl2L polypeptide; thus, the plant is susceptible to the fungus, bacterium, or insect. At a second temperature of from about 28°C to about 37°C or from about 28°C to about 40°C, the target nucleic acid is modified by the Casl2L polypeptide; thus, the polypeptide that provides for resistance is produced and the plant is resistant to the fungus, bacterium, or insect.
[0075] As another example, the modification results in expression of a transgene that confers resistance to an herbicide. For example, in some cases, the transgene is a nucleic acid comprising a nucleotide sequence encoding a polypeptide that confers resistance to an herbicide, such as an imidazolinone or a sulfonylurea, that inhibits the growing point or meristem; such polypeptides include, e.g., a mutant ALS or a mutant AHAS enzyme. As another example, in some cases, the transgene is a nucleic acid comprising a nucleotide sequence encoding a polypeptide that confers resistance to glyphosphate, e.g., where resistance can be conferred by a mutant 5-enolpyruvl-3-phosphikimate synthase gene (EPSP).
[0076] As another example, the modification controls male sterility/fertility. Examples include, e.g., a transgene that is a nucleic acid comprising a nucleotide sequence encoding barstar (an inhibitor of barnase), e.g., where the nucleotide sequence is operably linked to an anther-specific promoter or a pollen-specific promoter (see, e.g., Roque et al. (2019) Front. Plant Sci. 10:819); a a transgene that is a nucleic acid comprising a nucleotide sequence encoding barnase (Paul et al., (1992) Plant Mol. Biol. 19:611-622); and the like. Another example includes a transgene encoding a deacetylase gene under the control of a tapetum-specific promoter. Other male sterility genes include, e.g., MAC1, EMS1, and GNE2 (Sorensen et al. (2002) Plant J. 29:581-594). Further examples of male sterility genes include CMS-D2-2, CMS-hir, CMS-D8, CMS-D4, and CMS-C1 .
[0077] In some cases, the target nucleic acid comprises a nucleotide sequence that encodes a male reproductive pathway polypeptide. At a first temperature of from about 17°C to about 25°C, the target nucleic acid is not modified by the Casl2L polypeptide; thus, the plant is fertile. At a second temperature of from about 28°C to about 37°C or from about 28°C to about 40°C, the target nucleic acid is modified by the Casl2L polypeptide; thus, the plant is male sterile.
[0078] In other cases, the target nucleic acid is an endogenous nucleic acid or a transgene comprising a nucleotide sequence encoding a negative regulator of the male reproductive pathway. At a first temperature of from about 17°C to about 25°C, the target nucleic acid is not modified by the Casl2L polypeptide; thus, the male reproductive pathway is blocked, resulting in a male sterile phenotype. At a second temperature of from about 28°C to about 37°C or from about 28°C to about 40°C, the target nucleic acid is modified by the Casl2L polypeptide; thus, the male reproductive pathway is allowed to function and the plant is fertile.
Target Nucleic Acids
[0079] A Casl2L polypeptide can be targeted to a specific target nucleic acid to modify the target nucleic acid. As described above, Casl2L is targeted to a target nucleic acid based on its association/complex with a guide RNA that is able to hybridize with the particular target nucleotide sequence in the target nucleic acid. In this sense, the guide RNA provides the targeting functionality to target a particular target nucleotide sequence in a target nucleic acid. Various types of nucleic acids may be targeted to e.g. modulate their expression, as will be readily apparent to one of skill in the art.
[0080] Certain aspects of the present disclosure relate to targeting a target nucleic acid with a Casl2L polypeptide such that the Casl2L polypeptide is able to enact enzymatic activity at the target nucleic acid. In some cases, a Casl2L polypeptide/gRNA complex is targeted to a target nucleic acid and introduces an edit/modification into the target nucleic acid. In some cases, the edit/modification is to intr oduce a single-stranded break or a double stranded break into the nucleic acid backbone of the target nucleic acid.
[0081] Certain aspects of the present disclosure relate to target sites on target nucleic acids. A target site generally refers to a location of a target nucleic acid that is capable of being bound by a Casl2L/gRNA complex and subjected to the activity of a Casl2L polypeptide or variant thereof. In some cases, the target site may include both the nucleotide sequence hybridized with a guide RNA as well as at least 5, at least 10, at least 15, at least 20, at least 25, at least 30, at least 35, at least 40, at least 45, or at least 50 nucleotides or more on the 3’ side, the 5’ side, or both the 3’ and 5’ side of the nucleotide sequence in the target nucleic acid that is hybridized with a guide RNA. In some embodiments, the target site may contain at least 30, at least 40, at least 50, at least 60, at least 70, at least 80, at least 90, at least 100, at least 125, at least 150, at least 175, or at least 200 or more nucleotides.
[0082] In some cases, a Casl2L polypeptide is targeted to a particular locus. A locus generally refers to a specific position on a chromosome or other nucleic acid molecule. A locus may contain, for example, a polynucleotide that encodes a protein or an RNA. A locus may also contain, for example, a non-coding RNA, a gene, a promoter, a 5’ untranslated region (UTR), an exon, an intron, a 3’ UTR, or combinations thereof. In some cases, a locus may contain a coding region for a gene. [0083] In some cases, a Casl2L polypeptide is targeted to a gene. A gene generally refers to a polynucleotide that encodes a gene product (for example, a polypeptide or a noncoding RNA). A gene may contain a promoter, an enhancer sequence, a leader sequence, a transcriptional start site, a transcriptional stop site, a polyadenylation site, one or more exons, one or more introns, a 5’ UTR, a 3’ UTR, or combinations thereof. A gene sequence may contain a polynucleotide sequence encoding a promoter, an enhancer sequence, a leader sequence, a transcriptional start site, a transcriptional stop site, a poly adenylation site, one or more exons, one or more introns, a 5’ UTR, a 3’ UTR, or combinations thereof.
[0084] The target nucleic acid sequence may be located within the coding region of a target gene or upstream or downstream thereof. Moreover, the target nucleic acid sequence may reside endogenously in a target gene or may be inserted into the gene, e.g., heterologous, for example, using techniques such as homologous recombination. For example, a target gene of the present disclosure can be operably linked to a control region, such as a promoter, that contains a sequence that can be recognized by a guide RNA of the present disclosure such that a Casl2L polypeptide may be targeted to that sequence. In some embodiments, the target sequence may be a promoter or other regulatory region.
[0085] The target nucleic acid sequence may be located in a region of chromatin. In some embodiments, the target nucleic acid sequence to be edited by a Casl2L polypeptide may be in a region of open chromatin or similar region of DNA that is generally accessible to transcriptional machinery. Regions of open chromatin may be characterized by nucleosome depletion, nucleosome disruption, accessibility to transcriptional machinery, and/or a transcriptionally active state. Regions of open chromatin will be readily understood and identifiable by one of skill in the art. Editing a target nucleic acid sequence that is in a region of open chromatin may result in improved editing efficiency by the Casl2L polypeptide as compared to a corresponding control nucleic acid sequence (e.g. one that is present in a region of more closed, repressive, and/or transcriptionally inactive chromatin).
[0086] Target genes or nucleic acid regions to be edited by a Casl2L polypeptide of the present disclosure will be readily apparent to those of skill in the art depending on the particular application and/or purpose. For example, genes with particular agricultural importance may be edited/modified according to the methods of the present disclosure. Exemplary genes to be edited/modified may include, for example, those involved in light perception (e.g. PHYB, etc.); those involved in the circadian clock (e.g. CCA I, LHY, etc.); those involved in flowering time (e.g. CO, FT, etc.); those involved in meristem size (e.g. WUS, CLV3, etc.); those involved in plant architecture (S, SP, TFL1, SFT, etc.); those involved in ripening (e.g., genes in the ethylene production pathway); those involved in flower color; those involved in bract color; and those involved in embryogenesis, chromatin structure, stress response, growth and development, etc.
[0087] In some cases, the target nucleic acid is one that provides for resistance to an antimicrobial agent. Examples of such antimicrobial agents include penicillin, a cephalosporin, a monobactam, a carbapenem, a macrolide, an aminoglycoside, a quinolone, a sulfonamide, a tetracycline, a glycopeptide, a lipoglycopeptide, an oxazolidinone, a rifamycin, a tuberactinomycin, chloramphenicol, metronidazole, tinidazole, nitrofurantoin, teicoplanin, telavancin, linezolid, cycloserine 2, bacitracin, polymyxin B, viomycin, and capreomycin. In some cases, the target nucleic acid is one that provides for resistance to an antifungal agent, where examples of antifungal agents include an allylamine, an imidazole, a triazole, a thiazole, a polyene, and an echinocandin. In some cases, the target nucleic acid is one that provides for resistance to an insecticidal agent, where examples of insecticidal agents include a chloronicotinyl, a neonicotinoid, a carbamate, an organophosphate, a pyrethroid, an oxadiazine, a spinosyn, a cyclodiene, an organochlorine, a fiprole, a mectin, a diacylhydrazine, a benzoylurea, an organotin, a pyrrole, a dinitroterpenol, a METI, a tetronic acid, a tetramic acid, and a pthalamide.
[0088] In some cases, the target nucleic acid provides for resistance to a plant pathogen. In some cases, the plant pathogen is a bacterium, a fungus, a parasitic insect, a parasitic nematode, or a parasitic protozoan.
[0089] In some cases, the target nucleic acid is endogenous to the plant where the expression of one or more genes is modulated according to the methods described herein. In some cases, the target nucleic acid is a transgene of interest that has been inserted into a plant. Suitable target nucleic acids will be readily apparent to one of skill in the art depending on the particular need or outcome. The target nucleic acid sequence may be in e.g. a region of euchromatin (e.g. highly expressed gene), or the target nucleic acid sequence may be in a region of heterochromatin (e.g. centromere DNA).
[0090] In some cases, the target nucleic acid may be in a region of repressive chromatin. Repressive chromatin generally refers to regions of chromatin where transcription is repressed or otherwise generally transcriptionally inactive. Exemplary regions of repressive chromatin include, for example, regions with repressive DNA methylation, compact chromatin, and/or no transcription).
[0091] In some cases, a Casl2L polypeptide can be used to create mutations in plants that result in reduced or silenced expression of a target gene. In some cases, a Casl2L polypeptide can be used to create functional “overexpression” mutations in a plant by releasing repression of the target gene expression as a consequence of a modification that results in transcriptional activation of the target nucleic acid. Release of gene expression repression, which may lead to activation of gene expression, may be of a structural gene, e.g., one encoding a protein having for example enzymatic activity, or of a regulatory gene, e.g., one encoding a protein that in turn regulates expression of a structural gene.
[0092] In some cases, a Casl2L polypeptide can be used to control an endogenous biosynthetic pathway in a plant cell. In some cases, a Casl2L polypeptide can be used to control a heterologous biosynthetic pathway in a plant cell. Examples of biosynthetic pathways that can be controlled using a Casl2L polypeptide (together with a Casl2L guide nucleic acid) include, e.g., biosynthetic pathways involved in psychoactive alkaloid production (e.g., for reducing opium production by Papaver soniferum); biosynthetic pathways for production of cannabidiol; biosynthetic pathways for production of tetrahydrocannabinol; a phytic acid production pathway; and the like.
[0093] In some cases, a Casl2L polypeptide is used to control an endogenous glucosinolate production pathway. In some cases, the Casl2L polypeptide inhibits an endogenous glucosinolate production pathway, but only at a higher temperature (e.g., from about 25C to about 32C), where such higher temperature, and only just prior to (e.g., one week, two weeks, or three weeks) harvest of a vegetable intended for human consumption, where the vegetable is produced by the plant.
Casl2L Effector Polypeptides
[0094] Certain aspects of the present disclosure relate to Casl2L polypeptides and their use in facilitating the editing/modification of a tar get nucleic acid. Casl2L polypeptides generally function as RNA-guided DNA-binding proteins. Casl2L polypeptides may have endonuclease activity which can facilitate modification/editing of a target nucleic acid.
[0095] A Casl2L polypeptide (this term is used interchangeably with the term “Casl2L protein”) can bind and/or modify (e.g., cleave, nick, methylate, demethylate, etc.) a target nucleic acid and/or a polypeptide associated with target nucleic acid (e.g., methylation or acetylation of a histone tail) (e.g., in some cases, the Casl2L protein includes a fusion partner with an activity, and in some cases, the Casl2L protein provides nuclease activity). In some cases, the Casl2L protein is a naturally-occurring protein (e.g., naturally occurs in bacteriophage). In other cases, the Casl2L protein is not a naturally-occurring polypeptide (e.g., the Cas12L protein is a variant Cas12L protein, a fusion Cas12L protein, and the like). [0096] Assays to determine whether given protein interacts with a Casl2L guide RNA can be any convenient binding assay that tests for binding between a protein and a nucleic acid. Suitable binding assays (e.g., gel shift assays) will be known to one of ordinary skill in the art (e.g., assays that include adding a Casl2L guide RNA and a protein to a target nucleic acid). Assays to determine whether a protein has an activity (e.g., to determine if the protein has nuclease activity that cleaves a target nucleic acid and/or some heterologous activity) can be any convenient assay (e.g., any convenient nucleic acid cleavage assay that tests for nucleic acid cleavage). Suitable assays (e.g., cleavage assays) will be known to one of ordinary skill in the art.
[0097] A naturally occurring Casl2L protein functions as an endonuclease that catalyzes a double strand break at a specific sequence in a targeted double stranded DNA (dsDNA). The sequence specificity is provided by the associated guide RNA, which hybridizes to a target sequence within the target DNA. The naturally occurring Casl2L guide RNA is a crRNA, where the crRNA includes (i) a guide sequence that hybridizes to a target sequence in the target DNA and (ii) a protein binding segment which includes a stem-loop (hairpin - dsRNA duplex) that binds to the Casl2L protein.
[0098] In some cases, a Casl2L polypeptide suitable for use in a subject method and/or composition is (or is derived from) a naturally occurring (wild type) protein. Examples of naturally occurring Casl2L proteins are depicted in FIG. 5A-5M. In some cases, a Casl2L protein (of the subject compositions and/or methods) includes an amino acid sequence having 20% or more sequence identity (e.g., 30% or more, 40% or more, 50% or more, 60% or more, 70% or more, 80% or more, 85% or more, 90% or more, 95% or more, 97% or more, 98% or more, 99% or more, or 100% sequence identity) with any one of the Casl2L amino acid sequences depicted in FIG. 5A-5M.
[0099] In some cases, a Casl2L protein (of the subject compositions and/or methods) has more sequence identity to the amino acid sequence depicted in any one of FIG. 5A-5M than to any of the following: Casl2a proteins, Casl2b proteins, Casl2c proteins, Casl2d proteins, Casl2e proteins, Casl2 g proteins, Casl2h proteins, and Casl2i proteins. In some cases, a Casl2L protein (of the subject compositions and/or methods) includes an amino acid sequence having a RuvC domain (which includes the RuvC-I, RuvC-II, and RuvC-III domains) that has more sequence identity to the RuvC domain of any of the Casl2L amino acid sequences depicted in FIG. 5A-5M) than to the RuvC domain of any of the following: Casl2a proteins, Casl2b proteins, Casl2c proteins, Casl2d proteins, Casl2e proteins, Casl2 g proteins, Casl2h proteins, and Casl2i proteins.
[0100] FIG. 5 A provides the locations of active site residues present in RuvC domains of the CasL polypeptide designated “CasL_56.” For example, active site residues of CasL_56 (FIG. 5 A) are amino acid residues 336, 523, and 676. Corresponding active site residues of other CasL polypeptides presented in FIG. 5A-5M can be determined by those skilled in the art. See, e.g., the bold and underlined residues in FIG. 5B and FIG. 5C.
[0101] In some cases, a CasL protein of the present disclosure includes an Asn at position 102 (N102) of the CasL polypeptides depicted in FIG. 5A-5C, or corresponding positions in the CasL polypeptide of FIG. 5D-5M. Substitution of the N102 with another amino acid can modify the PAM requirement. For example, substitution of N102 with Q, S, E, T, or D could expand the PAM from R(-l) to C, T, or N.
[0102] In some cases, a CasL protein of the present disclosure includes amino acids that interact directly with the RNA nucleobases (Q452, N510), and or amino acids that interact directly with the RNA phosphate backbone to stabilize the guide (S451, K598, E444, N445, K503, Y619) (where the amino acid numbering is based on the numbering of the amino acid sequence depicted in FIG. 5B), or corresponding positions in the amino acid sequence depicted in any one of FIG. 5A or FIG. 5C-5M. For example, corresponding positions in the amino acid sequence depicted in FIG. 5C are shown in bold.
[0103] In some cases, a CasL protein of the present disclosure has a domain structure as shown in FIG. 3D. For example, in some cases, a CasL protein comprises: i) an OBD domain of about 27 amino acids in length at the N-terminus of the protein; ii) a REC I domain from amino acids 28-54; iii) a PID domain from amino acids 55-113; iv) a REC I domain from amino acids 114-245; v) an OBD domain from amino acids 246 to 321; vi) a first RuvC domain from amino acids 322 to 350; vii) a REC II domain from aino acids 351 to 387; viii) a second RuvC domain from amino acids 388 to 396; ix) a REC II domain from amino acids 397-522; x) a third RuvC domain from amino acids 523 to 640; xi) a TSL domain from amino acids 641 to 678; xii) a fourth RuvC domain from amino acids 679 to 724; xiii) and a TSL domain from amino acids 724 to 747 (e.g., of the consensus amino acid sequence depicted in FIG. 7; or the corresponding amino acids of a polypeptide depicted in any of FIG. 5A-5M). FIG. 7 provides an alignment of amino acid sequences of various CasL proteins, including the amino acid sequences set out in FIG. 5A-5M.
[0104] In some cases, a Casl2L protein (of the subject compositions and/or methods) includes an amino acid sequence having 20% or more sequence identity (e.g., 30% or more, 40% or more, 50% or more, 60% or more, 70% or more, 80% or more, 85% or more, 90% or more, 95% or more, 97% or more, 98% or more, 99% or more, or 100% sequence identity) with the RuvC domain (which includes the RuvC-I, RuvC-II, and RuvC-III domains) of any one of the Casl2L amino acid sequences depicted in FIG. 5A-5M. In some cases, a Cas12L protein (of the subject compositions and/or methods) includes an amino acid sequence having 70% or more sequence identity (e.g., 75% or more, 80% or more, 85% or more, 90% or more, 95% or more, 97% or more, 98% or more, 99% or more, or 100% sequence identity) with the RuvC domain (which includes the RuvC-I, RuvC-II, and RuvC-III domains) of any one of the Casl2L amino acid sequences depicted in FIG. 5A-5M. In some cases, a Casl2L protein (of the subject compositions and/or methods) includes the RuvC domain (which includes the RuvC-I, RuvC-II, and RuvC-III domains) of any one of the Cas12L amino acid sequences depicted in FIG. 5 A-5M. [0105] In some cases, a guide RNA that binds a Casl2L polypeptide includes a nucleotide sequence depicted in any one of FIG. 5A-5M; where “T” is replaced with “U”) (or in some cases the reverse complement of same). In some cases, the guide RNA comprises the nucleotide sequence (N)nX or the reverse complement of same, where N is any nucleotide, n is an integer from 15 to 30 (e.g., from 15 to 20, from 17 to 25, from 17 to 22, from 18 to 22, from 18 to 20, from 20 to 25, or from 25 to 30), and X is any one of the nucleotide sequences depicted in any one of FIG. 5A-5M (or in some cases the reverse complement of same).
[0106] In some cases, a guide RNA that binds a Casl2L polypeptide includes a nucleotide sequence (a repeat sequence; or protein-binding sequence) of the following consensus sequence: WAUUGUUGUARMWNYYWUUUURUAWGGWKURAACAAC (SEQ ID NO:69), where W is A or U; R is G or A; M is A or C; N is A, G, C, or U; Y is U or C; and K is G or U. For example, a guide RNA can comprise a protein-binding segment comprising the nucleotide sequence: AUUGUUGAAAUAGUACUUUUAUAGUCUAUAUACAAC (SEQ ID NO:70). As another example a guide RNA that binds a CasL polypeptide can comprise a protein-binding segment comprising the nucleotide sequence: AUUGUUGUAACAUCUAUUUUGUAAGGUGUAAACAAC (SEQ ID NO:71). As another example a guide RNA that binds a CasL polypeptide can comprise a protein-binding segment comprising the nucleotide sequence: UAUUGUUGUAACUCUUAUUUUGUAUGGAGUAAACAAC (SEQ ID NO:72). As another example a guide RNA that binds a CasL polypeptide can comprise a protein-binding segment comprising the nucleotide sequence:
AAUUGUUGUAACUCUUAUUUUGUAUGGAGUAAACAAC (SEQ ID NO:73). As another example a guide RNA that binds a CasL polypeptide can comprise a protein-binding segment comprising the nucleotide sequence: AUUGUUGUAACUCUUAUUUUGUAUGGAGUAAACAAC (SEQ ID NO:74). As another example a guide RNA that binds a CasL polypeptide can comprise a protein-binding segment comprising the nucleotide sequence: AUUGUUGUAACUUUUAUUUUGUAUGGAGUAAACAAC (SEQ ID NO:75). As another example a guide RNA that binds a CasL polypeptide can comprise a protein-binding segment comprising the nucleotide sequence:
AUUGUUGUAGACCUCUUUUUAUAAGGAUUGAACAAC (SEQ ID NO:76). A Casl2L polypeptide of the present disclosure can form a complex (a ribonucleoprotein (RNP) complex) with a guide RNA comprising a protein-binding segment described herein.
[0107] As another example a guide RNA that binds a CasL polypeptide can comprise a proteinbinding segment comprising the nucleotide sequence: AAUGUUGUAGAUGCCUUUUUAUAAGGAUUAAACAAC (SEQ ID NO:77). As another example a guide RNA that binds a CasL polypeptide can comprise a protein-binding segment comprising the nucleotide sequence: AAUGUUGUAGAUACCUUUUUGUAAGGAUUGAACAAC (SEQ ID NO:78). As another example a guide RNA that binds a CasL polypeptide can comprise a protein-binding segment comprising the nucleotide sequence: UAUUGUUGUAGAUACCUUUUUGUAAGGAUUAAACAAC (SEQ ID NO:79). As another example a guide RNA that binds a CasL polypeptide can comprise a protein-binding segment comprising the nucleotide sequence: AUUGUUGUAGAUACCUUUUUGUAAGGAUUGAACAAC (SEQ ID NO: 80). As another example a guide RNA that binds a CasL polypeptide can comprise a protein-binding segment comprising the nucleotide sequence: AUUGUUGUAAUACUAUUUUUGUAAAGUAUAAACAAC (SEQ ID NO:81). As another example a guide RNA that binds a CasL polypeptide can comprise a protein-binding segment comprising the nucleotide sequence: AUUGUUGUAAUACACUUUUUAUAAGGUAUGAACAAC (SEQ ID NO:82).
[0108] In addition to containing conserved sequence motifs in the repeat (protein-binding) regions, the repeat region of a CasLambda guide RNA share conserved secondary structures across homologs. For example, the repeat region can include palindromic regions that can form stem and stem-loop structures. [0109] In some cases, a guide RNA that binds a Casl2L polypeptide includes a nucleotide sequence having 20% or more sequence identity (e.g., 30% or more, 40% or more, 50% or more, 60% or more, 70% or more, 80% or more, 85% or more, 90% or more, 95% or more, 97% or more, 98% or more, 99% or more, or 100% sequence identity) with the nucleotide sequence depicted in any one of FIG. 5A-5M (or in some cases the reverse complement of same). In some cases, the guide RNA comprises the nucleotide sequence (N)nX or the reverse complement of same, where N is any nucleotide, n is an integer from 15 to 30 (e.g., from 15 to 20, from 17 to 25, from 17 to 22, from 18 to 22, from 18 to 20, from 20 to 25, or from 25 to 30), and X is a nucleotide sequence having 20% or more sequence identity (e.g., 30% or more, 40% or more, 50% or more, 60% or more, 70% or more, 80% or more, 85% or more, 90% or more, 95% or more, 97% or more, 98% or more, 99% or more, or 100% sequence identity) with nucleotide sequence depicted in any one of FIG. 5 A-5M.
[0110] In some cases, a guide RNA that binds a Cas12L polypeptide includes a nucleotide sequence having 85% or more sequence identity (e.g., 90% or more, 95% or more, 97% or more, 98% or more, 99% or more, or 100% sequence identity) with nucleotide sequence depicted in any one of FIG. 5A-5M (or in some cases the reverse complement of same). In some cases, the guide RNA comprises the nucleotide sequence (N)nX or the reverse complement of same, where N is any nucleotide, n is an integer from 15 to 30 (e.g., from 15 to 20, from 17 to 25, from 17 to 22, from 18 to 22, from 18 to 20, from 20 to 25, or from 25 to 30), and X is a nucleotide sequence having 85% or more sequence identity (e.g., 90% or more, 95% or more, 97% or more, 98% or more, 99% or more, or 100% sequence identity) with nucleotide sequence depicted in any one of FIG. 5A-5M.
[0111] In some cases, a guide RNA that binds a Casl2L polypeptide includes a nucleotide sequence depicted in any one of FIG. 5A-5M (or in some cases the reverse complement of same). In some cases, the guide RNA comprises the nucleotide sequence X(N)n, where N is any nucleotide, n is an integer from 15 to 30 (e.g., from 15 to 20, from 17 to 25, from 17 to 22, from 18 to 22, from 18 to 20, from 20 to 25, or from 25 to 30), and X is the nucleotide sequence depicted in any one of FIG. 5A-5M (or in some cases the reverse complement of same).
[0112] In some cases, a guide RNA that binds a Casl2L polypeptide includes a nucleotide sequence having 20% or more sequence identity (e.g., 30% or more, 40% or more, 50% or more, 60% or more, 70% or more, 80% or more, 85% or more, 90% or more, 95% or more, 97% or more, 98% or more, 99% or more, or 100% sequence identity) with the nucleotide sequence depicted in any one of FIG. 5A-5M (or in some cases the reverse complement of same). In some cases, the guide RNA comprises the nucleotide sequence X(N)n, where N is any nucleotide, n is an integer from 15 to 30 (e.g., from 15 to 20, from 17 to 25, from 17 to 22, from 18 to 22, from 18 to 20, from 20 to 25, or from 25 to 30), and X is a nucleotide sequence having 20% or more sequence identity (e.g., 30% or more, 40% or more, 50% or more, 60% or more, 70% or more, 80% or more, 85% or more, 90% or more, 95% or more, 97% or more, 98% or more, 99% or more, or 100% sequence identity) with the nucleotide sequence depicted in any one of FIG. 5A- 5M.
[0113] In some cases, a guide RNA that binds a Casl2L polypeptide includes a nucleotide sequence having 85% or more sequence identity (e.g., 90% or more, 95% or more, 97% or more, 98% or more, 99% or more, or 100% sequence identity) with the nucleotide sequence depicted in any one of FIG. 5A- 5M (or in some cases the reverse complement of same). In some cases, the guide RNA comprises the nucleotide sequence X(N)n, where N is any nucleotide, n is an integer from 15 to 30 (e.g., from 15 to 20, from 17 to 25, from 17 to 22, from 18 to 22, from 18 to 20, from 20 to 25, or from 25 to 30), and X is a nucleotide sequence having 85% or more sequence identity (e.g., 90% or more, 95% or more, 97% or more, 98% or more, 99% or more, or 100% sequence identity) with the nucleotide sequence depicted in any one of FIG. 5A-5M.
[0114] FIG. 5 A
[0115] In some cases, a Casl2L protein (of the subject compositions and/or methods) includes an amino acid sequence having 20% or more sequence identity (e.g., 30% or more, 40% or more, 50% or more, 60% or more, 70% or more, 80% or more, 85% or more, 90% or more, 95% or more, 97% or more, 98% or more, 99% or more, or 100% sequence identity) with the Casl2L amino acid sequence depicted in FIG. 5A and designated “Casl2L_56.” For example, in some cases, a Casl2L protein includes an amino acid sequence having 50% or more sequence identity (e.g., 60% or more, 70% or more, 80% or more, 85% or more, 90% or more, 95% or more, 97% or more, 98% or more, 99% or more, or 100% sequence identity) with the Casl2L amino acid sequence depicted in FIG. 5A. In some cases, a Casl2L protein includes an amino acid sequence having 80% or more sequence identity (e.g., 85% or more, 90% or more, 95% or more, 97% or more, 98% or more, 99% or more, or 100% sequence identity) with the Casl2L amino acid sequence depicted in FIG. 5A. In some cases, a Casl2L protein includes an amino acid sequence having 90% or more sequence identity (e.g., 95% or more, 97% or more, 98% or more, 99% or more, or 100% sequence identity) with the Casl2L amino acid sequence depicted in FIG. 5A. In some cases, a Casl2L protein includes an amino acid sequence having the Casl2L amino acid sequence depicted in FIG. 5A. In some cases, a Casl2L protein includes an amino acid sequence having the Casl2L protein sequence depicted in FIG. 5A, with the exception that the sequence includes an amino acid substitution (e.g., 1, 2, or 3 amino acid substitutions) that reduces the naturally occurring catalytic activity of the protein. In some cases, the Casl2L polypeptide has a length of from 700 amino acids (aa) to 750 aa, e.g., from 700 aa to 725 aa, from 725 aa to 735 aa, from 735 aa to 740 aa, or from 740 aa to 750 aa). In some cases, the Casl2L polypeptide has a length of 735 amino acids. In some cases, a guide RNA that binds a Casl2L polypeptide (e.g., a Casl2L polypeptide comprising an amino acid sequence having 20% or more, 30% or more, 40% or more, 50% or more, 60% or more, 70% or more, 80% or more, 85% or more, 90% or more, 95% or more, 97% or more, 98% or more, 99% or more, or 100%, amino acid sequence identity to the Casl2L amino acid sequence depicted in FIG. 5A) includes the following nucleotide sequence: ATTGTTGTAGATACCTTTTTATAAGGTTTGAACAAC (SEQ ID NO:83) or the reverse complement of same. In some cases, the guide RNA comprises the nucleotide sequence (N)nATTGTTGTAGATACCTTTTTATAAGGTTTGAACAAC (SEQ ID NO: 84) or the reverse complement of same, where N is any nucleotide and n is an integer from 15 to 30, e.g., from 15 to 20, from 17 to 25, from 17 to 22, from 18 to 22, from 18 to 20, from 20 to 25, or from 25 to 30).
[0116] FIG. 5B
[0117] In some cases, a Casl2L protein (of the subject compositions and/or methods) includes an amino acid sequence having 20% or more sequence identity (e.g., 30% or more, 40% or more, 50% or more, 60% or more, 70% or more, 80% or more, 85% or more, 90% or more, 95% or more, 97% or more, 98% or more, 99% or more, or 100% sequence identity) with the Casl2L amino acid sequence depicted in FIG. 5B and designated “Casl2L_57.” For example, in some cases, a Casl2L protein includes an amino acid sequence having 50% or more sequence identity (e.g., 60% or more, 70% or more, 80% or more, 85% or more, 90% or more, 95% or more, 97% or more, 98% or more, 99% or more, or 100% sequence identity) with the Casl2L amino acid sequence depicted in FIG. 5B. In some cases, a Casl2L protein includes an amino acid sequence having 80% or more sequence identity (e.g., 85% or more, 90% or more, 95% or more, 97% or more, 98% or more, 99% or more, or 100% sequence identity) with the Casl2L amino acid sequence depicted in FIG. 5B. In some cases, a Casl2L protein includes an amino acid sequence having 90% or more sequence identity (e.g., 95% or more, 97% or more, 98% or more, 99% or more, or 100% sequence identity) with the Casl2L amino acid sequence depicted in FIG. 5B. In some cases, a Casl2L protein includes an amino acid sequence having the Casl2L amino acid sequence depicted in FIG. 5B. In some cases, a Casl2L protein includes an amino acid sequence having the Casl2L protein sequence depicted in FIG. 5B, with the exception that the sequence includes an amino acid substitution (e.g., 1, 2, or 3 amino acid substitutions) that reduces the naturally occurring catalytic activity of the protein. In some cases, the Casl2L polypeptide has a length of from 730 amino acids (aa) to 775 aa, e.g., from 730 aa to 740 aa, from 740 aa to 750 aa, or from 750 aa to 775 aa). In some cases, the Casl2L polypeptide has a length of 746 amino acids. In some cases, a guide RNA that binds a Casl2L polypeptide (e.g., a Casl2L polypeptide comprising an amino acid sequence having 20% or more, 30% or more, 40% or more, 50% or more, 60% or more, 70% or more, 80% or more, 85% or more, 90% or more, 95% or more, 97% or more, 98% or more, 99% or more, or 100%, amino acid sequence identity to the Casl2L amino acid sequence depicted in FIG. 5B) includes the following nucleotide sequence: ATTGTTGTAACTCTTATTTTGTATGGAGTAAACAAC (SEQ ID NO: 85) or the reverse complement of same. In some cases, the guide RNA comprises the nucleotide sequence (N)nATTGTTGTAACTCTTATTTTGTATGGAGTAAACAAC (SEQ ID NO: 86) or the reverse complement of same, where N is any nucleotide and n is an integer from 15 to 30, e.g., from 15 to 20, from 17 to 25, from 17 to 22, from 18 to 22, from 18 to 20, from 20 to 25, or from 25 to 30).
[0118] FIG. 5C
[0119] In some cases, a Casl2L protein (of the subject compositions and/or methods) includes an amino acid sequence having 20% or more sequence identity (e.g., 30% or more, 40% or more, 50% or more, 60% or more, 70% or more, 80% or more, 85% or more, 90% or more, 95% or more, 97% or more, 98% or more, 99% or more, or 100% sequence identity) with the Casl2L amino acid sequence depicted in FIG. 5C and designated “Casl2L_58.” For example, in some cases, a Casl2L protein includes an amino acid sequence having 50% or more sequence identity (e.g., 60% or more, 70% or more, 80% or more, 85% or more, 90% or more, 95% or more, 97% or more, 98% or more, 99% or more, or 100% sequence identity) with the Casl2L amino acid sequence depicted in FIG. 5C. In some cases, a Casl2L protein includes an amino acid sequence having 80% or more sequence identity (e.g., 85% or more, 90% or more, 95% or more, 97% or more, 98% or more, 99% or more, or 100% sequence identity) with the Casl2L amino acid sequence depicted in FIG. 5C. In some cases, a Casl2L protein includes an amino acid sequence having 90% or more sequence identity (e.g., 95% or more, 97% or more, 98% or more, 99% or more, or 100% sequence identity) with the Casl2L amino acid sequence depicted in FIG. 5C. In some cases, a Casl2L protein includes an amino acid sequence having the Casl2L amino acid sequence depicted in FIG. 5C. In some cases, a Casl2L protein includes an amino acid sequence having the Casl2L protein sequence depicted in FIG. 5C, with the exception that the sequence includes an amino acid substitution (e.g., 1, 2, or 3 amino acid substitutions) that reduces the naturally occurring catalytic activity of the protein. In some cases, the Casl2L polypeptide has a length of from 730 amino acids (aa) to 775 aa, e.g., from 730 aa to 740 aa, from 740 aa to 750 aa, or from 750 aa to 775 aa). In some cases, the Casl2L polypeptide has a length of 746 amino acids. In some cases, a guide RNA that binds a Casl2L polypeptide (e.g., a Casl2L polypeptide comprising an amino acid sequence having 20% or more, 30% or more, 40% or more, 50% or more, 60% or more, 70% or more, 80% or more, 85% or more, 90% or more, 95% or more, 97% or more, 98% or more, 99% or more, or 100%, amino acid sequence identity to the Casl2L amino acid sequence depicted in FIG. 5C) includes the following nucleotide sequence: ATTGTTGTAACTCTTATTTTGTATGGAGTAAACAAC (SEQ ID NO: 85) or the reverse complement of same. In some cases, the guide RNA comprises the nucleotide sequence (N)nATTGTTGTAACTCTTATTTTGTATGGAGTAAACAAC (SEQ ID NO: 86) or the reverse complement of same, where N is any nucleotide and n is an integer from 15 to 30, e.g., from 15 to 20, from 17 to 25, from 17 to 22, from 18 to 22, from 18 to 20, from 20 to 25, or from 25 to 30).
[0120] FIG. 5D
[0121] In some cases, a Casl2L protein (of the subject compositions and/or methods) includes an amino acid sequence having 20% or more sequence identity (e.g., 30% or more, 40% or more, 50% or more, 60% or more, 70% or more, 80% or more, 85% or more, 90% or more, 95% or more, 97% or more, 98% or more, 99% or more, or 100% sequence identity) with the Casl2L amino acid sequence depicted in FIG. 5D and designated “Casl2L_59.” For example, in some cases, a Casl2L protein includes an amino acid sequence having 50% or more sequence identity (e.g., 60% or more, 70% or more, 80% or more, 85% or more, 90% or more, 95% or more, 97% or more, 98% or more, 99% or more, or 100% sequence identity) with the Casl2L amino acid sequence depicted in FIG. 5D. In some cases, a Casl2L protein includes an amino acid sequence having 80% or more sequence identity (e.g., 85% or more, 90% or more, 95% or more, 97% or more, 98% or more, 99% or more, or 100% sequence identity) with the Casl2L amino acid sequence depicted in FIG. 5D. In some cases, a Casl2L protein includes an amino acid sequence having 90% or more sequence identity (e.g., 95% or more, 97% or more, 98% or more, 99% or more, or 100% sequence identity) with the Casl2L amino acid sequence depicted in FIG. 5D. In some cases, a Casl2L protein includes an amino acid sequence having the Casl2L amino acid sequence depicted in FIG. 5D. In some cases, a Casl2L protein includes an amino acid sequence having the Casl2L protein sequence depicted in FIG. 5D, with the exception that the sequence includes an amino acid substitution (e.g., 1, 2, or 3 amino acid substitutions) that reduces the naturally occurring catalytic activity of the protein. In some cases, the Casl2L polypeptide has a length of from 800 amino acids (aa) to 875 aa, e.g., from 800 aa to 825 aa, from 825 aa to 850 aa, or from 850 aa to 875 aa). In some cases, the Casl2L polypeptide has a length of 828 amino acids. In some cases, a guide RNA that binds a Casl2L polypeptide (e.g., a Casl2L polypeptide comprising an amino acid sequence having 20% or more, 30% or more, 40% or more, 50% or more, 60% or more, 70% or more, 80% or more, 85% or more, 90% or more, 95% or more, 97% or more, 98% or more, 99% or more, or 100%, amino acid sequence identity to the Casl2L amino acid sequence depicted in FIG. 5D) includes the following nucleotide sequence: ACTGTTGGTTATCCTAATTTTATGGGAATACACAAC (SEQ ID NO: 87) or the reverse complement of same. In some cases, the guide RNA comprises the nucleotide sequence (N)nACTGTTGGTTATCCTAATTTTATGGGAATACACAAC (SEQ ID NO:88) or the reverse complement of same, where N is any nucleotide and n is an integer from 15 to 30, e.g., from 15 to 20, from 17 to 25, from 17 to 22, from 18 to 22, from 18 to 20, from 20 to 25, or from 25 to 30).
[0122] FIG. 5E
[0123] In some cases, a Casl2L protein (of the subject compositions and/or methods) includes an amino acid sequence having 20% or more sequence identity (e.g., 30% or more, 40% or more, 50% or more, 60% or more, 70% or more, 80% or more, 85% or more, 90% or more, 95% or more, 97% or more, 98% or more, 99% or more, or 100% sequence identity) with the Casl2L amino acid sequence depicted in FIG. 5E and designated “Casl2L_60.” For example, in some cases, a Casl2L protein includes an amino acid sequence having 50% or more sequence identity (e.g., 60% or more, 70% or more, 80% or more, 85% or more, 90% or more, 95% or more, 97% or more, 98% or more, 99% or more, or 100% sequence identity) with the Casl2L amino acid sequence depicted in FIG. 5E. In some cases, a Casl2L protein includes an amino acid sequence having 80% or more sequence identity (e.g., 85% or more, 90% or more, 95% or more, 97% or more, 98% or more, 99% or more, or 100% sequence identity) with the Casl2L amino acid sequence depicted in FIG. 5E. In some cases, a Casl2L protein includes an amino acid sequence having 90% or more sequence identity (e.g., 95% or more, 97% or more, 98% or more, 99% or more, or 100% sequence identity) with the Casl2L amino acid sequence depicted in FIG. 5E. In some cases, a Casl2L protein includes an amino acid sequence having the Casl2L amino acid sequence depicted in FIG. 5E. In some cases, a Cast 2L protein includes an amino acid sequence having the Casl2L protein sequence depicted in FIG. 5E, with the exception that the sequence includes an amino acid substitution (e.g., 1, 2, or 3 amino acid substitutions) that reduces the naturally occurring catalytic activity of the protein. In some cases, the Casl2L polypeptide has a length of from 800 amino acids (aa) to 875 aa, e.g., from 800 aa to 825 aa, from 825 aa to 850 aa, or from 850 aa to 875 aa). In some cases, the Casl2L polypeptide has a length of 828 amino acids. In some cases, a guide RNA that binds a Casl2L polypeptide (e.g., a Casl2L polypeptide comprising an amino acid sequence having 20% or more, 30% or more, 40% or more, 50% or more, 60% or more, 70% or more, 80% or more, 85% or more, 90% or more, 95% or more, 97% or more, 98% or more, 99% or more, or 100%, amino acid sequence identity to the Casl2L amino acid sequence depicted in FIG. 5E) includes the following nucleotide sequence: ACTGTTGGTTATCCTAATTTTATGGGAATACACAAC (SEQ ID NO: 87) or the reverse complement of same. In some cases, the guide RNA comprises the nucleotide sequence (N)nACTGTTGGTTATCCTAATTTTATGGGAATACACAAC (SEQ ID NO:88) or the reverse complement of same, where N is any nucleotide and n is an integer from 15 to 30, e.g., from 15 to 20, from 17 to 25, from 17 to 22, from 18 to 22, from 18 to 20, from 20 to 25, or from 25 to 30).
[0124] FIG. 5F
[0125] In some cases, a Casl2L protein (of the subject compositions and/or methods) comprises an amino acid sequence having 20% or more sequence identity (e.g., 30% or more, 40% or more, 50% or more, 60% or more, 70% or more, 80% or more, 85% or more, 90% or more, 95% or more, 97% or more, 98% or more, 99% or more, or 100% sequence identity) with the Casl2L amino acid sequence depicted in FIG. 5F and designated “Casl2L_61.” For example, in some cases, a Casl2L protein comprises an amino acid sequence having 50% or more sequence identity (e.g., 60% or more, 70% or more, 80% or more, 85% or more, 90% or more, 95% or more, 97% or more, 98% or more, 99% or more, or 100% sequence identity) with the Casl2L amino acid sequence depicted in FIG. 5F. In some cases, a Casl2L protein comprises an amino acid sequence having 80% or more sequence identity (e.g., 85% or more, 90% or more, 95% or more, 97% or more, 98% or more, 99% or more, or 100% sequence identity) with the Casl2L amino acid sequence depicted in FIG. 5F. In some cases, a Casl2L protein comprises an amino acid sequence having 90% or more sequence identity (e.g., 95% or more, 97% or more, 98% or more, 99% or more, or 100% sequence identity) with the Casl2L amino acid sequence depicted in FIG. 5F. In some cases, a Casl2L protein comprises the Casl2L amino acid sequence depicted in FIG. 5F. In some cases, a Casl2L protein comprises an amino acid sequence having the Casl2L protein sequence depicted in FIG. 5F, with the exception that the sequence includes an amino acid substitution (e.g., 1, 2, or 3 amino acid substitutions) that reduces the naturally occurring catalytic activity of the protein. In some cases, the Casl 2L polypeptide has a length of from 800 amino acids (aa) to 875 aa, e.g., from 800 aa to 825 aa, from 825 aa to 850 aa, or from 850 aa to 875 aa). In some cases, the Casl2L polypeptide has a length of 828 amino acids. In some cases, a guide RNA that binds a Casl2L polypeptide (e.g., a Casl2L polypeptide comprising an amino acid sequence having 20% or more, 30% or more, 40% or more, 50% or more, 60% or more, 70% or more, 80% or more, 85% or more, 90% or more, 95% or more, 97% or more, 98% or more, 99% or more, or 100%, amino acid sequence identity to the Casl2L amino acid sequence depicted in FIG. 5F) includes the following nucleotide sequence: ATTGTTGGTTATCCTAATTTTATAGGAATACACAAC (SEQ ID NO: 89) or the reverse complement of same. In some cases, the guide RNA comprises the nucleotide sequence (N)nATTGTTGGTTATCCTAATTTTATAGGAATACACAAC (SEQ ID NO:90) or the reverse complement of same, where N is any nucleotide and n is an integer from 15 to 30, e.g., from 15 to 20, from 17 to 25, from 17 to 22, from 18 to 22, from 18 to 20, from 20 to 25, or from 25 to 30).
[0126] FIG. 5G
[0127] In some cases, a Casl2L protein (of the subject compositions and/or methods) includes an amino acid sequence having 20% or more sequence identity (e.g., 30% or more, 40% or more, 50% or more, 60% or more, 70% or more, 80% or more, 85% or more, 90% or more, 95% or more, 97% or more, 98% or more, 99% or more, or 100% sequence identity) with the Casl2L amino acid sequence depicted in FIG. 5G and designated “Casl2L_62.” For example, in some cases, a Casl2L protein includes an amino acid sequence having 50% or more sequence identity (e.g., 60% or more, 70% or more, 80% or more, 85% or more, 90% or more, 95% or more, 97% or more, 98% or more, 99% or more, or 100% sequence identity) with the Casl2L amino acid sequence depicted in FIG. 5G. In some cases, a Casl2L protein includes an amino acid sequence having 80% or more sequence identity (e.g., 85% or more, 90% or more, 95% or more, 97% or more, 98% or more, 99% or more, or 100% sequence identity) with the Casl2L amino acid sequence depicted in FIG. 5G. In some cases, a Casl2L protein includes an amino acid sequence having 90% or more sequence identity (e.g., 95% or more, 97% or more, 98% or more, 99% or more, or 100% sequence identity) with the Casl2L amino acid sequence depicted in FIG. 5G. In some cases, a Casl2L protein includes an amino acid sequence having the Casl2L amino acid sequence depicted in FIG. 5G. In some cases, a Cas12L protein includes an amino acid sequence having the Casl2L protein sequence depicted in FIG. 5G, with the exception that the sequence includes an amino acid substitution (e.g., 1, 2, or 3 amino acid substitutions) that reduces the naturally occurring catalytic activity of the protein. In some cases, the Casl2L polypeptide has a length of from 800 amino acids (aa) to 875 aa, e.g., from 800 aa to 825 aa, from 825 aa to 850 aa, or from 850 aa to 875 aa). In some cases, the Casl2L polypeptide has a length of 827 amino acids. In some cases, a guide RNA that binds a Casl2L polypeptide (e.g., a Cas12L polypeptide comprising an amino acid sequence having 20% or more, 30% or more, 40% or more, 50% or more, 60% or more, 70% or more, 80% or more, 85% or more, 90% or more, 95% or more, 97% or more, 98% or more, 99% or more, or 100%, amino acid sequence identity to the Casl2L amino acid sequence depicted in FIG. 5G) includes the following nucleotide sequence: ACTGTTGGAGTACTTAATTTTATGGGTATTCACAAC (SEQ ID NO:91) or the reverse complement of same. In some cases, the guide RNA comprises the nucleotide sequence (N)nACTGTTGGAGTACTTAATTTTATGGGTATTCACAAC (SEQ ID NO:92) or the reverse complement of same, where N is any nucleotide and n is an integer from 15 to 30, e.g., from 15 to 20, from 17 to 25, from 17 to 22, from 18 to 22, from 18 to 20, from 20 to 25, or from 25 to 30).
[0128] FIG. 5H
[0129] In some cases, a Casl2L protein (of the subject compositions and/or methods) includes an amino acid sequence having 20% or more sequence identity (e.g., 30% or more, 40% or more, 50% or more, 60% or more, 70% or more, 80% or more, 85% or more, 90% or more, 95% or more, 97% or more, 98% or more, 99% or more, or 100% sequence identity) with the Casl2L amino acid sequence depicted in FIG. 5H and designated “Casl2L_63.” For example, in some cases, a Casl2L protein includes an amino acid sequence having 50% or more sequence identity (e.g., 60% or more, 70% or more, 80% or more, 85% or more, 90% or more, 95% or more, 97% or more, 98% or more, 99% or more, or 100% sequence identity) with the Casl2L amino acid sequence depicted in FIG. 5H. In some cases, a Casl2L protein includes an amino acid sequence having 80% or more sequence identity (e.g., 85% or more, 90% or more, 95% or more, 97% or more, 98% or more, 99% or more, or 100% sequence identity) with the Casl2L amino acid sequence depicted in FIG. 5H. In some cases, a Casl2L protein includes an amino acid sequence having 90% or more sequence identity (e.g., 95% or more, 97% or more, 98% or more, 99% or more, or 100% sequence identity) with the Casl2L amino acid sequence depicted in FIG. 5H. In some cases, a Casl2L protein includes an amino acid sequence having the Casl2L amino acid sequence depicted in FIG. 5H. In some cases, a Casl2L protein includes an amino acid sequence having the Casl2L protein sequence depicted in FIG. 5H, with the exception that the sequence includes an amino acid substitution (e.g., 1, 2, or 3 amino acid substitutions) that reduces the naturally occurring catalytic activity of the protein. In some cases, the Cas12L polypeptide has a length of from 700 amino acids (aa) to 750 aa, e.g., from 700 aa to 725 aa, from 725 aa to 735 aa, from 735 aa to 740 aa, or from 740 aa to 750 aa). In some cases, the Casl2L polypeptide has a length of 738 amino acids. In some cases, a guide RNA that binds a Casl2L polypeptide (e.g., a Casl2L polypeptide comprising an amino acid sequence having 20% or more, 30% or more, 40% or more, 50% or more, 60% or more, 70% or more, 80% or more, 85% or more, 90% or more, 95% or more, 97% or more, 98% or more, 99% or more, or 100%, amino acid sequence identity to the Cast 2L amino acid sequence depicted in FIG. 5H) includes the following nucleotide sequence: ACTGTTGGAGTACTTAATTTTATGGGTATTCACAAC (SEQ ID NO:91) or the reverse complement of same. In some cases, the guide RNA comprises the nucleotide sequence (N)nACTGTTGGAGTACTTAATTTTATGGGTATTCACAAC (SEQ ID NO:92) or the reverse complement of same, where N is any nucleotide and n is an integer from 15 to 30, e.g., from 15 to 20, from 17 to 25, from 17 to 22, from 18 to 22, from 18 to 20, from 20 to 25, or from 25 to 30).
[0130] FIG. 51
[0131] In some cases, a Casl2L protein (of the subject compositions and/or methods) includes an amino acid sequence having 20% or more sequence identity (e.g., 30% or more, 40% or more, 50% or more, 60% or more, 70% or more, 80% or more, 85% or more, 90% or more, 95% or more, 97% or more, 98% or more, 99% or more, or 100% sequence identity) with the Casl2L amino acid sequence depicted in FIG. 51 and designated “Casl2L_64.” For example, in some cases, a Casl2L protein includes an amino acid sequence having 50% or more sequence identity (e.g., 60% or more, 70% or more, 80% or more, 85% or more, 90% or more, 95% or more, 97% or more, 98% or more, 99% or more, or 100% sequence identity) with the Casl2L amino acid sequence depicted in FIG. 51. In some cases, a Casl2L protein includes an amino acid sequence having 80% or more sequence identity (e.g., 85% or more, 90% or more, 95% or more, 97% or more, 98% or more, 99% or more, or 100% sequence identity) with the Casl2L amino acid sequence depicted in FIG. 51. In some cases, a Casl2L protein includes an amino acid sequence having 90% or more sequence identity (e.g., 95% or more, 97% or more, 98% or more, 99% or more, or 100% sequence identity) with the Casl2L amino acid sequence depicted in FIG. 51. In some cases, a Casl2L protein includes an amino acid sequence having the Casl2L amino acid sequence depicted in FIG. 51. In some cases, a Casl2L protein includes an amino acid sequence having the Casl2L protein sequence depicted in FIG. 51, with the exception that the sequence includes an amino acid substitution (e.g., 1, 2, or 3 amino acid substitutions) that reduces the naturally occurring catalytic activity of the protein. In some cases, the Casl2L polypeptide has a length of from 740 amino acids (aa) to 800 aa, e.g., from 740 aa to 750 aa, from 750 aa to 775 aa, or from 775 aa to 800 aa). In some cases, the Casl2L polypeptide has a length of 767 amino acids. In some cases, a guide RNA that binds a Casl2L polypeptide (e.g., a Cas12L polypeptide comprising an amino acid sequence having 20% or more, 30% or more, 40% or more, 50% or more, 60% or more, 70% or more, 80% or more, 85% or more, 90% or more, 95% or more, 97% or more, 98% or more, 99% or more, or 100%, amino acid sequence identity to the Casl2L amino acid sequence depicted in FIG. 51) includes the following nucleotide sequence: ACTGTTGGAGTACTTAATTTTATGGGTATTCACAAC (SEQ ID NO:91) or the reverse complement of same. In some cases, the guide RNA comprises the nucleotide sequence
(N)nACTGTTGGAGTACTTAATTTTATGGGTATTCACAAC (SEQ ID NO:92) or the reverse complement of same, where N is any nucleotide and n is an integer from 15 to 30, e.g., from 15 to 20, from 17 to 25, from 17 to 22, from 18 to 22, from 18 to 20, from 20 to 25, or from 25 to 30).
[0132] FIG. 5J
[0133] In some cases, a Casl2L protein (of the subject compositions and/or methods) includes an amino acid sequence having 20% or more sequence identity (e.g., 30% or more, 40% or more, 50% or more, 60% or more, 70% or more, 80% or more, 85% or more, 90% or more, 95% or more, 97% or more, 98% or more, 99% or more, or 100% sequence identity) with the Casl2L amino acid sequence depicted in FIG. 5J and designated “Casl2L_65.” For example, in some cases, a Casl2L protein includes an amino acid sequence having 50% or more sequence identity (e.g., 60% or more, 70% or more, 80% or more, 85% or more, 90% or more, 95% or more, 97% or more, 98% or more, 99% or more, or 100% sequence identity) with the Casl2L amino acid sequence depicted in FIG. 5J. In some cases, a Casl2L protein includes an amino acid sequence having 80% or more sequence identity (e.g., 85% or more, 90% or more, 95% or more, 97% or more, 98% or more, 99% or more, or 100% sequence identity) with the Casl2L amino acid sequence depicted in FIG. 5J. In some cases, a Casl2L protein includes an amino acid sequence having 90% or more sequence identity (e.g., 95% or more, 97% or more, 98% or more, 99% or more, or 100% sequence identity) with the Casl2L amino acid sequence depicted in FIG. 5 J. In some cases, a Casl2L protein includes an amino acid sequence having the Casl2L amino acid sequence depicted in FIG. 5J. In some cases, a Casl2L protein includes an amino acid sequence having the Casl2L protein sequence depicted in FIG. 5J, with the exception that the sequence includes an amino acid substitution (e.g., 1, 2, or 3 amino acid substitutions) that reduces the naturally occurring catalytic activity of the protein. In some cases, the Casl2L polypeptide has a length of from 740 amino acids (aa) to 800 aa, e.g., from 740 aa to 750 aa, from 750 aa to 775 aa, or from 775 aa to 800 aa). In some cases, the Casl2L polypeptide has a length of 767 amino acids. In some cases, a guide RNA that binds a Casl2L polypeptide (e.g., a Casl2L polypeptide comprising an amino acid sequence having 20% or more, 30% or more, 40% or more, 50% or more, 60% or more, 70% or more, 80% or more, 85% or more, 90% or more, 95% or more, 97% or more, 98% or more, 99% or more, or 100%, amino acid sequence identity to the Casl2L amino acid sequence depicted in FIG. 5J) includes the following nucleotide sequence: ACTGTTGGAGTACTTAATTTTATGGGTATTCACAAC (SEQ ID NO:91) or the reverse complement of same. In some cases, the guide RNA comprises the nucleotide sequence (N)nACTGTTGGAGTACTTAATTTTATGGGTATTCACAAC (SEQ ID NO:92) or the reverse complement of same, where N is any nucleotide and n is an integer from 15 to 30, e.g., from 15 to 20, from 17 to 25, from 17 to 22, from 18 to 22, from 18 to 20, from 20 to 25, or from 25 to 30).
[0134] FIG. 5K [0135] In some cases, a Casl2L protein (of the subject compositions and/or methods) comprises an amino acid sequence having 20% or more sequence identity (e.g., 30% or more, 40% or more, 50% or more, 60% or more, 70% or more, 80% or more, 85% or more, 90% or more, 95% or more, 97% or more, 98% or more, 99% or more, or 100% sequence identity) with the Casl2L amino acid sequence depicted in FIG. 5K and designated “Casl2L_66.” For example, in some cases, a Casl2L protein comprises an amino acid sequence having 50% or more sequence identity (e.g., 60% or more, 70% or more, 80% or more, 85% or more, 90% or more, 95% or more, 97% or more, 98% or more, 99% or more, or 100% sequence identity) with the Casl2L amino acid sequence depicted in FIG. 5K. In some cases, a Casl2L protein comprises an amino acid sequence having 80% or more sequence identity (e.g., 85% or more, 90% or more, 95% or more, 97% or more, 98% or more, 99% or more, or 100% sequence identity) with the Casl2L amino acid sequence depicted in FIG. 5K. In some cases, a Casl2L protein comprises an amino acid sequence having 90% or more sequence identity (e.g., 95% or more, 97% or more, 98% or more, 99% or more, or 100% sequence identity) with the Casl2L amino acid sequence depicted in FIG. 5K. In some cases, a Casl2L protein comprises an amino acid sequence having having the Casl2L amino acid sequence depicted in FIG. 5K. In some cases, a Casl2L protein comprises an amino acid sequence having the Casl2L protein sequence depicted in FIG. 5K, with the exception that the sequence includes an amino acid substitution (e.g., 1, 2, or 3 amino acid substitutions) that reduces the naturally occurring catalytic activity of the protein. In some cases, the Casl2L polypeptide has a length of from 600 amino acids (aa) to 700 aa, e.g., from 600 aa to 625 aa, from 625 aa to 650 aa, from 650 aa to 675 aa, or from 675 aa to 700 aa). In some cases, the Casl2L polypeptide has a length of 638 amino acids. In some cases, a guide RNA that binds a Casl2L polypeptide (e.g., a Casl2L polypeptide comprising an amino acid sequence having 20% or more, 30% or more, 40% or more, 50% or more, 60% or more, 70% or more, 80% or more, 85% or more, 90% or more, 95% or more, 97% or more, 98% or more, 99% or more, or 100%, amino acid sequence identity to the Casl2L amino acid sequence depicted in FIG. 5K) includes the following nucleotide sequence: CTTGTTGTACATACTCTTTTATAGGTATTAAACAAC (SEQ ID NO:93) or the reverse complement of same. In some cases, the guide RNA comprises the nucleotide sequence (N)nCTTGTTGTACATACTCTTTTATAGGTATTAAACAAC (SEQ ID NO:94) or the reverse complement of same, where N is any nucleotide and n is an integer from 15 to 30, e.g., from 15 to 20, from 17 to 25, from 17 to 22, from 18 to 22, from 18 to 20, from 20 to 25, or from 25 to 30).
[0136] FIG. 5L
[0137] In some cases, a Casl2L protein (of the subject compositions and/or methods) includes a contiguous stretch of about 92 amino acids having 20% or more sequence identity (e.g., 30% or more, 40% or more, 50% or more, 60% or more, 70% or more, 80% or more, 85% or more, 90% or more, 95% or more, 97% or more, 98% or more, 99% or more, or 100% sequence identity) with the Casl2L amino acid sequence depicted in FIG. 5L and designated “Casl2L_67.” For example, in some cases, a Casl2L protein includes a contiguous stretch of about 92 amino acids having 50% or more sequence identity (e.g., 60% or more, 70% or more, 80% or more, 85% or more, 90% or more, 95% or more, 97% or more, 98% or more, 99% or more, or 100% sequence identity) with the Casl2L amino acid sequence depicted in FIG. 5L. In some cases, a Casl2L protein includes a contiguous stretch of about 92 amino acids having 80% or more sequence identity (e.g., 85% or more, 90% or more, 95% or more, 97% or more, 98% or more, 99% or more, or 100% sequence identity) with the Casl2L amino acid sequence depicted in FIG. 5L. In some cases, a Casl2L protein includes a contiguous stretch of about 92 amino acids having 90% or more sequence identity (e.g., 95% or more, 97% or more, 98% or more, 99% or more, or 100% sequence identity) with the Casl2L amino acid sequence depicted in FIG. 5L. In some cases, a Casl2L protein includes a contiguous stretch of about 92 amino acids having the Casl2L amino acid sequence depicted in FIG. 5L. In some cases, a Casl2L protein includes a contiguous stretch of about 92 amino acids having the Casl2L protein sequence depicted in FIG. 5L, with the exception that the sequence includes an amino acid substitution (e.g., 1, 2, or 3 amino acid substitutions) that reduces the naturally occurring catalytic activity of the protein. In some cases, the Casl2L polypeptide has a length of from 700 amino acids (aa) to 800 aa, e.g., from 700 aa to 725 aa, from 725 aa to 750 aa, from 750 aa to 775 aa, or from 775 aa to 800 aa). In some cases, the Casl2L polypeptide has a length of from 725 amino acids to 775 amino acids. In some cases, the Casl2L polypeptide has a length of 754 amino acids. In some cases, a guide RNA that binds a Casl2L polypeptide (e.g., a Casl2L polypeptide comprising an amino acid sequence having 20% or more, 30% or more, 40% or more, 50% or more, 60% or more, 70% or more, 80% or more, 85% or more, 90% or more, 95% or more, 97% or more, 98% or more, 99% or more, or 100%, amino acid sequence identity to the Casl2L amino acid sequence depicted in FIG. 5L) includes the following nucleotide sequence: CTTGTTGTACATACTCTTTTATAGGTATTAAACAAC (SEQ ID NO:93) or the reverse complement of same. In some cases, the guide RNA comprises the nucleotide sequence (N)nCTTGTTGTACATACTCTTTTATAGGTATTAAACAAC (SEQ ID NO:94) or the reverse complement of same, where N is any nucleotide and n is an integer from 15 to 30, e.g., from 15 to 20, from 17 to 25, from 17 to 22, from 18 to 22, from 18 to 20, from 20 to 25, or from 25 to 30).
[0138] FIG. 5M
[0139] In some cases, a Casl2L protein (of the subject compositions and/or methods) includes an amino acid sequence having 20% or more sequence identity (e.g., 30% or more, 40% or more, 50% or more, 60% or more, 70% or more, 80% or more, 85% or more, 90% or more, 95% or more, 97% or more, 98% or more, 99% or more, or 100% sequence identity) with the Casl2L amino acid sequence depicted in FIG. 5M and designated “Casl2L_68.” For example, in some cases, a Casl2L protein includes an amino acid sequence having 50% or more sequence identity (e.g., 60% or more, 70% or more, 80% or more, 85% or more, 90% or more, 95% or more, 97% or more, 98% or more, 99% or more, or 100% sequence identity) with the Casl2L amino acid sequence depicted in FIG. 5M. In some cases, a Casl2L protein includes an amino acid sequence having 80% or more sequence identity (e.g., 85% or more, 90% or more, 95% or more, 97% or more, 98% or more, 99% or more, or 100% sequence identity) with the Casl2L amino acid sequence depicted in FIG. 5M. In some cases, a Casl2L protein includes an amino acid sequence having 90% or more sequence identity (e.g., 95% or more, 97% or more, 98% or more, 99% or more, or 100% sequence identity) with the Casl2L amino acid sequence depicted in FIG. 5M. In some cases, a Casl2L protein includes an amino acid sequence having the Casl2L amino acid sequence depicted in FIG. 5M. In some cases, a Casl2L protein includes an amino acid sequence having the Casl2L protein sequence depicted in FIG. 5M, with the exception that the sequence includes an amino acid substitution (e.g., 1, 2, or 3 amino acid substitutions) that reduces the naturally occurring catalytic activity of the protein. In some cases, the Casl2L polypeptide has a length of from 700 amino acids (aa) to 800 aa, e.g., from 700 aa to 725 aa, from 725 aa to 750 aa, from 750 aa to 775 aa, or from 775 aa to 800 aa). In some cases, the Casl2L polypeptide has a length of 746 amino acids. In some cases, a guide RNA that binds a Casl2L polypeptide (e.g., a Casl2L polypeptide comprising an amino acid sequence having 20% or more, 30% or more, 40% or more, 50% or more, 60% or more, 70% or more, 80% or more, 85% or more, 90% or more, 95% or more, 97% or more, 98% or more, 99% or more, or 100%, amino acid sequence identity to the Casl2L amino acid sequence depicted in FIG. 5M) includes the following nucleotide sequence: ATTGTTGTAACTCTTATTTTGTATGGAGTAAACAAC (SEQ ID NO:85) or the reverse complement of same. In some cases, the guide RNA comprises the nucleotide sequence (N)nATTGTTGTAACTCTTATTTTGTATGGAGTAAACAAC (SEQ ID NO: 86) or the reverse complement of same, where N is any nucleotide and n is an integer from 15 to 30, e.g., from 15 to 20, from 17 to 25, from 17 to 22, from 18 to 22, from 18 to 20, from 20 to 25, or from 25 to 30).
Casl2L Variants
[0140] A variant Casl2L protein has an amino acid sequence that is different by at least one amino acid (e.g., has a deletion, insertion, substitution, fusion) when compared to the amino acid sequence of the corresponding wild type Casl2L protein, e.g., when compared to the Casl2L amino acid sequence depicted in any one of FIG. 5A-5M. In some cases, a Casl2L variant comprises from 1 amino acid substitution to 10 amino acid substitutions compared to the Casl2L amino acid sequence depicted in any one of FIG. 5A-5M. In some cases, a Casl2L variant comprises from 1 amino acid substitution to 10 amino acid substitutions in the RuvC domain, compared to the Casl2L amino acid sequence depicted in any one of FIG. 5A-5M.
Variants - catalytic activity
[0141] In some cases, the Casl2L protein is a variant Casl2L protein, e.g., mutated relative to the naturally occurring catalytically active sequence, and exhibits reduced cleavage activity (e.g., exhibits 90%, or less, 80% or less, 70% or less, 60% or less, 50% or less, 40% or less, or 30% or less cleavage activity) when compared to the corresponding naturally occurring sequence. In some cases, such a variant Casl2L protein is a catalytically ‘dead’ protein (has substantially no cleavage activity) and can be referred to as a ‘dCas! 2L.’ In some cases, the variant Cas12L protein is a nickase (cleaves only one strand of a double stranded target nucleic acid, e.g., a double stranded target DNA). As described in more detail herein, in some cases, a Casl2L protein (in some case a Casl2L protein with wild type cleavage activity and in some cases a variant Casl2L with reduced cleavage activity, e.g., a dCasl2L or a nickase Casl2L) is fused (conjugated) to a heterologous polypeptide that has an activity of interest (e.g., a catalytic activity of interest) to form a fusion protein (a fusion Casl2L protein).
[0142] For example, in some cases, a variant Casl2L polypeptide comprises a substitution of one or more of D336, E523, and D676 based on the amino acid numbering of the amino acid sequence depicted in FIG. 5A, or corresponding amino acids of a Casl2L polypeptide depicted in any one of FIG. 5B-5M, where the variant Casl2L polypeptide exhibits reduced catalytic activity compared to a control Casl2L polypeptide that does not include the substitutions. For example, “corresponding amino acids” are shown in bold and underlining in FIG. 5B and FIG. 5C. For example, in some cases, a variant Casl2L polypeptide comprises a D336A substitution, i.e., D336, based on the amino acid numbering of the amino acid sequence depicted in FIG. 5 A, or a corresponding amino acid of a Casl2L polypeptide depicted in any one of FIG. 5B-5M, is replaced with an Ala. As another example, in some cases, a variant Casl2L polypeptide comprises an E523A substitution, i.e., E523, based on the amino acid numbering of the amino acid sequence depicted in FIG. 5A, or a corresponding amino acid of a Casl2L polypeptide depicted in any one of FIG. 5B-5M, is replaced with an Ala. As another example, in some cases, a variant Casl2L polypeptide comprises a D676A substitution, i.e., D676, based on the amino acid numbering of the amino acid sequence depicted in FIG. 5 A, or a corresponding amino acid of a Casl2L polypeptide depicted in any one of FIG. 5B-5M, is replaced with an Ala. As another example, in some cases, a variant Casl2L polypeptide comprises D336A, E523, and D676 substitutions, i.e., each of D336, E523, and D676, based on the amino acid numbering of the amino acid sequence depicted in FIG. 5A, or corresponding amino acids of a Casl2L polypeptide depicted in any one of FIG. 5B-5M, is replaced with an Ala. [0143] In some cases, a variant Casl2L polypeptide comprises a substitution of the Asn at position 102 (N102) of the CasL polypeptides depicted in FIG. 5A-5C, or corresponding positions in the CasL polypeptide of FIG. 5D-5M. Substitution of the N102 with another amino acid can modify the PAM requirement. For example, substitution of N102 with Q, S, E, T, or D could expand the PAM from R(-l) to C, T, or N. In some cases, a variant Casl2L polypeptide comprises a substitution of the Asn at position 102 (N102) of the CasL polypeptides depicted in FIG. 5A-5C, or corresponding positions in the CasL polypeptide of FIG. 5D-5M, with Gin. In some cases, a variant Casl2L polypeptide comprises a substitution of the Asn at position 102 (N102) of the CasL polypeptides depicted in FIG. 5A-5C, or corresponding positions in the CasL polypeptide of FIG. 5D-5M, with Ser. In some cases, a variant Casl2L polypeptide comprises a substitution of the Asn at position 102 (N102) of the CasL polypeptides depicted in FIG. 5A-5C, or corresponding positions in the CasL polypeptide of FIG. 5D-5M, with Glu. In some cases, a variant Casl2L polypeptide comprises a substitution of the Asn at position 102 (N102) of the CasL polypeptides depicted in FIG. 5A-5C, or corresponding positions in the CasL polypeptide of FIG. 5D-5M, with Asp.
Variants - fusion Casl2L polypeptides
[0144] As noted above, in some cases, a Casl2L protein (in some cases a Casl2L protein with wild type cleavage activity and in some cases a variant Casl2L with reduced cleavage activity, e.g., a dCasl2L or a nickase Casl2L) is fused (conjugated) to one or more heterologous polypeptides. In some cases, the heterologous polypeptide has an activity of interest (e.g., a catalytic activity of interest) to form a fusion protein. A heterologous polypeptide to which a Casl2L protein can be fused is referred to herein as a “fusion partner.”
[0145] In some cases, the fusion partner can modulate transcription (e.g., inhibit transcription, increase transcription) of a target DNA. For example, in some cases the fusion partner is a protein (or a domain from a protein) that inhibits transcription (e.g., a transcriptional repressor, a protein that functions via recruitment of transcription inhibitor proteins, modification of target DNA such as methylation, recruitment of a DNA modifier, modulation of histones associated with target DNA, recruitment of a histone modifier such as those that modify acetylation and/or methylation of histones, and the like). In some cases, the fusion partner is a protein (or a domain from a protein) that increases transcription (e.g., a transcription activator, a protein that acts via recruitment of transcription activator proteins, modification of target DNA such as demethylation, recruitment of a DNA modifier, modulation of histones associated with target DNA, recruitment of a histone modifier such as those that modify acetylation and/or methylation of histones, and the like). [0146] In some cases, the fusion partner (heterologous polypeptide) is a reverse transcriptase. In some cases, the fusion partner is a base editor. In some cases, the fusion partner (heterologous polypeptide) is a deaminase.
[0147] In some cases, a fusion Casl2L protein includes a heterologous polypeptide that has enzymatic activity that modifies a target nucleic acid (e.g., nuclease activity, methyltransferase activity, demethylase activity, DNA repair activity, DNA damage activity, deamination activity, dismutase activity, alkylation activity, depurination activity, oxidation activity, pyrimidine dimer forming activity, integrase activity, transposase activity, recombinase activity, polymerase activity, ligase activity, helicase activity, photolyase activity or glycosylase activity).
[0148] In some cases, a fusion Casl2L protein includes a heterologous polypeptide that has enzymatic activity that modifies a polypeptide (e.g., a histone) associated with a target nucleic acid (e.g., methyltransferase activity, demethylase activity, acetyltransferase activity, deacetylase activity, kinase activity, phosphatase activity, ubiquitin ligase activity, deubiquitinating activity, adenylation activity, deadenylation activity, SUMOylating activity, deSUMOylating activity, ribosylation activity, deribosylation activity, myristoylation activity or demyristoylation activity).
[0149] Examples of proteins (or fragments thereof) that can be used in increase transcription include but are not limited to: transcriptional activators such as VP16, VP64, VP48, VP160, p65 subdomain (e.g., from NFkB), and activation domain of EDLL and/or TAL activation domain (e.g., for activity in plants); histone lysine methyltransferases such as SET1A, SET1B, MLL1 to 5, ASH1, SYMD2, NSD1, and the like; histone lysine demethylases such as JHDM2a/b, UTX, JMJD3, and the like; histone acetyltransferases such as GCN5, PCAF, CBP, p300, TAF1, TIP60/PLIP, M0Z/MYST3, MORF/MYST4, SRC1, ACTR, P160, CLOCK, and the like; and DNA demethylases such as Ten-Eleven Translocation (TET) dioxygenase 1 (TET1CD), TET1, DME, DML1, DML2, ROS1, and the like.
[0150] Examples of proteins (or fragments thereof) that can be used in decrease transcription include but are not limited to: transcriptional repressors such as the Kriippel associated box (KRAB or SKD);
K0X1 repression domain; the Mad mSIN3 interaction domain (SID); the ERF repressor domain (ERD), the SRDX repression domain (e.g., for repression in plants), and the like; histone lysine methyltransferases such as Pr-SET7/8, SUV4-20H1, RIZ1, and the like; histone lysine demethylases such as JMJD2A/JHDM3A, JMJD2B, JMJD2C/GASC1, JMJD2D, JARID1A/RBP2, JARID1B/PLU-1, JARID1C/SMCX, JARID1D/SMCY, and the like; histone lysine deacetylases such as HDAC1, HDAC2, HDAC3, HDAC8, HDAC4, HDAC5, HDAC7, HDAC9, SIRT1, SIRT2, HDAC11, and the like; DNA methylases such as Hhal DNA m5c-methyltransferase (M.Hhal), DNA methyltransferase 1 (DNMT1 ), DNA methyltransferase 3a (DNMT3a), DNA methyltransferase 3b (DNMT3b), METI, DRM3 (plants), ZMET2, CMT1, CMT2 (plants), and the like; and periphery recruitment elements such as Lamin A, Lamin B, and the like.
[0151] In some cases, the fusion partner has enzymatic activity that modifies the target nucleic acid (e.g., ssRNA, dsRNA, ssDNA, dsDNA). Examples of enzymatic activity that can be provided by the fusion partner include but are not limited to: nuclease activity such as that provided by a restriction enzyme (e.g., FokI nuclease), methyltransferase activity such as that provided by a methyltransferase (e.g., Hhal DNA m5c-methyltransferase (M.Hhal), DNA methyltransferase 1 (DNMT1), DNA methyltransferase 3a (DNMT3a), DNA methyltransferase 3b (DNMT3b), METI, DRM3 (plants), ZMET2, CMT1, CMT2 (plants), and the like); demethylase activity such as that provided by a demethylase (e.g., Ten-Eleven Translocation (TET) dioxygenase 1 (TET1CD), TET1, DME, DML1, DML2, ROS1, and the like) , DNA repair activity, DNA damage activity, deamination activity such as that provided by a deaminase (e.g., a cytosine deaminase enzyme such as rat APOBEC1), dismutase activity, alkylation activity, depurination activity, oxidation activity, pyrimidine dimer forming activity, integrase activity such as that provided by an integrase and/or resolvase (e.g., Gin invertase such as the hyperactive mutant of the Gin invertase, GinH106Y; human immunodeficiency virus type 1 integrase (IN); Tn3 resolvase; and the like), transposase activity, recombinase activity such as that provided by a recombinase (e.g., catalytic domain of Gin recombinase), polymerase activity, ligase activity, helicase activity, photolyase activity, and glycosylase activity).
[0152] In some cases, the fusion partner has enzymatic activity that modifies a protein associated with the target nucleic acid (e.g., ssRNA, dsRNA, ssDNA, dsDNA) (e.g., a histone, an RNA binding protein, a DNA binding protein, and the like). Examples of enzymatic activity (that modifyies a protein associated with a target nucleic acid) that can be provided by the fusion partner include but are not limited to: methyltransferase activity such as that provided by a histone methyltransferase (HMT) (e.g., suppressor of variegation 3-9 homolog 1 (SUV39H1, also known as KMT1A), euchromatic histone lysine methyltransferase 2 (G9A, also known as KMT1C and EHMT2), SUV39H2, ESET/SETDB1, and the like, SET1 A, SET1 B, MLL1 to 5, ASH1 , SYMD2, NSD1 , DOT1 L, Pr-SET7/8, SUV4-20H1 , EZH2, RIZ1), demethylase activity such as that provided by a histone demethylase (e.g., Lysine Demethylase 1 A (KDM1A also known as LSD1), JHDM2a/b, JMJD2A/JHDM3A, JMJD2B, JMJD2C/GASC1, JMJD2D, JARID1A/RBP2, JARID1B/PLU-1, JARID1C/SMCX, JARID1D/SMCY, UTX, JMJD3, and the like), acetyltransferase activity such as that provided by a histone acetylase transferase (e.g., catalytic core/fragement of the human acetyltransferase p300, GCN5, PCAF, CBP, TAF1, TIP60/PLIP, M0Z/MYST3, MORF/MYST4, HBO1/MYST2, HMOF/MYST1 , SRC1 , ACTR, P160, CLOCK, and the like), deacetylase activity such as that provided by a histone deacetylase (e.g., HDAC1, HDAC2, HDAC3, HDAC8, HDAC4, HDAC5, HDAC7, HDAC9, SIRT1, SIRT2, HDAC11, and the like), kinase activity, phosphatase activity, ubiquitin ligase activity, deubiquitinating activity, adenylation activity, deadenylation activity, SUMOylating activity, deSUMOylating activity, ribosylation activity, deribosylation activity, myristoylation activity, and demyristoylation activity.
[0153] Additional examples of a suitable fusion partners are dihydrofolate reductase (DHFR) destabilization domain (e.g., to generate a chemically controllable fusion Casl2L protein), and a chloroplast transit peptide. Suitable chloroplast transit peptides include, but are not limited to:
[0154] MASMISSSAVTTVSRASRGQSAAMAPFGGLKSMTGFPVRKVNTDITSITSNGGRVKC MQVWPPIGKKKFETLSYLPPLTRDSRA (SEQ ID NO:95);
MASMISSSAVTTVSRASRGQSAAMAPFGGLKSMTGFPVRKVNTDITSITSNGGRVKS (SEQ ID NO:96);
MASSMLSSATMVASPAQATMVAPFNGLKSSAAFPATRKANNDITSITSNGGRVNCMQVWPPIEK KKFETLSYLPDLTDSGGRVNC (SEQ ID NO:97);
MAQVSRICNGVQNPSLISNLSKSSQRKSPLSVSLKTQQHPRAYPISSSWGLKKSGMTLIGSELRPL KVMSSVSTAC (SEQ ID NO:98);
MAQVSRICNGVWNPSLISNLSKSSQRKSPLSVSLKTQQHPRAYPISSSWGLKKSGMTLIGSELRPL KVMSSVSTAC (SEQ ID NO:99);
MAQINNMAQGIQTLNPNSNFHKPQVPKSSSFLVFGSKKLKNSANSMLVLKKDSIFMQLFCSFRIS ASVATAC (SEQ ID NO: 100);
MAALVTSQLATSGTVLSVTDRFRRPGFQGLRPRNPADAALGMRTVGASAAPKQSRKPHRFDRR CLSMVV (SEQ ID NO: 101);
MAALTTSQLATSATGFGIADRSAPSSLLRHGFQGLKPRSPAGGDATSLSVTTSARATPKQQRSVQ RGSRRFPSVVVC (SEQ ID NO: 102);
MASSVLSSAAVATRSNVAQANMVAPFTGLKSAASFPVSRKQNLDITSIASNGGRVQC (SEQ ID NO:103);
MESLAATSVFAPSRVAVPAARALVRAGTVVPTRRTSSTSGTSGVKCSAAVTPQASPVISRSAAAA (SEQ ID NO: 104); and MGAAATSMQSLKFSNRLVPPSRRLSPVPNNVTCNNLPKSAAPVRTVKCCASSWNSTINGAAATT NGASAASS (SEQ ID NO: 105).
[0155] In some case, a Casl2L fusion polypeptide of the present disclosure comprises: a) a Casl2L polypeptide of the present disclosure; and b) a chloroplast transit peptide. Thus, for example, a Casl2L polypeptide/guide RNA complex can be targeted to the chloroplast. In some cases, this targeting may be achieved by the presence of an N-terminal extension, called a chloroplast transit peptide (CTP) or plastid transit peptide. Chromosomal transgenes from bacterial sources must have a sequence encoding a CTP sequence fused to a sequence encoding an expressed polypeptide if the expressed polypeptide is to be compartmentalized in the plant plastid (e.g. chloroplast). Accordingly, localization of an exogenous polypeptide to a chloroplast is often 1 accomplished by means of operably linking a polynucleotide sequence encoding a CTP sequence to the 5' region of a polynucleotide encoding the exogenous polypeptide. The CTP is removed in a processing step during translocation into the plastid. Processing efficiency may, however, be affected by the amino acid sequence of the CTP and nearby sequences at the amino terminus (NH2 terminus) of the peptide. Other options for targeting to the chloroplast which have been described are the maize cab-m7 signal sequence (U.S. Pat. No. 7,022,896, WO 97/41228) a pea glutathione reductase signal sequence (WO 97/41228) and the CTP described in US2009029861.
[0156] In some cases, a Casl2L fusion polypeptide of the present disclosure can comprise: a) a Casl2L polypeptide of the present disclosure; and b) an endosomal escape peptide. In some cases, an endosomal escape polypeptide comprises the amino acid sequence GLFXALLXLLXSLWXLLLXA (SEQ ID NO: 106), wherein each X is independently selected from lysine, histidine, and arginine. In some cases, an endosomal escape polypeptide comprises the amino acid sequence GLFHALLHLLHSLWHLLLHA (SEQ ID NO: 107).
[0157] For examples of some of the above fusion partners (and more) used in the context of fusions with Cas9, Zinc Finger, and/or TALE proteins (for site specific target nucleic modification, modulation of transcription, and/or target protein modification, e.g., histone modification), see, e.g.: Nomura et al, J Am Chem Soc. 2007 Jul 18;129(28):8676-7; Rivenbark et al., Epigenetics. 2012 Apr;7(4):350-60; Nucleic Acids Res. 2016 Jul 8;44(12):5615-28; Gilbert et al., Cell. 2013 Jul 18;154(2):442-51; Kearns et al., Nat Methods. 2015 May;12(5):401-3; Mendenhall et al., Nat Biotechnol. 2013 Dec;31(12):1133-6; Hilton et al., Nat Biotechnol. 2015 May;33(5):510-7; Gordley et al., Proc Natl Acad Sci U S A. 2009 Mar 31 ; 106(13):5O53-8; Akopian et al., Proc Natl Acad Sci U S A. 2003 Jul 22;100(15):8688-91; Tan et., al., J Virol. 2006 Feb; 80(4): 1939-48; Tan et al., Proc Natl Acad Sci U S A. 2003 Oct 14; 100(21): 11997- 2002; Papworth et al., Proc Natl Acad Sci U S A. 2003 Feb 18;100(4): 1621 -6; Sanjana et al., Nat Protoc. 2012 Jan 5;7(l):171-92; Beerli et al., Proc Natl Acad Sci U S A. 1998 Dec 8;95(25): 14628-33;
Snowden et al., Curr Biol. 2002 Dec 23;12(24):2159-66; Xu et.al., Xu et al., Cell Discov. 2016 May 3;2: 16009; Komor et al., Nature. 2016 Apr 20;533(7603):420-4; Chaikind et al., Nucleic Acids Res. 2016 Aug 11; Choudhury at. al., Oncotarget. 2016 Jun 23; Du et al., Cold Spring Harb Protoc. 2016 Jan 4; Pham et al., Methods Mol Biol. 2016;1358:43-57; Balboa et al., Stem Cell Reports. 2015 Sep 8;5(3):448- 59; Hara et al., Sci Rep. 2015 Jun 9;5 : 1 1221 ; Piatek et al., Plant Biotechnol J. 2015 May;13(4):578-89; Hu et al., Nucleic Acids Res. 2014 Apr;42(7):4375-90; Cheng et al., Cell Res. 2013 Oct;23(10):1163-71; and Maeder et al., Nat Methods. 2013 Oct;10(10):977-9.
[0158] Additional suitable heterologous polypeptides include, but are not limited to, a polypeptide that directly and/or indirectly provides for increased transcription and/or translation of a target nucleic acid (e.g., a transcription activator or a fragment thereof, a protein or fragment thereof that recruits a transcription activator, a small molecule/drug-responsive transcription and/or translation regulator, a translation-regulating protein, etc.). Non-limiting examples of heterologous polypeptides to accomplish increased or decreased transcription include transcription activator and transcription repressor domains. In some such cases, a fusion Casl2L polypeptide is targeted by the guide nucleic acid (guide RNA) to a specific location (i.e., sequence) in the target nucleic acid and exerts locus-specific regulation such as blocking RNA polymerase binding to a promoter (which selectively inhibits transcription activator function), and/or modifying the local chromatin status (e.g., when a fusion sequence is used that modifies the target nucleic acid or modifies a polypeptide associated with the target nucleic acid). In some cases, the changes are transient (e.g., transcription repression or activation). In some cases, the changes are inheritable (e.g., when epigenetic modifications are made to the target nucleic acid or to proteins associated with the target nucleic acid, e.g., nucleosomal histones).
[0159] Non-limiting examples of heterologous polypeptides for use when tar geting ssRNA tar get nucleic acids include (but are not limited to): splicing factors (e.g., RS domains); protein translation components (e.g., translation initiation, elongation, and/or release factors; e.g., eIF4G); RNA methylases; RNA editing enzymes (e.g., RNA deaminases, e.g., adenosine deaminase acting on RNA (ADAR), including A to I and/or C to U editing enzymes); helicases; RNA-binding proteins; and the like. It is understood that a heterologous polypeptide can include the entire protein or in some cases can include a fragment of the protein (e.g., a functional domain).
[0160] The heterologous polypeptide of a subject fusion Casl2L polypeptide can be any domain capable of interacting with ssRNA (which, for the purposes of this disclosure, includes intramolecular and/or intermolecular secondary structures, e.g., double-stranded RNA duplexes such as hairpins, stemloops, etc.), whether transiently or irreversibly, directly or indirectly, including but not limited to an effector domain selected from the group comprising; Endonucleases (for example RNase III, the CRR22 DYW domain, Dicer, and PIN (PilT N-terminus) domains from proteins such as SMG5 and SMG6); proteins and protein domains responsible for stimulating RNA cleavage (for example CPSF, CstF, CFIm and CFIIm); Exonucleases (for example XRN-1 or Exonuclease T) ; Deadenylases (for example HNT3); proteins and protein domains responsible for nonsense mediated RNA decay (for example UPF1 , UPF2, UPF3, UPF3b, RNP SI, Y14, DEK, REF2, and SRml60); proteins and protein domains responsible for stabilizing RNA (for example PABP) ; proteins and protein domains responsible for repressing translation (for example Ago2 and Ago4); proteins and protein domains responsible for stimulating translation (for example Staufen); proteins and protein domains responsible for (e.g., capable of) modulating translation (e.g., translation factors such as initiation factors, elongation factors, release factors, etc., e.g., eIF4G); proteins and protein domains responsible for polyadenylation of RNA (for example PAP1, GLD-2, and Star- PAP) ; proteins and protein domains responsible for polyuridinylation of RNA (for example CI DI and terminal uridylate transferase) ; proteins and protein domains responsible for RNA localization (for example from IMP1, ZBP1, She2p, She3p, and Bicaudal-D); proteins and protein domains responsible for nuclear retention of RNA (for example Rrp6); proteins and protein domains responsible for nuclear export of RNA (for example TAP, NXF1, THO, TREX, REF, and Aly) ; proteins and protein domains responsible for repression of RNA splicing (for example PTB, Sam68, and hnRNP Al) ; proteins and protein domains responsible for stimulation of RNA splicing (for example Serine/ Arginine-rich (SR) domains) ; proteins and protein domains responsible for reducing the efficiency of transcription (for example FUS (TLS)); and proteins and protein domains responsible for stimulating transcription (for example CDK7 and HIV Tat). Alternatively, the effector domain may be selected from the group comprising Endonucleases; proteins and protein domains capable of stimulating RNA cleavage; Exonucleases; Deadenylases; proteins and protein domains having nonsense mediated RNA decay activity; proteins and protein domains capable of stabilizing RNA; proteins and protein domains capable of repressing translation; proteins and protein domains capable of stimulating translation; proteins and protein domains capable of modulating translation (e.g., translation factors such as initiation factors, elongation factors, release factors, etc., e.g., eIF4G); proteins and protein domains capable of polyadenylation of RNA; proteins and protein domains capable of polyuridinylation of RNA; proteins and protein domains having RNA localization activity; proteins and protein domains capable of nuclear retention of RNA; proteins and protein domains having RNA nuclear export activity; proteins and protein domains capable of repression of RNA splicing; proteins and protein domains capable of stimulation of RNA splicing; proteins and protein domains capable of reducing the efficiency of transcription ; and proteins and protein domains capable of stimulating transcription. Another suitable heterologous polypeptide is a PUF RNA-binding domain, which is described in more detail in WO2012068627, which is hereby incorporated by reference in its entirety.
[0161] Some RNA splicing factors that can be used (in whole or as fragments thereof) as heterologous polypeptides for a fusion Casl2L polypeptide have modular organization, with separate sequence-specific RNA binding modules and splicing effector domains. For example, members of the Serine/ Arginine-rich (SR) protein family contain N-terminal RNA recognition motifs (RRMs) that bind to exonic splicing enhancers (ESEs) in pre-mRNAs and C-terminal RS domains that promote exon inclusion. As another example, the hnRNP protein hnRNP Al binds to exonic splicing silencers (ESSs) through its RRM domains and inhibits exon inclusion through a C-terminal Glycine -rich domain. Some splicing factors can regulate alternative use of splice site (ss) by binding to regulatory sequences between the two alternative sites. For example, ASF/SF2 can recognize ESEs and promote the use of intron proximal sites, whereas hnRNP Al can bind to ESSs and shift splicing towards the use of intron distal sites. One application for such factors is to generate ESFs that modulate alternative splicing of endogenous genes, particularly disease associated genes. For example, Bcl-x pre-mRNA produces two splicing isoforms with two alternative 5' splice sites to encode proteins of opposite functions. The long splicing isoform Bcl-xL is a potent apoptosis inhibitor expressed in long-lived postmitotic cells and is up- regulated in many cancer cells, protecting cells against apoptotic signals. The short isoform Bcl-xS is a pro-apoptotic isoform and expressed at high levels in cells with a high turnover rate (e.g., developing lymphocytes). The ratio of the two Bcl-x splicing isoforms is regulated by multiple co'j-clcmcnts that are located in either the core exon region or the exon extension region (i.e., between the two alternative 5' splice sites). For more examples, see W02010075303, which is hereby incorporated by reference in its entirety.
[0162] Further suitable fusion partners include, but are not limited to, proteins (or fragments thereof) that are boundary elements (e.g., CTCF), proteins and fragments thereof that provide periphery recruitment (e.g., Lamin A, Lamin B, etc.), protein docking elements (e.g., FKBP/FRB, Pill/Abyl, etc.).
Nucleases
[0163] In some cases, a subject fusion Casl2L polypeptide comprises: i) a Casl2L polypeptide of the present disclosure; and ii) one or more heterologous polypeptides (one or more “fusion partners”), where at least one of the one or more heterologous polypeptides is a nuclease. Suitable nucleases include, but are not limited to, a homing nuclease polypeptide; a FokI polypeptide; a transcription activator-like effector nuclease (TALEN) polypeptide; a MegaTAL polypeptide; a meganuclease polypeptide; a zinc finger nuclease (ZFN); an ARCUS nuclease; and the like. The meganuclease can be engineered from an LADLIDADG homing endonuclease (LHE). A megaTAL polypeptide can comprise a TALE DNA binding domain and an engineered meganuclease. See, e.g., WO 2004/067736 (homing endonuclease); Urnov et al. (2005) Nature 435:646 (ZFN); Mussolino et al. (2011) Nude. Adds Res. 39:9283 (TALE nuclease); Boissel et al. (2013) Nucl. Acids Res. 42:2591 (MegaTAL). Reverse transcriptases
[0164J In some cases, a subject fusion Casl2L polypeptide comprises: i) a Casl2L polypeptide of the present disclosure; and ii) one or more heterologous polypeptides, where at least one of the one or more heterologous polypeptides is a reverse transcriptase polypeptide. In some cases, the Casl2L polypeptide is catalytically inactive. Suitable reverse transcriptases include, e.g., a murine leukemia virus reverse transcriptase; a Rous sarcoma virus reverse transcriptase; a human immunodeficiency virus type I reverse transcriptase; a Moloney murine leukemia virus reverse transcriptase; and the like.
Base editors
[0165] In some cases, a Casl2L fusion polypeptide of the present disclosure comprises: i) a Casl2L polypeptide of the present disclosure; and ii) one or more heterologous polypeptides, where at least one of the one or more heterologous polypeptides is a base editor. Suitable base editors include, e.g., an adenosine deaminase; a cytidine deaminase (e.g., an activation-induced cytidine deaminase (AID)); APOBEC3G; and the like); and the like.
[0166] A suitable adenosine deaminase is any enzyme that is capable of deaminating adenosine in DNA. In some cases, the deaminase is a TadA deaminase.
[0167] In some cases, a suitable adenosine deaminase comprises an amino acid sequence having at least 80%, at least 85%, at least 90%, at least 95%, at least 98%, at least 99%, or 100%, amino acid sequence identity to the following amino acid sequence: MSEVEFSHEYWMRHALTLAKRAWDEREVPVGAVLVHNNRVIGEGWNRPIGRHDPTAHAEIMA LRQGGLVMQNYRL1DATLYVTLEPCVMCAGAM1HSR1GRVVFGARDAKTGAAGSLMDVLHHP GMNHRVEITEGILADECAALLSDFFRMRRQEIKAQKKAQSSTD (SEQ ID NO: 108)
[0168] In some cases, a suitable adenosine deaminase comprises an amino acid sequence having at least 80%, at least 85%, at least 90%, at least 95%, at least 98%, at least 99%, or 100%, amino acid sequence identity to the following amino acid sequence: MRRAFITGVFFLSEVEFSHEYWMRHALTLAKRAWDEREVPVGAVLVHNNRVIGEGWNRPIGRH DPTAHAEIMALRQGGLVMQNYRLIDATLYVTLEPCVMCAGAMIHSRIGRVVFGARDAKTGAAG SLMDVLHHPGMNHRVE1TEGILADECAALLSDFFRMRRQEIKAQKKAQSSTD (SEQ ID NO: 109).
[0169] In some cases, a suitable adenosine deaminase comprises an amino acid sequence having at least 80%, at least 85%, at least 90%, at least 95%, at least 98%, at least 99%, or 100%, amino acid sequence identity to the following Staphylococcus aureus TadA amino acid sequence: MGSHMTNDIYFMTLAIEEAKKAAQLGEVPIGAIITKDDEVIARAHNLRETLQQPTAHAEHIAIERA AKVLGSWRLEGCTLYVTLEPCVMCAGTIVMSRIPRVVYGADDPKGGCSGSLMNLLQQSNFNHR AIVDKGVLKEACSTLLTTFFKNLRANKKSTN (SEQ ID NO: 110)
[0170] In some cases, a suitable adenosine deaminase comprises an amino acid sequence having at least 80%, at least 85%, at least 90%, at least 95%, at least 98%, at least 99%, or 100%, amino acid sequence identity to the following Bacillus subtilis TadA amino acid sequence:
MTQDELYMKEAIKEAKKAEEKGEVPIGAVLVINGEIIARAHNLRETEQRSIAHAEMLVIDEACKA LGTWRLEGATLYVTLEPCPMCAGAVVLSRVEKVVFGAFDPKGGCSGTLMNLLQEERFNHQAEV VSGVLEEECGGMLSAFFRELRKKKKAARKNLSE (SEQ ID NO: 111)
[0171] In some cases, a suitable adenosine deaminase comprises an amino acid sequence having at least 80%, at least 85%, at least 90%, at least 95%, at least 98%, at least 99%, or 100%, amino acid sequence identity to the following Salmonella typhimurium TadA:
MPPAFITGVTSLSDVELDHEYWMRHALTLAKRAWDEREVPVGAVLVHNHRVIGEGWNRPIGRH DPTAHAEIMALRQGGLVLQNYRLLDTTLYVTLEPCVMCAGAMVHSRIGRVVFGARDAKTGAA GSLIDVLHHPGMNHRVEIIEGVLRDECATLLSDFFRMRRQEIKALKKADRAEGAGPAV (SEQ ID NO:112)
[0172] In some cases, a suitable adenosine deaminase comprises an amino acid sequence having at least 80%, at least 85%, at least 90%, at least 95%, at least 98%, at least 99%, or 100%, amino acid sequence identity to the following Shewanella putrefaciens TadA amino acid sequence:
MDEYWMQVAMQMAEKAEAAGEVPVGAVLVKDGQQIATGYNLSISQHDPTAHAEILCLRSAGK KLENYRLLDATLYITLEPCAMCAGAMVHSRIARVVYGARDEKTGAAGTVVNLLQHPAFNHQVE VTSGVLAEACSAQLSRFFKRRRDEKKALKLAQRAQQGIE (SEQ ID NO: 113)
[0173] In some cases, a suitable adenosine deaminase comprises an amino acid sequence having at least 80%, at least 85%, at least 90%, at least 95%, at least 98%, at least 99%, or 100%, amino acid sequence identity to the following Haemophilus influenzae F3031 TadA amino acid sequence:
MDAAKVRSEFDEKMMRYALELADKAEALGEIPVGAVLVDDARNIIGEGWNLSIVQSDPTAHAEI
IALRNGAKNIQNYRLLNSTLYVTLEPCTMCAGAILHSRIKRLVFGASDYKTGAIGSRFHFFDDYK
MNHTLEITSGVLAEECSQKLS TFFQKRREEKKIEKALLKSLSDK (SEQ ID NO: 114)
[0174] In some cases, a suitable adenosine deaminase comprises an amino acid sequence having at least 80%, at least 85%, at least 90%, at least 95%, at least 98%, at least 99%, or 100%, amino acid sequence identity to the following Caulobacter crescentus TadA amino acid sequence:
MRTDESEDQDHRMMRLALDAARAAAEAGETPVGAVILDPSTGEVIATAGNGPIAAHDPTAHAEI AAMRAAAAKLGNYRLTDLTLVVTLEPCAMCAGAISHARIGRVVFGADDPKGGAVVHGPKFFA QPTCHWRPEVTGGVLADESADLLRGFFRARRKAKI (SEQ ID NO: 115) [0175] In some cases, a suitable adenosine deaminase comprises an amino acid sequence having at least 80%, at least 85%, at least 90%, at least 95%, at least 98%, at least 99%, or 100%, amino acid sequence identity to the following Geobacter sulfurreducens TadA amino acid sequence: MSSLKKTPIRDDAYWMGKAIREAAKAAARDEVPIGAVIVRDGAVIGRGHNLREGSNDPSAHAE MIAIRQAARRSANWRLTGATLYVTLEPCLMCMGAIILARLERVVFGCYDPKGGAAGSLYDLSAD PRLNHQVRLSPGVCQEECGTMLSDFFRDLRRRKKAKATPALFIDERKVPPEP (SEQ ID NO: 116) [0176] Cytidine deaminases suitable for inclusion in a Casl2L fusion polypeptide include any enzyme that is capable of deaminating cytidine in DNA.
[0177] In some cases, the cytidine deaminase is a deaminase from the apolipoprotein B mRNA- editing complex (APOBEC) family of deaminases. In some cases, the APOB EC family deaminase is selected from the group consisting of APOBEC 1 deaminase, APOBEC2 deaminase, APOBEC3A deaminase, APOBEC3B deaminase, APOBEC3C deaminase, APOBEC3D deaminase, APOBEC3F deaminase, APOBEC3G deaminase, and APOBEC3H deaminase. In some cases, the cytidine deaminase is an activation induced deaminase (AID).
[0178] In some cases, a suitable cytidine deaminase comprises an amino acid sequence having at least 80%, at least 85%, at least 90%, at least 95%, at least 98%, at least 99%, or 100%, amino acid sequence identity to the following amino acid sequence: MDSLLMNRRKFLYQFKNVRWAKGRRETYLCYVVKRRDSATSFSLDFGYLRNKNGCHVELLFLR YISDWDLDPGRCYRVTWFTSWSPCYDCARHVADFLRGNPNLSLRIFTARLYFCEDRKAEPEGLR RLHRAGVQIAIMTFKDYFYCWNTFVENHERTFKAWEGLHENSVRLSRQLRRILLPLYEVDDLRD AFRTLGL (SEQ ID NO: 117)
[0179] In some cases, a suitable cytidine deaminase is an AID and comprises an amino acid sequence having at least 80%, at least 85%, at least 90%, at least 95%, at least 98%, at least 99%, or 100%, amino acid sequence identity to the following amino acid sequence: MDSLLMNRRKFLYQFKNVRWAKGRRETYLCYVVKRRDSATSFSLDFGYLRNKNGCHVELLFLR YISDWDLDPGRCYRVTWFTSWSPCYDCARHVADFLRGNPNLSLRIFTARLYFCEDRKAEPEGLR RLHRAGVQIAIMTFKENHERTFKAWEGLHENSVRLSRQLRRILLPLYEVDDLRD AFRTLGL (SEQ ID NO: 118).
[0180] In some cases, a suitable cytidine deaminase is an AID and comprises an amino acid sequence having at least 80%, at least 85%, at least 90%, at least 95%, at least 98%, at least 99%, or 100%, amino acid sequence identity to the following amino acid sequence: MDSLLMNRRKFLYQFKNVRWAKGRRETYLCYVVKRRDSATSFSLDFGYLRNKNGCHVELLFLR YISDWDLDPGRCYRVTWFTSWSPCYDCARHVADFLRGNPNLSLRIFTARLYFCEDRKAEPEGLR RLHRAGVQIAIMTFKDYFYCWNTFVENHERTFKAWEGLHENSVRLSRQLRRILLPLYEVDDLRD AFRTLGL (SEQ ID NO: 117).
Transcription factors
[0181] In some cases, a Casl2L fusion polypeptide of the present disclosure comprises: i) a Casl2L polypeptide of the present disclosure; and ii) one or more heterologous polypeptides, where at least one of the one or more heterologous polypeptides is a transcription factor. A transcription factor can include: i) a DNA binding domain; and ii) a transcription activator. A transcription factor can include: i) a DNA binding domain; and ii) a transcription repressor. Suitable transcription factors include polypeptides that include a transcription activator or a transcription repressor domain (e.g., the Kruppel associated box (KRAB or SKD); the Mad mSIN3 interaction domain (SID); the ERF repressor domain (ERD), etc.); zinc-finger-based artificial transcription factors (see, e.g., Sera (2009) Adv. Drug Deliv. 61:513); TALE- based artificial transcription factors (see, e.g., Liu et al. (2013) Nat. Rev. Genetics 14:781); and the like. In some cases, the transcription factor comprises a VP64 polypeptide (transcriptional activation). In some cases, the transcription factor comprises a Kriippel-associated box (KRAB) polypeptide (transcriptional repression). In some cases, the transcription factor comprises a Mad mSIN3 interaction domain (SID) polypeptide (transcriptional repression). In some cases, the transcription factor comprises an ERF repressor domain (ERD) polypeptide (transcriptional repression). For example, in some cases, the transcription factor is a transcriptional activator, where the transcriptional activator is GAL4-VP16.
Recombinases
[0182] In some cases, a Casl2L fusion polypeptide of the present disclosure comprises: i) a Casl2L polypeptide of the present disclosure; and ii) one or more heterologous polypeptides, where at least one of the one or more heterologous polypeptides is a recombinase. Suitable recombinases include, e.g., a Cre recombinase; a Hin recombinase; a Tre recombinase; a FLP recombinase; and the like.
[0183] Examples of various additional suitable heterologous polypeptide (or fragments thereof) for a subject fusion Casl2L polypeptide include, but are not limited to, those described in the following applications (which publications are related to other CRISPR endonucleases such as Cas9, but the described fusion partners can also be used with Cast 2L instead): PCT patent applications: W02010075303, WO2012068627, and WO2013155555, and can be found, for example, in U.S. patents and patent applications: 8,906,616; 8,895,308; 8,889,418; 8,889,356; 8,871,445; 8,865,406; 8,795,965; 8,771,945; 8,697,359; 20140068797; 20140170753; 20140179006; 20140179770; 20140186843; 20140186919; 20140186958; 20140189896; 20140227787; 20140234972; 20140242664; 20140242699; 20140242700; 20140242702; 20140248702; 20140256046; 20140273037; 20140273226; 20140273230; 20140273231; 20140273232; 20140273233; 20140273234; 20140273235; 20140287938; 20140295556; 20140295557; 20140298547; 20140304853; 20140309487; 20140310828; 20140310830; 20140315985; 20140335063; 20140335620; 20140342456; 20140342457; 20140342458; 20140349400; 20140349405; 20140356867; 20140356956; 20140356958; 20140356959; 20140357523; 20140357530; 20140364333; and 20140377868; all of which are hereby incorporated by reference in their entirety.
[0184] In some cases, a heterologous polypeptide (a fusion partner) provides for subcellular localization, i.e., the heterologous polypeptide contains a subcellular localization sequence (e.g., a nuclear localization signal (NLS) for targeting to the nucleus, a sequence to keep the fusion protein out of the nucleus, e.g., a nuclear export sequence (NES), a sequence to keep the fusion protein retained in the cytoplasm, a mitochondrial localization signal for targeting to the mitochondria, a chloroplast localization signal for targeting to a chloroplast, an ER retention signal, and the like). In some embodiments, a Casl2L fusion polypeptide does not include an NLS so that the protein is not targeted to the nucleus (which can be advantageous, e.g., when the target nucleic acid is an RNA that is present in the cytosol). In some embodiments, the heterologous polypeptide can provide a tag (i.e., the heterologous polypeptide is a detectable label) for ease of tracking and/or purification (e.g., a fluorescent protein, e.g., green fluorescent protein (GFP), yellow fluorescent protein (YFP), red fluorescent protein (RFP), cyan fluorescent protein (CFP), mCherry, tdTomato, and the like; a histidine tag, e.g., a 6XHis tag; a hemagglutinin (HA) tag; a FLAG tag; a Myc tag; and the like).
[0185] In some cases, a Casl2L protein (e.g., a wild type Casl2L protein, a variant Casl2L protein, a fusion Casl2L protein, a dCasl2L protein, and the like) includes (is fused to) a nuclear localization signal (NLS) (e.g., in some cases 2 or more, 3 or more, 4 or more, or 5 or more NLSs). Thus, in some cases, a Casl2L polypeptide includes one or more NLSs (e.g., 2 or more, 3 or more, 4 or more, or 5 or more NLSs). In some cases, one or more NLSs (2 or more, 3 or more, 4 or more, or 5 or more NLSs) are positioned at or near (e.g., within 50 amino acids of) the N-terminus and/or the C-terminus. In some cases, one or more NLSs (2 or more, 3 or more, 4 or more, or 5 or more NLSs) are positioned at or near (e.g., within 50 amino acids of) the N-terminus. In some cases, one or more NLSs (2 or more, 3 or more, 4 or more, or 5 or more NLSs) are positioned at or near (e.g., within 50 amino acids of) the C-terminus. In some cases, one or more NLSs (3 or more, 4 or more, or 5 or more NLSs) are positioned at or near (e.g., within 50 amino acids of) both the N-terminus and the C-terminus. In some cases, an NLS is positioned at the N-terminus and an NLS is positioned at the C-terminus.
[0186] In some cases, a Casl2L protein (e.g., a wild type Casl2L protein, a variant Casl2L protein, a fusion Cas12L protein, a dCas! 2L protein, and the like) includes (is fused to) between 1 and 10 NLSs (e.g., 1-9, 1-8, 1-7, 1-6, 1-5, 2-10, 2-9, 2-8, 2-7, 2-6, or 2-5 NLSs). In some cases, a Casl2L protein (e.g., a wild type Casl2L protein, a variant Casl2L protein, a fusion Casl2L protein, a dCasl2L protein, and the like) includes (is fused to) between 2 and 5 NLSs (e.g., 2-4, or 2-3 NLSs).
[0187] Non-limiting examples of NLSs include an NLS sequence derived from: the NLS of the SV40 virus large T-antigen, having the amino acid sequence PKKKRKV (SEQ ID NO: 119); the NLS from nucleoplasmin (e.g., the nucleoplasmin bipartite NLS with the sequence KRPAATKKAGQAKKKK (SEQ ID NO: 120)); the c-myc NLS having the amino acid sequence PAAKRVKLD (SEQ ID NO: 121) or RQRRNELKRSP (SEQ ID NO: 122); the hRNPAl M9 NLS having the sequence NQSSNFGPMKGGNFGGRSSGPYGGGGQYFAKPRNQGGY (SEQ ID NO: 123); the sequence RMRIZFKNKGKDTAELRRRRVEVSVELRKAKKDEQILKRRNV (SEQ ID NO: 124) of the IBB domain from importin-alpha; the sequences VSRKRPRP (SEQ ID NO: 125) and PPKKARED (SEQ ID NO:126) of the myoma T protein; the sequence PQPKKKPL (SEQ ID NO:127) of human p53; the sequence SALIKKKKKMAP (SEQ ID NO: 128) of mouse c-abl IV; the sequences DRLRR (SEQ ID NO:129) and PKQKKRK (SEQ ID NO:130) of the influenza virus NS1; the sequence RKLKKKIKKL (SEQ ID NO: 131) of the Hepatitis virus delta antigen; the sequence REKKKFLKRR (SEQ ID NO: 132) of the mouse Mxl protein; the sequence KRKGDEVDGVDEVAKKKSKK (SEQ ID NO:133) of the human poly(ADP-ribose) polymerase; and the sequence RKCLQAGMNLEARKTKK (SEQ ID NO: 134) of the steroid hormone receptors (human) glucocorticoid. In general, NLS (or multiple NLSs) are of sufficient strength to drive accumulation of the Casl2L protein in a detectable amount in the nucleus of a eukaryotic cell. Detection of accumulation in the nucleus may be performed by any suitable technique. For example, a detectable marker may be fused to the Casl2L protein such that location within a cell may be visualized. Cell nuclei may also be isolated from cells, the contents of which may then be analyzed by any suitable process for detecting protein, such as immunohistochemistry, Western blot, or enzyme activity assay. Accumulation in the nucleus may also be determined indirectly.
[0188] In some cases, a Casl2L fusion polypeptide includes a "Protein Transduction Domain" or PTD (also known as a CPP - cell penetrating peptide), which refers to a polypeptide, polynucleotide, carbohydrate, or organic or inorganic compound that facilitates traversing a lipid bilayer, micelle, cell membrane, organelle membrane, or vesicle membrane. A PTD attached to another molecule, which can range from a small polar molecule to a large macromolecule and/or a nanoparticle, facilitates the molecule traversing a membrane, for example going from extracellular space to intracellular space, or cytosol to within an organelle. In some embodiments, a PTD is covalently linked to the amino terminus a polypeptide (e.g., linked to a wild type Casl2L polypeptide to generate a fusion protein, or linked to a variant Cas12L protein such as a dCas! 2L, nickase Cast 2L, or fusion Cas12L protein, to generate a fusion protein). In some cases, a PTD is covalently linked to the carboxyl terminus of a polypeptide (e.g., linked to a wild type Casl2L to generate a fusion protein, or linked to a variant Casl2L protein such as a dCasl2L, nickase Casl2L, or fusion Casl2L protein to generate a fusion protein). In some cases, the PTD is inserted internally in the Casl2L fusion polypeptide (i.e., is not at the N- or C-terminus of the Casl2L fusion polypeptide) at a suitable insertion site. In some cases, a subject Casl2L fusion polypeptide includes (is conjugated to, is fused to) one or more PTDs (e.g., two or more, three or more, four or more PTDs). In some cases, a PTD includes a nuclear localization signal (NLS) (e.g., in some cases 2 or more, 3 or more, 4 or more, or 5 or more NLSs). Thus, in some cases, a Casl2L fusion polypeptide includes one or more NLSs (e.g., 2 or more, 3 or more, 4 or more, or 5 or more NLSs). In some embodiments, a PTD is covalently linked to a nucleic acid (e.g., a Casl2L guide nucleic acid, a polynucleotide encoding a Casl2L guide nucleic acid, a polynucleotide encoding a Casl2L fusion polypeptide, a donor polynucleotide, etc.). Examples of PTDs include but are not limited to a minimal undecapeptide protein transduction domain (corresponding to residues 47-57 of HIV-1 TAT comprising YGRKKRRQRRR; SEQ ID NO: 135); a polyarginine sequence comprising a number of arginines sufficient to direct entry into a cell (e.g., 3, 4, 5, 6, 7, 8, 9, 10, or 10-50 arginines); a VP22 domain (Zender et al. (2002) Cancer Gene Ther. 9(6):489-96); a Drosophila Antennapedia protein transduction domain (Noguchi et al. (2003) Diabetes 52(7): 1732-1737); a truncated human calcitonin peptide (Trehin et al. (2004) Pharm. Research 21:1248-1256); polylysine (Wender et al. (2000) Proc. Natl. Acad. Sci. USA 97:13003-13008);
RRQRRTSKLMKR (SEQ ID NO: 136); Transportan GWTLNSAGYLLGKINLKALAALAKKIL (SEQ ID NO: 137); KALAWEAKLAKALAKALAKHLAKALAKALKCEA (SEQ ID NO: 138); and RQIKIWFQNRRMKWKK (SEQ ID NO: 139). Exemplary PTDs include but are not limited to, YGRKKRRQRRR (SEQ ID NO: 135), RKKRRQRRR (SEQ ID NO: 140); an arginine homopolymer of from 3 arginine residues to 50 arginine residues; Exemplary PTD domain amino acid sequences include, but are not limited to, any of the following: YGRKKRRQRRR (SEQ ID NO: 135); RKKRRQRR (SEQ ID NO: 141); YARAAARQARA (SEQ ID NO: 142); THRLPRRRRRR (SEQ ID NO: 143); and GGRRARRRRRR (SEQ ID NO: 144). In some cases, the PTD is an activatable CPP (ACPP) (Aguilera et al. (2009) Integr Biol (Camb) June; 1 (5-6): 371 -381). ACPPs comprise a polycationic CPP (e.g., Arg9 or “R9”) connected via a cleavable linker to a matching polyanion (e.g., Glu9 or “E9”), which reduces the net charge to nearly zero and thereby inhibits adhesion and uptake into cells. Upon cleavage of the linker, the polyanion is released, locally unmasking the polyarginine and its inherent adhesiveness, thus “activating” the ACPP to traverse the membrane.
Linkers (e.g., for fusion partners)
[0189] In some embodiments, a subject Casl2L protein can fused to a fusion partner via a linker polypeptide (e.g., one or more linker polypeptides). The linker polypeptide may have any of a variety of amino acid sequences. Proteins can be joined by a spacer peptide, generally of a flexible nature, although other chemical linkages are not excluded. Suitable linkers include polypeptides of between 4 amino acids and 40 amino acids in length, or between 4 amino acids and 25 amino acids in length. These linkers can be produced by using synthetic, linker-encoding oligonucleotides to couple the proteins, or can be encoded by a nucleic acid sequence encoding the fusion protein. Peptide linkers with a degree of flexibility can be used. The linking peptides may have virtually any amino acid sequence, bearing in mind that the preferred linkers will have a sequence that results in a generally flexible peptide. The use of small amino acids, such as glycine and alanine, are of use in creating a flexible peptide. The creation of such sequences is routine to those of skill in the art. A variety of different linkers are commercially available and are considered suitable for use.
[0190] Examples of linker polypeptides include glycine polymers (G)n, glycine-serine polymers (including, for example, (GS)n (SEQ ID NO: 145), and (GGGGS)n (SEQ ID NO: 146), where n is an integer from 1 to 10), glycine-alanine polymers, alanine-serine polymers. Exemplary linkers can comprise amino acid sequences including, but not limited to, GGSG (SEQ ID NO: 147), GGSGG (SEQ ID NO: 148), GSGSG (SEQ ID NO: 149), GSGGG (SEQ ID NO: 150), GGGSG (SEQ ID NO: 151), GSSSG (SEQ ID NO: 152), and the like. The ordinarily skilled artisan will recognize that design of a peptide conjugated to any desired element can include linkers that are all or partially flexible, such that the linker can include a flexible linker as well as one or more portions that confer less flexible structure.
[0191] A variety of shorter or longer linker regions are known in the art, for example corresponding to a series of glycine residues, a series of adjacent glycine-serine dipeptides, a series of adjacent glycine- glycine-serine tripeptides, or known linkers from other proteins. A flexible linker may include, for example, the amino acid sequence: SSGPPPGTG (SEQ ID NO: 153) and variants thereof. A rigid linker may include, for example, the amino acid sequence: AEAAAKEAAAKA (SEQ ID NO: 154) and variants thereof. The XTEN linker, SGSETPGTSESATPES (SEQ ID NO: 155) and variants thereof, described in Guilinget et al, 2014 (Nature Biotechnology 32, 577-582), may also be used.
Tags, Reporters, and Other Features
[0192] A Casl2L polypeptide may contain one or more tags that allow for e.g. purification and/or detection of the recombinant polypeptide. Various tags may be used herein and are well-known to those of skill in the art. Exemplary tags may include hemagglutinin (HA), glutathione-S-transferase (GST), FLAG, maltose-binding protein (MBP), etc., and multiple copies of one or more tags may be present in a Casl2L polypeptide.
[0193] A Cas12L polypeptide may contain one or more reporters that allow for e.g. visualization and/or detection of the Casl2L polypeptide. A reporter polypeptide encodes a protein that may be readily detectable due to its biochemical characteristics such as, for example, enzymatic activity or chemifluorescent features. Reporter polypeptides may be detected in a number of ways depending on the characteristics of the particular reporter. For example, a reporter polypeptide may be detected by its ability to generate a detectable signal (e.g. fluorescence), by its ability to form a detectable product, etc. Various reporters may be used herein and are well-known to those of skill in the art. Exemplary reporters may include a green fluorescent protein (GFP), a yellow fluorescent protein (YFP), a cyan fluorescent protein, GUS, mCherry, luciferase, etc., and multiple copies of one or more tags may be present in a recombinant polypeptide.
[0194] A Casl2L polypeptide may contain one or more polypeptide domains that serve a particular purpose depending on the particular goal/need. For example, a Casl2L polypeptide may contain a GB 1 polypeptide. A Casl2L polypeptide may contain translocation sequences that target the polypeptide to a particular cellular compartment or area. Suitable features will be readily apparent to those of skill in the art.
Protospacer Adjacent Motif (PAM)
[0195] A Casl2L protein binds to target DNA at a target sequence defined by the region of complementarity between the DNA-targeting RNA and the target DNA. As is the case for many CRISPR endonucleases, site-specific binding (and/or cleavage) of a double stranded target DNA occurs at locations determined by both (i) base-pairing complementarity between the guide RNA and the target DNA; and (ii) a short motif [referred to as the protospacer adjacent motif (PAM)] in the target DNA.
[0196] In some cases, the PAM for a Casl2L protein is immediately 5’ of the target sequence of the non-complementary strand of the target DNA (the complementary strand: (i) hybridizes to the guide sequence of the guide RNA, while the non-complementary strand does not directly hybridize with the guide RNA; and (ii) is the reverse complement of the non-complementary strand).
[0197] In some cases, different Casl2L proteins (i.e., Casl2L proteins from various species) may be advantageous to use in the various provided methods in order to capitalize on various enzymatic characteristics of the different Casl2L proteins (e.g., for different PAM sequence preferences; for increased or decreased enzymatic activity; for an increased or decreased level of cellular toxicity; to change the balance between NHEJ, homology-directed repair, single strand breaks, double strand breaks, etc.; to take advantage of a short total sequence; and the like). Casl2L proteins from different species may require different PAM sequences in the target DNA. Various methods (including in silico and/or wet lab methods) for identification of the appropriate PAM sequence are known in the art and are routine, and any convenient method can be used. [0198] A Call2L polypeptide of the present disclosure can be reprogrammed (by complexing with a guide RNA) to cleave any sequence of a target nucleic acid (e.g., a target DNA) that is complementary to the targeting segment of the guide RNA, where the PAM is present on the 5’ end of the target (e.g., a T- rich PAM for CasXl); additional RNA components are not required for the formation of functional effectors in vivo. In some cases, a PAM sequence is a T-rich sequence (e.g., TTR, where R is a purine). In some cases, a PAM sequence is TTA. In some cases, a PAM sequence is TTG.
Casl2L Guide RNA
[0199] A nucleic acid that binds to a Casl2L protein, forming a ribonucleoprotein complex (RNP), and targets the complex to a specific location within a target nucleic acid (e.g., a target DNA) is referred to herein as a “Casl2L guide RNA” or simply as a “guide RNA.” It is to be understood that in some cases, a hybrid DNA/RNA can be made such that a Casl2L guide RNA includes DNA bases in addition to RNA bases, but the term “Casl2L guide RNA” is still used to encompass such a molecule herein. [0200] A Casl2L guide RNA can be said to include two segments, a targeting segment and a protein-binding segment. The protein-binding segment is also referred to herein as the “constant region” of the guide RNA. The targeting segment of a Casl2L guide RNA includes a nucleotide sequence (a guide sequence) that is complementary to (and therefore hybridizes with) a specific sequence (a target site) within a target nucleic acid (e.g., a target dsDNA, a target ssRNA, a target ssDNA, the complementary strand of a double stranded target DNA, etc.). The protein-binding segment (or “proteinbinding sequence”) interacts with (binds to) a Casl2L polypeptide. The protein-binding segment of a subject Casl2L guide RNA can include two complementary stretches of nucleotides that hybridize to one another to form a double stranded RNA duplex (dsRNA duplex). Site-specific binding and/or cleavage of a target nucleic acid (e.g., genomic DNA, ds DNA, RNA, etc.) can occur at locations (e.g., target sequence of a target locus) determined by base-pairing complementarity between the Casl2L guide RNA (the guide sequence of the Casl2L guide RNA) and the target nucleic acid.
[0201] A Casl2L guide RNA and a Casl2L protein (e.g., a wild-type Casl2L protein; a variant Casl2L protein; a fusion Casl2L polypeptide; etc.) form a complex (e.g., bind via non-covalent interactions). The Casl2L guide RNA provides target specificity to the complex by including a targeting segment, which includes a guide sequence (a nucleotide sequence that is complementary to a sequence of a target nucleic acid). The Casl2L protein of the complex provides the site-specific activity (e.g., cleavage activity provided by the Casl2L protein and/or an activity provided by the fusion partner in the case of a fusion Casl2L protein). In other words, the Casl2L protein is guided to a target nucleic acid sequence (e.g. a target sequence) by virtue of its association with the Casl2L guide RNA. [0202] The “guide sequence” also referred to as the “targeting sequence” of a Casl2L guide RNA can be modified so that the Casl2L guide RNA can target a Casl2L protein (e.g., a naturally occurring Casl2L protein, a fusion Casl2L polypeptide, and the like) to any desired sequence of any desired target nucleic acid, with the exception (e.g., as described herein) that the PAM sequence can be taken into account. Thus, for example, a Casl2L guide RNA can have a guide sequence with complementarity to (e.g., can hybridize to) a sequence in a nucleic acid in a eukaryotic cell, e.g., a viral nucleic acid, a eukaryotic nucleic acid (e.g., a eukaryotic chromosome, chromosomal sequence, a eukaryotic RNA, etc.), and the like.
Guide sequence of a Casl2L guide RNA
[0203] A subject Casl2L guide RNA includes a guide sequence (i.e., a targeting sequence), which is a nucleotide sequence that is complementary to a sequence (a target site) in a target nucleic acid. In other words, the guide sequence of a Casl2L guide RNA can interact with a target nucleic acid (e.g., double stranded DNA (dsDNA), single stranded DNA (ssDNA), single stranded RNA (ssRNA), or double stranded RNA (dsRNA)) in a sequence-specific manner via hybridization (i.e., base pairing). The guide sequence of a Casl2L guide RNA can be modified (e.g., by genetic engineeringj/designed to hybridize to any desired target sequence (e.g., while taking the PAM into account, e.g., when targeting a dsDNA target) within a target nucleic acid (e.g., a eukaryotic target nucleic acid such as genomic DNA).
[0204] In some cases, the percent complementarity between the guide sequence and the target site of the target nucleic acid is 60% or more (e.g., 65% or more, 70% or more, 75% or more, 80% or more, 85% or more, 90% or more, 95% or more, 97% or more, 98% or more, 99% or more, or 100%). In some cases, the percent complementarity between the guide sequence and the target site of the target nucleic acid is 80% or more (e.g., 85% or more, 90% or more, 95% or more, 97% or more, 98% or more, 99% or more, or 100%). In some cases, the percent complementarity between the guide sequence and the target site of the target nucleic acid is 90% or more (e.g., 95% or more, 97% or more, 98% or more, 99% or more, or 100%). In some cases, the percent complementarity between the guide sequence and the target site of the target nucleic acid is 100%.
[0205] In some cases, the percent complementarity between the guide sequence and the target site of the target nucleic acid is 100% over the seven contiguous 3 ’-most nucleotides of the target site of the target nucleic acid.
[0206] In some cases, the percent complementarity between the guide sequence and the target site of the target nucleic acid is 60% or more (e.g., 70% or more, 75% or more, 80% or more, 85% or more, 90% or more, 95% or more, 97% or more, 98% or more, 99% or more, or 100%) over 17 or more (e.g., 18 or more, 19 or more, 20 or more, 21 or more, 22 or more) contiguous nucleotides. In some cases, the percent complementarity between the guide sequence and the target site of the target nucleic acid is 80% or more (e.g., 85% or more, 90% or more, 95% or more, 97% or more, 98% or more, 99% or more, or 100%) over 17 or more (e.g., 18 or more, 19 or more, 20 or more, 21 or more, 22 or more) contiguous nucleotides. In some cases, the percent complementarity between the guide sequence and the target site of the target nucleic acid is 90% or more (e.g., 95% or more, 97% or more, 98% or more, 99% or more, or 100%) over 17 or more (e.g., 18 or more, 19 or more, 20 or more, 21 or more, 22 or more) contiguous nucleotides. In some cases, the percent complementarity between the guide sequence and the target site of the target nucleic acid is 100% over 17 or more (e.g., 18 or more, 19 or more, 20 or more, 21 or more, 22 or more) contiguous nucleotides.
[0207] In some cases, the percent complementarity between the guide sequence and the target site of the target nucleic acid is 60% or more (e.g., 70% or more, 75% or more, 80% or more, 85% or more, 90% or more, 95% or more, 97% or more, 98% or more, 99% or more, or 100%) over 19 or more (e.g., 20 or more, 21 or more, 22 or more) contiguous nucleotides. In some cases, the percent complementarity between the guide sequence and the target site of the target nucleic acid is 80% or more (e.g., 85% or more, 90% or more, 95% or more, 97% or more, 98% or more, 99% or more, or 100%) over 19 or more (e.g., 20 or more, 21 or more, 22 or more) contiguous nucleotides. In some cases, the percent complementarity between the guide sequence and the target site of the target nucleic acid is 90% or more (e.g., 95% or more, 97% or more, 98% or more, 99% or more, or 100%) over 19 or more (e.g., 20 or more, 21 or more, 22 or more) contiguous nucleotides. In some cases, the percent complementarity between the guide sequence and the target site of the target nucleic acid is 100% over 19 or more (e.g., 20 or more, 21 or more, 22 or more) contiguous nucleotides.
[0208] In some cases, the percent complementarity between the guide sequence and the target site of the target nucleic acid is 60% or more (e.g., 70% or more, 75% or more, 80% or more, 85% or more, 90% or more, 95% or more, 97% or more, 98% or more, 99% or more, or 100%) over 17-25 contiguous nucleotides. In some cases, the percent complementarity between the guide sequence and the target site of the target nucleic acid is 80% or more (e.g., 85% or more, 90% or more, 95% or more, 97% or more, 98% or more, 99% or more, or 100%) over 17-25 contiguous nucleotides. In some cases, the percent complementarity between the guide sequence and the target site of the target nucleic acid is 90% or more (e.g., 95% or more, 97% or more, 98% or more, 99% or more, or 100%) over 17-25 contiguous nucleotides. In some cases, the percent complementarity between the guide sequence and the target site of the target nucleic acid is 100% over 17-25 contiguous nucleotides.
[0209] In some cases, the percent complementarity between the guide sequence and the target site of the target nucleic acid is 60% or more (e.g., 70% or more, 75% or more, 80% or more, 85% or more, 90% or more, 95% or more, 97% or more, 98% or more, 99% or more, or 100%) over 19-25 contiguous nucleotides. In some cases, the percent complementarity between the guide sequence and the target site of the target nucleic acid is 80% or more (e.g., 85% or more, 90% or more, 95% or more, 97% or more, 98% or more, 99% or more, or 100%) over 19-25 contiguous nucleotides. In some cases, the percent complementarity between the guide sequence and the target site of the target nucleic acid is 90% or more (e.g., 95% or more, 97% or more, 98% or more, 99% or more, or 100%) over 19-25 contiguous nucleotides. In some cases, the percent complementarity between the guide sequence and the target site of the target nucleic acid is 100% over 19-25 contiguous nucleotides.
[0210] In some cases, the guide sequence has a length in a range of from 17-30 nucleotides (nt) (e.g., from 17-25, 17-22, 17-20, 19-30, 19-25, 19-22, 19-20, 20-30, 20-25, or 20-22 nt). In some cases, the guide sequence has a length in a range of from 17-25 nucleotides (nt) (e.g., from 17-22, 17-20, 19-25, 19- 22, 19-20, 20-25, or 20-22 nt). In some cases, the guide sequence has a length of 17 or more nt (e.g., 18 or more, 19 or more, 20 or more, 21 or more, or 22 or more nt; 19 nt, 20 nt, 21 nt, 22 nt, 23 nt, 24 nt, 25 nt, etc.). In some cases, the guide sequence has a length of 19 or more nt (e.g., 20 or more, 21 or more, or 22 or more nt; 19 nt, 20 nt, 21 nt, 22 nt, 23 nt, 24 nt, 25 nt, etc.). In some cases, the guide sequence has a length of 17 nt. In some cases, the guide sequence has a length of 18 nt. In some cases, the guide sequence has a length of 19 nt. In some cases, the guide sequence has a length of 20 nt. In some cases, the guide sequence has a length of 21 nt. In some cases, the guide sequence has a length of 22 nt. In some cases, the guide sequence has a length of 23 nt.
[0211] In some cases, the guide sequence (also referred to as a “spacer sequence”) has a length of from 15 to 50 nucleotides (e.g., from 15 nucleotides (nt) to 20 nt, from 20 nt to 25 nt, from 25 nt to 30 nt, from 30 nt to 35 nt, from 35 nt to 40 nt, from 40 nt to 45 nt, or from 45 nt to 50 nt).
Protein-binding segment of a Casl2L guide RNA
[0212] The protein-binding segment (the “constant region”) of a subject Casl2L guide RNA interacts with a Casl2L protein. The Casl2L guide RNA guides the bound Casl2L protein to a specific nucleotide sequence within target nucleic acid via the above-mentioned guide sequence. The proteinbinding segment of a Casl2L guide RNA can include two stretches of nucleotides that are complementary to one another and hybridize to form a double stranded RNA duplex (dsRNA duplex). Thus, in some cases, the protein-binding segment includes a dsRNA duplex.
[0213] In some cases, the dsRNA duplex region includes a range of from 5-25 base pairs (bp) (e.g., from 5-22, 5-20, 5-18, 5-15, 5-12, 5-10, 5-8, 8-25, 8-22, 8-18, 8-15, 8-12, 12-25, 12-22, 12-18, 12-15, 13- 25, 13-22, 13-18, 13-15, 14-25, 14-22, 14-18, 14-15, 15-25, 15-22, 15-18, 17-25, 17-22, or 17-18 bp, e.g., 5 bp, 6 bp, 7 bp, 8 bp, 9 bp, 10 bp, etc.). In some cases, the dsRNA duplex region includes a range of from 6-15 base pairs (bp) (e.g., from 6-12, 6-10, or 6-8 bp, e.g., 6 bp, 7 bp, 8 bp, 9 bp, 10 bp, etc.). In some cases, the duplex region includes 5 or more bp (e.g., 6 or more, 7 or more, or 8 or more bp). In some cases, the duplex region includes 6 or more bp (e.g., 7 or more, or 8 or more bp). In some cases, not all nucleotides of the duplex region are paired, and therefore the duplex forming region can include a bulge. The term “bulge” herein is used to mean a stretch of nucleotides (which can be one nucleotide) that do not contribute to a double stranded duplex, but which are surround 5’ and 3’ by nucleotides that do contribute, and as such a bulge is considered part of the duplex region. In some cases, the dsRNA includes 1 or more bulges (e.g., 2 or more, 3 or more, 4 or more bulges). In some cases, the dsRNA duplex includes 2 or more bulges (e.g., 3 or more, 4 or more bulges). In some cases, the dsRNA duplex includes 1-5 bulges (e.g., 1-4, 1-3, 2-5, 2-4, or 2-3 bulges).
[0214] Thus, in some cases, the stretches of nucleotides that hybridize to one another to form the dsRNA duplex have 70%-100% complementarity (e.g., 75%-100%, 80%-10%, 85%-100%, 90%-100%, 95%-100% complementarity) with one another. In some cases, the stretches of nucleotides that hybridize to one another to form the dsRNA duplex have 70%-100% complementarity (e.g., 75%-100%, 80%-10%, 85%-100%, 90%-100%, 95%-100% complementarity) with one another. In some cases, the stretches of nucleotides that hybridize to one another to form the dsRNA duplex have 85%-100% complementarity (e.g., 90%-100%, 95%-100% complementarity) with one another. In some cases, the stretches of nucleotides that hybridize to one another to form the dsRNA duplex have 70%-95% complementarity (e.g., 75%-95%, 80%-95%, 85%-95%, 90%-95% complementarity) with one another.
[0215] In other words, in some embodiments, the dsRNA duplex includes two stretches of nucleotides that have 70%-100% complementarity (e.g., 75%-100%, 80%-10%, 85%-100%, 90%-100%, 95%-100% complementarity) with one another. In some cases, the dsRNA duplex includes two stretches of nucleotides that have 85%-100% complementarity (e.g., 90%-100%, 95%-100% complementarity) with one another. In some cases, the dsRNA duplex includes two stretches of nucleotides that have 70%- 95% complementarity (e.g., 75%-95%, 80%-95%, 85%-95%, 90%-95% complementarity) with one another.
[0216] The duplex region of a subject Casl2L guide RNA can include one or more (1, 2, 3, 4, 5, etc) mutations relative to a naturally occurring duplex region. For example, in some cases a base pair can be maintained while the nucleotides contributing to the base pair from each segment can be different. In some cases, the duplex region of a subject Casl2L guide RNA includes more paired bases, less paired bases, a smaller bulge, a larger bulge, fewer bulges, more bulges, or any convenient combination thereof, as compared to a naturally occurring duplex region (of a naturally occurring Cas12L guide RNA). [0217] Examples of various Cas9 guide RNAs can be found in the art, and in some cases variations similar to those introduced into Cas9 guide RNAs can also be introduced into Casl2L guide RNAs of the present disclosure (e.g., mutations to the dsRNA duplex region, extension of the 5’ or 3’ end for added stability for to provide for interaction with another protein, and the like). For example, see Jinek et al., Science. 2012 Aug 17;337(6096): 816-21 ; Chylinski et al., RNA Biol. 2013 May;10(5):726-37; Ma et al., Biomed Res Int. 2013;2013:270805; Hou et al., Proc Natl Acad Sci U S A. 2013 Sep 24; 110(39): 15644- 9; Jinek et al., Elife. 2013;2:e00471; Pattanayak et al., Nat Biotechnol. 2013 Sep;31(9):839-43; Qi et al, Cell. 2013 Feb 28; 152(5): 1173-83; Wang et al., Cell. 2013 May 9; 153(4):910-8; Auer et al., Genome Res. 2013 Oct 31; Chen et al., Nucleic Acids Res. 2013 Nov l;41(20):el9; Cheng et al., Cell Res. 2013 Oct;23(10):1163-71; Cho et al., Genetics. 2013 Nov;195(3):l 177-80; DiCarlo et al., Nucleic Acids Res. 2013 Apr;41(7):4336-43; Dickinson et al., Nat Methods. 2013 Oct;10(10):1028-34; Ebina et al., Sci Rep. 2013;3:2510; Fujii et. al, Nucleic Acids Res. 2013 Nov l;41(20):el87; Hu et al., Cell Res. 2013 Nov;23(l l):1322-5; Jiang et al., Nucleic Acids Res. 2013 Nov l;41(20):el88; Larson et al., Nat Protoc. 2013 Nov;8(l l):2180-96; Mali et. at., Nat Methods. 2013 Oct;10(10):957-63; Nakayama et al., Genesis. 2013 Dec;51(12):835-43; Ran et al., Nat Protoc. 2013 Nov;8(ll):2281-308; Ran et al., Cell.
2013 Sep 12;154(6):1380-9; Upadhyay et al., G3 (Bethesda). 2013 Dec 9;3(12):2233-8; Walsh et al., Proc Natl Acad Sci U S A. 2013 Sep 24;110(39):15514-5; Xie et al., Mol Plant. 2013 Oct 9; Yang et al., Cell. 2013 Sep 12;154(6):1370-9; Briner et al., Mol Cell. 2014 Oct 23;56(2):333-9; and U.S. patents and patent applications: 8,906,616; 8,895,308; 8,889,418; 8,889,356; 8,871,445; 8,865,406; 8,795,965; 8,771,945; 8,697,359; 20140068797; 20140170753; 20140179006; 20140179770; 20140186843; 20140186919; 20140186958; 20140189896; 20140227787; 20140234972; 20140242664; 20140242699; 20140242700; 20140242702; 20140248702; 20140256046; 20140273037; 20140273226; 20140273230; 20140273231; 20140273232; 20140273233; 20140273234; 20140273235; 20140287938; 20140295556; 20140295557; 20140298547; 20140304853; 20140309487; 20140310828; 20140310830; 20140315985; 20140335063; 20140335620; 20140342456; 20140342457; 20140342458; 20140349400; 20140349405; 20140356867; 20140356956; 20140356958; 20140356959; 20140357523; 20140357530; 20140364333; and 20140377868; all of which are hereby incorporated by reference in their entirety.
[0218] Examples of constant regions suitable for inclusion in a Casl2L guide RNA are provided in FIG. 5A-5M (e.g., where T is substituted with U). A Casl2L guide RNA can include a constant region having from 1 to 5 nucleotide substitutions compared to any one of the nucleotide sequences depicted in FIG. 5A-5M.
[0219] The nucleotide sequences (with T substituted with U) can be combined with a spacer sequence (where the spacer sequence comprises a target nucleic acid-binding sequence (“guide sequence”)) of choice that is from 15 to 50 nucleotides (e.g., from 15 nucleotides (nt) to 20 nt, from 20 nt to 25 nt, from 25 nt to 30 nt, from 30 nt to 35 nt, from 35 nt to 40 nt, from 40 nt to 45 nt, or from 45 nt to 50 nt in length). In some cases, the spacer sequence is 35-38 nucleotides in length. For example, any one of the nucleotide sequences (with T substituted with U) depicted in FIG. 5A-5M can be included in a guide RNA comprising (N)n-constant region, where N is any nucleotide and n is an integer from 15 to 50 (e.g., from 15 to 20, from 20 to 25, from 25 to 30, from 30 to 35, from 35 to 38, from 35 to 40, from 40 to 45, or from 45 to 50).
[0220] As one example, the constant region of a Casl2L guide RNA can comprise the nucleotide sequence: AUUGUUGUAACUCUUAUUUUGUAUGGAGUAAACAAC (SEQ ID NO:74). As another example, the constant region of a Casl2L guide RNA can comprise the nucleotide sequence: AUUGUUGUAGACCUCUUUUUAUAAGGAUUGAACAAC (SEQ ID NO:76). As another example, the constant region of a Casl2L guide RNA can comprise the nucleotide sequence: UAUUGUUGUAGAUACCUUUUUGUAAGGAUUAAACAAC (SEQ ID NO:79). As another example, the constant region of a Casl2L guide RNA can comprise the nucleotide sequence: AAUGUUGUAGAUGCCUUUUUAUAAGGAUUAAACAACUUG (SEQ ID NO: 156). As another example, the constant region of a Casl2L guide RNA can comprise the nucleotide sequence: AUUGUUGAAAUAGUACUUUUAUAGUCUAUAUACAAC (SEQ ID NO:70). As another example, the constant region of a Casl2L guide RNA can comprise the nucleotide sequence:
UAUUGUUGUAACUCUUAUUUUGUAUGGAGUAAACAAC (SEQ ID NO:72). As another example, the constant region of a Casl2L guide RNA can comprise the nucleotide sequence: AUUGUUGUAACUUUUAUUUUGUAUGGAGUAAACAAC (SEQ ID NO:75). As another example, the constant region of a Casl2L guide RNA can comprise the nucleotide sequence: AAUGUUGUAGAUACCUUUUUGUAAGGAUUGAACAAC (SEQ ID NO:78). As another example, the constant region of a Casl2L guide RNA can comprise the nucleotide sequence: AUUGUUGUAAUACUAUUUUUGUAAAGUAUAAACAAC (SEQ ID NO:81). As another example, the constant region of a Cast 2L guide RNA can comprise the nucleotide sequence: AAUGUUGUAGAUGCCUUUUUAUAAGGAUUAAACAAC (SEQ ID NO:77). As another example, the constant region of a Casl2L guide RNA can comprise the nucleotide sequence: AUUGUUGUAAUACACUUUUUAUAAGGUAUGAACAAC (SEQ ID NO:82). As another example, the constant region of a Casl2L guide RNA can comprise the nucleotide sequence: AUUGUUGUAACAUCUAUUUUGUAAGGUGUAAACAAC (SEQ ID NO:71).
[0221] The reverse complement of any one of the nucleotide sequences depicted in FIG. 5 A-5M F (but with T substituted with U) can be included in a guide RNA comprising constant region-(N)n, where N is any nucleotide and n is an integer from 15 to 50 (e.g., from 15 to 20, from 20 to 25, from 25 to 30, from 30 to 35, from 35 to 38, from 35 to 40, from 40 to 45, or from 45 to 50). As one example, a guide RNA can have the following nucleotide sequence: NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNAUUGUUGUAACUCUUAUUUUGUAU GGAGUAAACAAC (SEQ ID NO: 157) or in some cases the reverse complement, where N is any nucleotide, e.g., where the stretch of Ns includes a target nucleic acid-binding sequence.
Casl2L guide polynucleotides
[0222] In some cases, a nucleic acid that binds to a Casl2L protein, forming a nucleic acid/Casl2L polypeptide complex, and that targets the complex to a specific location within a target nucleic acid (e.g., a target DNA) comprises ribonucleotides only, deoxyribonucleotides only, or a mixture of ribonucleotides and deoxyribonucleotides. In some cases, a guide polynucleotide comprises ribonucleotides only, and is referred to herein as a “guide RNA.” In some cases, a guide polynucleotide comprises deoxyribonucleotides only, and is referred to herein as a “guide DNA.” In some cases, a guide polynucleotide comprises both ribonucleotides and deoxyribonucleotides. A guide polynucleotide can comprise combinations of ribonucleotide bases, deoxyribonucleotide bases, nucleotide analogs, modified nucleotides, and the like; and may further include naturally-occurring backbone residues and/or linkages and/or non-naturally-occurring backbone residues and/or linkages.
Recombinant Nucleic Acids
[0223] Certain aspects of the present disclosure relate to recombinant nucleic acids. In some embodiments, recombinant nucleic acids encode recombinant polypeptides of the present disclosure. [0224] As used herein, the terms “polynucleotide,” “nucleic acid,” and variations thereof shall be generic to polydeoxyribonucleotides (containing 2-deoxy-D-ribose), to polyribonucleotides (containing D-ribose), to any other type of polynucleotide that is an N-glycoside of a purine or pyrimidine base, and to other polymers containing non-nucleotidic backbones, provided that the polymers contain nucleobases in a configuration that allows for base pairing and base stacking, as found in DNA and RNA. Thus, these terms include known types of nucleic acid sequence modifications, for example, substitution of one or more of the naturally occurring nucleotides with an analog, and inter-nucleotide modifications. As used herein, the symbols for nucleotides and polynucleotides are those recommended by the IUPAC-IUB Commission of Biochemical Nomenclature.
[0225] “Recombinant nucleic acid” or “heterologous nucleic acid” or “recombinant polynucleotide” as used herein refers to a polymer of nucleic acids wherein at least one of the following is true: (a) the sequence of nucleic acids is foreign to (i.e., not naturally found in) a given host cell; (b) the sequence may be naturally found in a given host cell, but in an unnatural (e.g., greater than expected) amount; or (c) the sequence of nucleic acids contains two or more subsequences that are not found in the same relationship to each other in nature. For example, regarding instance (c), a recombinant nucleic acid sequence will have two or more sequences from unrelated genes arranged to make a new functional nucleic acid. In some embodiments, the present disclosure describes the introduction of an expression vector into a plant cell, where the expression vector contains a nucleic acid sequence coding for a protein that is not normally found in a plant cell or contains a nucleic acid coding for a protein that is normally found in a plant cell but is under the control of different regulatory sequences. With reference to the plant cell’s genome, then, the nucleic acid sequence that codes for the protein is recombinant. A protein that is referred to as recombinant may be encoded by a recombinant nucleic acid sequence which may be present in the plant cell. Recombinant proteins of the present disclosure may also be exogenously supplied directly to host cells (e.g. plant cells).
NUCLEIC ACIDS
[0226] The present disclosure provides one or more nucleic acids comprising one or more of: a donor polynucleotide sequence, a nucleotide sequence encoding a Casl2L polypeptide (e.g., a wild type Casl2L protein, a nickase Casl2L protein, a dCasl2L protein, fusion Casl2L protein, and the like), a Casl2L guide RNA, and a nucleotide sequence encoding a Casl2L guide RNA. The present disclosure provides a nucleic acid comprising a nucleotide sequence encoding a Casl2L fusion polypeptide. The present disclosure provides a recombinant expression vector that comprises a nucleotide sequence encoding a Casl2L polypeptide. The present disclosure provides a recombinant expression vector that comprises a nucleotide sequence encoding a Casl2L fusion polypeptide. The present disclosure provides a recombinant expression vector that comprises: a) a nucleotide sequence encoding a Casl2L polypeptide; and b) a nucleotide sequence encoding a Casl2L guide RNA(s). The present disclosure provides a recombinant expression vector that comprises: a) a nucleotide sequence encoding a Casl2L fusion polypeptide; and b) a nucleotide sequence encoding a Casl2L guide RNA(s). In some cases, the nucleotide sequence encoding the Casl2L protein and/or the nucleotide sequence encoding the Casl2L guide RNA is operably linked to a promoter that is operable in a cell type of choice (e.g., a prokaryotic cell, a eukaryotic cell, a plant cell, an animal cell, a mammalian cell, a primate cell, a rodent cell, a human cell, etc.).
[0227] In some cases, a nucleotide sequence encoding a Casl2L polypeptide of the present disclosure is codon optimized. This type of optimization can entail a mutation of a Casl2L -encoding nucleotide sequence to mimic the codon preferences of the intended host organism or cell while encoding the same protein. Thus, the codons can be changed, but the encoded protein remains unchanged. For example, if the intended target cell was a human cell, a human codon-optimized Casl2L-encoding nucleotide sequence could be used. As another non-limiting example, if the intended host cell were a mouse cell, then a mouse codon-optimized Casl2L-encoding nucleotide sequence could be generated. As another non-limiting example, if the intended host cell were a plant cell, then a plant codon-optimized Casl2L-encoding nucleotide sequence could be generated. As another non-limiting example, if the intended host cell were an insect cell, then an insect codon-optimized Casl2L-encoding nucleotide sequence could be generated.
[0228] Codon usage tables are readily available, for example, at the "Codon Usage Database" available at www[dot]kazusa[dot]or[dot]jp[forwardslash]codon. In some cases, a nucleic acid of the present disclosure comprises a Casl2L polypeptide-encoding nucleotide sequence that is codon optimized for expression in a eukaryotic cell. In some cases, a nucleic acid of the present disclosure comprises a Casl2L polypeptide-encoding nucleotide sequence that is codon optimized for expression in an animal cell. In some cases, a nucleic acid of the present disclosure comprises a Casl2L polypeptide-encoding nucleotide sequence that is codon optimized for expression in a fungus cell. In some cases, a nucleic acid of the present disclosure comprises a Casl2L polypeptide-encoding nucleotide sequence that is codon optimized for expression in a plant cell. In some cases, a nucleic acid of the present disclosure comprises a Casl2L polypeptide-encoding nucleotide sequence that is codon optimized for expression in a monocotyledonous plant species. In some cases, a nucleic acid of the present disclosure comprises a Casl2L polypeptide-encoding nucleotide sequence that is codon optimized for expression in a dicotyledonous plant species. In some cases, a nucleic acid of the present disclosure comprises a Casl2L polypeptide-encoding nucleotide sequence that is codon optimized for expression in a gymnosperm plant species. In some cases, a nucleic acid of the present disclosure comprises a Casl2L polypeptide-encoding nucleotide sequence that is codon optimized for expression in an angiosperm plant species. In some cases, a nucleic acid of the present disclosure comprises a Casl2L polypeptide-encoding nucleotide sequence that is codon optimized for expression in a corn cell. In some cases, a nucleic acid of the present disclosure comprises a Casl2L polypeptide-encoding nucleotide sequence that is codon optimized for expression in a soybean cell. In some cases, a nucleic acid of the present disclosure comprises a Cas12L polypeptide-encoding nucleotide sequence that is codon optimized for expression in a rice cell. In some cases, a nucleic acid of the present disclosure comprises a Casl2L polypeptide-encoding nucleotide sequence that is codon optimized for expression in a wheat cell. In some cases, a nucleic acid of the present disclosure comprises a Casl2L polypeptide-encoding nucleotide sequence that is codon optimized for expression in a cotton cell. In some cases, a nucleic acid of the present disclosure comprises a Casl2L polypeptide-encoding nucleotide sequence that is codon optimized for expression in a sorghum cell. In some cases, a nucleic acid of the present disclosure comprises a Casl2L polypeptide-encoding nucleotide sequence that is codon optimized for expression in an alfalfa cell. In some cases, a nucleic acid of the present disclosure comprises a Casl2L polypeptide-encoding nucleotide sequence that is codon optimized for expression in a sugar cane cell. In some cases, a nucleic acid of the present disclosure comprises a Casl2L polypeptide-encoding nucleotide sequence that is codon optimized for expression in an Arabidopsis cell. In some cases, a nucleic acid of the present disclosure comprises a Casl2L polypeptide- encoding nucleotide sequence that is codon optimized for expression in a tomato cell. In some cases, a nucleic acid of the present disclosure comprises a Casl2L polypeptide-encoding nucleotide sequence that is codon optimized for expression in a cucumber cell. In some cases, a nucleic acid of the present disclosure comprises a Casl2L polypeptide-encoding nucleotide sequence that is codon optimized for expression in a potato cell. In some cases, a nucleic acid of the present disclosure comprises a Casl2L polypeptide-encoding nucleotide sequence that is codon optimized for expression in an algae cell.
[0229] The present disclosure provides one or more recombinant expression vectors that include (in different recombinant expression vectors in some cases, and in the same recombinant expression vector in some cases): (i) a nucleotide sequence of a donor template nucleic acid (where the donor template comprises a nucleotide sequence having homology to a target sequence of a target nucleic acid (e.g., a target genome)); (ii) a nucleotide sequence that encodes a Casl2L guide RNA that hybridizes to a target sequence of the target locus of the targeted genome (e.g., operably linked to a promoter that is operable in a target cell such as a eukaryotic cell); and (iii) a nucleotide sequence encoding a Casl2L protein (e.g., operably linked to a promoter that is operable in a target cell such as a eukaryotic cell). The present disclosure provides one or more recombinant expression vectors that include (in different recombinant expression vectors in some cases, and in the same recombinant expression vector in some cases): (i) a nucleotide sequence of a donor template nucleic acid (where the donor template comprises a nucleotide sequence having homology to a target sequence of a target nucleic acid (e.g., a target genome)); and (ii) a nucleotide sequence that encodes a Casl2L guide RNA that hybridizes to a target sequence of the target locus of the targeted genome (e.g., operably linked to a promoter that is operable in a target cell such as a eukaryotic cell). The present disclosure provides one or more recombinant expression vectors that include (in different recombinant expression vectors in some cases, and in the same recombinant expression vector in some cases): (i) a nucleotide sequence that encodes a Casl2L guide RNA that hybridizes to a target sequence of the target locus of the targeted genome (e.g., operably linked to a promoter that is operable in a target cell such as a eukaryotic cell); and (ii) a nucleotide sequence encoding a Casl2L protein (e.g., operably linked to a promoter that is operable in a target cell such as a eukaryotic cell).
[0230] Suitable expression vectors include viral expression vectors (e.g. viral vectors based on vaccinia virus; poliovirus; adenovirus (see, e.g., Li et al., Invest Opthalmol Vis Sci 35:2543 2549, 1994; Borras et al., Gene Ther 6:515 524, 1999; Li and Davidson, PNAS 92:77007704, 1995; Sakamoto et al., H Gene Ther 5:1088 1097, 1999; WO 94/12649, WO 93/03769; WO 93/19191; WO 94/28938; WO 95/11984 and WO 95/00655); adeno-associated virus (AAV) (see, e.g., Ali et al., Hum Gene Ther 9:81 86, 1998, Flannery et al., PNAS 94:6916 6921, 1997; Bennett et al., Invest Opthalmol Vis Sci 38:2857 2863, 1997; Jomary et al., Gene Ther 4:683 690, 1997, Rolling et al., Hum Gene Ther 10:641 648, 1999; Ali et al., Hum Mol Genet 5:591 594, 1996; Srivastava in WO 93/09239, Samulski et al., J. Vir. (1989) 63:3822-3828; Mendelson et al., Virol. (1988) 166:154-165; and Flotte et al., PNAS (1993) 90:10613-10617); SV40; herpes simplex virus; human immunodeficiency virus (see, e.g., Miyoshi et al., PNAS 94:10319 23, 1997; Takahashi et al., J Virol 73:7812 7816, 1999); a retroviral vector (e.g., Murine Leukemia Virus, spleen necrosis virus, and vectors derived from retroviruses such as Rous Sarcoma Virus, Harvey Sarcoma Virus, avian leukosis virus, a lenti virus, human immunodeficiency virus, myeloproliferative sarcoma virus, and mammary tumor virus); and the like. In some cases, a recombinant expression vector of the present disclosure is a recombinant adeno-associated virus (AAV) vector. In some cases, a recombinant expression vector of the present disclosure is a recombinant lentivirus vector. In some cases, a recombinant expression vector of the present disclosure is a recombinant retroviral vector.
[0231] For plant applications, viral vectors based on Tobamoviruses, Potexviruses, Potyviruses, Tobraviruses, Tombus viruses, Geminiviruses, Bromoviruses, Carmoviruses, Alfamo viruses, or Cucumoviruses can be used. See, e.g., Peyret and Lomonossoff (2015) Plant Biotechnol. J. 13:1121. Suitable Tobamovirus vectors include, for example, a tomato mosaic virus (ToMV) vector, a tobacco mosaic virus (TMV) vector, a tobacco mild green mosaic virus (TMGMV) vector, a pepper mild mottle virus (PMMoV) vector, a paprika mild mottle virus (PaMMV) vector, a cucumber green mottle mosaic virus (CGMMV) vector, a kyuri green mottle mosaic virus (KGMMV) vector, a hibiscus latent fort pierce virus (HLFPV) vector, an odontoglossum ringspot virus (ORSV) vector, a rehmannia mosaic virus (ReMV) vector, a Sammon's opuntia virus (SOV) vector, a wasabi mottle virus (WMoV) vector, a youcai mosaic virus (YoMV) vector, a sunn-hemp mosaic virus (SHMV) vector, and the like. Suitable Potexvirus vectors include, for example, a potato virus X (PVX) vector, a potato aucubamosaicvirus (PAMV) vector, an Alstroemeria virus X (AlsVX) vector, a cactus virus X (CVX) vector, a Cymbidium mosaic virus (CymMV) vector, a hosta virus X (HVX) vector, a lily virus X (LVX) vector, a Narcissus mosaic virus (NMV) vector, a Nerine virus X (NVX) vector, a Plantago asiatica mosaic virus (P1AMV) vector, a strawberry mild yellow edge virus (SMYEV) vector, a tulip virus X (TVX) vector, a white clover mosaic virus (WC1MV) vector, a bamboo mosaic virus (BaMV) vector, and the like. Suitable Potyvirus vectors include, for example, a potato virus Y (PVY) vector, a bean common mosaic virus (BCMV) vector, a clover yellow vein virus (C1YVV) vector, an East Asian Passiflora virus (EAPV) vector, a Freesia mosaic virus (FreMV) vector, a Japanese yam mosaic virus (JYMV) vector, a lettuce mosaic virus (LMV) vector, a Maize dwarf mosaic virus (MDMV) vector, an onion yellow dwarf virus (OYDV) vector, a papaya ringspot virus (PRSV) vector, a pepper mottle virus (PepMoV) vector, a Perilla mottle virus (PerMo V) vector, a plum pox virus (PPV) vector, a potato virus A (PVA) vector, a sorghum mosaic virus (SrMV) vector, a soybean mosaic virus (SMV) vector, a sugarcane mosaic virus (SCMV) vector, a tulip mosaic virus (TulMV) vector, a turnip mosaic virus (TuMV) vector, a watermelon mosaic virus (WMV) vector, a zucchini yellow mosaic virus (ZYMV) vector, a tobacco etch virus (TEV) vector, and the like. Suitable Tobravirus vectors include, for example, a tobacco rattle virus (TRV) vector and the like. Suitable Tombusvirus vectors include, for example, a tomato bushy stunt virus (TBSV) vector, an eggplant mottled crinkle virus (EMCV) vector, a grapevine Algerian latent virus (GALV) vector, and the like. Suitable Cucumovirus vectors include, for example, a cucumber mosaic virus (CMV) vector, a peanut stunt virus (PSV) vector, a tomato aspermy virus (TAV) vector, and the like. Suitable Bromovirus vectors include, for example, a brome mosaic virus (BMV) vector, a cowpea chlorotic mottle virus (CCMV) vector, and the like. Suitable Carmovirus vectors include, for example, a carnation mottle virus (CarMV) vector, a melon necrotic spot virus (MNSV) vector, a pea stem necrotic virus (PSNV) vector, a turnip crinkle virus (TCV) vector, and the like. Suitable Alfamovirus vectors include, for example, an alfalfa mosaic virus (AMV) vector, and the like.
[0232] Depending on the host/vector system utilized, any of a number of suitable transcription and translation control elements, including constitutive and inducible promoters, transcription enhancer elements, transcription terminators, etc. may be used in the expression vector.
[0233] In some embodiments, a nucleotide sequence encoding a Casl2L guide RNA is operably linked to a control element, e.g., a transcriptional control element, such as a promoter. In some embodiments, a nucleotide sequence encoding a Casl2L protein or a Casl2L fusion polypeptide is operably linked to a control element, e.g., a transcriptional control element, such as a promoter.
[0234] The transcriptional control element can be a promoter. In some cases, the promoter is a constitutively active promoter. In some cases, the promoter is a regulatable promoter. In some cases, the promoter is an inducible promoter. In some cases, the promoter is a tissue-specific promoter. In some cases, the promoter is a cell type-specific promoter. In some cases, the transcriptional control element (e.g., the promoter) is functional in a targeted cell type or targeted cell population. For example, in some cases, the transcriptional control element can be functional in eukaryotic cells, e.g., hematopoietic stem cells (e.g., mobilized peripheral blood (mPB) CD34(+) cell, bone marrow (BM) CD34(+) cell, etc.). [0235] Non-limiting examples of eukaryotic promoters (promoters functional in a eukaryotic cell) include EFla, those from cytomegalovirus (CMV) immediate early, herpes simplex virus (HSV) thymidine kinase, early and late SV40, long terminal repeats (LTRs) from retrovirus, and mouse metallothionein-I. Selection of the appropriate vector and promoter is well within the level of ordinary skill in the art. The expression vector may also contain a ribosome binding site for translation initiation and a transcription terminator. The expression vector may also include appropriate sequences for amplifying expression. The expression vector may also include nucleotide sequences encoding protein tags (e.g., 6xHis tag, hemagglutinin tag, fluorescent protein, etc.) that can be fused to the Casl2L protein, thus resulting in a fusion Casl2L polypeptide.
[0236] In some embodiments, a nucleotide sequence encoding a Casl2L guide RNA and/or a Casl2L fusion polypeptide is operably linked to an inducible promoter. In some embodiments, a nucleotide sequence encoding a Casl2L guide RNA and/or a Casl2L fusion protein is operably linked to a constitutive promoter.
[0237] A promoter can be a constitutively active promoter (i.e., a promoter that is constitutively in an active/”ON” state), it may be an inducible promoter (i.e., a promoter whose state, active/”ON” or inactive/“OFF”, is controlled by an external stimulus, e.g., the presence of a particular temperature, compound, or protein.), it may be a spatially restricted promoter (i.e., transcriptional control element, enhancer, etc.)(e.g., tissue specific promoter, cell type specific promoter, etc.), and it may be a temporally restricted promoter (i.e., the promoter is in the “ON” state or “OFF” state during specific stages of embryonic development or during specific stages of a biological process, e.g., hair' follicle cycle in mice). [0238] Suitable promoters can be derived from viruses and can therefore be referred to as viral promoters, or they can be derived from any organism, including prokaryotic or eukaryotic organisms. Suitable promoters can be used to drive expression by any RNA polymerase (e.g., pol I, pol II, pol III). Exemplary promoters include, but are not limited to the SV40 early promoter, mouse mammary tumor virus long terminal repeat (LTR) promoter; adenovirus major late promoter (Ad MLP); a herpes simplex virus (HSV) promoter, a cytomegalovirus (CMV) promoter such as the CMV immediate early promoter region (CMVIE), a rous sarcoma virus (RSV) promoter, a human U6 small nuclear promoter (U6) (Miyagishi et al., Nature Biotechnology 20, 497 - 500 (2002)), an enhanced U6 promoter (e.g., Xia et ah, Nucleic Acids Res. 2003 Sep 1;31(17)), a human Hl promoter (Hl), and the like.
[0239] In some cases, a nucleotide sequence encoding a Casl2L guide RNA is operably linked to (under the control of) a promoter operable in a eukaryotic cell (e.g., a U6 promoter, an enhanced U6 promoter, an H1 promoter, and the like). As would be understood by one of ordinary skill in the art, when expressing an RNA (e.g., a guide RNA) from a nucleic acid (e.g., an expression vector) using a U6 promoter (e.g., in a eukaryotic cell), or another PolIII promoter, the RNA may need to be mutated if there are several Ts in a row (coding for Us in the RNA). This is because a string of Ts (e.g., 5 Ts) in DNA can act as a terminator for polymerase III (PolIII). Thus, in order to ensure transcription of a guide RNA in a eukaryotic cell it may sometimes be necessary to modify the sequence encoding the guide RNA to eliminate runs of Ts. In some cases, a nucleotide sequence encoding a Casl2L protein (e.g., a wild type Casl2L protein, a nickase Casl2L protein, a dCasl2L protein, a fusion Casl2L protein and the like) is operably linked to a promoter operable in a eukaryotic cell (e.g., a CMV promoter, an EFla promoter, an estrogen receptor-regulated promoter, and the like).
[0240] Examples of inducible promoters include, but are not limited toT7 RNA polymerase promoter, T3 RNA polymerase promoter, Isopropyl-beta-D-thiogalactopyranoside (IPTG)-regulated promoter, lactose induced promoter, heat shock promoter, Tetracycline-regulated promoter, Steroid- regulated promoter, Metal-regulated promoter, estrogen receptor-regulated promoter, etc. Inducible promoters can therefore be regulated by molecules including, but not limited to, doxycycline; estrogen and/or an estrogen analog; IPTG; etc.
[0241] Inducible promoters suitable for use include any inducible promoter described herein or known to one of ordinary skill in the art. Examples of inducible promoters include, without limitation, chemically/biochemically-regulated and physically-regulated promoters such as alcohol-regulated promoters, tetracycline -regulated promoters (e.g., anhydrotetracycline (aTc)-responsive promoters and other tetracycline -responsive promoter systems, which include a tetracycline repressor protein (tetR), a tetracycline operator sequence (tetO) and a tetracycline transactivator fusion protein (tTA)), steroid- regulated promoters (e.g., promoters based on the rat glucocorticoid receptor, human estrogen receptor, moth ecdysone receptors, and promoters from the steroid/retinoid/thyroid receptor superfamily), metal- regulated promoters (e.g., promoters derived from metallothionein (proteins that bind and sequester metal ions) genes from yeast, mouse and human), pathogenesis-regulated promoters (e.g., induced by salicylic acid, ethylene or benzothiadiazole (BTH)), temperature/heat-inducible promoters (e.g., heat shock promoters), and light-regulated promoters (e.g., light responsive promoters from plant cells).
[0242] In some cases, the promoter is a spatially restricted promoter (i.e., cell type specific promoter, tissue specific promoter, etc.) such that in a multi-cellular organism, the promoter is active (i.e., “ON”) in a subset of specific cells. Spatially restricted promoters may also be referred to as enhancers, transcriptional control elements, control sequences, etc. Any convenient spatially restricted promoter may be used as long as the promoter is functional in the targeted host cell (e.g., eukaryotic cell; prokaryotic cell). [0243] In some cases, the promoter is a reversible promoter. Suitable reversible promoters, including reversible inducible promoters are known in the art. Such reversible promoters may be isolated and derived from many organisms, e.g., eukaryotes and prokaryotes. Modification of reversible promoters derived from a first organism for use in a second organism, e.g., a first prokaryote and a second a eukaryote, a first eukaryote and a second a prokaryote, etc., is well known in the art. Such reversible promoters, and systems based on such reversible promoters but also comprising additional control proteins, include, but are not limited to, alcohol regulated promoters (e.g., alcohol dehydrogenase I (alcA) gene promoter, promoters responsive to alcohol transactivator proteins (AlcR), etc.), tetracycline regulated promoters, (e.g., promoter systems including Tet Activators, TetON, TetOFF, etc.), steroid regulated promoters (e.g., rat glucocorticoid receptor promoter systems, human estrogen receptor promoter systems, retinoid promoter systems, thyroid promoter systems, ecdysone promoter systems, mifepristone promoter systems, etc.), metal regulated promoters (e.g., metallothionein promoter systems, etc.), pathogenesis-related regulated promoters (e.g., salicylic acid regulated promoters, ethylene regulated promoters, benzothiadiazole regulated promoters, etc.), temperature regulated promoters (e.g., heat shock inducible promoters (e.g., HSP-70, HSP-90, soybean heat shock promoter, etc.), light regulated promoters, synthetic inducible promoters, and the like.
[0244] RNA polymerase III (Pol III) promoters can be used to drive the expression of non-protein coding RNA molecules (e.g., guide RNAs). In some cases, a suitable promoter is a Pol III promoter. In some cases, a Pol III promoter is operably linked to a nucleotide sequence encoding a guide RNA (gRNA). In some cases, a Pol III promoter is operably linked to a nucleotide sequence encoding a singleguide RNA (sgRNA). In some cases, a Pol III promoter is operably linked to a nucleotide sequence encoding a CRISPR RNA (crRNA). In some cases, a Pol III promoter is operably linked to a nucleotide sequence encoding a encoding a tracrRNA.
[0245] Non-limiting examples of Pol III promoters include a U6 promoter, an Hl promoter, a 5S promoter, an Adenovirus 2 (Ad2) VAI promoter, a tRNA promoter, and a 7SK promoter. See , for example, Schramm and Hernandez (2002) Genes & Development 16:2593-2620. In some cases, a Pol III promoter is selected from the group consisting of a U6 promoter, an Hl promoter, a 5S promoter, an Adenovirus 2 (Ad2) VAI promoter, a tRNA promoter, and a 7SK promoter. In some cases, a guide RNA- encoding nucleotide sequence is operably linked to a promoter selected from the group consisting of a U6 promoter, an Hl promoter, a 5S promoter, an Adenovims 2 (Ad2) VAI promoter, a tRNA promoter, and a 7SK promoter. In some cases, a single-guide RNA-encoding nucleotide sequence is operably linked to a promoter selected from the group consisting of a U6 promoter, an Hl promoter, a 5S promoter, an Adenovirus 2 (Ad2) VAI promoter, a tRNA promoter, and a 7SK promoter. [0246] Examples describing a promoter that can be used herein in connection with expression in plants, plant tissues, and plant cells include, but are not limited to, promoters described in: U.S. Pat. No. 6,437,217 (maize RS81 promoter), U.S. Pat. No. 5,641,876 (rice actin promoter), U.S. Pat. No. 6,426,446 (maize RS324 promoter), U.S. Pat. No. 6,429,362 (maize PR-1 promoter), U.S. Pat. No. 6,232,526 (maize A3 promoter), U.S. Pat. No. 6,177,611 (constitutive maize promoters), U.S. Pat. Nos. 5,322,938, 5,352,605, 5,359,142 and 5,530,196 (35S promoter), U.S. Pat. No. 6,433,252 (maize L3 oleosin promoter), U.S. Pat. No. 6,429,357 (rice actin 2 promoter as well as a rice actin 2 intron), U.S. Pat. No. 5,837,848 (root specific promoter), U.S. Pat. No. 6,294,714 (light inducible promoters), U.S. Pat. No. 6,140,078 (salt inducible promoters), U.S. Pat. No. 6,252,138 (pathogen inducible promoters), U.S. Pat. No. 6,175,060 (phosphorus deficiency inducible promoters), U.S. Pat. No. 6,635,806 (gamma-coixin promoter), and U.S. patent application Ser. No. 09/757,089 (maize chloroplast aldolase promoter). Additional promoters that can find use include a nopaline synthase (NOS) promoter (Ebert et al., 1987), the octopine synthase (OCS) promoter (which is carried on tumor-inducing plasmids of Agrobacterium tumefaciens), the caulimo virus promoters such as the cauliflower mosaic virus (CaMV) 19S promoter (Lawton et al. Plant Molecular Biology (1987) 9: 315-324), the CaMV 35S promoter (Odell et al., Nature (1985) 313: 810-812), the figwort mosaic virus 35S-promoter (U.S. Pat. Nos. 6,051,753; 5,378,619), the sucrose synthase promoter (Yang and Russell, Proceedings of the National Academy of Sciences, USA (1990) 87: 4144-4148), the R gene complex promoter (Chandler et al., Plant Cell (1989) 1 : 1175-1183), and the chlorophyll a/b binding protein gene promoter, PC1SV (U.S. Pat. No. 5,850,019), and AGRtu.nos (GenBank Accession V00087; Depicker et al., Journal of Molecular and Applied Genetics (1982) 1 : 561- 573; Bevan et al., 1983) promoters.
[0247] Methods of introducing a nucleic acid (e.g., a nucleic acid comprising a donor polynucleotide sequence, one or more nucleic acids encoding a Casl2L protein and/or a Casl2L guide RNA, and the like) into a host cell are known in the art, and any convenient method can be used to introduce a nucleic acid (e.g., an expression construct) into a cell. Suitable methods include e.g., viral infection, transfection, lipofection, electroporation, calcium phosphate precipitation, polyethyleneimine (PEJ)-mediated transfection, DEAE-dextran mediated transfection, liposome-mediated transfection, particle gun technology, calcium phosphate precipitation, direct microinjection, nanoparticle-mediated nucleic acid delivery, and the like.
[0248] Introducing the recombinant expression vector into cells can occur in any culture media and under any culture conditions that promote the survival of the cells. Introducing the recombinant expression vector into a target cell can be carried out in vivo or ex vivo. Introducing the recombinant expression vector into a target cell can be carried out in vitro. [0249] In some embodiments, a Casl2L protein can be provided as RNA. The RNA can be provided by direct chemical synthesis or may be transcribed in vitro from a DNA (e.g., encoding the Casl2L protein). Once synthesized, the RNA may be introduced into a cell by any of the well-known techniques for introducing nucleic acids into cells (e.g., microinjection, electroporation, transfection, etc.).
[0250] Nucleic acids may be provided to the cells using well-developed transfection techniques; see, e.g. Angel and Yanik (2010) PLoS ONE 5(7): el 1756, and the commercially available TransMessenger® reagents from Qiagen, Stemfect™ RNA Transfection Kit from Stemgent, and TransIT®-mRNA Transfection Kit from Mirus Bio LLC. See also Beumer et al. (2008) PNAS 105(50): 19821-19826. [0251] Vectors may be provided directly to a target host cell. In other words, the cells are contacted with vectors comprising the subject nucleic acids (e.g., recombinant expression vectors having the donor template sequence and encoding the Casl2L guide RNA; recombinant expression vectors encoding the Casl2L protein; etc.) such that the vectors are taken up by the cells. Methods for contacting cells with nucleic acid vectors that are plasmids, include electroporation, calcium chloride transfection, microinjection, and lipofection are well known in the art. For viral vector delivery, cells can be contacted with viral particles comprising the subject viral expression vectors.
[0252] Retroviruses, for example, lentiviruses, are suitable for use in methods of the present disclosure. Commonly used retroviral vectors are “defective”, i.e. unable to produce viral proteins required for productive infection. Rather, replication of the vector requires growth in a packaging cell line. To generate viral particles comprising nucleic acids of interest, the retroviral nucleic acids comprising the nucleic acid are packaged into viral capsids by a packaging cell line. Different packaging cell lines provide a different envelope protein (ecotropic, amphotropic or xenotropic) to be incorporated into the capsid, this envelope protein determining the specificity of the viral particle for the cells (ecotropic for murine and rat; amphotropic for most mammalian cell types including human, dog and mouse; and xenotropic for most mammalian cell types except murine cells). The appropriate packaging cell line may be used to ensure that the cells are targeted by the packaged viral particles. Methods of introducing subject vector expression vectors into packaging cell lines and of collecting the viral particles that are generated by the packaging lines are well known in the art. Nucleic acids can also introduced by direct micro-injection (e.g., injection of RNA).
[0253] Vectors used for providing the nucleic acids encoding Casl2L guide RNA and/or a Casl2L polypeptide to a target host cell can include suitable promoters for driving the expression, that is, transcriptional activation, of the nucleic acid of interest. In other words, in some cases, the nucleic acid of interest will be operably linked to a promoter. This may include ubiquitously acting promoters, for example, the CMV-0-actin promoter, or inducible promoters, such as promoters that are active in particular cell populations or that respond to the presence of drugs such as tetracycline. By transcriptional activation, it is intended that transcription will be increased above basal levels in the target cell by 10 fold, by 100 fold, more usually by 1000 fold. In addition, vectors used for providing a nucleic acid encoding a Casl2L guide RNA and/or a Casl2L protein to a cell may include nucleic acid sequences that encode for selectable markers in the target cells, so as to identify cells that have taken up the Casl2L guide RNA and/or Casl2L protein.
[0254] A nucleic acid comprising a nucleotide sequence encoding a Casl2L polypeptide, or a Casl2L fusion polypeptide, is in some cases an RNA. Thus, a Casl2L fusion protein can be introduced into cells as RNA. Methods of introducing RNA into cells are known in the art and may include, for example, direct injection, transfection, or any other method used for the introduction of DNA. A Casl2L protein may instead be provided to cells as a polypeptide. Such a polypeptide may optionally be fused to a polypeptide domain that increases solubility of the product. The domain may be linked to the polypeptide through a defined protease cleavage site, e.g. a TEV sequence, which is cleaved by TEV protease. The linker may also include one or more flexible sequences, e.g. from 1 to 10 glycine residues. In some embodiments, the cleavage of the fusion protein is performed in a buffer that maintains solubility of the product, e.g. in the presence of from 0.5 to 2 M urea, in the presence of polypeptides and/or polynucleotides that increase solubility, and the like. Domains of interest include endosomolytic domains, e.g. influenza HA domain; and other polypeptides that aid in production, e.g. IF2 domain, GST domain, GRPE domain, and the like. The polypeptide may be formulated for improved stability. For example, the peptides may be PEGylated, where the polyethyleneoxy group provides for enhanced lifetime in the blood stream.
[0255] Additionally or alternatively, a Casl2L polypeptide of the present disclosure may be fused to a polypeptide permeant domain to promote uptake by the cell. A number of permeant domains are known in the art and may be used in the non-integrating polypeptides of the present disclosure, including peptides, peptidomimetics, and non-peptide carriers. For example, a permeant peptide may be derived from the third alpha helix of Drosophila melanogaster transcription factor Antennapaedia, referred to as penetratin, which comprises the amino acid sequence RQIKIWFQNRRMKWKK (SEQ ID NO:139). As another example, the permeant peptide comprises the HIV-1 tat basic region amino acid sequence, which may include, for example, amino acids 49-57 of naturally-occurring tat protein. Other permeant domains include poly-arginine motifs, for example, the region of amino acids 34-56 of HIV-1 rev protein, nonaarginine, octa-arginine, and the like. (See, for example, Futaki et al. (2003) Curr Protein Pept Sci. 2003 Apr; 4(2): 87-9 and 446; and Wender et al. (2000) Proc. Natl. Acad. Sci. U.S.A 2000 Nov. 21 ;
97(24): 13003-8; published U.S. Patent applications 20030220334; 20030083256; 20030032593; and 20030022831, herein specifically incorporated by reference for the teachings of translocation peptides and peptoids). The nona-arginine (R9) sequence is one of the more efficient PTDs that have been characterized (Wender et al. 2000; Uemura et al. 2002). The site at which the fusion is made may be selected in order to optimize the biological activity, secretion or binding characteristics of the polypeptide. The optimal site will be determined by routine experimentation.
[0256] As noted above, in some cases, the target cell is a plant cell. Numerous methods for transforming chromosomes or plastids in a plant cell with a recombinant nucleic acid are known in the art, which can be used according to methods of the present application to produce a transgenic plant cell and/or a transgenic plant. Any suitable method or technique for transformation of a plant cell known in the art can be used. Effective methods for transformation of plants include bacterially mediated transformation, such as Agrobacterium-mediated or Rhizobium-mediated transformation and microprojectile bombardment-mediated transformation. A variety of methods are known in the art for transforming explants with a transformation vector via bacterially mediated transformation or microprojectile bombardment and then subsequently culturing, etc., those explants to regenerate or develop transgenic plants. Other methods for plant transformation, such as microinjection, electroporation, vacuum infiltration, pressure, sonication, silicon carbide fiber agitation, PEG-mediated transformation, etc., are also known in the art. Transgenic plants produced by these transformation methods can be chimeric or non-chimeric for the transformation event depending on the methods and explants used.
[0257] Methods of transforming plant cells are well known by persons of ordinary skill in the art. For instance, specific instructions for transforming plant cells by microprojectile bombardment with particles coated with recombinant DNA (e.g., biolistic transformation) are found in U.S. Patent Nos. 5,550,318; 5,538,880 6,160,208; 6,399,861; and 6,153,812 and Agrobacterium-mediated transformation is described in U.S. Patent Nos. 5,159,135; 5,824,877; 5,591,616; 6,384,301; 5,750,871; 5,463,174; and 5,188,958. Additional methods for transforming plants can be found in, for example, Compendium of Transgenic Crop Plants (2009) Blackwell Publishing. Any appropriate method known to those skilled in the art can be used to transform a plant cell with any of the nucleic acids provided herein.
[0258] A Casl2L polypeptide of the present disclosure may be produced in vitro or by eukaryotic cells or by prokaryotic cells, and it may be further processed by unfolding, e.g. heat denaturation, dithiothreitol reduction, etc. and may be further refolded, using methods known in the art.
[0259] Modifications of interest that do not alter primary sequence include chemical derivatization of polypeptides, e.g., acylation, acetylation, carboxylation, amidation, etc. Also included are modifications of glycosylation, e.g. those made by modifying the glycosylation patterns of a polypeptide during its synthesis and processing or in further processing steps; e.g. by exposing the polypeptide to enzymes which affect glycosylation, such as mammalian glycosylating or deglycosylating enzymes. Also embraced are sequences that have phosphorylated amino acid residues, e.g. phosphotyrosine, phosphoserine, or phosphothreonine.
[0260] Also suitable for inclusion in embodiments of the present disclosure are nucleic acids (e.g., encoding a Casl2L guide RNA, encoding a Casl2L fusion protein, etc.) and proteins (e.g., a Casl2L fusion protein derived from a wild type protein or a variant protein) that have been modified using ordinary molecular biological techniques and synthetic chemistry so as to improve their resistance to proteolytic degradation, to change the target sequence specificity, to optimize solubility properties, to alter protein activity (e.g., transcription modulatory activity, enzymatic activity, etc.) or to render them more suitable. Analogs of such polypeptides include those containing residues other than naturally occurring L-amino acids, e.g. D-amino acids or non-naturally occurring synthetic amino acids. D-amino acids may be substituted for some or all of the amino acid residues.
[0261] A Casl2L polypeptide of the present disclosure may be prepared by in vitro synthesis, using conventional methods as known in the art. Various commercial synthetic apparatuses are available, for example, automated synthesizers by Applied Biosystems, Inc., Beckman, etc. By using synthesizers, naturally occurring amino acids may be substituted with unnatural amino acids. The particular sequence and the manner of preparation will be determined by convenience, economics, purity required, and the like.
[0262] If desired, various groups may be introduced into the peptide during synthesis or during expression, which allow for linking to other molecules or to a surface. Thus, cysteines can be used to make thioethers, histidines for linking to a metal ion complex, carboxyl groups for forming amides or esters, amino groups for forming amides, and the like.
[0263] A Casl2L polypeptide of the present disclosure may also be isolated and purified in accordance with conventional methods of recombinant synthesis. A lysate may be prepared of the expression host and the lysate purified using high performance liquid chromatography (HPLC), exclusion chromatography, gel electrophoresis, affinity chromatography, or other purification technique. For the most part, the compositions which are used will comprise 20% or more by weight of the desired product, more usually 75% or more by weight, preferably 95% or more by weight, and for therapeutic purposes, usually 99.5% or more by weight, in relation to contaminants related to the method of preparation of the product and its purification. Usually, the percentages will be based upon total protein. Thus, in some cases, a Casl 2L polypeptide, or a Cast 2L fusion polypeptide, of the present disclosure is at least 80% pure, at least 85% pure, at least 90% pure, at least 95% pure, at least 98% pure, or at least 99% pure (e.g., free of contaminants, non-Casl2L proteins or other macromolecules, etc.).
[0264] To induce cleavage or any desired modification to a target nucleic acid (e.g., genomic DNA), or any desired modification to a polypeptide associated with target nucleic acid, the Casl2L guide RNA and/or the Casl2L polypeptide of the present disclosure and/or the donor template sequence, whether they be introduced as nucleic acids or polypeptides, are provided to the cells for about 30 minutes to about 24 hours, e.g., 1 hour, 1.5 hours, 2 hours, 2.5 hours, 3 hours, 3.5 hours 4 hours, 5 hours, 6 hours, 7 hour's, 8 hours, 12 hours, 16 hours, 18 hours, 20 hours, or any other period from about 30 minutes to about 24 hours, which may be repeated with a frequency of about every day to about every 4 days, e.g., every 1.5 days, every 2 days, every 3 days, or any other frequency from about every day to about every four days. The agent(s) may be provided to the subject cells one or more times, e.g. one time, twice, three times, or more than three times, and the cells allowed to incubate with the agent(s) for some amount of time following each contacting event e.g. 16-24 hours, after which time the media is replaced with fresh media and the cells are cultured further.
[0265] In cases in which two or more different targeting complexes are provided to the cell (e.g., two different Casl2L guide RNAs that are complementary to different sequences within the same or different target nucleic acid), the complexes may be provided simultaneously (e.g. as two polypeptides and/or nucleic acids), or delivered simultaneously. Alternatively, they may be provided consecutively, e.g. the targeting complex being provided first, followed by the second targeting complex, etc. or vice versa. [0266] To improve the delivery of a DNA vector into a target cell, the DNA can be protected from damage and its entry into the cell facilitated, for example, by using lipoplexes and polyplexes. Thus, in some cases, a nucleic acid of the present disclosure (e.g., a recombinant expression vector of the present disclosure) can be covered with lipids in an organized structure like a micelle or a liposome. When the organized structure is complexed with DNA it is called a lipoplex. There are three types of lipids, anionic (negatively-charged), neutral, or cationic (positively-charged). Lipoplexes that utilize cationic lipids have proven utility for gene transfer. Cationic lipids, due to their positive charge, naturally complex with the negatively charged DNA. Also as a result of their charge, they interact with the cell membrane.
Endocytosis of the lipoplex then occurs, and the DNA is released into the cytoplasm. The cationic lipids also protect against degradation of the DNA by the cell.
[0267] Complexes of polymers with DNA are called polyplexes. Most polyplexes consist of cationic polymers and their production is regulated by ionic interactions. One large difference between the methods of action of polyplexes and lipoplexes is that polyplexes cannot release their DNA load into the cytoplasm, so to this end, co-transfection with endosome-lytic agents (to lyse the endosome that is made during endocytosis) such as inactivated adenovirus must occur. However, this is not always the case; polymers such as polyethylenimine have their own method of endosome disruption as does chitosan and trimethylchitosan.
[0268] Dendrimers, a highly branched macromolecule with a spherical shape, may be also be used to genetically modify stem cells. The surface of the dendrimer particle may be functionalized to alter its properties. In particular, it is possible to construct a cationic dendrimer (i.e., one with a positive surface charge). When in the presence of genetic material such as a DNA plasmid, charge complementarity leads to a temporary association of the nucleic acid with the cationic dendrimer. On reaching its destination, the dendrimer-nucleic acid complex can be taken up into a cell by endocytosis.
[0269] In some cases, a nucleic acid of the disclosure (e.g., an expression vector) includes an insertion site for a guide sequence of interest. For example, a nucleic acid can include an insertion site for a guide sequence of interest, where the insertion site is immediately adjacent to a nucleotide sequence encoding the portion of a Casl2L guide RNA that does not change when the guide sequence is changed to hybridized to a desired target sequence (e.g., sequences that contribute to the Casl2L binding aspect of the guide RNA, e.g., the sequences that contribute to the dsRNA duplex(es) of the Casl2L guide RNA - this portion of the guide RNA can also be referred to as the ‘scaffold’ or ‘constant region’ of the guide RNA). Thus, in some cases, a subject nucleic acid (e.g., an expression vector) includes a nucleotide sequence encoding a Casl2L guide RNA, except that the portion encoding the guide sequence portion of the guide RNA is an insertion sequence (an insertion site). An insertion site is any nucleotide sequence used for the insertion of the desired sequence. “Insertion sites” for use with various technologies are known to those of ordinary skill in the art and any convenient insertion site can be used. An insertion site can be for any method for manipulating nucleic acid sequences. For example, in some cases the insertion site is a multiple cloning site (MCS) (e.g., a site including one or more restriction enzyme recognition sequences), a site for ligation independent cloning, a site for recombination based cloning (e.g., recombination based on att sites), a nucleotide sequence recognized by a CRISPR/Cas (e.g. Cas9) based technology, and the like.
[0270] An insertion site can be any desirable length, and can depend on the type of insertion site (e.g., can depend on whether (and how many) the site includes one or more restriction enzyme recognition sequences, whether the site includes a target site for a CRISPR/Cas protein, etc.). In some cases, an insertion site of a subject nucleic acid is 3 or more nucleotides (nt) in length (e.g., 5 or more, 8 or more, 10 or more, 15 or more, 17 or more, 18 or more, 19 or more, 20 or more or 25 or more, or 30 or more nt in length). In some cases, the length of an insertion site of a subject nucleic acid has a length in a range of from 2 to 50 nucleotides (nt) (e.g., from 2 to 40 nt, from 2 to 30 nt, from 2 to 25 nt, from 2 to 20 nt, from 5 to 50 nt, from 5 to 40 nt, from 5 to 30 nt, from 5 to 25 nt, from 5 to 20 nt, from 10 to 50 nt, from 10 to 40 nt, from 10 to 30 nt, from 10 to 25 nt, from 10 to 20 nt, from 17 to 50 nt, from 17 to 40 nt, from 17 to 30 nt, from 17 to 25 nt). In some cases, the length of an insertion site of a subject nucleic acid has a length in a range of from 5 to 40 nt.
Nucleic acid modifications
[0271] In some embodiments, a subject nucleic acid (e.g., a Casl2L guide RNA) has one or more modifications, e.g., a base modification, a backbone modification, etc., to provide the nucleic acid with a new or enhanced feature (e.g., improved stability). A nucleoside is a base-sugar combination. The base portion of the nucleoside is normally a heterocyclic base. The two most common classes of such heterocyclic bases are the purines and the pyrimidines. Nucleotides are nucleosides that further include a phosphate group covalently linked to the sugar portion of the nucleoside. For those nucleosides that include a pentofuranosyl sugar, the phosphate group can be linked to the 2', the 3', or the 5' hydroxyl moiety of the sugar. In forming oligonucleotides, the phosphate groups covalently link adjacent nucleosides to one another to form a lineal' polymeric compound. In turn, the respective ends of this linear polymeric compound can be further joined to form a circular compound, however, linear compounds are suitable. In addition, linear compounds may have internal nucleotide base complementarity and may therefore fold in a manner as to produce a fully or partially double-stranded compound. Within oligonucleotides, the phosphate groups are commonly referred to as forming the internucleoside backbone of the oligonucleotide. The normal linkage or backbone of RNA and DNA is a 3' to 5' phosphodiester linkage.
[0272] Suitable nucleic acid modifications include, but are not limited to: 2’0methyl modified nucleotides, 2’ Fluoro modified nucleotides, locked nucleic acid (LNA) modified nucleotides, peptide nucleic acid (PNA) modified nucleotides, nucleotides with phosphorothioate linkages, and a 5’ cap (e.g., a 7-methylguanylate cap (m7G)). Additional details and additional modifications are described below. [0273] A 2'-O-Methyl modified nucleotide (also referred to as 2'-O-Methyl RNA) is a naturally occurring modification of RNA found in tRNA and other small RNAs that arises as a post-transcriptional modification. Oligonucleotides can be directly synthesized that contain 2'-O-Methyl RNA. This modification increases Tm of RNA:RNA duplexes but results in only small changes in RNA:DNA stability. It is stabile with respect to attack by single-stranded ribonucleases and is typically 5 to 10-fold less susceptible to DNases than DNA. It is commonly used in antisense oligos as a means to increase stability and binding affinity to the target message. [0274] 2’ Fluoro modified nucleotides (e.g., 2' Fluoro bases) have a fluorine modified ribose which increases binding affinity (Tm) and also confers some relative nuclease resistance when compared to native RNA. These modifications can improve stability in serum or other biological fluids.
[0275] LNA bases have a modification to the ribose backbone that locks the base in the C3'-endo position, which favors RNA A-type helix duplex geometry. This modification significantly increases Tm and is also very nuclease resistant. Multiple LNA insertions can be placed in an oligo at any position except the 3 '-end. Applications have been described ranging from antisense oligos to hybridization probes to SNP detection and allele specific PCR. Due to the large increase in Tm conferred by LNAs, they also can cause an increase in primer dimer formation as well as self-hairpin formation. In some cases, the number of LNAs incorporated into a single oligo is 10 bases or less.
[0276] The phosphorothioate (PS) bond (i.e., a phosphorothioate linkage) substitutes a sulfur atom for a non-bridging oxygen in the phosphate backbone of a nucleic acid (e.g., an oligo). This modification renders the internucleotide linkage resistant to nuclease degradation. Phosphorothioate bonds can be introduced between the last 3-5 nucleotides at the 5'- or 3'-end of the oligo to inhibit exonuclease degradation. Including phosphorothioate bonds within the oligo (e.g., throughout the entire oligo) can help reduce attack by endonucleases as well.
[0277] In some cases, a subject nucleic acid has one or more nucleotides that are 2'-O-Methyl modified nucleotides. In some embodiments, a subject nucleic acid has one or more 2’ Fluoro modified nucleotides. In some embodiments, a subject nucleic acid has one or more LNA bases. In some embodiments, a subject nucleic acid has one or more nucleotides that are linked by a phosphorothioate bond (i.e., the subject nucleic acid has one or more phosphorothioate linkages). In some embodiments, a subject nucleic acid has a 5’ cap (e.g., a 7-methylguanylate cap (m7G)). In some embodiments, a subject nucleic acid has a combination of modified nucleotides. For example, a subject nucleic acid can have a 5’ cap (e.g., a 7-methylguanylate cap (m7G)) in addition to having one or more nucleotides with other modifications (e.g., a 2'-O-Methyl nucleotide and/or a 2’ Fluoro modified nucleotide and/or a LNA base and/or a phosphorothioate linkage).
Modified backbones and modified internucleoside linkages
[0278] Examples of suitable nucleic acids (e.g., a Casl2L guide RNA) containing modifications include nucleic acids containing modified backbones or non-natural internucleoside linkages. Nucleic acids having modified backbones include those that retain a phosphorus atom in the backbone and those that do not have a phosphorus atom in the backbone.
[0279] Suitable modified oligonucleotide backbones containing a phosphorus atom therein include, for example, phosphorothioates, chiral phosphorothioates, phosphorodithioates, phosphotriesters, aminoalkylphosphotriesters, methyl and other alkyl phosphonates including 3'-alkylene phosphonates, 5'- alkylene phosphonates and chiral phosphonates, phosphinates, phosphoramidates including 3'-amino phosphoramidate and aminoalkylphosphoramidates, phosphorodiamidates, thionophosphoramidates, thionoalkylphosphonates, thionoalkylphosphotriesters, selenophosphates and boranophosphates having normal 3'-5' linkages, 2'-5' linked analogs of these, and those having inverted polarity wherein one or more internucleotide linkages is a 3' to 3', 5' to 5' or 2' to 2' linkage. Suitable oligonucleotides having inverted polarity comprise a single 3' to 3' linkage at the 3'-most internucleotide linkage i.e. a single inverted nucleoside residue which may be a basic (the nucleobase is missing or has a hydroxyl group in place thereof). Various salts (such as, for example, potassium or sodium), mixed salts and free acid forms are also included.
[0280] In some embodiments, a subject nucleic acid comprises one or more phosphorothioate and/or heteroatom internucleoside linkages, in particular -CH2-NH-O-CH2-, -CH2-N(CH3)-O-CH2- (known as a methylene (methylimino) or MMI backbone), -CH2-O-N(CH3)-CH2-, -CH2-N(CH3)-N(CH3)-CH2- and - O-N(CH3)-CH2-CH2- (wherein the native phosphodiester internucleotide linkage is represented as -O- P(=O)(OH)-O-CH2-). MMI type internucleoside linkages are disclosed in the above referenced U.S. Pat. No. 5,489,677, the disclosure of which is incorporated herein by reference in its entirety. Suitable amide internucleoside linkages are disclosed in U.S. Pat. No. 5,602,240, the disclosure of which is incorporated herein by reference in its entirety.
[0281] Also suitable are nucleic acids having morpholino backbone structures as described in, e.g., U.S. Pat. No. 5,034,506. For example, in some embodiments, a subject nucleic acid comprises a 6- membered morpholino ring in place of a ribose ring. In some of these embodiments, a phosphorodiamidate or other non-phosphodiester internucleoside linkage replaces a phosphodiester linkage.
[0282] Suitable modified polynucleotide backbones that do not include a phosphorus atom therein have backbones that are formed by short chain alkyl or cycloalkyl internucleoside linkages, mixed heteroatom and alkyl or cycloalkyl internucleoside linkages, or one or more short chain heteroatomic or heterocyclic internucleoside linkages. These include those having morpholino linkages (formed in part from the sugar portion of a nucleoside); siloxane backbones; sulfide, sulfoxide and sulfone backbones; formacetyl and thioformacetyl backbones; methylene formacetyl and thioformacetyl backbones; riboacetyl backbones; alkene containing backbones; sulfamate backbones; methyleneimino and methylenehydrazino backbones; sulfonate and sulfonamide backbones; amide backbones; and others having mixed N, O, S and CH? component parts. Mimetics
102831 A subject nucleic acid can be a nucleic acid mimetic. The term "mimetic" as it is applied to polynucleotides is intended to include polynucleotides wherein only the furanose ring or both the furanose ring and the internucleotide linkage are replaced with non-furanose groups, replacement of only the furanose ring is also referred to in the art as being a sugar surrogate. The heterocyclic base moiety or a modified heterocyclic base moiety is maintained for hybridization with an appropriate target nucleic acid. One such nucleic acid, a polynucleotide mimetic that has been shown to have excellent hybridization properties, is referred to as a peptide nucleic acid (PNA). In PNA, the sugar-backbone of a polynucleotide is replaced with an amide containing backbone, in particular an ami noethyl glycine backbone. The nucleotides are retained and are bound directly or indirectly to aza nitrogen atoms of the amide portion of the backbone.
[0284] One polynucleotide mimetic that has been reported to have excellent hybridization properties is a peptide nucleic acid (PNA). The backbone in PNA compounds is two or more linked aminoethylglycine units which gives PNA an amide containing backbone. The heterocyclic base moieties are bound directly or indirectly to aza nitrogen atoms of the amide portion of the backbone.
Representative U.S. patents that describe the preparation of PNA compounds include, but are not limited to: U.S. Pat. Nos. 5,539,082; 5,714,331; and 5,719,262, the disclosures of which are incorporated herein by reference in their entirety.
[0285] Another class of polynucleotide mimetic that has been studied is based on linked morpholino units (morpholino nucleic acid) having heterocyclic bases attached to the morpholino ring. A number of linking groups have been reported that link the morpholino monomeric units in a morpholino nucleic acid. One class of linking groups has been selected to give a non-ionic oligomeric compound. The nonionic morpholino-based oligomeric compounds are less likely to have undesired interactions with cellular proteins. Morpholino-based polynucleotides are non-ionic mimics of oligonucleotides which are less likely to form undesired interactions with cellular proteins (Dwaine A. Braasch and David R. Corey, Biochemistry, 2002, 41(14), 4503-4510). Morpholino-based polynucleotides are disclosed in U.S. Pat. No. 5,034,506, the disclosure of which is incorporated herein by reference in its entirety. A variety of compounds within the morpholino class of polynucleotides have been prepared, having a variety of different linking groups joining the monomeric subunits.
[0286] A further class of polynucleotide mimetic is referred to as cyclohexenyl nucleic acids (CeNA). The furanose ring normally present in a DNA/RNA molecule is replaced with a cyclohexenyl ring. CeNA DMT protected phosphoramidite monomers have been prepared and used for oligomeric compound synthesis following classical phosphoramidite chemistry. Fully modified CeNA oligomeric compounds and oligonucleotides having specific positions modified with CeNA have been prepared and studied (see Wang et al., J. Am. Chem. Soc.. 2000, 122, 8595-8602, the disclosure of which is incorporated herein by reference in its entirety). In general, the incorporation of CeNA monomers into a DNA chain increases its stability of a DNA/RNA hybrid. CeNA oligoadenylates formed complexes with RNA and DNA complements with similar stability to the native complexes. The study of incorporating CeNA structures into natural nucleic acid structures was shown by NMR and circular dichroism to proceed with easy conformational adaptation.
[0287] A further modification includes Locked Nucleic Acids (LNAs) in which the 2'-hydroxyl group is linked to the 4' carbon atom of the sugar ring thereby forming a 2'-C,4'-C-oxymethylene linkage thereby forming a bicyclic sugar moiety. The linkage cfan be a methylene (-CH2-), group bridging the 2' oxygen atom and the 4' carbon atom wherein n is 1 or 2 (Singh et al., Chem. Commun., 1998, 4, 455-456, the disclosure of which is incorporated herein by reference in its entirety). LNA and LNA analogs display very high duplex thermal stabilities with complementary DNA and RNA (Tm=+3 to +10° C), stability towards 3'-exonucleolytic degradation and good solubility properties. Potent and nontoxic antisense oligonucleotides containing LNAs have been described (e.g., Wahlestedt et al., Proc. Natl. Acad. Sci. U.S.A., 2000, 97, 5633-5638, the disclosure of which is incorporated herein by reference in its entirety). [0288] The synthesis and preparation of the LNA monomers adenine, cytosine, guanine, 5-methyl- cytosine, thymine and uracil, along with their oligomerization, and nucleic acid recognition properties have been described (e.g., Koshkin et al., Tetrahedron, 1998, 54, 3607-3630, the disclosure of which is incorporated herein by reference in its entirety). LNAs and preparation thereof are also described in WO 98/39352 and WO 99/14226, as well as U.S. applications 20120165514, 20100216983, 20090041809, 20060117410, 20040014959, 20020094555, and 20020086998, the disclosures of which are incorporated herein by reference in their entirety.
Modified sugar moieties
[0289] A subject nucleic acid can also include one or more substituted sugar moieties. Suitable polynucleotides comprise a sugar substituent group selected from: OH; F; O-, S-, or N-alkyl; O-, S-, or N- alkenyl; O-, S- or N-alkynyl; or O-alkyl-O-alkyl, wherein the alkyl, alkenyl and alkynyl may be substituted or unsubstituted Ci to C10 alkyl or C2 to C10 alkenyl and alkynyl. Particularly suitable are O((CH2)nO)mCH3, O(CH2)nOCH3, O(CH2)nNH2, O(CH2)nCH3, O(CH2)nONH2, and O(CH2)nON((CH2)nCH3)2, where n and m are from 1 to about 10. Other suitable polynucleotides comprise a sugar substituent group selected from: Ci to C10 lower alkyl, substituted lower alkyl, alkenyl, alkynyl, alkaryl, aralkyl, O-alkaryl or O-aralkyl, SH, SCH3, OCN, Cl Br, CN, CF3, OCF3, SOCH3, SO2CH3, ONO2, NO2, N3, NH2, heterocycloalkyl, heterocycloalkaryl, aminoalkylamino, polyalkylamino, substituted silyl, an RNA cleaving group, a reporter group, an intercalator, a group for improving the pharmacokinetic properties of an oligonucleotide, or a group for improving the pharmacodynamic properties of an oligonucleotide, and other substituents having similar properties. A suitable modification includes 2'-methoxyethoxy (2'-O-CH2 CH2OCH3, also known as 2'-0-(2-methoxyethyl) or 2'-M0E) (Martin et al., Helv. Chim. Acta, 1995, 78, 486-504, the disclosure of which is incorporated herein by reference in its entirety) i.e., an alkoxyalkoxy group. A further suitable modification includes 2'- dimethylaminooxyethoxy, i.e., a CXCEh ON Cth group, also known as 2'-DMA0E, as described in examples hereinbelow, and 2'-dimethylaminoethoxyethoxy (also known in the art as 2'-O-dimethyl- amino-ethoxy-ethyl or 2'-DMAEOE), i.e., 2'-O-CH2-O-CH2-N(CH3)2-
[0290] Other suitable sugar substituent groups include methoxy (-O-CH3), aminopropoxy (—0 CH2 CH2 CH2NH2), allyl (-CH2-CH-CH2), -O-allyl (—0— CH2 — CH-CH2) and fluoro (F). 2'-sugar substituent groups may be in the arabino (up) position or ribo (down) position. A suitable 2'-arabino modification is 2'-F. Similar modifications may also be made at other positions on the oligomeric compound, particularly the 3' position of the sugar on the 3' terminal nucleoside or in 2'-5' linked oligonucleotides and the 5' position of 5' terminal nucleotide. Oligomeric compounds may also have sugar mimetics such as cyclobutyl moieties in place of the pentofuranosyl sugar.
Base modifications and substitutions
[0291] A subject nucleic acid may also include nucleobase (often referred to in the art simply as "base") modifications or substitutions. As used herein, "unmodified" or "natural" nucleobases include the purine bases adenine (A) and guanine (G), and the pyrimidine bases thymine (T), cytosine (C) and uracil (U). Modified nucleobases include other synthetic and natural nucleobases such as 5 -methylcytosine (5- me-C), 5 -hydroxymethyl cytosine, xanthine, hypoxanthine, 2-aminoadenine, 6-methyl and other alkyl derivatives of adenine and guanine, 2-propyl and other alkyl derivatives of adenine and guanine, 2- thiouracil, 2-thiothymine and 2-thiocytosine, 5-halouracil and cytosine, 5-propynyl (-C^C-CHs) uracil and cytosine and other alkynyl derivatives of pyrimidine bases, 6-azo uracil, cytosine and thymine, 5- uracil (pseudouracil), 4-thiouracil, 8-halo, 8-amino, 8-thiol, 8-thioalkyl, 8-hydroxyl and other 8- substituted adenines and guanines, 5-halo particularly 5-bromo, 5-trifluoromethyl and other 5-substituted uracils and cytosines, 7-methylguanine and 7-methyladenine, 2-F-adenine, 2-amino-adenine, 8- azaguanine and 8-azaadenine, 7-deazaguanine and 7-deazaadenine and -deazaguanine and 3- deazaadenine. Further modified nucleobases include tricyclic pyrimidines such as phenoxazine cytidine(lH-pyrimido(5,4-b)( 1 ,4)benzoxazin-2(3H)-one), phenothiazine cytidine (lH-pyrimido(5,4- b)(l,4)benzothiazin-2(3H)-one), G-clamps such as a substituted phenoxazine cytidine (e.g. 9-(2- aminoethoxy)-H-pyrimido(5,4-(b) (l,4)benzoxazin-2(3H)-one), carbazole cytidine (2H-pyrimido(4,5- b)indol-2-one), pyridoindole cytidine (H-pyrido(3',2':4,5)pyrrolo(2,3-d)pyrimidin-2-one).
[0292] Heterocyclic base moieties may also include those in which the purine or pyrimidine base is replaced with other heterocycles, for example 7-deaza-adenine, 7-deazaguanosine, 2 -aminopyridine and 2-pyridone. Further nucleobases include those disclosed in U.S. Pat. No. 3,687,808, those disclosed in The Concise Encyclopedia Of Polymer Science And Engineering, pages 858-859, Kroschwitz, J. I., ed. John Wiley & Sons, 1990, those disclosed by Englisch et al., Angewandte Chemie, International Edition, 1991, 30, 613, and those disclosed by Sanghvi, Y. S., Chapter 15, Antisense Research and Applications, pages 289-302, Crooke, S. T. and Lebleu, B., ed., CRC Press, 1993; the disclosures of which are incorporated herein by reference in their entirety. Certain of these nucleobases are useful for increasing the binding affinity of an oligomeric compound. These include 5-substituted pyrimidines, 6- azapyrimidines and N-2, N-6 and O-6 substituted purines, including 2-aminopropyladenine, 5- propynyluracil and 5-propynylcytosine. 5 -methylcytosine substitutions have been shown to increase nucleic acid duplex stability by 0.6-1.2° C. (Sanghvi et al., eds., Antisense Research and Applications, CRC Press, Boca Raton, 1993, pp. 276-278; the disclosure of which is incorporated herein by reference in its entirety) and are suitable base substitutions, e.g., when combined with 2'-O-methoxyethyl sugar modifications.
Conjugates
[0293] Another possible modification of a subject nucleic acid involves chemically linking to the polynucleotide one or more moieties or conjugates which enhance the activity, cellular distribution or cellular uptake of the oligonucleotide. These moieties or conjugates can include conjugate groups covalently bound to functional groups such as primary or secondary hydroxyl groups. Conjugate groups include, but are not limited to, intercalators, reporter molecules, polyamines, polyamides, polyethylene glycols, polyethers, groups that enhance the pharmacodynamic properties of oligomers, and groups that enhance the pharmacokinetic properties of oligomers. Suitable conjugate groups include, but are not limited to, cholesterols, lipids, phospholipids, biotin, phenazine, folate, phenanthridine, anthraquinone, acridine, fluoresceins, rhodamines, coumarins, and dyes. Groups that enhance the pharmacodynamic properties include groups that improve uptake, enhance resistance to degradation, and/or strengthen sequence-specific hybridization with the target nucleic acid. Groups that enhance the pharmacokinetic properties include groups that improve uptake, distribution, metabolism or excretion of a subject nucleic acid.
[0294] Conjugate moieties include but are not limited to lipid moieties such as a cholesterol moiety (Letsinger et al., Proc. Natl. Acad. Sci. USA, 1989, 86, 6553-6556), cholic acid (Manoharan et al., Bioorg. Med. Chem. Let., 1994, 4, 1053-1060), a thioether, e.g., hexyl-S-tritylthiol (Manoharan et al., Ann. N. Y. Acad. Sci., 1992, 660, 306-309; Manoharan et al., Bioorg. Med. Chem. Let., 1993, 3, 2765- 2770), a thiocholesterol (Oberhauser et al., Nucl. Acids Res., 1992, 20, 533-538), an aliphatic chain, e.g., dodecandiol or undecyl residues (Saison-Behmoaras et al., EMBO J., 1991, 10, 1111-1118; Kabanov et al., FEBS Lett., 1990, 259, 327-330; Svinarchuk et al., Biochimie, 1993, 75, 49-54), a phospholipid, e.g., di-hexadecyl-rac-glycerol or triethylammonium l,2-di-O-hexadecyl-rac-glycero-3-H-phosphonate (Manoharan et al., Tetrahedron Lett., 1995, 36, 3651-3654; Shea et al., Nucl. Acids Res., 1990, 18, 3777- 3783), a polyamine or a polyethylene glycol chain (Manoharan et al., Nucleosides & Nucleotides, 1995, 14, 969-973), or adamantane acetic acid (Manoharan et al., Tetrahedron Lett., 1995, 36, 3651-3654), a palmityl moiety (Mishra et al., Biochim. Biophys. Acta, 1995, 1264, 229-237), or an octadecylamine or hexylamino-carbonyl-oxycholesterol moiety (Crooke et al., J. Pharmacol. Exp. Ther., 1996, 277, 923- 937).
[0295] A conjugate may include a "Protein Transduction Domain" or PTD (also known as a CPP - cell penetrating peptide), which may refer to a polypeptide, polynucleotide, carbohydrate, or organic or inorganic compound that facilitates traversing a lipid bilayer, micelle, cell membrane, organelle membrane, or vesicle membrane. A PTD attached to another molecule, which can range from a small polar molecule to a large macromolecule and/or a nanoparticle, facilitates the molecule traversing a membrane, for example going from extracellular space to intracellular space, or cytosol to within an organelle (e.g., the nucleus). In some cases, a PTD is covalently linked to the 3’ end of an exogenous polynucleotide. In some cases, a PTD is covalently linked to the 5’ end of an exogenous polynucleotide. Exemplary PTDs include but are not limited to a minimal undecapeptide protein transduction domain (corresponding to residues 47-57 of HIV-1 TAT comprising YGRKKRRQRRR; SEQ ID NO:135); a polyarginine sequence comprising a number of arginines sufficient to direct entry into a cell (e.g., 3, 4, 5, 6, 7, 8, 9, 10, or 10-50 arginines); a VP22 domain (Zender et al. (2002) Cancer Gene Ther. 9(6):489-96); a Drosophila Antennapedia protein transduction domain (Noguchi et al. (2003) Diabetes 52(7): 1732- 1737); a truncated human calcitonin peptide (Trehin et al. (2004) Pharm. Research 21 :1248-1256); polylysine (Wender et al. (2000) Proc. Natl. Acad. Sci. USA 97:13003-13008); RRQRRTSKLMKR (SEQ ID NO: 136); Transportan GWTLNSAGYLLGKINLKALAALAKKIL (SEQ ID NO: 137);
KALAWEAKLAKALAKALAKHLAKALAKALKCEA (SEQ ID NO:138); and RQIKIWFQNRRMKWKK (SEQ ID NO: 139). Exemplary PTDs include but are not limited to, YGRKKRRQRRR (SEQ ID NO: 135), RKKRRQRRR (SEQ ID NO: 140); an arginine homopolymer of from 3 arginine residues to 50 arginine residues; Exemplary PTD domain amino acid sequences include, but are not limited to, any of the following: YGRKKRRQRRR SEQ ID NO: 135); RKKRRQRR (SEQ ID N0:141); YARAAARQARA SEQ ID NO:142); THRLPRRRRRR (SEQ ID NO:143); and GGRRARRRRRR (SEQ ID NO: 144). In some cases, the PTD is an activatable CPP (ACPP) (Aguilera et al. (2009) Integr Biol (Camb) June; 1(5-6): 371-381). ACPPs comprise a polycationic CPP (e.g., Arg9 or “R9”) connected via a cleavable linker to a matching polyanion (e.g., Glu9 or “E9”), which reduces the net charge to nearly zero and thereby inhibits adhesion and uptake into cells. Upon cleavage of the linker, the polyanion is released, locally unmasking the polyarginine and its inherent adhesiveness, thus “activating” the ACPP to traverse the membrane.
[0296] Sequences of the polynucleotides of the present disclosure may be prepared by various suitable methods known in the art, including, for example, direct chemical synthesis or cloning. For direct chemical synthesis, formation of a polymer of nucleic acids typically involves sequential addition of 3 '-blocked and 5 '-blocked nucleotide monomers to the terminal 5'-hydroxyl group of a growing nucleotide chain, wherein each addition is effected by nucleophilic attack of the terminal 5'-hydroxyl group of the growing chain on the 3 '-position of the added monomer, which is typically a phosphorus derivative, such as a phosphotriester, phosphoramidite, or the like. Such methodology is known to those of ordinary skill in the art and is described in the pertinent texts and literature (e.g., in Matteucci et al., (1980) Tetrahedron Lett 21:719-722; U.S. Pat. Nos. 4,500,707; 5,436,327; and 5,700,637). In addition, the desired sequences may be isolated from natural sources by splitting DNA using appropriate restriction enzymes, separating the fragments using gel electrophoresis, and thereafter, recovering the desired polynucleotide sequence from the gel via techniques known to those of ordinary skill in the art, such as utilization of a polymerase chain reaction (PCR; e.g., U.S. Pat. No. 4,683,195).
[0297] The nucleic acids employed in the methods and compositions described herein may be codon optimized relative to a parental template for expression in a particular host cell. Cells differ in their usage of particular codons, and codon bias corresponds to relative abundance of particular tRNAs in a given cell type. By altering codons in a sequence so that they are tailored to match with the relative abundance of corresponding tRNAs, it is possible to increase expression of a product (e.g. a polypeptide) from a nucleic acid. Similarly, it is possible to decrease expression by deliberately choosing codons corresponding to rare tRNAs. Thus, codon optimization/deoptimization can provide control over nucleic acid expression in a particular cell type (e.g. bacterial cell, plant cell, mammalian cell, etc.). Methods of codon optimizing a nucleic acid for tailored expression in a particular cell type are well-known to those of skill in the art.
Guide RNAs
[0298] Certain aspects of the present disclosure relate to guide RNAs and their use in CRISPR-based targeting of a target nucleic acid. Guide RNAs of the present disclosure are capable of binding or otherwise interacting with a Casl2L polypeptide to facilitate targeting of the Casl2L polypeptide to a target nucleic acid. Suitable and exemplary guide RNAs are provided herein and design of such to target a particular nucleic acid will be readily apparent to one of skill in the art. Guide RNAs may also be modified to improve the efficiency of their function in guiding Casl2L to a target nucleic acid.
[0299] Guide RNAs of the present disclosure contain a CRISPR RNA (crRNA) sequence, and the sequence of the crRNA is involved in conferring specificity to targeting a specific nucleic acid sequence.
[0300] In some embodiments, guide RNA molecules may be extended to include sites for the binding of RNA binding proteins. In some embodiments, multiple guide RNAs can be assembled into a pre-crRNA array that can be processed by the RuvC domain of Casl2L. This will allow for multiplex editing to enable simultaneous targeting to several sites.
[0301] In some embodiments, a guide RNA contains both RNA and a repeat sequence that is composed of DNA. In this sense, a guide RNA may be an RNA-DNA hybrid molecule.
[0302] A guide RNA (gRNA) may be expressed in a variety of ways as will be apparent to one of skill in the art. For example, a gRNA may be expressed from a recombinant nucleic acid in vivo, from a recombinant nucleic acid in vitro, from a recombinant nucleic acid ex vivo, or can be synthetically synthesized.
[0303] A guide RNA of the present disclosure may have various nucleotide lengths. A guide RNA may contain, for example, at least 5, at least 10, at least 15, at least 20, at least 25, at least 30, at least 35, at least 40, at least 50, at least 60, at least 70, at least 80, at least 90, at least 100, at least 110, at least 120, at least 130, at least 140, at least 150, at least 160, at least 170, at least 180 nucleotides, at least 190 nucleotides, or at least 200 nucleotides or more. Longer guide RNAs may result in increased editing efficiency by Casl2L polypeptides.
[0304] A guide RNA of the present disclosure may hybridize with a particular nucleotide sequence on a target nucleic acid. This hybridization may be 100% complementary or it may be less than 100% complementary so long as the hybridiziation is sufficient to allow Casl2L to bind to or interact with the target nucleic acid. A guide RNA may contain a nucleotide sequence that is, for example, at least 80%, at least 85%, at least 90%, at least 91 %, at least 92%, at least 93%, at least 94%, at least 95%, at least 96%, at least 97%, at least 98%, at least 99%, or 100% identical or complementary to the target nucleotide sequence in the target nucleic acid that is targeted by/to be hybridized with the guide RNA.
[0305] In some cases, increasing expression of a guide RNA may increase the editing efficiency of a target nucleic acid according to the methods of the present disclosure. In some cases, use of a Pol II promoter (e.g. a CmYLCV promoter) to drive gRNA expression may result in increased expression of the guide RNA as compared to a corresponding control promoter (e.g. a Pol III promoter, such as a U6 promoter for example). Use of a Pol II promoter to drive gRNA expression may increase the expression of the guide RNA by, for example, at least about 1%, at least about 5%, at least about 10%, at least about 15%, at least about 20%, at least about 25%, at least about 30%, at least about 35%, at least about 40%, at least about 45%, at least about 50%, at least about 60%, at least about 70%, at least about 80%, at least about 90%, at least about 100%, at least about 125%, at least about 150%, at least about 175%, at least about 200%, at least about 225%, at least about 250%, at least about 275%, or at least about 300% or more as compared to a corresponding control (e.g. a U6 promoter).
[0306] In some embodiments, a guide RNA of the present disclosure may be recombinantly fused with a ribozyme sequence to assist in gRNA processing. Exemplary ribozymes for use herein will be readily apparent to one of skill in the art. Exemplary ribozymes may include, for example, a Hammerhead-type ribozyme and a hepatitis delta virus ribyzome. Use of a ribozyme to assist in processing of guide RNAs may increase efficiency of editing of a target nucleic acid sequence by a Casl2L polypeptide of the present disclosure. Use of a ribozyme fused to a gRNA may increase relative editing efficiency by, for example, at least about 1%, at least about 5%, at least about 10%, at least about 15%, at least about 20%, at least about 25%, at least about 30%, at least about 35%, at least about 40%, at least about 45%, at least about 50%, at least about 60%, at least about 70%, at least about 80%, at least about 90%, at least about 100%, at least about 125%, at least about 150%, at least about 175%, at least about 200%, at least about 225%, at least about 250%, at least about 275%, or at least about 300% or more as compared to a corresponding control (e.g. a guide RNA that is expressed without the assistance of any additional processing machinery).
Methods of Identifying Sequence Similarity
[0307] Various methods are known to those of skill in the art for identifying similar (e.g. homologs, orthologs, paralogs, etc.) polypeptide and/or polynucleotide sequences, including phylogenetic methods, sequence similarity analysis, and hybridization methods.
[0308] Phylogenetic trees may be created for a gene family by using a program such as CLUSTAL (Thompson et al. Nucleic Acids Res. 22: 4673-4680 (1994); Higgins et al. Methods Enzymol 266: 383- 402 (1996)) or MEGA (Tamura et al. Mol. Biol. & Evo. 24:1596-1599 (2007)). Once an initial tree for genes from one species is created, potential orthologous sequences can be placed in the phylogenetic tree and their relationships to genes from the species of interest can be determined. Evolutionary relationships may also be inferred using the Neighbor-Joining method (Saitou and Nei, Mol. Biol. & Evo. 4:406-425 (1987)). Homologous sequences may also be identified by a reciprocal BLAST strategy. Evolutionary distances may be computed using the Poisson correction method (Zuckerkandl and Pauling, pp. 97-166 in Evolving Genes and Proteins, edited by V. Bryson and H.J. Vogel. Academic Press, New York (1965)). [0309] In addition, evolutionary information may be used to predict gene function. Functional predictions of genes can be greatly improved by focusing on how genes became similar in sequence (i.e. by evolutionary processes) rather than on the sequence similarity itself (Eisen, Genome Res. 8: 163-167 (1998)). Many specific examples exist in which gene function has been shown to correlate well with gene phylogeny (Eisen, Genome Res. 8: 163-167 (1998)). By using a phylogenetic analysis, one skilled in the art would recognize that the ability to deduce similar functions conferred by closely -related polypeptides is predictable.
[0310] When a group of related sequences are analyzed using a phylogenetic program such as CLUSTAL, closely related sequences typically cluster together or in the same clade (a group of similar genes). Groups of similar genes can also be identified with pair-wise BLAST analysis (Feng and Doolittle, J. Mol. Evol. 25: 351-360 (1987)). Analysis of groups of similar genes with similar function that fall within one clade can yield sub-sequences that are particular to the clade. These sub-sequences, known as consensus sequences, can not only be used to define the sequences within each clade, but define the functions of these genes; genes within a clade may contain paralogous sequences, or orthologous sequences that share the same function (see also, for example, Mount, Bioinformatics: Sequence and Genome Analysis Cold Spring Harbor Laboratory Press, Cold Spring Harbor, N.Y., page 543 (2001)).
[0311] To find sequences that are homologous to a reference sequence, BLAST nucleotide searches can be performed with the BLASTN program, scorc= 100, wordlength=12, to obtain nucleotide sequences homologous to a nucleotide sequence encoding a protein of the disclosure. BLAST protein searches can be performed with the BLASTX program, scorc=50, wordlcngth=3. to obtain amino acid sequences homologous to a protein or polypeptide of the disclosure. To obtain gapped alignments for comparison purposes, Gapped BLAST (in BLAST 2.0) can be utilized as described in Altschul et al. (1997) Nucleic Acids Res. 25:3389. Alternatively, PSI-BLAST (in BLAST 2.0) can be used to perform an iterated search that detects distant relationships between molecules. See Altschul et al. (1997) supra. When utilizing BLAST, Gapped BLAST, or PSI-BLAST, the default parameters of the respective programs (e.g., BLASTN for nucleotide sequences, BLASTX for proteins) can be used.
[0312] Methods for the alignment of sequences and for the analysis of similarity and identity of polypeptide and polynucleotide sequences are well-known in the art.
[0313] As used herein “sequence identity” refers to the percentage of residues that are identical in the same positions in the sequences being analyzed. As used herein “sequence similarity” refers to the percentage of residues that have similar biophysical / biochemical characteristics in the same positions (e.g. charge, size, hydrophobicity) in the sequences being analyzed. [0314] Methods of alignment of sequences for comparison are well-known in the art, including manual alignment and computer assisted sequence alignment and analysis. This latter approach is a preferred approach in the present disclosure, due to the increased throughput afforded by computer assisted methods. As noted below, a variety of computer programs for performing sequence alignment are available, or can be produced by one of skill.
[0315] The determination of percent sequence identity and/or similarity between any two sequences can be accomplished using a mathematical algorithm. Examples of such mathematical algorithms are the algorithm of Myers and Miller, CABIOS 4: 11-17 (1988); the local homology algorithm of Smith et al., Adv. Appl. Math. 2:482 (1981); the homology alignment algorithm of Needleman and Wunsch, J. Mol. Biol. 48:443-453 (1970); the search-for-similarity-method of Pearson and Lipman, Proc. Natl. Acad. Sci. 85:2444-2448 (1988); the algorithm of Karlin and Altschul, Proc. Natl. Acad. Sci. USA 87:2264-2268 (1990), modified as in Karlin and Altschul, Proc. Natl. Acad. Sci. USA 90:5873-5877 (1993).
[0316] Computer implementations of these mathematical algorithms can be utilized for comparison of sequences to determine sequence identity and/or similarity. Such implementations include, for example: CLUSTAL in the PC/Gene program (available from Intelligenetics, Mountain View, Calif.); the AlignX program, versionl0.3.0 (Invitrogen, Carlsbad, CA) and GAP, BESTFIT, BLAST, FASTA, and TFASTA in the Wisconsin Genetics Software Package, Version 8 (available from Genetics Computer Group (GCG), 575 Science Drive, Madison, Wis., USA). Alignments using these programs can be performed using the default parameters. The CLUSTAL program is well described by Higgins et al. Gene 73:237-244 (1988); Higgins et al. CABIOS 5:151-153 (1989); Corpet et al., Nucleic Acids Res.
16:10881-90 (1988); Huang et al. CABIOS 8:155-65 (1992); and Pearson et al., Meth. Mol. Biol. 24:307- 331 (1994). The BLAST programs of Altschul et al. J. Mol. Biol. 215:403-410 (1990) are based on the algorithm of Karlin and Altschul (1990) supra.
[0317] Polynucleotides homologous to a reference sequence can be identified by hybridization to each other under stringent or under highly stringent conditions. Single stranded polynucleotides hybridize when they associate based on a variety of well characterized physical-chemical forces, such as hydrogen bonding, solvent exclusion, base stacking and the like. The stringency of a hybridization reflects the degree of sequence identity of the nucleic acids involved, such that the higher the stringency, the more similar are the two polynucleotide strands. Stringency is influenced by a variety of factors, including temperature, salt concentration and composition, organic and non-organic additives, solvents, etc. present in both the hybridization and wash solutions and incubations (and number thereof), as described in more detail in references cited below (e.g., Sambrook et al., Molecular Cloning: A Laboratory Manual, 2nd Ed., Vol. 1-3, Cold Spring Harbor Laboratory, Cold Spring Harbor, N.Y. ("Sambrook") (1989); Berger and Kimmel, Guide to Molecular Cloning Techniques, Methods in Enzymology, vol. 152 Academic Press, Inc., San Diego, Calif. ("Berger and Kimmel") (1987); and Anderson and Young, "Quantitative Filter Hybridisation." In: Hames and Higgins, ed_, Nucleic Acid Hybridisation, A Practical Approach. Oxford, TRL Press, 73-111 (1985)).
[0318] Encompassed by the disclosure are polynucleotide sequences that are capable of hybridizing to the disclosed polynucleotide sequences and fragments thereof under various conditions of stringency (see, for example, Wahl and Berger, Methods Enzymol. 152: 399-407 (1987); and Kimmel, Methods Enzymo. 152: 507-511, (1987)). Full length cDNA, homologs, orthologs, and paralogs of polynucleotides of the present disclosure may be identified and isolated using well-known polynucleotide hybridization methods.
[0319] With regard to hybridization, conditions that are highly stringent, and means for achieving them, are well known in the art. See, for example, Sambrook et al. (1989) (supra); Berger and Kimmel (1987) pp. 467-469 (supra); and Anderson and Young (1985)(supra).
[0320] Hybridization experiments are generally conducted in a buffer of pH between 6.8 to 7.4, although the rate of hybridization is nearly independent of pH at ionic strengths likely to be used in the hybridization buffer (Anderson and Young (1985)(supra)). In addition, one or more of the following may be used to reduce non-specific hybridization: sonicated salmon sperm DNA or another non- complementary DNA, bovine serum albumin, sodium pyrophosphate, sodium dodecylsulfate (SDS), polyvinyl-pyrrolidone, ficoll and Denhardt's solution. Dextran sulfate and polyethylene glycol 6000 act to exclude DNA from solution, thus raising the effective probe DNA concentration and the hybridization signal within a given unit of time. In some instances, conditions of even greater stringency may be desirable or required to reduce non-specific and/or background hybridization. These conditions may be created with the use of higher temperature, lower ionic strength and higher concentration of a denaturing agent such as formamide.
[0321] Stringency conditions can be adjusted to screen for moderately similar fragments such as homologous sequences from distantly related organisms, or to highly similar fragments such as genes that duplicate functional enzymes from closely related organisms. The stringency can be adjusted either during the hybridization step or in the post-hybridization washes. Salt concentration, formamide concentration, hybridization temperature and probe lengths are variables that can be used to alter stringency. As a general guideline, high stringency is typically performed at Tm-5°C to Tm-20°C, moderate stringency at Tm-20°C to Tm-35°C and low stringency at Tm-35°C to Tm-50° C for duplex >150 base pairs. Hybridization may be performed at low to moderate stringency (25-50°C below Tm), followed by post-hybridization washes at increasing stringencies. Maximum rates of hybridization in solution are determined empirically to occur at Tm-25°C for DNA-DNA duplex and Tm-15°C for RNA- DNA duplex. Optionally, the degree of dissociation may be assessed after each wash step to determine the need for subsequent, higher stringency wash steps.
[0322] High stringency conditions may be used to select for nucleic acid sequences with high degrees of identity to the disclosed sequences. An example of stringent hybridization conditions obtained in a filter-based method such as a Southern or northern blot for hybridization of complementary nucleic acids that have more than 100 complementary residues is about 5 °C to 20°C lower than the thermal melting point (Tm) for the specific sequence at a defined ionic strength and pH.
[0323] Hybridization and wash conditions that may be used to bind and remove polynucleotides with less than the desired homology to the nucleic acid sequences or their complements of the present disclosure include, for example: 6X saline sodium citrate (SSC) and 1% sodium dodecyl sulfate (SDS) at 65°C; 50% formamide, 4X SSC at 42°C; 0.5X SSC to 2.0 X SSC, 0.1% SDS at 50°C to 65°C; or 0.1X SSC to 2X SSC, 0.1% SDS at 50°C - 65°C; with a first wash step of, for example, 10 minutes at about 42°C with about 20% (v/v) formamide in 0.1X SSC, and with, for example, a subsequent wash step with 0.2 X SSC and 0.1% SDS at 65°C for 10, 20 or 30 minutes. A 20X solution of SSC is 3 M sodium chloride and 300 mM trisodium citrate, pH 7.0.
[0324] For identification of less closely related homologs, wash steps may be performed at a lower temperature, e.g., 50o C. An example of a low stringency wash step employs a solution and conditions of at least 25°C in 30 mM NaCl, 3 mM trisodium citrate, and 0.1% SDS over 30 min. Greater stringency may be obtained at 42°C in 15 mM NaCl, with 1.5 mM trisodium citrate, and 0.1% SDS over 30 min. Wash procedures will generally employ at least two final wash steps. Additional variations on these conditions will be readily apparent to those skilled in the art (see, for example, US Patent Application No. 20010010913).
[0325] If desired, one may employ wash steps of even greater stringency, including conditions of 65°C -68°C in a solution of 15 mM NaCl, 1.5 mM trisodium citrate, and 0.1% SDS, or about 0.2X SSC, 0.1 % SDS at 65° C and washing twice, each wash step of 10, 20 or 30 min in duration, or about 0.1 X SSC, 0.1% SDS at 65° C and washing twice for 10, 20 or 30 min. Hybridization stringency may be increased further by using the same conditions as in the hybridization steps, with the wash temperature raised about 3 °C to about 5 °C, and stringency may be increased even further by using the same conditions except the wash temperature is raised about 6 °C to about 9 °C.
Recombinant Expression
[0326] Recombinant nucleic acids and/or recombinant polypeptides of the present disclosure may be present in host cells (e.g. plant cells). In some embodiments, recombinant nucleic acids are present in an expression vector and may encode a recombinant polypeptide, and the expression vector may be present in host cells (e.g. plant cells). In some embodiments, recombinant nucleic acids and/or recombinant polypeptides are present in host cells (e.g. plant cells) via direct introduction into the cell (e.g. via RNPs). [0327] In some embodiments, the genes encoding the recombinant polypeptides in the plant cell may be heterologous to the plant cell. In certain embodiments, the plant cell does not naturally produce one or more polypeptides of the present disclosure, and contains heterologous nucleic acid constructs capable of expressing one or more genes necessary for producing those molecules. In certain embodiments, the plant cell does not naturally produce one or more polypeptides of the present disclosure, and is provided the one or more polypeptides through exogenous delivery of the polypeptides directly to the plant cell without the need to express a recombinant nucleic acid encoding the recombinant polypeptide in the plant cell.
[0328] Recombinant polypeptides of the present disclosure may be introduced into host cells (e.g. plant cells) via any suitable methods known in the art. For example, a Casl2L polypeptide can be exogenously added to plant cells and the plant cells are maintained under conditions such that the recombinant polypeptide is targeted (via a guide RNA) to one or more target nucleic acids to edit/modify the target nucleic acids in the plant cells. Alternatively, a recombinant nucleic acid encoding a Casl2L polypeptide of the present disclosure can be expressed in plant cells and the plant cells ar e maintained under conditions such that the Casl2L polypeptide is targeted (via a guide RNA) to one or more target nucleic acids to edit/modify the target nucleic acids in the plant cells. Additionally, in some embodiments, a Casl2L polypeptide of the present disclosure may be transiently expressed in a plant via viral infection of the plant, or by introducing a Casl2L polypeptide -encoding RNA into a plant to facilitate editing/modification of a target nucleic acid of interest. This approach may be particularly well- suited for Casl2L-based editing given that the small size of Casl2L proteins may make them more amenable to delivery via virus. Methods of introducing proteins via viral infection or via the introduction of RNAs into plants are well known in the art. For example, Tobacco rattle virus (TRV) has been successfully used to introduce zinc finger nucleases in plants to cause genome modification (“Nontransgenic Genome Modification in Plant Cells”, Plant Physiology 154:1079-1087 (2010)). TRV and other appropriate viruses may be used herein to facilitate editing in plants cells.
[0329] In some embodiments, a Casl2L polypeptide and a guide RNA may be exogenously and directly supplied to a plant cell as a ribonucleoprotein (RNP) complex. This particular form of delivery is useful for facilitating transgene-free editing in plants. Modified guide RNAs which are resistant to nuclease digestion could also be used in this approach. Transgene-free callus from plants cells provided with an RNP could be used to regenerate whole edited plants. [0330] A recombinant nucleic acid encoding a recombinant polypeptide of the present disclosure can be expressed in a plant with any suitable plant expression vector. Typical vectors useful for expression of recombinant nucleic acids in higher plants are well known in the art and include, for example, vectors derived from the tumor-inducing (Ti) plasmid of Agrobacterium tumefaciens (e.g., see Rogers et al., Meth, in Enzymol. (1987) 153:253-277). These vectors are plant integrating vectors in that on transformation, the vectors integrate a portion of vector DNA into the genome of the host plant.
Exemplary A. tumefaciens vectors useful herein are plasmids pKYLX6 and pKYLX7 (e.g., see of Schardl et al., Gene (1987) 61:1-11; and Berger et al., Proc. Natl. Acad. Sci. USA (1989) 86:8402-8406); and plasmid pBI 101.2 that is available from Clontech Laboratories, Inc. (Palo Alto, CA).
[0331] In addition to regulatory domains, recombinant polypeptides of the present disclosure can be expressed as a fusion protein that is coupled to, for example, a maltose binding protein ("MBP"), glutathione S transferase (GST), hexahistidine, c-myc, or the FLAG epitope for ease of purification, monitoring expression, or monitoring cellular and subcellular localization.
[0332] Moreover, a recombinant nucleic acid encoding a recombinant polypeptide of the present disclosure can be modified to improve expression of the recombinant protein in plants by using codon preference/codon optimization to target preferential expression in plant cells. When the recombinant nucleic acid is prepared or altered synthetically, advantage can be taken of known codon preferences of the intended plant host where the nucleic acid is to be expressed. For example, recombinant nucleic acids of the present disclosure can be modified to account for the specific codon preferences and GC content preferences of monocotyledons and dicotyledons, as these preferences have been shown to differ (Murray et al., Nucl. Acids Res. (1989) 17: 477-498).
[0333] The present disclosure further provides expression vectors encoding recombinant polypeptides of the present disclosure. A nucleic acid sequence coding for the desired recombinant nucleic acid of the present disclosure can be used to construct a recombinant expression vector which can be introduced into the desired host cell. A recombinant expression vector will typically contain a nucleic acid encoding a recombinant protein of the present disclosure, operably linked to transcriptional initiation regulatory sequences which will direct the transcription of the nucleic acid in the intended host cell, such as tissues of a transformed plant.
[0334] Recombinant nucleic acids e.g. encoding recombinant polypeptides of the present disclosure may be expressed on multiple expression vectors or they may be expressed on a single expression vector. For example, plant expression vectors may include (1) a cloned gene under the transcriptional control of 5' and 3' regulatory sequences and (2) a dominant selectable marker. Such plant expression vectors may also contain, if desired, a promoter regulatory region (e.g., one conferring inducible or constitutive, environmentally-regulated or developmentally-regulated expression, or cell- or tissue-specific/selective expression), a transcription initiation start site, a ribosome binding site, an RNA processing signal, a transcription termination site, and/or a polyadenylation signal.
[0335] In some embodiments, expression of a nucleic acid of the present disclosure may be driven (in operable linkage) with a promoter (e.g. a promoter functional in plants or a plant-specific promoter). A promoter generally refers to a DNA sequence that contains an RNA polymerase binding site, transcription start site, and/or TATA box and assists or promotes the transcription and expression of an associated transcribable polynucleotide sequence such as, for example, a gene. A plant promoter, or functional fragment thereof, can be employed to e.g. control the expression of a recombinant nucleic acid of the present disclosure in regenerated plants. The selection of the promoter used in expression vectors will determine the spatial and temporal expression pattern of the recombinant nucleic acid in the modified plant, e.g., the nucleic acid encoding the recombinant polypeptide of the present disclosure is only expressed in the desired tissue or at a certain time in plant development or growth. Certain promoters will express recombinant nucleic acids in all plant tissues and are active under most environmental conditions and states of development or cell differentiation (i.e., constitutive promoters). Other promoters will express recombinant nucleic acids in specific cell types (such as leaf epidermal cells, mesophyll cells, root cortex cells) or in specific tissues or organs (roots, leaves or flowers, for example) and the selection will reflect the desired location of accumulation of the gene product. Alternatively, the selected promoter may drive expression of the recombinant nucleic acid under various inducing conditions.
[0336] Examples of suitable constitutive promoters may include, for example, the core promoter of the Rsyn7, the core CaMV 35S promoter (Odell et al., Nature (1985) 313:810-812), CaMV 19S (Lawton et al., 1987), rice actin (Wang et al., 1992; U.S. Pat. No. 5,641,876; and McElroy et al., Plant Cell (1985) 2:163-171); ubiquitin (Christensen et al., Plant Mol. Biol. (1989)12:619-632; and Christensen et al., Plant Mol. Biol. (1992) 18:675-689), pEMU (Last et al., Theor. Appl. Genet. (1991) 81:581-588), MAS (Velten et al., EMBO J. (1984) 3:2723-2730), nos (Ebert et al., 1987), Adh (Walker et al., 1987), the P- or 2'- promoter derived from T-DNA of Agrobacterium tumefaciens, the Smas promoter, the cinnamyl alcohol dehydrogenase promoter (U.S. Pat. No. 5,683,439), the Nos promoter, the pEmu promoter, the rubisco promoter, the GRP 1 - 8 promoter, and other transcription initiation regions from various plant genes known to those of skilled artisans, and constitutive promoters described in, for example, U.S. Pat. Nos. 5,608,149; 5,608,144; 5,604,121; 5,569,597; 5,466,785; 5,399,680; 5,268,463; and 5, 608,142.
[0337] In some embodiments, expression of a nucleic acid of the present disclosure may be driven (in operable linkage) with a UBQ10 promoter. In some embodiments, expression of a nucleic acid of the present disclosure may be driven (in operable linkage) with a promoter having a nucleic acid sequence with at least about 20%, at least about 25%, at least about 30%, at least about 40%, at least about 50%, at least about 55%, at least about 60%, at least about 65%, at least about 70%, at least about 75%, at least about 80%, at least about 85%, at least about 90%, at least about 91%, at least about 92%, at least about 93%, at least about 94%, at least about 95%, at least about 96%, at least about 97%, at least about 98%, at least about 99%, or at least about 100% nucleic acid sequence identity to the following UBQ10 promoter sequence:
CGACGAGTCAGTAATAAACGGCGTCAAAGTGGTTGCAGCCGGCACACACGAGTCGTGTTTA TCAACTCAAAGCACAAATACTTTTCCTCAACCTAAAAATAAGGCAATTAGCCAAAAACAACT TTGCGTGTAAACAACGCTCAATACACGTGTCATTTTATTATTAGCTATTGCTTCACCGCCTTA GCTTTCTCGTGACCTAGTCGTCCTCGTCTTTTCTTCTTCTTCTTCTATAAAACAATACCCAAAG AGCTCTTCTTCTTCACAATTCAGATTTCAATTTCTCAAAATCTTAAAAACTTTCTCTCAATTCT CTCTACCGTGATCAAGGTAAATTTCTGTGTTCCTTATTCTCTCAAAATCTTCGATTTTGTTTTC GTTCGATCCCAATTTCGTATATGTTCTTTGGTTTAGATTCTGTTAATCTTAGATCGAAGACGA TTTTCTGGGTTTGATCGTTAGATATCATCTTAATTCTCGATTAGGGTTTCATAGATATCATCC GATTTGTTCAAATAATTTGAGTTTTGTCGAATAATTACTCTTCGATTTGTGATTTCTATCTAGA TCTGGTGTTAGTTTCTAGTTTGTGCGATCGAATTTGTAGATTAATCTGAGTTTTTCTGATTAAC A (SEQ ID NO:1).
[0338] In some cases, a UBQ10 promoter comprises the following amino acid sequence: CGACGAGTCAGTAATAAACGGCGTCAAAGTGGTTGCAGCCGGCACACACGAGTCGTGTTTA TCAACTCAAAGCACAAATACTTTTCCTCAACCTAAAAATAAGGCAATTAGCCAAAAACAACT TTGCGTGTAAACAACGCTCAATACACGTGTCATTTTATTATTAGCTATTGCTTCACCGCCTTA GCTTTCTCGTGACCTAGTCGTCCTCGTCTTTTCTTCTTCTTCTTCTATAAAACAATACCCAAAG AGCTCTTCTTCTTCACAATTCAGATTTCAATTTCTCAAAATCTTAAAAACTTTCTCTCAATTCT
CTCTACCGTGATCAAGGTAAATTTCTGTGTTCCTTATTCTCTCAAAATCTTCGATTTTGTTTTC GTTCGATCCCAATTTCGTATATGTTCTTTGGTTTAGATTCTGTTAATCTTAGATCGAAGACGA TTTTCTGGGTTTGATCGTTAGATATCATCTTAATTCTCGATTAGGGTTTCATAGATATCATCC GATTTGTTCAAATAATTTGAGTTTTGTCGAATAATTACTCTTCGATTTGTGATTTCTATCTAGA TCTGGTGTTAGTTTCTAGTTTGTGCGATCGAATTTGTAGATTAATCTGAGTTTTTCTGATTAAC A (SEQ ID NO:1).
[0339] In some cases, expression of a nucleic acid of the present disclosure may be driven with a UBQ10 promoter (i.e., the nucleic acid is operably linked to a UBQ10 promoter) having a nucleic acid sequence with at least about 20%, at least about 25%, at least about 30%, at least about 40%, at least about 50%, at least about 55%, at least about 60%, at least about 65%. at least about 70%, at least about 75%, at least about 80%, at least about 85%, at least about 90%, at least about 91%, at least about 92%, at least about 93%, at least about 94%, at least about 95%, at least about 96%, at least about 97%, at least about 98%, at least about 99%, or at least about 100% nucleic acid sequence identity to the UBQ10 promoter sequence depicted in FIG. 6 (SEQ ID NO:5).
[0340] Recombinant nucleic acids of the present disclosure may be expressed using an RNA
Polymerase III (Pol III) promoter such as, for example, the U6 promoter or the Hl promoter (eLife 2013 2:e00471). For example, an approach in plants has been described using three different Pol III promoters from three different Arabidopsis U6 genes, and their corresponding gene terminators (BMC Plant Biology 2014 14:327). One skilled in the art would readily understand that many additional Pol 111 promoters could be utilized to, for example, simultaneously express many guide RNAs to many different locations in the genome simultaneously. The use of different Pol III promoters for each gRNA expression cassette may be desirable to reduce the chances of natural gene silencing that can occur when multiple copies of identical sequences are expressed in plants.
[0341] In some cases, expression of a nucleic acid of the present disclosure may be driven (in operable linkage) with a U6 promoter. In some embodiments, expression of a nucleic acid of the present disclosure may be driven (in operable linkage) with a promoter having a nucleic acid sequence with at least about 20%, at least about 25%, at least about 30%, at least about 40%, at least about 50%, at least about 55%, at least about 60%, at least about 65%, at least about 70%, at least about 75%, at least about 80%, at least about 85%, at least about 90%, at least about 91%, at least about 92%, at least about 93%, at least about 94%, at least about 95%, at least about 96%, at least about 97%, at least about 98%, at least about 99%, or at least about 100% nucleic acid sequence identity to the following U6 promoter sequence: AAGGTCGGGCAGGAAGAGGGCCTATTTCCCATGATTCCTTCATATTTGCATATACGATACAA GGCTGTTAGAGAGATAATTAGAATTAATTTGACTGTAAACACAAAGATATTAGTACAAAATA CGTGACGTAGAAAGTAATAATTTCTTGGGTAGTTTGCAGTTTTAAAATTATGTTTTAAAATGG ACTATCATATGCTTACCGTAACTTGAAAGTATTTCGATTTCTTGGCTTTATATATCTTGTGGA AAGGACG (SEQ ID NO:4).
[0342] A U6 promoter can have the following amino acid sequence: AAGGTCGGGCAGGAAGAGGGCCTATTTCCCATGATTCCTTCATATTTGCATATACGATACAA GGCTGTTAGAGAGATAATTAGAATTAATTTGACTGTAAACACAAAGATATTAGTACAAAATA CGTGACGTAGAAAGTAATAATTTCTTGGGTAGTTTGCAGTTTTAAAATTATGTTTTAAAATGG ACTATCATATGCTTACCGTAACTTGAAAGTATTTCGATTTCTTGGCTTTATATATCTTGTGGA AAGGACG (SEQ ID NO:4). [0343] In some cases, a nucleic acid comprises a nucleotide sequence that is operably linked to a U6 promoter having a nucleic acid sequence with at least about 20%, at least about 25%, at least about 30%, at least about 40%, at least about 50%, at least about 55%, at least about 60%, at least about 65%, at least about 70%, at least about 75%, at least about 80%, at least about 85%, at least about 90%, at least about 91%, at least about 92%, at least about 93%, at least about 94%, at least about 95%, at least about 96%, at least about 97%, at least about 98%, at least about 99%, or at least about 100% nucleic acid sequence identity to the following AtU626 promoter sequence:
AAGCTTCGTTGAACAACGGAAACTCGACTTGCCTTCCGCACAATACATCATTTCTTCTTAGCT TTTTTTCTTCTTCTTCGTTCATACAGTTTTTTTTTGTTTATCAGCTTACATTTTCTTGAACCGTA GCTTTCGTTTTCTTCTTTTTAACTTTCCATTCGGAGTTTTTGTATCTTGTTTCATAGTTTGTCCC AGGATTAGAATGATTAGGCATCGAACCTTCAAGAATTTGATTGAATAAAACATCTTCATTCT TAAGATATGAAGATAATCTTCAAAAGGCCCCTGGGAATCTGAAAGAAGAGAAGCAGGCCCA TTTATATGGGAAAGAACAATAGTATTTCTTATATAGGCCCATTTAAGTTGAAAACAATCTTC AAAAGTCCCACATCGCTTAGATAAGAAAACGAAGCTGAGTTTATATACAGCTAGAGTCGAA GTAGTGATT (SEQ ID NO:6).
[0344] Recombinant nucleic acids of the present disclosure may be expressed using an RNA Polymerase II (Pol II) promoter such as, for example, the CmYLCV promoter and the 35S promoter. See, e.g., Sahoo et al. (2014) Planta 240:855. Use of a Pol II promoter to drive expression of nucleic acids (e.g. guide RNA expression) may provide additional flexibility for controlling the strength/degree of expression and may provide the possibility of tissue-specific expression. One skilled in the art would recognize appropriate Pol II promoters for use in the methods and compositions of the present disclosure. [0345] In some cases, expression of a nucleic acid of the present disclosure may be driven (in operable linkage) with a CmYLCV promoter. CmYLCV promoters are described in, e.g., WO 2001/073087; and Sahoo et al. (2016) Methods Mol. Biol. 1482:111. In some cases, expression of a nucleic acid of the present disclosure may be driven (in operable linkage) with a promoter having a nucleic acid sequence with at least about 20%, at least about 25%, at least about 30%, at least about 40%, at least about 50%, at least about 55%, at least about 60%, at least about 65%, at least about 70%, at least about 75%, at least about 80%, at least about 85%, at least about 90%, at least about 91%, at least about 92%, at least about 93%, at least about 94%, at least about 95%, at least about 96%, at least about 97%, at least about 98%, at least about 99%, or 100%, nucleic acid sequence identity to the following CmYLCV promoter nucleotide sequence:
TGGC AGAC AT ACTGTCCC AC A A ATG A AGATGG A ATCTGT A A A AG A A A ACGCGTG A A AT A AT GCGTCTGACAAAGGTTAGGTCGGCTGCCTTTAATCAATACCAAAGTGGTCCCTACCACGATG GAAAAACTGTGCAGTCGGTTTGGCTTTTTCTGACGAACAAATAAGATTCGTGGCCGACAGGT GGGGGTCCACCATGTGAAGGCATCTTCAGACTCCAATAATGGAGCAATGACGTAAGGGCTT ACGAAATAAGTAAGGGTAGTTTGGGAAATGTCCACTCACCCGTCAGTCTATAAATACTTAGC CCCTCCCTCATTGTTAAGGGAGCAAAATCTCAGAGAGATAGTCCTAGAGAGAGAAAGAGAG CAAGTAGCCTAGAAGTAGTCAAGGCGGCGAAGTATTCAGGCACGTGGCCAGGAAGAAGAA AAGCCAAGACGACGAAAACAGGTAAGAGCTAAGCTT (SEQ ID NO:2).
[0346] In some cases, expression of a nucleic acid of the present disclosure may be driven (in operable linkage) with a Cauliflower mosaic virus 35S promoter (CaMV 35S promoter). In some cases, a nucleic acid of the present disclosure comprises a nucleotide sequence operably linked to a promoter having a nucleic acid sequence with at least about 20%, at least about 25%, at least about 30%, at least about 40%, at least about 50%, at least about 55%, at least about 60%, at least about 65%, at least about 70%, at least about 75%, at least about 80%, at least about 85%, at least about 90%, at least about 91%, at least about 92%, at least about 93%, at least about 94%, at least about 95%, at least about 96%, at least about 97%, at least about 98%, at least about 99%, or 100%, nucleic acid sequence identity to the following CaMV 35S promoter nucleotide sequence:
GGTCAACATGGTGGAGCACGACACACTTGTCTACTCCAAAAATATCAAAGATACAGTCTCAG AAGACCAAAGGGCAATTGAGACTTTTCAACAAAGGGTAATATCCGGAAACCTCCTCGGATT CCATTGCCCAGCTATCTGTCACTTTATTGTGAAGATAGTGGAAAAGGAAGGTGGCTCCTACA AATGCCATCATTGCGATAAAGGAAAGGCCATCGTTGAAGATGCCTCTGCCGACAGTGGTCCC AAAGATGGACCCCCACCCACGAGGAGCATCGTGGAAAAAGAAGACGTTCCAACCACGTCTT CAAAGCAAGTGGATTGATGTGATAACATGGTGGAGCACGACACACTTGTCTACTCCAAAAA TATCAAAGATACAGTCTCAGAAGACCAAAGGGCAATTGAGACTTTTCAACAAAGGGTAATA TCCGGAAACCTCCTCGGATTCCATTGCCCAGCTATCTGTCACTTTATTGTGAAGATAGTGGAA AAGGAAGGTGGCTCCTACAAATGCCATCATTGCGATAAAGGAAAGGCCATCGTTGAAGATG CCTCTGCCGACAGTGGTCCCAAAGATGGACCCCCACCCACGAGGAGCATCGTGGAAAAAGA AGACGTTCCAACCACGTCTTCAAAGCAAGTGGATTGATGTGATATCTCCACTGACGTAAGGG ATGACGCACAATCCCACTATCCTTCGCAAGACCCTTCCTCTATATAAGGAAGTTCATTTCATT TGGAGAGGACCTCGACTCTAGAGGATCCC (SEQ ID NOG). This promoter sequence is referred to as “2x35S” promoter.
[0347] In some cases, a CaMV 35S promoter has the following nucleotide sequence: GGTCAACATGGTGGAGCACGACACACTTGTCTACTCCAAAAATATCAAAGATACAGTCTCAG AAGACCAAAGGGCAATTGAGACTTTTCAACAAAGGGTAATATCCGGAAACCTCCTCGGATT CCATTGCCCAGCTATCTGTCACTTTATTGTGAAGATAGTGGAAAAGGAAGGTGGCTCCTACA AATGCCATCATTGCGATAAAGGAAAGGCCATCGTTGAAGATGCCTCTGCCGACAGTGGTCCC AAAGATGGACCCCCACCCACGAGGAGCATCGTGGAAAAAGAAGACGTTCCAACCACGTCTT CAAAGCAAGTGGATTGATGTGATAACATGGTGGAGCACGACACACTTGTCTACTCCAAAAA TATCAAAGATACAGTCTCAGAAGACCAAAGGGCAATTGAGACTTTTCAACAAAGGGTAATA TCCGGAAACCTCCTCGGATTCCATTGCCCAGCTATCTGTCACTTTATTGTGAAGATAGTGGAA AAGGAAGGTGGCTCCTACAAATGCCATCATTGCGATAAAGGAAAGGCCATCGTTGAAGATG CCTCTGCCGACAGTGGTCCCAAAGATGGACCCCCACCCACGAGGAGCATCGTGGAAAAAGA AGACGTTCCAACCACGTCTTCAAAGCAAGTGGATTGATGTGATATCTCCACTGACGTAAGGG ATGACGCACAATCCCACTATCCTTCGCAAGACCCTTCCTCTATATAAGGAAGTTCATTTCATT TGGAGAGGACCTCGACTCTAGAGGATCCC (SEQ ID NOG).
[0348] In some embodiments, expression of a nucleic acid of the present disclosure may be driven (in operable linkage) with a 2x35S promoter. In some embodiments, expression of a nucleic acid of the present disclosure may be driven (in operable linkage) with a promoter having a nucleic acid sequence with at least about 20%, at least about 25%, at least about 30%, at least about 40%, at least about 50%, at least about 55%, at least about 60%, at least about 65%, at least about 70%, at least about 75%, at least about 80%, at least about 85%, at least about 90%, at least about 91%, at least about 92%, at least about 93%, at least about 94%, at least about 95%, at least about 96%, at least about 97%, at least about 98%, at least about 99%, or at least about 100% nucleic acid sequence identity to the nucleic acid sequence of SEQ ID NOG.
[0349] Examples of suitable tissue specific promoters may include, for example, the lectin promoter (Vodkin et al., 1983; Lindstrom et al., 1990), the corn alcohol dehydrogenase 1 promoter (Vogel et al., 1989; Dennis et al., 1984), the corn light harvesting complex promoter (Simpson, 1986; Bansal et al., 1992), the corn heat shock protein promoter (Odell et al., Nature (1985) 313:810-812; Rochester et al., 1986), the pea small subunit RuBP carboxylase promoter (Poulsen et al., 1986; Cashmore et al., 1983), the Ti plasmid mannopine synthase promoter (Langridge et al., 1989), the Ti plasmid nopaline synthase promoter (Langridge et al., 1989), the petunia chaicone isomerase promoter (Van Tunen et al., 1988), the bean glycine rich protein 1 promoter (Keller et al., 1989), the truncated CaMV 35s promoter (Odell et al., Nature (1985) 313:810-812), the potato patatin promoter (Wenzler et al., 1989), the root cell promoter (Conkling et al., 1990), the maize zein promoter (Reina et al., 1990; Kriz et al., 1987; Wandelt and Feix, 1989; Langridge and Feix, 1983; Reina et al., 1990), the globulin-1 promoter (Belanger and Kriz et al., 1991), the a-tubulin promoter, the cab promoter (Sullivan et al., 1989), the PEPCase promoter (Hudspeth & Grula, 1989), the R gene complex-associated promoters (Chandler et al., 1989), and the chaicone synthase promoters (Franken et al., 1991). [0350] Alternatively, the plant promoter can direct expression of a recombinant nucleic acid of the present disclosure in a specific tissue or may be otherwise under more precise environmental or developmental control. Such promoters are referred to here as “inducible” promoters. Environmental conditions that may affect transcription by inducible promoters include, for example, pathogen attack, anaerobic conditions, or the presence of light. Examples of inducible promoters include, for example, the AdhI promoter which is inducible by hypoxia or cold stress, the Hsp70 promoter which is inducible by heat stress, and the PPDK promoter which is inducible by light. Examples of promoters under developmental control include, for example, promoters that initiate transcription only, or preferentially, in certain tissues, such as leaves, roots, fruit, seeds, or flowers. An exemplary promoter is the anther specific promoter 5126 (U.S. Pat. Nos. 5,689,049 and 5,689,051). The operation of a promoter may also vary depending on its location in the genome. Thus, an inducible promoter may become fully or partially constitutive in certain locations.
[0351] Moreover, any combination of a constitutive or inducible promoter, and a non-tissue specific or tissue specific promoter may be used to control the expression of various recombinant polypeptides of the present disclosure.
[0352] The recombinant nucleic acids of the present disclosure and/or a vector housing a recombinant nucleic acid of the present disclosure, may also contain a regulatory sequence that serves as a 3’ terminator sequence. A terminator sequence generally refers to a nucleic acid sequence that marks the end of a gene or transcribable nucleic acid during transcription. One of skill in the art would readily recognize a variety of terminators that may be used in the recombinant nucleic acids of the present disclosure. For example, a recombinant nucleic acid of the present disclosure may contain a 3’ NOS terminator. In some embodiments, recombinant nucleic acids of the present disclosure contain a transcriptional termination site. Transcription termination sites may include, for example, OCS terminators, rbcS-E9 terminators, NOS terminators, HSP18.2 terminators, and poly-T terminators.
[0353] In some embodiments, a nucleic acid of the present disclosure may contain a transcriptional termination site having a nucleic acid sequence with at least about 20%, at least about 25%, at least about 30%, at least about 40%, at least about 50%, at least about 55%, at least about 60%, at least about 65%, at least about 70%, at least about 75%, at least about 80%, at least about 85%, at least about 90%, at least about 91%, at least about 92%, at least about 93%, at least about 94%, at least about 95%, at least about 96%, at least about 97%, at least about 98%, at least about 99%, or at least about 100% nucleic acid sequence identity to the nucleic acid sequence of an 35S terminator, a HSP18 terminator, and/or an RbcS- E9 terminator. [0354] Recombinant nucleic acids of the present disclosure may include one or more introns. Introns may be included in e.g. recombinant nucleic acids being expressed on a vector in a host cell. The inclusion of one of more introns in a recombinant nucleic acid to be expressed may be particularly helpful to increase expression in plant cells.
[0355] Recombinant nucleic acids of the present disclosure may also contain selectable markers. A selectable marker can be used to assist in the selection of transformed cells or tissue due to the presence of a selection agent, such as an antibiotic or herbicide, where the selectable marker gene provides tolerance or resistance to the selection agent. Thus, the selection agent can bias or favor the survival, development, growth, proliferation, etc., of transformed cells expressing the selectable marker gene. Selectable marker genes may include, for example, those conferring tolerance or resistance to antibiotics, such as kanamycin and paromomycin ( nptll ), hygromycin B (aph IV), streptomycin or spectinomycin ( aadA ) and gentamycin ( aac3 and aacC4), or those conferring tolerance or resistance to herbicides such as glufosinate ( bar or pat), dicamba (DM0) and glyphosate (aroA or Cp4-EPSPS). Selectable marker genes which provide an ability to visually screen for transformants may also be used such as, for example, luciferase or green fluorescent protein (GFP), or a gene expressing a beta glucuronidase or uidA gene (GUS) for which various chromogenic substrates are known. In some embodiments, a nucleic acid molecule provided herein contains a selectable marker gene selected from the group consisting of nptll, aph IV, aadA, aac3, aacC4, bar, pat, DM0, EPSPS, aroA, luciferase, GFP, and GUS.
Plants and Plant Cells
[0356] Certain aspects of the present disclosure relate to plants and plant cells that contain Casl2L polypeptides that are targeted to one or more target nucleic acids in the plant/plant cell in order to edit/modify the target nucleic acid.
[0357] As used herein, a “plant” refers to any of various photosynthetic, eukaryotic multi-cellular organisms of the kingdom Plantae, characteristically producing embryos, containing chloroplasts, having cellulose cell walls and lacking locomotion. As used herein, a “plant” includes any plant or part of a plant at any stage of development, including seeds, suspension cultures, plant cells, embryos, meristematic regions, callus tissue, leaves, roots, shoots, gametophytes, sporophytes, pollen, microspores, and progeny thereof. Also included are cuttings, and cell or tissue cultures. As used in conjunction with the present disclosure, plant tissue includes, for example, whole plants, plant cells, plant organs, e.g., leafs, stems, roots, meristems, plant seeds, protoplasts, callus, cell cultures, and any groups of plant cells organized into structural and/or functional units.
[0358] Various plant cells may be used in the present disclosure so long as they remain viable after being transformed or otherwise modified to express recombinant nucleic acids or house recombinant polypeptides. Preferably, the plant cell is not adversely affected by the transduction of the necessary nucleic acid sequences, the subsequent expression of the proteins or the resulting intermediates.
[0359] As disclosed herein, a broad range of plant types may be modified to incorporate recombinant polypeptides and/or polynucleotides of the present disclosure. Suitable plants that may be modified include both monocoty ledonous (monocot) plants and dicotyledonous (dicot) plants.
[0360] Examples of suitable plants may include, for example, species of the Family Gramineae, including Sorghum bicolor and Zea mays; species of the genera: Cucurbita, Rosa, Vitis, Juglans, Fragaria, Lotus, Medicago, Onobrychis, Trifolium, Trigonella, Vigna, Citrus, Linum, Geranium, Manihot, Daucus, Arabidopsis, Brassica, Raphanus, Sinapis, Atropa, Capsicum, Datura, Hyoscyamus, Lycopersicon, Nicotiana, Solanum, Petunia, Digitalis, Majorana, Ciahorium, Helianthus, Lactuca, Bromus, Asparagus, Antirrhinum, Heterocallis, Nemesis, Pelargonium, Panieum, Pennisetum, Ranunculus, Senecio, Salpiglossis, Cucumis, Browaalia, Glycine, Pisum, Phaseolus, Lolium, Oryza, Avena, Hordeum, Secale, and Triticum.
[0361] In some embodiments, plant cells may include, for example, those from corn (Zea mays), canola (Brassica napus, Brassica rapa ssp.), Brassica species useful as sources of seed oil, alfalfa (Medicago sativa), rice (Oryza sativa), rye (Secale cereale), sorghum (Sorghum bicolor, Sorghum vulgare), millet (e.g., pearl millet (Pennisetum glaucum), proso millet (Panieum miliaceum), foxtail millet (Setaria italica), finger millet (Eleusine coracana)), sunflower (Helianthus annuus), safflower (Carthamus tinctorius), wheat (Triticum aestivum), duckweed (Lemna), soybean (Glycine max), tobacco (Nicotiana tabacum), potato (Solanum tuberosum), peanuts (Arachis hypogaea), cotton (Gossypium barbadense, Gossypium hirsutum), sweet potato (Ipomoea batatus), cassava (Manihot esculenta), coffee (Coffea spp.), coconut (Cocos nueijra), pineapple (Ananas comosus), citrus trees (Citrus spp.), cocoa (Theobroma cacao), tea (Camellia sinensis), banana (Musa spp.), avocado (Persea americana), fig (Ficus casica), guava (Psidium guajava), mango (Mangifera indica), olive (Olea europaea), papaya (Carica papaya), cashew (Anacardium occidentale), macadamia (Macadamia spp.), almond (Prunus amygdalus), sugar beets (Beta vulgaris), sugarcane (Saccharum spp.), oats, barley, vegetables, ornamentals, and conifers. [0362] Examples of suitable vegetables plants may include, for example, tomatoes (Lycopersicon esculentum), lettuce (e.g., Lactuca sativa), green beans (Phaseolus vulgaris), lima beans (Phaseolus limensis), peas (Lathyrus spp.), and members of the genus Cucumis such as cucumber (C. sativus), cantaloupe (C. cantalupensis), and musk melon (C. melo).
[0363] Examples of suitable ornamental plants may include, for example, azalea (Rhododendron spp.), hydrangea (Macrophylla hydrangea), hibiscus (Hibiscus rosasanensis), roses (Rosa spp.), tulips (Tulipa spp.), daffodils (Narcissus spp.), petunias (Petunia hybrida), carnation (Dianthus caryophyllus), poinsettia (Euphorbiapulcherrima), and chrysanthemum.
[0364] Examples of suitable conifer plants may include, for example, loblolly pine (Pinus taeda), slash pine (Pinus elliotii), ponderosa pine (Pinus ponderosa), lodgepole pine (Pinus contorta), Monterey pine (Pinus radiata), Douglas-fir (Pseudotsuga menziesii), Western hemlock (Isuga canadensis), Sitka spruce (Picea glauca), redwood (Sequoia sempervirens), silver fir (Abies amabilis), balsam fir (Abies balsamea), Western red cedar (Thuja plicata), and Alaska yellow-cedar (Chamaecyparis nootkatensis). [0365] Examples of suitable leguminous plants may include, for example, guar, locust bean, fenugreek, soybean, garden beans, cowpea, mungbean, lima bean, fava bean, lentils, chickpea, peanuts (Arachis sp.), crown vetch (Vicia sp.), hairy vetch, adzuki bean, lupine (Lupinus sp.), trifolium, common bean (Phaseolus sp.), field bean (Pisum sp.), clover (Melilotus sp.) Lotus, trefoil, lens, and false indigo.
[0366] Examples of suitable forage and turf grass may include, for example, alfalfa (Medicago s sp.), orchard grass, tall fescue, perennial ryegrass, creeping bent grass, and redtop.
[0367] Examples of suitable crop plants and model plants may include, for example, Arabidopsis, corn, rice, alfalfa, sunflower, canola, soybean, cotton, peanut, sorghum, wheat, tobacco, and lemna. [0368] The plants and plant cells of the present disclosure may be genetically modified in that recombinant nucleic acids have been introduced into the plants, and as such the genetically modified plants and/or plant cells do not occur in nature. A suitable plant of the present disclosure is e.g. one capable of expressing one or more nucleic acid constructs encoding one or more recombinant proteins. The recombinant proteins encoded by the nucleic acids may be e.g. Casl2L polypeptides.
[0369] As used herein, the terms “transgenic plant” and “genetically modified plant” are used interchangeably and refer to a plant which contains within its genome a recombinant nucleic acid. Generally, the recombinant nucleic acid is stably integrated within the genome such that the polynucleotide is passed on to successive generations. However, in certain embodiments, the recombinant nucleic acid is transiently expressed in the plant. The recombinant nucleic acid may be integrated into the genome alone or as part of a recombinant expression cassette. “Transgenic” is used herein to include any cell, cell line, callus, tissue, plant part or plant, the genotype of which has been altered by the presence of exogenous nucleic acid including those transgenics initially so altered as well as those created by sexual crosses or asexual propagation from the initial transgenic.
[0370] Plant transformation protocols as well as protocols for introducing recombinant nucleic acids of the present disclosure into plants may vary depending on the type of plant or plant cell, e.g., monocot or dicot, targeted for transformation. Suitable methods of introducing recombinant nucleic acids of the present disclosure into plant cells and subsequent insertion into the plant genome include, for example, microinjection (Crossway et al., Biotechniques (1986) 4:320-334), electroporation (Riggs et al., Proc. Natl. Acad Sci. USA (1986) 83:5602-5606), Agrobacterium-mediated transformation (U.S. Pat. No. 5,563,055), direct gene transfer (Paszkowski et al., EMBO J. (1984) 3:2717-2722), and ballistic particle acceleration (U.S. Pat. No. 4,945,050; Tomes et al. (1995). "Direct DNA Transfer into Intact Plant Cells via Microprojectile Bombardment," in Plant Cell, Tissue, and Organ Culture: Fundamental Methods, ed. Gamborg and Phillips (Springer-Verlag, Berlin); and McCabe et al., Biotechnology (1988) 6:923-926). [0371] Additionally, recombinant polypeptides of the present disclosure can be targeted to a specific organelle within a plant cell. Targeting can be achieved by providing the recombinant protein with an appropriate targeting peptide sequence. Examples of such targeting peptides include, for example, secretory signal peptides (for secretion or cell wall or membrane targeting), plastid transit peptides, chloroplast transit peptides, mitochondrial target peptides, vacuole targeting peptides, nuclear targeting peptides, and the like (e.g., see Reiss et al., Mol. Gen. Genet. (1987) 209(1): 116-121 ; Settles and Martienssen, Trends Cell Biol (1998) 12:494-501; Scott et al., J Biol Chem (2000) 10:1074; and Luque and Correas, J Cell Sci (2000) 113:2485-2495).
[0372] Modified plant may be grown in accordance with conventional methods (e.g., see McCormick et al., Plant Cell. Reports (1986) 81-84.). These plants may then be grown, and pollinated with either the same transformed strain or different strains, with the resulting hybrid having the desired phenotypic characteristic. Two or more generations may be grown to ensure that the subject phenotypic characteristic is stably maintained and inherited and then seeds harvested to ensure the desired phenotype or other property has been achieved.
[0373] The present disclosure also provides plants derived from plants having an edited/modified nucleic acid as a consequence of the methods of the present disclosure. A plant having an edited/modified nucleic acid as a consequence of the methods of the present disclosure may be crossed with itself or with another plant to produce an Fl plant. In some embodiments, one or more of the resulting Fl plants may also have an edited/modified nucleic acid. Accordingly, in some embodiments, provided are progeny plants that are the progeny (either directly or indirectly) of plants having an edited/modified nucleic acid as a consequence of the methods of the present disclosure. These progeny plants may also have an edited/modified nucleic acid. Progeny plants may also have an altered or modified phenotype as compared to a corresponding control plant.
[0374] Further provided are methods of screening plants derived from plants having an edited/modified nucleic acid as a consequence of the methods of the present disclosure. In some embodiments, the derived plants (e.g. F1 or F2 plants resulting from or derived from crossing the plant having an edited/modified nucleic acid expression as a consequence of the methods of the present disclosure with another plant) can be selected from a population of derived plants. For example, provided are methods of selecting one or more of the derived plants that (i) lack recombinant nucleic acids, and (ii) have an edited/modified nucleic acid. Because the edit/modification of the target nucleic acid may be heritable, progeny plants as described herein do not necessarily need to contain a Casl2L polypeptide and/or a guide RNA in order to maintain the edit/modification to the target nucleic acid.
[0375] Plants with genetic backgrounds that are susceptible to transgene silencing may exhibit reduced Casl2L-mediated editing efficiency. It may thus be desireable, in some embodiments, to employ a genetic background that has reduced or eliminated susceptibility to transgene silencing. In some embodiments, employing a genetic background with reduced or eliminated susceptibility to transgene silencing may improve editing efficiency. Exemplary genetic backgrounds with reduced or eliminated susceptibility to transgene silencing will be readily apparent to one of skill in the art and include, for example, plants with mutations in RDR6 that reduce or eliminate RDR6 expression or function.
[0376] Conducting the methods of the present disclosure in a plant with a genetic background that reduces or eliminates susceptibility to transgene siliencing may increase the relative editing efficiency of a target nucleic acid by, for example, at least about 1%, at least about 5%, at least about 10%, at least about 15%, at least about 20%, at least about 25%, at least about 30%, at least about 35%, at least about 40%, at least about 45%, at least about 50%, at least about 60%, at least about 70%, at least about 80%, at least about 90%, at least about 100%, at least about 125%, at least about 150%, at least about 175%, at least about 200%, at least about 225%, at least about 250%, at least about 275%, or at least about 300% or more as compared to a corresponding control (e.g. a wild-type plant).
Methods of Modifying a Target Nucleic Acid
[0377] The present disclosure provides methods of modifying a target nucleic acid in a eukaryotic cell; the methods generally involve contacting a target nucleic acid in the eukaryotic cell with a Casl2L polypeptide of the present disclosure and a guide RNA. The methods may further comprise use of a donor DNA. Suitable eukaryotic cells including mammalian cells, plant cells, insect cells, arachnid cells, protozoan cells, fish cells, fungal cells, yeast cells, amphibian cells, reptile cells, and avian cells.
[0378] In some cases, the eukaryotic cell is a plant cell. Growing and/or cultivation conditions sufficient for the recombinant polypeptides and/or polynucleotides of the present disclosure to be expressed and/or maintained in the plant/plant cell and to be targeted to and edit/modify one or more target nucleic acids of the present disclosure are well known in the art and include any suitable growing conditions disclosed herein. Typically, the plant is grown under conditions sufficient to express a Casl2L polypeptide of the present disclosure, and for the expressed Cast 2L polypeptides to be localized to the nucleus of cells of the plant in order to be targeted to and edit/modify the target nucleic acids (if those target nucleic acids are present in the nucleus). Generally, the conditions sufficient for the expression of the Casl2L polypeptide will depend on the promoter used to control the expression of the Casl2L polypeptide. For example, if an inducible promoter is utilized, expression of the recombinant polypeptide in a plant will require that the plant to be grown in the presence of the inducer.
Growth Conditions
[0379] As noted above, growing conditions sufficient for the recombinant polypeptides of the present disclosure to be expressed and/or maintained in the plant and to be targeted to one or more target nucleic acids to edit/modify the one or more target nucleic acids may vary depending on a number of factors (e.g. species of plant, use of inducible promoter, etc.). Suitable growing conditions may include, for example, ambient environmental conditions, standard laboratory conditions, standard greenhouse conditions, growth in long days under standard environmental conditions (e.g. 16 hours of light, 8 hours of dark), growth in 12 hour light: 12 hour dark day/night cycles, etc.
[0380] Plants and/or plant cells of the present disclosure housing a Casl2L polypeptide and a guide RNA may be maintained at a variety of temperatures. In general, the temperature should be sufficient for the Casl2L polypeptide and guide RNA to form, maintain, or otherwise be present as a complex that is able to target a target nucleic acid in order to edit/modify the target nucleic acids. Exemplary growth/cultivation temperatures include, for example, at least about 20°C, at least about 21 °C, at least about 22°C, at least about 23°C, at least about 24°C, at least about 25°C, at least about 26°C, at least about 27°C, at least about 28°C, at least about 29°C, at least about 30°C, at least about 31 °C, at least about 32°C, at least about 33°C, at least about 34°C, at least about 35°C, at least about 36°C, at least about 37°C, at least about 38°C, at least about 39°C, or at least about 40°C. Exemplary growth/cultivation temperatures include, for example, about 20°C to about 25°C, about 25°C to about 30°C, about 30°C to about 35 °C, or about 35 °C to about 40°C. Plants and plant cells may be maintained at a constant temperature throughout the duration of the growth and/or incuation period, or the temperature schedule can be adjusted at various points throughout the duration of the growth and/or incuation period as will be readily apparent to one of skill in the art depending on the particular growth and/or incubation purpose.
[0381] Various time frames may be used to observe editing/modification of a target nucleic acid according to the methods of the present disclosure. Plants and/or plant cells may be observed/as sayed for editing/modification of a target nucleic acid after, for example, about 30 minutes, about 45 minutes, about 1 hour, about 2.5 hours, about 5 hours, about 7.5 hours, about 10 hours, about 15 hours, about 20 hours, about 1 day, about 5 days, about 10 days, about 15 days, about 20 days, about 25 days, about 30 days, about 35 days, about 40 days, about 45 days, about 50 days, or about 55 days or more after being cultivated/grown in conditions sufficient for a Casl2L polypeptide to facilitate editing/modification of a target nucleic acid.
Editing/Modifying a Target Nucleic Acid
[0382] Certain aspects of the present disclosure relate to editing or modifying a target nucleic acid using Casl2L polypeptides. In some embodiments, a Casl2L polypeptide is used to create a mutation in a target nucleic acid. Mutation of a nucleic acid generally refers to an insertion, deletion, substitution, duplication, or inversion of one or more nucleotides in the nucleic acid as compared to a reference or control nucleotide sequence.
[0383] In some embodiments, a Casl2L polypeptide of the present disclosure may induce a doublestranded break (DSB) at a target site of a nucleic acid sequence that is then repaired by the natural processes of either homologous recombination (HR) or non-homologous end-joining (NHEJ). Sequence modifications, such as for example insertions and deletions, can occur at the DSB locations via NHEJ repair. If two DSBs flanking one target region are created, the breaks can be repaired via NHEJ by reversing the orientation of the targeted DNA (also referred to as an “inversion”). HR can be used to integrate a donor nucleic acid sequence into a target site. In one aspect, a double-stranded break provided herein is repaired by NHEJ. In another aspect, a double-stranded break provided herein is repaired by HR. [0384] In some embodiments, a Casl2L polypeptide of the present disclosure may induce a doublestranded break with 5’ nucleotide overhangs at a target site of a nucleic acid sequence such that an exogenous DNA segment of interest can serve as the donor nucleic acid to be ligated into the target nucleic acid. The presence of 5’ nucleotide overhangs allows the insertion of the exogenous DNA to be directional.
[0385] In some embodiments, a nucleic acid that encodes a polypeptide may be targeted and edited such that the modification to the nucleic acid results in a change to one or more codons in the encoded polypeptide. In some embodiments, the modification of the target nucleic acid may result in deletion of one or more codons in the encoded polypeptide.
[0386] A target nucleic acid of the present disclosure may be edited or modified in a variety of ways (e.g. deletion of nucleotides in the target nucleic acid) depending on the particular application as will be readily apparent to one of skill in the art. A target nucleic acid subjected to the methods of the present disclosure may have an edit or modification of at least 1 nucleotide, at least 2 nucleotides, at least 3 nucleotides, at least 4 nucleotides, at least 5 nucleotides, at least 6 nucleotides, at least 7 nucleotides, at least 8 nucleotides, at least 9 nucleotides, at least 10 nucleotides, at least 11 nucleotides, at least 12 nucleotides, at least 13 nucleotides, at least 14 nucleotides, at least 15 nucleotides, at least 16 nucleotides, at least 17 nucleotides, at least 18 nucleotides, at least 19 nucleotides, at least 20 nucleotides, at least 21 nucleotides, at least 22 nucleotides, at least 23 nucleotides, at least 24 nucleotides, or at least 25 nucleotides or more.
[0387] A target nucleic acid of the present disclosure may have its expression decreased/downregulated as compared to a corresponding control nucleic acid. A target nucleic acid of the present disclosure in a cell may have its expression decreased/downregulated by at least about 5%, at least about 10%, at least about 15%, at least about 20%, at least about 25%, at least about 30%, at least about 40%, at least about 50%, at least about 55%, at least about 60%, at least about 65%, at least about 70%, at least about 75%, at least about 80%, at least about 85%, at least about 90%, at least about 91%, at least about 92%, at least about 93%, at least about 94%, at least about 95%, at least about 96%, at least about 97%, at least about 98%, at least about 99%, or at least about 100% as compared to a corresponding control. Various controls will be readily apparent to one of skill in the art. For example, a control may be a corresponding plant or plant cell that does not contain recombinant polypeptides of the present disclosure (e.g. wild-type plant or plant cell).
[0388] A target nucleic acid may have its expression decreased/downregulated at least about 5%, at least 10%, at least 15%, at least 20%, at least 25%, at least 30%, at least 35%, at least 40%, at least 45%, at least 50%, at least 55%, at least 60%, at least 65%, at least 70%, at least 75%, at least 80%, at least 85%, at least 90%, at least 95%, or more, as compared to a corresponding control nucleic acid. As stated above, various controls will be readily apparent to one of skill in the art. For example, a control nucleic acid may be a corresponding nucleic acid from a plant or plant cell that does not contain a nucleic acid encoding a recombinant polypeptide of the present disclosure.
[0389] A target nucleic acid of the present disclosure may have its expression increased/upregulated/activated as compared to a corresponding control nucleic acid. A target nucleic acid of the present disclosure in a cell may have its expression increased/upregulated/activated by at least about 5%, at least about 10%, at least about 15%, at least about 20%, at least about 25%, at least about 30%, at least about 40%, at least about 50%, at least about 55%, at least about 60%, at least about 65%, at least about 70%, at least about 75%, at least about 80%, at least about 85%, at least about 90%, at least about 91%, at least about 92%, at least about 93%, at least about 94%, at least about 95%, at least about 96%, at least about 97%, at least about 98%, at least about 99%, at least about 100% (or two-fold), at least 2.5-fold, at least 5-fold, at least 10-fold, at least 25-fold, at least 50-fold, at least 75-fold, at least 100-fold, or more than 100-fold, as compared to a corresponding control. Various controls will be readily apparent to one of skill in the art. For example, a control may be a corresponding plant or plant cell that does not contain recombinant polypeptides of the present disclosure (e.g. wild-type plant or plant cell). [0390] A target nucleic acid may have its expression increased/upregulated/activated at least about 1-fold, at least about 2-fold, at least about 3-fold, at least about 4-fold, at least about 5-fold, at least about 10-fold, at least about 15-fold, at least about 20-fold, at least about 25-fold, at least about 30-fold, at least about 40-fold, at least about 50-fold, at least about 75-fold, at least about 100-fold, at least about 150-fold, at least about 200-fold, at least about 300-fold, at least about 400-fold, at least about 500-fold, at least about 600-fold, at least about 700-fold, at least about 800-fold, at least about 900-fold, at least about 1,000-fold, at least about 1,250-fold, at least about 1,500-fold, at least about 1,750-fold, at least about
2,000-fold, at least about 2,500-fold, at least about 3,000-fold, at least about 3,500-fold, at least about
4,000-fold, at least about 4,500-fold, at least about 5,000-fold, at least about 5,500-fold, at least about
6,000-fold, at least about 6,500-fold, at least about 7,000-fold, at least about 7,500-fold, at least about
8,000-fold, at least about 8,500-fold, at least about 9,000-fold, at least about 9,500-fold, at least about
10,000-fold, at least about 12,000-fold, at least about 14,00-fold, at least about 16,000-fold, at least about 18,000-fold, or at least about 20,000-fold or more as compared to a corresponding control nucleic acid. As stated above, various controls will be readily apparent to one of skill in the art. For example, a control nucleic acid may be a corresponding nucleic acid from a plant or plant cell that does not contain a nucleic acid encoding a recombinant polypeptide of the present disclosure.
[0391] Certain aspects of the present disclosur e relate to increasing editing efficiency of a Casl2L polypeptides of the present disclosure. Editing frequency and efficiency, as well as methods of determing such, are well-known in the art. Generally speaking, editing efficiency is evaluated by determining the observed quantity of a given target sequence that experienced an editing event (editing frequency) as compared to the total quantity of the target sequence observed (whether edited or unedited). An increase in editing efficiency generally refers to an increase in the number of sequences experiencing an editing event (editing frequency) as compared to the total quantity of the target sequence observed (whether edited or unedited).
[0392] In some embodiments, increases in editing efficiency are compared to corresponding controls in relative terms (relative editing efficiency). For example, if the absolute editing frequency in one condition is 0.5% and the absolute editing frequency in a second condition is 1%, the second condition represents a doubling of the absolute editing frequency relative to the first condition, or in other words, the second condition represents a 100% increase in relative editing efficiency as compared to the first condition.
[0393] The frequency or efficiency of editing of a target nucleic acid of the present disclosure may vary. For example, the particular promoter used to drive gRNA expression may influence the editing efficiency of a target nucleic acid. In some embodiments, use of a Pol II promoter (e.g. a CmYLCV promoter) to drive gRNA expression may result in increased editing efficiency as compared to a corresponding control promoter (e.g. a Pol III promoter, such as a U6 promoter for example). Use of a Pol II promoter to drive gRNA expression may increase the relative editing efficiency of a target nucleic acid by, for example, at least about 1%, at least about 5%, at least about 10%, at least about 15%, at least about 20%, at least about 25%, at least about 30%, at least about 35%, at least about 40%, at least about 45%, at least about 50%, at least about 60%, at least about 70%, at least about 80%, at least about 90%, at least about 100%, at least about 125%, at least about 150%, at least about 175%, at least about 200%, at least about 225%, at least about 250%, at least about 275%, or at least about 300% or more as compared to a corresponding control e.g. a U6 promoter).
[0394] Various conditions or variables described herein may improve editing efficiency of a Casl2L polypeptide as described herein (e.g. targeting a region of open chromatin for editing, use of a ribozyme in the gRNA targeting, performing editing in a plant genetic background that exhibits reduced transgene silencing, etc.) as compared to corresponding control conditions or varaibles. Various conditions or variables described herein may increase the relative editing efficiency of a target nucleic acid by, for example, at least about 1%, at least about 5%, at least about 10%, at least about 15%, at least about 20%, at least about 25%, at least about 30%, at least about 35%, at least about 40%, at least about 45%, at least about 50%, at least about 60%, at least about 70%, at least about 80%, at least about 90%, at least about 100%, at least about 125%, at least about 150%, at least about 175%, at least about 200%, at least about 225%, at least about 250%, at least about 275%, or at least about 300% or more as compared to a corresponding control condition or variable. Applicable control conditions or variables will be readily apparent to one of skill in the art depending on the particular editing context. For example, the corresponding control may be as compared to a region of closed chromatin or heterochromatin, editing without the use of a ribozyme, and/or editing in a plant genetic background that exhibits relatively high transgene silencing.
[0395] Comparisons in the present disclosure may also be in reference to corresponding control cells. Various control cells will be readily apparent to one of skill in the art. For example, a control cell may be a cell that does not contain one or more of: (1) a Casl2L polypeptide, (2) a guide RNA, and/or (3) both a Casl2L polypeptide and a guide RNA.
[0396] Methods of probing the expression level of a nucleic acid are well-known to those of skill in the art. For example, quantitative reverse transcription-polymerase chain reaction (qRT-PCR) analysis may be used to determine the expression level of a population of nucleic acids isolated from a nucleic acid-containing sample. KITS
[0397] Certain aspects of the present disclosure relate to an article of manufacture or kit comprising a polynucleotide, vector, cell, and/or composition described herein. In some embodiments, the kit further comprises a packed insert comprising instructions for the use of the polynucleotide, vector, cell, and/or composition. In some embodiments, the article of manufacture or kit further comprises one or more buffer, e.g., for storing, transferring, or otherwise using the polynucleotide, vector, cell, and/or composition. In some embodiments, the kit further comprises one or more containers for storing the polynucleotide, vector, cell, and/or composition.
[0398] The foregoing written description is considered to be sufficient to enable one skilled in the art to practice the present disclosure. The following Examples are offered for illustrative purposes only, and are not intended to limit the scope of the present disclosure in any way. Indeed, various modifications of the present disclosure in addition to those shown and described herein will become apparent to those skilled in the art from the foregoing description and fall within the scope of the appended claims.
Examples of Non-Limiting Aspects of the Disclosure
[0399] Aspects, including embodiments, of the present subject matter described above may be beneficial alone or in combination, with one or more other aspects or embodiments. Without limiting the foregoing description, certain non-limiting aspects of the disclosure are provided below. As will be apparent to those of skill in the art upon reading this disclosure, each of the individually numbered aspects may be used or combined with any of the preceding or following individually numbered aspects. This is intended to provide support for all such combinations of aspects and is not limited to combinations of aspects explicitly provided below:
[0400] Aspect 1. A composition comprising:
[0401] a) a CRISPR-Cas effector polypeptide, or a nucleic acid comprising a nucleotide sequence encoding the CRISPR-Cas effector polypeptide, wherein the CRISPR-Cas effector polypeptide comprises an amino acid sequence having at least 50% amino acid sequence identity to the amino acid sequence depicted in any one of FIG. 5 A-5M, wherein the CRISPR-Cas effector polypeptide has a length of from 600 amino acids to 850 amino acids; and
[0402] b) a CRISPR-Cas effector guide RNA, or one or more DNA molecules encoding the CRISPR-Cas effector guide RNA, wherein the CRISPR-Cas effector guide RNA comprises a target nucleic acid binding segment and a CRISPR-Cas effector polypeptide-binding segment, and wherein the target nucleic acid binding segment is heterologous to the CRISPR-Cas effector polypeptide-binding segment. [0403] Aspect 2. The composition of aspect 1, wherein the CRISPR-Cas effector polypeptide comprises an amino acid sequence having at least 80% amino acid sequence identity to the amino acid sequence depicted in any one of FIG. 5 A-5M.
[0404] Aspect 3. The composition of aspect 1, wherein the CRISPR-Cas effector guide RNA comprises a nucleotide sequence having 80%, 90%, 95%, 98%, 99%, or 100%, nucleotide sequence identity with any one of the nucleotide sequences depicted in FIG. 5A-5M, or is encoded by a nucleic acid comprising a nucleotide sequence having 80%, 90%, 95%, 98%, 99%, or 100%, nucleotide sequence identity with any one of the nucleotide sequences depicted in FIG. 5A-5M.
[0405] Aspect 4. The composition of any one of aspects 1-3, wherein the CRISPR-Cas effector polypeptide is fused to a nuclear localization signal (NLS).
[0406] Aspect 5. The composition of any one of aspects 1-4, wherein the composition comprises a lipid.
[0407] Aspect 6. The composition of any one of aspects 1-4, wherein a) and b) are within a liposome.
[0408] Aspect 7. The composition of any one of aspects 1-4, wherein a) and b) are within a particle.
[0409] Aspect 8. The composition of any one of aspects 1-7, comprising one or more of: a buffer, a nuclease inhibitor, and a protease inhibitor.
[0410] Aspect 9. The composition of any one of aspects 1-8, wherein the CRISPR-Cas effector polypeptide comprises an amino acid sequence having 96% or more identity to the amino acid sequence depicted in any one of FIG. 5 A-5M.
[0411] Aspect 10. The composition of any one of aspects 1-9, wherein the CRISPR-Cas effector polypeptide is a nickase that can cleave only one strand of a double-stranded target nucleic acid molecule.
[0412] Aspect 11. The composition of any one of aspects 1-9, wherein the CRISPR-Cas effector polypeptide is a catalytically inactive CRISPR-Cas effector polypeptide (dCRISPR-Cas effector).
[0413] Aspect 12. The composition of any one of aspects 1-11, wherein the CRISPR-Cas effector polypeptide has a length of from 600 amino acids to 800 amino acids.
[0414] Aspect 13. The composition of any one of aspects 1-12, further comprising a DNA donor template.
[0415] Aspect 14. The composition of any one of aspects 1-13, wherein the CRISPR-Cas effector guide RNA is a single molecule.
[0416] Aspect 15. The composition of any one of aspects 1-14, wherein the CRISPR-Cas effector guide RNA comprises one or more of a base modification, a sugar modification, and a backbone modification. [0417] Aspect 16. A CRISPR-Cas effector fusion polypeptide comprising:
[0418] a) a CRISPR-Cas effector polypeptide comprising an amino acid sequence having at least 50% amino acid sequence identity to the amino acid sequence depicted in any one of FIG. 5A-5M, wherein the CRISPR-Cas effector polypeptide comprises a RuvC-like domain, and wherein the CRISPR- Cas effector polypeptide has a length of from 250 amino acids to 500 amino acids; and
[0419] b) one or more heterologous polypeptides.
[0420] Aspect 17. The CRISPR-Cas effector fusion polypeptide of aspect 16, wherein the CRISPR- Cas effector polypeptide comprises an amino acid sequence having 80% or more identity to the amino acid sequence depicted in any one of FIG. 5A-5M.
[0421] Aspect 18. The CRISPR-Cas effector fusion polypeptide of aspect 16, wherein the CRISPR- Cas effector polypeptide comprises an amino acid sequence having 90% or more identity to the amino acid sequence depicted in any one of FIG. 5A-5M.
[0422] Aspect 19. The CRISPR-Cas effector fusion polypeptide of any one of aspects 16-18, wherein the CRISPR-Cas effector polypeptide is a nickase that can cleave only one strand of a doublestranded target nucleic acid molecule.
[0423] Aspect 20. The CRISPR-Cas effector fusion polypeptide of any one of aspects 16-18, wherein the CRISPR-Cas effector polypeptide is a catalytically inactive CRISPR-Cas effector polypeptide (dCRISPR-Cas effector).
[0424] Aspect 21. The CRISPR-Cas effector fusion polypeptide of any one of aspects 16-20, wherein the CRISPR-Cas effector polypeptide has a length of from 600 amino acids to 800 amino acids.
[0425] Aspect 22. The CRISPR-Cas effector fusion polypeptide of any one of aspects 16-21, wherein the heterologous polypeptide is fused to the N-terminus and/or the C-terminus of the CRISPR- Cas effector polypeptide.
[0426] Aspect 23. The CRISPR-Cas effector fusion polypeptide of any one of aspects 16-22, comprising a nuclear localization signal (NLS).
[0427] Aspect 24. The CRISPR-Cas effector fusion polypeptide of any one of aspects 16-23, wherein the heterologous polypeptide is a targeting polypeptide that provides for binding to a cell surface moiety on a target cell or target cell type.
[0428] Aspect 25. The CRISPR-Cas effector fusion polypeptide of any one of aspects 16-23, wherein the heterologous polypeptide exhibits enzymatic activity.
[0429] Aspect 26. The CRISPR-Cas effector fusion polypeptide of aspect 25, wherein the heterologous polypeptide exhibits one or more enzymatic activities selected from: nuclease activity, methyltransferase activity, demethylase activity, DNA repair activity, DNA damage activity, deamination activity, dismutase activity, alkylation activity, depurination activity, oxidation activity, pyrimidine dimer forming activity, integrase activity, transposase activity, recombinase activity, polymerase activity, ligase activity, helicase activity, photolyase activity and glycosylase activity.
[0430] Aspect 27. The CRISPR-Cas effector fusion polypeptide of aspect 25, wherein the heterologous polypeptide exhibits one or more enzymatic activities selected from: reverse transcriptase activity, nuclease activity, methyltransferase activity, demethylase activity, deamination activity, depurination activity, integrase activity, transposase activity, and recombinase activity.
[0431] Aspect 28. The CRISPR-Cas effector fusion polypeptide of any one of aspects 16-23, wherein the heterologous polypeptide exhibits an enzymatic activity that modifies a target polypeptide associated with a target nucleic acid.
[0432] Aspect 29. The CRISPR-Cas effector fusion polypeptide of aspect 28, wherein the heterologous polypeptide exhibits histone modification activity.
[0433] Aspect 30. The CRISPR-Cas effector fusion polypeptide of aspect 28 or aspect 29, wherein the heterologous polypeptide exhibits one or more enzymatic activities selected from: methyltransferase activity, demethylase activity, acetyltransferase activity, deacetylase activity, kinase activity, phosphatase activity, ubiquitin ligase activity, deubiquitinating activity, adenylation activity, deadenylation activity, SUMOylating activity, deSUMOylating activity, ribosylation activity, deribosylation activity, myristoylation activity, demyristoylation activity, glycosylation activity, and deglycosylation activity.
[0434] Aspect 31. The CRISPR-Cas effector fusion polypeptide of aspect 30, wherein the heterologous polypeptide exhibits one or more enzymatic activities selected from: methyltransferase activity, demethylase activity, acetyltransferase activity, and deacetylase activity.
[0435] Aspect 32. The CRISPR-Cas effector fusion polypeptide of any one of aspects 16-23, wherein the heterologous polypeptide is an endosomal escape polypeptide.
[0436] Aspect 33. The CRISPR-Cas effector fusion polypeptide of any one of aspects 16-23, wherein the heterologous polypeptide is a protein that increases or decreases transcription.
[0437] Aspect 34. The CRISPR-Cas effector fusion polypeptide of aspect 33, wherein the heterologous polypeptide is a transcriptional repressor domain.
[0438] Aspect 35. The CRISPR-Cas effector fusion polypeptide of aspect 33, wherein the heterologous polypeptide is a transcriptional activation domain.
[0439] Aspect 36. The CRISPR-Cas effector fusion polypeptide of any one of aspects 16-23, wherein the heterologous polypeptide is a protein binding domain.
[0440] Aspect 37. A nucleic acid comprising a nucleotide sequence encoding the CRISPR-Cas effector fusion polypeptide of any one of aspects 16-36. [0441] Aspect 38. The nucleic acid of aspect 37, wherein the nucleotide sequence encoding the CRISPR-Cas effector fusion polypeptide is operably linked to a promoter.
[0442] Aspect 39. The nucleic acid of aspect 38, wherein the promoter is functional in an archaeal cell.
[0443] Aspect 40. The nucleic acid of aspect 38, wherein the promoter is functional in a eukaryotic cell.
[0444] Aspect 41. The nucleic acid of aspect 40, wherein the promoter is functional in one or more of: a plant cell, a fungal cell, an animal cell, cell of an invertebrate, a fly cell, a cell of a vertebrate, a mammalian cell, a primate cell, a non-human primate cell, and a human cell.
[0445] Aspect 42. The nucleic acid of any one of aspects 39-41, wherein the promoter is one or more of: a constitutive promoter, an inducible promoter, a cell type-specific promoter, and a tissue-specific promoter.
[0446] Aspect 43. The nucleic acid of any one of aspects 38-42, wherein the nucleic acid is a recombinant expression vector.
[0447] Aspect 44. The nucleic acid of aspect 43, wherein the recombinant expression vector is a recombinant adenoassociated viral vector, a recombinant retroviral vector, or a recombinant lentiviral vector.
[0448] Aspect 45. The nucleic acid of aspect 39, wherein the promoter is functional in a prokaryotic cell.
[0449] Aspect 46. The nucleic acid of aspect 38, wherein the nucleic acid is an mRNA.
[0450] Aspect 47. One or more nucleic acids comprising:
[0451] (a) a nucleotide sequence encoding a CRISPR-Cas effector guide RNA; and
[0452] (b) a nucleotide sequence encoding a CRISPR-Cas effector polypeptide,
[0453] wherein the CRISPR-Cas effector polypeptide comprises an amino acid sequence having 50% or more amino acid sequence identity to the amino acid sequence set forth in any one of FIG. 5A- 5M.
[0454] Aspect 48. The one or more nucleic acids of aspect 47, wherein the CRISPR-Cas effector polypeptide comprises an amino acid sequence having 60% or more, or 75% or more, amino acid sequence identity to the amino acid sequence depicted in any one of FIG. 5A-5M.
[0455] Aspect 49. The one or more nucleic acids of aspect 47, wherein the CRISPR-Cas effector polypeptide comprises an amino acid sequence having 85% or more amino acid identity to the amino acid depicted in any one of FIG. 5A-5M. [0456] Aspect 50. The one or more nucleic acids of any one of aspects 47-49, wherein the CRISPR- Cas effector guide RNA comprises a nucleotide sequence having 80% or more nucleotide sequence identity with any one of the nucleotide sequences set forth in FIG. 5A-5M; or is encoded by a nucleic acid comprising a nucleotide sequence having 80%, 90%, 95%, 98%, 99%, or 100%, nucleotide sequence identity with any one of the nucleotide sequences depicted in FIG. 5A-5M.
[0457] Aspect 51. The one or more nucleic acids of any one of aspects 47-50, wherein the CRISPR- Cas effector polypeptide is fused to a nuclear localization signal (NLS).
[0458] Aspect 52. The one or more nucleic acids of any one of aspects 47-51, wherein the nucleotide sequence encoding the CRISPR-Cas effector guide RNA is operably linked to a promoter.
[0459] Aspect 53. The one or more nucleic acids of any one of aspects 47-52, wherein the nucleotide sequence encoding the CRISPR-Cas effector polypeptide is operably linked to a promoter.
[0460] Aspect 54. The one or more nucleic acids of aspect 52 or aspect 53, wherein the promoter operably linked to the nucleotide sequence encoding the CRISPR-Cas effector guide RNA, and/or the promoter operably linked to the nucleotide sequence encoding the CRISPR-Cas effector polypeptide, is functional in a eukaryotic cell.
[0461] Aspect 55. The one or more nucleic acids of aspect 54, wherein the promoter is functional in one or more of: a plant cell, a fungal cell, an animal cell, cell of an invertebrate, a fly cell, a cell of a vertebrate, a mammalian cell, a primate cell, a non-human primate cell, and a human cell.
[0462] Aspect 56. The one or more nucleic acids of any one of aspects 53-55, wherein the promoter is one or more of: a constitutive promoter, an inducible promoter, a cell type-specific promoter, and a tissue-specific promoter.
[0463] Aspect 57. The one or more nucleic acids of any one of aspects 47-56, wherein the one or more nucleic acids is one or more recombinant expression vectors.
[0464] Aspect 58. The one or more nucleic acids of aspect 57, wherein the one or more recombinant expression vectors are selected from: one or more adenoassociated viral vectors, one or more recombinant retroviral vectors, or one or more recombinant lentiviral vectors.
[0465] Aspect 59. The one or more nucleic acids of aspect 53, wherein the promoter is functional in a prokaryotic cell.
[0466] Aspect 60. A eukaryotic cell comprising one or more of:
[0467] a) a CRISPR-Cas effector polypeptide, or a nucleic acid comprising a nucleotide sequence encoding the CRISPR-Cas effector polypeptide, wherein the CRISPR-Cas effector polypeptide comprises an amino acid sequence having 50% or more amino acid sequence identity to the amino acid sequence depicted in any one of FIG. 5A-5M; [0468] b) a CRISPR-Cas effector fusion polypeptide of any one of aspects 16-36, or a nucleic acid comprising a nucleotide sequence encoding the CRISPR-Cas effector fusion polypeptide, and [0469] c) a CRISPR-Cas effector guide RNA, or a nucleic acid comprising a nucleotide sequence encoding the CRISPR-Cas effector guide RNA, wherein the CRISPR-Cas effector guide RNA comprises a binding segment that binds to a CRISPR-Cas effector polypeptide that comprises an amino acid sequence having 50% or more amino acid sequence identity to the amino acid sequence depicted in any one of FIG. 5A-5M.
[0470] Aspect 61. The eukaryotic cell of aspect 60, comprising the nucleic acid encoding the CRISPR-Cas effector polypeptide, wherein said nucleic acid is integrated into the genomic DNA of the cell.
[0471] Aspect 62. The eukaryotic cell of aspect 60 or aspect 61, wherein the eukaryotic cell is a plant cell, a mammalian cell, an insect cell, an arachnid cell, a fungal cell, a bird cell, a reptile cell, an amphibian cell, an invertebrate cell, a mouse cell, a rat cell, a primate cell, a non-human primate cell, or a human cell.
[0472] Aspect 63. A cell comprising a comprising a CRISPR-Cas effector fusion polypeptide, or a nucleic acid comprising a nucleotide sequence encoding the CRISPR-Cas effector fusion polypeptide. [0473] Aspect 64. The cell of aspect 63, wherein the cell is a prokaryotic cell.
[0474] Aspect 65. The cell of aspect 63 or aspect 64, comprising the nucleic acid comprising a nucleotide sequence encoding the CRISPR-Cas effector fusion polypeptide, wherein said nucleic acid molecule is integrated into the genomic DNA of the cell.
[0475] Aspect 66. A method of modifying a target nucleic acid, the method comprising contacting the target nucleic acid with:
[0476] a) a CRISPR-Cas effector polypeptide; and
[0477] b) a CRISPR-Cas effector guide RNA comprising a guide sequence that hybridizes to a target sequence of the target nucleic acid,
[0478] wherein said contacting results in modification of the target nucleic acid by the CRISPR-Cas effector polypeptide,
[0479] wherein the CRISPR-Cas effector polypeptide comprises an amino acid sequence having 50% or more amino acid sequence identity to the amino acid sequence set forth in any one of FIG. 5A- 5M.
[0480] Aspect 67. The method of aspect 66, wherein said modification is cleavage of the target nucleic acid. [0481] Aspect 68. The method of aspect 66 or aspect 67, wherein the target nucleic acid is selected from: double stranded DNA, single stranded DNA, RNA, genomic DNA, and extrachromosomal DNA. [0482] Aspect 69. The method of any of aspects 66-68, wherein said contacting takes place in vitro outside of a cell.
[0483] Aspect 70. The method of any of aspects 66-68, wherein said contacting takes place inside of a cell in culture.
[0484] Aspect 71. The method of any of aspects 66-68, wherein said contacting takes place inside of a cell in vivo.
[0485] Aspect 72. The method of aspect 70 or aspect 71, wherein the cell is a eukaryotic cell.
[0486] Aspect 73. The method of aspect 72, wherein the cell is selected from: a plant cell, a fungal cell, a mammalian cell, a reptile cell, an insect cell, an avian cell, a fish cell, a parasite cell, an arthropod cell, a cell of an invertebrate, a cell of a vertebrate, a rodent cell, a mouse cell, a rat cell, a primate cell, a non-human primate cell, and a human cell.
[0487] Aspect 74. The method of aspect 70 or aspect 71, wherein the cell is a prokaryotic cell.
[0488] Aspect 75. The method of any one of aspects 66-74, wherein said contacting results in genome editing.
[0489] Aspect 76. The method of any one of aspects 66-75, wherein said contacting comprises: introducing into a cell: (a) the CRISPR-Cas effector polypeptide, or a nucleic acid comprising a nucleotide sequence encoding the CRISPR-Cas effector polypeptide, and (b) the CRISPR-Cas effector guide RNA, or a nucleic acid comprising a nucleotide sequence encoding the CRISPR-Cas effector guide RNA.
[0490] Aspect 77. The method of aspect 76, wherein said contacting further comprises: introducing a DNA donor template into the cell.
[0491] Aspect 78. The method of any one of aspects 66-77, wherein the CRISPR-Cas effector guide RNA comprises a nucleotide sequence having 80% or more nucleotide sequence identity with any one of the nucleotide sequences set forth in FIG. 5A-5M; or is encoded by a nucleic acid comprising a nucleotide sequence having 80%, 90%, 95%, 98%, 99%, or 100%, nucleotide sequence identity with any one of the nucleotide sequences depicted in FIG. 5A-5M.
[0492] Aspect 79. The method of any one of aspects 66-78, wherein the CRISPR-Cas effector polypeptide is fused to a nuclear localization signal.
[0493] Aspect 80. A method of modulating transcription from a target DNA, modifying a target nucleic acid, or modifying a protein associated with a target nucleic acid, the method comprising contacting the target nucleic acid with: [0494] a) a CRISPR-Cas effector fusion polypeptide comprising a CRISPR-Cas effector polypeptide fused to a heterologous polypeptide; and
[0495] b) a CRISPR-Cas effector guide RNA comprising a guide sequence that hybridizes to a target sequence of the target nucleic acid,
[0496] wherein the CRISPR-Cas effector polypeptide comprises an amino acid sequence having 50% or more amino acid sequence identity to the amino acid sequence set forth in any one of FIG. 5A- 5M.
[0497] Aspect 81. The method of aspect 80, wherein the CRISPR-Cas effector guide RNA comprises a nucleotide sequence having 80% or more nucleotide sequence identity with any one of the crRNA sequences set forth in FIG. 5A-5M; or is encoded by a nucleic acid comprising a nucleotide sequence having 80%, 90%, 95%, 98%, 99%, or 100%, nucleotide sequence identity with any one of the nucleotide sequences depicted in FIG. 5A-5M.
[0498] Aspect 82. The method of aspect 80 or aspect 81 , wherein the CRISPR-Cas effector fusion polypeptide comprises nuclear localization signal.
[0499] Aspect 83. The method of any of aspects 80-82, wherein said modification is not cleavage of the target nucleic acid.
[0500] Aspect 84. The method of any of aspects 80-83, wherein the target nucleic acid is selected from: double stranded DNA, single stranded DNA, RNA, genomic DNA, and extrachromosomal DNA. [0501] Aspect 85. The method of any of aspects 80-84, wherein said contacting takes place in vitro outside of a cell.
[0502] Aspect 86. The method of any of aspects 80-84, wherein said contacting takes place inside of a cell in culture.
[0503] Aspect 87. The method of any of aspects 80-84, wherein said contacting takes place inside of a cell in vivo.
[0504] Aspect 88. The method of aspect 86 or aspect 87, wherein the cell is a eukaryotic cell.
[0505] Aspect 89. The method of aspect 88, wherein the cell is selected from: a plant cell, a fungal cell, a mammalian cell, a reptile cell, an insect cell, an avian cell, a fish cell, a parasite cell, an arthropod cell, a cell of an invertebrate, a cell of a vertebrate, a rodent cell, a mouse cell, a rat cell, a primate cell, a non-human primate cell, and a human cell.
[0506] Aspect 90. The method of aspect 86 or aspect 87, wherein the cell is a prokaryotic cell.
[0507] Aspect 91. The method of any one of aspects 80-90, wherein said contacting comprises: introducing into a cell: (a) the CRISPR-Cas effector fusion polypeptide, or a nucleic acid comprising a nucleotide sequence encoding the CRISPR-Cas effector fusion polypeptide, and (b) the CRISPR-Cas effector guide RNA, or a nucleic acid comprising a nucleotide sequence encoding the CRISPR-Cas effector guide RNA.
[0508] Aspect 92. The method of any one of aspects 80-91, wherein the CRISPR-Cas effector polypeptide is a catalytically inactive CRISPR-Cas effector polypeptide (dCRISPR-Cas effector).
[0509] Aspect 93. The method of any one of aspects 80-92, wherein the CRISPR-Cas effector polypeptide has a length of from 275 amino acids to 465 amino acids.
[0510] Aspect 95. The method of any one of aspects 80-93, wherein the heterologous polypeptide exhibits an enzymatic activity.
[0511] Aspect 95. The method of aspect 94, wherein the heterologous polypeptide exhibits one or more enzymatic activities selected from: nuclease activity, methyltransferase activity, demethylase activity, DNA repair activity, DNA damage activity, deamination activity, dismutase activity, alkylation activity, depurination activity, oxidation activity, pyrimidine dimer forming activity, integrase activity, transposase activity, recombinase activity, polymerase activity, ligase activity, helicase activity, photolyase activity and glycosylase activity.
[0512] Aspect 96. The method of aspect 94, wherein the heterologous polypeptide exhibits one or more enzymatic activities selected from: reverse transcriptase activity, nuclease activity, methyltransferase activity, demethylase activity, deamination activity, depurination activity, integrase activity, transposase activity, and recombinase activity.
[0513] Aspect 97. The method of any one of aspects 80-93, wherein the heterologous polypeptide exhibits an enzymatic activity that modifies a target polypeptide associated with a target nucleic acid. [0514] Aspect 98. The method of aspect 97, wherein the heterologous polypeptide exhibits histone modification activity.
[0515] Aspect 99. The method of aspect 97 or aspect 98, wherein the heterologous polypeptide exhibits one or more enzymatic activities selected from: methyltransferase activity, demethylase activity, acetyltransferase activity, deacetylase activity, kinase activity, phosphatase activity, ubiquitin ligase activity, deubiquitinating activity, adenylation activity, deadenylation activity, SUMOylating activity, deSUMOylating activity, ribosylation activity, deribosylation activity, myristoylation activity, demyristoylation activity, glycosylation activity (e.g., from O-GlcNAc transferase) and de glycosylation activity.
[0516] Aspect 100. The method of aspect 99, wherein the heterologous polypeptide exhibits one or more enzymatic activities selected from: methyltransferase activity, demethylase activity, acetyltransferase activity, and deacetylase activity. [0517] Aspect 101. The method of any one of aspects 80-93, wherein the heterologous polypeptide is protein that increases or decreases transcription.
[0518] Aspect 102. The method of aspect 101, wherein the heterologous polypeptide is a transcriptional repressor domain.
[0519] Aspect 103. The method of aspect 101, wherein the heterologous polypeptide is a transcriptional activation domain.
[0520] Aspect 104. The method of any one of aspects 80-93, wherein the heterologous polypeptide is a protein binding domain.
[0521] Aspect 105. A transgenic, multicellular, non-human organism whose genome comprises a transgene comprising a nucleotide sequence encoding one or more of: [0522] a) a CRISPR-Cas effector polypeptide,
[0523] b) a CRISPR-Cas effector fusion polypeptide, and
[0524] c) a CRISPR-Cas effector guide RNA,
[0525] wherein the CRISPR-Cas effector polypeptide comprises an amino acid sequence having 50% or more amino acid sequence identity to the amino acid sequence set forth in any one of FIG. 5A- 5M.
[0526] Aspect 106. The transgenic, multicellular', non-human organism of aspect 105, wherein the CRISPR-Cas effector polypeptide comprises an amino acid sequence having 80% or more amino acid sequence identity to the amino acid sequence set forth in any one of FIG. 5 A-5M.
[0527] Aspect 107. The transgenic, multicellular, non-human organism of aspect 105, wherein the CRISPR-Cas effector polypeptide comprises an amino acid sequence having 90% or more amino acid sequence identity to the amino acid sequence set forth in any one of FIG. 5 A-5M.
[0528] Aspect 108. The transgenic, multicellular, non-human organism of any one of aspects 105- 107, wherein the organism is a plant, a monocotyledon plant, a dicotyledon plant, an invertebrate animal, an insect, an arthropod, an arachnid, a parasite, a worm, a cnidarian, a vertebrate animal, a fish, a reptile, an amphibian, an ungulate, a bird, a pig, a horse, a sheep, a rodent, a mouse, a rat, or a non-human primate.
[0529] Aspect 109. A system comprising:
[0530] a) a CRISPR-Cas effector polypeptide and a CRISPR-Cas effector guide RNA;
[0531] b) a CRISPR-Cas effector polypeptide, a CRISPR-Cas effector guide RNA, and a DNA donor template;
[0532] c) a CRISPR-Cas effector fusion polypeptide and a CRISPR-Cas effector guide RNA; [0533] d) a CRISPR-Cas effector fusion polypeptide, a CRISPR-Cas effector guide RNA, and a DNA donor template;
[0534] e) an mRNA encoding a CRISPR-Cas effector polypeptide, and a CRISPR-Cas effector guide RNA;
[0535] f) an mRNA encoding a CRISPR-Cas effector polypeptide; a CRISPR-Cas effector guide RNA, and a DNA donor template;
[0536] g) an mRNA encoding a CRISPR-Cas effector fusion polypeptide, and a CRISPR-Cas effector guide RNA;
[0537] h) an mRNA encoding a CRISPR-Cas effector fusion polypeptide, a CRISPR-Cas effector guide RNA, and a DNA donor template;
[0538] i) one or more recombinant expression vectors comprising: i) a nucleotide sequence encoding a CRISPR-Cas effector polypeptide; and ii) a nucleotide sequence encoding a CRISPR-Cas effector guide RNA;
[0539] j) one or more recombinant expression vectors comprising: i) a nucleotide sequence encoding a CRISPR-Cas effector polypeptide; ii) a nucleotide sequence encoding a CRISPR-Cas effector guide RNA; and iii) a DNA donor template;
[0540] k) one or more recombinant expression vectors comprising: i) a nucleotide sequence encoding a CRISPR-Cas effector fusion polypeptide; and ii) a nucleotide sequence encoding a CRISPR- Cas effector guide RNA; and
[0541] 1) one or more recombinant expression vectors comprising: i) a nucleotide sequence encoding a CRISPR-Cas effector fusion polypeptide; ii) a nucleotide sequence encoding a CRISPR-Cas effector guide RNA; and a DNA donor template,
[0542] wherein the CRISPR-Cas effector polypeptide comprises an amino acid sequence having 50% or more amino acid sequence identity to the amino acid sequence depicted in any one of FIG. 5 A- 5M, and
[0543] wherein the CRISPR-Cas effector fusion polypeptide is a CRISPR-Cas effector fusion polypeptide of any one of aspects 16-36.
[0544] Aspect 110. The CRISPR-Cas effector system of aspect 109, wherein the CRISPR-Cas effector polypeptide comprises an amino acid sequence having 80% or more amino acid sequence identity to the amino acid sequence depicted in any one of FIG. 5 A-5M.
[0545] Aspect 111. The CRISPR-Cas effector system of aspect 109, wherein the CRISPR-Cas effector polypeptide comprises an amino acid sequence having 90% or more amino acid sequence identity to the amino acid sequence depicted in any one of FIG. 5 A-5M. [0546] Aspect 112. The CRISPR-Cas effector system of any of aspects 109-111, wherein the donor template nucleic acid has a length of from 8 nucleotides to 1000 nucleotides.
[0547] Aspect 113. The CRISPR-Cas effector system of any of aspects 109-111, wherein the donor template nucleic acid has a length of from 25 nucleotides to 500 nucleotides.
[0548] Aspect 114. A kit comprising the CRISPR-Cas effector system of any one of aspects 109- 113.
[0549] Aspect 115. The kit of aspect 114, wherein the components of the kit are in the same container.
[0550] Aspect 116. The kit of aspect 114, wherein the components of the kit are in separate containers.
[0551] Aspect 117. A sterile container comprising the CRISPR-Cas effector system of any one of aspects 109-116.
[0552] Aspect 118. The sterile container of aspect 117, wherein the container is a syringe.
[0553] Aspect 119. An implantable device comprising the CRISPR-Cas effector system of any one of aspects 109-116.
[0554] Aspect 120. The implantable device of aspect 119, wherein the CRISPR-Cas effector system is within a matrix.
[0555] Aspect 121. The implantable device of aspect 119, wherein the CRISPR-Cas effector system is in a reservoir.
EXAMPLES
[0556] The following examples are put forth so as to provide those of ordinary skill in the art with a complete disclosure and description of how to make and use the present invention, and are not intended to limit the scope of what the inventors regard as their invention nor are they intended to represent that the experiments below are all or the only experiments performed. Efforts have been made to ensure accuracy with respect to numbers used (e.g. amounts, temperature, etc.) but some experimental errors and deviations should be accounted for. Unless indicated otherwise, parts are parts by weight, molecular weight is weight average molecular weight, temperature is in degrees Celsius, and pressure is at or near atmospheric. Standard abbreviations may be used, e.g., bp, base pair(s); kb, kilobase(s); pl, picoliter(s); s or sec, second(s); min, minute(s); h or hr, hour(s); aa, amino acid(s); kb, kilobase(s); bp, base pair(s); nt, nucleotide(s); i.m., intramuscular(ly); i.p., intraperitoneal(ly); s.c., subcutaneous(ly); and the like. [0557] The following examples are offered to illustrate provided embodiments and are not intended to limit the scope of the present disclosure. In the Examples provided herein, tables appear beneath the table heading that describes the respective table.
Example 1: Cas protein structure
[0558] To explore the mechanism by which CasZ achieves RNA-guided DNA recognition, we generated a purified CasZ-crRN A-dsDN A ternary complex suitable for analysis by cryo-electron microscopy (cryo-EM). CryoEM maps of this complex revealed a bi-lobed architecture analogous to Cas9 and Casl2 enzymes, despite the divergence in both sequence and size between CasZ and these much larger proteins (FIG. 3A, FIG. 3B, FIG. 2A-2E). The 3 A resolution structure revealed the shape and domain organization of Cas and the unique structure of the guide RNA (FIG. 3A-3C; FIG. 2A-2E).
Overall, several domains within CasZ, including the RECI, RECII, OBD, RuvC and TSL domains, exhibit segmentation and rearrangement compared to known type V systems. Notably, the RuvC domain of Cask is split into four parts across the C-terminal half of the protein, likely hindering reliable alignment and clustering with reported Casl2 systems (FIG. 3D). The REC I and REC II domains are also segmented in the protein sequence, with the PAM-interacting domain wedged within REC I as opposed to the N terminus of the protein as seen in CasO but similar to Casl2i. In contrast to Cas<D, CasZ contains a Target Strand Loading (TSL) domain that likely functions to load the single-stranded DNA substrate into the active site. The TSL sits in a position analogous to the “Nuc” domain that was incorrectly hypothesized in other type V CRISPR-Cas enzymes to be a second nuclease domain responsible for DNA cleavage. CasZ also exhibits a distinct structure in the REC I domain compared to Cas<D (FIG. 2D).
[0559] The crRNA forms an unexpected shape that blankets the protein, with a divergent recognition lobe in CasZ that binds to distinct sequences and structural features of the guide RNA (FIG. 3C; FIG. 2A- 2C). Specifically, possible interactions between primarily polar or charged residues within the REC II domain in CasZ with the conserved motifs of the crRNA hairpin were observed (FIG. 2C). These residues are conserved across the protein family and likely interact either directly with the RNA nucleobases (Q452, N510), or with the RNA phosphate backbone to stabilize the guide (S451 , K596, E444, N445, K503, Y619) (FIG. 3C; FIG. 2C). Protein contacts to the unpaired nucleobases A9, A30, A31, which form a motif in the middle of the guide RNA stem-loop, were not observed, consistent with the lack of loop sequence conservation. However, interestingly, the non-complementarity is a conserved feature of the CasZ crRNAs and may be important for the hairpin kink geometry.
[0560] CRISPR-Cas proteins initiate the unwinding of target double-stranded DNA following PAM recognition. In CasZ, this recognition is achieved via interactions with the OBD, REC I, and a five a- helical bundle referred to as the PAM-interacting domain (PID). Residues within the three domains interact with the sugar-phosphate backbone of the target DNA (FIG. 2B) and, in some cases such as residue N102, make direct contact with the nucleobases. The interaction between N102 and nucleobase G(-l) may explain the preference for purines in this position as opposed to pyrimidines since a pyrimidine substitution would result in a base that is too distant from the interacting asparagine (FIG. 2B). In examining the aftermath of cis-cleavage of DNA, we found that Cask possesses minimal ssDNA or ssRNA trans -cleavage activity upon DNA recognition in cis (Fig. S4B). Incubation of the Cask protein with non-cognate guides used by other orthologs within the protein family also produced minimal trans- ssDNA cleavage activity (FIG. 3E), consistent with the possibility that guide RNAs within the Cask family may be interchangeable. Single mismatches across the ssDNA target sequence revealed that the seed region of the target DNA (nts. 1-5 from the PAM-containing end) and the region extending from nts. 7-13 are required to match the spacer sequence of the guide RNA for efficient cleavage (FIG. 4). Investigation of positions that possibly interact with the DNA in these regions (FIG. 2D) or the corresponding RNA revealed conserved residues in REC, OBD, PID and RuvC domains that may account for the complex’s intolerance to target mismatches.
Example 2:
Summary
[0561] CRISPR-Cas systems are host-encoded pathways that protect microbes from viral infection using an adaptive RNA-guided mechanism. As described herein, using genome-resolved metagenomics, it was discovered that CRISPR systems are also encoded in diverse bacteriophages, where they occur as divergent and hypercompact anti-viral systems. Bacteriophage-encoded CRISPR systems belong to all six known CRISPR-Cas types, though some lack crucial components, suggesting alternate functional roles or host complementation. Described are multiple new Cas9-like proteins and families related to type V CRISPR-Cas systems, including the Cask RNA-guided nuclease family. Cask recognizes double-stranded DNA using a uniquely structured CRISPR RNA (crRNA). The Cask-RNA-DNA structure determined by cryoelectron microscopy reveals a compact bilobed architecture capable of inducing genome editing in mammalian, Arabidopsis, and hexapioid wheat cells. These findings reveal a new source of CRISPR-Cas enzymes in phages and highlight their value as genome editors in plant and human cells.
Introduction
[0562] CRISPR-Cas systems confer resistance in prokaryotes against invading extrachromosomal elements, including viruses and plasmids (FIG. 8A). To generate immunological memory, microbes capture fragments of foreign genetic elements and incorporate them into their genomic CRISPR array using the Casl-Cas2 integrase. Subsequent transcription of the array creates CRISPR RNAs (crRNAs) that bind to and direct CRISPR-associated (Cas) nucleases to target complementary nucleic acids. These systems comprise two classes, each with three different types, defined by the architectures of their nuclease effector modules involved in crRNA processing and DNA or RNA interference.
[0563] Reports of type I-F CRISPR-Cas loci encoded in bacteriophages (phage) that infect Vibrio cholera, or type V CRISPR-Cas loci in huge phage genomes reconstructed from microbial community DNA sequences, hinted at a wider distribution of phage-encoded CRISPR systems that might function in novel ways and reveal important insights into prokaryotic biology. However, these were the only reported examples of phage-encoded CRISPR systems.
[0564] Here, using metagenomic analysis of microbial samples isolated from soil, aquatic, human, and animal microbiomes, reported is the widespread occurrence of diverse, compact CRISPR-Cas systems encoded in phage genomes, demonstrating an unexpected biological reservoir of anti-viral machinery within infectious agents. Phage-encoded CRISPR-Cas systems include members of all six CRISPR types (types I- VI) as defined by bacterially encoded examples. Evidence was found for new or alternative modes of nucleic acid interference involving phage-encoded type I, III, IV, and VI systems. In addition, the phage and phage-like sequences result in a several-fold expansion of CRISPR-Cas9 and - Cas 12 enzymes belonging to the type II and type V families that are widely deployed for genome editing applications. Cask was found to have robust biochemical activity as an RNA-guided double-stranded DNA (dsDNA) cutter. Its cryoclcctron microscopy (cryo-EM)-dctcrmincd molecular structure explains its use of a natural single-guide RNA for DNA binding, and cell-based experiments demonstrated robust endogenous genome editing activity in plant and human cells. The compact architecture of Cask and other phage-encoded CRISPR-Cas proteins will help facilitate vector-based and direct delivery into cells for wide-ranging biotechnological applications.
Results
A wide diversity of phages across many bacterial phyla encode divergent CRISPR-Cas systems
[0565] Using genome-resolved metagenomics, over 660 giga base pairs (bp) of assembled genomic DNA were analyzed from both environmental and animal-associated microbiomes to reveal a surprising diversity of over 6000 CRISPR-encoding phages (E1G. 8B). Analysis of publicly available phage genomes revealed that CRISPR-Cas systems occur in only 0.4% of phages, making them exceptionally rare compared to their abundance in prokaryotic genomes where they occur in 40% of bacteria and 85% of archaea. CRISPR-encoding phages, rather than being limited to specific phylogenetic clusters, are found within many diverse phage subtypes (FIG. 8B). This finding is consistent with previous work that determined phage phylogeny from protein-clustering analysis. At least two phages harboring CRISPR arrays were alternatively coded such that the TAG stop codon was recoded to glutamine. Although circularized CRISPR-encoding phages included huge phages such as a >620 kbp megaphage (FIG. 8C), most had a genome size close to the average of 52 kbp. Notably, however, relatively few phages encode complete CRISPR-Cas systems. Fewer than 10% of CRISPR-encoding phages were found to contain machinery for the acquisition of new spacer sequences into their CRISPR arrays, consistent with observations in huge phages. Many phages encode CRISPR arrays, but few of these (~6%) include Cas effectors encoded nearby (FIG. 8D). In such situations, phages may produce their own guide RNAs but hijack the Cas effectors provided by their hosts. Consistent with this possibility, ~1% of phages encode only the Casl-Cas2 integrase used for the acquisition of new spacers, but no other Cas enzymes. In some cases, phage-encoded Casl contained a fusion to another protein such as reverse transcriptase, suggesting the possibility of the acquisition of RNA protospacers into the phage array. Notably, only 27 of the thousands of phage-encoded CRISPR-Cas loci identified in this study target RNA and can be classified as new homologs of previously described RNA-targeting systems. Thus, the vast majority of phage encoded CRISPR systems target DNA.
Phage-encoded CRISPR-Cas systems include all six known types but with phage-specific properties
[0566] As found with this work, all six known types of CRISPR-Cas systems occur in phages, and relative to host-encoded systems, they have various unique properties associated with their existence within phage genomes. These include missing sequence integration or targeting machinery as mentioned above, modified type Ill and VI systems that mitigate the abortive infection mechanism, and spacers that target other mobile genetic elements.
[0567] For example, some of the rare phage-encoded type III systems are associated with CRISPR arrays targeting vital or highly abundant RNA transcripts of other mobile elements, such as phage tail proteins or transposases (FIG. 9). In well-studied type III systems, the CaslO protein converts ATP into a cyclic oligoadenylate (cOA) product, which allosterically activates an auxiliary Csm6 ribonuclease. The activated Csm6 amplifies the immune response by degrading RNA transcripts indiscriminately, thereby destroying the invasive transcriptome or inducing host cell dormancy or death, aborting the phage infectious cycle. Interestingly, in phage-encoded type III systems, the CaslO subunit contains multiple mutations, hinting at an inability to produce cOA (FIG. 14), and Csm6 or a related CARF-domain ribonuclease is absent, similarly to archaeal Borg elements. Notably, the key residues for DNA cleavage in the CaslO HD domain, and for RNA cleavage in Cas7, remain intact (FIG. 14). Unless the cOA production and Csm6 ribonuclease functionalities are complemented by orthogonal type III systems from the host genome, this suggests that the type III phage systems may be capable of cleaving key RNA transcripts and genomic DNA of competing mobile elements to interfere with their infectious cycle without activating abortive infection in which cOA signaling triggers trans-cleavage of transcripts in the host cell.
[0568] In addition to type 111 systems, the first examples of phage-encoded type VI (Casl3) ribonucleases were found, most of which belong to the Casl3b and the relatively small Casl3d superfamilies (FIG. 10). Analogous to the findings above with type III systems in abortive infection, the lack of signature Csx27 and Csx28 proteins, which are transmembrane factors that enhance abortive infection mechanisms, may indicate the absence of an abortive infection pathway unless supplemented by the host.
Miniature single-effector CRISPR-Cas systems are enriched in phage genomes
[0569] Class 2 CRISPR-Cas systems, including types II, V, and VI, generally employ single-subunit RNA-guided, nucleic acid-targeting interference enzymes. In addition to new Cas9 (a, b, c) and Casl2 (a, b, c, f, i) enzyme variants, miniature CRISPR-associated nucleases were identified in phages harboring both HNH and RuvC catalytic domains characteristic of Cas9. These miniature nucleases constitute phylogenetically distinct clades denoted as types II-x, -y, and -z (FIG. 10A). These systems lack the Casl, Cas2, or Csn2 sequence acquisition machinery (Figs. ID and 3A) and have distinct domain organizations compared to previously studied Cas9 orthologs, with significant deletions across the proteins in comparison (FIG. 15). The phylogenetic analyses indicated multiple evolutionary origins of the type II CRISPR-Cas systems in viruses, the evolutionary relatedness of virally and bacterially encoded CRISPR- Cas systems suggesting that those encoded by viruses were obtained from their host during prior infection.
[0570] Furthermore, it was observed here that bacteriophage genomes harbor an unusual enrichment of hypercompact type V effectors (Figs. ID and 3B) compared to abundance in bacteria, including hundreds of variants comprising 44 protein families that are evolutionarily distant from previously reported and experimentally validated miniature type V CRISPR-Cas nucleases, including Casl2f and Cas<D (FIG. 10B). Evolutionary analysis suggests that distinct type V nuclease subtypes may have evolved multiple times from separate transposon-encoded TnpB families, which have recently been shown to be RNA-guided nucleases themselves, and it was observed that TnpB is also widely encoded on phages.
[0571] CRISPR arrays associated with the type V families contained spacer sequences targeting competing dsDNA-based extrachromosomal elements that are predicted to infect the same host (FIG. 9). It was found in this work that in multiple related Biggiephages, miniature type V families including Casq and Cas D co-occurred with a type I system that termed here type I-X, of which only one example was had previously, bearing similarities to type I-C CRISPR systems but featuring a distinct helicase in place of the processive nuclease Cas3. Biggiephage genomes were recovered over a four-year time span, and remained identical save for their CRISPR arrays, which only exhibited minor differences (Figs. 15C and 15D). Though DNA cleavage by this system was not validated, it is possible that dsDNA binding silences the expression of target genes. In some cases, the arrays of the type I-X system target the same circular extrachromosomal element, albeit with distinct spacers, as the array associated with co-occurring type V systems. One such cryptic element harbored restriction enzymes and retron-based anti-phage defense systems that could limit Biggiephage infectivity, underscoring the dynamic nature of the evolutionary arms race between mobile elements in competition for host resources.
[0572] It was also found in this work that the first type IV systems encoded in lytic phage genomes. Type IV systems are predominantly found on plasmids, where their mechanisms of action are poorly understood and they sometimes lack a CRISPR array. A type IV subtype is reported here that lacks the DinG hallmark gene and encodes in its place a CysH-likc protein bearing limited similarity to non- CRISPR associated CysH phosphoadenosine 5 '-phosphosulfate reductases. Remarkably, the CRISPR array associated with this type IV-F system and a neighboring type V targets the type V Cas gene encoded in a competing cyanophage (FIG. 9).
Cask is a divergent phage -specific CRISPR-Cas enzyme with a unique guide RNA
[0573] A distinctive phage-encoded enzyme family, Cask, exists within huge bacteriophages that are evolutionarily linked to the recently reported Mahaphage clade. Named using Greek nomenclature to indicate its phage origin, this family of 55 compact systems exhibited such sequence divergence that it had negligible sequence identity (<5%) to, and clustered separately from, type V and type II enzymes (FIG. 16A). In addition, Cask sequences have low similarity to these enzymes (<10%) but are phylogenetically closest to Casl4J5 (FIG. 10B). The protein is not encoded along with any other Cas proteins, and the RuvC nuclease was not immediately identifiable from the sequence. Difficulty in aligning this system to reported enzymes via remote homology (FIG. 19) further suggested that a direct evolutionary relationship with known Cas superfamilies was questionable. CRISPR arrays associated with Cask contain spacer sequences complementary to dsDNA-based extrachromosomal elements predicted to infect the same Bacteroidetes host (FIG. 9).
[0574] In any CRISPR-Cas system, processing of CRISPR array transcripts, consisting of repeats and spacer sequences acquired from previously encountered mobile genetic elements (MGEs), is essential to generating mature crRNAs that guide Cas proteins to destroy foreign viruses. Analogous to the distinct nature of the protein, the Cask crRNA is predicted to form an elongated hairpin secondary structure not previously observed in guide RNAs associated with Cas 12 (FIG. 11 A). Despite their divergent nucleotide sequences, crRNAs retain a similar predicted hairpin structure across the protein family (FIG. 16B). Furthermore, Cask crRNAs contain conserved sequences at their 5' and-3' ends and in the center of the RNA (FIG. 11B).
RuvC-mediated crRNA processing in the spacer region and dsDNA cutting hy Case
[0575] The lack of a detectable tracrRNA encoded within the genomic locus begged the question of how this RNA, akin to a naturally occurring crRNA-tracrRNA hybrid (FIG. 11 A), may be processed by the CRISPR-Cas system or host factors to produce mature crRNA. Using radiolabeled precursor crRNAs (pre-crRNAs) as substrates, it was first tested whether purified Cask protein catalyzes RNA cleavage. Analytical denaturing gel electrophoresis showed that pre-crRNAs are cut by Cask in the spacer region as opposed to the 5' end of the RNA, where cutting has been observed in nearly all self-processing singleeffector systems analyzed previously (Figs. 4C, 4D, and 16D), with the exception of a type V-C system. The Cask-induced pre-crRNA processing yields a crRNA spacer sequence that is complementary to DNA target sites 14-17 nucleotides (nt) in length.
[0576] The fact that Cask can process its own pre-crRNA obviates the need for Ribonuclease III or other host factors required for the function of most known Cas9 and Cas 12 family members. Although some CRISPR-Cas proteins process pre-crRNAs using an internal active site distinct from the RuvC domain or by recruiting Ribonuclease III to cleave a pre-crRNA:tracrRNA duplex, it was wondered whether phage-encoded Cask, like phage-encoded Cas®, processes pre-crRNA using its RuvC active site. It was thus tested Mg2+ dependence and showed that Cask is indeed reliant on the presence of Mg2+ and thus, by extension, the RuvC active site for crRNA maturation (FIG. 11D). [0577] CRISPR-Cas systems target DNA sequences following or preceding a 2-5 bp Protospacer Adjacent Motif (PAM) for self-versus-non-self discrimination. The sequence requirements for DNA targeting by CasZ were then determined using a plasmid depletion assay in which targeting a library of putative PAM sites revealed sequence specificity. This assay demonstrated the ability of crRNA-guided CasZ to cleave dsDNA, without requirement for additional RNA components, and a TTR PAM sequence specificity (FIG. 17A). CasZ with host genome-targeting guides showed a reduction in colony-forming units (as a proxy for cell viability) of multiple orders of magnitude, in comparison to negative control of CasZ with a non-targeting guide (FIG. HE).
[0578] In vitro incubation of purified CasZ with crR As, along with a linear dsDNA target substrate, generated cleavage products with surprisingly pronounced staggered 5'-overhangs of 11-16 nt (Figs. 4F and 4G). Type V CRISPR-Cas enzymes such as Casl2a have also been observed to generate staggered overhangs, albeit smaller at only 5 nt. Furthermore, the non-target strand (NTS) was cleaved faster than the target strand (T) within the RuvC active site over a 2 h time period (FIG. 11H).
Cas ribonucleoproteins induce genome editing in endogenous genes in human and plant cells
[0579] The development of single-effector CRISPR-Cas systems for editing eukaryotic cells has revolutionized genome engineering. However, the large sizes of Cas9 and Casl2a enzymes can inhibit delivery into some cell types, for which hypercompact genome editors with favorable kinetics imply great promise as an alternative. A head-to-head comparison was conducted of insertion and deletion efficiencies using CasZ and Casl2a ribonucleoproteins (RNPs) with identical guide RNA spacers targeting sequences recognizing Vascular Endothelial Growth Factor A (VEGFA) and Empty Spiracles Homeoboxl (EMX1) genes in HEK293T cells. Despite their miniature size, CasZ RNPs generated promising genome-editing outcomes compared to Casl2a, and in at least one case exceeded Casl2a insertion-deletion (indel) percentages (FIG. 12A). Extending these experiments to Arabidopsis thaliana, it was confirmed that CasZ exhibited editing efficiencies of up to 18% at the endogenous PDS3 gene (FIG. 12B), notably higher than observed previously using Cas®. The efficiency of editing was dependent on temperature, with no editing occurring at 23°C, an intermediate level of editing occurring at 28°C, and the highest level of editing occurring at 32°C. Furthermore, editing in the endogenous disease resistance gene Snn5 was achieved in hexapioid wheat protoplasts, where a genome ~5x larger than the human genome poses a scanning challenge to achieve successful editing at the target site (FIG. 12C). Next-generation sequencing for both human and plant cells revealed indel profiles with large deletions (Figs. 5D and 18C), consistent with the staggered cuts observed in vitro at the PAM distal region. Cask protein structure explains interference mechanism
[0580] To explore the mechanism by which CasZ achieves RNA-guided DNA recognition, a purified CasZ-crRN A-dsDN A ternary complex suitable for analysis by cryo-EM was generated. CryoEM maps of this complex revealed a bilobed architecture analogous to Cas9 and Casl2 enzymes, despite the divergence in both sequence and size between CasZ and these much larger enzymes (Figs. 6A, 6B, 18, 19, and 20). The 3 A resolution structure revealed the shape and domain organization of CasZ and the structure of the guide RNA (Figs. 6A-6C, 19, and 20). Notably, the RuvC domain of CasZ is split into four- parts across the C-terminal half of the protein, likely hindering reliable alignment and clustering with reported Casl2 systems (FIG. 13D). The REC I and REC II domains are also segmented in the protein sequence, with the PAM-interacting domain wedged within REC I as opposed to the N terminus of the protein as seen in Cas<D, but similar to Casl2i. In contrast to Cas<l>. CasZ contains a Target Strand Loading (TSL) domain that likely functions to load the single-stranded DNA (ssDNA) substrate, in a position analogous to the “Nuc” domain that was incorrectly hypothesized in other type V CRISPR-Cas enzymes to be a second nuclease domain responsible for DNA cleavage. CasZ also exhibits a distinct structure in the REC I domain compared to Cas<P (FIG. 19D).
[0581] The crRNA assumes a shape that blankets the protein, with a recognition lobe in CasZ that binds to distinct sequences and structural features of the guide RNA (Figs. 6C, and 19A-19C). Specifically, possible interactions between primarily polar or charged residues within the REC II domain in CasZ with the conserved motifs of the crRNA hairpin were observed (Figs. 4B and 19C). These residues are conserved across the protein family and likely interact either directly with the RNA nucleobases (Q452, N510), or with the RNA phosphate backbone to stabilize the guide (S451, K496, E444, N445, K503, Y619) (Figs. 6C and 19C). Protein contacts to the unpaired nucleobases A9, A30, A31 motif in the middle of the guide RNA stem-loop were not observed (FIG. 10A), which is further supported by the lack of sequence conservation. However, interestingly, the non-complementarity is conserved, which is likely important for the hairpin kink geometry.
[0582] CRISPR-Cas proteins initiate the unwinding of target dsDNA following PAM recognition. In CasZ, this recognition is achieved via interactions with the oligonucleotide-binding domain (OBD), REC I, and a five a-helical bundle referred to as the PAM-interacting domain (PID). Residues within the three domains interact with the sugar-phosphate backbone of the target DNA (FIG. 19B) and, in some cases such as residue N102, interact directly with the nucleobases. The interaction between N102 and nucleobase G(-l) may explain the preference for purines in this position as opposed to pyrimidines, since a pyrimidine substitution would result in a base that is too distant from the interacting asparagine (FIG. 19B). In examining the aftermath of cis-cleavage of DNA, it was found that Cask had a very low level of ssDNA or ssRNA cleavage in trans upon DNA recognition in cis (FIG. 17B). Incubation of the Cask protein with non-cognate guides from other orthologs within the protein family replicated the ssDNA trans cleavage effect despite differences in their sequence (FIG. 13E), suggesting that guides within the Cask family may be interchangeable, unlike Cas9. Single mismatches across the ssDNA target revealed that the seed region of the target DNA (1-5) and the region extending from bases 7—13 are required to match the spacer sequence of the guide RNA for efficient cleavage (FIG. 13F). Investigation of positions that possibly interact with the DNA in these regions (FIG. 19D) or the corresponding RNA revealed conserved residues in REC, OBD, PID, and RuvC domains that may account for the complex’s intolerance to target mismatches, and, therefore, the possibility of relatively high fidelity in the context of genome editing. Overall, more domains within Cask, such as the RECI, RECII, OBD, RuvC, and TSL domains, exhibit segmentation and rearrangement compared to known type V systems.
Discussion
[0583] This study demonstrates that phage genomes arc a natural reservoir of miniature singleeffector CRISPR-Cas systems, including DNA targeting type II and type V enzymes belonging to the Cas9 and Casl2 superfamilies. Greek nomenclature was used here to indicate the phage origins of Casp, CasQ, and Cask, extending the naming convention established by phage-encoded Cas<I>. In contrast to the prevalence of multi-subunit type I and type III CRISPR systems in prokaryotic genomes, the notable abundance of miniature Casl2-family enzymes in phages may reflect the size restriction of many phage genomes. Because phages evolve quickly, they serve as important sources of new, divergent, or hypercompact CRISPR systems. Some of these, such as Cask, bear sufficient sequence-level divergence to cluster separately from Casl2 and Cas9 systems and obscure a direct evolutionary relationship with known Cas superfamilies. Nonetheless, Cask's structure, domain composition, and biochemical mechanism are similar to other type V enzymes. This finding implies that within phage genomes, distinct type V nucleases may have evolved multiple times from ancestral transposon-encoded TnpB families, which also function as RNA-guided nucleases. Despite being from different clades of phages and having divergent sequences and domain organizations, a convergent evolution of Casl2-like architecture was observed in the Cask and Cas® protein structures. In addition, both can process their own pre-crRNA and rely on the same RuvC active site used for DNA cleavage for this activity. This extreme compression of enzymatic activities within one active site has not been observed for bacterially encoded CRISPR-Cas proteins.
[0584] The molecular structure of the Cask-crRNA-dsDNA complex reported in this study illustrates possible convergent evolution of RNA-guided effectors, despite extreme sequence divergence and distinct ancestral protein origins. The domain architecture of Cask exhibits more segmentation and likely structural rearrangements than have been seen in other Casl2-family enzymes, with multiple functional domains split at the sequence level into separate segments that assemble during protein folding. This unique domain organization may explain the difficulty in accurately aligning Cask to previously reported enzymes, despite overall structural similarity. This segmented domain composition does not compromise genome editing activity - as shown, e.g., for Cask-based editing of human, Arabidopsis, and wheat cells. The finding that Cask can induce efficient genome editing of endogenous genes in these diverse cell types, in some cases exceeding the efficacy of Casl2a-mediated genome editing, shows that there isn’t necessarily a tradeoff between Cas effector size and function. This result, together with the compact size of phage-encoded CRISPR-Cas proteins that is advantageous for vector-based cellular delivery (FIG. 20), shows that nature’s phage reservoir is an important future source of enzymes useful for genome editing in heterologous cell types.
[0585] Overall, the discovery of thousands of viruses encoding CRISPR systems representing all six CRISPR-Cas types highlights the sparsity but broad diversity of RNA-guided systems within viruses. Genome-resolved metagenomics and bioinformatics-enabled phylogenetic insights facilitated the analysis of these systems from uncultivated viruses and inference of their mechanisms of action within their biological contexts. Hundreds of novel hypercompact and divergent CRISPR-Cas systems were investigated here, with particular focus on the unique Cask family. The utility of Cask as a valuable tool for genome editing in plant and human cells is reported here. In addition, the data show how the structural compaction of this protein family preserves robust biochemical and cell-based functionality essential to both natural activities and biotechnological applications.
Experimental model and subject details
Mammalian models
[0586] Mammalian gene-editing experiments were performed in HEK293T cells obtained from University of California Berkeley Cell Culture Facility. HEK293T cells were female in origin and grown in DMEM media (Corning) containing 10% fetal bovine serum (VWR) and lOOU/mL of penicillinstreptomycin (Gibco) at 37°C with 5% CO2. Plant models
[0587] PDS3 gene-editing was tested in A. thaliana protoplasts isolated from the leaves of 4-week- old plants. Following RNP screening experiments, protoplasts were incubated in W5 solution (4 mM MES pH 5.7, 0.5 M mannitol, 20 mM KC1) at RT for 12 h, then moved to 37°C for 2.5 h, followed by a final incubation at room temperature for 48 h.
[0588] Additional experiments in wheat (Triticum aestivum. L cv. Fielder) were performed using protoplasts extracted from leaves of 2-week-old plantlets. Edited cells were incubated in darkness with W5 solution at 30°C for 24 h.
Method details
Phylogenetic analysis
[0589] For analysis of publicly available phage genomes, we analyzed Genbank-recorded phages, complete RefSeq-recorded phages, IMG-VR-recorded phages. Cas protein sequences and representatives from the TnpB superfamily were collected from literature. The resulting set was clustered at 90% amino acid identity to reduce redundancy. A new alignment of Cask with the resulting sequence set was generated using MAFFT with 1000 iterations and filtered to remove columns composed of gaps in 95% of sequences. The phylogenetic tree was inferred using IQTREE vl.6.6 using automatic model selection and 1000 bootstraps. crRNA sequence analysis
[0590] CRISPR-RNA (crRNA) repeats from Phage-encoded CRISPR loci were identified using MinCED (github.com/ctSkennerton/minced). The repeats were compared by generating pairwise similarity scores using the Needleman-Wunsch algorithm. A heatmap was built using the similarity score matrix and hierarchical clustering produced dendrograms that were overlaid onto the heatmap to delineate different clusters of repeats. The RNA structures were predicted with ViennaRNA.41
PAM depletion analysis
[0591] PAM depletion assays were performed with plasmids containing the cask protein coding sequence as derived from metagenomics and a mini CRISPR targeting guide (pBAS18), or with plasmids that contained only the cask gene and a non-targeting guide (pBAS12). Assays were performed as three individual biological replicates. Plasmids containing cask and mini CRISPRs were transformed into E. coli BL21(DE3) (NEB). Subsequently, electrocompetent cells were prepared by ice-cold H20 and 10% glycerol washing. A plasmid library was constructed with 8 randomized nucleotides upstream of the (5') end of the target sequence. Competent cells were transformed in triplicate by electroporation with 200 ng library plasmids (0.1 mm electroporation cuvettes (Bio-Rad) on a Micropulser electroporator (Bio-Rad)). After a 2 h recovery period, cells were plated on selective media and colony forming units were determined to ensure appropriate coverage of all possible combinations of the randomized 5' PAM region. Strains were grown at 25 °C for 48 h on media containing appropriate antibiotics (either 100 pg/mL carbenicillin and 34 pg/mL chloramphenicol, or 100 pg/mL carbenicillin and 50 pg/mL kanamycin) and 0.05 mM isopropyl-p-D-thiogalactopyranoside (IPTG), or 200 nM anhydrotetracycline (aTc), depending on the vector to ensure propagation of plasmids and Cask effector production. Subsequently, propagated plasmids were isolated using a QIAprep Spin Miniprep Kit (Qiagen).
PAM depletion sequencing analysis
[0592] Amplicon sequencing of the targeted plasmid was used to identify PAM motifs that are preferentially depleted. Sequencing reads were mapped to the respective plasmids and PAM randomized regions were extracted. The abundance of each possible 8 nucleotide combination was counted from the aligned reads and normalized to the total reads for each sample. Enriched PAMs were computed by calculating the log ratio compared to the abundance in the control plasmids, and were used to produce sequence logos.
Programmable DNA targeting
[0593] A flp recombination assay was performed in E. coli to eliminate the Kanamycin resistance cassette from E. coli strains that contain GFP and RFP expression cassettes integrated into the genome. Individual colonies of the E. coliMAm were picked to inoculate three 5 mL (LB) starter cultures to prepare electrocompetent cells the following day. 100 mL (LB) main cultures were inoculated from the starter cultures and grown vigorously shaking at 37°C to an OD600 of 0.6-0.7 before preparation of electrocompetent cells by repeated ice-cold H20 and 10% glycerol washes. Cells were resuspended in 10% glycerol and 50 pL aliquots were flash frozen in liquid nitrogen and stored at -80°C. Cask vectors were generated containing codon optimized cask I gene and a guide comprised of its cognate repeat element and selections of spacers targeting the GFP DNA within the resulting E. coliMAon strain (pBAS41, pBAS42, pBAS43, pBAS44) were subcloned from pBAS12. Cask vectors containing Caskl and a guide composed of a non-cognate repeat unit from cask2 and a GFP-targeting spacer (TAGCATCACCTTCACCCTCTCCACGGACAG)(SEQ ID NO: 158) guide were also subcloned to form pBAS40. The Cask vectors and Cask vectors with a non-targeting guide control plasmid were transformed into 25 pL of electrocompetent cells with 100 ng of plasmid via electroporation in 0.1 mm electroporation cuvettes (Bio-Rad) on a Micropulser electroporator (Bio-Rad), cells were recovered in 1 mL recovery medium (Lucigen) shaking at 37°C for 1 h 10-fold dilution series were then prepared and 3.5 pL of the respective dilutions were spot-plated on LB-Agar containing the appropriate antibiotics and IPTG inducer. Plates were incubated overnight at 37°C and colonies were counted the following day to determine the transformation efficiency. To assess the transformation efficiency, the mean and standard deviations were calculated from the cell forming units per ng transformed plasmids for the electroporation triplicates. The experiment showed marked reduction of GFP E. coli using Cask vectors with their cognate guides (pBAS44) in comparison to the non-targeting control, indicating a dsDNA break at the target region. The growth of primarily RFP-positive/GFP-negative colonies under blue light further supports the ability to confer targeted programmable genome editing to result in strains lacking GFP production. Growth inhibition using Cask vectors with guides from a separate Cask ortholog (pBAS40), with colonies observed expressing primarily RFP and no GFP, also indicate that Cask orthologs may function using guides from related CRISPR-Cas systems to confer editing in cells, with a precise ablation of GFP production. This can be further expanded to HEK293T mammalian cells with integrated GFP, which indicate activity in mammalian cells. The sickly phenotype of E. coli colonies that have grown in both cases even in undiluted samples is also indicative of possible trans-cleavage of nucleic acids (RNA or DNA), which can be used for diagnostic purposes by providing a sample containing the target nucleic acid with the Cask RNP and a ssDNA fluorophore-quencher (ssDNA-FQ) reporter or RNA fluorophore- quencher (ssRNA-FQ) reporter molecule, generating a strong fluorescence signal in the presence of the target nucleic acid compared to a markedly lower fluorescence signal in its absence.
Protein purification
[0594] Cask overexpression vectors containing a His-Tag were transformed into chemically competent E. coli BL21(DE3)-Star (QB3-Macrolab, UC Berkeley) and incubated overnight at 37°C on LB -Kan agar plates (50 pg/mL Kanamycin). Single colonies were picked to inoculate 50 mL (LB, Kanamycin 50 pg/mL) starter cultures which were incubated at 37°C shaking vigorously overnight. The following day, 1.5 L TB-Kan media (50 pg/mL Kanamycin) were inoculated with 40 mL starter culture and grown at 37°C to an GD600 of 0.6, cooled down on ice, and gene expression was subsequently induced with 0.5 mM IPTG followed by incubation overnight at 16°C. The cells were harvested by centrifugation and resuspended in wash buffer (50 mM HEPES-Na pH 7.5 RT, 500 mM NaCl, 20 mM Imidazole, 5% glycerol and 0.5 mM TCEP), and then subsequently lysed by sonication. The soluble fraction was loaded on a 5 mL Ni-NTA Superflow Cartridge (Qiagen) which had been pre-equilibrated in the same wash buffer. Bound proteins were washed with 20 column volumes (CV) wash buffer and subsequently eluted in 5 CV elution buffer (50 mM HEPES-Na pH 7.5 RT, 500 mM NaCl, 500 mM imidazole, 5% glycerol, and 0.5 mM TCEP). The eluted proteins were concentrated to 1 mL before injection into a HiLoad 16/600 Superdex 200pg column (GE Healthcare) pre-equilibrated in sizeexclusion chromatography buffer (20 mM HEPES-Na pH 7.5 RT, 500 mM NaCl, 5% glycerol, and 0.5 mM TCEP). Peak fractions were concentrated to 1 mL and concentrations were determined using a NanoDrop 8000 Spectrophotometer (Thermo Scientific). Proteins were purified at a constant temperature of 4°C and concentrated proteins were kept on ice to prevent aggregation, snap-frozen in liquid nitrogen, and stored at -80°C. SDS-PAGE gel electrophoresis of Cask at varying stages of protein purification showed a protein size in line with computationally predicted values of 85 kDa.
Pre-crRNA processing assays
[0595] The reactions were carried out in RNA cleavage buffer containing 20 mM Tris-Cl (pH 7.5 at 37°C), 150 mM KC1, 5 mM MgC12, 1 mM TCEP, and 5% (v/v) glycerol. Pre-crRNA substrates were 5'- radiolabeled with T4 PNK (NEB) in the presence of gamma 32P-ATP. In a typical pre-crRNA processing reaction, the concentrations of Cask and 32P-labeled pre-crRNA substrates were 100 and 3 nM, respectively. Reactions were incubated at 37°C, and an aliquot of each reaction was quenched with 2x Quench Buffer (94% (v/v) formamide, 30 mM EDTA, 400 pg/mL heparin, 0.2% SDS, and 0.025% (w/v) bromophenol blue) at 0, 1, 5, 15, 30, and 60 min. RNA hydrolysis ladders were prepared by incubating RNA probes in IX RNA Alkaline Hydrolysis Buffer (Invitrogen) at 95°C before the addition of 2x Quench Buffer. Quenched reactions were incubated at 95 °C for 3 min, and products were then resolved by denaturing PAGE (10% or 20% acrylamide:bis-acrylamide 19:1, 7 M urea, IX TBE). Gels were dried (3 h, 80°C) on a Model 583 Gel Dryer (Bio-Rad) and exposed to a phosphor screen. Phosphor screens were imaged on an Amcrsham Typhoon phosphorimagcr (GE Healthcare). For assays in an EDTA- containing buffer, 25 mM EDTA was substituted for 5 mM MgC12.
In vitro cleavage assays - Radiolabeled nucleic acids
[0596] crRNA oligonucleotides were synthesized by IDT and dissolved in DEPC-treated ddH20 to a concentration of 0.5 mM. Subsequently, the crRNA was heated to 65 °C for 3 min and allowed to cool down to room temperature. Cask RNP complexes were reconstituted at a concentration of 10 pM by incubation of 10 pM Cask and 12 pM crRNA for 10 min at RT in 2x cleavage buffer (20 mM Hepes-Na pH 7.5, 300 mM KC1, 10 mM MgC12, 20% glycerol, 1 mM TCEP). RNPs were aliquoted to a volume of 10 pL, flash-frozen in liquid nitrogen, and stored at -80°C. RNP aliquots were thawed on ice before experimental use. Substrates were 5 '-end-labelled using T4-PNK (NEB) in the presence of 32P-y-ATP. Oligonucleotide-duplex targets were generated by combining 32P-labelled and unlabelled complementary oligonucleotides in a 1:1.5 M ratio. Oligos were hybridized to a DNA-duplex concentration of 50 nM in hybridization buffer (10 mM Hepes-Na pH 7.5 RT, 150 mM NaCl), by heating for 5 min to 95°C and a slow cool down to RT in a heating block. Cleavage reactions were initiated by combining 200 nM RNP with 2 nM substrate in CB buffer and subsequently incubated at 37°C. Reactions were stopped by the addition of two volumes of formamide loading buffer (96% formamide, 100 pg/mL bromophenol blue, 50 pg/mL xylene cyanol, 10 mM EDTA, 50 pg/mL heparin), heated to 95°C for 5 min, and cooled down on ice before separation on a 12.5% denaturing urea-PAGE. Gels were dried for 4 h at 80°C before phosphor-imaging visualization using an Amersham Typhoon scanner, v2.0.0.6 firmware version 208 (GE Healthcare). Bands were quantified using ImageQuant TL 8.1 (Cytiva) and the cleaved fraction was calculated as the product intensity sum divided by the combined substrate and product intensity sum. Curves were fitted to a One-Phase-Decay model to derive the rate of cleavage.
Fluorophore quencher and DNA mismatch tolerance assay
[0597] DNA oligo activators were ordered from IDT to contain mismatches at each respective position, (A- > C, T- > G, C- > A, G- > T). Cask RNPs were prepared as described above. Reactions were started by combining 100 nM RNP (100 nM Cask, 120 nM crRNA), 100 nM DNase Alert (IDT) FQ probe, with and without activator ssDNA and with the addition of a non-targeting guide or activator control in cleavage buffer in a 384 well flat bottom black polystyrene assay plate (#3820, Corning). Three replicates for each reaction were monitored (lex: 530 nm; lex: 590 nm) in a Cytation 5 plate reader (BioTek, software Gen v3.04) at 37°C every 1.5 min for the activator titration experiment. For the FQ- mismatch-assay, 2 nM activator oligonucleotides were used in singlicates. The data were background- subtracted using the mean values of the measurements taken for three no-activator controls at the respective time point.
I-X PAM binding assay
[0598] The PAM binding assay was conducted using NEB 5-alpha Competent E. coli cells. Plasmids containing the type I-X system included a targeting or non-targeting guide downstream of T7 promoters. PAM library plasmids contained sfGFP under the control of an araBAD promoter, downstream of the promoter was a six-nucleotide variable region of potential PAM sequences, resulting in loss of sfGFP fluorescence for a successful PAM binding event. All cultures used 2xYT media and were supplemented with kanamycin and ampicillin as needed for plasmid maintenance. Cell densities were maintained at greater than lOOx library coverage throughout the assay.
[0599] Transformations of type I-X systems with guide and library plasmids were conducted consecutively. Type I-X systems with guides were transformed into NEB 5-alpha Competent E. coli cells following manufacturer’s instructions. Individual colonies were incubated at 37°C overnight at 250 RPM. Non-transformed cultures were included for library only and no plasmid controls. Cells were back diluted lOOOx and cultured to ABS600 ~0.6, pelleted, washed 3 times with sterile water, and resuspended in 10% glycerol to make them electrocompetent. Type I-X systems with guide, and non-transformed cultures were electroporated at 1800V with 100 ng of PAM library stock and recovered for 1 h in SOC media. Recovered cells were plated with appropriate antibiotics and incubated overnight. Plates were scraped, resuspended, and incubated at 37°C 250 RPM for 3 h. 25% glycerol stocks were stored at -80°C.
[0600] In preparation for fluorescence-activated cell sorting, stocks were back-diluted lOOx and cultured to ABS600 ~1 OD. To induce proteins and guides, cultures were back diluted lOOOx, supplemented with 0.5 mM IPTG and 1% arabinose. Strains were cultured overnight at 30°C until targeting and non-targeting strains reached an ABS600 ~1 OD. Prior to sorting on the Sony SH800 Cell Sorter, cells were pelleted and resuspended in PBS to 0.6 OD. The non-targeting strain was used to set forward scatter (Gate 1) and singlets (Gate 2) gates and to detect any decrease in fluorescence in the targeting strain. For the targeting strain, at least 270,000 events were sorted for the lowest ~0.1 % of the fluorescent cell population (Gate 3) and 500,000 events for the next ~0.13 to 1.23% lowest fluorescent cells (Gate 4). Sorted cells were grown overnight on plates containing appropriate antibiotics along with non-targeting, library only, and no plasmid controls.
[0601] Targeting, non-targeting, and library only strains were individually prepared for next generation sequencing by first purifying plasmid DNA using a Qiagen HiSpeed Plasmid Maxi Kit. Plates were gently scraped and resuspended in —50 mL 2xYT prior to pelleting. Concentrations were determined with a Nanodrop. In conjunction with the original naive PAM library stock control, PAM sequences were amplified using primers containing the 5' stub sequence GCTCTTCCGATCT. (SEQ ID NO: 159) Samples were submitted to the Innovative Genomics Core for completion of library preparation and iSeq sequencing at greater than lOOx library coverage.
Mammalian genome editing
[0602] RNPs were formed in the SF nucleofection buffer (Lonza) with lOOpmol protein & 120pmol crRNA at lOpM concentration for 10' at RT. 78 pmol (IpL) of IDT Casl2a electroporation enhancer was then added. HEK293T cells (University of California Berkeley Cell Culture Facility) were added in a I OpL SF nucleofection buffer at 200,000 cells per nucleofection. 21 pL reactions were loaded into cuvettes and electroporated with pulse code DS-150 in a 4D-nucleofector (Lonza). Nucleofections were performed in triplicate for each guide RNA tested. Cells were grown in duplicate in DMEM media (Corning) containing 10% fetal bovine serum (VWR) and lOOU/mL of penicillin-streptomycin (Gibco) from each nucleofection in 24-well plates at 37°C with 5% CO2. gDNA was collected after 72 h in Quick Extract (Lucigen) by heating at 65°C for 20 min followed by 95°C for 20 min. PCR1 was performed followed by bead clean-up to remove primers and submitted for PCR2, bead clean-up, and iSeq (Illumina) at the IGI Center for Translational Genomics. Approximately 20,000 reads per sample (2 x 150bp) were analyzed for genome editing using CRISPResso2 (https://crispresso.pinellolab.partners.org/login).
Plant genome editing
[0603] Guides were designed to target the PDS3 gene in A. thaliana protoplasts, incubated with protein as described for in vitro assays, and 26pL of 4pM RNP was transfected onto Arabidopsis protoplasts as previously described.
[0604] Wheat (T. aestivum. L cv. Fielder) seedlings were grown in darkness on wet filter paper, wherein every third day seedlings were exposed to 6 h low light (~100 pE m-2 s-1). Protoplasts from 2- week-old plantlets were isolated from leaf tissue as previously described. Leaves were cut into 0.5mm strips perpendicular to the leaf midrib in 0.6 M mannitol solution, then placed in freshly prepared enzyme solution (20 mM MES pH 5.7, 0.6 M mannitol, 10 mM KC1, 1.5% cellulase RIO, 0.75% macerozyme RIO). Leaf strips in solution were vacuum infiltrated for 30 min in darkness and then incubated for 6 h shaking at 70 rpm. After the incubation, the enzyme/protoplast solution was diluted with equal volume of W5 solution (2 mM MES pH 5.7, 154 mM NaCl, 125 mM CaC12, 5 mM KC1) and filtered through 40 pm cell strainers. Protoplasts were spun down at 80g for 3 min, then resuspended in 15mL W5 solution and left to aggregate in ice for 60 min. Supernatant was removed and protoplasts were resuspended in MMG solution (4mM MES pH 5.7, 0.4 M mannitol, 15mM MgC12) at 2.5x105 cells/mL. Cask RNP complexes were reconstituted with 6 pM Cask protein, purified as described, and 10 pM guideRNA assembled in RNP reconstitution buffer (20mM Hepes-Na pH 7.5, 300 mM KC1, lOmM MgC12, 20% glycerol, ImM TCEP) and incubated for 20 min at 37°C. 25 pL of 6 pM assembled RNP were added to a 1.5 mL tube, then mixed with 200 pL protoplasts. After flicking to mix, 220 pL PEG-CaC12 solution (40% PEG 4000, 0.2 M mannitol, 100 mM CaC12) was added to tubes, and samples were mixed thoroughly by slowly inverting the Cryo-EM tube until streaks of PEG were no longer visible. Samples were incubated for 15 min in darkness at RT, then 880 pL of W5 solution was added and mixed by inverting. Protoplasts were harvested by centrifugation at 80g for 3 min, resuspended in 1.2 mL W5 solution (4 mM MES pH 5.7, 0.5 M mannitol, 20 mM KC1), and plated into 12-well plates. Plate edges were sealed with parafilm and cells were incubated for 24 h in darkness at 30°C. At the end of the incubation period, protoplasts were collected in 1.5 mL tubes, pelleted at 8000 g, and flash frozen. gDNA was extracted from protoplasts using 2X CTAB47 and suspended in 50 pL DEPC-trcatcd H2O. 150-200 bp amplicons for amplicon sequencing were obtained using 30 cycles of Q5 High Fidelity DNA Polymerase, cleaned using the Zymo DNA Clean and Concentrator kit, then subsequently amplified using 15 cycles of PrimeSTAR GXL DNA Polymerase to add the requisite indices for sequencing. Amplicons were sequenced via paired-end 150 bp amplicon sequencing using an Illumina iSeq 100.
Ternary complex reconstitution for cryo-EM
[0605] Cask was produced as described above. crRNA (rBAS80, ssRNA oligo, AUUGUUGUAACUCUUAUUUUGUAUGGAGUAAACAACUAGCAUCACCUUCACC ) (SEQ ID NO: 160) was ordered as a synthetic RNA oligonucleotide from IDT and dissolved in DEPC-treated ddH2O to a concentration of 0.5mM. Subsequently, the crRNA was heated to 65 °C for 3 min and cooled down to RT to allow for hairpin formation. DNA oligonucleotides (dBAS608, ssDNA oligo, CTTGACCGTTTGATCGTAGTGGAAGTGGGAGATAGTAATGTTAATG (SEQ ID NO: 161); and dBAS609, ssDNA oligo, CATTAACATTACTAAGAGGGTGAAGGTGATGCTACAAACGGTCAAG) (SEQ ID NO: 162) were designed to contain a non complementary protospacer segment to produce ‘bubbled’ substrates and facilitate rapid R-loop formation during ternary complex reconstitution.
Oligonucleotides were ordered from and synthesized by IDT. DNA oligonucleotides were combined in a 1 : 1.2 M ratio (target strand: non-target strand) and annealed to form a DNA duplex in hybridization buffer (lOmM Hepes-Na pH 7.5 RT, 150mM NaCl) by heating for 5 min at 95°C and a subsequent slow cool down in a thermocycler.
[0606] Prior to reconstitution, thawed CasZ protein was incubated with crRNA in 1:1.1 ratio for 10 min at room temperature, and the DNA duplex was added. The ternary complex was reconstituted with the final CasZ: crRNA: TS: NTS strands stoichiometry of 1 : 1.1: 1.2 : 1.4, for another 10 min at RT, and further injected into a Superdex 200 prep grade 10/300 column (GE Healthcare) pre-equilibrated in low salt buffer (lOmM Hepes-Na pH 7.5, 150mM NaCl) at 4°C to separate complexes from excess nucleic acids. Peak fractions were pooled and concentrated down to ~20 pM with a centrifugal filter device (Millipore 10 kDa Mw cutoff), as measured by absorbance at 260 nm with a NanoDrop 8000 Spectrophotometer (Thermo Scientific), and kept on ice before plunge-freezing.
Electron microscopy grid preparation and data collection
[0607] The resulting sample was frozen using FEI Vitrobot Mark IV, cooled to 8 °C at 100% humidity. 1.2/1.3 300 mesh UltrAuFoil gold grids (Electron Microscopy Sciences #Q350AR13A), were glow discharged at 15 mA for 25 s using PELCO easyGLOW. Total volume of 4 pL sample was applied to the grid and immediately blotted for 5 s with a blot force of 8 units. Micrographs were collected on a Talos Arctica operated at 200 kV and x36,000 magnification (1.115 A pixel size), in the super-resolution setting of K3 Direct Electron Detector. Cryo-EM data was collected using SerialEM v.3.8.7 software. Images were obtained in a series of exposures generated by the microscope stage and beam shifts. Single -particle cryo-EM data processing and 3D volume reconstruction
[0608] In total, 2795 movies were collected with a defocus range of -0.8 to -2.2 m and a 20° tilt.
Data processing was further performed in cryoSPARC v3.2.040. Movies were corrected for beam- induced motion using patch motion, and CTF parameters were calculated using patch CTF. Two rounds of Topaz training were applied to the data to enrich the amounts of Cask ternary complex particles picked as follows. In the first round, as a result of initial curation, a subset of 562 micrographs with seemingly best ice quality and CTF fit were selected. Further, 3931 particles were manually picked and submitted to Topaz particle training. The resulting Topaz model was used to pick particles from the micrographs, and a total of 153,537 particles were extracted with bin factor 2, and applied to 2D classification. Following the selection of the best classes, 113,638 particles were used for ab initio reconstruction with three classes. The 55,587 particles constituting the best class in terms of resolution and resemblance to an RNP were subject to non-uniform map refinement, and an initial complex map was obtained. In the second round, the latter particles were used to train a new Topaz model. Following the second round of curation, a total of 1931 micrographs were selected, and the new Topaz model was applied to pick and extract the particles. In total, 884,595 particles were subject to a round of 2D classification. After excluding a minor subset of classes, a total of 874,1 19 particles were selected and submitted to ab initio reconstruction with three classes. Three resulting maps and all particles were applied to a round of heterogeneous refinement. Particles constituting the best class in terms of resolution were subject to the remove duplicates procedure, and further to non-uniform map refinement. As a result, a 2.99 A map reconstructed from a total of 369,389 particles was obtained. Half-maps from this refinement were used to generate the final LocSpiral map with improved weaker density regions.34 This map was further used for model building.
Model building and refinement
[0609] The initial model of the Cask protein was obtained with the AlphaFold2 program. The predicted model was split into two parts (eventually constituting REC and Nuc lobes), and each was docked independently into the map with the fitmap tool in UCSF ChimeraX vl.2.538. The dsDNA and crRNA models were built de novo. The combined ternary complex model was refined using the real- space refinement and rigid body fit tools in Coot vO.9.4.111. Finally, the model was subject to a round of real_space_refine tool in Phenix vl.19.2-4158, using secondary structure, Ramachandran, and rotamer restraints.
Data deposition and FIG. preparation
[0610] Cryo-EM maps and model coordinates were deposited to the EMDB (code EMD-27320) and PDB (code 8DC2). The structure Figs, were generated in UCSF ChimeraX vl.2.538. Cryo-EM map a levels were calculated as: map level/root-mean-square deviation from zero. The orientation distribution plots were either obtained from CryoSPARC or generated using pyem csparc2star.py and star2bild.py programs. Map versus model Fourier shell correlation (FSC) graphs were calculated in Mtriage, as implemented in Phenix.38 Gold standard FSC plot was generated in cryoSPARC.
Quantification and statistical analysis
[0611] Computational analyses and Figs, were prepared as described in Method Details. Where applicable, the SEM was graphed to depict variability across replicates.
[0612] While the present invention has been described with reference to the specific embodiments thereof, it should be understood by those skilled in the art that various changes may be made and equivalents may be substituted without departing from the true spirit and scope of the invention. In addition, many modifications may be made to adapt a particular situation, material, composition of matter, process, process step or steps, to the objective, spirit and scope of the present invention. All such modifications are intended to be within the scope of the claims appended hereto.

Claims

CLAIMS What is claimed is:
1. A composition comprising: a) a CRISPR-Cas effector polypeptide, or a nucleic acid comprising a nucleotide sequence encoding the CRISPR-Cas effector polypeptide, wherein the CRISPR-Cas effector polypeptide comprises an amino acid sequence having at least 50% amino acid sequence identity to the amino acid sequence depicted in any one of FIG. 5 A-5M, wherein the CRISPR-Cas effector polypeptide has a length of from 600 amino acids to 850 amino acids; and b) a CRISPR-Cas effector guide RNA, or one or more DNA molecules encoding the CRISPR- Cas effector guide RNA, wherein the CRISPR-Cas effector guide RNA comprises a target nucleic acid binding segment and a CRISPR-Cas effector polypeptide-binding segment, and wherein the target nucleic acid binding segment is heterologous to the CRISPR-Cas effector polypeptide-binding segment.
2. The composition of claim 1 , wherein the CRISPR-Cas effector polypeptide comprises an amino acid sequence having at least 80% amino acid sequence identity to the amino acid sequence depicted in any one of FIG. 5 A-5M.
3. The composition of claim 1, wherein the CRISPR-Cas effector guide RNA comprises a nucleotide sequence having 80%, 90%, 95%, 98%, 99%, or 100%, nucleotide sequence identity with any one of the nucleotide sequences depicted in FIG. 5A-5M, or is encoded by a nucleic acid comprising a nucleotide sequence having 80%, 90%, 95%, 98%, 99%, or 100%, nucleotide sequence identity with any one of the nucleotide sequences depicted in FIG. 5A-5M.
4. The composition of any one of claims 1-3, wherein the CRISPR-Cas effector polypeptide is fused to a nuclear localization signal (NLS).
5. The composition of any one of claims 1-4, wherein the composition comprises a lipid.
6. The composition of any one of claims 1-4, wherein a) and b) are within a liposome.
7. The composition of any one of claims 1 -4, wherein a) and h) are within a particle.
8. The composition of any one of claims 1-7, comprising one or more of: a buffer, a nuclease inhibitor, and a protease inhibitor.
9. The composition of any one of claims 1-8, wherein the CRISPR-Cas effector polypeptide comprises an amino acid sequence having 96% or more identity to the amino acid sequence depicted in any one of FIG. 5A-5M.
10. The composition of any one of claims 1 -9, wherein the CRISPR-Cas effector polypeptide is a nickase that can cleave only one strand of a double-stranded target nucleic acid molecule.
11. The composition of any one of claims 1-9, wherein the CRISPR-Cas effector polypeptide is a catalytically inactive CRISPR-Cas effector polypeptide (dCRISPR-Cas effector).
12. The composition of any one of claims 1-11, wherein the CRISPR-Cas effector polypeptide has a length of from 600 amino acids to 800 amino acids.
13. The composition of any one of claims 1-12, further comprising a DNA donor template.
14. The composition of any one of claims 1-13, wherein the CRISPR-Cas effector guide RNA is a single molecule.
15. The composition of any one of claims 1-14, wherein the CRISPR-Cas effector guide RNA comprises one or more of a base modification, a sugar modification, and a backbone modification.
16. A CRISPR-Cas effector fusion polypeptide comprising: a) a CRISPR-Cas effector polypeptide comprising an amino acid sequence having at least 50% amino acid sequence identity to the amino acid sequence depicted in any one of FIG. 5 A-5M, wherein the CRISPR-Cas effector polypeptide comprises a RuvC-like domain, and wherein the CRISPR-Cas effector polypeptide has a length of from 250 amino acids to 500 amino acids; and b) one or more heterologous polypeptides.
17. The CRISPR-Cas effector fusion polypeptide of claim 16, wherein the CRISPR-Cas effector polypeptide comprises an amino acid sequence having 80% or more identity to the amino acid sequence depicted in any one of FIG. 5 A-5M.
18. The CRISPR-Cas effector fusion polypeptide of claim 16, wherein the CRISPR-Cas effector polypeptide comprises an amino acid sequence having 90% or more identity to the amino acid sequence depicted in any one of FIG. 5A-5M.
19. The CRISPR-Cas effector fusion polypeptide of any one of claims 16-18, wherein the CRISPR-Cas effector polypeptide is a nickase that can cleave only one strand of a double-stranded target nucleic acid molecule.
20. The CRISPR-Cas effector fusion polypeptide of any one of claims 16-18, wherein the CRISPR-Cas effector polypeptide is a catalytically inactive CRISPR-Cas effector polypeptide (dCRISPR- Cas effector).
21. The CRISPR-Cas effector fusion polypeptide of any one of claims 16-20, wherein the CRISPR-Cas effector polypeptide has a length of from 600 amino acids to 800 amino acids.
22. The CRISPR-Cas effector fusion polypeptide of any one of claims 16-21, wherein the heterologous polypeptide is fused to the N-terminus and/or the C-terminus of the CRISPR-Cas effector polypeptide.
23. The CRISPR-Cas effector fusion polypeptide of any one of claims 16-22, comprising a nuclear localization signal (NLS).
24. The CRISPR-Cas effector fusion polypeptide of any one of claims 16-23, wherein the heterologous polypeptide is a targeting polypeptide that provides for binding to a cell surface moiety on a target cell or target cell type.
25. The CRISPR-Cas effector fusion polypeptide of any one of claims 16-23, wherein the heterologous polypeptide exhibits enzymatic activity.
26. The CRISPR-Cas effector fusion polypeptide of claim 25, wherein the heterologous polypeptide exhibits one or more enzymatic activities selected from: nuclease activity, methyltransferase activity, demethylase activity, DNA repair activity, DNA damage activity, deamination activity, dismutase activity, alkylation activity, depurination activity, oxidation activity, pyrimidine dimer forming activity, integrase activity, transposase activity, recombinase activity, polymerase activity, ligase activity, helicase activity, photolyase activity and glycosylase activity.
27. The CRISPR-Cas effector fusion polypeptide of claim 25, wherein the heterologous polypeptide exhibits one or more enzymatic activities selected from: reverse transcriptase activity, nuclease activity, methyltransferase activity, demethylase activity, deamination activity, depurination activity, integrase activity, transposase activity, and recombinase activity.
28. The CRISPR-Cas effector fusion polypeptide of any one of claims 16-23, wherein the heterologous polypeptide exhibits an enzymatic activity that modifies a target polypeptide associated with a target nucleic acid.
29. The CRISPR-Cas effector fusion polypeptide of claim 28, wherein the heterologous polypeptide exhibits histone modification activity.
30. The CRISPR-Cas effector fusion polypeptide of claim 28 or claim 29, wherein the heterologous polypeptide exhibits one or more enzymatic activities selected from: methyltransferase activity, demethylase activity, acetyltransferase activity, deacetylase activity, kinase activity, phosphatase activity, ubiquitin ligase activity, deubiquitinating activity, adenylation activity, deadenylation activity, SUMOylating activity, dcSUMOylating activity, ribosylation activity, dcribosylation activity, myristoylation activity, demyristoylation activity, glycosylation activity, and deglycosylation activity.
31. The CRISPR-Cas effector fusion polypeptide of claim 30, wherein the heterologous polypeptide exhibits one or more enzymatic activities selected from: methyltransferase activity, demethylase activity, acetyltransferase activity, and deacetylase activity.
32. The CRISPR-Cas effector fusion polypeptide of any one of claims 16-23, wherein the heterologous polypeptide is an endosomal escape polypeptide.
33. The CRISPR-Cas effector fusion polypeptide of any one of claims 16-23, wherein the heterologous polypeptide is a protein that increases or decreases transcription.
34. The CRISPR-Cas effector fusion polypeptide of claim 33, wherein the heterologous polypeptide is a transcriptional repressor domain.
35. The CRISPR-Cas effector fusion polypeptide of claim 33, wherein the heterologous polypeptide is a transcriptional activation domain.
36. The CRISPR-Cas effector fusion polypeptide of any one of claims 16-23, wherein the heterologous polypeptide is a protein binding domain.
37. A nucleic acid comprising a nucleotide sequence encoding the CRISPR-Cas effector fusion polypeptide of any one of claims 16-36.
38. The nucleic acid of claim 37, wherein the nucleotide sequence encoding the CRISPR-Cas effector fusion polypeptide is operably linked to a promoter.
39. The nucleic acid of claim 38, wherein the promoter is functional in an archaeal cell.
40. The nucleic acid of claim 38, wherein the promoter is functional in a eukaryotic cell.
41. The nucleic acid of claim 40, wherein the promoter is functional in one or more of: a plant cell, a fungal cell, an animal cell, cell of an invertebrate, a fly cell, a cell of a vertebrate, a mammalian cell, a primate cell, a non-human primate cell, and a human cell.
42. The nucleic acid of any one of claims 39-41, wherein the promoter is one or more of: a constitutive promoter, an inducible promoter, a cell type-specific promoter, and a tissue-specific promoter.
43. The nucleic acid of any one of claims 38-42, wherein the nucleic acid is a recombinant expression vector.
44. The nucleic acid of claim 43, wherein the recombinant expression vector is a recombinant adenoassociated viral vector, a recombinant retroviral vector, or a recombinant lentiviral vector.
45. The nucleic acid of claim 39, wherein the promoter is functional in a prokaryotic cell.
46. The nucleic acid of claim 38, wherein the nucleic acid is an mRNA.
47. One or more nucleic acids comprising:
(a) a nucleotide sequence encoding a CRISPR-Cas effector guide RNA: and
(b) a nucleotide sequence encoding a CRISPR-Cas effector polypeptide, wherein the CRISPR-Cas effector polypeptide comprises an amino acid sequence having 50% or more amino acid sequence identity to the amino acid sequence set forth in any one of FIG. 5A-5M.
48. The one or more nucleic acids of claim 47, wherein the CRISPR-Cas effector polypeptide comprises an amino acid sequence having 60% or more, or 75% or more, amino acid sequence identity to the amino acid sequence depicted in any one of FIG. 5A-5M.
49. The one or more nucleic acids of claim 47, wherein the CRISPR-Cas effector polypeptide comprises an amino acid sequence having 85% or more amino acid identity to the amino acid depicted in any one of FIG. 5A-5M.
50. The one or more nucleic acids of any one of claims 47-49, wherein the CRISPR-Cas effector guide RNA comprises a nucleotide sequence having 80% or more nucleotide sequence identity with any one of the nucleotide sequences set forth in FIG. 5A-5M; or is encoded by a nucleic acid comprising a nucleotide sequence having 80%, 90%, 95%, 98%, 99%, or 100%, nucleotide sequence identity with any one of the nucleotide sequences depicted in FIG. 5A-5M.
51. The one or more nucleic acids of any one of claims 47-50, wherein the CRISPR-Cas effector polypeptide is fused to a nuclear localization signal (NLS).
52. The one or more nucleic acids of any one of claims 47-51, wherein the nucleotide sequence encoding the CRISPR-Cas effector guide RNA is operably linked to a promoter.
53. The one or more nucleic acids of any one of claims 47-52, wherein the nucleotide sequence encoding the CRISPR-Cas effector polypeptide is operably linked to a promoter.
54. The one or more nucleic acids of claim 52 or claim 53, wherein the promoter operably linked to the nucleotide sequence encoding the CRISPR-Cas effector guide RNA, and/or the promoter operably linked to the nucleotide sequence encoding the CRISPR-Cas effector polypeptide, is functional in a eukaryotic cell.
55. The one or more nucleic acids of claim 54, wherein the promoter is functional in one or more of: a plant cell, a fungal cell, an animal cell, cell of an invertebrate, a fly cell, a cell of a vertebrate, a mammalian cell, a primate cell, a non-human primate cell, and a human cell.
56. The one or more nucleic acids of any one of claims 53-55, wherein the promoter is one or more of: a constitutive promoter, an inducible promoter, a cell type-specific promoter, and a tissuespecific promoter.
57. The one or more nucleic acids of any one of claims 47-56, wherein the one or more nucleic acids is one or more recombinant expression vectors.
58. The one or more nucleic acids of claim 57, wherein the one or more recombinant expression vectors are selected from: one or more adenoassociated viral vectors, one or more recombinant retroviral vectors, or one or more recombinant lenti viral vectors.
59. The one or more nucleic acids of claim 53, wherein the promoter is functional in a prokaryotic cell.
60. A eukaryotic cell comprising one or more of: a) a CRISPR-Cas effector polypeptide, or a nucleic acid comprising a nucleotide sequence encoding the CRISPR-Cas effector polypeptide, wherein the CRISPR-Cas effector polypeptide comprises an amino acid sequence having 50% or more amino acid sequence identity to the amino acid sequence depicted in any one of FIG. 5A-5M; b) a CRISPR-Cas effector fusion polypeptide of any one of claims 16-36, or a nucleic acid comprising a nucleotide sequence encoding the CRISPR-Cas effector fusion polypeptide, and c) a CRISPR-Cas effector guide RNA, or a nucleic acid comprising a nucleotide sequence encoding the CRISPR-Cas effector guide RNA, wherein the CRISPR-Cas effector guide RNA comprises a binding segment that binds to a CRISPR-Cas effector polypeptide that comprises an amino acid sequence having 50% or more amino acid sequence identity to the amino acid sequence depicted in any one of FIG. 5 A-5M.
61. The eukaryotic cell of claim 60, comprising the nucleic acid encoding the CRISPR-Cas effector polypeptide, wherein said nucleic acid is integrated into the genomic DNA of the cell.
62. The eukaryotic cell of claim 60 or claim 61, wherein the eukaryotic cell is a plant cell, a mammalian cell, an insect cell, an arachnid cell, a fungal cell, a bird cell, a reptile cell, an amphibian cell, an invertebrate cell, a mouse cell, a rat cell, a primate cell, a non-human primate cell, or a human cell.
63. A cell comprising a comprising a CRISPR-Cas effector fusion polypeptide, or a nucleic acid comprising a nucleotide sequence encoding the CRISPR-Cas effector fusion polypeptide.
64. The cell of claim 63, wherein the cell is a prokaryotic cell.
65. The cell of claim 63 or claim 64, comprising the nucleic acid comprising a nucleotide sequence encoding the CRISPR-Cas effector fusion polypeptide, wherein said nucleic acid molecule is integrated into the genomic DNA of the cell.
66. A method of modifying a target nucleic acid, the method comprising contacting the target nucleic acid with: a) a CRISPR-Cas effector polypeptide; and b) a CRISPR-Cas effector guide RNA comprising a guide sequence that hybridizes to a target sequence of the target nucleic acid, wherein said contacting results in modification of the target nucleic acid by the CRISPR-Cas effector polypeptide, wherein the CRISPR-Cas effector polypeptide comprises an amino acid sequence having 50% or more amino acid sequence identity to the amino acid sequence set forth in any one of FIG. 5A-5M.
67. The method of claim 66, wherein said modification is cleavage of the tar get nucleic acid.
68. The method of claim 66 or claim 67, wherein the target nucleic acid is selected from: double stranded DNA, single stranded DNA, RNA, genomic DNA, and extrachromosomal DNA.
69. The method of any of claims 66-68, wherein said contacting takes place in vitro outside of a cell.
70. The method of any of claims 66-68, wherein said contacting takes place inside of a cell in culture.
71. The method of any of claims 66-68, wherein said contacting takes place inside of a cell in vivo.
72. The method of claim 70 or claim 71, wherein the cell is a eukaryotic cell.
73. The method of claim 72, wherein the cell is selected from: a plant cell, a fungal cell, a mammalian cell, a reptile cell, an insect cell, an avian cell, a fish cell, a parasite cell, an arthropod cell, a cell of an invertebrate, a cell of a vertebrate, a rodent cell, a mouse cell, a rat cell, a primate cell, a nonhuman primate cell, and a human cell.
74. The method of claim 70 or claim 71, wherein the cell is a prokaryotic cell.
75. The method of any one of claims 66-74, wherein said contacting results in genome editing.
76. The method of any one of claims 66-75, wherein said contacting comprises: introducing into a cell: (a) the CRISPR-Cas effector polypeptide, or a nucleic acid comprising a nucleotide sequence encoding the CRISPR-Cas effector polypeptide, and (b) the CRISPR-Cas effector guide RNA, or a nucleic acid comprising a nucleotide sequence encoding the CRISPR-Cas effector guide RNA.
77. The method of claim 76, wherein said contacting further comprises: introducing a DNA donor template into the cell.
78. The method of any one of claims 66-77, wherein the CRISPR-Cas effector guide RNA comprises a nucleotide sequence having 80% or more nucleotide sequence identity with any one of the nucleotide sequences set forth in FIG. 5A-5M; or is encoded by a nucleic acid comprising a nucleotide sequence having 80%, 90%, 95%, 98%, 99%, or 100%, nucleotide sequence identity with any one of the nucleotide sequences depicted in FIG. 5A-5M.
79. The method of any one of claims 66-78, wherein the CRISPR-Cas effector polypeptide is fused to a nuclear localization signal.
80. A method of modulating transcription from a target DNA, modifying a target nucleic acid, or modifying a protein associated with a target nucleic acid, the method comprising contacting the target nucleic acid with: a) a CRISPR-Cas effector fusion polypeptide comprising a CRISPR-Cas effector polypeptide fused to a heterologous polypeptide; and b) a CRISPR-Cas effector guide RNA comprising a guide sequence that hybridizes to a target sequence of the target nucleic acid, wherein the CRISPR-Cas effector polypeptide comprises an amino acid sequence having 50% or more amino acid sequence identity to the amino acid sequence set forth in any one of FIG. 5A-5M.
81. The method of claim 80, wherein the CRISPR-Cas effector guide RNA comprises a nucleotide sequence having 80% or more nucleotide sequence identity with any one of the crRNA sequences set forth in FIG. 5A-5M; or is encoded by a nucleic acid comprising a nucleotide sequence having 80%, 90%, 95%, 98%, 99%, or 100%, nucleotide sequence identity with any one of the nucleotide sequences depicted in FIG. 5A-5M.
82. The method of claim 80 or claim 81, wherein the CRISPR-Cas effector fusion polypeptide comprises nuclear localization signal.
83. The method of any of claims 80-82, wherein said modification is not cleavage of the target nucleic acid.
84. The method of any of claims 80-83, wherein the target nucleic acid is selected from: double stranded DNA, single stranded DNA, RNA, genomic DNA, and extrachromosomal DNA.
85. The method of any of claims 80-84, wherein said contacting takes place in vitro outside of a cell.
86. The method of any of claims 80-84, wherein said contacting takes place inside of a cell in culture.
87. The method of any of claims 80-84, wherein said contacting takes place inside of a cell in vivo.
88. The method of claim 86 or claim 87, wherein the cell is a eukaryotic cell.
89. The method of claim 88, wherein the cell is selected from: a plant cell, a fungal cell, a mammalian cell, a reptile cell, an insect cell, an avian cell, a fish cell, a par asite cell, an arthropod cell, a cell of an invertebrate, a cell of a vertebrate, a rodent cell, a mouse cell, a rat cell, a primate cell, a nonhuman primate cell, and a human cell.
90. The method of claim 86 or claim 87, wherein the cell is a prokaryotic cell.
91. The method of any one of claims 80-90, wherein said contacting comprises: introducing into a cell: (a) the CRISPR-Cas effector fusion polypeptide, or a nucleic acid comprising a nucleotide sequence encoding the CRISPR-Cas effector fusion polypeptide, and (b) the CRISPR-Cas effector guide RNA, or a nucleic acid comprising a nucleotide sequence encoding the CRISPR-Cas effector guide RNA.
92. The method of any one of claims 80-91, wherein the CRISPR-Cas effector polypeptide is a catalytically inactive CRISPR-Cas effector polypeptide (dCRISPR-Cas effector).
93. The method of any one of claims 80-92, wherein the CRISPR-Cas effector polypeptide has a length of from 275 amino acids to 465 amino acids.
94. The method of any one of claims 80-93, wherein the heterologous polypeptide exhibits an enzymatic activity.
95. The method of claim 94, wherein the heterologous polypeptide exhibits one or more enzymatic activities selected from: nuclease activity, methyltransferase activity, demethylase activity, DNA repair activity, DNA damage activity, deamination activity, dismutase activity, alkylation activity, depurination activity, oxidation activity, pyrimidine dimer forming activity, integrase activity, transposase activity, recombinase activity, polymerase activity, ligase activity, helicase activity, photolyase activity and glycosylase activity.
96. The method of claim 94, wherein the heterologous polypeptide exhibits one or more enzymatic activities selected from: reverse transcriptase activity, nuclease activity, methyltransferase activity, demethylase activity, deamination activity, depurination activity, integrase activity, transposase activity, and recombinase activity.
97. The method of any one of claims 80-93, wherein the heterologous polypeptide exhibits an enzymatic activity that modifies a target polypeptide associated with a target nucleic acid.
98. The method of claim 97, wherein the heterologous polypeptide exhibits histone modification activity.
99. The method of claim 97 or claim 98, wherein the heterologous polypeptide exhibits one or more enzymatic activities selected from: methyltransferase activity, demethylase activity, acetyltransferase activity, deacetylase activity, kinase activity, phosphatase activity, ubiquitin ligase activity, deubiquitinating activity, adenylation activity, deadenylation activity, SUMOylating activity, deSUMOylating activity, ribosylation activity, deribosylation activity, myristoylation activity, demyristoylation activity, glycosylation activity (e.g., from O-GlcNAc transferase) and deglycosylation activity.
100. The method of claim 99, wherein the heterologous polypeptide exhibits one or more enzymatic activities selected from: methyltransferase activity, demethylase activity, acetyltransferase activity, and deacetylase activity.
101. The method of any one of claims 80-93, wherein the heterologous polypeptide is protein that increases or decreases transcription.
102. The method of claim 101, wherein the heterologous polypeptide is a transcriptional repressor domain.
103. The method of claim 101, wherein the heterologous polypeptide is a transcriptional activation domain.
104. The method of any one of claims 80-93, wherein the heterologous polypeptide is a protein binding domain.
105. A transgenic, multicellular, non-human organism whose genome comprises a transgene comprising a nucleotide sequence encoding one or more of: a) a CRISPR-Cas effector polypeptide, b) a CRISPR-Cas effector fusion polypeptide, and c) a CRISPR-Cas effector guide RNA, wherein the CRISPR-Cas effector polypeptide comprises an amino acid sequence having 50% or more amino acid sequence identity to the amino acid sequence set forth in any one of FIG. 5A-5M.
106. The transgenic, multicellular, non-human organism of claim 105, wherein the CRISPR- Cas effector polypeptide comprises an amino acid sequence having 80% or more amino acid sequence identity to the amino acid sequence set forth in any one of FIG. 5 A-5M.
107. The transgenic, multicellular, non-human organism of claim 105, wherein the CRISPR- Cas effector polypeptide comprises an amino acid sequence having 90% or more amino acid sequence identity to the amino acid sequence set forth in any one of FIG. 5 A-5M.
108. The transgenic, multicellular, non-human organism of any one of claims 105-107, wherein the organism is a plant, a monocotyledon plant, a dicotyledon plant, an invertebrate animal, an insect, an arthropod, an arachnid, a parasite, a worm, a cnidarian, a vertebrate animal, a fish, a reptile, an amphibian, an ungulate, a bird, a pig, a horse, a sheep, a rodent, a mouse, a rat, or a non-human primate.
109. A system comprising: a) a CRISPR-Cas effector polypeptide and a CRISPR-Cas effector guide RNA; b) a CRISPR-Cas effector polypeptide, a CRISPR-Cas effector guide RNA, and a DNA donor template; c) a CRISPR-Cas effector fusion polypeptide and a CRISPR-Cas effector guide RNA; d) a CRISPR-Cas effector fusion polypeptide, a CRISPR-Cas effector guide RNA, and a DNA donor template; e) an mRNA encoding a CRISPR-Cas effector polypeptide, and a CRISPR-Cas effector guide RNA;
1) an mRNA encoding a CRISPR-Cas effector polypeptide; a CRISPR-Cas effector guide RNA, and a DNA donor template; g) an mRNA encoding a CRISPR-Cas effector fusion polypeptide, and a CRISPR-Cas effector guide RNA; h) an mRNA encoding a CRISPR-Cas effector fusion polypeptide, a CRISPR-Cas effector guide RNA, and a DNA donor template; i) one or more recombinant expression vectors comprising: i) a nucleotide sequence encoding a CRISPR-Cas effector polypeptide; and ii) a nucleotide sequence encoding a CRISPR-Cas effector guide RNA; j) one or more recombinant expression vectors comprising: i) a nucleotide sequence encoding a CRISPR-Cas effector polypeptide; ii) a nucleotide sequence encoding a CRISPR-Cas effector guide RNA; and iii) a DNA donor template; k) one or more recombinant expression vectors comprising: i) a nucleotide sequence encoding a CRISPR-Cas effector fusion polypeptide; and ii) a nucleotide sequence encoding a CRISPR-Cas effector guide R A; and l) one or more recombinant expression vectors comprising: i) a nucleotide sequence encoding a CRISPR-Cas effector fusion polypeptide; ii) a nucleotide sequence encoding a CRISPR-Cas effector guide RNA; and a DNA donor template, wherein the CRISPR-Cas effector polypeptide comprises an amino acid sequence having 50% or more amino acid sequence identity to the amino acid sequence depicted in any one of FIG. 5 A-5M, and wherein the CRTSPR-Cas effector fusion polypeptide is a CRISPR-Cas effector fusion polypeptide of any one of claims 16-36.
110. The CRISPR-Cas effector system of claim 109, wherein the CRISPR-Cas effector polypeptide comprises an amino acid sequence having 80% or more amino acid sequence identity to the amino acid sequence depicted in any one of FIG. 5 A-5M.
111. The CRISPR-Cas effector system of claim 109, wherein the CRISPR-Cas effector polypeptide comprises an amino acid sequence having 90% or more amino acid sequence identity to the amino acid sequence depicted in any one of FIG. 5 A-5M.
112. The CRISPR-Cas effector system of any of claims 109-111, wherein the donor template nucleic acid has a length of from 8 nucleotides to 1000 nucleotides.
113. The CRISPR-Cas effector system of any of claims 109-111, wherein the donor template nucleic acid has a length of from 25 nucleotides to 500 nucleotides.
114. A kit comprising the CRISPR-Cas effector system of any one of claims 109-113.
115. The kit of claim 114, wherein the components of the kit are in the same container.
116. The kit of claim 114, wherein the components of the kit are in separate containers.
117. A sterile container comprising the CRISPR-Cas effector system of any one of claims 109-116.
118. The sterile container of claim 117, wherein the container is a syringe.
119. An implantable device comprising the CRISPR-Cas effector system of any one of claims 109-116.
120. The implantable device of claim 119, wherein the CRISPR-Cas effector system is within a matrix.
121. The implantable device of claim 119, wherein the CRISPR-Cas effector system is in a reservoir.
PCT/US2023/068823 2022-06-22 2023-06-21 Crispr-cas effector polypeptides and methods of use thereof WO2023250384A2 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
US202263354590P 2022-06-22 2022-06-22
US63/354,590 2022-06-22

Publications (2)

Publication Number Publication Date
WO2023250384A2 true WO2023250384A2 (en) 2023-12-28
WO2023250384A3 WO2023250384A3 (en) 2024-02-08

Family

ID=89380685

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/US2023/068823 WO2023250384A2 (en) 2022-06-22 2023-06-21 Crispr-cas effector polypeptides and methods of use thereof

Country Status (1)

Country Link
WO (1) WO2023250384A2 (en)

Family Cites Families (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
KR102676910B1 (en) * 2012-12-06 2024-06-19 시그마-알드리치 컴퍼니., 엘엘씨 Crispr-based genome modification and regulation
CA3150454A1 (en) * 2019-09-09 2021-03-18 Arbor Biotechnologies, Inc. Novel crispr dna targeting enzymes and systems
EP4051788A4 (en) * 2019-10-30 2023-12-06 Pairwise Plants Services, Inc. Type v crispr-cas base editors and methods of use thereof
MX2022007858A (en) * 2019-12-23 2022-09-19 Univ California Crispr-cas effector polypeptides and methods of use thereof.

Also Published As

Publication number Publication date
WO2023250384A3 (en) 2024-02-08

Similar Documents

Publication Publication Date Title
JP7047014B2 (en) Methods and compositions for increasing the efficiency of target gene modification using oligonucleotide-mediated gene repair
JP6947784B2 (en) Methods and compositions for increasing the efficiency of target gene modification using oligonucleotide-mediated gene repair
US11001843B2 (en) Engineered nucleic acid-targeting nucleic acids
US20230242927A1 (en) Novel plant cells, plants, and seeds
JP2022514493A (en) A novel CRISPR-CAS system for genome editing
RU2663354C2 (en) Compositions and methods for modification of specified target nucleic acid sequence
US20230407322A1 (en) Methods and compositions for increasing efficiency of targeted gene modification using oligonucleotide-mediated gene repair
EP2981166B1 (en) Methods and compositions for integration of an exogenous sequence within the genome of plants
US20160362667A1 (en) CRISPR-Cas Compositions and Methods
EP3935156A1 (en) Crispr-cas effector polypeptides and methods of use thereof
CN113473845A (en) Gene silencing via genome editing
KR20160111450A (en) Zea mays regulatory elements and uses thereof
US20230084762A1 (en) Novel crispr-cas systems for genome editing
WO2023216415A1 (en) Base editing system based on bimolecular deaminase complementation, and use thereof
WO2023250384A2 (en) Crispr-cas effector polypeptides and methods of use thereof
EP4377465A1 (en) Temperature regulated crispr-cas systems and methods of use thereof
WO2021003410A1 (en) Organelle genome modification
양준우 Cellular Response to Ectopic Expression of DNA Demethylase and its Application to Epigenome Editing
Huhdanmäki CRISPR-Cas9 based genetic engineering and mutation detection in genus Nicotiana
JP2021522829A (en) Methods and compositions for targeted editing of polynucleotides

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 23828022

Country of ref document: EP

Kind code of ref document: A2