WO2018208755A1 - Compositions and methods for tagging target proteins in proximity to a nucleotide sequence of interest - Google Patents

Compositions and methods for tagging target proteins in proximity to a nucleotide sequence of interest Download PDF

Info

Publication number
WO2018208755A1
WO2018208755A1 PCT/US2018/031563 US2018031563W WO2018208755A1 WO 2018208755 A1 WO2018208755 A1 WO 2018208755A1 US 2018031563 W US2018031563 W US 2018031563W WO 2018208755 A1 WO2018208755 A1 WO 2018208755A1
Authority
WO
WIPO (PCT)
Prior art keywords
nucleic acid
protein
bira
target
cas
Prior art date
Application number
PCT/US2018/031563
Other languages
French (fr)
Inventor
Jacob E. Corn
Alan Shyijong WANG
Gavin SCHLISSEL
Original Assignee
The Regents Of The University Of California
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by The Regents Of The University Of California filed Critical The Regents Of The University Of California
Publication of WO2018208755A1 publication Critical patent/WO2018208755A1/en

Links

Classifications

    • GPHYSICS
    • G01MEASURING; TESTING
    • G01NINVESTIGATING OR ANALYSING MATERIALS BY DETERMINING THEIR CHEMICAL OR PHYSICAL PROPERTIES
    • G01N33/00Investigating or analysing materials by specific methods not covered by groups G01N1/00 - G01N31/00
    • G01N33/48Biological material, e.g. blood, urine; Haemocytometers
    • G01N33/50Chemical analysis of biological material, e.g. blood, urine; Testing involving biospecific ligand binding methods; Immunological testing
    • G01N33/53Immunoassay; Biospecific binding assay; Materials therefor
    • G01N33/5308Immunoassay; Biospecific binding assay; Materials therefor for analytes not provided for elsewhere, e.g. nucleic acids, uric acid, worms, mites
    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12NMICROORGANISMS OR ENZYMES; COMPOSITIONS THEREOF; PROPAGATING, PRESERVING, OR MAINTAINING MICROORGANISMS; MUTATION OR GENETIC ENGINEERING; CULTURE MEDIA
    • C12N15/00Mutation or genetic engineering; DNA or RNA concerning genetic engineering, vectors, e.g. plasmids, or their isolation, preparation or purification; Use of hosts therefor
    • C12N15/09Recombinant DNA-technology
    • C12N15/11DNA or RNA fragments; Modified forms thereof; Non-coding nucleic acids having a biological activity
    • C12N15/62DNA sequences coding for fusion proteins
    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12NMICROORGANISMS OR ENZYMES; COMPOSITIONS THEREOF; PROPAGATING, PRESERVING, OR MAINTAINING MICROORGANISMS; MUTATION OR GENETIC ENGINEERING; CULTURE MEDIA
    • C12N9/00Enzymes; Proenzymes; Compositions thereof; Processes for preparing, activating, inhibiting, separating or purifying enzymes
    • C12N9/14Hydrolases (3)
    • C12N9/16Hydrolases (3) acting on ester bonds (3.1)
    • C12N9/22Ribonucleases RNAses, DNAses
    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12NMICROORGANISMS OR ENZYMES; COMPOSITIONS THEREOF; PROPAGATING, PRESERVING, OR MAINTAINING MICROORGANISMS; MUTATION OR GENETIC ENGINEERING; CULTURE MEDIA
    • C12N9/00Enzymes; Proenzymes; Compositions thereof; Processes for preparing, activating, inhibiting, separating or purifying enzymes
    • C12N9/93Ligases (6)
    • GPHYSICS
    • G01MEASURING; TESTING
    • G01NINVESTIGATING OR ANALYSING MATERIALS BY DETERMINING THEIR CHEMICAL OR PHYSICAL PROPERTIES
    • G01N33/00Investigating or analysing materials by specific methods not covered by groups G01N1/00 - G01N31/00
    • G01N33/48Biological material, e.g. blood, urine; Haemocytometers
    • G01N33/50Chemical analysis of biological material, e.g. blood, urine; Testing involving biospecific ligand binding methods; Immunological testing
    • G01N33/68Chemical analysis of biological material, e.g. blood, urine; Testing involving biospecific ligand binding methods; Immunological testing involving proteins, peptides or amino acids
    • CCHEMISTRY; METALLURGY
    • C07ORGANIC CHEMISTRY
    • C07KPEPTIDES
    • C07K2319/00Fusion polypeptide
    • CCHEMISTRY; METALLURGY
    • C07ORGANIC CHEMISTRY
    • C07KPEPTIDES
    • C07K2319/00Fusion polypeptide
    • C07K2319/40Fusion polypeptide containing a tag for immunodetection, or an epitope for immunisation
    • C07K2319/42Fusion polypeptide containing a tag for immunodetection, or an epitope for immunisation containing a HA(hemagglutinin)-tag
    • CCHEMISTRY; METALLURGY
    • C07ORGANIC CHEMISTRY
    • C07KPEPTIDES
    • C07K2319/00Fusion polypeptide
    • C07K2319/40Fusion polypeptide containing a tag for immunodetection, or an epitope for immunisation
    • C07K2319/43Fusion polypeptide containing a tag for immunodetection, or an epitope for immunisation containing a FLAG-tag

Definitions

  • a subject method can include contacting a target nucleic acid (e.g., DNA molecule) that includes the nucleotide sequence of interest with (a) a first fusion protein that includes a first nuclease defective CRISPR/Cas nucleic acid-guided polypeptide (e.g., a dCas9) fused to a first portion of a split BirA* protein; (b) a first guide nucleic acid (e.g., a first Cas9 guide RNA) that binds to the first fusion protein and hybridizes to a first target sequence of the target nucleic acid; (c) a second fusion protein comprising a second nuclease defective CRISPR/Cas nucleic acid-guided polypeptide (e.g.,
  • a complex of (a) and (b) forms [complex 1 - first fusion protein plus first guide nucleic acid] and a complex of (c) and (d) forms [complex 2 - second fusion protein plus second guide nucleic acid] .
  • Complex 1 is targeted to a first target sequence (the first guide nucleic acid hybridizes to the first target sequence) and complex 2 is targeted to a second target sequence (the second guide nucleic acid hybridizes to the second target sequence) such that (1) the nuclease defective CRISPR/Cas nucleic acid-guided polypeptide (e.g., dCas9) of the first fusion protein is not occupying the same space as the nuclease defective CRISPR/Cas nucleic acid-guided polypeptide (e.g., dCas9) of the second fusion protein; and (2) a functional BirA* protein forms when complexes 1 and 2 are positioned in such a way to bring the first and second portions of the split BirA* protein together.
  • the nuclease defective CRISPR/Cas nucleic acid-guided polypeptide e.g., dCas9
  • the first and second portions of the split BirA* protein are brought together (thereby forming a functional BirA* protein), by virtue of the proper positioning of the first and second fusion proteins [noted above as (a) and (c)] near the nucleotide sequence of interest.
  • the functional BirA* protein then biotinylates target proteins within its proximity (e.g., within a range of 12 nm, such as within a range of 10 nm or within a range of 8 nm).
  • each fusion protein (first or second) binds to which guide nucleic acid (first or second) is not controlled - thus in some cases each fusion protein can bind to either of the two guide nucleic acids.
  • the first target sequence (targeted by complex 1) is immediately adjacent a first protospacer adjacent motif (PAM); the second target sequence (targeted by complex 2) is immediately adjacent a second protospacer adjacent motif (PAM); and the first and second PAMs are separated by an appropriate number of base pairs so as to allow the first and second portions of the split BirA* protein to be brought together (thereby forming a functional BirA* protein) when complex 1 and complex 2 are targeted to their respective target sequences.
  • the first target sequence is immediately adjacent a first protospacer adjacent motif (PAM); the second target sequence is immediately adjacent a second protospacer adjacent motif (PAM); and the first and second PAMs are separated by 15 to 100 base pairs (bp).
  • nuclease defective CRISPR/Cas nucleic acid-guided polypeptide of the first and/or second fusion protein is a nuclease defective class 2 CRISPR/Cas nucleic acid-guided polypeptide (e.g., nuclease defective type II, type V, or type VI CRISPR/Cas nucleic acid-guided polypeptide).
  • nuclease defective CRISPR/Cas nucleic acid-guided polypeptide of the fusion protein (a) and/or (c) is a nuclease defective Cas9, Cpfl (also called Casl2a), CasX, or CasY protein.
  • the step of contacting a target nucleic acid (e.g., DNA molecule) of a subject method includes introducing into a cell one or more of: (i) the first fusion protein and/or a nucleic acid encoding the first fusion protein; (ii) the first guide nucleic acid and/or a nucleic acid encoding the first guide nucleic acid; (iii) the second fusion protein and/or a nucleic acid encoding the second fusion protein; and (iv) the second guide nucleic acid and/or a nucleic acid encoding the second guide nucleic acid.
  • a subject method includes a step of detecting, isolating, and/or identifying one or more of the tagged (e.g., biotinylated) target proteins.
  • Figure 1 depicts crystal structures used to identify the cut site with BirA*. The full-length
  • BirA* is depicted in green, and the biotin molecule is depicted in white. With the chosen cut site within BirA*, the N-terminal half is depicted in yellow and the C-terminal half is depicted in blue.
  • Figure 2 depicts various split BirA* constructs used to determine whether or not a split GFP tether can effectively drive assembly of the split BirA* halves into a catalytically active enzyme.
  • Figure 3 depicts split BirA* is capable of biotinylation only when brought together by a
  • Figure 4 depicts various orientations for fusion to a Cas9 protein.
  • FIG. 5 depicts various subject fusion proteins (nuclease defective Cas9 fused to split BirA*) that were designed.
  • Split BirA* could be fused to either the N-terminus or C-terminus of dCas9.
  • Two different flexible linkers between the split BirA* and dCas9 were tested.
  • the XTEN linker is SGSETPGTSES (SEQ ID NO: 10).
  • the second linker includes an additional five amino acids GGGGS (SEQ ID NO: 9) so that this longer linker is comprised of the amino acids
  • GGGGSSGSETPGTSES (SEQ ID NO: 1135).
  • Figure 6 depicts various possible orientations for different combinations of subject fusion proteins (nuclease defective Cas9 fused to split BirA*).
  • the two fusions proteins can be targeted towards the same DNA strand or opposite DNA strands.
  • Figure 7 depicts various spacings between the two guide nucleic acids (termed “sgRNAs” in the figure) used to localize the two fusion proteins near the beta actin promoter portion of a target DNA.
  • spacer is equivalent to "guide sequence” / "targeting sequence.”
  • Figure 8 depicts experimental work flow that was used for experiments of the disclosure.
  • HEK293 cells were transiently transfected with plasmids to express the two fusion proteins and two guide nucleic acids. Biotin was added to the culture media 24 hours after transfection to a final concentration of 50 ⁇ . Cells were lysed 24 hours after the addition of biotin. Biotinylated proteins in the cell lysate were then allowed to bind to Streptavidin-coated magnetic beads and eluted off the beads after an incubation.
  • Figure 9 depicts Western Blots demonstrating expression of subject fusion proteins (Cas9-
  • Split BirA* fusion proteins in this case of various orientations (e.g., split-BirA* fused to the N- terminus or C-terminus of Cas9).
  • Figure 10 depicts data from streptavidin pull down experiments.
  • S primary supernatant
  • E eluate
  • FL full-length BirA*
  • +/+ sgRNA two sgRNAs targeting the same strand
  • +/- sgRNA two sgRNAs targeting different strands.
  • A.2 GGGGSSGSETPGTSES (SEQ ID NO: 1135) linker.
  • Figure 11 depicts sequences of generated fusion constructs.
  • polynucleotide and “nucleic acid,” used interchangeably herein, refer to a
  • nucleotides of any length either ribonucleotides or deoxyribonucleotides.
  • this term includes, but is not limited to, single-, double-, or multi-stranded DNA or RNA, genomic DNA, cDNA, DNA -RNA hybrids, or a polymer comprising purine and pyrimidine bases or other natural, chemically or biochemically modified, non-natural, or derivatized nucleotide bases.
  • hybridizable or “complementary” or “substantially complementary” it is meant that a nucleic acid (e.g. RNA, DNA) comprises a sequence of nucleotides that enables it to non- covalently bind, i.e. form Watson-Crick base pairs and/or G/U base pairs, “anneal”, or
  • Standard Watson-Crick base-pairing includes: adenine (A) pairing with thymidine (T), adenine (A) pairing with uracil (U), and guanine (G) pairing with cytosine (C) [DNA, RNA].
  • RNA molecules e.g., dsRNA
  • DNA molecule with an RNA molecule e.g., when a DNA target nucleic acid base pairs with a guide RNA, etc.
  • guanine (G) can also base pair with uracil (U).
  • G/U base-pairing is at least partially responsible for the degeneracy (i.e., redundancy) of the genetic code in the context of tRNA anti-codon base-pairing with codons in mRNA.
  • a guanine (e.g., of dsRNA duplex of a guide RNA molecule; of a guide RNA base pairing with a target nucleic acid, etc.) is considered complementary to both a uracil (U) and to an adenine (A).
  • G guanine
  • U uracil
  • A adenine
  • Hybridization requires that the two nucleic acids contain complementary sequences, although mismatches between bases are possible.
  • the conditions appropriate for hybridization between two nucleic acids depend on the length of the nucleic acids and the degree of complementarity, variables well known in the art. The greater the degree of complementarity between two nucleotide sequences, the greater the value of the melting temperature (Tm) for hybrids of nucleic acids having those sequences.
  • Tm melting temperature
  • the length for a hybridizable nucleic acid is 8 nucleotides or more (e.g., 10 nucleotides or more, 12 nucleotides or more, 15 nucleotides or more, 20 nucleotides or more, 22 nucleotides or more, 25 nucleotides or more, or 30 nucleotides or more).
  • Temperature, wash solution salt concentration, and other conditions may be adjusted as necessary according to factors such as length of the region of complementation and the degree of complementation.
  • sequence of a polynucleotide need not be 100% complementary to that of its target nucleic acid to be specifically hybridizable or hybridizable. Moreover, a
  • polynucleotide may hybridize over one or more segments such that intervening or adjacent segments are not involved in the hybridization event (e.g., a bulge, a loop structure or hairpin structure, etc.).
  • a polynucleotide can comprise 60% or more, 65% or more, 70% or more, 75% or more, 80% or more, 85% or more, 90% or more, 95% or more, 98% or more, 99% or more, 99.5% or more, or 100% sequence complementarity to a target region within the target nucleic acid sequence to which it will hybridize.
  • an antisense nucleic acid in which 18 of 20 nucleotides of the antisense compound are complementary to a target region, and would therefore specifically hybridize would represent 90 percent complementarity.
  • the remaining noncomplementary nucleotides may be clustered or interspersed with complementary nucleotides and need not be contiguous to each other or to complementary nucleotides.
  • Percent complementarity between particular stretches of nucleic acid sequences within nucleic acids can be determined using any convenient method.
  • Example methods include BLAST programs (basic local alignment search tools) and PowerBLAST programs (Altschul et al., J. Mol.
  • peptide refers to a polymeric form of amino acids of any length, which can include coded and non-coded amino acids, chemically or biochemically modified or derivatized amino acids, and polypeptides having modified peptide backbones.
  • Binding refers to a non-covalent interaction between macromolecules (e.g., between a protein and a nucleic acid; between a CRISPR/Cas protein/guide RNA complex and a target nucleic acid; and the like). While in a state of non- covalent interaction, the macromolecules are said to be "associated” or "interacting" or
  • binding e.g., when a molecule X is said to interact with a molecule Y, it is meant the molecule X binds to molecule Y in a non-covalent manner). Not all components of a binding interaction need be sequence-specific (e.g., contacts with phosphate residues in a DNA backbone), but some portions of a binding interaction may be sequence-specific. Binding interactions are generally characterized by a dissociation constant (K D ) of less than 10 6 M, less than 10 7 M, less than 10 8 M, less than 10 9 M, less than 10 10 M, less than 10 11 M, less than 10 12 M, less than 10 13 M, less than 10 14 M, or less than 10 15 M. "Affinity" refers to the strength of binding, increased binding affinity being correlated with a lower K D .
  • binding domain it is meant a protein domain that is able to bind non-covalently to
  • a binding domain can bind to, for example, a DNA molecule (a DNA -binding domain), an RNA molecule (an RNA-binding domain) and/or a protein molecule (a protein- binding domain).
  • a DNA -binding domain a DNA -binding domain
  • an RNA molecule an RNA-binding domain
  • a protein molecule a protein- binding domain
  • it can in some cases bind to itself (to form homodimers, homotrimers, etc.) and/or it can bind to one or more regions of a different protein or proteins.
  • a group of amino acids having aliphatic side chains consists of glycine, alanine, valine, leucine, and isoleucine; a group of amino acids having aliphatic -hydroxyl side chains consists of serine and threonine; a group of amino acids having amide containing side chains consisting of asparagine and glutamine; a group of amino acids having aromatic side chains consists of phenylalanine, tyrosine, and tryptophan; a group of amino acids having basic side chains consists of lysine, arginine, and histidine; a group of amino acids having acidic side chains consists of glutamate and aspartate; and a group of amino acids having sulfur containing side chains consists of cysteine and methionine.
  • Exemplary conservative amino acid substitution groups are: valine -leucine - isoleucine, phenylalanine-tyrosine, lysine-arginine, alanine-valine-glycine, and asparagine- glutamine.
  • a polynucleotide or polypeptide has a certain percent "sequence identity" to another
  • polynucleotide or polypeptide meaning that, when aligned, that percentage of bases or amino acids are the same, and in the same relative position, when comparing the two sequences.
  • Sequence identity can be determined in a number of different ways. To determine sequence identity, sequences can be aligned using various convenient methods and computer programs (e.g., BLAST, T-COFFEE, MUSCLE, MAFFT, etc.), available over the world wide web at sites including ncbi.nlm.nili.gov/BLAST, ebi.ac.uk/Tools/msa/tcoffee/, ebi.ac.uk/Tools/msa/muscle/, mafft.cbrc.jp/alignment/software/. See, e.g., Altschul et al. (1990), J. Mol. Bioi. 215:403-10.
  • a DNA sequence that "encodes" a particular RNA is a DNA nucleotide sequence that is
  • a DNA polynucleotide may encode an RNA (mRNA) that is translated into protein (and therefore the DNA and the mRNA both encode the protein), or a DNA polynucleotide may encode an RNA that is not translated into protein (e.g. tRNA, rRNA, microRNA (miRNA), a "non-coding" RNA (ncRNA), a guide RNA, etc.).
  • mRNA RNA
  • rRNA microRNA
  • ncRNA non-coding RNA
  • guide RNA etc.
  • a "protein coding sequence” or a sequence that encodes a particular protein or polypeptide is a nucleotide sequence that is transcribed into mRNA (in the case of DNA) and is translated (in the case of mRNA) into a polypeptide in vitro or in vivo when placed under the control of appropriate regulatory sequences.
  • DNA regulatory sequences refer to transcriptional and translational control sequences, such as promoters, enhancers, polyadenylation signals, terminators, protein degradation signals, and the like, that provide for and/or regulate transcription of a non-coding sequence (e.g., guide RNA) or a coding sequence (e.g., CRISPR/Cas RNA-guided polypeptide, Cas9 polypeptide, and the like) and/or regulate translation of an encoded polypeptide.
  • a non-coding sequence e.g., guide RNA
  • a coding sequence e.g., CRISPR/Cas RNA-guided polypeptide, Cas9 polypeptide, and the like
  • a “promoter” or a “promoter sequence” is a DNA regulatory region capable of binding RNA polymerase and initiating transcription of a downstream (3' direction) coding or non-coding sequence.
  • the promoter sequence is bounded at its 3' terminus by the transcription initiation site and extends upstream (5' direction) to include the minimum number of bases or elements necessary to initiate transcription at levels detectable above background.
  • a transcription initiation site within the promoter sequence will be found a transcription initiation site, as well as protein binding domains responsible for the binding of RNA polymerase.
  • Eukaryotic promoters will often, but not always, contain "TATA" boxes and "CAT” boxes.
  • Various promoters, including inducible promoters may be used to drive the various vectors of the present disclosure.
  • nucleic acid refers to a nucleic acid, polypeptide, cell, or organism that is found in nature.
  • a polypeptide or polynucleotide sequence that is present in an organism (including viruses) that can be isolated from a source in nature is naturally occurring.
  • chimeric refers to two components that are defined by structures derived from different sources.
  • a chimeric polypeptide e.g., a chimeric Cas9 protein or Cas9 fusion protein
  • the chimeric polypeptide includes amino acid sequences that are derived from different polypeptides.
  • a chimeric polypeptide may comprise either modified or naturally- occurring polypeptide sequences (e.g., a first amino acid sequence from a modified or unmodified Cas9 protein; and a second amino acid sequence from a modified or unmodified protein other than a Cas9 protein, etc.).
  • chimeric in the context of a polynucleotide encoding a chimeric polypeptide includes nucleotide sequences derived from different coding regions (e.g., a first nucleotide sequence encoding a modified or unmodified Cas9 protein; and a second nucleotide sequence encoding a polypeptide other than a Cas9 protein).
  • chimeric polypeptide refers to a polypeptide which is made by the combination (i.e., "fusion") of two otherwise separated segments of amino acid sequence, usually through human intervention.
  • a polypeptide that comprises a chimeric amino acid sequence is a chimeric polypeptide.
  • Chimeric polypeptides can also be referred to as "fusion proteins” or "fusion variants.”
  • Heterologous means a nucleotide or polypeptide sequence that is not found in the native nucleic acid or protein, respectively.
  • heterologous means a nucleotide or polypeptide sequence that is not found in the native nucleic acid or protein, respectively.
  • RNA-binding domain of a naturally-occurring bacterial Cas9 polypeptide may be fused to a heterologous polypeptide sequence (i.e. a polypeptide sequence from a protein other than Cas9 or a polypeptide sequence from another organism).
  • the heterologous polypeptide sequence may exhibit an activity (e.g., enzymatic activity) that will also be exhibited by the chimeric Cas9 protein (e.g., biotin ligase activity, etc.).
  • a heterologous nucleic acid sequence may be linked to a naturally-occurring nucleic acid sequence (or a variant thereof) (e.g., by genetic engineering) to generate a chimeric nucleotide sequence encoding a chimeric polypeptide (a fusion protein).
  • Recombinant means that a particular nucleic acid (DNA or RNA) is the product of various combinations of cloning, restriction, polymerase chain reaction (PCR) and/or ligation steps resulting in a construct having a structural coding or non-coding sequence distinguishable from endogenous nucleic acids found in natural systems.
  • DNA sequences encoding polypeptides can be assembled from cDNA fragments or from a series of synthetic oligonucleotides, to provide a synthetic nucleic acid which is capable of being expressed from a recombinant transcriptional unit contained in a cell or in a cell-free transcription and translation system.
  • Genomic DNA comprising the relevant sequences can also be used in the formation of a recombinant gene or transcriptional unit. Sequences of non-translated DNA may be present 5' or 3' from the open reading frame, where such sequences do not interfere with manipulation or expression of the coding regions, and may indeed act to modulate production of a desired product by various mechanisms (see “DNA regulatory sequences”). Alternatively, DNA sequences encoding RNA (e.g., guide RNA) that is not translated may also be considered recombinant. Thus, e.g., the term "recombinant" nucleic acid refers to one which is not naturally occurring, e.g., is made by the artificial combination of two otherwise separated segments of sequence through human intervention.
  • This artificial combination is often accomplished by either chemical synthesis means, or by the artificial manipulation of isolated segments of nucleic acids, e.g., by genetic engineering techniques. Such is usually done to replace a codon with a codon encoding the same amino acid, a conservative amino acid, or a non-conservative amino acid. Alternatively, it is performed to join together nucleic acid segments of desired functions to generate a desired combination of functions. This artificial combination is often accomplished by either chemical synthesis means, or by the artificial manipulation of isolated segments of nucleic acids, e.g., by genetic engineering techniques.
  • the sequence of the encoded polypeptide can be naturally occurring ("wild type") or can be a variant (e.g., a mutant) of the naturally occurring sequence.
  • An example of such a case is a DNA (a recombinant) encoding a wild-type protein where the DNA sequence is codon optimized for expression of the protein in a cell (e.g., a eukaryotic cell) in which the protein is not naturally found (e.g., expression of a CRISPR/Cas RNA-guided polypeptide such as Cas9 in a eukaryotic cell).
  • a codon-optimized DNA can therefore be recombinant and non-naturally occurring while the protein encoded by the DNA may have a wild type amino acid sequence.
  • the term "recombinant" polypeptide does not necessarily refer to a polypeptide whose amino acid sequence does not naturally occur. Instead, a "recombinant" polypeptide is encoded by a recombinant non-naturally occurring DNA sequence, but the amino acid sequence of the polypeptide can be naturally occurring ("wild type") or non-naturally occurring (e.g., a variant, a mutant, etc.). Thus, a "recombinant" polypeptide is the result of human intervention, but may have a naturally occurring amino acid sequence.
  • a "vector” or "expression vector” is a replicon, such as plasmid, phage, virus, or cosmid, to which another DNA segment, i.e. an "insert”, may be attached so as to bring about the replication of the attached segment in a cell.
  • An "expression cassette” comprises a DNA coding sequence operably linked to a promoter.
  • operably linked refers to a juxtaposition wherein the components so described are in a relationship permitting them to function in their intended manner.
  • a promoter is operably linked to a coding sequence (or the coding sequence can also be said to be operably linked to the promoter) if the promoter affects its transcription or expression.
  • recombinant expression vector or "DNA construct” are used interchangeably herein to refer to a DNA molecule comprising a vector and one insert.
  • Recombinant expression vectors are usually generated for the purpose of expressing and/or propagating the insert(s), or for the construction of other recombinant nucleotide sequences.
  • the insert(s) may or may not be operably linked to a promoter sequence and may or may not be operably linked to DNA regulatory sequences.
  • a cell has been "genetically modified” or “transformed” or “transfected” by exogenous DNA, e.g. a recombinant expression vector, when such DNA has been introduced inside the cell.
  • exogenous DNA e.g. a recombinant expression vector
  • the presence of the exogenous DNA results in permanent or transient genetic change.
  • the transforming DNA may or may not be integrated (covalently linked) into the genome of the cell.
  • the transforming DNA may be maintained on an episomal element such as a plasmid.
  • a stably transformed cell is one in which the transforming DNA has become integrated into a
  • chromosome so that it is inherited by daughter cells through chromosome replication. This stability is demonstrated by the ability of the eukaryotic cell to establish cell lines or clones that comprise a population of daughter cells containing the transforming DNA.
  • a "clone” is a population of cells derived from a single cell or common ancestor by mitosis.
  • a "cell line” is a clone of a primary cell that is capable of stable growth in vitro for many generations.
  • Suitable methods of genetic modification include e.g., viral or bacteriophage infection, transfection, conjugation, protoplast fusion, lipofection, electroporation, calcium phosphate precipitation, polyethyleneimine (PEI) -mediated transfection, DEAE-dextran mediated transfection, liposome-mediated transfection, particle gun technology, calcium phosphate precipitation, direct micro injection, nanoparticle-mediated nucleic acid delivery (see, e.g., Panyam et., al Adv Drug Deliv Rev. 2012 Sep 13. pii: S0169- 409X(12)00283-9. doi: 10.1016/j.addr.2012.09.023 ), and the like.
  • transformation include e.g., viral or bacteriophage infection, transfection, conjugation, protoplast fusion, lipofection, electroporation, calcium phosphate precipitation, polyethyleneimine (PEI) -mediated transfection, DEAE-dextran mediated transfection, liposome-mediated transfection,
  • a "target nucleic acid” as used herein is a polynucleotide (e.g., DNA such as genomic DNA) that includes a site ("target site” or “target sequence") targeted by a CRISPR/Cas RNA-guided polypeptide (e.g., a subject Cas9 fusion protein).
  • the target sequence is the sequence to which the guide sequence of a subject CRISPR/Cas guide RNA (e.g., a Cas9 guide RNA) will hybridize.
  • the target site (or target sequence) 5'-GAGCAUAUC-3' within a target nucleic acid is targeted by (or is bound by, or hybridizes with, or is complementary to) the sequence 5'-GAUAUGCUC-3' .
  • Suitable hybridization conditions include physiological conditions normally present in a cell.
  • the strand of the target nucleic acid that is complementary to and hybridizes with the guide RNA is referred to as the "complementary strand” or “target strand”; while the strand of the target nucleic acid that is complementary to the “target strand” (and is therefore not complementary to the guide RNA) is referred to as the "non-target strand” or “non-complementary strand.”
  • cleavage it is meant the breakage of the covalent backbone of a target nucleic acid
  • Cleavage can be initiated by a variety of methods including, but not limited to, enzymatic or chemical hydrolysis of a phosphodiester bond. Both single-stranded cleavage and double-stranded cleavage are possible, and double-stranded cleavage can occur as a result of two distinct single-stranded cleavage events.
  • Nuclease and “endonuclease” are used interchangeably herein to mean an enzyme which possesses catalytic activity for nucleic acid cleavage (e.g., ribonuclease activity (ribonucleic acid cleavage), deoxyribonuclease activity (deoxyribonucleic acid cleavage), etc.).
  • a “genome editing endonuclease” is an endonuclease that can be used for the editing of a cell's genome (e.g., by cleaving at a targeted location within the cell's genomic DNA).
  • Examples of genome editing endonucleases include but are not limited to: (i) Zinc finger nucleases, (ii) TAL endonucleases, and (iii) CRISPR/Cas endonucleases.
  • Examples of CRISPR/Cas endonucleases include class 2 CRISPR/Cas endonucleases such as: (a) type II CRISPR/Cas proteins, e.g., a Cas9 protein; (b) type V CRISPR/Cas proteins, e.g., a Cpfl polypeptide, a C2cl polypeptide, a C2c3 polypeptide, and the like; and (c) type VI CRISPR/Cas proteins, e.g., a C2c2 polypeptide.
  • cleavage domain or “active domain” or “nuclease domain” of a nuclease it is meant the polypeptide sequence or domain within the nuclease which possesses the catalytic activity for nucleic acid cleavage.
  • a cleavage domain can be contained in a single polypeptide chain or cleavage activity can result from the association of two (or more) polypeptides.
  • a single nuclease domain may consist of more than one isolated stretch of amino acids within a given polypeptide.
  • stem cell is used herein to refer to a cell (e.g., plant stem cell, vertebrate stem cell) that has the ability both to self-renew and to generate a differentiated cell type (see Morrison et al. (1997) Cell 88:287-298).
  • the adjective "differentiated”, or “differentiating” is a relative term.
  • a “differentiated cell” is a cell that has progressed further down the developmental pathway than the cell it is being compared with.
  • pluripotent stem cells can differentiate into lineage-restricted progenitor cells (e.g., mesodermal stem cells), which in turn can differentiate into cells that are further restricted (e.g., neuron progenitors), which can differentiate into end-stage cells (i.e., terminally differentiated cells, e.g., neurons, cardiomyocytes, etc.), which play a characteristic role in a certain tissue type, and may or may not retain the capacity to proliferate further.
  • progenitor cells e.g., mesodermal stem cells
  • end-stage cells i.e., terminally differentiated cells, e.g., neurons, cardiomyocytes, etc.
  • Stem cells may be characterized by both the presence of specific markers (e.g., proteins, RNAs, etc.) and the absence of specific markers.
  • Stem cells may also be identified by functional assays both in vitro and in vivo, particularly assays relating to the ability of stem cells to give rise to multiple differentiated
  • Stem cells of interest include pluripotent stem cells (PSCs).
  • PSC pluripotent stem cell
  • the term "pluripotent stem cell” or “PSC” is used herein to mean a stem cell capable of producing all cell types of the organism. Therefore, a PSC can give rise to cells of all germ layers of the organism (e.g., the endoderm, mesoderm, and ectoderm of a vertebrate).
  • Pluripotent cells are capable of forming teratomas and of contributing to ectoderm, mesoderm, or endoderm tissues in a living organism.
  • Pluripotent stem cells of plants are capable of giving rise to all cell types of the plant (e.g., cells of the root, stem, leaves, etc.).
  • PSCs of animals can be derived in a number of different ways.
  • ESCs embryonic stem cells
  • ESCs are derived from the inner cell mass of an embryo (Thomson et. al, Science. 1998
  • iPSCs induced pluripotent stem cells
  • PSC refers to pluripotent stem cells regardless of their derivation
  • the term PSC encompasses the terms ESC and iPSC, as well as the term embryonic germ stem cells (EGSC), which are another example of a PSC.
  • ESC and iPSC as well as the term embryonic germ stem cells (EGSC), which are another example of a PSC.
  • EGSC embryonic germ stem cells
  • PSCs may be in the form of an established cell line, they may be obtained directly from primary embryonic tissue, or they may be derived from a somatic cell. PSCs can be target cells of the methods described herein.
  • ESC embryonic stem cell
  • ESC lines are listed in the NIH Human Embryonic Stem Cell Registry, e.g.
  • hESBGN-01, hESBGN-02, hESBGN-03, hESBGN-04 (BresaGen, Inc.); HES-1, HES-2, HES-3, HES-4, HES-5, HES-6 (ES Cell International); Miz-hESl (MizMedi Hospital-Seoul National University); HSF-1, HSF-6 (University of California at San Francisco); and HI, H7, H9, HI 3, H14 (Wisconsin Alumni Research Foundation (WiCell Research
  • Stem cells of interest also include embryonic stem cells from other primates, such as Rhesus stem cells and marmoset stem cells.
  • the stem cells may be obtained from any mammalian species, e.g. human, equine, bovine, porcine, canine, feline, rodent, e.g. mice, rats, hamster, primate, etc. (Thomson et al. (1998) Science 282: 1145; Thomson et al. (1995) Proc. Natl. Acad. Sci USA 92:7844; Thomson et al. (1996) Biol. Reprod. 55:254; Shamblott et al., Proc. Natl. Acad. Sci. USA 95: 13726, 1998).
  • ESCs In culture, ESCs typically grow as flat colonies with large nucleo-cytoplasmic ratios, defined borders and prominent nucleoli. In addition, ESCs express SSEA-3, SSEA-4, TRA-1-60, TRA-1-81, and Alkaline Phosphatase, but not SSEA-1. Examples of methods of generating and characterizing ESCs may be found in, for example, US Patent No. 7,029,913, US Patent No. 5,843,780, and US Patent No. 6,200,806, the disclosures of which are incorporated herein by reference. Methods for proliferating hESCs in the
  • EGSC embryonic germ stem cell
  • EG cell embryonic germ stem cell
  • Embryonic germ cells are thought to have properties similar to embryonic stem cells as described above. Examples of methods of generating and characterizing EG cells may be found in, for example, US Patent No. 7,153,684; Matsui, Y., et al., (1992) Cell 70:841 ; Shamblott, M., et al. (2001) Proc. Natl. Acad. Sci. USA 98: 113; Shamblott, M., et al. (1998) Proc. Natl. Acad. Sci. USA, 95: 13726; and Koshimizu, U., et al. (1996) Development, 122: 1235, the disclosures of which are incorporated herein by reference.
  • iPSC induced pluripotent stem cell
  • PSC induced pluripotent stem cell
  • iPSCs can be derived from multiple different cell types, including terminally differentiated cells. iPSCs have an ES cell-like morphology, growing as flat colonies with large nucleo-cytoplasmic ratios, defined borders and prominent nuclei.
  • iPSCs express one or more key pluripotency markers known by one of ordinary skill in the art, including but not limited to Alkaline Phosphatase, SSEA3, SSEA4, Sox2, Oct3/4, Nanog, TRA160, TRA181, TDGF 1, Dnmt3b, FoxD3, GDF3, Cyp26al, TERT, and zfp42. Examples of methods of generating and characterizing iPSCs may be found in, for example, U.S. Patent Publication Nos. US20090047263, US20090068742,
  • somatic cells are provided with reprograrnming factors (e.g. Oct4, SOX2, KLF4, MYC, Nanog, Lin28, etc.) known in the art to reprogram the somatic cells to become pluripotent stem cells.
  • reprograrnming factors e.g. Oct4, SOX2, KLF4, MYC, Nanog, Lin28, etc.
  • somatic cell it is meant any cell in an organism that, in the absence of experimental manipulation, does not ordinarily give rise to all types of cells in an organism.
  • somatic cells are cells that have differentiated sufficiently that they will not naturally generate cells of all three germ layers of the body, i.e. ectoderm, mesoderm and endoderm.
  • somatic cells would include both neurons and neural progenitors, the latter of which may be able to naturally give rise to all or some cell types of the central nervous system but cannot give rise to cells of the mesoderm or endoderm lineages.
  • mitotic cell it is meant a cell undergoing mitosis. Mitosis is the process by which a
  • eukaryotic cell separates the chromosomes in its nucleus into two identical sets in two separate nuclei. It is generally followed immediately by cytokinesis, which divides the nuclei, cytoplasm, organelles and cell membrane into two cells containing roughly equal shares of these cellular components.
  • post-mitotic cell it is meant a cell that has exited from mitosis, i.e., it is "quiescent", i.e. it is no longer undergoing divisions. This quiescent state may be temporary, i.e. reversible, or it may be permanent.
  • meiotic cell it is meant a cell that is undergoing meiosis.
  • Meiosis is the process by which a cell divides its nuclear material for the purpose of producing gametes or spores. Unlike mitosis, in meiosis, the chromosomes undergo a recombination step which shuffles genetic material between chromosomes. Additionally, the outcome of meiosis is four (genetically unique) haploid cells, as compared with the two (genetically identical) diploid cells produced from mitosis.
  • a component e.g., a nucleic acid component (e.g., a CRISPR/Cas guide
  • RNA RNA
  • a protein component e.g., a subject fusion protein
  • label moiety refers to any moiety that provides for signal detection and may vary widely depending on the particular nature of the assay.
  • Label moieties of interest include both directly detectable labels (direct labels; e.g., a fluorescent label) and indirectly detectable labels (indirect labels; e.g., a binding pair member).
  • a fluorescent label can be any fluorescent label (e.g., a fluorescent dye (e.g., fluorescein, Texas red, rhodamine, ALEXAFLUOR® labels, and the like), a fluorescent protein (e.g., green fluorescent protein (GFP), enhanced GFP (EGFP), yellow fluorescent protein (YFP), red fluorescent protein (RFP), cyan fluorescent protein (CFP), cherry, tomato, tangerine, and any fluorescent derivative thereof), etc.).
  • Suitable detectable (directly or indirectly) label moieties for use in the methods include any moiety that is detectable by spectroscopic, photochemical, biochemical, immunochemical, electrical, optical, chemical, or other means.
  • suitable indirect labels include biotin (a binding pair member), which can be bound by streptavidin (which can itself be directly or indirectly labeled).
  • Labels can also include: a radiolabel (a direct label)(e.g., 3 H, 12 I, 35 S, 14 C, or 32 P); an enzyme (an indirect label)(e.g., peroxidase, alkaline phosphatase, galactosidase, luciferase, glucose oxidase, and the like); a fluorescent protein (a direct label)(e.g., green fluorescent protein, red fluorescent protein, yellow fluorescent protein, and any convenient derivatives thereof); a metal label (a direct label); a colorimetric label; a binding pair member; and the like.
  • binding pair member By “partner of a binding pair” or “binding pair member” is meant one of a first and a second moiety, wherein the first and the second moiety have a specific binding affinity for each other.
  • Suitable binding pairs include, but are not limited to: antigen/antibodies (for example, digoxigenin/anti-digoxigenin, dinitrophenyl (DNP)/anti-DNP, dansyl-X-anti-dansyl, fluorescein/anti-fluorescein, lucifer yellow/anti-lucifer yellow, and rhodamine anti-rhodamine), biotin/avidin (or biotin/streptavidin) and calmodulin binding protein (CBP)/calmodulin.
  • Any binding pair member can be suitable for use as an indirectly detectable label moiety.
  • Any given component, or combination of components can be unlabeled, or can be detectably labeled with a label moiety. In some cases, when two or more components are labeled, they can be labeled with label moieties that are distinguishable from one another.
  • a subject method can include contacting a target nucleic acid (e.g., DNA molecule) that includes the nucleotide sequence of interest with (a) a first fusion protein that includes a first nuclease defective CRISPR/Cas nucleic acid-guided polypeptide (e.g., a dCas9) fused to a first portion of a split BirA* protein; (b) a first guide nucleic acid (e.g., a first Cas9 guide RNA) that binds to the first fusion protein and hybridizes to a first target sequence of the target nucleic acid; (c) a second fusion protein comprising a second nuclease defective CRISPR/Cas nucleic acid-guided polypeptide (e.g.,
  • the first and second portions of the split BirA* protein are brought together (thereby forming a functional BirA* protein) by virtue of the proper positioning of the two fusion proteins (a) [first CRISPR/Cas nucleic acid-guided polypeptide fused to first portion of split BirA*] and (c) [second CRISPR/Cas nucleic acid-guided polypeptide fused to second portion of split BirA*] near the nucleotide sequence of interest.
  • a complex of (a) and (b) forms [complex 1 - first fusion protein plus first guide nucleic acid] and a complex of (c) and (d) forms [complex 2 - second fusion protein plus second guide nucleic acid] .
  • Complexes 1 and 2 are targeted to the target nucleic acid such that the nuclease defective CRISPR/Cas nucleic acid-guided polypeptide (e.g., dCas9) of the first fusion protein is not occupying the same space as the nuclease defective CRISPR/Cas nucleic acid-guided polypeptide (e.g., dCas9) of the second fusion protein.
  • the fusion partners e.g., dCas9 of the first fusion protein is not occupying the same space as the nuclease defective CRISPR/Cas nucleic acid-guided polypeptide (e.g., dCas9) of the second fusion protein.
  • first and second portions of the split BirA* protein of the first and second fusion proteins are brought together thereby forming a functional BirA* protein.
  • 'complex is targeted to a first target sequence (the first guide nucleic acid hybridizes to the first target sequence) and 'complex 2' is targeted to a second target sequence (the second guide nucleic acid hybridizes to the second target sequence) such that a functional BirA* protein forms when complexes 1 and 2 are positioned in such a way to bring the first and second portions of the split BirA* protein together.
  • the functional BirA* protein then biotinylates target proteins within its proximity (e.g., within a range of 12 nm, such as within a range of 10 nm or within a range of 8 nm). In some cases the functional BirA* protein then biotinylates target proteins within a range of 12 nm. In some cases the functional BirA* protein then biotinylates target proteins within a range of 10 nm. In some cases the functional BirA* protein then biotinylates target proteins within a range of 8 nm. In some cases the functional BirA* protein biotinylates target proteins that are within 12 nm of (are 12 nm or less in distance from) the functional BirA* protein.
  • the functional BirA* protein biotinylates target proteins that are within 10 nm of (are 10 nm or less in distance from) the functional BirA* protein. In some cases the functional BirA* protein biotinylates target proteins that are within 8 nm of (are 8 nm or less in distance from) the functional BirA* protein. In some cases the functional BirA* protein biotinylates target proteins that are 12 or less nm from the functional BirA* protein. In some cases the functional BirA* protein biotinylates target proteins that are 10 or less nm from the functional BirA* protein. In some cases the functional BirA* protein biotinylates target proteins that are 8 or less nm from the functional BirA* protein.
  • a pair of fusion proteins [first and second fusion proteins described above as (a) and (c)] that can be used to form a functional BirA* protein when brought together are referred to herein as a "corresponding pair of split BirA* fusion proteins.”
  • the CRISPR/Cas nucleic acid- guided polypeptide portion of the first and second fusion proteins of a corresponding pair of split BirA* fusion proteins are the same, e.g., both can be the same dCas9 protein, but in some cases they are different.
  • they are the same protein type (e.g., Cas9) but are derived from a different prokaryotic species (e.g., one form 5. pyogenes and the other from 5.
  • thermophiles and in some such cases the 'handle' of the associated guide RNA can be different for each fusion protein.
  • one of the fusion proteins has one type of CRISPR/Cas nucleic acid-guided polypeptide portion (e.g., a dCas9) and the other has a different type (e.g., Cpfl).
  • fusion proteins that include a nuclease defective CRISPR/Cas RNA-guided
  • polypeptide e.g., dCas9 - described in more detail below fused to a portion of a split BirA* protein (described in more detail below).
  • Methods and Compositions of the disclosure can include two such fusion proteins, where one fusion protein includes a first portion of a split BirA* protein and a second fusion protein includes a second portion of a split BirA* protein such that, when the first and second portions are brought together, they form a function BirA* protein.
  • the first and second portions of a split BirA* protein can be fused, independently relative to one another, to the N-terminus or the C-terminus of the CRISPR/Cas RNA-guided polypeptide or can be fused to an internal insertion site of the CRISPR/Cas RNA-guided polypeptide.
  • a first fusion protein includes a nuclease defective CRISPR/Cas RNA- guided polypeptide (e.g., dCas9) fused (internally, at the N-terminus, or at the C-terminus) to a first portion of a split BirA* protein and the second fusion protein includes a nuclease defective CRISPR/Cas RNA-guided polypeptide (e.g., dCas9) fused (internally, at the N-terminus, or at the C-terminus) to a second portion of a split BirA* protein.
  • dCas9 nuclease defective CRISPR/Cas RNA- guided polypeptide
  • nuclease defective CRISPR/Cas RNA-guided polypeptide in some cases the nuclease defective CRISPR/Cas RNA-guided polypeptide
  • dCas9 of the first fusion protein is fused at its N-terminus to a first portion of a split BirA* protein; and the nuclease defective CRISPR/Cas RNA-guided polypeptide (e.g., dCas9) of the second fusion protein is fused at its N-terminus to a second portion of a split BirA* protein.
  • nuclease defective CRISPR/Cas RNA-guided polypeptide (e.g., dCas9) of the first fusion protein is fused at its N-terminus to a first portion of a split BirA* protein; and the nuclease defective CRISPR/Cas RNA-guided polypeptide (e.g., dCas9) of the second fusion protein is fused at its C-terminus to a second portion of a split BirA* protein.
  • nuclease defective CRISPR/Cas RNA-guided polypeptide (e.g., dCas9) of the first fusion protein is fused at its C-terminus to a first portion of a split BirA* protein; and the nuclease defective CRISPR/Cas RNA-guided polypeptide (e.g., dCas9) of the second fusion protein is fused at its C-terminus to a second portion of a split BirA* protein.
  • fusion proteins (dCas9 fused to split BirA*) are depicted in Figure 11 and are set forth as SEQ ID NOs: 5-6.
  • the sequence set forth as SEQ ID NO: 7 is a fusion protein that includes dCas9 fused to a BirA* protein that is not split.
  • BirA* is a mutated version (e.g., Rl 18G) of a biotin ligase protein.
  • the mutated version of the protein forms biotinoyl-5'-AMP from biotin and ATP but this 'activated' biotin is only loosely bound at the active site.
  • the highly reactive biotinoyl-5'-AMP diffuses away and indiscriminately biotinylates proteins at lysine residues within an action radius of about 10 nm.
  • the biotinylated proteins can be recovered by biotin-affinity purification approaches (e.g.
  • Biotinylated proteins can be identified by any convenient method, e.g., by mass spectrometry.
  • sequence of BirA* is:
  • a subject fusion protein includes an amino acid sequence having 70% or more identity (e.g., 80% or more, 90% or more, 95% or more, 98% or more, 99% or more, 99.5 % or more, or 100% identity) with the amino acid sequence:
  • BirA* can be split into two portions such that each portion is inactive unless brought together with the other portion.
  • Such a protein is referred to herein as a “split BirA* polypeptide” or “split BirA* protein.”
  • one portion of a split BirA* protein can be:
  • a fusion protein of the disclosure includes a portion of a split BirA*
  • fusion protein that includes an amino acid sequence having 70% or more identity (e.g., 80% or more, 90% or more, 95% or more, 98% or more, 99% or more, 99.5 % or more, or 100% identity) with the amino acid sequence set forth as SEQ ID NO: 3.
  • a fusion protein of the disclosure includes a portion of a split BirA* protein that includes an amino acid sequence having 70% or more identity (e.g., 80% or more, 90% or more, 95% or more, 98% or more, 99% or more, 99.5 % or more, or 100% identity) with the amino acid sequence set forth as SEQ ID NO: 4.
  • a one fusion protein of the disclosure includes a first portion of a split BirA* protein that includes an amino acid sequence having 70% or more identity (e.g., 80% or more, 90% or more, 95% or more, 98% or more, 99% or more, 99.5 % or more, or 100% identity) with the amino acid sequence set forth as SEQ ID NO: 3; and another fusion protein includes a second portion of a split BirA* protein that includes an amino acid sequence having 70% or more identity (e.g., 80% or more, 90% or more, 95% or more, 98% or more, 99% or more, 99.5 % or more, or 100% identity) with the amino acid sequence set forth as SEQ ID NO: 4.
  • one fusion protein of the disclosure includes a first portion of a split BirA* protein that includes the amino acid sequence set forth as SEQ ID NO: 3; and another fusion protein includes a second portion of a split BirA* protein that includes the amino acid sequence set forth as SEQ ID NO: 4.
  • a subject fusion protein includes a nuclease defective CRISPR/Cas nucleic acid-guided (e.g., RNA-guided) protein fused to a first or second portion of a split BirA* protein.
  • the term "nuclease defective” as used herein refers to the ability of the protein to cleave the target DNA molecule.
  • a nuclease defective CRISPR/Cas nucleic acid-guided (e.g., RNA-guided) protein can be targeted to a target sequence of a target DNA molecule but does not cleave the target DNA molecule.
  • CRISPR/Cas RNA-guided proteins in addition to their ability to cleave target DNA, can cleave precursor CRISPR RNA (pre-crRNA) during processing of pre- crRNA into mature crRNA.
  • pre-crRNA precursor CRISPR RNA
  • the term "nuclease defective” does not necessarily refer to their ability (or inability) to cleave pre -crRNA but instead refers to their inability to cleave target DNA.
  • a CRISPR/Cas nucleic acid-guided (e.g., RNA-guided) protein that cannot cleave target DNA (e.g., via mutation of the appropriate catalytic domain(s)) but can still cleave pre -crRNA is referred to herein as "nuclease defective.”
  • nuclease defective in the context of a CRISPR/Cas nucleic acid-guided polypeptide is meant herein to encompass proteins that have some nuclease activity (e.g., to cleave pre-crRNA) but do not have nuclease activity that can cleave target DNA.
  • the nuclease defective CRISPR/Cas nucleic acid-guided polypeptide of a subject fusion protein is a nuclease defective class 2 CRISPR/Cas nucleic acid-guided polypeptide (e.g., nuclease defective type II, type V, or type VI CRISPR/Cas nucleic acid-guided polypeptide).
  • nuclease defective CRISPR/Cas nucleic acid-guided polypeptide of a subject fusion protein is a nuclease defective class 2 CRISPR/Cas nucleic acid-guided (e.g., RNA-guide) protein.
  • nuclease defective CRISPR/Cas nucleic acid-guided polypeptide of a subject fusion protein is a nuclease defective class 2 type II CRISPR/Cas endonuclease (e.g., a dCas9 protein).
  • nuclease defective CRISPR/Cas nucleic acid-guided polypeptide of a subject fusion protein is a class 2 type V CRISPR/Cas endonuclease (e.g., a Cpfl protein, a C2cl protein, or a C2c3 protein).
  • nuclease defective CRISPR/Cas nucleic acid-guided polypeptide of a subject fusion protein is a nuclease defective CasY protein. In some the nuclease defective CRISPR/Cas nucleic acid-guided polypeptide of a subject fusion protein is a nuclease defective CasX protein.
  • RNA-mediated adaptive immune systems in bacteria and archaea rely on Clustered Regularly Interspaced Short Palindromic Repeat (CRISPR) genomic loci and CRISPR-associated (Cas) proteins that function together to provide protection from invading viruses and plasmids.
  • CRISPR Clustered Regularly Interspaced Short Palindromic Repeat
  • Cas CRISPR-associated proteins
  • a CRISPR/Cas nucleic acid-guided (e.g., RNA -guide) protein of a fusion protein of the present disclosure is a class 2 CRISPR/Cas endonuclease.
  • the functions of the effector complex are carried out by a single endonuclease (e.g., see Shmakov et al, Nat Rev Microbiol. 2017 Mar;15(3): 169-182; Zetsche et al, Cell. 2015 Oct 22;163(3):759-71 ; Makarova et al, Nat Rev Microbiol. 2015 Nov;13(l l):722-36; and Shmakov et al., Mol Cell. 2015 Nov 5;60(3):385-97).
  • a single endonuclease e.g., see Shmakov et al, Nat Rev Microbiol. 2017 Mar;15(3): 169-182; Zetsche et al, Cell. 2015 Oct 22;163(3):759-71 ; Makarova et al, Nat Rev Microbiol. 2015 Nov;13(l l):722-36; and Shmakov et al., Mol Cell. 2015 Nov 5;60(3):385-97
  • class 2 CRISPR/Cas protein class 2 CRISPR/Cas nucleic acid-guided protein
  • class 2 CRISPR/Cas RNA-guided protein are used herein to encompass the endonuclease (the target nucleic acid cleaving protein) from class 2 CRISPR systems, but in fusion proteins of this disclosure, the protein is nuclease defective.
  • class 2 CRISPR/Cas endonuclease encompasses type II CRISPR/Cas proteins (e.g., Cas9), type V CRISPR/Cas proteins (e.g., Cpfl, C2cl, C2C3), and type VI CRISPR/Cas proteins (e.g., C2c2), as well as CasX and CasY.
  • Class 2 CRISPR/Cas proteins encompass type II, type V, and type VI
  • CRISPR/Cas proteins as well as CasX CasY, but the term is also meant to encompass any class 2 CRISPR/Cas protein suitable for binding to a corresponding guide RNA, forming an ribonucleoprotein (RNP) complex that can target a specific location of a target nucleic molecule via hybridization of the guide sequence of the guide RNA with the target nucleic acid molecule (e.g., target RNA molecule, target DNA molecule)
  • RNP ribonucleoprotein
  • Type II CRISPR/Cas RNA-guided proteins e.g., Cas 9
  • Cas9 functions as an RNA-guided endonuclease that uses a dual-guide RNA having a crRNA and iraws-activating crRNA (tracrRNA) for target recognition and cleavage by a mechanism involving two nuclease active sites in Cas9 that together generate double-stranded DNA breaks (DSBs), or can individually generate single- stranded DNA breaks (SSBs).
  • dgRNA double-stranded DNA breaks
  • sgRNA single guide RNA
  • RNP ribonucleoprotein
  • Cas9 Guided by a dual-RNA complex or a chimeric single -guide RNA, Cas9 generates site-specific DSBs or SSBs within double-stranded DNA (dsDNA) target nucleic acids, which are repaired either by non-homologous end joining (NHEJ) or homology- directed recombination (HDR).
  • NHEJ non-homologous end joining
  • HDR homology- directed recombination
  • a CRISPR/Cas nucleic acid-guided (e.g., RNA -guided) protein of a fusion protein of the present disclosure is a type II CRISPR/Cas endonuclease (but nuclease defective).
  • a type II CRISPR/Cas endonuclease is a type of class 2 CRISPR/Cas endonuclease.
  • a type II CRISPR/Cas endonuclease is a Cas9 protein.
  • a Cas9 protein forms a complex with a Cas9 guide RNA.
  • the guide RNA provides target specificity to a Cas9-guide RNA complex by having a nucleotide sequence (a guide sequence) that is complementary to a sequence (the target site) of a target nucleic acid (as described elsewhere herein).
  • the Cas9 protein of the complex provides the site-specific activity.
  • the Cas9 protein is guided to a target site (e.g., stabilized at a target site) within a target nucleic acid sequence (e.g.
  • a chromosomal sequence or an extrachromosomal sequence e.g., an episomal sequence, a minicircle sequence, a mitochondrial sequence, a chloroplast sequence, etc.
  • Examples of amino acid sequences of suitable Cas9 proteins that can be part of a subject fusion protein include but are not limited to nuclease defective versions of the amino acid sequences set forth in SEQ ID NOs: 15-826. Those listed as SEQ ID NOs: 15-271 and 275-826 are naturally occurring Cas9 protein sequences and a subject nuclease defective Cas9 protein would be a variant of a naturally occurring Cas9 protein (e.g., would be mutated, e.g., include an insertion, a deletion, and/or a substitution relative to a corresponding naturally occurring Cas9 protein).
  • Naturally occurring Cas9 proteins bind a Cas9 guide RNA, are thereby directed to a specific sequence within a target nucleic acid (a target site), and cleave the target nucleic acid (e.g., cleave dsDNA to generate a double strand break, cleave ssDNA, cleave ssRNA, etc.).
  • a nuclease defective version of such a protein can bind a Cas9 guide RNA, can be directed to a specific sequence within a target nucleic acid (a target site), but cannot cleave the target nucleic acid.
  • Assays to determine whether given protein interacts with a Cas9 guide RNA can be any
  • binding assays e.g., gel shift assays
  • assays that include adding a Cas9 guide RNA and a protein to a target nucleic acid.
  • Assays to determine whether a protein has an activity can be any convenient assay (e.g., any convenient nucleic acid cleavage assay that tests for nucleic acid cleavage). Suitable assays (e.g., cleavage assays) will be known to one of ordinary skill in the art and can include adding a Cas9 guide RNA and a protein to a target nucleic acid.
  • Many Cas9 orthologs from a wide variety of species have been identified and in some cases the proteins share only a few identical amino acids.
  • Identified Cas9 orthologs have similar domain architecture with a central HNH endonuclease domain and a split RuvC/RNaseH domain (e.g., RuvCI, RuvCII, and RuvCIII) (e.g., see Table 1).
  • a Cas9 protein can have 3 different regions (sometimes referred to as RuvC-I, RuvC-II, and RucC-III), that are not contiguous with respect to the primary amino acid sequence of the Cas9 protein, but fold together to form a RuvC domain once the protein is produced and folds.
  • Cas9 proteins can be said to share at least 4 key motifs with a conserved architecture.
  • Motifs 1, 2, and 4 are RuvC like motifs while motif 3 is an HNH-motif.
  • the motifs set forth in Table 1 may not represent the entire RuvC -like and/or HNH domains as accepted in the art, but Table 1 does present motifs that can be used to help determine whether a given protein is a Cas9 protein.
  • Table 1 lists 4 motifs that are present in Cas9 sequences from various species. The amino acids listed in Table 1 are from the Cas9 from 5. pyogenes (SEQ ID NO: 15).
  • a variant Cas9 protein can cleave the complementary strand (sometimes referred to in the art as the target strand) of a target nucleic acid but has reduced ability to cleave the non-complementary strand (sometimes referred to in the art as the non-target strand) of a target nucleic acid.
  • the variant Cas9 protein can have a mutation (amino acid substitution) that reduces the function of the RuvC domain.
  • the Cas9 protein can be a nickase that cleaves the complementary strand, but does not cleave the non-complementary strand.
  • a variant Cas9 protein has a mutation at an amino acid position corresponding to residue D10 (e.g., DIOA, aspartate to alanine) of SEQ ID NO: 15 (or the corresponding position of any of the proteins set forth in SEQ ID NOs: 16-271 and 275-826) and can therefore cleave the complementary strand of a double stranded target nucleic acid but has reduced ability to cleave the non-complementary strand of a double stranded target nucleic acid (thus resulting in a single strand break (SSB) instead of a double strand break (DSB) when the variant Cas9 protein cleaves a double stranded target nucleic acid) (see, for example, Jinek et al., Science. 2012 Aug 17;337(6096):816-21). See, e.g., SEQ ID NO: 272.
  • residue D10 e.g., DIOA, aspartate to alanine
  • a variant Cas9 protein can cleave the non-complementary strand of a target nucleic acid but has reduced ability to cleave the complementary strand of the target nucleic acid.
  • the variant Cas9 protein can have a mutation (amino acid substitution) that reduces the function of the HNH domain.
  • the Cas9 protein can be a nickase that cleaves the non-complementary strand, but does not cleave the complementary strand.
  • the variant Cas9 protein has a mutation at an amino acid position corresponding to residue H840 (e.g., an H840A mutation, histidine to alanine) of SEQ ID NO: 15 (or the corresponding position of any of the proteins set forth as SEQ ID NOs: 16-271 and 275-816) and can therefore cleave the non-complementary strand of the target nucleic acid but has reduced ability to cleave (e.g., does not cleave) the complementary strand of the target nucleic acid.
  • residue H840 e.g., an H840A mutation, histidine to alanine
  • Such a Cas9 protein has a reduced ability to cleave a target nucleic acid (e.g., a single stranded target nucleic acid) but retains the ability to bind a target nucleic acid (e.g., a single stranded target nucleic acid). See, e.g., SEQ ID NO: 273.
  • a variant Cas9 protein has a reduced ability to cleave both the complementary and the non-complementary strands of a double stranded target nucleic acid.
  • a "nuclease defective" Cas9 lacks a functioning RuvC domain (i.e., does not cleave the non- complementary strand of a double stranded target DNA) and lacks a functioning HNH domain (i.e., does not cleave the complementary strand of a double stranded target DNA).
  • the nuclease defective Cas9 protein harbors mutations at amino acid positions corresponding to residues D10 and H840 (e.g., DIOA and H840A) of SEQ ID NO: 15 (or the corresponding residues of any of the proteins set forth as SEQ ID NOs: 16-271 and 275-816) such that the polypeptide has a reduced ability to cleave (e.g., does not cleave) both the complementary and the non-complementary strands of a target nucleic acid.
  • residues D10 and H840 e.g., DIOA and H840A
  • Such a Cas9 protein has a reduced ability to cleave a target nucleic acid (e.g., a single stranded or double stranded target nucleic acid) but retains the ability to bind a target nucleic acid.
  • a Cas9 protein that cannot cleave target nucleic acid e.g., due to one or more mutations, e.g., in the catalytic domains of the RuvC and HNH domains
  • a nuclease defective Cas9 e.g., "dead Cas9” or simply “dCas9.” See, e.g., SEQ ID NO: 274.
  • residues D10, G12, G17, E762, H840, N854, N863, H982, H983, A984, D986, and/or A987 of SEQ ID NO: 15 can be altered (i.e., substituted).
  • mutations other than alanine substitutions are suitable.
  • a subject nuclease defective Cas9 protein e.g., one that includes a mutation that correspond to a D10, G12, G17, E762, H840, N854, N863, H982, H983, A984, D986, and/or a A987 mutation (e.g., D10A, G12A, G17A, E762A, H840A, N854A, N863A, H982A, H983A, A984A, and/or D986A) of SEQ ID NO: 15 (e.g., a corresponding mutations of any of the proteins set forth as SEQ ID NOs: 16-826,), the variant Cas9 protein can still bind to target nucleic acid in a site- specific manner (because it is still guided to a target nucleic acid sequence by a Cas9 guide RNA) as long as it retains the ability to interact with the Cas9 guide RNA.
  • a site-specific manner because it is still guided to a target nu
  • nuclease defective Cas9 protein of a subject fusion protein
  • motifs 1-4 comprises an amino acid sequence having 4 motifs, each of motifs 1-4 having 60% or more, 70% or more, 75% or more, 80% or more, 85% or more, 90% or more, 95% or more, 99% or more or 100% amino acid sequence identity (independently) to motifs 1-4 as set forth in SEQ ID NOs: 11-14, respectively (e.g., see Table 1), or to the corresponding portions in any of the amino acid sequences set forth in SEQ ID NOs: 15-826 (with the exception for the case of '100%' that the nuclease defective Cas9 protein can include a mutation, or two mutations, that render inactive the RuvC and HNH domains).
  • a suitable Cas9 protein (of a subject fusion protein) comprises an amino acid sequence having 4 motifs, each of motifs 1-4 having 60% or more, 70% or more,
  • a suitable nuclease defective Cas9 protein (of a subject fusion protein) comprises an amino acid sequence having 4 motifs, each of motifs 1-4 having 60% or more amino acid sequence identity to motifs 1-4 of the Cas9 amino acid sequence set forth as SEQ ID NO: 15 (the motifs are in Table 1, and are set forth as SEQ ID NOs: 11-14, respectively), or to the corresponding portions in any of the amino acid sequences set forth in SEQ ID NOs: 16-826.
  • a suitable nuclease defective Cas9 protein comprises an amino acid sequence having 4 motifs, each of motifs 1-4 having 70% or more amino acid sequence identity to motifs 1-4 of the Cas9 amino acid sequence set forth as SEQ ID NO: 15 (the motifs are in Table 1, and are set forth as SEQ ID NOs: 11-14, respectively), or to the corresponding portions in any of the amino acid sequences set forth in SEQ ID NOs: 16-826.
  • a suitable nuclease defective Cas9 protein comprises an amino acid sequence having 4 motifs, each of motifs 1-4 having 75% or more amino acid sequence identity to motifs 1-4 of the Cas9 amino acid sequence set forth as SEQ ID NO: 15 (the motifs are in Table 1, and are set forth as SEQ ID NOs: 11-14, respectively), or to the corresponding portions in any of the amino acid sequences set forth in SEQ ID NOs: 16-826.
  • a suitable nuclease defective Cas9 protein comprises an amino acid sequence having 4 motifs, each of motifs 1-4 having 80% or more amino acid sequence identity to motifs 1-4 of the Cas9 amino acid sequence set forth as SEQ ID NO: 15 (the motifs are in Table 1, and are set forth as SEQ ID NOs: 11-14, respectively), or to the corresponding portions in any of the amino acid sequences set forth in SEQ ID NOs: 16-826.
  • a suitable nuclease defective Cas9 protein comprises an amino acid sequence having 4 motifs, each of motifs 1-4 having 85% or more amino acid sequence identity to motifs 1-4 of the Cas9 amino acid sequence set forth as SEQ ID NO: 15 (the motifs are in Table 1, and are set forth as SEQ ID NOs: 11-14, respectively), or to the corresponding portions in any of the amino acid sequences set forth in SEQ ID NOs: 16-826.
  • a suitable nuclease defective Cas9 protein comprises an amino acid sequence having 4 motifs, each of motifs 1-4 having 90% or more amino acid sequence identity to motifs 1-4 of the Cas9 amino acid sequence set forth as SEQ ID NO: 15 (the motifs are in Table 1, and are set forth as SEQ ID NOs: 11-14, respectively), or to the corresponding portions in any of the amino acid sequences set forth in SEQ ID NOs: 16-826.
  • a suitable nuclease defective Cas9 protein comprises an amino acid sequence having 4 motifs, each of motifs 1-4 having 95% or more amino acid sequence identity to motifs 1-4 of the Cas9 amino acid sequence set forth as SEQ ID NO: 15 (the motifs are in Table 1, and are set forth as SEQ ID NOs: 11-14, respectively), or to the corresponding portions in any of the amino acid sequences set forth in SEQ ID NOs: 16-826.
  • a suitable nuclease defective Cas9 protein comprises an amino acid sequence having 4 motifs, each of motifs 1-4 having 99% or more amino acid sequence identity to motifs 1-4 of the Cas9 amino acid sequence set forth as SEQ ID NO: 15 (the motifs are in Table 1, and are set forth as SEQ ID NOs: 11-14, respectively), or to the corresponding portions in any of the amino acid sequences set forth in SEQ ID NOs: 16-826.
  • a suitable nuclease defective Cas9 protein comprises an amino acid sequence having 4 motifs, each of motifs 1-4 having 100% amino acid sequence identity to motifs 1-4 of the Cas9 amino acid sequence set forth as SEQ ID NO: 15 (the motifs are in Table 1, and are set forth as SEQ ID NOs: 11-14,
  • nuclease defective Cas9 protein can include a mutation, or two mutations, that render inactive the RuvC and HNH domains).
  • nuclease defective Cas9 protein of a subject fusion protein
  • nuclease defective Cas9 protein comprises an amino acid sequence having 60% or more, 70% or more, 75% or more, 80% or more, 85% or more, 90% or more, 95% or more, 99% or more or 100% amino acid sequence identity to amino acids 7-166 or 731-1003 of the Cas9 amino acid sequence set forth in SEQ ID NO: 15, or to the corresponding portions in any of the amino acid sequences set forth as SEQ ID NOs: 16-826 (with the exception for the '100% case that the nuclease defective Cas9 protein can include a mutation, or two mutations, that render inactive the RuvC and HNH domains).
  • nuclease defective Cas9 protein of a subject fusion protein
  • a suitable nuclease defective Cas9 protein comprises an amino acid sequence having 70% or more amino acid sequence identity to amino acids 7-166 or 731-1003 of the Cas9 amino acid sequence set forth in SEQ ID NO: 15, or to the corresponding portions in any of the amino acid sequences set forth as SEQ ID NOs: 16-826.
  • a suitable nuclease defective Cas9 protein comprises an amino acid sequence having 75% or more amino acid sequence identity to amino acids 7-166 or 731-1003 of the Cas9 amino acid sequence set forth in SEQ ID NO: 15, or to the corresponding portions in any of the amino acid sequences set forth as SEQ ID NOs: 16-826. In some cases, a suitable nuclease defective Cas9 protein comprises an amino acid sequence having 80% or more amino acid sequence identity to amino acids 7-166 or 731-1003 of the Cas9 amino acid sequence set forth in SEQ ID NO: 15, or to the corresponding portions in any of the amino acid sequences set forth as SEQ ID NOs: 16-826.
  • a suitable nuclease defective Cas9 protein comprises an amino acid sequence having 85% or more amino acid sequence identity to amino acids 7-166 or 731-1003 of the Cas9 amino acid sequence set forth in SEQ ID NO: 15, or to the corresponding portions in any of the amino acid sequences set forth as SEQ ID NOs: 16-826. In some cases, a suitable nuclease defective Cas9 protein comprises an amino acid sequence having 90% or more amino acid sequence identity to amino acids 7-166 or 731-1003 of the Cas9 amino acid sequence set forth in SEQ ID NO: 15, or to the corresponding portions in any of the amino acid sequences set forth as SEQ ID NOs: 16-826.
  • a suitable nuclease defective Cas9 protein comprises an amino acid sequence having 95% or more amino acid sequence identity to amino acids 7-166 or 731-1003 of the Cas9 amino acid sequence set forth in SEQ ID NO: 15, or to the corresponding portions in any of the amino acid sequences set forth as SEQ ID NOs: 16-826. In some cases, a suitable nuclease defective Cas9 protein comprises an amino acid sequence having 99% or more amino acid sequence identity to amino acids 7-166 or 731-1003 of the Cas9 amino acid sequence set forth in SEQ ID NO: 15, or to the corresponding portions in any of the amino acid sequences set forth as SEQ ID NOs: 16-826.
  • a suitable nuclease defective Cas9 protein comprises an amino acid sequence having 100% amino acid sequence identity to amino acids 7-166 or 731-1003 of the Cas9 amino acid sequence set forth in SEQ ID NO: 15, or to the corresponding portions in any of the amino acid sequences set forth as SEQ ID NOs: 16-826 (with the exception that the nuclease defective Cas9 protein can include a mutation, or two mutations, that render inactive the RuvC and HNH domains).
  • nuclease defective Cas9 protein of a subject fusion protein
  • nuclease defective Cas9 protein of a subject fusion protein
  • a suitable nuclease defective Cas9 protein comprises an amino acid sequence having 70% or more amino acid sequence identity to the Cas9 amino acid sequence set forth in any one of SEQ ID NOs: 15- 826. In some cases, a suitable nuclease defective Cas9 protein comprises an amino acid sequence having 75% or more amino acid sequence identity to the Cas9 amino acid sequence set forth in any one of SEQ ID NOs: 15-826.
  • a suitable nuclease defective Cas9 protein comprises an amino acid sequence having 80% or more amino acid sequence identity to the Cas9 amino acid sequence set forth in any one of SEQ ID NOs: 15-826. In some cases, a suitable nuclease defective Cas9 protein comprises an amino acid sequence having 85% or more amino acid sequence identity to the Cas9 amino acid sequence set forth in any one of SEQ ID NOs: 15- 826. In some cases, a suitable nuclease defective Cas9 protein comprises an amino acid sequence having 90% or more amino acid sequence identity to the Cas9 amino acid sequence set forth in any one of SEQ ID NOs: 15-826.
  • a suitable nuclease defective Cas9 protein comprises an amino acid sequence having 95% or more amino acid sequence identity to the Cas9 amino acid sequence set forth in any one of SEQ ID NOs: 15-826. In some cases, a suitable nuclease defective Cas9 protein comprises an amino acid sequence having 99% or more amino acid sequence identity to the Cas9 amino acid sequence set forth in any one of SEQ ID NOs: 15- 826.
  • a nuclease defective Cas9 protein (of a subject fusion protein) comprises 4 motifs (as listed in Table 1), at least one with (or each with) amino acid sequences having 75% or more, 80% or more, 85% or more, 90% or more, 95% or more, 99% or more or 100% amino acid sequence identity to each of the 4 motifs listed in Table 1 (SEQ ID NOs: 11-14), or to the corresponding portions in any of the amino acid sequences set forth as SEQ ID NOs: 16-826 (with the exception for the ' 100%' case that the nuclease defective Cas9 protein can include a mutation, or two mutations, that render inactive the RuvC and HNH domains).
  • Cas9 proteins and Cas9 domain structure
  • Cas9 guide RNAs as well as information regarding requirements related to protospacer adjacent motif (PAM) sequences present in targeted nucleic acids
  • PAM protospacer adjacent motif
  • a genome targeting composition includes a type V or type VI CRISPR/Cas
  • RNA-guided protein i.e., the genome editing endonuclease is a type V or type VI CRISPR/Cas endonuclease) (e.g., Cpfl, C2cl, C2c2, C2c3).
  • Type V and type VI CRISPR/Cas endonucleases are a type of class 2 CRISPR/Cas endonuclease. Examples of type V CRISPR/Cas
  • endonucleases include but are not limited to: Cpfl, C2cl, and C2c3.
  • An example of a type VI CRISPR/Cas endonuclease is C2c2.
  • type V and VI CRISPR/Cas endonucleases form a complex with a corresponding guide RNA.
  • the guide RNA provides target specificity to an endonuclease-guide RNA RNP complex by having a nucleotide sequence (a guide sequence) that is complementary to a sequence (the target site) of a target nucleic acid (as described elsewhere herein).
  • the endonuclease of the complex provides the site-specific activity. In other words, the endonuclease is guided to a target site (e.g., stabilized at a target site) within a target nucleic acid sequence (e.g.
  • a chromosomal sequence or an extrachromosomal sequence e.g., an episomal sequence, a minicircle sequence, a mitochondrial sequence, a chloroplast sequence, etc.
  • Examples and guidance related to type V and type VI CRISPR/Cas proteins e.g., cpfl, C2cl, C2c2, and C2c3 guide RNAs), e.g., information regarding PAM sequences and guide RNA sequences, can be found in the art, for example, see Zetsche et al., Cell. 2015 Oct 22;163(3):759- 71 ; Makarova et al., Nat Rev Microbiol. 2015 Nov;13(l l):722-36; and Shmakov et al, Mol Cell. 2015 Nov 5;60(3):385-97; and Shmakov et al., Nat Rev Microbiol. 2017 Mar;15(3): 169-182.
  • a CRISPR/Cas nucleic acid-guided (e.g., RNA-guided) protein of a fusion protein of the present disclosure is a Type V or type VI CRISPR/Cas nucleic acid-guided protein (e.g., Cpfl, C2cl, C2c2, C2c3, C2c7, C2c6) and is nuclease defective ("dead"), e.g., the Type V or type VI CRISPR/Cas polypeptide, when bound to a guide RNA, retains target (e.g., DNA) binding activity but does not cleave the target nucleic acid.
  • a Type V or type VI CRISPR/Cas nucleic acid-guided protein e.g., Cpfl, C2cl, C2c2, C2c3, C2c7, C2c6
  • dead nuclease defective
  • a nuclease defective CRISPR/Cas nucleic acid-guided (e.g., RNA- guided) protein of a fusion protein of the present disclosure is a nuclease defective Cpf 1 protein (also known as "casl2a" - a type V CRISPR/Cas nucleic acid-guided protein).
  • a nuclease defective CRISPR/Cas nucleic acid-guided (e.g., RNA-guided) protein of a fusion protein of the present disclosure is a Cpf 1 protein that lacks a functioning RuvC-like domain.
  • a nuclease defective Cpf 1 protein comprises an amino acid sequence having at least 30%, at least 35%, at least 40%, at least 45%, at least 50%, at least 55%, at least 60%, at least 65%, at least 70%, at least 75%, at least 80%, at least 85%, at least 90%, at least 95%, at least 98%, at least 99%, or at least 99% amino acid sequence identity to the Cpfl amino acid sequence set forth in any of SEQ ID NOs: 828-832; and lacks a functioning RuvC-like domain.
  • a subject nuclease defective Cpfl protein includes an amino acid substitution (e.g., a D ⁇ A substitution) at an amino acid residue corresponding to amino acid 917 of the Cpfl amino acid sequence set forth in SEQ ID NO: 828.
  • an amino acid substitution e.g., a D ⁇ A substitution
  • a nuclease defective Cpfl protein comprises an amino acid sequence having at least 30%, at least 35%, at least 40%, at least 45%, at least 50%, at least 55%, at least 60%, at least 65%, at least 70%, at least 75%, at least 80%, at least 85%, at least 90%, at least 95%, at least 98%, or a least 99%, amino acid sequence identity to the Cpfl amino acid sequence set forth in any of SEQ ID NOs: 828-832; and comprises an amino acid substitution (e.g., an E ⁇ A substitution) at an amino acid residue corresponding to amino acid 1006 of the Cpfl amino acid sequence set forth in SEQ ID NO: 828.
  • amino acid substitution e.g., an E ⁇ A substitution
  • a nuclease defective Cpfl protein comprises an amino acid sequence having at least 30%, at least 35%, at least 40%, at least 45%, at least 50%, at least 55%, at least 60%, at least 65%, at least 70%, at least 75%, at least 80%, at least 85%, at least 90%, at least 95%, at least 98%, or at least 99% amino acid sequence identity to the Cpfl amino acid sequence set forth in any of SEQ ID NOs: 828-832; and comprises an amino acid substitution (e.g., a D ⁇ A substitution) at an amino acid residue corresponding to amino acid 1255 of the Cpfl amino acid sequence set forth in SEQ ID NO: 828.
  • amino acid substitution e.g., a D ⁇ A substitution
  • a nuclease defective Cpfl protein comprises an amino acid sequence having at least 30%, at least 35%, at least 40%, at least 45%, at least 50%, at least 55%, at least 60%, at least 65%, at least 70%, at least 75%, at least 80%, at least 85%, at least 90%, at least 95%, at least 98%, or at least 99%, amino acid sequence identity to the Cpfl amino acid sequence set forth in any of SEQ ID NOs: 828-832.
  • a nuclease defective Cpfl protein comprises an amino acid sequence having at least 30%, at least 35%, at least 40%, at least 45%, at least 50%, at least 55%, at least 60%, at least 65%, at least 70%, at least 75%, at least 80%, at least 85%, at least 90%, at least 95%, at least 98%, or at least 99% amino acid sequence identity to a contiguous stretch of from 100 amino acids to 200 amino acids (aa), from 200 aa to 400 aa, from 400 aa to 600 aa, from 600 aa to 800 aa, from 800 aa to 1000 aa, from 1000 aa to 1100 aa, from 1100 aa to 1200 aa, or from 1200 aa to 1300 aa, of the Cpfl amino acid sequence set forth in any of SEQ ID NOs: 828-832.
  • a nuclease defective Cpfl protein comprises an amino acid sequence having at least 30%, at least 35%, at least 40%, at least 45%, at least 50%, at least 55%, at least 60%, at least 65%, at least 70%, at least 75%, at least 80%, at least 85%, at least 90%, at least 95%, at least 98%, or at least 99%, amino acid sequence identity to the RuvCI domain of the Cpfl amino acid sequence set forth in any of SEQ ID NOs: 828-832.
  • a nuclease defective Cpfl protein comprises an amino acid sequence having at least 30%, at least 35%, at least 40%, at least 45%, at least 50%, at least 55%, at least 60%, at least 65%, at least 70%, at least 75%, at least 80%, at least 85%, at least 90%, at least 95%, at least 98%, or at least 99%, amino acid sequence identity to the RuvCII domain of the Cpfl amino acid sequence set forth in any of SEQ ID NOs: 828-832.
  • a nuclease defective Cpfl protein comprises an amino acid sequence having at least 30%, at least 35%, at least 40%, at least 45%, at least 50%, at least 55%, at least 60%, at least 65%, at least 70%, at least 75%, at least 80%, at least 85%, at least 90%, at least 95%, at least 98%, or at least 99%, amino acid sequence identity to the RuvCIII domain of the Cpfl amino acid sequence set forth in any of SEQ ID NOs: 828-832.
  • a nuclease defective Cpfl protein comprises an amino acid sequence having at least 30%, at least 35%, at least 40%, at least 45%, at least 50%, at least 55%, at least 60%, at least 65%, at least 70%, at least 75%, at least 80%, at least 85%, at least 90%, at least 95%, at least 98%, or at least 99%, amino acid sequence identity to the RuvCI, RuvCII, and RuvCIII domains of the Cpfl amino acid sequence set forth in any of SEQ ID NOs: 828-832.
  • a suitable nuclease defective Cpfl protein comprises an amino acid sequence having at least 30%, at least 35%, at least 40%, at least 45%, at least 50%, at least 55%, at least 60%, at least 65%, at least 70%, at least 75%, at least 80%, at least 85%, at least 90%, at least 95%, at least 98%, or at least 99%, amino acid sequence identity to the Cpfl amino acid sequence set forth in any of SEQ ID NOs: 828-832.
  • a nuclease defective CRISPR/Cas nucleic acid-guided (e.g., RNA- guided) protein of a fusion protein of the present disclosure is a nuclease defective C2cl protein (also known as "casl2b" - a type V CRISPR/Cas nucleic acid-guided protein) (examples include those set forth as SEQ ID NOs: 833-840).
  • a nuclease defective C2cl protein comprises an amino acid sequence having at least 30%, at least 35%, at least 40%, at least 45%, at least 50%, at least 55%, at least 60%, at least 65%, at least 70%, at least 75%, at least 80%, at least 85%, at least 90%, at least 95%, at least 98%, or at least 99%, amino acid sequence identity to the C2cl amino acid sequence set forth in any of SEQ ID NOs: 833-840.
  • a nuclease defective C2cl protein comprises an amino acid sequence having at least 30%, at least 35%, at least 40%, at least 45%, at least 50%, at least 55%, at least 60%, at least 65%, at least 70%, at least 75%, at least 80%, at least 85%, at least 90%, at least 95%, at least 98%, or at least 99%, amino acid sequence identity to a contiguous stretch of from 100 amino acids to 200 amino acids (aa), from 200 aa to 400 aa, from 400 aa to 600 aa, from 600 aa to 800 aa, from 800 aa to 1000 aa, from 1000 aa to 1100 aa, from 1100 aa to 1200 aa, or from 1200 aa to 1300 aa, of the C2cl amino acid sequence set forth in any of SEQ ID NOs: 833-840.
  • a nuclease defective C2cl protein comprises an amino acid sequence having at least 30%, at least 35%, at least 40%, at least 45%, at least 50%, at least 55%, at least 60%, at least 65%, at least 70%, at least 75%, at least 80%, at least 85%, at least 90%, at least 95%, at least 98%, or at least 99%, amino acid sequence identity to the RuvCI domain of the C2cl amino acid sequences set forth in any of SEQ ID NOs: 833-840).
  • a nuclease defective C2cl protein comprises an amino acid sequence having at least 30%, at least 35%, at least 40%, at least 45%, at least 50%, at least 55%, at least 60%, at least 65%, at least 70%, at least 75%, at least 80%, at least 85%, at least 90%, at least 95%, at least 98%, or at least 99%, amino acid sequence identity to the RuvCII domain of the C2cl amino acid sequence set forth in any of SEQ ID NOs: 833-840.
  • a nuclease defective C2cl protein comprises an amino acid sequence having at least 30%, at least 35%, at least 40%, at least 45%, at least 50%, at least 55%, at least 60%, at least 65%, at least 70%, at least 75%, at least 80%, at least 85%, at least 90%, at least 95%, at least 98%, or at least 99%, amino acid sequence identity to the RuvCIII domain of the C2cl amino acid sequence set forth in any of SEQ ID NOs: 833-840.
  • a nuclease defective C2cl protein comprises an amino acid sequence having at least 30%, at least 35%, at least 40%, at least 45%, at least 50%, at least 55%, at least 60%, at least 65%, at least 70%, at least 75%, at least 80%, at least 85%, at least 90%, at least 95%, at least 98%, or at least 99%, amino acid sequence identity to the RuvCI, RuvCII, and RuvCIII domains of the C2cl amino acid sequence set forth in any of SEQ ID NOs: 833-840.
  • a nuclease defective CRISPR/Cas nucleic acid-guided (e.g., RNA- guided) protein of a fusion protein of the present disclosure is a nuclease defective C2c3 protein (also known as "casl2c" - a type V CRISPR/Cas nucleic acid-guided protein) (examples include those set forth as SEQ ID NOs: 841-844).
  • a nuclease defective C2c3 protein comprises an amino acid sequence having at least 30%, at least 35%, at least 40%, at least 45%, at least 50%, at least 55%, at least 60%, at least 65%, at least 70%, at least 75%, at least 80%, at least 85%, at least 90%, at least 95%, at least 98%, or at least 99% amino acid sequence identity to the C2c3 amino acid sequence set forth in any of SEQ ID NOs: 841-844.
  • a nuclease defective C2c3 protein comprises an amino acid sequence having at least 30%, at least 35%, at least 40%, at least 45%, at least 50%, at least 55%, at least 60%, at least 65%, at least 70%, at least 75%, at least 80%, at least 85%, at least 90%, at least 95%, at least 98%, or at least 99%, amino acid sequence identity to a contiguous stretch of from 100 amino acids to 200 amino acids (aa), from 200 aa to 400 aa, from 400 aa to 600 aa, from 600 aa to 800 aa, from 800 aa to 1000 aa, from 1000 aa to 1100 aa, from 1100 aa to 1200 aa, or from 1200 aa to 1300 aa, of the C2c3 amino acid sequence set forth in any of SEQ ID NOs: 841-844.
  • a nuclease defective C2c3 protein comprises an amino acid sequence having at least 30%, at least 35%, at least 40%, at least 45%, at least 50%, at least 55%, at least 60%, at least 65%, at least 70%, at least 75%, at least 80%, at least 85%, at least 90%, at least 95%, at least 98%, or at least 99%, amino acid sequence identity to the RuvCI domain of the C2c3 amino acid sequence set forth in any of SEQ ID NOs: 841-844.
  • a nuclease defective C2c3 protein comprises an amino acid sequence having at least 30%, at least 35%, at least 40%, at least 45%, at least 50%, at least 55%, at least 60%, at least 65%, at least 70%, at least 75%, at least 80%, at least 85%, at least 90%, at least 95%, at least 98%, or at least 99%, amino acid sequence identity to the RuvCII domain of the C2c3 amino acid sequence set forth in any of SEQ ID NOs: 841-844.
  • a nuclease defective C2c3 protein comprises an amino acid sequence having at least 30%, at least 35%, at least 40%, at least 45%, at least 50%, at least 55%, at least 60%, at least 65%, at least 70%, at least 75%, at least 80%, at least 85%, at least 90%, at least 95%, at least 98%, or at least 99%, amino acid sequence identity to the RuvCIII domain of the C2c3 amino acid sequence set forth in any of SEQ ID NOs: 841-844.
  • a nuclease defective C2c3 protein comprises an amino acid sequence having at least 30%, at least 35%, at least 40%, at least 45%, at least 50%, at least 55%, at least 60%, at least 65%, at least 70%, at least 75%, at least 80%, at least 85%, at least 90%, at least 95%, at least 98%, or at least 99%, amino acid sequence identity to the RuvCI, RuvCII, and RuvCIII domains of the C2c3 amino acid sequence set forth in any of SEQ ID NOs: 841-844.
  • a suitable C2c3 protein comprises an amino acid sequence having at least 30%, at least 35%, at least 40%, at least 45%, at least 50%, at least 55%, at least 60%, at least 65%, at least 70%, at least 75%, at least 80%, at least 85%, at least 90%, at least 95%, at least 98%, or at least 99%, amino acid sequence identity to the C2c3 amino acid sequence set forth in any of SEQ ID NOs: 841-844.
  • a nuclease defective CRISPR/Cas nucleic acid-guided (e.g., RNA- guided) protein of a fusion protein of the present disclosure is a nuclease defective C2c2 protein (also known as "Casl3a" - a type VI CRISPR/Cas endonuclease) (examples include those set forth as SEQ ID NOs: 845-856).
  • a nuclease defective C2c2 protein comprises an amino acid sequence having at least 30%, at least 35%, at least 40%, at least 45%, at least 50%, at least 55%, at least 60%, at least 65%, at least 70%, at least 75%, at least 80%, at least 85%, at least 90%, at least 95%, at least 98%, or at least 99%, amino acid sequence identity to the C2c2 amino acid sequence set forth in any of SEQ ID NOs: 845-856.
  • a nuclease defective C2c2 protein comprises an amino acid sequence having at least 30%, at least 35%, at least 40%, at least 45%, at least 50%, at least 55%, at least 60%, at least 65%, at least 70%, at least 75%, at least 80%, at least 85%, at least 90%, at least 95%, at least 98%, or at least 99%, amino acid sequence identity to a contiguous stretch of from 100 amino acids to 200 amino acids (aa), from 200 aa to 400 aa, from 400 aa to 600 aa, from 600 aa to 800 aa, from 800 aa to 1000 aa, from 1000 aa to 1100 aa, from 1100 aa to 1200 aa, or from 1200 aa to 1300 aa, of the C2c2 amino acid sequence set forth in any of SEQ ID NOs: 845-856.
  • a nuclease defective C2c2 protein comprises an amino acid sequence having at least 30%, at least 35%, at least 40%, at least 45%, at least 50%, at least 55%, at least 60%, at least 65%, at least 70%, at least 75%, at least 80%, at least 85%, at least 90%, at least 95%, at least 98%, or at least 99%, amino acid sequence identity to the RuvCI domain of the C2c2 amino acid sequence set forth in any of SEQ ID NOs: 845-856.
  • a nuclease defective C2c2 protein comprises an amino acid sequence having at least 30%, at least 35%, at least 40%, at least 45%, at least 50%, at least 55%, at least 60%, at least 65%, at least 70%, at least 75%, at least 80%, at least 85%, at least 90%, at least 95%, at least 98%, or at least 99%, amino acid sequence identity to the RuvCII domain of the C2c2 amino acid sequence set forth in any of SEQ ID NOs: 845-856.
  • a nuclease defective C2c2 protein comprises an amino acid sequence having at least 30%, at least 35%, at least 40%, at least 45%, at least 50%, at least 55%, at least 60%, at least 65%, at least 70%, at least 75%, at least 80%, at least 85%, at least 90%, at least 95%, at least 98%, or at least 99%, amino acid sequence identity to the RuvCIII domain of the C2c2 amino acid sequence set forth in any of SEQ ID NOs: 845-856.
  • a nuclease defective C2c2 protein comprises an amino acid sequence having at least 30%, at least 35%, at least 40%, at least 45%, at least 50%, at least 55%, at least 60%, at least 65%, at least 70%, at least 75%, at least 80%, at least 85%, at least 90%, at least 95%, at least 98%, or at least 99%, amino acid sequence identity to the RuvCI, RuvCII, and RuvCIII domains of the C2c2 amino acid sequence set forth in any of SEQ ID NOs: 845-856.
  • a nuclease defective CRISPR/Cas nucleic acid-guided (e.g., RNA- guided) protein of a fusion protein of the present disclosure is a nuclease defective C2c7 protein (also known as "Casl3c” - a type VI CRISPR/Cas endonuclease) or a nuclease defective C2c6 (also known as "Casl3b" - a type VI CRISPR/Cas endonuclease) [e.g., see Shmakov et al, Nat Rev Microbiol. 2017 Mar;15(3): 169-182].
  • a nuclease defective CRISPR/Cas nucleic acid-guided (e.g., RNA- guided) protein of a fusion protein of the present disclosure is a nuclease defective CasX or CasY protein.
  • a nuclease defective CRISPR/Cas nucleic acid-guided (e.g., RNA-guided) protein of a fusion protein of the present disclosure is a nuclease defective CasX protein.
  • a nuclease defective CRISPR/Cas nucleic acid-guided (e.g., RNA- guided) protein of a fusion protein of the present disclosure is a nuclease defective CasY protein.
  • a nuclease defective CRISPR/Cas nucleic acid-guided (e.g., RNA- guided) protein of a fusion protein of the present disclosure is a nuclease defective CASX protein.
  • a nuclease defective CRISPR/Cas nucleic acid-guided (e.g., RNA-guided) protein of a fusion protein of the present disclosure is a CASX protein that lacks a functioning RuvC domain.
  • a nuclease defective CASX protein comprises an amino acid sequence having at least 30%, at least 35%, at least 40%, at least 45%, at least 50%, at least 55%, at least 60%, at least 65%, at least 70%, at least 75%, at least 80%, at least 85%, at least 90%, at least 95%, at least 98%, at least 99%, or at least 99% amino acid sequence identity to the CASX amino acid sequence set forth in any of SEQ ID NOs: 1146- 1147; and lacks a functioning RuvC domain.
  • a nuclease defective CasX protein comprises an amino acid sequence having at least 30%, at least 35%, at least 40%, at least 45%, at least 50%, at least 55%, at least 60%, at least 65%, at least 70%, at least 75%, at least 80%, at least 85%, at least 90%, at least 95%, at least 98%, or at least 99%, amino acid sequence identity to the CASX amino acid sequence set forth in any of SEQ ID NOs: 1146-1147.
  • a nuclease defective CASX protein comprises an amino acid sequence having at least 30%, at least 35%, at least 40%, at least 45%, at least 50%, at least 55%, at least 60%, at least 65%, at least 70%, at least 75%, at least 80%, at least 85%, at least 90%, at least 95%, at least 98%, or at least 99% amino acid sequence identity to a contiguous stretch of from 100 amino acids to 200 amino acids (aa), from 200 aa to 400 aa, from 400 aa to 600 aa, from 600 aa to 800 aa, or from 800 aa to 960 aa, of the CASX amino acid sequence set forth in any of SEQ ID NOs: 1146-1147.
  • a nuclease defective CASX protein comprises an amino acid sequence having at least 30%, at least 35%, at least 40%, at least 45%, at least 50%, at least 55%, at least 60%, at least 65%, at least 70%, at least 75%, at least 80%, at least 85%, at least 90%, at least 95%, at least 98%, or at least 99%, amino acid sequence identity to the RuvCI domain of the CASX amino acid sequence set forth in any of SEQ ID NOs: 1146-1147.
  • a nuclease defective CasX protein comprises an amino acid sequence having at least 30%, at least 35%, at least 40%, at least 45%, at least 50%, at least 55%, at least 60%, at least 65%, at least 70%, at least 75%, at least 80%, at least 85%, at least 90%, at least 95%, at least 98%, or at least 99%, amino acid sequence identity to the RuvCII domain of the CasX amino acid sequence set forth in any of SEQ ID NOs: 1146-1147.
  • a nuclease defective CasX protein comprises an amino acid sequence having at least 30%, at least 35%, at least 40%, at least 45%, at least 50%, at least 55%, at least 60%, at least 65%, at least 70%, at least 75%, at least 80%, at least 85%, at least 90%, at least 95%, at least 98%, or at least 99%, amino acid sequence identity to the RuvCIII domain of the CasX amino acid sequence set forth in any of SEQ ID NOs: 1146-1147.
  • a nuclease defective CasX protein comprises an amino acid sequence having at least 30%, at least 35%, at least 40%, at least 45%, at least 50%, at least 55%, at least 60%, at least 65%, at least 70%, at least 75%, at least 80%, at least 85%, at least 90%, at least 95%, at least 98%, or at least 99%, amino acid sequence identity to the RuvCI, RuvCII, and RuvCIII domains of the CasX amino acid sequence set forth in any of SEQ ID NOs: 1146-1147.
  • a suitable nuclease defective CasX protein comprises an amino acid sequence having at least 30%, at least 35%, at least 40%, at least 45%, at least 50%, at least 55%, at least 60%, at least 65%, at least 70%, at least 75%, at least 80%, at least 85%, at least 90%, at least 95%, at least 98%, or at least 99%, amino acid sequence identity to the CasX amino acid sequence set forth in any of SEQ ID NOs: 1146-1147.
  • a nuclease defective CRISPR/Cas nucleic acid-guided (e.g., RNA- guided) protein of a fusion protein of the present disclosure is a nuclease defective CASY protein.
  • a nuclease defective CRISPR/Cas nucleic acid-guided (e.g., RNA-guided) protein of a fusion protein of the present disclosure is a CASY protein that lacks a functioning RuvC domain.
  • a nuclease defective CASY protein comprises an amino acid sequence having at least 30%, at least 35%, at least 40%, at least 45%, at least 50%, at least 55%, at least 60%, at least 65%, at least 70%, at least 75%, at least 80%, at least 85%, at least 90%, at least 95%, at least 98%, at least 99%, or at least 99% amino acid sequence identity to the CASY amino acid sequence set forth in any of SEQ ID NOs: 1148- 1153; and lacks a functioning RuvC domain.
  • a nuclease defective CasY protein comprises an amino acid sequence having at least 30%, at least 35%, at least 40%, at least 45%, at least 50%, at least 55%, at least 60%, at least 65%, at least 70%, at least 75%, at least 80%, at least 85%, at least 90%, at least 95%, at least 98%, or at least 99%, amino acid sequence identity to the CASY amino acid sequence set forth in any of SEQ ID NOs: 1148-1153.
  • a nuclease defective CASY protein comprises an amino acid sequence having at least 30%, at least 35%, at least 40%, at least 45%, at least 50%, at least 55%, at least 60%, at least 65%, at least 70%, at least 75%, at least 80%, at least 85%, at least 90%, at least 95%, at least 98%, or at least 99% amino acid sequence identity to a contiguous stretch of from 100 amino acids to 200 amino acids (aa), from 200 aa to 400 aa, from 400 aa to 600 aa, from 600 aa to 800 aa, or from 800 aa to 960 aa, of the CASY amino acid sequence set forth in any of SEQ ID NOs: 1148-1153.
  • a nuclease defective CASY protein comprises an amino acid sequence having at least 30%, at least 35%, at least 40%, at least 45%, at least 50%, at least 55%, at least 60%, at least 65%, at least 70%, at least 75%, at least 80%, at least 85%, at least 90%, at least 95%, at least 98%, or at least 99%, amino acid sequence identity to the RuvCI domain of the CASY amino acid sequence set forth in any of SEQ ID NOs: 1148-1153.
  • a nuclease defective CasY protein comprises an amino acid sequence having at least 30%, at least 35%, at least 40%, at least 45%, at least 50%, at least 55%, at least 60%, at least 65%, at least 70%, at least 75%, at least 80%, at least 85%, at least 90%, at least 95%, at least 98%, or at least 99%, amino acid sequence identity to the RuvCII domain of the CasY amino acid sequence set forth in any of SEQ ID NOs: 1148-1153.
  • a nuclease defective CasY protein comprises an amino acid sequence having at least 30%, at least 35%, at least 40%, at least 45%, at least 50%, at least 55%, at least 60%, at least 65%, at least 70%, at least 75%, at least 80%, at least 85%, at least 90%, at least 95%, at least 98%, or at least 99%, amino acid sequence identity to the RuvCIII domain of the CasY amino acid sequence set forth in any of SEQ ID NOs: 1148-1153.
  • a nuclease defective CasY protein comprises an amino acid sequence having at least 30%, at least 35%, at least 40%, at least 45%, at least 50%, at least 55%, at least 60%, at least 65%, at least 70%, at least 75%, at least 80%, at least 85%, at least 90%, at least 95%, at least 98%, or at least 99%, amino acid sequence identity to the RuvCI, RuvCII, and RuvCIII domains of the CasY amino acid sequence set forth in any of SEQ ID NOs: 1148-1153.
  • a suitable nuclease defective CasY protein comprises an amino acid sequence having at least 30%, at least 35%, at least 40%, at least 45%, at least 50%, at least 55%, at least 60%, at least 65%, at least 70%, at least 75%, at least 80%, at least 85%, at least 90%, at least 95%, at least 98%, or at least 99%, amino acid sequence identity to the CasY amino acid sequence set forth in any of SEQ ID NOs: 1148-1153.
  • a subject fusion protein (a nuclease defective CRISPR/Cas RNA-guided
  • polypeptide e.g., dCas9 fused to a first or second portion of a split BirA* protein
  • an amino acid sequence e.g., the fusion partner is a subcellular localization sequence (e.g., one or more nuclear localization signals (NLSs) for targeting to the nucleus.
  • a subject fusion protein can have one or more NLSs (e.g., two or more, three or more, four or more, five or more, 1, 2, 3, 4, or 5 NLSs).
  • a subject fusion protein includes one or more (e.g., two or more, three or more, four or more, etc.) nuclear localization signals (NLSs).
  • a fusion partner (or multiple fusion partners, e.g., 1, 2, 3, 4, or 5 fusion partners; e.g., an NLS, a tag, a fusion partner providing an activity, etc.) is located at or near the C-terminus of the subject fusion protein. In some cases a fusion partner (or multiple fusion partners, e.g., 1, 2, 3, 4, or 5 fusion partners; e.g., an NLS, a tag, a fusion partner providing an activity, etc.) is located at the N-terminus of the subject fusion protein.
  • the subject fusion protein has a fusion partner (or multiple fusion partners, e.g., 1, 2, 3, 4, or 5 fusion partners; e.g., an NLS, a tag, a fusion partner providing an activity, etc.) at both the N-terminus and C-terminus.
  • the subject fusion protein has a fusion partner (or multiple fusion partners, e.g., 1, 2, 3, 4, or 5 fusion partners; e.g., an NLS, a tag, a fusion partner providing an activity, etc.) located internal to the primary amino acid sequence.
  • the subject fusion protein has a fusion partner (or multiple fusion partners, e.g., 1, 2, 3, 4, or 5 fusion partners; e.g., an NLS, a tag, a fusion partner providing an activity, etc.) located internal to the primary amino acid sequence in addition to a fusion partner located at C- and/or N- terminus.
  • a fusion partner or multiple fusion partners, e.g., 1, 2, 3, 4, or 5 fusion partners; e.g., an NLS, a tag, a fusion partner providing an activity, etc.
  • a subject fusion protein (a nuclease defective CRISPR/Cas RNA- guided polypeptide, e.g., dCas9, fused to a first or second portion of a split BirA* protein) can have a fusion partner that provides for tagging (e.g., GFP), and can also have a subcellular localization sequence (e.g., one or more NLSs, 2 or more NLSs, etc.).
  • such a fusion protein might also have a tag for ease of tracking and/or purification (e.g., a histidine tag, e.g., a 6XHis tag; a hemagglutinin (HA) tag; a FLAG tag; a Myc tag; and the like).
  • a histidine tag e.g., a 6XHis tag; a hemagglutinin (HA) tag; a FLAG tag; a Myc tag; and the like.
  • a subject fusion protein is fused to an amino acid sequence (a fusion partner) that provides a tag (i.e., the fusion partner is a detectable label) for ease of tracking and/or purification (e.g., a fluorescent protein, e.g., green fluorescent protein (GFP), YFP, RFP, CFP, mCherry, tdTomato, and the like; a histidine tag, e.g., a 6XHis tag; a hemagglutinin (HA) tag; a FLAG tag; a Myc tag; and the like).
  • a fluorescent protein e.g., green fluorescent protein (GFP), YFP, RFP, CFP, mCherry, tdTomato, and the like
  • GFP green fluorescent protein
  • YFP green fluorescent protein
  • RFP red fluorescent protein
  • CFP CFP
  • mCherry mCherry
  • tdTomato e.g., a histidine
  • a subject fusion partner can provide for increased or decreased stability (i.e., the fusion partner can be a stability control peptide, e.g., a degron, which in some cases is controllable (e.g., a temperature sensitive or drug controllable degron sequence).
  • a stability control peptide e.g., a degron, which in some cases is controllable (e.g., a temperature sensitive or drug controllable degron sequence).
  • a subject fusion protein (a nuclease defective CRISPR/Cas RNA-guided
  • polypeptide e.g., dCas9, fused to a first or second portion of a split BirA* protein
  • a polypeptide permeant domain to promote uptake by the cell (i.e., a fusion partner can promote uptake by a cell).
  • permeant domains are known in the art and may be used, including peptides, peptidomimetics, and non-peptide carriers.
  • a permeant peptide may be derived from the third alpha helix of Drosophila melanogaster transcription factor Antennapaedia, referred to as penetratin, which comprises the amino acid sequence RQIKIWFQNRRMKWKK (SEQ ID NO: 1129).
  • the permeant peptide can comprise the HIV-1 tat basic region amino acid sequence, which may include, for example, amino acids 49-57 of naturally-occurring tat protein.
  • Other permeant domains include poly-arginine motifs, for example, the region of amino acids 34-56 of HIV-1 rev protein, nona- arginine, octa-arginine, and the like. (See, for example, Futaki et al.
  • the nona-arginine (R9) sequence is one of the more efficient PTDs that have been characterized (Wender et al. 2000; Uemura et al. 2002).
  • the site at which the fusion is made may be selected in order to optimize the biological activity, secretion or binding characteristics of the polypeptide. The optimal site can be determined by routine experimentation.
  • a subject fusion protein includes a "Protein Transduction Domain” or PTD (also known as a CPP - cell penetrating peptide), which refers to a polypeptide, polynucleotide, carbohydrate, or organic or inorganic compound that facilitates traversing a lipid bilayer, micelle, cell membrane, organelle membrane, or vesicle membrane.
  • PTD Protein Transduction Domain
  • a PTD attached to another molecule which can range from a small polar molecule to a large macromolecule and/or a nanoparticle, facilitates the molecule traversing a membrane, for example going from extracellular space to intracellular space, or cytosol to within an organelle.
  • a PTD is covalently linked to the N-terminus a polypeptide (e.g., a subject fusion protein). In some embodiments, a PTD is covalently linked to the C-terminus of a polypeptide (e.g., a subject fusion protein). In some cases, the PTD is inserted internally in primary amino acid sequence of a subject fusion protein. In some cases, a subject fusion protein includes (is conjugated to, is fused to) one or more PTDs (e.g., two or more, three or more, four or more PTDs).
  • a PTD includes a nuclear localization signal (NLS) (e.g., in some cases 2 or more, 3 or more, 4 or more, or 5 or more NLSs).
  • NLS nuclear localization signal
  • a subject fusion protein includes one or more NLSs (e.g., 2 or more, 3 or more, 4 or more, or 5 or more NLSs).
  • a PTD is covalently linked to a nucleic acid (e.g., a CRISPR/Cas guide RNA, a DNA molecule encoding a CRISPR/Cas guide RNA, a nucleic acid encoding a subject fusion protein).
  • a nucleic acid e.g., a CRISPR/Cas guide RNA, a DNA molecule encoding a CRISPR/Cas guide RNA, a nucleic acid encoding a subject fusion protein.
  • PTDs include but are not limited to a minimal undecapeptide protein transduction domain (corresponding to residues 47-57 of HIV- 1 TAT comprising YGRKKRRQRRR; SEQ ID NO: 1125); a polyarginine sequence comprising a number of arginines sufficient to direct entry into a cell (e.g., 3, 4, 5, 6, 7, 8, 9, 10, or 10-50 arginines); a VP22 domain (Zender et al. (2002) Cancer Gene Ther. 9(6):489-96); an Drosophila
  • Exemplary PTDs include but are not limited to, YGRKKRRQRRR (SEQ ID NO: 1125), RKKRRQRRR (SEQ ID NO: 1130); an arginine homopolymer of from 3 arginine residues to 50 arginine residues;
  • Exemplary PTD domain amino acid sequences include, but are not limited to, any of the following: YGRKKRRQRRR (SEQ ID NO: 1125); RKKRRQRR (SEQ ID NO: 1131); YARAAARQARA (SEQ ID NO: 1132); THRLPRRRRRR (SEQ ID NO: 1133); and GGRRARRRRRR (SEQ ID NO: 1134).
  • the PTD is an activatable CPP (ACPP) (Aguilera et al. (2009) Integr Biol (Camb) June; 1(5-6): 371-381).
  • ACPPs comprise a polycationic CPP (e.g., Arg9 or "R9") connected via a cleavable linker to a matching polyanion (e.g., Glu9 or "E9”), which reduces the net charge to nearly zero and thereby inhibits adhesion and uptake into cells.
  • a polyanion e.g., Glu9 or "E9
  • a subject fusion protein includes one or more linker polypeptides (i.e., a linker).
  • a linker separates the nuclease defective CRISPR/Cas nucleic acid-guided (e.g., RNA-guided) polypeptide (e.g., dCas9) from the split BirA* portion of the fusion protein.
  • a linker separates a fusion partner (e.g., NLS, PTD, tag, etc.) from the nuclease defective CRISPR/Cas nucleic acid-guided (e.g., RNA-guided) polypeptide.
  • a linker separates a fusion partner (e.g., NLS, PTD, tag, etc.) from the split BirA* portion of a subject fusion protein.
  • the linker polypeptide may have any of a variety of amino acid sequences.
  • the linker is a flexible linker. In some cases the linker is a rigid linker.
  • a linker can be of any convenient length, and length of the linker can influence guide RNA positioning. For example, a long linker between the nuclease defective CRISPR/Cas RNA- guided polypeptide and the split BirA* portion of a subject fusion protein, will likely allow for the guide RNAs to be targeted farther apart at the target locus because the CRISPR/Cas RNA- guided polypeptide portions (e.g., dCas9 portions of the fusion proteins) do not need to be as close to one another in order to bring the first and second split BirA* portions together.
  • CRISPR/Cas RNA- guided polypeptide portions e.g., dCas9 portions of the fusion proteins
  • a linker is 3 amino acids (aa) to 40 amino acids in length, e.g., from 3 amino acids to 25 amino acids in length, from 3 amino acids to 10 amino acids in length, from 3 amino acids to 5 amino acids in length, etc.
  • a linker is 10 to 30 amino acids long (e.g., 10 to 28, 10 to 25, 10 to 22, 10 to 20, 12 to 30, 12 to 28, 12 to 25, 12 to 22, 12 to 20, 15 to 30, 15 to 28, 15 to 25, 15 to 22, or 15 to 20 amino acids long).
  • a linker is 15 aa, 16 aa, 17 aa, 18 aa, 19 aa or 20 aa long.
  • Peptide linkers with a degree of flexibility can be used.
  • the linking peptides may have virtually any amino acid sequence, bearing in mind that the some linkers will have a sequence that results in a generally flexible peptide.
  • small amino acids such as glycine and alanine, are of use in creating a flexible peptide.
  • the creation of such sequences is routine to those of skill in the art.
  • a variety of different linkers are commercially available and are considered suitable for use.
  • the linker is an XTEN linker, e.g., SGSETPGTSES (SEQ ID NO: 10).
  • a subject linker includes the amino acid sequence GGGGSSGSETPGTSES (SEQ ID NO: 1135).
  • a subject linker includes the amino acid sequence ETPGTSES (SEQ ID NO: 1136).
  • linker polypeptides include glycine polymers (G) n , glycine-serine polymers
  • n is an integer of at least one, and can range from 1 to about 10
  • glycine-alanine polymers alanine-serine polymers.
  • Example linkers can include amino acid sequences including, but not limited to, GS, GGS, GGSG (SEQ ID NO: 1140), GGSGG (SEQ ID NO: 1141), GSGSG (SEQ ID NO: 1142), GSGGG (SEQ ID NO: 1143), GGGSG (SEQ ID NO: 1144), GSSSG (SEQ ID NO: 1145), and the like.
  • the ordinarily skilled artisan will recognize that design of a peptide conjugated to any elements described above can include linkers that are all or partially flexible, such that the linker can include a flexible linker as well as one or more portions that confer less flexible structure.
  • a CRISPR/Cas nucleic acid-guided (e.g., RNA -guided) protein interacts with (binds to) a corresponding guide RNA to form a ribonucleoprotein (RNP) complex that is targeted to a particular site in a target genome via base pairing between the guide RNA and a target sequence within the target genome.
  • RNP ribonucleoprotein
  • a guide RNA includes a nucleotide sequence (a guide sequence) that is complementary to a sequence (the target site) of a target nucleic acid and includes a 'constant region,' referred to as a 'handle,' that binds to the CRISPR/Cas nucleic acid-guided (e.g., RNA- guided) protein.
  • RNA-guided proteins bind to different guide RNA handles.
  • a subject fusion protein binds to a corresponding guide RNA when being targeted to a specific sequence of a target DNA.
  • a subject composition or kit includes only the fusion proteins and the user adds the desired corresponding guide RNA (or a nucleic acid encoding the
  • a nucleic acid molecule e.g., RNA, DNA, or RNA/DNA hybrid
  • binds to a class 2 e.g., RNA, DNA, or RNA/DNA hybrid
  • CRISPR/Cas endonuclease e.g., a Cas9 protein; a type V or type VI CRISPR/Cas protein; a Cpf 1 protein; etc.
  • targets the complex to a specific location within a target nucleic acid is referred to herein as a "guide RNA” or “CRISPR/Cas guide nucleic acid” or “CRISPR/Cas guide RNA” (all of which terms are used interchangeably).
  • a guide RNA provides target specificity to the complex (the RNP complex) by including a targeting segment, which includes a guide sequence (also referred to herein as a targeting sequence), which is a nucleotide sequence that is complementary to a sequence of a target nucleic acid.
  • a targeting segment which includes a guide sequence (also referred to herein as a targeting sequence), which is a nucleotide sequence that is complementary to a sequence of a target nucleic acid.
  • a guide RNA can be referred to by the protein to which it corresponds.
  • the corresponding guide RNA can be referred to as a "Cas9 guide RNA.”
  • the corresponding guide RNA can be referred to as a "Cpf 1 guide RNA.”
  • a guide RNA can include two separate nucleic acid molecules: an "activator” and a “targeter” and is referred to herein as a "dual guide RNA", a “double-molecule guide RNA", a "two-molecule guide RNA", or a "dgRNA.”
  • the guide RNA is one molecule (e.g., for some class 2 CRISPR/Cas proteins,
  • a nucleic acid molecule that binds to a Cas9 protein and targets the complex to a specific location within a target nucleic acid is referred to herein as a "Cas9 guide RNA" (even though in some cases it can be an RNA/DNA hybrid molecule).
  • a Cas9 guide RNA can be said to include two segments, a first segment (referred to herein as a “targeting segment”); and a second segment (referred to herein as a “protein-binding segment”).
  • target segment a segment/section/region of a molecule, e.g., a contiguous stretch of nucleotides in a nucleic acid molecule.
  • a segment can also mean a region/section of a complex such that a segment may comprise regions of more than one molecule.
  • the first segment (targeting segment) of a Cas9 guide RNA includes a nucleotide sequence (a guide sequence) that is complementary to (and therefore hybridizes with) a specific sequence (a target site) within a target nucleic acid (e.g., a target ssRNA, a target ssDNA, the complementary strand of a double stranded target DNA, etc.).
  • the protein-binding segment (or "protein-binding sequence”) interacts with (binds to) a Cas9 polypeptide.
  • the protein-binding segment of a subject Cas9 guide RNA includes two complementary stretches of nucleotides that hybridize to one another to form a double stranded RNA duplex (dsRNA duplex).
  • Site-specific binding to and/or cleavage of a target nucleic acid can occur at locations (e.g., target sequence of a target locus) determined by base-pairing complementarity between the Cas9 guide RNA (the guide sequence of the Cas9 guide RNA) and the target nucleic acid.
  • a Cas9 guide RNA and a Cas9 protein form a complex (e.g., bind via non-covalent
  • the Cas9 guide RNA provides target specificity to the complex by including a targeting segment, which includes a guide sequence (a nucleotide sequence that is
  • the Cas9 protein of the complex provides the site-specific activity (e.g., cleavage activity or an activity provided by the Cas9 protein when the Cas9 protein is a Cas9 fusion polypeptide, i.e., has a fusion partner such as a split BirA*).
  • the Cas9 protein is guided to a target nucleic acid sequence (e.g. a target sequence in a chromosomal nucleic acid, e.g., a chromosome; a target sequence in an extrachromosomal nucleic acid, e.g.
  • the "guide sequence” also referred to as the "targeting sequence” of a Cas9 guide RNA can be modified so that the Cas9 guide RNA can target a Cas9 protein to any desired sequence of any desired target nucleic acid, with the exception that the protospacer adjacent motif (PAM) sequence can be taken into account.
  • PAM protospacer adjacent motif
  • a Cas9 guide RNA can have a targeting segment with a sequence (a guide sequence) that has complementarity with (e.g., can hybridize to) a sequence in a nucleic acid in a eukaryotic cell, e.g., a viral nucleic acid, a eukaryotic nucleic acid (e.g., a eukaryotic chromosome, chromosomal sequence, a eukaryotic RNA, etc.), and the like.
  • a eukaryotic cell e.g., a viral nucleic acid, a eukaryotic nucleic acid (e.g., a eukaryotic chromosome, chromosomal sequence, a eukaryotic RNA, etc.), and the like.
  • a Cas9 guide RNA includes two separate nucleic acid molecules: an "activator” and a “targeter” and is referred to herein as a “dual Cas9 guide RNA", a “double- molecule Cas9 guide RNA”, or a “two-molecule Cas9 guide RNA” a “dual guide RNA”, or a “dgRNA.”
  • the activator and targeter are covalently linked to one another (e.g., via intervening nucleotides) and the guide RNA is referred to as a "single guide RNA", a “Cas9 single guide RNA", a “single-molecule Cas9 guide RNA,” or a “one-molecule Cas9 guide RNA”, or simply "sgRNA.”
  • a Cas9 guide RNA comprises a crRNA-like ("CRISPR RNA” / "targeter” / "crRNA” /
  • crRNA repeat molecule and a corresponding tracrRNA-like ("trans-acting CRISPR RNA” / "activator” / "tracrRNA”) molecule.
  • a crRNA-like molecule comprises both the targeting segment (single stranded) of the Cas9 guide RNA and a stretch ("duplex-forming segment") of nucleotides that forms one half of the dsRNA duplex of the protein-binding segment of the Cas9 guide RNA.
  • a corresponding tracrRNA-like molecule (activator / tracrRNA) comprises a stretch of nucleotides (duplex-forming segment) that forms the other half of the dsRNA duplex of the protein-binding segment of the guide nucleic acid.
  • a stretch of nucleotides of a crRNA-like molecule are complementary to and hybridize with a stretch of nucleotides of a tracrRNA-like molecule to form the dsRNA duplex of the protein- binding domain of the Cas9 guide RNA.
  • each targeter molecule can be said to have a corresponding activator molecule (which has a region that hybridizes with the targeter).
  • the targeter molecule additionally provides the targeting segment.
  • a targeter and an activator molecule hybridize to form a Cas9 guide RNA.
  • the exact sequence of a given crRNA or tracrRNA molecule is characteristic of the species in which the RNA molecules are found.
  • a subject dual Cas9 guide RNA can include any corresponding activator and targeter pair.
  • activator or “activator RNA” is used herein to mean a tracrRNA-like molecule (tracrRNA : “trans-acting CRISPR RNA”) of a Cas9 dual guide RNA (and therefore of a Cas9 single guide RNA when the "activator” and the “targeter” are linked together by, e.g., intervening nucleotides).
  • a Cas9 guide RNA (dgRNA or sgRNA) comprises an activator sequence (e.g., a tracrRNA sequence).
  • a tracr molecule is a naturally existing molecule that hybridizes with a CRISPR RNA molecule (a crRNA) to form a Cas9 dual guide RNA.
  • the term "activator" is used herein to encompass naturally existing tracrRNAs, but also to encompass tracrRNAs with modifications (e.g., truncations, sequence variations, base modifications, backbone modifications, linkage modifications, etc.) where the activator retains at least one function of a tracrRNA (e.g., contributes to the dsRNA duplex to which Cas9 protein binds). In some cases the activator provides one or more stem loops that can interact with Cas9 protein.
  • An activator can be referred to as having a tracr sequence (tracrRNA sequence) and in some cases is a tracrRNA, but the term “activator” is not limited to naturally existing tracrRNAs.
  • targeter or “targeter RNA” is used herein to refer to a crRNA-like molecule
  • a Cas9 guide RNA comprises a targeting segment (which includes nucleotides that hybridize with (are complementary to) a target nucleic acid, and a duplex-forming segment (e.g., a duplex forming segment of a crRNA, which can also be referred to as a crRNA repeat).
  • the sequence of a targeting segment (the segment that hybridizes with a target sequence of a target nucleic acid) of a targeter is modified by a user to hybridize with a desired target nucleic acid
  • the sequence of a targeter will often be a non- naturally occurring sequence.
  • the duplex-forming segment of a targeter (described in more detail below), which hybridizes with the duplex-forming segment of an activator, can include a naturally existing sequence (e.g., can include the sequence of a duplex-forming segment of a naturally existing crRNA, which can also be referred to as a crRNA repeat).
  • targeter is used herein to distinguish from naturally occurring crRNAs, despite the fact that part of a targeter (e.g., the duplex-forming segment) often includes a naturally occurring sequence from a crRNA. However, the term “targeter” encompasses naturally occurring crRNAs.
  • a Cas9 guide RNA can also be said to include 3 parts: (i) a targeting sequence (a nucleotide sequence that hybridizes with a sequence of the target nucleic acid); (ii) an activator sequence (as described above) (in some cases, referred to as a tracr sequence); and (iii) a sequence that hybridizes to at least a portion of the activator sequence to form a double stranded duplex.
  • a targeter has (i) and (iii); while an activator has (ii).
  • a Cas9 guide RNA (e.g. a dual guide RNA or a single guide RNA) can be comprised of any corresponding activator and targeter pair.
  • the duplex forming segments can be swapped between the activator and the targeter.
  • the targeter includes a sequence of nucleotides from a duplex forming segment of a tracrRNA (which sequence would normally be part of an activator) while the activator includes a sequence of nucleotides from a duplex forming segment of a crRNA (which sequence would normally be part of a targeter).
  • a targeter comprises both the targeting segment (single stranded) of the Cas9 guide RNA and a stretch ("duplex-forming segment") of nucleotides that forms one half of the dsRNA duplex of the protein-binding segment of the Cas9 guide RNA.
  • a corresponding tracrRNA-like molecule comprises a stretch of nucleotides (a duplex-forming segment) that forms the other half of the dsRNA duplex of the protein-binding segment of the Cas9 guide RNA.
  • a stretch of nucleotides of the targeter is complementary to and hybridizes with a stretch of nucleotides of the activator to form the dsRNA duplex of the protein- binding segment of a Cas9 guide RNA.
  • each targeter can be said to have a
  • a targeter and an activator hybridize to form a Cas9 guide RNA.
  • the particular sequence of a given naturally existing crRNA or tracrRNA molecule is characteristic of the species in which the RNA molecules are found. Examples of suitable activator and targeter are well known in the art.
  • a Cas9 guide RNA (e.g. a dual guide RNA or a single guide RNA) can be comprised of any corresponding activator and targeter pair.
  • Non-limiting examples of nucleotide sequences that can be included in a Cas9 guide RNA include sequences set forth in SEQ ID NOs: 857-1105, or complements thereof.
  • sequences from SEQ ID NOs: 857-987 (which are from tracrRNAs) or complements thereof can pair with sequences from SEQ ID NOs: 994-1105 (which are from crRNAs), or complements thereof, to form a dsRNA duplex of a protein binding segment.
  • the first segment of a subject guide nucleic acid includes a guide sequence (i.e., a targeting sequence)(a nucleotide sequence that is complementary to a sequence (a target site) in a target nucleic acid).
  • a targeting sequence i.e., a targeting sequence
  • the targeting segment of a subject guide nucleic acid can interact with a target nucleic acid (e.g., double stranded DNA (dsDNA)) in a sequence-specific manner via hybridization (i.e., base pairing).
  • dsDNA double stranded DNA
  • the nucleotide sequence of the targeting segment may vary (depending on the target) and can determine the location within the target nucleic acid that the Cas9 guide RNA and the target nucleic acid will interact.
  • the targeting segment of a Cas9 guide RNA can be modified (e.g., by genetic engineering)/designed to hybridize to any desired sequence (target site) within a target nucleic acid (e.g., a eukaryotic target nucleic acid such as genomic DNA).
  • a target nucleic acid e.g., a eukaryotic target nucleic acid such as genomic DNA.
  • the targeting segment can have a length of 7 or more nucleotides (nt) (e.g., 8 or more, 9 or more, 10 or more, 12 or more, 15 or more, 17 or more, 20 or more, 25 or more, 30 or more, or 40 or more nucleotides).
  • nt nucleotides
  • the targeting segment can have a length of from 7 to 100 nucleotides (nt) (e.g., from 7 to 80 nt, from 7 to 60 nt, from 7 to 40 nt, from 7 to 30 nt, from 7 to 25 nt, from 7 to 22 nt, from 7 to 20 nt, from 7 to 18 nt, from 8 to 80 nt, from 8 to 60 nt, from 8 to 40 nt, from 8 to 30 nt, from 8 to 25 nt, from 8 to 22 nt, from 8 to 20 nt, from 8 to 18 nt, from 10 to 100 nt, from 10 to 80 nt, from 10 to 60 nt, from 10 to 40 nt, from 10 to 30 nt, from 10 to 25 nt, from 10 to 22 nt, from 10 to 20 nt, from 10 to 18 nt, from 12 to 100 nt, from 12 to 80 nt, from 12 to 60 nt
  • the complementary to a nucleotide sequence (target site) of the target nucleic acid can have a length of 10 nt or more.
  • the targeting sequence of the targeting segment that is complementary to a target site of the target nucleic acid can have a length of 12 nt or more, 15 nt or more, 17 nt or more, 18 nt or more, 19 nt or more, or 20 nt or more.
  • the nucleotide sequence (the targeting sequence) of the targeting segment that is complementary to a nucleotide sequence (target site) of the target nucleic acid has a length of 12 nt or more.
  • the nucleotide sequence (the targeting sequence) of the targeting segment that is complementary to a nucleotide sequence (target site) of the target nucleic acid has a length of 18 nt or more.
  • the targeting sequence of the targeting segment that is complementary to a target sequence of the target nucleic acid can have a length of from 10 to 100 nucleotides (nt) (e.g., from 10 to 90 nt, from 10 to 75 nt, from 10 to 60 nt, from 10 to 50 nt, from 10 to 35 nt, from 10 to 30 nt, from 10 to 25 nt, from 10 to 22 nt, from 10 to 20 nt, from 12 to 100 nt, from 12 to 90 nt, from 12 to 75 nt, from 12 to 60 nt, from 12 to 50 nt, from 12 to 35 nt, from 12 to 30 nt, from 12 to 25 nt, from 12 to 22 nt, from 12 to 20 nt, from 15 to 100 nt, from 15 to 90 nt, from 15 to 75 nt, from 15 to 60 nt, from 15 to 50 nt, from 15 to 35 nt, from 10 to 30 nt
  • the targeting sequence of the targeting segment that is complementary to a target sequence of the target nucleic acid has a length of from 15 nt to 30 nt. In some cases, the targeting sequence of the targeting segment that is complementary to a target sequence of the target nucleic acid has a length of from 15 nt to 25 nt. In some cases, the targeting sequence of the targeting segment that is complementary to a target sequence of the target nucleic acid has a length of from 18 nt to 30 nt. In some cases, the targeting sequence of the targeting segment that is complementary to a target sequence of the target nucleic acid has a length of from 18 nt to 25 nt.
  • the targeting sequence of the targeting segment that is complementary to a target sequence of the target nucleic acid has a length of from 18 nt to 22 nt. In some cases, the targeting sequence of the targeting segment that is complementary to a target site of the target nucleic acid is 20 nucleotides in length. In some cases, the targeting sequence of the targeting segment that is complementary to a target site of the target nucleic acid is 19 nucleotides in length.
  • the percent complementarity between the targeting sequence (guide sequence) of the targeting segment and the target site of the target nucleic acid can be 60% or more (e.g., 65% or more, 70% or more, 75% or more, 80% or more, 85% or more, 90% or more, 95% or more, 97% or more, 98% or more, 99% or more, or 100%). In some cases, the percent complementarity between the targeting sequence of the targeting segment and the target site of the target nucleic acid is 100% over the seven contiguous 5 '-most nucleotides of the target site of the target nucleic acid. In some cases, the percent complementarity between the targeting sequence of the targeting segment and the target site of the target nucleic acid is 60% or more over about 20 contiguous nucleotides.
  • the percent complementarity between the targeting sequence of the targeting segment and the target site of the target nucleic acid is 100% over the fourteen contiguous 5 '-most nucleotides of the target site of the target nucleic acid and as low as 0% or more over the remainder. In such a case, the targeting sequence can be considered to be 14 nucleotides in length. In some cases, the percent complementarity between the targeting sequence of the targeting segment and the target site of the target nucleic acid is 100% over the seven contiguous 5 '-most nucleotides of the target site of the target nucleic acid and as low as 0% or more over the remainder. In such a case, the targeting sequence can be considered to be 20 nucleotides in length.
  • the percent complementarity between the targeting sequence of the targeting segment and the target site of the target nucleic acid is 100% over the 7 contiguous 5 '-most nucleotides of the target site of the target nucleic acid (which can be complementary to the 3'- most nucleotides of the targeting sequence of the Cas9 guide RNA). In some cases, the percent complementarity between the targeting sequence of the targeting segment and the target site of the target nucleic acid is 100% over the 8 contiguous 5 '-most nucleotides of the target site of the target nucleic acid (which can be complementary to the 3 '-most nucleotides of the targeting sequence of the Cas9 guide RNA).
  • the percent complementarity between the targeting sequence of the targeting segment and the target site of the target nucleic acid is 100% over the 9 contiguous 5 '-most nucleotides of the target site of the target nucleic acid (which can be complementary to the 3 '-most nucleotides of the targeting sequence of the Cas9 guide RNA). In some cases, the percent complementarity between the targeting sequence of the targeting segment and the target site of the target nucleic acid is 100% over the 10 contiguous 5 '-most nucleotides of the target site of the target nucleic acid (which can be complementary to the 3'- most nucleotides of the targeting sequence of the Cas9 guide RNA).
  • the percent complementarity between the targeting sequence of the targeting segment and the target site of the target nucleic acid is 100% over the 17 contiguous 5 '-most nucleotides of the target site of the target nucleic acid (which can be complementary to the 3 '-most nucleotides of the targeting sequence of the Cas9 guide RNA). In some cases, the percent complementarity between the targeting sequence of the targeting segment and the target site of the target nucleic acid is 100% over the 18 contiguous 5 '-most nucleotides of the target site of the target nucleic acid (which can be complementary to the 3 '-most nucleotides of the targeting sequence of the Cas9 guide RNA).
  • the percent complementarity between the targeting sequence of the targeting segment and the target site of the target nucleic acid is 60% or more (e.g., e.g., 65% or more, 70% or more, 75% or more, 80% or more, 85% or more, 90% or more, 95% or more, 97% or more, 98% or more, 99% or more, or 100%) over about 20 contiguous nucleotides.
  • the percent complementarity between the targeting sequence of the targeting segment and the target site of the target nucleic acid is 100% over the 7 contiguous 5 '-most nucleotides of the target site of the target nucleic acid and as low as 0% or more over the remainder. In such a case, the targeting sequence can be considered to be 7 nucleotides in length. In some cases, the percent complementarity between the targeting sequence of the targeting segment and the target site of the target nucleic acid is 100% over the 8 contiguous 5 '-most nucleotides of the target site of the target nucleic acid and as low as 0% or more over the remainder. In such a case, the targeting sequence can be considered to be 8 nucleotides in length.
  • the percent complementarity between the targeting sequence of the targeting segment and the target site of the target nucleic acid is 100% over the 9 contiguous 5 '-most nucleotides of the target site of the target nucleic acid and as low as 0% or more over the remainder. In such a case, the targeting sequence can be considered to be 9 nucleotides in length. In some cases, the percent complementarity between the targeting sequence of the targeting segment and the target site of the target nucleic acid is 100% over the 10 contiguous 5 '-most nucleotides of the target site of the target nucleic acid and as low as 0% or more over the remainder. In such a case, the targeting sequence can be considered to be 10 nucleotides in length.
  • the percent complementarity between the targeting sequence of the targeting segment and the target site of the target nucleic acid is 100% over the 11 contiguous 5'- most nucleotides of the target site of the target nucleic acid and as low as 0% or more over the remainder. In such a case, the targeting sequence can be considered to be 11 nucleotides in length. In some cases, the percent complementarity between the targeting sequence of the targeting segment and the target site of the target nucleic acid is 100% over the 12 contiguous 5'- most nucleotides of the target site of the target nucleic acid and as low as 0% or more over the remainder. In such a case, the targeting sequence can be considered to be 12 nucleotides in length.
  • the percent complementarity between the targeting sequence of the targeting segment and the target site of the target nucleic acid is 100% over the 13 contiguous 5'- most nucleotides of the target site of the target nucleic acid and as low as 0% or more over the remainder. In such a case, the targeting sequence can be considered to be 13 nucleotides in length. In some cases, the percent complementarity between the targeting sequence of the targeting segment and the target site of the target nucleic acid is 100% over the 14 contiguous 5'- most nucleotides of the target site of the target nucleic acid and as low as 0% or more over the remainder. In such a case, the targeting sequence can be considered to be 14 nucleotides in length.
  • the percent complementarity between the targeting sequence of the targeting segment and the target site of the target nucleic acid is 100% over the 17 contiguous 5'- most nucleotides of the target site of the target nucleic acid and as low as 0% or more over the remainder. In such a case, the targeting sequence can be considered to be 17 nucleotides in length. In some cases, the percent complementarity between the targeting sequence of the targeting segment and the target site of the target nucleic acid is 100% over the 18 contiguous 5'- most nucleotides of the target site of the target nucleic acid and as low as 0% or more over the remainder. In such a case, the targeting sequence can be considered to be 18 nucleotides in length.
  • Examples of various Cas9 proteins and Cas9 guide RNAs can be found in the art, for example, see Jinek et al., Science. 2012 Aug 17;337(6096):816-21 ; Chylinski et al., RNA Biol. 2013 May;10(5):726-37; Ma et al., Biomed Res Int. 2013;2013:270805; Hou et al., Proc Natl Acad Sci U S A. 2013 Sep 24;110(39): 15644- 9; Jinek et al., Elife.
  • Jinek et al. Science. 2012 Aug 17;337(6096):816-21 ; Chylinski et al., RNA Biol. 2013 May;10(5):726-37; Ma et al., Biomed Res Int. 2013;2013:270805; Hou et al., Proc Natl Acad Sci U S A. 2013 Sep 24;110(39): 15644
  • RNAs corresponding to type V and type VI CRISPR/Cas endonucleases e.g., Cpfl Guide RNA
  • a guide RNA that binds to a type V or type VI CRISPR/Cas protein e.g., Cpfl, C2cl, C2c2, C2c3
  • a type V or type VI CRISPR/Cas guide RNA e.g., Cpfl, C2cl, C2c2, C2c3
  • An example of a more specific term is a "Cpfl guide RNA.”
  • a type V or type VI CRISPR/Cas guide RNA can have a total length of from 30 nucleotides (nt) to 200 nt, e.g., from 30 nt to 180 nt, from 30 nt to 160 nt, from 30 nt to 150 nt, from 30 nt to 125 nt, from 30 nt to 100 nt, from 30 nt to 90 nt, from 30 nt to 80 nt, from 30 nt to 70 nt, from 30 nt to 60 nt, from 30 nt to 50 nt, from 50 nt to 200 nt, from 50 nt to 180 nt, from 50 nt to 160 nt, from 50 nt to 150 nt, from 50 nt to 125 nt, from 50 nt to 100 nt, from 50 nt to 90 nt, from 50
  • a type V or type VI CRISPR/Cas guide RNA (e.g., cpfl guide RNA) has a total length of at least 30 nt (e.g., at least 40 nt, at least 50 nt, at least 60 nt, at least 70 nt, at least 80 nt, at least 90 nt, at least 100 nt, or at least 120 nt,).
  • a Cpfl guide RNA has a total length of 35 nt, 36 nt, 37 nt, 38 nt, 39 nt, 40 nt, 41 nt, 42 nt, 43 nt, 44 nt, 45 nt, 46 nt, 47 nt, 48 nt, 49 nt, or 50 nt.
  • a type V or type VI CRISPR/Cas guide RNA can include a target nucleic acid-binding segment and a duplex-forming region (e.g., in some cases formed from two duplex-forming segments, i.e., two stretches of nucleotides that hybridize to one another to form a duplex).
  • the target nucleic acid-binding segment of a type V or type VI CRISPR/Cas guide RNA can have a length of from 15 nt to 30 nt, e.g., 15 nt, 16 nt, 17 nt, 18 nt, 19 nt, 20 nt, 21 nt, 22 nt, 23 nt, 24 nt, 25 nt, 26 nt, 27 nt, 28 nt, 29 nt, or 30 nt.
  • the target nucleic acid-binding segment has a length of 23 nt.
  • the target nucleic acid-binding segment has a length of 24 nt.
  • the target nucleic acid-binding segment has a length of 25 nt.
  • the guide sequence of a type V or type VI CRISPR/Cas guide RNA can have a length of from 15 nt to 30 nt (e.g., 15 to 25 nt, 15 to 24 nt, 15 to 23 nt, 15 to 22 nt, 15 to 21 nt, 15 to 20 nt, 15 to 19 nt, 15 to 18 nt,17 to 30 nt, 17 to 25 nt, 17 to 24 nt, 17 to 23 nt, 17 to 22 nt, 17 to 21 nt, 17 to 20 nt, 17 to 19 nt, 17 to 18 nt, 18 to 30 nt, 18 to 25 nt, 18 to 24 nt, 18 to 23 nt, 18 to 22 nt, 18 to 21 nt, 18 to 20 nt, 18 to 19 nt, 19 to 30 nt, 19 to 25 nt, 19 to 24 nt,
  • the guide sequence has a length of 17 nt. In some cases, the guide sequence has a length of 18 nt. In some cases, the guide sequence has a length of 19 nt. In some cases, the guide sequence has a length of 20 nt. In some cases, the guide sequence has a length of 21 nt. In some cases, the guide sequence has a length of 22 nt. In some cases, the guide sequence has a length of 23 nt. In some cases, the guide sequence has a length of 24 nt.
  • the guide sequence of a type V or type VI CRISPR/Cas guide RNA can have 100% complementarity with a corresponding length of target nucleic acid sequence.
  • the guide sequence can have less than 100% complementarity with a corresponding length of target nucleic acid sequence.
  • the guide sequence of a type V or type VI CRISPR/Cas guide RNA e.g., cpfl guide RNA
  • the target nucleic acid-binding segment has 100% complementarity to the target nucleic acid sequence.
  • the target nucleic acid-binding segment has 1 non-complementary nucleotide and 24 complementary nucleotides with the target nucleic acid sequence.
  • the target nucleic acid- binding segment has 2 non-complementary nucleotides and 23 complementary nucleotides with the target nucleic acid sequence.
  • the duplex-forming segment of a type V or type VI CRISPR/Cas guide RNA (e.g., cpfl guide RNA) (e.g., of a targeter RNA or an activator RNA) can have a length of from 15 nt to 25 nt (e.g., 15 nt, 16 nt, 17 nt, 18 nt, 19 nt, 20 nt, 21 nt, 22 nt, 23 nt, 24 nt, or 25 nt).
  • the RNA duplex of a type V or type VI CRISPR/Cas guide RNA can have a length of from 5 base pairs (bp) to 40 bp (e.g., from 5 to 35 bp, 5 to 30 bp, 5 to 25 bp, 5 to 20 bp, 5 to 15 bp, 5-12 bp, 5-10 bp, 5-8 bp, 6 to 40 bp, 6 to 35 bp, 6 to 30 bp, 6 to 25 bp, 6 to 20 bp, 6 to 15 bp, 6 to 12 bp, 6 to 10 bp, 6 to 8 bp, 7 to 40 bp, 7 to 35 bp, 7 to 30 bp, 7 to 25 bp, 7 to 20 bp, 7 to 15 bp, 7 to 12 bp, 7 to 10 bp, 8 to 40 bp, 8 to 35 bp, 8 to 30 bp, 7 to 25 bp, 7 to 20 b
  • a duplex-forming segment of a Cpfl guide RNA can comprise a nucleotide sequence selected from (5' to 3'): AAUUUCUACUGUUGUAGAU (SEQ ID NO: 1106), AAUUUCUGCUGUUGCAGAU (SEQ ID NO: 1107), AAUUUCCACUGUUGUGGAU (SEQ ID NO: 1095), AAUUCCUACUGUUGUAGGU (SEQ ID NO: 1109),
  • AAUUUCUACUAUUGUAGAU (SEQ ID NO: 1110), AAUUUCUACUGCUGUAGAU (SEQ ID NO: 1111), AAUUUCUACUUUGUAGAU (SEQ ID NO: 1112), and
  • AAUUUCUACUUGUAGAU (SEQ ID NO: 1113).
  • the guide sequence can then follow (5' to 3') the duplex forming segment.
  • a non-limiting example of an activator RNA (e.g. tracrRNA) of a C2cl guide RNA (dual guide or single guide) is an RNA that includes the nucleotide sequence
  • a C2cl guide RNA is an RNA that includes the nucleotide sequence
  • a C2cl guide RNA is an RNA that includes the nucleotide sequence
  • a C2cl guide RNA (dual guide or single guide) is an RNA that includes the nucleotide sequence
  • an activator RNA (e.g. tracrRNA) of a C2cl guide RNA is an RNA that includes the nucleotide sequence ACUUUCCAGGCAAAGCCCGUUGAGCUUCUCAAAAAG (SEQ ID NO: 1117).
  • a duplex forming segment of a C2cl guide RNA (dual guide or single guide) of an activator RNA includes the nucleotide sequence
  • AGCUUCUCA (SEQ ID NO: 1118) or the nucleotide sequence GCUUCUCA (SEQ ID NO: 1119) (the duplex forming segment from a naturally existing tracrRNA.
  • a non-limiting example of a targeter RNA (e.g. crRNA) of a C2cl guide RNA (dual guide or single guide) is an RNA with the nucleotide sequence
  • CUGAGAAGUGGCACNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN SEQ ID NO: 1120
  • the Ns represent the guide sequence, which will vary depending on the target sequence, and although 20 Ns are depicted a range of different lengths are acceptable.
  • a duplex forming segment of a C2cl guide RNA (dual guide or single guide) of a targeter RNA e.g.
  • crRNA includes the nucleotide sequence CUGAGAAGUGGCAC (SEQ ID NO: 1121) or includes the nucleotide sequence CUGAGAAGU (SEQ ID NO: 1122) or includes the nucleotide sequence UGAGAAGUGGCAC (SEQ ID NO: 1123) or includes the nucleotide sequence UGAGAAGU (SEQ ID NO: 1124).
  • a target nucleic acid (e.g., target genomic DNA) can be located within a eukaryotic cell, for example, inside of a eukaryotic cell in vitro, inside of a eukaryotic cell in vivo, inside of a eukaryotic cell ex vivo.
  • Suitable target cells include, but are not limited to: a single-celled eukaryotic organism; a cell of a single -cell eukaryotic organism; a plant cell; an algal cell, e.g., Botryococcus braunii, Chlamydomonas reinhardtii, Nannochloropsis gaditana, Chlorella pyrenoidosa, Sargassum patens, C. agardh, and the like; a fungal cell (e.g., a yeast cell); an animal cell; a cell of an invertebrate animal (e.g.
  • a cell of a vertebrate animal e.g., fish, amphibian, reptile, bird, mammal
  • a cell of a mammal e.g., a cell of a rodent such as a mouse or rat, a cell of a non-human primate, a cell of a human, etc.
  • Any type of cell may be of interest (e.g. a stem cell, e.g.
  • Cells may be from established cell lines or they may be primary cells, where "primary cells”, “primary cell lines”, and “primary cultures” are used interchangeably herein to refer to cells and cells cultures that have been derived from a subject and allowed to grow in vitro for a limited number of passages, i.e. splittings, of the culture.
  • primary cultures are cultures that may have been passaged 0 times, 1 time, 2 times, 4 times, 5 times, 10 times, or 15 times, but not enough times go through the crisis stage.
  • the primary cell lines are maintained for fewer than 10 passages in vitro.
  • Target cells can be unicellular organisms and/or can be grown in culture.
  • the cells may be harvest from an individual by any convenient method.
  • leukocytes may be conveniently harvested by apheresis, leukocytapheresis, density gradient separation, etc., while cells from tissues such as skin, muscle, bone marrow, spleen, liver, pancreas, lung, intestine, stomach, etc. can be conveniently harvested by biopsy.
  • Target cells include in vivo target cells.
  • Target cells include retinal cells (e.g., Muller cells, ganglion cells, amacrine cells, horizontal cells, bipolar cells, and photoreceptor cells including rods and cones, Muller glial cells, and retinal pigmented epithelium); neural cells (e.g., cells of the thalamus, sensory cortex, zona incerta (71), ventral tegmental area (VTA), prefontal cortex (PFC), nucleus accumbens (NAc), amygdala (BLA), substantia nigra, ventral pallidum, globus pallidus, dorsal striatum, ventral striatum, subthalamic nucleus, hippocampus, dentate gyrus, cingulate gyrus, entorhinal cortex, olfactory cortex, primary motor cortex, or cerebellum); liver cells; kidney cells; immune cells (e.g., T cells
  • the guide sequences of the guide RNAs are targeted to target sequences of the target DNA, and the guide sequences are positioned relative to one another such that upon hybridization to the target sequences the fusion proteins to which the guide RNAs are bound are oriented so that the split BirA* portions of fusion proteins interact to form a functional BirA* protein.
  • This can be achieved in a number of ways, and can depend on the orientation of the fusion proteins (e.g., whether the split BirA* portion is fused to the N-terminus, C-terminus, or internal to the CRISPR/Cas nucleic acid-guided protein).
  • a CRISPR/Cas nucleic acid-guided protein binds to a double stranded target DNA at a target sequence defined by both (i) base-pairing complementarity between the guide RNA and the target DNA; and (ii) a short motif [referred to as the protospacer adjacent motif (PAM)] in the target DNA.
  • PAM protospacer adjacent motif
  • the PAM for a natural Cas9 protein is immediately 3' of the target sequence of the non- complementary strand of the target DNA (the complementary strand hybridizes to the guide sequence of the guide RNA while the non-complementary strand does not directly hybridize with the guide RNA and is the reverse complement of the non-complementary strand) while the PAM for a natural CasX protein is immediately 5' of the target sequence of the non- complementary strand of the target DNA.
  • Cas9 when Cas9 from 5.
  • the PAM sequence of the non-complementary strand is 5'-NGG-3', where N is any DNA nucleotide.
  • the PAM sequence of the non-complementary strand is 5'-TCN-3' (and in some cases TTCN), where N is any DNA nucleotide.
  • CRISPR/Cas nucleic acid-guided proteins e.g., Cas9, Cpfl, CasX, CasY, and the like
  • Cas9, Cpfl, CasX, CasY, and the like may be advantageous to use in the various provided methods in order to capitalize on desired enzymatic characteristics of the different proteins (e.g., for different PAM sequence preferences; for increased or decreased enzymatic activity; for an increased or decreased level of cellular toxicity; to change the balance between NHEJ, homology-directed repair, single strand breaks, double strand breaks, etc.; to take advantage of a short amino acid sequence; and the like).
  • CRISPR/Cas nucleic acid-guided proteins from different species may require different PAM sequences in the target DNA.
  • the PAM sequence requirement may be different than the 5'- NGG-3'or 5'-TCN-3' described above.
  • Various methods including in silico and/or wet lab methods) for identification of the appropriate PAM sequence for any given CRISPR/Cas nucleic acid-guided protein are known in the art and are routine, and any convenient method can be used (e.g., an in silico sequence analysis and/or a PAM depletion assay can be used).
  • the first target sequence is immediately adjacent a first protospacer adjacent motif (PAM); the second target sequence is immediately adjacent a second protospacer adjacent motif (PAM); and the first and second PAMs are separated by 15 to 100 base pairs (bp).
  • PAM protospacer adjacent motif
  • bp base pairs
  • the first target sequence is immediately adjacent a first protospacer adjacent motif (PAM); the second target sequence is immediately adjacent a second protospacer adjacent motif (PAM); and the first and second PAMs are separated by 15 to 100 base pairs (bp) (e.g., separated by 15 to 80 bp, 15 to 60 bp, 15 to 50 bp , 15 to 40 bp, 15 to 30 bp, 15 to 25 bp, 25 to 100 bp, 25 to 80 bp, 25 to 60 bp, 25 to 50 bp , 25 to 40 bp, 25 to 35 bp, 35 to 100 bp, 35 to 80 bp, 35 to 60 bp, 35 to 55 bp, 35 to 50 bp , 35 to 45 bp, 55 to 100 bp, 55 to 80 bp, or 55 to 75 bp).
  • bp base pairs
  • the first and second PAMs would be separated by 40 bp.
  • the first and second PAMs would be separated by 60 bp. In some cases the first and second PAMs are separated by 35 to 80 bp. In some cases the first and second PAMs are separated by 35 to 55 bp. In some cases the first and second PAMs are separated by 55 to 75 bp.
  • the first and second guide nucleic acids bind to target sites on a target DNA that are adjacent one another. In some cases the first and second guide nucleic acids (e.g., guide RNAs), bind to target sites on a target DNA that overlap. Thus, in some embodiments, the first and second guide nucleic acids (e.g., guide RNAs) hybridize to overlapping target sequences (on opposite strands of the target DNA). In some embodiments, the first and second guide nucleic acids (e.g., guide RNAs) hybridize to target sequences that do not overlap (can be to same or different strand of the target DNA, depending on the orientation of the fusion proteins).
  • the first and second guide nucleic acids hybridize to target sequences that are separated from one another by 0 to 60 intervening base pairs (bp) (e.g., from 0 to 50, 0 to 40, 0 to 30, 0 to 20, 0 to 15, 0 to 10, 5 to 60, 5 to 50, 5 to 40, 5 to 30, 5 to 20, 5 to 15, 5 to 10, 10 to 60, 10 to 50, 10 to 40, 10 to 30, 10 to 20, 10 to 15, 20 to 60, 20 to 50, 20 to 40, or 20 to 30 intervening bp).
  • intervening base pairs e.g., from 0 to 50, 0 to 40, 0 to 30, 0 to 20, 0 to 15, 0 to 10, 5 to 60, 5 to 50, 5 to 40, 5 to 30, 5 to 20, 5 to 15, 5 to 10, 10 to 60, 10 to 50, 10 to 40, 10 to 30, 10 to 20, 10 to 15, 20 to 60, 20 to 50, 20 to 40, or 20 to 30 intervening bp).
  • the first and second guide nucleic acids hybridize to target sequences where the PAMs (and target sequences) are farther apart than 100 bp.
  • the first and second guide nucleic acids hybridize to target sequences that are separated by more than 60 bp. For example in some cases regions of DNA that are very far apart (even megabases apart) can physically interact. For example, in some such cases distant enhancers influence expression from a promoter.
  • the first and second guide nucleic acids can hybridize to target sequences that are far apart when considering primary nucleotide sequence, but are actually physically close to one other at particular points in time (e.g., while a distant enhancer is influencing expression from a promoter).
  • one of the guide nucleic acids could be targeted to a sequence at or near the promoter while the other could be targeted to a distant enhancer sequence - in such a case, target proteins would only be tagged when a functional BirA* is present, and that would only happen at times when the two split BirA* portions of the subject fusion proteins are brought together (such would be the case when/if the distant enhancer is physically near the promoter sequence despite being far apart when considering the nucleotide sequence).
  • each fusion protein can bind to either of the two guide nucleic acids.
  • which fusion protein (first or second) binds to which guide nucleic acid (first or second) is controlled.
  • CRISPR/Cas nucleic acid-guided polypeptide (e.g., dCas9) of the first fusion protein can be a protein (or can be from a species) that can bind to the first guide nucleic acid but cannot bind to the second guide nucleic acid.
  • the nuclease defective CRISPR/Cas nucleic acid- guided polypeptide (e.g., dCas9) of the second fusion protein can be a protein (or can be from a species) that can bind to the second guide nucleic acid but cannot bind to the first guide nucleic acid.
  • a subject method or composition/kit includes a nucleic acid (RNA or DNA) encoding a subject fusion protein (e.g., a CRISPR/Cas nucleic acid-guided protein fused to a split BirA* protein portion) and/or a CRISPR/Cas guide nucleic acid (e.g., guide RNA).
  • a subject fusion protein e.g., a CRISPR/Cas nucleic acid-guided protein fused to a split BirA* protein portion
  • a CRISPR/Cas guide nucleic acid e.g., guide RNA
  • a subject method or composition/kit includes a DNA (e.g., recombinant expression vector) encoding a subject fusion protein (e.g., a CRISPR/Cas nucleic acid-guided protein fused to a split BirA* protein portion) and/or a CRISPR/Cas guide nucleic acid (e.g., guide RNA).
  • a subject fusion protein e.g., a CRISPR/Cas nucleic acid-guided protein fused to a split BirA* protein portion
  • a CRISPR/Cas guide nucleic acid e.g., guide RNA
  • a subject fusion protein is encoded on the same nucleic acid molecule (e.g., viral DNA, plasmid DNA) as a CRISPR/Cas guide nucleic acid.
  • a subject fusion protein is encoded on a different nucleic acid molecule (e.g., viral DNA, plasmid DNA, mRNA) as a CRISPR/Cas guide nucleic acid.
  • a subject method or composition/kit includes one or more nucleic acids (RNA or DNA) encoding one or more subject fusion proteins (e.g., CRISPR/Cas nucleic acid-guided protein fused to a split BirA* protein portion) and/or one or more CRISPR/Cas guide nucleic acids (e.g., guide RNAs).
  • vectors e.g. plasmids, cosmids, minicircles, phage, viruses, etc.
  • useful for transferring nucleic acids into target cells are available.
  • the vectors comprising the nucleic acid(s) may be maintained episomally, e.g. as plasmids, minicircle DNAs, viruses such cytomegalovirus, adenovirus, etc., or they may be integrated into the target cell genome, through homologous recombination or random integration, e.g. retrovirus-derived vectors such as MMLV, HIV-1 , ALV, etc.
  • Vectors may be provided directly to cells.
  • cells can be contacted with one or more vectors encoding subject fusion proteins and/or corresponding CRISPR/Cas guide nucleic acids (e.g., guide RNAs) such that the vectors are taken up by the cells.
  • Methods for contacting cells with nucleic acid vectors that are plasmids including electroporation, calcium chloride transfection, microinjection, and lipofection are well known in the art.
  • the cells can be contacted with viral particles comprising the nucleic acid.
  • Retroviruses for example, lentiviruses, are particularly suitable to the method of the invention. Commonly used retroviral vectors are "defective", i.e.
  • the retroviral nucleic acids comprising the nucleic acid can be packaged into viral capsids by a packaging cell line.
  • Different packaging cell lines provide a different envelope protein (ecotropic, amphotropic or xenotropic) to be incorporated into the capsid, this envelope protein determining the specificity of the viral particle for the cells (ecotropic for murine and rat; amphotropic for most mammalian cell types including human, dog and mouse; and xenotropic for most mammalian cell types except murine cells).
  • the appropriate packaging cell line may be used to ensure that the cells are targeted by the packaged viral particles.
  • Vectors used for providing subject fusion proteins and/or corresponding CRISPR/Cas guide nucleic acids (e.g., guide RNAs) to target cells can include suitable promoters for driving expression, that is, transcriptional activation, of the nucleic acid of interest.
  • the nucleic acid of interest e.g., nucleic acids that include nucleotide sequences encoding subject fusion proteins and/or corresponding CRISPR/Cas guide RNAs
  • a promoter e.g., a promoter operable in the target cell.
  • a promoter can be a pol II promoter or a pol III promoter (e.g., for expressing guide RNA) such as a U6 promoter or a HI promoter.
  • pol II promoter e.g., CMV promoter, ⁇ -actin promoter, EF-1 alpha promoter, and the like
  • inducible promoters e.g., promoters that are active in particular cell populations or that respond to the presence of drugs such as tetracycline.
  • a promoter can be a pol II promoter or a pol III promoter (e.g., for expressing guide RNA) such as a U6 promoter or a HI promoter.
  • pol III promoter e.g., for expressing guide RNA
  • a subject fusion protein (nuclease defective CRISPR/Cas nucleic acid-guided protein, e.g., Cas9, fused to a portion of a split BirA* protein) can be introduced into cells as a polypeptide (e.g., in some cases complexed with a guide RNA, thus forming an RNP).
  • a polypeptide may optionally be fused to a polypeptide domain that increases solubility of the product.
  • the domain may be linked to the polypeptide through a defined protease cleavage site, e.g. a TEV sequence, which is cleaved by TEV protease.
  • the linker may also include one or more flexible sequences, e.g.
  • the cleavage of the fusion protein is performed in a buffer that maintains solubility of the product, e.g. in the presence of from 0.5 to 2 M urea, in the presence of polypeptides and/or polynucleotides that increase solubility, and the like.
  • Domains of interest include endosomolytic domains, e.g. influenza HA domain; and other polypeptides that aid in production, e.g. IF2 domain, GST domain, GRPE domain, and the like.
  • the polypeptide may be formulated for improved stability.
  • the peptides may be PEGylated, where the polyethyleneoxy group provides for enhanced lifetime in the blood stream.
  • a subject fusion protein may be fused to a polypeptide permeant domain to promote uptake by the cell.
  • a permeant domain are known in the art and may be used in the non-integrating polypeptides of the present invention, including peptides, peptidomimetics, and non-peptide carriers.
  • a permeant peptide may be derived from the third alpha helix of Drosophila melanogaster transcription factor Antennapaedia, referred to as penetratin, which comprises the amino acid sequence
  • the permeant peptide comprises the HIV-1 tat basic region amino acid sequence, which may include, for example, amino acids 49-57 of naturally-occurring tat protein.
  • Other permeant domains include poly- arginine motifs, for example, the region of amino acids 34-56 of HIV-1 rev protein, nona- arginine, octa-arginine, and the like.
  • the nona-arginine (R9) sequence is one of the more efficient PTDs that have been characterized (Wender et al. 2000; Uemura et al. 2002).
  • the site at which the fusion is made may be selected in order to optimize the biological activity, secretion or binding characteristics of the polypeptide. The optimal site will be determined by routine experimentation.
  • a subject fusion protein nuclease defective CRISPR/Cas nucleic acid-guided protein, e.g.,
  • Cas9, fused to a portion of a split BirA* protein may be produced in vitro or by eukaryotic cells or by prokaryotic cells, and it may be further processed by unfolding, e.g. heat denaturation, DTT reduction, etc. and may be further refolded, using methods known in the art.
  • Modifications of interest that do not alter primary sequence include chemical derivatization of polypeptides, e.g., acylation, acetylation, carboxylation, amidation, etc. Also included are modifications of glycosylation, e.g. those made by modifying the glycosylation patterns of a polypeptide during its synthesis and processing or in further processing steps; e.g. by exposing the polypeptide to enzymes which affect glycosylation, such as mammalian glycosylating or deglycosylating enzymes. Also embraced are sequences that have phosphorylated amino acid residues, e.g. phosphotyrosine, phosphoserine, or phosphothreonine.
  • modifications of glycosylation e.g. those made by modifying the glycosylation patterns of a polypeptide during its synthesis and processing or in further processing steps; e.g. by exposing the polypeptide to enzymes which affect glycosylation, such as mammalian glycosylating or
  • a subject fusion protein (nuclease defective CRISPR/Cas nucleic acid-guided protein, e.g., Cas9, fused to a portion of a split BirA* protein) may be isolated and purified, e.g., in accordance with conventional methods of recombinant synthesis.
  • a lysate may be prepared of the expression host and the lysate purified using HPLC, exclusion chromatography, gel electrophoresis, affinity chromatography, or other purification technique.
  • the compositions which are used can comprise at least 20% by weight of the desired product, more usually at least 75% by weight, e.g., at least 95% by weight, in relation to contaminants related to the method of preparation of the product and its purification. The percentages can be based upon total protein.
  • Target cells are usually at least 75% by weight, e.g., at least 95% by weight, in relation to contaminants related to the method of preparation of the product and its purification. The percentages can be based
  • a target nucleic acid e.g., target DNA, target genomic DNA
  • target DNA e.g., target DNA, target genomic DNA
  • a target nucleic acid e.g., target DNA, target genomic DNA
  • a target nucleic acid can be located within a eukaryotic cell, for example, inside of a eukaryotic cell in vitro, inside of a eukaryotic cell in vivo, inside of a eukaryotic cell ex vivo.
  • Suitable target cells include, but are not limited to: a single -celled eukaryotic organism; a cell of a single-cell eukaryotic organism; a plant cell; an algal cell, e.g.,
  • Botryococcus braunii Chlamydomonas reinhardtii, Nannochloropsis gaditana, Chlorella pyrenoidosa, Sargassum patens, C. agardh, and the like; a fungal cell (e.g., a yeast cell); an animal cell; a cell of an invertebrate animal (e.g.
  • a cell of a vertebrate animal e.g., fish, amphibian, reptile, bird, mammal
  • a cell of a mammal e.g., a cell of a rodent such as a mouse or rat, a cell of a non-human primate, a cell of a human, etc.
  • a target nucleic acid is located inside of a prokaryotic cell.
  • Any type of cell may be of interest (e.g. a stem cell, e.g. an embryonic stem (ES) cell, an induced pluripotent stem (iPS) cell, a hematopoietic stem cell, a germ cell (e.g., an oocyte, a sperm, an oogonia, a spermatogonia, etc.), a somatic cell, e.g.
  • a stem cell e.g. an embryonic stem (ES) cell, an induced pluripotent stem (iPS) cell, a hematopoietic stem cell, a germ cell (e.g., an oocyte, a sperm, an oogonia, a spermatogonia, etc.), a somatic cell, e.g.
  • ES embryonic stem
  • iPS induced pluripotent stem
  • hematopoietic stem cell e.g., a germ cell (e.g., an
  • a fibroblast an oligodendrocyte, a glial cell, a hematopoietic cell, a neuron, a muscle cell, a bone cell, a hepatocyte, a pancreatic cell; an in vitro or in vivo embryonic cell of an embryo at any stage, e.g., a 1-cell, 2-cell, 4-cell, 8-cell, etc. stage zebrafish embryo; etc.).
  • Cells may be from established cell lines or they may be primary cells, where "primary cells”, “primary cell lines”, and “primary cultures” are used interchangeably herein to refer to cells and cells cultures that have been derived from a subject and allowed to grow in vitro for a limited number of passages, i.e. splittings, of the culture.
  • primary cultures are cultures that may have been passaged 0 times, 1 time, 2 times, 4 times, 5 times, 10 times, or 15 times, but not enough times go through the crisis stage.
  • the primary cell lines are maintained for fewer than 10 passages in vitro.
  • Target cells can be unicellular organisms and/or can be grown in culture. If the cells are primary cells, they may be harvest from an individual by any convenient method. For example, leukocytes may be conveniently harvested by apheresis, leukocytapheresis, density gradient separation, etc., while cells from tissues such as skin, muscle, bone marrow, spleen, liver, pancreas, lung, intestine, stomach, etc. can be conveniently harvested by biopsy.
  • Target cells include in vivo target cells.
  • Target cells include retinal cells (e.g., Muller cells, ganglion cells, amacrine cells, horizontal cells, bipolar cells, and photoreceptor cells including rods and cones, Muller glial cells, and retinal pigmented epithelium); neural cells (e.g., cells of the thalamus, sensory cortex, zona incerta (ZI), ventral tegmental area (VTA), prefontal cortex (PFC), nucleus accumbens (NAc), amygdala (BLA), substantia nigra, ventral pallidum, globus pallidus, dorsal striatum, ventral striatum, subthalamic nucleus, hippocampus, dentate gyrus, cingulate gyrus, entorhinal cortex, olfactory cortex, primary motor cortex, or cerebellum); liver cells; kidney cells; immune cells (e.g., T
  • a subject method takes place outside of a cell in vitro (e.g., in a vessel such as a test tube). In some cases, a subject method takes place inside of a cell (e.g., in vivo, ex vivo, or inside of a cell in vitro, e.g., in culture). Whether a subject method takes place outside of a cell or inside of a cell, in some cases, a subject method includes providing biotin. In some such cases, e.g., cells in culture can be supplemented with exogenous biotin (e.g., by supplying biotin in the culture medium).
  • a subject method includes introducing at least one of the following into a cell (and in some cases all of the following) into a cell: (i) the first protein or a nucleic acid encoding the same; (ii) the second fusion protein or a nucleic acid encoding the same; (iii) the first guide nucleic acid or a nucleic acid encoding the same; and (iv) the second guide nucleic acid or a nucleic acid encoding the same.
  • Components of described herein can be delivered (introduced into a cell) as DNA, RNA, or protein.
  • a subject fusion protein nuclease defective CRISPR/Cas nucleic acid-guided protein, e.g., Cas9, fused to a portion of a split BirA* protein
  • a corresponding guide nucleic acid e.g., guide RNA
  • the fusion protein(s) and guide RNA(s) can be delivered (introduced into the cell) as an RNP complex (i.e., a pre-assembled complex of the protein and guide).
  • a subject fusion protein can be introduced into a cell as a protein.
  • a subject fusion protein can be introduced into a cell as a nucleic acid (DNA and/or RNA) encoding the fusion protein.
  • a CRISPR/Cas guide nucleic acid e.g., guide RNA
  • DNA/RNA hybrid molecule any of which - RNA, DNA, RNA/DNA hybrid - can include non- natural modifications.
  • eukaryotic cells include but are not limited to: viral or bacteriophage infection, transfection, conjugation, protoplast fusion, lipofection, electroporation, calcium phosphate precipitation, polyethyleneimine (PEI) -mediated transfection, DEAE-dextran mediated transfection, liposome-mediated transfection, particle gun technology, calcium phosphate precipitation, direct micro injection, nanoparticle-mediated nucleic acid delivery (see, e.g., Panyam et., al Adv Drug Deliv Rev. 2012 Sep 13. pii: S0169- 409X(12)00283-9. doi: 10.1016/j.addr.2012.09.023 ), and the like.
  • viral or bacteriophage infection include but are not limited to: viral or bacteriophage infection, transfection, conjugation, protoplast fusion, lipofection, electroporation, calcium phosphate precipitation, polyethyleneimine (PEI) -mediated transfection, DEAE-dextran mediated transfection, lip
  • the components can be introduced into a cell in vivo (e.g., administered to an individual) and/or in vitro (e.g., delivered to a cell in culture) and/or ex vivo using any convenient method.
  • injection, local or system injection oral, parenteral, subcutaneous, intravenous, intracranial, intraspinal, intraocular, into spinal fluid, intratumoral, intraocular, topical, and the like.
  • introduction can include nucleofection, electroporation, and the like. In some cases, introduction does not include nucleofection or electroporation. Detection / isolation
  • a subject method includes a step of detecting, identifying, and/or isolating one or more (e.g., 2 or more, 3 or more, 4 or more, 5 or more, etc.) target proteins tagged by a subject corresponding pair of split BirA* fusion proteins.
  • Biotinylated proteins can be enriched and/or isolated from a sample using any convenient technique, e.g., any biotin-affinity enrichment and/or purification approach such as NeutrAvidin-, avidin-, and/or streptavidin- coated beads (e.g., magnetic beads).
  • the biotinylated proteins can be identified by any convenient method, e.g., by mass spectrometry.
  • mass spectrometry e.g., tandem mass spectrometry such as LC -MS/MS
  • results from mass spectrometry can be compared to control experiments (e.g., experiments in which one or more guide RNAs were not present, guide RNAs of inappropriate location were present, one or more fusion proteins were not present, and the like).
  • a subject kit and/or subject composition includes (a) a first fusion protein
  • the first fusion protein comprises a first nuclease defective CRISPR/Cas nucleic acid-guided polypeptide (e.g., dCas9) fused to a first portion of a split BirA* protein;
  • a first guide nucleic acid and/or a nucleic acid encoding the first guide nucleic acid
  • the first guide nucleic acid binds to the first fusion protein;
  • a second fusion protein and/or a nucleic acid encoding the second fusion protein
  • the second fusion protein comprises a second nuclease defective CRISPR/Cas nucleic acid-guided polypeptide (e.g., dCas9) fused to a second portion of the split BirA* protein; and
  • a second guide nucleic acid and/or a nucleic acid encoding the second guide nucleic acid
  • the fusion proteins (a) and (c) are a corresponding pair of split BirA* fusion proteins (i.e., the first and second portions of the split BirA* protein form a functional BirA* protein when they are brought together, e.g., when the CRISPR/Cas nucleic acid-guided polypeptide portions of the fusion proteins, e.g., dCas9 portions of the fusion proteins, are brought near each other at a target DNA via hybridization of the corresponding guide RNAs - see elsewhere herein for details regarding positioning of guide nucleic acids).
  • a subject kit and/or subject composition includes (a) a first fusion protein (and/or a nucleic acid encoding the first fusion protein) where the first fusion protein comprises a first nuclease defective CRISPR/Cas nucleic acid-guided polypeptide (e.g., dCas9) fused to a first portion of a split BirA* protein; and (b) a second fusion protein (and/or a nucleic acid encoding the second fusion protein) where the second fusion protein comprises a second nuclease defective CRISPR/Cas nucleic acid-guided polypeptide (e.g., dCas9) fused to a second portion of the split BirA* protein.
  • a first fusion protein and/or a nucleic acid encoding the first fusion protein
  • the first fusion protein comprises a first nuclease defective CRISPR/Cas nucleic acid-guided polypeptide (e.g.,
  • the fusion proteins (a) and (b) are a corresponding pair of split BirA* fusion proteins (i.e., the first and second portions of the split BirA* protein form a functional BirA* protein when they are brought together, e.g., when the CRISPR/Cas nucleic acid-guided polypeptide portions of the fusion proteins, e.g., dCas9 portions of the fusion proteins, are brought near one another at a target DNA via hybridization of corresponding guide nucleic acids - see elsewhere herein for details regarding positioning of the guide nucleic acids).
  • a composition and/or kit can further include one or more additional reagents, e.g., selected from: a dilution buffer; a reconstitution solution (e.g., reconstituting lyophilized fusion prtein(s)); a wash buffer; a control reagent; a control expression vector or RNA polynucleotide; a reagent for in vitro production of a subject fusion protein from DNA or RNA, a reagent for in vitro production of a CRISPR/Cas guide RNA from DNA, and the like.
  • additional reagents e.g., selected from: a dilution buffer; a reconstitution solution (e.g., reconstituting lyophilized fusion prtein(s)); a wash buffer; a control reagent; a control expression vector or RNA polynucleotide; a reagent for in vitro production of a subject fusion protein from DNA or RNA, a reagent
  • a subject kit can further include instructions for using the components of the kit to practice the subject methods.
  • the instructions for practicing the subject methods are generally recorded on a suitable recording medium.
  • the instructions may be printed on a substrate, such as paper or plastic, etc.
  • the instructions may be present in the kits as a package insert, in the labeling of the container of the kit or components thereof (i.e., associated with the packaging or subpackaging) etc.
  • the instructions are present as an electronic storage data file present on a suitable computer readable storage medium, e.g. CD-ROM, diskette, flash drive, etc.
  • the actual instructions are not present in the kit, but means for obtaining the instructions from a remote source, e.g. via the internet, are provided.
  • An example of this embodiment is a kit that includes a web address where the instructions can be viewed and/or from which the instructions can be downloaded. As with the instructions, this means for obtaining the instructions is recorded on a suitable substrate.
  • a method of tagging target proteins in proximity to a nucleotide sequence of interest comprising contacting a target nucleic acid that comprises the nucleotide sequence of interest with: (a) a first fusion protein comprising a first nuclease defective CRISPR/Cas nucleic acid-guided polypeptide fused to a first portion of a split BirA* protein; (b) a first guide nucleic acid that binds to the first fusion protein and hybridizes to a first target sequence of the target nucleic acid; (c) a second fusion protein comprising a second nuclease defective CRISPR/Cas nucleic acid-guided polypeptide fused to a second portion of the split BirA* protein; and (d) a second guide nucleic acid that binds to the second fusion protein and hybridizes to a second target sequence of the target nucleic acid;
  • first and second portions of the split BirA* protein are brought together, forming a functional BirA* protein, by targeting of a first complex, comprising (a) and (b), to the first target sequence and targeting of a second complex, comprising (c) and (d), to the second target sequence, and wherein the functional BirA* protein biotinylates target proteins within its proximity.
  • nuclease defective CRISPR/Cas nucleic acid-guided polypeptides are each selected from nuclease defective forms of one of: Cas9, Cpfl, CasX, and CasY.
  • CRISPR/Cas nucleic acid-guided polypeptide is Cas9.
  • nucleotide sequence of interest is located within a eukaryotic cell genome and said contacting takes place within the eukaryotic cell.
  • any one of 1-8 wherein said contacting comprises introducing into a cell (i) the first protein or a nucleic acid encoding the same; (ii) the second fusion protein or a nucleic acid encoding the same; (iii) the first guide nucleic acid or a nucleic acid encoding the same; and (iv) the second guide nucleic acid or a nucleic acid encoding the same.
  • a composition for detecting target proteins comprising: (a) a first fusion protein, or a nucleic acid encoding the first fusion protein, wherein the first fusion protein comprises a first nuclease defective CRISPR/Cas nucleic acid-guided polypeptide fused to a first portion of a split BirA* protein; (b) a first guide nucleic acid, or a nucleic acid encoding the first guide nucleic acid, wherein the first guide nucleic acid binds to the first fusion protein; (c) a second fusion protein, or a nucleic acid encoding the second fusion protein, wherein the second fusion protein comprises a second nuclease defective CRISPR/Cas nucleic acid-guided polypeptide fused to a second portion of the split BirA* protein; and (d) a second guide nucleic acid, or a nucleic acid encoding the second guide nucleic acid, wherein the second guide nucleic acid binds
  • composition of 14, wherein the first and second nuclease defective CRISPR/Cas nucleic acid-guided polypeptides are each selected from a nuclease defective form of: Cas9, Cpfl, CasX, and CasY.
  • CRISPR/Cas nucleic acid-guided polypeptide is Cas9.
  • composition of any one of 14-16 wherein (i) a linker separates the first nuclease defective CRISPR/Cas nucleic acid-guided polypeptide from the first portion of the split BirA* protein, and/or (ii) a linker separates the second nuclease defective CRISPR/Cas nucleic acid- guided polypeptide from the second portion of the split BirA* protein.
  • a method of identifying one or more target proteins in proximity to a nucleotide sequence of interest comprising: (1) introducing into a cell (a) a first fusion protein, or a nucleic acid encoding the first fusion protein, wherein the first fusion protein comprises a first nuclease defective CRISPR/Cas nucleic acid-guided polypeptide fused to a first portion of a split BirA* protein; (b) a first guide nucleic acid, or a nucleic acid encoding the first guide nucleic acid, wherein the first guide nucleic acid binds to the first fusion protein and hybridizes to a first target sequence of a target nucleic acid; (c) a second fusion protein, or a nucleic acid encoding the second fusion protein, wherein the second fusion protein comprises a second nuclease defective CRISPR/Cas nucleic acid-guided polypeptide fused to a second portion of the split BirA* protein;
  • first and second portions of the split BirA* protein are brought together, forming a functional BirA* protein that biotinylates target proteins within its proximity; and (2) identifying one or more of the biotinylated target proteins.
  • CRISPR/Cas nucleic acid-guided polypeptides are each selected from: Cas9, Cpfl, CasX, and CasY.
  • a kit comprising: (i) a first fusion protein, or a nucleic acid encoding the first fusion protein, wherein the first fusion protein comprises a first nuclease defective CRISPR/Cas nucleic acid-guided polypeptide fused to a first portion of a split BirA* protein; and (ii) a second fusion protein, or a nucleic acid encoding the second fusion protein, wherein the second fusion protein comprises a second nuclease defective CRISPR/Cas nucleic acid-guided polypeptide fused to a second portion of the split BirA* protein; wherein the first and second portions of the split BirA* protein form a functional BirA* protein when brought together.
  • kit of 29, further comprising: (iii) a first guide nucleic acid, or a nucleic acid encoding the first guide nucleic acid, wherein the first guide nucleic acid is capable of binding to the first fusion protein and comprises a guide sequence that is complementary to a first target sequence of a target nucleic acid; (iv) a second guide nucleic acid, or a nucleic acid encoding the second guide nucleic acid, wherein the second guide nucleic acid is capable of binding to the second fusion protein and comprises a guide sequence that is complementary to a second target sequence of a target nucleic acid
  • Standard abbreviations may be used, e.g., bp, base pair(s); kb, kilobase(s); pi, picoliter(s); s or sec, second(s); min, minute(s); h or hr, hour(s); aa, amino acid(s); kb, kilobase(s); bp, base pair(s); nt, nucleotide(s); i.m.,
  • Figure 3 demonstrates that two halves of the split BirA* can form a catalytically active
  • HEK293 cells were transiently transfected with plasmids encoding either the full length BirA* (lane 1), either half of the split BirA* in the absence of the split GFP fusions (lanes 2 and 3), either half of the split BirA* in the presence of the split GFP fusions (lanes 4 and 5), both halves of the split BirA* in the absence of the split GFP fusions (lane 6), both halves of the split BirA* in the presence of the split GFP fusions (lane 7), or both halves of the split BirA* in which the N- terminal contains the specified alanine mutation in the presence of the split GFP fusions (lane 8- 13).
  • Biotin was added to the culture media 24 hours after transfection to a final concentration of 50 ⁇ .
  • Cells were lysed 24 hours after the addition of biotin.
  • the lysates were then run on a polyacrylamide gel and the presence of biotinylated proteins was detected using an anti-biotin primary antibody. Bands were visualized using secondary antibodies conjugated to horseradish peroxidase.
  • Figure 9 demonstrates the expression of HA and FLAG-tagged fusion proteins containing one of the two halves of the split BirA* fused to either the N-terminus or C-terminus of dCas9.
  • Two sets of fusion proteins one in which the either half of split BirA* was N-terminally or C- terminally linked to dCas9 through the linker SGSETPGTSES (SEQ ID NO: 10) ("XTEN Linker") and one in which either half of split BirA* was N-terminally or C-terminally linked to dCas9 through the linker GGGGSSGSETPGTSES (SEQ ID NO: 1135) (“XTEN + GGGS linker”) were expressed.
  • a linker in a subject fusion protein can be different lengths.
  • Figure 10 demonstrates that expression of two guide nucleic acids (sgRNAs in this case) used localize in close proximity two dCas9 molecules fused to complementary halves of split BirA* can drive assembly of the split BirA* into a catalytically active complex.
  • A.2 a N- terminal fusion of the first half of split BirA* linked to a dCas9 molecule through a "GGGGS + XTEN linker” [GGGGSSGSETPGTSES (SEQ ID NO: 1135)] and N-terminal fusion of the second half of split BirA* linked to a dCas9 molecule through a "GGGGS + XTEN linker” [GGGGSSGSETPGTSES (SEQ ID NO: 1135)].
  • Biotin was then added to a final concentration of 50 ⁇ 24 hours after transfections.

Abstract

The present disclosure provides methods and compositions for tagging target proteins in proximity to a nucleotide sequence of interest. A subject method can include contacting a target nucleic acid that includes the nucleotide sequence of interest with (a) a first fusion protein including a first nuclease defective CRISPR/Cas nucleic acid-guided polypeptide (e.g., a dCas9) fused to a first portion of a split BirA* protein; (b) a first guide nucleic acid (e.g., a first Cas9 guide RNA) that binds to (a) and hybridizes to a first target sequence of the target nucleic acid; (c) a second fusion protein including a second nuclease defective CRISPR/Cas nucleic acid-guided polypeptide (e.g., a dCas9) fused to a second portion of the split BirA* protein; and (d) a second guide nucleic acid (e.g., a second Cas9 guide RNA) that binds to (c) and hybridizes to a second target sequence of the target nucleic acid.

Description

COMPOSITIONS AND METHODS FOR TAGGING TARGET PROTEINS IN PROXIMITY TO A
NUCLEOTIDE SEQUENCE OF INTEREST
CROSS -REFERENCE
[0001] This application claims the benefit of U.S. Provisional Patent Application No.
62/503,779, filed May 9, 2017, which application is incorporated herein by reference in entirety.
INCORPORATION BY REFERENCE OF SEQUENCE LISTING PROVIDED AS A TEXT FILE
[0002] A Sequence Listing is provided herewith as a text file, "BERK-350PRV_SeqList_ST25.txt" created on May 8, 2017 and having a size of 8,008 KB. The contents of the text file are incorporated by reference herein in their entirety.
I. INTRODUCTION
[0003] Hundreds of proteins interact with DNA, often very transiently and rapidly (e.g. DNA repair factors, helicases, etc.). Technologies exist to isolate genomic DNA sequences tightly bound to proteins (ChIP), or proteins very tightly bound to DNA (reverse ChIP). Reverse ChIP is capable of capturing proteins bound to repetitive sequences but is not suitable for capturing proteins bound to non-repetitive sequences. Additionally, given the nature of many protein-DNA interactions, there is a need for technologies that can "record" (identify) transient interactions between proteins and a given target locus of a DNA molecule. The compositions and methods disclosed herein address at least these needs.
II. SUMMARY
[0004] The present disclosure provides methods (and compositions for performing a subject method) for tagging, e.g., biotinylating, target proteins in proximity to a nucleotide sequence of interest. A subject method can include contacting a target nucleic acid (e.g., DNA molecule) that includes the nucleotide sequence of interest with (a) a first fusion protein that includes a first nuclease defective CRISPR/Cas nucleic acid-guided polypeptide (e.g., a dCas9) fused to a first portion of a split BirA* protein; (b) a first guide nucleic acid (e.g., a first Cas9 guide RNA) that binds to the first fusion protein and hybridizes to a first target sequence of the target nucleic acid; (c) a second fusion protein comprising a second nuclease defective CRISPR/Cas nucleic acid-guided polypeptide (e.g., a dCas9) fused to a second portion of the split BirA* protein; and (d) a second guide nucleic acid (e.g., a second Cas9 guide RNA) that binds to the second fusion protein and hybridizes to a second target sequence of the target nucleic acid.
[0005] A complex of (a) and (b) forms [complex 1 - first fusion protein plus first guide nucleic acid] and a complex of (c) and (d) forms [complex 2 - second fusion protein plus second guide nucleic acid] . Complex 1 is targeted to a first target sequence (the first guide nucleic acid hybridizes to the first target sequence) and complex 2 is targeted to a second target sequence (the second guide nucleic acid hybridizes to the second target sequence) such that (1) the nuclease defective CRISPR/Cas nucleic acid-guided polypeptide (e.g., dCas9) of the first fusion protein is not occupying the same space as the nuclease defective CRISPR/Cas nucleic acid-guided polypeptide (e.g., dCas9) of the second fusion protein; and (2) a functional BirA* protein forms when complexes 1 and 2 are positioned in such a way to bring the first and second portions of the split BirA* protein together. Thus, the first and second portions of the split BirA* protein are brought together (thereby forming a functional BirA* protein), by virtue of the proper positioning of the first and second fusion proteins [noted above as (a) and (c)] near the nucleotide sequence of interest. The functional BirA* protein then biotinylates target proteins within its proximity (e.g., within a range of 12 nm, such as within a range of 10 nm or within a range of 8 nm).
[0006] In some cases, which fusion protein (first or second) binds to which guide nucleic acid (first or second) is not controlled - thus in some cases each fusion protein can bind to either of the two guide nucleic acids.
[0007] In some cases, the first target sequence (targeted by complex 1) is immediately adjacent a first protospacer adjacent motif (PAM); the second target sequence (targeted by complex 2) is immediately adjacent a second protospacer adjacent motif (PAM); and the first and second PAMs are separated by an appropriate number of base pairs so as to allow the first and second portions of the split BirA* protein to be brought together (thereby forming a functional BirA* protein) when complex 1 and complex 2 are targeted to their respective target sequences. In some cases, the first target sequence is immediately adjacent a first protospacer adjacent motif (PAM); the second target sequence is immediately adjacent a second protospacer adjacent motif (PAM); and the first and second PAMs are separated by 15 to 100 base pairs (bp).
[0008] In some cases the nuclease defective CRISPR/Cas nucleic acid-guided polypeptide of the first and/or second fusion protein is a nuclease defective class 2 CRISPR/Cas nucleic acid-guided polypeptide (e.g., nuclease defective type II, type V, or type VI CRISPR/Cas nucleic acid-guided polypeptide). In some cases the nuclease defective CRISPR/Cas nucleic acid-guided polypeptide of the fusion protein (a) and/or (c) is a nuclease defective Cas9, Cpfl (also called Casl2a), CasX, or CasY protein. [0009] In some cases the step of contacting a target nucleic acid (e.g., DNA molecule) of a subject method includes introducing into a cell one or more of: (i) the first fusion protein and/or a nucleic acid encoding the first fusion protein; (ii) the first guide nucleic acid and/or a nucleic acid encoding the first guide nucleic acid; (iii) the second fusion protein and/or a nucleic acid encoding the second fusion protein; and (iv) the second guide nucleic acid and/or a nucleic acid encoding the second guide nucleic acid. In some cases a subject method includes a step of detecting, isolating, and/or identifying one or more of the tagged (e.g., biotinylated) target proteins.
III. Brief Description Of The Drawings
[0010] Figure 1 depicts crystal structures used to identify the cut site with BirA*. The full-length
BirA* is depicted in green, and the biotin molecule is depicted in white. With the chosen cut site within BirA*, the N-terminal half is depicted in yellow and the C-terminal half is depicted in blue.
[0011] Figure 2 depicts various split BirA* constructs used to determine whether or not a split GFP tether can effectively drive assembly of the split BirA* halves into a catalytically active enzyme.
[0012] Figure 3 depicts split BirA* is capable of biotinylation only when brought together by a
dimerizing fusion. In this case, split-GFP was used to force dimerization. Mutations at the interface of BirA* greatly reduced biotinylation even with a dimerizing fusion.
[0013] Figure 4 depicts various orientations for fusion to a Cas9 protein.
[0014] Figure 5 depicts various subject fusion proteins (nuclease defective Cas9 fused to split BirA*) that were designed. Split BirA* could be fused to either the N-terminus or C-terminus of dCas9. Two different flexible linkers between the split BirA* and dCas9 were tested. The XTEN linker is SGSETPGTSES (SEQ ID NO: 10). The second linker includes an additional five amino acids GGGGS (SEQ ID NO: 9) so that this longer linker is comprised of the amino acids
GGGGSSGSETPGTSES (SEQ ID NO: 1135).
[0015] Figure 6 depicts various possible orientations for different combinations of subject fusion proteins (nuclease defective Cas9 fused to split BirA*). The two fusions proteins can be targeted towards the same DNA strand or opposite DNA strands.
[0016] Figure 7 depicts various spacings between the two guide nucleic acids (termed "sgRNAs" in the figure) used to localize the two fusion proteins near the beta actin promoter portion of a target DNA. "spacer" is equivalent to "guide sequence" / "targeting sequence."
[0017] Figure 8 depicts experimental work flow that was used for experiments of the disclosure.
HEK293 cells were transiently transfected with plasmids to express the two fusion proteins and two guide nucleic acids. Biotin was added to the culture media 24 hours after transfection to a final concentration of 50 μΜ. Cells were lysed 24 hours after the addition of biotin. Biotinylated proteins in the cell lysate were then allowed to bind to Streptavidin-coated magnetic beads and eluted off the beads after an incubation.
[0018] Figure 9 depicts Western Blots demonstrating expression of subject fusion proteins (Cas9-
Split BirA* fusion proteins in this case) of various orientations (e.g., split-BirA* fused to the N- terminus or C-terminus of Cas9).
[0019] Figure 10 depicts data from streptavidin pull down experiments. S = primary supernatant; E = eluate; FL = full-length BirA*; +/+ sgRNA = two sgRNAs targeting the same strand; +/- sgRNA = two sgRNAs targeting different strands. A.2 = GGGGSSGSETPGTSES (SEQ ID NO: 1135) linker.
[0020] Figure 11 depicts sequences of generated fusion constructs.
IV. DEFINITIONS
[0021] The terms "polynucleotide" and "nucleic acid," used interchangeably herein, refer to a
polymeric form of nucleotides of any length, either ribonucleotides or deoxyribonucleotides. Thus, this term includes, but is not limited to, single-, double-, or multi-stranded DNA or RNA, genomic DNA, cDNA, DNA -RNA hybrids, or a polymer comprising purine and pyrimidine bases or other natural, chemically or biochemically modified, non-natural, or derivatized nucleotide bases.
[0022] By "hybridizable" or "complementary" or "substantially complementary" it is meant that a nucleic acid (e.g. RNA, DNA) comprises a sequence of nucleotides that enables it to non- covalently bind, i.e. form Watson-Crick base pairs and/or G/U base pairs, "anneal", or
"hybridize," to another nucleic acid in a sequence-specific, antiparallel, manner (i.e., a nucleic acid specifically binds to a complementary nucleic acid) under the appropriate in vitro and/or in vivo conditions of temperature and solution ionic strength. Standard Watson-Crick base-pairing includes: adenine (A) pairing with thymidine (T), adenine (A) pairing with uracil (U), and guanine (G) pairing with cytosine (C) [DNA, RNA]. In addition, for hybridization between two RNA molecules (e.g., dsRNA), and for hybridization of a DNA molecule with an RNA molecule (e.g., when a DNA target nucleic acid base pairs with a guide RNA, etc.): guanine (G) can also base pair with uracil (U). For example, G/U base-pairing is at least partially responsible for the degeneracy (i.e., redundancy) of the genetic code in the context of tRNA anti-codon base-pairing with codons in mRNA. Thus, in the context of this disclosure, a guanine (G) (e.g., of dsRNA duplex of a guide RNA molecule; of a guide RNA base pairing with a target nucleic acid, etc.) is considered complementary to both a uracil (U) and to an adenine (A). For example, when a G/U base-pair can be made at a given nucleotide position of a dsRNA duplex of a guide RNA molecule, the position is not considered to be non-complementary, but is instead considered to be complementary.
[0023] Hybridization and washing conditions are well known and exemplified in Sambrook, J.,
Fritsch, E. F. and Maniatis, T. Molecular Cloning: A Laboratory Manual, Second Edition, Cold Spring Harbor Laboratory Press, Cold Spring Harbor (1989), particularly Chapter 11 and Table 11.1 therein; and Sambrook, J. and Russell, W., Molecular Cloning: A Laboratory Manual, Third Edition, Cold Spring Harbor Laboratory Press, Cold Spring Harbor (2001). The conditions of temperature and ionic strength determine the "stringency" of the hybridization.
[0024] Hybridization requires that the two nucleic acids contain complementary sequences, although mismatches between bases are possible. The conditions appropriate for hybridization between two nucleic acids depend on the length of the nucleic acids and the degree of complementarity, variables well known in the art. The greater the degree of complementarity between two nucleotide sequences, the greater the value of the melting temperature (Tm) for hybrids of nucleic acids having those sequences. For hybridizations between nucleic acids with short stretches of complementarity (e.g. complementarity over 35 or less, 30 or less, 25 or less, 22 or less, 20 or less, or 18 or less nucleotides) the position of mismatches can become important (see Sambrook et al., supra, 11.7-11.8). Typically, the length for a hybridizable nucleic acid is 8 nucleotides or more (e.g., 10 nucleotides or more, 12 nucleotides or more, 15 nucleotides or more, 20 nucleotides or more, 22 nucleotides or more, 25 nucleotides or more, or 30 nucleotides or more). Temperature, wash solution salt concentration, and other conditions may be adjusted as necessary according to factors such as length of the region of complementation and the degree of complementation.
[0025] It is understood that the sequence of a polynucleotide need not be 100% complementary to that of its target nucleic acid to be specifically hybridizable or hybridizable. Moreover, a
polynucleotide may hybridize over one or more segments such that intervening or adjacent segments are not involved in the hybridization event (e.g., a bulge, a loop structure or hairpin structure, etc.). A polynucleotide can comprise 60% or more, 65% or more, 70% or more, 75% or more, 80% or more, 85% or more, 90% or more, 95% or more, 98% or more, 99% or more, 99.5% or more, or 100% sequence complementarity to a target region within the target nucleic acid sequence to which it will hybridize. For example, an antisense nucleic acid in which 18 of 20 nucleotides of the antisense compound are complementary to a target region, and would therefore specifically hybridize, would represent 90 percent complementarity. In this example, the remaining noncomplementary nucleotides may be clustered or interspersed with complementary nucleotides and need not be contiguous to each other or to complementary nucleotides. Percent complementarity between particular stretches of nucleic acid sequences within nucleic acids can be determined using any convenient method. Example methods include BLAST programs (basic local alignment search tools) and PowerBLAST programs (Altschul et al., J. Mol. Biol., 1990, 215, 403-410; Zhang and Madden, Genome Res., 1997, 7, 649-656), the Gap program (Wisconsin Sequence Analysis Package, Version 8 for Unix, Genetics Computer Group, University Research Park, Madison Wis.), e.g., using default settings, which uses the algorithm of Smith and Waterman (Adv. Appl. Math., 1981, 2, 482-489), and the like.
[0026] The terms "peptide," "polypeptide," and "protein" are used interchangeably herein, and refer to a polymeric form of amino acids of any length, which can include coded and non-coded amino acids, chemically or biochemically modified or derivatized amino acids, and polypeptides having modified peptide backbones.
[0027] "Binding" as used herein (e.g. with reference to an RNA -binding domain of a polypeptide, binding to a target nucleic acid, and the like) refers to a non-covalent interaction between macromolecules (e.g., between a protein and a nucleic acid; between a CRISPR/Cas protein/guide RNA complex and a target nucleic acid; and the like). While in a state of non- covalent interaction, the macromolecules are said to be "associated" or "interacting" or
"binding" (e.g., when a molecule X is said to interact with a molecule Y, it is meant the molecule X binds to molecule Y in a non-covalent manner). Not all components of a binding interaction need be sequence-specific (e.g., contacts with phosphate residues in a DNA backbone), but some portions of a binding interaction may be sequence-specific. Binding interactions are generally characterized by a dissociation constant (KD) of less than 106 M, less than 107 M, less than 10 8 M, less than 10 9 M, less than 10 10 M, less than 10 11 M, less than 10 12 M, less than 10 13 M, less than 10 14 M, or less than 10 15 M. "Affinity" refers to the strength of binding, increased binding affinity being correlated with a lower KD.
[0028] By "binding domain" it is meant a protein domain that is able to bind non-covalently to
another molecule. A binding domain can bind to, for example, a DNA molecule (a DNA -binding domain), an RNA molecule (an RNA-binding domain) and/or a protein molecule (a protein- binding domain). In the case of a protein having a protein-binding domain, it can in some cases bind to itself (to form homodimers, homotrimers, etc.) and/or it can bind to one or more regions of a different protein or proteins.
[0029] The term "conservative amino acid substitution" refers to the interchangeability in proteins of amino acid residues having similar side chains. For example, a group of amino acids having aliphatic side chains consists of glycine, alanine, valine, leucine, and isoleucine; a group of amino acids having aliphatic -hydroxyl side chains consists of serine and threonine; a group of amino acids having amide containing side chains consisting of asparagine and glutamine; a group of amino acids having aromatic side chains consists of phenylalanine, tyrosine, and tryptophan; a group of amino acids having basic side chains consists of lysine, arginine, and histidine; a group of amino acids having acidic side chains consists of glutamate and aspartate; and a group of amino acids having sulfur containing side chains consists of cysteine and methionine. Exemplary conservative amino acid substitution groups are: valine -leucine - isoleucine, phenylalanine-tyrosine, lysine-arginine, alanine-valine-glycine, and asparagine- glutamine.
[0030] A polynucleotide or polypeptide has a certain percent "sequence identity" to another
polynucleotide or polypeptide, meaning that, when aligned, that percentage of bases or amino acids are the same, and in the same relative position, when comparing the two sequences.
Sequence identity can be determined in a number of different ways. To determine sequence identity, sequences can be aligned using various convenient methods and computer programs (e.g., BLAST, T-COFFEE, MUSCLE, MAFFT, etc.), available over the world wide web at sites including ncbi.nlm.nili.gov/BLAST, ebi.ac.uk/Tools/msa/tcoffee/, ebi.ac.uk/Tools/msa/muscle/, mafft.cbrc.jp/alignment/software/. See, e.g., Altschul et al. (1990), J. Mol. Bioi. 215:403-10.
[0031] A DNA sequence that "encodes" a particular RNA is a DNA nucleotide sequence that is
transcribed into RNA. A DNA polynucleotide may encode an RNA (mRNA) that is translated into protein (and therefore the DNA and the mRNA both encode the protein), or a DNA polynucleotide may encode an RNA that is not translated into protein (e.g. tRNA, rRNA, microRNA (miRNA), a "non-coding" RNA (ncRNA), a guide RNA, etc.).
[0032] A "protein coding sequence" or a sequence that encodes a particular protein or polypeptide, is a nucleotide sequence that is transcribed into mRNA (in the case of DNA) and is translated (in the case of mRNA) into a polypeptide in vitro or in vivo when placed under the control of appropriate regulatory sequences.
[0033] The terms "DNA regulatory sequences," "control elements," and "regulatory elements," used interchangeably herein, refer to transcriptional and translational control sequences, such as promoters, enhancers, polyadenylation signals, terminators, protein degradation signals, and the like, that provide for and/or regulate transcription of a non-coding sequence (e.g., guide RNA) or a coding sequence (e.g., CRISPR/Cas RNA-guided polypeptide, Cas9 polypeptide, and the like) and/or regulate translation of an encoded polypeptide.
[0034] As used herein, a "promoter" or a "promoter sequence" is a DNA regulatory region capable of binding RNA polymerase and initiating transcription of a downstream (3' direction) coding or non-coding sequence. For purposes of the present disclosure, the promoter sequence is bounded at its 3' terminus by the transcription initiation site and extends upstream (5' direction) to include the minimum number of bases or elements necessary to initiate transcription at levels detectable above background. Within the promoter sequence will be found a transcription initiation site, as well as protein binding domains responsible for the binding of RNA polymerase. Eukaryotic promoters will often, but not always, contain "TATA" boxes and "CAT" boxes. Various promoters, including inducible promoters, may be used to drive the various vectors of the present disclosure.
[0035] The term "naturally-occurring" or "unmodified" or "wild type" as used herein as applied to a nucleic acid, a polypeptide, a cell, or an organism, refers to a nucleic acid, polypeptide, cell, or organism that is found in nature. For example, a polypeptide or polynucleotide sequence that is present in an organism (including viruses) that can be isolated from a source in nature is naturally occurring.
[0036] The term "chimeric" as used herein as applied to a nucleic acid or polypeptide refers to two components that are defined by structures derived from different sources. For example, where "chimeric" is used in the context of a chimeric polypeptide (e.g., a chimeric Cas9 protein or Cas9 fusion protein), the chimeric polypeptide includes amino acid sequences that are derived from different polypeptides. A chimeric polypeptide may comprise either modified or naturally- occurring polypeptide sequences (e.g., a first amino acid sequence from a modified or unmodified Cas9 protein; and a second amino acid sequence from a modified or unmodified protein other than a Cas9 protein, etc.). Similarly, "chimeric" in the context of a polynucleotide encoding a chimeric polypeptide includes nucleotide sequences derived from different coding regions (e.g., a first nucleotide sequence encoding a modified or unmodified Cas9 protein; and a second nucleotide sequence encoding a polypeptide other than a Cas9 protein).
[0037] The term "chimeric polypeptide" refers to a polypeptide which is made by the combination (i.e., "fusion") of two otherwise separated segments of amino acid sequence, usually through human intervention. A polypeptide that comprises a chimeric amino acid sequence is a chimeric polypeptide. Chimeric polypeptides can also be referred to as "fusion proteins" or "fusion variants."
[0038] "Heterologous," as used herein, means a nucleotide or polypeptide sequence that is not found in the native nucleic acid or protein, respectively. For example, in a chimeric Cas9 protein, the
RNA-binding domain of a naturally-occurring bacterial Cas9 polypeptide (or a variant thereof) may be fused to a heterologous polypeptide sequence (i.e. a polypeptide sequence from a protein other than Cas9 or a polypeptide sequence from another organism). The heterologous polypeptide sequence may exhibit an activity (e.g., enzymatic activity) that will also be exhibited by the chimeric Cas9 protein (e.g., biotin ligase activity, etc.). A heterologous nucleic acid sequence may be linked to a naturally-occurring nucleic acid sequence (or a variant thereof) (e.g., by genetic engineering) to generate a chimeric nucleotide sequence encoding a chimeric polypeptide (a fusion protein).
[0039] "Recombinant," as used herein, means that a particular nucleic acid (DNA or RNA) is the product of various combinations of cloning, restriction, polymerase chain reaction (PCR) and/or ligation steps resulting in a construct having a structural coding or non-coding sequence distinguishable from endogenous nucleic acids found in natural systems. DNA sequences encoding polypeptides can be assembled from cDNA fragments or from a series of synthetic oligonucleotides, to provide a synthetic nucleic acid which is capable of being expressed from a recombinant transcriptional unit contained in a cell or in a cell-free transcription and translation system. Genomic DNA comprising the relevant sequences can also be used in the formation of a recombinant gene or transcriptional unit. Sequences of non-translated DNA may be present 5' or 3' from the open reading frame, where such sequences do not interfere with manipulation or expression of the coding regions, and may indeed act to modulate production of a desired product by various mechanisms (see "DNA regulatory sequences"). Alternatively, DNA sequences encoding RNA (e.g., guide RNA) that is not translated may also be considered recombinant. Thus, e.g., the term "recombinant" nucleic acid refers to one which is not naturally occurring, e.g., is made by the artificial combination of two otherwise separated segments of sequence through human intervention. This artificial combination is often accomplished by either chemical synthesis means, or by the artificial manipulation of isolated segments of nucleic acids, e.g., by genetic engineering techniques. Such is usually done to replace a codon with a codon encoding the same amino acid, a conservative amino acid, or a non-conservative amino acid. Alternatively, it is performed to join together nucleic acid segments of desired functions to generate a desired combination of functions. This artificial combination is often accomplished by either chemical synthesis means, or by the artificial manipulation of isolated segments of nucleic acids, e.g., by genetic engineering techniques. When a recombinant polynucleotide encodes a polypeptide, the sequence of the encoded polypeptide can be naturally occurring ("wild type") or can be a variant (e.g., a mutant) of the naturally occurring sequence. An example of such a case is a DNA (a recombinant) encoding a wild-type protein where the DNA sequence is codon optimized for expression of the protein in a cell (e.g., a eukaryotic cell) in which the protein is not naturally found (e.g., expression of a CRISPR/Cas RNA-guided polypeptide such as Cas9 in a eukaryotic cell). A codon-optimized DNA can therefore be recombinant and non-naturally occurring while the protein encoded by the DNA may have a wild type amino acid sequence.
[0040] Thus, the term "recombinant" polypeptide does not necessarily refer to a polypeptide whose amino acid sequence does not naturally occur. Instead, a "recombinant" polypeptide is encoded by a recombinant non-naturally occurring DNA sequence, but the amino acid sequence of the polypeptide can be naturally occurring ("wild type") or non-naturally occurring (e.g., a variant, a mutant, etc.). Thus, a "recombinant" polypeptide is the result of human intervention, but may have a naturally occurring amino acid sequence.
[0041] A "vector" or "expression vector" is a replicon, such as plasmid, phage, virus, or cosmid, to which another DNA segment, i.e. an "insert", may be attached so as to bring about the replication of the attached segment in a cell.
[0042] An "expression cassette" comprises a DNA coding sequence operably linked to a promoter.
"Operably linked" refers to a juxtaposition wherein the components so described are in a relationship permitting them to function in their intended manner. For instance, a promoter is operably linked to a coding sequence (or the coding sequence can also be said to be operably linked to the promoter) if the promoter affects its transcription or expression.
[0043] The terms "recombinant expression vector," or "DNA construct" are used interchangeably herein to refer to a DNA molecule comprising a vector and one insert. Recombinant expression vectors are usually generated for the purpose of expressing and/or propagating the insert(s), or for the construction of other recombinant nucleotide sequences. The insert(s) may or may not be operably linked to a promoter sequence and may or may not be operably linked to DNA regulatory sequences.
[0044] A cell has been "genetically modified" or "transformed" or "transfected" by exogenous DNA, e.g. a recombinant expression vector, when such DNA has been introduced inside the cell. The presence of the exogenous DNA results in permanent or transient genetic change. The transforming DNA may or may not be integrated (covalently linked) into the genome of the cell. In prokaryotes, yeast, and mammalian cells for example, the transforming DNA may be maintained on an episomal element such as a plasmid. With respect to eukaryotic cells, a stably transformed cell is one in which the transforming DNA has become integrated into a
chromosome so that it is inherited by daughter cells through chromosome replication. This stability is demonstrated by the ability of the eukaryotic cell to establish cell lines or clones that comprise a population of daughter cells containing the transforming DNA. A "clone" is a population of cells derived from a single cell or common ancestor by mitosis. A "cell line" is a clone of a primary cell that is capable of stable growth in vitro for many generations.
[0045] Suitable methods of genetic modification (also referred to as "transformation") include e.g., viral or bacteriophage infection, transfection, conjugation, protoplast fusion, lipofection, electroporation, calcium phosphate precipitation, polyethyleneimine (PEI) -mediated transfection, DEAE-dextran mediated transfection, liposome-mediated transfection, particle gun technology, calcium phosphate precipitation, direct micro injection, nanoparticle-mediated nucleic acid delivery (see, e.g., Panyam et., al Adv Drug Deliv Rev. 2012 Sep 13. pii: S0169- 409X(12)00283-9. doi: 10.1016/j.addr.2012.09.023 ), and the like.
[0046] The choice of method of genetic modification is generally dependent on the type of cell being transformed and the circumstances under which the transformation is taking place (e.g., in vitro, ex vivo, or in vivo). A general discussion of these methods can be found in Ausubel, et al., Short Protocols in Molecular Biology, 3rd ed., Wiley & Sons, 1995.
[0047] A "target nucleic acid" as used herein is a polynucleotide (e.g., DNA such as genomic DNA) that includes a site ("target site" or "target sequence") targeted by a CRISPR/Cas RNA-guided polypeptide (e.g., a subject Cas9 fusion protein). The target sequence is the sequence to which the guide sequence of a subject CRISPR/Cas guide RNA (e.g., a Cas9 guide RNA) will hybridize. For example, the target site (or target sequence) 5'-GAGCAUAUC-3' within a target nucleic acid is targeted by (or is bound by, or hybridizes with, or is complementary to) the sequence 5'-GAUAUGCUC-3' . Suitable hybridization conditions include physiological conditions normally present in a cell. For a double stranded target nucleic acid, the strand of the target nucleic acid that is complementary to and hybridizes with the guide RNA is referred to as the "complementary strand" or "target strand"; while the strand of the target nucleic acid that is complementary to the "target strand" (and is therefore not complementary to the guide RNA) is referred to as the "non-target strand" or "non-complementary strand."
[0048] By "cleavage" it is meant the breakage of the covalent backbone of a target nucleic acid
molecule (e.g., RNA, DNA). Cleavage can be initiated by a variety of methods including, but not limited to, enzymatic or chemical hydrolysis of a phosphodiester bond. Both single-stranded cleavage and double-stranded cleavage are possible, and double-stranded cleavage can occur as a result of two distinct single-stranded cleavage events.
[0049] "Nuclease" and "endonuclease" are used interchangeably herein to mean an enzyme which possesses catalytic activity for nucleic acid cleavage (e.g., ribonuclease activity (ribonucleic acid cleavage), deoxyribonuclease activity (deoxyribonucleic acid cleavage), etc.). A "genome editing endonuclease" is an endonuclease that can be used for the editing of a cell's genome (e.g., by cleaving at a targeted location within the cell's genomic DNA). Examples of genome editing endonucleases include but are not limited to: (i) Zinc finger nucleases, (ii) TAL endonucleases, and (iii) CRISPR/Cas endonucleases. Examples of CRISPR/Cas endonucleases include class 2 CRISPR/Cas endonucleases such as: (a) type II CRISPR/Cas proteins, e.g., a Cas9 protein; (b) type V CRISPR/Cas proteins, e.g., a Cpfl polypeptide, a C2cl polypeptide, a C2c3 polypeptide, and the like; and (c) type VI CRISPR/Cas proteins, e.g., a C2c2 polypeptide. [0050] By "cleavage domain" or "active domain" or "nuclease domain" of a nuclease it is meant the polypeptide sequence or domain within the nuclease which possesses the catalytic activity for nucleic acid cleavage. A cleavage domain can be contained in a single polypeptide chain or cleavage activity can result from the association of two (or more) polypeptides. A single nuclease domain may consist of more than one isolated stretch of amino acids within a given polypeptide.
[0051] The term "stem cell" is used herein to refer to a cell (e.g., plant stem cell, vertebrate stem cell) that has the ability both to self-renew and to generate a differentiated cell type (see Morrison et al. (1997) Cell 88:287-298). In the context of cell ontogeny, the adjective "differentiated", or "differentiating" is a relative term. A "differentiated cell" is a cell that has progressed further down the developmental pathway than the cell it is being compared with. Thus, pluripotent stem cells (described below) can differentiate into lineage-restricted progenitor cells (e.g., mesodermal stem cells), which in turn can differentiate into cells that are further restricted (e.g., neuron progenitors), which can differentiate into end-stage cells (i.e., terminally differentiated cells, e.g., neurons, cardiomyocytes, etc.), which play a characteristic role in a certain tissue type, and may or may not retain the capacity to proliferate further. Stem cells may be characterized by both the presence of specific markers (e.g., proteins, RNAs, etc.) and the absence of specific markers. Stem cells may also be identified by functional assays both in vitro and in vivo, particularly assays relating to the ability of stem cells to give rise to multiple differentiated progeny.
[0052] Stem cells of interest include pluripotent stem cells (PSCs). The term "pluripotent stem cell" or "PSC" is used herein to mean a stem cell capable of producing all cell types of the organism. Therefore, a PSC can give rise to cells of all germ layers of the organism (e.g., the endoderm, mesoderm, and ectoderm of a vertebrate). Pluripotent cells are capable of forming teratomas and of contributing to ectoderm, mesoderm, or endoderm tissues in a living organism. Pluripotent stem cells of plants are capable of giving rise to all cell types of the plant (e.g., cells of the root, stem, leaves, etc.).
[0053] PSCs of animals can be derived in a number of different ways. For example, embryonic stem cells (ESCs) are derived from the inner cell mass of an embryo (Thomson et. al, Science. 1998
Nov 6;282(5391): 1145-7) whereas induced pluripotent stem cells (iPSCs) are derived from somatic cells (Takahashi et. al, Cell. 2007 Nov 30; 131(5):861-72; Takahashi et. al, Nat Protoc.
2007 ;2( 12): 3081-9; Yu et. al, Science. 2007 Dec 21 ;318(5858):1917-20. Epub 2007 Nov 20).
Because the term PSC refers to pluripotent stem cells regardless of their derivation, the term PSC encompasses the terms ESC and iPSC, as well as the term embryonic germ stem cells (EGSC), which are another example of a PSC. PSCs may be in the form of an established cell line, they may be obtained directly from primary embryonic tissue, or they may be derived from a somatic cell. PSCs can be target cells of the methods described herein.
[0054] By "embryonic stem cell" (ESC) is meant a PSC that was isolated from an embryo, typically from the inner cell mass of the blastocyst. ESC lines are listed in the NIH Human Embryonic Stem Cell Registry, e.g. hESBGN-01, hESBGN-02, hESBGN-03, hESBGN-04 (BresaGen, Inc.); HES-1, HES-2, HES-3, HES-4, HES-5, HES-6 (ES Cell International); Miz-hESl (MizMedi Hospital-Seoul National University); HSF-1, HSF-6 (University of California at San Francisco); and HI, H7, H9, HI 3, H14 (Wisconsin Alumni Research Foundation (WiCell Research
Institute)). Stem cells of interest also include embryonic stem cells from other primates, such as Rhesus stem cells and marmoset stem cells. The stem cells may be obtained from any mammalian species, e.g. human, equine, bovine, porcine, canine, feline, rodent, e.g. mice, rats, hamster, primate, etc. (Thomson et al. (1998) Science 282: 1145; Thomson et al. (1995) Proc. Natl. Acad. Sci USA 92:7844; Thomson et al. (1996) Biol. Reprod. 55:254; Shamblott et al., Proc. Natl. Acad. Sci. USA 95: 13726, 1998). In culture, ESCs typically grow as flat colonies with large nucleo-cytoplasmic ratios, defined borders and prominent nucleoli. In addition, ESCs express SSEA-3, SSEA-4, TRA-1-60, TRA-1-81, and Alkaline Phosphatase, but not SSEA-1. Examples of methods of generating and characterizing ESCs may be found in, for example, US Patent No. 7,029,913, US Patent No. 5,843,780, and US Patent No. 6,200,806, the disclosures of which are incorporated herein by reference. Methods for proliferating hESCs in the
undifferentiated form are described in WO 99/20741, WO 01/51616, and WO 03/020920.
[0055] By "embryonic germ stem cell" (EGSC) or "embryonic germ cell" or "EG cell" is meant a
PSC that is derived from germ cells and/or germ cell progenitors, e.g. primordial germ cells, i.e. those that would become sperm and eggs. Embryonic germ cells (EG cells) are thought to have properties similar to embryonic stem cells as described above. Examples of methods of generating and characterizing EG cells may be found in, for example, US Patent No. 7,153,684; Matsui, Y., et al., (1992) Cell 70:841 ; Shamblott, M., et al. (2001) Proc. Natl. Acad. Sci. USA 98: 113; Shamblott, M., et al. (1998) Proc. Natl. Acad. Sci. USA, 95: 13726; and Koshimizu, U., et al. (1996) Development, 122: 1235, the disclosures of which are incorporated herein by reference.
[0056] By "induced pluripotent stem cell" or "iPSC" it is meant a PSC that is derived from a cell that is not a PSC (i.e., from a cell this is differentiated relative to a PSC). iPSCs can be derived from multiple different cell types, including terminally differentiated cells. iPSCs have an ES cell-like morphology, growing as flat colonies with large nucleo-cytoplasmic ratios, defined borders and prominent nuclei. In addition, iPSCs express one or more key pluripotency markers known by one of ordinary skill in the art, including but not limited to Alkaline Phosphatase, SSEA3, SSEA4, Sox2, Oct3/4, Nanog, TRA160, TRA181, TDGF 1, Dnmt3b, FoxD3, GDF3, Cyp26al, TERT, and zfp42. Examples of methods of generating and characterizing iPSCs may be found in, for example, U.S. Patent Publication Nos. US20090047263, US20090068742,
US20090191159, US20090227032, US20090246875, and US20090304646, the disclosures of which are incorporated herein by reference. Generally, to generate iPSCs, somatic cells are provided with reprograrnming factors (e.g. Oct4, SOX2, KLF4, MYC, Nanog, Lin28, etc.) known in the art to reprogram the somatic cells to become pluripotent stem cells.
[0057] By "somatic cell" it is meant any cell in an organism that, in the absence of experimental manipulation, does not ordinarily give rise to all types of cells in an organism. In other words, somatic cells are cells that have differentiated sufficiently that they will not naturally generate cells of all three germ layers of the body, i.e. ectoderm, mesoderm and endoderm. For example, somatic cells would include both neurons and neural progenitors, the latter of which may be able to naturally give rise to all or some cell types of the central nervous system but cannot give rise to cells of the mesoderm or endoderm lineages.
[0058] By "mitotic cell" it is meant a cell undergoing mitosis. Mitosis is the process by which a
eukaryotic cell separates the chromosomes in its nucleus into two identical sets in two separate nuclei. It is generally followed immediately by cytokinesis, which divides the nuclei, cytoplasm, organelles and cell membrane into two cells containing roughly equal shares of these cellular components.
[0059] By "post-mitotic cell" it is meant a cell that has exited from mitosis, i.e., it is "quiescent", i.e. it is no longer undergoing divisions. This quiescent state may be temporary, i.e. reversible, or it may be permanent.
[0060] By "meiotic cell" it is meant a cell that is undergoing meiosis. Meiosis is the process by which a cell divides its nuclear material for the purpose of producing gametes or spores. Unlike mitosis, in meiosis, the chromosomes undergo a recombination step which shuffles genetic material between chromosomes. Additionally, the outcome of meiosis is four (genetically unique) haploid cells, as compared with the two (genetically identical) diploid cells produced from mitosis.
[0061] In some instances, a component (e.g., a nucleic acid component (e.g., a CRISPR/Cas guide
RNA); a protein component (e.g., a subject fusion protein); and the like) includes a label moiety. The terms "label", "detectable label", or "label moiety" as used herein refer to any moiety that provides for signal detection and may vary widely depending on the particular nature of the assay. Label moieties of interest include both directly detectable labels (direct labels; e.g., a fluorescent label) and indirectly detectable labels (indirect labels; e.g., a binding pair member). A fluorescent label can be any fluorescent label (e.g., a fluorescent dye (e.g., fluorescein, Texas red, rhodamine, ALEXAFLUOR® labels, and the like), a fluorescent protein (e.g., green fluorescent protein (GFP), enhanced GFP (EGFP), yellow fluorescent protein (YFP), red fluorescent protein (RFP), cyan fluorescent protein (CFP), cherry, tomato, tangerine, and any fluorescent derivative thereof), etc.). Suitable detectable (directly or indirectly) label moieties for use in the methods include any moiety that is detectable by spectroscopic, photochemical, biochemical, immunochemical, electrical, optical, chemical, or other means. For example, suitable indirect labels include biotin (a binding pair member), which can be bound by streptavidin (which can itself be directly or indirectly labeled). Labels can also include: a radiolabel (a direct label)(e.g., 3 H, 12 I, 35 S, 14 C, or 32 P); an enzyme (an indirect label)(e.g., peroxidase, alkaline phosphatase, galactosidase, luciferase, glucose oxidase, and the like); a fluorescent protein (a direct label)(e.g., green fluorescent protein, red fluorescent protein, yellow fluorescent protein, and any convenient derivatives thereof); a metal label (a direct label); a colorimetric label; a binding pair member; and the like. By "partner of a binding pair" or "binding pair member" is meant one of a first and a second moiety, wherein the first and the second moiety have a specific binding affinity for each other. Suitable binding pairs include, but are not limited to: antigen/antibodies (for example, digoxigenin/anti-digoxigenin, dinitrophenyl (DNP)/anti-DNP, dansyl-X-anti-dansyl, fluorescein/anti-fluorescein, lucifer yellow/anti-lucifer yellow, and rhodamine anti-rhodamine), biotin/avidin (or biotin/streptavidin) and calmodulin binding protein (CBP)/calmodulin. Any binding pair member can be suitable for use as an indirectly detectable label moiety.
[0062] Any given component, or combination of components can be unlabeled, or can be detectably labeled with a label moiety. In some cases, when two or more components are labeled, they can be labeled with label moieties that are distinguishable from one another.
[0063] General methods in molecular and cellular biochemistry can be found in such standard
textbooks as Molecular Cloning: A Laboratory Manual, 3rd Ed. (Sambrook et al., HaRBor Laboratory Press 2001); Short Protocols in Molecular Biology, 4th Ed. (Ausubel et al. eds., John Wiley & Sons 1999); Protein Methods (Bollag et al., John Wiley & Sons 1996); Nonviral Vectors for Gene Therapy (Wagner et al. eds., Academic Press 1999); Viral Vectors (Kaplift & Loewy eds., Academic Press 1995); Immunology Methods Manual (I. Lefkovits ed., Academic Press 1997); and Cell and Tissue Culture: Laboratory Procedures in Biotechnology (Doyle & Griffiths, John Wiley & Sons 1998), the disclosures of which are incorporated herein by reference.
[0064] Before the present invention is further described, it is to be understood that this invention is not limited to particular embodiments described, as such may, of course, vary. It is also to be understood that the terminology used herein is for the purpose of describing particular embodiments only, and is not intended to be limiting, since the scope of the present invention will be limited only by the appended claims.
[0065] Where a range of values is provided, it is understood that each intervening value, to the tenth of the unit of the lower limit unless the context clearly dictates otherwise, between the upper and lower limit of that range and any other stated or intervening value in that stated range, is encompassed within the invention. The upper and lower limits of these smaller ranges may independently be included in the smaller ranges, and are also encompassed within the invention, subject to any specifically excluded limit in the stated range. Where the stated range includes one or both of the limits, ranges excluding either or both of those included limits are also included in the invention.
[0066] Unless defined otherwise, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this invention belongs. Although any methods and materials similar or equivalent to those described herein can also be used in the practice or testing of the present invention, the preferred methods and materials are now described. All publications mentioned herein are incorporated herein by reference to disclose and describe the methods and/or materials in connection with which the publications are cited.
[0067] It must be noted that as used herein and in the appended claims, the singular forms "a," "an," and "the" include plural referents unless the context clearly dictates otherwise. Thus, for example, reference to "a protein" includes a plurality of such proteins and reference to "the cell" includes reference to one or more cells and equivalents thereof known to those skilled in the art, and so forth. It is further noted that the claims may be drafted to exclude any optional element. As such, this statement is intended to serve as antecedent basis for use of such exclusive terminology as "solely," "only" and the like in connection with the recitation of claim elements, or use of a "negative" limitation.
[0068] It is appreciated that certain features of the invention, which are, for clarity, described in the context of separate embodiments, may also be provided in combination in a single embodiment.
Conversely, various features of the invention, which are, for brevity, described in the context of a single embodiment, may also be provided separately or in any suitable sub-combination. All combinations of the embodiments pertaining to the invention are specifically embraced by the present invention and are disclosed herein just as if each and every combination was individually and explicitly disclosed. In addition, all sub-combinations of the various embodiments and elements thereof are also specifically embraced by the present invention and are disclosed herein just as if each and every such sub-combination was individually and explicitly disclosed herein. [0069] The publications discussed herein are provided solely for their disclosure prior to the filing date of the present application. Nothing herein is to be construed as an admission that the present invention is not entitled to antedate such publication by virtue of prior invention. Further, the dates of publication provided may be different from the actual publication dates which may need to be independently confirmed.
V. DETAILED DESCRIPTION
[0070] The present disclosure provides methods (and compositions for performing a subject method) for tagging, e.g., biotinylating, target proteins in proximity to a nucleotide sequence of interest. A subject method can include contacting a target nucleic acid (e.g., DNA molecule) that includes the nucleotide sequence of interest with (a) a first fusion protein that includes a first nuclease defective CRISPR/Cas nucleic acid-guided polypeptide (e.g., a dCas9) fused to a first portion of a split BirA* protein; (b) a first guide nucleic acid (e.g., a first Cas9 guide RNA) that binds to the first fusion protein and hybridizes to a first target sequence of the target nucleic acid; (c) a second fusion protein comprising a second nuclease defective CRISPR/Cas nucleic acid-guided polypeptide (e.g., a dCas9) fused to a second portion of the split BirA* protein; and (d) a second guide nucleic acid (e.g., a second Cas9 guide RNA) that binds to the second fusion protein and hybridizes to a second target sequence of the target nucleic acid.
[0071] The first and second portions of the split BirA* protein are brought together (thereby forming a functional BirA* protein) by virtue of the proper positioning of the two fusion proteins (a) [first CRISPR/Cas nucleic acid-guided polypeptide fused to first portion of split BirA*] and (c) [second CRISPR/Cas nucleic acid-guided polypeptide fused to second portion of split BirA*] near the nucleotide sequence of interest.
[0072] Thus, a complex of (a) and (b) forms [complex 1 - first fusion protein plus first guide nucleic acid] and a complex of (c) and (d) forms [complex 2 - second fusion protein plus second guide nucleic acid] . It is the guide nucleic acids (b) and (d) that position the first and second fusion proteins (a) and (c) at the nucleotide sequence of interest by virtue of where each guide nucleic acid hybridizes, i.e., by virtue of the location of the first and second target sequences in the target nucleic acid relative to one another. Complexes 1 and 2 are targeted to the target nucleic acid such that the nuclease defective CRISPR/Cas nucleic acid-guided polypeptide (e.g., dCas9) of the first fusion protein is not occupying the same space as the nuclease defective CRISPR/Cas nucleic acid-guided polypeptide (e.g., dCas9) of the second fusion protein. The fusion partners
(first and second portions of the split BirA* protein) of the first and second fusion proteins are brought together thereby forming a functional BirA* protein. In other words, 'complex is targeted to a first target sequence (the first guide nucleic acid hybridizes to the first target sequence) and 'complex 2' is targeted to a second target sequence (the second guide nucleic acid hybridizes to the second target sequence) such that a functional BirA* protein forms when complexes 1 and 2 are positioned in such a way to bring the first and second portions of the split BirA* protein together. The functional BirA* protein then biotinylates target proteins within its proximity (e.g., within a range of 12 nm, such as within a range of 10 nm or within a range of 8 nm). In some cases the functional BirA* protein then biotinylates target proteins within a range of 12 nm. In some cases the functional BirA* protein then biotinylates target proteins within a range of 10 nm. In some cases the functional BirA* protein then biotinylates target proteins within a range of 8 nm. In some cases the functional BirA* protein biotinylates target proteins that are within 12 nm of (are 12 nm or less in distance from) the functional BirA* protein. In some cases the functional BirA* protein biotinylates target proteins that are within 10 nm of (are 10 nm or less in distance from) the functional BirA* protein In some cases the functional BirA* protein biotinylates target proteins that are within 8 nm of (are 8 nm or less in distance from) the functional BirA* protein. In some cases the functional BirA* protein biotinylates target proteins that are 12 or less nm from the functional BirA* protein. In some cases the functional BirA* protein biotinylates target proteins that are 10 or less nm from the functional BirA* protein. In some cases the functional BirA* protein biotinylates target proteins that are 8 or less nm from the functional BirA* protein.
[0073] A pair of fusion proteins [first and second fusion proteins described above as (a) and (c)] that can be used to form a functional BirA* protein when brought together are referred to herein as a "corresponding pair of split BirA* fusion proteins." In some cases the CRISPR/Cas nucleic acid- guided polypeptide portion of the first and second fusion proteins of a corresponding pair of split BirA* fusion proteins are the same, e.g., both can be the same dCas9 protein, but in some cases they are different. For example in some cases they are the same protein type (e.g., Cas9) but are derived from a different prokaryotic species (e.g., one form 5. pyogenes and the other from 5. thermophiles) and in some such cases the 'handle' of the associated guide RNA can be different for each fusion protein. As another example, in some cases, one of the fusion proteins has one type of CRISPR/Cas nucleic acid-guided polypeptide portion (e.g., a dCas9) and the other has a different type (e.g., Cpfl).
Fusion proteins
[0074] Provided are fusion proteins that include a nuclease defective CRISPR/Cas RNA-guided
polypeptide (e.g., dCas9 - described in more detail below) fused to a portion of a split BirA* protein (described in more detail below). Methods and Compositions of the disclosure can include two such fusion proteins, where one fusion protein includes a first portion of a split BirA* protein and a second fusion protein includes a second portion of a split BirA* protein such that, when the first and second portions are brought together, they form a function BirA* protein. The first and second portions of a split BirA* protein can be fused, independently relative to one another, to the N-terminus or the C-terminus of the CRISPR/Cas RNA-guided polypeptide or can be fused to an internal insertion site of the CRISPR/Cas RNA-guided polypeptide. For example, in some cases a first fusion protein includes a nuclease defective CRISPR/Cas RNA- guided polypeptide (e.g., dCas9) fused (internally, at the N-terminus, or at the C-terminus) to a first portion of a split BirA* protein and the second fusion protein includes a nuclease defective CRISPR/Cas RNA-guided polypeptide (e.g., dCas9) fused (internally, at the N-terminus, or at the C-terminus) to a second portion of a split BirA* protein.
[0075] For example, in some cases the nuclease defective CRISPR/Cas RNA-guided polypeptide
(e.g., dCas9) of the first fusion protein is fused at its N-terminus to a first portion of a split BirA* protein; and the nuclease defective CRISPR/Cas RNA-guided polypeptide (e.g., dCas9) of the second fusion protein is fused at its N-terminus to a second portion of a split BirA* protein. In some cases the nuclease defective CRISPR/Cas RNA-guided polypeptide (e.g., dCas9) of the first fusion protein is fused at its N-terminus to a first portion of a split BirA* protein; and the nuclease defective CRISPR/Cas RNA-guided polypeptide (e.g., dCas9) of the second fusion protein is fused at its C-terminus to a second portion of a split BirA* protein. In some cases the nuclease defective CRISPR/Cas RNA-guided polypeptide (e.g., dCas9) of the first fusion protein is fused at its C-terminus to a first portion of a split BirA* protein; and the nuclease defective CRISPR/Cas RNA-guided polypeptide (e.g., dCas9) of the second fusion protein is fused at its C-terminus to a second portion of a split BirA* protein.
[0076] Illustrative examples of fusion proteins (dCas9 fused to split BirA*) are depicted in Figure 11 and are set forth as SEQ ID NOs: 5-6. The sequence set forth as SEQ ID NO: 7 is a fusion protein that includes dCas9 fused to a BirA* protein that is not split.
Split BirA*
[0077] "BirA*" is a mutated version (e.g., Rl 18G) of a biotin ligase protein. The mutated version of the protein forms biotinoyl-5'-AMP from biotin and ATP but this 'activated' biotin is only loosely bound at the active site. The highly reactive biotinoyl-5'-AMP diffuses away and indiscriminately biotinylates proteins at lysine residues within an action radius of about 10 nm. The biotinylated proteins can be recovered by biotin-affinity purification approaches (e.g.
NeutrAvidin-coated beads). The biotinylated proteins can be identified by any convenient method, e.g., by mass spectrometry. The sequence of BirA* is:
MKDNTVPLKLIALLANGEFHSGEQLGETLGMSRAAINKHIQTLRDWGVDVFTVPGKGY SLPEPIQLLNAKQILGQLDGGSVAVLPVIDSTNQYLLDRIGELKSGDACVAEYQQAGRG GRGRKWFSPFGANLYLSMFWRLEQGPAAAIGLSLVIGIVMAEVLRKLGADKVRVKWP NDLYLQDRKLAGILVELTGKTGDAAQIVIGAGINMAMRRVEESVVNQGWITLQEAGINL DRNTLAAMLIRELRAALELFEQEGLAPYLSRWEKLDNFINRPVKLIIGDKEIFGISRGIDK QGALLLEQDGIIKPWMGGEISLRSAEK (SEQ ID NO: 1).
[0078] In some cases, a subject fusion protein includes an amino acid sequence having 70% or more identity (e.g., 80% or more, 90% or more, 95% or more, 98% or more, 99% or more, 99.5 % or more, or 100% identity) with the amino acid sequence:
KDNTVPLKLIALLANGEFHSGEQLGETLGMSRAAINKHIQTLRDWGVDVFTVPGKGYSL
PEPIQLLNAKQILGQLDGGSVAVLPVIDSTNQYLLDRIGELKSGDACVAEYQQAGRGGR
GRKWFSPFGANLYLSMFWRLEQGPAAAIGLSLVIGIVMAEVLRKLGADKVRVKWPNDL
YLQDRKLAGILVELTGKTGDAAQIVIGAGINMAMRRVEESVVNQGWITLQEAGINLDR
NTLAAMLIRELRAALELFEQEGLAPYLSRWEKLDNFINRPVKLIIGDKEIFGISRGIDKQG
ALLLEQDGIIKPWMGGEISLRSAEK (SEQ ID NO: 2)
[0079] BirA* can be split into two portions such that each portion is inactive unless brought together with the other portion. Such a protein is referred to herein as a "split BirA* polypeptide" or "split BirA* protein."
[0080] In some cases, one portion of a split BirA* protein can be:
KDNTVPLKLIALLANGEFHSGEQLGETLGMSRAAINKHIQTLRDWGVDVFTVPGKGYSL PEPIQLLNAKQILGQLDGGSVAVLPVIDSTNQYLLDRIGELKSGDACVAEYQQAGRGGR GRKWFSPFGANLYLSMFWRLEQGPAAAIGLSLVIGIVMAEVLRKLGADKVRVKWPNDL YLQDRKLAGILVELTGKTGDAAQIVIGAGINMAMRRVEE (SEQ ID NO: 3); and the other portion of the split BirA* protein can be
SVVNQGWITLQEAGINLDRNTLAAMLIRELRAALELFEQEGLAPYLSRWEKLDNFINRP VKLIIGDKEIFGISRGIDKQG ALLLEQDGIIKPWMGGEISLRSAEK (SEQ ID NO: 4).
[0081] Thus, in some cases a fusion protein of the disclosure includes a portion of a split BirA*
protein that includes an amino acid sequence having 70% or more identity (e.g., 80% or more, 90% or more, 95% or more, 98% or more, 99% or more, 99.5 % or more, or 100% identity) with the amino acid sequence set forth as SEQ ID NO: 3. In some cases a fusion protein of the disclosure includes a portion of a split BirA* protein that includes an amino acid sequence having 70% or more identity (e.g., 80% or more, 90% or more, 95% or more, 98% or more, 99% or more, 99.5 % or more, or 100% identity) with the amino acid sequence set forth as SEQ ID NO: 4.
[0082] In some cases a one fusion protein of the disclosure includes a first portion of a split BirA* protein that includes an amino acid sequence having 70% or more identity (e.g., 80% or more, 90% or more, 95% or more, 98% or more, 99% or more, 99.5 % or more, or 100% identity) with the amino acid sequence set forth as SEQ ID NO: 3; and another fusion protein includes a second portion of a split BirA* protein that includes an amino acid sequence having 70% or more identity (e.g., 80% or more, 90% or more, 95% or more, 98% or more, 99% or more, 99.5 % or more, or 100% identity) with the amino acid sequence set forth as SEQ ID NO: 4. In some cases one fusion protein of the disclosure includes a first portion of a split BirA* protein that includes the amino acid sequence set forth as SEQ ID NO: 3; and another fusion protein includes a second portion of a split BirA* protein that includes the amino acid sequence set forth as SEQ ID NO: 4.
CRISPR/Cas RNA-guided polypeptides
[0083] A subject fusion protein includes a nuclease defective CRISPR/Cas nucleic acid-guided (e.g., RNA-guided) protein fused to a first or second portion of a split BirA* protein. The term "nuclease defective" as used herein refers to the ability of the protein to cleave the target DNA molecule. Thus, a nuclease defective CRISPR/Cas nucleic acid-guided (e.g., RNA-guided) protein can be targeted to a target sequence of a target DNA molecule but does not cleave the target DNA molecule. Some CRISPR/Cas RNA-guided proteins, in addition to their ability to cleave target DNA, can cleave precursor CRISPR RNA (pre-crRNA) during processing of pre- crRNA into mature crRNA. For such proteins, the term "nuclease defective" does not necessarily refer to their ability (or inability) to cleave pre -crRNA but instead refers to their inability to cleave target DNA. As an example, a CRISPR/Cas nucleic acid-guided (e.g., RNA-guided) protein that cannot cleave target DNA (e.g., via mutation of the appropriate catalytic domain(s)) but can still cleave pre -crRNA is referred to herein as "nuclease defective." In other words, the term "nuclease defective" in the context of a CRISPR/Cas nucleic acid-guided polypeptide is meant herein to encompass proteins that have some nuclease activity (e.g., to cleave pre-crRNA) but do not have nuclease activity that can cleave target DNA.
[0084] In some cases the nuclease defective CRISPR/Cas nucleic acid-guided polypeptide of a subject fusion protein is a nuclease defective class 2 CRISPR/Cas nucleic acid-guided polypeptide (e.g., nuclease defective type II, type V, or type VI CRISPR/Cas nucleic acid-guided polypeptide). In some the nuclease defective CRISPR/Cas nucleic acid-guided polypeptide of a subject fusion protein is a nuclease defective class 2 CRISPR/Cas nucleic acid-guided (e.g., RNA-guide) protein.
[0085] In some the nuclease defective CRISPR/Cas nucleic acid-guided polypeptide of a subject fusion protein is a nuclease defective class 2 type II CRISPR/Cas endonuclease (e.g., a dCas9 protein). In some the nuclease defective CRISPR/Cas nucleic acid-guided polypeptide of a subject fusion protein is a class 2 type V CRISPR/Cas endonuclease (e.g., a Cpfl protein, a C2cl protein, or a C2c3 protein). In some the nuclease defective CRISPR/Cas nucleic acid-guided polypeptide of a subject fusion protein is a nuclease defective CasY protein. In some the nuclease defective CRISPR/Cas nucleic acid-guided polypeptide of a subject fusion protein is a nuclease defective CasX protein.
Class 2 CRISPR/Cas endonucleases
[0086] RNA-mediated adaptive immune systems in bacteria and archaea rely on Clustered Regularly Interspaced Short Palindromic Repeat (CRISPR) genomic loci and CRISPR-associated (Cas) proteins that function together to provide protection from invading viruses and plasmids. In some embodiments, a CRISPR/Cas nucleic acid-guided (e.g., RNA -guide) protein of a fusion protein of the present disclosure is a class 2 CRISPR/Cas endonuclease. In class 2 CRISPR systems, the functions of the effector complex (e.g., the cleavage of target DNA) are carried out by a single endonuclease (e.g., see Shmakov et al, Nat Rev Microbiol. 2017 Mar;15(3): 169-182; Zetsche et al, Cell. 2015 Oct 22;163(3):759-71 ; Makarova et al, Nat Rev Microbiol. 2015 Nov;13(l l):722-36; and Shmakov et al., Mol Cell. 2015 Nov 5;60(3):385-97). As such, the terms "class 2 CRISPR/Cas protein", "class 2 CRISPR/Cas nucleic acid-guided protein" and "class 2 CRISPR/Cas RNA-guided protein" are used herein to encompass the endonuclease (the target nucleic acid cleaving protein) from class 2 CRISPR systems, but in fusion proteins of this disclosure, the protein is nuclease defective. Thus, the term "class 2 CRISPR/Cas endonuclease" as used herein encompasses type II CRISPR/Cas proteins (e.g., Cas9), type V CRISPR/Cas proteins (e.g., Cpfl, C2cl, C2C3), and type VI CRISPR/Cas proteins (e.g., C2c2), as well as CasX and CasY. Class 2 CRISPR/Cas proteins encompass type II, type V, and type VI
CRISPR/Cas proteins, as well as CasX CasY, but the term is also meant to encompass any class 2 CRISPR/Cas protein suitable for binding to a corresponding guide RNA, forming an ribonucleoprotein (RNP) complex that can target a specific location of a target nucleic molecule via hybridization of the guide sequence of the guide RNA with the target nucleic acid molecule (e.g., target RNA molecule, target DNA molecule)
Type II CRISPR/Cas RNA-guided proteins (e.g., Cas 9)
[0087] In natural Type II CRISPR/Cas systems, Cas9 functions as an RNA-guided endonuclease that uses a dual-guide RNA having a crRNA and iraws-activating crRNA (tracrRNA) for target recognition and cleavage by a mechanism involving two nuclease active sites in Cas9 that together generate double-stranded DNA breaks (DSBs), or can individually generate single- stranded DNA breaks (SSBs). The Type II CRISPR endonuclease Cas9 and engineered dual- (dgRNA) or single guide RNA (sgRNA) form a ribonucleoprotein (RNP) complex that can be targeted to a desired DNA sequence. Guided by a dual-RNA complex or a chimeric single -guide RNA, Cas9 generates site-specific DSBs or SSBs within double-stranded DNA (dsDNA) target nucleic acids, which are repaired either by non-homologous end joining (NHEJ) or homology- directed recombination (HDR).
[0088] In some embodiments, a CRISPR/Cas nucleic acid-guided (e.g., RNA -guided) protein of a fusion protein of the present disclosure is a type II CRISPR/Cas endonuclease (but nuclease defective). A type II CRISPR/Cas endonuclease is a type of class 2 CRISPR/Cas endonuclease. In some cases, a type II CRISPR/Cas endonuclease is a Cas9 protein. A Cas9 protein forms a complex with a Cas9 guide RNA. The guide RNA provides target specificity to a Cas9-guide RNA complex by having a nucleotide sequence (a guide sequence) that is complementary to a sequence (the target site) of a target nucleic acid (as described elsewhere herein). The Cas9 protein of the complex provides the site-specific activity. In other words, the Cas9 protein is guided to a target site (e.g., stabilized at a target site) within a target nucleic acid sequence (e.g. a chromosomal sequence or an extrachromosomal sequence, e.g., an episomal sequence, a minicircle sequence, a mitochondrial sequence, a chloroplast sequence, etc.) by virtue of its association with the protein-binding segment of the Cas9 guide RNA.
[0089] Examples of amino acid sequences of suitable Cas9 proteins that can be part of a subject fusion protein include but are not limited to nuclease defective versions of the amino acid sequences set forth in SEQ ID NOs: 15-826. Those listed as SEQ ID NOs: 15-271 and 275-826 are naturally occurring Cas9 protein sequences and a subject nuclease defective Cas9 protein would be a variant of a naturally occurring Cas9 protein (e.g., would be mutated, e.g., include an insertion, a deletion, and/or a substitution relative to a corresponding naturally occurring Cas9 protein). Naturally occurring Cas9 proteins bind a Cas9 guide RNA, are thereby directed to a specific sequence within a target nucleic acid (a target site), and cleave the target nucleic acid (e.g., cleave dsDNA to generate a double strand break, cleave ssDNA, cleave ssRNA, etc.). A nuclease defective version of such a protein can bind a Cas9 guide RNA, can be directed to a specific sequence within a target nucleic acid (a target site), but cannot cleave the target nucleic acid.
[0090] Assays to determine whether given protein interacts with a Cas9 guide RNA can be any
convenient binding assay that tests for binding between a protein and a nucleic acid. Suitable binding assays (e.g., gel shift assays) will be known to one of ordinary skill in the art (e.g., assays that include adding a Cas9 guide RNA and a protein to a target nucleic acid).
[0091] Assays to determine whether a protein has an activity (e.g., to determine if the protein has nuclease activity that cleaves a target nucleic acid and/or some heterologous activity) can be any convenient assay (e.g., any convenient nucleic acid cleavage assay that tests for nucleic acid cleavage). Suitable assays (e.g., cleavage assays) will be known to one of ordinary skill in the art and can include adding a Cas9 guide RNA and a protein to a target nucleic acid. [0092] Many Cas9 orthologs from a wide variety of species have been identified and in some cases the proteins share only a few identical amino acids. Identified Cas9 orthologs have similar domain architecture with a central HNH endonuclease domain and a split RuvC/RNaseH domain (e.g., RuvCI, RuvCII, and RuvCIII) (e.g., see Table 1). For example, a Cas9 protein can have 3 different regions (sometimes referred to as RuvC-I, RuvC-II, and RucC-III), that are not contiguous with respect to the primary amino acid sequence of the Cas9 protein, but fold together to form a RuvC domain once the protein is produced and folds. Thus, Cas9 proteins can be said to share at least 4 key motifs with a conserved architecture. Motifs 1, 2, and 4 are RuvC like motifs while motif 3 is an HNH-motif. The motifs set forth in Table 1 may not represent the entire RuvC -like and/or HNH domains as accepted in the art, but Table 1 does present motifs that can be used to help determine whether a given protein is a Cas9 protein.
[0093] Table 1. Table 1 lists 4 motifs that are present in Cas9 sequences from various species. The amino acids listed in Table 1 are from the Cas9 from 5. pyogenes (SEQ ID NO: 15).
Figure imgf000026_0001
[0094] In some cases, a variant Cas9 protein can cleave the complementary strand (sometimes referred to in the art as the target strand) of a target nucleic acid but has reduced ability to cleave the non-complementary strand (sometimes referred to in the art as the non-target strand) of a target nucleic acid. For example, the variant Cas9 protein can have a mutation (amino acid substitution) that reduces the function of the RuvC domain. Thus, the Cas9 protein can be a nickase that cleaves the complementary strand, but does not cleave the non-complementary strand. As a non-limiting example, in some embodiments, a variant Cas9 protein has a mutation at an amino acid position corresponding to residue D10 (e.g., DIOA, aspartate to alanine) of SEQ ID NO: 15 (or the corresponding position of any of the proteins set forth in SEQ ID NOs: 16-271 and 275-826) and can therefore cleave the complementary strand of a double stranded target nucleic acid but has reduced ability to cleave the non-complementary strand of a double stranded target nucleic acid (thus resulting in a single strand break (SSB) instead of a double strand break (DSB) when the variant Cas9 protein cleaves a double stranded target nucleic acid) (see, for example, Jinek et al., Science. 2012 Aug 17;337(6096):816-21). See, e.g., SEQ ID NO: 272.
[0095] In some cases, a variant Cas9 protein can cleave the non-complementary strand of a target nucleic acid but has reduced ability to cleave the complementary strand of the target nucleic acid. For example, the variant Cas9 protein can have a mutation (amino acid substitution) that reduces the function of the HNH domain. Thus, the Cas9 protein can be a nickase that cleaves the non-complementary strand, but does not cleave the complementary strand. As a non-limiting example, in some embodiments, the variant Cas9 protein has a mutation at an amino acid position corresponding to residue H840 (e.g., an H840A mutation, histidine to alanine) of SEQ ID NO: 15 (or the corresponding position of any of the proteins set forth as SEQ ID NOs: 16-271 and 275-816) and can therefore cleave the non-complementary strand of the target nucleic acid but has reduced ability to cleave (e.g., does not cleave) the complementary strand of the target nucleic acid. Such a Cas9 protein has a reduced ability to cleave a target nucleic acid (e.g., a single stranded target nucleic acid) but retains the ability to bind a target nucleic acid (e.g., a single stranded target nucleic acid). See, e.g., SEQ ID NO: 273.
[0096] In some cases, a variant Cas9 protein has a reduced ability to cleave both the complementary and the non-complementary strands of a double stranded target nucleic acid. For example, a "nuclease defective" Cas9 lacks a functioning RuvC domain (i.e., does not cleave the non- complementary strand of a double stranded target DNA) and lacks a functioning HNH domain (i.e., does not cleave the complementary strand of a double stranded target DNA). As a non- limiting example, in some cases, the nuclease defective Cas9 protein harbors mutations at amino acid positions corresponding to residues D10 and H840 (e.g., DIOA and H840A) of SEQ ID NO: 15 (or the corresponding residues of any of the proteins set forth as SEQ ID NOs: 16-271 and 275-816) such that the polypeptide has a reduced ability to cleave (e.g., does not cleave) both the complementary and the non-complementary strands of a target nucleic acid. Such a Cas9 protein has a reduced ability to cleave a target nucleic acid (e.g., a single stranded or double stranded target nucleic acid) but retains the ability to bind a target nucleic acid. A Cas9 protein that cannot cleave target nucleic acid (e.g., due to one or more mutations, e.g., in the catalytic domains of the RuvC and HNH domains) is referred to as a "nuclease defective Cas9" , "dead Cas9" or simply "dCas9." See, e.g., SEQ ID NO: 274.
[0097] Other residues can be mutated to achieve the above effects (i.e. inactivate one or the other nuclease portions). As non-limiting examples, residues D10, G12, G17, E762, H840, N854, N863, H982, H983, A984, D986, and/or A987 of SEQ ID NO: 15 (or the corresponding mutations of any of the proteins set forth as SEQ ID NOs: 16-826) can be altered (i.e., substituted). Also, mutations other than alanine substitutions are suitable.
[0098] A subject nuclease defective Cas9 protein (e.g., one that includes a mutation that correspond to a D10, G12, G17, E762, H840, N854, N863, H982, H983, A984, D986, and/or a A987 mutation (e.g., D10A, G12A, G17A, E762A, H840A, N854A, N863A, H982A, H983A, A984A, and/or D986A) of SEQ ID NO: 15 (e.g., a corresponding mutations of any of the proteins set forth as SEQ ID NOs: 16-826,), the variant Cas9 protein can still bind to target nucleic acid in a site- specific manner (because it is still guided to a target nucleic acid sequence by a Cas9 guide RNA) as long as it retains the ability to interact with the Cas9 guide RNA.
[0099] In some cases, a suitable nuclease defective Cas9 protein (of a subject fusion protein)
comprises an amino acid sequence having 4 motifs, each of motifs 1-4 having 60% or more, 70% or more, 75% or more, 80% or more, 85% or more, 90% or more, 95% or more, 99% or more or 100% amino acid sequence identity (independently) to motifs 1-4 as set forth in SEQ ID NOs: 11-14, respectively (e.g., see Table 1), or to the corresponding portions in any of the amino acid sequences set forth in SEQ ID NOs: 15-826 (with the exception for the case of '100%' that the nuclease defective Cas9 protein can include a mutation, or two mutations, that render inactive the RuvC and HNH domains).
[00100] In other words, in some cases, a suitable Cas9 protein (of a subject fusion protein) comprises an amino acid sequence having 4 motifs, each of motifs 1-4 having 60% or more, 70% or more,
75% or more, 80% or more, 85% or more, 90% or more, 95% or more, 99% or more or 100% amino acid sequence identity to motifs 1-4 (independently) of the Cas9 amino acid sequence set forth in SEQ ID NO: 15 (e.g., the sequences set forth in SEQ ID NOs: 11-14, e.g., see Table 1), or to the corresponding portions in any of the amino acid sequences set forth in SEQ ID NOs:
16-826 (with the exception for the case of '100%' that the nuclease defective Cas9 protein can include a mutation, or two mutations, that render inactive the RuvC and HNH domains). In some cases, a suitable nuclease defective Cas9 protein (of a subject fusion protein) comprises an amino acid sequence having 4 motifs, each of motifs 1-4 having 60% or more amino acid sequence identity to motifs 1-4 of the Cas9 amino acid sequence set forth as SEQ ID NO: 15 (the motifs are in Table 1, and are set forth as SEQ ID NOs: 11-14, respectively), or to the corresponding portions in any of the amino acid sequences set forth in SEQ ID NOs: 16-826. In some cases, a suitable nuclease defective Cas9 protein comprises an amino acid sequence having 4 motifs, each of motifs 1-4 having 70% or more amino acid sequence identity to motifs 1-4 of the Cas9 amino acid sequence set forth as SEQ ID NO: 15 (the motifs are in Table 1, and are set forth as SEQ ID NOs: 11-14, respectively), or to the corresponding portions in any of the amino acid sequences set forth in SEQ ID NOs: 16-826. In some cases, a suitable nuclease defective Cas9 protein comprises an amino acid sequence having 4 motifs, each of motifs 1-4 having 75% or more amino acid sequence identity to motifs 1-4 of the Cas9 amino acid sequence set forth as SEQ ID NO: 15 (the motifs are in Table 1, and are set forth as SEQ ID NOs: 11-14, respectively), or to the corresponding portions in any of the amino acid sequences set forth in SEQ ID NOs: 16-826. In some cases, a suitable nuclease defective Cas9 protein comprises an amino acid sequence having 4 motifs, each of motifs 1-4 having 80% or more amino acid sequence identity to motifs 1-4 of the Cas9 amino acid sequence set forth as SEQ ID NO: 15 (the motifs are in Table 1, and are set forth as SEQ ID NOs: 11-14, respectively), or to the corresponding portions in any of the amino acid sequences set forth in SEQ ID NOs: 16-826. In some cases, a suitable nuclease defective Cas9 protein comprises an amino acid sequence having 4 motifs, each of motifs 1-4 having 85% or more amino acid sequence identity to motifs 1-4 of the Cas9 amino acid sequence set forth as SEQ ID NO: 15 (the motifs are in Table 1, and are set forth as SEQ ID NOs: 11-14, respectively), or to the corresponding portions in any of the amino acid sequences set forth in SEQ ID NOs: 16-826. In some cases, a suitable nuclease defective Cas9 protein comprises an amino acid sequence having 4 motifs, each of motifs 1-4 having 90% or more amino acid sequence identity to motifs 1-4 of the Cas9 amino acid sequence set forth as SEQ ID NO: 15 (the motifs are in Table 1, and are set forth as SEQ ID NOs: 11-14, respectively), or to the corresponding portions in any of the amino acid sequences set forth in SEQ ID NOs: 16-826. In some cases, a suitable nuclease defective Cas9 protein comprises an amino acid sequence having 4 motifs, each of motifs 1-4 having 95% or more amino acid sequence identity to motifs 1-4 of the Cas9 amino acid sequence set forth as SEQ ID NO: 15 (the motifs are in Table 1, and are set forth as SEQ ID NOs: 11-14, respectively), or to the corresponding portions in any of the amino acid sequences set forth in SEQ ID NOs: 16-826. In some cases, a suitable nuclease defective Cas9 protein comprises an amino acid sequence having 4 motifs, each of motifs 1-4 having 99% or more amino acid sequence identity to motifs 1-4 of the Cas9 amino acid sequence set forth as SEQ ID NO: 15 (the motifs are in Table 1, and are set forth as SEQ ID NOs: 11-14, respectively), or to the corresponding portions in any of the amino acid sequences set forth in SEQ ID NOs: 16-826. In some cases, a suitable nuclease defective Cas9 protein comprises an amino acid sequence having 4 motifs, each of motifs 1-4 having 100% amino acid sequence identity to motifs 1-4 of the Cas9 amino acid sequence set forth as SEQ ID NO: 15 (the motifs are in Table 1, and are set forth as SEQ ID NOs: 11-14,
respectively), or to the corresponding portions in any of the amino acid sequences set forth in SEQ ID NOs: 16-826 (with the exception that the nuclease defective Cas9 protein can include a mutation, or two mutations, that render inactive the RuvC and HNH domains).
[00102] In some cases, a suitable nuclease defective Cas9 protein (of a subject fusion protein)
comprises an amino acid sequence having 60% or more, 70% or more, 75% or more, 80% or more, 85% or more, 90% or more, 95% or more, 99% or more or 100% amino acid sequence identity to amino acids 7-166 or 731-1003 of the Cas9 amino acid sequence set forth in SEQ ID NO: 15, or to the corresponding portions in any of the amino acid sequences set forth as SEQ ID NOs: 16-826 (with the exception for the '100% case that the nuclease defective Cas9 protein can include a mutation, or two mutations, that render inactive the RuvC and HNH domains).
[00103] In some cases, a suitable nuclease defective Cas9 protein (of a subject fusion protein)
comprises an amino acid sequence having 60% or more amino acid sequence identity to amino acids 7-166 or 731-1003 of the Cas9 amino acid sequence set forth in SEQ ID NO: 15, or to the corresponding portions in any of the amino acid sequences set forth as SEQ ID NOs: 16-826. In some cases, a suitable nuclease defective Cas9 protein comprises an amino acid sequence having 70% or more amino acid sequence identity to amino acids 7-166 or 731-1003 of the Cas9 amino acid sequence set forth in SEQ ID NO: 15, or to the corresponding portions in any of the amino acid sequences set forth as SEQ ID NOs: 16-826. In some cases, a suitable nuclease defective Cas9 protein comprises an amino acid sequence having 75% or more amino acid sequence identity to amino acids 7-166 or 731-1003 of the Cas9 amino acid sequence set forth in SEQ ID NO: 15, or to the corresponding portions in any of the amino acid sequences set forth as SEQ ID NOs: 16-826. In some cases, a suitable nuclease defective Cas9 protein comprises an amino acid sequence having 80% or more amino acid sequence identity to amino acids 7-166 or 731-1003 of the Cas9 amino acid sequence set forth in SEQ ID NO: 15, or to the corresponding portions in any of the amino acid sequences set forth as SEQ ID NOs: 16-826. In some cases, a suitable nuclease defective Cas9 protein comprises an amino acid sequence having 85% or more amino acid sequence identity to amino acids 7-166 or 731-1003 of the Cas9 amino acid sequence set forth in SEQ ID NO: 15, or to the corresponding portions in any of the amino acid sequences set forth as SEQ ID NOs: 16-826. In some cases, a suitable nuclease defective Cas9 protein comprises an amino acid sequence having 90% or more amino acid sequence identity to amino acids 7-166 or 731-1003 of the Cas9 amino acid sequence set forth in SEQ ID NO: 15, or to the corresponding portions in any of the amino acid sequences set forth as SEQ ID NOs: 16-826. In some cases, a suitable nuclease defective Cas9 protein comprises an amino acid sequence having 95% or more amino acid sequence identity to amino acids 7-166 or 731-1003 of the Cas9 amino acid sequence set forth in SEQ ID NO: 15, or to the corresponding portions in any of the amino acid sequences set forth as SEQ ID NOs: 16-826. In some cases, a suitable nuclease defective Cas9 protein comprises an amino acid sequence having 99% or more amino acid sequence identity to amino acids 7-166 or 731-1003 of the Cas9 amino acid sequence set forth in SEQ ID NO: 15, or to the corresponding portions in any of the amino acid sequences set forth as SEQ ID NOs: 16-826. In some cases, a suitable nuclease defective Cas9 protein comprises an amino acid sequence having 100% amino acid sequence identity to amino acids 7-166 or 731-1003 of the Cas9 amino acid sequence set forth in SEQ ID NO: 15, or to the corresponding portions in any of the amino acid sequences set forth as SEQ ID NOs: 16-826 (with the exception that the nuclease defective Cas9 protein can include a mutation, or two mutations, that render inactive the RuvC and HNH domains).
[00104] In some cases, a suitable nuclease defective Cas9 protein (of a subject fusion protein)
comprises an amino acid sequence having 60% or more, 70% or more, 75% or more, 80% or more, 85% or more, 90% or more, 95% or more, or 99% or more amino acid sequence identity to the Cas9 amino acid sequence set forth in any one of SEQ ID NOs: 15-826.
[00105] In some cases, a suitable nuclease defective Cas9 protein (of a subject fusion protein)
comprises an amino acid sequence having 60% or more amino acid sequence identity to the Cas9 amino acid sequence set forth in Any one of SEQ ID NOs. 15-826. In some cases, a suitable nuclease defective Cas9 protein comprises an amino acid sequence having 70% or more amino acid sequence identity to the Cas9 amino acid sequence set forth in any one of SEQ ID NOs: 15- 826. In some cases, a suitable nuclease defective Cas9 protein comprises an amino acid sequence having 75% or more amino acid sequence identity to the Cas9 amino acid sequence set forth in any one of SEQ ID NOs: 15-826. In some cases, a suitable nuclease defective Cas9 protein comprises an amino acid sequence having 80% or more amino acid sequence identity to the Cas9 amino acid sequence set forth in any one of SEQ ID NOs: 15-826. In some cases, a suitable nuclease defective Cas9 protein comprises an amino acid sequence having 85% or more amino acid sequence identity to the Cas9 amino acid sequence set forth in any one of SEQ ID NOs: 15- 826. In some cases, a suitable nuclease defective Cas9 protein comprises an amino acid sequence having 90% or more amino acid sequence identity to the Cas9 amino acid sequence set forth in any one of SEQ ID NOs: 15-826. In some cases, a suitable nuclease defective Cas9 protein comprises an amino acid sequence having 95% or more amino acid sequence identity to the Cas9 amino acid sequence set forth in any one of SEQ ID NOs: 15-826. In some cases, a suitable nuclease defective Cas9 protein comprises an amino acid sequence having 99% or more amino acid sequence identity to the Cas9 amino acid sequence set forth in any one of SEQ ID NOs: 15- 826.
[00106] In some cases, a nuclease defective Cas9 protein (of a subject fusion protein) comprises 4 motifs (as listed in Table 1), at least one with (or each with) amino acid sequences having 75% or more, 80% or more, 85% or more, 90% or more, 95% or more, 99% or more or 100% amino acid sequence identity to each of the 4 motifs listed in Table 1 (SEQ ID NOs: 11-14), or to the corresponding portions in any of the amino acid sequences set forth as SEQ ID NOs: 16-826 (with the exception for the ' 100%' case that the nuclease defective Cas9 protein can include a mutation, or two mutations, that render inactive the RuvC and HNH domains).
[00107] Examples of various Cas9 proteins (and Cas9 domain structure) and Cas9 guide RNAs (as well as information regarding requirements related to protospacer adjacent motif (PAM) sequences present in targeted nucleic acids) can be found in the art, for example, see Jinek et al., Science.
2012 Aug 17;337(6096):816-21 ; Chylinski et al., RNA Biol. 2013 May;10(5):726-37; Ma et al., Biomed Res Int. 2013;2013:270805; Hou et al., Proc Natl Acad Sci U S A. 2013 Sep
24;110(39): 15644-9; Jinek et al., Elife. 2013;2:e00471; Pattanayak et al., Nat Biotechnol. 2013 Sep;31(9):839-43; Qi et al., Cell. 2013 Feb 28;152(5): 1173-83; Wang et al., Cell. 2013 May 9;153(4):910-8; Auer et al., Genome Res. 2013 Oct 31; Chen et al., Nucleic Acids Res. 2013 Nov l ;41(20):el9; Cheng et al, Cell Res. 2013 Oct;23(10): 1163-71 ; Cho et al., Genetics. 2013 Nov;195(3): 1177-80; DiCarlo et al., Nucleic Acids Res. 2013 Apr;41(7):4336-43; Dickinson et al, Nat Methods. 2013 Oct;10(10): 1028-34; Ebina et al, Sci Rep. 2013;3:2510; Fujii et al, Nucleic Acids Res. 2013 Nov l ;41(20):el87; Hu et al., Cell Res. 2013 Nov;23(l l): 1322-5; Jiang et al., Nucleic Acids Res. 2013 Nov l ;41(20):el88; Larson et al., Nat Protoc. 2013
Nov;8(l l):2180-96; Mali et al, Nat Methods. 2013 Oct;10(10):957-63; Nakayama et al., Genesis. 2013 Dec;51(12):835-43; Ran et al., Nat Protoc. 2013 Nov;8(l l):2281-308; Ran et al., Cell. 2013 Sep 12;154(6): 1380-9; Upadhyay et al., G3 (Bethesda). 2013 Dec 9;3(12):2233-8; Walsh et al., Proc Natl Acad Sci U S A. 2013 Sep 24;110(39): 15514-5; Xie et al., Mol Plant.
2013 Oct 9; Yang et al., Cell. 2013 Sep 12;154(6): 1370-9; Briner et al., Mol Cell. 2014 Oct 23;56(2):333-9; Shmakov et al, Nat Rev Microbiol. 2017 Mar;15(3): 169-182; and U.S. patents and patent applications: 8,906,616; 8,895,308; 8,889,418; 8,889,356; 8,871,445; 8,865,406; 8,795,965; 8,771,945; 8,697,359; 20140068797; 20140170753; 20140179006; 20140179770; 20140186843; 20140186919; 20140186958; 20140189896; 20140227787; 20140234972;
20140242664; 20140242699; 20140242700; 20140242702; 20140248702; 20140256046; 20140273037; 20140273226; 20140273230; 20140273231 ; 20140273232; 20140273233;
20140273234; 20140273235; 20140287938; 20140295556; 20140295557; 20140298547;
20140304853; 20140309487; 20140310828; 20140310830; 20140315985; 20140335063;
20140335620; 20140342456; 20140342457; 20140342458; 20140349400; 20140349405;
20140356867; 20140356956; 20140356958; 20140356959; 20140357523; 20140357530;
20140364333; and 20140377868; all of which are hereby incorporated by reference in their entirety.
Type V and Type VI CRISPR/Cas nucleic acid-guided (e.g., RNA-guided) proteins
[00108] In some cases, a genome targeting composition includes a type V or type VI CRISPR/Cas
RNA-guided protein (i.e., the genome editing endonuclease is a type V or type VI CRISPR/Cas endonuclease) (e.g., Cpfl, C2cl, C2c2, C2c3). Type V and type VI CRISPR/Cas endonucleases are a type of class 2 CRISPR/Cas endonuclease. Examples of type V CRISPR/Cas
endonucleases include but are not limited to: Cpfl, C2cl, and C2c3. An example of a type VI CRISPR/Cas endonuclease is C2c2.
[00109] Like type II CRISPR/Cas endonucleases, type V and VI CRISPR/Cas endonucleases form a complex with a corresponding guide RNA. The guide RNA provides target specificity to an endonuclease-guide RNA RNP complex by having a nucleotide sequence (a guide sequence) that is complementary to a sequence (the target site) of a target nucleic acid (as described elsewhere herein). The endonuclease of the complex provides the site-specific activity. In other words, the endonuclease is guided to a target site (e.g., stabilized at a target site) within a target nucleic acid sequence (e.g. a chromosomal sequence or an extrachromosomal sequence, e.g., an episomal sequence, a minicircle sequence, a mitochondrial sequence, a chloroplast sequence, etc.) by virtue of its association with the protein-binding segment of the guide RNA.
[00110] Examples and guidance related to type V and type VI CRISPR/Cas proteins (e.g., cpfl, C2cl, C2c2, and C2c3 guide RNAs), e.g., information regarding PAM sequences and guide RNA sequences, can be found in the art, for example, see Zetsche et al., Cell. 2015 Oct 22;163(3):759- 71 ; Makarova et al., Nat Rev Microbiol. 2015 Nov;13(l l):722-36; and Shmakov et al, Mol Cell. 2015 Nov 5;60(3):385-97; and Shmakov et al., Nat Rev Microbiol. 2017 Mar;15(3): 169-182.
[00111] In some embodiments, a CRISPR/Cas nucleic acid-guided (e.g., RNA-guided) protein of a fusion protein of the present disclosure is a Type V or type VI CRISPR/Cas nucleic acid-guided protein (e.g., Cpfl, C2cl, C2c2, C2c3, C2c7, C2c6) and is nuclease defective ("dead"), e.g., the Type V or type VI CRISPR/Cas polypeptide, when bound to a guide RNA, retains target (e.g., DNA) binding activity but does not cleave the target nucleic acid. [00112] In some embodiments, a nuclease defective CRISPR/Cas nucleic acid-guided (e.g., RNA- guided) protein of a fusion protein of the present disclosure is a nuclease defective Cpf 1 protein (also known as "casl2a" - a type V CRISPR/Cas nucleic acid-guided protein). For example, in some cases a nuclease defective CRISPR/Cas nucleic acid-guided (e.g., RNA-guided) protein of a fusion protein of the present disclosure is a Cpf 1 protein that lacks a functioning RuvC-like domain. For example, in some cases, a nuclease defective Cpf 1 protein comprises an amino acid sequence having at least 30%, at least 35%, at least 40%, at least 45%, at least 50%, at least 55%, at least 60%, at least 65%, at least 70%, at least 75%, at least 80%, at least 85%, at least 90%, at least 95%, at least 98%, at least 99%, or at least 99% amino acid sequence identity to the Cpfl amino acid sequence set forth in any of SEQ ID NOs: 828-832; and lacks a functioning RuvC-like domain. In some cases, a subject nuclease defective Cpfl protein includes an amino acid substitution (e.g., a D→A substitution) at an amino acid residue corresponding to amino acid 917 of the Cpfl amino acid sequence set forth in SEQ ID NO: 828. In some cases, a nuclease defective Cpfl protein comprises an amino acid sequence having at least 30%, at least 35%, at least 40%, at least 45%, at least 50%, at least 55%, at least 60%, at least 65%, at least 70%, at least 75%, at least 80%, at least 85%, at least 90%, at least 95%, at least 98%, or a least 99%, amino acid sequence identity to the Cpfl amino acid sequence set forth in any of SEQ ID NOs: 828-832; and comprises an amino acid substitution (e.g., an E→A substitution) at an amino acid residue corresponding to amino acid 1006 of the Cpfl amino acid sequence set forth in SEQ ID NO: 828. In some cases, a nuclease defective Cpfl protein comprises an amino acid sequence having at least 30%, at least 35%, at least 40%, at least 45%, at least 50%, at least 55%, at least 60%, at least 65%, at least 70%, at least 75%, at least 80%, at least 85%, at least 90%, at least 95%, at least 98%, or at least 99% amino acid sequence identity to the Cpfl amino acid sequence set forth in any of SEQ ID NOs: 828-832; and comprises an amino acid substitution (e.g., a D→A substitution) at an amino acid residue corresponding to amino acid 1255 of the Cpfl amino acid sequence set forth in SEQ ID NO: 828.
[00113] In some cases, a nuclease defective Cpfl protein comprises an amino acid sequence having at least 30%, at least 35%, at least 40%, at least 45%, at least 50%, at least 55%, at least 60%, at least 65%, at least 70%, at least 75%, at least 80%, at least 85%, at least 90%, at least 95%, at least 98%, or at least 99%, amino acid sequence identity to the Cpfl amino acid sequence set forth in any of SEQ ID NOs: 828-832. In some cases, a nuclease defective Cpfl protein comprises an amino acid sequence having at least 30%, at least 35%, at least 40%, at least 45%, at least 50%, at least 55%, at least 60%, at least 65%, at least 70%, at least 75%, at least 80%, at least 85%, at least 90%, at least 95%, at least 98%, or at least 99% amino acid sequence identity to a contiguous stretch of from 100 amino acids to 200 amino acids (aa), from 200 aa to 400 aa, from 400 aa to 600 aa, from 600 aa to 800 aa, from 800 aa to 1000 aa, from 1000 aa to 1100 aa, from 1100 aa to 1200 aa, or from 1200 aa to 1300 aa, of the Cpfl amino acid sequence set forth in any of SEQ ID NOs: 828-832.
[00114] In some cases, a nuclease defective Cpfl protein comprises an amino acid sequence having at least 30%, at least 35%, at least 40%, at least 45%, at least 50%, at least 55%, at least 60%, at least 65%, at least 70%, at least 75%, at least 80%, at least 85%, at least 90%, at least 95%, at least 98%, or at least 99%, amino acid sequence identity to the RuvCI domain of the Cpfl amino acid sequence set forth in any of SEQ ID NOs: 828-832. In some cases, a nuclease defective Cpfl protein comprises an amino acid sequence having at least 30%, at least 35%, at least 40%, at least 45%, at least 50%, at least 55%, at least 60%, at least 65%, at least 70%, at least 75%, at least 80%, at least 85%, at least 90%, at least 95%, at least 98%, or at least 99%, amino acid sequence identity to the RuvCII domain of the Cpfl amino acid sequence set forth in any of SEQ ID NOs: 828-832. In some cases, a nuclease defective Cpfl protein comprises an amino acid sequence having at least 30%, at least 35%, at least 40%, at least 45%, at least 50%, at least 55%, at least 60%, at least 65%, at least 70%, at least 75%, at least 80%, at least 85%, at least 90%, at least 95%, at least 98%, or at least 99%, amino acid sequence identity to the RuvCIII domain of the Cpfl amino acid sequence set forth in any of SEQ ID NOs: 828-832. In some cases, a nuclease defective Cpfl protein comprises an amino acid sequence having at least 30%, at least 35%, at least 40%, at least 45%, at least 50%, at least 55%, at least 60%, at least 65%, at least 70%, at least 75%, at least 80%, at least 85%, at least 90%, at least 95%, at least 98%, or at least 99%, amino acid sequence identity to the RuvCI, RuvCII, and RuvCIII domains of the Cpfl amino acid sequence set forth in any of SEQ ID NOs: 828-832.
[00115] In some cases, a suitable nuclease defective Cpfl protein comprises an amino acid sequence having at least 30%, at least 35%, at least 40%, at least 45%, at least 50%, at least 55%, at least 60%, at least 65%, at least 70%, at least 75%, at least 80%, at least 85%, at least 90%, at least 95%, at least 98%, or at least 99%, amino acid sequence identity to the Cpfl amino acid sequence set forth in any of SEQ ID NOs: 828-832.
[00116] In some embodiments, a nuclease defective CRISPR/Cas nucleic acid-guided (e.g., RNA- guided) protein of a fusion protein of the present disclosure is a nuclease defective C2cl protein (also known as "casl2b" - a type V CRISPR/Cas nucleic acid-guided protein) (examples include those set forth as SEQ ID NOs: 833-840). In some cases, a nuclease defective C2cl protein comprises an amino acid sequence having at least 30%, at least 35%, at least 40%, at least 45%, at least 50%, at least 55%, at least 60%, at least 65%, at least 70%, at least 75%, at least 80%, at least 85%, at least 90%, at least 95%, at least 98%, or at least 99%, amino acid sequence identity to the C2cl amino acid sequence set forth in any of SEQ ID NOs: 833-840. In some cases, a nuclease defective C2cl protein comprises an amino acid sequence having at least 30%, at least 35%, at least 40%, at least 45%, at least 50%, at least 55%, at least 60%, at least 65%, at least 70%, at least 75%, at least 80%, at least 85%, at least 90%, at least 95%, at least 98%, or at least 99%, amino acid sequence identity to a contiguous stretch of from 100 amino acids to 200 amino acids (aa), from 200 aa to 400 aa, from 400 aa to 600 aa, from 600 aa to 800 aa, from 800 aa to 1000 aa, from 1000 aa to 1100 aa, from 1100 aa to 1200 aa, or from 1200 aa to 1300 aa, of the C2cl amino acid sequence set forth in any of SEQ ID NOs: 833-840.
[00117] In some cases, a nuclease defective C2cl protein comprises an amino acid sequence having at least 30%, at least 35%, at least 40%, at least 45%, at least 50%, at least 55%, at least 60%, at least 65%, at least 70%, at least 75%, at least 80%, at least 85%, at least 90%, at least 95%, at least 98%, or at least 99%, amino acid sequence identity to the RuvCI domain of the C2cl amino acid sequences set forth in any of SEQ ID NOs: 833-840). In some cases, a nuclease defective C2cl protein comprises an amino acid sequence having at least 30%, at least 35%, at least 40%, at least 45%, at least 50%, at least 55%, at least 60%, at least 65%, at least 70%, at least 75%, at least 80%, at least 85%, at least 90%, at least 95%, at least 98%, or at least 99%, amino acid sequence identity to the RuvCII domain of the C2cl amino acid sequence set forth in any of SEQ ID NOs: 833-840. In some cases, a nuclease defective C2cl protein comprises an amino acid sequence having at least 30%, at least 35%, at least 40%, at least 45%, at least 50%, at least 55%, at least 60%, at least 65%, at least 70%, at least 75%, at least 80%, at least 85%, at least 90%, at least 95%, at least 98%, or at least 99%, amino acid sequence identity to the RuvCIII domain of the C2cl amino acid sequence set forth in any of SEQ ID NOs: 833-840. In some cases, a nuclease defective C2cl protein comprises an amino acid sequence having at least 30%, at least 35%, at least 40%, at least 45%, at least 50%, at least 55%, at least 60%, at least 65%, at least 70%, at least 75%, at least 80%, at least 85%, at least 90%, at least 95%, at least 98%, or at least 99%, amino acid sequence identity to the RuvCI, RuvCII, and RuvCIII domains of the C2cl amino acid sequence set forth in any of SEQ ID NOs: 833-840.
[00118] In some embodiments, a nuclease defective CRISPR/Cas nucleic acid-guided (e.g., RNA- guided) protein of a fusion protein of the present disclosure is a nuclease defective C2c3 protein (also known as "casl2c" - a type V CRISPR/Cas nucleic acid-guided protein) (examples include those set forth as SEQ ID NOs: 841-844). In some cases, a nuclease defective C2c3 protein comprises an amino acid sequence having at least 30%, at least 35%, at least 40%, at least 45%, at least 50%, at least 55%, at least 60%, at least 65%, at least 70%, at least 75%, at least 80%, at least 85%, at least 90%, at least 95%, at least 98%, or at least 99% amino acid sequence identity to the C2c3 amino acid sequence set forth in any of SEQ ID NOs: 841-844. In some cases, a nuclease defective C2c3 protein comprises an amino acid sequence having at least 30%, at least 35%, at least 40%, at least 45%, at least 50%, at least 55%, at least 60%, at least 65%, at least 70%, at least 75%, at least 80%, at least 85%, at least 90%, at least 95%, at least 98%, or at least 99%, amino acid sequence identity to a contiguous stretch of from 100 amino acids to 200 amino acids (aa), from 200 aa to 400 aa, from 400 aa to 600 aa, from 600 aa to 800 aa, from 800 aa to 1000 aa, from 1000 aa to 1100 aa, from 1100 aa to 1200 aa, or from 1200 aa to 1300 aa, of the C2c3 amino acid sequence set forth in any of SEQ ID NOs: 841-844.
[00119] In some cases, a nuclease defective C2c3 protein comprises an amino acid sequence having at least 30%, at least 35%, at least 40%, at least 45%, at least 50%, at least 55%, at least 60%, at least 65%, at least 70%, at least 75%, at least 80%, at least 85%, at least 90%, at least 95%, at least 98%, or at least 99%, amino acid sequence identity to the RuvCI domain of the C2c3 amino acid sequence set forth in any of SEQ ID NOs: 841-844. In some cases, a nuclease defective C2c3 protein comprises an amino acid sequence having at least 30%, at least 35%, at least 40%, at least 45%, at least 50%, at least 55%, at least 60%, at least 65%, at least 70%, at least 75%, at least 80%, at least 85%, at least 90%, at least 95%, at least 98%, or at least 99%, amino acid sequence identity to the RuvCII domain of the C2c3 amino acid sequence set forth in any of SEQ ID NOs: 841-844. In some cases, a nuclease defective C2c3 protein comprises an amino acid sequence having at least 30%, at least 35%, at least 40%, at least 45%, at least 50%, at least 55%, at least 60%, at least 65%, at least 70%, at least 75%, at least 80%, at least 85%, at least 90%, at least 95%, at least 98%, or at least 99%, amino acid sequence identity to the RuvCIII domain of the C2c3 amino acid sequence set forth in any of SEQ ID NOs: 841-844. In some cases, a nuclease defective C2c3 protein comprises an amino acid sequence having at least 30%, at least 35%, at least 40%, at least 45%, at least 50%, at least 55%, at least 60%, at least 65%, at least 70%, at least 75%, at least 80%, at least 85%, at least 90%, at least 95%, at least 98%, or at least 99%, amino acid sequence identity to the RuvCI, RuvCII, and RuvCIII domains of the C2c3 amino acid sequence set forth in any of SEQ ID NOs: 841-844.
[00120] In some cases, a suitable C2c3 protein comprises an amino acid sequence having at least 30%, at least 35%, at least 40%, at least 45%, at least 50%, at least 55%, at least 60%, at least 65%, at least 70%, at least 75%, at least 80%, at least 85%, at least 90%, at least 95%, at least 98%, or at least 99%, amino acid sequence identity to the C2c3 amino acid sequence set forth in any of SEQ ID NOs: 841-844.
[00121] In some embodiments, a nuclease defective CRISPR/Cas nucleic acid-guided (e.g., RNA- guided) protein of a fusion protein of the present disclosure is a nuclease defective C2c2 protein (also known as "Casl3a" - a type VI CRISPR/Cas endonuclease) (examples include those set forth as SEQ ID NOs: 845-856). In some cases, a nuclease defective C2c2 protein comprises an amino acid sequence having at least 30%, at least 35%, at least 40%, at least 45%, at least 50%, at least 55%, at least 60%, at least 65%, at least 70%, at least 75%, at least 80%, at least 85%, at least 90%, at least 95%, at least 98%, or at least 99%, amino acid sequence identity to the C2c2 amino acid sequence set forth in any of SEQ ID NOs: 845-856. In some cases, a nuclease defective C2c2 protein comprises an amino acid sequence having at least 30%, at least 35%, at least 40%, at least 45%, at least 50%, at least 55%, at least 60%, at least 65%, at least 70%, at least 75%, at least 80%, at least 85%, at least 90%, at least 95%, at least 98%, or at least 99%, amino acid sequence identity to a contiguous stretch of from 100 amino acids to 200 amino acids (aa), from 200 aa to 400 aa, from 400 aa to 600 aa, from 600 aa to 800 aa, from 800 aa to 1000 aa, from 1000 aa to 1100 aa, from 1100 aa to 1200 aa, or from 1200 aa to 1300 aa, of the C2c2 amino acid sequence set forth in any of SEQ ID NOs: 845-856.
[00122] In some cases, a nuclease defective C2c2 protein comprises an amino acid sequence having at least 30%, at least 35%, at least 40%, at least 45%, at least 50%, at least 55%, at least 60%, at least 65%, at least 70%, at least 75%, at least 80%, at least 85%, at least 90%, at least 95%, at least 98%, or at least 99%, amino acid sequence identity to the RuvCI domain of the C2c2 amino acid sequence set forth in any of SEQ ID NOs: 845-856. In some cases, a nuclease defective C2c2 protein comprises an amino acid sequence having at least 30%, at least 35%, at least 40%, at least 45%, at least 50%, at least 55%, at least 60%, at least 65%, at least 70%, at least 75%, at least 80%, at least 85%, at least 90%, at least 95%, at least 98%, or at least 99%, amino acid sequence identity to the RuvCII domain of the C2c2 amino acid sequence set forth in any of SEQ ID NOs: 845-856. In some cases, a nuclease defective C2c2 protein comprises an amino acid sequence having at least 30%, at least 35%, at least 40%, at least 45%, at least 50%, at least 55%, at least 60%, at least 65%, at least 70%, at least 75%, at least 80%, at least 85%, at least 90%, at least 95%, at least 98%, or at least 99%, amino acid sequence identity to the RuvCIII domain of the C2c2 amino acid sequence set forth in any of SEQ ID NOs: 845-856. In some cases, a nuclease defective C2c2 protein comprises an amino acid sequence having at least 30%, at least 35%, at least 40%, at least 45%, at least 50%, at least 55%, at least 60%, at least 65%, at least 70%, at least 75%, at least 80%, at least 85%, at least 90%, at least 95%, at least 98%, or at least 99%, amino acid sequence identity to the RuvCI, RuvCII, and RuvCIII domains of the C2c2 amino acid sequence set forth in any of SEQ ID NOs: 845-856.
[00123] In some embodiments, a nuclease defective CRISPR/Cas nucleic acid-guided (e.g., RNA- guided) protein of a fusion protein of the present disclosure is a nuclease defective C2c7 protein (also known as "Casl3c" - a type VI CRISPR/Cas endonuclease) or a nuclease defective C2c6 (also known as "Casl3b" - a type VI CRISPR/Cas endonuclease) [e.g., see Shmakov et al, Nat Rev Microbiol. 2017 Mar;15(3): 169-182]. [00124] Examples and guidance related to type V or type VI CRISPR/Cas endonucleases (including domain structure) and guide RNAs (as well as information regarding requirements related to protospacer adjacent motif (PAM) sequences present in targeted nucleic acids) can be found in the art, for example, see Zetsche et al., Cell. 2015 Oct 22;163(3):759-71 ; Makarova et al., Nat Rev Microbiol. 2015 Nov;13(l l):722-36; Shmakov et al., Mol Cell. 2015 Nov 5;60(3):385-97; and Shmakov et al, Nat Rev Microbiol. 2017 Mar;15(3): 169-182; and U.S. patents and patent applications: 9,580,701 ; 20170073695, 20170058272, 20160362668, 20160362667,
20160298078, 20160289637, 20160215300, 20160208243, and 20160208241, all of which are hereby incorporated by reference in their entirety.
CasX and CasY CRISPR/Cas nucleic acid-guided (e.g., RNA-guided) proteins
[00125] In some embodiments, a nuclease defective CRISPR/Cas nucleic acid-guided (e.g., RNA- guided) protein of a fusion protein of the present disclosure is a nuclease defective CasX or CasY protein. As such, in some cases a nuclease defective CRISPR/Cas nucleic acid-guided (e.g., RNA-guided) protein of a fusion protein of the present disclosure is a nuclease defective CasX protein. In some cases a nuclease defective CRISPR/Cas nucleic acid-guided (e.g., RNA- guided) protein of a fusion protein of the present disclosure is a nuclease defective CasY protein.
[00126] In some embodiments, a nuclease defective CRISPR/Cas nucleic acid-guided (e.g., RNA- guided) protein of a fusion protein of the present disclosure is a nuclease defective CASX protein. For example, in some cases a nuclease defective CRISPR/Cas nucleic acid-guided (e.g., RNA-guided) protein of a fusion protein of the present disclosure is a CASX protein that lacks a functioning RuvC domain. For example, in some cases, a nuclease defective CASX protein comprises an amino acid sequence having at least 30%, at least 35%, at least 40%, at least 45%, at least 50%, at least 55%, at least 60%, at least 65%, at least 70%, at least 75%, at least 80%, at least 85%, at least 90%, at least 95%, at least 98%, at least 99%, or at least 99% amino acid sequence identity to the CASX amino acid sequence set forth in any of SEQ ID NOs: 1146- 1147; and lacks a functioning RuvC domain.
[00127] In some cases, a nuclease defective CasX protein comprises an amino acid sequence having at least 30%, at least 35%, at least 40%, at least 45%, at least 50%, at least 55%, at least 60%, at least 65%, at least 70%, at least 75%, at least 80%, at least 85%, at least 90%, at least 95%, at least 98%, or at least 99%, amino acid sequence identity to the CASX amino acid sequence set forth in any of SEQ ID NOs: 1146-1147. In some cases, a nuclease defective CASX protein comprises an amino acid sequence having at least 30%, at least 35%, at least 40%, at least 45%, at least 50%, at least 55%, at least 60%, at least 65%, at least 70%, at least 75%, at least 80%, at least 85%, at least 90%, at least 95%, at least 98%, or at least 99% amino acid sequence identity to a contiguous stretch of from 100 amino acids to 200 amino acids (aa), from 200 aa to 400 aa, from 400 aa to 600 aa, from 600 aa to 800 aa, or from 800 aa to 960 aa, of the CASX amino acid sequence set forth in any of SEQ ID NOs: 1146-1147.
[00128] In some cases, a nuclease defective CASX protein comprises an amino acid sequence having at least 30%, at least 35%, at least 40%, at least 45%, at least 50%, at least 55%, at least 60%, at least 65%, at least 70%, at least 75%, at least 80%, at least 85%, at least 90%, at least 95%, at least 98%, or at least 99%, amino acid sequence identity to the RuvCI domain of the CASX amino acid sequence set forth in any of SEQ ID NOs: 1146-1147. In some cases, a nuclease defective CasX protein comprises an amino acid sequence having at least 30%, at least 35%, at least 40%, at least 45%, at least 50%, at least 55%, at least 60%, at least 65%, at least 70%, at least 75%, at least 80%, at least 85%, at least 90%, at least 95%, at least 98%, or at least 99%, amino acid sequence identity to the RuvCII domain of the CasX amino acid sequence set forth in any of SEQ ID NOs: 1146-1147. In some cases, a nuclease defective CasX protein comprises an amino acid sequence having at least 30%, at least 35%, at least 40%, at least 45%, at least 50%, at least 55%, at least 60%, at least 65%, at least 70%, at least 75%, at least 80%, at least 85%, at least 90%, at least 95%, at least 98%, or at least 99%, amino acid sequence identity to the RuvCIII domain of the CasX amino acid sequence set forth in any of SEQ ID NOs: 1146-1147. In some cases, a nuclease defective CasX protein comprises an amino acid sequence having at least 30%, at least 35%, at least 40%, at least 45%, at least 50%, at least 55%, at least 60%, at least 65%, at least 70%, at least 75%, at least 80%, at least 85%, at least 90%, at least 95%, at least 98%, or at least 99%, amino acid sequence identity to the RuvCI, RuvCII, and RuvCIII domains of the CasX amino acid sequence set forth in any of SEQ ID NOs: 1146-1147.
[00129] In some cases, a suitable nuclease defective CasX protein comprises an amino acid sequence having at least 30%, at least 35%, at least 40%, at least 45%, at least 50%, at least 55%, at least 60%, at least 65%, at least 70%, at least 75%, at least 80%, at least 85%, at least 90%, at least 95%, at least 98%, or at least 99%, amino acid sequence identity to the CasX amino acid sequence set forth in any of SEQ ID NOs: 1146-1147.
[00130] In some embodiments, a nuclease defective CRISPR/Cas nucleic acid-guided (e.g., RNA- guided) protein of a fusion protein of the present disclosure is a nuclease defective CASY protein. For example, in some cases a nuclease defective CRISPR/Cas nucleic acid-guided (e.g., RNA-guided) protein of a fusion protein of the present disclosure is a CASY protein that lacks a functioning RuvC domain. For example, in some cases, a nuclease defective CASY protein comprises an amino acid sequence having at least 30%, at least 35%, at least 40%, at least 45%, at least 50%, at least 55%, at least 60%, at least 65%, at least 70%, at least 75%, at least 80%, at least 85%, at least 90%, at least 95%, at least 98%, at least 99%, or at least 99% amino acid sequence identity to the CASY amino acid sequence set forth in any of SEQ ID NOs: 1148- 1153; and lacks a functioning RuvC domain.
[00131] In some cases, a nuclease defective CasY protein comprises an amino acid sequence having at least 30%, at least 35%, at least 40%, at least 45%, at least 50%, at least 55%, at least 60%, at least 65%, at least 70%, at least 75%, at least 80%, at least 85%, at least 90%, at least 95%, at least 98%, or at least 99%, amino acid sequence identity to the CASY amino acid sequence set forth in any of SEQ ID NOs: 1148-1153. In some cases, a nuclease defective CASY protein comprises an amino acid sequence having at least 30%, at least 35%, at least 40%, at least 45%, at least 50%, at least 55%, at least 60%, at least 65%, at least 70%, at least 75%, at least 80%, at least 85%, at least 90%, at least 95%, at least 98%, or at least 99% amino acid sequence identity to a contiguous stretch of from 100 amino acids to 200 amino acids (aa), from 200 aa to 400 aa, from 400 aa to 600 aa, from 600 aa to 800 aa, or from 800 aa to 960 aa, of the CASY amino acid sequence set forth in any of SEQ ID NOs: 1148-1153.
[00132] In some cases, a nuclease defective CASY protein comprises an amino acid sequence having at least 30%, at least 35%, at least 40%, at least 45%, at least 50%, at least 55%, at least 60%, at least 65%, at least 70%, at least 75%, at least 80%, at least 85%, at least 90%, at least 95%, at least 98%, or at least 99%, amino acid sequence identity to the RuvCI domain of the CASY amino acid sequence set forth in any of SEQ ID NOs: 1148-1153. In some cases, a nuclease defective CasY protein comprises an amino acid sequence having at least 30%, at least 35%, at least 40%, at least 45%, at least 50%, at least 55%, at least 60%, at least 65%, at least 70%, at least 75%, at least 80%, at least 85%, at least 90%, at least 95%, at least 98%, or at least 99%, amino acid sequence identity to the RuvCII domain of the CasY amino acid sequence set forth in any of SEQ ID NOs: 1148-1153. In some cases, a nuclease defective CasY protein comprises an amino acid sequence having at least 30%, at least 35%, at least 40%, at least 45%, at least 50%, at least 55%, at least 60%, at least 65%, at least 70%, at least 75%, at least 80%, at least 85%, at least 90%, at least 95%, at least 98%, or at least 99%, amino acid sequence identity to the RuvCIII domain of the CasY amino acid sequence set forth in any of SEQ ID NOs: 1148-1153. In some cases, a nuclease defective CasY protein comprises an amino acid sequence having at least 30%, at least 35%, at least 40%, at least 45%, at least 50%, at least 55%, at least 60%, at least 65%, at least 70%, at least 75%, at least 80%, at least 85%, at least 90%, at least 95%, at least 98%, or at least 99%, amino acid sequence identity to the RuvCI, RuvCII, and RuvCIII domains of the CasY amino acid sequence set forth in any of SEQ ID NOs: 1148-1153.
[00133] In some cases, a suitable nuclease defective CasY protein comprises an amino acid sequence having at least 30%, at least 35%, at least 40%, at least 45%, at least 50%, at least 55%, at least 60%, at least 65%, at least 70%, at least 75%, at least 80%, at least 85%, at least 90%, at least 95%, at least 98%, or at least 99%, amino acid sequence identity to the CasY amino acid sequence set forth in any of SEQ ID NOs: 1148-1153.
[00134] Examples and guidance related to CasX and CasY proteins (including domain structure) and guide RNAs (as well as information regarding requirements related to protospacer adjacent motif (PAM) sequences present in targeted nucleic acids) can be found in the art, for example, see Burstein et al., Nature. 2017 Feb 9; 542 (7640):237-241, which reference is hereby incorporated by reference in its entirety.
Fusion partners and linkers
[00135] In some cases, a subject fusion protein (a nuclease defective CRISPR/Cas RNA-guided
polypeptide (e.g., dCas9) fused to a first or second portion of a split BirA* protein) is fused to an amino acid sequence (a fusion partner) that provides for subcellular localization, i.e., the fusion partner is a subcellular localization sequence (e.g., one or more nuclear localization signals (NLSs) for targeting to the nucleus. For example, a subject fusion protein can have one or more NLSs (e.g., two or more, three or more, four or more, five or more, 1, 2, 3, 4, or 5 NLSs). In some cases, a subject fusion protein includes one or more (e.g., two or more, three or more, four or more, etc.) nuclear localization signals (NLSs). In some cases a fusion partner (or multiple fusion partners, e.g., 1, 2, 3, 4, or 5 fusion partners; e.g., an NLS, a tag, a fusion partner providing an activity, etc.) is located at or near the C-terminus of the subject fusion protein. In some cases a fusion partner (or multiple fusion partners, e.g., 1, 2, 3, 4, or 5 fusion partners; e.g., an NLS, a tag, a fusion partner providing an activity, etc.) is located at the N-terminus of the subject fusion protein. In some cases the subject fusion protein has a fusion partner (or multiple fusion partners, e.g., 1, 2, 3, 4, or 5 fusion partners; e.g., an NLS, a tag, a fusion partner providing an activity, etc.) at both the N-terminus and C-terminus. In some cases, the subject fusion protein has a fusion partner (or multiple fusion partners, e.g., 1, 2, 3, 4, or 5 fusion partners; e.g., an NLS, a tag, a fusion partner providing an activity, etc.) located internal to the primary amino acid sequence. In some cases, the subject fusion protein has a fusion partner (or multiple fusion partners, e.g., 1, 2, 3, 4, or 5 fusion partners; e.g., an NLS, a tag, a fusion partner providing an activity, etc.) located internal to the primary amino acid sequence in addition to a fusion partner located at C- and/or N- terminus.
[00136] As an illustrative example, a subject fusion protein (a nuclease defective CRISPR/Cas RNA- guided polypeptide, e.g., dCas9, fused to a first or second portion of a split BirA* protein) can have a fusion partner that provides for tagging (e.g., GFP), and can also have a subcellular localization sequence (e.g., one or more NLSs, 2 or more NLSs, etc.). In some cases, such a fusion protein might also have a tag for ease of tracking and/or purification (e.g., a histidine tag, e.g., a 6XHis tag; a hemagglutinin (HA) tag; a FLAG tag; a Myc tag; and the like). [00137] In some embodiments, , a subject fusion protein is fused to an amino acid sequence (a fusion partner) that provides a tag (i.e., the fusion partner is a detectable label) for ease of tracking and/or purification (e.g., a fluorescent protein, e.g., green fluorescent protein (GFP), YFP, RFP, CFP, mCherry, tdTomato, and the like; a histidine tag, e.g., a 6XHis tag; a hemagglutinin (HA) tag; a FLAG tag; a Myc tag; and the like). In some embodiments, , a subject fusion partner can provide for increased or decreased stability (i.e., the fusion partner can be a stability control peptide, e.g., a degron, which in some cases is controllable (e.g., a temperature sensitive or drug controllable degron sequence).
[00138] In some cases, a subject fusion protein (a nuclease defective CRISPR/Cas RNA-guided
polypeptide, e.g., dCas9, fused to a first or second portion of a split BirA* protein) is conjugated (e.g., fused) to a polypeptide permeant domain to promote uptake by the cell (i.e., a fusion partner can promote uptake by a cell). A number of permeant domains are known in the art and may be used, including peptides, peptidomimetics, and non-peptide carriers. For example, a permeant peptide may be derived from the third alpha helix of Drosophila melanogaster transcription factor Antennapaedia, referred to as penetratin, which comprises the amino acid sequence RQIKIWFQNRRMKWKK (SEQ ID NO: 1129). As another example, the permeant peptide can comprise the HIV-1 tat basic region amino acid sequence, which may include, for example, amino acids 49-57 of naturally-occurring tat protein. Other permeant domains include poly-arginine motifs, for example, the region of amino acids 34-56 of HIV-1 rev protein, nona- arginine, octa-arginine, and the like. (See, for example, Futaki et al. (2003) Curr Protein Pept Sci. 2003 Apr; 4(2): 87-9 and 446; and Wender et al. (2000) Proc. Natl. Acad. Sci. U.S.A 2000 Nov. 21 ; 97(24): 13003-8; published U.S. Patent applications 20030220334; 20030083256; 20030032593; and 20030022831, herein specifically incorporated by reference for the teachings of translocation peptides and peptoids). The nona-arginine (R9) sequence is one of the more efficient PTDs that have been characterized (Wender et al. 2000; Uemura et al. 2002). The site at which the fusion is made may be selected in order to optimize the biological activity, secretion or binding characteristics of the polypeptide. The optimal site can be determined by routine experimentation.
[00139] In some cases, a subject fusion protein includes a "Protein Transduction Domain" or PTD (also known as a CPP - cell penetrating peptide), which refers to a polypeptide, polynucleotide, carbohydrate, or organic or inorganic compound that facilitates traversing a lipid bilayer, micelle, cell membrane, organelle membrane, or vesicle membrane. A PTD attached to another molecule, which can range from a small polar molecule to a large macromolecule and/or a nanoparticle, facilitates the molecule traversing a membrane, for example going from extracellular space to intracellular space, or cytosol to within an organelle. In some embodiments, a PTD is covalently linked to the N-terminus a polypeptide (e.g., a subject fusion protein). In some embodiments, a PTD is covalently linked to the C-terminus of a polypeptide (e.g., a subject fusion protein). In some cases, the PTD is inserted internally in primary amino acid sequence of a subject fusion protein. In some cases, a subject fusion protein includes (is conjugated to, is fused to) one or more PTDs (e.g., two or more, three or more, four or more PTDs). In some cases a PTD includes a nuclear localization signal (NLS) (e.g., in some cases 2 or more, 3 or more, 4 or more, or 5 or more NLSs). In some cases, a subject fusion protein includes one or more NLSs (e.g., 2 or more, 3 or more, 4 or more, or 5 or more NLSs).
In some embodiments, a PTD is covalently linked to a nucleic acid (e.g., a CRISPR/Cas guide RNA, a DNA molecule encoding a CRISPR/Cas guide RNA, a nucleic acid encoding a subject fusion protein). Examples of PTDs include but are not limited to a minimal undecapeptide protein transduction domain (corresponding to residues 47-57 of HIV- 1 TAT comprising YGRKKRRQRRR; SEQ ID NO: 1125); a polyarginine sequence comprising a number of arginines sufficient to direct entry into a cell (e.g., 3, 4, 5, 6, 7, 8, 9, 10, or 10-50 arginines); a VP22 domain (Zender et al. (2002) Cancer Gene Ther. 9(6):489-96); an Drosophila
Antennapedia protein transduction domain (Noguchi et al. (2003) Diabetes 52(7): 1732-1737); a truncated human calcitonin peptide (Trehin et al. (2004) Pharm. Research 21 : 1248-1256);
polylysine (Wender et al. (2000) Proc. Natl. Acad. Sci. USA 97: 13003-13008);
RRQRRTS KLMKR (SEQ ID NO: 1126); Transportan
GWTLNSAGYLLGKINLKALAALAKKIL (SEQ ID NO: 1127);
KALAWEAKLAKALAKALAKHLAKALAKALKCEA (SEQ ID NO: 1128); and
RQIKIWFQNRRMKWKK (SEQ ID NO: 1129). Exemplary PTDs include but are not limited to, YGRKKRRQRRR (SEQ ID NO: 1125), RKKRRQRRR (SEQ ID NO: 1130); an arginine homopolymer of from 3 arginine residues to 50 arginine residues; Exemplary PTD domain amino acid sequences include, but are not limited to, any of the following: YGRKKRRQRRR (SEQ ID NO: 1125); RKKRRQRR (SEQ ID NO: 1131); YARAAARQARA (SEQ ID NO: 1132); THRLPRRRRRR (SEQ ID NO: 1133); and GGRRARRRRRR (SEQ ID NO: 1134). In some embodiments, the PTD is an activatable CPP (ACPP) (Aguilera et al. (2009) Integr Biol (Camb) June; 1(5-6): 371-381). ACPPs comprise a polycationic CPP (e.g., Arg9 or "R9") connected via a cleavable linker to a matching polyanion (e.g., Glu9 or "E9"), which reduces the net charge to nearly zero and thereby inhibits adhesion and uptake into cells. Upon cleavage of the linker, the polyanion is released, locally unmasking the polyarginine and its inherent adhesiveness, thus "activating" the ACPP to traverse the membrane. Linkers
[00141] In some embodiments, a subject fusion protein includes one or more linker polypeptides (i.e., a linker). In some cases a linker separates the nuclease defective CRISPR/Cas nucleic acid-guided (e.g., RNA-guided) polypeptide (e.g., dCas9) from the split BirA* portion of the fusion protein. In some cases a linker separates a fusion partner (e.g., NLS, PTD, tag, etc.) from the nuclease defective CRISPR/Cas nucleic acid-guided (e.g., RNA-guided) polypeptide. In some cases a linker separates a fusion partner (e.g., NLS, PTD, tag, etc.) from the split BirA* portion of a subject fusion protein. The linker polypeptide may have any of a variety of amino acid sequences. In some cases the linker is a flexible linker. In some cases the linker is a rigid linker.
[00142] A linker can be of any convenient length, and length of the linker can influence guide RNA positioning. For example, a long linker between the nuclease defective CRISPR/Cas RNA- guided polypeptide and the split BirA* portion of a subject fusion protein, will likely allow for the guide RNAs to be targeted farther apart at the target locus because the CRISPR/Cas RNA- guided polypeptide portions (e.g., dCas9 portions of the fusion proteins) do not need to be as close to one another in order to bring the first and second split BirA* portions together.
[00143] In some cases a linker is 3 amino acids (aa) to 40 amino acids in length, e.g., from 3 amino acids to 25 amino acids in length, from 3 amino acids to 10 amino acids in length, from 3 amino acids to 5 amino acids in length, etc. In some cases, a linker is 10 to 30 amino acids long (e.g., 10 to 28, 10 to 25, 10 to 22, 10 to 20, 12 to 30, 12 to 28, 12 to 25, 12 to 22, 12 to 20, 15 to 30, 15 to 28, 15 to 25, 15 to 22, or 15 to 20 amino acids long). In some cases a linker is 15 aa, 16 aa, 17 aa, 18 aa, 19 aa or 20 aa long.
[00144] Peptide linkers with a degree of flexibility can be used. The linking peptides may have virtually any amino acid sequence, bearing in mind that the some linkers will have a sequence that results in a generally flexible peptide. The use of small amino acids, such as glycine and alanine, are of use in creating a flexible peptide. The creation of such sequences is routine to those of skill in the art. A variety of different linkers are commercially available and are considered suitable for use. In some cases the linker is an XTEN linker, e.g., SGSETPGTSES (SEQ ID NO: 10). In some cases a subject linker includes the amino acid sequence GGGGSSGSETPGTSES (SEQ ID NO: 1135). In some cases a subject linker includes the amino acid sequence ETPGTSES (SEQ ID NO: 1136).
[00145] Examples of linker polypeptides include glycine polymers (G)n, glycine-serine polymers
(including, for example, GGS, (GGS)n, (GS)n, (GSGGS)n (SEQ ID NO: 1137), (GGSGGS)n (SEQ ID NO: 1138), and (GGGS)n (SEQ ID NO: 1139), where n is an integer of at least one, and can range from 1 to about 10), glycine-alanine polymers, alanine-serine polymers. Example linkers can include amino acid sequences including, but not limited to, GS, GGS, GGSG (SEQ ID NO: 1140), GGSGG (SEQ ID NO: 1141), GSGSG (SEQ ID NO: 1142), GSGGG (SEQ ID NO: 1143), GGGSG (SEQ ID NO: 1144), GSSSG (SEQ ID NO: 1145), and the like. The ordinarily skilled artisan will recognize that design of a peptide conjugated to any elements described above can include linkers that are all or partially flexible, such that the linker can include a flexible linker as well as one or more portions that confer less flexible structure.
Guide RNAs
[00146] A CRISPR/Cas nucleic acid-guided (e.g., RNA -guided) protein interacts with (binds to) a corresponding guide RNA to form a ribonucleoprotein (RNP) complex that is targeted to a particular site in a target genome via base pairing between the guide RNA and a target sequence within the target genome. A guide RNA includes a nucleotide sequence (a guide sequence) that is complementary to a sequence (the target site) of a target nucleic acid and includes a 'constant region,' referred to as a 'handle,' that binds to the CRISPR/Cas nucleic acid-guided (e.g., RNA- guided) protein. Different CRISPR/Cas nucleic acid-guided (e.g., RNA-guided) proteins bind to different guide RNA handles. Thus, a subject fusion protein binds to a corresponding guide RNA when being targeted to a specific sequence of a target DNA. However, because the guide sequence of a guide RNA can be readily modified to target different target sequences within a target genome, in some cases, a subject composition or kit includes only the fusion proteins and the user adds the desired corresponding guide RNA (or a nucleic acid encoding the
corresponding guide RNA).
[00147] A nucleic acid molecule (e.g., RNA, DNA, or RNA/DNA hybrid) that binds to a class 2
CRISPR/Cas endonuclease (e.g., a Cas9 protein; a type V or type VI CRISPR/Cas protein; a Cpf 1 protein; etc.) and targets the complex to a specific location within a target nucleic acid is referred to herein as a "guide RNA" or "CRISPR/Cas guide nucleic acid" or "CRISPR/Cas guide RNA" (all of which terms are used interchangeably).
[00148] A guide RNA provides target specificity to the complex (the RNP complex) by including a targeting segment, which includes a guide sequence (also referred to herein as a targeting sequence), which is a nucleotide sequence that is complementary to a sequence of a target nucleic acid.
[00149] A guide RNA can be referred to by the protein to which it corresponds. For example, when the class 2 CRISPR/Cas endonuclease is a Cas9 protein, the corresponding guide RNA can be referred to as a "Cas9 guide RNA." Likewise, as another example, when the class 2 CRISPR/Cas endonuclease is a Cpfl protein, the corresponding guide RNA can be referred to as a "Cpf 1 guide RNA." [00150] In some embodiments, (e.g., when Cas9 is used) a guide RNA can include two separate nucleic acid molecules: an "activator" and a "targeter" and is referred to herein as a "dual guide RNA", a "double-molecule guide RNA", a "two-molecule guide RNA", or a "dgRNA." In some embodiments, the guide RNA is one molecule (e.g., for some class 2 CRISPR/Cas proteins, the corresponding guide RNA is a single molecule; and in some cases, an activator and targeter are covalently linked to one another, e.g., via intervening nucleotides), and the guide RNA is referred to as a "single guide RNA", a "single-molecule guide RNA," a "one-molecule guide RNA", or simply "sgRNA."
Cas9 Guide RNA
[00151] A nucleic acid molecule that binds to a Cas9 protein and targets the complex to a specific location within a target nucleic acid is referred to herein as a "Cas9 guide RNA" (even though in some cases it can be an RNA/DNA hybrid molecule).
[00152] A Cas9 guide RNA (can be said to include two segments, a first segment (referred to herein as a "targeting segment"); and a second segment (referred to herein as a "protein-binding segment"). By "segment" it is meant a segment/section/region of a molecule, e.g., a contiguous stretch of nucleotides in a nucleic acid molecule. A segment can also mean a region/section of a complex such that a segment may comprise regions of more than one molecule.
[00153] The first segment (targeting segment) of a Cas9 guide RNA includes a nucleotide sequence (a guide sequence) that is complementary to (and therefore hybridizes with) a specific sequence (a target site) within a target nucleic acid (e.g., a target ssRNA, a target ssDNA, the complementary strand of a double stranded target DNA, etc.). The protein-binding segment (or "protein-binding sequence") interacts with (binds to) a Cas9 polypeptide. The protein-binding segment of a subject Cas9 guide RNA includes two complementary stretches of nucleotides that hybridize to one another to form a double stranded RNA duplex (dsRNA duplex). Site-specific binding to and/or cleavage of a target nucleic acid (e.g., genomic DNA) can occur at locations (e.g., target sequence of a target locus) determined by base-pairing complementarity between the Cas9 guide RNA (the guide sequence of the Cas9 guide RNA) and the target nucleic acid.
[00154] A Cas9 guide RNA and a Cas9 protein form a complex (e.g., bind via non-covalent
interactions). The Cas9 guide RNA provides target specificity to the complex by including a targeting segment, which includes a guide sequence (a nucleotide sequence that is
complementary to a sequence of a target nucleic acid). The Cas9 protein of the complex provides the site-specific activity (e.g., cleavage activity or an activity provided by the Cas9 protein when the Cas9 protein is a Cas9 fusion polypeptide, i.e., has a fusion partner such as a split BirA*). In other words, the Cas9 protein is guided to a target nucleic acid sequence (e.g. a target sequence in a chromosomal nucleic acid, e.g., a chromosome; a target sequence in an extrachromosomal nucleic acid, e.g. an episomal nucleic acid, a minicircle, an ssRNA, an ssDNA, etc.; a target sequence in a mitochondrial nucleic acid; a target sequence in a chloroplast nucleic acid; a target sequence in a plasmid; a target sequence in a viral nucleic acid; etc.) by virtue of its association with the Cas9 guide RNA.
[00155] The "guide sequence" also referred to as the "targeting sequence" of a Cas9 guide RNA can be modified so that the Cas9 guide RNA can target a Cas9 protein to any desired sequence of any desired target nucleic acid, with the exception that the protospacer adjacent motif (PAM) sequence can be taken into account. Thus, for example, a Cas9 guide RNA can have a targeting segment with a sequence (a guide sequence) that has complementarity with (e.g., can hybridize to) a sequence in a nucleic acid in a eukaryotic cell, e.g., a viral nucleic acid, a eukaryotic nucleic acid (e.g., a eukaryotic chromosome, chromosomal sequence, a eukaryotic RNA, etc.), and the like.
[00156] In some embodiments, a Cas9 guide RNA includes two separate nucleic acid molecules: an "activator" and a "targeter" and is referred to herein as a "dual Cas9 guide RNA", a "double- molecule Cas9 guide RNA", or a "two-molecule Cas9 guide RNA" a "dual guide RNA", or a "dgRNA." In some embodiments, the activator and targeter are covalently linked to one another (e.g., via intervening nucleotides) and the guide RNA is referred to as a "single guide RNA", a "Cas9 single guide RNA", a "single-molecule Cas9 guide RNA," or a "one-molecule Cas9 guide RNA", or simply "sgRNA."
[00157] A Cas9 guide RNA comprises a crRNA-like ("CRISPR RNA" / "targeter" / "crRNA" /
"crRNA repeat") molecule and a corresponding tracrRNA-like ("trans-acting CRISPR RNA" / "activator" / "tracrRNA") molecule. A crRNA-like molecule (targeter) comprises both the targeting segment (single stranded) of the Cas9 guide RNA and a stretch ("duplex-forming segment") of nucleotides that forms one half of the dsRNA duplex of the protein-binding segment of the Cas9 guide RNA. A corresponding tracrRNA-like molecule (activator / tracrRNA) comprises a stretch of nucleotides (duplex-forming segment) that forms the other half of the dsRNA duplex of the protein-binding segment of the guide nucleic acid. In other words, a stretch of nucleotides of a crRNA-like molecule are complementary to and hybridize with a stretch of nucleotides of a tracrRNA-like molecule to form the dsRNA duplex of the protein- binding domain of the Cas9 guide RNA. As such, each targeter molecule can be said to have a corresponding activator molecule (which has a region that hybridizes with the targeter). The targeter molecule additionally provides the targeting segment. Thus, a targeter and an activator molecule (as a corresponding pair) hybridize to form a Cas9 guide RNA. The exact sequence of a given crRNA or tracrRNA molecule is characteristic of the species in which the RNA molecules are found. A subject dual Cas9 guide RNA can include any corresponding activator and targeter pair.
[00158] The term "activator" or "activator RNA" is used herein to mean a tracrRNA-like molecule (tracrRNA : "trans-acting CRISPR RNA") of a Cas9 dual guide RNA (and therefore of a Cas9 single guide RNA when the "activator" and the "targeter" are linked together by, e.g., intervening nucleotides). Thus, for example, a Cas9 guide RNA (dgRNA or sgRNA) comprises an activator sequence (e.g., a tracrRNA sequence). A tracr molecule (a tracrRNA) is a naturally existing molecule that hybridizes with a CRISPR RNA molecule (a crRNA) to form a Cas9 dual guide RNA. The term "activator" is used herein to encompass naturally existing tracrRNAs, but also to encompass tracrRNAs with modifications (e.g., truncations, sequence variations, base modifications, backbone modifications, linkage modifications, etc.) where the activator retains at least one function of a tracrRNA (e.g., contributes to the dsRNA duplex to which Cas9 protein binds). In some cases the activator provides one or more stem loops that can interact with Cas9 protein. An activator can be referred to as having a tracr sequence (tracrRNA sequence) and in some cases is a tracrRNA, but the term "activator" is not limited to naturally existing tracrRNAs.
[00159] The term "targeter" or "targeter RNA" is used herein to refer to a crRNA-like molecule
(crRNA: "CRISPR RNA") of a Cas9 dual guide RNA (and therefore of a Cas9 single guide RNA when the "activator" and the "targeter" are linked together, e.g., by intervening nucleotides). Thus, for example, a Cas9 guide RNA (dgRNA or sgRNA) comprises a targeting segment (which includes nucleotides that hybridize with (are complementary to) a target nucleic acid, and a duplex-forming segment (e.g., a duplex forming segment of a crRNA, which can also be referred to as a crRNA repeat). Because the sequence of a targeting segment (the segment that hybridizes with a target sequence of a target nucleic acid) of a targeter is modified by a user to hybridize with a desired target nucleic acid, the sequence of a targeter will often be a non- naturally occurring sequence. However, the duplex-forming segment of a targeter (described in more detail below), which hybridizes with the duplex-forming segment of an activator, can include a naturally existing sequence (e.g., can include the sequence of a duplex-forming segment of a naturally existing crRNA, which can also be referred to as a crRNA repeat). Thus, the term targeter is used herein to distinguish from naturally occurring crRNAs, despite the fact that part of a targeter (e.g., the duplex-forming segment) often includes a naturally occurring sequence from a crRNA. However, the term "targeter" encompasses naturally occurring crRNAs.
[00160] A Cas9 guide RNA can also be said to include 3 parts: (i) a targeting sequence (a nucleotide sequence that hybridizes with a sequence of the target nucleic acid); (ii) an activator sequence (as described above) (in some cases, referred to as a tracr sequence); and (iii) a sequence that hybridizes to at least a portion of the activator sequence to form a double stranded duplex. A targeter has (i) and (iii); while an activator has (ii).
[00161] A Cas9 guide RNA (e.g. a dual guide RNA or a single guide RNA) can be comprised of any corresponding activator and targeter pair. In some cases, the duplex forming segments can be swapped between the activator and the targeter. In other words, in some cases, the targeter includes a sequence of nucleotides from a duplex forming segment of a tracrRNA (which sequence would normally be part of an activator) while the activator includes a sequence of nucleotides from a duplex forming segment of a crRNA (which sequence would normally be part of a targeter).
[00162] As noted above, a targeter comprises both the targeting segment (single stranded) of the Cas9 guide RNA and a stretch ("duplex-forming segment") of nucleotides that forms one half of the dsRNA duplex of the protein-binding segment of the Cas9 guide RNA. A corresponding tracrRNA-like molecule (activator) comprises a stretch of nucleotides (a duplex-forming segment) that forms the other half of the dsRNA duplex of the protein-binding segment of the Cas9 guide RNA. In other words, a stretch of nucleotides of the targeter is complementary to and hybridizes with a stretch of nucleotides of the activator to form the dsRNA duplex of the protein- binding segment of a Cas9 guide RNA. As such, each targeter can be said to have a
corresponding activator (which has a region that hybridizes with the targeter). The targeter molecule additionally provides the targeting segment. Thus, a targeter and an activator (as a corresponding pair) hybridize to form a Cas9 guide RNA. The particular sequence of a given naturally existing crRNA or tracrRNA molecule is characteristic of the species in which the RNA molecules are found. Examples of suitable activator and targeter are well known in the art.
[00163] A Cas9 guide RNA (e.g. a dual guide RNA or a single guide RNA) can be comprised of any corresponding activator and targeter pair. Non-limiting examples of nucleotide sequences that can be included in a Cas9 guide RNA (dgRNA or sgRNA) include sequences set forth in SEQ ID NOs: 857-1105, or complements thereof. For example, in some cases, sequences from SEQ ID NOs: 857-987 (which are from tracrRNAs) or complements thereof, can pair with sequences from SEQ ID NOs: 994-1105 (which are from crRNAs), or complements thereof, to form a dsRNA duplex of a protein binding segment.
Targeting segment of a Cas9 guide RNA
[00164] The first segment of a subject guide nucleic acid includes a guide sequence (i.e., a targeting sequence)(a nucleotide sequence that is complementary to a sequence (a target site) in a target nucleic acid). In other words, the targeting segment of a subject guide nucleic acid can interact with a target nucleic acid (e.g., double stranded DNA (dsDNA)) in a sequence-specific manner via hybridization (i.e., base pairing). As such, the nucleotide sequence of the targeting segment may vary (depending on the target) and can determine the location within the target nucleic acid that the Cas9 guide RNA and the target nucleic acid will interact. The targeting segment of a Cas9 guide RNA can be modified (e.g., by genetic engineering)/designed to hybridize to any desired sequence (target site) within a target nucleic acid (e.g., a eukaryotic target nucleic acid such as genomic DNA).
[00165] The targeting segment can have a length of 7 or more nucleotides (nt) (e.g., 8 or more, 9 or more, 10 or more, 12 or more, 15 or more, 17 or more, 20 or more, 25 or more, 30 or more, or 40 or more nucleotides). In some cases, the targeting segment can have a length of from 7 to 100 nucleotides (nt) (e.g., from 7 to 80 nt, from 7 to 60 nt, from 7 to 40 nt, from 7 to 30 nt, from 7 to 25 nt, from 7 to 22 nt, from 7 to 20 nt, from 7 to 18 nt, from 8 to 80 nt, from 8 to 60 nt, from 8 to 40 nt, from 8 to 30 nt, from 8 to 25 nt, from 8 to 22 nt, from 8 to 20 nt, from 8 to 18 nt, from 10 to 100 nt, from 10 to 80 nt, from 10 to 60 nt, from 10 to 40 nt, from 10 to 30 nt, from 10 to 25 nt, from 10 to 22 nt, from 10 to 20 nt, from 10 to 18 nt, from 12 to 100 nt, from 12 to 80 nt, from 12 to 60 nt, from 12 to 40 nt, from 12 to 30 nt, from 12 to 25 nt, from 12 to 22 nt, from 12 to 20 nt, from 12 to 18 nt, from 14 to 100 nt, from 14 to 80 nt, from 14 to 60 nt, from 14 to 40 nt, from 14 to 30 nt, from 14 to 25 nt, from 14 to 22 nt, from 14 to 20 nt, from 14 to 18 nt, from 16 to 100 nt, from 16 to 80 nt, from 16 to 60 nt, from 16 to 40 nt, from 16 to 30 nt, from 16 to 25 nt, from 16 to 22 nt, from 16 to 20 nt, from 16 to 18 nt, from 18 to 100 nt, from 18 to 80 nt, from 18 to 60 nt, from 18 to 40 nt, from 18 to 30 nt, from 18 to 25 nt, from 18 to 22 nt, or from 18 to 20 nt).
[00166] The nucleotide sequence (the targeting sequence) of the targeting segment that is
complementary to a nucleotide sequence (target site) of the target nucleic acid can have a length of 10 nt or more. For example, the targeting sequence of the targeting segment that is complementary to a target site of the target nucleic acid can have a length of 12 nt or more, 15 nt or more, 17 nt or more, 18 nt or more, 19 nt or more, or 20 nt or more. In some cases, the nucleotide sequence (the targeting sequence) of the targeting segment that is complementary to a nucleotide sequence (target site) of the target nucleic acid has a length of 12 nt or more. In some cases, the nucleotide sequence (the targeting sequence) of the targeting segment that is complementary to a nucleotide sequence (target site) of the target nucleic acid has a length of 18 nt or more.
[00167] For example, the targeting sequence of the targeting segment that is complementary to a target sequence of the target nucleic acid can have a length of from 10 to 100 nucleotides (nt) (e.g., from 10 to 90 nt, from 10 to 75 nt, from 10 to 60 nt, from 10 to 50 nt, from 10 to 35 nt, from 10 to 30 nt, from 10 to 25 nt, from 10 to 22 nt, from 10 to 20 nt, from 12 to 100 nt, from 12 to 90 nt, from 12 to 75 nt, from 12 to 60 nt, from 12 to 50 nt, from 12 to 35 nt, from 12 to 30 nt, from 12 to 25 nt, from 12 to 22 nt, from 12 to 20 nt, from 15 to 100 nt, from 15 to 90 nt, from 15 to 75 nt, from 15 to 60 nt, from 15 to 50 nt, from 15 to 35 nt, from 15 to 30 nt, from 15 to 25 nt, from 15 to 22 nt, from 15 to 20 nt, from 17 to 100 nt, from 17 to 90 nt, from 17 to 75 nt, from 17 to 60 nt, from 17 to 50 nt, from 17 to 35 nt, from 17 to 30 nt, from 17 to 25 nt, from 17 to 22 nt, from 17 to 20 nt, from 17 to 19 nt, from 17 to 18 nt, from 18 to 100 nt, from 18 to 90 nt, from 18 to 75 nt, from 18 to 60 nt, from 18 to 50 nt, from 18 to 35 nt, from 18 to 30 nt, from 18 to 25 nt, from 18 to 22 nt, or from 18 to 20 nt). In some cases, the targeting sequence of the targeting segment that is complementary to a target sequence of the target nucleic acid has a length of from 15 nt to 30 nt. In some cases, the targeting sequence of the targeting segment that is complementary to a target sequence of the target nucleic acid has a length of from 15 nt to 25 nt. In some cases, the targeting sequence of the targeting segment that is complementary to a target sequence of the target nucleic acid has a length of from 18 nt to 30 nt. In some cases, the targeting sequence of the targeting segment that is complementary to a target sequence of the target nucleic acid has a length of from 18 nt to 25 nt. In some cases, the targeting sequence of the targeting segment that is complementary to a target sequence of the target nucleic acid has a length of from 18 nt to 22 nt. In some cases, the targeting sequence of the targeting segment that is complementary to a target site of the target nucleic acid is 20 nucleotides in length. In some cases, the targeting sequence of the targeting segment that is complementary to a target site of the target nucleic acid is 19 nucleotides in length.
The percent complementarity between the targeting sequence (guide sequence) of the targeting segment and the target site of the target nucleic acid can be 60% or more (e.g., 65% or more, 70% or more, 75% or more, 80% or more, 85% or more, 90% or more, 95% or more, 97% or more, 98% or more, 99% or more, or 100%). In some cases, the percent complementarity between the targeting sequence of the targeting segment and the target site of the target nucleic acid is 100% over the seven contiguous 5 '-most nucleotides of the target site of the target nucleic acid. In some cases, the percent complementarity between the targeting sequence of the targeting segment and the target site of the target nucleic acid is 60% or more over about 20 contiguous nucleotides. In some cases, the percent complementarity between the targeting sequence of the targeting segment and the target site of the target nucleic acid is 100% over the fourteen contiguous 5 '-most nucleotides of the target site of the target nucleic acid and as low as 0% or more over the remainder. In such a case, the targeting sequence can be considered to be 14 nucleotides in length. In some cases, the percent complementarity between the targeting sequence of the targeting segment and the target site of the target nucleic acid is 100% over the seven contiguous 5 '-most nucleotides of the target site of the target nucleic acid and as low as 0% or more over the remainder. In such a case, the targeting sequence can be considered to be 20 nucleotides in length. [00169] In some cases, the percent complementarity between the targeting sequence of the targeting segment and the target site of the target nucleic acid is 100% over the 7 contiguous 5 '-most nucleotides of the target site of the target nucleic acid (which can be complementary to the 3'- most nucleotides of the targeting sequence of the Cas9 guide RNA). In some cases, the percent complementarity between the targeting sequence of the targeting segment and the target site of the target nucleic acid is 100% over the 8 contiguous 5 '-most nucleotides of the target site of the target nucleic acid (which can be complementary to the 3 '-most nucleotides of the targeting sequence of the Cas9 guide RNA). In some cases, the percent complementarity between the targeting sequence of the targeting segment and the target site of the target nucleic acid is 100% over the 9 contiguous 5 '-most nucleotides of the target site of the target nucleic acid (which can be complementary to the 3 '-most nucleotides of the targeting sequence of the Cas9 guide RNA). In some cases, the percent complementarity between the targeting sequence of the targeting segment and the target site of the target nucleic acid is 100% over the 10 contiguous 5 '-most nucleotides of the target site of the target nucleic acid (which can be complementary to the 3'- most nucleotides of the targeting sequence of the Cas9 guide RNA). In some cases, the percent complementarity between the targeting sequence of the targeting segment and the target site of the target nucleic acid is 100% over the 17 contiguous 5 '-most nucleotides of the target site of the target nucleic acid (which can be complementary to the 3 '-most nucleotides of the targeting sequence of the Cas9 guide RNA). In some cases, the percent complementarity between the targeting sequence of the targeting segment and the target site of the target nucleic acid is 100% over the 18 contiguous 5 '-most nucleotides of the target site of the target nucleic acid (which can be complementary to the 3 '-most nucleotides of the targeting sequence of the Cas9 guide RNA). In some cases, the percent complementarity between the targeting sequence of the targeting segment and the target site of the target nucleic acid is 60% or more (e.g., e.g., 65% or more, 70% or more, 75% or more, 80% or more, 85% or more, 90% or more, 95% or more, 97% or more, 98% or more, 99% or more, or 100%) over about 20 contiguous nucleotides.
[00170] In some cases, the percent complementarity between the targeting sequence of the targeting segment and the target site of the target nucleic acid is 100% over the 7 contiguous 5 '-most nucleotides of the target site of the target nucleic acid and as low as 0% or more over the remainder. In such a case, the targeting sequence can be considered to be 7 nucleotides in length. In some cases, the percent complementarity between the targeting sequence of the targeting segment and the target site of the target nucleic acid is 100% over the 8 contiguous 5 '-most nucleotides of the target site of the target nucleic acid and as low as 0% or more over the remainder. In such a case, the targeting sequence can be considered to be 8 nucleotides in length. In some cases, the percent complementarity between the targeting sequence of the targeting segment and the target site of the target nucleic acid is 100% over the 9 contiguous 5 '-most nucleotides of the target site of the target nucleic acid and as low as 0% or more over the remainder. In such a case, the targeting sequence can be considered to be 9 nucleotides in length. In some cases, the percent complementarity between the targeting sequence of the targeting segment and the target site of the target nucleic acid is 100% over the 10 contiguous 5 '-most nucleotides of the target site of the target nucleic acid and as low as 0% or more over the remainder. In such a case, the targeting sequence can be considered to be 10 nucleotides in length. In some cases, the percent complementarity between the targeting sequence of the targeting segment and the target site of the target nucleic acid is 100% over the 11 contiguous 5'- most nucleotides of the target site of the target nucleic acid and as low as 0% or more over the remainder. In such a case, the targeting sequence can be considered to be 11 nucleotides in length. In some cases, the percent complementarity between the targeting sequence of the targeting segment and the target site of the target nucleic acid is 100% over the 12 contiguous 5'- most nucleotides of the target site of the target nucleic acid and as low as 0% or more over the remainder. In such a case, the targeting sequence can be considered to be 12 nucleotides in length. In some cases, the percent complementarity between the targeting sequence of the targeting segment and the target site of the target nucleic acid is 100% over the 13 contiguous 5'- most nucleotides of the target site of the target nucleic acid and as low as 0% or more over the remainder. In such a case, the targeting sequence can be considered to be 13 nucleotides in length. In some cases, the percent complementarity between the targeting sequence of the targeting segment and the target site of the target nucleic acid is 100% over the 14 contiguous 5'- most nucleotides of the target site of the target nucleic acid and as low as 0% or more over the remainder. In such a case, the targeting sequence can be considered to be 14 nucleotides in length. In some cases, the percent complementarity between the targeting sequence of the targeting segment and the target site of the target nucleic acid is 100% over the 17 contiguous 5'- most nucleotides of the target site of the target nucleic acid and as low as 0% or more over the remainder. In such a case, the targeting sequence can be considered to be 17 nucleotides in length. In some cases, the percent complementarity between the targeting sequence of the targeting segment and the target site of the target nucleic acid is 100% over the 18 contiguous 5'- most nucleotides of the target site of the target nucleic acid and as low as 0% or more over the remainder. In such a case, the targeting sequence can be considered to be 18 nucleotides in length.
Examples of various Cas9 proteins and Cas9 guide RNAs (as well as information regarding requirements related to protospacer adjacent motif (PAM) sequences present in targeted nucleic acids) can be found in the art, for example, see Jinek et al., Science. 2012 Aug 17;337(6096):816-21 ; Chylinski et al., RNA Biol. 2013 May;10(5):726-37; Ma et al., Biomed Res Int. 2013;2013:270805; Hou et al., Proc Natl Acad Sci U S A. 2013 Sep 24;110(39): 15644- 9; Jinek et al., Elife. 2013;2:e00471 ; Pattanayak et al., Nat Biotechnol. 2013 Sep;31(9):839-43; Qi et al., Cell. 2013 Feb 28;152(5): 1173-83; Wang et al., Cell. 2013 May 9;153(4):910-8; Auer et al., Genome Res. 2013 Oct 31 ; Chen et al., Nucleic Acids Res. 2013 Nov l ;41(20):el9; Cheng et al., Cell Res. 2013 Oct;23(10): 1163-71 ; Cho et al., Genetics. 2013 Nov;195(3): 1177-80;
DiCarlo et al., Nucleic Acids Res. 2013 Apr;41(7):4336-43; Dickinson et al., Nat Methods. 2013 Oct;10(10): 1028-34; Ebina et al., Sci Rep. 2013;3:2510; Fujii et al., Nucleic Acids Res. 2013 Nov l ;41(20):el87; Hu et al., Cell Res. 2013 Nov;23(l l): 1322-5; Jiang et al., Nucleic Acids Res. 2013 Nov l ;41(20):el88; Larson et al., Nat Protoc. 2013 Nov;8(l l):2180-96; Mali et al., Nat Methods. 2013 Oct;10(10):957-63; Nakayama et al., Genesis. 2013 Dec;51(12):835-43; Ran et al., Nat Protoc. 2013 Nov;8(l l):2281-308; Ran et al., Cell. 2013 Sep 12;154(6):1380-9;
Upadhyay et al., G3 (Bethesda). 2013 Dec 9;3(12):2233-8; Walsh et al., Proc Natl Acad Sci U S A. 2013 Sep 24;110(39): 15514-5; Xie et al., Mol Plant. 2013 Oct 9; Yang et al., Cell. 2013 Sep 12;154(6): 1370-9; Briner et al., Mol Cell. 2014 Oct 23;56(2):333-9; and U.S. patents and patent applications: 8,906,616; 8,895,308; 8,889,418; 8,889,356; 8,871,445; 8,865,406; 8,795,965; 8,771,945; 8,697,359; 20140068797; 20140170753; 20140179006; 20140179770; 20140186843; 20140186919; 20140186958; 20140189896; 20140227787; 20140234972; 20140242664;
20140242699; 20140242700; 20140242702; 20140248702; 20140256046; 20140273037;
20140273226; 20140273230; 20140273231 ; 20140273232; 20140273233; 20140273234;
20140273235; 20140287938; 20140295556; 20140295557; 20140298547; 20140304853;
20140309487; 20140310828; 20140310830; 20140315985; 20140335063; 20140335620;
20140342456; 20140342457; 20140342458; 20140349400; 20140349405; 20140356867;
20140356956; 20140356958; 20140356959; 20140357523; 20140357530; 20140364333; and 20140377868; all of which are hereby incorporated by reference in their entirety.
Guide RNAs corresponding to type V and type VI CRISPR/Cas endonucleases (e.g., Cpfl Guide RNA)
[00172] A guide RNA that binds to a type V or type VI CRISPR/Cas protein (e.g., Cpfl, C2cl, C2c2, C2c3), and targets the complex to a specific location within a target nucleic acid is referred to herein generally as a "type V or type VI CRISPR/Cas guide RNA" . An example of a more specific term is a "Cpfl guide RNA."
[00173] A type V or type VI CRISPR/Cas guide RNA (e.g., cpfl guide RNA) can have a total length of from 30 nucleotides (nt) to 200 nt, e.g., from 30 nt to 180 nt, from 30 nt to 160 nt, from 30 nt to 150 nt, from 30 nt to 125 nt, from 30 nt to 100 nt, from 30 nt to 90 nt, from 30 nt to 80 nt, from 30 nt to 70 nt, from 30 nt to 60 nt, from 30 nt to 50 nt, from 50 nt to 200 nt, from 50 nt to 180 nt, from 50 nt to 160 nt, from 50 nt to 150 nt, from 50 nt to 125 nt, from 50 nt to 100 nt, from 50 nt to 90 nt, from 50 nt to 80 nt, from 50 nt to 70 nt, from 50 nt to 60 nt, from 70 nt to 200 nt, from 70 nt to 180 nt, from 70 nt to 160 nt, from 70 nt to 150 nt, from 70 nt to 125 nt, from 70 nt to 100 nt, from 70 nt to 90 nt, or from 70 nt to 80 nt). In some cases, a type V or type VI CRISPR/Cas guide RNA (e.g., cpfl guide RNA) has a total length of at least 30 nt (e.g., at least 40 nt, at least 50 nt, at least 60 nt, at least 70 nt, at least 80 nt, at least 90 nt, at least 100 nt, or at least 120 nt,).
[00174] In some cases, a Cpfl guide RNA has a total length of 35 nt, 36 nt, 37 nt, 38 nt, 39 nt, 40 nt, 41 nt, 42 nt, 43 nt, 44 nt, 45 nt, 46 nt, 47 nt, 48 nt, 49 nt, or 50 nt.
[00175] Like a Cas9 guide RNA, a type V or type VI CRISPR/Cas guide RNA (e.g., cpfl guide RNA) can include a target nucleic acid-binding segment and a duplex-forming region (e.g., in some cases formed from two duplex-forming segments, i.e., two stretches of nucleotides that hybridize to one another to form a duplex).
[00176] The target nucleic acid-binding segment of a type V or type VI CRISPR/Cas guide RNA (e.g., cpfl guide RNA) can have a length of from 15 nt to 30 nt, e.g., 15 nt, 16 nt, 17 nt, 18 nt, 19 nt, 20 nt, 21 nt, 22 nt, 23 nt, 24 nt, 25 nt, 26 nt, 27 nt, 28 nt, 29 nt, or 30 nt. In some cases, the target nucleic acid-binding segment has a length of 23 nt. In some cases, the target nucleic acid-binding segment has a length of 24 nt. In some cases, the target nucleic acid-binding segment has a length of 25 nt.
[00177] The guide sequence of a type V or type VI CRISPR/Cas guide RNA (e.g., cpfl guide RNA) can have a length of from 15 nt to 30 nt (e.g., 15 to 25 nt, 15 to 24 nt, 15 to 23 nt, 15 to 22 nt, 15 to 21 nt, 15 to 20 nt, 15 to 19 nt, 15 to 18 nt,17 to 30 nt, 17 to 25 nt, 17 to 24 nt, 17 to 23 nt, 17 to 22 nt, 17 to 21 nt, 17 to 20 nt, 17 to 19 nt, 17 to 18 nt, 18 to 30 nt, 18 to 25 nt, 18 to 24 nt, 18 to 23 nt, 18 to 22 nt, 18 to 21 nt, 18 to 20 nt, 18 to 19 nt, 19 to 30 nt, 19 to 25 nt, 19 to 24 nt, 19 to 23 nt, 19 to 22 nt, 19 to 21 nt, 19 to 20 nt, 20 to 30 nt, 20 to 25 nt, 20 to 24 nt, 20 to 23 nt, 20 to 22 nt, 20 to 21 nt, 15 nt, 16 nt, 17 nt, 18 nt, 19 nt, 20 nt, 21 nt, 22 nt, 23 nt, 24 nt, 25 nt, 26 nt, 27 nt, 28 nt, 29 nt, or 30 nt). In some cases, the guide sequence has a length of 17 nt. In some cases, the guide sequence has a length of 18 nt. In some cases, the guide sequence has a length of 19 nt. In some cases, the guide sequence has a length of 20 nt. In some cases, the guide sequence has a length of 21 nt. In some cases, the guide sequence has a length of 22 nt. In some cases, the guide sequence has a length of 23 nt. In some cases, the guide sequence has a length of 24 nt.
[00178] The guide sequence of a type V or type VI CRISPR/Cas guide RNA (e.g., cpfl guide RNA) can have 100% complementarity with a corresponding length of target nucleic acid sequence. The guide sequence can have less than 100% complementarity with a corresponding length of target nucleic acid sequence. For example, the guide sequence of a type V or type VI CRISPR/Cas guide RNA (e.g., cpfl guide RNA) can have 1, 2, 3, 4, or 5 nucleotides that are not complementary to the target nucleic acid sequence. For example, in some cases, where a guide sequence has a length of 25 nucleotides, and the target nucleic acid sequence has a length of 25 nucleotides, in some cases, the target nucleic acid-binding segment has 100% complementarity to the target nucleic acid sequence. As another example, in some cases, where a guide sequence has a length of 25 nucleotides, and the target nucleic acid sequence has a length of 25 nucleotides, in some cases, the target nucleic acid-binding segment has 1 non-complementary nucleotide and 24 complementary nucleotides with the target nucleic acid sequence. As another example, in some cases, where a guide sequence has a length of 25 nucleotides, and the target nucleic acid sequence has a length of 25 nucleotides, in some cases, the target nucleic acid- binding segment has 2 non-complementary nucleotides and 23 complementary nucleotides with the target nucleic acid sequence.
[00179] The duplex-forming segment of a type V or type VI CRISPR/Cas guide RNA (e.g., cpfl guide RNA) (e.g., of a targeter RNA or an activator RNA) can have a length of from 15 nt to 25 nt (e.g., 15 nt, 16 nt, 17 nt, 18 nt, 19 nt, 20 nt, 21 nt, 22 nt, 23 nt, 24 nt, or 25 nt).
[00180] The RNA duplex of a type V or type VI CRISPR/Cas guide RNA (e.g., cpfl guide RNA) can have a length of from 5 base pairs (bp) to 40 bp (e.g., from 5 to 35 bp, 5 to 30 bp, 5 to 25 bp, 5 to 20 bp, 5 to 15 bp, 5-12 bp, 5-10 bp, 5-8 bp, 6 to 40 bp, 6 to 35 bp, 6 to 30 bp, 6 to 25 bp, 6 to 20 bp, 6 to 15 bp, 6 to 12 bp, 6 to 10 bp, 6 to 8 bp, 7 to 40 bp, 7 to 35 bp, 7 to 30 bp, 7 to 25 bp, 7 to 20 bp, 7 to 15 bp, 7 to 12 bp, 7 to 10 bp, 8 to 40 bp, 8 to 35 bp, 8 to 30 bp, 8 to 25 bp, 8 to 20 bp, 8 to 15 bp, 8 to 12 bp, 8 to 10 bp, 9 to 40 bp, 9 to 35 bp, 9 to 30 bp, 9 to 25 bp, 9 to 20 bp, 9 to 15 bp, 9 to 12 bp, 9 to 10 bp, 10 to 40 bp, 10 to 35 bp, 10 to 30 bp, 10 to 25 bp, 10 to 20 bp, 10 to 15 bp, or 10 to 12bp).
[00181] As an example, a duplex-forming segment of a Cpfl guide RNA can comprise a nucleotide sequence selected from (5' to 3'): AAUUUCUACUGUUGUAGAU (SEQ ID NO: 1106), AAUUUCUGCUGUUGCAGAU (SEQ ID NO: 1107), AAUUUCCACUGUUGUGGAU (SEQ ID NO: 1095), AAUUCCUACUGUUGUAGGU (SEQ ID NO: 1109),
AAUUUCUACUAUUGUAGAU (SEQ ID NO: 1110), AAUUUCUACUGCUGUAGAU (SEQ ID NO: 1111), AAUUUCUACUUUGUAGAU (SEQ ID NO: 1112), and
AAUUUCUACUUGUAGAU (SEQ ID NO: 1113). The guide sequence can then follow (5' to 3') the duplex forming segment.
[00182] A non-limiting example of an activator RNA (e.g. tracrRNA) of a C2cl guide RNA (dual guide or single guide) is an RNA that includes the nucleotide sequence
GAAUUUUUCAACGGGUGUGCCAAUGGCCACUUUCCAGGUGGCAAAGCCCGUUGA GCUUCUCAAAAAG (SEQ ID NO: 1114). In some cases, a C2cl guide RNA (dual guide or single guide) is an RNA that includes the nucleotide sequence In some cases, a C2cl guide RNA (dual guide or single guide) is an RNA that includes the nucleotide sequence
GUCUAGAGGACAGAAUUUUUCAACGGGUGUGCCAAUGGCCACUUUCCAGGUGGC AAAGCCCGUUGAGCUUCUCAAAAAG (SEQ ID NO: 1115). In some cases, a C2cl guide RNA (dual guide or single guide) is an RNA that includes the nucleotide sequence
UCUAGAGGACAGAAUUUUUCAACGGGUGUGCCAAUGGCCACUUUCCAGGUGGCA AAGCCCGUUGAGCUUCUCAAAAAG (SEQ ID NO: 1116). A non-limiting example of an activator RNA (e.g. tracrRNA) of a C2cl guide RNA (dual guide or single guide) is an RNA that includes the nucleotide sequence ACUUUCCAGGCAAAGCCCGUUGAGCUUCUCAAAAAG (SEQ ID NO: 1117). In some cases, a duplex forming segment of a C2cl guide RNA (dual guide or single guide) of an activator RNA (e.g. tracrRNA) includes the nucleotide sequence
AGCUUCUCA (SEQ ID NO: 1118) or the nucleotide sequence GCUUCUCA (SEQ ID NO: 1119) (the duplex forming segment from a naturally existing tracrRNA.
[00183] A non-limiting example of a targeter RNA (e.g. crRNA) of a C2cl guide RNA (dual guide or single guide) is an RNA with the nucleotide sequence
CUGAGAAGUGGCACNNNNNNNNNNNNNNNNNNNN (SEQ ID NO: 1120), where the Ns represent the guide sequence, which will vary depending on the target sequence, and although 20 Ns are depicted a range of different lengths are acceptable. In some cases, a duplex forming segment of a C2cl guide RNA (dual guide or single guide) of a targeter RNA (e.g. crRNA) includes the nucleotide sequence CUGAGAAGUGGCAC (SEQ ID NO: 1121) or includes the nucleotide sequence CUGAGAAGU (SEQ ID NO: 1122) or includes the nucleotide sequence UGAGAAGUGGCAC (SEQ ID NO: 1123) or includes the nucleotide sequence UGAGAAGU (SEQ ID NO: 1124).
[00184] Examples and guidance related to type V or type VI CRISPR/Cas endonucleases (including domain structure) and guide RNAs (as well as information regarding requirements related to protospacer adjacent motif (PAM) sequences present in targeted nucleic acids) can be found in the art, for example, see Zetsche et al., Cell. 2015 Oct 22;163(3):759-71 ; Makarova et al., Nat Rev Microbiol. 2015 Nov;13(l l):722-36; Shmakov et al., Mol Cell. 2015 Nov 5;60(3):385-97; and Shmakov et al, Nat Rev Microbiol. 2017 Mar;15(3): 169-182; and U.S. patents and patent applications: 9,580,701 ; 20170073695, 20170058272, 20160362668, 20160362667,
20160298078, 20160289637, 20160215300, 20160208243, and 20160208241, all of which are hereby incorporated by reference in their entirety. Target cells
[00185] A target nucleic acid (e.g., target genomic DNA) can be located within a eukaryotic cell, for example, inside of a eukaryotic cell in vitro, inside of a eukaryotic cell in vivo, inside of a eukaryotic cell ex vivo. Suitable target cells (which can comprise target nucleic acids) include, but are not limited to: a single-celled eukaryotic organism; a cell of a single -cell eukaryotic organism; a plant cell; an algal cell, e.g., Botryococcus braunii, Chlamydomonas reinhardtii, Nannochloropsis gaditana, Chlorella pyrenoidosa, Sargassum patens, C. agardh, and the like; a fungal cell (e.g., a yeast cell); an animal cell; a cell of an invertebrate animal (e.g. fruit fly, cnidarian, echinoderm, nematode, etc.); a cell of a vertebrate animal (e.g., fish, amphibian, reptile, bird, mammal); a cell of a mammal (e.g., a cell of a rodent such as a mouse or rat, a cell of a non-human primate, a cell of a human, etc.); and the like. Any type of cell may be of interest (e.g. a stem cell, e.g. an embryonic stem (ES) cell, an induced pluripotent stem (iPS) cell, a hematopoietic stem cell, a germ cell (e.g., an oocyte, a sperm, an oogonia, a spermatogonia, etc.), a somatic cell, e.g. a fibroblast, an oligodendrocyte, a glial cell, a hematopoietic cell, a neuron, a muscle cell, a bone cell, a hepatocyte, a pancreatic cell; an in vitro or in vivo embryonic cell of an embryo at any stage, e.g., a 1-cell, 2-cell, 4-cell, 8-cell, etc. stage zebrafish embryo; etc.). Cells may be from established cell lines or they may be primary cells, where "primary cells", "primary cell lines", and "primary cultures" are used interchangeably herein to refer to cells and cells cultures that have been derived from a subject and allowed to grow in vitro for a limited number of passages, i.e. splittings, of the culture. For example, primary cultures are cultures that may have been passaged 0 times, 1 time, 2 times, 4 times, 5 times, 10 times, or 15 times, but not enough times go through the crisis stage. Typically, the primary cell lines are maintained for fewer than 10 passages in vitro. Target cells can be unicellular organisms and/or can be grown in culture. If the cells are primary cells, they may be harvest from an individual by any convenient method. For example, leukocytes may be conveniently harvested by apheresis, leukocytapheresis, density gradient separation, etc., while cells from tissues such as skin, muscle, bone marrow, spleen, liver, pancreas, lung, intestine, stomach, etc. can be conveniently harvested by biopsy.
[00186] Target cells include in vivo target cells. Target cells include retinal cells (e.g., Muller cells, ganglion cells, amacrine cells, horizontal cells, bipolar cells, and photoreceptor cells including rods and cones, Muller glial cells, and retinal pigmented epithelium); neural cells (e.g., cells of the thalamus, sensory cortex, zona incerta (71), ventral tegmental area (VTA), prefontal cortex (PFC), nucleus accumbens (NAc), amygdala (BLA), substantia nigra, ventral pallidum, globus pallidus, dorsal striatum, ventral striatum, subthalamic nucleus, hippocampus, dentate gyrus, cingulate gyrus, entorhinal cortex, olfactory cortex, primary motor cortex, or cerebellum); liver cells; kidney cells; immune cells (e.g., T cells, B cells, and the like); cardiac cells; skeletal muscle cells; smooth muscle cells; lung cells; blood cells, pancreas cells, bone cells, and the like. Guide nucleic acid (guide RNA) positioning
[00187] As noted above, the guide sequences of the guide RNAs are targeted to target sequences of the target DNA, and the guide sequences are positioned relative to one another such that upon hybridization to the target sequences the fusion proteins to which the guide RNAs are bound are oriented so that the split BirA* portions of fusion proteins interact to form a functional BirA* protein. This can be achieved in a number of ways, and can depend on the orientation of the fusion proteins (e.g., whether the split BirA* portion is fused to the N-terminus, C-terminus, or internal to the CRISPR/Cas nucleic acid-guided protein).
[00188] A CRISPR/Cas nucleic acid-guided protein (e.g., Cas9, Cpfl, CasX, CasY, and the like) binds to a double stranded target DNA at a target sequence defined by both (i) base-pairing complementarity between the guide RNA and the target DNA; and (ii) a short motif [referred to as the protospacer adjacent motif (PAM)] in the target DNA. The location/orientation of the PAM relative to the target sequence can vary from protein to protein, and for a given protein, the species from which the protein is derived can define the sequence of the PAM. For example, the PAM for a natural Cas9 protein is immediately 3' of the target sequence of the non- complementary strand of the target DNA (the complementary strand hybridizes to the guide sequence of the guide RNA while the non-complementary strand does not directly hybridize with the guide RNA and is the reverse complement of the non-complementary strand) while the PAM for a natural CasX protein is immediately 5' of the target sequence of the non- complementary strand of the target DNA. In some embodiments (e.g., when Cas9 from 5.
pyogenes is used), the PAM sequence of the non-complementary strand is 5'-NGG-3', where N is any DNA nucleotide. In some embodiments (e.g., when CasXl as described herein is used), the PAM sequence of the non-complementary strand is 5'-TCN-3' (and in some cases TTCN), where N is any DNA nucleotide.
[00189] In some cases, different CRISPR/Cas nucleic acid-guided proteins (e.g., Cas9, Cpfl, CasX, CasY, and the like), e.g., from various different species, may be advantageous to use in the various provided methods in order to capitalize on desired enzymatic characteristics of the different proteins (e.g., for different PAM sequence preferences; for increased or decreased enzymatic activity; for an increased or decreased level of cellular toxicity; to change the balance between NHEJ, homology-directed repair, single strand breaks, double strand breaks, etc.; to take advantage of a short amino acid sequence; and the like). CRISPR/Cas nucleic acid-guided proteins from different species may require different PAM sequences in the target DNA. Thus, for a particular protein of choice, the PAM sequence requirement may be different than the 5'- NGG-3'or 5'-TCN-3' described above. Various methods (including in silico and/or wet lab methods) for identification of the appropriate PAM sequence for any given CRISPR/Cas nucleic acid-guided protein are known in the art and are routine, and any convenient method can be used (e.g., an in silico sequence analysis and/or a PAM depletion assay can be used).
[00190] In some embodiments, the first target sequence is immediately adjacent a first protospacer adjacent motif (PAM); the second target sequence is immediately adjacent a second protospacer adjacent motif (PAM); and the first and second PAMs are separated by 15 to 100 base pairs (bp). In some cases, the first target sequence is immediately adjacent a first protospacer adjacent motif (PAM); the second target sequence is immediately adjacent a second protospacer adjacent motif (PAM); and the first and second PAMs are separated by 15 to 100 base pairs (bp) (e.g., separated by 15 to 80 bp, 15 to 60 bp, 15 to 50 bp , 15 to 40 bp, 15 to 30 bp, 15 to 25 bp, 25 to 100 bp, 25 to 80 bp, 25 to 60 bp, 25 to 50 bp , 25 to 40 bp, 25 to 35 bp, 35 to 100 bp, 35 to 80 bp, 35 to 60 bp, 35 to 55 bp, 35 to 50 bp , 35 to 45 bp, 55 to 100 bp, 55 to 80 bp, or 55 to 75 bp). For example, if two guide RNAs are used and each has a guide sequence of 20 nucleotides (20nt), where the two target sequences have no overlap but have no intervening base pairs (they are adjacent one another), then the first and second PAMs would be separated by 40 bp. As another example, if two guide RNAs are used and each has a guide sequence of 20 nucleotides (20nt), where the two target sequences have no overlap and have 20 intervening base pairs, then the first and second PAMs would be separated by 60 bp. In some cases the first and second PAMs are separated by 35 to 80 bp. In some cases the first and second PAMs are separated by 35 to 55 bp. In some cases the first and second PAMs are separated by 55 to 75 bp.
[00191] In some cases the first and second guide nucleic acids (e.g., guide RNAs), bind to target sites on a target DNA that are adjacent one another. In some cases the first and second guide nucleic acids (e.g., guide RNAs), bind to target sites on a target DNA that overlap. Thus, in some embodiments, the first and second guide nucleic acids (e.g., guide RNAs) hybridize to overlapping target sequences (on opposite strands of the target DNA). In some embodiments, the first and second guide nucleic acids (e.g., guide RNAs) hybridize to target sequences that do not overlap (can be to same or different strand of the target DNA, depending on the orientation of the fusion proteins). In some embodiments, the first and second guide nucleic acids (e.g., guide RNAs) hybridize to target sequences that are separated from one another by 0 to 60 intervening base pairs (bp) (e.g., from 0 to 50, 0 to 40, 0 to 30, 0 to 20, 0 to 15, 0 to 10, 5 to 60, 5 to 50, 5 to 40, 5 to 30, 5 to 20, 5 to 15, 5 to 10, 10 to 60, 10 to 50, 10 to 40, 10 to 30, 10 to 20, 10 to 15, 20 to 60, 20 to 50, 20 to 40, or 20 to 30 intervening bp).
[00192] In some embodiments, the first and second guide nucleic acids (e.g., guide RNAs) hybridize to target sequences where the PAMs (and target sequences) are farther apart than 100 bp. Likewise, in some embodiments, the first and second guide nucleic acids (e.g., guide RNAs) hybridize to target sequences that are separated by more than 60 bp. For example in some cases regions of DNA that are very far apart (even megabases apart) can physically interact. For example, in some such cases distant enhancers influence expression from a promoter. Thus, the first and second guide nucleic acids (e.g., guide RNAs) can hybridize to target sequences that are far apart when considering primary nucleotide sequence, but are actually physically close to one other at particular points in time (e.g., while a distant enhancer is influencing expression from a promoter). As an illustrative example, if one were interesting in tagging proteins that interact at a promoter sequence when a particular distant enhancer is actively influencing expression from that promoter, one of the guide nucleic acids (guide RNAs) could be targeted to a sequence at or near the promoter while the other could be targeted to a distant enhancer sequence - in such a case, target proteins would only be tagged when a functional BirA* is present, and that would only happen at times when the two split BirA* portions of the subject fusion proteins are brought together (such would be the case when/if the distant enhancer is physically near the promoter sequence despite being far apart when considering the nucleotide sequence).
[00193] In some embodiments, which fusion protein (first or second) binds to which guide nucleic acid (first or second) is not controlled - thus in some cases each fusion protein can bind to either of the two guide nucleic acids. In some cases which fusion protein (first or second) binds to which guide nucleic acid (first or second) is controlled. For example, the nuclease defective
CRISPR/Cas nucleic acid-guided polypeptide (e.g., dCas9) of the first fusion protein can be a protein (or can be from a species) that can bind to the first guide nucleic acid but cannot bind to the second guide nucleic acid. Likewise, the nuclease defective CRISPR/Cas nucleic acid- guided polypeptide (e.g., dCas9) of the second fusion protein can be a protein (or can be from a species) that can bind to the second guide nucleic acid but cannot bind to the first guide nucleic acid.
Nucleic acids
[00194] In some cases, a subject method or composition/kit includes a nucleic acid (RNA or DNA) encoding a subject fusion protein (e.g., a CRISPR/Cas nucleic acid-guided protein fused to a split BirA* protein portion) and/or a CRISPR/Cas guide nucleic acid (e.g., guide RNA). In some cases, a subject method or composition/kit includes a DNA (e.g., recombinant expression vector) encoding a subject fusion protein (e.g., a CRISPR/Cas nucleic acid-guided protein fused to a split BirA* protein portion) and/or a CRISPR/Cas guide nucleic acid (e.g., guide RNA). In some cases a subject fusion protein is encoded on the same nucleic acid molecule (e.g., viral DNA, plasmid DNA) as a CRISPR/Cas guide nucleic acid. In some cases a subject fusion protein is encoded on a different nucleic acid molecule (e.g., viral DNA, plasmid DNA, mRNA) as a CRISPR/Cas guide nucleic acid. Thus, in some cases a subject method or composition/kit includes one or more nucleic acids (RNA or DNA) encoding one or more subject fusion proteins (e.g., CRISPR/Cas nucleic acid-guided protein fused to a split BirA* protein portion) and/or one or more CRISPR/Cas guide nucleic acids (e.g., guide RNAs).
[00195] Many vectors, e.g. plasmids, cosmids, minicircles, phage, viruses, etc., useful for transferring nucleic acids into target cells are available. The vectors comprising the nucleic acid(s) may be maintained episomally, e.g. as plasmids, minicircle DNAs, viruses such cytomegalovirus, adenovirus, etc., or they may be integrated into the target cell genome, through homologous recombination or random integration, e.g. retrovirus-derived vectors such as MMLV, HIV-1 , ALV, etc.
[00196] Vectors may be provided directly to cells. In other words, cells can be contacted with one or more vectors encoding subject fusion proteins and/or corresponding CRISPR/Cas guide nucleic acids (e.g., guide RNAs) such that the vectors are taken up by the cells. Methods for contacting cells with nucleic acid vectors that are plasmids, including electroporation, calcium chloride transfection, microinjection, and lipofection are well known in the art. For viral vector delivery, the cells can be contacted with viral particles comprising the nucleic acid. Retroviruses, for example, lentiviruses, are particularly suitable to the method of the invention. Commonly used retroviral vectors are "defective", i.e. unable to produce viral proteins required for productive infection. Rather, replication of the vector requires growth in a packaging cell line. To generate viral particles comprising nucleic acids of interest, the retroviral nucleic acids comprising the nucleic acid can be packaged into viral capsids by a packaging cell line. Different packaging cell lines provide a different envelope protein (ecotropic, amphotropic or xenotropic) to be incorporated into the capsid, this envelope protein determining the specificity of the viral particle for the cells (ecotropic for murine and rat; amphotropic for most mammalian cell types including human, dog and mouse; and xenotropic for most mammalian cell types except murine cells). The appropriate packaging cell line may be used to ensure that the cells are targeted by the packaged viral particles. Methods of introducing vectors into packaging cell lines and of collecting the viral particles that are generated by the packaging lines are well known in the art. Nucleic acids can also introduced by direct micro-injection.
[00197] Vectors used for providing subject fusion proteins and/or corresponding CRISPR/Cas guide nucleic acids (e.g., guide RNAs) to target cells can include suitable promoters for driving expression, that is, transcriptional activation, of the nucleic acid of interest. In other words, the nucleic acid of interest (e.g., nucleic acids that include nucleotide sequences encoding subject fusion proteins and/or corresponding CRISPR/Cas guide RNAs) can be operably linked to a promoter (e.g., a promoter operable in the target cell). This may include ubiquitously acting promoters (e.g., CMV promoter, β-actin promoter, EF-1 alpha promoter, and the like), or inducible promoters (e.g., promoters that are active in particular cell populations or that respond to the presence of drugs such as tetracycline). A promoter can be a pol II promoter or a pol III promoter (e.g., for expressing guide RNA) such as a U6 promoter or a HI promoter._By transcriptional activation, it is intended that transcription will be increased above basal levels in the target cell by at least about 10 fold, by at least about 100 fold, more usually by at least about 1000 fold. In addition, suitable vectors may include nucleic acid sequences that encode for selectable markers in the target cells, so as to identify cells that have taken up the introduced nucleic acid.
[00198] A subject fusion protein (nuclease defective CRISPR/Cas nucleic acid-guided protein, e.g., Cas9, fused to a portion of a split BirA* protein) can be introduced into cells as a polypeptide (e.g., in some cases complexed with a guide RNA, thus forming an RNP). Such a polypeptide may optionally be fused to a polypeptide domain that increases solubility of the product. The domain may be linked to the polypeptide through a defined protease cleavage site, e.g. a TEV sequence, which is cleaved by TEV protease. The linker may also include one or more flexible sequences, e.g. from 1 to 10 glycine residues. In some embodiments, the cleavage of the fusion protein is performed in a buffer that maintains solubility of the product, e.g. in the presence of from 0.5 to 2 M urea, in the presence of polypeptides and/or polynucleotides that increase solubility, and the like. Domains of interest include endosomolytic domains, e.g. influenza HA domain; and other polypeptides that aid in production, e.g. IF2 domain, GST domain, GRPE domain, and the like. The polypeptide may be formulated for improved stability. For example, the peptides may be PEGylated, where the polyethyleneoxy group provides for enhanced lifetime in the blood stream.
[00199] Additionally or alternatively, a subject fusion protein (nuclease defective CRISPR/Cas nucleic acid-guided protein, e.g., Cas9, fused to a portion of a split BirA* protein) may be fused to a polypeptide permeant domain to promote uptake by the cell. A number of permeant domains are known in the art and may be used in the non-integrating polypeptides of the present invention, including peptides, peptidomimetics, and non-peptide carriers. For example, a permeant peptide may be derived from the third alpha helix of Drosophila melanogaster transcription factor Antennapaedia, referred to as penetratin, which comprises the amino acid sequence
RQIKIWFQNRRMKWKK (SEQ ID NO: 1129). As another example, the permeant peptide comprises the HIV-1 tat basic region amino acid sequence, which may include, for example, amino acids 49-57 of naturally-occurring tat protein. Other permeant domains include poly- arginine motifs, for example, the region of amino acids 34-56 of HIV-1 rev protein, nona- arginine, octa-arginine, and the like. (See, for example, Futaki et al. (2003) Curr Protein Pept Sci. 2003 Apr; 4(2): 87-9 and 446; and Wender et al. (2000) Proc. Natl. Acad. Sci. U.S.A 2000 Nov. 21 ; 97(24): 13003-8; published U.S. Patent applications 20030220334; 20030083256; 20030032593; and 20030022831, herein specifically incorporated by reference for the teachings of translocation peptides and peptoids). The nona-arginine (R9) sequence is one of the more efficient PTDs that have been characterized (Wender et al. 2000; Uemura et al. 2002). The site at which the fusion is made may be selected in order to optimize the biological activity, secretion or binding characteristics of the polypeptide. The optimal site will be determined by routine experimentation.
[00200] A subject fusion protein (nuclease defective CRISPR/Cas nucleic acid-guided protein, e.g.,
Cas9, fused to a portion of a split BirA* protein) may be produced in vitro or by eukaryotic cells or by prokaryotic cells, and it may be further processed by unfolding, e.g. heat denaturation, DTT reduction, etc. and may be further refolded, using methods known in the art.
[00201] Modifications of interest that do not alter primary sequence include chemical derivatization of polypeptides, e.g., acylation, acetylation, carboxylation, amidation, etc. Also included are modifications of glycosylation, e.g. those made by modifying the glycosylation patterns of a polypeptide during its synthesis and processing or in further processing steps; e.g. by exposing the polypeptide to enzymes which affect glycosylation, such as mammalian glycosylating or deglycosylating enzymes. Also embraced are sequences that have phosphorylated amino acid residues, e.g. phosphotyrosine, phosphoserine, or phosphothreonine.
[00202] A subject fusion protein (nuclease defective CRISPR/Cas nucleic acid-guided protein, e.g., Cas9, fused to a portion of a split BirA* protein) may be isolated and purified, e.g., in accordance with conventional methods of recombinant synthesis. A lysate may be prepared of the expression host and the lysate purified using HPLC, exclusion chromatography, gel electrophoresis, affinity chromatography, or other purification technique. The compositions which are used can comprise at least 20% by weight of the desired product, more usually at least 75% by weight, e.g., at least 95% by weight, in relation to contaminants related to the method of preparation of the product and its purification. The percentages can be based upon total protein. Target cells
[00203] A target nucleic acid (e.g., target DNA, target genomic DNA) can be located within any
desired type of cell. For example, a target nucleic acid (e.g., target DNA, target genomic DNA) can be located within a eukaryotic cell, for example, inside of a eukaryotic cell in vitro, inside of a eukaryotic cell in vivo, inside of a eukaryotic cell ex vivo. Suitable target cells (which can comprise target nucleic acids) include, but are not limited to: a single -celled eukaryotic organism; a cell of a single-cell eukaryotic organism; a plant cell; an algal cell, e.g.,
Botryococcus braunii, Chlamydomonas reinhardtii, Nannochloropsis gaditana, Chlorella pyrenoidosa, Sargassum patens, C. agardh, and the like; a fungal cell (e.g., a yeast cell); an animal cell; a cell of an invertebrate animal (e.g. fruit fly, cnidarian, echinoderm, nematode, etc.); a cell of a vertebrate animal (e.g., fish, amphibian, reptile, bird, mammal); a cell of a mammal (e.g., a cell of a rodent such as a mouse or rat, a cell of a non-human primate, a cell of a human, etc.); and the like. In some cases a target nucleic acid is located inside of a prokaryotic cell.
[00204] Any type of cell may be of interest (e.g. a stem cell, e.g. an embryonic stem (ES) cell, an induced pluripotent stem (iPS) cell, a hematopoietic stem cell, a germ cell (e.g., an oocyte, a sperm, an oogonia, a spermatogonia, etc.), a somatic cell, e.g. a fibroblast, an oligodendrocyte, a glial cell, a hematopoietic cell, a neuron, a muscle cell, a bone cell, a hepatocyte, a pancreatic cell; an in vitro or in vivo embryonic cell of an embryo at any stage, e.g., a 1-cell, 2-cell, 4-cell, 8-cell, etc. stage zebrafish embryo; etc.). Cells may be from established cell lines or they may be primary cells, where "primary cells", "primary cell lines", and "primary cultures" are used interchangeably herein to refer to cells and cells cultures that have been derived from a subject and allowed to grow in vitro for a limited number of passages, i.e. splittings, of the culture. For example, primary cultures are cultures that may have been passaged 0 times, 1 time, 2 times, 4 times, 5 times, 10 times, or 15 times, but not enough times go through the crisis stage.
Typically, the primary cell lines are maintained for fewer than 10 passages in vitro. Target cells can be unicellular organisms and/or can be grown in culture. If the cells are primary cells, they may be harvest from an individual by any convenient method. For example, leukocytes may be conveniently harvested by apheresis, leukocytapheresis, density gradient separation, etc., while cells from tissues such as skin, muscle, bone marrow, spleen, liver, pancreas, lung, intestine, stomach, etc. can be conveniently harvested by biopsy.
[00205] Target cells include in vivo target cells. Target cells include retinal cells (e.g., Muller cells, ganglion cells, amacrine cells, horizontal cells, bipolar cells, and photoreceptor cells including rods and cones, Muller glial cells, and retinal pigmented epithelium); neural cells (e.g., cells of the thalamus, sensory cortex, zona incerta (ZI), ventral tegmental area (VTA), prefontal cortex (PFC), nucleus accumbens (NAc), amygdala (BLA), substantia nigra, ventral pallidum, globus pallidus, dorsal striatum, ventral striatum, subthalamic nucleus, hippocampus, dentate gyrus, cingulate gyrus, entorhinal cortex, olfactory cortex, primary motor cortex, or cerebellum); liver cells; kidney cells; immune cells (e.g., T cells, B cells, and the like); cardiac cells; skeletal muscle cells; smooth muscle cells; lung cells; and the like.
Introducing components (e.g., into a cell)
[00206] In some cases a subject method takes place outside of a cell in vitro (e.g., in a vessel such as a test tube). In some cases, a subject method takes place inside of a cell (e.g., in vivo, ex vivo, or inside of a cell in vitro, e.g., in culture). Whether a subject method takes place outside of a cell or inside of a cell, in some cases, a subject method includes providing biotin. In some such cases, e.g., cells in culture can be supplemented with exogenous biotin (e.g., by supplying biotin in the culture medium).
[00207] In some cases, a subject method includes introducing at least one of the following into a cell (and in some cases all of the following) into a cell: (i) the first protein or a nucleic acid encoding the same; (ii) the second fusion protein or a nucleic acid encoding the same; (iii) the first guide nucleic acid or a nucleic acid encoding the same; and (iv) the second guide nucleic acid or a nucleic acid encoding the same.
[00208] Components of described herein can be delivered (introduced into a cell) as DNA, RNA, or protein. For example, when the composition includes a subject fusion protein (nuclease defective CRISPR/Cas nucleic acid-guided protein, e.g., Cas9, fused to a portion of a split BirA* protein) and a corresponding guide nucleic acid (e.g., guide RNA), the fusion protein(s) and guide RNA(s) can be delivered (introduced into the cell) as an RNP complex (i.e., a pre-assembled complex of the protein and guide). Thus, a subject fusion protein can be introduced into a cell as a protein. Alternatively, a subject fusion protein can be introduced into a cell as a nucleic acid (DNA and/or RNA) encoding the fusion protein. A CRISPR/Cas guide nucleic acid (e.g., guide RNA) can be introduced into a cell as RNA, as DNA encoding the guide RNA, or as a
DNA/RNA hybrid molecule (any of which - RNA, DNA, RNA/DNA hybrid - can include non- natural modifications).
[00209] Examples of ways to introduce the above components to cells (e.g., prokaryotic cells,
eukaryotic cells, plant cells, animal cells, mammalian cells, human cells, etc.) include but are not limited to: viral or bacteriophage infection, transfection, conjugation, protoplast fusion, lipofection, electroporation, calcium phosphate precipitation, polyethyleneimine (PEI) -mediated transfection, DEAE-dextran mediated transfection, liposome-mediated transfection, particle gun technology, calcium phosphate precipitation, direct micro injection, nanoparticle-mediated nucleic acid delivery (see, e.g., Panyam et., al Adv Drug Deliv Rev. 2012 Sep 13. pii: S0169- 409X(12)00283-9. doi: 10.1016/j.addr.2012.09.023 ), and the like.
[00210] The components can be introduced into a cell in vivo (e.g., administered to an individual) and/or in vitro (e.g., delivered to a cell in culture) and/or ex vivo using any convenient method. For example, for a cell in vivo - local or systemic, injection, local or system injection, oral, parenteral, subcutaneous, intravenous, intracranial, intraspinal, intraocular, into spinal fluid, intratumoral, intraocular, topical, and the like. In some cases, introduction can include nucleofection, electroporation, and the like. In some cases, introduction does not include nucleofection or electroporation. Detection / isolation
[00211] In some cases, a subject method includes a step of detecting, identifying, and/or isolating one or more (e.g., 2 or more, 3 or more, 4 or more, 5 or more, etc.) target proteins tagged by a subject corresponding pair of split BirA* fusion proteins. Biotinylated proteins can be enriched and/or isolated from a sample using any convenient technique, e.g., any biotin-affinity enrichment and/or purification approach such as NeutrAvidin-, avidin-, and/or streptavidin- coated beads (e.g., magnetic beads). The biotinylated proteins can be identified by any convenient method, e.g., by mass spectrometry. For example, mass spectrometry (e.g., tandem mass spectrometry such as LC -MS/MS) can then be used to quantify and/or identify the proteins that are tagged. Results from mass spectrometry can be compared to control experiments (e.g., experiments in which one or more guide RNAs were not present, guide RNAs of inappropriate location were present, one or more fusion proteins were not present, and the like).
Kits and Compositions
[00212] In some cases, a subject kit and/or subject composition includes (a) a first fusion protein
(and/or a nucleic acid encoding the first fusion protein) where the first fusion protein comprises a first nuclease defective CRISPR/Cas nucleic acid-guided polypeptide (e.g., dCas9) fused to a first portion of a split BirA* protein; (b) a first guide nucleic acid (and/or a nucleic acid encoding the first guide nucleic acid) where the first guide nucleic acid binds to the first fusion protein; (c) a second fusion protein (and/or a nucleic acid encoding the second fusion protein) where the second fusion protein comprises a second nuclease defective CRISPR/Cas nucleic acid-guided polypeptide (e.g., dCas9) fused to a second portion of the split BirA* protein; and (d) a second guide nucleic acid (and/or a nucleic acid encoding the second guide nucleic acid) where the second guide nucleic acid binds to the second fusion protein. In such cases, the fusion proteins (a) and (c) are a corresponding pair of split BirA* fusion proteins (i.e., the first and second portions of the split BirA* protein form a functional BirA* protein when they are brought together, e.g., when the CRISPR/Cas nucleic acid-guided polypeptide portions of the fusion proteins, e.g., dCas9 portions of the fusion proteins, are brought near each other at a target DNA via hybridization of the corresponding guide RNAs - see elsewhere herein for details regarding positioning of guide nucleic acids). In some cases, a subject kit and/or subject composition includes (a) a first fusion protein (and/or a nucleic acid encoding the first fusion protein) where the first fusion protein comprises a first nuclease defective CRISPR/Cas nucleic acid-guided polypeptide (e.g., dCas9) fused to a first portion of a split BirA* protein; and (b) a second fusion protein (and/or a nucleic acid encoding the second fusion protein) where the second fusion protein comprises a second nuclease defective CRISPR/Cas nucleic acid-guided polypeptide (e.g., dCas9) fused to a second portion of the split BirA* protein. In such cases, the fusion proteins (a) and (b) are a corresponding pair of split BirA* fusion proteins (i.e., the first and second portions of the split BirA* protein form a functional BirA* protein when they are brought together, e.g., when the CRISPR/Cas nucleic acid-guided polypeptide portions of the fusion proteins, e.g., dCas9 portions of the fusion proteins, are brought near one another at a target DNA via hybridization of corresponding guide nucleic acids - see elsewhere herein for details regarding positioning of the guide nucleic acids).
[00213] The components of a subject kit can be in the same or separate containers. A composition and/or kit can further include one or more additional reagents, e.g., selected from: a dilution buffer; a reconstitution solution (e.g., reconstituting lyophilized fusion prtein(s)); a wash buffer; a control reagent; a control expression vector or RNA polynucleotide; a reagent for in vitro production of a subject fusion protein from DNA or RNA, a reagent for in vitro production of a CRISPR/Cas guide RNA from DNA, and the like.
[00214] In addition to above-mentioned components, a subject kit can further include instructions for using the components of the kit to practice the subject methods. The instructions for practicing the subject methods are generally recorded on a suitable recording medium. For example, the instructions may be printed on a substrate, such as paper or plastic, etc. As such, the instructions may be present in the kits as a package insert, in the labeling of the container of the kit or components thereof (i.e., associated with the packaging or subpackaging) etc. In other embodiments, the instructions are present as an electronic storage data file present on a suitable computer readable storage medium, e.g. CD-ROM, diskette, flash drive, etc. In yet other embodiments, the actual instructions are not present in the kit, but means for obtaining the instructions from a remote source, e.g. via the internet, are provided. An example of this embodiment is a kit that includes a web address where the instructions can be viewed and/or from which the instructions can be downloaded. As with the instructions, this means for obtaining the instructions is recorded on a suitable substrate.
EXAMPLES OF NON-LIMITING ASPECTS OF THE DISCLOSURE
[00215] Aspects, including embodiments, of the present subject matter described above may be beneficial alone or in combination, with one or more other aspects or embodiments. Without limiting the foregoing description, certain non-limiting aspects of the disclosure numbered 1-30 are provided below. As will be apparent to those of skill in the art upon reading this disclosure, each of the individually numbered aspects may be used or combined with any of the preceding or following individually numbered aspects. This is intended to provide support for all such combinations of aspects and is not limited to combinations of aspects explicitly provided below: 1. A method of tagging target proteins in proximity to a nucleotide sequence of interest, the method comprising contacting a target nucleic acid that comprises the nucleotide sequence of interest with: (a) a first fusion protein comprising a first nuclease defective CRISPR/Cas nucleic acid-guided polypeptide fused to a first portion of a split BirA* protein; (b) a first guide nucleic acid that binds to the first fusion protein and hybridizes to a first target sequence of the target nucleic acid; (c) a second fusion protein comprising a second nuclease defective CRISPR/Cas nucleic acid-guided polypeptide fused to a second portion of the split BirA* protein; and (d) a second guide nucleic acid that binds to the second fusion protein and hybridizes to a second target sequence of the target nucleic acid;
wherein the first and second portions of the split BirA* protein are brought together, forming a functional BirA* protein, by targeting of a first complex, comprising (a) and (b), to the first target sequence and targeting of a second complex, comprising (c) and (d), to the second target sequence, and wherein the functional BirA* protein biotinylates target proteins within its proximity.
2. The method of 1, wherein the first target sequence is immediately adjacent a first protospacer adjacent motif (PAM); the second target sequence is immediately adjacent a second protospacer adjacent motif (PAM); and the first and second PAMs are separated by 35 to 80 base pairs (bp).
3. The method of 2, wherein the first and second PAMs are separated by 55 to 75 bp.
4. The method of 2, wherein the first and second PAMs are separated by 35 to 55 bp.
5. The method of 1, wherein the first and second target sequences are separated from one another by more than 60 intervening base pairs (bp).
6. The method of any one of 1-5, wherein the first and second nuclease defective CRISPR/Cas nucleic acid-guided polypeptides are each selected from nuclease defective forms of one of: Cas9, Cpfl, CasX, and CasY.
7. The method of any one of 1-6, wherein the first and/or second nuclease defective
CRISPR/Cas nucleic acid-guided polypeptide is Cas9.
8. The method of any one of 1-7, wherein the nucleotide sequence of interest is located within a eukaryotic cell genome and said contacting takes place within the eukaryotic cell.
9. The method of any one of 1-8, wherein said contacting comprises introducing into a cell (i) the first protein or a nucleic acid encoding the same; (ii) the second fusion protein or a nucleic acid encoding the same; (iii) the first guide nucleic acid or a nucleic acid encoding the same; and (iv) the second guide nucleic acid or a nucleic acid encoding the same.
10. The method of 9, wherein the cell is a eukaryotic cell.
11. The method of any one of 1-10, further comprising a step of detecting, isolating, and/or identifying one or more of the biotinylated target proteins. 12. The method of any one of 1-11, wherein (i) a linker separates the first nuclease defective CRISPR/Cas nucleic acid-guided polypeptide from the first portion of the split BirA* protein, and/or (ii) a linker separates the second nuclease defective CRISPR/Cas nucleic acid-guided polypeptide from the second portion of the split BirA* protein.
13. The method of any one of 1-12, wherein the first portion of the split BirA* protein is fused to the C-terminus or N-terminus of the first nuclease defective CRISPR/Cas nucleic acid-guided polypeptide, and the second portion of the split BirA* protein is fused to the C-terminus or N- terminus of the second nuclease defective CRISPR/Cas nucleic acid-guided polypeptide.
14. A composition for detecting target proteins, comprising: (a) a first fusion protein, or a nucleic acid encoding the first fusion protein, wherein the first fusion protein comprises a first nuclease defective CRISPR/Cas nucleic acid-guided polypeptide fused to a first portion of a split BirA* protein; (b) a first guide nucleic acid, or a nucleic acid encoding the first guide nucleic acid, wherein the first guide nucleic acid binds to the first fusion protein; (c) a second fusion protein, or a nucleic acid encoding the second fusion protein, wherein the second fusion protein comprises a second nuclease defective CRISPR/Cas nucleic acid-guided polypeptide fused to a second portion of the split BirA* protein; and (d) a second guide nucleic acid, or a nucleic acid encoding the second guide nucleic acid, wherein the second guide nucleic acid binds to the second fusion protein; wherein the first and second portions of the split BirA* protein form a functional BirA* protein when they are brought together.
15. The composition of 14, wherein the first and second nuclease defective CRISPR/Cas nucleic acid-guided polypeptides are each selected from a nuclease defective form of: Cas9, Cpfl, CasX, and CasY.
16. The composition of 14 or 15, wherein the first and/or second nuclease defective
CRISPR/Cas nucleic acid-guided polypeptide is Cas9.
17. The composition of any one of 14-16, wherein (i) a linker separates the first nuclease defective CRISPR/Cas nucleic acid-guided polypeptide from the first portion of the split BirA* protein, and/or (ii) a linker separates the second nuclease defective CRISPR/Cas nucleic acid- guided polypeptide from the second portion of the split BirA* protein.
18. The composition of any one of 14-17, wherein the first portion of the split BirA* protein is fused to the C-terminus or N-terminus of the first nuclease defective CRISPR/Cas nucleic acid- guided polypeptide, and the second portion of the split BirA* protein is fused to the C-terminus or N-terminus of the second nuclease defective CRISPR/Cas nucleic acid-guided polypeptide.
19. A method of identifying one or more target proteins in proximity to a nucleotide sequence of interest, the method comprising: (1) introducing into a cell (a) a first fusion protein, or a nucleic acid encoding the first fusion protein, wherein the first fusion protein comprises a first nuclease defective CRISPR/Cas nucleic acid-guided polypeptide fused to a first portion of a split BirA* protein; (b) a first guide nucleic acid, or a nucleic acid encoding the first guide nucleic acid, wherein the first guide nucleic acid binds to the first fusion protein and hybridizes to a first target sequence of a target nucleic acid; (c) a second fusion protein, or a nucleic acid encoding the second fusion protein, wherein the second fusion protein comprises a second nuclease defective CRISPR/Cas nucleic acid-guided polypeptide fused to a second portion of the split BirA* protein; and (d) a second guide nucleic acid, or a nucleic acid encoding the second guide nucleic acid, wherein the second guide nucleic acid binds to the second fusion protein and hybridizes to a second target sequence of the target nucleic acid,
wherein the first and second portions of the split BirA* protein are brought together, forming a functional BirA* protein that biotinylates target proteins within its proximity; and (2) identifying one or more of the biotinylated target proteins.
20. The method of 19, wherein the first target sequence is immediately adjacent a first protospacer adjacent motif (PAM); the second target sequence is immediately adjacent a second protospacer adjacent motif (PAM); and the first and second PAMs are separated by 35 to 80 base pairs (bp).
21. The method of 20, wherein the first and second PAMs are separated by 55 to 75 bp.
22. The method of 20, wherein the first and second PAMs are separated by 35 to 55 bp.
23. The method of 19, wherein the first and second target sequences are separated from one another by more than 60 intervening base pairs (bp).
24. The method of any one of 19-23, wherein the first and second nuclease defective
CRISPR/Cas nucleic acid-guided polypeptides are each selected from: Cas9, Cpfl, CasX, and CasY.
25. The method of any one of 19-24, wherein the first and/or second nuclease defective CRISPR/Cas nucleic acid-guided polypeptide is Cas9.
26. The method of any one of 19-25, wherein the cell is a eukaryotic cell.
27. The method of any one of 19-26, wherein (i) a linker separates the first nuclease defective CRISPR/Cas nucleic acid-guided polypeptide from the first portion of the split BirA* protein, and/or (ii) a linker separates the second nuclease defective CRISPR/Cas nucleic acid-guided polypeptide from the second portion of the split BirA* protein.
28. The method of any one of 19-27, wherein the first portion of the split BirA* protein is fused to the C-terminus or N-terminus of the first nuclease defective CRISPR/Cas nucleic acid-guided polypeptide, and the second portion of the split BirA* protein is fused to the C-terminus or N- terminus of the second nuclease defective CRISPR/Cas nucleic acid-guided polypeptide. 29. A kit, comprising: (i) a first fusion protein, or a nucleic acid encoding the first fusion protein, wherein the first fusion protein comprises a first nuclease defective CRISPR/Cas nucleic acid-guided polypeptide fused to a first portion of a split BirA* protein; and (ii) a second fusion protein, or a nucleic acid encoding the second fusion protein, wherein the second fusion protein comprises a second nuclease defective CRISPR/Cas nucleic acid-guided polypeptide fused to a second portion of the split BirA* protein; wherein the first and second portions of the split BirA* protein form a functional BirA* protein when brought together.
30. The kit of 29, further comprising: (iii) a first guide nucleic acid, or a nucleic acid encoding the first guide nucleic acid, wherein the first guide nucleic acid is capable of binding to the first fusion protein and comprises a guide sequence that is complementary to a first target sequence of a target nucleic acid; (iv) a second guide nucleic acid, or a nucleic acid encoding the second guide nucleic acid, wherein the second guide nucleic acid is capable of binding to the second fusion protein and comprises a guide sequence that is complementary to a second target sequence of a target nucleic acid
EXAMPLES
[00216] The following examples are put forth so as to provide those of ordinary skill in the art with a complete disclosure and description of how to make and use the present invention, and are not intended to limit the scope of what the inventors regard as their invention nor are they intended to represent that the experiments below are all or the only experiments performed. Efforts have been made to ensure accuracy with respect to numbers used (e.g. amounts, temperature, etc.) but some experimental errors and deviations should be accounted for. Unless indicated otherwise, parts are parts by weight, molecular weight is weight average molecular weight, temperature is in degrees Celsius, and pressure is at or near atmospheric. Standard abbreviations may be used, e.g., bp, base pair(s); kb, kilobase(s); pi, picoliter(s); s or sec, second(s); min, minute(s); h or hr, hour(s); aa, amino acid(s); kb, kilobase(s); bp, base pair(s); nt, nucleotide(s); i.m.,
intramuscular(ly); i.p., intraperitoneal(ly); s.c, subcutaneous(ly); and the like.
[00217] While the present invention has been described with reference to the specific embodiments thereof, it should be understood by those skilled in the art that various changes may be made and equivalents may be substituted without departing from the true spirit and scope of the invention. In addition, many modifications may be made to adapt a particular situation, material, composition of matter, process, process step or steps, to the objective, spirit and scope of the present invention. All such modifications are intended to be within the scope of the claims appended hereto. Example 1
Results
[00218] Figure 3 demonstrates that two halves of the split BirA* can form a catalytically active
complex when the two halves are brought together. In this particular case the two halves are fused to complementary halves of split GFP that drive assembly of the split BirA*. HEK293 cells were transiently transfected with plasmids encoding either the full length BirA* (lane 1), either half of the split BirA* in the absence of the split GFP fusions (lanes 2 and 3), either half of the split BirA* in the presence of the split GFP fusions (lanes 4 and 5), both halves of the split BirA* in the absence of the split GFP fusions (lane 6), both halves of the split BirA* in the presence of the split GFP fusions (lane 7), or both halves of the split BirA* in which the N- terminal contains the specified alanine mutation in the presence of the split GFP fusions (lane 8- 13). Biotin was added to the culture media 24 hours after transfection to a final concentration of 50 μΜ. Cells were lysed 24 hours after the addition of biotin. The lysates were then run on a polyacrylamide gel and the presence of biotinylated proteins was detected using an anti-biotin primary antibody. Bands were visualized using secondary antibodies conjugated to horseradish peroxidase.
[00219] Figure 9 demonstrates the expression of HA and FLAG-tagged fusion proteins containing one of the two halves of the split BirA* fused to either the N-terminus or C-terminus of dCas9. Two sets of fusion proteins, one in which the either half of split BirA* was N-terminally or C- terminally linked to dCas9 through the linker SGSETPGTSES (SEQ ID NO: 10) ("XTEN Linker") and one in which either half of split BirA* was N-terminally or C-terminally linked to dCas9 through the linker GGGGSSGSETPGTSES (SEQ ID NO: 1135) ("XTEN + GGGS linker") were expressed. Thus, e.g., a linker in a subject fusion protein can be different lengths.
[00220] Figure 10 demonstrates that expression of two guide nucleic acids (sgRNAs in this case) used localize in close proximity two dCas9 molecules fused to complementary halves of split BirA* can drive assembly of the split BirA* into a catalytically active complex. Cells were transiently expressed with the indicated constructs (FL = full-length BirA*; +/+ sgRNA = two sgRNAs targeting the same strand; +/- sgRNA = two sgRNAs targeting different strands. A.2 = a N- terminal fusion of the first half of split BirA* linked to a dCas9 molecule through a "GGGGS + XTEN linker" [GGGGSSGSETPGTSES (SEQ ID NO: 1135)] and N-terminal fusion of the second half of split BirA* linked to a dCas9 molecule through a "GGGGS + XTEN linker" [GGGGSSGSETPGTSES (SEQ ID NO: 1135)]. Biotin was then added to a final concentration of 50 μΜ 24 hours after transfections. Cells were lysed 24 hours after the addition of biotin, and the Streptavidin-coated magnetic beads were used to pulldown on the biotinylated proteins in the cell lysates. Biotinylated proteins eluted off the Streptavidin-coated beads (S = primary supernatant; E = eluate). The supernatant and eluate samples were run on a polyacrylamide gel and probed for the presence of the HA-tagged and FLAG-tagged proteins using anti-HA or anti- FLAG primary antibodies. Bands were visualized using secondary antibodies conjugated to horseradish peroxidase. The presence of the HA-tagged and FLAG-tagged split BirA*-dCas9 fusion proteins in the eluate of samples derived from cells that also expressed two guide RNAs targeting opposite strands of DNA indicates that simultaneous binding of two dCas9 molecules can drive assembly of the split BirA*.

Claims

CLAIMS That which is claimed is:
1. A method of tagging target proteins in proximity to a nucleotide sequence of interest, the method comprising contacting a target nucleic acid that comprises the nucleotide sequence of interest with:
(a) a first fusion protein comprising a first nuclease defective CRISPR/Cas nucleic acid-guided polypeptide fused to a first portion of a split BirA* protein;
(b) a first guide nucleic acid that binds to the first fusion protein and hybridizes to a first target sequence of the target nucleic acid;
(c) a second fusion protein comprising a second nuclease defective CRISPR/Cas nucleic acid- guided polypeptide fused to a second portion of the split BirA* protein; and
(d) a second guide nucleic acid that binds to the second fusion protein and hybridizes to a second target sequence of the target nucleic acid;
wherein the first and second portions of the split BirA* protein are brought together, forming a functional BirA* protein, by targeting of a first complex, comprising (a) and (b), to the first target sequence and targeting of a second complex, comprising (c) and (d), to the second target sequence, and wherein the functional BirA* protein biotinylates target proteins within its proximity.
2. The method of claim 1, wherein the first target sequence is immediately adjacent a first protospacer adjacent motif (PAM); the second target sequence is immediately adjacent a second protospacer adjacent motif (PAM); and the first and second PAMs are separated by 35 to 80 base pairs (bp).
3. The method of claim 2, wherein the first and second PAMs are separated by 55 to 75 bp.
4. The method of claim 2, wherein the first and second PAMs are separated by 35 to 55 bp.
5. The method of claim 1, wherein the first and second target sequences are separated from one another by more than 60 intervening base pairs (bp).
6. The method of any one of claims 1-5, wherein the first and second nuclease defective
CRISPR/Cas nucleic acid-guided polypeptides are each selected from nuclease defective forms of one of: Cas9, Cpfl, CasX, and CasY.
7. The method of any one of claims 1-6, wherein the first and/or second nuclease defective CRISPR/Cas nucleic acid-guided polypeptide is Cas9.
8. The method of any one of claims 1-7, wherein the nucleotide sequence of interest is located within a eukaryotic cell genome and said contacting takes place within the eukaryotic cell.
9. The method of any one of claims 1-8, wherein said contacting comprises introducing into a cell (i) the first protein or a nucleic acid encoding the same; (ii) the second fusion protein or a nucleic acid encoding the same; (iii) the first guide nucleic acid or a nucleic acid encoding the same; and (iv) the second guide nucleic acid or a nucleic acid encoding the same.
10. The method of claim 9, wherein the cell is a eukaryotic cell.
11. The method of any one of claims 1-10, further comprising a step of detecting, isolating, and/or identifying one or more of the biotinylated target proteins.
12. The method of any one of claims 1-11, wherein (i) a linker separates the first nuclease defective CRISPR/Cas nucleic acid-guided polypeptide from the first portion of the split BirA* protein, and/or (ii) a linker separates the second nuclease defective CRISPR/Cas nucleic acid-guided polypeptide from the second portion of the split BirA* protein.
13. The method of any one of claims 1-12, wherein the first portion of the split BirA* protein is fused to the C-terminus or N-terminus of the first nuclease defective CRISPR/Cas nucleic acid-guided polypeptide, and the second portion of the split BirA* protein is fused to the C-terminus or N-terminus of the second nuclease defective CRISPR/Cas nucleic acid-guided polypeptide.
14. A composition for detecting target proteins, comprising:
(a) a first fusion protein, or a nucleic acid encoding the first fusion protein, wherein the first fusion protein comprises a first nuclease defective CRISPR/Cas nucleic acid-guided polypeptide fused to a first portion of a split BirA* protein;
(b) a first guide nucleic acid, or a nucleic acid encoding the first guide nucleic acid, wherein the first guide nucleic acid binds to the first fusion protein;
(c) a second fusion protein, or a nucleic acid encoding the second fusion protein, wherein the second fusion protein comprises a second nuclease defective CRISPR/Cas nucleic acid-guided polypeptide fused to a second portion of the split BirA* protein; and (d) a second guide nucleic acid, or a nucleic acid encoding the second guide nucleic acid, wherein the second guide nucleic acid binds to the second fusion protein;
wherein the first and second portions of the split BirA* protein form a functional BirA* protein when they are brought together.
15. The composition of claim 14, wherein the first and second nuclease defective CRISPR/Cas nucleic acid-guided polypeptides are each selected from a nuclease defective form of: Cas9, Cpfl, CasX, and CasY.
16. The composition of claim 14 or claim 15, wherein the first and/or second nuclease defective CRISPR/Cas nucleic acid-guided polypeptide is Cas9.
17. The composition of any one of claims 14-16, wherein (i) a linker separates the first nuclease defective CRISPR/Cas nucleic acid-guided polypeptide from the first portion of the split BirA* protein, and/or (ii) a linker separates the second nuclease defective CRISPR/Cas nucleic acid-guided polypeptide from the second portion of the split BirA* protein.
18. The composition of any one of claims 14-17, wherein the first portion of the split BirA* protein is fused to the C-terminus or N-terminus of the first nuclease defective CRISPR/Cas nucleic acid-guided polypeptide, and the second portion of the split BirA* protein is fused to the C-terminus or N-terminus of the second nuclease defective CRISPR/Cas nucleic acid-guided polypeptide.
19. A method of identifying one or more target proteins in proximity to a nucleotide sequence of interest, the method comprising:
(1) introducing into a cell
(a) a first fusion protein, or a nucleic acid encoding the first fusion protein, wherein the first fusion protein comprises a first nuclease defective CRISPR/Cas nucleic acid-guided polypeptide fused to a first portion of a split BirA* protein;
(b) a first guide nucleic acid, or a nucleic acid encoding the first guide nucleic acid, wherein the first guide nucleic acid binds to the first fusion protein and hybridizes to a first target sequence of a target nucleic acid;
(c) a second fusion protein, or a nucleic acid encoding the second fusion protein, wherein the second fusion protein comprises a second nuclease defective CRISPR/Cas nucleic acid- guided polypeptide fused to a second portion of the split BirA* protein; and (d) a second guide nucleic acid, or a nucleic acid encoding the second guide nucleic acid, wherein the second guide nucleic acid binds to the second fusion protein and hybridizes to a second target sequence of the target nucleic acid,
wherein the first and second portions of the split BirA* protein are brought together, forming a functional BirA* protein that biotinylates target proteins within its proximity; and
(2) identifying one or more of the biotinylated target proteins.
20. The method of claim 19, wherein the first target sequence is immediately adjacent a first protospacer adjacent motif (PAM); the second target sequence is immediately adjacent a second protospacer adjacent motif (PAM); and the first and second PAMs are separated by 35 to 80 base pairs (bp).
21. The method of claim 20, wherein the first and second PAMs are separated by 55 to 75 bp.
22. The method of claim 20, wherein the first and second PAMs are separated by 35 to 55 bp.
23. The method of claim 19, wherein the first and second target sequences are separated from one another by more than 60 intervening base pairs (bp).
24. The method of any one of claims 19-23, wherein the first and second nuclease defective CRISPR/Cas nucleic acid-guided polypeptides are each selected from: Cas9, Cpfl, CasX, and CasY.
25. The method of any one of claims 19-24, wherein the first and/or second nuclease defective CRISPR/Cas nucleic acid-guided polypeptide is Cas9.
26. The method of any one of claims 19-25, wherein the cell is a eukaryotic cell.
27. The method of any one of claims 19-26, wherein (i) a linker separates the first nuclease defective CRISPR/Cas nucleic acid-guided polypeptide from the first portion of the split BirA* protein, and/or (ii) a linker separates the second nuclease defective CRISPR/Cas nucleic acid-guided polypeptide from the second portion of the split BirA* protein.
28. The method of any one of claims 19-27, wherein the first portion of the split BirA* protein is fused to the C-terminus or N-terminus of the first nuclease defective CRISPR/Cas nucleic acid-guided polypeptide, and the second portion of the split BirA* protein is fused to the C-terminus or N-terminus of the second nuclease defective CRISPR/Cas nucleic acid-guided polypeptide.
29. A kit, comprising:
(i) a first fusion protein, or a nucleic acid encoding the first fusion protein, wherein the first fusion protein comprises a first nuclease defective CRISPR/Cas nucleic acid-guided polypeptide fused to a first portion of a split BirA* protein; and
(ii) a second fusion protein, or a nucleic acid encoding the second fusion protein, wherein the second fusion protein comprises a second nuclease defective CRISPR/Cas nucleic acid-guided polypeptide fused to a second portion of the split BirA* protein;
wherein the first and second portions of the split BirA* protein form a functional BirA* protein when brought together.
30. The kit of claim 29, further comprising:
(iii) a first guide nucleic acid, or a nucleic acid encoding the first guide nucleic acid, wherein the first guide nucleic acid is capable of binding to the first fusion protein and comprises a guide sequence that is complementary to a first target sequence of a target nucleic acid;
(iv) a second guide nucleic acid, or a nucleic acid encoding the second guide nucleic acid, wherein the second guide nucleic acid is capable of binding to the second fusion protein and comprises a guide sequence that is complementary to a second target sequence of a target nucleic acid.
PCT/US2018/031563 2017-05-09 2018-05-08 Compositions and methods for tagging target proteins in proximity to a nucleotide sequence of interest WO2018208755A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
US201762503779P 2017-05-09 2017-05-09
US62/503,779 2017-05-09

Publications (1)

Publication Number Publication Date
WO2018208755A1 true WO2018208755A1 (en) 2018-11-15

Family

ID=64105493

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/US2018/031563 WO2018208755A1 (en) 2017-05-09 2018-05-08 Compositions and methods for tagging target proteins in proximity to a nucleotide sequence of interest

Country Status (1)

Country Link
WO (1) WO2018208755A1 (en)

Cited By (24)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US10465176B2 (en) 2013-12-12 2019-11-05 President And Fellows Of Harvard College Cas variants for gene editing
US10508298B2 (en) 2013-08-09 2019-12-17 President And Fellows Of Harvard College Methods for identifying a target site of a CAS9 nuclease
US10597679B2 (en) 2013-09-06 2020-03-24 President And Fellows Of Harvard College Switchable Cas9 nucleases and uses thereof
US10682410B2 (en) 2013-09-06 2020-06-16 President And Fellows Of Harvard College Delivery system for functional nucleases
US10704062B2 (en) 2014-07-30 2020-07-07 President And Fellows Of Harvard College CAS9 proteins including ligand-dependent inteins
US10745677B2 (en) 2016-12-23 2020-08-18 President And Fellows Of Harvard College Editing of CCR5 receptor gene to protect against HIV infection
US10858639B2 (en) 2013-09-06 2020-12-08 President And Fellows Of Harvard College CAS9 variants and uses thereof
US10947530B2 (en) 2016-08-03 2021-03-16 President And Fellows Of Harvard College Adenosine nucleobase editors and uses thereof
US11046948B2 (en) 2013-08-22 2021-06-29 President And Fellows Of Harvard College Engineered transcription activator-like effector (TALE) domains and uses thereof
US11214780B2 (en) 2015-10-23 2022-01-04 President And Fellows Of Harvard College Nucleobase editors and uses thereof
US11268082B2 (en) 2017-03-23 2022-03-08 President And Fellows Of Harvard College Nucleobase editors comprising nucleic acid programmable DNA binding proteins
US11306324B2 (en) 2016-10-14 2022-04-19 President And Fellows Of Harvard College AAV delivery of nucleobase editors
US11319532B2 (en) 2017-08-30 2022-05-03 President And Fellows Of Harvard College High efficiency base editors comprising Gam
US11447770B1 (en) 2019-03-19 2022-09-20 The Broad Institute, Inc. Methods and compositions for prime editing nucleotide sequences
US11542496B2 (en) 2017-03-10 2023-01-03 President And Fellows Of Harvard College Cytosine to guanine base editor
US11542509B2 (en) 2016-08-24 2023-01-03 President And Fellows Of Harvard College Incorporation of unnatural amino acids into proteins using base editing
US11560566B2 (en) 2017-05-12 2023-01-24 President And Fellows Of Harvard College Aptazyme-embedded guide RNAs for use with CRISPR-Cas9 in genome editing and transcriptional activation
US11661590B2 (en) 2016-08-09 2023-05-30 President And Fellows Of Harvard College Programmable CAS9-recombinase fusion proteins and uses thereof
US11732274B2 (en) 2017-07-28 2023-08-22 President And Fellows Of Harvard College Methods and compositions for evolving base editors using phage-assisted continuous evolution (PACE)
US11795443B2 (en) 2017-10-16 2023-10-24 The Broad Institute, Inc. Uses of adenosine base editors
WO2024005863A1 (en) * 2022-06-30 2024-01-04 Inari Agriculture Technology, Inc. Compositions, systems, and methods for genome editing
WO2024005864A1 (en) * 2022-06-30 2024-01-04 Inari Agriculture Technology, Inc. Compositions, systems, and methods for genome editing
US11898179B2 (en) 2017-03-09 2024-02-13 President And Fellows Of Harvard College Suppression of pain by gene editing
US11912985B2 (en) 2020-05-08 2024-02-27 The Broad Institute, Inc. Methods and compositions for simultaneous editing of both strands of a target double-stranded nucleotide sequence

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2004033485A2 (en) * 2002-10-09 2004-04-22 The Trustees Of Boston University Nucleic acid supported protein complementation
WO2009018003A2 (en) * 2007-07-27 2009-02-05 Ensemble Discovery Corporation Detection assays and use thereof
WO2014070227A1 (en) * 2012-11-03 2014-05-08 Roux Kyle Methods and reagents for identifying proximate proteins
WO2014139980A1 (en) * 2013-03-12 2014-09-18 Ventana Medical Systems, Inc. Proximity assay for in situ detection of targets
WO2014204578A1 (en) * 2013-06-21 2014-12-24 The General Hospital Corporation Using rna-guided foki nucleases (rfns) to increase specificity for rna-guided genome editing
WO2015035162A2 (en) * 2013-09-06 2015-03-12 President And Fellows Of Harvard College Cas9 variants and uses thereof
WO2018053053A1 (en) * 2016-09-13 2018-03-22 The Broad Institute, Inc. Proximity-dependent biotinylation and uses thereof

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2004033485A2 (en) * 2002-10-09 2004-04-22 The Trustees Of Boston University Nucleic acid supported protein complementation
WO2009018003A2 (en) * 2007-07-27 2009-02-05 Ensemble Discovery Corporation Detection assays and use thereof
WO2014070227A1 (en) * 2012-11-03 2014-05-08 Roux Kyle Methods and reagents for identifying proximate proteins
WO2014139980A1 (en) * 2013-03-12 2014-09-18 Ventana Medical Systems, Inc. Proximity assay for in situ detection of targets
WO2014204578A1 (en) * 2013-06-21 2014-12-24 The General Hospital Corporation Using rna-guided foki nucleases (rfns) to increase specificity for rna-guided genome editing
WO2015035162A2 (en) * 2013-09-06 2015-03-12 President And Fellows Of Harvard College Cas9 variants and uses thereof
WO2018053053A1 (en) * 2016-09-13 2018-03-22 The Broad Institute, Inc. Proximity-dependent biotinylation and uses thereof

Non-Patent Citations (3)

* Cited by examiner, † Cited by third party
Title
DEMUNTER ET AL.: "Split-BioID: a proximity biotinylation assay for dimerization-dependent protein interactions", FEBS LETTERS, vol. 591, no. 2, 12 January 2017 (2017-01-12), pages 415 - 424, XP055552769 *
RAN ET AL.: "Double nicking by RNA-guided CRISPR Cas9 for enhanced genome editing specificity", CELL, vol. 154, no. 6, September 2013 (2013-09-01), pages 1380 - 1389, XP055299681 *
SCHMIDTMANN E. ET AL.: "Determination of local chromatin composition by CasID", NUCLEUS, vol. 7, no. 5, 27 September 2016 (2016-09-27), pages 476 - 484, XP055552115 *

Cited By (36)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US10954548B2 (en) 2013-08-09 2021-03-23 President And Fellows Of Harvard College Nuclease profiling system
US10508298B2 (en) 2013-08-09 2019-12-17 President And Fellows Of Harvard College Methods for identifying a target site of a CAS9 nuclease
US11920181B2 (en) 2013-08-09 2024-03-05 President And Fellows Of Harvard College Nuclease profiling system
US11046948B2 (en) 2013-08-22 2021-06-29 President And Fellows Of Harvard College Engineered transcription activator-like effector (TALE) domains and uses thereof
US10597679B2 (en) 2013-09-06 2020-03-24 President And Fellows Of Harvard College Switchable Cas9 nucleases and uses thereof
US10682410B2 (en) 2013-09-06 2020-06-16 President And Fellows Of Harvard College Delivery system for functional nucleases
US11299755B2 (en) 2013-09-06 2022-04-12 President And Fellows Of Harvard College Switchable CAS9 nucleases and uses thereof
US10858639B2 (en) 2013-09-06 2020-12-08 President And Fellows Of Harvard College CAS9 variants and uses thereof
US10912833B2 (en) 2013-09-06 2021-02-09 President And Fellows Of Harvard College Delivery of negatively charged proteins using cationic lipids
US11124782B2 (en) 2013-12-12 2021-09-21 President And Fellows Of Harvard College Cas variants for gene editing
US11053481B2 (en) 2013-12-12 2021-07-06 President And Fellows Of Harvard College Fusions of Cas9 domains and nucleic acid-editing domains
US10465176B2 (en) 2013-12-12 2019-11-05 President And Fellows Of Harvard College Cas variants for gene editing
US10704062B2 (en) 2014-07-30 2020-07-07 President And Fellows Of Harvard College CAS9 proteins including ligand-dependent inteins
US11578343B2 (en) 2014-07-30 2023-02-14 President And Fellows Of Harvard College CAS9 proteins including ligand-dependent inteins
US11214780B2 (en) 2015-10-23 2022-01-04 President And Fellows Of Harvard College Nucleobase editors and uses thereof
US10947530B2 (en) 2016-08-03 2021-03-16 President And Fellows Of Harvard College Adenosine nucleobase editors and uses thereof
US11702651B2 (en) 2016-08-03 2023-07-18 President And Fellows Of Harvard College Adenosine nucleobase editors and uses thereof
US11661590B2 (en) 2016-08-09 2023-05-30 President And Fellows Of Harvard College Programmable CAS9-recombinase fusion proteins and uses thereof
US11542509B2 (en) 2016-08-24 2023-01-03 President And Fellows Of Harvard College Incorporation of unnatural amino acids into proteins using base editing
US11306324B2 (en) 2016-10-14 2022-04-19 President And Fellows Of Harvard College AAV delivery of nucleobase editors
US10745677B2 (en) 2016-12-23 2020-08-18 President And Fellows Of Harvard College Editing of CCR5 receptor gene to protect against HIV infection
US11820969B2 (en) 2016-12-23 2023-11-21 President And Fellows Of Harvard College Editing of CCR2 receptor gene to protect against HIV infection
US11898179B2 (en) 2017-03-09 2024-02-13 President And Fellows Of Harvard College Suppression of pain by gene editing
US11542496B2 (en) 2017-03-10 2023-01-03 President And Fellows Of Harvard College Cytosine to guanine base editor
US11268082B2 (en) 2017-03-23 2022-03-08 President And Fellows Of Harvard College Nucleobase editors comprising nucleic acid programmable DNA binding proteins
US11560566B2 (en) 2017-05-12 2023-01-24 President And Fellows Of Harvard College Aptazyme-embedded guide RNAs for use with CRISPR-Cas9 in genome editing and transcriptional activation
US11732274B2 (en) 2017-07-28 2023-08-22 President And Fellows Of Harvard College Methods and compositions for evolving base editors using phage-assisted continuous evolution (PACE)
US11932884B2 (en) 2017-08-30 2024-03-19 President And Fellows Of Harvard College High efficiency base editors comprising Gam
US11319532B2 (en) 2017-08-30 2022-05-03 President And Fellows Of Harvard College High efficiency base editors comprising Gam
US11795443B2 (en) 2017-10-16 2023-10-24 The Broad Institute, Inc. Uses of adenosine base editors
US11795452B2 (en) 2019-03-19 2023-10-24 The Broad Institute, Inc. Methods and compositions for prime editing nucleotide sequences
US11643652B2 (en) 2019-03-19 2023-05-09 The Broad Institute, Inc. Methods and compositions for prime editing nucleotide sequences
US11447770B1 (en) 2019-03-19 2022-09-20 The Broad Institute, Inc. Methods and compositions for prime editing nucleotide sequences
US11912985B2 (en) 2020-05-08 2024-02-27 The Broad Institute, Inc. Methods and compositions for simultaneous editing of both strands of a target double-stranded nucleotide sequence
WO2024005863A1 (en) * 2022-06-30 2024-01-04 Inari Agriculture Technology, Inc. Compositions, systems, and methods for genome editing
WO2024005864A1 (en) * 2022-06-30 2024-01-04 Inari Agriculture Technology, Inc. Compositions, systems, and methods for genome editing

Similar Documents

Publication Publication Date Title
WO2018208755A1 (en) Compositions and methods for tagging target proteins in proximity to a nucleotide sequence of interest
US11666666B2 (en) Methods and compositions for modifying a mutant dystrophin gene in a cell's genome
US20220042047A1 (en) Compositions and methods for modifying a target nucleic acid
EP3352795B1 (en) Compositions and methods for target nucleic acid modification
US11427837B2 (en) Compositions and methods for enhanced genome editing
US11118194B2 (en) Modified site-directed modifying polypeptides and methods of use thereof
US20220220508A1 (en) Engineered casx systems
US11578323B2 (en) RNA-programmable endonuclease systems and their use in genome editing and other applications
US11248216B2 (en) Methods and compositions for genomic editing
US11053271B2 (en) Methods and compositions for nucleic acid integration
US11208638B2 (en) Heterodimeric Cas9 and methods of use thereof
US11530421B2 (en) Self-inactivating endonuclease-encoding nucleic acids and methods of using the same
US20200199552A1 (en) Variant cas9 polypeptides comprising internal insertions
US20200347387A1 (en) Compositions and methods for target nucleic acid modification
WO2018195540A1 (en) Inducible, tunable, and multiplex human gene regulation using crispr-cpf1
US20200115688A1 (en) Compositions and methods for enhancing genome editing
CA3036695A1 (en) Targeted enhanced dna demethylation
WO2020209959A1 (en) Nucleobase-editing fusion protein systems, compositions, and uses thereof

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 18797821

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 18797821

Country of ref document: EP

Kind code of ref document: A1