WO2023122663A2 - Protéines effectrices et procédés d'utilisation - Google Patents

Protéines effectrices et procédés d'utilisation Download PDF

Info

Publication number
WO2023122663A2
WO2023122663A2 PCT/US2022/082137 US2022082137W WO2023122663A2 WO 2023122663 A2 WO2023122663 A2 WO 2023122663A2 US 2022082137 W US2022082137 W US 2022082137W WO 2023122663 A2 WO2023122663 A2 WO 2023122663A2
Authority
WO
WIPO (PCT)
Prior art keywords
nucleic acid
sequence
effector protein
composition
seq
Prior art date
Application number
PCT/US2022/082137
Other languages
English (en)
Other versions
WO2023122663A3 (fr
Inventor
Lucas Benjamin HARRINGTON
Benjamin Julius RAUCH
Aaron DELOUGHERY
Alexander Richard NECKELMANN
Timothy Robert ABBOTT
William Douglass WRIGHT
Yuxuan Zheng
David Burstein
Dov Arie GERTZ
Tomer PARKIET
Fnu YUNANDA
Original Assignee
Mammoth Biosciences, Inc.
Ramot At Tel-Aviv University Ltd.
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Mammoth Biosciences, Inc., Ramot At Tel-Aviv University Ltd. filed Critical Mammoth Biosciences, Inc.
Publication of WO2023122663A2 publication Critical patent/WO2023122663A2/fr
Publication of WO2023122663A3 publication Critical patent/WO2023122663A3/fr

Links

Classifications

    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12NMICROORGANISMS OR ENZYMES; COMPOSITIONS THEREOF; PROPAGATING, PRESERVING, OR MAINTAINING MICROORGANISMS; MUTATION OR GENETIC ENGINEERING; CULTURE MEDIA
    • C12N15/00Mutation or genetic engineering; DNA or RNA concerning genetic engineering, vectors, e.g. plasmids, or their isolation, preparation or purification; Use of hosts therefor
    • C12N15/09Recombinant DNA-technology
    • C12N15/87Introduction of foreign genetic material using processes not otherwise provided for, e.g. co-transformation
    • C12N15/90Stable introduction of foreign DNA into chromosome
    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12NMICROORGANISMS OR ENZYMES; COMPOSITIONS THEREOF; PROPAGATING, PRESERVING, OR MAINTAINING MICROORGANISMS; MUTATION OR GENETIC ENGINEERING; CULTURE MEDIA
    • C12N15/00Mutation or genetic engineering; DNA or RNA concerning genetic engineering, vectors, e.g. plasmids, or their isolation, preparation or purification; Use of hosts therefor
    • C12N15/09Recombinant DNA-technology
    • C12N15/11DNA or RNA fragments; Modified forms thereof; Non-coding nucleic acids having a biological activity
    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12NMICROORGANISMS OR ENZYMES; COMPOSITIONS THEREOF; PROPAGATING, PRESERVING, OR MAINTAINING MICROORGANISMS; MUTATION OR GENETIC ENGINEERING; CULTURE MEDIA
    • C12N9/00Enzymes; Proenzymes; Compositions thereof; Processes for preparing, activating, inhibiting, separating or purifying enzymes
    • C12N9/14Hydrolases (3)
    • C12N9/16Hydrolases (3) acting on ester bonds (3.1)
    • C12N9/22Ribonucleases RNAses, DNAses
    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12NMICROORGANISMS OR ENZYMES; COMPOSITIONS THEREOF; PROPAGATING, PRESERVING, OR MAINTAINING MICROORGANISMS; MUTATION OR GENETIC ENGINEERING; CULTURE MEDIA
    • C12N9/00Enzymes; Proenzymes; Compositions thereof; Processes for preparing, activating, inhibiting, separating or purifying enzymes
    • C12N9/14Hydrolases (3)
    • C12N9/78Hydrolases (3) acting on carbon to nitrogen bonds other than peptide bonds (3.5)
    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12NMICROORGANISMS OR ENZYMES; COMPOSITIONS THEREOF; PROPAGATING, PRESERVING, OR MAINTAINING MICROORGANISMS; MUTATION OR GENETIC ENGINEERING; CULTURE MEDIA
    • C12N2310/00Structure or type of the nucleic acid
    • C12N2310/10Type of nucleic acid
    • C12N2310/20Type of nucleic acid involving clustered regularly interspaced short palindromic repeats [CRISPRs]
    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12NMICROORGANISMS OR ENZYMES; COMPOSITIONS THEREOF; PROPAGATING, PRESERVING, OR MAINTAINING MICROORGANISMS; MUTATION OR GENETIC ENGINEERING; CULTURE MEDIA
    • C12N2750/00MICROORGANISMS OR ENZYMES; COMPOSITIONS THEREOF; PROPAGATING, PRESERVING, OR MAINTAINING MICROORGANISMS; MUTATION OR GENETIC ENGINEERING; CULTURE MEDIA ssDNA viruses
    • C12N2750/00011Details
    • C12N2750/14011Parvoviridae
    • C12N2750/14111Dependovirus, e.g. adenoassociated viruses
    • C12N2750/14141Use of virus, viral particle or viral elements as a vector
    • C12N2750/14143Use of virus, viral particle or viral elements as a vector viral genome or elements thereof as genetic vector

Definitions

  • the present disclosure relates generally to effector proteins, compositions of such effector proteins and guide nucleic acids, systems and methods of using such effector proteins and compositions, including detecting and editing target nucleic acids.
  • Programmable nucleases are proteins that bind and cleave nucleic acids in a sequence-specific manner.
  • a programmable nuclease may bind a target region of a nucleic acid and cleave the nucleic acid within the target region or at a position adjacent to the target region.
  • a programmable nuclease is activated when it binds a target region of a nucleic acid to cleave regions of the nucleic acid that are near, but not adjacent to the target region.
  • a programmable nuclease such as a CRISPR-associated (Cas) protein, may be coupled to a guide nucleic acid that imparts activity or sequence selectivity to the programmable nuclease.
  • guide nucleic acids comprise a CRISPR RNA (crRNA) that is at least partially complementary to a target nucleic acid.
  • guide nucleic acids comprise a trans-activating crRNA (tracrRNA), at least a portion of which interacts with the programmable nuclease.
  • tracrRNA is provided separately from the crRNA and hybridizes to a portion of the crRNA that does not hybridize to the target nucleic acid.
  • the tracrRNA and crRNA are linked as a single guide RNA.
  • Programmable nucleases may cleave nucleic acids, including single stranded RNA (ssRNA), double stranded DNA (dsDNA), and single-stranded DNA (ssDNA). Programmable nucleases may provide cis cleavage activity, trans cleavage activity, nickase activity, or a combination thereof.
  • Cis cleavage activity is cleavage of a target nucleic acid that is hybridized to a guide nucleic acid, wherein cleavage occurs within or directly adjacent to the region of the target nucleic acid that is hybridized to guide RNA.
  • Trans cleavage activity is cleavage of ssDNA or ssRNA that is near, but not hybridized to the guide RNA. Trans cleavage activity may be triggered by the hybridization of guide RNA to the target nucleic acid.
  • nickase activity is the selective cleavage of one strand of a dsDNA molecule.
  • Programmable nucleases may be modified to have reduced nuclease or nickase activity relative to its unmodified version but retain their sequence selectivity. For instance, amino acid residues of the programmable nuclease that impart catalytic activity to the programmable nuclease may be substituted with an alternative amino acid that does not impart catalytic activity to the programmable nuclease.
  • effector protein is used herein and throughout to encompass both programmable nucleases and modified versions thereof that may not necessarily have nuclease activity.
  • compositions, systems, and methods comprising effector proteins and uses thereof.
  • Compositions, systems and methods disclosed herein leverage nucleic acid modifying activities (e.g., cis cleavage activity and trans cleavage activity) of these effector proteins for the modification, detection and engineering of target nucleic acids.
  • compositions comprising an effector protein and a guide nucleic acid, wherein the effector protein comprises an amino acid sequence that is at least 75%, at least 80%, at least 85%, at least 90%, at least 95%, at least 98%, at least 99% or 100% identical to any one of SEQ ID NOs: 1-23.
  • compositions comprising an effector protein and a guide nucleic acid, wherein the amino acid sequence of the effector protein is at least 75%, at least 80%, at least 85%, at least 90%, at least 95%, at least 98%, at least 99% or 100% identical to any one of SEQ ID NOs: 1-23
  • compositions comprising an effector protein and a guide nucleic acid, wherein the effector protein comprises about 100, about 120, about 140, about 160, about 180, about 200, about 220, about 240, about 260, about 280, about
  • compositions comprising an effector protein and a guide nucleic acid, wherein the effector protein comprises the amino acid sequence located at positions 1-100, 150-250, 101-200, 250-350, 201-300, 350-450, 301-400, 350-450, 401-500, 450-550, 501-600, 550-650, 601-700, 650-750, 701-800, 750-850, 801-900, 850-950, 901-1000, 950-1050, 1001-1100, 1050-1150, 1101-1200, 1150-1250, 1201-1300, 1250-1350, 1301-1400, 1350-1450, 1401- 1500, 1450-1550, 1501-1600, 1550-1650, 1601-1700, 1650-1750, 1701-1800, 1850-1950, or 1801- 1900 of a sequence selected from SEQ ID NOs: 1-23.
  • compositions comprising an effector protein and a guide nucleic acid, wherein the effector protein comprises an amino acid sequence that is at least 90%, at least 95%, or 100% identical to a portion of a sequence selected from SEQ ID NOs: 1-23, wherein the portion of the sequence is about 30%, about 40% about 50%, about 60%, about 70%, about 80%, or about 90% of a sequence selected from SEQ ID NOs: 1-23.
  • at least a portion of the guide nucleic acid binds the effector protein.
  • the guide nucleic acid comprises a crRNA.
  • the crRNA comprises or consists of a sequence that is at least 90% identical to a sequence selected from SEQ ID NO: 114 and SEQ ID NO: 115. In some embodiments, the crRNA comprises or consists of a sequence that is 100% identical to a sequence selected from SEQ ID NO: 114 and SEQ ID NO: 115
  • the guide nucleic acid comprises a tracrRNA sequence. In some embodiments, the guide nucleic acid does not comprise a tracrRNA. In some embodiments, the guide nucleic acid comprises a crRNA and a tracrRNA sequence.
  • the tracrRNA sequence comprises or consists of a sequence that is at least 90% identical to at least a portion of a sequence selected from SEQ ID NO: 116 and SEQ ID NO: 117. In some embodiments, the tracrRNA sequence comprises or consists of a sequence that is at least 95% identical to at least a portion of a sequence selected from SEQ ID NO: 116 and SEQ ID NO: 117. In some embodiments, the tracrRNA sequence comprises a sequence that is 100% identical to at least a portion of a sequence selected from SEQ ID NO: 116 and SEQ ID NO: 117.
  • the portion of the sequence comprises at least 50 nucleobases, at least 60 nucleobases, at least 70 nucleobases, at least 80 nucleobases, at least 90 nucleobases, at least 100, at least 110 nucleobases, at least 120 nucleobases, at least 130 nucleobases, at least 140 nucleobases, or at least 150 nucleobases.
  • the composition comprises an effector protein, a crRNA and a tracrRNA sequence, wherein the effector protein comprises an amino acid sequence that is at least 90% identical to SEQ ID NO: 22; the crRNA comprises a nucleotide sequence that is at least 90% identical to SEQ ID NO: 114; and the tracrRNA sequence comprises a nucleotide sequence that is at least 90% identical to a portion of SEQ ID NO: 116, wherein the portion of SEQ ID NO: 116 comprises at least 100 nucleobases.
  • the composition comprises an effector protein, a crRNA and a tracrRNA sequence, wherein the effector protein comprises an amino acid sequence that is at least 90% identical to SEQ ID NO: 23; the crRNA comprises a nucleotide sequence that is at least 90% identical to SEQ ID NO: 115; and the tracrRNA sequence comprises a nucleotide sequence that is at least 90% identical to a portion of SEQ ID NO: 117, wherein the portion of SEQ ID NO: 117 comprises at least 100 nucleobases.
  • the guide nucleic acid comprises a first sequence and a second sequence, wherein the first sequence is heterologous with the second sequence.
  • the first sequence comprises at least five amino acids and the second sequence comprises at least five amino acids.
  • the effector protein comprises a nuclear localization signal.
  • the length of the effector protein is at least 300, at least 350, at least 400, at least 450, at least 500, at least 550, at least 600, at least 650, at least 700, at least 750, at least 800, at least 850, at least 900, at least 950, at least 1000, at least 1050, at least 1100, at least 1150, at least 1200, at least 1250, or at least 1300 linked amino acid residues. In some embodiments, the length of the effector protein is less than about 1900 linked amino acids.
  • the length of the effector protein is about 300 to about 400, about 350 to about 450, about 400 to about 500, about 450 to about 550, about 500 to about 600, about 550 to about 650, about 600 to about 700, about 650 to about 750, about 700 to about 800, about 750 to about 850, about 800 to about 900, about 850 to about 950, about 900 to about 1000, about 950 to about 1050, about 1000 to about 1100, about 1050 to about 1150, about 1100 to about 1200, about 1150 to about 1250, about 1200 to about 1300, about 1250 to about 1350, about 1300 to about 1400, about 1350 to about 1450, about 1400 to about 1500, about 1450 to about
  • the length of the effector protein is at least 300, at least 350, at least 400, at least 450, at least 500, at least 550, at least 600, at least 650, at least 700, at least 750, at least 800, at least 850, at least 900, at least 950, at least 1000, at least 1050, at least 1100, at least 1150, at least 1200, at least 1250, at least 1300, at least 1350, at least 1400, at least 1450, at least 1500, at least 1550, at least 1600, or least 1650 linked amino acids.
  • compositions comprise a donor nucleic acid.
  • compositions comprise a fusion partner protein linked to the effector protein.
  • the fusion partner protein is directly fused to the N terminus or C terminus of the effector protein via an amide bond.
  • the fusion partner protein is directly fused to the N terminus or C terminus of the effector protein via a peptide linker.
  • the fusion partner protein comprises a polypeptide selected from a deaminase, a transcriptional activator, a transcriptional repressor, or a functional domain thereof.
  • the effector protein comprises at least one mutation that reduces its nuclease activity relative to the effector protein without the mutation as measured in a cleavage assay.
  • the effector protein is a catalytically inactive nuclease.
  • the effector protein and the guide nucleic acid do not occur together in nature.
  • compositions comprising a nucleic acid expression vector encoding an effector protein, wherein the effector protein comprises an amino acid sequence that is at least 75%, at least 80%, at least 85%, at least 90%, at least 95%, at least 98%, at least 99% or 100% identical to any one of SEQ ID NOs: 1-23. Also disclosed herein, in some aspects, are compositions comprising a nucleic acid expression vector encoding an effector protein, wherein the amino acid sequence of the effector protein is at least 75%, at least 80%, at least 85%, at least 90%, at least 95%, at least 98%, at least 99% or 100% identical to any one of SEQ ID NOs: 1-23.
  • compositions comprising a nucleic acid expression vector encoding an effector protein, wherein the effector protein comprises about 100, about 120, about 140, about 160, about 180, about 200, about 220, about 240, about 260, about 280, about 300, about 320, about 340, about 360, about 380, about 400, about 420, about 440, about 460, about 480, about 500, about 520, about 540, about 560, about 580, about 600, about 620, about 640, about 660, about 680, about 700, about 720, about 740, about 760, about 780, about 800, about 820, about 840, about 860, about 880, about 900, about 920, about 940, about 960, about 980, about 1000, about 1020, about 1040, about 1060, about 1080, about 1100, about 1120, about 1140, about 1160, about 1180, about 1200, about 1220, about 1240,
  • compositions comprising a nucleic acid expression vector encoding an effector protein, wherein the effector protein comprises the amino acid sequence located at positions 1-100, 150- 250, 101-200, 250-350, 201-300, 350-450, 301-400, 350-450, 401-500, 450-550, 501-600, 550-650, 601-700, 650-750, 701-800, 750-850, 801-900, 850-950, 901-1000, 950-1050, 1001-1100, 1050-1150, 1101-1200, 1150-1250, 1201-1300, 1250-1350, 1301-1400, 1350-1450, 1401-1500, 1450-1550, 1501- 1600, 1550-1650, 1601-1700, 1650-1750, 1701-1800, 1850-1950, or 1801-1900 of a sequence selected from SEQ ID NOs: 1-23.
  • compositions comprising a nucleic acid expression vector encoding an effector protein, wherein the effector protein comprises an amino acid sequence that is at least 90%, at least 95%, or 100% identical to a portion of a sequence selected from SEQ ID NOS: 1-23, wherein the portion of the sequence is about 30%, about 40% about 50%, about 60%, about 70%, about 80%, or about 90% of a sequence selected from SEQ ID NOs: 1-23.
  • the nucleic acid expression vector encodes at least one guide nucleic acid.
  • compositions comprise an additional nucleic acid expression vector encoding an engineered guide nucleic acid.
  • compositions comprise a donor nucleic acid, optionally wherein the donor nucleic acid is encoded by the nucleic acid expression vector or additional nucleic acid expression vector.
  • the nucleic acid expression vector is a viral vector.
  • the viral vector is an adeno associated viral (AAV) vector.
  • compositions comprising a virus, wherein the virus comprises the a nucleic acid expression vector encoding an effector protein, wherein the effector protein comprises an amino acid sequence that is at least 75%, at least 80%, at least 85%, at least 90%, at least 95%, at least 98%, at least 99% or 100% identical to any one of SEQ ID NOs: 1-23.
  • compositions comprising: an effector protein or a nucleic acid expression vector encoding the effector protein, wherein the effector protein comprises an amino acid sequence that is at least 75%, at least 80%, at least 85%, at least 90%, at least 95%, at least 98%, at least 99% or 100% identical to any one of SEQ ID NOs: 1-23; and a pharmaceutically acceptable excipient.
  • systems comprising an effector protein or a nucleic acid expression vector encoding the effector protein, wherein the effector protein comprises an amino acid sequence that is at least 75%, at least 80%, at least 85%, at least 90%, at least 95%, at least 98%, at least 99% or 100% identical to any one of SEQ ID NOs: 1-23; and at least one detection reagent for detecting a target nucleic acid.
  • the at least one detection reagent is selected from a reporter nucleic acid, a detection moiety, an additional effector protein, or a combination thereof, optionally wherein the reporter nucleic acid comprises a fluorophore, a quencher, or a combination thereof.
  • systems comprising a composition described herein.
  • systems comprise at least one amplification reagent for amplifying a target nucleic acid.
  • systems comprising a composition described herein, and at least one amplification reagent for amplifying a target nucleic acid.
  • the at least one amplification reagent is selected from the group consisting of a primer, a polymerase, a deoxynucleoside triphosphate (dNTP), a ribonucleoside triphosphate (rNTP), and combinations thereof.
  • the system comprises a device with a chamber or solid support for containing the composition, target nucleic acid, detection reagent or combination thereof.
  • the amino acid sequence of the effector protein is at least 75%, at least 80%, at least 85%, at least 90%, at least 95%, at least 98%, at least 99% or 100% identical to any one of SEQ ID NOs: 1-23.
  • a target nucleic acid in a sample comprising the steps of: (a) contacting the sample with: (i) an effector protein comprising an amino acid sequence that is at least 75%, at least 80%, at least 85%, at least 90%, at least 95%, at least 98%, at least 99% or 100% identical to any one of SEQ ID NOs: 1-23 or a nucleic acid expression vector encoding an effector protein, wherein the effector protein comprises an amino acid sequence that is at least 75%, at least 80%, at least 85%, at least 90%, at least 95%, at least 98%, at least 99% or 100% identical to any one of SEQ ID NOs: 1-23; and (ii) a reporter nucleic acid comprising a detectable moiety that produces a detectable signal in the presence of the target nucleic acid and the composition or system, and (b) detecting the detectable signal.
  • an effector protein comprising an amino acid sequence that is at least 75%, at least 80%, at least 85%, at least 90%
  • the reporter nucleic acid comprises a fluorophore, a quencher, or a combination thereof, and wherein the detecting comprises detecting a fluorescent signal.
  • methods comprise reverse transcribing the target nucleic acid, amplifying the target nucleic acid, in vitro transcribing the target nucleic acid, or any combination thereof.
  • methods comprise reverse transcribing the target nucleic acid and/or amplifying the target nucleic acid before contacting the sample with the composition.
  • methods comprise reverse transcribing the target nucleic acid and/or amplifying the target nucleic acid after contacting the sample with the composition.
  • amplifying comprises isothermal amplification.
  • the target nucleic acid is from a pathogen. In some embodiments, the pathogen is a virus. In some embodiments, the target nucleic acid comprises RNA. In some embodiments, the target nucleic acid comprises DNA. In some embodiments, the amino acid sequence of the effector protein is at least 75%, at least 80%, at least 85%, at least 90%, at least 95%, at least 98%, at least 99% or 100% identical to any one of SEQ ID NOs: 1-23.
  • a nucleic acid comprising contacting a target nucleic acid with an effector protein comprising an amino acid sequence that is at least 75%, at least 80%, at least 85%, at least 90%, at least 95%, at least 98%, at least 99% or 100% identical to any one of SEQ ID NOs: 1-23 or a nucleic acid expression vector encoding an effector protein, wherein the effector protein comprises an amino acid sequence that is at least 75%, at least 80%, at least 85%, at least 90%, at least 95%, at least 98%, at least 99% or 100% identical to any one of SEQ ID NOs: 1-23, thereby modifying the target nucleic acid.
  • modifying the target nucleic acid comprises cleaving the target nucleic acid, deleting a nucleotide of the target nucleic acid, inserting a nucleotide into the target nucleic acid, substituting a nucleotide of the target nucleic acid with an alternative nucleotide or an additional nucleotide, or any combination thereof.
  • methods comprise contacting a target nucleic acid with a donor nucleic acid.
  • the target nucleic acid comprises a mutation associated with a disease.
  • the disease is selected from an autoimmune disease, a cancer, an inherited disorder, an ophthalmological disorder, a metabolic disorder, or a combination thereof.
  • the disease is cystic fibrosis, thalassemia, Duchenne muscular dystrophy, myotonic dystrophy Type 1, or sickle cell anemia.
  • contacting the target nucleic acid comprises contacting a cell, wherein the target nucleic acid is located in the cell. In some embodiments, the contacting occurs in vitro. In some embodiments, the contacting occurs in vivo. In some embodiments, the contacting occurs ex vivo. In some embodiments, the amino acid sequence of the effector protein is at least 75%, at least 80%, at least 85%, at least 90%, at least 95%, at least 98%, at least 99% or 100% identical to any one of SEQ ID NOs: 1-23
  • cells that comprise an effector protein comprising an amino acid sequence that is at least 75%, at least 80%, at least 85%, at least 90%, at least 95%, at least 98%, at least 99% or 100% identical to any one of SEQ ID NOs: 1-23 or a nucleic acid expression vector encoding an effector protein, wherein the effector protein comprises an amino acid sequence that is at least 75%, at least 80%, at least 85%, at least 90%, at least 95%, at least 98%, at least 99% or 100% identical to any one of SEQ ID NOs: 1-23.
  • cells modified by an effector protein comprising an amino acid sequence that is at least 75%, at least 80%, at least 85%, at least 90%, at least 95%, at least 98%, at least 99% or 100% identical to any one of SEQ ID NOs: 1-23 or a nucleic acid expression vector encoding an effector protein, wherein the effector protein comprises an amino acid sequence that is at least 75%, at least 80%, at least 85%, at least 90%, at least 95%, at least 98%, at least 99% or 100% identical to any one of SEQ ID NOs: 1-23.
  • the cell is a eukaryotic cell.
  • the cell is a mammalian cell.
  • the cell is a prokaryotic cell. In some embodiments, the cell is a plant cell. In some embodiments, the cell is an animal cell. In some embodiments, the cell is a T cell, optionally wherein the T cell is a natural killer T cell (NKT). In some embodiments, the cell is a chimeric antigen receptor T cell (CAR T-cell). In some embodiments, the cell is an induced pluripotent stem cell (iPSC). In some embodiments are populations of cells described, herein.
  • the amino acid sequence of the effector protein is at least 75%, at least 80%, at least 85%, at least 90%, at least 95%, at least 98%, at least 99% or 100% identical to any one of SEQ ID NOs: 1-23.
  • a cell comprising target nucleic acid to a composition comprising an effector protein comprising an amino acid sequence that is at least 75%, at least 80%, at least 85%, at least 90%, at least 95%, at least 98%, at least 99% or 100% identical to any one of SEQ ID NOs: 1-23 or a nucleic acid expression vector encoding an effector protein, wherein the effector protein comprises an amino acid sequence that is at least 75%, at least 80%, at least 85%, at least 90%, at least 95%, at least 98%, at least 99% or 100% identical to any one of SEQ ID NOs: 1-23, thereby editing the target nucleic acid to produce a modified cell comprising a modified target nucleic acid, and producing a protein from the cell that is encoded, transcriptionally affected, or translationally affected by the modified nucleic acid.
  • methods of treating a disease comprising administering to a subject in need thereof an effector protein comprising an amino acid sequence that is at least 75%, at least 80%, at least 85%, at least 90%, at least 95%, at least 98%, at least 99% or 100% identical to any one of SEQ ID NOs: 1-23; a nucleic acid expression vector encoding an effector protein; or a cell expressing an effector protein comprising an amino acid sequence that is at least 75%, at least 80%, at least 85%, at least 90%, at least 95%, at least 98%, at least 99% or 100% identical to any one of SEQ ID NOs: 1-23.
  • the amino acid sequence of the effector protein is at least 75%, at least 80%, at least 85%, at least 90%, at least 95%, at least 98%, at least 99% or 100% identical to any one of SEQ ID NOs: 1-23.
  • compositions comprising an effector protein and a guide nucleic acid, wherein the effector protein comprises an amino acid sequence that is at least 75%, at least 80%, at least 85%, at least 90%, at least 95%, at least 98%, at least 99% or 100% identical to any one of SEQ ID NOs: 15-16 and 31-104.
  • compositions comprising an effector protein and a guide nucleic acid, wherein the amino acid sequence of the effector protein is at least 75%, at least 80%, at least 85%, at least 90%, at least 95%, at least 98%, at least 99% or 100% identical to any one of SEQ ID NOs: 15-16 and 31-104.
  • compositions comprising an effector protein and a guide nucleic acid, wherein the effector protein comprises about 100, about 120, about 140, about 160, about 180, about 200, about 220, about 240, about 260, about 280, about 300, about 320, about 340, about 360, about 380, about 400, about
  • compositions comprising an effector protein and a guide nucleic acid, wherein the effector protein comprises the amino acid sequence located at positions 1-100, 150-250, 101-200, 250-350, 201-300, 350-450, 301-400, 350-450, 401-500, 450-550, 501-600, 550-650, 601-700, 650-750, 701-800, 750- 850, 801-900, 850-950, 901-1000, 950-1050, 1001-1100, 1050-1150, 1101-1200, 1150-1250, 1201 - 1300, 1250-1350, 1301-1400, 1350-1450, 1401-1500, 1450-1550, 1501-1600, 1550-1650, 1601-1700, 1650-1750, 1701-1800, 1850-1950, or 1801-1900 of a sequence selected from SEQ ID NOs: 15-16 and 31-104.
  • compositions comprising an effector protein and a guide nucleic acid, wherein the effector protein comprises an amino acid sequence that is at least 90%, at least 95%, or 100% identical to a portion of a sequence selected from SEQ ID NOs: 15-16 and 31- 104, wherein the portion of the sequence is about 30%, about 40% about 50%, about 60%, about 70%, about 80%, or about 90% of a sequence selected from SEQ ID NOs: 15-16 and 31-104.
  • the composition comprises a second effector protein.
  • the second effector protein comprises a nuclease domain.
  • the second effector protein comprises an amino acid sequence that is at least 75%, at least 80%, at least 85%, at least 90%, at least 95%, at least 98%, at least 99% or 100% identical to any one of SEQ ID NO: 1-14 and 17-23.
  • at least a portion of the guide nucleic acid binds to the effector protein.
  • the guide nucleic acid comprises a crRNA.
  • the crRNA comprises or consists of a sequence that is at least 90% identical to a sequence selected from SEQ ID NO: 114 and SEQ ID NO: 115.
  • the crRNA comprises or consists of a sequence that is 100% identical to a sequence selected from SEQ ID NO: 114 and SEQ ID NO: 115. In some embodiments, the composition does not comprise a tracrRNA. In some embodiments, the crRNA comprises or consists of a sequence that is 90% identical to SEQ ID NO: 108. In some embodiments, the crRNA comprises or consists of a sequence that is 90% identical to any one of SEQ ID NO: 109- 113.
  • a composition comprises an effector protein and a crRNA, wherein the effector protein comprises an amino acid sequence that is at least 90% identical to a portion of a sequence selected from SEQ ID NOs: 15-16 and 31-104, and wherein the crRNA comprises a nucleotide sequence that is at least 90% identical to any one of SEQ ID NO: 108-115.
  • the guide nucleic acid comprises a first sequence and a second sequence, wherein the first sequence is heterologous with the second sequence.
  • the first sequence comprises at least five amino acids and the second sequence comprises at least five amino acids.
  • the effector protein comprises a nuclear localization signal.
  • the length of the effector protein is at least 300, at least 350, at least 400, at least 450, at least 500, at least 550, at least 600, at least 650, at least 700, at least 750, at least 800, at least 850, at least 900, at least 950, at least 1000, at least 1050, at least 1100, at least 1150, at least 1200, at least 1250, or at least 1300 linked amino acid residues. In some embodiments, the length of the effector protein is less than 1900 linked amino acids.
  • the length of the effector protein is about 300 to about 400, about 350 to about 450, about 400 to about 500, about 450 to about 550, about 500 to about 600, about 550 to about 650, about 600 to about 700, about 650 to about 750, about 700 to about 800, about 750 to about 850, about 800 to about 900, about 850 to about 950, about 900 to about 1000, about 950 to about 1050, about 1000 to about 1100, about 1050 to about 1150, about 1100 to about 1200, about 1150 to about 1250, about 1200 to about 1300, about 1250 to about 1350, about 1300 to about 1400, about 1350 to about 1450, about 1400 to about 1500, about 1450 to about 1550, about 1500 to about 1600, about 1550 to about 1650, about 1600 to about 1700, about 1650 to about 1750, about 1700 to about 1800, about 1750 to about 1850, or about 1800 to about 1900 linked amino acids.
  • the effector protein comprises a nuclease domain. In some embodiments, the nuclease domain is located at the C-terminus. In some embodiments, the effector protein does not comprise a nuclease domain. In some embodiments, the effector protein has a helicase domain. In some embodiments, the effector protein comprises a RecAl domain. In some embodiments, the effector protein comprises a RecA2 domain. In some embodiments, the effector protein comprises a helicase domain. In some embodiments, the effector protein has helicase activity. In some embodiments, the effector protein has helicase activity.
  • compositions comprising an effector protein and a guide nucleic acid, wherein the effector protein comprises helicase activity, and wherein the effector protein has an amino acid sequence of at least 800, at least 850, at least 900, at least 950, at least 1000, at least 1050, at least 1100, at least 1150, at least 1200, at least 1250, at least 1300, at least 1350, at least 1400, at least 1450, at least 1500, or at least 1600 linked amino acids.
  • the composition comprises a fusion partner protein linked to the effector protein.
  • the fusion partner protein is directly fused to the N terminus or C terminus of the effector protein via an amide bond.
  • the fusion partner protein is directly fused to the N terminus or C terminus of the effector protein via a peptide linker.
  • the fusion partner protein comprises a polypeptide selected from a deaminase, a transcriptional activator, a transcriptional repressor, or a functional domain thereof.
  • the effector protein comprises at least one mutation that reduces its nuclease activity relative to the effector protein without the mutation as measured in a cleavage assay, optionally wherein the effector protein is a catalytically inactive nuclease.
  • the effector protein and the guide nucleic acid do not occur together in nature.
  • the effector protein recognizes a protospacer adjacent motif (PAM) comprising the nucleotide sequence GAA within a target nucleic acid. In some embodiments, the effector protein recognizes a PAM comprising the nucleotide sequence TTC within a target nucleic acid. In some embodiments, the effector protein comprises the amino acid sequence of any one of SEQ ID NO: 31- 104.
  • PAM protospacer adjacent motif
  • compositions comprising a nucleic acid expression vector encoding an effector protein, wherein the amino acid sequence of the effector protein is at least 75%, at least 80%, at least 85%, at least 90%, at least 95%, at least 98%, at least 99% or 100% identical to any one of SEQ ID NOs: 15-16 and 31-104.
  • compositions comprising a nucleic acid expression vector encoding an effector protein, wherein the effector protein comprises an amino acid sequence that is at least 75%, at least 80%, at least 85%, at least 90%, at least 95%, at least 98%, at least 99% or 100% identical to any one of SEQ ID NOs: 31-104.
  • compositions comprising a nucleic acid expression vector encoding an effector protein, wherein the effector protein comprises about 100, about 120, about 140, about 160, about 180, about 200, about 220, about 240, about 260, about 280, about 300, about
  • compositions comprising a nucleic acid expression vector encoding an effector protein, wherein the effector protein comprises the amino acid sequence located at positions 1-100, 150-250, 101-200, 250-350, 201-300, 350-450, 301-400, 350-450, 401-500, 450-550, 501-600, 550-650, 601-700, 650-750, 701-800, 750-850, 801-900, 850-950, 901-1000, 950- 1050, 1001-1100, 1050-1150, 1101-1200, 1150-1250, 1201-1300, 1250-1350, 1301-1400, 1350-1450, 1401-1500, 1450-1550, 1501-1600, 1550-1650, 1601-1700, 1650-1750, 1701-1800, 1850-1950, or 1801-1900 of a sequence selected from SEQ ID NOs: 15-16 and 31-104.
  • compositions comprising a nucleic acid expression vector encoding a first effector protein and a second effector protein, wherein the first effector protein comprises an amino acid sequence that is at least 75%, at least 80%, at least 85%, at least 90%, at least 95%, or 100% identical to a sequence selected from SEQ ID NOs: 1-14 or 17-23, wherein the second effector protein comprising an amino acid sequence that is at least 75%, at least 80%, at least 85%, at least 90%, at least 95%, or 100% identical to a sequence selected from SEQ ID NOs: 15-16 and 31- 104.
  • the nucleic acid expression vector encodes at least one guide nucleic acid.
  • the composition comprises an additional nucleic acid expression vector encoding an engineered guide nucleic acid.
  • the composition comprises a donor nucleic acid, optionally wherein the donor nucleic acid is encoded by the nucleic acid expression vector or additional nucleic acid expression vector.
  • the nucleic acid expression vector is a viral vector.
  • the viral vector is an adeno associated viral (AAV) vector.
  • the virus comprises a composition described herein.
  • compositions comprising: an effector protein or a nucleic acid expression vector encoding the effector protein, wherein the effector protein comprises an amino acid sequence that is at least 75%, at least 80%, at least 85%, at least 90%, at least 95%, at least 98%, at least 99% or 100% identical to any one of SEQ ID NOs: 15-16 and 31-104; and a pharmaceutically acceptable excipient.
  • systems comprising an effector protein or a nucleic acid expression vector encoding the effector protein, wherein the effector protein comprises an amino acid sequence that is at least 75%, at least 80%, at least 85%, at least 90%, at least 95%, at least 98%, at least 99% or 100% identical to any one of SEQ ID NOs: 15-16 and 31-104; and at least one detection reagent for detecting a target nucleic acid.
  • the at least one detection reagent is selected from a reporter nucleic acid, a detection moiety, an additional effector protein, or a combination thereof, optionally wherein the reporter nucleic acid comprises a fluorophore, a quencher, or a combination thereof.
  • systems comprising a composition described herein.
  • systems comprise at least one amplification reagent for amplifying a target nucleic acid.
  • systems comprising a composition described herein, and at least one amplification reagent for amplifying a target nucleic acid.
  • the at least one amplification reagent is selected from the group consisting of a primer, a polymerase, a deoxynucleoside triphosphate (dNTP), a ribonucleoside triphosphate (rNTP), and combinations thereof.
  • the system comprises a device with a chamber or solid support for containing the composition, target nucleic acid, detection reagent or combination thereof.
  • the amino acid sequence of the effector protein is at least 75%, at least 80%, at least 85%, at least 90%, at least 95%, at least 98%, at least 99% or 100% identical to any one of SEQ ID NOs: 15-16 and 31-104.
  • a target nucleic acid in a sample comprising the steps of: (a) contacting the sample with: (i) an effector protein comprising an amino acid sequence that is at least 75%, at least 80%, at least 85%, at least 90%, at least 95%, at least 98%, at least 99% or 100% identical to any one of SEQ ID NOs: 15-16 and 31-104 or a nucleic acid expression vector encoding an effector protein, wherein the effector protein comprises an amino acid sequence that is at least 75%, at least 80%, at least 85%, at least 90%, at least 95%, at least 98%, at least 99% or 100% identical to any one of SEQ ID NOs: 15-16 and 31-104; and (ii) a reporter nucleic acid comprising a detectable moiety that produces a detectable signal in the presence of the target nucleic acid and the composition or system, and (b) detecting the detectable signal.
  • an effector protein comprising an amino acid sequence that is at least 75%, at least 80%,
  • the reporter nucleic acid comprises a fluorophore, a quencher, or a combination thereof, and wherein the detecting comprises detecting a fluorescent signal.
  • methods comprise reverse transcribing the target nucleic acid, amplifying the target nucleic acid, in vitro transcribing the target nucleic acid, or any combination thereof.
  • methods comprise reverse transcribing the target nucleic acid and/or amplifying the target nucleic acid before contacting the sample with the composition.
  • methods comprise reverse transcribing the target nucleic acid and/or amplifying the target nucleic acid after contacting the sample with the composition.
  • amplifying comprises isothermal amplification.
  • the target nucleic acid is from a pathogen. In some embodiments, the pathogen is a virus. In some embodiments, the target nucleic acid comprises RNA. In some embodiments, the target nucleic acid comprises DNA. In some embodiments, the amino acid sequence of the effector protein is at least 75%, at least 80%, at least 85%, at least 90%, at least 95%, at least 98%, at least 99% or 100% identical to any one of SEQ ID NOs: 15-16 and 31- 104.
  • a nucleic acid comprising contacting a target nucleic acid with an effector protein comprising an amino acid sequence that is at least 75%, at least 80%, at least 85%, at least 90%, at least 95%, at least 98%, at least 99% or 100% identical to any one of SEQ ID NOs: 15-16 and 31-104 or a nucleic acid expression vector encoding an effector protein, wherein the effector protein comprises an amino acid sequence that is at least 75%, at least 80%, at least 85%, at least 90%, at least 95%, at least 98%, at least 99% or 100% identical to any one of SEQ ID NOs: 15-16 and 31-104, thereby modifying the target nucleic acid.
  • modifying the target nucleic acid comprises cleaving the target nucleic acid, deleting a nucleotide of the target nucleic acid, inserting a nucleotide into the target nucleic acid, substituting a nucleotide of the target nucleic acid with an alternative nucleotide or an additional nucleotide, or any combination thereof.
  • methods comprise contacting a target nucleic acid with a donor nucleic acid.
  • the target nucleic acid comprises a mutation associated with a disease.
  • the disease is selected from an autoimmune disease, a cancer, an inherited disorder, an ophthalmological disorder, a metabolic disorder, or a combination thereof.
  • the disease is cystic fibrosis, thalassemia, Duchenne muscular dystrophy, myotonic dystrophy Type 1, or sickle cell anemia.
  • contacting the target nucleic acid comprises contacting a cell, wherein the target nucleic acid is located in the cell. In some embodiments, the contacting occurs in vitro. In some embodiments, the contacting occurs in vivo. In some embodiments, the contacting occurs ex vivo. In some embodiments, the amino acid sequence of the effector protein is at least 75%, at least 80%, at least 85%, at least 90%, at least 95%, at least 98%, at least 99% or 100% identical to any one of SEQ ID NOs: 15-16 and 31-104.
  • cells that comprise an effector protein comprising an amino acid sequence that is at least 75%, at least 80%, at least 85%, at least 90%, at least 95%, at least 98%, at least 99% or 100% identical to any one of SEQ ID NOs: 15-16 and 31-104 or a nucleic acid expression vector encoding an effector protein, wherein the effector protein comprises an amino acid sequence that is at least 75%, at least 80%, at least 85%, at least 90%, at least 95%, at least 98%, at least 99% or 100% identical to any one of SEQ ID NOs: 15-16 and 31-104.
  • cells modified by an effector protein comprising an amino acid sequence that is at least 75%, at least 80%, at least 85%, at least 90%, at least 95%, at least 98%, at least 99% or 100% identical to any one of SEQ ID NOs: 15-16 and 31-104 or a nucleic acid expression vector encoding an effector protein, wherein the effector protein comprises an amino acid sequence that is at least 75%, at least 80%, at least 85%, at least 90%, at least 95%, at least 98%, at least 99% or 100% identical to any one of SEQ ID NOs: 15-16 and 31-104.
  • the cell is a eukaryotic cell.
  • the cell is a mammalian cell.
  • the cell is a prokaryotic cell. In some embodiments, the cell is a plant cell. In some embodiments, the cell is an animal cell. In some embodiments, the cell is a T cell, optionally wherein the T cell is a natural killer T cell (NKT). In some embodiments, the cell is a chimeric antigen receptor T cell (CAR T-cell). In some embodiments, the cell is an induced pluripotent stem cell (iPSC). In some embodiments are populations of cells described, herein.
  • the amino acid sequence of the effector protein is at least 75%, at least 80%, at least 85%, at least 90%, at least 95%, at least 98%, at least 99% or 100% identical to any one of SEQ ID NOs: 15-16 and 31-104.
  • a cell comprising target nucleic acid to a composition comprising an effector protein comprising an amino acid sequence that is at least 75%, at least 80%, at least 85%, at least 90%, at least 95%, at least 98%, at least 99% or 100% identical to any one of SEQ ID NOs: 15-16 and 31-104 or a nucleic acid expression vector encoding an effector protein, wherein the effector protein comprises an amino acid sequence that is at least 75%, at least 80%, at least 85%, at least 90%, at least 95%, at least 98%, at least 99% or 100% identical to any one of SEQ ID NOs: 15-16 and 31-104, thereby editing the target nucleic acid to produce a modified cell comprising a modified target nucleic acid, and producing a protein from the cell that is encoded, transcriptionally affected, or translationally affected by the modified nucleic acid.
  • methods of treating a disease comprising administering to a subject in need thereof an effector protein comprising an amino acid sequence that is at least 75%, at least 80%, at least 85%, at least 90%, at least 95%, at least 98%, at least 99% or 100% identical to any one of SEQ ID NOs: 15-16 and 31-104; a nucleic acid expression vector encoding an effector protein; or a cell expressing an effector protein comprising an amino acid sequence that is at least 75%, at least 80%, at least 85%, at least 90%, at least 95%, at least 98%, at least 99% or 100% identical to any one of SEQ ID NOs: 15-16 and 31-104.
  • the amino acid sequence of the effector protein is at least 75%, at least 80%, at least 85%, at least 90%, at least 95%, at least 98%, at least 99% or 100% identical to any one of SEQ ID NOs: 15-16 and 31-104.
  • compositions comprising an effector protein and a guide nucleic acid, wherein the effector protein comprises an amino acid sequence that is at least 75%, at least 80%, at least 85%, at least 90%, at least 95%, at least 98%, at least 99% or 100% identical to SEQ ID NO: 121, and wherein the guide nucleic acid comprises at least 75%, at least 80%, at least 85%, at least 90%, at least 95%, at least 98%, at least 99% or 100% identical to at least one of SEQ ID NOs: 154 and 159.
  • compositions comprising an effector protein and a guide nucleic acid, wherein the amino acid sequence of the effector protein is at least 75%, at least 80%, at least 85%, at least 90%, at least 95%, at least 98%, at least 99% or 100% identical to SEQ ID NO: 121, wherein the guide nucleic acid comprises a spacer sequence that hybridizes to a target sequence of a target nucleic acid, and wherein a PAM of SEQ ID NOs: 135 or 136 is adjacent to the target sequence.
  • systems comprising an effector protein and a guide nucleic acid
  • the effector protein comprises an amino acid sequence that is at least 75%, at least 80%, at least 85%, at least 90%, at least 95%, at least 98%, at least 99% or 100% identical to SEQ ID NO: 121
  • the guide nucleic acid comprises at least 75%, at least 80%, at least 85%, at least 90%, at least 95%, at least 98%, at least 99% or 100% identical to at least one of SEQ ID NOs: 154 and 159.
  • cells comprising a target nucleic acid, wherein the target nucleic acid is modified by the composition or system comprising an effector protein and a guide nucleic acid, wherein the effector protein comprises an amino acid sequence that is at least 75%, at least 80%, at least 85%, at least 90%, at least 95%, at least 98%, at least 99% or 100% identical to SEQ ID NO: 121, and wherein the guide nucleic acid comprises at least 75%, at least 80%, at least 85%, at least 90%, at least 95%, at least 98%, at least 99% or 100% identical to at least one of SEQ ID NOs: 154 and 159.
  • compositions comprising an effector protein and a guide nucleic acid, wherein the effector protein comprises an amino acid sequence that is at least 75%, at least 80%, at least 85%, at least 90%, at least 95%, at least 98%, at least 99% or 100% identical to SEQ ID NO: 122, and wherein the guide nucleic acid comprises at least 75%, at least 80%, at least 85%, at least 90%, at least 95%, at least 98%, at least 99% or 100% identical to at least one of SEQ ID NOs: 114, 116, 152, and 157.
  • compositions comprising an effector protein and a guide nucleic acid, wherein the amino acid sequence of the effector protein is at least 75%, at least 80%, at least 85%, at least 90%, at least 95%, at least 98%, at least 99% or 100% identical to SEQ ID NO: 122, wherein the guide nucleic acid comprises a spacer sequence that hybridizes to a target sequence of a target nucleic acid, and wherein a PAM of SEQ ID NOs: 131 or 132 is adjacent to the target sequence.
  • systems comprising an effector protein and a guide nucleic acid
  • the effector protein comprises an amino acid sequence that is at least 75%, at least 80%, at least 85%, at least 90%, at least 95%, at least 98%, at least 99% or 100% identical to SEQ ID NO: 122
  • the guide nucleic acid comprises at least 75%, at least 80%, at least 85%, at least 90%, at least 95%, at least 98%, at least 99% or 100% identical to at least one of SEQ ID NOs: 114, 116, 152, and 157.
  • cells comprising a target nucleic acid, wherein the target nucleic acid is modified by the composition or system comprising an effector protein and a guide nucleic acid, wherein the effector protein comprises an amino acid sequence that is at least 75%, at least 80%, at least 85%, at least 90%, at least 95%, at least 98%, at least 99% or 100% identical to SEQ ID NO: 122, and wherein the guide nucleic acid comprises at least 75%, at least 80%, at least 85%, at least 90%, at least 95%, at least 98%, at least 99% or 100% identical to at least one of SEQ ID NOs: 114, 116, 152, and 157.
  • compositions comprising an effector protein and a guide nucleic acid, wherein the effector protein comprises an amino acid sequence that is at least 75%, at least 80%, at least 85%, at least 90%, at least 95%, at least 98%, at least 99% or 100% identical to SEQ ID NO: 123, and wherein the guide nucleic acid comprises at least 75%, at least 80%, at least 85%, at least 90%, at least 95%, at least 98%, at least 99% or 100% identical to at least one of SEQ ID NOs: 114, 116, 153, and 158.
  • compositions comprising an effector protein and a guide nucleic acid, wherein the amino acid sequence of the effector protein is at least 75%, at least 80%, at least 85%, at least 90%, at least 95%, at least 98%, at least 99% or 100% identical to SEQ ID NO: 123, wherein the guide nucleic acid comprises a spacer sequence that hybridizes to a target sequence of a target nucleic acid, and wherein a PAM of SEQ ID NOs: 133 or 134 is adjacent to the target sequence.
  • systems comprising an effector protein and a guide nucleic acid
  • the effector protein comprises an amino acid sequence that is at least 75%, at least 80%, at least 85%, at least 90%, at least 95%, at least 98%, at least 99% or 100% identical to SEQ ID NO: 123
  • the guide nucleic acid comprises at least 75%, at least 80%, at least 85%, at least 90%, at least 95%, at least 98%, at least 99% or 100% identical to at least one of SEQ ID NOs: 114, 116, 153, and 158.
  • cells comprising a target nucleic acid, wherein the target nucleic acid is modified by the composition or system comprising an effector protein and a guide nucleic acid, wherein the effector protein comprises an amino acid sequence that is at least 75%, at least 80%, at least 85%, at least 90%, at least 95%, at least 98%, at least 99% or 100% identical to SEQ ID NO: 123, and wherein the guide nucleic acid comprises at least 75%, at least 80%, at least 85%, at least 90%, at least 95%, at least 98%, at least 99% or 100% identical to at least one of SEQ ID NOs: 114, 116, 153, and 158.
  • compositions comprising an effector protein and a guide nucleic acid
  • the effector protein comprises an amino acid sequence that is at least 75%, at least 80%, at least 85%, at least 90%, at least 95%, at least 98%, at least 99% or 100% identical to SEQ ID NO: 124
  • the guide nucleic acid comprises at least 75%, at least 80%, at least 85%, at least 90%, at least 95%, at least 98%, at least 99% or 100% identical to at least one of SEQ ID NOs: 114, 161, 162, and 163.
  • compositions comprising an effector protein and a guide nucleic acid, wherein the amino acid sequence of the effector protein is at least 75%, at least 80%, at least 85%, at least 90%, at least 95%, at least 98%, at least 99% or 100% identical to SEQ ID NO: 124, wherein the guide nucleic acid comprises a spacer sequence that hybridizes to a target sequence of a target nucleic acid, and wherein a PAM of SEQ ID NOs: 139 or 140 is adjacent to the target sequence.
  • systems comprising an effector protein and a guide nucleic acid
  • the effector protein comprises an amino acid sequence that is at least 75%, at least 80%, at least 85%, at least 90%, at least 95%, at least 98%, at least 99% or 100% identical to SEQ ID NO: 124
  • the guide nucleic acid comprises at least 75%, at least 80%, at least 85%, at least 90%, at least 95%, at least 98%, at least 99% or 100% identical to at least one of SEQ ID NOs: 114, 161, 162, and 163.
  • a target nucleic acid in a cell comprising contacting the cell with the composition or system comprising an effector protein and a guide nucleic acid, wherein the effector protein comprises an amino acid sequence that is at least 75%, at least 80%, at least 85%, at least 90%, at least 95%, at least 98%, at least 99% or 100% identical to SEQ ID NO: 124, and wherein the guide nucleic acid comprises at least 75%, at least 80%, at least 85%, at least 90%, at least 95%, at least 98%, at least 99% or 100% identical to at least one of SEQ ID NOs: 114, 161, 162, and 163.
  • detecting a target nucleic acid in a sample comprising contacting the sample with the composition or system comprising an effector protein and a guide nucleic acid, wherein the effector protein comprises an amino acid sequence that is at least 75%, at least 80%, at least 85%, at least 90%, at least 95%, at least 98%, at least 99% or 100% identical to SEQ ID NO: 124, and wherein the guide nucleic acid comprises at least 75%, at least 80%, at least 85%, at least 90%, at least 95%, at least 98%, at least 99% or 100% identical to at least one of SEQ ID NOs: 114, 161, 162, and 163.
  • cells comprising a target nucleic acid, wherein the target nucleic acid is modified by the composition or system comprising an effector protein and a guide nucleic acid, wherein the effector protein comprises an amino acid sequence that is at least 75%, at least 80%, at least 85%, at least 90%, at least 95%, at least 98%, at least 99% or 100% identical to SEQ ID NO: 124, and wherein the guide nucleic acid comprises at least 75%, at least 80%, at least 85%, at least 90%, at least 95%, at least 98%, at least 99% or 100% identical to at least one of SEQ ID NOs: 114, 161, 162, and 163.
  • compositions comprising an effector protein and a guide nucleic acid, wherein the effector protein comprises an amino acid sequence that is at least 75%, at least 80%, at least 85%, at least 90%, at least 95%, at least 98%, at least 99% or 100% identical to SEQ ID NO: 125, and wherein the guide nucleic acid comprises at least 75%, at least 80%, at least 85%, at least 90%, at least 95%, at least 98%, at least 99% or 100% identical to at least one of SEQ ID NOs: 114, 116, 155, and 160.
  • compositions comprising an effector protein and a guide nucleic acid, wherein the amino acid sequence of the effector protein is at least 75%, at least 80%, at least 85%, at least 90%, at least 95%, at least 98%, at least 99% or 100% identical to SEQ ID NO: 125, wherein the guide nucleic acid comprises a spacer sequence that hybridizes to a target sequence of a target nucleic acid, and wherein a PAM of SEQ ID NO: 137 or 138 is adjacent to the target sequence.
  • systems comprising an effector protein and a guide nucleic acid
  • the effector protein comprises an amino acid sequence that is at least 75%, at least 80%, at least 85%, at least 90%, at least 95%, at least 98%, at least 99% or 100% identical to SEQ ID NO: 125
  • the guide nucleic acid comprises at least 75%, at least 80%, at least 85%, at least 90%, at least 95%, at least 98%, at least 99% or 100% identical to at least one of SEQ ID NOs: 114, 116, 155, and 160.
  • cells comprising a target nucleic acid, wherein the target nucleic acid is modified by the composition or system comprising an effector protein and a guide nucleic acid, wherein the effector protein comprises an amino acid sequence that is at least 75%, at least 80%, at least 85%, at least 90%, at least 95%, at least 98%, at least 99% or 100% identical to SEQ ID NO: 125, and wherein the guide nucleic acid comprises at least 75%, at least 80%, at least 85%, at least 90%, at least 95%, at least 98%, at least 99% or 100% identical to at least one of SEQ ID NOs: 114, 116, 155, and 160.
  • compositions comprising an effector protein and a guide nucleic acid
  • the effector protein comprises an amino acid sequence that is at least 75%, at least 80%, at least 85%, at least 90%, at least 95%, at least 98%, at least 99% or 100% identical to SEQ ID NO: 126
  • the guide nucleic acid comprises at least 75%, at least 80%, at least 85%, at least 90%, at least 95%, at least 98%, at least 99% or 100% identical to at least one of SEQ ID NOs: 114, 116, 151, 156, and 161.
  • compositions comprising an effector protein and a guide nucleic acid, wherein the amino acid sequence of the effector protein is at least 75%, at least 80%, at least 85%, at least 90%, at least 95%, at least 98%, at least 99% or 100% identical to SEQ ID NO: 126, wherein the guide nucleic acid comprises a spacer sequence that hybridizes to a target sequence of a target nucleic acid, and wherein a PAM of SEQ ID NO: 131 is adjacent to the target sequence.
  • systems comprising an effector protein and a guide nucleic acid
  • the effector protein comprises an amino acid sequence that is at least 75%, at least 80%, at least 85%, at least 90%, at least 95%, at least 98%, at least 99% or 100% identical to SEQ ID NO: 126
  • the guide nucleic acid comprises at least 75%, at least 80%, at least 85%, at least 90%, at least 95%, at least 98%, at least 99% or 100% identical to at least one of SEQ ID NOs: 114, 116, 151, 156, and 161.
  • detecting a target nucleic acid in a sample comprising contacting the sample with the composition or system comprising an effector protein and a guide nucleic acid, wherein the effector protein comprises an amino acid sequence that is at least 75%, at least 80%, at least 85%, at least 90%, at least 95%, at least 98%, at least 99% or 100% identical to SEQ ID NO: 126, and wherein the guide nucleic acid comprises at least 75%, at least 80%, at least 85%, at least 90%, at least 95%, at least 98%, at least 99% or 100% identical to at least one of SEQ ID NOs: 114, 116, 151, 156, and 161.
  • cells comprising atarget nucleic acid, wherein the target nucleic acid is modified by the composition or system comprising an effector protein and a guide nucleic acid, wherein the effector protein comprises an amino acid sequence that is at least 75%, at least 80%, at least 85%, at least 90%, at least 95%, at least 98%, at least 99% or 100% identical to SEQ ID NO: 126, and wherein the guide nucleic acid comprises at least 75%, at least 80%, at least 85%, at least 90%, at least 95%, at least 98%, at least 99% or 100% identical to at least one of SEQ ID NOs: 114, 116, 151, 156, and 161.
  • systems for modifying a target nucleic acid comprising at least two components each individually comprising one of the following: (i) an effector protein or a nucleic acid encoding the effector protein, wherein the effector protein comprises an amino acid sequence that is at least 75%, at least 80%, at least 85%, at least 90%, at least 95%, at least 98%, at least 99%, or 100% identical to any one of the sequences recited in TABLE 1; and (ii) a guide nucleic acid or a nucleic acid encoding the guide nucleic acid, wherein at least a portion of the guide nucleic acid is complementary to a target sequence of a target nucleic acid.
  • an effector protein or a nucleic acid encoding the effector protein comprising the effector protein comprises about 100, about 120, about 140, about 160, about 180, about 200, about 220, about 240, about 260, about
  • the term, “about,” in reference to a number or range of numbers, is understood to mean the stated number and numbers +/- 10% thereof, or 10% below the lower listed limit and 10% above the higher listed limit for the values listed for a range.
  • percent identity refers to the extent to which two sequences (nucleotide or amino acid) have the same residue at the same positions in an alignment.
  • an amino acid sequence is X% identical to SEQ ID NO: Y can refer to % identity of the amino acid sequence to SEQ ID NO: Y and is elaborated as X% of residues in the amino acid sequence are identical to the residues of sequence disclosed in SEQ ID NO: Y.
  • computer programs can be employed for such calculations.
  • Illustrative programs that compare and align pairs of sequences include ALIGN (Myers and Miller, Comput Appl Biosci.
  • amplification refers to a process by which a nucleic acid molecule is enzymatically copied to generate a plurality of nucleic acid molecules containing the same sequence as the original nucleic acid molecule or a distinguishable portion thereof.
  • base editing enzyme refers to a protein, polypeptide or fragment thereof that is capable of catalyzing the chemical modification of a nucleobase of a deoxyribonucleotide or a ribonucleotide.
  • a base editing enzyme for example, is capable of catalyzing a reaction that modifies a nucleobase that is present in a nucleic acid molecule, such as DNA or RNA (single stranded or double stranded).
  • Non-limiting examples of the type of modification that a base editing enzyme is capable of catalyzing includes converting an existing nucleobase to a different nucleobase, such as converting a cytosine to a guanine or thymine or converting an adenine to a guanine, hydrolytic deamination of an adenine or adenosine, or methylation of cytosine (e.g., CpG, CpA, CpT or CpC).
  • a base editing enzyme itself may or may not bind to the nucleic acid molecule containing the nucleobase.
  • base editor refers to a fusion protein comprising a base editing enzyme fused to an effector protein.
  • the base editor is functional when the effector protein is coupled to a guide nucleic acid.
  • the guide nucleic acid imparts sequence specific activity to the base editor.
  • the effector protein may comprise a catalytically inactive effector protein.
  • the base editing enzyme may comprise deaminase activity. Additional base editors are described herein.
  • catalytically inactive effector protein refers to an effector protein that is modified relative to a naturally-occurring effector protein to have a reduced or eliminated catalytic activity relative to that of the naturally-occurring effector protein, but retains its ability to interact with a guide nucleic acid.
  • the catalytic activity that is reduced or eliminated is often a nuclease activity.
  • the naturally-occurring effector protein may be a wildtype protein.
  • the catalytically inactive effector protein is referred to as a catalytically inactive variant of an effector protein, e.g., a Cas effector protein.
  • cleavage refers to cleavage (hydrolysis of a phosphodiester bond) of a target nucleic acid by an effector protein complexed with a guide nucleic acid refers to cleavage of a target nucleic acid that is hybridized to a guide nucleic acid, wherein cleavage occurs within or directly adjacent to the region of the target nucleic acid that is hybridized to the guide nucleic acid.
  • nucleic acid molecule or nucleotide sequence refers to the characteristic of a polynucleotide having nucleotides that base pair with their Watson-Crick counterparts (C with G; or A with T) in a reference nucleic acid. For example, when every nucleotide in a polynucleotide forms a base pair with a reference nucleic acid, that polynucleotide is said to be 100% complementary to the reference nucleic acid.
  • the upper (sense) strand sequence is in general, understood as going in the direction from its 5'- to 3 '-end, and the complementary sequence is thus understood as the sequence of the lower (antisense) strand in the same direction as the upper strand.
  • the reverse sequence is understood as the sequence of the upper strand in the direction from its 3'- to its 5'- end, while the "reverse complement” sequence or the “reverse complementary” sequence is understood as the sequence of the lower strand in the direction of its 5'- to its 3 '-end.
  • Each nucleotide in a double stranded DNA or RNA molecule that is paired with its Watson-Crick counterpart called its complementary nucleotide.
  • cleavage assay refers to an assay designed to visualize, quantitate or identify cleavage of a nucleic acid.
  • the cleavage activity may be cis-cleavage activity.
  • the cleavage activity may be trans-cleavage activity.
  • CRISPR clustered regularly interspaced short palindromic repeats
  • CRISPR RNA refers to a type of guide nucleic acid, wherein the nucleic acid is RNA comprising a first sequence, often referred to herein as a “spacer sequence,” that hybridizes to a target sequence of a target nucleic acid, and a second sequence that either a) hybridizes to a portion of a tracrRNA or b) is capable of being non-covalently bound by an effector protein.
  • the crRNA is covalently linked to an additional nucleic acid (e.g., a tracrRNA), wherein the additional nucleic acid interacts with the effector protein.
  • detecttable signal refers to a signal that can be detected using optical, fluorescent, chemiluminescent, electrochemical and other detection methods known in the art.
  • donor nucleic acid refers to nucleic acid that is incorporated into a target nucleic acid.
  • effector protein refers to a protein, polypeptide, or peptide that non-covalently binds to a guide nucleic acid to form a complex that contacts a target nucleic acid, wherein at least a portion of the guide nucleic acid hybridizes to a target sequence of the target nucleic acid.
  • a complex between an effector protein and a guide nucleic acid can include multiple effector proteins or a single effector protein. In some instances, the effector protein modifies the target nucleic acid when the complex contacts the target nucleic acid.
  • the effector protein does not modify the target nucleic acid, but it is fused to a fusion partner protein that modifies the target nucleic acid when the complex contacts the target nucleic acid.
  • a fusion partner protein that modifies the target nucleic acid when the complex contacts the target nucleic acid.
  • a non-limiting example of an effector protein modifying a target nucleic acid is cleaving (hydrolysis) of a phosphodiester bond of the target nucleic acid. Additional examples of modifications an effector protein can make to target nucleic acids are described herein and throughout.
  • fusion effector protein refers to a protein comprising at least two heterologous polypeptides. Often a fusion effector protein comprises an effector protein and a fusion partner protein. In general, the fusion partner protein is not an effector protein. Examples of fusion partner proteins are provided herein.
  • fusion partner protein refers to a protein, polypeptide or peptide that is fused to an effector protein.
  • the fusion partner generally imparts some function to the fusion protein that is not provided by the effector protein.
  • the fusion partner may provide a detectable signal.
  • the fusion partner may modify a target nucleic acid, including changing a nucleobase of the target nucleic acid and making a chemical modification to one or more nucleotides of the target nucleic acid.
  • the fusion partner may be capable of modulating the expression of a target nucleic acid.
  • the fusion partner may inhibit, reduce, activate or increase expression of a target nucleic acid via additional proteins or nucleic acid modifications to the target sequence.
  • heterologous means a nucleotide or polypeptide sequence that is not found in a native nucleic acid or protein, respectively.
  • fusion proteins comprise an effector protein and a fusion partner protein, wherein the fusion partner protein is heterologous to an effector protein. These fusion proteins may be referred to as a “heterologous protein.”
  • a protein that is heterologous to the effector protein is a protein that is not covalently linked via an amide bond to the effector protein in nature.
  • a heterologous protein is not encoded by a species that encodes the effector protein.
  • the heterologous protein exhibits an activity (e.g., enzymatic activity) when it is fused to the effector protein. In some instances, the heterologous protein exhibits increased or reduced activity (e.g., enzymatic activity) when it is fused to the effector protein, relative to when it is not fused to the effector protein. In some instances, the heterologous protein exhibits an activity (e.g., enzymatic activity) that it does not exhibit when it is fused to the effector protein.
  • a guide nucleic acid may comprise a first sequence and a second sequence, wherein the first sequence and the second sequence are not found covalently linked via a phosphodiester bond in nature. Thus, the first sequence is considered to be heterologous with the second sequence, and the guide nucleic acid may be referred to as a heterologous guide nucleic acid.
  • in vitro is used to describe an event that takes places contained in a container for holding laboratory reagent such that it is separated from the biological source from which the material is obtained.
  • In vitro assays can encompass cell-based assays in which living or dead cells are employed.
  • In vitro assays can also encompass a cell-free assay in which no intact cells are employed.
  • the term “in vivo” is used to describe an event that takes place in a subject’s body.
  • ex vzvo is used to describe an event that takes place outside of a subject’s body.
  • An ex vivo assay is not performed on a subject. Rather, it is performed upon a sample separate from a subject.
  • An example of an ex vivo assay performed on a sample is an “in vitro” assay.
  • the term, “functional domain,” as used herein, refers to a region of one or more amino acids in a protein that is required for an activity of the protein, or the full extent of that activity, as measured in an in vitro assay. Activities include, but are not limited to nucleic acid binding, nucleic acid modification, nucleic acid cleavage, protein binding. The absence of the functional domain, including mutations of the functional domain, would abolish or reduce activity.
  • guide nucleic acid refers to a nucleic acid comprising: a first nucleotide sequence that hybridizes to a target nucleic acid; and a second nucleotide sequence that capable of being non-covalently bound by an effector protein.
  • the first sequence may be referred to herein as a spacer sequence.
  • the second sequence may be referred to herein as a repeat sequence. In some instances, the first sequence is located 5’ of the second nucleotide sequence. In some instances, the first sequence is located 3’ of the second nucleotide sequence.
  • helicase activity refers to the enzymatic activity of an enzyme which allows the enzyme to unwind, or separate hybridized nucleic acid strands.
  • An enzyme with a “helicase domain,” as used herein, refers to a protein sequence or fragment, thereof, which results in helicase activity.
  • An enzyme, protein sequence or fragment, thereof, with helicase activity may be referred to as a “helicase.”
  • linked amino acids refers to at least two amino acids linked by an amide bond.
  • linker refers to a bond or molecule that links a first polypeptide to a second polypeptide.
  • a “peptide linker” comprises at least two amino acids linked by an amide bond.
  • modified target nucleic acid refers to a target nucleic acid, wherein the target nucleic acid has undergone a modification, for example, after contact with an effector protein. In some instances, the modification is an alteration in the sequence of the target nucleic acid. In some instances, the modified target nucleic acid comprises an insertion, deletion, or replacement of one or more nucleotides compared to the unmodified target nucleic acid.
  • mutation associated with a disease refers to the co-occurrence of a mutation and the phenotype of a disease.
  • the mutation may occur in a gene, wherein transcription or translation products from the gene occur at a significantly abnormal level or in an abnormal form in a cell or subject harboring the mutation as compared to a non-disease control subject not having the mutation.
  • nucleic acid, nucleotide, protein, polypeptide, peptide or amino acid refers to a nucleic acid, nucleotide, protein, polypeptide, peptide or amino acid that is at least substantially free from at least one other feature with which it is naturally associated in nature and as found in nature, and/or contains a modification (e.g., chemical modification, nucleotide sequence, or amino acid sequence) that is not present in the naturally occurring nucleic acid, nucleotide, protein, polypeptide, peptide, or amino acid.
  • a modification e.g., chemical modification, nucleotide sequence, or amino acid sequence
  • compositions or systems described herein refer to a composition or system having at least one component that is not naturally associated with the other components of the composition or system.
  • a composition may include an effector protein and a guide nucleic acid that do not naturally occur together.
  • an effector protein or guide nucleic acid that is “natural,” “naturally-occurring,” or “found in nature” includes an effector protein and a guide nucleic acid from a cell or organism that have not been genetically modified by the hand of man.
  • nucleic acid expression vector refers to a plasmid that can be used to express a nucleic acid of interest.
  • nuclear localization signal refers to an entity (e.g., peptide) that facilitates localization of a nucleic acid, protein, or small molecule to the nucleus, when present in a cell that contains a nuclear compartment.
  • nuclease activity refers to the enzymatic activity of an enzyme which allows the enzyme to cleave the phosphodiester bonds between the nucleotide subunits of nucleic acids; the term “endonuclease activity” refers to the enzymatic activity of an enzyme which allows the enzyme to cleave the phosphodiester bond within a polynucleotide chain.
  • An enzyme with nuclease activity may be referred to as a “nuclease.”
  • nuclease domain refers to a protein sequence or fragment, thereof, which is responsible for the enzymatic activity of an enzyme which allows the enzyme to cleave the phosphodiester bonds between the nucleotide subunits of nucleic acids.
  • PAM protospacer adjacent motif
  • DNA sequences encoding the structural coding sequence can be assembled from cDNA fragments and short oligonucleotide linkers, or from a series of synthetic oligonucleotides, to provide a synthetic nucleic acid which is capable of being expressed from a recombinant transcriptional unit contained in a cell or in a cell-free transcription and translation system.
  • sequences can be provided in the form of an open reading frame uninterrupted by internal non translated sequences, or introns, which are typically present in eukaryotic genes.
  • Genomic DNA comprising the relevant sequences can also be used in the formation of a recombinant gene or transcriptional unit.
  • sequences of non-translated DNA may be present 5 ’ or 3 ’ from the open reading frame, where such sequences do not interfere with manipulation or expression of the coding regions and may act to modulate production of a desired product by various mechanisms.
  • the term, “recombinant” polynucleotide or “recombinant” nucleic acid refers to one which is not naturally occurring, e.g., is made by the artificial combination of two otherwise separated segments of sequence through human intervention. This artificial combination is often accomplished by either chemical synthesis means, or by the artificial manipulation of isolated segments of nucleic acids, e.g., by genetic engineering techniques.
  • telomere shortening is usually done to replace a codon with a redundant codon encoding the same or a conservative amino acid, while typically introducing or removing a sequence recognition site.
  • it is performed to join together nucleic acid segments of desired functions to generate a desired combination of functions.
  • This artificial combination is often accomplished by either chemical synthesis means, or by the artificial manipulation of isolated segments of nucleic acids, e.g., by genetic engineering techniques.
  • the term, “recombinant polypeptide” or “recombinant protein,” refers to one which is not naturally occurring, e.g., is made by the artificial combination of two otherwise separated segments of amino sequences through human intervention.
  • a polypeptide that comprises a heterologous amino acid sequence is a recombinant polypeptide.
  • reporter and “reporter nucleic acid,” are used interchangeably herein to refer to a non-target nucleic acid molecule that can provide a detectable signal upon cleavage by an effector protein. Examples of detectable signals and detectable moieties that generate detectable signals are provided herein.
  • sample generally refers to something comprising a target nucleic acid.
  • the sample is a biological sample, such as a biological fluid or tissue sample.
  • the sample is an environmental sample.
  • the sample may be a biological sample or environmental sample that is modified or manipulated.
  • samples may be modified or manipulated with purification techniques, heat, nucleic acid amplification, salts and buffers.
  • the term, “subject,” as used herein, refers to a biological entity containing expressed genetic materials.
  • the biological entity can be a plant, animal, or microorganism, including, for example, bacteria, viruses, fungi, and protozoa.
  • the subject can be tissues, cells and their progeny of a biological entity obtained in vivo or cultured in vitro.
  • the subject can be a mammal.
  • the mammal can be a human.
  • the subject may be diagnosed or suspected of being at high risk for a disease. In some instances, the subject is not necessarily diagnosed or suspected of being at high risk for the disease.
  • target nucleic acid refers to a nucleic acid that is selected as the nucleic acid for modification, binding, hybridization or any other activity of or interaction with a nucleic acid, protein, polypeptide, or peptide described herein.
  • a target nucleic acid may comprise RNA, DNA, or a combination thereof.
  • a target nucleic acid may be single-stranded (e.g., single -stranded RNA or single -stranded DNA) or double-stranded (e.g., double -stranded DNA).
  • target sequence refers to a sequence of nucleotides found within a target nucleic acid. Such a sequence of nucleotides can, for example, hybridize to an equal length portion of a guide nucleic acid. Hybridization of the guide nucleic acid to the target sequence may bring an effector protein into contact with the target nucleic acid.
  • trans cleavage is used herein in reference to cleavage (hydrolysis of a phosphodiester bond) of one or more nucleic acids by an effector protein that is complexed with a guide nucleic acid and a target nucleic acid.
  • the one or more nucleic acids may include the target nucleic acid as well as non-target nucleic acids.
  • Trans cleavage may occur near, but not within or directly adjacent to, the region of the target nucleic acid that is hybridized to the guide nucleic acid.
  • Trans cleavage activity may be triggered by the hybridization of the guide nucleic acid to the target nucleic acid.
  • tracrRNA trans-activating RNA
  • tracrRNA refers to a nucleic acid that comprises a first sequence that is capable of being non-covalently bound by an effector protein.
  • TracrRNAs may comprise a second sequence that hybridizes to a portion of a crRNA, which may be referred to as a repeat hybridization sequence.
  • tracrRNAs are covalently linked to a crRNA.
  • a tracrRNA may include deoxyribonucleosides, ribonucleosides, chemically modified nucleosides, or any combination thereof.
  • a tracrRNA may be separate from, but form a complex with, a guide nucleic acid and an effector protein.
  • the tracrRNA may be attached (e.g., covalently) by an artificial linker to a guide nucleic acid.
  • a tracrRNA may include a nucleotide sequence that hybridizes with a portion of a guide nucleic acid.
  • a tracrRNA may also form a secondary structure (e.g., one or more hairpin loops) that facilitates the binding of an effector protein to a guide nucleic acid and/or modification activity of an effector protein on a target nucleic acid.
  • a tracrRNA may include a repeat hybridization region and a hairpin region.
  • the repeat hybridization region may hybridize to all or part of the repeat sequence of a guide nucleic acid.
  • the repeat hybridization region may be positioned 3’ of the hairpin region.
  • the hairpin region may include a first sequence, a second sequence that is reverse complementary to the first sequence, and a stem-loop linking the first sequence and the second sequence.
  • transcriptional activator refers to a polypeptide or a fragment thereof that can activate or increase transcription of a target nucleic acid molecule.
  • transcriptional repressor refers to a polypeptide or a fragment thereof that is capable of arresting, preventing, or reducing transcription of a target nucleic acid.
  • treatment and “treating,” as used herein, are used in reference to a pharmaceutical or other intervention regimen for obtaining beneficial or desired results in the recipient.
  • beneficial or desired results include but are not limited to a therapeutic benefit and/or a prophylactic benefit.
  • a therapeutic benefit may refer to eradication or amelioration of symptoms or of an underlying disorder being treated.
  • a therapeutic benefit can be achieved with the eradication or amelioration of one or more of the physiological symptoms associated with the underlying disorder such that an improvement is observed in the subject, notwithstanding that the subject may still be afflicted with the underlying disorder.
  • a prophylactic effect includes delaying, preventing, or eliminating the appearance of a disease or condition, delaying, or eliminating the onset of symptoms of a disease or condition, slowing, halting, or reversing the progression of a disease or condition, or any combination thereof.
  • a subject at risk of developing a particular disease, or to a subject reporting one or more of the physiological symptoms of a disease may undergo treatment, even though a diagnosis of this disease may not have been made.
  • viral vector refers to a nucleic acid to be delivered into a host cell via a recombinantly produced virus or viral particle.
  • the nucleic acid may be single-stranded or double stranded, linear or circular, segmented or non-segmented.
  • the nucleic acid may comprise DNA, RNA, or a combination thereof.
  • viruses or viral particles that can deliver a viral vector include retroviruses (e.g, lentiviruses and y-retroviruses), adenoviruses, arenaviruses, alphaviruses, adeno-associated viruses (AAVs), baculoviruses, vaccinia viruses, herpes simplex viruses and poxviruses.
  • a viral vector delivered by such viruses or viral particles may be referred to by the type of virus to deliver the viral vector (e.g. , an AAV viral vector is a viral vector that is to be delivered by an adeno-associated virus).
  • a viral vector referred to by the type of virus to be delivered by the viral vector can contain viral elements (e.g. , nucleotide sequences) necessary for packaging of the viral vector into the virus or viral particle, replicating the virus, or other desired viral activities.
  • a virus containing a viral vector may be replication competent, replication deficient or replication defective.
  • compositions and systems comprising an effector protein and an engineered guide nucleic acid, which may simply be referred to herein as a guide nucleic acid.
  • an engineered effector protein and an engineered guide nucleic acid refer to an effector protein and a guide nucleic acid, respectively, that are not found in nature.
  • systems and compositions comprise at least one non-naturally occurring component.
  • compositions and systems may comprise a guide nucleic acid, wherein the sequence of the guide nucleic acid is different or modified from that of a naturally-occurring guide nucleic acid.
  • compositions and systems comprise at least two components that do not naturally occur together.
  • compositions and systems may comprise a guide nucleic acid comprising a repeat region and a spacer region which do not naturally occur together.
  • composition and systems may comprise a guide nucleic acid and an effector protein that do not naturally occur together.
  • an effector protein or guide nucleic acid that is “natural,” “naturally-occurring,” or “found in nature” includes effector proteins and guide nucleic acids from cells or organisms that have not been genetically modified by a human or machine.
  • the guide nucleic acid comprises a non-natural nucleobase sequence.
  • the non-natural sequence is a nucleobase sequence that is not found in nature.
  • the non-natural sequence may comprise a portion of a naturally-occurring sequence, wherein the portion of the naturally-occurring sequence is not present in nature, absent the remainder of the naturally-occurring sequence.
  • the guide nucleic acid comprises two naturally-occurring sequences arranged in an order or proximity that is not observed in nature.
  • compositions and systems comprise a ribonucleotide complex comprising an effector protein and a guide nucleic acid that do not occur together in nature.
  • Engineered guide nucleic acids may comprise a first sequence and a second sequence that do not occur naturally together.
  • an engineered guide nucleic acid may comprise a sequence of a naturally-occurring repeat region and a spacer region that is complementary to a naturally-occurring eukaryotic sequence.
  • the engineered guide nucleic acid may comprise a sequence of a repeat region that occurs naturally in an organism and a spacer region that does not occur naturally in that organism.
  • An engineered guide nucleic acid may comprise a first sequence that occurs in a first organism and a second sequence that occurs in a second organism, wherein the first organism and the second organism are different.
  • the guide nucleic acid may comprise a third sequence located at a 3 ’ or 5 ’ end of the guide nucleic acid, or between the first and second sequences of the guide nucleic acid.
  • an engineered guide nucleic acid may comprise a naturally occurring CRISPR RNA (crRNA) and trans-activating crRNA (tracrRNA) coupled by a linker sequence.
  • compositions and systems described herein comprise an engineered effector protein that is similar to a naturally occurring effector protein.
  • the engineered effector protein may lack a portion of the naturally occurring effector protein.
  • the effector protein may comprise a mutation relative to the naturally-occurring effector protein, wherein the mutation is not found in nature.
  • the effector protein may also comprise at least one additional amino acid relative to the naturally- occurring effector protein.
  • the effector protein may comprise an addition of a nuclear localization signal relative to the natural occurring effector protein.
  • the nucleotide sequence encoding the effector protein is codon optimized (e.g., for expression in a eukaryotic cell) relative to the naturally occurring sequence.
  • compositions that comprise one or more effector proteins and uses thereof are also provided herein.
  • compositions that comprise a nucleic acid wherein the nucleic acid encodes any of one the effector proteins described herein.
  • the nucleic acid may be a nucleic acid expression vector.
  • the nucleic acid expression vector may be a viral vector, such as an AAV vector.
  • Such an expression vector can comprise nucleic acid sequences encoding one or more effector proteins described herein, including in some embodiments, a nucleic acid sequence encoding an engineered guide nucleic acid and/or a donor nucleic acid described herein.
  • an effector protein is brought into proximity of a target nucleic acid in the presence of a guide nucleic acid when the guide nucleic acid includes a nucleotide sequence that is complementary with a target sequence in the target nucleic acid.
  • the ability of an effector protein to modify a target nucleic acid may be dependent upon the effector protein being bound to a guide nucleic acid and the guide nucleic acid being hybridized to a target nucleic acid.
  • An effector protein may also recognize a protospacer adjacent motif (PAM) sequence present in the target nucleic acid, which may direct the modification activity of the effector protein.
  • PAM protospacer adjacent motif
  • An effector protein may modify a nucleic acid by cis cleavage or trans cleavage.
  • the modification of the target nucleic acid generated by an effector protein may, as a non-limiting example, result in modulation of the expression of the nucleic acid (e.g., increasing or decreasing expression of the nucleic acid) or modulation of the activity of a translation product of the target nucleic acid (e.g., inactivation of a protein binding to an RNA molecule or hybridization).
  • An effector protein may be a CRISPR-associated (“Cas”) protein.
  • An effector protein may function as a single protein, including a single protein that is capable of binding to a guide nucleic acid and modifying a target nucleic acid.
  • an effector protein may function as part of a multiprotein complex, including, for example, a complex having two or more effector proteins, including two or more of the same effector proteins (e.g., dimer or multimer).
  • An effector protein when functioning in a multiprotein complex, may have only one functional activity (e.g., binding to a guide nucleic acid), while other effector proteins present in the multiprotein complex are capable of the other functional activity (e.g. , modifying a target nucleic acid).
  • An effector protein may be a modified effector protein having reduced modification activity (e.g., a catalytically defective effector protein) or no modification activity (e.g., a catalytically inactive effector protein). Accordingly, an effector protein as used herein encompasses a modified or programmable nuclease that does not have nuclease activity.
  • the effector protein comprises an amino acid sequence that is at least 65%, at least 70%, at least 75%, at least 80%, at least 85%, at least 90%, at least 95%, at least 97%, or at least 98%, at least 99%, or 100% identical to any one of SEQ ID NOs: 1-23 and 31-104.
  • the amino acid sequence of the effector protein is at least 65%, at least 70%, at least 75%, at least 80%, at least 85%, at least 90%, at least 95%, at least 97%, or at least 98%, at least 99%, or 100% identical to any one of SEQ ID NOs: 1-23, 31-104 and 121-126. In some embodiments, the amino acid sequence is at least 80% identical to any one of SEQ ID NOs: 1-23, 31-104 and 121-126. In some embodiments, the amino acid sequence is at least 85% identical to any one of SEQ ID NOs: 1-23, 31-104 and 121-126.
  • the amino acid sequence is at least 90% identical to any one of SEQ ID NOs: 1-23, 31-104 and 121-126. In some embodiments, the amino acid sequence is at least 95% identical to any one of SEQ ID NOs: 1-23, 31-104 and 121-126. In some embodiments, the amino acid sequence is at least 97% identical to any one of SEQ ID NOs: 1-23, 31-104 and 121-126. In some embodiments, the amino acid sequence is at least 99% identical to any one of SEQ ID NOs: 1-23, 31-104 and 121- 126. In some embodiments, the amino acid sequence is 100% identical to any one of SEQ ID NOs: 1- 23, 31-104 and 121-126.
  • compositions comprise a nucleic acid sequence encoding an amino acid sequence of an effector protein, wherein the amino acid sequence is at least 65%, at least 70%, at least 75%, at least 80%, at least 85%, at least 90%, at least 95%, at least 97%, or at least 98%, at least 99%, or 100% identical to any one of SEQ ID NOs: 1-23, 31-104 and 121-126.
  • the nucleic acid sequence encodes the amino acid sequence, wherein the amino acid sequence is at least 85% identical to any one of SEQ ID NOs: 1-23, 31-104 and 121-126.
  • the nucleic acid sequence encodes the amino acid sequence, wherein the amino acid sequence is at least at least 90% identical to any one of SEQ ID NOs: 1-23, 31-104 and 121-126. In some embodiments, the nucleic acid sequence encodes the amino acid sequence, wherein the amino acid sequence is at least 95% identical to any one of SEQ ID NOs: 1-23, 31-104 and 121-126. In some embodiments, the nucleic acid sequence encodes the amino acid sequence, wherein the amino acid sequence is at least 97% identical to any one of SEQ ID NOs: 1-23, 31-104 and 121-126.
  • the nucleic acid sequence encodes the amino acid sequence, wherein the amino acid sequence is at least 99% identical to any one of SEQ ID NOs: 1-23, 31-104 and 121-126. In some embodiments, the nucleic acid sequence encodes the amino acid sequence, wherein the amino acid sequence is at least 100% identical to any one of SEQ ID NOs: 1-23, 31-104 and 121-126.
  • compositions comprise an effector protein, wherein effector protein comprises an amino acid sequence that is at least 65%, at least 70%, at least 75%, at least 80%, at least 85%, at least 90%, at least 95%, at least 97%, or at least 98%, at least 99%, or 100% identical to any one of SEQ ID NO: 1-23, 31-104 and 121-126.
  • the amino acid sequence is at least 80% identical to any one of SEQ ID NO: 1-23, 31-104 and 121-126.
  • the amino acid sequence is at least 85% identical to any one of SEQ ID NO: 1-23, 31-104 and 121-126.
  • the amino acid sequence is at least 90% identical to any one of SEQ ID NO: 1-23, 31-104 and 121-126. In some embodiments, the amino acid sequence is at least 95% identical to any one of SEQ ID NO: 1-23, 31-104 and 121-126. In some embodiments, the amino acid sequence is at least 97% identical to any one of SEQ ID NO: 1-23, 31-104 and 121-126. In some embodiments, the amino acid sequence is at least 99% identical to any one of SEQ ID NO: 1-23, 31-104 and 121-126. In some embodiments, the amino acid sequence is 100% identical to SEQ ID NO: 1-23, 31-104 and 121- 126.
  • compositions comprise an effector protein, wherein the amino acid sequence of the effector protein is at least 65%, at least 70%, at least 75%, at least 80%, at least 85%, at least 90%, at least 95%, at least 97%, or at least 98%, at least 99%, or 100% identical to any one of SEQ ID NO: 1-23, 31-104 and 121-126.
  • the amino acid sequence of the effector protein is at least 80% identical to any one of SEQ ID NO: 1-23, 31-104 and 121-126.
  • the amino acid sequence of the effector protein is at least 85% identical to any one of SEQ ID NO: 1-23, 31-104 and 121-126.
  • the amino acid sequence of the effector protein is at least 90% identical to any one of SEQ ID NO: 1-23, 31-104 and 121-126. In some embodiments, the amino acid sequence of the effector protein is at least 95% identical to any one of SEQ ID NO: 1-23, 31-104 and 121-126. In some embodiments, the amino acid sequence of the effector protein is at least 97% identical to any one of SEQ ID NO: 1-23, 31-104 and 121-126. In some embodiments, the amino acid sequence of the effector protein is at least 99% identical to any one of SEQ ID NO: 1-23, 31-104 and 121-126. In some embodiments, the amino acid sequence of the effector protein is 100% identical to any one of SEQ ID NO: 1-23, 31-104 and 121-126.
  • compositions comprise an effector protein, wherein portion of the amino acid sequence of the effector protein is at least 65%, at least 70%, at least 75%, at least 80%, at least 85%, at least 90%, at least 95%, at least 97%, or at least 98%, at least 99%, or 100% identical to an equal length portion of a sequence selected from any one of SEQ ID NOs: 1-23, 31-104, and 121-126.
  • the length of the portion is selected from: 20 to 40, 40 to 60, 60 to 80, 80 to 100, 100 to 120, 120 to 140, 140 to 160, 160 to 180, 180 to 200, 200 to 220, 220 to 240, 240 to 260, 260 to
  • the length of the portion is selected from: 400 to 420, 420 to 440, 440 to 460, 460 to 480, 480 to 500, 520 to 540, 540 to 560, 560 to 580, 580 to 600, 600 to 620, 620 to 640, 640 to 660, 660 to
  • the length of the portion is selected from: 1000 to 1020, 1020 to 1040, 1040 to 1060, 1060 to 1080, 1080 to 1100, 1100 to 1120, 1120 to 1140, 1140 to 1160, 1160 to 1180, 1180 to 1200,
  • effector proteins comprise a functional domain.
  • the functional domain may comprise nucleic acid binding activity.
  • the functional domain may comprise catalytic activity, also referred to as enzymatic activity.
  • the catalytic activity may be nuclease activity.
  • the nuclease activity may comprise cleaving a strand of a nucleic acid.
  • the nuclease activity may comprise cleaving only one strand of a double stranded nucleic acid, also referred to as nicking.
  • the functional domain is an HNH domain.
  • the functional domain is a RuvC domain.
  • the RuvC domain comprises multiple subdomains.
  • the functional domain is a zinc finger binding domain.
  • the functional domain is a HEPN domain.
  • effector proteins lack a certain functional domain.
  • the effector protein lacks an HNH domain.
  • effector proteins lack a zinc finger binding domain.
  • the effector protein comprises a nuclease domain. In some embodiments, the effector protein comprises a nuclease domain located at the C-terminus. In some embodiments, the effector protein comprises a helicase domain. In some embodiments, the effector protein comprises a RecAl domain. In some embodiments, the effector protein comprises a RecA2 domain. In some embodiments, the effector protein comprises a RuvA domain. In some embodiments, the effector protein comprises a RuvA dimer.
  • an effector protein comprising a helicase domain has an amino acid sequence of at least 800, at least 850, at least 900, at least 950, at least 1000, at least 1050, at least 1100, at least 1150, at least 1200, at least 1250, at least 1300, at least 1350, at least 1400, at least 1450, at least 1500, at least 1550, at least 1600, at least 1650, or at least 1680 linked amino acids.
  • an effector protein comprises a helicase domain.
  • the effector protein comprising a helicase domain comprises about 800 to about 900, about 900 to about 1000, about 1000 to about 1100, about 1100 to about 1200, about 1200 to about 1300, about 1300 to about 1400, about 1400 to about 1500, about 1500 to about 1600, about 1600 to about 1700, or about 1700 to about 1800 linked amino acids.
  • the enzymatic activity of the effector protein comprises helicase activity. In some embodiments, the enzymatic activity of the effector protein comprises nuclease activity. In some embodiments, the enzymatic activity of the effector protein comprises helicase activity and nuclease activity.
  • compositions comprising multiple effector proteins and/or multiple domains having various enzymatic activity as described herein.
  • compositions comprising a first effector protein, or fragment thereof, linked to a second effector protein, or fragment thereof.
  • compositions comprising multiple effector proteins.
  • a composition comprises at least one effector protein.
  • a composition comprises an effector protein with multiple domains with enzymatic activity as described herein (e.g., nuclease and/or helicase domains).
  • a composition comprises dimers of effector proteins, or fragments thereof that are operatively linked.
  • a composition comprises multimers of effector proteins, or fragments thereof that are operatively linked.
  • a composition comprises an effector protein with helicase activity.
  • a composition comprises at least one effector protein.
  • the effector protein further comprises multiple domains, wherein a first domain comprises an amino acid sequence of at least 70%, at least 75%, at least 80%, at least 85%, at least 90%, at least 95%, at least 97%, at least 98%, at least 99% or 100% identical to a sequence described in TABLE 1.
  • the first domain is linked to a second domain comprising an amino acid sequence of at least 70%, at least 75%, at least 80%, at least 85%, at least 90%, at least 95%, at least 97%, at least 98%, at least 99% or 100% identical to a sequence described in TABLE 1.
  • the amino acid sequence of the effector protein is at least 70%, at least 75%, at least 80%, at least 85%, at least 90%, at least 95%, at least 97%, at least 98%, at least 99% or 100% identical to one or more sequences described in TABLE 1.
  • a composition may comprise multiple effector proteins each with an amino acid sequence of at least 70%, at least 75%, at least 80%, at least 85%, at least 90%, at least 95%, at least 97%, at least 98%, at least 99% or 100% identical to a sequence described in TABLE 1.
  • compositions, systems, and methods described herein comprise an effector protein, or a nucleic acid encoding the effector protein, wherein the effector protein comprises an amino acid sequence that is at least 80%, at least 85%, at least 90%, at least 95%, at least 97%, at least 98%, at least 99%, or 100% similar to any one of the sequences as set forth in TABLE 1.
  • an effector protein provided herein comprises an amino acid sequence that is at least 80% similar to any one of the sequences as set forth in TABLE 1.
  • an effector protein provided herein comprises an amino acid sequence that is at least 85% similar to any one of the sequences as set forth in TABLE 1.
  • an effector protein provided herein comprises an amino acid sequence that is at least 90% identical to any one of the sequences as set forth in TABLE 1. In some embodiments, an effector protein provided herein comprises an amino acid sequence that is at least 95% similar to any one of the sequences as set forth in TABLE 1. In some embodiments, an effector protein provided herein comprises an amino acid sequence that is at least 97% similar to any one of the sequences as set forth in TABLE 1. In some embodiments, an effector protein provided herein comprises an amino acid sequence that is at least 98% identical to any one of the sequences as set forth in TABLE 1. In some embodiments, an effector protein provided herein comprises an amino acid sequence that is at least 99% similar to any one of the sequences as set forth in TABLE 1. In some embodiments, an effector protein provided herein comprises an amino acid sequence that is 100% similar to any one of the sequences as set forth in TABLE 1.
  • the similarity of two amino acid sequences can be calculated by using a BLOSUM62 similarity matrix (Henikoff and Henikoff, Proc. Natl. Acad. Set. USA., 89: 10915-10919 (1992)) that is transformed so that any value > 1 is replaced with +1 and any value ⁇ 0 is replaced with 0.
  • a BLOSUM62 similarity matrix Henikoff and Henikoff, Proc. Natl. Acad. Set. USA., 89: 10915-10919 (1992)
  • an He (I) to Leu (L) substitution is scored at +2.0 by the BLOSUM62 similarity matrix, which in the transformed matrix is scored at +1.
  • the proteins when comparing two full protein sequences, can be aligned using pairwise MUSCLE alignment. Then, the % similarity can be scored at each residue and divided by the length of the alignment. For determining % similarity over a protein domain or motif, a multilevel consensus sequence (or PROSITE motif sequence) can be used to identify how strongly each domain or motif is conserved. In calculating the similarity of a domain or motif, the second and third levels of the multilevel sequence are treated as equivalent to the top level. Additionally, in some embodiments, if a substitution could be treated as conservative with any of the amino acids in that position of the multilevel consensus sequence, +1 point is assigned.
  • a composition described herein comprises a Type 1 CRISPR system.
  • a composition comprises a Type 1 CRISPR system and an effector protein.
  • the Type 1 CRISPR system forms a complex with an effector protein described herein.
  • compositions comprising a Type 1 CRISPR system and an effector protein function together as a complex.
  • effector proteins catalyze cleavage of a target nucleic acid in a cell or a sample.
  • the target nucleic acid is single stranded (ss).
  • the target nucleic acid is double stranded (ds).
  • the target nucleic acid is dsDNA.
  • the target nucleic acid is ssDNA.
  • the target nucleic acid is RNA.
  • effector proteins cleave the target nucleic acid within a target sequence of the target nucleic acid.
  • effector proteins cleave the target nucleic acid, as well as additional nucleic acids in the cell or the sample, which may be referred to as trans cleavage activity or simply trans cleavage activity. In some embodiments, effector proteins catalyze cis cleavage activity. In some embodiments, effector proteins cleave both strands of dsDNA.
  • effector proteins modify (e.g., cleave, nick, or unwind) a target nucleic acid within or near a PAM sequence of the target nucleic acid.
  • the modification e.g., cleavage, nicking, unwinding
  • a target nucleic acid may comprise a PAM sequence adjacent to a sequence that is complementary to a guide nucleic acid spacer region.
  • a given polypeptide may not require a PAM sequence being present in a target nucleic acid for the effector protein to modify the target nucleic acid.
  • a polypeptide e.g., effector protein
  • TABLE 2 illustrates exemplary PAM sequences.
  • a target nucleic acid includes a PAM sequence described herein (e.g, SEQ ID NOs: 28, 29, 131-142).
  • a target nucleic acid includes a PAM sequence described herein (e.g., SEQ ID NO: 28 or 29 or GAA or TTC).
  • an effector protein described herein recognizes PAM sequence described herein (e.g., SEQ ID NO: 28 or 29 or GAA or TTC).
  • the effector protein recognizes a PAM sequence comprising any one of SEQ ID NO: 28-29, and 131-142.
  • the effector protein recognizing a PAM sequence comprising SEQ ID NO: 28 comprises the amino acid sequence of SEQ ID NO: 22.
  • the effector protein recognizing a PAM sequence comprising SEQ ID NO: 29 comprises the amino acid sequence of SEQ ID NO: 23.
  • the effector protein recognizes a PAM sequence comprising GAA.
  • the effector protein recognizes a PAM sequence comprising TTC.
  • the effector protein recognizing a PAM sequence comprising GAA or TTC comprises the amino acid sequence of any one of SEQ ID NO: 31-104.
  • the effector protein recognizing a PAM sequence comprising any one of SEQ ID NOs: 131-140 comprises the amino acid sequence of any one of SEQ ID NO: 121-126. In some embodiments, the effector protein recognizing a PAM sequence comprising SEQ ID NO: 135 or 136 comprises the amino acid sequence of SEQ ID NO: 121 In some embodiments, the effector protein recognizing a PAM sequence comprising SEQ ID NO: 131 or 132 comprises the amino acid sequence of SEQ ID NO: 122. In some embodiments, the effector protein recognizing a PAM sequence comprising SEQ ID NO: 133 or 134 comprises the amino acid sequence of SEQ ID NO: 123.
  • the effector protein recognizing a PAM sequence comprising SEQ ID NO: 139 or 140 comprises the amino acid sequence of SEQ ID NO: 124 In some embodiments, the effector protein recognizing a PAM sequence comprising SEQ ID NO: 137 or 138 comprises the amino acid sequence of SEQ ID NO: 125. In some embodiments, the effector protein recognizing a PAM sequence comprising SEQ ID NO: 131 comprises the amino acid sequence of SEQ ID NO: 126. In some embodiments, effector proteins do not require a PAM sequence to cleave or a nick a target nucleic acid.
  • effector proteins disclosed herein are engineered proteins.
  • Engineered proteins are not identical to a naturally-occurring protein.
  • Engineered proteins may not comprise an amino acid sequence that is identical to that of a naturally-occurring protein.
  • the amino acid sequence of an engineered protein is not identical to that of a naturally occurring protein.
  • Engineered proteins may provide an increased activity relative to a naturally occurring protein.
  • Engineered proteins may provide a reduced activity relative to a naturally occurring protein.
  • the activity may be nuclease activity.
  • the activity may be nickase activity.
  • the activity may be nucleic acid binding activity.
  • Engineered proteins may provide an increased or reduced activity relative to a naturally occurring protein under a given condition of a cell or sample in which the activity occurs.
  • the condition may be temperature.
  • the temperature may be greater than 20 °C, greater than 25 °C, greater than 30 °C, greater than 35 °C, greater than 40 °C, greater than 45 °C, greater than 50 °C, greater than 55 °C, greater than 60 °C, greater than 65 °C, or greater than 70 °C, but not greater than 80 °C.
  • the condition may be the presence of a salt.
  • the salt may be a magnesium salt, a zinc salt, a potassium salt, a calcium salt or a sodium salt.
  • the condition may be the concentration of one or more salts.
  • the amino acid sequence of an engineered protein comprises at least one residue that is different from that of a naturally occurring protein. In some embodiments, the amino acid sequence of an engineered protein comprises at least 2, at least 3, at least 4, at least 5, at least 6, at least 7, at least 8, at least 9 or at least 10 residues that are different from that of a naturally occurring protein.
  • the residues in the engineered protein that differ from those at corresponding positions of the naturally occurring protein (when the engineered and naturally occurring proteins are aligned for maximal identity) may be referred to as substituted residues or amino acid substitutions. In some embodiments, the substituted residues are non-conserved residues relative to the residues at corresponding positions of the naturally occurring protein.
  • a non-conserved residue has a different physicochemical property from the amino acid for which it substitutes.
  • Physicochemical properties include aliphatic, cyclic, aromatic, basic, acidic and hydroxyl -containing.
  • Glycine, alanine, valine, leucine and isoleucine are aliphatic.
  • Serine, Cysteine, threonine and methionine are hydroxyl -containing.
  • Proline is cyclic. Phenylalanine, tyrosine, tryptophan are basic.
  • Aspartate, Glutamate, Asparagine and glutamine are acidic.
  • engineered proteins are designed to be catalytically inactive or to have reduced catalytic activity relative to a naturally occurring protein.
  • a catalytically inactive effector protein may be generated by substituting an amino acid that confers a catalytic activity (also referred to as a “catalytic residue”) with a substituted residue that does not support the catalytic activity.
  • the substituted residue has an aliphatic side chain.
  • the substituted residue is glycine.
  • the substituted residue is valine.
  • the substituted residue is leucine.
  • the substituted residue is alanine.
  • the amino acid is aspartate and it is substituted with asparagine. In some embodiments, the amino acid is glutamate and it is substituted with glutamine.
  • An amino acid that confers catalytic activity may be identified by performing sequence alignment of an unmodified effector protein with a similar enzyme having at least one identified catalytic residue; selecting at least one putative catalytic residue in the unmodified effector protein within the portion of the unmodified effector protein that aligns with a portion of the similar enzyme that comprises the identified catalytic residue; substituting the at least one putative catalytic residue of the unmodified effector protein with the different amino acid; and comparing the catalytic activity of the unmodified effector protein to the modified effector protein.
  • a similar enzyme may be an enzyme that is at least 50%, at least 60%, at least 70%, at least 80%, or at least 90% identical to the unmodified effector protein.
  • a similar enzyme may be an enzyme that is not greater than 99.9% identical to the unmodified effector protein.
  • the portion of the unmodified effector protein that aligns with a portion of the similar enzyme is at least 10 amino acids, at least 20 amino acids, at least 30 amino acids, at least 40 amino acids, at least 50 amino acids, at least 60 amino acids, at least 70 amino acids, at least 80 amino acids, at least 90 amino acids, or at least 100 amino acids in length.
  • the portion of the unmodified effector protein that aligns with a portion of the similar enzyme is not greater than 200 amino acids.
  • the portion of the unmodified effector protein that aligns with a portion of the similar enzyme comprises a functional domain (e.g., HEPN, HNH, RuvC, zinc finger binding).
  • comparing the catalytic activity comprises performing a cleavage assay. An example of generating a catalytically inactive effector protein is provided in Example 7.
  • compositions comprise a fusion effector protein, wherein the fusion effector protein comprises an effector protein described herein.
  • compositions comprise a nucleic acid encoding the fusion effector protein.
  • fusion effector proteins comprise an effector protein or a portion thereof, and a fusion partner protein.
  • compositions comprise a fusion effector protein and a guide nucleic acid, wherein at least a portion of the guide nucleic acid hybridizes to a target nucleic acid, and the fusion partner modulates the target nucleic acid or expression thereof.
  • a fusion partner protein may also simply be referred to herein as a fusion partner.
  • the effector protein and the fusion partner protein are heterologous proteins.
  • fusion effector proteins modify a target nucleic acid or the expression thereof.
  • the modifications are transient (e.g., transcription repression or activation).
  • the modifications are inheritable.
  • epigenetic modifications made to a target nucleic acid, or to proteins associated with the target nucleic acid, e.g., nucleosomal histones, in a cell are observed in cells produced by proliferation of the cell.
  • fusion effector proteins modify a target nucleic acid or the expression thereof, wherein the target nucleic acid comprises a deoxyribonucleoside, a ribonucleoside or a combination thereof.
  • the target nucleic acid may comprise or consist of a single stranded RNA (ssRNA), a double-stranded RNA (dsRNA), a single -stranded DNA (ssDNA), or a double stranded DNA (dsDNA).
  • Non-limiting examples of fusion partners for modifying ssRNA include, but are not limited to, splicing factors (e.g., RS domains); protein translation components (e.g., translation initiation, elongation, and/or release factors; e.g. , eIF4G); RNA methylases; RNA editing enzymes (e.g., RNA deaminases, e.g., adenosine deaminase acting on RNA (ADAR), including A to I and/or C to U editing enzymes); helicases; and RNA-binding proteins.
  • splicing factors e.g., RS domains
  • protein translation components e.g., translation initiation, elongation, and/or release factors; e.g. , eIF4G
  • RNA methylases e.g., RNA editing enzymes (e.g., RNA deaminases, e.g., adenos
  • the fusion partner protein is fused to the 5’ end of the effector protein. In some embodiments, the fusion partner protein is fused to the 3’ end of the effector protein. In some embodiments, the effector protein is located at an internal location of the fusion partner protein. In some embodiments, the fusion partner protein is located at an internal location of the Cas effector protein. For example, a base editing enzyme (e.g., a deaminase enzyme) is inserted at an internal location of a Cas effector protein. The effector protein may be fused directly or indirectly (e.g., via a linker) to the fusion partner protein. Exemplary linkers are described herein.
  • fusion partners inhibit or reduce expression of a target nucleic acid.
  • fusion partners reduce expression of the target nucleic acid relative to its expression in the absence of the fusion effector protein.
  • Relative expression including transcription and RNA levels, may be assessed, quantified, and compared, e.g., by RT-qPCR.
  • fusion partners may comprise a transcriptional repressor.
  • Transcriptional repressors may inhibit transcription via: recruitment of other transcription factor proteins; modification of target DNA such as methylation; recruitment of a DNA modifier; modulation of histones associated with target DNA; recruitment of a histone modifier such as those that modify acetylation and/or methylation of histones; or a combination thereof.
  • fusion partners that decrease or inhibit transcription include, but are not limited to: histone lysine methyltransferases; histone lysine demethylases; histone lysine deacetylases; and DNA methylases; and functional domains thereof.
  • fusion partners activate or increase expression of a target nucleic acid.
  • fusion partners increase expression of the target nucleic acid relative to its expression in the absence of the fusion effector protein. Relative expression, including transcription and RNA levels, may be assessed, quantified, and compared, e.g., by RT-qPCR.
  • fusion partners comprise a transcriptional activator. Transcriptional activators may promote transcription via: recruitment of other transcription factor proteins; modification of target DNA such as demethylation; recruitment of a DNA modifier; modulation of histones associated with target DNA; recruitment of a histone modifier such as those that modify acetylation and/or methylation of histones; or a combination thereof.
  • Non-limiting examples of fusion partners that activate or increase transcription include, but are not limited to: histone lysine methyltransferases; histone lysine demethylases; histone acetyltransferases; and DNA demethylases; and functional domains thereof.
  • fusion partners modify a nucleobase of a target nucleic acid. Fusion proteins comprising such fusion partners and a catalytically inactive Cas effector protein may be referred to as base editors.
  • base editors modify a sequence of a target nucleic acid.
  • base editors provide a nucleobase change in a DNA molecule.
  • the nucleobase change in the DNA molecule is selected from: an adenine (A) to guanine (G); cytosine (C) to thymine (T); and cytosine (C) to guanine (G).
  • base editors provide a nucleobase change in an RNA molecule.
  • the nucleobase change in the RNA molecule is selected from: adenine (A) to guanine (G); uracil (U) to cytosine (C); cytosine (C) to guanine (G); and guanine (G) to adenine (A).
  • the fusion partner is a deaminase, e.g., ADAR1/2.
  • RNA base editors modify a nucleobase of an RNA.
  • RNA base editors comprise an adenosine deaminase.
  • ADAR proteins bind to RNAs and alter their sequence by changing an adenosine into an inosine.
  • RNA base editors comprise a Cas effector protein that is activated by or binds RNA.
  • Cas effector proteins that are activated by or bind RNA are Cas 13 proteins.
  • base editors are used to treat a subject having or a subject suspected of having a disease related to a gene of interest.
  • base editors are useful for treating a disease or a disorder caused by a point mutation in a gene of interest.
  • compositions comprise a base editor and a guide nucleic acid, wherein the guide nucleic acid directs the base editor to a sequence in a target gene.
  • the target gene may be associated with a disease.
  • the guide nucleic acid directs that base editor to or near a mutation in the sequence of a target gene.
  • the mutation may be the deletion of one more nucleotide.
  • the mutation may be the addition of one or more nucleotides.
  • the mutation may be the substitution of one or more nucleotides.
  • the mutation may be the insertion, deletion or substitution of a single nucleotide, also referred to as a point mutation.
  • the point mutation may be a SNP.
  • the mutation may be associated with a disease.
  • the guide nucleic acid directs the base editor to bind a target sequence within the target nucleic acid that is within 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, or 20 nucleotides of the mutation.
  • the guide nucleic acid comprises a sequence that is identical, complementary or reverse complementary to a target sequence of a target nucleic acid that comprises the mutation.
  • the guide nucleic acid comprises a sequence that is identical, complementary or reverse complementary to a target sequence of a target nucleic acid that is within 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, or 20 nucleotides of the mutation.
  • fusion partners comprise an RNA splicing factor.
  • the RNA splicing factor may be used (in whole or as fragments thereof) for modular organization, with separate sequencespecific RNA binding modules and splicing effector domains.
  • RNA splicing factors include members of the Serine/ Arginine-rich (SR) protein family contain N-terminal RNA recognition motifs (RRMs) that bind to exonic splicing enhancers (ESEs) in pre-mRNAs and C-terminal RS domains that promote exon inclusion.
  • RRMs N-terminal RNA recognition motifs
  • ESEs exonic splicing enhancers
  • the hnRNP protein hnRNP Al binds to exonic splicing silencers (ESSs) through its RRM domains and inhibits exon inclusion through a C- terminal Glycine-rich domain.
  • ESSs exonic splicing silencers
  • Some splicing factors may regulate alternative use of splice site (ss) by binding to regulatory sequences between the two alternative sites.
  • ASF/SF2 may recognize ESEs and promote the use of intron proximal sites, whereas hnRNP Al may bind to ESSs and shift splicing towards the use of intron distal sites.
  • One application for such factors is to generate ESFs that modulate alternative splicing of endogenous genes, particularly disease associated genes.
  • Bcl-x pre-mRNA produces two splicing isoforms with two alternative 5 ’ splice sites to encode proteins of opposite functions.
  • the long splicing isoform Bcl-xL is a potent apoptosis inhibitor expressed in long-lived postmitotic cells and is up-regulated in many cancer cells, protecting cells against apoptotic signals.
  • the short isoform Bcl-xS is a pro-apoptotic isoform and expressed at high levels in cells with a high turnover rate (e.g., developing lymphocytes).
  • the ratio of the two Bcl-x splicing isoforms is regulated by multiple co'j-clcmcnts that are located in either the core exon region or the exon extension region (i.e., between the two alternative 5’ splice sites). For more examples, see W02010075303, which is hereby incorporated by reference in its entirety.
  • effector proteins and fusion partners of a fusion effector protein are connected via a linker.
  • the linker may comprise or consist of a covalent bond.
  • the linker may comprise or consist of a chemical group.
  • the linker comprises an amino acid.
  • the linker connects a terminus of the effector protein to a terminus of the fusion partner.
  • the carboxy terminus of the effector protein is linked to the amino terminus of the fusion partner.
  • the carboxy terminus of the fusion partner is linked to the amino terminus of the effector protein.
  • fusion effector proteins disclosed herein comprise a linker, wherein the linker comprises or consists of a peptide.
  • the peptide may comprise a region of rigidity (e.g., beta sheet, alpha helix), a region of flexibility, or any combination thereof.
  • the linker comprises small amino acids, such as glycine and alanine, that impart linker flexibility.
  • the linker comprises amino acids that impart linker rigidity, such as valine and isoleucine.
  • linkers comprise or consist of a non-peptide linker.
  • non-peptide linkers are linkers comprising polyethylene glycol (PEG), polypropylene glycol (PPG), co-poly(ethylene/propylene) glycol, polyoxyethylene (POE), polyurethane, polyphosphazene, polysaccharides, dextran, polyvinyl alcohol, polyvinylpyrrolidones, polyvinyl ethyl ether, polyacrylamide, polyacrylate, polycyanoacrylates, lipid polymers, chitins, hyaluronic acid, heparin, an alkyl linker, or a combination thereof.
  • linkers comprise or consist of a nucleic acid.
  • the nucleic acid comprises DNA.
  • the nucleic acid comprises RNA.
  • the effector protein and the fusion partner each interact with the nucleic acid, the nucleic acid thereby linking the effector protein and the fusion partner.
  • the nucleic acid serves as a scaffold for both the effector protein and the fusion partner to interact with, thereby linking the effector protein and the fusion partner.
  • nucleic acids include those described by Tadakuma et al., (2016), Progress in Molecular Biology and Translational Science, Volume 139, 2016, Pages 121-163, incorporated herein by reference.
  • the fusion effector protein or the guide nucleic acid comprises a chemical modification that allows for direct crosslinking between the guide nucleic acid or the effector protein and the fusion partner.
  • the chemical modification may comprise any one of a SNAP-tag, CLIP -tag, ACP-tag, Halo-tag, and an MCP-tag.
  • modifications are introduced with a Click Reaction, also known as Click Chemistry. The Click reaction may be copper dependent or copper independent.
  • guide nucleic acids comprise an aptamer.
  • the aptamer may serve as a linker between the effector protein and the fusion partner by interacting non-covalently with both.
  • the aptamer binds a fusion partner, wherein the fusion partner is a transcriptional activator.
  • the aptamer binds a fusion partner, wherein the fusion partner is a transcriptional inhibitor.
  • the aptamer binds a fusion partner, wherein the fusion partner comprises a base editor.
  • the aptamer binds the fusion partner directly.
  • the aptamer binds the fusion partner indirectly.
  • Aptamers may bind the fusion partner indirectly through an aptamer binding protein.
  • the aptamer binding protein may be MS2 and the aptamer sequence may be ACATGAGGATCACCCATGT (SEQ ID NO: 127); the aptamer binding protein may be PP7 and the aptamer sequence may be GGAGCAGACGATATGGCGTCGCTCC (SEQ ID NO: 128); or the aptamer binding protein may be BoxB and the aptamer sequence may be GCCCTGAAGAAGGGC (SEQ ID NO: 129).
  • the fusion partner is located within effector protein.
  • the fusion partner may be a domain of a fusion partner protein that is internally integrated into the effector protein.
  • the fusion partner may be located between the 5’ and 3’ ends of the effector protein without disrupting the ability of the fusion effector protein to recognize/bind a target nucleic acid.
  • the fusion partner replaces a portion of the effector protein.
  • the fusion partner replaces a domain of the effector protein.
  • the fusion partner does not replace a portion of the effector protein.
  • an effector protein disclosed herein or fusion effector protein may comprise a nuclear localization signal (NLS).
  • NLS nuclear localization signal
  • TABLE 2.1 recites exemplary NLS sequences.
  • the NLS may comprise a sequence of KRPAATKKAGQAKKKK (SEQ ID NO: 130).
  • the NLS comprises or consists of a sequence of PKKKRKV (SEQ ID NO: 119).
  • the NLS comprises or consists of a sequence of LPPLERLTL (SEQ ID NO: 120).
  • An effector protein may be codon optimized for expression in a specific cell, for example, a bacterial cell, a plant cell, a eukaryotic cell, an animal cell, a mammalian cell, or a human cell.
  • the effector protein is codon optimized for a human cell.
  • the NLS may be located at a variety of locations, including, but not limited to 5’ of the effector protein, 5’ of the fusion partner, 3’ of the effector protein, 3 ’ of the fusion partner, between the effector protein and the fusion partner, within the fusion partner, within the effector protein.
  • compositions, systems, and methods of the present disclosure may comprise a guide nucleic acid or a use thereof.
  • a guide nucleic acid may bind to an effector protein.
  • a guide nucleic acid may comprise one or more deoxyribonucleotides, one or more ribonucleotides, one or more chemically modified nucleotides, or a combination thereof.
  • the guide nucleic acid comprises a CRISPR RNA (crRNA), at least a portion of which is complementary to a target sequence of a target nucleic acid.
  • the guide nucleic acid comprises a trans-activating CRISPR RNA (tracrRNA) that interacts with the effector protein.
  • the tracrRNA may hybridize to a portion of the guide nucleic acid that does not hybridize to the target nucleic acid.
  • the guide nucleic acid does not comprise a tracrRNA.
  • the composition does not comprise a tracrRNA.
  • the guide RNA is a single guide RNA (sgRNA) (e.g., a crRNA linked to a tracrRNA sequence).
  • sgRNA single guide RNA
  • the portion of a sgRNA having a tracrRNA sequence and a repeat sequence of a crRNA is referred to as a handle sequence, wherein the sgRNA comprises a repeat sequence or a portion thereof linked to all or a portion of a tracrRNA sequence .
  • a crRNA and tracrRNA function as two separate, unlinked molecules.
  • Guide nucleic acids are often referred to as “guide RNA.”
  • a guide nucleic acid may comprise deoxyribonucleotides.
  • the term “guide RNA,” as well as crRNA, tracrRNA and sgRNA, includes guide nucleic acids comprising DNA bases and RNA bases, and chemically modified nucleobases.
  • the guide RNA may be chemically synthesized or recombinantly produced.
  • the sequence of the guide nucleic acid, or a portion thereof, may be different from the sequence of a naturally occurring nucleic acid.
  • the sequence of the guide nucleic acid may comprise two or more heterologous sequences.
  • Guide nucleic acids when complexed with an effector protein, may bring the effector protein into proximity of a target nucleic acid.
  • Sufficient conditions for hybridization of a guide nucleic acid to a target nucleic acid and/or for binding of a guide nucleic acid to an effector protein include in vivo physiological conditions of a desired cell type or in vitro conditions sufficient for assaying catalytic activity of a protein, polypeptide or peptide described herein, such as the nuclease activity of an effector protein.
  • Guide nucleic acids may comprise DNA, RNA, or a combination thereof (e.g., RNA with a thymine base).
  • Guide nucleic acids may include a chemically modified nucleobase or phosphate backbone. Guide nucleic acids may be referred to herein as a guide RNA (gRNA). However, a guide RNA is not limited to ribonucleotides, but may comprise deoxyribonucleotides and other chemically modified nucleotides.
  • a guide nucleic acid may comprise a CRISPR RNA (crRNA), a shortcomplementarity untranslated RNA (scoutRNA), an associated trans-activating RNA (tracrRNA) sequence or a combination thereof.
  • a crRNA with a tracrRNA sequence may be referred to herein as a single guide RNA (sgRNA), wherein the crRNA and the tracrRNA sequence are covalently linked.
  • a portion of the sgRNA comprising a tracrRNA sequence or a portion thereof and a repeat sequence of a crRNA or a portion thereof is referred to as a handle sequence, wherein the crRNA and tracrRNA sequences are covalently linked (e.g. , phosphodiester bond).
  • the handle sequence of the sgRNA further comprises a linker, wherein the linker comprises one or more nucleotides.
  • the tracrRNA sequence or a portion thereof and the repeat sequence of crRNA or a portion thereof are linked by the linker of the sgRNA. In some embodiments, the crRNA and tracrRNA sequence are linked by a phosphodiester bond. In some embodiments, the crRNA and tracrRNA sequence are linked by one or more linked nucleotides.
  • a guide nucleic acid may comprise a naturally occurring guide nucleic acid.
  • a guide nucleic acid may comprise a non-naturally occurring guide nucleic acid, including a guide nucleic acid that is designed to contain a chemical or biochemical modification.
  • a crRNA may be the product of processing of a longer precursor CRISPR RNA (pre-crRNA) transcribed from the CRISPR array by cleavage of the pre- crRNA within each direct repeat sequence to afford shorter, mature crRNAs.
  • pre-crRNA precursor CRISPR RNA
  • a crRNA may be generated by a variety of mechanisms, including the use of dedicated endonucleases (e.g., Cas6 or Cas5d in Type I and III systems), coupling of a host endonuclease (e.g., RNase III) with tracrRNA (Type II systems), or a ribonuclease activity endogenous to the effector protein itself (e.g., Cpfl, from Type V systems).
  • a crRNA may also be specifically generated outside of processing of a pre-crRNA and individually contacted to an effector protein in vivo or in vitro.
  • fusion effector proteins are targeted by a guide nucleic acid (e.g., a guide RNA) to a specific location in the target nucleic acid where they exert locus-specific regulation.
  • locus-specific regulation include blocking RNA polymerase binding to a promoter (which selectively inhibits transcription activator function), and/or modifying local chromatin (e.g., when a fusion sequence is used that modifies the target nucleic acid or modifies a protein associated with the target nucleic acid).
  • the guide RNA may bind to a target nucleic acid (e.g., a single strand of a target nucleic acid) or a portion thereof, an amplicon thereof, or a portion thereof.
  • a guide nucleic acid may bind to a target nucleic acid, such as DNA or RNA, from a cancer gene or gene associated with a genetic disorder, or an amplicon thereof, as described herein.
  • an effector protein cleaves a precursor RNA (“pre-crRNA”) to produce a guide RNA, also referred to as a “mature guide RNA.”
  • pre-crRNA precursor RNA
  • a guide RNA also referred to as a “mature guide RNA.”
  • An effector protein that cleaves pre-crRNA to produce a mature guide RNA is said to have pre-crRNA processing activity.
  • a repeat region of a guide RNA comprises mutations or truncations relative to respective regions in a corresponding pre-crRNA.
  • the guide nucleic acid may comprise a first region complementary to a target nucleic acid (FR1) and a second region that is not complementary to the target nucleic acid (FR2).
  • FR1 is located 5’ to FR2 (FR1-FR2).
  • FR2 is located 5’ to FR1 (FR2-FR1).
  • the guide nucleic acid comprises 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, or 30 linked nucleosides.
  • a guide nucleic acid comprises at least linked nucleosides.
  • a guide nucleic acid comprises at least 25 linked nucleosides.
  • a guide nucleic acid may comprise 10 to 50 linked nucleosides.
  • the guide nucleic acid comprises or consists essentially of about 12 to about 80 linked nucleosides, about 12 to about 50, about 12 to about 45, about 12 to about 40, about 12 to about 35, about 12 to about 30, about 12 to about 25, from about 12 to about 20, about 12 to about 19 , about 19 to about 20, about 19 to about 25, about 19 to about 30, about 19 to about 35, about 19 to about 40, about 19 to about 45, about 19 to about 50, about 19 to about 60, about 20 to about 25, about 20 to about 30, about 20 to about 35, about 20 to about 40, about 20 to about 45, about 20 to about 50, or about 20 to about 60 linked nucleosides.
  • the guide nucleic acid has about 10 to about 60, about 20 to about 50, or about 30 to about 40 linked nucleosides.
  • the guide nucleic acid comprises a nucleotide sequence as described herein (e g , TABLE 3, TABLE 4, TABLE 5, TABLE 6, TABLE 9, TABLE 10, or TABLE 11)
  • nucleotide sequences described herein may be described as a nucleotide sequence of either DNA or RNA, however, no matter the form the sequence is described, it is readily understood that such nucleotide sequences can be revised to be RNA or DNA, as needed, for describing a sequence within a guide nucleic acid itself or the sequence that encodes a guide nucleic acid, such as a nucleotide sequence described herein for a vector.
  • nucleotide sequences described herein also discloses the complementary nucleotide sequence, the reverse nucleotide sequence, and the reverse complement nucleotide sequence, any one of which can be a nucleotide sequence for use in a guide nucleic acid as described herein.
  • the gRNA comprises or consists of a sequence that is at least 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, 99% identical, or is identical, to any one of sequences recited in TABLE 3, TABLE 4, TABLE 5, TABLE 6, TABLE 9, TABLE 10, or TABLE 11
  • the gRNA comprises or consists of a sequence that is at least 90% identical to any one of sequences recited in TABLE 3, TABLE 4, TABLE 5, TABLE 6, TABLE 9, TABLE 10, or TABLE 11.
  • the gRNA comprises or consists of a sequence that is at least 91% identical to any one of sequences recited in TABLE 3, TABLE 4, TABLE 5, TABLE 6, TABLE 9, TABLE 10, or TABLE 11. In some embodiments, the gRNA comprises or consists of a sequence that is at least 92% identical to any one of sequences recited in TABLE 3, TABLE 4, TABLE 5, TABLE 6, TABLE 9, TABLE 10, or TABLE 11. In some embodiments, the gRNA comprises or consists of a sequence that is at least 93% identical to any one of sequences recited in TABLE 3, TABLE 4, TABLE
  • the gRNA comprises or consists of a sequence that is at least 94% identical to any one of sequences recited in TABLE 3, TABLE 4, TABLE 5, TABLE 6, TABLE 9, TABLE 10, or TABLE 11. In some embodiments, the gRNA comprises or consists of a sequence that is at least 95% identical to any one of sequences recited in TABLE 3, TABLE 4, TABLE 5, TABLE 6, TABLE 9, TABLE 10, or TABLE 11 In some embodiments, the gRNA comprises or consists of a sequence that is at least 96% identical to any one of sequences recited in TABLE 3, TABLE 4, TABLE 5, TABLE 6, TABLE 9, TABLE 10, or TABLE 11.
  • the gRNA comprises or consists of a sequence that is at least 97% identical to any one of sequences recited in TABLE 3, TABLE 4, TABLE 5, TABLE 6, TABLE 9, TABLE 10, or TABLE 11. In some embodiments, the gRNA comprises or consists of a sequence that is at least 98% identical to any one of sequences recited in TABLE 3, TABLE 4, TABLE 5, TABLE
  • the gRNA comprises or consists of a sequence that is at least 99% identical to any one of sequences recited in TABLE 3, TABLE 4, TABLE 5, TABLE 6, TABLE 9, TABLE 10, or TABLE 11. In some embodiments, the gRNA comprises or consists of a sequence that is 100% identical to any one of sequences recited in TABLE 3, TABLE 4, TABLE 5, TABLE 6, TABLE 9, TABLE 10, or TABLE 11
  • a crRNA comprises a spacer region that hybridizes to a target sequence of a target nucleic acid, and a repeat region that interacts with the effector protein.
  • the spacer region may comprise complementarity with (e.g., hybridize to) a target sequence of a target nucleic acid.
  • the spacer region is 15-28 linked nucleosides in length.
  • the spacer region is 15-26, 15-24, 15-22, 15-20, 15-18, 16-28, 16-26, 16-24, 16-22, 16-20, 16-18, 17-26, 17-24, 17-22, 17-20, 17-18, 18-26, 18-24, or 18-22 linked nucleosides in length.
  • the spacer region is 18-24 linked nucleosides in length. In some embodiments, the spacer region is at least 15 linked nucleosides in length. In some embodiments, the spacer region is at least 16, 18, 20, or 22 linked nucleosides in length. In some embodiments, the spacer region comprises at least 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, or 30 nucleotides. In some embodiments, the spacer region is at least 17 linked nucleosides in length. In some embodiments, the spacer region is at least 18 linked nucleosides in length. In some embodiments, the spacer region is at least 20 linked nucleosides in length.
  • the spacer region is at least 80 %, at least 85 %, at least 90 %, at least 95 % or 100 % complementary to a target sequence of the target nucleic acid. In some embodiments, the spacer region is 100 % complementary to the target sequence of the target nucleic acid. In some embodiments, the spacer region comprises at least 15 contiguous nucleobases that are complementary to the target nucleic acid.
  • the repeat region may also be referred to as a “protein-binding segment.” Typically, the repeat region is adjacent to the spacer region.
  • a guide RNA that interacts with an effector protein comprises a repeat region that is 5’ of the spacer region. TABLE 3 illustrates exemplary repeat sequences.
  • the repeat sequence comprises or consists of a sequence that is at least 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, 99% identical, or is identical, to any one of sequences recited in TABLE 3. Accordingly, in some embodiments, the repeat sequence comprises or consists of a sequence that is at least 90% identical to any one of sequences recited in TABLE 3. In some embodiments, the repeat sequence comprises or consists of a sequence that is at least 91% identical to any one of sequences recited in TABLE 3. In some embodiments, the repeat sequence comprises or consists of a sequence that is at least 92% identical to any one of sequences recited in TABLE 3.
  • the repeat sequence comprises or consists of a sequence that is at least 93% identical to any one of sequences recited in TABLE 3. In some embodiments, the repeat sequence comprises or consists of a sequence that is at least 94% identical to any one of sequences recited in TABLE 3. In some embodiments, the repeat sequence comprises or consists of a sequence that is at least 95% identical to any one of sequences recited in TABLE 3. In some embodiments, the repeat sequence comprises or consists of a sequence that is at least 96% identical to any one of sequences recited in TABLE 3. In some embodiments, the repeat sequence comprises or consists of a sequence that is at least 97% identical to any one of sequences recited in TABLE 3.
  • the repeat sequence comprises or consists of a sequence that is at least 98% identical to any one of sequences recited in TABLE 3. In some embodiments, the repeat sequence comprises or consists of a sequence that is at least 99% identical to any one of sequences recited in TABLE 3. In some embodiments, the repeat sequence comprises or consists of a sequence that is 100% identical to any one of sequences recited in TABLE 3.
  • sequence of a spacer region need not be 100 % complementary to that of a target sequence of a target nucleic acid to hybridize or hybridize specifically to the target sequence.
  • the guide nucleic acid may comprise at least one uracil between nucleic acid residues 5 to 20 of the spacer region that is not complementary to the corresponding nucleoside of the target sequence.
  • the guide nucleic acid may comprise at least one uracil between nucleic acid residues 5 to 9, 10 to 14, or 15 to 20 of the spacer region that is not complementary to the corresponding nucleoside of the target sequence.
  • the region of the target nucleic acid that is complementary to the spacer region comprises an epigenetic modification or a post-transcriptional modification.
  • the epigenetic modification comprises an acetylation, methylation, or thiol modification.
  • TABLE 4, TABLE 6, TABLE 10 and TABLE 11 illustrates exemplary crRNA sequences.
  • the crRNA comprises or consists of a sequence that is at least 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, 99% identical, or is identical, to any one of sequences recited in TABLE 4, TABLE 6, TABLE 10 and TABLE 11.
  • the crRNA comprises or consists of a sequence that is at least 90% identical to any one of sequences recited in TABLE 4, TABLE 6, TABLE 10 and TABLE 11.
  • the crRNA comprises or consists of a sequence that is at least 91% identical to any one of sequences recited in TABLE 4, TABLE 6, TABLE 10 and TABLE 11. In some embodiments, the crRNA comprises or consists of a sequence that is at least 92% identical to any one of sequences recited in TABLE 4, TABLE 6, TABLE 10 and TABLE 11. In some embodiments, the crRNA comprises or consists of a sequence that is at least 93% identical to any one of sequences recited in TABLE 4, TABLE 6, TABLE 10 and TABLE 11. In some embodiments, the crRNA comprises or consists of a sequence that is at least 94% identical to any one of sequences recited in TABLE 4, TABLE 6, TABLE 10 and TABLE 11.
  • the crRNA comprises or consists of a sequence that is at least 95% identical to any one of sequences recited in TABLE 4, TABLE 6, TABLE 10 and TABLE 11. In some embodiments, the crRNA comprises or consists of a sequence that is at least 96% identical to any one of sequences recited in TABLE 4, TABLE 6, TABLE 10 and TABLE 11. In some embodiments, the crRNA comprises or consists of a sequence that is at least 97% identical to any one of sequences recited in TABLE 4, TABLE 6, TABLE 10 and TABLE 11. In some embodiments, the crRNA comprises or consists of a sequence that is at least 98% identical to any one of sequences recited in TABLE 4, TABLE 6, TABLE 10 and TABLE 11.
  • the crRNA comprises or consists of a sequence that is at least 99% identical to any one of sequences recited in TABLE 4, TABLE 6, TABLE 10 and TABLE 11. In some embodiments, the crRNA comprises or consists of a sequence that is 100% identical to any one of sequences recited in TABLE 4, TABLE 6, TABLE 10 and TABLE 11.
  • the crRNA comprises or consists of a sequence that is at least 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, 99% identical, or is identical, to a sequence described herein (e g , TABLE 3, TABLE 4, TABLE 6, TABLE 9, TABLE 10 and TABLE 11) Accordingly, in some embodiments, the crRNA comprises or consists of a sequence that is at least 90% identical to any one of sequences recited in TABLE 3, TABLE 4, TABLE 6, TABLE 9, TABLE 10 and TABLE 11.
  • the crRNA comprises or consists of a sequence that is at least 91% identical to any one of sequences recited in TABLE 3, TABLE 4, TABLE 6, TABLE 9, TABLE 10 and TABLE 11 In some embodiments, the crRNA comprises or consists of a sequence that is at least 92% identical to any one of sequences recited in TABLE 3, TABLE 4, TABLE 6, TABLE 9, TABLE 10 and TABLE 11. In some embodiments, the crRNA comprises or consists of a sequence that is at least 93% identical to any one of sequences recited in TABLE 3, TABLE 4, TABLE 6, TABLE 9, TABLE 10 and TABLE 11.
  • the crRNA comprises or consists of a sequence that is at least 94% identical to any one of sequences recited in TABLE 3, TABLE 4, TABLE 6, TABLE 9, TABLE 10 and TABLE 11. In some embodiments, the crRNA comprises or consists of a sequence that is at least 95% identical to any one of sequences recited in TABLE 3, TABLE 4, TABLE 6, TABLE 9, TABLE 10 and TABLE 11. In some embodiments, the crRNA comprises or consists of a sequence that is at least 96% identical to any one of sequences recited in TABLE 3, TABLE 4, TABLE 6, TABLE 9, TABLE 10 and TABLE 11.
  • the crRNA comprises or consists of a sequence that is at least 97% identical to any one of sequences recited in TABLE 3, TABLE 4, TABLE 6, TABLE 9, TABLE 10 and TABLE 11. In some embodiments, the crRNA comprises or consists of a sequence that is at least 98% identical to any one of sequences recited in TABLE 3, TABLE 4, TABLE 6, TABLE 9, TABLE 10 and TABLE 11. In some embodiments, the crRNA comprises or consists of a sequence that is at least 99% identical to any one of sequences recited in TABLE 3, TABLE 4, TABLE 6, TABLE 9, TABLE 10 and TABLE 11. In some embodiments, the crRNA comprises or consists of a sequence that is 100% identical to a sequence described herein (e.g., TABLE 3, TABLE 4, TABLE 6, TABLE 9, TABLE 10 and TABLE 11)
  • the guide nucleic acid comprises or consists of a sequence that is at least 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, 99% identical, or is identical, to a sequence described herein (e.g., TABLE 3, TABLE 4, TABLE 6, TABLE 9, TABLE 10 and TABLE 11) Accordingly, in some embodiments, the guide nucleic acid comprises or consists of a sequence that is at least 90% identical to any one of sequences recited in TABLE 3, TABLE 4, TABLE 6, TABLE 9, TABLE 10 and TABLE 11.
  • the guide nucleic acid comprises or consists of a sequence that is at least 91% identical to any one of sequences recited in TABLE 3, TABLE 4, TABLE 6, TABLE 9, TABLE 10 and TABLE 11. In some embodiments, the guide nucleic acid comprises or consists of a sequence that is at least 92% identical to any one of sequences recited in TABLE 3, TABLE 4, TABLE 6, TABLE 9, TABLE 10 and TABLE 11. In some embodiments, the guide nucleic acid comprises or consists of a sequence that is at least 93% identical to any one of sequences recited in TABLE 3, TABLE 4, TABLE 6, TABLE 9, TABLE 10 and TABLE 11.
  • the guide nucleic acid comprises or consists of a sequence that is at least 94% identical to any one of sequences recited in TABLE 3, TABLE 4, TABLE 6, TABLE 9, TABLE 10 and TABLE 11. In some embodiments, the guide nucleic acid comprises or consists of a sequence that is at least 95% identical to any one of sequences recited in TABLE 3, TABLE 4, TABLE 6, TABLE 9, TABLE 10 and TABLE 11. In some embodiments, the guide nucleic acid comprises or consists of a sequence that is at least 96% identical to any one of sequences recited in TABLE 3, TABLE 4, TABLE 6, TABLE 9, TABLE 10 and TABLE 11.
  • the guide nucleic acid comprises or consists of a sequence that is at least 97% identical to any one of sequences recited in TABLE 3, TABLE 4, TABLE 6, TABLE 9, TABLE 10 and TABLE 11. In some embodiments, the guide nucleic acid comprises or consists of a sequence that is at least 98% identical to any one of sequences recited in TABLE 3, TABLE 4, TABLE 6, TABLE 9, TABLE 10 and TABLE 11. In some embodiments, the guide nucleic acid comprises or consists of a sequence that is at least 99% identical to any one of sequences recited in TABLE 3, TABLE 4, TABLE 6, TABLE 9, TABLE 10 and TABLE 11. In some embodiments, the guide nucleic acid comprises or consists of a sequence that is 100% identical to a sequence described herein (e.g., TABLE 3, TABLE 4, TABLE 6, TABLE 9, TABLE 10 and TABLE 11)
  • the crRNA comprises or consists of a sequence that is at least 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, 99% identical, or is identical, to a sequence described herein (e.g., SEQ ID NOs: 24, 25, 108-115, 118, 151-155, 164-168, or 177-183). Accordingly, in some embodiments, the crRNA comprises or consists of a sequence that is at least 90% identical to SEQ ID NOs: 24, 25, 108-115, 118, 151-155, 164-168, or 177-183.
  • the crRNA comprises or consists of a sequence that is at least 91% identical to SEQ ID NOs: 24, 25, 108-115, 118, 151-155, 164-168, or 177-183. In some embodiments, the crRNA comprises or consists of a sequence that is at least 92% identical to SEQ ID NOs: 24, 25, 108-115, 118, 151-155, 164-168, or 177-183. In some embodiments, the crRNA comprises or consists of a sequence that is at least 93% identical to SEQ ID NOs: 24, 25, 108-115, 118, 151-155, 164-168, or 177-183.
  • the crRNA comprises or consists of a sequence that is at least 94% identical to SEQ ID NOs: 24, 25, 108-115, 118, 151-155, 164-168, or 177-183. In some embodiments, the crRNA comprises or consists of a sequence that is at least 95% identical to SEQ ID NOs: 24, 25, 108-115, 118, 151-155, 164-168, or 177-183. In some embodiments, the crRNA comprises or consists of a sequence that is at least 96% identical to SEQ ID NOs: 24, 25, 108-115, 118, 151-155, 164-168, or 177-183.
  • the crRNA comprises or consists of a sequence that is at least 97% identical to SEQ ID NOs: 24, 25, 108-115, 118, 151-155, 164-168, or 177-183. In some embodiments, the crRNA comprises or consists of a sequence that is at least 98% identical to SEQ ID NOs: 24, 25, 108-115, 118, 151-155, 164-168, or 177-183. In some embodiments, the crRNA comprises or consists of a sequence that is at least 99% identical to SEQ ID NOs: 24, 25, 108-115, 118, 151-155, 164-168, or 177-183.
  • the crRNA comprises or consists of a sequence that is 100% identical to a sequence described herein (e.g., SEQ ID NOs: 24, 25, 108-115, 118, 151-155, 164-168, or 177-183).
  • a sgRNA may include deoxyribonucleosides, ribonucleosides, chemically modified nucleosides, or any combination thereof.
  • a sgRNA may also include a nucleotide sequence that forms a secondary structure (e.g., one or more hairpin loops) that facilitates the binding of an effector protein to the sgRNA and/or modification activity of an effector protein on a target nucleic acid (e.g. , a hairpin region).
  • a target nucleic acid e.g. a hairpin region
  • Such a sequence can be contained within a handle sequence as described herein.
  • a sgRNA may include a handle sequence having a hairpin region, as well as a linker and a repeat sequence.
  • the sgRNA having a handle sequence can have a hairpin region positioned 3 ’ of the linker and/or repeat sequence.
  • the sgRNA having a handle sequence can have a hairpin region positioned 5’ of the linker and/or repeat sequence.
  • the hairpin region may include a first sequence, a second sequence that is reverse complementary to the first sequence, and a stem-loop linking the first sequence and the second sequence.
  • the handle sequence of a sgRNA comprises a stem -loop structure comprising a stem region and a loop region.
  • the stem region is 4 to 8 linked nucleotides in length.
  • the stem region is 5 to 6 linked nucleotides in length.
  • the stem region is 4 to 5 linked nucleotides in length.
  • the sgRNA comprises a pseudoknot (e.g., a secondary structure comprising a stem at least partially hybridized to a second stem or half-stem secondary structure).
  • An effector protein may recognize a sgRNA comprising multiple stem regions.
  • the nucleotide sequences of the multiple stem regions are identical to one another.
  • the nucleotide sequences of at least one of the multiple stem regions is not identical to those of the others.
  • the sgRNA comprises at least 2, at least 3, at least 4, or at least 5 stem regions.
  • the length of a handle sequence in a sgRNA is not greater than 50, 56, 66, 67, 68, 69, 70, 71, 72, 73, 95, or 105 linked nucleotides. In some embodiments, the length of a handle sequence in a sgRNA is about 30 to about 120 linked nucleotides. In some embodiments, the length of a handle sequence in a sgRNA is about 50 to about 105, about 50 to about 95, about 50 to about 73, about 50 to about 71, about 50 to about 70, or about 50 to about 69 linked nucleotides.
  • the length of a handle sequence in a sgRNA is 56 to 105 linked nucleotides, from 56 to 105 linked nucleotides, 66 to 105 linked nucleotides, 67 to 105 linked nucleotides, 68 to 105 linked nucleotides, 69 to 105 linked nucleotides, 70 to 105 linked nucleotides, 71 to 105 linked nucleotides, 72 to 105 linked nucleotides, 73 to 105 linked nucleotides, or 95 to 105 linked nucleotides. In some embodiments, the length of a handle sequence in a sgRNA is 40 to 70 nucleotides.
  • the length of a handle sequence in a sgRNA is 50, 56, 66, 67, 68, 69, 70, 71, 72, 73, 95, or 105 linked nucleotides. In some embodiments, the length of a handle sequence in a sgRNA is 69 nucleotides.
  • the handle sequence comprises or consists of a sequence that is at least 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, 99% identical, or is identical, to any one of sequences recited in TABLE 3, TABLE 5, TABLE 6, TABLE 9, TABLE 10 and TABLE 11. Accordingly, in some embodiments, the handle sequence comprises or consists of a sequence that is at least 90% identical to any one of sequences recited in TABLE 3, TABLE 5, TABLE 6, TABLE 9, TABLE 10 and TABLE 11.
  • the handle sequence comprises or consists of a sequence that is at least 91% identical to any one of sequences recited in TABLE 3, TABLE 5, TABLE 6, TABLE 9, TABLE 10 and TABLE 11. In some embodiments, the handle sequence comprises or consists of a sequence that is at least 92% identical to any one of sequences recited in TABLE 3, TABLE 5, TABLE 6, TABLE 9, TABLE 10 and TABLE 11.
  • the handle sequence comprises or consists of a sequence that is at least 93% identical to any one of sequences recited in TABLE 3, TABLE 5, TABLE 6, TABLE 9, TABLE 10 and TABLE 11
  • the handle sequence comprises or consists of a sequence that is at least 94% identical to any one of sequences recited in TABLE 3, TABLE 5, TABLE 6, TABLE 9, TABLE 10 and TABLE 11.
  • the handle sequence comprises or consists of a sequence that is at least 95% identical to any one of sequences recited in TABLE 3, TABLE 5, TABLE 6, TABLE 9, TABLE 10 and TABLE 11.
  • the handle sequence comprises or consists of a sequence that is at least 96% identical to any one of sequences recited in TABLE 3, TABLE 5, TABLE 6, TABLE 9, TABLE 10 and TABLE 11. In some embodiments, the handle sequence comprises or consists of a sequence that is at least 97% identical to any one of sequences recited in TABLE 3, TABLE 5, TABLE 6, TABLE 9, TABLE 10 and TABLE 11. In some embodiments, the handle sequence comprises or consists of a sequence that is at least 98% identical to any one of sequences recited in TABLE 3, TABLE 5, TABLE 6, TABLE 9, TABLE 10 and TABLE 11.
  • the handle sequence comprises or consists of a sequence that is at least 99% identical to any one of sequences recited in TABLE 3, TABLE 5, TABLE 6, TABLE 9, TABLE 10 and TABLE 11 In some embodiments, the handle sequence comprises or consists of a sequence that is 100% identical to any one of sequences recited in TABLE 3, TABLE 5, TABLE 6, TABLE 9, TABLE 10 and TABLE 11. tracrRNA
  • the guide RNA comprises a tracrRNA.
  • the tracrRNA may be linked to a crRNA to form a composite gRNA.
  • the crRNA and the tracrRNA are provided as a single nucleic acid (e.g., covalently linked).
  • compositions comprise a tracrRNA that is separate from, but forms a complex with a crRNA to form a gRNA system.
  • the crRNA and the tracrRNA are separate polynucleotides.
  • a tracrRNA may comprise a repeat hybridization region and a hairpin region.
  • the repeat hybridization region may hybridize to all or part of the sequence of the repeat of a crRNA.
  • the repeat hybridization region may be positioned 3’ of the hairpin region.
  • the hairpin region may comprise a first sequence, a second sequence that is reverse complementary to the first sequence, and a stem-loop linking the first sequence and the second sequence.
  • tracrRNAs comprise a stem-loop structure comprising a stem region and a loop region.
  • the stem region is 4 to 8 linked nucleosides in length.
  • the stem region is 5 to 6 linked nucleosides in length.
  • the stem region is 4 to 5 linked nucleosides in length.
  • the tracrRNA comprises a pseudoknot (e.g., a secondary structure comprising a stem at least partially hybridized to a second stem or half-stem secondary structure).
  • An effector protein may recognize a tracrRNA comprising multiple stem regions.
  • the amino acid sequences of the multiple stem regions are identical to one another.
  • the amino acid sequences of at least one of the multiple stem regions is not identical to those of the others.
  • the tracrRNA comprises at least 2, at least 3, at least 4, or at least 5 stem regions.
  • the length of a tracrRNA is not greater than 50, 56, 68, 71, 73, 95, or 105 linked nucleosides. In some embodiments, the length of atracrRNA is about 30 to about 120 linked nucleosides. In some embodiments, the length of atracrRNA is about 50 to about 105, about 50 to about 95, about 50 to about 73, about 50 to about 71, about 50 to about 68, or about 50 to about 56 linked nucleosides.
  • the length of a tracrRNA is 56 to 105 linked nucleosides, from 56 to 105 linked nucleosides, 68 to 105 linked nucleosides, 71 to 105 linked nucleosides, 73 to 105 linked nucleosides, or 95 to 105 linked nucleosides. In some embodiments, the length of a tracrRNA is 40 to 60 nucleotides. In some embodiments, the length of a tracrRNA is 50, 56, 68, 71, 73, 95, or 105 linked nucleosides. In some embodiments, the length of a tracrRNA is 50 nucleotides. TABLE 5 shows exemplary tracrRNA sequences.
  • An exemplary tracrRNA may comprise, from 5’ to 3’, a 5’ region, a hairpin region, a repeat hybridization region, and a 3’ region.
  • the 5’ region may hybridize to the 3’ region.
  • the 5’ region does not hybridize to the 3’ region.
  • the 3’ region is covalently linked to the crRNA (e.g., through a phosphodiester bond).
  • a tracrRNA may comprise an unhybridized region at the 3’ end of the tracrRNA.
  • the unhybridized region may have a length of about 1, about 2, about 3, about 4, about 5, about 6, about 7, about 8, about 9, about 10, about 12, about 14, about 16, about 18, or about 20 linked nucleosides. In some embodiments, the length of the un-hybridized region is 0 to 20 linked nucleosides.
  • the guide RNA does not comprise a tracrRNA.
  • an effector protein does not require a tracrRNA to locate and/or cleave a target nucleic acid.
  • the crRNA of the guide nucleic acid comprises a repeat region and a spacer region, wherein the repeat region binds to the effector protein and the spacer region hybridizes to a target sequence of the target nucleic acid.
  • the repeat sequence of the crRNA may interact with an effector protein, allowing for the guide nucleic acid and the effector protein to form a complex.
  • the tracrRNA comprises or consists of a sequence that is at least 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, 99% identical, or is identical, to any one of sequences recited in TABLE 5. Accordingly, in some embodiments, the tracrRNA comprises or consists of a sequence that is at least 90% identical to any one of sequences recited in TABLE 5. In some embodiments, the tracrRNA comprises or consists of a sequence that is at least 91% identical to any one of sequences recited in TABLE 5. In some embodiments, the tracrRNA comprises or consists of a sequence that is at least 92% identical to any one of sequences recited in TABLE 5.
  • the tracrRNA comprises or consists of a sequence that is at least 93% identical to any one of sequences recited in TABLE 5. In some embodiments, the tracrRNA comprises or consists of a sequence that is at least 94% identical to any one of sequences recited in TABLE 5. In some embodiments, the tracrRNA comprises or consists of a sequence that is at least 95% identical to any one of sequences recited in TABLE 5. In some embodiments, the tracrRNA comprises or consists of a sequence that is at least 96% identical to any one of sequences recited in TABLE 5. In some embodiments, the tracrRNA comprises or consists of a sequence that is at least 97% identical to any one of sequences recited in TABLE 5.
  • the tracrRNA comprises or consists of a sequence that is at least 98% identical to any one of sequences recited in TABLE 5. In some embodiments, the tracrRNA comprises or consists of a sequence that is at least 99% identical to any one of sequences recited in TABLE 5. In some embodiments, the tracrRNA comprises or consists of a sequence that is 100% identical to any one of sequences recited in TABLE 5.
  • the tracrRNA comprises or consists of a sequence that is at least 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, 99% identical, or is identical, to at least a portion of a sequence described herein (e.g., SEQ ID NOs: 26, 27, 116, 117, 156-163, and 169-176). Accordingly, in some embodiments, the tracrRNA comprises or consists of a sequence that is at least 90% identical to at least a portion of a sequence selected from SEQ ID NOs: 26, 27, 116, 117, 156-163, and 169- 176.
  • the tracrRNA comprises or consists of a sequence that is at least 91% identical to at least a portion of a sequence selected from SEQ ID NOs: 26, 27, 116, 117, 156-163, and 169-176. In some embodiments, the tracrRNA comprises or consists of a sequence that is at least 92% identical to at least a portion of a sequence selected from SEQ ID NOs: 26, 27, 116, 117, 156-163, and 169-176. In some embodiments, the tracrRNA comprises or consists of a sequence that is at least 93% identical to at least a portion of a sequence selected from SEQ ID NOs: 26, 27, 116, 117, 156-163, and 169-176.
  • the tracrRNA comprises or consists of a sequence that is at least 94% identical to at least a portion of a sequence selected from SEQ ID NOs: 26, 27, 116, 117, 156-163, and 169-176. In some embodiments, the tracrRNA comprises or consists of a sequence that is at least 95% identical to at least a portion of a sequence selected from SEQ ID NOs: 26, 27, 116, 117, 156-163, and 169-176. In some embodiments, the tracrRNA comprises or consists of a sequence that is at least 96% identical to at least a portion of a sequence selected from SEQ ID NOs: 26, 27, 116, 117, 156-163, and 169-176.
  • the tracrRNA comprises or consists of a sequence that is at least 97% identical to at least a portion of a sequence selected from SEQ ID NOs: 26, 27, 116, 117, 156-163, and 169-176. In some embodiments, the tracrRNA comprises or consists of a sequence that is at least 98% identical to at least a portion of a sequence selected from SEQ ID NOs: 26, 27, 116, 117, 156-163, and 169-176. In some embodiments, the tracrRNA comprises or consists of a sequence that is at least 99% identical to at least a portion of a sequence selected from SEQ ID NOs: 26, 27, 116, 117, 156-163, and 169-176. In some embodiments, the tracrRNA comprises a sequence that is 100% identical to at least a portion of a sequence selected from SEQ ID NOs: 26, 27, 116, 117, 156-163, and 169-176.
  • the portion of the sequence comprises at least 50 nucleobases, at least 60 nucleobases, at least 70 nucleobases, at least 80 nucleobases, at least 90 nucleobases, at least 100, at least 110 nucleobases, at least 120 nucleobases, at least 130 nucleobases, at least 140 nucleobases, or at least 150 nucleobases.
  • the portion of the sequence comprises at least 50 nucleobases. In some embodiments, the portion of the sequence comprises at least 60 nucleobases. In some embodiments, the portion of the sequence comprises at least 70 nucleobases. In some embodiments, the portion of the sequence comprises at least 80 nucleobases.
  • the portion of the sequence comprises at least 90 nucleobases. In some embodiments, the portion of the sequence comprises at least 100 nucleobases. In some embodiments, the portion of the sequence comprises at least 110 nucleobases. In some embodiments, the portion of the sequence comprises at least 120 nucleobases. In some embodiments, the portion of the sequence comprises at least 130 nucleobases. In some embodiments, the portion of the sequence comprises at least 140 nucleobases. In some embodiments, the portion of the sequence comprises at least 150 nucleobases.
  • the tracrRNA comprises or consists of a sequence that is at least 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, 99% identical, oris identical, a sequence described herein (e g., SEQ ID NOs: 26, 27, 116, 117, 156, 157, 158, 159, 160, 161, 162, or 163).
  • the tracrRNA comprises or consists of a sequence that is at least 90% identical to SEQ ID NOs: 116, 117, 156, 157, 158, 159, 160, 161, 162, or 163.
  • the tracrRNA comprises or consists of a sequence that is at least 91% identical to SEQ ID NOs: 116, 117, 156, 157, 158, 159, 160, 161, 162, or 163. In some embodiments, the tracrRNA comprises or consists of a sequence that is at least 92% identical to SEQ ID NOs: 116, 117, 156, 157, 158, 159, 160, 161, 162, or 163. In some embodiments, the tracrRNA comprises or consists of a sequence that is at least 93% identical to SEQ ID NOs: 116, 117, 156, 157, 158, 159, 160, 161, 162, or 163.
  • the tracrRNA comprises or consists of a sequence that is at least 94% identical to SEQ ID NOs: 116, 117, 156, 157, 158, 159, 160, 161, 162, or 163. In some embodiments, the tracrRNA comprises or consists of a sequence that is at least 95% identical to SEQ ID NOs: 116, 117, 156, 157, 158, 159, 160, 161, 162, or 163. In some embodiments, the tracrRNA comprises or consists of a sequence that is at least 96% identical to SEQ ID NOs: 116, 117, 156, 157, 158, 159, 160, 161, 162, or 163.
  • the tracrRNA comprises or consists of a sequence that is at least 97% identical to SEQ ID NOs: 116, 117, 156, 157, 158, 159, 160, 161, 162, or 163. In some embodiments, the tracrRNA comprises or consists of a sequence that is at least 98% identical to SEQ ID NOs: 116, 117, 156, 157, 158, 159, 160, 161, 162, or 163. In some embodiments, the tracrRNA comprises or consists of a sequence that is at least 98% identical to SEQ ID NOs: 116, 117, 156, 157, 158, 159, 160, 161, 162, or 163. In some embodiments, the tracrRNA comprises or consists of a sequence that is identical to SEQ ID NOs: 116, 117, 156, 157, 158, 159, 160, 161, 162, or 163.
  • a composition described herein comprises a specific combination of effector protein, crRNA and tracrRNA. Accordingly, in some embodiments, the composition comprises an effector protein, a crRNA and a tracrRNA, wherein the effector protein comprises an amino acid sequence that is at least 90% identical to SEQ ID NO: 22; the crRNA comprises a nucleotide sequence that is at least 90% identical to SEQ ID NO: 114; and the tracrRNA comprises a nucleotide sequence that is at least 90% identical to a portion of SEQ ID NO: 116, wherein the portion of SEQ ID NO: 116 comprises at least 100 nucleobases.
  • the composition comprises an effector protein, a crRNA and a tracrRNA, wherein the effector protein comprises an amino acid sequence that is at least 90% identical to SEQ ID NO: 23; the crRNA comprises a nucleotide sequence that is at least 90% identical to SEQ ID NO: 114; and the tracrRNA comprises a nucleotide sequence that is at least 90% identical to a portion of SEQ ID NO: 117, wherein the portion of SEQ ID NO: 117 comprises at least 100 nucleobases.
  • the composition comprises an effector protein, a crRNA and a tracrRNA, wherein the effector protein comprises an amino acid sequence that is at least 90% identical to SEQ ID NO: 121; the crRNA comprises a nucleotide sequence that is at least 90% identical to SEQ ID NO: 154; and the tracrRNA comprises a nucleotide sequence that is at least 90% identical to a portion of SEQ ID NO: 159, wherein the portion of SEQ ID NO: 159 comprises at least 100 nucleobases.
  • the composition comprises an effector protein, a crRNA and a tracrRNA, wherein the effector protein comprises an amino acid sequence that is at least 90% identical to SEQ ID NO: 122; the crRNA comprises a nucleotide sequence that is at least 90% identical to SEQ ID NO: 152; and the tracrRNA comprises a nucleotide sequence that is at least 90% identical to a portion of SEQ ID NO: 157, wherein the portion of SEQ ID NO: 157 comprises at least 100 nucleobases.
  • the composition comprises an effector protein, a crRNA and a tracrRNA, wherein the effector protein comprises an amino acid sequence that is at least 90% identical to SEQ ID NO: 122; the crRNA comprises a nucleotide sequence that is at least 90% identical to SEQ ID NO: 114; and the tracrRNA comprises a nucleotide sequence that is at least 90% identical to a portion of SEQ ID NO: 116, wherein the portion of SEQ ID NO: 116 comprises at least 100 nucleobases
  • the composition comprises an effector protein, a crRNA and a tracrRNA, wherein the effector protein comprises an amino acid sequence that is at least 90% identical to SEQ ID NO: 123; the crRNA comprises a nucleotide sequence that is at least 90% identical to SEQ ID NO: 153; and the tracrRNA comprises a nucleotide sequence that is at least 90% identical to a portion of SEQ ID NO: 158, wherein the portion of SEQ ID NO: 158 comprises at least 100 nucleobases.
  • the composition comprises an effector protein, a crRNA and a tracrRNA, wherein the effector protein comprises an amino acid sequence that is at least 90% identical to SEQ ID NO: 123; the crRNA comprises a nucleotide sequence that is at least 90% identical to SEQ ID NO: 114; and the tracrRNA comprises a nucleotide sequence that is at least 90% identical to a portion of SEQ ID NO: 116, wherein the portion of SEQ ID NO: 116 comprises at least 100 nucleobases
  • the composition comprises an effector protein, a crRNA and a tracrRNA, wherein the effector protein comprises an amino acid sequence that is at least 90% identical to SEQ ID NO: 124; the crRNA comprises a nucleotide sequence that is at least 90% identical to SEQ ID NO: 114; and the tracrRNA comprises a nucleotide sequence that is at least 90% identical to a portion of SEQ ID NO: 161, wherein the portion of SEQ ID NO: 161 comprises at least 100 nucleobases.
  • the composition comprises an effector protein, a crRNA and a tracrRNA, wherein the effector protein comprises an amino acid sequence that is at least 90% identical to SEQ ID NO: 124; the crRNA comprises a nucleotide sequence that is at least 90% identical to SEQ ID NO: 114; and the tracrRNA comprises a nucleotide sequence that is at least 90% identical to a portion of SEQ ID NO: 162, wherein the portion of SEQ ID NO: 162 comprises at least 100 nucleobases.
  • the composition comprises an effector protein, a crRNA and a tracrRNA, wherein the effector protein comprises an amino acid sequence that is at least 90% identical to SEQ ID NO: 124; the crRNA comprises a nucleotide sequence that is at least 90% identical to SEQ ID NO: 114; and the tracrRNA comprises a nucleotide sequence that is at least 90% identical to a portion of SEQ ID NO: 163, wherein the portion of SEQ ID NO: 163 comprises at least 100 nucleobases.
  • the composition comprises an effector protein, a crRNA and a tracrRNA, wherein the effector protein comprises an amino acid sequence that is at least 90% identical to SEQ ID NO: 125; the crRNA comprises a nucleotide sequence that is at least 90% identical to SEQ ID NO: 155; and the tracrRNA comprises a nucleotide sequence that is at least 90% identical to a portion of SEQ ID NO: 160, wherein the portion of SEQ ID NO: 160 comprises at least 100 nucleobases.
  • the composition comprises an effector protein, a crRNA and a tracrRNA, wherein the effector protein comprises an amino acid sequence that is at least 90% identical to SEQ ID NO: 125; the crRNA comprises a nucleotide sequence that is at least 90% identical to SEQ ID NO: 114; and the tracrRNA comprises a nucleotide sequence that is at least 90% identical to a portion of SEQ ID NO: 116, wherein the portion of SEQ ID NO: 160 comprises at least 100 nucleobases.
  • the composition comprises an effector protein, a crRNA and a tracrRNA, wherein the effector protein comprises an amino acid sequence that is at least 90% identical to SEQ ID NO: 126; the crRNA comprises a nucleotide sequence that is at least 90% identical to SEQ ID NO: 151; and the tracrRNA comprises a nucleotide sequence that is at least 90% identical to a portion of SEQ ID NO: 156, wherein the portion of SEQ ID NO: 156 comprises at least 100 nucleobases.
  • the composition comprises an effector protein, a crRNA and a tracrRNA, wherein the effector protein comprises an amino acid sequence that is at least 90% identical to SEQ ID NO: 126; the crRNA comprises a nucleotide sequence that is at least 90% identical to SEQ ID NO: 114; and the tracrRNA comprises a nucleotide sequence that is at least 90% identical to a portion of SEQ ID NO: 116, wherein the portion of SEQ ID NO: 116 comprises at least 100 nucleobases.
  • the composition comprises an effector protein, a crRNA and a tracrRNA, wherein the effector protein comprises an amino acid sequence that is at least 90% identical to SEQ ID NO: 126; the crRNA comprises a nucleotide sequence that is at least 90% identical to SEQ ID NO: 114; and the tracrRNA comprises a nucleotide sequence that is at least 90% identical to a portion of SEQ ID NO: 161, wherein the portion of SEQ ID NO: 161 comprises at least 100 nucleobases.
  • compositions and methods described herein may comprise a donor nucleic acid or uses thereof.
  • the donor nucleic acid is present in a viral vector.
  • the donor nucleic acid may be introduced into the cell by any mechanism of the viral vector, including, but not limited to, integration into the genome of the cell or introduction of an episomal plasmid or viral genome.
  • the donor nucleic acid is inserted at the site of cleavage by the effector protein (cleaving (hydrolysis of a phosphodiester bond) of a nucleic acid resulting in a nick or double strand break - nuclease activity).
  • donor nucleic acid refers to a sequence of DNA that serves as a template in the process of homologous recombination, which may carry the modification that is to be or has been introduced into the target nucleic acid.
  • this donor nucleic acid as a template, the genetic information, including the modification, is copied into the target nucleic acid by way of homologous recombination.
  • compositions for modifying a target nucleic acid in a cell or a subject comprising any one of the effector proteins, engineered effector proteins, or fusion effector proteins described herein.
  • pharmaceutical compositions comprising a nucleic acid encoding any one of the effector proteins, engineered effector proteins, or fusion effector proteins described herein.
  • pharmaceutical compositions comprise a guide nucleic acid.
  • pharmaceutical compositions comprise a plurality of guide nucleic acids.
  • Pharmaceutical compositions may be used to modify a target nucleic acid or the expression thereof in a cell in vitro, in vivo or ex vivo.
  • compositions comprise one or more nucleic acids encoding an effector protein, fusion effector protein, fusion partner, a guide nucleic acid, or a combination thereof; and a pharmaceutically acceptable carrier or diluent.
  • the effector protein, fusion effector protein, fusion partner protein, or combination thereof may be any one of those described herein.
  • the one or more nucleic acids may comprise a plasmid.
  • the one or more nucleic acids may comprise a nucleic acid expression vector.
  • the one or more nucleic acids may comprise a viral vector.
  • the viral vector is a lentiviral vector.
  • the vector is an adeno-associated viral (AAV) vector.
  • AAV adeno-associated viral
  • compositions including pharmaceutical compositions, comprise a viral vector encoding a fusion effector protein and a guide nucleic acid, wherein at least a portion of the guide nucleic acid binds to the effector protein of the fusion effector protein.
  • compositions comprise a virus comprising a viral vector encoding a fusion effector protein, an effector protein, a fusion partner, a guide nucleic acid, or a combination thereof; and a pharmaceutically acceptable carrier or diluent.
  • the virus may be a lentivirus.
  • the virus may be an adenovirus.
  • the virus may be a non-replicating virus.
  • the virus may be an adeno- associated virus (AAV).
  • the viral vector may be a retroviral vector.
  • Retroviral vectors may include gamma-retroviral vectors such as vectors derived from the Moloney Murine Keukemia Virus (MoMLV, MMLV, MuLV, or MLV) or the Murine Steam cell Virus (MSCV) genome. Retroviral vectors may include lentiviral vectors such as those derived from the human immunodeficiency virus (HIV) genome.
  • the viral vector is a chimeric viral vector, comprising viral portions from two or more viruses.
  • the viral vector is a recombinant viral vector.
  • the viral vector is an AAV.
  • the AAV may be any AAV known in the art.
  • the viral vector corresponds to a virus of a specific serotype.
  • the serotype is selected from an AAV1 serotype, an AAV2 serotype, AAV3 serotype, an AAV4 serotype, AAV5 serotype, an AAV6 serotype, AAV7 serotype, an AAV8 serotype, an AAV9 serotype, an AAV 10 serotype, an AAV11 serotype, and an AAV 12 serotype.
  • the AAV vector is a recombinant vector, a hybrid AAV vector, a chimeric AAV vector, a self- complementary AAV (scAAV) vector, a single-stranded AAV or any combination thereof.
  • scAAV genomes are generally known in the art and contain both DNA strands which can anneal together to form double-stranded DNA.
  • methods of producing delivery vectors herein comprise packaging a nucleic acid encoding an effector protein and a guide nucleic acid, or a combination thereof, into an AAV vector.
  • methods of producing the delivery vector comprises, (a) contacting a cell with at least one nucleic acid encoding: (i) a guide nucleic acid; (ii) a Replication (Rep) gene; and (iii) a Capsid (Cap) gene that encodes an AAV capsid protein; (b) expressing the AAV capsid protein in the cell; (c) assembling an AAV particle; and (d) packaging a Cas effector encoding nucleic acid into the AAV particle, thereby generating an AAV delivery vector.
  • promoters, staffer sequences, and any combination thereof may be packaged in the AAV vector.
  • the AAV vector can package 1, 2, 3, 4, or 5 guide nucleic acids or copies thereof.
  • the AAV vector comprises inverted terminal repeats, e.g., a 5 ’ inverted terminal repeat and a 3 ’ inverted terminal repeat.
  • the AAV vector comprises a mutated inverted terminal repeat that lacks a terminal resolution site.
  • a hybrid AAV vector is produced by transcapsidation, e.g., packaging an inverted terminal repeat (ITR) from a first serotype into a capsid of a second serotype, wherein the first and second serotypes may be not the same.
  • the Rep gene and ITR from a first AAV serotype e.g., AAV2
  • a second AAV serotype e.g., AAV9
  • a hybrid AAV serotype comprising the AAV2 ITRs and AAV9 capsid protein may be indicated AAV2/9.
  • the hybrid AAV delivery vector comprises an AAV2/1, AAV2/2, AAV 2/4, AAV2/5, AAV2/8, or AAV2/9 vector.
  • the AAV vector may be a chimeric AAV vector.
  • the chimeric AAV vector comprises an exogenous amino acid or an amino acid substitution, or capsid proteins from two or more serotypes.
  • a chimeric AAV vector may be genetically engineered to increase transduction efficiency, selectivity, or a combination thereof.
  • the delivery vector may be a eukaryotic vector, a prokaryotic vector (e.g., a bacterial vector) a viral vector, or any combination thereof.
  • the delivery vehicle may be a non-viral vector.
  • the delivery vehicle may be a plasmid.
  • the plasmid comprises DNA.
  • the plasmid comprises RNA.
  • the plasmid comprises circular double-stranded DNA.
  • the plasmid may be linear.
  • the plasmid comprises one or more genes of interest and one or more regulatory elements.
  • the plasmid comprises a bacterial backbone containing an origin of replication and an antibiotic resistance gene or other selectable marker for plasmid amplification in bacteria.
  • the plasmid may be a minicircle plasmid.
  • the plasmid contains one or more genes that provide a selective marker to induce a target cell to retain the plasmid.
  • the plasmid may be formulated for delivery through injection by a needle carrying syringe.
  • the plasmid may be formulated for delivery via electroporation.
  • the plasmids may be engineered through synthetic or other suitable means known in the art.
  • the genetic elements may be assembled by restriction digest of the desired genetic sequence from a donor plasmid or organism to produce ends of the DNA which may then be readily ligated to another genetic sequence.
  • the vector is a non-viral vector, and a physical method or a chemical method is employed for delivery into the somatic cell.
  • exemplary physical methods include electroporation, gene gun, sonoporation, magnetofection, or hydrodynamic delivery.
  • Exemplary chemical methods include delivery of the recombinant polynucleotide via liposomes such as, cationic lipids or neutral lipids; dendrimers; nanoparticles; or cell-penetrating peptides.
  • a fusion effector protein as described herein is inserted into a vector.
  • the vector optionally comprises one or more promoters, enhancers, ribosome binding sites, RNA splice sites, polyadenylation sites, a replication origin, and/or transcriptional terminator sequences.
  • plasmids and vectors described herein comprise at least one promoter.
  • the promoters are constitutive promoters.
  • the promoters are inducible promoters.
  • the promoters are prokaryotic promoters (e.g., drive expression of a gene in a prokaryotic cell).
  • the promoters are eukaryotic promoters, (e.g., drive expression of a gene in a eukaryotic cell).
  • Exemplary promoters include, but are not limited to, CMV, 7SK, EFla, RPBSA, hPGK, EFS, SV40, PGK1, Ubc, human beta actin, CAG, TRE, UAS, Ac5, polyhedrin, CaMKIIa, GALI-10, TEF1, GDS, ADH1, CaMV35S, Ubi, Hl, U6, MNDU3, MSCV, MND, and HSV TK promoter.
  • the promoter is CMV.
  • the promoter is EFla.
  • the promoter is ubiquitin.
  • vectors are bicistronic or polycistronic vector (e.g., having or involving two or more loci responsible for generating a protein) having an internal ribosome entry site (IRES) is for translation initiation in a cap-independent manner.
  • IRS internal ribosome entry site
  • vectors comprise an enhancer.
  • Enhancers are nucleotide sequences that have the effect of enhancing promoter activity.
  • enhancers augment transcription regardless of the orientation of their sequence.
  • enhancers activate transcription from a distance of several kilo base pairs.
  • enhancers are located optionally upstream or downstream of a gene region to be transcribed, and/or located within the gene, to activate the transcription.
  • Exemplary enhancers include, but are not limited to, WPRE; CMV enhancers; the R-U5' segment in LTR of HTLV-I (Mol. Cell. Biol., Vol. 8(1), p.
  • compositions described herein may comprise a salt.
  • the salt is a sodium salt.
  • the salt is a potassium salt.
  • the salt is a magnesium salt.
  • the salt is NaCl.
  • the salt is KNO3.
  • the salt is Mg 2 + SO -.
  • Non-limiting examples of pharmaceutically acceptable carriers and diluents suitable for the pharmaceutical compositions disclosed herein include buffers (e.g., neutral buffered saline, phosphate buffered saline); carbohydrates (e.g., glucose, mannose, sucrose, dextran, mannitol); polypeptides or amino acids (e.g., glycine); antioxidants; chelating agents (e.g., EDTA, glutathione); adjuvants (e.g., aluminum hydroxide); surfactants (Polysorbate 80, Polysorbate 20, or Pluronic F68); glycerol; sorbitol; mannitol; polyethyleneglycol; and preservatives.
  • buffers e.g., neutral buffered saline, phosphate buffered saline
  • carbohydrates e.g., glucose, mannose, sucrose, dextran, mannitol
  • polypeptides or amino acids e.g.
  • compositions are in the form of a solution (e.g, a liquid).
  • the solution may be formulated for injection, e.g., intravenous or subcutaneous injection.
  • the pH of the solution is about 7, about 7.1, about 7.2, about 7.3, about 7.4, about 7.5, about 7.6, about 7.7, about 7.8, about 7.9, about 8, about 8.1, about 8.2, about 8.3, about 8.4, about 8.5, about 8.6, about 8.7, about 8.8, about 8.9, or about 9.
  • the pH is 7 to 7.5, 7.5 to 8, 8 to 8.5, 8.5 to 9, or 7 to 8.5.
  • the pH of the solution is less than 7.
  • the pH is greater than 7.
  • methods of modifying target nucleic acids or the expression thereof comprise editing a target nucleic acid.
  • editing refers to modifying the nucleobase sequence of a target nucleic acid.
  • methods of modulating the expression of a target nucleic acid Fusion effector proteins and systems described herein may be used for such methods.
  • Methods of editing a target nucleic acid may comprise one or more of cleaving the target nucleic acid, deleting one or more nucleotides of the target nucleic acid, inserting one or more nucleotides into the target nucleic acid, modifying one or more nucleotides of the target nucleic acid.
  • Methods of modulating expression of target nucleic acids may comprise modifying the target nucleic acid or a protein associated with the target nucleic acid, e.g., a histone.
  • methods comprise contacting a target nucleic acid with a composition described herein. In some embodiments, methods comprise contacting a target nucleic acid with an effector protein described herein. In some embodiments, methods comprise contacting a target nucleic acid with a fusion effector protein described herein.
  • the effector protein may be an effector protein provided in TABLE 1 or a catalytically inactive variant thereof.
  • the effector protein may comprise an amino acid sequence that is at least 70%, at least 75%, at least 80%, at least 85%, at least 90%, at least 95%, at least 97%, at least 98%, at least 99% or 100% identical to a sequence described in TABLE 1.
  • the amino acid sequence of the effector protein is at least 70%, at least 75%, at least 80%, at least 85%, at least 90%, at least 95%, at least 97%, at least 98%, at least 99% or 100% identical to a sequence described in TABLE 1.
  • methods comprise contacting a target nucleic acid with an effector protein described herein and a guide nucleic acid described herein.
  • methods comprise contacting a target nucleic acid with an effector protein described herein and a crRNA described herein.
  • methods comprise contacting a target nucleic acid with an effector protein described herein, a crRNA described herein, and a tracrRNA described herein.
  • methods comprise contacting a target nucleic acid with a composition comprising a specific combination of effector protein, crRNA and tracrRNA. Accordingly, in some embodiments, the method comprises contacting a target nucleic acid with a composition comprising an effector protein, a crRNA and a tracrRNA, wherein the effector protein comprises an amino acid sequence that is at least 90% identical to SEQ ID NO: 22; the crRNA comprises a nucleotide sequence that is at least 90% identical to SEQ ID NO: 114; and the tracrRNA comprises a nucleotide sequence that is at least 90% identical to a portion of SEQ ID NO: 116, wherein the portion of SEQ ID NO: 116 comprises at least 100 nucleobases.
  • the method comprises contacting a target nucleic acid with a composition comprising an effector protein, a crRNA and a tracrRNA, wherein the effector protein comprises an amino acid sequence that is at least 90% identical to SEQ ID NO: 23; the crRNA comprises a nucleotide sequence that is at least 90% identical to SEQ ID NO: 115; and the tracrRNA comprises a nucleotide sequence that is at least 90% identical to a portion of SEQ ID NO: 117, wherein the portion of SEQ ID NO: 117 comprises at least 100 nucleobases.
  • the method comprises contacting a target nucleic acid with a composition comprising an effector protein, a crRNA and a tracrRNA, wherein the effector protein comprises an amino acid sequence that is at least 90% identical to SEQ ID NO: 121; the crRNA comprises a nucleotide sequence that is at least 90% identical to SEQ ID NO: 154; and the tracrRNA comprises a nucleotide sequence that is at least 90% identical to a portion of SEQ ID NO: 159, wherein the portion of SEQ ID NO: 159 comprises at least 100 nucleobases.
  • the method comprises contacting a target nucleic acid with a composition comprising an effector protein, a crRNA and a tracrRNA, wherein the effector protein comprises an amino acid sequence that is at least 90% identical to SEQ ID NO: 122; the crRNA comprises a nucleotide sequence that is at least 90% identical to SEQ ID NO: 152; and the tracrRNA comprises a nucleotide sequence that is at least 90% identical to a portion of SEQ ID NO: 157, wherein the portion of SEQ ID NO: 157 comprises at least 100 nucleobases.
  • the method comprises contacting a target nucleic acid with a composition comprising an effector protein, a crRNA and a tracrRNA, wherein the effector protein comprises an amino acid sequence that is at least 90% identical to SEQ ID NO: 122; the crRNA comprises a nucleotide sequence that is at least 90% identical to SEQ ID NO: 114; and the tracrRNA comprises a nucleotide sequence that is at least 90% identical to a portion of SEQ ID NO: 116, wherein the portion of SEQ ID NO: 116 comprises at least 100 nucleobases.
  • the method comprises contacting a target nucleic acid with a composition comprising an effector protein, a crRNA and a tracrRNA, wherein the effector protein comprises an amino acid sequence that is at least 90% identical to SEQ ID NO: 123; the crRNA comprises a nucleotide sequence that is at least 90% identical to SEQ ID NO: 153; and the tracrRNA comprises a nucleotide sequence that is at least 90% identical to a portion of SEQ ID NO: 158, wherein the portion of SEQ ID NO: 158 comprises at least 100 nucleobases.
  • the method comprises contacting a target nucleic acid with a composition comprising an effector protein, a crRNA and a tracrRNA, wherein the effector protein comprises an amino acid sequence that is at least 90% identical to SEQ ID NO: 123; the crRNA comprises a nucleotide sequence that is at least 90% identical to SEQ ID NO: 114; and the tracrRNA comprises a nucleotide sequence that is at least 90% identical to a portion of SEQ ID NO: 116, wherein the portion of SEQ ID NO: 116 comprises at least 100 nucleobases.
  • the method comprises contacting a target nucleic acid with a composition comprising an effector protein, a crRNA and a tracrRNA, wherein the effector protein comprises an amino acid sequence that is at least 90% identical to SEQ ID NO: 124; the crRNA comprises a nucleotide sequence that is at least 90% identical to SEQ ID NO: 114; and the tracrRNA comprises a nucleotide sequence that is at least 90% identical to a portion of SEQ ID NO: 161, wherein the portion of SEQ ID NO: 161 comprises at least 100 nucleobases.
  • the method comprises contacting a target nucleic acid with a composition comprising an effector protein, a crRNA and a tracrRNA, wherein the effector protein comprises an amino acid sequence that is at least 90% identical to SEQ ID NO: 124; the crRNA comprises a nucleotide sequence that is at least 90% identical to SEQ ID NO: 114; and the tracrRNA comprises a nucleotide sequence that is at least 90% identical to a portion of SEQ ID NO: 162, wherein the portion of SEQ ID NO: 162 comprises at least 100 nucleobases.
  • the method comprises contacting a target nucleic acid with a composition comprising an effector protein, a crRNA and a tracrRNA, wherein the effector protein comprises an amino acid sequence that is at least 90% identical to SEQ ID NO: 124; the crRNA comprises a nucleotide sequence that is at least 90% identical to SEQ ID NO: 114; and the tracrRNA comprises a nucleotide sequence that is at least 90% identical to a portion of SEQ ID NO: 163, wherein the portion of SEQ ID NO: 163 comprises at least 100 nucleobases.
  • the method comprises contacting a target nucleic acid with a composition comprising an effector protein, a crRNA and a tracrRNA, wherein the effector protein comprises an amino acid sequence that is at least 90% identical to SEQ ID NO: 125; the crRNA comprises a nucleotide sequence that is at least 90% identical to SEQ ID NO: 155; and the tracrRNA comprises a nucleotide sequence that is at least 90% identical to a portion of SEQ ID NO: 160, wherein the portion of SEQ ID NO: 160 comprises at least 100 nucleobases.
  • the method comprises contacting a target nucleic acid with a composition comprising an effector protein, a crRNA and a tracrRNA, wherein the effector protein comprises an amino acid sequence that is at least 90% identical to SEQ ID NO: 125; the crRNA comprises a nucleotide sequence that is at least 90% identical to SEQ ID NO: 114; and the tracrRNA comprises a nucleotide sequence that is at least 90% identical to a portion of SEQ ID NO: 116, wherein the portion of SEQ ID NO: 160 comprises at least 100 nucleobases.
  • the method comprises contacting a target nucleic acid with a composition comprising an effector protein, a crRNA and a tracrRNA, wherein the effector protein comprises an amino acid sequence that is at least 90% identical to SEQ ID NO: 126; the crRNA comprises a nucleotide sequence that is at least 90% identical to SEQ ID NO: 151; and the tracrRNA comprises a nucleotide sequence that is at least 90% identical to a portion of SEQ ID NO: 156, wherein the portion of SEQ ID NO: 156 comprises at least 100 nucleobases.
  • the method comprises contacting a target nucleic acid with a composition comprising an effector protein, a crRNA and a tracrRNA, wherein the effector protein comprises an amino acid sequence that is at least 90% identical to SEQ ID NO: 126; the crRNA comprises a nucleotide sequence that is at least 90% identical to SEQ ID NO: 114; and the tracrRNA comprises a nucleotide sequence that is at least 90% identical to a portion of SEQ ID NO: 116, wherein the portion of SEQ ID NO: 116 comprises at least 100 nucleobases.
  • the method comprises contacting a target nucleic acid with a composition comprising an effector protein, a crRNA and a tracrRNA, wherein the effector protein comprises an amino acid sequence that is at least 90% identical to SEQ ID NO: 126; the crRNA comprises a nucleotide sequence that is at least 90% identical to SEQ ID NO: 114; and the tracrRNA comprises a nucleotide sequence that is at least 90% identical to a portion of SEQ ID NO: 161, wherein the portion of SEQ ID NO: 161 comprises at least 100 nucleobases.
  • methods comprise base editing.
  • base editing comprises contacting a target nucleic acid with a fusion effector protein comprising an effector protein fused to a base editing enzyme, such as a deaminase, thereby changing a nucleobase of the target nucleic acid to an alternative nucleobase.
  • the nucleobase of the target nucleic acid is adenine (A) and the method comprises changing A to guanine (G).
  • the nucleobase of the target nucleic acid is cytosine (C) and the method comprises changing C to thymine (T).
  • the nucleobase of the target nucleic acid is C and the method comprises changing C to G.
  • the nucleobase of the target nucleic acid is A and the method comprises changing A to G.
  • methods introduce a nucleobase change in a target nucleic acid relative to a corresponding wildtype or mutant nucleobase sequence.
  • methods remove or correct a disease-causing mutation in a nucleic acid sequence, e.g., to produce a corresponding wildtype nucleobase sequence.
  • methods remove/correct point mutations, deletions, null mutations, or tissue-specific mutations in a target nucleic acid.
  • methods generate gene knock-out, gene knock-in, gene editing, gene tagging, or a combination thereof. Methods of the disclosure may be targeted to a locus in a genome of a cell.
  • methods of editing a target nucleic acid or modulating the expression of a target nucleic acid are performed in vivo.
  • methods of editing a target nucleic acid or modulating the expression of a target nucleic acid are performed in vitro.
  • a plasmid may be modified in vitro using a composition described herein and introduced into a cell or organism.
  • methods of editing a target nucleic acid or modulating the expression of a target nucleic acid are performed ex vivo.
  • methods may comprise obtaining a cell from a subject, modifying a target nucleic acid in the cell with methods and compositions described herein, and returning the cell to the subject.
  • Methods of editing performed ex vivo may be particularly advantageous to produce CAR T-cells.
  • methods comprise editing a target nucleic acid or modulating the expression of the target nucleic acid in a cell or a subject.
  • the cell may be a dividing cell.
  • the cell may be a terminally differentiated cell.
  • the target nucleic acid is a gene.
  • the cell may be a prokaryotic cell.
  • the cell may be an archaeal cell.
  • the cell may be a eukaryotic cell.
  • the cell may be a mammalian cell.
  • the cell may be a human cell.
  • the cell may be a T cell.
  • the cell may be a hematopoietic stem cell.
  • the cell may be a bone marrow derived cell, a white blood cell, a blood cell progenitor, or a combination thereof.
  • Generating a genetically modified cell may comprise contacting a target cell with an effector protein or a fusion effector protein described herein and a guide nucleic acid. Contacting may comprise electroporation, acoustic poration, optoporation, viral vector-based delivery, iTOP, nanoparticle delivery (e.g., lipid or gold nanoparticle delivery), cell-penetrating peptide (CPP) delivery, DNA nanostructure delivery, or any combination thereof.
  • the nanoparticle delivery comprises lipid nanoparticle delivery or gold nanoparticle delivery.
  • the nanoparticle delivery comprises lipid nanoparticle delivery.
  • the nanoparticle delivery comprises gold nanoparticle delivery.
  • Methods may comprise cell line engineering.
  • cell line engineering comprises modifying a pre-existing cell (e.g., naturally-occurring or engineered) or pre-existing cell line to produce a novel cell line or modified cell line.
  • modifying the pre-existing cell or cell line comprises contacting the pre-existing cell or cell line with an effector protein or fusion effector protein described herein and a guide nucleic acid. The resulting modified cell line may be useful for production of a protein of interest.
  • Non-limiting examples of cell lines includes: 132-d5 human fetal fibroblasts, 10.1 mouse fibroblasts, 293-T, 3T3, 3T3 Swiss, 3T3-L1, 721, 9L, A-549, A10, A172, A20, A253, A2780, A2780ADR, A2780cis, A375, A431, ALC, ARH-77, B16, B35, BALB/3T3 mouse embryo fibroblast, BC-3, BCP-1 cells, BEAS-2B, BHK-21, BR 293, BS-C-1 monkey kidney epithelial, Bcl-l, bEnd.3, BxPC3, C3H-10T1/2, C6/36, C8161, CCRF-CEM, CHO, CHO Dhfr-/- CHO-7, CHOIR, CHO-K1, CHO-K2, CHO-T, CIR, CML Tl, CMT, COR-L23, COR-L23/5010, COR-
  • modified cells or populations of modified cells wherein the modified cell comprises an effector protein described herein, a nucleic acid encoding an effector protein described herein, or a combination thereof.
  • the modified cell comprises a fusion effector protein described herein, a nucleic acid encoding an effector protein described herein, or a combination thereof.
  • the modified cell is a modified prokaryotic cell.
  • the modified cell is a modified eukaryotic cell.
  • a modified cell may be a modified fungal cell.
  • the modified cell is a modified vertebrate cell.
  • the modified cell is a modified invertebrate cell.
  • the modified cell is a modified mammalian cell. In some embodiments, the modified cell is a modified human cell. In some embodiments, the modified cell is in a subject.
  • a modified cell may be in vitro.
  • a modified cell may be in vivo.
  • a modified cell may be ex vivo.
  • a modified cell may be a cell in a cell culture.
  • a modified cell may be a cell obtained from a biological fluid, organ or tissue of a subject and modified with a composition and/or method described herein. Non-limiting examples of biological fluids are blood, plasma, serum, and cerebrospinal fluid.
  • Non-limiting examples of tissues and organs are bone marrow, adipose tissue, skeletal muscle, smooth muscle, spleen, thymus, brain, lymph node, adrenal gland, prostate gland, intestine, colon, liver, kidney, pancreas, heart, lung, bladder, ovary, uterus, breast, and testes.
  • Non-limiting examples of cells that may be obtained from a subject are hepatocytes, epithelial cells, endothelial cells, neurons, cardiomyocytes, muscle cells and adipocytes.
  • Non-limiting examples of cells that may be modified with compositions and methods described herein include immune cells, such as CAR T-cells, T-cells, B-cells, NK cells, granulocytes, basophils, eosinophils, neutrophils, mast cells, monocytes, macrophages, dendritic cells, microglia, Kuppfer cells, antigen-presenting cells (APC), or adaptive cells.
  • immune cells such as CAR T-cells, T-cells, B-cells, NK cells, granulocytes, basophils, eosinophils, neutrophils, mast cells, monocytes, macrophages, dendritic cells, microglia, Kuppfer cells, antigen-presenting cells (APC), or adaptive cells.
  • Non-limiting examples of cells that may be engineered or modified with compositions and methods described herein include stem cells, such as human stem cells, animal stem cells, stem cells that are not derived from human embryonic stem cells, embryonic stem cells, mesenchymal stem cells, pluripotent stem cells, induced pluripotent stem cells (iPS), somatic stem cells, adult stem cells, hematopoietic stem cells, tissue-specific stem cells.
  • stem cells such as human stem cells, animal stem cells, stem cells that are not derived from human embryonic stem cells, embryonic stem cells, mesenchymal stem cells, pluripotent stem cells, induced pluripotent stem cells (iPS), somatic stem cells, adult stem cells, hematopoietic stem cells, tissue-specific stem cells.
  • a cell may be a pluripotent cell.
  • Non-limiting examples of cells that may be engineered or modified with compositions and methods described herein include include plant cells, such as parenchyma, sclerenchyma, collenchyma, xylem, phloem, germline (e.g., pollen). Cells from lycophytes, fems, gymnosperms, angiosperms, bryophytes, charophytes, chiorophytes, rhodophytes, or glaucophytes.
  • plant cells such as parenchyma, sclerenchyma, collenchyma, xylem, phloem, germline (e.g., pollen).
  • the target nucleic acid may be from any organism, including, but not limited to, a bacterium, a virus, a parasite, a protozoon, a fungus, a mammal, a plant, and an insect.
  • the target nucleic acid may be responsible for a disease, contain a mutation (e.g., single strand polymorphism, point mutation, insertion, or deletion), be contained in an amplicon, or be uniquely identifiable from the surrounding nucleic acids (e.g., contain a unique sequence of nucleotides).
  • the target nucleic acid is a double stranded nucleic acid. In some embodiments, the target nucleic acid is a single stranded nucleic acid. In some embodiments, the target nucleic acid is a double stranded nucleic acid that is prepared into single stranded nucleic acids before or upon contacting a reagent or sample. In some embodiments, the target nucleic acid comprises DNA. In some embodiments, the target nucleic acid comprises RNA.
  • the target nucleic acids include but are not limited to mRNA, rRNA, tRNA, non-coding RNA, long non-coding RNA, and microRNA (miRNA).
  • the target nucleic acid is complementary DNA (cDNA) synthesized from a single-stranded RNA template in a reaction catalyzed by a reverse transcriptase.
  • the target nucleic acid is single-stranded RNA (ssRNA) or mRNA.
  • target nucleic acids comprise a mutation.
  • the mutation may be a mutation of 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20 or more nucleotides.
  • the mutation may result in the insertion of at least one amino acid in a polypeptide encoded by the target nucleic acid.
  • the mutation may result in the deletion of at least one amino acid in a polypeptide encoded by the target nucleic acid.
  • the mutation may result in the substitution of at least one amino acid in a polypeptide encoded by the target nucleic acid.
  • the mutation may result in misfolding of the polypeptide.
  • the mutation may result in a premature stop codon.
  • the mutation may result in a truncation of the protein.
  • At least a portion of a guide nucleic acid of a composition described herein hybridizes to a region of the target nucleic acid comprising the mutation. In some embodiments, at least a portion of a guide nucleic acid of a composition described herein hybridizes to a region of the target nucleic acid that is within 10 nucleotides, within 50 nucleotides, within 100 nucleotides, orwithin 200 nucleotides of the mutation. The mutation may be located in a non-coding region or a coding region of a gene.
  • the mutation is an autosomal dominant mutation. In some embodiments, the mutation is a dominant negative mutation. In some embodiments, the mutation is a loss of function mutation. In some embodiments, the mutation is a single nucleotide polymorphism (SNP). In some embodiments, the SNP is associated with a phenotype of the sample or a phenotype of the organism from which the sample was taken. The SNP, in some embodiments, is associated with altered phenotype from wild type phenotype.
  • the SNP may be a synonymous substitution or a nonsynonymous substitution. The nonsynonymous substitution may be a missense substitution, or a nonsense point mutation. The synonymous substitution may be a silent substitution.
  • the mutation may be a deletion of one or more nucleotides.
  • the single nucleotide mutation, SNP, or deletion is associated with a disease such as cancer or a genetic disorder.
  • the mutation such as a single nucleotide mutation, a SNP, or a deletion, may be encoded in the sequence of a target nucleic acid from the germline of an organism or may be encoded in a target nucleic acid from a diseased cell, such as a maycer cell.
  • the target nucleic acid comprises a mutation associated with a disease.
  • a mutation associated with a disease refers to a mutation whose presence in a subject indicates that the subject is susceptible to, or suffers from, a disease, disorder, or pathological state.
  • a mutation associated with a disease refers to a mutation which causes the disease, contributes to the development of the disease, or indicates the existence of the disease.
  • a mutation associated with a disease may also refer to any mutation which generates transcription or translation products at an abnormal level, or in an abnormal form, in cells affected by a disease relative to a control without the disease.
  • the mutation causes the disease.
  • Non-limiting examples of diseases associated with genetic mutations are cystic fibrosis, Duchenne muscular dystrophy, [3-thalassemia, hemophilia, sickle cell anemia, amyotrophic lateral sclerosis (ALS), severe combined immunodeficiency, Huntington’s disease, Alzheimer’s Disease, alpha- 1 antitrypsin deficiency, myotonic dystrophy Type 1, and Usher syndrome.
  • the disease may comprise, at least in part, a cancer, an inherited disorder, an ophthalmological disorder, a neurological disorder, a blood disorder, a metabolic disorder, or a combination thereof.
  • the target nucleic acid in some embodiments, comprises a portion of a gene comprising a mutation associated with cancer, a gene whose overexpression is associated with cancer, a tumor suppressor gene, an oncogene, a checkpoint inhibitor gene, a gene associated with cellular growth, a gene associated with cellular metabolism, or a gene associated with cell cycle.
  • the target nucleic acid encodes a cancer biomarker, such as a prostate cancer biomarker or non-small cell lung cancer.
  • the assay may be used to detect “hotspots” in target nucleic acids that may be predictive of lung cancer.
  • the target nucleic acid comprises a portion of a nucleic acid that is associated with a blood fever.
  • the target nucleic acid is a portion of a nucleic acid from a genomic locus, any DNA amplicon of, a reverse transcribed mRNA, or a cDNA from a locus of at least one of: ALK, APC, ATM, AXIN2, BAP1, BARD1, BLM, BMPR1A, BRCA1, BRCA2, BRIP1, CASR, CDC73, CDH1, CDK4, CDKN1B, CDKN1C, CDKN2A, CEBPA, CHEK2, CTNNA1, DICER1, DIS3L2, EGFR, EPCAM, FH, FLCN, GATA2, GPC3, GREM1, HOXB13, HRAS, MAX, MEN1, MET, MITF, MLH1, MSH2, MSH3, MSH6, MUTYH, NBN, NF1, NF2, NTHL1, PALB2, PDGFRA, PHOX2B, PMS2, POLDI, POLE, POTI, PRK
  • any region of the aforementioned gene loci may be probed for a mutation or deletion using the compositions and methods disclosed herein.
  • the compositions and methods for detection disclosed herein may be used to detect a single nucleotide polymorphism or a deletion.
  • the target nucleic acid comprises a portion of a nucleic acid from a genomic locus, any DNA amplicon of, a reverse transcribed mRNA, or a cDNA from a locus of at least one of: TRAC, B2M, PD1, PCSK9, DNMT1, HPRT1, RPL32P3, CCR5, FANCF, GRIN2B, EMX1, AAVS1, ALKBH5, CLTA, CDK11, CTNNB1, AXIN1, LRP6, TBK1, BAP1, TLE3, PPM1A, BCL2L2, SUFU, RICTOR, VPS35, TOPI, SIRT1, PTEN, MMD, PAQR8, H2AX, POU5F1, OCT4, SYS1, ARFRP1, TSPAN14, EMC2, EMC3, SEL1L, DERL2, UBE2G2, UBE2J1, and HRD1.
  • the target nucleic acid is a portion of a nucleic acid from a genomic locus, any DNA amplicon of, a reverse transcribed mRNA, or a cDNA from a locus of at least one of: : CFTR, FMRI, SMN1, ABCB11, ABCC8, ABCD1, ACAD9, ACADM, ACADVL, ACAT1, ACOX1, ACSF3, ADA, ADAMTS2, ADGRG1, AGA, AGL, AGPS, AGXT, AIRE, ALDH3A2, ALDOB, ALG6, ALMS1, ALPL, AMT, AQP2, ARG1, ARSA, ARSB, ASL, ASNS, ASPA, ASS1, ATM, ATP6V1B1, ATP7A, ATP7B, ATRX, BBS1, BBS10, BBS12, BBS2, BCKDHA, BCKDHB, BCS1L, BLM, BSND, CAPN3, CBS, CDH23
  • the target nucleic acid is a portion of a nucleic acid from a genomic locus, any DNA amplicon of, a reverse transcribed mRNA, or a cDNA from a locus of any one of the gene recited in TABLE 12.
  • TABLE 12 recites exemplary target nucleic acids.
  • the systems comprise at least two components each individually comprising one of the following: (i) an effector protein or a nucleic acid encoding the effector protein, wherein the effector protein comprises an amino acid sequence that is at least 75%, at least 80%, at least 85%, at least 90%, at least 95%, at least 98%, at least 99%, or 100% identical to any one of the sequences recited in TABLE 1; and (ii) a guide nucleic acid or a nucleic acid encoding the guide nucleic acid, wherein at least a portion of the guide nucleic acid is complementary to a target sequence of a target nucleic acid.
  • systems for modifying a target nucleic acid comprise at least two components each individually comprising one of the following: (i) an effector protein or a nucleic acid encoding the effector protein comprising the effector protein comprises about 100, about 120, about 140, about 160, about 180, about 200, about 220, about 240, about 260, about 280, about 300, about 320, about 340, about 360, about 380, about
  • systems for detecting a target nucleic acid comprising any one of the effector proteins described herein.
  • systems comprise a guide nucleic acid.
  • Systems may be used to detect a target nucleic acid.
  • systems comprise an effector protein described herein, a reagent, support medium, or a combination thereof.
  • systems comprise a fusion protein described herein.
  • effector proteins comprise an amino acid sequence that is at least 70%, at least 75%, at least 80%, at least 85%, at least 90%, at least 95%, or 100% identical to any one of the amino acid sequences selected from SEQ ID NOs: 1-23, 31-104, and 121-126.
  • the amino acid sequence of the effector protein is at least 70%, at least 75%, at least 80%, at least 85%, at least 90%, at least 95%, or 100% identical to any one of the amino acid sequences selected from SEQ ID NOs: 1-23, 31-104, and 121- 126.
  • Systems may be used for detecting the presence of a target nucleic acid associated with or causative of a disease, such as cancer, a genetic disorder, or an infection.
  • systems are useful for phenotyping, genotyping, or determining ancestry.
  • systems include kits and may be referred to as kits.
  • systems include devices and may also be referred to as devices.
  • Systems described herein may be provided in the form of a companion diagnostic assay or device, a point-of-care assay or device, or an over-the-counter diagnostic assay/device.
  • Reagents and effector proteins of various systems may be provided in a reagent chamber or on a support medium.
  • the reagent and/or effector protein may be contacted with the reagent chamber or the support medium by the individual using the system.
  • An exemplary reagent chamber is a test well or container.
  • the opening of the reagent chamber may be large enough to accommodate the support medium.
  • the system comprises a buffer and a dropper.
  • the buffer may be provided in a dropper bottle for ease of dispensing.
  • the dropper may be disposable and transfer a fixed volume. The dropper may be used to place a sample into the reagent chamber or on the support medium.
  • systems comprise a solution in which the activity of an effector protein occurs.
  • the solution comprises or consists essentially of a buffer.
  • the solution or buffer may comprise a buffering agent, a salt, a crowding agent, a detergent, a reducing agent, a competitor, or a combination thereof.
  • the buffer is the primary component or the basis for the solution in which the activity occurs.
  • concentrations for components of buffers described herein e.g., buffering agents, salts, crowding agents, detergents, reducing agents, and competitors
  • concentrations for components of buffers described herein are the same or essentially the same as the concentration of these components in the solution in which the activity occurs.
  • a buffer is required for cell lysis activity or viral lysis activity.
  • systems comprise a buffer, wherein the buffer comprise at least one buffering agent.
  • buffering agents include HEPES, TRIS, MES, ADA, PIPES, ACES, MOPSO, BIS-TRIS propane, BES, MOPS, TES, DISO, Trizma, TRICINE, GLY-GLY, HEPPS, BICINE, TAPS, A MPD, A MPSO, CHES, CAPSO, AMP, CAPS, phosphate, citrate, acetate, imidazole, or any combination thereof.
  • the concentration of the buffering agent in the buffer is 1 mM to 200 mM.
  • a buffer compatible with an effector protein may comprise a buffering agent at a concentration of 10 mM to 30 mM.
  • a buffer compatible with an effector protein may comprise a buffering agent at a concentration of about 20 mM.
  • a buffering agent may provide a pH for the buffer or the solution in which the activity of the effector protein occurs. The pH may be 3 to 4, 3.5 to 4.5, 4 to 5, 4.5 to 5.5, 5 to 6, 5.5 to 6.5, 6 to 7, 6.5 to 7.5, 7 to 8, 7.5 to 8.5, 8 to 9, 8.5 to 9.5, 9 to 10, or 9.5 to 10.5.
  • systems comprise a solution, wherein the solution comprises at least one salt.
  • the at least one salt is selected from potassium acetate, magnesium acetate, sodium chloride, potassium chloride, magnesium chloride, calcium chloride, and any combination thereof.
  • the concentration of the at least one salt in the solution is 5 mM to 100 mM, 5 mM to 10 mM, 1 mM to 60 mM, or 1 mM to 10 mM.
  • the concentration of the at least one salt is about 105 mM.
  • the concentration of the at least one salt is about 55 mM.
  • the concentration of the at least one salt is about 7 mM.
  • the solution comprises potassium acetate and magnesium acetate.
  • the solution comprises sodium chloride and magnesium chloride.
  • the solution comprises potassium chloride and magnesium chloride.
  • the salt is a magnesium salt and the concentration of magnesium in the solution is at least 5 mM, 7 mM, at least 9 mM, at least 11 mM, at least 13 mM, or at least 15 mM. In some embodiments, the concentration of magnesium is less than 20mM, less than 18 mM or less than 16 mM.
  • systems comprise a solution, wherein the solution comprises at least one crowding agent.
  • a crowding agent may reduce the volume of solvent available for other molecules in the solution, thereby increasing the effective concentrations of said molecules.
  • crowding agents include glycerol and bovine serum albumin.
  • the crowding agent is glycerol.
  • the concentration of the crowding agent in the solution is 0.01% (v/v) to 10% (v/v). In some embodiments, the concentration of the crowding agent in the solution is 0.5% (v/v) to 10% (v/v).
  • systems comprise a solution, wherein the solution comprises at least one detergent.
  • exemplary detergents include Tween, Triton-X, and IGEPAL.
  • a solution may comprise Tween, Triton-X, or any combination thereof.
  • a solution may comprise Triton-X.
  • a solution may comprise IGEPAL CA-630.
  • the concentration of the detergent in the solution is 2% (v/v) or less.
  • the concentration of the detergent in the solution is 1% (v/v) or less.
  • the concentration of the detergent in the solution is 0.00001% (v/v) to 0.01% (v/v).
  • the concentration of the detergent in the solution is about 0.01% (v/v).
  • systems comprise a solution, wherein the solution comprises at least one reducing agent.
  • exemplary reducing agents comprise dithiothreitol (DTT), B-mercaptoethanol (BME), or tris(2-carboxyethyl) phosphine (TCEP).
  • the reducing agent is DTT.
  • the concentration of the reducing agent in the solution is 0.01 mM to 100 mM. In some embodiments, the concentration of the reducing agent in the solution is 0.1 mM to 10 mM. In some embodiments, the concentration of the reducing agent in the solution is 0.5 mM to 2 mM.
  • the concentration of the reducing agent in the solution is 0.01 mM to 100 mM. In some embodiments, the concentration of the reducing agent in the solution is 0.1 mM to 10 mM. In some embodiments, the concentration of the reducing agent in the solution is about 1 mM.
  • systems comprise a solution, wherein the solution comprises a competitor.
  • competitors compete with the target nucleic acid or the reporter nucleic acid for cleavage by the effector protein or a dimer thereof.
  • exemplary competitors include heparin, and imidazole, and salmon sperm DNA.
  • the concentration of the competitor in the solution is 1 pg/mL to 100 pg/mL. In some embodiments, the concentration of the competitor in the solution is 40 pg/mL to 60 pg/mL.
  • systems comprise a solution, wherein the solution comprises a co-factor.
  • the co-factor allows an effector protein or a multimeric complex thereof to perform a function, including pre-crRNA processing and/or target nucleic acid cleavage .
  • the suitability of a cofactor for an effector protein or a multimeric complex thereof may be assessed, such as by methods based on those described by Sundaresan et al. (Cell Rep. 2017 Dec 26; 21(13): 3728-3739).
  • an effector or a multimeric complex thereof forms a complex with a co-factor.
  • the co-factor is a divalent metal ion.
  • the divalent metal ion is selected from Mg 2+ , Mn 2+ , Zn 2+ , Ca 2+ , Cu 2+ .
  • the divalent metal ion is Mg 2+ .
  • the co-factor is Mg 2+ .
  • systems disclosed herein comprise a reporter.
  • a reporter may comprise a single stranded nucleic acid and a detection moiety (e.g., a labeled single stranded RNA reporter), wherein the nucleic acid is capable of being cleaved by an effector protein (e.g., a CRISPR/Cas protein as disclosed herein) or a multimeric complex thereof, releasing the detection moiety, and, generating a detectable signal.
  • an effector protein e.g., a CRISPR/Cas protein as disclosed herein
  • reporter is used interchangeably with “reporter nucleic acid” or “reporter molecule”.
  • the effector proteins disclosed herein, activated upon hybridization of a guide nucleic acid to a target nucleic acid, may cleave the reporter.
  • Cleaving the “reporter” may be referred to herein as cleaving the “reporter nucleic acid,” the “reporter molecule,” or the “nucleic acid of the reporter.”
  • Reporters may comprise RNA.
  • Reporters may comprise DNA.
  • Reporters may be double -stranded. Reporters may be single-stranded.
  • reporters comprise a protein capable of generating a signal.
  • a signal may be a calorimetric, potentiometric, amperometric, optical (e.g., fluorescent, colorimetric, etc.), or piezo-electric signal.
  • the reporter comprises a detection moiety. Suitable detectable labels and/or moieties that may provide a signal include, but are not limited to, an enzyme, a radioisotope, a member of a specific binding pair; a fluorophore; a fluorescent protein; a quantum dot; and the like.
  • the reporter comprises a detection moiety and a quenching moiety.
  • the reporter comprises a cleavage site, wherein the detection moiety is located at a first site on the reporter and the quenching moiety is located at a second site on the reporter, wherein the first site and the second site are separated by the cleavage site.
  • the quenching moiety is a fluorescence quenching moiety.
  • the quenching moiety is 5’ to the cleavage site and the detection moiety is 3’ to the cleavage site.
  • the detection moiety is 5’ to the cleavage site and the quenching moiety is 3’ to the cleavage site.
  • the quenching moiety is at the 5’ terminus of the nucleic acid of a reporter.
  • the detection moiety is at the 3’ terminus of the nucleic acid of a reporter. In some embodiments, the detection moiety is at the 5’ terminus of the nucleic acid of a reporter. In some embodiments, the quenching moiety is at the 3 ’ terminus of the nucleic acid of a reporter.
  • Suitable fluorescent proteins include, but are not limited to, green fluorescent protein (GFP) or variants thereof, blue fluorescent variant of GFP (BFP), cyan fluorescent variant of GFP (CFP), yellow fluorescent variant of GFP (YFP), enhanced GFP (EGFP), enhanced CFP (ECFP), enhanced YFP (EYFP), GFPS65T, Emerald, Topaz (TYFP), Venus, Citrine, mCitrine, GFPuv, destabilised EGFP (dEGFP), destabilised ECFP (dECFP), destabilised EYFP (dEYFP), mCFPm, Cerulean, T-Sapphire, CyPet, YPet, mKO, HcRed, t-HcRed, DsRed, DsRed2, DsRed-monomer, J-Red, dimer2, t-dimer2(12), mRFPl, pocilloporin, Renilla GFP, Monster GFP, paGFP
  • Suitable enzymes include, but are not limited to, horseradish peroxidase (HRP), alkaline phosphatase (AP), beta-galactosidase (GAL), glucose-6-phosphate dehydrogenase, beta-N- acetylglucosaminidase, P-glucuronidase, invertase, Xanthine Oxidase, firefly luciferase, and glucose oxidase (GO).
  • HRP horseradish peroxidase
  • AP alkaline phosphatase
  • GAL beta-galactosidase
  • glucose-6-phosphate dehydrogenase beta-N- acetylglucosaminidase
  • P-glucuronidase invertase
  • Xanthine Oxidase firefly luciferase
  • GO glucose oxidase
  • the detection moiety comprises an invertase.
  • the substrate of the invertase may be sucrose.
  • a DNS reagent may be included in the system to produce a colorimetric change when the invertase converts sucrose to glucose.
  • the reporter nucleic acid and invertase are conjugated using a heterobifunctional linker via sulfo-SMCC chemistry.
  • Suitable fluorophores may provide a detectable fluorescence signal in the same range as 6- Fluorescein (Integrated DNA Technologies), IRDye 700 (Integrated DNA Technologies), TYE 665 (Integrated DNA Technologies), Alex Fluor 594 (Integrated DNA Technologies), or ATTO TM 633 (NHS Ester) (Integrated DNA Technologies).
  • fluorophores are fluorescein amidite, 6-Fluorescein, IRDye 700, TYE 665, Alex Fluor 594, or ATTO TM 633 (NHS Ester).
  • the fluorophore may be an infrared fluorophore.
  • the fluorophore may emit fluorescence in the range of 500 nm and 720 nm.
  • the fluorophore emits fluorescence at a wavelength of 700 nm or higher. In other embodiments, the fluorophore emits fluorescence at about 665 nm. In some embodiments, the fluorophore emits fluorescence in the range of 500 nm to 520 nm, 500 nm to 540 nm, 500 nm to 590 nm, 590 nm to 600 nm, 600 nm to 610 nm, 610 nm to 620 nm, 620 nm to 630 nm, 630 nm to 640 nm, 640 nm to 650 nm, 650 nm to 660 nm, 660 nm to 670 nm, 670 nm to 680 nm, 690 nm to 690 nm, 690 nm to 700 nm, 700 nm to 710 nm, 710 nm to 720 nm, 690
  • Systems may comprise a quenching moiety.
  • a quenching moiety may be chosen based on its ability to quench the detection moiety.
  • a quenching moiety may be a non-fluore scent fluorescence quencher.
  • a quenching moiety may quench a detection moiety that emits fluorescence in the range of 500 nm and 720 nm.
  • a quenching moiety may quench a detection moiety that emits fluorescence in the range of 500 nm and 720 nm.
  • the quenching moiety quenches a detection moiety that emits fluorescence at a wavelength of 700 nm or higher.
  • the quenching moiety quenches a detection moiety that emits fluorescence at about 660 nm or about 670 nm. In some embodiments, the quenching moiety quenches a detection moiety that emits fluorescence in the range of 500 to 520, 500 to 540, 500 to 590, 590 to 600, 600 to 610, 610 to 620, 620 to 630, 630 to 640, 640 to 650, 650 to 660, 660 to 670, 670 to 680, 690 to 690, 690 to 700, 700 to 710, 710 to 720, or 720 to 730 nm.
  • the quenching moiety quenches a detection moiety that emits fluorescence in the range 450 nm to 750 nm, 500 nm to 650 nm, or 550 to 650 nm.
  • a quenching moiety may quench fluorescein amidite, 6-Fluorescein, IRDye 700, TYE 665, Alex Fluor 594, or ATTO TM 633 (NHS Ester).
  • a quenching moiety may be Iowa Black RQ, Iowa Black FQ or IRDye QC-1 Quencher.
  • a quenching moiety may quench fluorescein amidite, 6-Fluorescein (Integrated DNA Technologies), IRDye 700 (Integrated DNA Technologies), TYE 665 (Integrated DNA Technologies), Alex Fluor 594 (Integrated DNA Technologies), or ATTO TM 633 (NHS Ester) (Integrated DNA Technologies).
  • a quenching moiety may be Iowa Black RQ (Integrated DNA Technologies), Iowa Black FQ (Integrated DNA Technologies) or IRDye QC-1 Quencher (LiCor). Any of the quenching moieties described herein may be from any commercially available source, may be an alternative with a similar function, a generic, or a non-tradename of the quenching moieties listed.
  • the detection moiety comprises a fluorescent dye. Sometimes the detection moiety comprises a fluorescence resonance energy transfer (FRET) pair. In some embodiments, the detection moiety comprises an infrared (IR) dye. In some embodiments, the detection moiety comprises an ultraviolet (UV) dye. Alternatively, or in combination, the detection moiety comprises a protein. Sometimes the detection moiety comprises a biotin. Sometimes the detection moiety comprises at least one of avidin or streptavidin. In some embodiments, the detection moiety comprises a polysaccharide, a polymer, or a nanoparticle. In some embodiments, the detection moiety comprises a gold nanoparticle or a latex nanoparticle.
  • a detection moiety may be any moiety capable of generating a calorimetric, potentiometric, amperometric, optical (e.g., fluorescent, colorimetric, etc.), or piezo-electric signal.
  • a nucleic acid of a reporter sometimes, is protein-nucleic acid that is capable of generating a calorimetric, potentiometric, amperometric, optical (e.g., fluorescent, colorimetric, etc.), or piezo-electric signal upon cleavage of the nucleic acid.
  • a calorimetric signal is heat produced after cleavage of the nucleic acids of a reporter.
  • a calorimetric signal is heat absorbed after cleavage of the nucleic acids of a reporter.
  • a potentiometric signal for example, is electrical potential produced after cleavage of the nucleic acids of a reporter.
  • An amperometric signal may be movement of electrons produced after the cleavage of nucleic acid of a reporter.
  • the signal is an optical signal, such as a colorimetric signal or a fluorescence signal.
  • An optical signal is, for example, a light output produced after the cleavage of the nucleic acids of a reporter.
  • an optical signal is a change in light absorbance between before and after the cleavage of nucleic acids of a reporter.
  • a piezo-electric signal is a change in mass between before and after the cleavage of the nucleic acid of a reporter.
  • the detectable signal may be a colorimetric signal or a signal visible by eye.
  • the detectable signal may be fluorescent, electrical, chemical, electrochemical, or magnetic.
  • the first detection signal may be generated by binding of the detection moiety to the capture molecule in the detection region, where the first detection signal indicates that the sample contained the target nucleic acid.
  • systems are capable of detecting more than one type of target nucleic acid, wherein the system comprises more than one type of guide nucleic acid and more than one type of reporter nucleic acid.
  • the detectable signal may be generated directly by the cleavage event. Alternatively, or in combination, the detectable signal may be generated indirectly by the signal event. Sometimes the detectable signal is not a fluorescent signal.
  • the detectable signal may be a colorimetric or color-based signal.
  • the detected target nucleic acid may be identified based on its spatial location on the detection region of the support medium.
  • the second detectable signal may be generated in a spatially distinct location than the first generated signal.
  • the reporter nucleic acid is a single-stranded nucleic acid sequence comprising ribonucleotides.
  • the nucleic acid of a reporter may be a single-stranded nucleic acid sequence comprising at least one ribonucleotide.
  • the nucleic acid of a reporter is a single -stranded nucleic acid comprising at least one ribonucleotide residue at an internal position that functions as a cleavage site.
  • the nucleic acid of a reporter comprises at least 2, at least 3, at least 4, at least 5, at least 6, at least 7, at least 8, at least 9, or at least 10 ribonucleotide residues at an internal position.
  • the nucleic acid of a reporter comprises from 2 to 10, from 3 to 9, from 4 to 8, or from 5 to 7 ribonucleotide residues at an internal position. Sometimes the ribonucleotide residues are continuous. Alternatively, the ribonucleotide residues are interspersed in between non-ribonucleotide residues. In some embodiments, the nucleic acid of a reporter has only ribonucleotide residues. In some embodiments, the nucleic acid of a reporter has only deoxyribonucleotide residues. In some embodiments, the nucleic acid comprises nucleotides resistant to cleavage by the effector protein described herein.
  • the nucleic acid of a reporter comprises synthetic nucleotides. In some embodiments, the nucleic acid of a reporter comprises at least one ribonucleotide residue and at least one non-ribonucleotide residue.
  • the nucleic acid of a reporter comprises at least one uracil ribonucleotide. In some embodiments, the nucleic acid of a reporter comprises at least two uracil ribonucleotides. Sometimes the nucleic acid of a reporter has only uracil ribonucleotides. In some embodiments, the nucleic acid of a reporter comprises at least one adenine ribonucleotide. In some embodiments, the nucleic acid of a reporter comprises at least two adenine ribonucleotides. In some embodiments, the nucleic acid of a reporter has only adenine ribonucleotides.
  • the nucleic acid of a reporter comprises at least one cytosine ribonucleotide. In some embodiments, the nucleic acid of a reporter comprises at least two cytosine ribonucleotides. In some embodiments, the nucleic acid of a reporter comprises at least one guanine ribonucleotide. In some embodiments, the nucleic acid of a reporter comprises at least two guanine ribonucleotides. In some embodiments, a nucleic acid of a reporter comprises a single unmodified ribonucleotide. In some embodiments, a nucleic acid of a reporter comprises only unmodified deoxyribonucleotides.
  • the nucleic acid of a reporter is 5 to 20, 5 to 15, 5 to 10, 7 to 20, 7 to 15, or 7 to 10 nucleotides in length. In some embodiments, the nucleic acid of a reporter is 3 to 20, 4 to 10, 5 to 10, or 5 to 8 nucleotides in length. In some embodiments, the nucleic acid of a reporter is 5 to 12 nucleotides in length.
  • the reporter nucleic acid is at least 2, at least 3, at least 4, at least 5, at least 6, at least 7, at least 8, at least 9, at least 10, at least 11, at least 12, at least 13, at least 14, at least 15, at least 16, at least 17, at least 18, at least 19, at least 20, at least 21, at least 22, at least 23, at least 24, at least 25, at least 26, at least 27, at least 28, at least 29, or at least 30 nucleotides in length.
  • the reporter nucleic acid is 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, at least 29, or at least 30 nucleotides in length.
  • systems comprise a plurality of reporters.
  • the plurality of reporters may comprise a plurality of signals.
  • systems comprise at least 2, at least 3, at least 4, at least 5, at least 6, at least 7, at least 8, at least 9, at least 10, at least 11, at least 12, at least 13, at least 14, at least 15, at least 20, at least 30, at least 40, or at least 50 reporters.
  • systems comprise an effector protein and a reporter nucleic acid configured to undergo trans cleavage by the effector protein.
  • Trans cleavage of the reporter may generate a signal from the reporter or alter a signal from the reporter.
  • the signal is an optical signal, such as a fluorescence signal or absorbance band.
  • Trans cleavage of the reporter may alter the wavelength, intensity, or polarization of the optical signal.
  • the reporter may comprise a fluorophore and a quencher, such that trans cleavage of the reporter separates the fluorophore and the quencher thereby increasing a fluorescence signal from the fluorophore.
  • detection of reporter cleavage to determine the presence of a target nucleic acid sequence may be referred to as ‘DETECTR’.
  • DETECTR detection of reporter cleavage to determine the presence of a target nucleic acid sequence.
  • a method of assaying for a target nucleic acid in a sample comprising contacting the target nucleic acid with an effector protein, a non- naturally occurring guide nucleic acid that hybridizes to a segment of the target nucleic acid, and a reporter nucleic acid, and assaying for a change in a signal, wherein the change in the signal is produced by cleavage of the reporter nucleic acid.
  • an activity of an effector protein may be inhibited. This is because the activated effector proteins collaterally cleave any nucleic acids. If total nucleic acids are present in large amounts, they may outcompete reporters for the effector proteins.
  • systems comprise an excess of reporter(s), such that when the system is operated and a solution of the system comprising the reporter is combined with a sample comprising a target nucleic acid, the concentration of the reporter in the combined solution-sample is greater than the concentration of the target nucleic acid.
  • the sample comprises amplified target nucleic acid.
  • the sample comprises an unamplified target nucleic acid.
  • the concentration of the reporter is greater than the concentration of target nucleic acids and non-target nucleic acids.
  • the non-target nucleic acids may be from the original sample, either lysed or unlysed.
  • the non-target nucleic acids may comprise byproducts of amplification.
  • systems comprise a reporter wherein the concentration of the reporter in a solution 1.5 fold, at least 2 fold, at least 3 fold, at least 4 fold, at least 5 fold, at least 6 fold, at least 7 fold, at least 8 fold, at least 9 fold, at least 10 fold, at least 11 fold, at least 12 fold, at least 13 fold, at least 14 fold, at least 15 fold, at least 16 fold, at least 17 fold, at least 18 fold, at least 19 fold, at least 20 fold, at least 30 fold, at least 40 fold, at least 50 fold, at least 60 fold, at least 70 fold, at least 80 fold, at least 90 fold, at least 100 fold excess of total nucleic acids.
  • systems described herein comprise a reagent or component for amplifying a nucleic acid.
  • reagents for amplifying a nucleic acid include polymerases, primers, and nucleotides.
  • systems comprise reagents for nucleic acid amplification of a target nucleic acid in a sample. Nucleic acid amplification of the target nucleic acid may improve at least one of sensitivity, specificity, or accuracy of the assay in detecting the target nucleic acid.
  • nucleic acid amplification is isothermal nucleic acid amplification, providing for the use of the system or system in remote regions or low resource settings without specialized equipment for amplification.
  • amplification of the target nucleic acid increases the concentration of the target nucleic acid in the sample relative to the concentration of nucleic acids that do not correspond to the target nucleic acid.
  • the reagents for nucleic acid amplification may comprise a recombinase, an oligonucleotide primer, a single-stranded DNA binding (SSB) protein, a polymerase, or a combination thereof that is suitable for an amplification reaction.
  • a recombinase an oligonucleotide primer
  • SSB single-stranded DNA binding
  • Non-limiting examples of amplification reactions are transcription mediated amplification (TMA), helicase dependent amplification (HD A), or circular helicase dependent amplification (cHDA), strand displacement amplification (SDA), recombinase polymerase amplification (RPA), loop mediated amplification (LAMP), exponential amplification reaction (EXPAR), rolling circle amplification (RCA), ligase chain reaction (LCR), simple method amplifying RNA targets (SMART), single primer isothermal amplification (SPIA), multiple displacement amplification (MDA), nucleic acid sequence based amplification (NASBA), hinge- initiated primer-dependent amplification of nucleic acids (HIP), nicking enzyme amplification reaction (NEAR), and improved multiple displacement amplification (IMDA).
  • TMA transcription mediated amplification
  • HD A helicase dependent amplification
  • cHDA circular helicase dependent amplification
  • SDA strand displacement amplification
  • RPA recombina
  • systems comprise a PCR tube, a PCR well or a PCR plate.
  • the wells of the PCR plate may be pre-aliquoted with the reagent for amplifying a nucleic acid, as well as a guide nucleic acid, an effector protein, a multimeric complex, or any combination thereof.
  • the wells of the PCR plate may be pre-aliquoted with a guide nucleic acid targeting a target sequence, an effector protein capable of being activated when complexed with the guide nucleic acid and the target sequence, and at least one population of a single stranded reporter nucleic acid comprising a detection moiety.
  • a user may thus add the biological sample of interest to a well of the pre-aliquoted PCR plate and measure for the detectable signal with a fluorescent light reader or a visible light reader.
  • systems comprise a PCR plate; a guide nucleic acid targeting a target sequence; an effector protein capable of being activated when complexed with the guide nucleic acid and the target sequence; and a single stranded reporter nucleic acid comprising a detection moiety, wherein the reporter nucleic acid is capable of being cleaved by the activated nuclease, thereby generating a detectable signal.
  • systems comprise a support medium; a guide nucleic acid targeting a target sequence; and an effector protein capable of being activated when complexed with the guide nucleic acid and the target sequence.
  • nucleic acid amplification is performed in a nucleic acid amplification region on the support medium.
  • the nucleic acid amplification is performed in a reagent chamber, and the resulting sample is applied to the support medium.
  • a system for modifying a target nucleic acid comprises a PCR plate; a guide nucleic acid targeting a target sequence; and an effector protein capable of being activated when complexed with the guide nucleic acid and the target sequence.
  • the wells of the PCR plate may be pre- aliquoted with the guide nucleic acid targeting a target sequence, and an effector protein capable of being activated when complexed with the guide nucleic acid and the target sequence. A user may thus add the biological sample of interest to a well of the pre-aliquoted PCR plate.
  • the nucleic acid amplification is performed for no greater than 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 25, 30, 40, 50, or 60 minutes, or any value 1 to 60 minutes. Sometimes, the nucleic acid amplification is performed for 1 to 60, 5 to 55, 10 to 50, 15 to 45, 20 to 40, or 25 to 35 minutes. Sometimes, the nucleic acid amplification reaction is performed at a temperature of around 20-45 °C.
  • the nucleic acid amplification reaction is performed at a temperature no greater than 20 °C, 25 °C, 30 °C, 35 °C, 37 °C, 40 °C, 45 °C, or any value 20 °C to 45 °C. In some embodiments, the nucleic acid amplification reaction is performed at a temperature of at least 20 °C, 25 °C, 30 °C, 35 °C, 37 °C, 40 °C, or 45 °C, or any value 20 °C to 45 °C. In some embodiments, the nucleic acid amplification reaction is performed at a temperature of 20 °C to 45 °C, 25 °C to 40 °C, 30 °C to 40 °C, or 35 °C to 40 °C.
  • systems comprise primers for amplifying a target nucleic acid to produce an amplification product comprising the target nucleic acid and a PAM.
  • at least one of the primers may comprise the PAM that is incorporated into the amplification product during amplification.
  • the compositions for amplification of target nucleic acids and methods of use thereof, as described herein, are compatible with any of the methods disclosed herein including methods of assaying for at least one base difference (e.g.
  • assaying for a SNP or a base mutation) in a target nucleic acid sequence methods of assaying for a target nucleic acid that lacks a PAM by amplifying the target nucleic acid sequence to introduce a PAM, and compositions used in introducing a PAM via amplification into the target nucleic acid sequence.
  • systems include a package, carrier, or container that is compartmentalized to receive one or more containers such as vials, tubes, and the like, each of the container(s) comprising one of the separate elements to be used in a method described herein.
  • Suitable containers include, for example, test wells, bottles, vials, and test tubes.
  • the containers are formed from a variety of materials such as glass, plastic, or polymers.
  • the system or systems described herein contain packaging materials. Examples of packaging materials include, but are not limited to, pouches, blister packs, bottles, tubes, bags, containers, bottles, and any packaging material suitable for intended mode of use.
  • a system may include labels listing contents and/or instructions for use, or package inserts with instructions for use.
  • a set of instructions will also typically be included.
  • a label is on or associated with the container.
  • a label is on a container when letters, numbers or other characters forming the label are attached, molded or etched into the container itself; a label is associated with a container when it is present within a receptacle or carrier that also holds the container, e.g., as a package insert.
  • a label is used to indicate that the contents are to be used for a specific therapeutic application. The label also indicates directions for use of the contents, such as in the methods described herein.
  • the product After packaging the formed product and wrapping or boxing to maintain a sterile barrier, the product may be terminally sterilized by heat sterilization, gas sterilization, gamma irradiation, or by electron beam sterilization. Alternatively, the product may be prepared and packaged by aseptic processing.
  • systems comprise a solid support.
  • An RNP or effector protein may be attached to a solid support.
  • the solid support may be an electrode or a bead.
  • the bead may be a magnetic bead.
  • the RNP is liberated from the solid support and interacts with other mixtures.
  • the effector protein of the RNP flows through a chamber into a mixture comprising a substrate.
  • a reaction occurs, such as a colorimetric reaction, which is then detected.
  • the protein is an enzyme substrate, and upon cleavage of the nucleic acid of the enzyme substrate-nucleic acid, the enzyme flows through a chamber into a mixture comprising the enzyme. When the enzyme substrate meets the enzyme, a reaction occurs, such as a calorimetric reaction, which is then detected.
  • systems and methods are employed under certain conditions that enhance an activity of the effector protein relative to alternative conditions, as measured by a detectable signal released from cleavage of a reporter in the presence of the target nucleic acid.
  • the detectable signal may be generated at about the rate of trans cleavage of a reporter nucleic acid.
  • the reporter nucleic acid is a homopolymeric reporter nucleic acid comprising 5 to 20 consecutive adenines, 5 to 20 consecutive thymines, 5 to 20 consecutive cytosines, or 5 to 20 consecutive guanines.
  • the reporter is an RNA-FQ reporter.
  • effector proteins disclosed herein recognize, bind, or are activated by, different target nucleic acids having different sequences, but are active toward the same reporter nucleic acid, allowing for facile multiplexing in a single assay having a single ssRNA-FQ reporter.
  • systems are employed under certain conditions that enhance trans cleavage activity of an effector protein.
  • trans cleavage occurs at a rate of at least 0.005 mmol/min, at least 0.01 mmol/min, at least 0.05 mmol/min, at least 0. 1 mmol/min, at least 0.2 mmol/min, at least 0.5 mmol/min, or at least 1 mmol/min.
  • systems and methods are employed under certain conditions that enhance cis-cleavage activity of the effector protein.
  • Certain conditions that may enhance the activity of an effector protein include a certain salt presence or salt concentration of the solution in which the activity occurs.
  • cis-cleavage activity of an effector protein may be inhibited or halted by a high salt concentration.
  • the salt may be a sodium salt, a potassium salt, or a magnesium salt.
  • the salt is NaCl.
  • the salt is KNO3.
  • the salt concentration is less than 150 mM, less than 125 mM, less than 100 mM, less than 75 mM, less than 50 mM, or less than 25 mM.
  • Certain conditions that may enhance the activity of an effector protein include the pH of a solution in which the activity.
  • increasing pH may enhance trans cleavage activity.
  • the rate of trans cleavage activity may increase with increase in pH up to pH 9.
  • the pH is about 7, about 7.1, about 7.2, about 7.3, about 7.4, about 7.5, about 7.6, about
  • the pH is 7 to 7.5, 7.5 to 8, 8 to 8.5, 8.5 to 9, or 7 to 8.5. In some embodiments, the pH is less than 7. In some embodiments, the pH is greater than 7.
  • Certain conditions that may enhance the activity of an effector protein includes the temperature at which the activity is performed.
  • the temperature is about 25 °C to about 50 °C.
  • the temperature is about 20 °C to about 40 °C, about 30 °C to about 50 °C, or about 40 °C to about 60 °C.
  • the temperature is about 25 °C, about 30 °C, about 35 °C, about 40 °C, about 45 °C, or about 50 °C.
  • Methods may comprise detecting target nucleic acids with compositions or systems described herein.
  • Methods may comprise detecting a target nucleic acid in a sample, e.g. , a cell lysate, a biological fluid, or environmental sample.
  • Methods may comprise detecting a target nucleic acid in a cell.
  • methods of detecting a target nucleic acid in a sample or cell comprises contacting the sample or cell with an effector protein or a multimeric complex thereof, a guide nucleic acid, wherein at least a portion of the guide nucleic acid is complementary to at least a portion of the target nucleic acid, and a reporter nucleic acid that is cleaved in the presence of the effector protein, the guide nucleic acid, and the target nucleic acid, and detecting a signal produced by cleavage of the reporter nucleic acid, thereby detecting the target nucleic acid in the sample.
  • methods result in trans cleavage of the reporter nucleic acid.
  • methods result in cis cleavage of the reporter nucleic acid.
  • methods of detecting comprise contacting a target nucleic acid, a cell comprising the target nucleic acid, or a sample comprising a target nucleic acid with an effector protein that comprises an amino acid sequence that is at least is at least 70%, at least 75%, at least 80%, at least 85%, at least 90%, at least 92%, at least 95%, at least 97%, at least 98%, at least 99%, or 100% identical to any one of SEQ ID NOs: 1-23, 31-104, and 121-126.
  • the amino acid sequence of the effector protein is at least 70%, at least 75%, at least 80%, at least 85%, at least 90%, at least 92%, at least 95%, at least 97%, at least 98%, at least 99%, or 100% identical to any one of SEQ ID NOs: 1-23, 31-104, and 121-126.
  • Methods may comprise contacting the sample to a complex comprising a guide nucleic acid comprising a segment that is reverse complementary to a segment of the target nucleic acid and a effector protein that exhibits sequence independent cleavage upon forming a complex comprising the segment of the guide nucleic acid binding to the segment of the target nucleic acid; and assaying for a signal indicating cleavage of at least some protein-nucleic acids of a population of protein-nucleic acids, wherein the signal indicates a presence of the target nucleic acid in the sample and wherein absence of the signal indicates an absence of the target nucleic acid in the sample.
  • Methods may comprise contacting the sample comprising the target nucleic acid with a guide nucleic acid targeting a target nucleic acid segment, an effector protein capable of being activated when complexed with the guide nucleic acid and the target nucleic acid segment, a single stranded nucleic acid of a reporter comprising a detection moiety, wherein the nucleic acid of a reporter is capable of being cleaved by the activated effector protein, thereby generating a first detectable signal, cleaving the single stranded nucleic acid of a reporter using the effector protein that cleaves as measured by a change in color, and measuring the first detectable signal on the support medium.
  • Methods may comprise contacting the sample or cell with an effector protein or a multimeric complex thereof and a guide nucleic acid at a temperature of at least about 25 °C, at least about 30°C, at least about 35°C, at least about 40°C, at least about 50°C, or at least about 65°C.
  • the temperature is not greater than 80°C.
  • the temperature is about 25°C, about 30°C, about 35°C, about 40°C, about 45°C, about 50°C, about 55°C, about 60°C, about 65 °C, or about 70°C.
  • the temperature is about 25 °C to about 45 °C, about 35 °C to about 55°C, or about 55°C to about 65°C.
  • threshold of detection there is a threshold of detection for methods of detecting target nucleic acids.
  • methods are not capable of detecting target nucleic acids that are present in a sample or solution at a concentration less than or equal to 10 nM.
  • the term “threshold of detection” is used herein to describe the minimal amount of target nucleic acid that must be present in a sample in order for detection to occur. For example, when a threshold of detection is 10 nM, then a signal can be detected when a target nucleic acid is present in the sample at a concentration of 10 nM or more.
  • the threshold of detection is less than or equal to 5 nM, 1 nM, 0.5 nM, 0.1 nM, 0.05 nM, 0.01 nM, 0.005 nM, 0.001 nM, 0.0005 nM, 0.0001 nM, 0.00005 nM, 0.00001 nM, 10 pM, 1 pM, 500 fM, 250 fM, 100 fM, 50 fM, 10 fM, 5 fM, 1 fM, 500 attomole (aM), 100 aM, 50 aM, 10 aM, or 1 aM.
  • the threshold of detection is in a range of from 1 aM to 1 nM, 1 aM to 500 pM, 1 aM to 200 pM, 1 aM to 100 pM, 1 aM to 10 pM, 1 aM to 1 pM, 1 aM to 500 fM, 1 aM to 100 fM, 1 aM to 1 fM, 1 aM to 500 aM, 1 aM to 100 aM, 1 aM to 50 aM, 1 aM to 10 aM, 10 aM to 1 nM, 10 aM to 500 pM, 10 aM to 200 pM, 10 aM to 100 pM, 10 aM to 10 pM, 10 aM to 1 pM, 10 aM to 500 fM, 10 aM to 100 fM, 10 aM to 1 fM, 10 aM to 100 aM to 100 aM, 10 aM to 50 a
  • the threshold of detection in a range of from 800 fM to 100 pM, 1 pM to 10 pM, 10 fM to 500 fM, 10 fM to 50 fM, 50 fM to 100 fM, 100 fM to 250 fM, or 250 fM to 500 fM. In some embodiments, the threshold of detection is in a range of from
  • the target nucleic acid is present in a cleavage reaction at a concentration of about 10 nM, about 20 nM, about 30 nM, about 40 nM, about 50 nM, about 60 nM, about 70 nM, about 80 nM, about 90 nM, about 100 nM, about 200 nM, about 300 nM, about 400 nM, about 500 nM, about 600 nM, about 700 nM, about 800 nM, about 900 nM, about 1 pM, about 10 pM, or about 100 pM.
  • the target nucleic acid is present in the cleavage reaction at a concentration of from 10 nM to 20 nM, from 20 nM to 30 nM, from 30 nM to 40 nM, from 40 nM to 50 nM, from 50 nM to 60 nM, from 60 nM to 70 nM, from 70 nM to 80 nM, from 80 nM to 90 nM, from 90 nM to 100 nM, from 100 nM to 200 nM, from 200 nM to 300 nM, from 300 nM to 400 nM, from 400 nM to 500 nM, from 500 nM to 600 nM, from 600 nM to 700 nM, from 700 nM to 800 nM, from 800 nM to 900 nM, from 900 nM to 1 pM, from 1 pM to 10 pM, from 10 pM to 100 pM, from 10 nM to 100 pM, from
  • methods detect a target nucleic acid in less than 60 minutes. In some embodiments, methods detect a target nucleic acid in less than about 120 minutes, less than about 110 minutes, less than about 100 minutes, less than about 90 minutes, less than about 80 minutes, less than about 70 minutes, less than about 60 minutes, less than about 55 minutes, less than about 50 minutes, less than about 45 minutes, less than about 40 minutes, less than about 35 minutes, less than about 30 minutes, less than about 25 minutes, less than about 20 minutes, less than about 15 minutes, less than about 10 minutes, less than about 5 minutes, less than about 4 minutes, less than about 3 minutes, less than about 2 minutes, or less than about 1 minute.
  • Methods may comprise detecting a detectable signal within 5 minutes of contacting the sample and/or the target nucleic acid with the guide nucleic acid and/or the effector protein. In some embodiments, detecting occurs within 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 20, 25, 30, 35, 40, 45, 50, 55, 60, 70, 80, 90, 100, 110, or 120 minutes of contacting the target nucleic acid. In some embodiments, detecting occurs within 1 to 120, 5 to 100, 10 to 90, 15 to 80, 20 to 60, or 30 to 45 minutes of contacting the target nucleic acid.
  • Methods may comprise amplifying a target nucleic acid for detection using any of the compositions or systems described herein.
  • Amplifying may comprise changing the temperature of the amplification reaction, also known as thermal amplification (e.g., PCR).
  • Amplifying may be performed at essentially one temperature, also known as isothermal amplification.
  • Amplifying may improve at least one of sensitivity, specificity, or accuracy of the detection of the target nucleic acid.
  • Amplifying may comprise subjecting a target nucleic acid to an amplification reaction selected from transcription mediated amplification (TMA), helicase dependent amplification (HD A), or circular helicase dependent amplification (cHDA), strand displacement amplification (SDA), recombinase polymerase amplification (RPA), loop mediated amplification (LAMP), exponential amplification reaction (EXPAR), rolling circle amplification (RCA), ligase chain reaction (LCR), simple method amplifying RNA targets (SMART), single primer isothermal amplification (SPIA), multiple displacement amplification (MDA), nucleic acid sequence based amplification (NASBA), hinge- initiated primer-dependent amplification of nucleic acids (HIP), nicking enzyme amplification reaction (NEAR), and improved multiple displacement amplification (IMDA).
  • TMA transcription mediated amplification
  • HD A helicase dependent amplification
  • cHDA circular helicase dependent amplification
  • SDA strand
  • amplification of the target nucleic acid comprises modifying the sequence of the target nucleic acid. For example, amplification may be used to insert a PAM sequence into a target nucleic acid that lacks a PAM sequence. In some embodiments, amplification may be used to increase the homogeneity of a target nucleic acid in a sample. For example, amplification may be used to remove a nucleic acid variation that is not of interest in the target nucleic acid sequence.
  • Amplifying may take 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 25, 30, 40, 50, or 60 minutes. Amplifying may be performed at a temperature of around 20-45 °C. Amplifying may be performed at a temperature of less than about 20 °C, less than about 25 °C, less than about 30 °C, 35 °C, less than about 37 °C, less than about 40 °C, or less than about 45 °C.
  • the nucleic acid amplification reaction may be performed at a temperature of at least about 20 °C, at least about 25 °C, at least about 30 °C, at least about 35 °C, at least about 37 °C, at least about 40 °C, or at least about 45
  • PAM screening reactions used 10 pl of RNP in 100 pl reactions with 1,000 ng of a plasmid library containing a randomized PAM sequence 5’ of a target protospacer (5’-NNNNN-3’, where N is any of A, C, G, T) in lx Cutsmart buffer and were carried out for 60 minutes at 37°C, and 15 minutes at 45 °C. Reactions were terminated with 1 pl of proteinase K and 5 pl of 500 mM EDTA for 30 minutes at 37 °C. Any target plasmid that was successfully cleaved had an adapter ligated to the cut end, enabling PCR amplification. Amplicons were detected by gel electrophoresis. Enriched PAMs are provided in
  • Effector proteins are tested for trans cleavage. Briefly, partially purified (e.g., nickel-NTA purified) effector proteins are incubated with crRNA and tracrRNA (or an sgRNA) in a trans cleavage buffer (e.g., 20 mM Tricine, 15 mM MgC12, 0.2 mg/ml BSA, ImM TCEP (pH 9 at 37°C)) at room temperature for about 10 to about 30 minutes, followed by addition of a target nucleic acid to produce effector-protein guide complexes. Trans cleavage activity is detected by fluorescence signal upon cleavage of a fluorophore -quencher reporter. Dilutions of the effector-protein guide complexes are performed, and the assay repeated at various concentrations of the effector-protein guide complexes.
  • a trans cleavage buffer e.g. 20 mM Tricine, 15 mM MgC12, 0.2 mg/ml BSA, ImM
  • Effector proteins are tested for their ability to produce indels in a mammalian cell line (e.g., HEK293T cells). Briefly, a plasmid encoding the effector proteins and a guide RNA are delivered by lipofection to the mammalian cells. This is performed with a variety of guide RNAs targeting several loci adjacent to biochemically determined PAM sequences. Indels in the loci are detected by next generation sequencing of PCR amplicons at the targeted loci and indel percentage is calculated as the fraction of sequencing reads containing insertions or deletions relative to an unedited reference sequence. “No plasmid” and Cas9 are included as negative and positive controls, respectively.
  • a mammalian cell line e.g., HEK293T cells.
  • a nucleic acid vector encoding a fusion protein is constructed for base editing.
  • the fusion protein comprises a catalytically inactive variant of an effector protein fused to a deaminase.
  • the fusion protein and at least one guide nucleic acid is tested for its ability to edit a target sequence in eukaryotic cells.
  • Cells are transfected with the nucleic acid vector and guide nucleic acid. After sufficient incubation, DNA is extracted from the transfected cells.
  • Target sequences are PCR amplified and sequenced by NGS and MiSeq. The presence of base modifications are analyzed from sequencing data. Results are recorded as a change in % base call relative to the negative control.
  • a single stranded reporter nucleic acid encoding a fluorescent protein (e.g., enhanced green fluorescent protein (EGFP)) and a eukaryotic promoter is generated with a target sequence that is known to be recognized by complexes of effector proteins disclosed herein and corresponding guide nucleic acids.
  • a nucleic acid vector encoding the Cas effector fused to a transcriptional activator; a guide nucleic acid; and the single stranded reporter nucleic acid encoding EGFP are introduced to eukaryotic cells via lipofection and EGFP expression is quantified by flow cytometry. Relative amounts of RNA, indicative of relative gene expression, are quantified with RT-qPCR.
  • a single stranded reporter nucleic acid encoding a fluorescent protein (e.g., enhanced green fluorescent protein (EGFP)) and a pSV40 promoter that drives constitutive expression of EGFP is generated with a target sequence that is known to be recognized by complexes of effector proteins disclosed herein and corresponding guide nucleic acids.
  • a nucleic acid vector encoding the Cas effector fused to a transcriptional repressor; a guide nucleic acid; and the single stranded reporter nucleic acid encoding EGFP are introduced to eukaryotic cells via lipofection and EGFP expression is quantified by flow cytometry. Relative amounts of RNA, indicative of relative gene expression, are quantified with RT-qPCR.
  • Example 7 Generating a Catalytically Inactive Variant of a CRISPR Cas Effector Protein
  • Sequence or structural analogs of a Cas nuclease provide an additional or supplemental way to predict the catalytic residues of the novel Cas nuclease relative to the previous description in this Example. Catalytic residues are usually highly conserved and can be identified in this manner.
  • computational software may be used to predict the structure of a Cas nuclease.
  • Effector proteins were compared for their helicase and nuclease activity. Swiss-modeling showed that SEQ ID NOs: 45, 65, 91 had structural similarity to helicase Lhr and ATP dependent helicase. Modeling was additionally used for structure and sequence identification. These modeling results suggest that some of these effector proteins may have a RuvA domain. RuvA is a signature domain in helicase proteins. RuvA dimers are present in certain ATP-dependent helicases. These modeling results also suggested that they may have a RecA domain.
  • effector proteins are assessed for their helicase activity and/or nuclease activity.
  • Such effector proteins are expressed in E. coli cells, which are then purified with Ni-NTA and a heparin column.
  • the purified effector proteins are then combined with predicted guide RNA and potential target nucleic acids, such as double-stranded DNA, single-stranded DNA or RNA.
  • Target depletion from helicase or nuclease activity by the effector proteins is assessed using a polyacrylamide gel electrophoresis (PAGE) system.
  • PAGE polyacrylamide gel electrophoresis
  • Predicted PAM sequences were identified for effector proteins corresponding to SEQ ID NO: 31-104. Analysis of the CRISPR arrays from which these effector proteins originated found that there was a conserved “GAA” sequence in the sequences immediately next to the protospacer sequence (see TABLE 7). The “GAA” sequence present in the arrays is indicative of a common PAM for SEQ ID NO: 31-104. It is predicted that the “GAA” and the reverse complement thereof “TTC” is the PAM sequence for these effector proteins.
  • effector protein sequences were analyzed by standard sequence alignment. Effector proteins having SEQ ID NO: 31-39 were identified as having highly similar sequences. Percent sequence identities across these sequences showed a range from 69.179% to 98.829% sequence identities (see TABLE 8).
  • In vitro enrichment involves the amplification of DNA fragments that may be excised by potential CRISPR-Cas candidates. The method begins with a cis cleavage assay, which is then followed by dA end repair, ligation, and multiple rounds of PCR. Magnetic bead purification is also performed after interference, ligation, and both rounds of PCR. The final purified PCR product can then be sequenced on a MiSeq instrument.
  • HEK293T lysates comprising effector proteins were combined with guide nucleic acids as set forth in TABLE 10 below and allowed to complex for 25 minutes at room temperature.
  • the complex was subsequently combined with lOx/CutSmart buffer and a 5’ PAM plasmid library, and incubated at 37° C for 1 hour.
  • 5 pL 0.5 M EDTA + 1 pL Proteinase K solution were added to each reaction, and further incubated at 37° C for 30 minutes.
  • Samples were subjected to adapter ligation and PCR before sequencing. The observed enriched PAM sequences (1% enriched and 5% enriched) are provided in TABLE 10.
  • *N is selected from A, G, C and T; W is A or T; Y is C or T; and R is A or G.

Landscapes

  • Health & Medical Sciences (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Genetics & Genomics (AREA)
  • Engineering & Computer Science (AREA)
  • Chemical & Material Sciences (AREA)
  • Zoology (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Wood Science & Technology (AREA)
  • Organic Chemistry (AREA)
  • Biomedical Technology (AREA)
  • Molecular Biology (AREA)
  • General Engineering & Computer Science (AREA)
  • Biotechnology (AREA)
  • Microbiology (AREA)
  • Biochemistry (AREA)
  • General Health & Medical Sciences (AREA)
  • Medicinal Chemistry (AREA)
  • Physics & Mathematics (AREA)
  • Biophysics (AREA)
  • Plant Pathology (AREA)
  • Mycology (AREA)
  • Micro-Organisms Or Cultivation Processes Thereof (AREA)
  • Medicines That Contain Protein Lipid Enzymes And Other Medicines (AREA)
  • Measuring Or Testing Involving Enzymes Or Micro-Organisms (AREA)

Abstract

L'invention concerne des compositions, des systèmes et des procédés comprenant des protéines effectrices et leurs utilisations. Ces protéines effectrices peuvent être caractérisées en tant que protéines associées à CRISPR (Cas). Des compositions, systèmes et procédés divers de la présente invention peuvent tirer profit des activités de ces protéines effectrices pour la modification, la détection et l'ingénierie d'acides nucléiques.
PCT/US2022/082137 2021-12-23 2022-12-21 Protéines effectrices et procédés d'utilisation WO2023122663A2 (fr)

Applications Claiming Priority (6)

Application Number Priority Date Filing Date Title
US202163293443P 2021-12-23 2021-12-23
US63/293,443 2021-12-23
US202263321624P 2022-03-18 2022-03-18
US63/321,624 2022-03-18
US202263353469P 2022-06-17 2022-06-17
US63/353,469 2022-06-17

Publications (2)

Publication Number Publication Date
WO2023122663A2 true WO2023122663A2 (fr) 2023-06-29
WO2023122663A3 WO2023122663A3 (fr) 2023-08-03

Family

ID=86903769

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/US2022/082137 WO2023122663A2 (fr) 2021-12-23 2022-12-21 Protéines effectrices et procédés d'utilisation

Country Status (1)

Country Link
WO (1) WO2023122663A2 (fr)

Family Cites Families (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US10253365B1 (en) * 2017-11-22 2019-04-09 The Regents Of The University Of California Type V CRISPR/Cas effector proteins for cleaving ssDNAs and detecting target DNAs
CA3130135A1 (fr) * 2019-02-14 2020-08-20 Metagenomi Ip Technologies, Llc Enzymes ayant des domaines ruvc
CN114867852A (zh) * 2019-10-30 2022-08-05 成对植物服务股份有限公司 V型crispr-cas碱基编辑器及其使用方法

Also Published As

Publication number Publication date
WO2023122663A3 (fr) 2023-08-03

Similar Documents

Publication Publication Date Title
CN112195164B (zh) 工程化的Cas效应蛋白及其使用方法
CN113151215B (zh) 工程化的Cas12i核酸酶及其效应蛋白以及用途
US20230167454A1 (en) Programmable nucleases and methods of use
WO2023102329A2 (fr) Protéines effectrices et leurs utilisations
US20230203481A1 (en) Effector proteins and methods of use
EP3730616A1 (fr) Systèmes d'édition de gènes à base unique fragmentés et application associée
WO2023028444A1 (fr) Protéines effectrices et procédés d'utilisation
US20240173433A1 (en) Programmable nucleases and methods of use
WO2023004430A1 (fr) Vecteurs codant pour des systèmes d'édition génique et leurs utilisations
WO2023092132A1 (fr) Protéines effectrices et leurs utilisations
WO2020069029A1 (fr) Nouvelles nucléases crispr
WO2022120520A1 (fr) Protéines effectrices cas modifiées et leurs procédés d'utilisation
WO2022241059A2 (fr) Protéines effectrices et procédés d'utilisation
WO2023122663A2 (fr) Protéines effectrices et procédés d'utilisation
US20240191280A1 (en) Enhanced guide nucleic acids and methods of use
US20240218393A1 (en) Vectors encoding gene editing systems and uses thereof
US20230323406A1 (en) Effector proteins and methods of use
US20230257739A1 (en) Effector proteins and methods of use
WO2024094084A1 (fr) Polypeptides iscb et leurs utilisations
WO2023220570A2 (fr) Protéines cas-phi modifiées et leurs utilisations
US20240131187A1 (en) Effector proteins, effector partners, compositions, systems and methods of use thereof
US20240226327A9 (en) Effector proteins, effector partners, compositions, systems and methods of use thereof
WO2023077095A2 (fr) Protéines effectrices, compositions, systèmes, dispositifs, kits et leurs procédés d'utilisation
WO2024006824A2 (fr) Protéines effectrices, compositions, systèmes et leurs procédés d'utilisation
US20230348873A1 (en) Nuclease-mediated nucleic acid modification

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 22912698

Country of ref document: EP

Kind code of ref document: A2