WO2022109058A1 - Nucléases comprenant des séquences peptidiques pénétrant les cellules - Google Patents

Nucléases comprenant des séquences peptidiques pénétrant les cellules Download PDF

Info

Publication number
WO2022109058A1
WO2022109058A1 PCT/US2021/059773 US2021059773W WO2022109058A1 WO 2022109058 A1 WO2022109058 A1 WO 2022109058A1 US 2021059773 W US2021059773 W US 2021059773W WO 2022109058 A1 WO2022109058 A1 WO 2022109058A1
Authority
WO
WIPO (PCT)
Prior art keywords
nuclease
cpp
construct
sequence
amino acid
Prior art date
Application number
PCT/US2021/059773
Other languages
English (en)
Inventor
Natarajan Sethuraman
Original Assignee
Entrada Therapeutics, Inc.
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Entrada Therapeutics, Inc. filed Critical Entrada Therapeutics, Inc.
Publication of WO2022109058A1 publication Critical patent/WO2022109058A1/fr

Links

Classifications

    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12NMICROORGANISMS OR ENZYMES; COMPOSITIONS THEREOF; PROPAGATING, PRESERVING, OR MAINTAINING MICROORGANISMS; MUTATION OR GENETIC ENGINEERING; CULTURE MEDIA
    • C12N9/00Enzymes; Proenzymes; Compositions thereof; Processes for preparing, activating, inhibiting, separating or purifying enzymes
    • C12N9/14Hydrolases (3)
    • C12N9/16Hydrolases (3) acting on ester bonds (3.1)
    • C12N9/22Ribonucleases RNAses, DNAses
    • CCHEMISTRY; METALLURGY
    • C07ORGANIC CHEMISTRY
    • C07KPEPTIDES
    • C07K5/00Peptides containing up to four amino acids in a fully defined sequence; Derivatives thereof
    • C07K5/04Peptides containing up to four amino acids in a fully defined sequence; Derivatives thereof containing only normal peptide links
    • C07K5/08Tripeptides
    • C07K5/0815Tripeptides with the first amino acid being basic
    • C07K5/0817Tripeptides with the first amino acid being basic the first amino acid being Arg
    • CCHEMISTRY; METALLURGY
    • C07ORGANIC CHEMISTRY
    • C07KPEPTIDES
    • C07K5/00Peptides containing up to four amino acids in a fully defined sequence; Derivatives thereof
    • C07K5/04Peptides containing up to four amino acids in a fully defined sequence; Derivatives thereof containing only normal peptide links
    • C07K5/08Tripeptides
    • C07K5/0821Tripeptides with the first amino acid being heterocyclic, e.g. His, Pro, Trp
    • CCHEMISTRY; METALLURGY
    • C07ORGANIC CHEMISTRY
    • C07KPEPTIDES
    • C07K5/00Peptides containing up to four amino acids in a fully defined sequence; Derivatives thereof
    • C07K5/04Peptides containing up to four amino acids in a fully defined sequence; Derivatives thereof containing only normal peptide links
    • C07K5/10Tetrapeptides
    • C07K5/1019Tetrapeptides with the first amino acid being basic
    • CCHEMISTRY; METALLURGY
    • C07ORGANIC CHEMISTRY
    • C07KPEPTIDES
    • C07K5/00Peptides containing up to four amino acids in a fully defined sequence; Derivatives thereof
    • C07K5/04Peptides containing up to four amino acids in a fully defined sequence; Derivatives thereof containing only normal peptide links
    • C07K5/10Tetrapeptides
    • C07K5/1024Tetrapeptides with the first amino acid being heterocyclic
    • AHUMAN NECESSITIES
    • A61MEDICAL OR VETERINARY SCIENCE; HYGIENE
    • A61KPREPARATIONS FOR MEDICAL, DENTAL OR TOILETRY PURPOSES
    • A61K38/00Medicinal preparations containing peptides
    • CCHEMISTRY; METALLURGY
    • C07ORGANIC CHEMISTRY
    • C07KPEPTIDES
    • C07K2319/00Fusion polypeptide
    • C07K2319/01Fusion polypeptide containing a localisation/targetting motif
    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12NMICROORGANISMS OR ENZYMES; COMPOSITIONS THEREOF; PROPAGATING, PRESERVING, OR MAINTAINING MICROORGANISMS; MUTATION OR GENETIC ENGINEERING; CULTURE MEDIA
    • C12N2310/00Structure or type of the nucleic acid
    • C12N2310/10Type of nucleic acid
    • C12N2310/20Type of nucleic acid involving clustered regularly interspaced short palindromic repeats [CRISPRs]
    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12RINDEXING SCHEME ASSOCIATED WITH SUBCLASSES C12C - C12Q, RELATING TO MICROORGANISMS
    • C12R2001/00Microorganisms ; Processes using microorganisms
    • C12R2001/01Bacteria or Actinomycetales ; using bacteria or Actinomycetales
    • C12R2001/46Streptococcus ; Enterococcus; Lactococcus
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02ATECHNOLOGIES FOR ADAPTATION TO CLIMATE CHANGE
    • Y02A50/00TECHNOLOGIES FOR ADAPTATION TO CLIMATE CHANGE in human health protection, e.g. against extreme weather
    • Y02A50/30Against vector-borne diseases, e.g. mosquito-borne, fly-borne, tick-borne or waterborne diseases whose impact is exacerbated by climate change

Definitions

  • CRISPR Clustered regularly interspaced short palindromic repeats
  • CRISPR associated proteins is a prokaryotic RNA-guided adaptive immune system that was identified in archaea and bacteria and has been adapted for gene editing.
  • CRISPR-Cas gene editing systems generally include two components: a CRISPR associated (Cas) nuclease and a guide RNA (gRNA).
  • the gRNA can be programmed to recognize a nucleic acid sequence via a “spacer” sequence of about 18 to about 22 nucleotides at the 5’ end of the gRNA.
  • the gRNA forms a ribonuclease complex with the Cas nuclease.
  • the spacer region of the gRNA forms a Watson-Crick base-pair with the target nucleic acid sequence enabling the Cas nuclease to precisely cleave the nucleic acid at the target sequence.
  • CRISPR- Cas gene systems Many types have been identified and can be classified into the three major types (I, II, and III) plus a less common but clearly distinct type IV. Some CRISPR-Cas systems target DNA, others target RNA, others can target both DNA and RNA.
  • the present disclosure provides a construct comprising at least one component of a CRISPR-Cas gene editing system and at least one cell penetrating peptide (CPP) sequence, wherein: (a) the component of a CRISPR-Cas gene editing system comprises a nuclease comprising one or more one loop regions, and at least one loop region comprises a CPP sequence inserted into the loop region; (b) the component of a CRISPR-Cas gene editing system comprises a nuclease to which at least one CPP sequence is conjugated; (c) the component of a CRISPR-Cas gene editing system comprises a gRNA to which at least one CPP sequence is conjugated; or (d) a combination of any of (a), (b) or (c).
  • CCPP cell penetrating peptide
  • the nuclease comprises one or more loop regions, and the at least one loop region comprises a CPP sequence inserted into the loop region. In some embodiments, the at least one CPP sequence is conjugated to the nuclease.
  • the nuclease comprises one or more one loop regions, and at least one loop region comprises a CPP sequence inserted into the loop region; and (b) at least one CPP sequence is conjugated to the nuclease.
  • the looped nuclease is zinc-finger nuclease, meganuclease, transcription activator-like effector nuclease (TALEN), RNA nuclease, DNA nuclease, or CRISPR/Cas nuclease.
  • the CRISPR/Cas nuclease is Cas9, Cas9 variant, Casl2a (Cpfl), Casl2b, Casl2c, Tnp-B like, Casl3a (C2c2), Casl3b, or Casl4.
  • the nuclease is Cas9 or a Cas9 variant.
  • the at least one CPP sequence is conjugated to an N-terminus of the nuclease, to a C-terminus of the nuclease, to a side chain of an amino acid residue of the nuclease, or a combination thereof. In embodiments, the at least one CPP sequence is conjugated to the N-terminus of the nuclease, to the C-terminus of the nuclease, or a combination thereof.
  • the at least one CPP sequence is conjugated to a side chain of an amino acid residue of the nuclease.
  • the side chain is the side chain of a
  • SUBSTITUTE SHEET residue of lysine, glutamine, glutamic acid, asparagine, or aspartic acid.
  • the side chain is the side chain of a residue of lysine.
  • the component of the CRISPR-Cas gene editing system comprises a guide RNA sequence.
  • one or more CPP sequence is conjugated to a 5’ end of the guide RNA sequence, a 3’ end of the guide RNA sequence, or on a backbone of the guide RNA sequence.
  • the construct comprises 1, 2, 3, 4, 5, 6, 7, 8, 9, or 10 CPPs.
  • the at least one CPP sequence conjugated to the nuclease, the guide RNA sequence, or a combination thereof is a cyclic CPP sequence.
  • the construct comprises a linker, wherein the at least one CPP sequence conjugated to the nuclease, the guide RNA sequence, or a combination thereof is a cyclic CPP sequence conjugated through the linker.
  • the linker is a bivalent or trivalent C1-C50 saturated or unsaturated, straight or branched alkyl, wherein 1-25 methylene groups are optionally and independently replaced by -N(H)-, -N(C1-C4 alkyl)-, -N(cycloalkyl)- , -O-, -C(O)-, -C(O)O-, -S-, -S(O)-, -S(O)2-, -S(O)2N(C1-C4 alkyl)-, -S(O)2N(cycloalkyl)-, - N(H)C(O)-, -N(C1-C4 alkyl)C(
  • the CPP comprises at least two arginine residues. In embodiments, the CPP comprises from two to six arginine residues.
  • the CPP comprises at least one amino acid residue that comprises a hydrophobic side chain.
  • the CPP comprises from one to six amino acid residues which independently comprise a hydrophobic side chain.
  • the amino acid residues comprising a hydrophobic side chain are residues of glycine, alanine, valine, leucine, isoleucine, methionine, phenylalanine, tryptophan, proline, naphthylalanine, phenylglycine, homophenylalanine, tyrosine, cyclohexylalanine, piperidine-2-carboxylic acid, cyclohexylalanine, norleucine, 3-(3-benzothienyl)-alanine, 3-(2-quinolyl)-alanine, O- benzylserine, 3-(4-(benzyloxy)phenyl)-alanine, S-(4-methylbenzyl)cystein
  • At least one of the amino acid residues comprising a hydrophobic side chain is a residue of tryptophan or phenylalanine. In embodiments, at least one of the amino acid residues comprising a hydrophobic side chain is a tryptophan residue. In embodiments, at least one of the amino acid residues comprising a 3
  • hydrophobic side chain is a phenylalanine residue.
  • each of the at least one of the amino acids comprising a hydrophobic side chain is tryptophan.
  • the CPP sequence comprises at least three arginine residues and at least three tryptophan residues.
  • the CPP sequence in at least one loop region of the nuclease comprises at least three arginine residues and at least three tryptophan residues.
  • the nuclease comprises a first looped region and a second looped region, wherein a first CPP sequence is inserted into the first looped region, and a second CPP sequence is inserted into the second looped region.
  • the first CPP comprises at least three arginine residues
  • the second CPP comprises at least three amino acid residues each of which independently comprises a hydrophobic side chain.
  • the CPP sequence comprises from one to six residues of a D- amino acids.
  • the one or more D-amino acid residues are arginine.
  • the one or more D-amino acid residues are residues of amino acids comprising a hydrophobic side chain.
  • the one or more of the residues of amino acids comprising a hydrophobic side chain is a residue of phenylalanine.
  • the one or more of the residues of amino acids comprising a hydrophobic side chain is a residue of naphthylalanine.
  • the construct comprises an exocyclic peptide (EP), wherein the EP is conjugated to the nuclease, guide RNA sequence, or combination thereof.
  • the EP comprises from 2 to 10 amino acid residues. In embodiments, the EP comprises from 4 to 8 amino acid residues.
  • the EP comprises 1 or 2 arginine residues. In embodiments, the EP comprises 1, 2, 3, or 4 lysine residues. In embodiments, the amino group on the side chain of each lysine residue is substituted with a trifluoroacetyl (-COCF3) group, allyloxy carbonyl (Alloc), l-(4,4-dimethyl-2,6-dioxocyclohexylidene)ethyl (Dde), or (4,4- dimethyl-2,6-dioxocyclohex-l-ylidene-3)-methylbutyl (ivDde) group. In embodiments, the EP comprises at least 2 amino acid residues with a hydrophobic side chain. In embodiments, the amino acid residue with a hydrophobic side chain is valine, proline, alanine, leucine, isoleucine, or methionine.
  • the exocyclic peptide comprises one of the following sequences: PKKKRKV; KR; RR; KKK; KGK; KBK; KBR; KRK; KRR; RKK; RRR; KKKK; KKRK; KRKK; KRRK; RKKR; RRRR; KGKK; KKGK; KKKKK; KKKRK; KBKBK; KKKRKV; PGKKRKV; PKGKRKV; PKKGRKV; PKKKGKV; PKKKRGV; or PKKKRKG.
  • the exocyclic peptide has the structure: Ac-P-K-K-K-R-K-V-.
  • each CPP sequence independently comprises a sequence from Table D.
  • the construct comprises a detectable tag.
  • the detectable tag is a FLAG tag, a polyhistidine tag, a SNAP tag, a Halo tag, cMyc, glutathione- S-transferase, avidin, an enzyme, a fluorescent protein, a luminescent protein, a chemiluminescent protein, a bioluminescent protein, or a phosphorescent protein.
  • the present disclosure provides a recombinant nucleic acid molecule encoding a construct as disclosed herein.
  • the construct is operably linked to a promoter.
  • the present disclosure provides a vector comprising an expression cassette encoding one or more components of a CRISPR-Cas gene editing system.
  • the present disclosure provides a host cell comprising a vector disclosed herein.
  • the host cell is a Chinese Hamster Ovary (CHO) cell, an HEK 293 cell, a BHK cell, a murine NSO cell, a murine SP2/0 cell, or an E. coli cell.
  • the present disclosure provides a composition comprising a construct as disclosed herein.
  • the present disclosure provides a method of producing a construct as disclosed herein, comprising culturing a host cell disclosed herein and purifying an expressed modified looped nuclease from the supernatant.
  • the present disclosure provides a method of treating a disease or condition, comprising administering a construct as disclosed herein.
  • the present disclosure provides a method of gene editing, comprising administering a construct as disclosed herein.
  • the method comprises upregulating target DNA.
  • the method comprises upregulating
  • SUBSTITUTE SHEET (RULE 26) expression of a target RNA.
  • the method comprises downregulating target DNA.
  • the method comprises downregulating expression of a target RNA.
  • FIG. 1 shows the secondary structure of Cas9 from Streptococcus pyogenes serotype Ml (SEQ ID NO: 1). Beta strands are italicized and bold. Loops are double underlined. Helices are underlined with a squiggly line.
  • CRISPR Clustered Regularly Interspaced Short Palindromic Repeats
  • CRISPR associated protein CRISPR associated protein gene editing system.
  • a system, method or composition is provided that facilitates the delivery of a Cas nuclease, a guide RNA (gRNA), or a combination thereof.
  • gRNA guide RNA
  • the disclosure provides a construct comprising a component of a CRISPR-Cas gene editing system and a peptide sequence which allows for the component of the CRISPR-Cas gene editing system to penetrate the cell membrane and deliver the component of the CRISPR-Cas gene editing system intracellularly.
  • the disclosure provides nucleases comprising an exogenous peptide sequence which allows for the nuclease to penetrate the cell membrane and deliver the nuclease intracellularly.
  • the disclosure provides a construct comprising a nuclease and at least one cell penetrating peptide (CPP) sequence, wherein: (a) the nuclease comprises one or more one loop regions, and at least one loop region comprises a CPP sequence inserted into the loop region;
  • CPP cell penetrating peptide
  • the disclosure provides a construct that comprises a guide RNA (gRNA) and at least one peptide sequence, for example, at least one cell penetrating peptide (CPP) which allows the gRNA to penetrate the cell membrane and deliver the gRNA intracellularly.
  • gRNA guide RNA
  • CPP cell penetrating peptide
  • the disclosure provides a composition comprising (1) a construct comprising a nuclease and at least one cell penetrating peptide (CPP) sequence, wherein: (a) the nuclease comprises one or more one loop regions, and at least one loop region comprises a CPP sequence inserted into the loop region; (b) at least one CPP sequence is conjugated to the nuclease; or
  • (c) a combination of (a) and (b); and (2) a construct that comprises a guide RNA (gRNA) and
  • SUBSTITUTE SHEET (RULE 26) at least one peptide sequence, for example, at least one cell penetrating peptide (CPP); or (3) a combination of (1) and (2).
  • CPP cell penetrating peptide
  • the disclosure provides a construct comprises at least one expression vector that encodes at least one component of a CRSIRP-Cas gene editing system and at least one peptide sequence which allows for the expression vector to penetrate the cell membrane and deliver the expression vector intracellularly.
  • the expression vector encodes at least one CRISPR-associated nuclease.
  • the expression vector encodes at least one gRNA.
  • the expression vector encodes at least one CRISPR-associated nuclease and at least one gRNA.
  • the present disclosure provides polynucleotides encoding the constructs described herein and methods for the production of the constructs described herein.
  • compositions and methods for insertion of CPP motifs (also referred to herein as “CPP sequences”) into the loops of nucleases or conjugating CPP to nucleases and/or guide sequences, as described herein, represents a general approach to endowing cell permeability to a component of a CRISPR-Cas gene editing system that would otherwise be cell-impermeable. This approach offers a number of advantages over previous methods, not the least of which is its simplicity. Additionally or alternatively, conjugation a CPP to a nuclease and/or guide RRNA sequence can further improve cell delivery efficiency of the disclosed constructs.
  • the methods described herein involve relatively minor changes to the structure of the CRISPR-Cas component and should be applicable to a broad range of nucleases and gRNAs.
  • the modified nucleases described herein are expected to be less immunogenic than other nucleases modified by other protein surface remodeling methods.
  • the CPP motifs grafted to protein loops are structurally constrained and relatively stable against proteolytic degradation.
  • the term “about” or “approximately” refers to a quantity, level, value, number, frequency, percentage, dimension, size, amount, weight or length that varies by acceptable levels in the art. In embodiments, the amount of variation may be as much as 15%, 10%, 9%, 8%, 7%, 6%, 5%, 4%, 3%, 2% or 1% to a reference quantity, level, value, number, frequency, percentage, dimension, size, amount, weight or length.
  • the term “about” or “approximately” refers a range of quantity, level, value, number, frequency, percentage, dimension, size, amount, weight or length ⁇ 15%, ⁇ 10%, ⁇ 9%, ⁇ 8%, ⁇ 7%, ⁇ 6%, ⁇ 5%, ⁇ 4%, ⁇ 3%, ⁇ 2%, or ⁇ 1% about a reference quantity, level, value, number, frequency, percentage, dimension, size, amount, weight or length.
  • a numerical range e.g., 1 to 5, about 1 to 5, or about 1 to about 5, refers to each numerical value encompassed by the range.
  • the range “1 to 5” is equivalent to the expression 1, 2, 3, 4, 5; or 1.0, 1.5, 2.0, 2.5, 3.0, 3.5, 4.0, 4.5, or 5.0; or 1.0, 1.1, 1.2, 1.3, 1.4, 1.5, 1.6, 1.7, 1.8, 1.9, 2.0, 2.1,
  • the term “substantially” refers to a quantity, level, value, number, frequency, percentage, dimension, size, amount, weight or length that is 80%, 85%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, 99% or higher compared to a reference quantity, level, value, number, frequency, percentage, dimension, size, amount, weight or length.
  • “substantially the same” refers to a quantity, level, value, number, frequency, percentage, dimension, size, amount, weight or length that produces an effect, e.g., a physiological effect, that is approximately the same as a reference quantity, level, value, number, frequency, percentage, dimension, size, amount, weight or length.
  • nuclease and “endonuclease” can be used interchangeably to refer to an enzyme capable of cleaving the phosphodiester bonds between the nucleotide subunits of a nucleic acid.
  • the nuclease is capable of cleaving DNA.
  • the nuclease is capable of cleaving RNA.
  • the nuclease is capable of cleaving both DNA and RNA.
  • CRISPR-associated protein refers to an RNA-guided endonuclease component of a CRISPR (Clustered Regularly Interspaced Short Palindromic Repeats) gene editing system and includes wild-type proteins as well as homologs, variants, fragments and derivatives thereof that exhibit one or more desired biological properties or functions, including, but not limited to, the ability to be targeted by a guide RNA (gRNA) to a target nucleic acid sequence (e.g., DNA or RNA sequence) in a sequence- specific manner.
  • gRNA guide RNA
  • target nucleic acid sequence e.g., DNA or RNA sequence
  • a functional homolog, variant, fragment or derivative is capable of (1) specifically interacting with a target nucleic acid sequence, for example by binding to and, optionally, cleaving (by endonuclease or nickase activity) the target nucleic acid sequence, (2) associating with a guide RNA, (3) recognizing a protospacer adjacent motif (PAM) that is juxtaposed to a target DNA or RNA sequence, or (4) combinations thereof.
  • CRISPR- associated proteins include, but are not limited to, Cas9, Cpfl (Cast 2), C2cl, C2c3, C2c2, Casl3, CasX and CasY.
  • the term “CRISPR-associated protein” includes all post-translationally
  • SUBSTITUTE SHEET (RULE 26) modified forms thereof, including, but not limited to glycosylation, phosphorylation, ubiquitinylation, S-nitrosylation, methylation, N- acetylation, lipidation, disulfide bond formation, sulfation, acylation, deamination etc.
  • variants have a sequence that is at least about 70%, about 75%, about 80%, about 85%, about 90%, about 95%, about 96%, about 97%, about 98%, about 99%, or 100% identical to an amino acid sequence of a naturally occurring (e.g., wild-type) CRISPR- associated protein.
  • fragment have an amino acid sequence of at least 50, at least 75, at least 100, at least 125, at least 150, at least 175, at least 200 or at least 250 contiguous amino acid residues of a naturally occurring (e.g., wild-type) CRISPR- associated protein.
  • gRNA guide RNA
  • the gRNA is a chimeric RNA molecule which includes a CRISPR RNA (crRNA) and trans-encoded CRISPR RNA (tracrRNA) component.
  • the crRNA includes from about 19 to about 22 consecutive nucleotides that are at least about 80%, about 85%, about 90%, about 95% or about 100% complementary to a target nucleic acid sequence.
  • Techniques for designing gRNAs are known, see, for example, Doench et al. (2014) Nature biotechnology. 32(12): 1262- 7 and Graham et al. (2015) Genome Biol. 6: 260, the disclosures of which are incorporated by reference herein.
  • CRISPR-Cas gene-editing system refers to protein, for example, a Cas protein, a nucleic acid, for example, a guide RNA (gRNA), or a combination thereof, which may be used to edit a genome.
  • gRNA guide RNA
  • the following patent documents describe CRISPR-Cas gene-editing systems: U.S. Pat. No. 8,697,359, U.S. Pat. No. 8,771,945, U.S. Pat. No. 8,795,965, U.S. Pat. No. 8,865,406, U.S. Pat. No. 8,871,445, U.S. Pat. No. 8,889,356, U.S. Pat. No. 8,895,308, U.S. Pat. No.
  • CRISPR-Cas ribonucleoprotein complex or “ribonucleoprotein complex” (RNP) refers to a complex that includes a nuclease and targeting gRNA.
  • the nuclease is a Cas protein.
  • component of a CRISPR-Cas gene editing system refers to a Cas endonuclease, a gRNA, a CRISPR-Cas ribonucleoprotein complex” or combinations thereof.
  • modified looped nuclease refer to a nuclease in which a CPP sequence described herein is inserted into a looped region of the nuclease.
  • peptide refers to a polymeric form of amino acids of any length, which can include coded and non-coded amino acids, chemically or biochemically modified or derivatized amino acids, and polypeptides having modified peptide backbones.
  • modified refers to a substance or compound (e.g., a cell, a polynucleotide sequence, and/or a polypeptide sequence) that has been altered or changed as compared to the corresponding unmodified substance or compound.
  • two or more amino acid residues are linked by the carboxyl group of one amino acid to the alpha amino group, thereby forming a peptide bond.
  • the polypeptide comprises a peptide backbone modification in which two or more amino acids are covalently attached by a bond other than a peptide bond.
  • the polypeptide comprises one or more non-natural amino acids, amino acid analogs, or other synthetic molecules that are capable of integrating into a polypeptide.
  • the term polypeptide comprises naturally occurring and artificially occurring amino acids. There is no upper limit to the number of amino acids that can be included in a polypeptide.
  • a residue of an amino acid refers to a derivative of the amino acid that is present in a particular product (e.g., peptide).
  • a product e.g., peptide
  • the CPPs described herein comprise amino acids (e.g., arginine) incorporated therein through formation of one or more peptide bonds.
  • the amino acids incorporated into the CPP may be referred to a “residue” of such amino acid, or simply as the amino acid.
  • arginine or an arginine residue refers to wherein the N- and C-terminus are attached to other amino acids through a peptide bond.
  • insert or “insertion” means the addition of a CPP sequence into a protein sequence.
  • the CPP sequence is inserted between amino acids in the looped region of a protein without removing or replacing amino acids of the protein, such
  • SUBSTITUTE SHEET (RULE 26) that the resulting protein contains the all of the amino acids in the native protein in addition to the CPP.
  • CPP insertion increases the total number of amino acids in the protein.
  • the CPP replaces amino acids present in the loop region of a protein, such that resulting protein does not contain all of the amino acids that were present prior to CPP insertion.
  • treat refers to any administration of one or more of the disclosed compounds that partially or completely alleviates, ameliorates, relieves, inhibits, delays onset of, reduces severity of, and/or reduces incidence of one or more symptoms or features of a disease as described herein.
  • inhibitor refers to a decrease in an activity, expression, function or other biological parameter and can include, but does not require complete ablation of the activity, expression, function or other biological parameter. Inhibition can include, for example, at least about a 10% reduction in the activity, response, condition, or disease as compared to a control. In embodiments, expression, activity or function of a gene or protein is decreased by a statistically significant amount.
  • therapeutically effective refers to an amount of a disclosed compound which confers a therapeutic effect on a patient.
  • the therapeutically effective amount is an amount sufficient to treat a disease in a subject in need thereof.
  • cell penetrating peptide refers to any peptide which is capable of penetrating a cell membrane.
  • the CPP is cyclic, and may be represented as “cCPP”.
  • the cyclic cell penetrating peptide is also capable of directing a compound (e.g., nuclease) to penetrate the membrane of a cell.
  • the CPP is conjugated to the nuclease (rather than inserted into a looped region)
  • the CPP is cyclic.
  • the CPP delivers the nuclease to the cytosol of the cell.
  • the CPP delivers the nuclease to the cellular location where the target sequence is located.
  • exocyclic peptide EP
  • modulatory peptide MP
  • the terms “exocyclic peptide” (EP) and “modulatory peptide” (MP) may be used interchangeably to refer to two or more amino acid residues linked by a peptide bond that are attached to the cyclic peptides described herein and alter the tissue distribution and/or retention of the compound.
  • the modulatory peptide comprises at least one positively charged amino acid residue, e.g., at least one lysine residue and/or at least one arginine residue.
  • exocyclic peptides are described
  • the exocyclic peptide can comprise a peptide that has been identified in the art as a “nuclear localization signal” (NLS).
  • NLS nuclear localization signal
  • nuclear localization sequence refers to an amino acid sequence which induces transport of molecules including such sequences or linked to such sequences into the nucleus of eukaryotic cells.
  • nuclear localization sequences include the nuclear localization sequence of the SV40 virus large T- antigen, the minimal functional unit of which is the seven amino acid sequence PKKKRKV, the nucleoplasmin bipartite NLS with the sequence NLSKRPAAIKKAGQAKKKK, the c-myc nuclear localization sequence comprising the amino acid sequence PAAKRVKLD or RQRRNELKRSF, the sequence
  • linker refers to a moiety that covalently bonds two or more moi eties (e.g., a CPP and an nuclease or guide RNA sequence, and optionally an exocyclic peptide).
  • the linker can be natural or non-natural amino acid or polypeptide.
  • the linker is a synthetic compound containing two or more appropriate functional groups suitable to bind a CPP and a nuclease or guide RNA sequence, to thereby form the constructs disclosed herein.
  • the linker comprises an M moiety to thereby conjugate the CPP to the nuclease or guide RNA sequence.
  • the CPP may be covalently bound to the Cas nuclease via linker.
  • sequence identity refers to the percentage of nucleic acids or amino acids between two oligonucleotide or polypeptide sequences, respectively, that are the same and in the same relative position. As such one oligonucleotide or polypeptide sequence has a certain percentage of sequence identity compared to another oligonucleotide or polypeptide sequence, respectively. For sequence comparison, typically one sequence acts as
  • SUBSTITUTE SHEET (RULE 26) a reference sequence, to which test sequences are compared.
  • Those of ordinary skill in the art will appreciate that two sequences are generally considered to be “substantially identical” if they contain identical residues in corresponding positions.
  • the sequence identity between two amino acid sequences may be determined using the Needleman-Wunsch algorithm (Needleman and Wunsch, 1970, J. Mol. Biol. 48: 443-453) as implemented in the Needle program of the EMBOSS package (EMBOSS: The European Molecular Biology Open Software Suite, Rice et al., 2000, Trends Genet. 16: 276-277), in the version that exists as of the date of filing.
  • the parameters used are gap open penalty of 10, gap extension penalty of 0.5, and the EBLOSUM62 (EMBOSS version of BLOSUM62) substitution matrix.
  • the output of Needle labeled “longest identity” (obtained using the -nobrief option) is used as the percent identity and is calculated as follows: (Identical Residues* 100)/(Length of Alignment-Total Number of Gaps in Alignment)
  • sequence identity may be determined using the Smith- Waterman algorithm, in the version that exists as of the date of filing.
  • sequence homology refers to the percentage of amino acids between two polypeptide sequences, or the percentage of nucleic acids between two oligonucleotide sequences, that are homologous and in the same relative position. As such one polypeptide sequence has a certain percentage of sequence homology compared to another polypeptide sequence. As will be appreciated by those of ordinary skill in the art, two sequences are generally considered to be “substantially homologous” if they contain homologous residues in corresponding positions. Homologous residues may be identical residues. Alternatively, homologous residues may be non-identical residues with appropriately similar structural and/or functional characteristics.
  • amino acids are typically classified as “hydrophobic” or “hydrophilic” amino acids, and/or as having “polar” or “non-polar” side chains, and substitution of one amino acid for another of the same type may often be considered a “homologous” substitution.
  • amino acid sequences or nucleic acid sequences may be compared using any of a variety of algorithms, including those available in commercial computer programs such as BLASTP, gapped BLAST, and PSI-BLAST, in existence as of the date of filing. Exemplary such programs are described in Altschul, et al., Basic local alignment search tool, J. Mol. BioL, 215(3): 403-410, 1990; Altschul, et al., Methods in Enzymology, Altschul, et al., “Gapped BLAST and PSI-BLAST : a new generation of protein database search
  • the terms “targeting” or “targeted to” refer to the association of a nuclease with a target nucleic acid molecule or a region of a target nucleic acid molecule.
  • the nuclease is associated with a guide RNA (gRNA) that is capable of hybridizing to a target nucleic acid under physiological conditions.
  • gRNA guide RNA
  • the nuclease targets a specific portion or site within the target nucleic acid, for example, a portion of the target nucleic acid comprising at least one protospacer adjacent motif (PAM) sequence or region.
  • PAM protospacer adjacent motif
  • target nucleic acid and “target sequence” refer to a nucleic acid molecule comprising a nucleic acid sequence to which the construct binds or hybridizes.
  • Target nucleic acids include, but are not limited to, RNA (including, but not limited to pre-mRNA and mRNA or portions thereof), DNA, including, for example, genomic DNA or cDNA, as well as non-translated RNA, such as miRNA.
  • a target nucleic acid can be a cellular gene (or mRNA transcribed from such gene) whose expression is associated with a particular disorder or disease state, or a nucleic acid molecule from an infectious agent.
  • the target nucleic acid is RNA.
  • the target nucleic acid is DNA.
  • the target nucleic acid is mRNA.
  • the target nucleic acid is pre-mRNA.
  • mRNA refers to an RNA molecule that encodes a protein and comprises pre-mRNA or mature mRNA.
  • Pre-mRNA refers to a newly synthesized eukaryotic mRNA molecule directly after DNA transcription.
  • a pre-mRNA is capped with a 5' cap, modified with a 3' poly-A tail, and/or spliced to produce a mature mRNA sequence.
  • pre-mRNA comprises one or more introns.
  • the pre-mRNA undergoes a process known as splicing to remove introns and join exons.
  • pre-mRNA comprises a polyadenylation site.
  • Target nucleic acid sequence refers to a nucleic acid sequence to which the gRNA of a CRISPR-Cas ribonucleoprotein complex (RNP) hybridizes.
  • the target nucleic acid sequence is a DNA sequence.
  • SUBSTITUTE SHEET (RULE 26) sequence is an RNA sequence.
  • the target nucleic acid sequence is an mRNA sequence.
  • the target nucleic acid sequence is a pre-mRNA sequence.
  • the target nucleic acid sequence is a mature mRNA sequence.
  • the term "gene” refers to a nucleic acid molecule comprising a nucleic acid sequence that encompasses a 5' promoter region associated with the expression of the gene product, and any intron and exon regions and 3' untranslated regions (“UTR”) associated with the expression of the gene product.
  • target gene refers to a gene that includes a nucleic acid sequence to which the gRNA of a CRISPR-Cas ribonucleoprotein complex (RNP) hybridizes or that encodes a target mRNA, for example, a target pre-mRNA or a mature target mRNA.
  • RNP CRISPR-Cas ribonucleoprotein complex
  • target protein refers to the amino acid sequence encoded by the target gene or target mRNA.
  • the target protein may have aberrant or reduced activity, or may not be a functional protein.
  • Alkyl refers to a fully saturated, straight or branched hydrocarbon chain having from one to twelve carbon atoms, and which is attached to the rest of the molecule by a single bond. Alkyls comprising any number of carbon atoms from 1 to 12 are included. An alkyl comprising up to 12 carbon atoms is a C1-C12 alkyl, an alkyl comprising up to 10 carbon atoms is a C1-C10 alkyl, an alkyl comprising up to 6 carbon atoms is a Ci-Ce alkyl and an alkyl comprising up to 5 carbon atoms is a C1-C5 alkyl.
  • a C1-C5 alkyl includes C5 alkyls, C4 alkyls, C3 alkyls, C2 alkyls and Ci alkyl (i.e., methyl).
  • a Ci-Ce alkyl includes all moieties described above for C1-C5 alkyls but also includes Ce alkyls.
  • a C1-C10 alkyl includes all moieties described above for C1-C5 alkyls and Ci-Ce alkyls, but also includes C7, Cs, C9 and C10 alkyls.
  • a C1-C12 alkyl includes all the foregoing moieties, but also includes C11 and C12 alkyls.
  • Non-limiting examples of C1-C12 alkyl include methyl, ethyl, n- propyl, z-propyl, ec-propyl, zz-butyl, z-butyl, sec-butyl, /-butyl, zz-pentyl, Z-amyl, zz-hexyl, n- heptyl, zz-octyl, zz-nonyl, zz-decyl, zz-undecyl, and zz-dodecyl.
  • an alkyl group can be optionally substituted.
  • Alkylene refers to a fully saturated, straight or branched divalent or trivalent hydrocarbon chain radical, having from one to forty carbon atoms.
  • C2-C40 alkylene include ethylene, propylene, zz-butylene, ethenylene, propenylene, zz-butenylene, propynylene, zz-butynylene, and the like.
  • the alkylene chain is attached, directly or indirectly, to the CPP through a single bond and, 16
  • SUBSTITUTE SHEET (RULE 26) directly or indirectly, to the nuclease through a single bond (and optionally to the exocyclic peptide through a single bond).
  • an alkylene chain can be optionally substituted as described herein.
  • alkenylene refers to a straight or branched divalent or trivalent hydrocarbon chain radical, having from two to forty carbon atoms, and having one or more carbon-carbon double bonds.
  • C2-C40 alkenylene include ethene, propene, butene, and the like.
  • the alkenylene chain is attached, directly or indirectly, to the CPP through a single bond and, directly or indirectly, to the nuclease through a single bond (and optionally to the exocyclic peptide through a single bond). Unless stated otherwise specifically in the specification, an alkenylene chain can be optionally substituted.
  • Alkynyl refers to a straight or branched hydrocarbon chain having from two to twelve carbon atoms and having one or more carboncarbon triple bonds. Each alkynyl group is attached to the rest of the molecule by a single bond. Alkynyl group comprising any number of carbon atoms from 2 to 12 are included.
  • An alkynyl group comprising up to 12 carbon atoms is a C2-C12 alkynyl
  • an alkynyl comprising up to 10 carbon atoms is a C2-C10 alkynyl
  • an alkynyl group comprising up to 6 carbon atoms is a C2- Ce alkynyl
  • an alkynyl comprising up to 5 carbon atoms is a C2-C5 alkynyl.
  • a C2-C5 alkynyl includes C5 alkynyls, C4 alkynyls, C3 alkynyls, and C2 alkynyls.
  • a C2-C6 alkynyl includes all moieties described above for C2-C5 alkynyls but also includes Ce alkynyls.
  • a C2-C10 alkynyl includes all moieties described above for C2-C5 alkynyls and C2-C6 alkynyls, but also includes C7, Cs, C9 and C10 alkynyls.
  • a C2-C12 alkynyl includes all the foregoing moieties, but also includes C11 and C12 alkynyls.
  • Non-limiting examples of C2-C12 alkenyl include ethynyl, propynyl, butynyl, pentynyl and the like. Unless stated otherwise specifically in the specification, an alkyl group can be optionally substituted.
  • Alkynylene refers to a straight or branched divalent or trivalent hydrocarbon chain, having from two to forty carbon atoms, and having one or more carbon-carbon triple bonds.
  • C2-C40 alkynylene include ethynylene, propargylene and the like.
  • the alkynylene chain is attached, directly or indirectly, to the CPP through a single bond and, directly or indirectly, to the nuclease through a single bond (and optionally to the exocyclic peptide through a single bond). Unless stated otherwise specifically in the specification, an alkynylene chain can be optionally substituted.
  • Carbocyclyl refers to a rings structure, wherein the atoms which form the ring are each carbon, and which is attached to the rest of the molecule by a single bond. Carbocyclic rings can comprise from 3 to 20 carbon atoms in the ring.
  • the carbocyclyl can be a monocyclic, bicyclic, tricyclic or tetracyclic ring system, which can include fused or bridged ring systems
  • Carbocyclic rings include aryls and cycloalkyl, cycloalkenyl, and cycloalkynyl as defined herein.
  • a carbocyclyl group can be optionally substituted.
  • a heterocyclyl group can be optionally substituted.
  • Cycloalkyl refers to a stable non-aromatic monocyclic or polycyclic fully saturated hydrocarbon having from 3 to 40 carbon atoms and at least one ring, wherein the ring consists solely of carbon and hydrogen atoms, which can include fused or bridged ring systems.
  • Monocyclic cycloalkyls include, for example, cyclopropyl, cyclobutyl, cyclopentyl, cyclohexyl, cycloheptyl, and cyclooctyl.
  • Polycyclic cycloalkyls include, for example, adamantyl, norbomyl, decalinyl, 7,7-dimethyl-bicyclo[2.2.1]heptanyl, and the like.
  • the cycloalkyl divalent and is attached, directly or indirectly, to the CPP through a single bond and, directly or indirectly, to the nuclease through a single bond (and optionally to the exocyclic peptide through a single bond).
  • a cycloalkyl group can be optionally substituted.
  • Cycloalkenyl refers to a stable non-aromatic monocyclic or polycyclic hydrocarbon having from 3 to 40 carbon atoms, at least one ring having, and one or more carbon-carbon double bonds, wherein the ring consists solely of carbon and hydrogen atoms, which can include fused or bridged ring systems.
  • Monocyclic cycloalkenyls include, for example, cyclopentenyl, cyclohexenyl, cycloheptenyl, cycloctenyl, and the like.
  • Polycyclic cycloalkenyl radicals include, for example, bicyclo[2.2.1]hept-2-enyl and the like.
  • cycloalkenyl is divalent and is attached, directly or indirectly, to the CPP through a single bond and, directly or indirectly, to the nuclease through a single bond (and optionally to the exocyclic peptide through a single bond). Unless otherwise stated specifically in the specification, a cycloalkenyl group can be optionally substituted.
  • Cycloalkynyl refers to a stable non-aromatic monocyclic or polycyclic hydrocarbon having from 3 to 40 carbon atoms, at least one ring having, and one or more carbon-carbon triple bonds, wherein the ring consists solely of carbon and hydrogen atoms, which can include fused or bridged ring systems.
  • Monocyclic cycloalkynyls include, for example, cycloheptynyl, cyclooctynyl, and the like.
  • cycloalkynyl is attached, directly or indirectly, to the CPP through a single bond and, directly or indirectly, to the nuclease through a single bond (and optionally to the exocyclic peptide through a single bond). Unless otherwise stated specifically in the specification, a cycloalkynyl group can be optionally substituted.
  • Aryl refers to a hydrocarbon ring system comprising hydrogen, 6 to 40 carbon atoms and at least one aromatic ring.
  • the aryl can be a monocyclic, bicyclic, tricyclic or tetracyclic ring system, which can include fused or bridged ring systems.
  • Aryls include, but are not limited to, aryl divalent radicals derived from aceanthrylene, acenaphthylene, acephenanthrylene, anthracene, azulene, benzene, chrysene, fluoranthene, fluorene, as-indacene, -indacene, indane, indene, naphthalene, phenalene, phenanthrene, pleiadene, pyrene, and triphenylene.
  • an aryl group can be optionally substituted.
  • Heterocyclyl refers to a stable 3- to 22-membered ring system which includes two to fourteen carbon atoms and from one to eight heteroatoms selected from nitrogen, oxygen and sulfur. Heterocyclyl or heterocyclic rings include heteroaryls as defined below. Unless stated otherwise specifically in the specification, the heterocyclyl can be a monocyclic, bicyclic, tricyclic or tetracyclic ring system, which can include fused or bridged ring systems; and the nitrogen, carbon or sulfur atoms in the heterocyclyl can be optionally oxidized; the nitrogen atom can be optionally quaternized; and the heterocyclyl can be partially or fully saturated.
  • heterocyclyl radicals include, but are not limited to, dioxolanyl, thienyl[l,3]dithianyl, decahydroisoquinolyl, imidazolinyl, imidazolidinyl, isothiazolidinyl, isoxazolidinyl, morpholinyl, octahydroindolyl, octahydroisoindolyl, 2-oxopiperazinyl, 2-oxopiperidinyl, 2-oxopyrrolidinyl, oxazolidinyl, piperidinyl, piperazinyl, 4-piperidonyl, pyrrolidinyl, succinimidyl, pyrazolidinyl, quinuclidinyl, thiazolidinyl, tetrahydrofuryl, trithianyl, tetrahydropyranyl, thiomorpholinyl,
  • SUBSTITUTE SHEET (RULE 26) thiamorpholinyl, 1-oxo-thiomorpholinyl, and 1,1-dioxo-thiomorpholinyl.
  • the heterocyclyl is divalent and is attached, directly or indirectly, to the CPP through a single bond and, directly or indirectly, to the nuclease through a single bond (and optionally to the exocyclic peptide through a single bond).
  • a heterocyclyl group can be optionally substituted.
  • Heteroaryl refers to a 5- to 22-membered aromatic ring comprising hydrogen atoms, one to fourteen carbon atoms, one to eight heteroatoms selected from nitrogen, oxygen and sulfur, and at least one aromatic ring.
  • the heteroaryl can be a monocyclic, bicyclic, tricyclic or tetracyclic ring system, which can include fused or bridged ring systems; and the nitrogen, carbon or sulfur atoms in the heteroaryl can be optionally oxidized; the nitrogen atom can be optionally quatemized.
  • Examples include, but are not limited to, azepinyl, acridinyl, benzimidazolyl, benzothiazolyl, benzindolyl, benzodioxolyl, benzofuranyl, benzooxazolyl, benzothiazolyl, benzothiadiazolyl, benzo[Z>][l,4]dioxepinyl, 1,4-benzodioxanyl, benzonaphthofuranyl, benzoxazolyl, benzodioxolyl, benzodioxinyl, benzopyranyl, benzopyranonyl, benzofuranyl, benzofuranonyl, benzothienyl (benzothiophenyl), benzotriazolyl, benzo[4,6]imidazo[l,2-a]pyridinyl, carbazolyl, cinnolinyl, dibenzofuranyl, dibenzothiophenyl
  • the heteroaryl is divalent and is attached, directly or indirectly, to the CPP through a single bond and, directly or indirectly, to the nuclease through a single bond (and optionally to the exocyclic peptide through a single bond). Unless stated otherwise specifically in the specification, a heteroaryl group can be optionally substituted.
  • ether refers to a divalent moiety having a formula - [(Rl)m -O- R.2)n]z- wherein each of m, n, and z are independently an integer from 1 to 40, and Ri and R2 are independently an alkylene. Examples include polyethylene glycol.
  • the ether is attached, directly or indirectly, to the CPP through a single bond and, directly or indirectly, to the nuclease through a single bond (and optionally to the exocyclic peptide through a single
  • substituted means any of the above groups (i.e., alkylene, alkenylene, alkynylene, aryl, carbocyclyl, cycloalkyl, cycloalkenyl, cycloalkynyl, heterocyclyl, heteroaryl, and/or ether) wherein at least one hydrogen atom is replaced by a bond to a non-hydrogen atoms such as, but not limited to: a deuterium atom; a halogen atom such as F, Cl, Br, and I; an oxygen atom in groups such as hydroxyl groups, alkoxy groups, and ester groups; a sulfur atom in groups such as thiol groups, thioalkyl groups, sulfone groups, sulfonyl groups, and sulfoxide groups; a nitrogen atom in groups such as amines, amides, alkylamines, dialkylamines, arylamines,
  • “Substituted” also means any of the above groups in which one or more hydrogen atoms are replaced by a higher-order bond (e.g., a double- or triple-bond) to a heteroatom such as oxygen in oxo, carbonyl, carboxyl, and ester groups; and nitrogen in groups such as imines, oximes, hydrazones, and nitriles.
  • a higher-order bond e.g., a double- or triple-bond
  • nitrogen in groups such as imines, oximes, hydrazones, and nitriles.
  • R g and Rh are the same or different and independently hydrogen, alkyl, alkenyl, alkynyl, alkoxy, alkylamino, thioalkyl, aryl, aralkyl, cycloalkyl, cycloalkenyl, cycloalkynyl, cycloalkylalkyl, haloalkyl, haloalkenyl, haloalkynyl, heterocyclyl, /'/-heterocyclyl, heterocyclylalkyl, heteroaryl, /'/-heteroaryl and/or heteroarylalkyl.
  • “Substituted” further means any of the above groups in which one or more hydrogen atoms are replaced by a bond to an amino, cyano, hydroxyl, imino, nitro, oxo, thioxo, halo, alkyl, alkenyl, alkynyl, alkoxy, alkylamino, thioalkyl, aryl, aralkyl, cycloalkyl, cycloalkenyl, cycloalkynyl, cycloalkylalkyl, haloalkyl, haloalkenyl, haloalkynyl, heterocyclyl, /'/-heterocyclyl, heterocyclylalkyl, heteroaryl, /'/-heteroaryl and/or heteroarylalkyl group.
  • each of the foregoing substituents can also be optionally substituted with one or more of the above substituents.
  • substituted also encompasses instances in which one or more hydrogen atoms on any of the above groups are replaced by a substituent listed in this paragraph, and the 21
  • an appropriate amino acid CPP e.g., lysine
  • the resulting bond e.g., amide bond
  • the second position is substituted with a thiol group which forms a disulfide bond with a -SH group on the nuclease.
  • the resulting disulfide is encompassed by the term substituent.
  • a point of attachment bond denotes a bond that is a point of attachment between two chemical entities, one of which is depicted as being attached to the point of attachment bond and the other of which is not depicted as being attached to the point of attachment bond.
  • the present disclosure provides constructs that include at least one cell penetrating peptide (CPP) sequence and at least one component of a CRISPR-Cas gene editing system.
  • the construct includes one or more (1, 2, 3, 4, 5, 6, 7, 8, 9, 10 or more) components of a CRISPR-Cas gene editing system.
  • the construct includes one or more (1, 2, 3, 4, 5, 6, 7, 8, 9, 10 or more cell penetrating peptides (CPPs).
  • the present disclosure provides nucleases comprising at least one cell penetrating peptide (CPP) sequence.
  • CPP cell penetrating peptide
  • a nuclease is provided in which one or more (1, 2, 3, 4, 5, 6, 7, 8, 9, 10 or more) CPP may be inserted into the nuclease.
  • constructs are provided in which one or more (1, 2, 3, 4, 5, 6, 7, 8, 9, 10 or more) CPP may be conjugated to a nuclease. In embodiments, constructs are provided in which one or more (1, 2, 3, 4, 5, 6, 7, 8, 9, 10 or more) CPP may be conjugated to a gRNA. In embodiments, constructs are provided in which one or more (1, 2, 3, 4, 5, 6, 7, 8, 9, 10 or more) CPP may be conjugated to the a CRISPR-Cas ribonucleoprotein complex (RNP).
  • RNP CRISPR-Cas ribonucleoprotein complex
  • the CPP is inserted at any suitable location in the nuclease, such as the N- or C-terminus, or between the N- and C-terminus.
  • the nuclease comprises at least one loop region, and the CPP is inserted in the at least one loop region.
  • the nuclease can contain any number of loops and any number of CPP sequences.
  • suitable loops for CPP insertion are those in which CPP insertion does not abolish the desired activity of the protein. Methods for determining the impact of CPP insertion on protein activity are known in the art (see, for example, the methods described herein).
  • the nuclease comprises 1, 2, 3, 4, 5, 6, 7, 8, 9, 10 or more loops, and 1, 2, 3, 4, 5, 6, 7, 8, 9, or 10 CPP sequences are inserted into the loop(s).
  • the CPP is inserted into at least about 10%, at least about 20%, at least about 30%, at least about 40%, at least about 50%, at least about 60%, at least about 70%, at least about 80%, at least about 90% and up to about 50%, up to about 60%, up to about 70%, up to about 80%, up to about 90% or up to about 100%of the loop regions in the nuclease.
  • constructs are provided that include one or more (1, 2, 3, 4, 5, 6, 7, 8, 9, 10 or more) nucleases.
  • constructs are provided that include one or more (1, 2, 3, 4, 5, 6, 7, 8, 9, 10 or more) gRNA.
  • constructs are provided that include one or more (1, 2, 3, 4, 5, 6, 7, 8, 9, 10 or more) CRISPR-Cas ribonucleoprotein complexes (RNP).
  • one or more (1, 2, 3, 4, 5, 6, 7, 8, 9, 10 or more) CPP may be conjugated to a nuclease, a guide RNA (gRNA), a ribonucleoprotein complex (RNP) or a combination thereof.
  • Conjugation of the CPP to the nuclease, gRNA or RNP may occur at any suitable location, such as the N- or C-terminus, or a side chain of an amino acid in the nuclease with a suitable functional group, at the 5’ or 3’ end of the gRNA, or on the backbone of the gRNA.
  • the CPP may be cyclic.
  • the CPP may be or include any amino acid sequence which facilitates cellular uptake of the modified looped proteins (e.g., nucleases) disclosed herein.
  • Suitable CPPs can include naturally occurring sequences, modified sequences, and synthetic sequences, and linear or cyclic sequences, which facilitate uptake of a looped nuclease.
  • Non-limiting examples of linear CPPs include Polyarginine (e.g., R9 or R11), Antennapedia sequences, HIV-TAT, Penetratin, Antp-3A (Antp mutant), Buforin II.
  • MAP model amphipathic peptide
  • K-FGF K-FGF
  • Ku70 Prion
  • Prion pVEC
  • Pep-1 SynBl
  • Pep-7 HN-1
  • BGSC Bis-Guanidinium- Spermidine-Cholesterol
  • BGTC Bis-Guanidinium-Tren-Cholesterol
  • the total number of amino acids in the CPP may be in the range of from about 4 to about 20 amino acids, e.g., about 4, about 5, about 6, about 7, about 8, about 9, about 10, about 11, about 12, about 13, about 14, about 15, about 16, about 17, about 18, about 19 or about 20 amino acids, inclusive of all ranges and subranges therebetween.
  • the CPPs disclosed herein comprise about 4 to about 13 amino acids.
  • the CPPs disclosed herein comprise about 6 to about 10 amino acids, or about 6 to about 8 amino acids.
  • Each amino acid in the CPP may be a natural or non-natural amino acid.
  • non-natural amino acid refers to an organic compound that is a congener of a natural amino acid in that it has an amine (-NH2) group on one end and a carboxylic acid (- COOH) group on the other end but the side chain or backbone is modified.
  • the resulting moiety has a structure and reactivity that is similar but not identical to a natural amino acid.
  • Non-limiting examples of such modifications include elongation or truncation of the side chain by one or more methylene groups, replacing one atom with another, and increasing the size of an aromatic ring.
  • the non-natural amino acid can be a modified amino acid, and/or amino acid analog, that is not one of the 20 common naturally occurring amino acids or the rare natural amino acids selenocysteine or pyrrolysine.
  • an analog of arginine may have one more or one fewer methylene groups on the side chain.
  • Non-natural amino acids can also be the D-isomer of the natural amino acids.
  • suitable amino acids include, but are not limited to, alanine, alloisoleucine, arginine, asparagine, aspartic acid, cysteine, glutamine, glutamic acid, glycine, histidine, isoleucine, leucine, lysine, methionine, naphthylalanine, phenylalanine, proline, pyroglutamic acid, serine, threonine, tryptophan, tyrosine, valine, a derivative, or combinations thereof.
  • alanine alloisoleucine
  • arginine asparagine
  • aspartic acid cysteine
  • cysteine glutamine
  • glutamic acid glutamic acid
  • glycine histidine
  • isoleucine leucine
  • lysine methionine
  • naphthylalanine naphthylalanine
  • phenylalanine proline
  • pyroglutamic acid serine, thre
  • the CPP comprises at least 2 arginine residues, or analogs thereof, e.g., 2, 3, 4, 5, 6, 7, 8, 9, or 10. In embodiments, the CPP comprises 2-6 arginine residues, or analogs thereof. In embodiments, the CPP comprises at least 3 arginine residues, or analogs thereof, e.g., 3, 4, 5, 6, 7, 8, 9, or 10. In embodiments, the CPP comprises from 3-6 arginine residues, or analogs thereof. In embodiments, the CPP comprises at least 2 arginine residues, e.g., 3, 4, 5, 6, 7, 8, 9, or 10. In embodiments, the CPP comprises 3-6 arginine residues, e.g., 3, 4, 5, 6, 7, 8, 9, or 10.
  • the CPP comprises at least one amino acid residue with a hydrophobic side chain, e.g., 1, 2, 3, 4, 5, 6, 7, 8, 9, or 10 amino acid residues with a hydrophobic side chain. In embodiments, the CPP comprises from 1-6 amino acid residues with a hydrophobic side chain.
  • each hydrophobic amino acid (also referred to herein as an amino acid having a hydrophobic side chain) independently has a hydrophobicity value which is greater than that of glycine.
  • each hydrophobic amino acid independently has a hydrophobicity value which is greater than that of alanine.
  • each hydrophobic amino acid independently has a hydrophobicity value which is greater or equal to phenylalanine. Hydrophobicity may be measured using hydrophobicity scales known in the art.
  • the CPP sequence comprises 4, 5, 6, 7, 8, 9, 10, or more amino acid residues.
  • the CPP sequence comprises from 1-6 D-amino acids.
  • the chirality of the amino acids can be selected to improve cytosolic uptake efficiency.
  • at least two of the amino acids have the opposite chirality.
  • the at least two amino acids having the opposite chirality can be adjacent to each other.
  • at least three amino acids have alternating stereochemistry relative to each other.
  • the at least three amino acids having the alternating chirality relative to each other can be adjacent to each other.
  • at least two of the amino acids have the same chirality.
  • the at least two amino acids having the same chirality can be adjacent to each other. In embodiments, at least two amino acids have the same chirality and at least two amino acids have the opposite chirality. In embodiments, the at least two amino acids having the opposite chirality can be adjacent to the at least two amino acids having the same chirality. Accordingly, in embodiments, adjacent amino acids in the CPP can have any of the following sequences: D-L; L-D; D-L-L-D; L-D-D-L; L-D-L-L-D; D-L-D-D- L; D-L-L-D-L; or L-D-D-L-D.
  • the hydrophobic amino acid residue comprises an aryl or heteroaryl group, each of which is optionally substituted.
  • the hydrophobic amino acid residue comprises an alkyl, alkenyl, or alkynyl side chain, each of which is optionally substituted.
  • each amino acid residue comprising a hydrophobic side chain is independently selected from a residue of glycine, alanine, valine, leucine, isoleucine,
  • SUBSTITUTE SHEET (RULE 26) methionine, phenylalanine, tryptophan, proline, naphthylalanine, phenylglycine, homophenylalanine, tyrosine, cyclohexylalanine, piperidine-2-carboxylic acid, cyclohexylalanine, norleucine, 3-(3-benzothienyl)-alanine, 3-(2-quinolyl)-alanine, O- benzylserine, 3-(4-(benzyloxy)phenyl)-alanine, S-(4-methylbenzyl)cysteine, 7V-(naphthalen-2- yl)glutamine, 3-(l,l'-biphenyl-4-yl)-alanine, tert-leucine, or nicotinoyl lysine, each of which is optionally substituted with one or more substituents.
  • At least one of the amino acid residues comprising a hydrophobic side chain is a residue of tryptophan or phenylalanine. In embodiments, at least one of the amino acid residues comprising a hydrophobic side chain is a tryptophan residue. In embodiments, at least one of the amino acid residues comprising a hydrophobic side chain is a phenylalanine residue.
  • each of the at least one of the amino acid residues comprising a hydrophobic side chain is a tryptophan residue.
  • each hydrophobic amino acid is independently a hydrophobic aromatic amino acid.
  • the aromatic hydrophobic amino acid is naphthylalanine, 3-(3-benzothienyl)-alanine, phenylglycine, homophenylalanine, phenylalanine, tryptophan, or tyrosine, each of which is optionally substituted with one or more substituents.
  • each hydrophobic amino acid is tryptophan or phenylalanine. In embodiments, each hydrophobic amino acid is tryptophan. In embodiments, the hydrophobic amino acid is tryptophan when the CPP sequence is inserted into the nuclease. In embodiments, each hydrophobic amino acid is phenylalanine.
  • the aromatic hydrophobic amino acid is:
  • the optional substituent can be any atom or group which does not significantly reduce (e.g., by more than 50%) the cytosolic delivery efficiency of the CPP, e.g., compared to an otherwise identical sequence which does not have the substituent.
  • the optional substituent can be a hydrophobic substituent or a hydrophilic substituent.
  • the optional substituent is a hydrophobic substituent.
  • the substituent increases the solvent-accessible surface area (as defined herein) of the hydrophobic amino acid.
  • the substituent can be a halogen, alkyl, alkenyl, alkynyl, cycloalkyl, cycloalkenyl, cycloalkynyl, heterocyclyl, aryl, heteroaryl, alkoxy, aryloxy, acyl, alkylcarbamoyl, alkylcarboxamidyl, alkoxycarbonyl, alkylthio, or arylthio.
  • the substituent is a halogen.
  • the size of the hydrophobic amino acid may be selected to improve cytosolic delivery efficiency of the CPP. For example, a larger hydrophobic amino acid may improve cytosolic delivery efficiency compared to an otherwise identical sequence having a smaller hydrophobic amino acid.
  • the size of the hydrophobic amino acid can be measured in terms of molecular weight of the hydrophobic amino acid, the steric effects of the hydrophobic amino acid, the solvent-accessible surface area (SASA) of the side chain, or combinations thereof.
  • the size of the hydrophobic amino acid is measured in terms of the molecular weight of the hydrophobic amino acid, and the larger hydrophobic amino acid has a side chain with a molecular weight of at least about 90 g/mol, or at least about 130 g/mol, or at least about 141 g/mol.
  • the size of the amino acid is measured in terms of the SASA of the hydrophobic side chain, and the larger hydrophobic amino acid has a side chain with a SASA greater than alanine, or greater than glycine.
  • the hydrophobic amino acid(s) have a hydrophobic side chain with a SASA greater than or equal to about piperidine- 2-carboxylic acid, greater than or equal to about tryptophan, greater than or equal to about phenylalanine, or equal to or greater than about naphthylalanine.
  • the hydrophobic amino acid(s) have a side chain side with a SASA of at least about 200 A 2 , at least about 210 A 2 , at least about 220 A 2 , at least about 240 A 2 , at least about 250 A 2 , at least about 260 A 2 , at least about 270 A 2 , at least about 280 A 2 , at least about 290 A 2 , at least about 300 A 2 , at least about 310 A 2 , at least about 320 A 2 , at least about 330 A 2 , at least about 350 A 2 , at least about 360 A 2 , at least about 370 A 2 , at least about 380 A 2 , at least about 390 A 2 , at least about 400 A 2 , at least about 410 A 2 , at least about 420 A 2 , at least about 430 A 2 , at least about 440 A 2 , at least about 450 A 2 , at least about 460 A 2 , at least about 470 A 2 , at least
  • hydrophobic surface area refers to the surface area (reported as square Angstroms; A 2 ) of an amino acid side chain that is accessible to a solvent.
  • SASA is calculated using the 'rolling ball' algorithm developed by Shrake & Rupley (J Mol Biol. 79 (2): 351-71), which is herein incorporated by reference in its entirety for all purposes. This algorithm uses a “sphere” of solvent of a particular radius to probe the surface of the molecule. A typical value of the sphere is 1.4 A, which approximates to the radius of a water molecule.
  • SASA values for certain side chains are shown below in Table C.
  • the SASA values described herein are based on the theoretical values listed in Table C below, as reported by Tien, et al. (PLOS ONE 8(11): e80635.
  • the CPPs described herein comprise at least two or at least three arginine residues. In embodiments, the CPPs described herein comprise at least one, two, or three amino acid residues independently having a hydrophobic side chain. In embodiments, the CPP sequence described herein comprises at least three arginine residues and at least three tryptophan residues.
  • the at least three arginines and the at least three amino acids having a hydrophobic side chain together constitute a CPP and may be inserted into one loop 32
  • a CPP may be inserted into more than one looped region.
  • a CPP with at least three arginines is inserted into a first loop.
  • the at least three arginines are considered a CPP.
  • the at least three amino acids with a hydrophobic side chain are inserted into a second loop.
  • the at least three hydrophobic amino acids are considered a CPP.
  • the CPPs may include any combination of at least three arginines and at least one, two, or three hydrophobic amino acids described herein.
  • the CPPs described herein comprise at least three arginines and at least three hydrophobic amino acids described herein.
  • the CPPs described herein comprise at least three arginines and at least four hydrophobic amino acid residues described herein.
  • the CPPs described herein comprise at least four arginines and at least three hydrophobic amino acids described herein.
  • the CPPs described herein comprise at least four arginines and at least four hydrophobic amino acids described herein.
  • an arginine is adjacent to a hydrophobic amino acid.
  • the arginine residue has the same chirality as the hydrophobic amino acid residue.
  • at least two arginine residues are adjacent to each other.
  • three arginine residues are adjacent to each other.
  • at least two hydrophobic amino acid residues are adjacent to each other.
  • at least three hydrophobic amino acid residues are adjacent to each other.
  • the CPPs described herein comprise at least two consecutive hydrophobic amino acid residues and at least two consecutive arginine residues.
  • the CPPs described herein comprise at least two hydrophobic amino acids and at least two arginine residues. In embodiments, one hydrophobic amino acid is adjacent to one of the arginines. In embodiments, the CPPs described herein comprise at least three consecutive hydrophobic amino acid residues and three consecutive arginine residues. In embodiments, one hydrophobic amino acid is adjacent to one of the arginines. These various combinations of amino acids can have any arrangement of D and L amino acids.
  • the CPPs described herein comprise at least two hydrophobic amino acid residues and at least two arginine residues, wherein the at least two hydrophobic residues are separated from each other by an intervening amino acid and the at least two arginine residues are separated by an intervening amino acid.
  • the hydrophobic residues have the same chirality.
  • the arginine residues have the same chirality.
  • the intervening amino acid does not have the same chirality as the arginine residues.
  • the CPP may comprise a residue of lysine, glutamine, glutamic acid, asparagine, aspartic acid, or an amino acid comprising an -SH group.
  • the CPP may be or include any of the sequences listed in Table D.
  • the CPPs used in the modified loop nucleases and/or conjugated to the nucleases as disclosed herein may comprise any one of the sequences in Table D or comprise any one of the sequences listed in Table D, along with additional amino acids (e.g., lysine, glutamine, glutamic acid, asparagine, aspartic acid).
  • the sequences in Table D may be cyclized by forming a peptide bond between the terminal amino acids.
  • one or more amino acids may be added to the sequences below to mediate cyclization and/or conjugation to nucleases.
  • amino acids include, but are not limited to amino acids include a thiol (-SH) group, such as cystine, or amino acids with functional groups that may be used to conjugate the CPP the nuclease (e.g., through a linker), such as lysine, glutamine, glutamic acid, asparagine, aspartic acid.
  • a thiol (-SH) group such as cystine
  • amino acids with functional groups that may be used to conjugate the CPP the nuclease (e.g., through a linker), such as lysine, glutamine, glutamic acid, asparagine, aspartic acid.
  • SUBSTITUTE SHEET (RULE 26) phosphothreonine; Pip, L-piperidine-2-carboxylic acid; Cha, L-3-cyclohexyl-alanine; Tm, trimesic acid; Dap, L-2, 3 -diaminopropionic acid; Sar, sarcosine; F2Pmp, L- difluorophosphonomethyl phenylalanine; Dod, dodecanoyl; Pra, L-propargylglycine; AzK, L- 6-Azido-2-amino-hexanoic; Agp, L-2-amino-3-guanidinylpropionic acid.
  • each W may be independently replaced with phenylalanine (F or f) or tyrosine (Y or y).
  • cytosolic delivery efficiency refers to the ability of a construct described herein comprising a CPP to traverse a cell membrane and enter the cytosol.
  • cytosolic delivery efficiency of the construct comprising the CPP is not dependent on a receptor or a cell type. Cytosolic delivery efficiency can refer to absolute cytosolic delivery efficiency or relative cytosolic delivery efficiency.
  • Absolute cytosolic delivery efficiency is the ratio of cytosolic concentration of a construct comprising a CPP over the concentration of the construct comprising the CPP in the growth medium.
  • Relative cytosolic delivery efficiency refers to the concentration of a construct comprising a CPP in the cytosol compared to the concentration of a control construct not comprising a CPP in the cytosol. Quantification can be achieved by fluorescently labeling
  • SUBSTITUTE SHEET (RULE 26) the protein (e.g., with a FITC dye) and measuring the fluorescence intensity using techniques well-known in the art.
  • the present disclosure provides modified looped nucleases comprising one or more loop region, wherein the at least one loop region comprises a CPP sequence inserted into the loop.
  • looped nucleases refers to a nuclease with a secondary structure comprising one or more looped regions. Loops refer to regions of the protein other than alpha helices and beta-strands. Structurally, loops are generally located in regions where there is a change direction in the secondary structure. In embodiments, the change in direction can be at least 120 degrees. In embodiments, the change of direction is determined across 200 amino acids or less. Loops that have only 4 or 5 amino acid residues which participate in internal hydrogen bonding are referred to as “turns”.
  • Protein loops include beta turns and omega loops. The most common types of loops and turns cause a change in direction of the polypeptide chain allowing it to fold back on itself to create a more compact structure. Looped regions in nucleases can be determined by means known in the art, such as queries of the Loops in Proteins database (See Michalesky and Preissner, Loops In Proteins (LIP) - a comprehensive loop database for homology modelling. Protein Engineering, Design, and Selection. (2003) 16: 12;979-985), and the online protein fold recognition server Phyre 2 (Kelley et al., The Phyre2 Web Portal For Protein Modeling, Prediction And Analysis. Nat. Protoc 2015, 10 (6), 845-858).
  • Looped regions in nucleases may be annotated within online databases, such as UniProt.
  • the secondary structure of Cas9 from Streptococcus pyogenes serotype Ml (Uniprot Accession Number Q99ZW2) (SEQ ID NO: 1) is annotated within the Structure section of Uniprot.
  • Cas9 from Streptococcus pyogenes serotype Ml comprises a CPP sequence disclosed herein in one or more of Cas9’s loop regions, which are described in Table E.
  • the amino acid ranges contained in Table E are numbered with respect to SEQ ID NO: 1.
  • a CPP replaces one or more of, or is inserted between one or more of, amino acids 23-25 of Cas9 (SEQ ID NO: 1), for example, amino acid 23, 24, 25, or combinations thereof.
  • a CPP replaces one or more of, or is inserted between one or more of, amino acids 103 - 105 of Cas9 (SEQ ID NO: 1), for example, amino acid 103, 104, 105, or combinations thereof.
  • a CPP replaces one or more of, or is inserted between one or more of, amino acids 117 - 119 of Cas9 (SEQ ID NO: 1), for example, amino acid 117, 118, 119, or combinations thereof.
  • a CPP replaces one or
  • SUBSTITUTE SHEET (RULE 26) more of, or is inserted between one or more of, amino acids 196 - 198 of Cas9 (SEQ ID NO: 1), for example, amino acid 196, 197, 198, or combinations thereof.
  • a CPP replaces one or more of, or is inserted between one or more of, amino acids 253 - 257 of Cas9 (SEQ ID NO: 1), for example, amino acid 253, 254, 255, 256, 257, or combinations thereof.
  • a CPP replaces one or more of, or is inserted between one or more of, amino acids 300 - 305 of Cas9 (SEQ ID NO: 1), for example, amino acid 300, 301, 302, 303, 304, 305, or combinations thereof.
  • a CPP replaces one or more of, or is inserted between one or more of, amino acids 427 - 429 of Cas9 (SEQ ID NO: 1), for example, amino acid 427, 428, 429, or combinations thereof.
  • a CPP replaces one or more of, or is inserted between one or more of, amino acids 450 - 452 of Cas9 (SEQ ID NO: 1), for example, amino acid 450, 451, 452, or combinations thereof.
  • a CPP replaces one or more of, or is inserted between one or more of, amino acids 475 - 477 of Cas9 (SEQ ID NO: 1), for example, amino acid 475, 476, 477, or combinations thereof.
  • a CPP replaces one or more of, or is inserted between one or more of, amino acids 532 - 534 of Cas9 (SEQ ID NO: 1), for example, amino acid 532, 533, 534, or combinations thereof.
  • a CPP replaces one or more of, or is inserted between one or more of, amino acids 552 - 555 of Cas9 (SEQ ID NO: 1), for example, amino acid 552, 553, 554, 555, or combinations thereof.
  • a CPP replaces one or more of, or is inserted between one or more of, amino acids 568 - 573 of Cas9 (SEQ ID NO: 1), for example, amino acid 568, 569, 570, 571, 572, 573, or combinations thereof.
  • a CPP replaces one or more of, or is inserted between one or more of, amino acids 673 - 675 of Cas9 (SEQ ID NO: 1), for example, amino acid 673, 674, 675, or combinations thereof.
  • a CPP replaces one or more of, or is inserted between one or more of, amino acids 687 - 689 of Cas9 (SEQ ID NO: 1), for example, amino acid 687, 688, 689, or combinations thereof.
  • a CPP replaces one or more of, or is inserted between one or more of, amino acids 751 - 753 of Cas9 (SEQ ID NO: 1), for example, amino acid 751, 752, 753, or combinations thereof.
  • a CPP replaces one or more of, or is inserted between one or more of, amino acids 771 - 774 of Cas9 (SEQ ID NO: 1), for example, amino acid 771, 772, 773, 774, or combinations thereof.
  • a CPP replaces one or more of, or is inserted between one or more of, amino acids 817 - 819 of Cas9 (SEQ ID NO: 1), for example, amino acid 817, 818, 819, or combinations thereof.
  • a CPP replaces one or more of, or is inserted between one or more of, amino acids 844 - 846 of Cas9 (SEQ ID NO: 1), for example, amino acid 844, 845, 846, or combinations thereof.
  • a CPP replaces one or
  • SUBSTITUTE SHEET (RULE 26) more of, or is inserted between one or more of, amino acids 1053 - 1055 of Cas9 (SEQ ID NO: 1), for example, amino acid 1053, 1054, 1055, or combinations thereof.
  • a CPP replaces one or more of, or is inserted between one or more of, amino acids 1067 - 1069 of Cas9 (SEQ ID NO: 1), for example, amino acid 1067, 1068, 1069, or combinations thereof.
  • a CPP replaces one or more of, or is inserted between one or more of, amino acids 1076 - 1078 of Cas9 (SEQ ID NO: 1), for example, amino acid 1076, 1077, 1078, or combinations thereof.
  • a CPP replaces one or more of, or is inserted between one or more of, amino acids 1152 - 1155 of Cas9 (SEQ ID NO: 1), for example, amino acid 1152, 1153, 1154, 1155, or combinations thereof.
  • a CPP replaces one or more of, or is inserted between one or more of, amino acids 1168 - 1170 of Cas9 (SEQ ID NO: 1), for example, amino acid 1168, 1169, 1170, or combinations thereof.
  • a CPP replaces one or more of, or is inserted between one or more of, amino acids 1262 - 1264 of Cas9 (SEQ ID NO: 1), for example, amino acid 1262, 1263, 1264, or combinations thereof.
  • a CPP replaces one or more of, or is inserted between one or more of, amino acids 1297 - 1299 of Cas9 (SEQ ID NO: 1), for example, amino acid 1297, 1298, 1299, or combinations thereof.
  • a CPP is inserted immediately after an amino acid within the range 23 - 25 of Cas9 (SEQ ID NO: 1), for example, immediately after amino acid 23, 24, or 25.
  • a CPP is inserted immediately after an amino acid within the range 103 - 105 of Cas9 (SEQ ID NO: 1), for example, immediately after amino acid 103, 104, or 105.
  • a CPP is inserted immediately after an amino acid within the range 117 - 119 of Cas9 (SEQ ID NO: 1), for example, immediately after amino acid 117, 118, or 119.
  • a CPP is inserted immediately after an amino acid within the range 196 - 198 of Cas9 (SEQ ID NO: 1), for example, immediately after amino acid 196, 197, or 198. In embodiments, a CPP is inserted immediately after an amino acid within the range 253 - 257 of Cas9 (SEQ ID NO: 1), for example, immediately after amino acid 253, 254, 255, 256, or 257. In embodiments, a CPP is inserted immediately after an amino acid within the range 300 - 305 of Cas9 (SEQ ID NO: 1), for example, immediately after amino acid 300, 301, 302, 303, 304, or 305.
  • a CPP is inserted immediately after an amino acid within the range 427 - 429 of Cas9 (SEQ ID NO: 1), for example, immediately after amino acid 427, 428, or 429. In embodiments, a CPP is inserted immediately after an amino acid within the range 450 - 452 of Cas9 (SEQ ID NO: 1), for example, immediately after amino acid 450, 451, or 452. In embodiments, a CPP is inserted immediately after an amino acid 42
  • SUBSTITUTE SHEET within the range 475 - 477 of Cas9 (SEQ ID NO: 1), for example, immediately after amino acid 475, 476, or 477.
  • a CPP is inserted immediately after an amino acid within the range 532 - 534 of Cas9 (SEQ ID NO: 1), for example, immediately after amino acid 532, 533, or 534.
  • a CPP is inserted immediately after an amino acid within the range 552 - 555 of Cas9 (SEQ ID NO: 1), for example, immediately after amino acid 552, 553, or 554, 555.
  • a CPP is inserted immediately after an amino acid within the range 568 - 573 of Cas9 (SEQ ID NO: 1), for example, immediately after amino acid 568, 569, 570, 571, 572, or 573.
  • a CPP is inserted immediately after an amino acid within the range 673 - 675 of Cas9 (SEQ ID NO: 1), for example, immediately after amino acid 673, 674, or 675.
  • a CPP is inserted immediately after an amino acid within the range 687 - 689 of Cas9 (SEQ ID NO: 1), for example, immediately after amino acid 687, 688, or 689.
  • a CPP is inserted immediately after an amino acid within the range 751 - 753 of Cas9 (SEQ ID NO: 1), for example, immediately after amino acid 751, 752, or 753.
  • a CPP is inserted immediately after an amino acid within the range 771 - 774 of Cas9 (SEQ ID NO: 1), for example, immediately after amino acid 771, 772, 773, or 774.
  • a CPP is inserted immediately after an amino acid within the range 817 - 819 of Cas9 (SEQ ID NO: 1), for example, immediately after amino acid 817, 818, or 819.
  • a CPP is inserted immediately after an amino acid within the range 844 - 846 of Cas9 (SEQ ID NO: 1), for example, immediately after amino acid 844, 845, or 846.
  • a CPP is inserted immediately after an amino acid within the range 1053 - 1055 of Cas9 (SEQ ID NO: 1), for example, immediately after amino acid 1053, 1054, or 1055.
  • a CPP is inserted immediately after an amino acid within the range 1067 - 1069 of Cas9 (SEQ ID NO: 1), for example, immediately after amino acid 1067, 1068, or 1069.
  • a CPP is inserted immediately after an amino acid within the range 1076 - 1078 of Cas9 (SEQ ID NO: 1), for example, immediately after amino acid 1076, 1077, or 1078.
  • a CPP is inserted immediately after an amino acid within the range 1152 - 1155 of Cas9 (SEQ ID NO: 1), for example, immediately after amino acid 1152, 1153, 1154, or 1155.
  • a CPP is inserted immediately after an amino acid within the range 1168 - 1170 of Cas9 (SEQ ID NO: 1), for example, immediately after amino acid 1168, 1169, or 1170.
  • a CPP is inserted immediately after an amino acid within the range 1262 - 1264 of Cas9 (SEQ ID NO: 1), for example, immediately after amino acid 1262, 1263, or 1264. In embodiments, a CPP is inserted
  • SUBSTITUTE SHEET (RULE 26) immediately after an amino acid within the range 1297 - 1299 of Cas9 (SEQ ID NO: 1), for example, immediately after amino acid 1297, 1298, or 1299.
  • a CPP is inserted immediately before an amino acid within the range 23 - 25 of Cas9 (SEQ ID NO: 1), for example, immediately before amino acid 23, 24, or 25.
  • a CPP is inserted immediately before an amino acid within the range 103 - 105 of Cas9 (SEQ ID NO: 1), for example, immediately before amino acid 103, 104, or 105.
  • a CPP is inserted immediately before an amino acid within the range 117 - 119 of Cas9 (SEQ ID NO: 1), for example, immediately before amino acid 117, 118, or 119.
  • a CPP is inserted immediately before an amino acid within the range 196 - 198 of Cas9 (SEQ ID NO: 1), for example, immediately before amino acid 196, 197, or 198.
  • a CPP is inserted immediately before an amino acid within the range 253 - 257 of Cas9 (SEQ ID NO: 1), for example, immediately before amino acid 253, 254, 255, 256, or 257.
  • a CPP is inserted immediately before an amino acid within the range 300 - 305 of Cas9 (SEQ ID NO: 1), for example, immediately before amino acid 300, 301, 302, 303, 304, or 305.
  • a CPP is inserted immediately before an amino acid within the range 427 - 429 of Cas9 (SEQ ID NO: 1), for example, immediately before amino acid 427, 428, or 429.
  • a CPP is inserted immediately before an amino acid within the range 450 - 452 of Cas9 (SEQ ID NO: 1), for example, immediately before amino acid 450, 451, or 452.
  • a CPP is inserted immediately before an amino acid within the range 475 - 477 of Cas9 (SEQ ID NO: 1), for example, immediately before amino acid 475, 476, or 477.
  • a CPP is inserted immediately before an amino acid within the range 532 - 534 of Cas9 (SEQ ID NO: 1), for example, immediately before amino acid 532, 533, or 534.
  • a CPP is inserted immediately before an amino acid within the range 552 - 555 of Cas9 (SEQ ID NO: 1), for example, immediately before amino acid 552, 553, or 554, 555.
  • a CPP is inserted immediately before an amino acid within the range 568 - 573 of Cas9 (SEQ ID NO: 1), for example, immediately before amino acid 568, 569, 570, 571, 572, or 573.
  • a CPP is inserted immediately before an amino acid within the range 673 - 675 of Cas9 (SEQ ID NO: 1), for example, immediately before amino acid 673, 674, or 675.
  • a CPP is inserted immediately before an amino acid within the range 687 - 689 of Cas9 (SEQ ID NO: 1), for example, immediately before amino acid 687, 688, or 689.
  • a CPP is inserted immediately before an amino acid within the range 751 - 753 of Cas9 (SEQ ID NO: 1), for example, immediately before amino acid 751, 752, or 753.
  • a CPP is inserted 44
  • SUBSTITUTE SHEET (RULE 26) immediately before an amino acid within the range 771 - 774 of Cas9 (SEQ ID NO: 1), for example, immediately before amino acid 771, 772, 773, or 774.
  • a CPP is inserted immediately before an amino acid within the range 817 - 819 of Cas9 (SEQ ID NO: 1), for example, immediately before amino acid 817, 818, or 819.
  • a CPP is inserted immediately before an amino acid within the range 844 - 846 of Cas9 (SEQ ID NO: 1), for example, immediately before amino acid 844, 845, or 846.
  • a CPP is inserted immediately before an amino acid within the range 1053 - 1055 of Cas9 (SEQ ID NO: 1), for example, immediately before amino acid 1053, 1054, or 1055.
  • a CPP is inserted immediately before an amino acid within the range 1067 - 1069 of Cas9 (SEQ ID NO: 1), for example, immediately before amino acid 1067, 1068, or 1069.
  • a CPP is inserted immediately before an amino acid within the range 1076 - 1078 of Cas9 (SEQ ID NO: 1), for example, immediately before amino acid 1076, 1077, or 1078.
  • a CPP is inserted immediately before an amino acid within the range 1152 - 1155 of Cas9 (SEQ ID NO: 1), for example, immediately before amino acid 1152, 1153, 1154, or 1155.
  • a CPP is inserted immediately before an amino acid within the range 1168 - 1170 of Cas9 (SEQ ID NO: 1), for example, immediately before amino acid 1168, 1169, or 1170.
  • a CPP is inserted immediately before an amino acid within the range 1262 - 1264 of Cas9 (SEQ ID NO: 1), for example, immediately before amino acid 1262, 1263, or 1264.
  • a CPP is inserted immediately before an amino acid within the range 1297 - 1299 of Cas9 (SEQ ID NO: 1), for example, immediately before amino acid 1297, 1298, or 1299.
  • CPP motifs are fused into the loop regions of nucleases, rather than at the N- or C-terminus.
  • insertion of a short CPP peptide into a surface loop or replacement of the original loop sequence with a CPP is expected to constrain the CPP sequence into a “cyclic” like conformation, which is expected to enhance the proteolytic stability of the CPP sequence.
  • the “cyclic” like conformation of a loop-embedded CPP may mimic that of a cyclic CPP and potentially enhance its cellular entry efficiency (cyclic CPPs have greater cytosolic uptake efficiency compared to linear CPPs).
  • CPP sequence Another important consideration is the CPP sequence. CPPs are thought to escape the endosome by binding to the intraluminal membrane and inducing CPP-enriched lipid domains to bud off the endosomal membrane as tiny vesicles, which then disintegrate into amorphous lipid/CPP aggregates inside the cytoplasm (Qian et al., Biochemistry 2016, 55, 2601-2612).
  • Amphipathic CPPs likely facilitate endosomal escape by stabilizing the budding neck structure, which features simultaneous positive and negative membrane curvatures in orthogonal directions (or negative Gaussian curvature), as the hydrophobic group(s) can insert into the membrane to generate positive curvature, while the arginine residues bring the phospholipid head groups to-gether to induce negative curvature (Dougherty et al., Understanding Cell Penetration of Cyclic Peptides. Chem. Rev. 2019, 119, 10241-10287).
  • cyclo(Phe-phe-Nal-Arg-arg-Arg-arg-Gln) (SEQ ID NO: 126), where phe is D-phenylalanine, Nal is L-naphthylalanine (Nal), and arg is D- arginine
  • phe is D-phenylalanine
  • Nal is L-naphthylalanine
  • arg is D- arginine
  • the modified looped nucleases described herein further comprise a detectable tag.
  • detectable tags include but are not limited to, FLAG tags, poly-histidine tags (e.g. 6xHis (SEQ ID NO: 127)), SNAP tags, Halo tags, cMyc tags, glutathione-S-transferase tags, avidin, enzymes, fluorescent proteins, luminescent proteins, chemiluminescent proteins, bioluminescent proteins, and phosphorescent proteins.
  • the fluorescent protein is selected from blue/UV proteins (such as BFP, TagBFP, mTagBFP2, Azurite, EBFP2, mKalamal, Sirius, Sapphire, and T-Sapphire); cyan proteins (such as CFP, eCFP, Cerulean, SCFP3A, mTurquoise, mTurquoise2, monomeric Midoriishi- Cyan, TagCFP, and mTFPl); green proteins (such as: GFP, eGFP, meGFP (A208K mutation), Emerald, Superfolder GFP, Monomeric Azami Green, TagGFP2, mUKG, mWasabi, Clover, and mNeonGreen); yellow proteins (such as YFP, eYFP, Citrine, Venus, SYFP2, and TagYFP); orange proteins (such as Monomeric Kusabira-Orange, mKOx, mK02, mOrange, and mOrange2); red proteins (such as RFP,
  • SUBSTITUTE SHEET (RULE 26) LSS-mKate2, and mBeRFP); photoactivatible proteins (such as PA-GFP, PAmCherryl, and PATagRFP); photoconvertible proteins (such as Kaede (green), Kaede (red), KikGRl (green), KikGRl (red), PS-CFP2, PS-CFP2, mEos2 (green), mEos2 (red), mEos3.2 (green), mEos3.2 (red), PSmOrange, and PSmOrange); and photoswitchable proteins (such as Dronpa).
  • photoactivatible proteins such as PA-GFP, PAmCherryl, and PATagRFP
  • photoconvertible proteins such as Kaede (green), Kaede (red), KikGRl (green), KikGRl (red), PS-CFP2, PS-CFP2, mEos2 (green), mEos2 (red), mEos3.2 (green),
  • the detectable tag can be selected from AmCyan, AsRed, DsRed2, DsRed Express, E2-Crimson, HcRed, ZsGreen, Zs Yellow, mCherry, mStrawberry, mOrange, mBanana, mPlum, mRasberry, tdTomato, DsRed Monomer, and/or AcGFP, all of which are available from Clontech.
  • CRISPR Clustered Regularly Interspaced Short Palindromic Repeats
  • the compounds disclosed herein comprise one or more CPP (or cCPP) conjugated to one or more components of a CRISPR-Cas gene-editing system.
  • the compounds disclosed herein comprise one or more CPP (or cCPP) conjugated to an expression vector encoding one or more components of a CRISPR-Cas geneediting system.
  • a linker conjugates the CPP (or cCPP) to the component of the CRISPR-Cas gene-editing system. Any linker described in this disclosure or that is known to a person of skill in the art may be utilized.
  • a compound comprising a CPP conjugated a nuclease or fragment thereof is provided.
  • a compound comprising a CPP conjugated to a DNA sequence e.g. an expression vector, for example, an expression vector encoding a nuclease, an RNA, or both
  • a compound comprising a CPP conjugated to a RNA sequence e.g. a gRNA
  • a CPP is conjugated to multiple cargos.
  • a CNP is conjugated to RNA and DNA cargos.
  • a CPP is conjugated to RNA and polypeptide cargos.
  • a CPP is conjugated to DNA and polypeptide cargos.
  • the construct described herein comprises one or more nucleases. In embodiments, one or more CPPs are conjugated to one or more nucleases. In embodiments, the construct described herein comprises an expression vector encoding one or more nucleases. In embodiments, one or more CPPs are conjugated to an expression vector encoding one or more nucleases.
  • nuclease refers to a protein that cleaves a phosphodiester bond connecting nucleotide residues in a nucleic acid molecule.
  • the nuclease is an endonuclease.
  • An endonuclease cleaves a phosphodiester bond within a polynucleotide chain.
  • the endonuclease cut a double-stranded nucleic acid target site symmetrically, i.e., cutting both strands at the same position so that the ends comprise base-paired nucleotides, also referred to herein as blunt ends.
  • the endonuclease cuts a double-stranded nucleic acid target sites asymmetrically, i.e., cutting each strand at a different position so that the ends comprise unpaired nucleotides.
  • Unpaired nucleotides at the end of a double-stranded DNA molecule are also referred to as “overhangs,” e.g., as “5 '-overhang” or as “3 '-overhang,” depending on whether the unpaired nucleotide(s) form(s) the 5' or the 3' end of the respective DNA strand.
  • Double-stranded DNA molecule ends ending with unpaired nucleotide(s) are also referred to as sticky ends, as they can “stick to” other double-stranded DNA molecule ends comprising complementary unpaired nucleotide(s).
  • the endonuclease is a DNA restriction nuclease.
  • a restriction nuclease such as EcoRI, Hindlll, or BamHI, recognizes a palindromic, double-stranded DNA target site of 4 to 10 base pairs in length, and cuts each of the two DNA strands at a specific position within the target site.
  • the nuclease is an exonuclease.
  • An exonuclease cleaves a phosphodiester bond at the end of the polynucleotide chain.
  • a nuclease is a site-specific nuclease, binding and/or cleaving a specific phosphodiester bond within a specific nucleotide sequence, which is also referred to herein as the “recognition sequence,” the “nuclease target site,” or the “target site.”
  • a nuclease is a deoxyribonuclease (referred to DNase or DNA nuclease).
  • a DNA nuclase catalyzes the hydrolytic cleavage of phosphodiester bonds in the DNA backbone.
  • the DNA nuclease is deoxyribonuclease I, deoxyribonuclease II, or microccal nuclease.
  • the DNA nuclease is an endonuclease.
  • the DNA nuclease is a Type I nuclease (e.g. restriction enzyme that cleaves away from the recognition site).
  • the DNA nuclease is a Type II nuclease (e.g. restriction enzyme that cleaves within or close to the recognition site).
  • the nuclease is a Type V nuclease.
  • a nuclease is a ribonuclease (referred to RNase or RNA nuclease).
  • RNase ribonuclease
  • a RNA nuclease catalyzes the hydrolytic cleavage of phosphodiester bonds in the RNA backbone.
  • the RNA nuclease is RNaseA, RNaseH, RNase III, RNase L, RNaseP, RNase PhyM, RNase Tl, RNase T2, RNase U2, RNase V, RNaseE, PNPase, RNase PH, RNase R, RNase D, RNase T, oligoribonuclease, exoribonuclease I, exoribonuclease II, or RNaseG.
  • the RNA nuclease is a Type VI nuclease.
  • the nuclease is a DNA and RNA nuclease. In embodiments, the DNA and RNA nuclease is a Type III nuclease.
  • the nuclease is a Type II, Type V-A, Type V-B, Type VC, Type V-U, Type VI-B nuclease.
  • a nuclease recognizes a single stranded target site. In embodiments, a nuclease recognizes a double-stranded target site, for example a doublestranded DNA target site.
  • a nuclease comprises a “binding domain” that mediates the interaction of the protein with the nucleic acid substrate. In embodiments, the nuclease specifically binds to a target site. In embodiments, a nuclease comprises a “cleavage domain” that catalyzes the cleavage of the phosphodiester bond within the nucleic acid backbone.
  • a nuclease binds and cleave a nucleic acid molecule in a monomeric form.
  • a nuclease protein has to dimerize or multimerize in order to cleave a target nucleic acid molecule.
  • Binding domains and cleavage domains of naturally occurring nucleases, as well as modular binding domains and cleavage domains that can be fused to create nucleases binding specific target sites, are well known to those of skill in the art.
  • zinc fingers or transcriptional activator like elements can be used as binding domains to specifically bind a desired target site, and fused or conjugated to a cleavage domain, for example, the cleavage domain of FokI, to create an engineered nuclease cleaving the target site.
  • the nuclease is part of a CRISPR-Cas system.
  • CRISPR-Cas systems fall into two classes: Class 1 systems use a complex of multiple Cas proteins to degrade foreign nucleic acids. In contrast, Class 2 systems use a single large Cas protein for the same purpose.
  • the CRISPR-Cas system is a Class 1 system.
  • the Class 1 system is a Type I system.
  • the Class 1, Type I system is a I- A, I-B, I-
  • the Class 1, Type I system incorporates Cas3, Cas8a, Cas5, Cas8b, Cas8c, CaslOd, Csel, Cse2, Csyl, Csy2, Csy3, and/or GSU0054 or a variant thereof.
  • the Class 1 system is a Type III system.
  • the Class 1, Type III system is a III- A, III-B, III-C, III-D, III-E, or III-F subtype.
  • the Class 1, Type III system incorporates CaslO, Csm2, Cmr5, CaslO, Csxl 1, and/or CsxlO or a variant thereof.
  • the Class 1 system is a Type IV system.
  • the Class 1, Type IV system is a IV-A, IV-B, or IV-C subtype.
  • the Class 1, Type IV system incorporate Csfl or a variant thereof.
  • the CRISPR-Cas system is a Class 2 system.
  • the Class 2 system is a Type II system.
  • the Class 2, Type II system is a II-A, II-B, or II-C subtype.
  • the Class 2, Type II system incorporates Cas9, Csn2, and/or Cas4 or a variant thereof.
  • the Class 2 system is a Type V system.
  • the Class 2, Type V system is a V-A, V-B, V-C, V-D, V-E, V-F, V-G, V-H, V-
  • the Class 2, Type V system incorporates Casl2, Casl2a (Cpfl), Casl2b (C2cl), Casl2c (C2c3), Casl2d (CasY), Casl2e (CasX), Casl2f (Cast 4, C2cl0), Cast 2g, Casl2h, Casl2i, Cast 2k (C2c5), C2c4, C2c8, and/or C2c9 or a variant thereof.
  • the Class 2 system is a Type VI system.
  • the Class 2, Type VI system is a VI-A, VI-B, VI-C, or VI-D subtype.
  • the Class 1 Casl2a
  • Type VI system incorporates Cast 3, Cast 3a (C2c2), Cast 3b, Cast 3 c, and/or Cast 3d or a variant thereof.
  • the CRISPR-Cas system and components are those taught in US 8,697,359 the entire contents of which are incorporated herein by reference.
  • the CRISPR-Cas system employs a nuclease that can cleave a single stranded nucleotide sequence. In embodiments, the CRISPR-Cas system employs a nuclease that can cleave RNA, mRNA, and/or pre-mRNA.
  • the nuclease is a transcription, activator-like effector nuclease (TALEN), a meganuclease, or a zinc-finger nuclease.
  • the nuclease is a Cas9, Cas9 variant, Cast 2a (Cpfl), Cast 2b, Cast 2c, Tnp-B like, Cast 3 a (C2c2), Cast 3b, or Cast 4 nuclease.
  • the nuclease is a Cas9 nuclease or a Cpfl nuclease.
  • the nuclease is a TALEN.
  • TALEN Transcriptional Activator-Like Element Nuclease
  • SUBSTITUTE SHEET (RULE 26) a DNA cleavage domain, for example, a FokI domain.
  • a number of modular assembly schemes for generating engineered TALE constructs have been reported (see e.g., Zhang, Feng; et. al. (February 2011). “Efficient construction of sequence-specific TAL effectors for modulating mammalian transcription”. Nature Biotechnology 29 (2): 149-53; Geipier, R.; Scholze, H.; Hahn, S.; Streubel, J.; Bonas, LT.; Behrens, S. E.; Boch, J. (2011), Shiu, Shin-Han. ed. “Transcriptional Activators of Human Genes with Programmable DNA-Specificity”.
  • the nuclease is a zinc finger nuclease.
  • Zinc fingers encompass a wide variety of differing protein structures (see, e.g., Klug A, Rhodes D (1987). “Zinc fingers: a novel protein fold for nucleic acid recognition”. Cold Spring Harb. Symp. Quant. Biol.
  • Zinc fingers can be designed to bind a specific sequence of nucleotides, and zinc finger arrays comprising fusions of a series of zinc fingers, can be designed to bind virtually any desired target sequence.
  • Such zinc finger arrays can form a binding domain of a protein, for example, of a nuclease, e.g., if conjugated to a nucleic acid cleavage domain.
  • zinc finger motifs are known to those of skill in the art, including, but not limited to, Cys2His2, Gag knuckle, Treble clef, Zinc ribbon, Zm/Cyse, and TAZ2 domain-like motifs (see, e.g., Krishna S S, Majumdar I, Grishin N V (January 2003). “Structural classification of zinc fingers: survey and summary”. Nucleic Acids Res. 31 (2): 532-50).
  • the zinc 51 is known to those of skill in the art, including, but not limited to, Cys2His2, Gag knuckle, Treble clef, Zinc ribbon, Zm/Cyse, and TAZ2 domain-like motifs.
  • SUBSTITUTE SHEET (RULE 26) finger array comprises one or more different zinc finger motifs selected from Cys2His2, Gag knuckle, Treble clef, Zinc ribbon, Zm/Cyse, and TAZ2 domain-like motifs.
  • a single zinc finger motif binds 3 or 4 nucleotides of a nucleic acid molecule.
  • a zinc finger domain comprising 2 zinc finger motifs binds 6- 8 nucleotides.
  • a zinc finger domain comprising 3 zinc finger motifs binds 9- 12 nucleotides.
  • Any suitable protein engineering technique can be employed to alter the DNA- binding specificity of zinc fingers and/or design novel zinc finger fusions to bind virtually any desired target sequence from 3-30 nucleotides in length (see, e.g., Pabo C O, Peisach E, Grant R A (2001). “Design and selection of novel cys2His2 Zinc finger proteins”.
  • the nuclease is a modified form or variant of a Cas9, Casl2a (Cpfl), Casl2b, Casl2c, Tnp-B like, Casl3a (C2c2), Casl3b, or Casl4 nuclease.
  • the nuclease is a modified form or variant of a TAL nuclease, a meganuclease, or a zinc-finger nuclease.
  • a “modified” or “variant” nuclease is one that is, for example, truncated, fused to another protein (such as another nuclease), catalytically inactivated, etc.
  • the nuclease may have at least 80%, at least 85%, at least 90%, at least 95%, at least 96%, at least 97%, at least 98%, at least 99%, or 100% sequence identity to a naturally occurring Cas9, Casl2a (Cpfl), Casl2b, Casl2c, Tnp-B like, Casl3a (C2c2), Casl3b, Casl4 nuclease, or a TALEN, meganuclease, or zinc-finger nuclease.
  • the nuclease is a Cas9 nuclease variant.
  • the nuclease is a naturally-occurring Cas9 variant.
  • the nuclease is an engineered Cas9 variant. In embodiments, the nuclease is an engineered Cas9 variant in which one or more amino acid residues are replaced with a cysteine. In embodiments, the nuclease is a high fidelity Cas9 variant. In embodiments, the nuclease is a Cas9 nuclease derived from S. pyogenes (SpCas9; SEQ ID NO: 1)). In embodiments, a nuclease has at least 90%, at least 95%, at least 96%, at least 97%, at least 98%, or at least 99% sequence identity to a Cas9 nuclease derived from S. pyogenes (SpCas9;
  • the nuclease is SpCas9-HFl. In embodiments, the nuclease is an enhanced SpCas9 (eSpCas9). In embodiments, the nuclease is a Cas9 derived from S. aureus (SaCas9; SEQ ID NO: 133). In embodiments, the nuclease has at least 90%, at least 95%, at least 96%, at least 97%, at least 98%, or at least 99% sequence identity to a Cas9 derived from S. aureus (SaCas9; SEQ ID NO: 133).
  • the nuclease is a SaCas9- HF nuclease. In embodiments, the nuclease is a KKHSaCas9 nuclease. In embodiments, the nuclease is a catalytically inactive Cas9. In embodiments, the nuclease is dCas9 (SEQ ID NO: 132). In embodiments, the nuclease has at least 90%, at least 95%, at least 96%, at least 97%, at least 98%, at least 99% or 100% sequence identity to dCas9 (SEQ ID NO: 132). In embodiments, the Cas9 nuclease is from S.
  • thermophilus (stCas9; SEQ ID NO: 134).
  • the nuclease has at least 90%, at least 95%, at least 96%, at least 97%, at least 98%, at least 99% or 100% sequence identity to a Cas9 derived from S. thermophilus (stCas9; SEQ ID NO: 134).
  • the nuclease is derived from N. meningitidis (nmCas9; SEQ ID NO: 135).
  • the nuclease has at least 90%, at least 95%, at least 96%, at least 97%, at least 98%, at least 99% or 100% sequence identity to a Cas9 derived from N.
  • the nuclease is derived from F. novicida (fnCas9; SEQ ID NO: 136). In embodiments, the nuclease has at least 90%, at least 95%, at least 96%, at least 97%, at least 98%, at least 99% or 100% sequence identity to a Cas9 derived from F novicida (fnCas9; SEQ ID NO: 136). In embodiments, the nuclease is derived from C. jejuni (cjCas9; SEQ ID NO: 137).
  • the nuclease has at least 90%, at least 95%, at least 96%, at least 97%, at least 98%, at least 99% or 100% sequence identity to a Cas9 derived from C. jejuni (cjCas9; SEQ ID NO: 137).
  • the nuclease is derived from S. canis (scCas9; SEQ ID NO: 138).
  • the nuclease has at least 90%, at least 95%, at least 96%, at least 97%, at least 98%, at least 99% or 100% sequence identity to a Cas9 derived from S. canis (scCas9; SEQ ID NO: 138).
  • the nuclease is derived from S. auricularis (SauriCas9; SEQ ID NO: 139). In embodiments, the nuclease has at least 90%, at least 95%, at least 96%, at least 97%, at least 98%, at least 99% or 100% sequence identity to a Cas9 derived from S. auricularis (SauriCas9; SEQ ID NO: 139). In embodiments, the Cas9 variant binds to a protospacer adjacent motif (PAM) including, but not limited to, 5-NGG-3, 3-NNGRRT-5, 5-NNGRRT-3, 5-NNG-3, or 5-NNGG-3.
  • PAM protospacer adjacent motif
  • the Cpfl is a Cpfl enzyme from Acidaminococcus (species BV3L6, UniProt Accession No. U2UMQ6).
  • the nuclease has at least 90%, at least 95%, at least 96%, at least 97%, at least 98%, at least 99% or 100% sequence identity to 53
  • SUBSTITUTE SHEET (RULE 26) a Cpfl enzyme from Acidaminococcus (species BV3L6, UniProt Accession No. U2UMQ6).
  • the Cpfl is a Cpfl enzyme from Lachnospiraceae (species ND2006, UniProt Accession No. A0A182DWE3).
  • the nuclease has at least 90%, at least 95%, at least 96%, at least 97%, at least 98%, at least 99% or 100% sequence identity to a Cpfl enzyme from Lachnospiraceae .
  • a sequence encoding the nuclease is codon optimized for expression in mammalian cells.
  • the sequence encoding the nuclease is codon optimized for expression in human cells or mouse cells.
  • the nuclease is selected from Table F.
  • a nuclease has at least 95 %, at least 96 %, at least 97 %, at least 98 %, at least 99 % or 100% sequence identity to a nuclease from Table F.
  • the nuclease recognizes a protospacer adjacent motif (PAM).
  • the PAM is selected from Table F.
  • gRNA Guide RNA
  • the construct described herein comprises guide RNA (also referred to a gRNA).
  • guide RNA also referred to a gRNA
  • one or more CPPs are conjugated to one or more guide RNA.
  • the construct described herein comprises an expression vector encoding one or more gRNA.
  • one or more CPPs are conjugated to an expression vector encoding one or more gRNA.
  • the gRNA is a single-molecule guide RNA (sgRNA).
  • a sgRNA comprises a spacer sequence and a scaffold sequence.
  • a spacer sequence is a short nucleic acid sequence used to target a nuclease (e.g., a Cas9 nuclease) to a specific nucleotide region of interest (e.g., a genomic DNA sequence to be cleaved).
  • the spacer may be about 17-24 nucleotides in length, such as about 20 nucleotides in length.
  • the spacer may be about 15, about 16, about 17, about 18, about 19, about 20, about 21, about 22, about 23, about 24, about 25, about 26, about 27, about 28, about 29, or about 30 nucleotides in length. In embodiments, the spacer may be at least 15, at least 16, at least 17, at least 18, at least 19, at least 20, at least 21, at least 22, at least 23, at least 24, at least 25, at least 26, at least 27, at least 28, at least 29, or at least 30 nucleotides in length. In embodiments, the spacer may be 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, or 30 nucleotides in length. In embodiments, the spacer sequence has from about 40% to about 80% GC content.
  • the spacer targets a site that immediately precedes a 5’ protospacer adjacent motif (PAM).
  • the PAM sequence may be selected based on the desired nuclease.
  • the PAM sequence may be any one of the PAM sequences shown in Table E, wherein N refers to any nucleic acid, R refers to A or G, Y refers to C or T, W refers to A or T, and V refers to A or C or G.
  • a spacer may target a sequence of a mammalian gene, such as a human gene.
  • the spacer may target a mutant gene.
  • the spacer may target a coding sequence.
  • the spacer may target an exonic sequence.
  • the scaffold sequence is the sequence within the sgRNA that is responsible for nuclease (e.g., Cas9) binding.
  • the scaffold sequence does not include the spacer/targeting sequence.
  • the scaffold may be from about 1 to about 130 nucleotides in length, about 1 to about 10, about 10 to about 20, about 20 to about 30, about 30 to about 40, about 40 to about 50, about 50 to about 60, about 60 to about 70, about 70 to about 80, about 80 to about 90, about 90 to about 100, about 100 to about 110, about 110 to about 120, or about 120 to about 130 nucleotides in length.
  • the scaffold may be about 1, about 2, about 3, about 4, about 5, about 6, about 7, about 8, about 9, about 10, about 11, about 12, about 13, about 14, about 15, about 16, about 17, about 18, about 19, about 20, about 21, about 22, about 23, about 24, about 25, about 26, about 27, about 28, about 29, about 30, about 31, about 32, about 33, about 34, about 35, about 36, about 37, about 38, about 39, about 40, about 41, about 42, about 43, about 44, about 45, about 46, about 47, about 48, about 49, about 50, about 51, about 52, about 53, about 54, about 55, about 56, about 57, about 58, about 59, about 60, about 60, about 61, about 62, about 63, about 64, about 65, about 66, about 67, about 68, about 69, about 70, about 71, about 72, about 73, about 74, about 75, about 76, about 77, about 78, about 79, about 80, about 81, about 82, about 83, about 84, about 85
  • SUBSTITUTE SHEET (RULE 26) about 89, about 90, about 91, about 92, about 93, about 94, about 95, about 96, about 97, about 98, about 99, about 100, about 101, about 102, about 103, about 104, about 105, about 106, about 107, about 108, about 109, about 110, about 111, about 112, about 113, about 114, about 115, about 116, about 117, about 118, about 119, about 120, about 121, about 122, about 123, about 124, or about 125 nucleotides in length.
  • the scaffold may be at least 10, at least 20, at least 30, at least 40, at least 50, at least 60, at least 70, at least 80, at least 90, at least 100, at least 110, at least 120, or at least 125 nucleotides in length. In embodiments, the scaffold may be up to 100, up to 110, up to 120 or up to 130 nucleotides in length.
  • the gRNA is a dual-molecule guide RNA, e.g, crRNA and tracrRNA.
  • the gRNA may further comprise a polyA tail.
  • a compound comprising a CPP conjugated to a nucleic acid comprising a gRNA is provided.
  • a compound comprising a CPP conjugated to an expression vector encoding a gRNA is provided.
  • the nucleic acid comprises about 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, or 20 gRNAs.
  • the expression vector encodes about 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, or 20 gRNAs.
  • the gRNAs recognize the same target.
  • the gRNAs recognize different targets.
  • the expression vector comprising a gRNA comprises a sequence encoding a promoter, wherein the promoter drives expression of the gRNA.
  • RNP Ribonucleoprotein complex
  • the construct described herein comprises one or more ribonucleoprotein complexes (RNPs).
  • RNPs ribonucleoprotein complexes
  • one or more CPPs are conjugated to one or more RNPs.
  • the construct comprising at least one component of a CRISPR-
  • Cas gene editing system and CPP further comprises an exocyclic peptide (EP).
  • the EP may be referred to interchangeably as a modulatory peptide (MP).
  • the EP can include a peptide that has been identified in the art as a “nuclear localization sequence” (NLS).
  • NLS nuclear localization sequence
  • the EP is coupled to the component of the CRISPR-Cas gene editing system.
  • the EP is coupled to the CPP.
  • the EP is coupled to the component of the CRISPR-Cas gene editing system and the CPP. Coupling between the EP, 56
  • the component of the CRISPR-Cas gene editing system, CPP, or combinations thereof, may be non-covalent or covalent.
  • the EP is attached through a peptide bond to the N- terminus of the CPP.
  • the EP is attached through a peptide bond to the C- terminus of the CPP.
  • the EP is attached to the CPP through a side chain of an amino acid in the CPP.
  • the EP is attached to the CPP through a side chain of a lysine which is conjugated to the side chain of a glutamine or glutamic acid in the CPP.
  • the EP is attached through a peptide bond to the N-terminus of the nuclease. In embodiments, the EP is attached through a peptide bond to the C-terminus of the nuclease. In embodiments, the EP is attached to the CPP through a side chain of an amino acid in the nuclease. In embodiments, the EP is attached to the CPP through a side chain of a lysine which is conjugated to the side chain of a glutamine or glutamic acid in the nuclease. In embodiments, the EP is conjugated to the 5’ or 3’ end of a guide RNA sequence. In embodiments, the EP is coupled to a linker.
  • the EP is coupled to a linker via the C-terminus of an EP and a CPP through a side chain on the CPP and/or EP.
  • an EP may comprise a terminal lysine which is then coupled to a CPP containing a glutamine through an amide bond.
  • the C- or N-terminus may be attached to the linker on the component of the CRISPR-Cas gene editing system.
  • the EP comprises at least one positively charged amino acid residues, e.g., at least one lysine residue and/or at least one arginine residue. In one embodiment, the EP comprises at least two, at least three or at least four or more lysine residues and/or arginine residues.
  • the EP is selected from KK, KR, RR, KKK, KGK, KBK, KBR, KRK, KRR, RKK, RRR, KKKK, KKRK, KRKK, KRRK, RKKR, RRRR, KGKK, KKGK, KKKKK, KKKRK, KBKBK, KKKRKV, PKKKRKV, PGKKRKV, PKGKRKV, PKKGRKV, PKKKGKV, PKKKRGV and PKKKRKG.
  • the EP is selected from KK, KR, RR, KKK, KGK, KBK, KBR, KRK, KRR, RKK, RRR, KKKK, KKRK, KRKK, KRRK, RKKR, RRRR, KGKK, KKGK, KKKKK, KKKRK, KBKBK, KKKRKV, PGKKRKV, PKGKRKV, PKKGRKV, PKKKGKV, PKKKRGV and PKKKRKG.
  • the EP comprises an amino acid sequence identified in the art as a nuclear localization sequence (NLS).
  • the EP comprises an NLS 57
  • the EP comprises an NLS comprising an amino acid sequence selected from NLSKRPAAIKKAGQAKKKK, PAAKRVKLD, RQRRNELKRSF, RMRKFKNKGKDTAELRRRRVEVSVELR, KAKKDEQILKRRNV, VSRKRPRP, PPKKARED, PQPKKKPL, SALIKKKKKMAP, DRLRR, PKQKKRK, RKLKKKIKKL, REKKKFLKRR, KRKGDEVDGVDEVAKKKSKK and RKCLQAGMNLEARKTKK.
  • the compounds further comprise a linker (L), which conjugates the CPP to the nuclease, guide sequence (e.g., gRNA), or ribonucleoprotein complex (RNP).
  • a linker (L) conjugates the CPP to the 5' or the 3' end of the guide sequence (e.g., gRNA).
  • L may be any appropriate moiety which conjugates the CPP (e.g., as described herein) to a component of the CRISPR-Cas gene editing system.
  • the linker prior to conjugation to the CPP and a component of the CRISPR-Cas gene editing system, the linker has two or more functional groups, each of which are independently capable of forming a covalent bond to the CPP moiety and the component of the CRISPR-Cas gene editing system.
  • the CPP is covalently bound to the N-terminus of the nuclease.
  • the CPP is covalently bound to the C-terminus of the nuclease.
  • the CPP is covalently bound to a side chain of an amino acid in the nuclease.
  • a linker is covalently bound to the 5' end of the guide RNA or the 3' end of the guide RNA.
  • a linker is covalently bound to the 5' end of the guide RNA.
  • L is covalently bound to the 3' end of the guide RNA.
  • a linker is covalently bound to the backbone of the guide RNA.
  • L may be any appropriate moiety which conjugates CPP (e.g., as described herein) to a component of the CRISPR-Cas gene editing system.
  • the linker prior to conjugation to the CPP and the component of the CRISPR-Cas gene editing system, the linker has two or more functional groups, each of which are independently capable of forming a covalent bond to the CPP moiety, the nuclease, or the guide RNA sequence.
  • a linker is covalently bound to a nucleophilic moiety on the nuclease or guide RNA sequence.
  • the nucleophilic moiety is conjugated to the nuclease or guide RNA sequence so that the nuclease can be attached to the CPP through a linker.
  • a linker is covalently bound to a side chain or terminus of an amino acid on the CPP.
  • a linker is covalently bound to the side chain of an amino acid on the CPP. In embodiments, a linker is covalently bound to a side chain or terminus of an amino acid on the nuclease. In embodiments, a linker is covalently bound to 5’ end, 3’ end, or backbone of the guide RNA sequence.
  • the linker is a bivalent or trivalent C1-C50 saturated or unsaturated, straight or branched alkyl, wherein 1-25 methylene groups are optionally and independently replaced by -N(H)-, -N(CI-C4 alkyl)-, -N(cycloalkyl)-, -O-, -C(O)-, -C(O)O-, - S-, -S(O)-, -S(O) 2 -, -S(O) 2 N(CI-C 4 alkyl)-, -S(O) 2 N(cycloalkyl)-, -N(H)C(O)-, -N(CI-C 4 alkyl)C(O)-, -N(cycloalkyl)C(O)-, -C(O)N(H)-, -C(O)N(CI-C 4 alkyl), -C(O)N(cycloalkyl), -C(O
  • a linker comprises (i) one or more D or L amino acids, each of which is optionally substituted; (ii) alkylene, alkenylene, alkynylene, carbocyclyl, or heterocyclyl, each of which is optionally substituted; (iii) -(J-R ⁇ z, wherein each R 1 is independently alkylene, alkenylene, alkynylene, carbocyclyl, or heterocyclyl, each J is independently NR 3 , -NR 3 C(O)-, S, or O, wherein R 3 is H, alkyl, alkenyl, alkynyl, carbocyclyl, or heterocyclyl, each of which is optionally substituted, and z is an integer from 1 to 50; (iv) - (J-R 2 )X wherein each R 2 is independently alkylene, alkenylene, alkynylene, carbocyclyl, or heterocyclyl, each J is independently NR 3
  • a linker comprises (i) P alanine and lysine residues; (ii) -(J- R x )z; (iii) -(J-R 2 )x; (iv) or combinations thereof.
  • each R 1 and R 2 is independently alkylene, alkenylene, alkynylene, carbocyclyl, or heterocyclyl
  • each J is independently NR 3 , -NR 3 C(O)-, S, or O, wherein R 3 is H, alkyl, alkenyl, alkynyl, carbocyclyl, or heterocyclyl, each of which is optionally substituted, and x and z are independently an integer from 1 to 50.
  • each R 1 and R 2 is independently alkylene and each J is O.
  • the linker has the structure:
  • each AA is independently an amino acid; AAsc is an amino acid side chain; x is an integer from 1 to 10; y is an integer from 1 to 5; and z is an integer from 1 to 10.
  • a linker has the following structure: wherein: x is an integer from 2 to 20; y is an integer from 1 to 5; and z is an integer from 2 to 20; wherein AAsc is a side chain of an amino acid residue of the CPP; and wherein M is a bonding group.
  • z is an integer from 5 to 15. In one embodiment, z is 11. In embodiments, x is an integer from 1 to 10. In one embodiment, x is 1. In embodiments, the CPP is attached to the nuclease or guide RNA sequence through a linker (“L”). In embodiments, the linker is conjugated to the nuclease or guide RNA sequence through a bonding group (“M”).
  • L linker
  • M bonding group
  • a linker or M may be covalently bound to the nuclease or guide RNA sequence at any suitable location on the nuclease or guide RNA sequence.
  • L or M is covalently bound to the 3' end of the nuclease or guide RNA sequence or the 5' end of the nuclease or guide RNA sequence.
  • L or M is covalently bound to the backbone of the nuclease or guide RNA sequence.
  • a linker is bound to the side chain of aspartic acid, glutamic acid, glutamine, asparagine, or lysine, or a modified side chain of glutamine or asparagine (e.g., a reduced side chain comprising an amino group), on the CPP.
  • the L is bound to the side chain of lysine on the CPP.
  • a linker has the following structure:
  • M is a group that conjugates L to nuclease or RNA guide sequence
  • AAx is an amino acid; o is an integer from 0 to 10; and p is an integer from 0 to 5.
  • a linker has the following structure: wherein
  • M is a group that conjugates the linker to an oligonucleotide
  • AAx is an amino acid; o is an integer from 0 to 10; and p is an integer from 0 to 5.
  • a linker or M may be covalently bound to the nuclease or guide RNA sequence at any suitable location on the nuclease or guide RNA sequence.
  • M is covalently bound to a nucleophilic moiety on the nuclease.
  • the nucleophilic moiety is a nitrogen-containing moiety.
  • M comprises an alkylene, alkenylene, alkynylene, carbocyclyl, or heterocyclyl, each of which is optionally substituted.
  • M is:
  • M is:
  • R 1 is alkylene, cycloalkyl, or , wherein m is an integer from 0 to 10. In embodiments, integer from 0 to 10. In embodiments,
  • M is a heterobifunctional crosslinker, e.g., Q OH , which is disclosed in Williams et al. Curr. Protoc Nucleic Acid Chem. 2010, 42, 4.41.1-4.41.20, incorporated herein by reference its entirety.
  • m is an integer from 0 to 10, e.g., 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, or 10. In embodiments, m is an integer from 1 to 5. In embodiments, m is an integer from 1 to 3. In embodiments, m is 1. In embodiments, m is 2. In embodiments, m is 3. In embodiments, m is 4. In embodiments, m is 5.
  • AA s is a side chain or terminus of an amino acid on the CPP.
  • Non-limiting examples of AA S include aspartic acid, glutamic acid, glutamine, asparagine, or
  • SUBSTITUTE SHEET (RULE 26) lysine, or a modified side chain of glutamine or asparagine (e.g., a reduced side chain comprising an amino group).
  • each AA X is independently a natural or non-natural amino acid.
  • one or more AA X are a natural amino acid. In embodiments, one or more AA X are a non-natural amino acid. In embodiments, one or more AA X are a P-amino acid. In embodiments, the P-amino acid is P-alanine.
  • o is an integer from 0 to 10, e.g., 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, or 10. In embodiments, o is 0, 1, 2, or 3. In embodiments, o is 0. In other embodiments, o is 1. In other embodiments, o is 2. In another embodiment, o is 3.
  • p is 0 to 5, e.g., 0, 1, 2, 3, 4, or 5. In embodiments, p is 0. In other embodiments, p is 1. In other embodiments p is 2. In other embodiments, p is 3. In another embodiment, p is 4. In another embodiment, p is 5.
  • a linker has a structure according to: wherein M, AA S , each -(R 1_ J-R 2 )z-, and o are defined as described herein; and r is 0 or 1.
  • r is 0. In embodiments, r is 1.
  • each of R 1 andR 2 are independently selected from alkylene, alkenylene, alkynylene, carbocyclyl, and heterocyclyl, each of which is optionally substituted.
  • each J is independently NR 3 , -NR 3 C(O)-, S, or O, and wherein R 3 is independently H, alkyl, alkenyl, alkynyl, carbocyclyl, or heterocyclyl, each of which is optionally substituted.
  • a linker has a structure according to: wherein each of M, AA S , o, p, q and r are defined above.
  • q is an integer from 1 to 50, e.g., 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, or 50, inclusive of all ranges and values therebetween.
  • q is an integer from 5 to 20.
  • q is an integer from 10 to 15.
  • a linker has a structure according to: wherein M, AA S and o are as defined as described herein.
  • linker groups include:
  • SUBSTITUTE SHEET (RULE 26) wherein M is a group that conjugates a linker to a nuclease, guide sequence, or exocyclic peptide; and AA S is a side chain or terminus of an amino acid on the CPP.
  • a linker and M comprise the following structure: wherein m is an integer from 0 to 10;
  • NU is a nuclease or guide RNA associated with the nuclease; and AAsis a side chain or terminus of an amino acid on the CPP.
  • the present disclosure provides a compound comprising the following structure: wherein:
  • EP, M, NU nuclease or guide RNA associated with the nuclease
  • x, y, and z are as defined above
  • AAsc comprises a side chain of an amino acid residue on the CPP.
  • a precursor to L also contains a thiol (-SH) group, which forms a disulfide bond with the side chain of cysteine or cysteine analog located on the nuclease.
  • the compounds disclosed herein e.g., the compounds for comprise the following structure:
  • the disulfide bond is formed between a thiol group on L, and the side chain of cysteine or an amino acid analog having a thiol group on the nuclease.
  • thiol-containing side chains may be located on native amino acids of the nuclease, or such thiol - containing amino acids may be introduced on the nuclease.
  • amino acid analogs having a thiol group which can be used with the polypeptide conjugates disclosed herein include:
  • L is wherein AAs is a side chain or terminus of an amino acid on the CPP.
  • a disulfide bond is formed between a thiol group on L, and the side chain of cysteine on the nuclease.
  • the cysteine may be a constituent of the nuclease or the nuclease may be modified to include cysteine or an amino acid analog having a thiol group.
  • any suitable functional group of the nuclease may be modified to form a thiol group for bonding to the linker L.
  • nucleic acid molecules comprising a nucleic acid sequence encoding nuclease, a modified looped nuclease, or a gRNA as described herein.
  • polynucleotide and nucleic acid used interchangeably herein, refer to a polymeric form of nucleotides of any length, either ribonucleotides or deoxyribonucleotides.
  • this term includes, but is not limited to, single-, double-, or multi -stranded DNA or RNA, genomic DNA, cDNA, DNA-RNA hybrids, or a polymer comprising purine and pyrimidine bases or other natural, chemically or biochemically modified, non-natural, or derivatized nucleotide bases.
  • Oligonucleotide generally refers to polynucleotides of between about 5 and about 100 nucleotides of single- or double-stranded DNA. However, for the purposes of this disclosure, there is no upper limit to the length of an oligonucleotide.
  • Oligonucleotides are also known as “oligomers” or “oligos” and may be isolated from genes, or chemically synthesized by methods known in the art.
  • polynucleotide and nucleic acid should be understood to include, as applicable to the embodiments being described, single-stranded and doublestranded polynucleotides.
  • references to describe sequence relationships between two or more polynucleotides or polypeptides include “reference sequence,” “comparison window,” “sequence identity,” “percentage of sequence identity,” and “substantial identity”.
  • a “reference sequence” is at least 12 but frequently 15 to 18 and often at least 25 monomer units, inclusive of nucleotides and amino acid residues, in length.
  • two polynucleotides may each comprise (1) a sequence (z.e., only a portion of the complete polynucleotide sequence) that is similar between the two polynucleotides, and (2) a sequence that is divergent between the two polynucleotides
  • sequence comparisons between two (or more) polynucleotides are typically performed by comparing sequences of the two polynucleotides over a “comparison window” to identify and compare local regions of sequence similarity.
  • a “comparison window” refers to a conceptual segment of at least 6 contiguous positions, usually about 50 to about 100, more 69
  • SUBSTITUTE SHEET usually about 100 to about 150 in which a sequence is compared to a reference sequence of the same number of contiguous positions after the two sequences are optimally aligned.
  • the comparison window may comprise additions or deletions (i.e., gaps) of about 20% or less as compared to the reference sequence (which does not comprise additions or deletions) for optimal alignment of the two sequences.
  • Optimal alignment of sequences for aligning a comparison window may be conducted by computerized implementations of algorithms (GAP, BESTFIT, FASTA, and TFASTA in the Wisconsin Genetics Software Package Release 7.0, Genetics Computer Group, 575 Science Drive Madison, WI, USA) or by inspection and the best alignment (z.e., resulting in the highest percentage homology over the comparison window) generated by any of the various methods selected.
  • GAP Garnier et al.
  • BESTFIT Pearson FASTA
  • FASTA Altschul etal.
  • TFASTA Pearson's Alignment of Altschul etal.
  • TFASTA Pearson's Alignment of Altschul etal.
  • a detailed discussion of sequence analysis can be found in Unit 19.3 of Ausubel et al., Current Protocols in Molecular Biology, John Wiley & Sons Inc, 1994-1998, Chapter 15.
  • sequence identity or, for example, comprising a “sequence 50% identical to,” as used herein, refer to the extent that sequences are identical on a nucleotide-by-nucleotide basis or an amino acid-by-amino acid basis over a window of comparison.
  • a “percentage of sequence identity” may be calculated by comparing two optimally aligned sequences over the window of comparison, determining the number of positions at which the identical nucleic acid base (e.g., A, T, C, G, I) or the identical amino acid residue (e.g., Ala, Pro, Ser, Thr, Gly, Vai, Leu, He, Phe, Tyr, Trp, Lys, Arg, His, Asp, Glu, Asn, Gin, Cys and Met) occurs in both sequences to yield the number of matched positions, dividing the number of matched positions by the total number of positions in the window of comparison (z.e., the window size), and multiplying the result by 100 to yield the percentage of sequence identity.
  • the identical nucleic acid base e.g., A, T, C, G, I
  • the identical amino acid residue e.g., Ala, Pro, Ser, Thr, Gly, Vai, Leu, He, Phe, Tyr, Trp, Lys,
  • polynucleotide variant and “variant” and the like refer to polynucleotides displaying substantial sequence identity with a reference polynucleotide sequence or polynucleotides that hybridize with a reference sequence under stringent conditions that are defined hereinafter. These terms include polynucleotides in which one or more nucleotides have been added or deleted, or replaced with different nucleotides compared to a reference polynucleotide. In this regard, it is well understood in the art that certain alterations inclusive of mutations, additions, deletions, and substitutions can be made
  • SUBSTITUTE SHEET (RULE 26) to a reference polynucleotide whereby the altered polynucleotide retains the biological function or activity of the reference polynucleotide.
  • polynucleotides or variants have at least or about 50%, 55%, 60%, 65%, 70%, 71%, 72%, 73%, 74%, 75%, 76%, 77%, 78%, 79%, 80%, 81%, 82%, 83%, 84%, 85%, 86%, 87%, 88%, 89%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, 99% or 100% sequence identity to a reference sequence.
  • polynucleotides contemplated herein may be combined with other DNA sequences, such as promoters and/or enhancers, untranslated regions (UTRs), signal sequences, Kozak sequences, polyadenylation signals, additional restriction enzyme sites, multiple cloning sites, internal ribosomal entry sites (IRES), recombinase recognition sites (e.g., LoxP, FRT, and Att sites), termination codons, transcriptional termination signals, and polynucleotides encoding self-cleaving polypeptides, epitope tags, as disclosed elsewhere herein or as known in the art, such that their overall length may vary considerably.
  • promoters and/or enhancers such as promoters and/or enhancers, untranslated regions (UTRs), signal sequences, Kozak sequences, polyadenylation signals, additional restriction enzyme sites, multiple cloning sites, internal ribosomal entry sites (IRES), recombinase recognition sites (e.g.
  • polynucleotide fragment of almost any length may be employed, with the total length preferably being limited by the ease of preparation and use in the intended recombinant DNA protocol.
  • Polynucleotides can be prepared, manipulated and/or expressed using any of a variety of well-established techniques known and available in the art.
  • a vector may also comprise a sequence encoding a signal peptide (e.g., for nuclear localization, nucleolar localization, mitochondrial localization), fused to the polynucleotide encoding the modified looped nuclease.
  • a vector may comprise a nuclear localization sequence (e.g., from SV40 or cMyc) fused to the polynucleotide encoding the modified looped nuclease.
  • a nuclear localization sequence e.g., from SV40 or cMyc
  • SV40 PKKKRKV (SEQ ID NO: 128)
  • NLP AVKRPAATKKAGQAKKKKLD (SEQ ID NO: 129)
  • TUS KLKIKRPVK (SEQ ID NO: 130)
  • EGL-13 MSRRRKANPTKLSENAKKLAKEVEN (SEQ ID NO: 131)
  • vector is used herein to refer to a nucleic acid molecule capable transferring or transporting another nucleic acid molecule.
  • the transferred nucleic acid is generally linked to, e.g., inserted into, the vector nucleic acid molecule.
  • a vector may include sequences that direct autonomous replication in a cell, or may include sequences sufficient to allow integration into host cell DNA.
  • expression cassette refers to genetic sequences within a vector which can express a RNA, and subsequently a protein.
  • the nucleic acid cassette contains the gene of interest, e.g., a modified looped nuclease.
  • the nucleic acid cassette is positionally and sequentially oriented within the vector such that the nucleic acid in the cassette can be transcribed into RNA, and when necessary, translated into a protein or a polypeptide, undergo appropriate post-translational modifications required for activity in the transformed cell, and be translocated to the appropriate compartment for biological activity by targeting to appropriate intracellular compartments or secretion into extracellular compartments.
  • the cassette has its 3' and 5' ends adapted for ready insertion into a vector, e.g., it has restriction endonuclease sites at each end.
  • the cassette can be removed and inserted into a plasmid or viral vector as a single unit.
  • the nucleic acid cassette contains the sequence of a modified looped nuclease.
  • Exemplary vectors include, without limitation, plasmids, phagemids, cosmids, transposons, artificial chromosomes such as yeast artificial chromosome (YAC), bacterial artificial chromosome (BAC), or Pl -derived artificial chromosome (PAC), bacteriophages such as lambda phage or Ml 3 phage, and animal viruses.
  • Examples of categories of animal viruses useful as vectors include, without limitation, retrovirus (including lentivirus), adenovirus, adeno-associated virus, herpesvirus (e.g., herpes simplex virus), poxvirus, baculovirus, papillomavirus, and papovavirus (e.g., SV40).
  • expression vectors are pClneo vectors (Promega) for expression in mammalian cells; pLenti4/V5-DESTTM, pLenti6/V5-DESTTM, and pLenti6.2/V5-GW/lacZ (Invitrogen) for lentivirus-mediated gene transfer and expression in mammalian cells.
  • the coding sequences of the modified looped nuclease disclosed herein can be ligated into such expression vectors for the expression of the modified looped nuclease in host cells.
  • non-viral vectors are used to deliver one or more polynucleotides contemplated herein to a host cell.
  • the vector is a non-integrating vector, including but not limited to, an episomal vector or a vector that is maintained extrachromosomally.
  • episomal vector refers to a vector that is able to replicate without integration into host’s chromosomal DNA and without gradual loss from a dividing host cell also meaning that the vector replicates extrachromosomally or episomally.
  • the vector is engineered to harbor the sequence coding for the origin of DNA replication or “ori” from a lymphotrophic herpes virus or a gamma herpesvirus, an adenovirus, SV40, a bovine papilloma virus, or a yeast, specifically a replication origin of a lymphotrophic herpes virus or a gamma herpesvirus corresponding to oriP of EBV.
  • the lymphotrophic herpes virus may be Epstein Barr virus (EBV), Kaposi's sarcoma herpes virus (KSHV), Herpes virus saimiri (HS), or Marek's disease virus (MDV).
  • Epstein Barr virus (EBV) and Kaposi's sarcoma herpes virus (KSHV) are also examples of a gamma herpesvirus.
  • the host cell comprises the viral replication transactivator protein that activates the replication.
  • a polynucleotide is introduced into a target or host cell using a transposon vector system.
  • the transposon vector system comprises a vector comprising transposable elements and a polynucleotide contemplated herein; and a transposase.
  • the transposon vector system is a single transposase vector system, see, e.g., WO 2008/027384.
  • Exemplary transposases include, but are not limited to: piggyBac, Sleeping Beauty, Mosl, Tcl/mariner, Tol2, mini-Tol2, Tc3, MuA, Himar I, Frog Prince, and derivatives thereof.
  • the piggyBac transposon and transposase are described, for example, in U.S. Patent 6,962,810, which is incorporated herein by reference in its entirety.
  • the Sleeping Beauty transposon and transposase are described, for example, in Izsvak et al., J. Mol. Biol. 302: 93-102 (2000), which is incorporated herein by reference in its entirety.
  • the Tol2 transposon which was first isolated from the medaka fish Oryzias latipes and belongs to the hAT family of transposons is described in Kawakami et al. (2000).
  • Mini-Tol2 is a variant of Tol2 and is described in Balciunas et al. (2006).
  • the Tol2 and Mini-Tol2 transposons facilitate integration of a transgene into the genome of an organism when co-acting with the Tol2 transposase.
  • the Frog Prince transposon and transposase are described, for example, in Miskey et al., Nucleic Acids Res . 31 :6873-6881 (2003).
  • control elements or “regulatory sequences” present in an expression vector are those non-translated regions of the vector (e.g., origin of replication, selection cassettes, promoters, enhancers, translation initiation signals (Shine Dalgarno sequence or
  • SUBSTITUTE SHEET (RULE 26) Kozak sequence) introns, a polyadenylation sequence, 5' and 3' untranslated regions) which interact with host cellular proteins to carry out transcription and translation.
  • Such elements may vary in their strength and specificity.
  • any number of suitable transcription and translation elements including ubiquitous promoters and inducible promoters may be used.
  • the polynucleotide of interest is operably linked to a control element or regulatory sequence. “Operably linked” refers to a juxtaposition wherein the components so described are in a relationship permitting them to function in their intended manner.
  • a promoter is operably linked to a polynucleotide sequence if the promoter affects the transcription or expression of the polynucleotide sequence.
  • the polynucleotide of interest is operably linked to a promoter sequence.
  • promoter refers to a recognition site of a polynucleotide (DNA or RNA) to which an RNA polymerase binds. An RNA polymerase initiates and transcribes polynucleotides operably linked to the promoter.
  • Illustrative ubiquitous promoters suitable for use in particular embodiments include, but are not limited to, a cytomegalovirus (CMV) immediate early promoter, a viral simian virus 40 (SV40) (e.g., early or late) promoter, a spleen focus forming virus (SFFV) promoter, a Moloney murine leukemia virus (MoMLV) LTR promoter, a Rous sarcoma virus (RSV) LTR, a herpes simplex virus (HSV) (thymidine kinase) promoter, H5, P7.5, and Pl l promoters from vaccinia virus, an elongation factor 1- alpha (EFla) promoter, early growth response 1 (EGR1) promoter, a ferritin H (FerH) promoter, a ferritin L (FerL) promoter, a Glyceraldehyde 3 -phosphate dehydrogenase (GAPDH) promoter, a
  • Illustrative methods of non-viral delivery of polynucleotides contemplated in particular embodiments include, but are not limited to: electroporation, sonoporation, lipofection, microinjection, biolistics, virosomes, liposomes, immunoliposomes, nanoparticles,
  • SUBSTITUTE SHEET (RULE 26) polycation or lipid:nucleic acid conjugates, naked DNA, artificial virions, DEAE-dextran- mediated transfer, gene gun, and heat-shock.
  • Illustrative examples of polynucleotide delivery systems suitable for use in particular embodiments contemplated in particular embodiments include, but are not limited to, those provided by Amaxa Biosystems, Maxcyte, Inc., BTX Molecular Delivery Systems, and Copernicus Therapeutics Inc.
  • Lipofection reagents are sold commercially (e.g., TransfectamTM and LipofectinTM). Cationic and neutral lipids that are suitable for efficient receptor-recognition lipofection of polynucleotides have been described in the literature. See e.g., Liu et al. (2003) Gene Therapy. 10: 180-187; and Balazs et al. (2011) Journal of Drug Delivery. 2011 : 1-12.
  • Antibody-targeted, bacterially derived, non-living nanocell-based delivery is also contemplated in particular embodiments.
  • a vector comprising an expression cassette comprising nucleic acid sequence encoding a modified looped nuclease described herein is introduced into a host cell that is capable of expressing the encoded modified looped nuclease.
  • exemplary host cells include Chinese Hamster Ovary (CHO) cells, HEK 293 cells, BHK cells, murine NSO cells, or murine SP2/0 cells, and E. coll cells.
  • the expressed protein is then purified from the culture system using any one of a variety of methods known in the art (e.g. , Protein A columns, affinity chromatography, size-exclusion chromatography, and the like).
  • Eukaryote-based systems in particular can be employed to produce nucleic acid sequences, or their cognate polypeptides, proteins and peptides. Many such systems are commercially and widely available.
  • the modified loop proteins described herein are produced using Chinese Hamster Ovary (CHO) cells following standardized protocols.
  • transgenic animals may be utilized to produce the modified loop proteins described herein, generally by expression into the milk of the animal using well established transgenic animal techniques. Lonberg N. Human antibodies from transgenic animals. Nat Biotechnol. 2005 Sep;23(9): 1117-25; Kipriyanov et al. Generation and production of engineered antibodies. Mol Biotechnol. 2004 Jan;26(l):39-60; See also Ko et al., Plant biopharming of monoclonal antibodies. Virus Res. 2005 Jul; 11 l(l):93-100.
  • the insect cell/baculovirus system can produce a high level of protein expression of a heterologous nucleic acid segment, such as described in U.S. Patent No. 5,871,986 and 4,879,236, both incorporated herein by reference in their entireties, and which can be bought, for example, under the name MAXBAC® 2.0 from Invitrogen and BACPACKTM Baculovirus expression system from Clonotech.
  • expression systems include Stratagene's Complete Control Inducible Mammalian Expression System, which utilizes a synthetic ecdysone-inducible receptor.
  • an inducible expression system is available from Invitrogen, which carries the T-REXTM (tetracyclineregulated expression) System, an inducible mammalian expression system that uses the full-length CMV promoter.
  • Invitrogen also provides a yeast expression system called the Pichia methanolica Expression System, which is designed for high-level production of recombinant proteins in the methylotrophic yeast Pichia methanolica.
  • vectors such as an expression construct comprising a nucleic acid sequence encoding a modified looped nuclease described herein, to produce its encoded nucleic acid sequence or its cognate polypeptide, protein, or peptide. See, generally, Recombinant Gene Expression Protocols By Rocky S. Tuan, Humana Press (1997), ISBN 0896033333; Advanced Technologies for Biopharmaceutical Processing By Roshni L. Dutton, Jeno M.
  • proteins of the present invention can be synthesized by exclusive solid phase synthesis, partial solid phase methods, fragment condensation or classical solution synthesis. These synthesis methods are well-known to those of skill in the art (see, for example, Merrifield, J. Am. Chem. Soc. 85:2149 (1963), Stewart et al., “Solid Phase Peptide Synthesis” (2nd Edition), (Pierce Chemical Co. 1984), Bayer and Rapp, Chem. Pept. Prot.
  • a recombinantly expressed protein is cleaved from an intein and the protein is ligated to a peptide containing an N-terminal cysteine having an unoxidized sulfhydryl side chain, by contacting the protein with the peptide in a reaction solution containing a conjugated thiophenol.
  • any appropriate means may be used to introduce a component of a CRISPR- Cas gene editing system, including, for example, a nuclease, a modified looped nuclease, a gRNA, or a ribonucleoprotein complex (RNP) to the target cell.
  • the compound comprising a component of a CRISPR-Cas gene editing system is transfected into the target cell by electroporation, lipofection, viral-delivery, or incubation of the target cell and modified looped nuclease.
  • the nuclease or modified looped nuclease is delivered to the target cell as a ribonucleoprotein comprising guide RNA conjugated to a nuclease, for example, a modified loop nuclease.
  • the RNP comprises a recombinant Cas9 protein and an sgRNA or a crRNA:tracrRNA duplex.
  • the nuclease or modified looped nuclease is delivered to the target cell separately from the guide RNA.
  • the target cell is transfected with the guide RNA (e.g. sgRNA) and then incubated with a nuclease, for example, a modified looped nuclease.
  • the CPPs comprise one or more cell- or tissue- targeting moieties.
  • the conjugation of these types of moieties targets the CPP and cargo to the desired cell or tissue type to improve the efficiency of uptake of the CPP and cargo into the targeted cell or tissue.
  • SUBSTITUTE SHEET (RULE 26) moiety can be identified by biopanning of phage displayed libraries.
  • the cell- or tissue-targeting moiety is a polypeptide or fragment thereof that binds to a cell (e.g. an antibody, ligand, or polypeptide that binds a carbohydrate).
  • the cell- or tissuetargeting moiety delivers the CPP conjugates of the present disclosure to any desired cell or tissue.
  • the cell or tissue includes, but is not limited to, breast, prostate, colon, brain, liver, neuronal, central nervous system, muscle, cardiac muscle, smooth muscle, skeletal muscle, lung, heart, epithelial tissue, vascular tissue, gastrointestinal tract, spinal cord, tumor, solid tumor, and a cancer cell.
  • An advantage of the CRISPR system is the ability to target any sequence in a DNA sequence that contains a PAM motif on either strand of DNA for editing. Nuclease binding depends on the complementary base pairing of the guide RNA to the DNA target to produce a targeted double-strand break in the DNA. This break is then repaired by the endogenous cellular repair machinery and can lead to local insertion and/or deletion events via the nonhomologous end-joining pathway or to precise sequence modification via homology- directed repair when a user-defined donor template is provided.
  • not all guide RNAs are equally effective at directing the nuclease-mediated DNA modifications and the conjugates disclosed herein may show different stability in vitro, ex vivo, or in vivo.
  • the assay includes, but is not limited to, a T7 endonuclease 1 (T7E1) mismatch detection assay, next-generation sequencing (NGS), tracking of indels by decomposition (TIDE) assay, Indel Detection by Amplicon Analysis (IDAA), and a DNA cleavage assay.
  • T7E1 T7 endonuclease 1
  • NGS next-generation sequencing
  • TIDE tracking of indels by decomposition
  • DNA cleavage assay a DNA cleavage assay.
  • nuclease activity is assayed in vitro.
  • nuclease activity is assayed ex vivo.
  • the nuclease activity is assayed in vivo.
  • the assay is a cell-based assay.
  • the assay is a synthetic assay.
  • the synthetic assay comprises one or more substrate DNA sequences known to be targets of the g
  • compositions comprising a nuclease, for example, CRISPR- associated nuclease, for example, a modified looped nuclease.
  • the compositions comprise a gRNA. The characteristics of gRNAs are described throughout this
  • compositions comprise an unmodified nuclease.
  • An unmodified nuclease does not comprise a CPP.
  • compositions described herein comprise at least one modified nuclease, at least two nucleases, at least three nucleases, or more. In embodiments, the compositions described herein comprise at least one modified looped nuclease, at least two modified looped nucleases, at least three modified loop nucleases, or more.
  • Embodiment 1 A modified looped nuclease comprising at least one loop region, wherein the at least one loop region comprises a cell penetrating peptide (CPP) sequence inserted into the loop region.
  • CPP cell penetrating peptide
  • Embodiment 2 The modified looped nuclease of embodiment 0, wherein the looped nuclease is selected from Cas9, Cas9 variant, Cast 2a (Cpfl), Cast 2b, Cast 2c, Tnp- B like, Cast 3a (C2c2), Cast 3b, and Cast 4 nuclease
  • Embodiment 3 The modified looped nuclease of embodiment 0, wherein the nuclease is Cas9 or a Cas9 variant.
  • Embodiment 4 The modified looped nuclease of embodiment 2, wherein the nuclease comprises at least 80%, at least 85%, at least 90%, at least 95%, at least 98%, at least 99%, or 100% sequence identity to the nuclease of claim 2.
  • Embodiment s The modified looped nuclease of embodiment 1, comprising a detectable tag.
  • Embodiment 6 The modified looped nuclease of embodiment 5, wherein the detectable tag is selected from a FLAG tag, a polyhistidine tag, a SNAP tag, a Halo tag, cMyc, glutathione-S-transferase, avidin, an enzyme, a fluorescent protein, a luminescent protein, a chemiluminescent protein, a bioluminescent protein, and a phosphorescent protein.
  • the detectable tag is selected from a FLAG tag, a polyhistidine tag, a SNAP tag, a Halo tag, cMyc, glutathione-S-transferase, avidin, an enzyme, a fluorescent protein, a luminescent protein, a chemiluminescent protein, a bioluminescent protein, and a phosphorescent protein.
  • Embodiment 7 The modified looped nuclease of any one of embodiment
  • RNA 1-6 comprising a guide RNA (gRNA).
  • gRNA guide RNA
  • Embodiment 8 The modified looped nuclease of any one of embodiment
  • the CPP sequence comprises at least three arginines, or analogs thereof.
  • Embodiment 9 The modified looped nuclease of any one of embodiment
  • Embodiment 10 The modified looped nuclease of any one of embodiment
  • the CPP comprises at least one amino acid with a hydrophobic side chain.
  • Embodiment 11 The modified looped nuclease of embodiment 10, wherein the CPP comprises from one to six amino acids with a hydrophobic side chain.
  • Embodiment 12 The modified looped nuclease of embodiment 10, wherein the amino acids with a hydrophobic side chain are independently selected from glycine, alanine, valine, leucine, isoleucine, methionine, phenylalanine, tryptophan, proline, naphthylalanine, phenylglycine, homophenylalanine, tyrosine, cyclohexylalanine, piperidine-
  • 2-carboxylic acid cyclohexylalanine, norleucine, 3-(3-benzothienyl)-alanine, 3-(2-quinolyl)- alanine, O-benzylserine, 3-(4-(benzyloxy)phenyl)-alanine, S-(4-methylbenzyl)cysteine, N- (naphthalen-2-yl)glutamine, 3-(l,r-biphenyl-4-yl)-alanine, tert-leucine, or nicotinoyl lysine, each of which is optionally substituted with one or more substituents.
  • Embodiment 13 The modified looped nuclease of any one of embodiment
  • Embodiment 14 The modified looped nuclease of any one of embodiment
  • each of the at least one of the amino acids with a hydrophobic side chain is tryptophan.
  • Embodiment 15 The modified looped nuclease of any one of embodiment 8-14, wherein the CPP sequence comprises at least three arginines and at least three tryptophans.
  • Embodiment 16 The modified looped nuclease of any one of embodiment
  • CPP sequence comprises from one to six D-amino acids.
  • Embodiment 17 The modified looped nuclease of any one of embodiment
  • Embodiment 18 The modified looped nuclease of embodiment 17, wherein the first CPP comprises at least three arginine, and the second CPP comprises at least three amino acids with a hydrophobic side chain
  • Embodiment 19 The modified looped nuclease of any one of embodiment
  • Embodiment 20 A recombinant nucleic acid molecule encoding the modified looped nuclease of any one of embodiment 0-19.
  • Embodiment 21 An expression cassette comprising the recombinant nucleic acid molecule of embodiment 20 operably linked to a promoter.
  • Embodiment 22 A vector comprising the expression cassette of embodiment 21.
  • Embodiment 23 A host cell comprising the vector of embodiment 22.
  • Embodiment 24 The host cell of embodiment 0, wherein the host cell is selected from a Chinese Hamster Ovary (CHO) cell, an HEK 293 cell, a BHK cell, a murine NSO cell, a murine SP2/0 cell, or an E. coli cell.
  • CHO Chinese Hamster Ovary
  • Embodiment 25 A composition comprising a modified looped nuclease of any one of embodiment 1-19 and a gRNA.
  • Embodiment 26 A method of producing the modified looped nuclease of any one of embodiment 0-19, comprising culturing the host cell of claim 0 and purifying the expressed modified looped nuclease from the supernatant.
  • Embodiment 27 A method of treating a disease or condition, comprising administering a modified looped nuclease of any one of embodiment 1-19.
  • Embodiment 28 A method of gene editing, comprising administering a modified looped nuclease of any one of embodiment 1-19.
  • Example 1 Intracellular delivery of a Modified Looped Nuclease
  • Modified Looped Nuclease Modified Looped Nucleases comprising a CPP from Table D are prepared.
  • FIG. 1 highlights the secondary structure of Cas9 (SEQ ID NO: 1). Loop regions are double underlined, helices are single underlined, and beta strands are 81
  • SUBSTITUTE SHEET (RULE 26) highlighted in bold and italics.
  • Table E shows the amino acid ranges of Cas9 which contain loops. The amino acid ranges are numbered with respect to SEQ ID NO: 1.
  • the CPP was inserted immediately prior to an amino acid within a loop region, immediately after an amino acid within a loop region of Cas9, or in place of one or more amino acids within loop regions.
  • the CPP is labeled with a fluorescent dye.
  • HeLa cells are cultured in six-well plates (5 x 10 5 cells per well) for 24 hours. After 24 hours, HeLa cells are transfected with sgRNA and then incubated with a modified looped nuclease (e.g., Cas9 comprising a CPP). As a negative control, the HeLa cells are incubated with an unmodified nuclease (e.g., Cas9) in the absence of CPP. The uptake efficiency of the modified looped nuclease is compared to uptake of the unmodified looped nuclease using fluorescence.
  • a modified looped nuclease e.g., Cas9 comprising a CPP
  • an unmodified nuclease e.g., Cas9
  • human HEK293T cells are transfected with the modified looped nuclease and/or gRNA.
  • Western blotting is performed to confirm that the modified looped nuclease enters the HEK293T cells.
  • Northern blotting is performed to confirm that the gRNA enters the HEK293T cells.
  • Fluorescence microscopy is performed to visualize Cas9.
  • a Surveyor assay is performed to assess site-specific genome cleavage.
  • CRISPR/Cas9 can be used in a ribonucleoprotein (RNP) format, in which the recombinant Cas9 protein is assembled in vitro with chemically synthesized sgRNA or a crRNA:tracrRNA duplex.
  • RNP ribonucleoprotein
  • sgRNA or duplex RNA is prepared by annealing crRNA to tracrRNA, and then mixed with a modified looped nuclease of Example 1.
  • SUBSTITUTE SHEET (RULE 26) [0256] The DNA cleavage activity of the RNP is assayed on PCR products of chosen amplified genes by mixing the RNP with the PCR product of an amplified gene and incubating at 37°C for 2 hours. After incubation, 1 pl Proteinase K is added to the reaction, and then the mixture incubated at 65°C for 10 minutes to release the DNA from the RNP. The products of each reaction are assessed by electrophoresis on 2% agarose gel to visualize cleavage of the PCR product.
  • Example 4 Intracellular delivery of a modified nuclease ribonucleoprotein (RNP)
  • RNP modified nuclease ribonucleoprotein
  • the conjugation of the nuclease to a cell-penetrating peptide allows the conjugate to be introduced to a cell without needing viral-delivery, lipofection, or other transfection methods that may introduce artefacts. Additionally, the use of RNP can reduce off- target effects versus plasmid transfection and can be less stressful on the cells.
  • Cas9 RNPs are prepared immediately before experiments by incubating 20 pM Cas9 with 20 pM sgRNA at 1 : 1 ratio in 20 pM Hepes (pH 7.5), 150 mM KC1, 1 mM MgC12, 10% (vol/vol) glycerol, and 1 mM Tris(2-chloroethyl) phosphate (TCEP) at 37 °C for 10 min to a final concentration of 10 pM.
  • the RNPs are then electroporated into HeLa cells.
  • the HeLa cells are electroporated with an unmodified nuclease (e.g., Cas9) in the absence of CPP.
  • the uptake efficiency of the modified looped nuclease is compared to uptake of the unmodified looped nuclease using fluorescence.
  • CPP is conjugated to Cas nuclease, such as Cas9 and one or more Cas9 variants (see Table G, below) using one of three possible conjugation methods: Site-specific thiol- maleimide conjugation (for engineered Cysteine variants), N-terminal conjugation or lysine conjugation.
  • the CPP and Cas nuclease may be conjugated by any known means in the art.
  • the CPP and the Cas nuclease may be conjugated by the enzymatic oxidation of tyrosine residues followed by reaction with cysteine thiols to allow covalent coupling as substantially described in Lobba et al. (2020) “Site-Specific Bioconjugation through Enzyme- Catalyzed Tyrosine-Cysteine Bond Formation” ACS Cent. Sci 6(9) 1564-1571, the contents of which are incorporated by reference herein.
  • Cas9 belongs to the class 2 type II CRISPR systems and is the most widely used genome editing tool. Streptococcus pyogenesvCas9 (SpCas9) was the first Cas nuclease to be used for genome editing in mammalian cells. However, recognition of the PAM 5’-NGG (N 83
  • SUBSTITUTE SHEET (RULE 26) represents any nucleotide) sequence limits the target sites in the human genome.
  • Cas9 variants have been generated with altered PAM specificities some of which are listed in Table G, below. See, Pickar-Oliver and Gersbach (2019) “The next generation of CRISPR-Cas technologies and applications. Nat Rev Mol Cell Biol. 20(8):490- 507, the contents of which is incorporated by reference herein.
  • Conjugation efficiency and retention of activity for the CPP-Cas conjugates prepared in Example 5 is evaluated by assembly of the RNP complex and evaluation of nuclease activity in an in vitro DNA cleavage assay, for example, substantially as described by Cromwell and Hubbard (2021) “In vitro assays for comparing the specificity of First- and Next- Generation CRISPR/Cas9 systems.” Methods Mol. Biol. 22162:215-232, the contents of which is incorporated by reference herein.
  • Intracellular uptake of the CPP-conjugated Cas nuclease prepared in Example 5 is evaluated by transfecting cells with sgRNA and treating transfected cells with the CPP- conjugated Cas nuclease.
  • CPP conjugation to the Cas nuclease is quantified by mass spectrometry.
  • Cas nuclease activity is evaluated using an enzyme activity assay using a DNA substrate, such as a reporter construct as substantially described in Martin et al. (2018): A fluorescent reporter for quantification and enrichment of DNA editing by APOBEC-Cas9 or cleavage by Cas9 in living cells” Nucleic Acids Res. 46(14)e84, the contents of which is incorporated by reference herein
  • Intracellular uptake of CPP-conjugated RNPs is evaluated as described in Example 3.
  • CPP conjugation to the RNP is quantified by mass spectrometry.
  • Nuclease activity is evaluated using an enzyme activity assay using a DNA substrate as described in Example 7.

Landscapes

  • Chemical & Material Sciences (AREA)
  • Organic Chemistry (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Genetics & Genomics (AREA)
  • Molecular Biology (AREA)
  • General Health & Medical Sciences (AREA)
  • Medicinal Chemistry (AREA)
  • Biochemistry (AREA)
  • Biophysics (AREA)
  • Proteomics, Peptides & Aminoacids (AREA)
  • Wood Science & Technology (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Engineering & Computer Science (AREA)
  • Zoology (AREA)
  • Biomedical Technology (AREA)
  • Biotechnology (AREA)
  • Microbiology (AREA)
  • General Engineering & Computer Science (AREA)
  • Micro-Organisms Or Cultivation Processes Thereof (AREA)
  • Enzymes And Modification Thereof (AREA)

Abstract

La présente invention concerne des systèmes, des méthodes et des compositions pour l'administration d'un ou de plusieurs constituants d'un système d'édition de gènes de CRISPR (Clustered Regularly Interspaced Short Palindromic Repeats)-Cas (protéine associée à CRISPR).
PCT/US2021/059773 2020-11-18 2021-11-17 Nucléases comprenant des séquences peptidiques pénétrant les cellules WO2022109058A1 (fr)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
US202063115314P 2020-11-18 2020-11-18
US63/115,314 2020-11-18

Publications (1)

Publication Number Publication Date
WO2022109058A1 true WO2022109058A1 (fr) 2022-05-27

Family

ID=81709688

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/US2021/059773 WO2022109058A1 (fr) 2020-11-18 2021-11-17 Nucléases comprenant des séquences peptidiques pénétrant les cellules

Country Status (1)

Country Link
WO (1) WO2022109058A1 (fr)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2022241408A1 (fr) * 2021-05-10 2022-11-17 Entrada Therapeutics, Inc. Compositions et procédés de modulation de la distribution tissulaire d'agents thérapeutiques intracellulaires
WO2023219933A1 (fr) * 2022-05-09 2023-11-16 Entrada Therapeutics, Inc. Compositions et procédés d'administration d'agents thérapeutiques à base d'acides nucléiques

Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2016205613A1 (fr) * 2015-06-18 2016-12-22 The Broad Institute Inc. Mutations d'enzyme crispr qui réduisent les effets non ciblés

Patent Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2016205613A1 (fr) * 2015-06-18 2016-12-22 The Broad Institute Inc. Mutations d'enzyme crispr qui réduisent les effets non ciblés

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
CHEN ET AL.: "Engineering Cell -Permeable Proteins through Insertion of Cell -Penetrating Motifs into Surface Loops", ACS CHEM. BIOL., vol. 15, 3 August 2020 (2020-08-03), pages 2568 - 2576, XP055837925, DOI: 10.1021/acschembio.0c00593 *
RAMAKRISHNA ET AL.: "Gene disruption by cell -penetrating peptide-mediated delivery of Cas9 protein and guide RNA", GENOME RESEARCH, vol. 24, no. 6, 2 April 2014 (2014-04-02), pages 1020 - 1027, XP055692365, DOI: 10.1101/gr.171264.113 *

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2022241408A1 (fr) * 2021-05-10 2022-11-17 Entrada Therapeutics, Inc. Compositions et procédés de modulation de la distribution tissulaire d'agents thérapeutiques intracellulaires
WO2023219933A1 (fr) * 2022-05-09 2023-11-16 Entrada Therapeutics, Inc. Compositions et procédés d'administration d'agents thérapeutiques à base d'acides nucléiques

Similar Documents

Publication Publication Date Title
AU2020233745B2 (en) Delivery system for functional nucleases
US20220315952A1 (en) Cas 9 retroviral integrase and cas 9 recombinase systems for targeted incorporation of a dna sequence into a genome of a cell or organism
AU2017341736B2 (en) Rationally-designed synthetic peptide shuttle agents for delivering polypeptide cargos from an extracellular space to the cytosol and/or nucleus of a target eukaryotic cell, uses thereof, methods and kits relating to same
Chang et al. BCL-6, a POZ/zinc-finger protein, is a sequence-specific transcriptional repressor.
EP1991560B1 (fr) Peptide présentant une activité de pénétration de membrane cellulaire
US20230348537A1 (en) Rationally-designed synthetic peptide shuttle agents for delivering polypeptide cargos from an extracellular space to the cytosol and/or nucleus of a target eukaryotic cell, uses thereof, methods and kits relating to same
AU2019222568B2 (en) Engineered Cas9 systems for eukaryotic genome modification
Bosnali et al. Generation of transducible versions of transcription factors Oct4 and Sox2
US10669531B2 (en) Cell-permeable Cre (iCP-Cre) recombinant protein and use thereof
WO2022109058A1 (fr) Nucléases comprenant des séquences peptidiques pénétrant les cellules
US9371533B2 (en) Protein expression system
Furukawa et al. Identification of the lamina‐associated‐polypeptide‐2‐binding domain of B‐type lamin
US10508265B2 (en) Cell-permeable reprogramming factor (iCP-RF) recombinant protein and use thereof
US7994148B2 (en) Transmembrane delivery peptide and bio-material comprising the same
US20230212235A1 (en) Looped proteins comprising cell penetrating peptides
JP2003518947A (ja) 誘導可能遺伝子標的化のためのレコンビナーゼのトランスダクション
CN114057861A (zh) 一种靶向UBE2C的bio-PROTAC人工蛋白
JPH10500311A (ja) 核タンパク質と相互作用する因子
Kim et al. Addition of an N-Terminal Poly-Glutamate Fusion Tag Improves Solubility and Production of Recombinant TAT-Cre Recombinase in Escherichia coli
AU773085B2 (en) Transfer compounds, the production and the use thereof
Niemeyer et al. Purification of a high-mobility-group 1 sea-urchin protein and cloning of cDNAs
Khokhlova et al. Bifidobacterium longum modified recombinant HU protein as a vector for nonviral delivery of DNA to HEK293 human cell culture
KR102668726B1 (ko) 기능성 뉴클레아제의 전달 시스템
Zhang et al. Prokaryotic expression and characterization of human AC3-33 protein

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 21895534

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 21895534

Country of ref document: EP

Kind code of ref document: A1