WO2022120520A1 - Engineered cas effector proteins and methods of use thereof - Google Patents

Engineered cas effector proteins and methods of use thereof Download PDF

Info

Publication number
WO2022120520A1
WO2022120520A1 PCT/CN2020/134249 CN2020134249W WO2022120520A1 WO 2022120520 A1 WO2022120520 A1 WO 2022120520A1 CN 2020134249 W CN2020134249 W CN 2020134249W WO 2022120520 A1 WO2022120520 A1 WO 2022120520A1
Authority
WO
WIPO (PCT)
Prior art keywords
engineered
amino acid
nuclease
cas
acid residues
Prior art date
Application number
PCT/CN2020/134249
Other languages
French (fr)
Inventor
Wei Li
Qi Zhou
Yangcan CHEN
Yanping Hu
Original Assignee
Institute Of Zoology, Chinese Academy Of Sciences
Beijing Institute For Stem Cell And Regenerative Medicine
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Institute Of Zoology, Chinese Academy Of Sciences, Beijing Institute For Stem Cell And Regenerative Medicine filed Critical Institute Of Zoology, Chinese Academy Of Sciences
Priority to CN202080107728.8A priority Critical patent/CN116601293A/en
Priority to PCT/CN2020/134249 priority patent/WO2022120520A1/en
Publication of WO2022120520A1 publication Critical patent/WO2022120520A1/en

Links

Images

Classifications

    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12NMICROORGANISMS OR ENZYMES; COMPOSITIONS THEREOF; PROPAGATING, PRESERVING, OR MAINTAINING MICROORGANISMS; MUTATION OR GENETIC ENGINEERING; CULTURE MEDIA
    • C12N9/00Enzymes; Proenzymes; Compositions thereof; Processes for preparing, activating, inhibiting, separating or purifying enzymes
    • C12N9/14Hydrolases (3)
    • C12N9/16Hydrolases (3) acting on ester bonds (3.1)
    • C12N9/22Ribonucleases RNAses, DNAses
    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12NMICROORGANISMS OR ENZYMES; COMPOSITIONS THEREOF; PROPAGATING, PRESERVING, OR MAINTAINING MICROORGANISMS; MUTATION OR GENETIC ENGINEERING; CULTURE MEDIA
    • C12N15/00Mutation or genetic engineering; DNA or RNA concerning genetic engineering, vectors, e.g. plasmids, or their isolation, preparation or purification; Use of hosts therefor
    • C12N15/09Recombinant DNA-technology
    • C12N15/11DNA or RNA fragments; Modified forms thereof; Non-coding nucleic acids having a biological activity
    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12NMICROORGANISMS OR ENZYMES; COMPOSITIONS THEREOF; PROPAGATING, PRESERVING, OR MAINTAINING MICROORGANISMS; MUTATION OR GENETIC ENGINEERING; CULTURE MEDIA
    • C12N2310/00Structure or type of the nucleic acid
    • C12N2310/10Type of nucleic acid
    • C12N2310/20Type of nucleic acid involving clustered regularly interspaced short palindromic repeats [CRISPRs]

Definitions

  • the present application relates generally to the field of biotechnology. More specifically, the present application relates to methods and compositions of engineered Cas effector proteins with improved activity (e.g., gene editing activity) .
  • Genome editing is an important and useful technology in genomic research and various applications.
  • Various systems may be used for genome editing, including the clustered regularly interspersed short palindromic repeats (CRISPR) -Cas system, the transcription activator-like effector nuclease (TALEN) system, and the zinc finger nuclease (ZFN) system.
  • CRISPR clustered regularly interspersed short palindromic repeats
  • TALEN transcription activator-like effector nuclease
  • ZFN zinc finger nuclease
  • the CRISPR-Cas system is an efficient and cost-effective genome-editing technology that is widely applicable in a range of eukaryotic organisms from yeast and plants to zebrafish and human (reviewed by Van der Oost 2013, Science 339: 768-770, and Charpentier and Doudna, 2013, Nature 495: 50-51) .
  • the CRISPR-Cas system provides adaptive immunity in archaea and bacteria by employing a combination of Cas effector proteins and CRISPR RNAs (crRNAs) .
  • crRNAs CRISPR RNAs
  • two classes (class 1 and 2) including six types (type I-VI) of CRISPR-Cas systems have been characterized according to prominent functional and evolutionary modularity of the systems.
  • type II Cas9 systems and type V-A/B/E/J Cas12a/Cas12b/Cas12e/Cas12j systems have been harnessed for genome editing, and hold tremendous promise for biomedical research.
  • the present disclosure provides methods for engineering enzymes such as Cas nucleases to improve its enzymatic activity, engineered Cas effector proteins, and methods of using the engineered Cas effector proteins.
  • the present application provides a method of engineering an enzyme, comprising: (a) obtaining a plurality of engineered enzymes each comprising one or more mutations that increase flexibility of a flexible region in a plurality of flexible regions of a reference enzyme; and (b) selecting one or more engineered enzymes from the plurality of the engineered enzymes, wherein the one or more engineered enzymes have an increased activity compared to the reference enzyme.
  • the method further comprises determining a plurality of flexible regions in the reference enzyme.
  • the plurality of flexible regions is determined based on the amino acid sequence of the reference enzyme. In some embodiments, the plurality of flexible regions is determined without reference to a three-dimensional structure of the reference enzyme or a homolog thereof. In some embodiments, the plurality of flexible regions is determined using a program selected from the group consisting of PredyFlexy, FoldUnfold, PROFbval, Flexserv, FlexPred, DynaMine and Disomine. In some embodiments, the plurality of flexible regions is determined using DynaMine.
  • the method comprises: (i) calculating a flexibility score of each amino acid residue of the reference enzyme, wherein a higher flexibility score indicates lower conformational flexibility; (ii) selecting a plurality of peak amino acid residues at positions X i , wherein each peak amino acid residue has a flexibility score that is below a pre-determined threshold value and is lower than the flexibility scores of amino acid residues at positions X i -5 to X i -1 and X i +1 and X i +5; and (iii) defining the plurality of flexible regions as amino acid residues X i -2 to X i +2.
  • the plurality of flexible regions are located in random coils.
  • the one or more mutations comprise insertion of one or more Glycine (G) residues in a flexible region. In some embodiments, the one or more mutations comprise inserting two G residues in a flexible region.
  • G Glycine
  • the one or more G residues are inserted at the N-terminus of a flexible amino acid residue in the flexible region, wherein the flexible amino acid residue is selected from the group consisting of G, Serine (S) , Asparagine (N) , Aspartic acid (D) , Histidine (H) , Methionine (M) , Threonine (T) , Glutamic acid (E) , Glutamine (Q) , Lysine (K) , Arginine (R) , Alanine (A) and Proline (P) .
  • the flexible amino acid residue is chosen according to the preference: G>S>N>D>H>M>T>E>Q>K>R>A>P.
  • the one or more mutations comprise substitution of one or more non-G residues with one or more G residues.
  • the one or more mutations comprise substitution of a hydrophobic amino acid residue in a flexible region with a G residue, wherein the hydrophobic amino acid residues is selected from the group consisting of leucine (L) , isoleucine (I) , valine (V) , cysteine (C) , tyrosine (Y) , phenylalanine (F) , and tryptophan (W) .
  • the enzyme is a bacterial or archaeal enzyme.
  • the enzyme is a Cas nuclease.
  • the Cas nuclease is selected from the group consisting of Cas9, Cas12a, Cas12b, Cas12c, Cas12d, Cas12f, Cas12g, Cas12h, Cms1, Cas12i, Cas12j, Cas12k and CasX.
  • the plurality of flexible regions are in domains of the reference Cas nuclease that interact with DNA and/or RNA.
  • the activity described in step (b) is site-specific nuclease activity.
  • the activity is gene-editing activity in a eukaryotic cell (e.g., mammalian cell) . In some embodiments, the activity is gene-editing activity in a human cell.
  • a selected engineered Cas nuclease of step (b) has at least about 20% (e.g., at least about 30%, 40%, 50%, 60%, 70%, 80%, or more) higher gene-editing efficiency compared to the reference Cas nuclease at a genomic locus in the cell.
  • the average gene-editing efficiency of a selected engineered Cas nuclease of step (b) at a plurality of genomic loci in the cell is at least about 20% (e.g., at least about 30%, 40%, 50%, 60%, 70%, 80%, or more) higher than that of the reference Cas nuclease.
  • the gene-editing efficiency is measured using a T7 endonuclease 1 (T7E1) assay, sequencing of the target DNA, a Tracking of Indels by Decomposition (TIDE) assay, or Indel Detection by Amplicon Analysis (IDAA) assay.
  • T7E1 T7 endonuclease 1
  • TIDE Tracking of Indels by Decomposition
  • IDAA Indel Detection by Amplicon Analysis
  • Another aspect of the present application provides an engineered Cas nuclease obtained using the method of any one of the methods described above.
  • the present application provides an engineered Cas nuclease comprising one or more mutations that increase flexibility of a flexible region comprising at least 5 consecutive amino acid residues of a reference Cas nuclease, wherein the engineered Cas nuclease has an increased activity compared to the reference Cas nuclease.
  • the flexible region is determined based on the amino acid sequence of the reference Cas nuclease.
  • the flexible region is determined using a program selected from the group consisting of PredyFlexy, FoldUnfold, PROFbval, Flexserv, FlexPred, DynaMine and Disomine.
  • the flexible region is determined using DynaMine, and wherein the amino acid residue with the highest flexibility in the flexible region has a flexibility score S 2 pred of no more than about 0.8.
  • the flexible region has 5 consecutive amino acid residues, wherein the 3 rd amino acid residue has the lowest flexibility score, and wherein a higher flexibility score indicates lower conformational flexibility. In some embodiments according to any one of the engineered Cas nucleases described above, the flexible region is located in a random coil.
  • the flexible region is in a domain of the reference Cas nuclease that interacts with DNA and/or RNA.
  • the reference Cas nuclease is selected from the group consisting of Cas9, Cas12a, Cas12b, Cas12c, Cas12d, Cas12f, Cas12g, Cas12h, Cms1, Cas12i, Cas12j, Cas12k and CasX.
  • the present application provides an engineered Cas12b nuclease comprising one or more mutations that increase flexibility of a flexible region that corresponds to amino acid residues 835 to 839 in a reference Cas12b nuclease, wherein the amino acid residue numbering is based on SEQ ID NO: 1, wherein the engineered Cas12b nuclease has an increased activity compared to the reference Cas12b nuclease.
  • the engineered Cas12b nuclease comprises an amino acid sequence having at least about 85% (e.g., at least about any one of 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, 99%, or 100%) sequence identity to SEQ ID NO: 2.
  • the present application provides an engineered Cas12i nuclease comprising one or more mutations that increase flexibility of a flexible region in a reference Cas12i nuclease that is selected from the group consisting of regions corresponding to amino acid residues 228-232, amino acid residues 439-443, amino acid residues 478-482, amino acid residues 500-504, amino acid residues 775-779, and amino acid residues 925-929, wherein the amino acid residue numbering is based on SEQ ID NO: 8, wherein the engineered Cas12i nuclease has an increased activity compared to the reference Cas12i nuclease.
  • the flexible region corresponds to amino acid residues 439-443 or amino acid residues 925-929, wherein the amino acid residue numbering is based on SEQ ID NO: 8.
  • the engineered Cas12i nuclease comprises an amino acid sequence having at least about 85% (e.g., at least about any one of 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, 99%, or 100%) sequence identity to an amino acid sequence selected from the group consisting of SEQ ID NOs: 14, 18, and 20.
  • the present application provides an engineered Cas9 nuclease comprising one or more mutations that increase flexibility of a flexible region in a reference Cas9 nuclease that is selected from the group consisting of regions corresponding to amino acid residues 39-43, amino acid residues 135-139, amino acid residues 176-180, amino acid residues 274-278, amino acid residues 351-355, and amino acid residues 389-393, amino acid residues 521-525, amino acid residues 541-545, amino acid residues 755-759, amino acid residues 774-778, amino acid residues 786-790, amino acid residues 811-815, amino acid residues 848-852, amino acid residues 855-859, amino acid residues 874-878, amino acid residues 891-895, amino acid residues 1019-1023, and amino acid residues 1036-1040, wherein the amino acid residue numbering is based on SEQ ID NO: 25, wherein the engineered Cas9 nuclease has an increased activity
  • the flexible region is selected from the group consisting of regions corresponding to amino acid residues 135-139, amino acid residues 176-180, amino acid residues 541-545, amino acid residues 755-759, and amino acid residues 811-815, wherein the amino acid residue numbering is based on SEQ ID NO: 25.
  • the engineered nuclease comprises an amino acid sequence having at least about 85% (e.g., at least about any one of 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, 99%, or 100%) sequence identity to an amino acid sequence selected from the group consisting of SEQ ID NOs: 27, 28, 33, 34, and 41.
  • the present application provides an engineered Cas9 nuclease comprising one or more mutations that increase flexibility of a flexible region in a reference Cas9 nuclease that is selected from the group consisting of regions corresponding to amino acid residues 45-49, amino acid residues 84-88, amino acid residues 116-120, amino acid residues 128-132, amino acid residues 216-220, and amino acid residues 318-322, amino acid residues 387-391, amino acid residues 497-501, amino acid residues 583-587, amino acid residues 594-598, amino acid residues 614-618, amino acid residues 696-700, and amino acid residues 739-743, wherein the amino acid residue numbering is based on SEQ ID NO: 53, wherein the engineered Cas9 nuclease has an increased activity compared to the reference Cas9 nuclease.
  • the flexible region corresponds to amino acid residues 45-49, or amino acid residues 116-120, wherein the amino acid residue numbering is based on SEQ ID NO: 53.
  • the engineered Cas9 nuclease comprises an amino acid sequence having at least about 85% (e.g., at least about any one of 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, 99%, or 100%) sequence identity to an amino acid sequence selected from the group consisting of SEQ ID NOs: 54 and 58-59.
  • the one or more mutations comprise insertion of one or more G residues in the flexible region. In some embodiments, the one or more mutations comprise insertion of two G residues in a flexible region.
  • the one or more G residues are inserted at the N-terminus of a flexible amino acid residue in the flexible region, wherein the flexible amino acid residue is selected from the group consisting of G, Serine (S) , Asparagine (N) , Aspartic acid (D) , Histidine (H) , Methionine (M) , Threonine (T) , Glutamic acid (E) , Glutamine (Q) , Lysine (K) , Arginine (R) , Alanine (A) and Proline (P) , and wherein the flexible amino acid residue is chosen according to the preference: G>S>N>D>H>M>T>E>Q>K>R>A>P.
  • the flexible amino acid residue is chosen according to the preference: G>S>N>D>H>M>T>E>Q>K>R>A>P.
  • the one or more mutations comprise substitution of a hydrophobic amino acid residue in a flexible region with a G residue, wherein the hydrophobic amino acid residues is selected from the group consisting of leucine (L) , isoleucine (I) , valine (V) , cysteine (C) , tyrosine (Y) , phenylalanine (F) , and tryptophan (W) .
  • the engineered Cas nuclease comprises one or more mutations that increase flexibility of two or more flexible regions in the reference Cas nuclease.
  • the increased activity of the Cas nuclease is site-specific nuclease activity.
  • the increased activity of the Cas nuclease is gene-editing activity in a eukaryotic cell (e.g., a mammalian cell) .
  • the increased activity of the Cas nuclease is gene-editing activity in a human cell.
  • the engineered Cas nuclease has at least about 20% (e.g., at least about 30%, 40%, 50%, 60%, 70%, 80%, or more) higher gene-editing efficiency compared to the reference Cas nuclease at a genomic locus in the cell.
  • the average gene-editing efficiency of the engineered Cas nuclease at a plurality of genomic loci in the cell is at least about 20% (e.g., at least about 30%, 40%, 50%, 60%, 70%, 80%, or more) higher than that of the reference Cas nuclease.
  • the gene-editing efficiency is measured using a T7 endonuclease 1 (T7E1) assay, sequencing of the target DNA, a Tracking of Indels by Decomposition (TIDE) assay, or Indel Detection by Amplicon Analysis (IDAA) assay.
  • T7E1 T7 endonuclease 1
  • TIDE Tracking of Indels by Decomposition
  • IDAA Indel Detection by Amplicon Analysis
  • the present application provides a Cas nuclease comprising an amino acid sequence selected from the group consisting of SEQ ID NOs: 2, 14, 18, 20, 27, 28, 33, 34, 41, 54 and 58-59.
  • the present application provides an engineered Cas effector protein comprising any one of the engineered Cas nucleases described above, or a functional derivative thereof.
  • the engineered Cas nuclease or functional derivative thereof is enzymatically active.
  • the effector protein is capable of inducing a double-strand break in a DNA molecule.
  • the engineered Cas effector protein is capable of inducing a single-strand break in a DNA molecule.
  • the effector protein comprises an enzymatically inactive mutant of the engineered Cas nuclease.
  • the engineered Cas effector protein further comprises a functional domain fused to the engineered Cas nuclease or functional derivative thereof.
  • the functional domain is selected from the group consisting of a translation initiator domain, a transcription repressor domain, a transactivation domain, an epigenetic modification domain, a nucleobase-editing domain (e.g., CBE or ABE domain) , a reverse transcriptase domain, a reporter domain (e.g., a fluorescent domain) , and a nuclease domain.
  • the engineered Cas effector protein comprises a first polypeptide comprising an N-terminal portion of the engineered Cas nuclease or functional derivative thereof, and a second polypeptide comprising a C-terminal portion of the engineered Cas nuclease or functional derivative thereof, wherein the first polypeptide and the second polypeptide are capable of associating with each other in the presence of a guide RNA comprising a guide sequence to form a Clustered Regularly Interspersed Short Palindromic Repeat (CRISPR) complex that specifically binds to a target nucleic acid comprising a target sequence complementary to the guide sequence.
  • CRISPR Clustered Regularly Interspersed Short Palindromic Repeat
  • the first polypeptide and the second polypeptide each comprises a dimerization domain. In some embodiments, the first dimerization domain and the second dimerization domain associate with each other in the presence of an inducer. In some embodiments, the first polypeptide and the second polypeptide do not comprise dimerization domains.
  • an engineered CRISPR-Cas system comprising: (a) any one of the engineered Cas effector proteins described above; and (b) a guide RNA comprising a guide sequence complementary to a target sequence, or one or more nucleic acids encoding the guide RNA, wherein the engineered Cas effector protein and the guide RNA are capable of forming a CRISPR complex that specifically binds to a target nucleic acid comprising the target sequence and induces a modification of the target nucleic acid.
  • the guide RNA is a crRNA comprising the guide sequence.
  • the system comprises a precursor guide RNA array encoding a plurality of crRNAs.
  • the guide RNA comprises a crRNA and a tracrRNA. In some embodiments, the guide RNA is a single guide RNA (sgRNA) . In some embodiments, the guide RNA comprises a crRNA and a scoutRNA. In some embodiments, wherein the engineered Cas effector protein is a prime editor, the guide RNA is a pegRNA.
  • the system comprises one or more vectors encoding the engineered Cas effector protein.
  • the one or more vectors is selected from the group consisting of retroviral vectors, lentiviral vectors, adenoviral vectors, adeno-associated vectors, and herpes simplex vector.
  • the one or more vectors is an adeno-associated viral (AAV) vector.
  • the AAV vector further encodes the guide RNA (e.g., a crRNA, a sgRNA or precursor guide RNA array) .
  • an engineered CRISPR-Cas system comprising: (a) a Cas12i effector protein comprising a Cas12i nuclease (e.g., Cas12i2) or a functional derivative thereof; and (b) a crRNA comprising a substitution of one or more Uridine (U) residues with a non-U residue in a repeat sequence comprising at least four U residues and a guide sequence complementary to a target sequence; wherein the Cas12i effector protein and the crRNA are capable of forming a CRISPR complex that specifically binds to a target nucleic acid comprising the target sequence and induces a modification of the target nucleic acid.
  • a Cas12i effector protein comprising a Cas12i nuclease (e.g., Cas12i2) or a functional derivative thereof
  • a crRNA comprising a substitution of one or more Uridine (U) residues with a non-U residue in a repeat sequence comprising at least four U residues and
  • Another aspect of the present application provides a method of detecting a target nucleic acid in a sample, comprising: (a) contacting the sample with any one of the engineered CRISPR-Cas systems described above, and a labeled detector nucleic acid (e.g., DNA or RNA) that is single stranded and does not hybridize with the guide sequence of the guide RNA; and (b) measuring a detectable signal produced by cleavage of the labeled detector nucleic acid by the engineered Cas effector protein, thereby detecting the target nucleic acid.
  • a labeled detector nucleic acid e.g., DNA or RNA
  • Another aspect of the present application provides a method of modifying a target nucleic acid comprising a target sequence, comprising contacting the target nucleic acid with any one of the engineered CRISPR-Cas systems described above.
  • the method is carried out in vitro.
  • the target nucleic acid is present in a cell.
  • the cell is a bacterial cell, a yeast cell, a mammalian cell, a plant cell, or an animal cell.
  • the method is carried out ex vivo. In some embodiments, the method is carried out in vivo.
  • the target nucleic acid is cleaved or the target sequence in the target nucleic acid is altered by the engineered CRISPR-Cas system. In some embodiments, expression of the target nucleic acid is altered by the engineered CRISPR-Cas system. In some embodiments, the target nucleic acid is a genomic DNA. In some embodiments, the target sequence is associated with a disease or condition.
  • the engineered CRISPR-Cas system comprises a precursor guide RNA array encoding a plurality of crRNA, wherein each crRNA comprises a different guide sequence.
  • the method is carried out at a temperature of about 4°C to about 67°C (e.g., about 4 °C to about 15°C, about 15°C to about 40°C, about 4°C to about 37°C, or about 40°C to about 67°C) .
  • Another aspect of the present application provides a method of treating a disease or condition associated with a target nucleic acid in cells of an individual, comprising modifying the target nucleic acid in the cells of the individual using the any one of the methods employing the engineered CRISPR-Cas systems described above, thereby treating the disease or condition.
  • the disease or condition is selected from the group consisting of cancer, cardiovascular diseases, hereditary diseases, autoimmune diseases, metabolic diseases, neurodegenerative diseases, ocular diseases, bacterial infections and viral infections.
  • Another aspect of the present application provides an engineered cell comprising a modified target nucleic acid, wherein the target nucleic acid has been modified using one of the methods described above.
  • the present application provides an engineered non-human animal comprising one or more of the engineered cells described above.
  • Another aspect of the present application provides a method of modifying a target sequence in a target nucleic acid, comprising contacting the target nucleic acid with an engineered CRISPR-Cas system at a temperature of about 40°C to about 67°C, wherein the engineered CRISPR-Cas system comprises: (a) a Cas12i2 effector protein comprising a Cas12i2 nuclease or a functional derivative thereof; and (b) a crRNA comprising a guide sequence that is complementary to the target sequence, wherein the Cas12i effector protein and the crRNA are capable of forming a CRISPR complex that specifically binds to a target nucleic acid comprising the target sequence and induces a modification of the target nucleic acid.
  • an engineered crRNA comprising a substitution of one or more Uridine (U) residue with a non-U residue in a repeat sequence comprising at least four U residues.
  • the engineered crRNA comprises a spacer sequence of about 17 to 25 nucleotides long. In some embodiments, the engineered crRNA comprises a spacer sequence of about 20 nucleotides long. In some embodiments, the engineered crRNA comprises a repeat sequence comprising the nucleic acid sequence of SEQ ID NO: 173.
  • compositions, kits and articles of manufacture for use in any one the methods described above.
  • FIG. 1A shows an exemplary design pipeline for engineering Cas proteins with improved activity.
  • candidate flexible regions are identified using DynaMine software, and glycine substitutions (single underlined) or insertions (double underlined) are introduced to generate Cas protein variants.
  • Variants are then cloned into expression vectors together with eGFP, and cells are transfected with the variant Cas + GFP plasmid and an sgRNA or crRNA plasmid to test editing efficiency of the variants.
  • FIG. 1B shows a enlarged figure of the third step of the FIG. 1A.
  • FIG. 2 shows the flexibility (S2) score profile for BhCas12b4.
  • FIG. 3 shows the %indels generated by wild-type BhCas12b4 compared to the enBhCas12b4 using the indicated sgRNA in human cells.
  • the last graph of FIG. 3 shows the %indels generated for all loci tested.
  • FIG. 4 shows the flexibility (S2) score profile for Cas12i2.
  • FIG. 5 shows computationally determined secondary structure regions of Cas12i2.
  • FIG. 6 shows the gene editing efficiency (%indels) of the Cas12i2 variants compared to WT Cas12i using four different crRNAs.
  • FIG. 7 shows the gene editing efficiency (%indels) of the a Cas12i2 variant with the combined mutations of variants 2.2+6.1, designated enCas12i2 compared to WT Cas12i or variants 2.2 and 6.1 alone.
  • the enCas12i2 (2.2+6.1) showed improved editing efficiency.
  • FIG. 8A shows the overall genome editing efficiency in human cells of enCas12i2 compared to SpCas9 and BhCas12b-v4. Editing efficiency (indel %) was analyzed at 46 loci for enCas12i2, 18 loci for SpCas9, and 23 loci for BhCas12bv4.
  • FIG. 8B shows the average editing efficiency of enCas12i2 in human 293T cells in different protospacer adjacent motif (PAM) sites including NTTA, NTTC, NTTG, and NTTT, and ATTN, CTTN, GTTN, and TTTN.
  • PAM protospacer adjacent motif
  • FIG. 9 shows enCas12i2 processing of pre-crRNA in vivo.
  • enCas12i2 has comparable genome-editing activity using a pre-crRNA targeting 3 sites versus using a single crRNA.
  • FIG. 10 shows in vitro cleavage of DNA plasmid by Cas12i2 and enCas12i2.
  • FIG. 11A shows detection of dsDNA containing an XBP target sequence by wildtype Cas12i2 and engineered Cas12i2 variants.
  • FIG. 11B shows results demonstrating that wildtype Cas12i2 can cleave a fluorescent reporter connected via rU in nucleic acid detection experiments.
  • FIG. 12 shows the flexibility (S2) score profile for GeoCas9.
  • FIG. 13 shows computationally determined secondary structure regions of GeoCas9.
  • FIG. 14 shows the results of a targeted deep sequencing assay to determine the efficiency of indel generation by the engineered GeoCas9 variants.
  • Engineered GeoCas9 variants with significantly improved editing efficiency are indicated by arrowheads.
  • FIG. 15 shows the locations of the selected flexible regions and corresponding domains of SaCas9.
  • FIG. 16 shows the results of a T7 enzyme cleavage assay to test editing efficiency of the SaCas9 variants.
  • Engineered SaCas9 variants 1.1, 3.1, 3.2 (indicated by box outline) showed significantly improved gene editing efficiency.
  • FIG. 17A shows in vitro dsDNA cleavage by wild-type Cas12i1 and Cas12i2.
  • FIG. 17B shows the crRNA sequences used to test cleavage by Cas12i1 or Cas12i2.
  • FIG. 18 shows %indels generated by wild-type Cas12i1 and Cas12i2 using crRNAs at target sites in the human genome.
  • the gene-editing efficiency is measured using a T7 endonuclease 1 (T7E1) assay.
  • T7E1 T7 endonuclease 1
  • FIG. 19 shows the editing efficiencies of the indicated crRNA mutants compared to wild type crRNA sequences using Cas12i1.
  • FIG. 20 shows results demonstrating that Cas12i2 is enzymatically active (i.e., able to cleave dsDNA) at a wide temperature range, from 12 °C to 67 °C.
  • FIG. 21 shows Cas12i2 editing of multiple genomic target loci in human cells using a crRNA array (pre-crRNA) .
  • FIG. 22 shows results defining the seed sequence of Cas12i2 by testing the ability of Cas12i2 to generate indels using crRNAs with single base mismatches at one of bases 1-19 of the crRNA.
  • FIG. 23 shows identification of the optimal spacer length of 20 bp for Cas12i2.
  • the present application provides methods for engineering an enzyme by introducing amino acid mutations that enhance flexibility in flexible regions of the enzyme, which leads to increased enzymatic activity in vitro and in vivo.
  • the methods described herein are applicable for a variety of Cas nucleases, including Cas12b, Cas12i and Cas9.
  • the methods described herein do not rely on three-dimensional structures of the Cas nuclease.
  • the methods described herein have been successfully applied to Cas12i to create a number of engineered Cas12i proteins with improved genome editing efficiency across a wide range of genetic loci.
  • Engineered Cas effector proteins e.g., Cas12b, Cas12i and Cas9 and methods of using the engineered Cas effector proteins are also provided.
  • an “effector protein” refers to a protein having an activity, such as site-specific binding activity, single-strand DNA cleavage activity, double-strand DNA cleavage activity, single-strand RNA cleavage activity, or transcriptional regulation activity.
  • guide RNA and “gRNA” are used herein interchangeably to refer to RNA that is capable of forming a complex with a Cas effector protein and a target nucleic acid (e.g., duplex DNA) .
  • a guide RNA may comprise a single RNA molecule or two or more RNA molecules associated with each other via hybridization of complementary regions in the two or more RNA molecules.
  • a guide RNA comprises a crRNA and a tracrRNA.
  • the guide RNA When used in connection with a single RNA-guided Cas nuclease, such as Cas12i, the guide RNA does not comprise a tracrRNA or another transactivating RNA. Also contemplated herein are precursor guide RNA arrays that can be processed into a plurality of crRNAs, and for some CRISPR/Cas systems, the processed crRNAs further associate with tracrRNA or another transactivating RNA (e.g., scoutRNA) to guide the Cas effector protein.
  • tracrRNA e.g., scoutRNA
  • the “crRNA” or “CRISPR RNA” comprises a guide sequence that has sufficient complementarity to a target sequence of a target nucleic acid (e.g., duplex DNA) , which guides sequence-specific binding of the CRISPR complex to the target nucleic acid.
  • the “tracrRNA” or “trans-activating CRISPR RNA” is partially complementary to and base pairs with the crRNA, and may play a role in the maturation of the crRNA.
  • a “single guide RNA” or “sgRNA” is an engineered guide RNA having both crRNA and tracrRNA fused to each other in a single molecule.
  • nucleic acid polynucleotide, ” and “nucleotide sequence” are used interchangeably to refer to a polymeric form of nucleotides of any length, including deoxyribonucleotides, ribonucleotides, combinations thereof, and analogs thereof.
  • Oligo are used interchangeably to refer to a short polynucleotide, having no more than about 50 nucleotides.
  • complementarity refers to the ability of a nucleic acid to form hydrogen bond (s) with another nucleic acid by traditional Watson-Crick base-pairing.
  • a percent complementarity indicates the percentage of residues in a nucleic acid molecule which can form hydrogen bonds (i.e., Watson-Crick base pairing) with a second nucleic acid (e.g., about 5, 6, 7, 8, 9, 10 out of 10, being about 50%, 60%, 70%, 80%, 90%, and 100%complementary respectively) .
  • Perfectly complementary means that all the contiguous residues of a nucleic acid sequence form hydrogen bonds with the same number of contiguous residues in a second nucleic acid sequence.
  • “Substantially complementary” as used herein refers to a degree of complementarity that is at least about any one of 70%, 75%, 80%, 85%, 90%, 95%, 97%, 98%, 99%, or 100%over a region of about 40, 50, 60, 70, 80, 100, 150, 200, 250 or more nucleotides, or refers to two nucleic acids that hybridize under stringent conditions.
  • stringent conditions for hybridization refer to conditions under which a nucleic acid having complementarity to a target sequence predominantly hybridizes with the target sequence, and substantially does not hybridize to non-target sequences. Stringent conditions are generally sequence-dependent, and vary depending on a number of factors. In general, the longer the sequence, the higher the temperature at which the sequence specifically hybridizes to its target sequence. Non-limiting examples of stringent conditions are described in detail in Tijssen (1993) , Laboratory Techniques In Biochemistry And Molecular Biology-Hybridization With Nucleic Acid Probes Part I, Second Chapter “Overview of principles of hybridization and the strategy of nucleic acid probe assay, ” Elsevier, N, Y.
  • Hybridization refers to a reaction in which one or more polynucleotides react to form a complex that is stabilized via hydrogen bonding between the bases of the nucleotide residues.
  • the hydrogen bonding may occur by Watson Crick base pairing, Hoogstein binding, or in any other sequence specific manner.
  • a sequence capable of hybridizing with a given sequence is referred to as the “complement” of the given sequence.
  • Percentage (%) sequence identity with respect to a nucleic acid sequence is defined as the percentage of nucleotides in a candidate sequence that are identical with the nucleotides in the specific nucleic acid sequence, after aligning the sequences by allowing gaps, if necessary, to achieve the maximum percent sequence identity. “Percentage (%) sequence identity” with respect to a peptide, polypeptide or protein sequence is the percentage of amino acid residues in a candidate sequence that are identical substitutions to amino acid residues in the specific peptide or amino acid sequence, after aligning the sequences by allowing gaps, if necessary, to achieve the maximum percent sequence homology.
  • Alignment for purposes of determining percent amino acid sequence identity can be achieved in various ways that are within the skill in the art, for instance, using publicly available computer software such as BLAST, BLAST-2, ALIGN or MEGALIGN TM (DNASTAR) software. Those skilled in the art can determine appropriate parameters for measuring alignment, including any algorithms needed to achieve maximal alignment over the full length of the sequences being compared.
  • polypeptide and “peptide” are used interchangeably herein to refer to polymers of amino acids of any length.
  • the polymer may he linear or branched, it may comprise modified amino acids, and it may be interrupted by non-amino acids.
  • a protein may have one or more polypeptides.
  • the terms also encompass an amino acid polymer that has been modified; for example, disulfide bond formation, glycosylation, lipidation, acetylation, phosphorylation, or any other manipulation, such as conjugation with a labeling component.
  • a “variant” is interpreted to mean a polynucleotide or polypeptide that differs from a reference polynucleotide or polypeptide, respectively, but retains essential properties.
  • a typical variant of a polynucleotide differs in nucleic acid sequence from another, reference polynucleotide. Changes in the nucleic acid sequence of the variant may or may not alter the amino acid sequence of a polypeptide encoded by the reference polynucleotide. Nucleotide changes may result in amino acid substitutions, additions, deletions, fusions and truncations in the polypeptide encoded by the reference sequence, as discussed below.
  • a typical variant of a polypeptide differs in amino acid sequence from another, reference polypeptide. Generally, differences are limited so that the sequences of the reference polypeptide and the variant are closely similar overall and, in many regions, identical.
  • a variant and reference polypeptide may differ in amino acid sequence by one or more substitutions, additions, deletions in any combination.
  • a substituted or inserted amino acid residue may or may not be one encoded by the genetic code.
  • a variant of a polynucleotide or polypeptide may be a naturally occurring such as an allelic variant, or it may be a variant that is not known to occur naturally. Non-naturally occurring variants of polynucleotides and polypeptides may be made by mutagenesis techniques, by direct synthesis, and by other recombinant methods known to skilled artisans.
  • wild type has the meaning commonly understood by those skilled in the art to mean a typical form of an organism, a strain, a gene, or a feature that distinguishes it from a mutant or variant when it exists in nature. It can be isolated from sources in nature and not intentionally modified.
  • nucleic acid molecule or polypeptide As used herein, the terms “non-naturally occurring” or “engineered” are used interchangeably and refer to artificial participation. When these terms are used to describe a nucleic acid molecule or polypeptide, it is meant that the nucleic acid molecule or polypeptide is at least substantially freed from at least one other component of its association in nature or as found in nature.
  • an "orthologue, ortholog” has the meaning as commonly understood by one of ordinary skill in the art.
  • an "orthologue" of a protein as referred to herein refers to a protein belonging to a different species that performs the same or similar function as a protein that is an orthologue thereof.
  • the term "identity" is used to mean the matching of sequences between two polypeptides or between two nucleic acids.
  • a position in the two sequences being compared is occupied by the same base or amino acid monomer subunit (for example, a position in each of the two DNA molecules is occupied by adenine, or two
  • Each position in each of the polypeptides is occupied by lysine, and then each molecule is identical at that position.
  • the "percent identity" between the two sequences is a function of the number of matching positions shared by the two sequences divided by the number of positions to be compared x 100. For example, if 6 of the 10 positions of the two sequences match, then the two sequences have 60%identity.
  • the DNA sequences CTGACT and CAGGTT share 50%identity (3 out of a total of 6 positions match) .
  • the comparison is made when the two sequences are aligned to produce maximum identity.
  • Such alignment can be achieved by, for example, the method of Needleman et al. (1970) J. Mol. Biol. 48: 443-453, which can be conveniently performed by a computer program such as the Align program (DNAstar, Inc. ) . It is also possible to use the algorithm of E. Meyers and W. Miller (Comput. Appl Biosci., 4: 11-17 (1988) ) integrated into the ALIGN program (version 2.0) , using the PAM 120 weight residue table.
  • the gap length penalty of 12 and the gap penalty of 4 were used to determine the percent identity between the two amino acid sequences.
  • the Needleman and Wunsch (J MoI Biol. 48: 444-453 (1970) ) algorithms in the GAP program integrated into the GCG software package can be used, using the Blossum 62 matrix or The PAM250 matrix and the gap weight of 16, 14, 12, 10, 8, 6 or 4 and the length weight of 1, 2, 3, 4, 5 or 6 to determine the percent identity between two amino acid sequences.
  • a “cell” as used herein, is understood to refer not only to the particular individual cell, but to the progeny or potential progeny of the cell. Because certain modifications may occur in succeeding generations due to either mutation or environmental influences, such progeny may not, in fact, be identical to the parent cell, but are still included within the scope of the term as used herein.
  • transduction and “transfection” as used herein include all methods known in the art using an infectious agent (such as a virus) or other means to introduce DNA into cells for expression of a protein or molecule of interest.
  • infectious agent such as a virus
  • virus or virus like agent there are chemical-based transfection methods, such as those using calcium phosphate, dendrimers, liposomes, or cationic polymers (e.g., DEAE-dextran or polyethylenimine) ; non-chemical methods, such as electroporation, cell squeezing, sonoporation, optical transfection, impalefection, protoplast fusion, delivery of plasmids, or transposons; particle-based methods, such as using a gene gun, magnectofection or magnet assisted transfection, particle bombardment; and hybrid methods, such as nucleofection.
  • transfected or “transformed” or “transduced” as used herein refers to a process by which exogenous nucleic acid is transferred or introduced into the host cell.
  • a “transfected” or “transformed” or “transduced” cell is one, which has been transfected, transformed or transduced with exogenous nucleic acid.
  • in vivo refers to inside the body of the organism from which the cell is obtained. “Ex vivo” or “in vitro” means outside the body of the organism from which the cell is obtained.
  • treatment is an approach for obtaining beneficial or desired results including clinical results.
  • beneficial or desired clinical results include, but are not limited to, one or more of the following: alleviating one or more symptoms resulting from the disease, diminishing the extent of the disease, stabilizing the disease (e.g., preventing or delaying the worsening of the disease) , preventing or delaying the spread (e.g., metastasis) of the disease, preventing or delaying the recurrence of the disease, reducing recurrence rate of the disease, delay or slowing the progression of the disease, ameliorating the disease state, providing a remission (partial or total) of the disease, decreasing the dose of one or more other medications required to treat the disease, delaying the progression of the disease, increasing the quality of life, and/or prolonging survival.
  • treatment is a reduction of pathological consequence of cancer. The methods of the invention contemplate any one or more of these aspects of treatment.
  • an “effective amount” used herein refers to an amount of a compound or composition sufficient to treat a specified disorder, condition or disease such as ameliorate, palliate, lessen, and/or delay one or more of its symptoms.
  • an “effective amount” may be in one or more doses, i.e., a single dose or multiple doses may be required to achieve the desired treatment endpoint.
  • a “subject, ” an “individual, ” or a “patient” are used herein interchangeably for purposes of treatment, and refers to any animal classified as a mammal, including humans, domestic and farm animals, and zoo, sports, or pet animals, such as dogs, horses, cats, cows, etc. In some embodiments, the individual is a human individual.
  • references to “about” a value or parameter herein includes (and describes) variations that are directed to that value or parameter per se. For example, description referring to “about X” includes description of “X. ”
  • reference to “not” a value or parameter generally means and describes “other than” a value or parameter.
  • the method is not used to treat cancer of type X means the method is used to treat cancer of types other than X.
  • a and/or B is intended to include both A and B; A or B; A (alone) ; and B (alone) .
  • the term “and/or” as used herein a phrase such as “A, B, and/or C” is intended to encompass each of the following embodiments: A, B, and C; A, B, or C; A or C; A or B; B or C; A and C; A and B; B and C; A (alone) ; B (alone) ; and C (alone) .
  • the present application provides a method of engineering an enzyme, comprising: (a) obtaining a plurality of engineered enzymes each comprising one or more mutations that increase flexibility of a flexible region in a plurality of flexible regions of a reference enzyme; and (b) selecting one or more engineered enzymes from the plurality of the engineered enzymes, wherein the one or more engineered enzymes have an increased activity compared to the reference enzyme.
  • the method further comprises determining a plurality of flexible regions in the reference enzyme.
  • a three- dimensional structure of the reference enzyme or a homolog thereof is not available.
  • the enzyme is a bacterial or archaeal enzyme.
  • the activity is measured in a eukaryotic cell, such as a mammalian cell, e.g., a human cell.
  • a method of engineering a Cas nuclease comprising: (a) determining a plurality of flexible regions in a reference Cas nuclease (e.g., based on the primary sequence of the reference Cas nuclease) ; (b) obtaining a plurality of engineered Cas nucleases each comprising one or more mutations that increase flexibility of a flexible region in the plurality of flexible regions of the reference Cas nuclease; and (c) selecting one or more engineered Cas nucleases from the plurality of the engineered Cas nucleases, wherein the one or more engineered Cas nucleases have an increased activity compared to the reference Cas nuclease.
  • the plurality of flexible regions is determined using a program selected from the group consisting of PredyFlexy, FoldUnfold, PROFbval, Flexserv, FlexPred, DynaMine and Disomine.
  • the Cas nuclease is selected from the group consisting of Cas9, Cas12a, Cas12b, Cas12c, Cas12d, Cas12f, Cas12g, Cas12h, Cms1, Cas12i, Cas12j, Cas12k and CasX.
  • the plurality of flexible regions are located in random coils.
  • the plurality of flexible regions are in domains of the reference Cas nuclease that interact with DNA and/or RNA.
  • the flexible region is at least about 5 (e.g., 5) amino acids long.
  • the one or more mutations comprise insertion of one or more (e.g., 2) Glycine (G) residues in a flexible region.
  • the one or more G residues are inserted at the N-terminus of a flexible amino acid residue in the flexible region, wherein the flexible amino acid residue is selected from the group consisting of G, Serine (S) , Asparagine (N) , Aspartic acid (D) , Histidine (H) , Methionine (M) , Threonine (T) , Glutamic acid (E) , Glutamine (Q) , Lysine (K) , Arginine (R) , Alanine (A) and Proline (P) .
  • the flexible amino acid residue is chosen according to the preference: G>S>N>D>H>M>T>E>Q>K>R>A>P.
  • the one or more mutations comprise substitution of one or more non-G residues with one or more G residues.
  • the one or more mutations comprise substitution of a hydrophobic amino acid residue in a flexible region with a G residue, wherein the hydrophobic amino acid residues is selected from the group consisting of leucine (L) , isoleucine (I) , valine (V) , cysteine (C) , tyrosine (Y) , phenylalanine (F) , and tryptophan (W) .
  • the activity is site-specific nuclease activity.
  • the activity is gene-editing activity in a eukaryotic cell (e.g., human cell) .
  • the gene-editing efficiency is measured using a T7 endonuclease 1 (T7E1) assay, sequencing of the target DNA, a Tracking of Indels by Decomposition (TIDE) assay, or Indel Detection by Amplicon Analysis (IDAA) assay.
  • T7E1 T7 endonuclease 1
  • TIDE Tracking of Indels by Decomposition
  • IDAA Indel Detection by Amplicon Analysis
  • a method of engineering a Cas nuclease comprising: (a) calculating a flexibility score of each amino acid residue of a reference Cas nuclease (e.g., based on the primary sequence of the reference Cas nuclease) , wherein a higher flexibility score indicates lower conformational flexibility; (b) selecting a plurality of peak amino acid residues at positions X i , wherein each peak amino acid residue has a flexibility score that is below a pre-determined threshold value and is lower than the flexibility scores of amino acid residues at positions X i -5 to X i -1 and X i +1 and X i +5; and (c) selecting a plurality of flexible regions as amino acid residues X i -2 to X i +2; (d) obtaining a plurality of engineered Cas nucleases each comprising one or more mutations that increase flexibility of a flexible region in the plurality of flexible regions of the reference Cas nucle
  • the plurality of flexible regions is determined using a program selected from the group consisting of PredyFlexy, FoldUnfold, PROFbval, Flexserv, FlexPred, DynaMine and Disomine.
  • the Cas nuclease is selected from the group consisting of Cas9, Cas12a, Cas12b, Cas12c, Cas12d, Cas12f, Cas12g, Cas12h, Cms1, Cas12i, Cas12j, Cas12k and CasX.
  • the plurality of flexible regions are located in random coils.
  • the plurality of flexible regions are in domains of the reference Cas nuclease that interact with DNA and/or RNA.
  • the one or more G residues are inserted at the N-terminus of a flexible amino acid residue in the flexible region, wherein the flexible amino acid residue is selected from the group consisting of G, Serine (S) , Asparagine (N) , Aspartic acid (D) , Histidine (H) , Methionine (M) , Threonine (T) , Glutamic acid (E) , Glutamine (Q) , Lysine (K) , Arginine (R) , Alanine (A) and Proline (P) .
  • the flexible amino acid residue is chosen according to the preference: G>S>N>D>H>M>T>E>Q>K>R>A>P.
  • the one or more mutations comprise substitution of a hydrophobic amino acid residue in a flexible region with a G residue, wherein the hydrophobic amino acid residues is selected from the group consisting of leucine (L) , isoleucine (I) , valine (V) , cysteine (C) , tyrosine (Y) , phenylalanine (F) , and tryptophan (W) .
  • the activity is site-specific nuclease activity.
  • the activity is gene-editing activity in a eukaryotic cell (e.g., human cell) .
  • the gene-editing efficiency is measured using a T7 endonuclease 1 (T7E1) assay, sequencing of the target DNA, a Tracking of Indels by Decomposition (TIDE) assay, or Indel Detection by Amplicon Analysis (IDAA) assay.
  • T7E1 T7 endonuclease 1
  • TIDE Tracking of Indels by Decomposition
  • IDAA Indel Detection by Amplicon Analysis
  • engineered enzymes e.g., engineered Cas nucleases obtained using any one of the methods described herein, and a library comprising the plurality of engineered enzymes described herein.
  • the plurality of flexible regions in the reference enzyme may be determined using any known methods in the art. In some embodiments, the plurality of flexible regions are determined solely based on the amino acid sequence of the reference enzyme. In some embodiments, the plurality of flexible regions are determined based on structural information of the reference enzyme, including, for example, secondary structure, crystal structure, NMR structure, etc. In some embodiments, the plurality of flexible regions are determined without reference to the structural information, e.g., three-dimensional structures, of the reference enzyme. In some embodiments, a three-dimensional structure of the reference enzyme or homolog thereof is not available.
  • the plurality of flexible regions is determined based on the amino acid sequence of the reference enzyme. In some embodiments, the plurality of flexible regions is determined using a program selected from the group consisting of PredyFlexy, FoldUnfold, PROFbval, Flexserv, FlexPred, and DynaMine. Methods of determining flexible regions based on amino acid sequence have been described, for example, by Yu et al. (Engineering proteins for thermostability through rigidifying flexible sites. Biotechnology Advances Volume 32, Issue 2, March-April 2014, Pages 308-315) .
  • the plurality of flexible regions is determined based on NMR chemical shift data for proteins in solution.
  • the plurality of flexible regions is determined using DynaMine (Cilia et al. The DynaMine webserver: predicting protein dynamics from sequence. Nucleic Acids Res. 2014 Jul 1; 42 (Web Server issue) : W264-W270) .
  • DynaMine leverages NMR chemical shift data for proteins in solution to obtain a quantitative insight into the relationship between the amino-acid sequence and backbone dynamics and predict flexible regions.
  • the DynaMine predictor developed from these data predicts the residue-level potential of a protein for backbone dynamics based on sequence information alone, as opposed to approaches that 3D structural information. This approach opens up the vast amount of available protein sequences lacking structural information for dynamics analysis.
  • the flexibility profile is an S2 score profile, wherein a lower S2 score indicates higher flexibility.
  • the plurality of flexible regions is determined using molecular dynamics simulations (e.g., simulated Root Mean Square Fluctuation) . In some embodiments, the plurality of flexible regions is determined based on B-factor (e.g., crystallographic) data. In some embodiments, the plurality of flexible regions is determined using PredyFlexy, described by Tarun et al. (In silico prediction of protein flexibility with local structure approach. Biochimie (165) , October 2019, Pages 150-155) .
  • the plurality of flexible regions is determined using the expected average number of contacts per residue as an indicator of whether the given region is folded or unfolded, such as with FoldUnfold (Oxana et al. FoldUnfold: web server for the prediction of disordered regions in protein chain, Bioinformatics, Volume 22, Issue 23, 1 December 2006, Pages 2948-2949) .
  • FoldUnfold Oxana et al. FoldUnfold: web server for the prediction of disordered regions in protein chain, Bioinformatics, Volume 22, Issue 23, 1 December 2006, Pages 2948-2949
  • the plurality of flexible regions is determined by calculating normalized B-values from amino acid sequence, such as with PROFbval (Schlessinger et al. PROFbval: predict flexible and rigid residues in proteins. Bioinformatics, Volume 22, Issue 7, 1 April 2006, Pages 891-893) .
  • the plurality of flexible regions is determined using a combination of multiple coarse-grained approaches, structural databases, and atomistic models, such as with Flexserv (Camps et al. FlexServ: an integrated tool for the analysis of protein flexibility. Bioinformatics 2009 Jul 1 ; 25 (13) : 1709-10) .
  • the plurality of flexible regions is determined using a trained supervised pattern recognition method, Support Vector Machine (SVM) , such as using the FlexPred webserver (Kuznetsov1 et al. FlexPred: a web-server for predicting residue positions involved in conformational switches in proteins. Bioinformation. 2008; 3 (3) : 134-136) .
  • SVM Support Vector Machine
  • the plurality of flexible regions is determined based on protein disorder calculated with recurrent neural networks, such as using DisoMine (Orlando et al. Prediction of disordered regions in proteins with recurrent Neural Networks and protein dynamics. bioRxiv 2020.05.25.115253; doi: https: //doi. org/10.1101/2020.05.25.115253) .
  • determining the plurality of flexible regions comprises: (i) calculating a flexibility score of each amino acid residue of the reference enzyme, wherein a higher flexibility score indicates lower conformational flexibility; (ii) selecting a plurality of peak amino acid residues at positions X i , wherein each peak amino acid residue has a flexibility score that is below a pre-determined threshold value and is lower than the flexibility scores of amino acid residues at positions X i -5 to X i -1 and X i +1 and X i +5; and (iii) defining the plurality of flexible regions as amino acid residues X i -2 to X i +2.
  • the flexible region is determined using DynaMine, and the amino acid residue with the highest flexibility in the flexible region has a flexibility score S 2 pred of no more than about 0.8 (e.g., no more than about 0.75 or 0.7) .
  • the flexible region has 5 consecutive amino acid residues, wherein the 3 rd amino acid residue has the lowest flexibility score, and wherein a higher flexibility score indicates lower conformational flexibility.
  • a flexible region is located in a random coil or linker region of the protein. In some embodiments, the random coil or linker region is determined based on available structural data. In some embodiments, the random coil or linker region is determined based on homology to orthologues with known structures. In some embodiments, the random coil or linker region is determined based on amino acid sequence. In some embodiments, no 3D structural data is available for the reference protein or an orthologue of the reference protein. In some embodiments, a flexible region is not located in a random coil or linker region. In some embodiments, a flexible region is located within about 10 amino acids, such as about any one of 9, 8, 7, 6, 5, 4, 3, 2, or 1 amino acids away from a random coil or linker region.
  • a flexible region is longer than a random coil or linker region. In some embodiments, a flexible region is shorter than a random coil or linker region. In some embodiments, at least a portion of a flexible region is located in an alpha helix or a beta strand.
  • a flexible region is located in a functional domain of the reference enzyme. In some embodiments, a flexible region is located in proximity to the catalytic site of the reference enzyme. In some embodiments, a flexible region is not located in a functional domain of the reference enzyme.
  • the reference enzyme is a reference nuclease. In some embodiments, the flexible region is in a domain of the reference nuclease (e.g., Cas nuclease) that interacts with DNA and/or RNA. In some embodiments, the flexible region is not located in a domain of the reference nuclease that interacts with DNA and/or RNA.
  • the reference enzyme is a Cas nuclease selected from the group consisting of Cas9, Cas12a, Cas12b, Cas12c, Cas12d, Cas12f, Cas12g, Cas12h, Cms1, Cas12i, Cas12j, Cas12k and CasX.
  • Cas nucleases that can be engineered using the methods described herein are discussed in the “Other engineered Cas effector proteins” subsection of Section III.
  • the reference enzyme is an Argonaut protein.
  • the engineered enzyme (e.g., Cas nuclease) comprises one or more mutations that increase flexibility of two or more (e.g., 2, 3, 4, 5, 6) flexible regions.
  • the one or more mutations in different flexible regions have synergistic effect in increasing the activity of the engineered enzyme with respect to the reference enzyme.
  • the method comprises generating an insertion of one or more G residues and/or G substitution of a hydrophobic amino acid residue in a flexible region comprising at least 5 consecutive amino acid residues of a reference Cas nuclease, wherein the engineered Cas nuclease has an increased activity compared to the reference Cas nuclease.
  • the flexible region has 5 consecutive amino acid residues, wherein the 3 rd amino acid residue has the lowest flexibility score, and wherein a higher flexibility score indicates lower conformational flexibility.
  • the one or more mutations comprise inserting one or more (e.g., 2) G residues in the flexible region.
  • the one or more G residues are inserted at the N-terminus of a flexible amino acid residue in the flexible region, wherein the flexible amino acid residue is selected from the group consisting of G, Serine (S) , Asparagine (N) , Aspartic acid (D) , Histidine (H) , Methionine (M) , Threonine (T) , Glutamic acid (E) , Glutamine (Q) , Lysine (K) , Arginine (R) , Alanine (A) and Proline (P) .
  • the flexible amino acid residue is chosen according to the preference: G>S>N>D>H>M>T>E>Q>K>R>A>P.
  • the one or more mutations comprise substituting a hydrophobic amino acid residue in a flexible region with a G residue, wherein the hydrophobic amino acid residues is selected from the group consisting of L, I, V, C, Y, F and W.
  • the engineered Cas nuclease comprises one or more mutations that increase flexibility of two or more flexible regions.
  • the reference Cas nuclease is selected from the group consisting of Cas9, Cas12a, Cas12b, Cas12c, Cas12d, Cas12f, Cas12g, Cas12h, Cms1, Cas12i, Cas12j, Cas12k and CasX.
  • the method comprises generating insertion of one or more G or S residues and/or substitution of a hydrophobic amino acid residue in a flexible region with a G or S residue.
  • the method comprises generating a substitution of a hydrophobic residue (e.g., L, I, V, C, Y, F or W) or a less flexible amino acid residue with a more flexible amino acid residue.
  • the more flexible amino acid residue is chosen according to the preference: G>S>N>D>H>M>T>E>Q>K>R>A>P.
  • the method comprises selecting one or more engineered enzymes from the plurality of the engineered enzymes, wherein the one or more engineered enzymes have an increased activity compared to the reference enzyme wherein the activity is site-specific nuclease activity.
  • the activity is gene-editing activity in a eukaryotic cell. In some embodiments, the activity is gene-editing activity in a human cell.
  • the increased activity described is site-specific nuclease activity.
  • the site-specific nuclease activity is determined in vitro.
  • the site-specific nuclease activity is determined in a cell. Site-specific nuclease activity may be assessed using known methods in the art, including, for example, in vitro cleavage assay based on agarose gel electrophoresis as described in the Examples provided herein.
  • an engineered enzyme e.g., engineered Cas nuclease
  • an engineered enzyme having at least about any one of 20%, 30%, 40%, 60%, 70%, 80%, 90%, 1.5 fold, 2 fold, 3 fold, 4 fold, 5 fold or higher increase of site-specific nuclease activity compared to that of the reference enzyme are selected.
  • the activity is gene-editing activity in a cell, such as a prokaryotic cell or a eukaryotic cell.
  • Gene-editing efficiency of an engineered enzyme in a cell may be determined using known methods in the art, including, for example, a T7 endonuclease 1 (T7E1) assay, sequencing of the target DNA (including, e.g., Sanger sequence, and next generation sequencing) , a Tracking of Indels by Decomposition (TIDE) assay, or Indel Detection by Amplicon Analysis (IDAA) assay.
  • T7E1 T7 endonuclease 1
  • sequencing of the target DNA including, e.g., Sanger sequence, and next generation sequencing
  • TIDE Tracking of Indels by Decomposition
  • IDAA Indel Detection by Amplicon Analysis
  • the gene-editing efficiency of an engineered enzyme in a cell is measured using targeted next-generation sequencing (NGS) , for example, as described in the Examples herein.
  • NGS next-generation sequencing
  • the activity is gene-editing activity in a eukaryotic cell, such as a plant cell or a mammalian cell.
  • engineered enzyme e.g., engineered Cas nuclease
  • engineered Cas nuclease variants having at least about any one of 20%, 30%, 40%, 60%, 70%, 80%, 90%, 1.5 fold, 2 fold, 3 fold, 4 fold, 5 fold or higher increase of site-specific nuclease activity in a eukaryotic cell compared to that of the reference enzyme are selected.
  • the activity is gene-editing activity in a human cell, such as 293 T cell.
  • engineered enzyme e.g., engineered Cas nuclease
  • engineered Cas nuclease variants having at least about any one of 20%, 30%, 40%, 60%, 70%, 80%, 90%, 1.5 fold, 2 fold, 3 fold, 4 fold, 5 fold or higher increase of site-specific nuclease activity in a human cell compared to that of the reference enzyme are selected.
  • the method further comprises a step of combining mutations from one or more engineered enzymes from the plurality of the engineered enzymes, wherein the one or more engineered enzymes have an increased activity compared to the reference enzyme.
  • the combined mutations comprise mutations in one or more (e.g., 2, 3, 4, or more) different flexible regions.
  • the present application provides engineered Cas effector proteins that have improved activity, such as target binding, double-strand cleavage activity, nickase activity, and/or gene-editing activity.
  • the engineered Cas nucleases are obtained using any one of the methods of engineering described in Section II.
  • an engineered Cas effector protein e.g., Cas nuclease, Cas nickase, Cas fusion effector protein, or split Cas effector protein
  • an engineered Cas effector protein comprising any one of the engineered Cas nucleases described herein or a functional derivative thereof.
  • an engineered Cas nuclease comprising one or more mutations that increase flexibility of a flexible region comprising at least 5 consecutive amino acid residues of a reference Cas nuclease, wherein the engineered Cas nuclease has an increased activity compared to the reference Cas nuclease.
  • the flexible region is determined based on the amino acid sequence of the reference Cas nuclease, e.g., using a program selected from the group consisting of PredyFlexy, FoldUnfold, PROFbval, Flexserv, FlexPred, DynaMine and Disomine.
  • the flexible region is determined using DynaMine, and wherein the amino acid residue with the highest flexibility in the flexible region has a flexibility score S 2 pred of no more than about 0.8 (e.g., no more than about 0.75 or 0.7) .
  • the flexible region has 5 consecutive amino acid residues, wherein the 3 rd amino acid residue has the lowest flexibility score, and wherein a higher flexibility score indicates lower conformational flexibility.
  • the flexible region is located in a random coil.
  • the flexible region is in a domain of the reference Cas nuclease that interacts with DNA and/or RNA.
  • the engineered Cas nuclease comprises one or more mutations that increase flexibility of two or more flexible regions.
  • the reference Cas nuclease is selected from the group consisting of Cas9, Cas12a, Cas12b, Cas12c, Cas12d, Cas12f, Cas12g, Cas12h, Cms1, Cas12i, Cas12j, Cas12k and CasX.
  • an engineered Cas nuclease comprising insertion of one or more G residues and/or substitution of a hydrophobic amino acid residue in a flexible region comprising at least 5 consecutive amino acid residues of a reference Cas nuclease, wherein the engineered Cas nuclease has an increased activity compared to the reference Cas nuclease.
  • the flexible region has 5 consecutive amino acid residues, wherein the 3 rd amino acid residue has the lowest flexibility score, and wherein a higher flexibility score indicates lower conformational flexibility.
  • the one or more mutations comprise inserting one or more (e.g., 2) G residues in the flexible region.
  • the one or more G residues are inserted at the N-terminus of a flexible amino acid residue in the flexible region, wherein the flexible amino acid residue is selected from the group consisting of G, S, N, D, H, M, T, E, Q, K, R, A and P.
  • the flexible amino acid residue is chosen according to the preference: G>S>N>D>H>M>T>E>Q>K>R>A>P.
  • the one or more mutations comprise substituting a hydrophobic amino acid residue in a flexible region with a G residue, wherein the hydrophobic amino acid residues is selected from the group consisting of L, I, V, C, Y, F and W.
  • the engineered Cas nuclease comprises one or more mutations that increase flexibility of two or more flexible regions.
  • the reference Cas nuclease is selected from the group consisting of Cas9, Cas12a, Cas12b, Cas12c, Cas12d, Cas12f, Cas12g, Cas12h, Cms1, Cas12i, Cas12j, Cas12k and CasX.
  • an engineered Cas effector protein comprising an engineered Cas nuclease or a functional derivative thereof, wherein the engineered Cas nuclease comprises one or more mutations that increase flexibility of a flexible region comprising at least 5 consecutive amino acid residues of a reference Cas nuclease, wherein the engineered Cas nuclease has an increased activity compared to the reference Cas nuclease.
  • the flexible region is determined using DynaMine, and wherein the amino acid residue with the highest flexibility in the flexible region has a flexibility score S 2 pred of no more than about 0.8 (e.g., no more than about 0.75 or 0.7) .
  • the flexible region has 5 consecutive amino acid residues, wherein the 3 rd amino acid residue has the lowest flexibility score, and wherein a higher flexibility score indicates lower conformational flexibility.
  • the engineered Cas nuclease comprises insertion of one or more G residues and/or substitution of a hydrophobic amino acid residue in the flexible region.
  • the one or more mutations comprise inserting one or more (e.g., 2) G residues in the flexible region.
  • the one or more G residues are inserted at the N-terminus of a flexible amino acid residue in the flexible region, wherein the flexible amino acid residue is selected from the group consisting of G, S, N, D, H, M, T, E, Q, K, R, A and P.
  • the flexible amino acid residue is chosen according to the preference: G>S>N>D>H>M>T>E>Q>K>R>A>P.
  • the one or more mutations comprise substituting a hydrophobic amino acid residue in a flexible region with a G residue, wherein the hydrophobic amino acid residues is selected from the group consisting of L, I, V, C, Y, F and W.
  • the engineered Cas nuclease comprises one or more mutations that increase flexibility of two or more flexible regions.
  • the reference Cas nuclease is selected from the group consisting of Cas9, Cas12a, Cas12b, Cas12c, Cas12d, Cas12f, Cas12g, Cas12h, Cms1, Cas12i, Cas12j, Cas12k and CasX.
  • the functional derivative of the engineered Cas nuclease is an enzymatically active Cas nuclease, a Cas nickase, an enzymatically dead Cas, a fusion effector protein (e.g., a transcriptional activator, a transcriptional repressor, a base editor, or a primer editor) , or a split Cas effector protein (e.g., an inducible split Cas effector protein, or an auto-inducible split Cas effector protein) .
  • a fusion effector protein e.g., a transcriptional activator, a transcriptional repressor, a base editor, or a primer editor
  • a split Cas effector protein e.g., an inducible split Cas effector protein, or an auto-inducible split Cas effector protein
  • engineered CRISPR-Cas systems comprising any one of the engineered Cas effector proteins (e.g., engineered Cas nucleases) described herein and a guide RNA (including a precursor guide RNA array, a crRNA, a single guide RNA, or a crRNA and a tracrRNA) .
  • the engineered CRISPR-Cas system comprises one or more nucleic acid molecules encoding the engineered Cas effector protein, and/or the guide RNA.
  • the engineered CRISPR-Cas system comprises one or more vectors encoding the engineered Cas effector protein, and/or the guide RNA.
  • a vector encoding the engineered Cas effector protein.
  • the vector further comprises a guide RNA.
  • the vector is selected from the group consisting of retroviral vectors, lentiviral vectors, adenoviral vectors, adeno-associated vectors, and herpes simplex vector.
  • the vector is an adeno-associated viral (AAV) vector.
  • the engineered Cas nucleases described herein have mutations that increase conformational flexibility of one or more (e.g., 1, 2, 3, or more) flexible regions in a reference Cas nuclease, such as a naturally occurring wildtype Cas nuclease. Mutations in two or more flexible regions may be combined to provide synergistic increase of activity in the engineered Cas nuclease as compared to the reference Cas nuclease.
  • flexible regions in a reference Cas nuclease may be determined using known methods in the art, for example, based one or more of the methods described in Section II, “Methods of Engineering Enzymes, ” above.
  • the flexible region (s) of a reference Cas is determined based on the amino acid sequence of the reference Cas nuclease. In some embodiments, the flexible region (s) of a reference Cas nuclease is determined using a program selected from the group consisting of PredyFlexy, FoldUnfold, PROFbval, Flexserv, FlexPred, DynaMine and Disomine.
  • the flexible region may comprises 5 or more (e.g., 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15 or more) consecutive amino acid residues of the reference Cas nuclease. In some embodiments, the flexible region has 5 consecutive amino acid residues of the reference Cas nuclease.
  • the flexible region may be defined by first selecting a peak amino acid residue at position X of the reference Cas nuclease, in which the peak amino acid residue has a flexibility score that is below a pre-determined threshold value (e.g., a S 2 pred of 0.8 or less as determined by DynaMine) , and wherein the peak amino acid residue has a flexibility score that is lower than the flexibility scores of amino acid residues at positions X-5 to X-1 (i.e., 5 amino acid residues flanking the N-terminus of the peak amino acid residue) and X+i and X+5 (i.e., 5 amino acid residues flanking the C-terminus of the peak amino acid residue) , wherein a higher flexibility score indicates lower conformational flexibility.
  • the flexible region has 5 consecutive amino acid residues, wherein the 3 rd amino acid residue has the lowest flexibility score, and wherein a higher flexibility score indicates lower conformational flexibility.
  • the flexible region (s) of a reference Cas nuclease is determined using DynaMine.
  • the amino acid residue with the highest flexibility in the flexible region (i.e., the peak amino acid residue) of the flexible region is in the context-dependent flexible zone as determined by DynaMine.
  • the amino acid residue with the highest flexibility in the flexible region (i.e., the peak amino acid residue) of the flexible region is in the flexible zone as determined by DynaMine.
  • the amino acid residue with the highest flexibility (i.e., the peak amino acid residue) in the flexible region has a flexibility score S 2 pred of no more than about 0.8., e.g., no more than about any one of 0.79, 0.78, 0.77, 0.76, 0.75, 0.74, 0.73, 0.72, 0.71, 0.7 or less, based on calculation in DynaMine.
  • the amino acid residues of the flexible region are in the context-dependent flexible zone and/or the flexible zone as determined by DynaMine.
  • each amino acid residue in the flexible region has a flexibility score S 2 pred of no more than about 0.8., e.g., no more than about any one of 0.79, 0.78, 0.77, 0.76, 0.75, 0.74, 0.73, 0.72, 0.71, 0.7 or less, based on calculation in DynaMine.
  • the flexible region (s) of a reference Cas nuclease is located in a random coil. In some embodiments, the flexible region (s) of a reference Cas nuclease is not located in an alpha helix. In some embodiments, the flexible region (s) of a reference Cas nuclease is not located in a beta strand. In some embodiments, the flexible region (s) of a reference Cas nuclease is not located in a random coil. In some embodiments, at least a portion of the flexible region (s) of a reference Cas nuclease is located in an alpha helix or a beta strand.
  • the flexible region (s) of a reference Cas nuclease is in a domain of the reference Cas nuclease that interacts with DNA. In some embodiments, the flexible region (s) of a reference Cas nuclease is in a domain of the reference Cas nuclease that interacts with RNA (e.g., guide RNA, such as crRNA and/or tracrRNA) . In some embodiments, the flexible region (s) of a reference Cas nuclease is in a domain of the reference Cas nuclease that does not interact with DNA or RNA. In some embodiments, flexible region (s) of a reference Cas nuclease is located between functional domains of the reference Cas nuclease.
  • RNA e.g., guide RNA, such as crRNA and/or tracrRNA
  • Any mutations that can increase flexibility of the flexible region (s) can be used. Flexibility of the 20 naturally occurring amino acid residues has been characterized based on experimental data (e.g., crystal structures, NMR, and other protein dynamics studies) . Flexible amino acid residues include G, T, R, S, N, Q, D, P, E, K, A, M; and rigid amino acid residues include W, Y, F, C, I, V, H, L. In some embodiments, the flexibility of an amino acid depends on the identity of its neighboring amino acids. See, for example, Smith DK et al., “Improved amino acid flexibility parameters, ” Protein Sci., 2003, 12 (5) : 1060-1072.
  • the one or more mutations to the flexible region comprises insertion of one or more (e.g., 1, 2, 3, or more) amino acid residues associated with flexible conformation.
  • the insertion may occur at any position (s) of the flexible region.
  • the insertion is at the N-terminus or the C-terminus of the peak amino acid in the flexible region.
  • the insertion is at the N-terminus or the C-terminus of a flexible amino acid residue, e.g., with the preference of G>S>N>D>H>M>T>E>Q>K>R>A>P.
  • the insertion is made at the N-terminus or the C-terminus of the preferred flexible amino acid residue that is closer to the peak amino acid residue.
  • the insertion is made at the N-terminus or the C-terminus of the preferred flexible amino acid residue having a neighboring amino acid residue that has a higher preference in terms of flexibility (i.e., G>S>N>D>H>M>T>E>Q>K>R>A>P) than the neighboring amino acid residue of the other preferred flexible amino acid residue.
  • the insertion is made at the N-terminus or the C-terminus of the preferred flexible amino acid residue that is closer to the N-terminus of the reference Cas nuclease.
  • Glycine (G) is widely accepted as a flexible amino acid residue.
  • the one or more mutations to the flexible region comprises insertion of one or more (e.g., 1, 2, 3, or more) Gs in the flexible region.
  • the one or more mutations comprise inserting two G residues in the flexible region.
  • the one or more G residues are inserted at the N-terminus of a flexible amino acid residue in the flexible region, wherein the flexible amino acid residue is selected from the group consisting of G, S, N, D, H, M, T, E, Q, K, R, A and P.
  • the flexible amino acid residue is chosen according to the preference: G>S>N>D>H>M>T>E>Q>K>R>A>P. Insertion of other flexible amino acid residues or groups of flexible amino acid residues, such as S, T, GS, SG, GGS, etc., is also contemplated.
  • the one or more mutations to the flexible region comprises substitution of one or more amino acid residues associated with less flexible conformation (e.g., rigid amino acid residues) with one or more amino acid residues with more flexible conformation (e.g., flexible amino acid residues) .
  • the one or more mutations of the flexible region comprises substitution of a hydrophobic amino acid residue in the flexible region with a flexible amino acid residue.
  • the hydrophobic amino acid residue is selected from the group consisting of L, I, V, C, Y, F and W.
  • the one or more mutations of the flexible region comprises substitution of two or more (e.g., 2, 3, or 4) hydrophobic amino acid residues in the flexible region with a flexible amino acid residue.
  • the flexible amino acid residue is G, S, T, N, D, H, K or R. In some embodiments, the flexible amino acid residue is G, S, or T.
  • the one or more mutations comprise substituting a hydrophobic amino acid residue in a flexible region with a G residue, wherein the hydrophobic amino acid residues is selected from the group consisting of L, I, V, C, Y, F and W.
  • the hydrophobic amino acid residues is selected from the group consisting of L, I, V, C, Y, F and W.
  • Combination of insertion (s) and substitution (s) in the flexible region as described herein are also contemplated, provided that the mutations do not adversely affect the folding and/or activity of the engineered Cas nuclease.
  • the engineered Cas nuclease is a Cas effector protein of a Class 1 or Class 2 CRISPR-Cas system. In some embodiments, the engineered Cas nuclease is a Cas effector protein of the Type I-A, Type T-B, Type I-C, Type I-D, Type I-E, Type I-F, Type II-A, Type II-B, Type II-C, Type III-A, Type III-B, Type IV-A, Type IV-B, Type V-A, Type V-B, Type V-F, Type V-U3, Type V-U4, Type V-U2, Type V-U1, Type V-C, Type V-D, Type V-E, Type V-U5, Type V-G, Type V-H, Type V-I, Type V-K and Type VI CRISPR-Cas systems.
  • the engineered Cas nuclease is selected from the group consisting of Cas9, Cas12a, Cas12b, Cas12c, Cas12d, Cas12f, Cas12g, Cas12h, Cms1, Cas12i, Cas12j, Cas12k and CasX.
  • the engineered Cas nuclease is a Cas9.
  • the engineered Cas nuclease is a Cas12i.
  • the engineered Cas nuclease is a Cas12b.
  • the engineered Cas nuclease has increased activity compared to the reference Cas nuclease.
  • the activity is target-DNA binding activity.
  • the activity is site-specific nuclease activity.
  • the activity is double-strand DNA cleavage activity.
  • the activity is single-strand DNA cleavage activity, including, e.g., site-specific DNA cleavage activity, or nonspecific DNA cleavage activity.
  • the activity is single-strand RNA cleavage activity, e.g., site-specific RNA cleavage activity or nonspecific RNA cleavage activity.
  • the activity is measured in vitro.
  • the activity is measured in a cell, such as a bacterial cell, a plant cell, or a eukaryotic cell. In some embodiments, the activity is measured in a mammalian cell, e.g., a rodent cell or a human cell. In some embodiments, the activity is measured in a human cell, such as a 293T cell.
  • the engineered Cas nuclease has at least about any one of 20%, 30%, 40%, 60%, 70%, 80%, 90%, 1.5 fold, 2 fold, 3 fold, 4 fold, 5 fold or higher increase of site-specific nuclease activity compared to that the reference Cas nuclease.
  • the site-specific nuclease activity of the engineered Cas nuclease may be measured using known methods in the art, including, for example, gel-shift assay.
  • the activity is gene-editing activity in a cell.
  • the cell is a bacterial cell, a plant cell, or a eukaryotic cell.
  • the cell is a mammalian cell, such as a rodent cell or a human cell.
  • the cell is a 293T cell.
  • the activity is indel formation activity, e.g., via site-specific cleavage of a target nucleic acid by the engineered Cas nuclease and DNA repair by non-homologous end-joining (NHEJ) mechanism, in a cell at a target genomic locus.
  • NHEJ non-homologous end-joining
  • the activity is insertion of an exogenous nucleic acid sequence, e.g., via site-specific cleavage of a target nucleic acid by the engineered Cas nuclease and DNA repair by homologous recombination (HR) mechanism, in a cell at a target genomic locus.
  • HR homologous recombination
  • the engineered Cas nuclease has at least about any one of 20%, 30%, 40%, 60%, 70%, 80%, 90%, 1.5 fold, 2 fold, 3 fold, 4 fold, 5 fold or higher increase of gene-editing (e.g., indel formation) activity compared to that the reference Cas nuclease at a genomic locus in a cell (e.g., human cell such as 293T cell) .
  • a cell e.g., human cell such as 293T cell
  • the engineered Cas nuclease has at least about any one of 20%, 30%, 40%, 60%, 70%, 80%, 90%, 1.5 fold, 2 fold, 3 fold, 4 fold, 5 fold or higher increase of gene-editing (e.g., indel formation) activity compared to that the reference Cas nuclease at a plurality (e.g., 2, 3, 4, 5, 6, 7, 8, 9, 10 or more) of genomic loci in a cell (e.g., human cell such as 293T cell) .
  • the engineered Cas nuclease is capable of editing a larger number of genomic loci than the reference Cas nuclease.
  • the consensus PAM sequence of the engineered Cas nuclease is the same as the reference Cas nuclease.
  • Gene-editing efficiency of an engineered Cas nuclease in a cell may be determined using known methods in the art, including, for example, a T7 endonuclease 1 (T7E1) assay, sequencing of the target DNA (including, e.g., Sanger sequence, and next generation sequencing) , a Tracking of Indels by Decomposition (TIDE) assay, or Indel Detection by Amplicon Analysis (IDAA) assay.
  • T7E1 T7 endonuclease 1
  • sequencing of the target DNA including, e.g., Sanger sequence, and next generation sequencing
  • TIDE Tracking of Indels by Decomposition
  • Indel Detection by Amplicon Analysis Indel Detection by Amplicon Analysis
  • the gene-editing efficiency of the engineered Cas nuclease in a cell is measured using targeted next-generation sequencing (NGS) , for example, as described in the Examples herein.
  • NGS next-generation sequencing
  • Exemplary genomic loci for determination of gene-editing efficiency of the engineered Cas nuclease include, but are not limited to, CCR5, AAVS, CD34, RNF2, and EMX1.
  • the present application further provides engineered Cas effector proteins based on any one of the engineered Cas nucleases described herein.
  • the engineered Cas effector protein comprises a functional derivative of the engineered Cas nuclease, such as any one of the functional derivatives as described in the section “Functional Derivatives” below.
  • the engineered Cas effector protein has site-specific nuclease activity. In some embodiments, the engineered Cas effector protein can induce double-strand breaks in a target DNA molecule. In some embodiments, the engineered Cas effector protein comprises an enzymatically active engineered Cas nuclease. In some embodiments, the engineered Cas effector protein is a Cas nickase that can induces a single-strand break in a target DNA molecule. In some embodiments, the engineered Cas effector protein comprises a nickase mutant of the engineered Cas nuclease.
  • the engineered Cas effector protein comprises an enzymatically inactive (i.e., enzymatically dead) mutant of the engineered Cas nuclease. In some embodiments, the engineered Cas effector protein further comprises one or more functional domains fused to the engineered Cas nuclease or functional derivative thereof. In some embodiments, the engineered Cas effector protein comprises a functional domain fused to an enzymatically inactive mutant of the engineered Cas nuclease.
  • the functional domain is selected from the group consisting of a translation initiator domain, a transcription repressor domain, a transactivation domain, an epigenetic modification domain, a nucleobase-editing domain (e.g., CBE or ABE domain) , a reverse transcriptase domain, a reporter domain (e.g., a fluorescent domain) and a nuclease domain.
  • the engineered Cas effector protein comprises split Cas polypeptides based on the engineered Cas nuclease or functional derivative thereof.
  • the engineered Cas effector protein comprises a first polypeptide comprising an N-terminal portion of the engineered Cas nuclease or functional derivative thereof, and a second polypeptide comprising a C-terminal portion of the engineered Cas nuclease or functional derivative thereof, wherein the first polypeptide and the second polypeptide are capable of associating with each other in the presence of a guide RNA comprising a guide sequence to form a CRISPR complex that specifically binds to a target nucleic acid comprising a target sequence complementary to the guide sequence.
  • the split Cas effector protein is inducible, e.g., each split Cas polypeptide comprises a dimerization domain that are capable of associating with each other in the presence of an inducer (e.g., rapamycin) .
  • the split Cas effector protein is auto-inducible, e.g., the split Cas polypeptides do not comprise dimerization domains, and they are capable of associating with each other in the presence of a guide RNA.
  • the present application provides engineered Cas12b effector proteins (e.g., Cas12b nucleases, Cas12b nickases, Cas12b fusion effector proteins, split Cas12b effector proteins) that have improved activity (e.g., target binding, double-strand cleavage activity, nickase activity, and/or gene-editing activity) .
  • engineered Cas12b effector proteins e.g., Cas12b nucleases, Cas12b nickases, Cas12b fusion effector proteins, split Cas12b effector proteins
  • improved activity e.g., target binding, double-strand cleavage activity, nickase activity, and/or gene-editing activity
  • an engineered Cas12b nuclease comprising one or more mutations that increase flexibility of a flexible region that corresponds to amino acid residues 835 to 839, wherein the amino acid residue numbering is based on SEQ ID NO: 1, wherein the engineered Cas12b nuclease has an increased activity compared to a reference Cas12b nuclease.
  • the reference Cas12b nuclease is BhCas12b (e.g., BhCas12bv4) .
  • the one or more mutations comprise inserting one or more (e.g., 2) G residues in the flexible region.
  • the one or more G residues are inserted at the N-terminus of a flexible amino acid residue in the flexible region, wherein the flexible amino acid residue is selected from the group consisting of G, S, N, D, H, M, T, E, Q, K, R, A and P.
  • the flexible amino acid residue is chosen according to the preference: G>S>N>D>H>M>T>E>Q>K>R>A>P.
  • the one or more mutations comprise substituting a hydrophobic amino acid residue in a flexible region with a G residue, wherein the hydrophobic amino acid residues is selected from the group consisting of L, I, V, C, Y, F and W.
  • the engineered Cas12b nuclease comprises the amino acid sequence of SEQ ID NO: 83. In some embodiments, the engineered Cas12b nuclease comprises an amino acid sequence having at least about 85% (e.g., at least about any one of 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, 99%or more) sequence identity to SEQ ID NO: 2.
  • an engineered Cas12b nuclease comprising the amino acid sequence of SEQ ID NO: 2.
  • Type V-B CRISPR-Cas12b (also known as C2c1) system has been identified as a dual-RNA-guided (i.e., crRNA and tracrRNA) DNA endonuclease system with distinct features from Cas9 and Cas12a (Shmakov, S. et al. Mol. Cell 60, 385-397 (2015) ) .
  • Cas12b was reported to generate staggered ends distal to the protospacer adjacent motif (PAM) site in vitro when reconstituted with the crRNA/tracrRNA duplex.
  • PAM protospacer adjacent motif
  • Cas12b proteins are smaller than the most widely used SpCas9 and Cas12a (e.g., AacCas12b: 1,129 amino acids (aa) ; SpCas9: 1,369 aa; AsCas12a: 1,353 aa; LbCas12a: 1,228 aa) , making Cas12b suitable for adeno-associated virus (AAV) -mediated in vivo delivery in gene therapy.
  • AAV adeno-associated virus
  • Cas12b Compared with small-sized Cas9 proteins, such as SaCas9 and CjCas9, Cas12b recognizes simpler PAM sequences (e.g., AacCas12b: 5′-TTN-3’ (SEQ ID NO: 3) ; compared to SaCas9: 5’-NNGRRT-3’ (SEQ ID NO: 4) , CjCas9: 5’-NNNNRYAC-3’ (SEQ ID NO: 5) ) , which significantly increase the targeting range of Cas12b in the genome. Additionally, Cas12b has minimal off-target effects and thus may serve as a safer choice for therapeutic and clinical applications.
  • PAM sequences e.g., AacCas12b: 5′-TTN-3’ (SEQ ID NO: 3) ; compared to SaCas9: 5’-NNGRRT-3’ (SEQ ID NO: 4) , CjCas9: 5’-NNNNRYAC-3’ (SEQ
  • Cas12b (C2c1) nucleases from various organisms may be used as the reference Cas12b nuclease to provide engineered Cas12b effector proteins of the present application.
  • Exemplary Cas12b nucleases have been described, for example, in Shmakov, S. et al. Mol. Cell 60, 385-397 (2015) ; Shmakov, S. et al. Nat. Rev. Microbiol. 15, 169-182 (2017) ; WO2016205764, and WO2020/087631, which are incorporated herein by reference in their entirety.
  • the engineered Cas12b effector protein is based on a reference Cas12b protein (e.g., Cas12b nuclease) selected from Cas12b proteins from Alicyclobacillus acidiphilus (AaCas12b) , Cas12b from Alicyclobacillus kakegawensis (AkCas12b) , Cas12b from Alicyclobacillus macrosporangiidus (AmCas12b) , Cas12b from Bacillus hisashii (BhCas12b) , BsCas12b from Bacillus, Bs3Cas12b from Bacillus, Cas12b from Desulfovibrio inopinatus (DiCas12b) , Cas12b from Laceyella sediminis (LsCas12b) , Cas12b from Spirochaetes bacterium (SbCas12b) , Cas12b from Sp
  • the reference Cas12b protein is a Cas12b nuclease from Bacillus hisashii (BhCas12b) or a functional derivative thereof.
  • the engineered Cas12b effector protein is based on a reference Cas12b protein comprising an amino acid sequence having at least about 85% (e.g., at least about any one of 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, or 99%) sequence identity to the amino acid sequence of SEQ ID NO: 6.
  • the engineered Cas12b effector protein is based on a reference Cas12b nuclease comprising the amino acid sequence of SEQ ID NO: 6. Exemplary flexible regions in BhCas12b and mutations that increase flexibility of the flexible regions are shown in FIGs. 2-3.
  • the reference Cas12b protein is a Cas12b nuclease from Alicyclobacillus acidiphilus (AaCas12b) or a functional derivative thereof.
  • the engineered Cas12b effector protein is based on a reference Cas12b protein comprising an amino acid sequence having at least about 85% (e.g., at least about any one of 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, or 99%) sequence identity to the amino acid sequence of SEQ ID NO: 7.
  • the engineered Cas12b effector protein is based on a reference Cas12b nuclease comprising the amino acid sequence of SEQ ID NO: 7.
  • orthologues having a certain sequence identity e.g., at least about any one of 60%, 70%, 80%, 85%, 90%, 95%, 98%or higher
  • sequence identity e.g., at least about any one of 60%, 70%, 80%, 85%, 90%, 95%, 98%or higher
  • the skilled artisan can determine, based on the purpose and application, the percentage of sequence identity of an orthologue of Cas12b or fragment thereof suitable for use in the present application.
  • a reference Cas12b nuclease comprises from the N-terminus to the C-terminus: a first WED domain (WED-I; also known as OBD-I domain) , a first REC domain (REC1) , a second WED domain (WED-II; also known as OBD-II domain) , a first RuvC domain (RuvC-I) , a bridge helix (BH) domain, a second RuvC domain (RuvC-II) , a first Nuc domain (Nuc-I; also known as UK-I domain) , a third RuvC domain (RuvC-III) and a second Nuc domain (Nuc-II; also known as UK-II domain) .
  • Domain boundaries may be determined using known methods in the art, such as based on crystal structures of a reference Cas12b nuclease (e.g., PDB ID Nos: 5U30, 5U31, 5U33, 5U34 and 5WQE for AaCas12b) , and/or sequence homology to known functional domains in a reference Cas12b nuclease.
  • a reference Cas12b nuclease e.g., PDB ID Nos: 5U30, 5U31, 5U33, 5U34 and 5WQE for AaCas12b
  • the AaCas12b has the following domains: WEB-I domain (amino acid residues 1-14) , REC1 domain (amino acid residues 15-386) , WED-II domain (amino acid residues 387-518) , RuvC-I domain (amino acid residues 519-628) , BH domain (amino acid residues 629-658) , REC2 domain (amino acid residues 659-784) , RuvC-II domain (amino acid residues 785-900) , Nuc-I domain (amino acid residues 901-974) , RuvC-III domain (amino acid residues 975-993) , and Nuc-II domain (amino acid residues 994-1129) , wherein the amino acid numbering is based on SEQ ID NO: 7.
  • Crystal structures of Alicyclobacillus acidoterrestris Cas12b bound to sgRNA as a binary complex and to target DNAs as ternary complexes have been described in Yang H., et al. Cell 167: 1814-1828 (2016) and Liu L. et al. Mol. Cell 65: 310-322 (2017) . Briefly, the crystal structures show 2 discontinuous REC (recognition, residues 15-386, 658-784) and NUC (nuclease, residues 1-14, 387-658 and 785-1129) lobes composed of several domains each.
  • the crRNA or single guide RNA, sgRNA
  • PAM recognition is sequence specific and occurs mostly via interaction with the REC 1 (helical-1) and WED-II (OBD-II) domains.
  • the sgRNA-target DNA heteroduplex binds primarily to the REC lobe in a sequence-independent manner.
  • the engineered Cas12b nuclease is based on a functional variant of a naturally occurring Cas12b nuclease.
  • the functional variant has one or more mutations, such as amino acid substitutions, insertions and deletions.
  • the functional variant may comprise any one of 1, 2, 3, 4, 5, 6, 7, 8, 9, 10 or more amino acid substitutions compared to a wild type naturally occurring Cas12b nuclease.
  • the one or more substitutions are conservative substitutions.
  • the functional variant has all domains of a naturally occurring Cas12b nuclease. In some embodiments, the functional variant does not have one or more domains of a naturally occurring Cas12b nuclease.
  • engineered Cas12b effector proteins based on any one of the engineered Cas12b nucleases described herein.
  • the engineered Cas12b effector protein is enzymatically active.
  • the engineered Cas12b effector protein is a nuclease that cleaves both strands of a target duplex nucleic acid (e.g., duplex DNA) .
  • the engineered Cas12b effector protein is a nickase, i.e., cleaving a single strand of a target duplex nucleic acid (e.g., duplex DNA) .
  • the engineered Cas12b effector protein comprises an enzymatically inactive mutant of the engineered Cas12b nuclease. Mutations at one or more amino acid residues in the active site of a Cas12b nuclease can result in an enzymatically dead Cas12b.
  • R785A, R911A, or D977A mutants of AaCas12b have no nuclease activities in human cells. See, for example, Teng F. et al., Cell Discovery, 4, Article number: 63 (2016) , which is incorporated herein by reference in its entirety.
  • the engineered Cas12b effector protein comprises an engineered Cas12b having a mutation corresponding to the R785A mutation of AaCas12b. In some embodiments, the engineered Cas12b effector protein comprises an engineered Cas12b having one or more mutations corresponding to R785A, R911A or D977A of AaCas12b. In some embodiments, the engineered Cas12b effector protein comprises an engineered Cas12b having a mutation corresponding to the R911A mutation of AaCas12b. In some embodiments, the engineered Cas12b effector protein comprises an engineered Cas12b having a mutation corresponding to the D977A mutation of AaCas12b.
  • an engineered Cas12b nickase there is provided an engineered Cas12b fusion effector protein, comprising an engineered Cas12b nuclease or functional derivative thereof (e.g., an enzymatically inactive mutant of the engineered Cas12b nuclease) fused to a functional domain, such as a translation initiator domain, a transcription repressor domain, a transactivation domain, an epigenetic modification domain, a nucleobase-editing domain (e.g., CBE or ABE domain) , a reverse transcriptase domain, a reporter domain (e.g., a fluorescent domain) , or a nuclease domain.
  • a functional domain such as a translation initiator domain, a transcription repressor domain, a transactivation domain, an epigenetic modification domain, a nucleobase-editing domain (e.g., CBE or ABE domain) , a reverse transcriptase domain, a reporter domain (e
  • engineered CRISPR-Cas12b systems comprising any one of the engineered Cas12b effector proteins described herein, and a guide RNA comprising a guide sequence complementary to a target sequence, or one or more nucleic acids encoding the guide RNA, wherein the engineered Cas12b effector protein and the guide RNA are capable of forming a CRISPR complex that specifically binds to a target nucleic acid comprising the target sequence and induces a modification of the target nucleic acid.
  • the guide RNA comprises a crRNA and a tracrRNA.
  • the guide RNA is a sgRNA.
  • the guide RNA is a precursor that can be processed into a crRNA and a tracrRNA. In some embodiments, the guide RNA is a precursor RNA array encoding a plurality of crRNAs, and wherein each processed crRNA is associated with a tracrRNA.
  • the engineered Cas12b effector protein and/or the guide RNA are encoded by one or more vectors such as AAV vectors.
  • the engineered CRISPR-Cas12b system is a ribonucleoprotein (RNP) complex comprising the engineered Cas12b effector protein bound to the guide RNA.
  • RNP ribonucleoprotein
  • the present application provides engineered Cas12i effector proteins that have improved activity (e.g., target binding, double-strand cleavage activity, nickase activity, and/or gene-editing activity) .
  • an engineered Cas12i nuclease comprising one or more mutations that increase flexibility of a flexible region in a reference Cas12i nuclease that is selected from the group consisting of regions corresponding to amino acid residues 228-232, amino acid residues 439-443, amino acid residues 478-482, amino acid residues 500-504, amino acid residues 775-779, and amino acid residues 925-929, wherein the amino acid residue numbering is based on SEQ ID NO: 8, wherein the engineered Cas12i nuclease has an increased activity compared to the reference Cas12i nuclease.
  • the flexible region corresponds to amino acid residues 439-443, or amino acid residues 925-929, wherein the amino acid residue numbering is based on SEQ ID NO: 8.
  • the reference Cas12i nuclease is Cas12i2.
  • the one or more mutations comprise inserting one or more (e.g., 2) G residues in the flexible region.
  • the one or more G residues are inserted at the N-terminus of a flexible amino acid residue in the flexible region, wherein the flexible amino acid residue is selected from the group consisting of G, S, N, D, H, M, T, E, Q, K, R, A and P.
  • the flexible amino acid residue is chosen according to the preference: G>S>N>D>H>M>T>E>Q>K>R>A>P.
  • the one or more mutations comprise substituting a hydrophobic amino acid residue in a flexible region with a G residue, wherein the hydrophobic amino acid residues is selected from the group consisting of L, I, V, C, Y, F and W.
  • the engineered Cas12i nuclease comprises one or more amino acid sequences selected from the group consisting of SEQ ID NOs: 96-105.
  • the engineered Cas12i nuclease comprises one or more amino acid sequences selected from the group consisting of SEQ ID NOs: 99 and 104-105. In some embodiments, the engineered Cas12i nuclease comprises the amino acid sequence of having at least about 85% (e.g., at least about any one of 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, 99%or more) sequence identity to SEQ ID NO: 8.
  • an engineered Cas12i2 nuclease comprising the amino acid sequence of SEQ ID NO: 99.
  • the engineered Cas12i2 nuclease comprises an amino acid sequence having at least about 85%(e.g., at least about any one of 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, 99%or more) sequence identity to SEQ ID NO: 14.
  • an engineered Cas12i nuclease comprising the amino acid sequence of SEQ ID NO: 14.
  • an engineered Cas12i2 nuclease comprising the amino acid sequence of SEQ ID NO: 104.
  • the engineered Cas12i2 nuclease comprises an amino acid sequence having at least about 85%(e.g., at least about any one of 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, 99%or more) sequence identity to SEQ ID NO: 19.
  • an engineered Cas12i nuclease comprising the amino acid sequence of SEQ ID NO: 18.
  • an engineered Cas12i2 nuclease comprising the amino acid sequence of SEQ ID NO: 105.
  • the engineered Cas12i2 nuclease comprises an amino acid sequence having at least about 85%(e.g., at least about any one of 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, 99%or more) sequence identity to SEQ ID NO: 18.
  • an engineered Cas12i nuclease comprising the amino acid sequence of SEQ ID NO: 19.
  • an engineered Cas12i2 nuclease comprising the amino acid sequence of SEQ ID NO: 99 and the amino acid sequence of SEQ ID NO: 104.
  • the engineered Cas12i2 nuclease comprises an amino acid sequence having at least about 85% (e.g., at least about any one of 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, 99%or more) sequence identity to SEQ ID NO: 20.
  • an engineered Cas12i nuclease comprising the amino acid sequence of SEQ ID NO: 20.
  • Type V-I CRISPR-Cas12i has been identified as an RNA-guided DNA endonuclease system. Unlike CRISPR-Cas systems such as Cas12b or Cas9, the Cas12i-based CRISPR system does not require tracrRNA sequences.
  • the RNA guide includes a crRNA.
  • the crRNAs described herein include a direct repeat sequence and a spacer sequence.
  • the crRNA includes, consists essentially of, or consists of a direct repeat sequence linked to a guide sequence or spacer sequence.
  • the crRNA includes a direct repeat sequence, a spacer sequence, and a direct repeat sequence (DR-spacer-DR) , which is typical of precursor crRNA (pre-crRNA) configurations in other CRISPR systems.
  • the crRNA includes a truncated direct repeat sequence and a spacer sequence, which is typical of processed or mature crRNA.
  • the CRISPR-Cas effector protein forms a complex with the RNA guide, and the spacer sequence directs the complex to a sequence-specific binding with the target nucleic acid that is complementary to the spacer sequence.
  • the engineered Cas12i of the present application is an endonuclease, which binds to a specific site of a target sequence and cleaves under the guidance of a guide RNA, and has both DNA and RNA endonuclease activity.
  • the Cas12i is capable of autonomous crRNA biogenesis by processing of pre-crRNA arrays. Autonomous pre-crRNA processing facilitates Cas12i delivery for double nicking applications, as two separate genomic loci can be targeted from a single crRNA transcript.
  • the Cas12i protein then processes the CRISPR array into two cognate crRNAs that result in the formation of paired nicking complexes.
  • the guide RNA comprises a pre-crRNA expressed from a CRISPR array consisting of target sequences interleaved by unprocessed DR sequences, repeated to enable targeting of one, two, or more loci simultaneously by the intrinsic pre-crRNA processing of the effector.
  • the Type V-I CRISPR-Cas effector protein is capable of recognizing a protospacer adjacent motif (PAM)
  • the target nucleic acid includes or consists of a PAM including or consisting of the nucleic acid sequence 5′-TTN-3′ (SEQ ID NO: 21) or 5′-TTH-3′ (SEQ ID NO: 22) or 5′-TTY-3′ (SEQ ID NO: 23) or 5′-TTC-3′ (SEQ ID NO: 24) .
  • Cas12i nucleases from various organisms may be used as the reference Cas12i nuclease to provide engineered Cas12i effector proteins of the present application.
  • Exemplary Cas12i nucleases have been described, for example, in WO2019/201331A1 and US2020/0063126A1, which are incorporated herein by reference in their entirety.
  • the reference Cas12i protein is enzymatically active.
  • the reference Cas12i is a nuclease, i.e., cleaving both strands of a target duplex nucleic acid (e.g., duplex DNA) .
  • the reference Cas12i is a nickase, i.e., cleaving a single strand of a target duplex nucleic acid (e.g., duplex DNA) .
  • the reference Cas12i protein is enzymatically inactive.
  • the reference Cas12i nuclease is Cas12i1 (e.g., SEQ ID NO: 9) , Cas12i2 (e.g., SEQ ID NO: 8) , or Cas12i-Phi (e.g., SEQ ID NO: 10) .
  • the reference Cas12i nuclease comprises an amino acid sequence selected from the group consisting of SEQ ID NOs: 8-10.
  • Orthologues having a certain sequence identity e.g., at least about any one of 60%, 70%, 80%, 85%, 90%, 95%, 98%or higher
  • Cas12i or functional derivatives thereof may be used as basis to design the engineered Cas12i effector proteins of the present application.
  • the engineered Cas12i proteins are based on a functional variant of a naturally occurring Cas12i protein.
  • the functional variant has one or more mutations, such as amino acid substitutions, insertions and deletions.
  • the functional variant may comprise any one of 1, 2, 3, 4, 5, 6, 7, 8, 9, 10 or more amino acid substitutions compared to a wild type naturally occurring Cas12i protein.
  • the one or more substitutions are conservative substitutions.
  • the functional variant has all domains of a naturally occurring Cas12i protein. In some embodiments, the functional variant does not have one or more domains of a naturally occurring Cas12i protein.
  • engineered Cas12i effector proteins based on any one of the engineered Cas12i nucleases described herein.
  • the engineered Cas12i effector protein is enzymatically active.
  • the engineered Cas12i effector protein is a nuclease that cleaves both strands of a target duplex nucleic acid (e.g., duplex DNA) .
  • the engineered Cas12i effector protein is a nickase, i.e., cleaving a single strand of a target duplex nucleic acid (e.g., duplex DNA) .
  • the engineered Cas12i effector protein comprises an enzymatically inactive mutant of the engineered Cas12i nuclease. Mutations at one or more amino acid residues in the active site of a Cas12i nuclease can result in an enzymatically dead Cas12i.
  • the engineered Cas12i enzymes provided herein can be modified to have diminished nuclease activity, e.g., nuclease inactivation of at least 50%, at least 60%, at least 70%, at least 80%, at least 90%, at least 95%, at least 97%, or 100%as compared with the wild type Cas12i enzymes.
  • the nuclease activity can be diminished by several methods, e.g., introducing mutations into the nuclease or PAM interacting domains of the Cas12i enzymes.
  • catalytic residues for the nuclease activities are identified, and these amino acid residues can be substituted by different amino acid residues (e.g., glycine or alanine) to diminish the nuclease activity.
  • mutations for Cas12i1 include D647A or E894A or D948A.
  • Examples of such mutations for Cas12i2 include D599A or E833A or D886A.
  • an engineered Cas12i nickase there is provided an engineered Cas12i fusion effector protein, comprising an engineered Cas12i nuclease or functional derivative thereof (e.g., an enzymatically inactive mutant of the engineered Cas12i nuclease) fused to a functional domain, such as a translation initiator domain, a transcription repressor domain, a transactivation domain, an epigenetic modification domain, a nucleobase-editing domain (e.g., CBE or ABE domain) , a reverse transcriptase domain, a reporter domain (e.g., a fluorescent domain) or a nuclease domain.
  • a functional domain such as a translation initiator domain, a transcription repressor domain, a transactivation domain, an epigenetic modification domain, a nucleobase-editing domain (e.g., CBE or ABE domain) , a reverse transcriptase domain, a reporter domain (e.g
  • an engineered Cas12i base editor comprising a catalytically inactive variant of any one of the engineered Cas12i nucleases described herein (e.g., enCas12i2 or SEQ ID NO: 20) fused to a cytosine deaminase domain or an adenosine deaminase domain.
  • an engineered Cas12i prime editor comprising a catalytically inactive variant of any one of the engineered Cas12i nucleases described herein (e.g., enCas12i2 or SEQ ID NO: 20) fused to a reverse transcriptase domain.
  • an engineered split Cas12i effector protein there is provided.
  • engineered CRISPR-Cas12i systems comprising any one of the engineered Cas12i effector proteins described herein, and a guide RNA comprising a guide sequence complementary to a target sequence, or one or more nucleic acids encoding the guide RNA, wherein the engineered Cas12i effector protein and the guide RNA are capable of forming a CRISPR complex that specifically binds to a target nucleic acid comprising the target sequence and induces a modification of the target nucleic acid.
  • the guide RNA comprises a crRNA.
  • a tracrRNA is not required.
  • the guide RNA comprises a pre-crRNA expressed from a CRISPR array consisting of target sequences interleaved by unprocessed DR sequences, repeated to enable targeting of one, two, or more loci simultaneously by the intrinsic pre-crRNA processing of the effector.
  • the guide RNA is a precursor RNA array encoding a plurality of crRNAs.
  • the engineered Cas12i effector protein and/or the guide RNA are encoded by one or more vectors such as AAV vectors.
  • the engineered CRISPR-Cas12i system is a ribonucleoprotein (RNP) complex comprising the engineered Cas12i effector protein bound to the guide RNA.
  • RNP ribonucleoprotein
  • the present application provides engineered Cas9 effector proteins that have improved activity (e.g., target binding, double-strand cleavage activity, nickase activity, and/or gene-editing activity) .
  • an engineered Cas9 nuclease comprising one or more mutations that increase flexibility of a flexible region in a reference Cas9 nuclease that is selected from the group consisting of regions corresponding to amino acid residues 39-43, amino acid residues 135-139, amino acid residues 176-180, amino acid residues 274-278, amino acid residues 351-355, and amino acid residues 389-393, amino acid residues 521-525, amino acid residues 541-545, amino acid residues 755-759, amino acid residues 774-778, amino acid residues 786-790, amino acid residues 811-815, amino acid residues 848-852, amino acid residues 855-859, amino acid residues 874-878, amino acid residues 891-895, amino acid residues 1019-1023, and amino acid residues 1036-1040, wherein the amino acid residue numbering is based on SEQ ID NO: 25, and wherein the engineered Cas9 nuclease has an increased activity
  • the reference Cas9 nuclease is GeoCas9.
  • the flexible region in the reference Cas9 nuclease is selected from the group consisting of regions corresponding to amino acid residues 135-139, amino acid residues 176-180, amino acid residues 541-545, amino acid residues 755-759, and amino acid residues 811-815, wherein the amino acid residue numbering is based on SEQ ID NO: 25.
  • the one or more mutations comprise inserting one or more (e.g., 2) G residues in the flexible region.
  • the one or more G residues are inserted at the N-terminus of a flexible amino acid residue in the flexible region, wherein the flexible amino acid residue is selected from the group consisting of G, S, N, D, H, M, T, E, Q, K, R, A and P.
  • the flexible amino acid residue is chosen according to the preference: G>S>N>D>H>M>T>E>Q>K>R>A>P.
  • the one or more mutations comprise substituting a hydrophobic amino acid residue in a flexible region with a G residue, wherein the hydrophobic amino acid residues is selected from the group consisting of L, I, V, C, Y, F and W.
  • the engineered Cas9 nuclease comprises one or more amino acid sequences selected from the group consisting of SEQ ID NOs: 138-175. In some embodiments, the engineered Cas9 nuclease comprises one or more amino acid sequences selected from the group consisting of SEQ ID NOs: 139, 140, 145-146 and 153. In some embodiments, the engineered Cas9 nuclease comprises the amino acid sequence of having at least about 85% (e.g., at least about any one of 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, 99%or more) sequence identity to SEQ ID NO: 25.
  • an engineered GeoCas9 nuclease comprising the amino acid sequence of SEQ ID NO: 139.
  • the engineered Cas9 nuclease comprises the amino acid sequence of having at least about 85%(e.g., at least about any one of 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, 99%or more) sequence identity to SEQ ID NO: 27.
  • an engineered Cas9 nuclease comprising the amino acid sequence of SEQ ID NO: 27.
  • an engineered GeoCas9 nuclease comprising the amino acid sequence of SEQ ID NO: 140.
  • the engineered Cas9 nuclease comprises the amino acid sequence of having at least about 85%(e.g., at least about any one of 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, 99%or more) sequence identity to SEQ ID NO: 28.
  • an engineered Cas9 nuclease comprising the amino acid sequence of SEQ ID NO: 28.
  • an engineered GeoCas9 nuclease comprising the amino acid sequence of SEQ ID NO: 145.
  • the engineered Cas9 nuclease comprises the amino acid sequence of having at least about 85%(e.g., at least about any one of 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, 99%or more) sequence identity to SEQ ID NO: 33.
  • an engineered Cas9 nuclease comprising the amino acid sequence of SEQ ID NO: 33.
  • an engineered GeoCas9 nuclease comprising the amino acid sequence of SEQ ID NO: 146.
  • the engineered Cas9 nuclease comprises the amino acid sequence of having at least about 85%(e.g., at least about any one of 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, 99%or more) sequence identity to SEQ ID NO: 34.
  • an engineered Cas9 nuclease comprising the amino acid sequence of SEQ ID NO: 34.
  • an engineered GeoCas9 nuclease comprising the amino acid sequence of SEQ ID NO: 153.
  • the engineered Cas9 nuclease comprises the amino acid sequence of having at least about 85%(e.g., at least about any one of 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, 99%or more) sequence identity to SEQ ID NO: 41.
  • an engineered Cas9 nuclease comprising the amino acid sequence of SEQ ID NO: 41.
  • an engineered Cas9 nuclease comprising one or more mutations that increase flexibility of a flexible region in a reference Cas9 nuclease that is selected from the group consisting of regions corresponding to amino acid residues 45-49, amino acid residues 84-88, amino acid residues 116-120, amino acid residues 128-132, amino acid residues 216-220, and amino acid residues 318-322, amino acid residues 387-391, amino acid residues 497-501, amino acid residues 583-587, amino acid residues 594-598, amino acid residues 614-618, amino acid residues 696-700, and amino acid residues 739-743, wherein the amino acid residue numbering is based on SEQ ID NO: 53, and wherein the engineered Cas9 nuclease has an increased activity compared to the reference Cas9 nuclease.
  • the flexible region in the reference Cas9 nuclease corresponds to amino acid residues 45-49, or amino acid residues 116-120, wherein the amino acid residue numbering is based on SEQ ID NO: 53.
  • the reference Cas9 nuclease is SaCas9.
  • the one or more mutations comprise inserting one or more (e.g., 2) G residues in the flexible region.
  • the one or more G residues are inserted at the N-terminus of a flexible amino acid residue in the flexible region, wherein the flexible amino acid residue is selected from the group consisting of G, S, N, D, H, M, T, E, Q, K, R, A and P.
  • the flexible amino acid residue is chosen according to the preference: G>S>N>D>H>M>T>E>Q>K>R>A>P.
  • the one or more mutations comprise substituting a hydrophobic amino acid residue in a flexible region with a G residue, wherein the hydrophobic amino acid residues is selected from the group consisting of L, I, V, C, Y, F and W.
  • the engineered Cas9 nuclease comprises one or more amino acid sequences selected from the group consisting of SEQ ID NOs: 199-217.
  • the engineered Cas9 nuclease comprises one or more amino acid sequences selected from the group consisting of SEQ ID NOs: 199, 203 and 204. In some embodiments, the engineered Cas9 nuclease comprises the amino acid sequence of having at least about 85%(e.g., at least about any one of 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, 99%or more) sequence identity to SEQ ID NO: 53.
  • an engineered SaCas9 nuclease comprising the amino acid sequence of SEQ ID NO: 199.
  • the engineered Cas9 nuclease comprises the amino acid sequence of having at least about 85%(e.g., at least about any one of 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, 99%or more) sequence identity to SEQ ID NO: 54.
  • an engineered Cas9 nuclease comprising the amino acid sequence of SEQ ID NO: 54.
  • an engineered SaCas9 nuclease comprising the amino acid sequence of SEQ ID NO: 203.
  • the engineered Cas9 nuclease comprises the amino acid sequence of having at least about 85%(e.g., at least about any one of 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, 99%or more) sequence identity to SEQ ID NO: 58.
  • an engineered Cas9 nuclease comprising the amino acid sequence of SEQ ID NO: 58.
  • an engineered SaCas9 nuclease comprising the amino acid sequence of SEQ ID NO: 204.
  • the engineered Cas9 nuclease comprises the amino acid sequence of having at least about 85%(e.g., at least about any one of 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, 99%or more) sequence identity to SEQ ID NO: 59.
  • an engineered Cas9 nuclease comprising the amino acid sequence of SEQ ID NO: 59.
  • the Type II CRISPR-Cas9 system is a dual-RNA-guided (i.e., crRNA and tracrRNA) DNA endonuclease system.
  • the mature crRNA that is base-paired to trans-activating crRNA (tracrRNA) forms a two-RNA structure that directs the CRISPR-associated protein Cas9 to introduce double-stranded (ds) breaks in target DNA.
  • ds double-stranded
  • the dual-tracrRNA when engineered as a single RNA chimera, also directs sequence-specific Cas9 dsDNA cleavage.
  • the DNA-targeting RNA (also referred to herein as “crRNA” ; or “guide RNA” ; or “gRNA” ) comprises: i) a first segment comprising a nucleotide sequence that is complementary to a target sequence in a target DNA; ii) a second segment that interacts with a site-directed polypeptide; and iii) a transcriptional terminator.
  • Cas9 nucleases from various organisms may be used as the reference Cas9 nuclease to provide engineered Cas9 effector proteins of the present application.
  • Exemplary Cas9 proteins have been described, for example, in US8697359, US10266850, and US20170145425, which are incorporated herein by reference in their entirety.
  • the engineered Cas9 effector protein is based on a reference Cas9 protein (e.g., Cas9 nuclease) selected from Cas9 proteins from Streptococcus pneumoniae (Csn1) , Streptococcus pyogenes (SpCas9) or Streptococcus thermophiles (StCas9) , Staphylococcus aureus (SaCas9) , Neisseria meningitides (Nm2Cas9) , Campylobacter jejuni (CjCas9) , Geobacillus stearothermophilus (GeoCas9) , and Treponema denticola (TdCas9) , and may include mutated Cas9 derived from these organisms.
  • Cas9 nuclease selected from Cas9 proteins from Streptococcus pneumoniae (Csn1) , Streptococcus pyogenes (SpCa
  • the reference Cas9 protein may have desirable properties for certain applications, such as targeting thermophiles.
  • GeoCas9 is active at temperatures up to 70°C, compared to 45 °C for Streptococcus pyogenes Cas9 (SpCas9) .
  • the reference Cas9 nuclease comprises an amino acid sequence selected from the group consisting of SEQ ID NOs: 25, 53, or 72-73.
  • Orthologues having a certain sequence identity e.g., at least about any one of 60%, 70%, 80%, 85%, 90%, 95%, 98%or higher
  • Cas9 or functional derivatives thereof may be used as basis to design the engineered Cas9 effector proteins of the present application.
  • Naturally occurring Cas9 nucleases have various structural domains.
  • a reference Cas9 nuclease comprises a domain architecture with a central HNH endonuclease domain and a split RuvC/RNaseH domain.
  • a reference Cas9 nuclease comprises 4 key motifs with a conserved architecture. Motifs 1, 2, and 4 are RuvC like motifs while motif 3 is an HNH-motif. Domain boundaries may be determined using known methods in the art, such as based on crystal structures of a reference Cas9 nucleases (e.g., PDB ID Nos: 5CZZ, 4OGC, 5X2G, 6JOO) .
  • engineered Cas9 effector proteins based on any one of the engineered Cas9 nucleases described herein.
  • the engineered Cas9 effector protein is enzymatically active.
  • the engineered Cas9 effector protein is a nuclease that cleaves both strands of a target duplex nucleic acid (e.g., duplex DNA) .
  • the engineered Cas9 effector protein is a nickase, i.e., cleaving a single strand of a target duplex nucleic acid (e.g., duplex DNA) .
  • an aspartate-to-alanine substitution (D10A) in the RuvC I catalytic domain of Cas9 from S. pyogenes converts Cas9 from a nuclease that cleaves both strands to a nickase (cleaves a single strand) .
  • Other examples of mutations that render Cas9 a nickase include, without limitation, H840A, N854A, and N863A.
  • the engineered Cas9 effector protein comprises an enzymatically inactive mutant of the engineered Cas9 nuclease.
  • Mutations at one or more amino acid residues in the active site of a Cas9 nuclease can result in an enzymatically dead Cas9.
  • two or more catalytic domains of Cas9 may be mutated to produce a mutated Cas9 substantially lacking all DNA cleavage activity.
  • a D10A mutation is combined with one or more of H840A, N854A, or N863A mutations to produce a Cas9 enzyme substantially lacking all DNA cleavage activity.
  • a CRISPR enzyme is considered to substantially lack all DNA cleavage activity when the DNA cleavage activity of the mutated enzyme is less than about 25%, 10%, 5%, 1%, 0.1%, 0.01%, or lower with respect to its non-mutated form.
  • Other mutations may be useful; where the Cas9 or other CRISPR enzyme is from a species other than S. pyogenes, mutations in corresponding amino acids may be made to achieve similar effects.
  • an engineered Cas9 nickase there is provided an engineered Cas9 nickase.
  • an engineered Cas9 fusion effector protein comprising an engineered Cas9 nuclease or functional derivative thereof (e.g., an enzymatically inactive mutant of the engineered Cas9 nuclease) fused to a functional domain, such as a translation initiator domain, a transcription repressor domain, a transactivation domain, an epigenetic modification domain, a nucleobase-editing domain (e.g., CBE or ABE domain) , a reverse transcriptase domain, a reporter domain (e.g., a fluorescent domain) or a nuclease domain.
  • a functional domain such as a translation initiator domain, a transcription repressor domain, a transactivation domain, an epigenetic modification domain, a nucleobase-editing domain (e.g., CBE or ABE domain) , a reverse transcriptase domain
  • engineered CRISPR-Cas9 systems comprising any one of the engineered Cas9 effector proteins described herein, and a guide RNA comprising a guide sequence complementary to a target sequence, or one or more nucleic acids encoding the guide RNA, wherein the engineered Cas9 effector protein and the guide RNA are capable of forming a CRISPR complex that specifically binds to a target nucleic acid comprising the target sequence and induces a modification of the target nucleic acid.
  • the guide RNA comprises a crRNA and a tracrRNA.
  • the guide RNA is a sgRNA.
  • the guide RNA is a precursor that can be processed into a crRNA and a tracrRNA. In some embodiments, the guide RNA is a precursor RNA array encoding a plurality of crRNAs, and wherein each processed crRNA is associated with a tracrRNA.
  • the engineered Cas9 effector protein and/or the guide RNA are encoded by one or more vectors such as AAV vectors.
  • the engineered CRISPR-Cas9 system is a ribonucleoprotein (RNP) complex comprising the engineered Cas9 effector protein bound to the guide RNA.
  • RNP ribonucleoprotein
  • the present application provides engineered Cas effector proteins (e.g., Cas nucleases, Cas nickases, Cas fusion effector proteins, split Cas effector proteins) that have improved activity (e.g., target binding, double-strand cleavage activity, nickase activity, and/or gene-editing activity) .
  • engineered Cas effector proteins e.g., Cas nucleases, Cas nickases, Cas fusion effector proteins, split Cas effector proteins
  • improved activity e.g., target binding, double-strand cleavage activity, nickase activity, and/or gene-editing activity
  • Suitable reference Cas proteins for engineering may include, for example, Cas1, Cas1B, Cas2, Cas3, Cas4, Cas5, Cas6, Cas7, Cas8, Cas9, Csn1, Csx12, Cas1O, Csy1, Csy2, Csy3, Cse1, Cse2, Csc1, Csc2, Csa5, Csn2, Csm2, Csm3, Csm4, Csm5, Csm6, Cmr1, Cmr3, Cmr4, Cmr5, Cmr6, Csb1, Csb2, Csb3, Csx17, Csx14, Csx1O, Csx16, CsaX, Csx3, Csx1, Csx15, Csf1, Csf2, Csf3, Csf4, Cpf1 (also known as Cas12a) , CasX (Cas12e) , Cas12c, Cas12d, Cas12g, Cas12k, Cas
  • the reference Cas nuclease is a Cas12a, Cas12b (such as any one of the reference proteins described in the “Engineered Cas12b effector proteins” subsection) , Cas9 (such as any one of the reference proteins described in the “Engineered Cas9 effector proteins” subsection) , Cas12i (such as any one of the reference proteins described in the “Engineered Cas12i effector proteins” subsection) , Cas12f, Cas12j, or CasX (Cas12e) .
  • the reference protein is a Cas12a protein of a Type V-A CRISPR-Cas system (previously known as Cpf1) .
  • Type V-A systems do not require tracrRNA, allowing for simplified guide RNA design.
  • Cas12a (Cpf1) nucleases from various organisms may be used as the reference Cas12a nuclease to provide engineered Cas12a effector proteins of the present application.
  • Exemplary Cas12a nucleases have been described, for example, in US10648020; US10669540; US9790490; US20180282713; and WO2018188571, which are incorporated herein by reference in their entirety.
  • the engineered Cas12a effector protein is based on a reference Cas12a protein (e.g., Cas12a nuclease) selected from Cas12a proteins from Prevotella and Francisella, such as Francisella novicida (FnCas12a or FnCpf1) , or from Acidaminococcus (AsCas12a or AsCpf1) or Lachnospiraceae bacterium (LbCas12a or LbCpf1p) .
  • the reference Cas12a nuclease comprises an amino acid sequence selected from the group consisting of SEQ ID NOs: 74-75.
  • Orthologues having a certain sequence identity e.g., at least about any one of 60%, 70%, 80%, 85%, 90%, 95%, 98%or higher
  • Cas12a or functional derivatives thereof may be used as basis to design the engineered Cas12a effector proteins of the present application.
  • a crystal structure of Acidaminococcus Cas12a-crRNA-target DNA complex have been described in Yamano T, Nishimasu H, Zetsche B, et al. Crystal structure of Cpf1 in complex with guide RNA and target DNA. Cell. 2016; 165: 949-962 and is available with PDB code 5B43. Crystal structures are also available for LbCas12a (LbCpf1) (e.g., PDB codes 5XUU, 5XH6, and 5XUT) .
  • the engineered Cas12a effector protein comprises one or more mutations that reduce or eliminate a nuclease activity.
  • suitable mutated amino acid positions in the FnCpf1 RuvC domain include but are not limited to D917A, E1006A, E1028A, D1227A, D1255A, N1257A, D917A, E1006A, E1028A, D1227A, D1255A and N1257A.
  • the point mutations to substantially reduce nuclease activity include mutations in a putative second nuclease domain such as N580A, N584A, T587A, W609A, D610A, K613A, E614A, D616A, K624A, D625A, K627A and Y629A.
  • the mutation in the AsCpf1p RuvC domain is D908A, E993A, and D1263A, wherein the D908A, E993A, and D1263A mutations completely inactivates the DNA cleavage activity of the AsCpf1 effector protein.
  • the mutation in the LbCpf1p RuvC domain include but are not limited to 832, 947 or 1180. In some embodiments, embodiment, the mutation in the LbCpf1 RuvC domain is LbD832A, E925A, D947A or D1180A, wherein the LbD832A E925A, D947A or D1180A mutations completely inactivates the DNA cleavage activity of the Cas12a protein. Mutations in other engineered Cas12a at positions corresponding to those described for FnCpf1 and LbCpf1 are contemplated herein.
  • the reference protein is a CasX (also known as Cas12e) protein.
  • CasX nucleases from various organisms may be used as the reference CasX nuclease to provide engineered CasX effector proteins of the present application.
  • the engineered CasX protein of the subject methods and/or compositions is (or is derived from) a naturally occurring (wild type) protein.
  • Exemplary CasX nucleases have been described, for example, in US10570415, WO2018/202800, and WO2019/084148, which are incorporated herein by reference in their entirety.
  • the engineered CasX effector protein is based on a reference CasX protein (e.g., CasX nuclease) selected from CasX proteins from Deltaproteobacter (DpbCasX or CasX1) and Plantomycetes (PlmCasX or CasX2) .
  • a reference CasX protein e.g., CasX nuclease selected from CasX proteins from Deltaproteobacter (DpbCasX or CasX1) and Plantomycetes (PlmCasX or CasX2) .
  • the reference CasX nuclease comprises an amino acid sequence selected from the group consisting of SEQ ID NOs: 76-77.
  • Orthologues having a certain sequence identity e.g., at least about any one of 60%, 70%, 80%, 85%, 90%, 95%, 98%or higher
  • CasX or functional derivatives thereof may be used as basis to design the engineered CasX effector proteins of the present application.
  • CasX proteins are short compared to previously identified CRISPR-Cas endonucleases, and thus use of this protein as an alternative provides the advantage that the nucleotide sequence encoding the protein is relatively short. This is useful, for example, in cases where a nucleic acid encoding the engineered CasX protein is desirable, e.g., in situations that employ a viral vector (e.g., an AAV vector) , for delivery to a cell such as a eukaryotic cell (e.g., mammalian cell, human cell, mouse cell, in vitro, ex vivo, in vivo) for research and/or clinical applications.
  • a viral vector e.g., an AAV vector
  • CasX is expected to be able to function well at low temperatures (e.g., 10-14°C, 10-17°C, 10-20°C) .
  • CasX and Cas12a perform similar functions in CasX and Cas12a enzymes: after non-target DNA strand cleavage by the RuvC domain, they bend the sgRNA-DNA duplex. This conformational change allows the target DNA strand to be cleaved by the RuvC domain.
  • both Cas12e and Cas12a rely on a single nuclease domain for double stranded DNA cleavage, in contrast to Cas9, which uses distinct domains, HNH and RuvC, to cleave each DNA strand.
  • Cas12e and Cas12a a large structural change alters accessibility of DNA strands for the RuvC nuclease and in this way compensates the lack of the second nuclease domain.
  • Cas12a and Cas12X generate products with staggered ends.
  • Cas9 proteins mainly produce blunt ends.
  • Jun-Jie Liu et al. found that DpbCasX produces staggered ends about 10-nucleotides long, which is longer than 3-5nt overhangs usually produced by Cas12a proteins.
  • the 5′-overhangs produced by Cas12a and CasX potentially can be used for in vivo or in vitro insertion of DNA fragments into genome through direct DNA ligation.
  • CasX enzymes require tracrRNA in addition to crRNA for DNA target recognition.
  • the reference Cas nuclease is a Cas12f.
  • Exemplary Cas12f nucleases have been disclosed, for example, in WO2020088450.
  • the reference Cas nuclease is a Cas12i.
  • Exemplary Cas12i nucleases have been disclosed, for example, in WO2020098772.
  • the reference Cas nuclease is a Cas12j (e.g., SEQ ID NO: 78) .
  • the present application provides engineered Cas effector proteins which comprise functional variants of the engineered Cas nucleases described herein.
  • the functional variant has an amino acid sequence that is different by at least one amino acid residue (e.g., has a deletion, insertion, substitution, and/or fusion) when compared to the amino acid sequence of the corresponding engineered Cas nuclease.
  • the functional variant has one or more mutations, such as amino acid substitutions, insertions and/or deletions.
  • the functional variant may comprise any one of 1, 2, 3, 4, 5, 6, 7, 8, 9, 10 or more amino acid substitutions compared to an engineered Cas nuclease.
  • the one or more substitutions are conservative substitutions.
  • the functional variant has all domains of an engineered Cas nuclease. In some embodiments, the functional variant does not have one or more domains of an engineered Cas nuclease.
  • the Cas variant can include a Cas protein sequence with the same parameters described above (e.g., domains that are present, percent identity, and the like) .
  • the functional variant has different catalytic activity compared to its non-mutated form of the engineered Cas nuclease.
  • the mutations e.g., amino acid substitutions, insertions, and/or deletions
  • the variant comprises mutations in multiple catalytic domains.
  • a Cas effector protein that cleaves one strand but not the other of a double stranded target nucleic acid is referred to herein as a “nickase” (e.g., a “nickase Cas” ) .
  • a Cas protein that has substantially no nuclease activity is referred to herein as a dead Cas protein ( “dCas” ) (with the caveat that nuclease activity can be provided by a heterologous polypeptide-afusion partner-in the case of a fusion Cas effector protein, which is described in more detail below) .
  • a Cas effector protein is considered to substantially lack all DNA cleavage activity when the DNA cleavage activity of the mutated enzyme is less than about 25%, 10%, 5%, 1%, 0.1%, 0.01%, or lower with respect to its non-mutated form.
  • Exemplary mutations in Cas functional variants are described in the Cas12b, Cas12i, and Cas9 subsections above, and in WO2016205764, WO2020/087631, WO2019/201331A1, US2020/0063126A1, US8697359, US10266850, US20170145425, US10648020, US10669540, US9790490, US20180282713, WO2018188571, US10570415, WO2018/202800, and WO2019/084148, which are herein incorporated by reference in their entirety.
  • the present application also provides split Cas effector proteins based on any one of the engineered Cas effector proteins described herein.
  • the split Cas effector proteins may be advantageous for delivery.
  • the engineered Cas effector proteins are split to two parts of the enzymes, which can be reconstituted together to provide a substantially functioning Cas effector protein.
  • Split versions of Cas effector proteins e.g., Cas12 and Cas9 proteins have been described, for example, in WO2016/112242; WO2016/205749; and PCT/CN2020/111057, which are herein incorporated by reference in their entirety.
  • a split Cas effector protein comprising a first polypeptide comprising an N-terminal portion of any one of the engineered Cas nucleases described herein or functional derivative thereof, and a second polypeptide comprising a C-terminal portion of the engineered Cas nuclease or functional derivative thereof, wherein the first polypeptide and the second polypeptide are capable of associating with each other in the presence of a guide RNA comprising a guide sequence to form a CRISPR complex that specifically binds to a target nucleic acid comprising a target sequence complementary to the guide sequence.
  • the first polypeptide and the second polypeptide each comprises a dimerization domain.
  • the first dimerization domain and the second dimerization domain associate with each other in the presence of an inducer (e.g., rapamycin) .
  • the first polypeptide and the second polypeptide do not comprise dimerization domains.
  • the split Cas effector protein is auto-inducing.
  • the split can be done in a way that the catalytic domain (s) are unaffected.
  • the Cas effector proteins may function as a nuclease (including a nickase) or may be inactivated enzymes, which are essentially RNA-guided DNA-binding proteins with very little or no catalytic activity (e.g., due to mutation (s) in their catalytic domains) .
  • the nuclease lobe and a-helical lobe of a Cas protein are expressed as separate polypeptides. Although the lobes do not interact on their own, the RNA guide recruits them into a complex that recapitulates the activity of full-length Cas enzymes and catalyzes site-specific DNA cleavage.
  • a modified RNA guide may be used to abrogate split-enzyme activity by preventing dimerization, allowing for the development of an inducible dimerization system.
  • the split enzyme is described, e.g., in Wright, Addison V., et al. “Rational design of a split-Cas9 enzyme complex, ” Proc. Nat′l. Acad. Sci., 112.10 (2015) : 2984-2989, which is incorporated herein by reference in its entirety.
  • the split Cas effector protein portions described herein can be designed by dividing (i.e., splitting) a reference engineered Cas effector protein (e.g., a full-length engineered Cas12b, Cas12i, Cas9, Cas12a, or CasX effector protein or a functional variant thereof) into two halves at a split position, which is the point at which the N-terminal portion of the reference Cas effector protein is separated from the C-terminal portion.
  • the N-terminal portion comprises amino acid residues 1 to X
  • the C-terminal portion comprises amino acid residues X+i to the C-terminus end of the reference Cas effector protein.
  • the numbering is contiguous, but this may not always be necessary as amino acids (or the nucleotides encoding them) could be trimmed from the end of either one of the split ends, and/or mutations (e.g., insertions, deletions and substitutions) at internal regions of the polypeptide chain (s) are also contemplated, provided that sufficient DNA binding activity and, if required, DNA nickase or cleavage activity, of the reconstituted Cas effector protein is retained, for example at least 40%, 50%, 60%, 70%, 80%, 90%or 95%activity compared to the reference Cas effector protein.
  • the split point may be designed in silico and cloned into the constructs. During this process, mutations can be introduced to the split Cas effector protein and non-functional domains can be removed.
  • the two parts or fragments of the split Cas effector protein i.e., the N-terminal and C-terminal fragments
  • the split Cas effector proteins may each comprise one or more dimerization domains.
  • the first polypeptide comprises a first dimerization domain fused to the first split Cas effector portion
  • the second polypeptide comprises a second dimerization domain fused to the second split Cas effector portion.
  • the dimerization domain may be fused to the split Cas effector portion via a peptide linker (e.g., a flexible peptide linker such as a GS linker) or a chemical bond.
  • the dimerization domain is fused to the N-terminus of the split Cas effector portion.
  • the dimerization domain is fused to the C-terminus of the split Cas effector portion.
  • the split Cas effector proteins do not comprise any dimerization domains.
  • the dimerization domains promotes association of the two split Cas effector portions.
  • the split Cas effector portions are induced to associate or dimerize into a functional Cas effector protein by an inducer.
  • the split Cas effector proteins comprise inducible dimerization domains.
  • the dimerization domains are not inducible dimerization domains, i.e., the dimerization domains dimerize without the presence of an inducer.
  • An inducer may be an inducing energy source or an inducing molecule other than a guide RNA (e.g., a sgRNA) .
  • the inducer acts to reconstitute two split Cas effector portions into a functional Cas effector protein via induced dimerization of the dimerization domains.
  • the inducer brings the two split Cas effector portions together through the action of induced association of the inducible dimerization domains.
  • the two split Cas effector portions do not associate with each other to reconstitute into a functional Cas effector protein.
  • the two split Cas effector portions may associate with each other to reconstitute into a functional Cas effector protein in the presence of a guide RNA (e.g., a sgRNA) .
  • a guide RNA e.g., a sgRNA
  • the inducer of the present application may be heat, ultrasound, electromagnetic energy or a chemical compound.
  • the inducer is an antibiotic, a small molecule, a hormone, a hormone derivative, a steroid or a steroid derivative.
  • the inducer is abscisic acid (ABA) , doxycycline (DOX) , cumate, rapamycin, 4-hydroxytamoxifen (4OHT) , estrogen or ecdysone.
  • the split Cas effector system is an inducer-controlled system selected from the group consisting of antibiotic based inducible systems, electromagnetic energy based inducible systems, small molecule based inducible systems, nuclear receptor based inducible systems and hormone based inducible systems.
  • the split Cas effector system is an inducer-controlled system is selected from the group consisting of tetracycline (Tet) /DOX inducible systems, light inducible systems, ABA inducible systems, cumate repressor/operator systems, 4OHT/estrogen inducible systems, ecdysone-based inducible systems and FKBP12/FRAP (FKBP12-rapamycin complex) inducible systems.
  • the pair of split Cas effector proteins are separate and inactive until induced dimerization of the dimerization domains (e.g., FRB and FKBP) , which results in reassembly of a functional Cas effector nuclease.
  • the first split Cas effector protein comprising a first half of an inducible dimer e.g., FRB
  • the second split Cas effector protein comprising a second half of an inducible dimer (e.g., FKBP) .
  • FKBP-based inducible systems that may be used in inducer-controlled split Cas effector systems described herein include, but are not limited to, FKBP which dimerizes with CalcineurinA (CNA) , in the presence of FK506; FKBP which dimerizes with CyP-Fas, in the presence of FKCsA; FKBP which dimerizes with FRB, in the presence of Rapamycin; GyrB which dimerizes with GryB, in the presence of Coumermycin; GAI which dimerizes with GID 1, in the presence of Gibberellin; or Snap-tag which dimerizes with HaloTag, in the presence of HaXS.
  • CNA CalcineurinA
  • FKBP which homodimerizes (i.e., one FKBP dimerizes with another FKBP) in the presence of FK1012.
  • the dimerization domain is FKBP and the inducer is FK1012. In some embodiments, the dimerization domain is GryB and the inducer is coumermycin. In some embodiments, the dimerization domain is ABA and the inducer is Gibberellin.
  • the split Cas effector portions may be auto-induced (i.e., auto-activated or self-induced) to associate/dimerize into a functional Cas effector protein without the presence of an inducer.
  • auto-induction of the split Cas effector portions may be mediated by binding to a guide RNA, such as sgRNA.
  • the first polypeptide and the second polypeptide do not comprise dimerization domains.
  • the first polypeptide and the second polypeptide comprise dimerization domains.
  • the reconstituted Cas effector protein of the split Cas effector systems described herein has an editing efficiency of at least 70% (such as at least about any of 70%, 75%, 80%, 85%, 90%, 95%, 98%, 99%or more efficiency, or 100%efficiency) of the editing efficiency of the reference Cas effector protein.
  • the reconstituted Cas effector protein of an inducer-controlled split Cas effector systems described herein has an editing efficiency of no more than 50% (such as no more than about any of 50%, 45%, 40%, 35%, 30%, 25%, 20%, 15%, 10%, 5%, or less efficiency, or 0%efficiency) without the presence of an inducer (i.e., due to auto-induction) of the editing efficiency of the reference Cas effector protein.
  • the present application also provides engineered Cas effector proteins comprising additional protein domains and/or components, such as linkers, nuclear localization/exportation sequences, functional domains, and/or reporter proteins.
  • the engineered Cas effector protein is a protein complex comprising one or more heterologous protein domains (e.g., about or more than about 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, or more domains) in addition to the nucleic acid-targeting domains of the engineered Cas nuclease or functional derivative thereof.
  • the engineered Cas effector protein is a fusion protein comprising one or more heterologous protein domains (e.g. about or more than about 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, or more domains) fused to the engineered Cas nuclease.
  • the engineered Cas effector proteins of the present application can comprise (e.g., via fusion protein, such as via one or more peptide linkers, for example, GS peptide linkers, etc. ) or be associated (e.g., via co-expression of multiple proteins) with one or more functional domains.
  • the one or more functional domains are enzymatic domains.
  • These functional domains can have various activities, e.g., DNA and/or RNA methylase activity, demethylase activity, transcription activation activity, transcription repression activity, transcription release factor activity, histone modification activity, RNA cleavage activity, DNA cleavage activity, nucleic acid binding activity, and switch activity (e.g., light inducible) .
  • the one or more functional domains are transcriptional activation domains (i.e., transactivation domains) or repressor domains.
  • the one or more functional domains are histone-modifying domains.
  • the one or more functional domains are transposase domains, HR (Homologous Recombination) machinery domains, recombinase domains, and/or integrase domains.
  • the functional domains are Krüppel associated box (KRAB) , VP64, VP16, Fok1, P65, HSF1, MyoD1, biotin-APEX, APOBEC1, AID, PmCDA1, Tad1, and M-MLV reverse transcriptase.
  • KRAB Krüppel associated box
  • the functional domain is selected from the group consisting of a translation initiator domain, a transcription repressor domain, a transactivation domain, an epigenetic modification domain, a nucleobase-editing domain (e.g., CBE or ABE domain) , a reverse transcriptase domain, a reporter domain (e.g., a fluorescent domain) and a nuclease domain.
  • the positioning of the one or more functional domains in the engineered Cas effector proteins allows for correct spatial orientation for the functional domains to affect the target with the attributed functional effects.
  • the functional domain is a transcription activator (e.g., VP16, VP64, or p65)
  • the transcription activator is placed in a spatial orientation that allows it to affect the transcription of the target.
  • a transcription repressor is positioned to affect the transcription of the target
  • a nuclease e.g., Fok1
  • the functional domain is positioned at the N-terminus of the engineered Cas effector protein.
  • the functional domain is positioned at the C-terminus of the engineered Cas effector protein.
  • the engineered Cas effector protein comprises a first functional domain at the N-terminus and a second functional domain at the C-terminus.
  • the engineered Cas effector protein comprises a catalytically inactive mutant of any one of the engineered Cas nucleases described herein fused to one or more functional domains.
  • the engineered Cas effector protein is a transcriptional activator.
  • the engineered Cas effector protein comprises an enzymatically inactive variant of any one of the engineered Cas nucleases described herein fused to a transactivation domain.
  • the transactivation domain is selected from the group consisting of VP64, p65, HSF1, VP16, MyoD1, HSF1, RTA, SET7/9, and combinations thereof.
  • the transactivation domain comprises VP64, p65 and HSF1.
  • the engineered Cas effector protein comprises two split Cas effector polypeptides, each fused to a transactivation domain.
  • the engineered Cas effector protein is a transcriptional repressor.
  • the engineered Cas effector protein comprises an enzymatically inactive variant of any one of the engineered Cas nucleases described herein fused to a transcription repressor domain.
  • the transcription repressor domain is selected from the group consisting of Krüppel associated box (KRAB) , EnR, NuE, NcoR, SID, SID4X, and combinations thereof.
  • the engineered Cas effector protein comprises two split Cas effector polypeptides, each fused to a transcription repressor domain.
  • the engineered Cas effector protein is a base editor, such as a cytosine editor or an adenosine editor.
  • the engineered Cas effector protein comprises an enzymatically inactive variant of any one of the engineered Cas nucleases described herein fused to a nucleobase-editing domain, such as a cytosine base editing (CBE) domain or an adenosine base editing (ABE) domain.
  • the nucleobase-editing domain is a DNA-editing domain.
  • the nucleobase-editing domain has deaminase activity.
  • the nucleobase-editing domain is a cytosine deaminase domain. In some embodiments, the nucleobase- editing domain is an adenosine deaminase domain.
  • Exemplary base editors based on Cas nucleases have been described, for example, in WO2018/165629A1 and WO2019/226953A1, which are incorporated herein by reference in their entirety.
  • Exemplary CBE domains include, but are not limited to, activation-induced cytidine deaminase or AID (e.g., hAID) , apolipoprotein B mRNA-editing complex or APOBEC (e.g., rat APOBEC1, hAPOBEC3 A/B/C/D/E/F/G) and PmCDA1.
  • AID activation-induced cytidine deaminase
  • APOBEC e.g., rat APOBEC1, hAPOBEC3 A/B/C/D/E/F/G
  • PmCDA1 e.g., PmCDA1.
  • Exemplary ABE domains include, but are not limited to, TadA, ABE8 and variants thereof (see, e.g., Gaudelli et al., 2017, Nature 551: 464-471; and Richter et al., 2020, Nature Biotechnology 38: 883-891) .
  • the functional domain is an APOBEC1 domain, e.g., a rat APOBEC1 domain comprising the amino acid sequence of SEQ ID NO: 218.
  • the functional domain is a TadA domain, e.g., an E. coli TadA domain comprising the amino acid sequence of SEQ ID NO: 219.
  • the engineered Cas effector protein further comprises one or more nuclear localization sequences.
  • the engineered Cas effector protein is a prime editor. Prime editors based on Cas9 have been described, for example, in A. Anzalone et al., Nature, 2019, 576 (7785) : 149-157, which is incorporated herein by reference in its entirety.
  • the engineered Cas effector protein comprises a nickase variant of any one of the engineered Cas nucleases described herein fused to a reverse transcriptase domain.
  • the functional domain is a reverse transcriptase domain.
  • the reverse transcriptase domains is an M-MLV reverse transcriptase, or a variant thereof, e.g., M-MLV reverse transcriptase having one or more mutations of D200N, T306K, W313F, T330P and L603W.
  • the reverse transcriptase domain comprises the amino acid sequence of SEQ ID NO: 220 or 221.
  • an engineered CRISPR/Cas system comprising the prime editor.
  • the engineered CRISPR/Cas system further comprises a second Cas nickase, e.g., based on the same engineered Cas nuclease as the prime editor.
  • the engineered CRISPR/Cas system comprises a prime editor guide RNA (pegRNA) , which comprises a primer binding site and a reverse transcriptase (RT) template sequence.
  • pegRNA prime editor guide RNA
  • RT reverse transcriptase
  • the present application provides a split Cas effector system having one or more (e.g., 1, 2, 3, 4, 5, 6, or more) functional domains associated with (i.e., bound to or fused to) one or both split Cas effector portions.
  • the functional domain (s) may be provided as part of the first and/or second split Cas effector proteins, as fusions within that construct.
  • the functional domains are typically fused to other parts in the split Cas effector proteins (e.g., split Cas effector portions) via a peptide linker, such as GS linker.
  • the functional domains can be used to repurpose the function of the split Cas effector system based on a catalytically dead Cas effector.
  • the engineered Cas effector proteins comprise one or more nuclear localization sequences (NLSs) and/or one or more nuclear exportation sequences (NESs) .
  • NLS sequences include, for example, PKKKRKVPG (SEQ ID NO: 79) and ASPKKKRKV (SEQ ID NO: 80) .
  • the NLS (s) and/or NES (s) may be operably linked to the N-terminus and/or the C-terminus of the engineered Cas effector proteins or polypeptide chains in the engineered Cas effector proteins.
  • the engineered Cas effector proteins may encode additional components, such as reporter proteins.
  • the engineered Cas effector protein comprises a fluorescent protein, e.g., GFP.
  • GFP fluorescent protein
  • the engineered Cas effector protein is an inducible split Cas effector system that can be used to image genomic loci.
  • engineered CRISPR-Cas systems comprising: (a) any one of the engineered Cas effector proteins (e.g., engineered Cas nuclease, nickase, split Cas, transcriptional repressor, transcriptional activator, base editor, or prime editor) described herein; and (b) a guide RNA comprising a guide sequence complementary to a target sequence, or one or more nucleic acids encoding the guide RNA, wherein the engineered Cas effector protein and the guide RNA are capable of forming a CRISPR complex that specifically binds to a target nucleic acid comprising the target sequence and induces a modification of the target nucleic acid.
  • engineered Cas effector proteins e.g., engineered Cas nuclease, nickase, split Cas, transcriptional repressor, transcriptional activator, base editor, or prime editor
  • a guide RNA comprising a guide sequence complementary to a target sequence, or one or more nucleic acids
  • the engineered CRISPR-Cas system comprises one or more nucleic acids encoding the engineered Cas effector protein and/or the guide RNA.
  • the engineered CRISPR-Cas system comprises a precursor guide RNA array that can be processed, e.g., by the engineered Cas effector protein, into a plurality of crRNAs.
  • the engineered CRISPR-Cas system comprises one or more vectors encoding the engineered Cas effector protein and/or the guide RNA.
  • the engineered CRISPR-Cas system comprises a ribonucleoprotein (RNP) complex comprising the engineered Cas effector protein bound to the guide RNA.
  • RNP ribonucleoprotein
  • the engineered CRISPR-Cas systems of the present application may comprise any suitable guide RNAs.
  • a guide RNA may comprise a guide sequence capable of hybridizing to a target sequence in a target nucleic acid of interest, such as a genomic locus of interest in a cell.
  • the gRNA comprises a CRISPR RNA (crRNA) sequence comprising the guide sequence.
  • the gRNA comprises a trans-activating CRISPR RNA (tracrRNA) sequence.
  • the guide RNA is a single-guide RNA (sgRNA) .
  • the sgRNA comprises a tracrRNA and a crRNA.
  • the CRISPR-Cas systems provided herein do not require tracrRNA sequences (e.g., CRISPR-Cas12i or CRISPR-Cas12a systems) .
  • the guide RNA comprises a crRNA.
  • the crRNAs described herein include a direct repeat sequence and a spacer sequence.
  • the crRNA includes, consists essentially of, or consists of a direct repeat sequence linked to a guide sequence or spacer sequence.
  • the crRNA includes a direct repeat sequence, a spacer sequence, and a direct repeat sequence (DR-spacer-DR) , which is typical of precursor crRNA (pre-crRNA) configurations.
  • the crRNA includes a truncated direct repeat sequence and a spacer sequence, which is typical of processed or mature crRNA.
  • the CRISPR-Cas effector protein forms a complex with the RNA guide, and the spacer sequence directs the complex to a sequence-specific binding with the target nucleic acid that is complementary to the spacer sequence.
  • the guide RNA comprises a crRNA and a scoutRNA.
  • the guide RNA is a crRNA comprising the guide sequence.
  • the engineered CRISPR-Cas system comprises a precursor guide RNA array encoding a plurality of crRNAs.
  • the Cas effector protein cleaves the precursor guide RNA array to produce a plurality of crRNAs.
  • the engineered CRISPR-Cas system comprises a precursor guide RNA array encoding a plurality of crRNA, wherein each crRNA comprises a different guide sequence.
  • the crRNA encoded by the precursor guide RNA array is associated with a tracrRNA or scoutRNA.
  • the guide sequence may have a suitable length. In some embodiments, the guide sequence is between about 18 to about 35 nucleotides, including, for example, any one of 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34 or 35 nucleotides.
  • the guide sequence may have at least about 70%, 75%, 80%, 85%, 90%, 95%, 98%, 99%, or 100%complementarity to a target sequence of the target nucleic acid.
  • constructs, vectors and expression systems encoding any one of the engineered Cas effector proteins described herein.
  • the construct, vector, or expression system further comprises one or more gRNAs (e.g., sgRNAs) or crRNA arrays.
  • a “vector” is a composition of matter which comprises an isolated nucleic acid and which can be used to deliver the isolated nucleic acid to the interior of a cell.
  • vectors are known in the art including, but not limited to, linear polynucleotides, polynucleotides associated with ionic or amphiphilic compounds, plasmids, and viruses.
  • a suitable vector contains an origin of replication functional in at least one organism, a promoter sequence, convenient restriction endonuclease sites, and one or more selectable markers.
  • the term “vector” should also be construed to include non-plasmid and non-viral compounds, which facilitate transfer of nucleic acid into cells, such as, for example, polylysine compounds, liposomes, and the like.
  • the vector is a viral vector.
  • viral vectors include, but are not limited to, adenoviral vectors, adeno-associated virus vectors, lentiviral vector, retroviral vectors, vaccinia vector, herpes simplex viral vector, and derivatives thereof.
  • the vector is a phage vector. Viral vector technology is well known in the art and is described, for example, in Sambrook et al. (2001, Molecular Cloning: A Laboratory Manual, Cold Spring Harbor Laboratory, New York) , and in other virology and molecular biology manuals.
  • retroviruses provide a convenient platform for gene delivery systems.
  • the heterologous nucleic acid can be inserted into a vector and packaged in retroviral particles using techniques known in the art.
  • the recombinant virus can then be isolated and delivered to the engineered mammalian cell in vitro or ex vivo.
  • retroviral systems are known in the art.
  • adenovirus vectors are used.
  • a number of adenovirus vectors are known in the art.
  • lentivirus vectors are used.
  • self-inactivating lentiviral vectors are used.
  • the vector is an adeno-associated viruses (AAV) vector, e.g., AAV2, AAV8, or AAV9, which can be administered in a single dose containing at least 1 ⁇ 10 5 particles (also referred to as particle units, pu) of adenoviruses or adeno-associated viruses.
  • AAV adeno-associated viruses
  • the dose is at least about 1 ⁇ 10 6 particles, at least about 1 ⁇ 10 7 particles, at least about 1 ⁇ 10 8 particles, or at least about 1 ⁇ 10 9 particles of the adeno- associated viruses.
  • the delivery methods and the doses are described, e.g., in WO 2016205764 and U.S. Pat. No. 8,454,972, both of which are incorporated herein by reference in their entirety.
  • the vector is a recombinant adeno-associated virus (rAAV) vector.
  • a modified AAV vector may be used for delivery.
  • Modified AAV vectors can be based on one or more of several capsid types, including AAV1, AV2, AAV5, AAV6, AAV8, AAV8.2. AAV9, AAV rh1O, modified AAV vectors (e.g., modified AAV2, modified AAV3, modified AAV6) and pseudotyped AAV (e.g., AAV2/8, AAV2/5 and AAV2/6) .
  • Exemplary AAV vectors and techniques that may be used to produce rAAV particles are known in the art (see, e.g., Aponte-Ubillus et al. (2016) Appl. Microbiol. Biotechnol. 102 (3) : 1045-54; Zhong et al. (2012) J. Genet. Syndr. Gene Ther. S1: 008; West et al. (1987) Virology 160: 38-47 (1987) ; Tratschin et al. (1985) Mol. Cell. Biol. 5: 3251-60) ; U.S. Pat. Nos. 4,797,368 and 5,173,414; and International Publication Nos. WO 2015/054653 and WO 93/24641, each of which is incorporated by reference) .
  • Any one of the known AAV vectors for delivering Cas9 and other Cas proteins may be used for delivery of the engineered Cas systems of the present application.
  • vectors can be transferred into a host cell by physical, chemical, or biological methods.
  • Biological methods for introducing the heterologous nucleic acid into a host cell include the use of DNA and RNA vectors.
  • Viral vectors have become the most widely used method for inserting genes into mammalian, e.g., human cells.
  • Chemical means for introducing the vector into a host cell include colloidal dispersion systems, such as macromolecule complexes, nanocapsules, microspheres, beads, and lipid-based systems including oil-in-water emulsions, micelles, mixed micelles, and liposomes.
  • An exemplary colloidal system for use as a delivery vehicle in vitro is a liposome (e.g., an artificial membrane vesicle) .
  • the engineered CRISPR-Cas system is delivered as an RNP in a nanoparticle.
  • the vector (s) or expression system encoding the CRISPR-Cas systems or components thereof comprise one or more selectable or detectable markers that provide a means to isolate or efficiently select cells that contain and/or have been modified by the CRISPR-Cas system, e.g., at an early stage and on a large scale.
  • Reporter genes may be used for identifying potentially transfected cells and for evaluating the functionality of regulatory sequences.
  • a reporter gene is a gene that is not present in or expressed by the recipient organism or tissue and that encodes a polypeptide whose expression is manifested by some easily detectable property, e.g., enzymatic activity. Expression of the reporter gene is assayed at a suitable time after the DNA has been introduced into the recipient cells.
  • Suitable reporter genes may include genes encoding luciferase, beta-galactosidase, chloramphenicol acetyl transferase, secreted alkaline phosphatase, or the green fluorescent protein gene (e.g., Ui-Tei et al. FEBS Letters 479: 79-82 (2000) ) .
  • heterologous nucleic acid in a host cell, includes, for example, molecular biological assays well known to those of skill in the art, such as Southern and Northern blotting, RT-PCR and PCR; biochemical assays, such as detecting the presence or absence of a particular peptide, e.g., by immunological methods (such as ELISAs and Western blots) .
  • molecular biological assays well known to those of skill in the art, such as Southern and Northern blotting, RT-PCR and PCR
  • biochemical assays such as detecting the presence or absence of a particular peptide, e.g., by immunological methods (such as ELISAs and Western blots) .
  • the nucleic acid sequences encoding the encoding the engineered Cas effector protein (s) and/or the guide RNA are operably linked to a promoter.
  • the promoter is an endogenous promoter with respect to a cell that is engineered using the engineered CRISPR-Cas system.
  • the nucleic acid encoding the engineered Cas effector protein may be knocked-in to the genome of an engineered mammalian cell downstream of an endogenous promoter using any methods known in the art.
  • the endogenous promoter is a promoter for an abundant protein, such as beta-actin.
  • the endogenous promoter is an inducible promoter, for example, inducible by an endogenous activation signal of an engineered mammalian cell.
  • the promoter is a T cell activation-dependent promoter (such as an IL-2 promoter, an NFAT promoter, or an NF ⁇ B promoter) .
  • the promoter is a heterologous promoter with respect to a cell that is engineered using the engineered CRISPR-Cas system.
  • Varieties of promoters have been explored for gene expression in mammalian cells, and any of the promoters known in the art may be used in the present application. Promoters may be roughly categorized as constitutive promoters or regulated promoters, such as inducible promoters.
  • the nucleic acid sequences encoding the engineered Cas effector protein and/or the guide RNA are operably linked to a constitutive promoter.
  • Constitutive promoters allow heterologous genes (also referred to as transgenes) to be expressed constitutively in the host cells.
  • Exemplary constitutive promoters contemplated herein include, but are not limited to, Cytomegalovirus (CMV) promoters, human elongation factors-1alpha (hEF1 ⁇ ) , ubiquitin C promoter (UbiC) , phosphoglycerokinase promoter (PGK) , simian virus 40 early promoter (SV40) , and chicken ⁇ -Actin promoter coupled with CMV early enhancer (CAG) .
  • CMV Cytomegalovirus
  • hEF1 ⁇ human elongation factors-1alpha
  • UbiC ubiquitin C promoter
  • PGK phosphoglycerokinase promoter
  • SV40 simian virus 40 early promoter
  • CAG CMV early enhancer
  • the promoter is a CAG promoter comprising a cytomegalovirus (CMV) early enhancer element, the promoter, the first exon and the first intron of chicken beta-actin gene, and the splice acceptor of the rabbit beta-globin gene.
  • CMV cytomegalovirus
  • the nucleic acid sequences encoding the engineered CRISPR-Cas protein (s) and/or the guide RNA are operably linked to an inducible promoter.
  • Inducible promoters belong to the category of regulated promoters.
  • the inducible promoter can be induced by one or more conditions, such as a physical condition, microenvironment, or the physiological state of a host cell, an inducer (i.e., an inducing agent) , or a combination thereof.
  • the inducing condition is selected from the group consisting off an inducer, irradiation (such as ionizing radiation, light) , temperature (such as heat) , redox state, tumor environment, and the activation state of a cell to be engineered by the engineered CRISPR-Cas system.
  • the promoter is inducible by a small molecule inducer, such as a chemical compound.
  • the small molecule is selected from the group consisting of doxycycline, tetracycline, alcohol, metal, or steroids. Chemically-induced promoters have been most widely explored.
  • Such promoters includes promoters whose transcriptional activity is regulated by the presence or absence of a small molecule chemical, such as doxycycline, tetracycline, alcohol, steroids, metal and other compounds.
  • Doxycycline-inducible system with reverse tetracycline-controlled transactivator (rtTA) and tetracycline-responsive element promoter (TRE) is the most mature system at present.
  • rtTA reverse tetracycline-controlled transactivator
  • TRE tetracycline-responsive element promoter
  • WO9429442 describes the tight control of gene expression in eukaryotic cells by tetracycline responsive promoters.
  • WO9601313 discloses tetracycline-regulated transcriptional modulators.
  • Tet technology such as the Tet-on system, has described, for example, on the website of TetSystems. com. Any of the known chemically regulated promoters may be used to drive expression of the encoding the engineered CRISPR
  • the nucleic acid sequence encoding the engineered Cas effector protein (e.g., enCas12i2) is codon optimized.
  • an expression construct comprising the codon optimized sequence encoding the engineered Cas effector protein ligated into a BPK2104-ccdB vector.
  • the expression construct encodes a tag (e.g., a 10xHis tag) operably linked to the C terminus of the engineered Cas effector protein.
  • each engineered split Cas constructs encodes a flourescent protein, such as GFP or RFP.
  • the reporter proteins may be used to assess co-localization and/or dimerization of the engineered split Cas proteins, e.g., using microscopy.
  • a nucleic acid sequence encoding an engineered Cas effector protein may be fused to a nucleic acid sequence encoding an additional component using a sequence encoding a self-cleaving peptide, such as a T2A, P2A, E2A or F2A peptide.
  • an expression construct for mammalian cells comprising a nucleic acid sequence encoding the engineered Cas effector protein.
  • the expression construct comprises the codon-optimized sequence encoding the engineered Cas effector protein inserted into a pCAG-2A-eGFP vector, such that the Cas protein is operably linked to eGFP.
  • a second vector is provided for expression of a guide RNA (e.g., an sgRNA, crRNA, or pre-crRNA array) in mammalian cells (e.g., human cells) .
  • the sequence encoding the guide RNA is expressed in a pUC19-U6-i2-cr RNA vector backbone.
  • An exemplary two-vector expression system is shown in Fig. 1.
  • One aspect of the present application provides methods of using the any one of the engineered Cas effector proteins or CRISPR-Cas systems described herein for detecting a target nucleic acid or modifying a nucleic acid in vitro, ex vivo, or in vivo, as well as methods of treatment or diagnosis using the engineered Cas effector proteins or CRISPR-Cas systems.
  • engineered Cas effector proteins or CRISPR-Cas systems described herein for detecting or modifying a nucleic acid in a cell, and for treating or diagnosing a disease or condition in a subject; and compositions comprising any one of the engineered Cas effector proteins or one or more components of the engineered CRISPR-Cas systems for use in the manufacture of a medicament for detecting or modifying a nucleic acid in a cell, and for treating or diagnosing a disease or condition in a subject.
  • One aspect of the present application provides methods of cleaving target nucleic acids and genome-editing in mammalian cells (e.g., human cells) using Cas12i, including wildtype or engineered Cas12i effector proteins.
  • the present application provides a method of modifying a target sequence in a target nucleic acid, comprising contacting the target nucleic acid with an engineered CRISPR-Cas system at a temperature of about 40°C to about 67°C, wherein the engineered CRISPR-Cas system comprises: (a) a Cas12i effector protein comprising a Cas12i nuclease or a functional derivative thereof; and (b) a crRNA comprising a guide sequence that is complementary to the target sequence, wherein the Cas12i effector protein and the crRNA are capable of forming a CRISPR complex that specifically binds to a target nucleic acid comprising the target sequence and induces a modification of the target nucleic acid.
  • the engineered CRISPR-Cas system comprises: (a) a Cas12i effector protein comprising a Cas12i nuclease or a functional derivative thereof; and (b) a crRNA comprising a guide sequence that is complementary to the target
  • the Cas12i effector protein is a Cas12i2 nuclease or a functional derivative thereof.
  • the method is carried out at an elevated temperature, e.g., at a temperature of about any one of 40, 42, 44, 46, 48, 50, 52, 54, 56, 58, 60, 62, 64, 66, or 67 °C.
  • the method is carried out at a temperature of about 40°C to 50°C, 50°C to 60°C, 45°C to 55°C, 55°C to 65°C, 40°C to 60°C, or 50°C to 67°C.
  • the Cas12i effector protein has non-specific single-strand RNA cleavage activity.
  • the method of using a Cas12i effector protein described herein is carried out at a temperature of4°C to about 40°C, such as about any one of 4-10, 10-20, 20-30, 30-40, 15-37, 4-20, or 20-40°C.
  • the present application further provides engineered crRNAs that improves the gene-editing efficacy of Cas12i nucleases (e.g., Cas12i1 nuclease) .
  • the engineered crRNA increases the gene-editing activity of a Cas12i2 nuclease in a human cell by at least about 20% (e.g., at least about 30%, 40%, 50%, 60%, 70%, 80%, or more) compared to a crRNA comprising endogenous repeat sequences (e.g., SEQ ID NO: 171) corresponding to the Cas12i2 nuclease.
  • an engineered crRNA comprising a substitution of one or more Uridine (U) residue with a non-U residue in a repeat sequence comprising at least four U residues.
  • an engineered precursor guide RNA array encoding a plurality of the engineered crRNAs described herein.
  • the engineered crRNA comprises a spacer sequence of about 17 to 25 (e.g., any one of 17, 18, 19, 20, 21, 22, 23, 24, or 25) nucleotides long. In some embodiments, the engineered crRNA comprises a spacer sequence of about 20 nucleotides long.
  • the engineered crRNA comprises a repeat sequence comprising the nucleic acid sequence of SEQ ID NO: 173.
  • an engineered CRISPR-Cas system comprising: (a) a Cas12i effector protein comprising a Cas12i1 nuclease or a functional derivative thereof; and (b) a crRNA comprising a substitution of one or more Uridine (U) residues with a non-U residue in a repeat sequence comprising at least four U residues and a guide sequence complementary to a target sequence; wherein the Cas12i effector protein and the crRNA are capable of forming a CRISPR complex that specifically binds to a target nucleic acid comprising the target sequence and induces a modification of the target nucleic acid.
  • a method of modifying a target sequence in a target nucleic acid using the engineered CRISPR-Cas system comprising: (a) a Cas12i effector protein comprising a Cas12i1 nuclease or a functional derivative thereof; and (b) a crRNA comprising a substitution of one or more Uridine (U) residues with
  • Methods of using a Cas12i including wildtype or engineered Cas12i effector proteins) and/or the engineered crRNAs described herein to modify a target nucleic acid in a mammalian cell, methods of treatment, methods of detection, etc. according to any one of the methods described in this section IV. “Methods of use” are also contemplated.
  • the present application provides a method of modifying a target nucleic acid comprising a target sequence, comprising contacting the target nucleic acid with any one of the engineered CRISPR-Cas systems described herein.
  • the method is carried out in vitro.
  • the target nucleic acid is present in a cell.
  • the cell is a bacterial cell, a yeast cell, a mammalian cell, a plant cell, or an animal cell.
  • the method is carried out ex vivo. In some embodiments, the method is carried out in vivo.
  • the target nucleic acid is cleaved or the target sequence in the target nucleic acid is altered by the engineered CRISPR-Cas system. In some embodiments, expression of the target nucleic acid is altered by the engineered CRISPR-Cas system. In some embodiments, the target nucleic acid is a genomic DNA. In some embodiments, the target sequence is associated with a disease or condition. In some embodiments, the engineered CRISPR-Cas system comprises a precursor guide RNA array encoding a plurality of crRNA, wherein each crRNA comprises a different guide sequence.
  • the method is carried out at a temperature of about 4°C to about 67°C, such as about any one of 12°C-67°C, 4°C-25°C, 25 °C to about 37 °C, about 37 °C to about 45 °C, about 45 °C to about 50 °C, about 50 °C to about 60 °C, or about 40 °C to about 67 °C.
  • the method is carried out at a low temperature, such as about 4°C to about 12°C.
  • the method is carried out at a high temperature, such as about 40 °C to about 67 °C.
  • the present application provides a method of treating a disease or condition associated with a target nucleic acid in cells of an individual, comprising modifying the target nucleic acid in the cells of the individual using any one of the methods described herein, thereby treating the disease or condition.
  • the disease or condition is selected from the group consisting of cancer, cardiovascular diseases, hereditary diseases, autoimmune diseases, metabolic diseases, neurodegenerative diseases, ocular diseases, bacterial infections and viral infections.
  • the engineered CRISPR-Cas systems described herein can modify a target nucleic acid in a cell in a variety of ways, depending on the types of engineered Cas effector protein in the CRISPR-Cas system.
  • the method induces a site-specific cleavage in the target nucleic acid.
  • the method cleaves a genomic DNA in a cell, such as a bacterial cell, a plant cell, or an animal cell (e.g., a mammalian cell) .
  • the method kills a cell by cleaving a genomic DNA in the cell.
  • the method cleaves a viral nucleic acid in a cell.
  • the method alters (such as increase or decrease) the expression level of the target nucleic acid in the cell.
  • the method increases the expression level of the target nucleic acid in the cell, e.g., using an engineered Cas effector protein based on an enzymatically inactive Cas protein fused to a transactivation domain (s) .
  • the method reduces the expression level of the target nucleic acid in the cell, e.g., using an engineered Cas effector protein based on an enzymatically inactive Cas protein fused to a transcription repressor domain (s) .
  • the method introduces epigenetic modifications to the target nucleic acid in the cell, e.g., using an engineered Cas effector protein based on an enzymatically inactive Cas protein fused to epigenetic modification domains.
  • the engineered Cas systems described herein may be used to introduce other modifications to the target nucleic acid, depending on the functional domains comprised by the engineered Cas effector proteins.
  • the method alters a target sequence in the target nucleic acid in the cell. In some embodiments, the method introduces a mutation to the target nucleic acid in the cell. In some embodiments, the method uses one or more endogenous DNA repair pathways, such as Non-homologous end joining (NHEJ) or Homology directed recombination (HDR) , in the cell to repair a double-strand break induced in a target DNA as a result of sequence-specific cleavage by the CRISPR complex. Exemplary mutations include, but are not limited to, insertions, deletions, substitutions, and frameshifts. In some embodiments, the method inserts a donor DNA at the target locus.
  • NHEJ Non-homologous end joining
  • HDR Homology directed recombination
  • Exemplary mutations include, but are not limited to, insertions, deletions, substitutions, and frameshifts. In some embodiments, the method inserts a donor DNA at the target locus.
  • the insertion of the donor DNA results in introduction of a selection marker or a reporter protein to the cell. In some embodiments, the insertion of the donor DNA results in knock-in of a gene. In some embodiments, the insertion of the donor DNA results in a knockout mutation. In some embodiments, the insertion of the donor DNA results in a substitution mutation, such as a single nucleotide substitution. In some embodiments, the method induces a phenotypic change to the cell.
  • the engineered CRISPR-Cas system is used a part of a genetic circuit, or for inserting a genetic circuit into the genomic DNA of a cell.
  • the inducer-controlled engineered split Cas effector proteins described herein may be especially useful as a component of a genetic circuit.
  • Genetic circuits can be useful for gene therapy. Methods and techniques of designing and using genetic circuits are known in the art. Further reference may be made to, for example, Brophy, Jennifer AN, and Christopher A. Voigt. "Principles of genetic circuit design. " Nature methods 11.5 (2014) : 508.
  • the target nucleic acid is in a cell.
  • the target nucleic acid is a genomic DNA.
  • the target nucleic acid is an extrachromosomal DNA.
  • the target nucleic acid is exogenous to a cell.
  • the target nucleic acid is a viral nucleic acid, such as viral DNA.
  • the target nucleic acid is a plasmid is a cell.
  • the target nucleic acid is a horizontally transferred plasmid.
  • the target nucleic acid is a RNA.
  • the target nucleic acid is an isolated nucleic acid, such as an isolated DNA. In some embodiments, the target nucleic acid is present in a cell-free environment. In some embodiments, the target nucleic acid is an isolated vector, such as a plasmid. In some embodiments, the target nucleic acid is an isolated linear DNA fragment.
  • the cell is a bacterium, a yeast cell, a fungal cell, an algal cell, a plant cell, or an animal cell. (e.g., a mammalian cell, such as a human cell) .
  • the cell is a cell isolated from natural sources, such as a tissue biopsy.
  • the cell is a cell isolated from an in vitro cultured cell line.
  • the cell is from a primary cell line.
  • the cell is from an immortalized cell line.
  • the cell is a genetically engineered cell.
  • the cell is an animal cell from an organism selected from the group consisting of cattle, sheep, goat, horse, pig, deer, chicken, duck, goose, rabbit, and fish.
  • the cell is a plant cell from an organism selected from the group consisting of maize, wheat, barley, oat, rice, soybean, oil palm, safflower, sesame, tobacco, flax, cotton, sunflower, pearl millet, foxtail millet, sorghum, canola, cannabis, a vegetable crop, a forage crop, an industrial crop, a woody crop, and a biomass crop.
  • the cell is a mammalian cell. In some embodiments, the cell is a human cell. In some embodiments, the human cell is a human embryonic kidney 293T (HEK293T or 293T) cell or a HeLa cell. In some embodiments, the cell is a human embryonic kidney (HEK293T) cell. In some embodiments, the mammalian the mammalian cell is selected from the group consisting of an immune cell, a hepatic cell, a tumor cell, a stem cell, a zygote, a muscle cell, and a skin cell.
  • the cell is an immune cell selected from the group consisting of a cytotoxic T cell, a helper T cell, a natural killer (NK) T cell, an iNK-T cell, an NK-T like cell, a ⁇ T cell, a tumor-infiltrating T cell and a dendritic cell (DC) -activated T cell.
  • the method produces a modified immune cell, such as a CAR-T cell or a TCR-T cell.
  • the cell is an embryonic stem (ES) cell, an induced pluripotent stem (iPS) cell, a progenitor cell of a gamete, a gamete, a zygote, or a cell in an embryo.
  • ES embryonic stem
  • iPS induced pluripotent stem
  • the methods described herein can be used to a modify a target cell in vivo, ex vivo or in vitro and, may be conducted in a manner that alters the cell such that once modified the progeny or cell line of the modified cell retains the altered phenotype.
  • the modified cells and progeny may be part of a multi-cellular organism such as a plant or animal with ex vivo or in vivo applications, such as genome editing and gene therapy.
  • the method is carried out ex vivo.
  • the modified cell e.g., mammalian cell
  • the modified cell is propagated ex vivo after introduction of the engineered CRISPR-Cas system into the cell.
  • the modified cell is cultured to propagate for at least about any of 1 day, 2 days, 3 days, 4 days, 5 days, 6 days, 7 days, 10 days, 12 days, or 14 days.
  • the modified cell is cultured for no more than about any of 1 day, 2 days, 3 days, 4 days, 5 days, 6 days, 7 days, 10 days, 12 days, or 14 days.
  • the modified cell is further evaluated or screened to select cells with one or more desirable phenotypes or properties.
  • the target sequence is a sequence associated with a disease or condition.
  • diseases or conditions include, but are not limited to, cancer, cardiovascular diseases, hereditary diseases, autoimmune diseases, metabolic diseases, neurodegenerative diseases, ocular diseases, bacterial infections and viral infections.
  • the disease or condition is a genetic disease.
  • the disease or condition is a monogenetic disease or condition.
  • the disease or condition is a polygenetic disease or condition.
  • the target sequence has a mutation compared to a wildtype sequence. In some embodiments, the target sequence has a single-nucleotide polymorphism (SNP) associated with a disease or condition.
  • SNP single-nucleotide polymorphism
  • the donor DNA that is inserted into the target nucleic acid encodes a biological product selected from the group consisting of a reporter protein, an antigen-specific receptor, a therapeutic protein, an antibiotic resistance protein, an RNAi molecule, a cytokine, a kinase, an antigen, an antigen-specific receptor, a cytokine receptor, and a suicide polypeptide.
  • the donor DNA encodes a therapeutic protein.
  • the donor DNA encodes a therapeutic protein useful for gene therapy.
  • the donor DNA encodes a therapeutic antibody.
  • the donor DNA encodes an engineered receptor, such as a chimeric antigen receptor (CAR) , or an engineered TCR.
  • the donor DNA encodes a therapeutic RNA, such as a small RNA (e.g., siRNA, shRNA, or miRNA) , or a long non-coding RNA (lincRNA) .
  • the methods described herein may be used for multiplex gene editing or regulation at two or more (e.g., 2, 3, 4, 5, 6, 8, 10 or more) different target loci.
  • the method detects or modifies a plurality of target nucleic acids or target nucleic acid sequences.
  • the method comprises contacting the target nucleic acid with a guide RNA comprises a plurality (e.g., 2, 3, 4, 5, 6, 8, 10 or more) of crRNA sequences, wherein each crRNA comprises a different target sequence.
  • engineered cells comprising a modified target nucleic acid, which are produced using any one of the methods described herein.
  • the engineered cells may be used for cell therapy.
  • Autologous or allogeneic cells may be used to prepare engineered cells using the methods described herein for cell therapy.
  • the methods described herein may also be used to generate isogenic lines of cells (e.g., mammalian cells) to study genetic variants.
  • engineered non-human animals comprising the engineered cells described herein.
  • the engineered non-human animals are genome-edited non-human animals.
  • the engineered non-human animals can be used as disease models.
  • Non-human genome-edited or transgenic animals include, but are not limited to, pronuclear microinjection, viral infection, and transformation of embryonic stem cells and induced pluripotent stem (iPS) cells.
  • iPS induced pluripotent stem
  • Detailed methods that can be used include, but are not limited to, those described in Sundberg and Ichiki (2006, Genetically Engineered Mice Handbook, CRC Press) and Gibson (2004, A Primer Of Genome Science 2nd ed. Sunderland, Mass.: Sinauer) .
  • the engineered animals may be of any suitable species, including, but not limited to, such as bovids, equids, ovids, canids, cervids, felids, goats, swine, primates as well as less commonly known mammals such as elephants, deer, zebra, or camels.
  • the present application provides a method of treating a disease or condition associated with a target nucleic acid in cells of an individual, comprising contacting the target nucleic acid with any one of the engineered CRISPR-Cas systems described herein, wherein the guide sequence of the guide RNA is complementary to a target sequence of the target nucleic acid, wherein the engineered Cas effector protein and the guide RNA associate with each other to bind to the target nucleic acid to modify the target nucleic acid, thereby the disease or condition is treated.
  • a mutation e.g., knockout or knock-in mutation
  • expression of the target nucleic acid is enhanced.
  • expression of the target nucleic acid is inhibited.
  • the present application provides a method of treating a disease or condition in an individual, comprising administering to the individual an effective amount of any one of the engineered CRISPR-Cas systems described herein, and a donor DNA encoding a therapeutic agent, wherein the guide sequence of the guide RNA is complementary to a target sequence of a target nucleic acid of the individual, wherein the engineered Cas effector protein and the guide RNA associate with each other to bind to the target nucleic acid and inserts the donor DNA in the target sequence, thereby the disease or condition is treated.
  • the present application provides a method of treating a disease or condition in an individual, comprising administering to the individual an effective amount of engineered cells comprising a modified target nucleic acid, wherein the engineered cells are prepared by contacting the cell with any one of the engineered CRISPR-Cas systems described herein, wherein the guide sequence of the guide RNA is complementary to a target sequence of the target nucleic acid, wherein the engineered Cas effector protein and the guide RNA associate with each other to bind to the target nucleic acid to modify the target nucleic acid.
  • the engineered cells are immune cells.
  • the individual is a human being. In some embodiments, the individual is an animal, e.g., a model animal such as a rodent, a pet, or a farm animal. In some embodiments, the individual is a mammal.
  • the disease or condition is selected from the group consisting of cancer, cardiovascular diseases, hereditary diseases, autoimmune diseases, metabolic diseases, neurodegenerative diseases, ocular diseases, bacterial infections and viral infections.
  • the target nucleic acid is PCSK9.
  • the disease or condition is a cardiovascular disease.
  • the disease or condition is a coronary artery disease.
  • the method reduces cholesterol levels in an individual.
  • the method treats diabetes in the individual.
  • the present application also provides methods of using any one of the engineered Cas effector proteins with improved activity or CRISPR-Cas systems for detection of a target nucleic acid.
  • the use of Cas effector proteins as detection agents takes advantage of the discovery that type V CRISPR/Cas proteins (e.g., Cas 12 proteins such as Cas12a, Cas12b, Cas12c, Cas12d, Cas12e (CasX) , and Cas12i) can promiscuously cleave non-targeted single stranded DNA (ssDNA) once activated by detection of a target DNA.
  • Cas 12 proteins e.g., Cas 12 proteins such as Cas12a, Cas12b, Cas12c, Cas12d, Cas12e (CasX) , and Cas12i
  • CassDNA non-targeted single stranded DNA
  • a type V CRISPR/Cas effector protein e.g., a Cas12 protein such as Cas12a, Cas12b, Cas12c, Cas12d, Cas12e (CasX) , or Cas12i
  • a guide RNA which occurs when a sample includes a target DNA to which the guide RNA hybridizes (i.e., the sample includes the targeted DNA)
  • the Cas effector protein becomes a nuclease that promiscuously cleaves single strand nucleic acids (e.g., non-target ssDNAs or RNAs, i.e., single strand nucleic acid to which the guide sequence of the guide RNA does not hybridize) .
  • the targeted DNA double or single stranded
  • the result is cleavage of single strand nucleic acids in the sample, which can be detected using any convenient detection method (e.g., using a labeled single stranded detector nucleic acid, such as DNA or RNA) .
  • Cas12i can cleave ssDNA and ssRNA.
  • a method of detecting a target DNA comprising: (a) contacting the sample with: (i) any one of the engineered type V CRISPR/Cas effector proteins described herein (e.g., a Cas12 protein such as Cas12a, Cas12b, Cas12c, Cas12d, Cas12e (CasX) , or Cas12i) ; (ii) a guide RNA comprising a guide sequence that hybridizes with the target DNA; and (iii) a detector nucleic acid that is single stranded (i.e., a “single stranded detector nucleic acid” ) and does not hybridize with the guide sequence of the guide RNA; and (b) measuring a detectable signal produced by cleavage of the single stranded detector nucleic acid by the engineered type V CRISPR/Cas
  • the single stranded detector nucleic acid includes a fluorescence-emitting dye pair (e.g., a fluorescence-emitting dye pair is a fluorescence resonance energy transfer (FRET) pair, a quencher/fluor pair) .
  • the target DNA is a viral DNA (e.g., papovavirus, hepadnavirus, herpesvirus, adenovirus, poxvirus, parvovirus, and the like) .
  • the single stranded detector nucleic acid is a DNA.
  • the single stranded detector nucleic acid is a RNA.
  • the engineered Cas effector protein is an engineered Cas12i nuclease, such as enCas12i2.
  • a method of the present disclosure for detecting a target DNA (single-stranded or double-stranded) in a sample can detect a target DNA with a high degree of sensitivity.
  • a method of the present disclosure can be used to detect a target DNA present in a sample comprising a plurality of DNAs (including the target DNA and a plurality of non-target DNAs) , where the target DNA is present at one or more copies per 10 7 non-target DNAs (e.g., one or more copies per 10 6 non-target DNAs, one or more copies per 10 5 non-target DNAs, one or more copies per 10 4 non-target DNAs, one or more copies per 10 3 non-target DNAs, one or more copies per 10 2 non-target DNAs, one or more copies per 50 non-target DNAs, one or more copies per 20 non-target DNAs, one or more copies per 10 non-target DNAs, or one or more copies per 5 non-target DNAs) .
  • the engineered Cas effector proteins described herein can detect a target DNA with a higher degree of sensitivity compared to the reference Cas nuclease. In some embodiments, the engineered Cas effector protein can detect a target DNA with 10%, 15%, 20%, 25%, 30%, 40%, 50%, 60%, 70%, 80%, 90%or higher sensitivity compared to the reference Cas nuclease.
  • the engineered CRISPR-Cas systems described herein, or components thereof, nucleic acid molecules thereof, or nucleic acid molecules encoding or providing components thereof can be delivered to host cells by various delivery systems such as plasmid or viral vectors (e.g., any one of the vectors described in the “Constructs and Vectors” subsection above) .
  • the engineered CRISPR-Cas systems can be delivered by other methods, such as nucleofection or electroporation of ribonucleoprotein complexes consisting of the engineered Cas effector proteins and their cognate RNA guide or guides.
  • the delivery is via nanoparticles or exosomes.
  • paired Cas nickase complexes can be delivered directly using nanoparticle or other direct protein delivery methods, such that complexes containing both paired crRNA elements are co-delivered.
  • protein can be delivered to cells by viral vector or directly, followed by the direct delivery of a CRISPR array containing two paired spacers for double nicking.
  • the RNA may be conjugated to at least one sugar moiety, such as N-acetyl galactosamine (GalNAc) (particularly, triantennary GalNAc) .
  • compositions, kits, unit dosages, and articles of manufacture comprising one or more components of any one of the engineered Cas effector proteins or engineered CRISPR-Cas systems described herein.
  • kits comprising: one or more AAV vectors encoding any one of the engineered Cas effector proteins or engineered CRISPR-Cas systems described herein.
  • the kit further comprises one or more guide RNAs.
  • the kit further comprises a donor DNA.
  • the kit further comprises a cell, such as a human cell.
  • kits may contain one or more additional components, such as containers, reagents, culturing media, cytokines, buffers, antibodies, and the like to allow propagation of an engineered cell.
  • additional components such as containers, reagents, culturing media, cytokines, buffers, antibodies, and the like to allow propagation of an engineered cell.
  • the kits may also contain a device for administration of the composition.
  • the kit may further comprise instructions for using the engineered CRISPR-Cas system described herein, such as methods of detecting or modifying a target nucleic acid.
  • the kit comprises instructions for treating or diagnosing a disease or condition.
  • the instructions relating to the use of the components of the kit generally include information as to dosage, dosing schedule, and route of administration for the intended treatment.
  • the containers may be unit doses, bulk packages (e.g., multi-dose packages) or sub-unit doses.
  • kits may be provided that contain sufficient dosages of the composition as disclosed herein to provide effective treatment of an individual for an extended period. Kits may also include multiple unit doses of the composition and instructions for use, packaged in quantities sufficient for storage and use in pharmacies, for example, hospital pharmacies and compounding pharmacies.
  • kits of the invention are in suitable packaging.
  • suitable packaging includes, but is not limited to, vials, bottles, jars, flexible packaging (e.g., sealed Mylar or plastic bags) , and the like. Kits may optionally provide additional components such as buffers and interpretative information.
  • the present application thus also provides articles of manufacture, which include vials (such as sealed vials) , bottles, jars, flexible packaging, and the like.
  • the article of manufacture can comprise a container and a label or package insert on or associated with the container.
  • Suitable containers include, for example, bottles, vials, syringes, etc.
  • the containers may be formed from a variety of materials such as glass or plastic.
  • the container holds a composition which is effective for treating a disease or disorder described herein, and may have a sterile access port (for example the container may be an intravenous solution bag or a vial having a stopper pierceable by a hypodermic injection needle) .
  • the label or package insert indicates that the composition is used for treating the particular condition in an individual.
  • the label or package insert will further comprise instructions for administering the composition to the individual.
  • Package insert refers to instructions customarily included in commercial packages of therapeutic products that contain information about the indications, usage, dosage, administration, contraindications and/or warnings concerning the use of such therapeutic products.
  • the article of manufacture may further comprise a second container comprising a pharmaceutically-acceptable buffer, such as bacteriostatic water for injection (BWFI) , phosphate-buffered saline, Ringer′ssolution and dextrose solution. It may further include other materials desirable from a commercial and user standpoint, including other buffers, diluents, filters, needles, and syringes.
  • a pharmaceutically-acceptable buffer such as bacteriostatic water for injection (BWFI) , phosphate-buffered saline, Ringer′ssolution and dextrose solution.
  • This example provides a strategy for designing Cas enzymes with enhanced conformational transition dynamics that result in better catalytic efficiency of Cas endonucleases.
  • the exemplary method provided herein allows engineering of Cas proteins that have improved activity, such as target binding, double-strand cleavage activity, nickase activity, and/or gene-editing activity.
  • Fig. 1 shows the pipeline used to design variants of SaCas9 as an example of the design workflow.
  • the DynaMine molecular dynamics predictor (Cilia et al. The DynaMine webserver: predicting protein dynamics from sequence. Nucleic Acids Res. 2014 Jul 1; 42 (Web Server issue) : W264-W270) . From the molecular dynamics predictions, we obtained the S2 order parameter score profile of the Cas protein.
  • the S2 order parameter can range from 0-1, with 1 corresponding to a rigid bond-vector and 0 corresponding to complete flexibility.
  • peak amino acids (peak aa) as amino acids whose score is the lowest compared with the scores of the 5 neighboring amino acids on either side (preceding or following the peak aa) . Based on this definition, we identified peak amino acids. We selected regions containing a peak amino acid and its neighboring 2 amino acids on either side (preceding or following the peak aa) as candidate flexible regions for engineering.
  • the most flexible aa was selected based on the following priority order G>S>N>D>H>M>T>E>Q>K>R>A>P.
  • the most flexible aa as the aa which is closer to the peak aa.
  • the resulting Cas variants were cloned and purified as described in the methods section below, and the cleavage efficiency of the variants was measured in human cells.
  • the coding sequence of BhCas12bv4, Cas12i2 and GeoCas9 were codon optimized (human) and synthesized.
  • the variants of Cas protein were created by PCR-based site-directed mutagenesis.
  • the coding sequences of Cas12i2 and its variants were ligated into a BPK2104-ccdB expression vector using T4 ligase, with the vector digested by XmaI and SpeI.
  • the fusion construct contains a fused 10xHis tag at the C terminal of the protein.
  • Cas effector proteins were expressed in human 293T cells by pCAG-2A-eGFP vector.
  • the DNA coding Cas proteins were inserted between XmaI and NheI.
  • the vector for expression of sgRNA or crRNA of BhCas12bv4, GeoCas9 and Cas12i2 in 293T were constructed by ligating annealed oligos containing targeted sequence into BasI-digested pUC19-U6-i2-crRNA scaffold.
  • the prokaryotic expression plasmid was transformed in E. coli strain BL21 ( ⁇ DE3) and the transformants were plated on solid LB with chloramphenicol (CmR) . Picked 3-5 clones in to 15ml liquid LB with CmR to cultivate overnight at 37°C. Then 2ml of the culture was transferred into 300ml liquid LB to cultivate until the OD600 achieved at 1.2, following induction with IPTG at 16 °C for 16 h. Cell pellets were resuspended with lysis buffer followed by sonication. The supernatant after centrifugation was kept for future purification. Target protein was obtained by one-step purification using Ni column. Target protein was sterilized by 0.22 ⁇ m filter before stored in aliquots. The concentration was determined by Bradford protein assay with BSA as standard.
  • PCR amplified dsDNA containing T7 promoter was used as in vitro transcription template to produce crRNA with HISRIBE TM T7 Quick High Yield RNA Synthesis Kit (NEB) .
  • the transcribed crRNA were purified using Oligo Clean &Concentrator (ZYMO TM Research) and quantitated on NANODROP TM 2000 (Thermo Fisher Scientific) .
  • HEK293T cells were cultured in DMEM (Gibco) with 1% Penicillin-Streptomycin (Gibco) and 10%fetal bovine serum (Gibco) . Cells were seeded in 24-cell dish (Corning) for 16h until the confluency reached 70%. 600ng of plasmid encoding Cas protein and 3000ng of plasmid encoding crRNA were transfected into each 24-cell by using Lipofectamine 3000 (Invitrogen) . Fluorescence-activated cell sorting (FACS) HEK293T cells were digested by Trypsin-EDTA (0.05%) (Gibco) after transfecting 68h. Cell sorting was using MoFlo XDP (Beckman Coulter) with GFP signal.
  • T7 endonuclease I T7EI assay and targeted deep sequencing analysis for genomic modifications
  • FACS-sorted GFP-positive 293FT cells were lysed with Buffer L and incubated at 55 °C for 3 h following 95 °C for 10 min.
  • the dsDNA fragments containing target sites in different genomic loci were PCR-amplified using the corresponding primer.
  • T7E 1 assay 200 ⁇ 400ng PCR products were used, adding ddH2O to a final volume of 10 ⁇ L. Then the mix was subjected to reannealing procedure to form heteroduplex dsDNA. Then, the mix was treated with 1/10 volume of NEBUFFER TM 2.1 and 0.2 ⁇ L T7EI (NEB) at 37 °C for 50 min.
  • the digested product was analyzed by ⁇ 3%agarose gel electrophoresis. Indels were calculated based on previous methods.
  • the PCR products containing mutation identified by T7E1 assay were cloned in TA-cloning vector, which then transformed into competent E. coli. Colonies were randomly picked and sent for Sanger-sequencing after overnight culture.
  • target sites were amplified by barcoded PCR directly using cell lysate as template.
  • the PCR products were purified and pooled into several libraries for high- throughput sequencing.
  • the indel (%) were analyzed by CRISPResso2 by calculating the ratio of reads containing insertions or deletions. The reads whose number were lower than 0.05%of the whole reads were discarded.
  • This example describes design and characterization of a Bacillus hisashii Cas12bv4 (BhCas12bv4) engineered variant with improved gene-editing activity in human cells using the design pipeline described in Example 1.
  • BhCas12bv4 has no solved crystal structure, but the highly homologous BthCas12b (>98%homology) has an available crystal structure, so the linker of BhCas12bv4 as shown in Table 1 is determined based on the structure and homology of BthCas122b.
  • the resulting BhCas12bv4 variant (enBhCas12bv4 1.1, set forth in SEQ ID NO: 2) was cloned and purified as described in Example 1.
  • the most dramatic improvement in editing efficiency was observed at genomic sites where wild type BhCas12bv4 had low editing efficiency, such as RNF2-5, CCR5-8, and CCR5-1 (Fig. 3) .
  • This example describes design and characterization of a Cas12i2 engineered variant with improved gene-editing activity in human cells using the design pipeline described in Example 1.
  • Cas12i2 has no known 3D structure and no homologue whose structure has been resolved. We thus sought to test whether our method, which does not require a resolved structure, could be used to engineer Cas12i2 variants with improved activity, such as target binding, double-strand cleavage activity, nickase activity, and/or gene-editing activity.
  • the selected flexible region amino acid sequences and modified amino acid sequences of engineered flexible variant regions are shown in Table 2 below.
  • the amino acid positions are based on SEQ ID NO: 8.
  • Cas12i2 variants were selected in the human genome (CCR5-3, CCR5-2, RNF2-7, and CCR5-8) .
  • the editing efficiency (%indels) of the Cas12i2 variants compared to wild type Cas12i2 was determined at each site.
  • Cas12i2-2.2 SEQ ID NO: 14
  • Cas12i2-6.1 SEQ ID NO: 18
  • variant 6.1 significantly improved editing efficiency at all four target sites (Fig. 6) .
  • Example 6 We also compared in vitro plasmid cleavage activity of enCas12i2 with wildtype Cas12i2. Detailed description of the in vitro cleavage assay can be found in Example 6. Briefly, the target plasmid was incubated with different concentration of RNP formed by Cas12i2 or enCas12i2 plus crRNA. The reaction was conducted at 37°C for 10min. As shown in FIG. 10, the purified enCas12i2 protein exhibited increased dsDNA cleavage activity at 37 °C than wild type Cas12i2 protein.
  • both wildtype Cas12i2 and the engineered Cas12i2 variants were capable of detecting double-strand DNA (i.e., the trans-XBP activator) containing the XBP target site.
  • the ⁇ Rn value corresponds to the level of detector nucleic acid cleavage.
  • the engineered Cas12i2 variants demonstrated higher detection activity than wildtype Cas12i2.
  • RNA-based fluorescent reporters containing rUs 5‘-6-FAM-UUUUU-BHQ1-3’
  • This example describes design and characterization of GeoCas9 engineered variants with improved gene-editing activity in human cells using the design pipeline described in Example 1.
  • the selected flexible region amino acid sequences and modified amino acid sequences of engineered flexible variant regions are shown in Table 3.
  • the amino acid positions are based on SEQ ID NO: 25.
  • the indel (%) were analyzed by CRISPResso2 by calculating the ratio of reads containing insertions or deletions. The reads whose number were lower than 0.05%of the whole reads were discarded.
  • the targeted deep sequencing results are shown in Fig. 14.
  • Engineered GeoCas9 with significantly improved editing efficiency include 2.1, 3.1, 8.1, 9.1 and 12.1.
  • HEK293T cells were cultured in DMEM (Gibco) with 1% Penicillin-Streptomycin (Gibco) and 10%fetal bovine serum (Gibco) . Cells were seeded in 24-cell dish (Corning) for 16h until the confluency reached 70%. 600ng of plasmid encoding Cas protein and 3000ng of plasmid encoding crRNA were transfected into each 24-cell by using Lipofectamine 3000 (Invitrogen) . Fluorescence-activated cell sorting (FACS) HEK293T cells were digested by Trypsin-EDTA (0.05%) (Gibco) after transfecting 68h. Cell sorting was using MoFlo XDP (Beckman Coulter) with GFP signal.
  • FACS-sorted GFP-positive 293FT cells were lysed with Buffer L and incubated at 55 °C for 3 h following 95 °C for 10 min.
  • the dsDNA fragments containing target sites in different genomic loci were PCR-amplified using the corresponding primers (Table 4 below) .
  • 200 ⁇ 400ng PCR products were used, adding ddH2O to a final volume of 10 ⁇ L. Then the mix was subjected to reannealing procedure to form heteroduplex dsDNA. Then, the mix was treated with 1/10 volume of NEBUFFER TM 2.1 and 0.2 ⁇ L T7EI (NEB) at 37 °C for 50 min.
  • the digested product was analyzed by ⁇ 3%agarose gel electrophoresis. Indels were calculated based on previous methods (Cong et al., 2013) .
  • the PCR products containing mutation identified by T7E1 assay were cloned in TA-cloning vector, which then transformed into competent E. coli. Colonies were randomly picked and sent for Sanger-sequencing after overnight culture.
  • This example describes design and characterization of a SaCas9 engineered variant with improved gene-editing activity in human cells using the design pipeline described in Example 1.
  • SaCas9 Flexible regions of SaCas9 were identified computationally using DynaMine as described in Example 1 above.
  • the flexibility (S2 score) profile of SaCas9 is shown in the exemplary design pipeline of Fig. 1. Based on the S2 score profile, we selected flexible regions from 13 aa peaks with S2 scores lower than 0.71 (peaks circled in Fig. 1) .
  • the locations of the selected flexible regions and corresponding domains of SaCas9 are shown in Fig. 15.
  • the selected flexible region amino acid sequences and modified amino acid sequences of engineered flexible variant regions are shown in Table 5.
  • the amino acid positions are based on SEQ ID NO: 53.
  • This example describes characterization of wild-type Cas12i1 and Cas12i2.
  • Cas12i1 and Cas12i2 can cleave dsDNA in vitro.
  • Cas12i1 and Cas12i2 generated cleaved dsDNA products in vitro, resolved by gel electrophoresis (Fig. 17A) .
  • the cRNAs used in the in vitro cleavage assay are shown in Fig. 17B.
  • Cas12i1 and Cas12i2 are capable of genome editing in human cells.
  • Cas12i2 was able to generate indels in a broader range of targets, suggesting that it is a promising candidate for engineering (Fig. 18) .
  • Cas12i2 is able to generate indels in a broad range of targets.
  • Cas12i2 is enzymatically active (i.e., able to cleave dsDNA) at a wide temperature range, from 12 °C to 67 °C (Fig. 20) .
  • wildtype Cas12i2 is capable of editing multiple genomic target loci in human cells using a crRNA array (Fig. 21) .
  • CRISPR-Cas ribonucleoprotein effectors is initiated by the recognition of double-stranded PAM motifs by the Cas protein moiety followed by destabilization, localized melting, and interrogation of the target by the guide part of CRISPR RNA moiety.
  • the latter process depends on seed sequences, parts of the target that must be strictly complementary to CRISPR RNA guide. Mismatches between the target and CRISPR RNA guide outside the seed have minor effects on target binding, thus contributing to off-target activity of CRISPR-Cas effectors.
  • the coding sequences of Cas12i2 and its variants were ligated into a BPK2104-ccdB expression vector using T4 ligase, with the vector digested by XmaI and SpeI.
  • the template for in vitro cleavage assay was created by PCR and purified by DNA Clean & Concentrator (ZYMO TM Research) .
  • 100nM template was used, with 1mM Cas12i protein and 2mM crRNA.
  • the reaction was conducted in the 1x NEBUFFER TM 3.1 for Cas12i1 and 1x NEBUFFER TM 2 for Cas12i2 at 37°C for 60min.
  • the cleavage mix was incubated at a large-range temperature (4°C -67°C) for 1 h in its cleavage buffer.
  • the plasmid was incubated with different concentration of RNP formed by Cas12i2 or enCas12i2 plus crRNA.
  • the reaction was conducted at 37°C for 10min in 1x NEBUFFER TM 2.
  • the reaction was stopped by adding RNase cocktail (Thermo Fisher Scientific) to digest crRNA at 37°C for 20 min, following the incubation with Proteinase (Takara) at 37°C for 20 min.
  • the reaction was resolved by agarose gel electrophoresis and ethidium bromide staining.

Abstract

Provided herein are methods for engineering an enzyme such as a Cas nuclease to increase its enzymatic activity. Also provided are engineered Cas effector proteins, including engineered Cas12b, Cas12i and Cas9 nucleases and derivatives thereof, and methods of use thereof.

Description

ENGINEERED CAS EFFECTOR PROTEINS AND METHODS OF USE THEREOF
SUBMISSION OF SEQUENCE LISTING ON ASCII TEXT FILE
The content of the following submission on ASCII text file is incorporated herein by reference in its entirety: a computer readable form (CRF) of the Sequence Listing (file name: FD00283PCT-SEQListing1207. txt, date recorded: December 07, 2020, size: 666 KB) .
FIELD
The present application relates generally to the field of biotechnology. More specifically, the present application relates to methods and compositions of engineered Cas effector proteins with improved activity (e.g., gene editing activity) .
BACKGROUND
Genome editing is an important and useful technology in genomic research and various applications. Various systems may be used for genome editing, including the clustered regularly interspersed short palindromic repeats (CRISPR) -Cas system, the transcription activator-like effector nuclease (TALEN) system, and the zinc finger nuclease (ZFN) system.
The CRISPR-Cas system is an efficient and cost-effective genome-editing technology that is widely applicable in a range of eukaryotic organisms from yeast and plants to zebrafish and human (reviewed by Van der Oost 2013, Science 339: 768-770, and Charpentier and Doudna, 2013, Nature 495: 50-51) . The CRISPR-Cas system provides adaptive immunity in archaea and bacteria by employing a combination of Cas effector proteins and CRISPR RNAs (crRNAs) . To date, two classes (class 1 and 2) including six types (type I-VI) of CRISPR-Cas systems have been characterized according to prominent functional and evolutionary modularity of the systems. Among class 2 CRISPR-Cas systems, type II Cas9 systems and type V-A/B/E/J Cas12a/Cas12b/Cas12e/Cas12j systems have been harnessed for genome editing, and hold tremendous promise for biomedical research.
However, current CRISPR-Cas systems have various limitations, including limited gene-editing efficiency. Accordingly, there exists a need for improved methods and systems for effective genome editing across a variety of genetic loci.
BRIEF SUMMARY
To address the above and other needs, the present disclosure provides methods for engineering enzymes such as Cas nucleases to improve its enzymatic activity, engineered Cas effector proteins, and methods of using the engineered Cas effector proteins.
In one aspect, the present application provides a method of engineering an enzyme, comprising: (a) obtaining a plurality of engineered enzymes each comprising one or more mutations that increase flexibility of a flexible region in a plurality of flexible regions of a reference enzyme; and (b) selecting one or more engineered enzymes from the plurality of the engineered enzymes, wherein the one or more engineered enzymes have an increased activity compared to the reference enzyme. In some embodiments, the method further comprises determining a plurality of flexible regions in the reference enzyme.
In some embodiments according to any one of the methods described above, the plurality of flexible regions is determined based on the amino acid sequence of the reference enzyme. In some embodiments, the plurality of flexible regions is determined without reference to a three-dimensional structure of the reference enzyme or a homolog thereof. In some embodiments, the plurality of flexible regions is determined using a program selected from the group consisting of PredyFlexy, FoldUnfold, PROFbval, Flexserv, FlexPred, DynaMine and Disomine. In some embodiments, the plurality of flexible regions is determined using DynaMine.
In some embodiments according to any one of the methods described above, the method comprises: (i) calculating a flexibility score of each amino acid residue of the reference enzyme, wherein a higher flexibility score indicates lower conformational flexibility; (ii) selecting a plurality of peak amino acid residues at positions X i, wherein each peak amino acid residue has a flexibility score that is below a pre-determined threshold value and is lower than the flexibility scores of amino acid residues at positions X i-5 to X i-1 and X i+1 and X i+5; and (iii) defining the plurality of flexible regions as amino acid residues X i-2 to X i+2.
In some embodiments according to any one of the methods described above, the plurality of flexible regions are located in random coils.
In some embodiments according to any one of the methods described above, the one or more mutations comprise insertion of one or more Glycine (G) residues in a flexible region. In some embodiments, the one or more mutations comprise inserting two G residues in a flexible region. In some embodiments according to any one of the methods described above, the one or more G residues are inserted at the N-terminus of a flexible amino acid  residue in the flexible region, wherein the flexible amino acid residue is selected from the group consisting of G, Serine (S) , Asparagine (N) , Aspartic acid (D) , Histidine (H) , Methionine (M) , Threonine (T) , Glutamic acid (E) , Glutamine (Q) , Lysine (K) , Arginine (R) , Alanine (A) and Proline (P) . In some embodiments, the flexible amino acid residue is chosen according to the preference: G>S>N>D>H>M>T>E>Q>K>R>A>P.
In some embodiments according to any one of the methods described above, the one or more mutations comprise substitution of one or more non-G residues with one or more G residues. In some embodiments, the one or more mutations comprise substitution of a hydrophobic amino acid residue in a flexible region with a G residue, wherein the hydrophobic amino acid residues is selected from the group consisting of leucine (L) , isoleucine (I) , valine (V) , cysteine (C) , tyrosine (Y) , phenylalanine (F) , and tryptophan (W) .
In some embodiments according to any one of the methods described above, the enzyme is a bacterial or archaeal enzyme. In some embodiments, the enzyme is a Cas nuclease. In some embodiments, the Cas nuclease is selected from the group consisting of Cas9, Cas12a, Cas12b, Cas12c, Cas12d, Cas12f, Cas12g, Cas12h, Cms1, Cas12i, Cas12j, Cas12k and CasX. In some embodiments, the plurality of flexible regions are in domains of the reference Cas nuclease that interact with DNA and/or RNA. In some embodiments, the activity described in step (b) is site-specific nuclease activity. In some embodiments, the activity is gene-editing activity in a eukaryotic cell (e.g., mammalian cell) . In some embodiments, the activity is gene-editing activity in a human cell. In some embodiments, a selected engineered Cas nuclease of step (b) has at least about 20% (e.g., at least about 30%, 40%, 50%, 60%, 70%, 80%, or more) higher gene-editing efficiency compared to the reference Cas nuclease at a genomic locus in the cell. In some embodiments, the average gene-editing efficiency of a selected engineered Cas nuclease of step (b) at a plurality of genomic loci in the cell is at least about 20% (e.g., at least about 30%, 40%, 50%, 60%, 70%, 80%, or more) higher than that of the reference Cas nuclease. In some embodiments, the gene-editing efficiency is measured using a T7 endonuclease 1 (T7E1) assay, sequencing of the target DNA, a Tracking of Indels by Decomposition (TIDE) assay, or Indel Detection by Amplicon Analysis (IDAA) assay.
Another aspect of the present application provides an engineered Cas nuclease obtained using the method of any one of the methods described above.
In some embodiments, the present application provides an engineered Cas nuclease comprising one or more mutations that increase flexibility of a flexible region comprising at least 5 consecutive amino acid residues of a reference Cas nuclease, wherein  the engineered Cas nuclease has an increased activity compared to the reference Cas nuclease. In some embodiments, the flexible region is determined based on the amino acid sequence of the reference Cas nuclease. In some embodiments, the flexible region is determined using a program selected from the group consisting of PredyFlexy, FoldUnfold, PROFbval, Flexserv, FlexPred, DynaMine and Disomine. In some embodiments, the flexible region is determined using DynaMine, and wherein the amino acid residue with the highest flexibility in the flexible region has a flexibility score S 2 pred of no more than about 0.8.
In some embodiments according to any one of the engineered Cas nucleases described above, the flexible region has 5 consecutive amino acid residues, wherein the 3 rd amino acid residue has the lowest flexibility score, and wherein a higher flexibility score indicates lower conformational flexibility. In some embodiments according to any one of the engineered Cas nucleases described above, the flexible region is located in a random coil.
In some embodiments according to any one of the engineered Cas nucleases described above, the flexible region is in a domain of the reference Cas nuclease that interacts with DNA and/or RNA.
In some embodiments according to any one of the engineered Cas nucleases described above, the reference Cas nuclease is selected from the group consisting of Cas9, Cas12a, Cas12b, Cas12c, Cas12d, Cas12f, Cas12g, Cas12h, Cms1, Cas12i, Cas12j, Cas12k and CasX.
In some embodiments, the present application provides an engineered Cas12b nuclease comprising one or more mutations that increase flexibility of a flexible region that corresponds to amino acid residues 835 to 839 in a reference Cas12b nuclease, wherein the amino acid residue numbering is based on SEQ ID NO: 1, wherein the engineered Cas12b nuclease has an increased activity compared to the reference Cas12b nuclease. In some embodiments, the engineered Cas12b nuclease comprises an amino acid sequence having at least about 85% (e.g., at least about any one of 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, 99%, or 100%) sequence identity to SEQ ID NO: 2.
In some embodiments, the present application provides an engineered Cas12i nuclease comprising one or more mutations that increase flexibility of a flexible region in a reference Cas12i nuclease that is selected from the group consisting of regions corresponding to amino acid residues 228-232, amino acid residues 439-443, amino acid residues 478-482, amino acid residues 500-504, amino acid residues 775-779, and amino acid residues 925-929, wherein the amino acid residue numbering is based on SEQ ID NO: 8, wherein the engineered Cas12i nuclease has an increased activity compared to the reference Cas12i  nuclease. In some embodiments according to any one of the engineered Cas12i nucleases described above, the flexible region corresponds to amino acid residues 439-443 or amino acid residues 925-929, wherein the amino acid residue numbering is based on SEQ ID NO: 8. In some embodiments, the engineered Cas12i nuclease comprises an amino acid sequence having at least about 85% (e.g., at least about any one of 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, 99%, or 100%) sequence identity to an amino acid sequence selected from the group consisting of SEQ ID NOs: 14, 18, and 20.
In some embodiments, the present application provides an engineered Cas9 nuclease comprising one or more mutations that increase flexibility of a flexible region in a reference Cas9 nuclease that is selected from the group consisting of regions corresponding to amino acid residues 39-43, amino acid residues 135-139, amino acid residues 176-180, amino acid residues 274-278, amino acid residues 351-355, and amino acid residues 389-393, amino acid residues 521-525, amino acid residues 541-545, amino acid residues 755-759, amino acid residues 774-778, amino acid residues 786-790, amino acid residues 811-815, amino acid residues 848-852, amino acid residues 855-859, amino acid residues 874-878, amino acid residues 891-895, amino acid residues 1019-1023, and amino acid residues 1036-1040, wherein the amino acid residue numbering is based on SEQ ID NO: 25, wherein the engineered Cas9 nuclease has an increased activity compared to the reference Cas9 nuclease. In some embodiments according to any one of the engineered Cas9 nucleases described above, the flexible region is selected from the group consisting of regions corresponding to amino acid residues 135-139, amino acid residues 176-180, amino acid residues 541-545, amino acid residues 755-759, and amino acid residues 811-815, wherein the amino acid residue numbering is based on SEQ ID NO: 25. In some embodiments according to any one of the engineered Cas9 nucleases described above, the engineered nuclease comprises an amino acid sequence having at least about 85% (e.g., at least about any one of 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, 99%, or 100%) sequence identity to an amino acid sequence selected from the group consisting of SEQ ID NOs: 27, 28, 33, 34, and 41.
In some embodiments, the present application provides an engineered Cas9 nuclease comprising one or more mutations that increase flexibility of a flexible region in a reference Cas9 nuclease that is selected from the group consisting of regions corresponding to amino acid residues 45-49, amino acid residues 84-88, amino acid residues 116-120, amino acid residues 128-132, amino acid residues 216-220, and amino acid residues 318-322, amino acid residues 387-391, amino acid residues 497-501, amino acid residues 583-587, amino acid residues 594-598, amino acid residues 614-618, amino acid residues 696-700, and amino  acid residues 739-743, wherein the amino acid residue numbering is based on SEQ ID NO: 53, wherein the engineered Cas9 nuclease has an increased activity compared to the reference Cas9 nuclease. In some embodiments, the flexible region corresponds to amino acid residues 45-49, or amino acid residues 116-120, wherein the amino acid residue numbering is based on SEQ ID NO: 53. In some embodiments, the engineered Cas9 nuclease comprises an amino acid sequence having at least about 85% (e.g., at least about any one of 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, 99%, or 100%) sequence identity to an amino acid sequence selected from the group consisting of SEQ ID NOs: 54 and 58-59.
In some embodiments according to any one of the engineered Cas nucleases described above, the one or more mutations comprise insertion of one or more G residues in the flexible region. In some embodiments, the one or more mutations comprise insertion of two G residues in a flexible region. In some embodiments, the one or more G residues are inserted at the N-terminus of a flexible amino acid residue in the flexible region, wherein the flexible amino acid residue is selected from the group consisting of G, Serine (S) , Asparagine (N) , Aspartic acid (D) , Histidine (H) , Methionine (M) , Threonine (T) , Glutamic acid (E) , Glutamine (Q) , Lysine (K) , Arginine (R) , Alanine (A) and Proline (P) , and wherein the flexible amino acid residue is chosen according to the preference: G>S>N>D>H>M>T>E>Q>K>R>A>P.
In some embodiments according to any one of the engineered Cas nucleases described above, the one or more mutations comprise substitution of a hydrophobic amino acid residue in a flexible region with a G residue, wherein the hydrophobic amino acid residues is selected from the group consisting of leucine (L) , isoleucine (I) , valine (V) , cysteine (C) , tyrosine (Y) , phenylalanine (F) , and tryptophan (W) .
In some embodiments according to any one of the engineered Cas nucleases described above, the engineered Cas nuclease comprises one or more mutations that increase flexibility of two or more flexible regions in the reference Cas nuclease.
In some embodiments according to any one of the engineered Cas nucleases described above, the increased activity of the Cas nuclease is site-specific nuclease activity.
In some embodiments according to any one of the engineered Cas nucleases described above, the increased activity of the Cas nuclease is gene-editing activity in a eukaryotic cell (e.g., a mammalian cell) . In some embodiments, the increased activity of the Cas nuclease is gene-editing activity in a human cell. In some embodiments, the engineered Cas nuclease has at least about 20% (e.g., at least about 30%, 40%, 50%, 60%, 70%, 80%, or more) higher gene-editing efficiency compared to the reference Cas nuclease at a genomic  locus in the cell. In some embodiments, the average gene-editing efficiency of the engineered Cas nuclease at a plurality of genomic loci in the cell is at least about 20% (e.g., at least about 30%, 40%, 50%, 60%, 70%, 80%, or more) higher than that of the reference Cas nuclease. In some embodiments, the gene-editing efficiency is measured using a T7 endonuclease 1 (T7E1) assay, sequencing of the target DNA, a Tracking of Indels by Decomposition (TIDE) assay, or Indel Detection by Amplicon Analysis (IDAA) assay.
In some embodiments, the present application provides a Cas nuclease comprising an amino acid sequence selected from the group consisting of SEQ ID NOs: 2, 14, 18, 20, 27, 28, 33, 34, 41, 54 and 58-59.
In some embodiments, the present application provides an engineered Cas effector protein comprising any one of the engineered Cas nucleases described above, or a functional derivative thereof. In some embodiments, the engineered Cas nuclease or functional derivative thereof is enzymatically active. In some embodiments, the effector protein is capable of inducing a double-strand break in a DNA molecule. In some embodiments, the engineered Cas effector protein is capable of inducing a single-strand break in a DNA molecule. In some embodiments, the effector protein comprises an enzymatically inactive mutant of the engineered Cas nuclease.
In some embodiments according to any one of the engineered Cas effector proteins described above, the engineered Cas effector protein further comprises a functional domain fused to the engineered Cas nuclease or functional derivative thereof. In some embodiments, the functional domain is selected from the group consisting of a translation initiator domain, a transcription repressor domain, a transactivation domain, an epigenetic modification domain, a nucleobase-editing domain (e.g., CBE or ABE domain) , a reverse transcriptase domain, a reporter domain (e.g., a fluorescent domain) , and a nuclease domain.
In some embodiments according to any one of the engineered Cas effector proteins described above, the engineered Cas effector protein comprises a first polypeptide comprising an N-terminal portion of the engineered Cas nuclease or functional derivative thereof, and a second polypeptide comprising a C-terminal portion of the engineered Cas nuclease or functional derivative thereof, wherein the first polypeptide and the second polypeptide are capable of associating with each other in the presence of a guide RNA comprising a guide sequence to form a Clustered Regularly Interspersed Short Palindromic Repeat (CRISPR) complex that specifically binds to a target nucleic acid comprising a target sequence complementary to the guide sequence. In some embodiments, the first polypeptide and the second polypeptide each comprises a dimerization domain. In some embodiments, the  first dimerization domain and the second dimerization domain associate with each other in the presence of an inducer. In some embodiments, the first polypeptide and the second polypeptide do not comprise dimerization domains.
Another aspect of the present application provides an engineered CRISPR-Cas system comprising: (a) any one of the engineered Cas effector proteins described above; and (b) a guide RNA comprising a guide sequence complementary to a target sequence, or one or more nucleic acids encoding the guide RNA, wherein the engineered Cas effector protein and the guide RNA are capable of forming a CRISPR complex that specifically binds to a target nucleic acid comprising the target sequence and induces a modification of the target nucleic acid. In some embodiments, the guide RNA is a crRNA comprising the guide sequence. In some embodiments, the system comprises a precursor guide RNA array encoding a plurality of crRNAs. In some embodiments, the guide RNA comprises a crRNA and a tracrRNA. In some embodiments, the guide RNA is a single guide RNA (sgRNA) . In some embodiments, the guide RNA comprises a crRNA and a scoutRNA. In some embodiments, wherein the engineered Cas effector protein is a prime editor, the guide RNA is a pegRNA.
In some embodiments according to any one of the CRISPR-Cas systems described above, the system comprises one or more vectors encoding the engineered Cas effector protein. In some embodiments, the one or more vectors is selected from the group consisting of retroviral vectors, lentiviral vectors, adenoviral vectors, adeno-associated vectors, and herpes simplex vector. In some embodiments, the one or more vectors is an adeno-associated viral (AAV) vector. In some embodiments, the AAV vector further encodes the guide RNA (e.g., a crRNA, a sgRNA or precursor guide RNA array) .
Another aspect of the present application provides an engineered CRISPR-Cas system comprising: (a) a Cas12i effector protein comprising a Cas12i nuclease (e.g., Cas12i2) or a functional derivative thereof; and (b) a crRNA comprising a substitution of one or more Uridine (U) residues with a non-U residue in a repeat sequence comprising at least four U residues and a guide sequence complementary to a target sequence; wherein the Cas12i effector protein and the crRNA are capable of forming a CRISPR complex that specifically binds to a target nucleic acid comprising the target sequence and induces a modification of the target nucleic acid.
Another aspect of the present application provides a method of detecting a target nucleic acid in a sample, comprising: (a) contacting the sample with any one of the engineered CRISPR-Cas systems described above, and a labeled detector nucleic acid (e.g., DNA or RNA) that is single stranded and does not hybridize with the guide sequence of the  guide RNA; and (b) measuring a detectable signal produced by cleavage of the labeled detector nucleic acid by the engineered Cas effector protein, thereby detecting the target nucleic acid.
Another aspect of the present application provides a method of modifying a target nucleic acid comprising a target sequence, comprising contacting the target nucleic acid with any one of the engineered CRISPR-Cas systems described above. In some embodiments, the method is carried out in vitro. In some embodiments, the target nucleic acid is present in a cell. In some embodiments, the cell is a bacterial cell, a yeast cell, a mammalian cell, a plant cell, or an animal cell. In some embodiments, the method is carried out ex vivo. In some embodiments, the method is carried out in vivo.
In some embodiments according to any one of the methods of modifying a target nucleic acid described above, the target nucleic acid is cleaved or the target sequence in the target nucleic acid is altered by the engineered CRISPR-Cas system. In some embodiments, expression of the target nucleic acid is altered by the engineered CRISPR-Cas system. In some embodiments, the target nucleic acid is a genomic DNA. In some embodiments, the target sequence is associated with a disease or condition.
In some embodiments according to any one of the methods of modifying a target nucleic acid described above, the engineered CRISPR-Cas system comprises a precursor guide RNA array encoding a plurality of crRNA, wherein each crRNA comprises a different guide sequence.
In some embodiments according to any one of the methods of modifying a target nucleic acid described above, the method is carried out at a temperature of about 4℃ to about 67℃ (e.g., about 4 ℃ to about 15℃, about 15℃ to about 40℃, about 4℃ to about 37℃, or about 40℃ to about 67℃) .
Another aspect of the present application provides a method of treating a disease or condition associated with a target nucleic acid in cells of an individual, comprising modifying the target nucleic acid in the cells of the individual using the any one of the methods employing the engineered CRISPR-Cas systems described above, thereby treating the disease or condition. In some embodiments, the disease or condition is selected from the group consisting of cancer, cardiovascular diseases, hereditary diseases, autoimmune diseases, metabolic diseases, neurodegenerative diseases, ocular diseases, bacterial infections and viral infections.
Another aspect of the present application provides an engineered cell comprising a modified target nucleic acid, wherein the target nucleic acid has been modified using one of the methods described above.
In some embodiments, the present application provides an engineered non-human animal comprising one or more of the engineered cells described above.
Another aspect of the present application provides a method of modifying a target sequence in a target nucleic acid, comprising contacting the target nucleic acid with an engineered CRISPR-Cas system at a temperature of about 40℃ to about 67℃, wherein the engineered CRISPR-Cas system comprises: (a) a Cas12i2 effector protein comprising a Cas12i2 nuclease or a functional derivative thereof; and (b) a crRNA comprising a guide sequence that is complementary to the target sequence, wherein the Cas12i effector protein and the crRNA are capable of forming a CRISPR complex that specifically binds to a target nucleic acid comprising the target sequence and induces a modification of the target nucleic acid.
Another aspect of the present application provides an engineered crRNA comprising a substitution of one or more Uridine (U) residue with a non-U residue in a repeat sequence comprising at least four U residues. In some embodiments, the engineered crRNA comprises a spacer sequence of about 17 to 25 nucleotides long. In some embodiments, the engineered crRNA comprises a spacer sequence of about 20 nucleotides long. In some embodiments, the engineered crRNA comprises a repeat sequence comprising the nucleic acid sequence of SEQ ID NO: 173.
Also provided are compositions, kits and articles of manufacture for use in any one the methods described above.
It is appreciated that certain features of the disclosure, which are, for clarity, described in the context of separate embodiments, may also be provided in combination in a single embodiment. Conversely, various features of the disclosure, which are, for brevity, described in the context of a single embodiment, may also be provided separately or in any suitable sub-combination. All combinations of the embodiments pertaining to particular method steps, reagents, or conditions, or components of a composition are specifically embraced by the present disclosure and are disclosed herein just as if each and every combination was individually and explicitly disclosed.
BRIEF DESCRIPTION OF DRAWINGS
FIG. 1A shows an exemplary design pipeline for engineering Cas proteins with improved activity. In this example, candidate flexible regions are identified using DynaMine software, and glycine substitutions (single underlined) or insertions (double underlined) are introduced to generate Cas protein variants. Variants are then cloned into expression vectors together with eGFP, and cells are transfected with the variant Cas + GFP plasmid and an sgRNA or crRNA plasmid to test editing efficiency of the variants.
FIG. 1B shows a enlarged figure of the third step of the FIG. 1A.
FIG. 2 shows the flexibility (S2) score profile for BhCas12b4.
FIG. 3 shows the %indels generated by wild-type BhCas12b4 compared to the enBhCas12b4 using the indicated sgRNA in human cells. The last graph of FIG. 3 shows the %indels generated for all loci tested.
FIG. 4 shows the flexibility (S2) score profile for Cas12i2.
FIG. 5 shows computationally determined secondary structure regions of Cas12i2.
FIG. 6 shows the gene editing efficiency (%indels) of the Cas12i2 variants compared to WT Cas12i using four different crRNAs.
FIG. 7 shows the gene editing efficiency (%indels) of the a Cas12i2 variant with the combined mutations of variants 2.2+6.1, designated enCas12i2 compared to WT Cas12i or variants 2.2 and 6.1 alone. The enCas12i2 (2.2+6.1) showed improved editing efficiency.
FIG. 8A shows the overall genome editing efficiency in human cells of enCas12i2 compared to SpCas9 and BhCas12b-v4. Editing efficiency (indel %) was analyzed at 46 loci for enCas12i2, 18 loci for SpCas9, and 23 loci for BhCas12bv4.
FIG. 8B shows the average editing efficiency of enCas12i2 in human 293T cells in different protospacer adjacent motif (PAM) sites including NTTA, NTTC, NTTG, and NTTT, and ATTN, CTTN, GTTN, and TTTN.
FIG. 9 shows enCas12i2 processing of pre-crRNA in vivo. enCas12i2 has comparable genome-editing activity using a pre-crRNA targeting 3 sites versus using a single crRNA.
FIG. 10 shows in vitro cleavage of DNA plasmid by Cas12i2 and enCas12i2.
FIG. 11A shows detection of dsDNA containing an XBP target sequence by wildtype Cas12i2 and engineered Cas12i2 variants.
FIG. 11B shows results demonstrating that wildtype Cas12i2 can cleave a fluorescent reporter connected via rU in nucleic acid detection experiments.
FIG. 12 shows the flexibility (S2) score profile for GeoCas9.
FIG. 13 shows computationally determined secondary structure regions of GeoCas9.
FIG. 14 shows the results of a targeted deep sequencing assay to determine the efficiency of indel generation by the engineered GeoCas9 variants. Engineered GeoCas9 variants with significantly improved editing efficiency are indicated by arrowheads.
FIG. 15 shows the locations of the selected flexible regions and corresponding domains of SaCas9.
FIG. 16 shows the results of a T7 enzyme cleavage assay to test editing efficiency of the SaCas9 variants. Engineered SaCas9 variants 1.1, 3.1, 3.2 (indicated by box outline) showed significantly improved gene editing efficiency.
FIG. 17A shows in vitro dsDNA cleavage by wild-type Cas12i1 and Cas12i2.
FIG. 17B shows the crRNA sequences used to test cleavage by Cas12i1 or Cas12i2.
FIG. 18 shows %indels generated by wild-type Cas12i1 and Cas12i2 using crRNAs at target sites in the human genome. The gene-editing efficiency is measured using a T7 endonuclease 1 (T7E1) assay. We found that Cas12i2 was able to generate indels in a broader range of targets, suggesting that it is a promising candidate for engineering.
FIG. 19 shows the editing efficiencies of the indicated crRNA mutants compared to wild type crRNA sequences using Cas12i1.
FIG. 20 shows results demonstrating that Cas12i2 is enzymatically active (i.e., able to cleave dsDNA) at a wide temperature range, from 12 ℃ to 67 ℃.
FIG. 21 shows Cas12i2 editing of multiple genomic target loci in human cells using a crRNA array (pre-crRNA) .
FIG. 22 shows results defining the seed sequence of Cas12i2 by testing the ability of Cas12i2 to generate indels using crRNAs with single base mismatches at one of bases 1-19 of the crRNA.
FIG. 23 shows identification of the optimal spacer length of 20 bp for Cas12i2. The ability of Cas12i2 to generate indels with two different crRNAs (CCR5-1 and CCR5-2) using spacer lengths ranging from 17 bp to 25 bp was tested.
DETAILED DESCRIPTION
The present application provides methods for engineering an enzyme by introducing amino acid mutations that enhance flexibility in flexible regions of the enzyme, which leads to increased enzymatic activity in vitro and in vivo. The methods described herein are applicable for a variety of Cas nucleases, including Cas12b, Cas12i and Cas9. Notably, the  methods described herein do not rely on three-dimensional structures of the Cas nuclease. For example, the methods described herein have been successfully applied to Cas12i to create a number of engineered Cas12i proteins with improved genome editing efficiency across a wide range of genetic loci. Engineered Cas effector proteins (e.g., Cas12b, Cas12i and Cas9) and methods of using the engineered Cas effector proteins are also provided.
I. Definitions
Unless defined otherwise, all technical and scientific terms used herein have the same meaning as is commonly understood by one of ordinary skill in the art to which this disclosure belongs.
As used herein, an “effector protein” refers to a protein having an activity, such as site-specific binding activity, single-strand DNA cleavage activity, double-strand DNA cleavage activity, single-strand RNA cleavage activity, or transcriptional regulation activity.
As used herein, “guide RNA” and “gRNA” are used herein interchangeably to refer to RNA that is capable of forming a complex with a Cas effector protein and a target nucleic acid (e.g., duplex DNA) . A guide RNA may comprise a single RNA molecule or two or more RNA molecules associated with each other via hybridization of complementary regions in the two or more RNA molecules. When used in connection with a dual RNA-guided Cas nuclease, such as Cas9, a guide RNA comprises a crRNA and a tracrRNA. When used in connection with a single RNA-guided Cas nuclease, such as Cas12i, the guide RNA does not comprise a tracrRNA or another transactivating RNA. Also contemplated herein are precursor guide RNA arrays that can be processed into a plurality of crRNAs, and for some CRISPR/Cas systems, the processed crRNAs further associate with tracrRNA or another transactivating RNA (e.g., scoutRNA) to guide the Cas effector protein. The “crRNA” or “CRISPR RNA” comprises a guide sequence that has sufficient complementarity to a target sequence of a target nucleic acid (e.g., duplex DNA) , which guides sequence-specific binding of the CRISPR complex to the target nucleic acid. The “tracrRNA” or “trans-activating CRISPR RNA” is partially complementary to and base pairs with the crRNA, and may play a role in the maturation of the crRNA. A “single guide RNA” or “sgRNA” is an engineered guide RNA having both crRNA and tracrRNA fused to each other in a single molecule.
The terms “nucleic acid, ” “polynucleotide, ” and "nucleotide sequence" are used interchangeably to refer to a polymeric form of nucleotides of any length, including deoxyribonucleotides, ribonucleotides, combinations thereof, and analogs thereof. “Oligonucleotide” and “oligo” are used interchangeably to refer to a short polynucleotide, having no more than about 50 nucleotides.
As used herein, “complementarity” refers to the ability of a nucleic acid to form hydrogen bond (s) with another nucleic acid by traditional Watson-Crick base-pairing. A percent complementarity indicates the percentage of residues in a nucleic acid molecule which can form hydrogen bonds (i.e., Watson-Crick base pairing) with a second nucleic acid (e.g., about 5, 6, 7, 8, 9, 10 out of 10, being about 50%, 60%, 70%, 80%, 90%, and 100%complementary respectively) . “Perfectly complementary” means that all the contiguous residues of a nucleic acid sequence form hydrogen bonds with the same number of contiguous residues in a second nucleic acid sequence. “Substantially complementary” as used herein refers to a degree of complementarity that is at least about any one of 70%, 75%, 80%, 85%, 90%, 95%, 97%, 98%, 99%, or 100%over a region of about 40, 50, 60, 70, 80, 100, 150, 200, 250 or more nucleotides, or refers to two nucleic acids that hybridize under stringent conditions.
As used herein, “stringent conditions” for hybridization refer to conditions under which a nucleic acid having complementarity to a target sequence predominantly hybridizes with the target sequence, and substantially does not hybridize to non-target sequences. Stringent conditions are generally sequence-dependent, and vary depending on a number of factors. In general, the longer the sequence, the higher the temperature at which the sequence specifically hybridizes to its target sequence. Non-limiting examples of stringent conditions are described in detail in Tijssen (1993) , Laboratory Techniques In Biochemistry And Molecular Biology-Hybridization With Nucleic Acid Probes Part I, Second Chapter "Overview of principles of hybridization and the strategy of nucleic acid probe assay, ” Elsevier, N, Y.
“Hybridization” refers to a reaction in which one or more polynucleotides react to form a complex that is stabilized via hydrogen bonding between the bases of the nucleotide residues. The hydrogen bonding may occur by Watson Crick base pairing, Hoogstein binding, or in any other sequence specific manner. A sequence capable of hybridizing with a given sequence is referred to as the “complement” of the given sequence.
“Percentage (%) sequence identity” with respect to a nucleic acid sequence is defined as the percentage of nucleotides in a candidate sequence that are identical with the nucleotides in the specific nucleic acid sequence, after aligning the sequences by allowing gaps, if necessary, to achieve the maximum percent sequence identity. “Percentage (%) sequence identity” with respect to a peptide, polypeptide or protein sequence is the percentage of amino acid residues in a candidate sequence that are identical substitutions to amino acid residues in the specific peptide or amino acid sequence, after aligning the  sequences by allowing gaps, if necessary, to achieve the maximum percent sequence homology. Alignment for purposes of determining percent amino acid sequence identity can be achieved in various ways that are within the skill in the art, for instance, using publicly available computer software such as BLAST, BLAST-2, ALIGN or MEGALIGN TM (DNASTAR) software. Those skilled in the art can determine appropriate parameters for measuring alignment, including any algorithms needed to achieve maximal alignment over the full length of the sequences being compared.
The terms "polypeptide" , and "peptide" are used interchangeably herein to refer to polymers of amino acids of any length. The polymer may he linear or branched, it may comprise modified amino acids, and it may be interrupted by non-amino acids. A protein may have one or more polypeptides. The terms also encompass an amino acid polymer that has been modified; for example, disulfide bond formation, glycosylation, lipidation, acetylation, phosphorylation, or any other manipulation, such as conjugation with a labeling component.
As used herein, a “variant” is interpreted to mean a polynucleotide or polypeptide that differs from a reference polynucleotide or polypeptide, respectively, but retains essential properties. A typical variant of a polynucleotide differs in nucleic acid sequence from another, reference polynucleotide. Changes in the nucleic acid sequence of the variant may or may not alter the amino acid sequence of a polypeptide encoded by the reference polynucleotide. Nucleotide changes may result in amino acid substitutions, additions, deletions, fusions and truncations in the polypeptide encoded by the reference sequence, as discussed below. A typical variant of a polypeptide differs in amino acid sequence from another, reference polypeptide. Generally, differences are limited so that the sequences of the reference polypeptide and the variant are closely similar overall and, in many regions, identical. A variant and reference polypeptide may differ in amino acid sequence by one or more substitutions, additions, deletions in any combination. A substituted or inserted amino acid residue may or may not be one encoded by the genetic code. A variant of a polynucleotide or polypeptide may be a naturally occurring such as an allelic variant, or it may be a variant that is not known to occur naturally. Non-naturally occurring variants of polynucleotides and polypeptides may be made by mutagenesis techniques, by direct synthesis, and by other recombinant methods known to skilled artisans.
As used herein, the term "wild type" has the meaning commonly understood by those skilled in the art to mean a typical form of an organism, a strain, a gene, or a feature that distinguishes it from a mutant or variant when it exists in nature. It can be isolated from sources in nature and not intentionally modified.
As used herein, the terms "non-naturally occurring" or "engineered" are used interchangeably and refer to artificial participation. When these terms are used to describe a nucleic acid molecule or polypeptide, it is meant that the nucleic acid molecule or polypeptide is at least substantially freed from at least one other component of its association in nature or as found in nature.
As used herein, the term "orthologue, ortholog" has the meaning as commonly understood by one of ordinary skill in the art. As a further guide, an "orthologue" of a protein as referred to herein refers to a protein belonging to a different species that performs the same or similar function as a protein that is an orthologue thereof.
As used herein, the term "identity" is used to mean the matching of sequences between two polypeptides or between two nucleic acids. When a position in the two sequences being compared is occupied by the same base or amino acid monomer subunit (for example, a position in each of the two DNA molecules is occupied by adenine, or two Each position in each of the polypeptides is occupied by lysine, and then each molecule is identical at that position. The "percent identity" between the two sequences is a function of the number of matching positions shared by the two sequences divided by the number of positions to be compared x 100. For example, if 6 of the 10 positions of the two sequences match, then the two sequences have 60%identity. For example, the DNA sequences CTGACT and CAGGTT share 50%identity (3 out of a total of 6 positions match) . Typically, the comparison is made when the two sequences are aligned to produce maximum identity. Such alignment can be achieved by, for example, the method of Needleman et al. (1970) J. Mol. Biol. 48: 443-453, which can be conveniently performed by a computer program such as the Align program (DNAstar, Inc. ) . It is also possible to use the algorithm of E. Meyers and W. Miller (Comput. Appl Biosci., 4: 11-17 (1988) ) integrated into the ALIGN program (version 2.0) , using the PAM 120 weight residue table. The gap length penalty of 12 and the gap penalty of 4 were used to determine the percent identity between the two amino acid sequences. In addition, the Needleman and Wunsch (J MoI Biol. 48: 444-453 (1970) ) algorithms in the GAP program integrated into the GCG software package (available at www. gcg. com) can be used, using the Blossum 62 matrix or The PAM250 matrix and the gap weight of 16, 14, 12, 10, 8, 6 or 4 and the length weight of 1, 2, 3, 4, 5 or 6 to determine the percent identity between two amino acid sequences.
A “cell” as used herein, is understood to refer not only to the particular individual cell, but to the progeny or potential progeny of the cell. Because certain modifications may occur in succeeding generations due to either mutation or environmental influences, such  progeny may not, in fact, be identical to the parent cell, but are still included within the scope of the term as used herein.
The term “transduction” and “transfection” as used herein include all methods known in the art using an infectious agent (such as a virus) or other means to introduce DNA into cells for expression of a protein or molecule of interest. Besides a virus or virus like agent, there are chemical-based transfection methods, such as those using calcium phosphate, dendrimers, liposomes, or cationic polymers (e.g., DEAE-dextran or polyethylenimine) ; non-chemical methods, such as electroporation, cell squeezing, sonoporation, optical transfection, impalefection, protoplast fusion, delivery of plasmids, or transposons; particle-based methods, such as using a gene gun, magnectofection or magnet assisted transfection, particle bombardment; and hybrid methods, such as nucleofection.
The term “transfected” or “transformed” or “transduced” as used herein refers to a process by which exogenous nucleic acid is transferred or introduced into the host cell. A “transfected” or “transformed” or “transduced” cell is one, which has been transfected, transformed or transduced with exogenous nucleic acid.
The term “in vivo” refers to inside the body of the organism from which the cell is obtained. “Ex vivo” or “in vitro” means outside the body of the organism from which the cell is obtained.
As used herein, “treatment” or “treating” is an approach for obtaining beneficial or desired results including clinical results. For purposes of this invention, beneficial or desired clinical results include, but are not limited to, one or more of the following: alleviating one or more symptoms resulting from the disease, diminishing the extent of the disease, stabilizing the disease (e.g., preventing or delaying the worsening of the disease) , preventing or delaying the spread (e.g., metastasis) of the disease, preventing or delaying the recurrence of the disease, reducing recurrence rate of the disease, delay or slowing the progression of the disease, ameliorating the disease state, providing a remission (partial or total) of the disease, decreasing the dose of one or more other medications required to treat the disease, delaying the progression of the disease, increasing the quality of life, and/or prolonging survival. Also encompassed by “treatment” is a reduction of pathological consequence of cancer. The methods of the invention contemplate any one or more of these aspects of treatment.
The term “effective amount” used herein refers to an amount of a compound or composition sufficient to treat a specified disorder, condition or disease such as ameliorate, palliate, lessen, and/or delay one or more of its symptoms. As is understood in the art, an  “effective amount” may be in one or more doses, i.e., a single dose or multiple doses may be required to achieve the desired treatment endpoint.
A “subject, ” an “individual, ” or a “patient” are used herein interchangeably for purposes of treatment, and refers to any animal classified as a mammal, including humans, domestic and farm animals, and zoo, sports, or pet animals, such as dogs, horses, cats, cows, etc. In some embodiments, the individual is a human individual.
It is understood that embodiments of the invention described herein include “consisting” and/or “consisting essentially of” embodiments.
Reference to “about” a value or parameter herein includes (and describes) variations that are directed to that value or parameter per se. For example, description referring to “about X” includes description of “X. ” 
As used herein, reference to “not” a value or parameter generally means and describes “other than” a value or parameter. For example, the method is not used to treat cancer of type X means the method is used to treat cancer of types other than X.
The term “about X-Y” used herein has the same meaning as “about X to about Y. ” 
As used herein and in the appended claims, the singular forms “a, ” “an, ” and “the” include plural referents unless the context clearly dictates otherwise. It is further noted that the claims may be drafted to exclude any optional element. As such, this statement is intended to serve as antecedent basis for use of such exclusive terminology as “solely, ” “only” and the like in connection with the recitation of claim elements, or use of a “negative” limitation.
The term “and/or” as used herein a phrase such as “A and/or B” is intended to include both A and B; A or B; A (alone) ; and B (alone) . Likewise, the term “and/or” as used herein a phrase such as “A, B, and/or C” is intended to encompass each of the following embodiments: A, B, and C; A, B, or C; A or C; A or B; B or C; A and C; A and B; B and C; A (alone) ; B (alone) ; and C (alone) .
II. Methods of Engineering Enzymes
The present application provides a method of engineering an enzyme, comprising: (a) obtaining a plurality of engineered enzymes each comprising one or more mutations that increase flexibility of a flexible region in a plurality of flexible regions of a reference enzyme; and (b) selecting one or more engineered enzymes from the plurality of the engineered enzymes, wherein the one or more engineered enzymes have an increased activity compared to the reference enzyme. In some embodiments, the method further comprises determining a plurality of flexible regions in the reference enzyme. In some embodiments, a three- dimensional structure of the reference enzyme or a homolog thereof is not available. In some embodiments, the enzyme is a bacterial or archaeal enzyme. In some embodiments, the activity is measured in a eukaryotic cell, such as a mammalian cell, e.g., a human cell.
In some embodiments, there is provided a method of engineering a Cas nuclease, comprising: (a) determining a plurality of flexible regions in a reference Cas nuclease (e.g., based on the primary sequence of the reference Cas nuclease) ; (b) obtaining a plurality of engineered Cas nucleases each comprising one or more mutations that increase flexibility of a flexible region in the plurality of flexible regions of the reference Cas nuclease; and (c) selecting one or more engineered Cas nucleases from the plurality of the engineered Cas nucleases, wherein the one or more engineered Cas nucleases have an increased activity compared to the reference Cas nuclease. In some embodiments, the plurality of flexible regions is determined using a program selected from the group consisting of PredyFlexy, FoldUnfold, PROFbval, Flexserv, FlexPred, DynaMine and Disomine. In some embodiments, the Cas nuclease is selected from the group consisting of Cas9, Cas12a, Cas12b, Cas12c, Cas12d, Cas12f, Cas12g, Cas12h, Cms1, Cas12i, Cas12j, Cas12k and CasX. In some embodiments, the plurality of flexible regions are located in random coils. In some embodiments, the plurality of flexible regions are in domains of the reference Cas nuclease that interact with DNA and/or RNA. In some embodiments, the flexible region is at least about 5 (e.g., 5) amino acids long. In some embodiments, the one or more mutations comprise insertion of one or more (e.g., 2) Glycine (G) residues in a flexible region. In some embodiments, the one or more G residues are inserted at the N-terminus of a flexible amino acid residue in the flexible region, wherein the flexible amino acid residue is selected from the group consisting of G, Serine (S) , Asparagine (N) , Aspartic acid (D) , Histidine (H) , Methionine (M) , Threonine (T) , Glutamic acid (E) , Glutamine (Q) , Lysine (K) , Arginine (R) , Alanine (A) and Proline (P) . In some embodiments, the flexible amino acid residue is chosen according to the preference: G>S>N>D>H>M>T>E>Q>K>R>A>P. In some embodiments, the one or more mutations comprise substitution of one or more non-G residues with one or more G residues. In some embodiments, the one or more mutations comprise substitution of a hydrophobic amino acid residue in a flexible region with a G residue, wherein the hydrophobic amino acid residues is selected from the group consisting of leucine (L) , isoleucine (I) , valine (V) , cysteine (C) , tyrosine (Y) , phenylalanine (F) , and tryptophan (W) . In some embodiments, the activity is site-specific nuclease activity. In some embodiments, the activity is gene-editing activity in a eukaryotic cell (e.g., human cell) . In some embodiments, the gene-editing efficiency is measured using a T7 endonuclease 1 (T7E1)  assay, sequencing of the target DNA, a Tracking of Indels by Decomposition (TIDE) assay, or Indel Detection by Amplicon Analysis (IDAA) assay.
In some embodiments, there is provided a method of engineering a Cas nuclease, comprising: (a) calculating a flexibility score of each amino acid residue of a reference Cas nuclease (e.g., based on the primary sequence of the reference Cas nuclease) , wherein a higher flexibility score indicates lower conformational flexibility; (b) selecting a plurality of peak amino acid residues at positions X i, wherein each peak amino acid residue has a flexibility score that is below a pre-determined threshold value and is lower than the flexibility scores of amino acid residues at positions X i-5 to X i-1 and X i+1 and X i+5; and (c) selecting a plurality of flexible regions as amino acid residues X i-2 to X i+2; (d) obtaining a plurality of engineered Cas nucleases each comprising one or more mutations that increase flexibility of a flexible region in the plurality of flexible regions of the reference Cas nuclease, wherein the one or more mutations comprise insertion of one or more (e.g., 2) G residues in a flexible region and/or substitution of one or more non-G residues with one or more G residues; and (e) selecting one or more engineered Cas nucleases from the plurality of the engineered Cas nucleases, wherein the one or more engineered Cas nucleases have an increased activity compared to the reference Cas nuclease. In some embodiments, the plurality of flexible regions is determined using a program selected from the group consisting of PredyFlexy, FoldUnfold, PROFbval, Flexserv, FlexPred, DynaMine and Disomine. In some embodiments, the Cas nuclease is selected from the group consisting of Cas9, Cas12a, Cas12b, Cas12c, Cas12d, Cas12f, Cas12g, Cas12h, Cms1, Cas12i, Cas12j, Cas12k and CasX. In some embodiments, the plurality of flexible regions are located in random coils. In some embodiments, the plurality of flexible regions are in domains of the reference Cas nuclease that interact with DNA and/or RNA. In some embodiments, the one or more G residues are inserted at the N-terminus of a flexible amino acid residue in the flexible region, wherein the flexible amino acid residue is selected from the group consisting of G, Serine (S) , Asparagine (N) , Aspartic acid (D) , Histidine (H) , Methionine (M) , Threonine (T) , Glutamic acid (E) , Glutamine (Q) , Lysine (K) , Arginine (R) , Alanine (A) and Proline (P) . In some embodiments, the flexible amino acid residue is chosen according to the preference: G>S>N>D>H>M>T>E>Q>K>R>A>P. In some embodiments, the one or more mutations comprise substitution of a hydrophobic amino acid residue in a flexible region with a G residue, wherein the hydrophobic amino acid residues is selected from the group consisting of leucine (L) , isoleucine (I) , valine (V) , cysteine (C) , tyrosine (Y) , phenylalanine (F) , and tryptophan (W) . In some embodiments, the activity is site-specific nuclease activity. In some  embodiments, the activity is gene-editing activity in a eukaryotic cell (e.g., human cell) . In some embodiments, the gene-editing efficiency is measured using a T7 endonuclease 1 (T7E1) assay, sequencing of the target DNA, a Tracking of Indels by Decomposition (TIDE) assay, or Indel Detection by Amplicon Analysis (IDAA) assay.
Also provided are engineered enzymes (e.g., engineered Cas nucleases) obtained using any one of the methods described herein, and a library comprising the plurality of engineered enzymes described herein.
Determining a plurality of fiexible regions
The plurality of flexible regions in the reference enzyme may be determined using any known methods in the art. In some embodiments, the plurality of flexible regions are determined solely based on the amino acid sequence of the reference enzyme. In some embodiments, the plurality of flexible regions are determined based on structural information of the reference enzyme, including, for example, secondary structure, crystal structure, NMR structure, etc. In some embodiments, the plurality of flexible regions are determined without reference to the structural information, e.g., three-dimensional structures, of the reference enzyme. In some embodiments, a three-dimensional structure of the reference enzyme or homolog thereof is not available.
In some embodiments of the method provide herein, the plurality of flexible regions is determined based on the amino acid sequence of the reference enzyme. In some embodiments, the plurality of flexible regions is determined using a program selected from the group consisting of PredyFlexy, FoldUnfold, PROFbval, Flexserv, FlexPred, and DynaMine. Methods of determining flexible regions based on amino acid sequence have been described, for example, by Yu et al. (Engineering proteins for thermostability through rigidifying flexible sites. Biotechnology Advances Volume 32, Issue 2, March-April 2014, Pages 308-315) .
In some embodiments, the plurality of flexible regions is determined based on NMR chemical shift data for proteins in solution. In some embodiments, the plurality of flexible regions is determined using DynaMine (Cilia et al. The DynaMine webserver: predicting protein dynamics from sequence. Nucleic Acids Res. 2014 Jul 1; 42 (Web Server issue) : W264-W270) . DynaMine leverages NMR chemical shift data for proteins in solution to obtain a quantitative insight into the relationship between the amino-acid sequence and backbone dynamics and predict flexible regions. The DynaMine predictor developed from these data predicts the residue-level potential of a protein for backbone dynamics based on sequence information alone, as opposed to approaches that 3D structural information. This  approach opens up the vast amount of available protein sequences lacking structural information for dynamics analysis. In some embodiments, the flexibility profile is an S2 score profile, wherein a lower S2 score indicates higher flexibility.
In some embodiments, the plurality of flexible regions is determined using molecular dynamics simulations (e.g., simulated Root Mean Square Fluctuation) . In some embodiments, the plurality of flexible regions is determined based on B-factor (e.g., crystallographic) data. In some embodiments, the plurality of flexible regions is determined using PredyFlexy, described by Tarun et al. (In silico prediction of protein flexibility with local structure approach. Biochimie (165) , October 2019, Pages 150-155) .
In some embodiments, the plurality of flexible regions is determined using the expected average number of contacts per residue as an indicator of whether the given region is folded or unfolded, such as with FoldUnfold (Oxana et al. FoldUnfold: web server for the prediction of disordered regions in protein chain, Bioinformatics, Volume 22,  Issue  23, 1 December 2006, Pages 2948-2949) .
In some embodiments, the plurality of flexible regions is determined by calculating normalized B-values from amino acid sequence, such as with PROFbval (Schlessinger et al. PROFbval: predict flexible and rigid residues in proteins. Bioinformatics, Volume 22,  Issue  7, 1 April 2006, Pages 891-893) .
In some embodiments, the plurality of flexible regions is determined using a combination of multiple coarse-grained approaches, structural databases, and atomistic models, such as with Flexserv (Camps et al. FlexServ: an integrated tool for the analysis of protein flexibility. Bioinformatics 2009 Jul 1 ; 25 (13) : 1709-10) .
In some embodiments, the plurality of flexible regions is determined using a trained supervised pattern recognition method, Support Vector Machine (SVM) , such as using the FlexPred webserver (Kuznetsov1 et al. FlexPred: a web-server for predicting residue positions involved in conformational switches in proteins. Bioinformation. 2008; 3 (3) : 134-136) .
In some embodiments, the plurality of flexible regions is determined based on protein disorder calculated with recurrent neural networks, such as using DisoMine (Orlando et al. Prediction of disordered regions in proteins with recurrent Neural Networks and protein dynamics. bioRxiv 2020.05.25.115253; doi: https: //doi. org/10.1101/2020.05.25.115253) .
In some embodiments, determining the plurality of flexible regions comprises: (i) calculating a flexibility score of each amino acid residue of the reference enzyme, wherein a higher flexibility score indicates lower conformational flexibility; (ii) selecting a plurality of  peak amino acid residues at positions X i, wherein each peak amino acid residue has a flexibility score that is below a pre-determined threshold value and is lower than the flexibility scores of amino acid residues at positions X i-5 to X i-1 and X i+1 and X i+5; and (iii) defining the plurality of flexible regions as amino acid residues X i-2 to X i+2. In some embodiments, the flexible region is determined using DynaMine, and the amino acid residue with the highest flexibility in the flexible region has a flexibility score S 2 pred of no more than about 0.8 (e.g., no more than about 0.75 or 0.7) .
In some embodiments, the flexible region has 5 consecutive amino acid residues, wherein the 3 rd amino acid residue has the lowest flexibility score, and wherein a higher flexibility score indicates lower conformational flexibility.
In some embodiments, a flexible region is located in a random coil or linker region of the protein. In some embodiments, the random coil or linker region is determined based on available structural data. In some embodiments, the random coil or linker region is determined based on homology to orthologues with known structures. In some embodiments, the random coil or linker region is determined based on amino acid sequence. In some embodiments, no 3D structural data is available for the reference protein or an orthologue of the reference protein. In some embodiments, a flexible region is not located in a random coil or linker region. In some embodiments, a flexible region is located within about 10 amino acids, such as about any one of 9, 8, 7, 6, 5, 4, 3, 2, or 1 amino acids away from a random coil or linker region. In some embodiments, a flexible region is longer than a random coil or linker region. In some embodiments, a flexible region is shorter than a random coil or linker region. In some embodiments, at least a portion of a flexible region is located in an alpha helix or a beta strand.
In some embodiments, a flexible region is located in a functional domain of the reference enzyme. In some embodiments, a flexible region is located in proximity to the catalytic site of the reference enzyme. In some embodiments, a flexible region is not located in a functional domain of the reference enzyme. In some embodiments, the reference enzyme is a reference nuclease. In some embodiments, the flexible region is in a domain of the reference nuclease (e.g., Cas nuclease) that interacts with DNA and/or RNA. In some embodiments, the flexible region is not located in a domain of the reference nuclease that interacts with DNA and/or RNA. In some embodiments, the reference enzyme is a Cas nuclease selected from the group consisting of Cas9, Cas12a, Cas12b, Cas12c, Cas12d, Cas12f, Cas12g, Cas12h, Cms1, Cas12i, Cas12j, Cas12k and CasX. Other Cas nucleases that can be engineered using the methods described herein are discussed in the “Other engineered  Cas effector proteins” subsection of Section III. In some embodiments, the reference enzyme is an Argonaut protein.
In some embodiments, the engineered enzyme (e.g., Cas nuclease) comprises one or more mutations that increase flexibility of two or more (e.g., 2, 3, 4, 5, 6) flexible regions. In some embodiments, the one or more mutations in different flexible regions have synergistic effect in increasing the activity of the engineered enzyme with respect to the reference enzyme.
Generating variants with mutations that increase flexibility
In some embodiments, the method comprises generating an insertion of one or more G residues and/or G substitution of a hydrophobic amino acid residue in a flexible region comprising at least 5 consecutive amino acid residues of a reference Cas nuclease, wherein the engineered Cas nuclease has an increased activity compared to the reference Cas nuclease. In some embodiments, the flexible region has 5 consecutive amino acid residues, wherein the 3 rd amino acid residue has the lowest flexibility score, and wherein a higher flexibility score indicates lower conformational flexibility. In some embodiments, the one or more mutations comprise inserting one or more (e.g., 2) G residues in the flexible region. In some embodiments, the one or more G residues are inserted at the N-terminus of a flexible amino acid residue in the flexible region, wherein the flexible amino acid residue is selected from the group consisting of G, Serine (S) , Asparagine (N) , Aspartic acid (D) , Histidine (H) , Methionine (M) , Threonine (T) , Glutamic acid (E) , Glutamine (Q) , Lysine (K) , Arginine (R) , Alanine (A) and Proline (P) . In some embodiments, the flexible amino acid residue is chosen according to the preference: G>S>N>D>H>M>T>E>Q>K>R>A>P.
In some embodiments, the one or more mutations comprise substituting a hydrophobic amino acid residue in a flexible region with a G residue, wherein the hydrophobic amino acid residues is selected from the group consisting of L, I, V, C, Y, F and W. In some embodiments, the engineered Cas nuclease comprises one or more mutations that increase flexibility of two or more flexible regions. In some embodiments, the reference Cas nuclease is selected from the group consisting of Cas9, Cas12a, Cas12b, Cas12c, Cas12d, Cas12f, Cas12g, Cas12h, Cms1, Cas12i, Cas12j, Cas12k and CasX. In some embodiments, the method comprises generating insertion of one or more G or S residues and/or substitution of a hydrophobic amino acid residue in a flexible region with a G or S residue. In some embodiments, the method comprises generating a substitution of a hydrophobic residue (e.g., L, I, V, C, Y, F or W) or a less flexible amino acid residue with a more flexible amino acid  residue. In some embodiments, the more flexible amino acid residue is chosen according to the preference: G>S>N>D>H>M>T>E>Q>K>R>A>P.
Selecting engineered enzymes with increased activity
In some embodiments, the method comprises selecting one or more engineered enzymes from the plurality of the engineered enzymes, wherein the one or more engineered enzymes have an increased activity compared to the reference enzyme wherein the activity is site-specific nuclease activity. In some embodiments, the activity is gene-editing activity in a eukaryotic cell. In some embodiments, the activity is gene-editing activity in a human cell.
In some embodiments, the increased activity described is site-specific nuclease activity. In some embodiments, the site-specific nuclease activity is determined in vitro. In some embodiments, the site-specific nuclease activity is determined in a cell. Site-specific nuclease activity may be assessed using known methods in the art, including, for example, in vitro cleavage assay based on agarose gel electrophoresis as described in the Examples provided herein. In some embodiments, an engineered enzyme (e.g., engineered Cas nuclease) variants having at least about any one of 20%, 30%, 40%, 60%, 70%, 80%, 90%, 1.5 fold, 2 fold, 3 fold, 4 fold, 5 fold or higher increase of site-specific nuclease activity compared to that of the reference enzyme are selected.
In some embodiments, the activity is gene-editing activity in a cell, such as a prokaryotic cell or a eukaryotic cell. Gene-editing efficiency of an engineered enzyme in a cell may be determined using known methods in the art, including, for example, a T7 endonuclease 1 (T7E1) assay, sequencing of the target DNA (including, e.g., Sanger sequence, and next generation sequencing) , a Tracking of Indels by Decomposition (TIDE) assay, or Indel Detection by Amplicon Analysis (IDAA) assay. See, for example, Sentmanat MF et al., “Asurvey of validation strategies for CRISPR-Cas9 editing, ” Scientific Reports, 2018, 8, Article number 888, which is incorporated herein by reference in its entirety. In some embodiments, the gene-editing efficiency of an engineered enzyme in a cell is measured using targeted next-generation sequencing (NGS) , for example, as described in the Examples herein.
In some embodiments, the activity is gene-editing activity in a eukaryotic cell, such as a plant cell or a mammalian cell. In some embodiments, engineered enzyme (e.g., engineered Cas nuclease) variants having at least about any one of 20%, 30%, 40%, 60%, 70%, 80%, 90%, 1.5 fold, 2 fold, 3 fold, 4 fold, 5 fold or higher increase of site-specific nuclease activity in a eukaryotic cell compared to that of the reference enzyme are selected.
In some embodiments, the activity is gene-editing activity in a human cell, such as 293 T cell. In some embodiments, engineered enzyme (e.g., engineered Cas nuclease) variants having at least about any one of 20%, 30%, 40%, 60%, 70%, 80%, 90%, 1.5 fold, 2 fold, 3 fold, 4 fold, 5 fold or higher increase of site-specific nuclease activity in a human cell compared to that of the reference enzyme are selected.
In some embodiments, the method further comprises a step of combining mutations from one or more engineered enzymes from the plurality of the engineered enzymes, wherein the one or more engineered enzymes have an increased activity compared to the reference enzyme. In some embodiments, the combined mutations comprise mutations in one or more (e.g., 2, 3, 4, or more) different flexible regions.
III. Engineered Cas effector proteins
The present application provides engineered Cas effector proteins that have improved activity, such as target binding, double-strand cleavage activity, nickase activity, and/or gene-editing activity. In some embodiments, the engineered Cas nucleases are obtained using any one of the methods of engineering described in Section II. In some embodiments, there is provided an engineered Cas effector protein (e.g., Cas nuclease, Cas nickase, Cas fusion effector protein, or split Cas effector protein) comprising any one of the engineered Cas nucleases described herein or a functional derivative thereof.
In some embodiments, there is provided an engineered Cas nuclease comprising one or more mutations that increase flexibility of a flexible region comprising at least 5 consecutive amino acid residues of a reference Cas nuclease, wherein the engineered Cas nuclease has an increased activity compared to the reference Cas nuclease. In some embodiments, the flexible region is determined based on the amino acid sequence of the reference Cas nuclease, e.g., using a program selected from the group consisting of PredyFlexy, FoldUnfold, PROFbval, Flexserv, FlexPred, DynaMine and Disomine. In some embodiments, the flexible region is determined using DynaMine, and wherein the amino acid residue with the highest flexibility in the flexible region has a flexibility score S 2 pred of no more than about 0.8 (e.g., no more than about 0.75 or 0.7) . In some embodiments, the flexible region has 5 consecutive amino acid residues, wherein the 3 rd amino acid residue has the lowest flexibility score, and wherein a higher flexibility score indicates lower conformational flexibility. In some embodiments, the flexible region is located in a random coil. In some embodiments, the flexible region is in a domain of the reference Cas nuclease that interacts with DNA and/or RNA. In some embodiments, the engineered Cas nuclease comprises one or more mutations that increase flexibility of two or more flexible regions. In some  embodiments, the reference Cas nuclease is selected from the group consisting of Cas9, Cas12a, Cas12b, Cas12c, Cas12d, Cas12f, Cas12g, Cas12h, Cms1, Cas12i, Cas12j, Cas12k and CasX.
In some embodiments, there is provided an engineered Cas nuclease comprising insertion of one or more G residues and/or substitution of a hydrophobic amino acid residue in a flexible region comprising at least 5 consecutive amino acid residues of a reference Cas nuclease, wherein the engineered Cas nuclease has an increased activity compared to the reference Cas nuclease. In some embodiments, the flexible region has 5 consecutive amino acid residues, wherein the 3 rd amino acid residue has the lowest flexibility score, and wherein a higher flexibility score indicates lower conformational flexibility. In some embodiments, the one or more mutations comprise inserting one or more (e.g., 2) G residues in the flexible region. In some embodiments, the one or more G residues are inserted at the N-terminus of a flexible amino acid residue in the flexible region, wherein the flexible amino acid residue is selected from the group consisting of G, S, N, D, H, M, T, E, Q, K, R, A and P. In some embodiments, the flexible amino acid residue is chosen according to the preference: G>S>N>D>H>M>T>E>Q>K>R>A>P. In some embodiments, the one or more mutations comprise substituting a hydrophobic amino acid residue in a flexible region with a G residue, wherein the hydrophobic amino acid residues is selected from the group consisting of L, I, V, C, Y, F and W. In some embodiments, the engineered Cas nuclease comprises one or more mutations that increase flexibility of two or more flexible regions. In some embodiments, the reference Cas nuclease is selected from the group consisting of Cas9, Cas12a, Cas12b, Cas12c, Cas12d, Cas12f, Cas12g, Cas12h, Cms1, Cas12i, Cas12j, Cas12k and CasX.
In some embodiments, there is provided an engineered Cas effector protein comprising an engineered Cas nuclease or a functional derivative thereof, wherein the engineered Cas nuclease comprises one or more mutations that increase flexibility of a flexible region comprising at least 5 consecutive amino acid residues of a reference Cas nuclease, wherein the engineered Cas nuclease has an increased activity compared to the reference Cas nuclease. In some embodiments, the flexible region is determined using DynaMine, and wherein the amino acid residue with the highest flexibility in the flexible region has a flexibility score S 2 pred of no more than about 0.8 (e.g., no more than about 0.75 or 0.7) . In some embodiments, the flexible region has 5 consecutive amino acid residues, wherein the 3 rd amino acid residue has the lowest flexibility score, and wherein a higher flexibility score indicates lower conformational flexibility. In some embodiments, the engineered Cas nuclease comprises insertion of one or more G residues and/or substitution of  a hydrophobic amino acid residue in the flexible region. In some embodiments, the one or more mutations comprise inserting one or more (e.g., 2) G residues in the flexible region. In some embodiments, the one or more G residues are inserted at the N-terminus of a flexible amino acid residue in the flexible region, wherein the flexible amino acid residue is selected from the group consisting of G, S, N, D, H, M, T, E, Q, K, R, A and P. In some embodiments, the flexible amino acid residue is chosen according to the preference: G>S>N>D>H>M>T>E>Q>K>R>A>P. In some embodiments, the one or more mutations comprise substituting a hydrophobic amino acid residue in a flexible region with a G residue, wherein the hydrophobic amino acid residues is selected from the group consisting of L, I, V, C, Y, F and W. In some embodiments, the engineered Cas nuclease comprises one or more mutations that increase flexibility of two or more flexible regions. In some embodiments, the reference Cas nuclease is selected from the group consisting of Cas9, Cas12a, Cas12b, Cas12c, Cas12d, Cas12f, Cas12g, Cas12h, Cms1, Cas12i, Cas12j, Cas12k and CasX. In some embodiments, the functional derivative of the engineered Cas nuclease is an enzymatically active Cas nuclease, a Cas nickase, an enzymatically dead Cas, a fusion effector protein (e.g., a transcriptional activator, a transcriptional repressor, a base editor, or a primer editor) , or a split Cas effector protein (e.g., an inducible split Cas effector protein, or an auto-inducible split Cas effector protein) .
Also provided are engineered CRISPR-Cas systems comprising any one of the engineered Cas effector proteins (e.g., engineered Cas nucleases) described herein and a guide RNA (including a precursor guide RNA array, a crRNA, a single guide RNA, or a crRNA and a tracrRNA) . In some embodiments, the engineered CRISPR-Cas system comprises one or more nucleic acid molecules encoding the engineered Cas effector protein, and/or the guide RNA. In some embodiments, the engineered CRISPR-Cas system comprises one or more vectors encoding the engineered Cas effector protein, and/or the guide RNA. In some embodiments, there is provided a vector encoding the engineered Cas effector protein. In some embodiments, the vector further comprises a guide RNA. In some embodiments, the vector is selected from the group consisting of retroviral vectors, lentiviral vectors, adenoviral vectors, adeno-associated vectors, and herpes simplex vector. In some embodiments, the vector is an adeno-associated viral (AAV) vector.
The engineered Cas nucleases described herein have mutations that increase conformational flexibility of one or more (e.g., 1, 2, 3, or more) flexible regions in a reference Cas nuclease, such as a naturally occurring wildtype Cas nuclease. Mutations in two or more flexible regions may be combined to provide synergistic increase of activity in  the engineered Cas nuclease as compared to the reference Cas nuclease. As described in Section II, flexible regions in a reference Cas nuclease may be determined using known methods in the art, for example, based one or more of the methods described in Section II, “Methods of Engineering Enzymes, ” above. In some embodiments, the flexible region (s) of a reference Cas is determined based on the amino acid sequence of the reference Cas nuclease. In some embodiments, the flexible region (s) of a reference Cas nuclease is determined using a program selected from the group consisting of PredyFlexy, FoldUnfold, PROFbval, Flexserv, FlexPred, DynaMine and Disomine.
The flexible region may comprises 5 or more (e.g., 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15 or more) consecutive amino acid residues of the reference Cas nuclease. In some embodiments, the flexible region has 5 consecutive amino acid residues of the reference Cas nuclease. The flexible region may be defined by first selecting a peak amino acid residue at position X of the reference Cas nuclease, in which the peak amino acid residue has a flexibility score that is below a pre-determined threshold value (e.g., a S 2 pred of 0.8 or less as determined by DynaMine) , and wherein the peak amino acid residue has a flexibility score that is lower than the flexibility scores of amino acid residues at positions X-5 to X-1 (i.e., 5 amino acid residues flanking the N-terminus of the peak amino acid residue) and X+i and X+5 (i.e., 5 amino acid residues flanking the C-terminus of the peak amino acid residue) , wherein a higher flexibility score indicates lower conformational flexibility. In some embodiments, the flexible region has 5 consecutive amino acid residues, wherein the 3 rd amino acid residue has the lowest flexibility score, and wherein a higher flexibility score indicates lower conformational flexibility.
In some embodiments, the flexible region (s) of a reference Cas nuclease is determined using DynaMine. In some embodiments, the amino acid residue with the highest flexibility in the flexible region (i.e., the peak amino acid residue) of the flexible region is in the context-dependent flexible zone as determined by DynaMine. In some embodiments, the amino acid residue with the highest flexibility in the flexible region (i.e., the peak amino acid residue) of the flexible region is in the flexible zone as determined by DynaMine. In some embodiments, the amino acid residue with the highest flexibility (i.e., the peak amino acid residue) in the flexible region has a flexibility score S 2 pred of no more than about 0.8., e.g., no more than about any one of 0.79, 0.78, 0.77, 0.76, 0.75, 0.74, 0.73, 0.72, 0.71, 0.7 or less, based on calculation in DynaMine. In some embodiments, the amino acid residues of the flexible region are in the context-dependent flexible zone and/or the flexible zone as determined by DynaMine. In some embodiments, each amino acid residue in the flexible  region has a flexibility score S 2 pred of no more than about 0.8., e.g., no more than about any one of 0.79, 0.78, 0.77, 0.76, 0.75, 0.74, 0.73, 0.72, 0.71, 0.7 or less, based on calculation in DynaMine.
In some embodiments, the flexible region (s) of a reference Cas nuclease is located in a random coil. In some embodiments, the flexible region (s) of a reference Cas nuclease is not located in an alpha helix. In some embodiments, the flexible region (s) of a reference Cas nuclease is not located in a beta strand. In some embodiments, the flexible region (s) of a reference Cas nuclease is not located in a random coil. In some embodiments, at least a portion of the flexible region (s) of a reference Cas nuclease is located in an alpha helix or a beta strand.
In some embodiments, the flexible region (s) of a reference Cas nuclease is in a domain of the reference Cas nuclease that interacts with DNA. In some embodiments, the flexible region (s) of a reference Cas nuclease is in a domain of the reference Cas nuclease that interacts with RNA (e.g., guide RNA, such as crRNA and/or tracrRNA) . In some embodiments, the flexible region (s) of a reference Cas nuclease is in a domain of the reference Cas nuclease that does not interact with DNA or RNA. In some embodiments, flexible region (s) of a reference Cas nuclease is located between functional domains of the reference Cas nuclease.
Any mutations that can increase flexibility of the flexible region (s) can be used. Flexibility of the 20 naturally occurring amino acid residues has been characterized based on experimental data (e.g., crystal structures, NMR, and other protein dynamics studies) . Flexible amino acid residues include G, T, R, S, N, Q, D, P, E, K, A, M; and rigid amino acid residues include W, Y, F, C, I, V, H, L. In some embodiments, the flexibility of an amino acid depends on the identity of its neighboring amino acids. See, for example, Smith DK et al., “Improved amino acid flexibility parameters, ” Protein Sci., 2003, 12 (5) : 1060-1072.
In some embodiments, the one or more mutations to the flexible region comprises insertion of one or more (e.g., 1, 2, 3, or more) amino acid residues associated with flexible conformation. The insertion may occur at any position (s) of the flexible region. In some embodiments, the insertion is at the N-terminus or the C-terminus of the peak amino acid in the flexible region. In some embodiments, the insertion is at the N-terminus or the C-terminus of a flexible amino acid residue, e.g., with the preference of G>S>N>D>H>M>T>E>Q>K>R>A>P. In some embodiments, wherein there are more than one preferred flexible amino acids (e.g., two Gs) in the flexible region, the insertion is made at the N-terminus or the C-terminus of the preferred flexible amino acid residue that is closer  to the peak amino acid residue. In some embodiments, wherein two preferred flexible amino acid residues (e.g., two Gs) in the flexible region exist and they have the same distance to the peak amino acid residue, the insertion is made at the N-terminus or the C-terminus of the preferred flexible amino acid residue having a neighboring amino acid residue that has a higher preference in terms of flexibility (i.e., G>S>N>D>H>M>T>E>Q>K>R>A>P) than the neighboring amino acid residue of the other preferred flexible amino acid residue. In some embodiments, wherein two preferred flexible amino acid residues (e.g., two Gs) in the flexible region exist, they have the same distance to the peak amino acid residue, and their neighboring amino acid residues have the same preference in terms of flexibility, the insertion is made at the N-terminus or the C-terminus of the preferred flexible amino acid residue that is closer to the N-terminus of the reference Cas nuclease. Glycine (G) is widely accepted as a flexible amino acid residue. In some embodiments, the one or more mutations to the flexible region comprises insertion of one or more (e.g., 1, 2, 3, or more) Gs in the flexible region. In some embodiments, the one or more mutations comprise inserting two G residues in the flexible region. In some embodiments, the one or more G residues are inserted at the N-terminus of a flexible amino acid residue in the flexible region, wherein the flexible amino acid residue is selected from the group consisting of G, S, N, D, H, M, T, E, Q, K, R, A and P. In some embodiments, the flexible amino acid residue is chosen according to the preference: G>S>N>D>H>M>T>E>Q>K>R>A>P. Insertion of other flexible amino acid residues or groups of flexible amino acid residues, such as S, T, GS, SG, GGS, etc., is also contemplated.
In some embodiments, the one or more mutations to the flexible region comprises substitution of one or more amino acid residues associated with less flexible conformation (e.g., rigid amino acid residues) with one or more amino acid residues with more flexible conformation (e.g., flexible amino acid residues) . In some embodiments, the one or more mutations of the flexible region comprises substitution of a hydrophobic amino acid residue in the flexible region with a flexible amino acid residue. In some embodiments, the hydrophobic amino acid residue is selected from the group consisting of L, I, V, C, Y, F and W. In some embodiments, the one or more mutations of the flexible region comprises substitution of two or more (e.g., 2, 3, or 4) hydrophobic amino acid residues in the flexible region with a flexible amino acid residue. In some embodiments, the flexible amino acid residue is G, S, T, N, D, H, K or R. In some embodiments, the flexible amino acid residue is G, S, or T. In some embodiments, the one or more mutations comprise substituting a hydrophobic amino acid residue in a flexible region with a G residue, wherein the  hydrophobic amino acid residues is selected from the group consisting of L, I, V, C, Y, F and W.Combination of insertion (s) and substitution (s) in the flexible region as described herein are also contemplated, provided that the mutations do not adversely affect the folding and/or activity of the engineered Cas nuclease.
In some embodiments, the engineered Cas nuclease is a Cas effector protein of a Class 1 or Class 2 CRISPR-Cas system. In some embodiments, the engineered Cas nuclease is a Cas effector protein of the Type I-A, Type T-B, Type I-C, Type I-D, Type I-E, Type I-F, Type II-A, Type II-B, Type II-C, Type III-A, Type III-B, Type IV-A, Type IV-B, Type V-A, Type V-B, Type V-F, Type V-U3, Type V-U4, Type V-U2, Type V-U1, Type V-C, Type V-D, Type V-E, Type V-U5, Type V-G, Type V-H, Type V-I, Type V-K and Type VI CRISPR-Cas systems. See, for example, Makarova KS and Koonin EV, “Annotation and Classification of CRISPR-Cas Systems” , Methods Mol. Biol., 2015, 1311: 47-75; Markarova KS et al., “An updated evolutionary classification of CRISPR-Cas systems, ” Nat. Rev. Microbiol. 2015, 13: 722-736; Yan WX et al., “Functionally diverse type V CRISPR-Cas systems, ” Science, 2019, 363 (6422) : 88-91; and Pinilla-Redondo R. et al., “Type IV CRISPR-Cas systems are highly diverse and involved in competition between plasmids, ” Nucleic Acids Research, 2020, 48 (4) : 2000-2012, which are incorporated herein by reference in their entireties. In some embodiments, the engineered Cas nuclease is selected from the group consisting of Cas9, Cas12a, Cas12b, Cas12c, Cas12d, Cas12f, Cas12g, Cas12h, Cms1, Cas12i, Cas12j, Cas12k and CasX. In some embodiments, the engineered Cas nuclease is a Cas9. In some embodiments, the engineered Cas nuclease is a Cas12i. In some embodiments, the engineered Cas nuclease is a Cas12b.
The engineered Cas nuclease has increased activity compared to the reference Cas nuclease. In some embodiments, the activity is target-DNA binding activity. In some embodiments, the activity is site-specific nuclease activity. In some embodiments, the activity is double-strand DNA cleavage activity. In some embodiments, the activity is single-strand DNA cleavage activity, including, e.g., site-specific DNA cleavage activity, or nonspecific DNA cleavage activity. In some embodiments, the activity is single-strand RNA cleavage activity, e.g., site-specific RNA cleavage activity or nonspecific RNA cleavage activity. In some embodiments, the activity is measured in vitro. In some embodiments, the activity is measured in a cell, such as a bacterial cell, a plant cell, or a eukaryotic cell. In some embodiments, the activity is measured in a mammalian cell, e.g., a rodent cell or a human cell. In some embodiments, the activity is measured in a human cell, such as a 293T cell. In some embodiments, the engineered Cas nuclease has at least about any one of 20%, 30%, 40%,  60%, 70%, 80%, 90%, 1.5 fold, 2 fold, 3 fold, 4 fold, 5 fold or higher increase of site-specific nuclease activity compared to that the reference Cas nuclease. The site-specific nuclease activity of the engineered Cas nuclease may be measured using known methods in the art, including, for example, gel-shift assay.
In some embodiments, the activity is gene-editing activity in a cell. In some embodiments, the cell is a bacterial cell, a plant cell, or a eukaryotic cell. In some embodiments, the cell is a mammalian cell, such as a rodent cell or a human cell. In some embodiments, the cell is a 293T cell. In some embodiments, the activity is indel formation activity, e.g., via site-specific cleavage of a target nucleic acid by the engineered Cas nuclease and DNA repair by non-homologous end-joining (NHEJ) mechanism, in a cell at a target genomic locus. In some embodiments, the activity is insertion of an exogenous nucleic acid sequence, e.g., via site-specific cleavage of a target nucleic acid by the engineered Cas nuclease and DNA repair by homologous recombination (HR) mechanism, in a cell at a target genomic locus. In some embodiments, the engineered Cas nuclease has at least about any one of 20%, 30%, 40%, 60%, 70%, 80%, 90%, 1.5 fold, 2 fold, 3 fold, 4 fold, 5 fold or higher increase of gene-editing (e.g., indel formation) activity compared to that the reference Cas nuclease at a genomic locus in a cell (e.g., human cell such as 293T cell) . In some embodiments, the engineered Cas nuclease has at least about any one of 20%, 30%, 40%, 60%, 70%, 80%, 90%, 1.5 fold, 2 fold, 3 fold, 4 fold, 5 fold or higher increase of gene-editing (e.g., indel formation) activity compared to that the reference Cas nuclease at a plurality (e.g., 2, 3, 4, 5, 6, 7, 8, 9, 10 or more) of genomic loci in a cell (e.g., human cell such as 293T cell) . In some embodiments, the engineered Cas nuclease is capable of editing a larger number of genomic loci than the reference Cas nuclease. In some embodiments, the consensus PAM sequence of the engineered Cas nuclease is the same as the reference Cas nuclease.
Gene-editing efficiency of an engineered Cas nuclease in a cell may be determined using known methods in the art, including, for example, a T7 endonuclease 1 (T7E1) assay, sequencing of the target DNA (including, e.g., Sanger sequence, and next generation sequencing) , a Tracking of Indels by Decomposition (TIDE) assay, or Indel Detection by Amplicon Analysis (IDAA) assay. See, for example, Sentmanat MF et al., “A survey of validation strategies for CRISPR-Cas9 editing, ” Scientific Reports, 2018, 8, Article number 888, which is incorporated herein by reference in its entirety. In some embodiments, the gene-editing efficiency of the engineered Cas nuclease in a cell is measured using targeted next-generation sequencing (NGS) , for example, as described in the Examples herein.  Exemplary genomic loci for determination of gene-editing efficiency of the engineered Cas nuclease include, but are not limited to, CCR5, AAVS, CD34, RNF2, and EMX1.
The present application further provides engineered Cas effector proteins based on any one of the engineered Cas nucleases described herein. In some embodiments, the engineered Cas effector protein comprises a functional derivative of the engineered Cas nuclease, such as any one of the functional derivatives as described in the section “Functional Derivatives” below.
In some embodiments, the engineered Cas effector protein has site-specific nuclease activity. In some embodiments, the engineered Cas effector protein can induce double-strand breaks in a target DNA molecule. In some embodiments, the engineered Cas effector protein comprises an enzymatically active engineered Cas nuclease. In some embodiments, the engineered Cas effector protein is a Cas nickase that can induces a single-strand break in a target DNA molecule. In some embodiments, the engineered Cas effector protein comprises a nickase mutant of the engineered Cas nuclease.
In some embodiments, the engineered Cas effector protein comprises an enzymatically inactive (i.e., enzymatically dead) mutant of the engineered Cas nuclease. In some embodiments, the engineered Cas effector protein further comprises one or more functional domains fused to the engineered Cas nuclease or functional derivative thereof. In some embodiments, the engineered Cas effector protein comprises a functional domain fused to an enzymatically inactive mutant of the engineered Cas nuclease. In some embodiments, the functional domain is selected from the group consisting of a translation initiator domain, a transcription repressor domain, a transactivation domain, an epigenetic modification domain, a nucleobase-editing domain (e.g., CBE or ABE domain) , a reverse transcriptase domain, a reporter domain (e.g., a fluorescent domain) and a nuclease domain.
In some embodiments, the engineered Cas effector protein comprises split Cas polypeptides based on the engineered Cas nuclease or functional derivative thereof. In some embodiments, the engineered Cas effector protein comprises a first polypeptide comprising an N-terminal portion of the engineered Cas nuclease or functional derivative thereof, and a second polypeptide comprising a C-terminal portion of the engineered Cas nuclease or functional derivative thereof, wherein the first polypeptide and the second polypeptide are capable of associating with each other in the presence of a guide RNA comprising a guide sequence to form a CRISPR complex that specifically binds to a target nucleic acid comprising a target sequence complementary to the guide sequence. In some embodiments, the split Cas effector protein is inducible, e.g., each split Cas polypeptide comprises a  dimerization domain that are capable of associating with each other in the presence of an inducer (e.g., rapamycin) . In some embodiments, the split Cas effector protein is auto-inducible, e.g., the split Cas polypeptides do not comprise dimerization domains, and they are capable of associating with each other in the presence of a guide RNA.
Engineered Cas12b effector proteins
The present application provides engineered Cas12b effector proteins (e.g., Cas12b nucleases, Cas12b nickases, Cas12b fusion effector proteins, split Cas12b effector proteins) that have improved activity (e.g., target binding, double-strand cleavage activity, nickase activity, and/or gene-editing activity) .
In some embodiments, there is provided an engineered Cas12b nuclease comprising one or more mutations that increase flexibility of a flexible region that corresponds to amino acid residues 835 to 839, wherein the amino acid residue numbering is based on SEQ ID NO: 1, wherein the engineered Cas12b nuclease has an increased activity compared to a reference Cas12b nuclease. In some embodiments, the reference Cas12b nuclease is BhCas12b (e.g., BhCas12bv4) . In some embodiments, the one or more mutations comprise inserting one or more (e.g., 2) G residues in the flexible region. In some embodiments, the one or more G residues are inserted at the N-terminus of a flexible amino acid residue in the flexible region, wherein the flexible amino acid residue is selected from the group consisting of G, S, N, D, H, M, T, E, Q, K, R, A and P. In some embodiments, the flexible amino acid residue is chosen according to the preference: G>S>N>D>H>M>T>E>Q>K>R>A>P. In some embodiments, the one or more mutations comprise substituting a hydrophobic amino acid residue in a flexible region with a G residue, wherein the hydrophobic amino acid residues is selected from the group consisting of L, I, V, C, Y, F and W. In some embodiments, the engineered Cas12b nuclease comprises the amino acid sequence of SEQ ID NO: 83. In some embodiments, the engineered Cas12b nuclease comprises an amino acid sequence having at least about 85% (e.g., at least about any one of 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, 99%or more) sequence identity to SEQ ID NO: 2.
In some embodiments, there is provided an engineered Cas12b nuclease comprising the amino acid sequence of SEQ ID NO: 2.
SEQ ID NO: 1 Bacillus hisashii Cas12b version 4 (BhCas12b-v4) amino acid sequence
Figure PCTCN2020134249-appb-000001
Figure PCTCN2020134249-appb-000002
SEQ ID NO: 2 enBhCas12bv4 (BhCas12b-v4 1.1) amino acid sequence
Figure PCTCN2020134249-appb-000003
Type V-B CRISPR-Cas12b (also known as C2c1) system has been identified as a dual-RNA-guided (i.e., crRNA and tracrRNA) DNA endonuclease system with distinct features from Cas9 and Cas12a (Shmakov, S. et al. Mol. Cell 60, 385-397 (2015) ) . First, Cas12b was reported to generate staggered ends distal to the protospacer adjacent motif (PAM) site in vitro when reconstituted with the crRNA/tracrRNA duplex. Second, although the RuvC domain of Cas12b is similar to that of Cas9 and Cas12a, its putative Nuc domain shares no sequence or structural similarity to the HNH domain of Cas9 and the Nuc domain of Cas12a. Moreover, Cas12b proteins are smaller than the most widely used SpCas9 and  Cas12a (e.g., AacCas12b: 1,129 amino acids (aa) ; SpCas9: 1,369 aa; AsCas12a: 1,353 aa; LbCas12a: 1,228 aa) , making Cas12b suitable for adeno-associated virus (AAV) -mediated in vivo delivery in gene therapy. Compared with small-sized Cas9 proteins, such as SaCas9 and CjCas9, Cas12b recognizes simpler PAM sequences (e.g., AacCas12b: 5′-TTN-3’ (SEQ ID NO: 3) ; compared to SaCas9: 5’-NNGRRT-3’ (SEQ ID NO: 4) , CjCas9: 5’-NNNNRYAC-3’ (SEQ ID NO: 5) ) , which significantly increase the targeting range of Cas12b in the genome. Additionally, Cas12b has minimal off-target effects and thus may serve as a safer choice for therapeutic and clinical applications.
Cas12b (C2c1) nucleases from various organisms may be used as the reference Cas12b nuclease to provide engineered Cas12b effector proteins of the present application. Exemplary Cas12b nucleases have been described, for example, in Shmakov, S. et al. Mol. Cell 60, 385-397 (2015) ; Shmakov, S. et al. Nat. Rev. Microbiol. 15, 169-182 (2017) ; WO2016205764, and WO2020/087631, which are incorporated herein by reference in their entirety.
In some embodiments, the engineered Cas12b effector protein is based on a reference Cas12b protein (e.g., Cas12b nuclease) selected from Cas12b proteins from Alicyclobacillus acidiphilus (AaCas12b) , Cas12b from Alicyclobacillus kakegawensis (AkCas12b) , Cas12b from Alicyclobacillus macrosporangiidus (AmCas12b) , Cas12b from Bacillus hisashii (BhCas12b) , BsCas12b from Bacillus, Bs3Cas12b from Bacillus, Cas12b from Desulfovibrio inopinatus (DiCas12b) , Cas12b from Laceyella sediminis (LsCas12b) , Cas12b from Spirochaetes bacterium (SbCas12b) , Cas12b from Tuberibacillus calidus (TcCas12b) and functional derivatives thereof. Sequences of naturally occurring Cas12b proteins are known, for example, in UniProtKB IDs: T0D7A2, A0A6I3SPI6, and A0A6I7FUC4, which are incorporated herein by reference in its entirety.
In some embodiments, the reference Cas12b protein is a Cas12b nuclease from Bacillus hisashii (BhCas12b) or a functional derivative thereof. In some embodiments, the engineered Cas12b effector protein is based on a reference Cas12b protein comprising an amino acid sequence having at least about 85% (e.g., at least about any one of 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, or 99%) sequence identity to the amino acid sequence of SEQ ID NO: 6. In some embodiments, the engineered Cas12b effector protein is based on a reference Cas12b nuclease comprising the amino acid sequence of SEQ ID NO: 6. Exemplary flexible regions in BhCas12b and mutations that increase flexibility of the flexible regions are shown in FIGs. 2-3.
In some embodiments, the reference Cas12b protein is a Cas12b nuclease from Alicyclobacillus acidiphilus (AaCas12b) or a functional derivative thereof. In some embodiments, the engineered Cas12b effector protein is based on a reference Cas12b protein comprising an amino acid sequence having at least about 85% (e.g., at least about any one of 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, or 99%) sequence identity to the amino acid sequence of SEQ ID NO: 7. In some embodiments, the engineered Cas12b effector protein is based on a reference Cas12b nuclease comprising the amino acid sequence of SEQ ID NO: 7.
It is noted that orthologues having a certain sequence identity (e.g., at least about any one of 60%, 70%, 80%, 85%, 90%, 95%, 98%or higher) to the reference Cas12b proteins or fragments thereof may be used as basis to design the engineered Cas12b effector proteins of the present application. The skilled artisan can determine, based on the purpose and application, the percentage of sequence identity of an orthologue of Cas12b or fragment thereof suitable for use in the present application. Methods for determining sequence identity values may be found in Computational Molecular Biology, Lesk, A.M., ed., Oxford University Press, New York, 1988; Biocomputing: Informatics and Genome Projects, Smith, D.W., ed., Academic Press, New York, 1993; Computer Analysis of Sequence Data, Part I, Griffin, A.M., and Griffin, H.G., eds., Humana Press, New Jersey, 1994; Sequence Analysis in Molecular Biology, von Heinje, G., Academic Press, 1987; and Sequence Analysis Primer, Gribskov, M. and Devereux, J., eds., M Stockton Press, New York, 1991) . Various Cas12b orthologues have been described in WO2020/087631, which is incorporated herein by reference in its entirety.
Naturally occurring Cas12b nucleases have various structural domains. In some embodiments, a reference Cas12b nuclease comprises from the N-terminus to the C-terminus: a first WED domain (WED-I; also known as OBD-I domain) , a first REC domain (REC1) , a second WED domain (WED-II; also known as OBD-II domain) , a first RuvC domain (RuvC-I) , a bridge helix (BH) domain, a second RuvC domain (RuvC-II) , a first Nuc domain (Nuc-I; also known as UK-I domain) , a third RuvC domain (RuvC-III) and a second Nuc domain (Nuc-II; also known as UK-II domain) . Domain boundaries may be determined using known methods in the art, such as based on crystal structures of a reference Cas12b nuclease (e.g., PDB ID Nos: 5U30, 5U31, 5U33, 5U34 and 5WQE for AaCas12b) , and/or sequence homology to known functional domains in a reference Cas12b nuclease. In some embodiments, the AaCas12b has the following domains: WEB-I domain (amino acid residues 1-14) , REC1 domain (amino acid residues 15-386) , WED-II domain (amino acid  residues 387-518) , RuvC-I domain (amino acid residues 519-628) , BH domain (amino acid residues 629-658) , REC2 domain (amino acid residues 659-784) , RuvC-II domain (amino acid residues 785-900) , Nuc-I domain (amino acid residues 901-974) , RuvC-III domain (amino acid residues 975-993) , and Nuc-II domain (amino acid residues 994-1129) , wherein the amino acid numbering is based on SEQ ID NO: 7.
Crystal structures of Alicyclobacillus acidoterrestris Cas12b bound to sgRNA as a binary complex and to target DNAs as ternary complexes have been described in Yang H., et al. Cell 167: 1814-1828 (2016) and Liu L. et al. Mol. Cell 65: 310-322 (2017) . Briefly, the crystal structures show 2 discontinuous REC (recognition, residues 15-386, 658-784) and NUC (nuclease, residues 1-14, 387-658 and 785-1129) lobes composed of several domains each. The crRNA (or single guide RNA, sgRNA) binds in a central channel between the two lobes. PAM recognition is sequence specific and occurs mostly via interaction with the REC 1 (helical-1) and WED-II (OBD-II) domains. The sgRNA-target DNA heteroduplex binds primarily to the REC lobe in a sequence-independent manner.
In some embodiments, the engineered Cas12b nuclease is based on a functional variant of a naturally occurring Cas12b nuclease. In some embodiments, the functional variant has one or more mutations, such as amino acid substitutions, insertions and deletions. By way of example, the functional variant may comprise any one of 1, 2, 3, 4, 5, 6, 7, 8, 9, 10 or more amino acid substitutions compared to a wild type naturally occurring Cas12b nuclease. In some embodiments, the one or more substitutions are conservative substitutions. In some embodiments, the functional variant has all domains of a naturally occurring Cas12b nuclease. In some embodiments, the functional variant does not have one or more domains of a naturally occurring Cas12b nuclease.
Also provided are engineered Cas12b effector proteins based on any one of the engineered Cas12b nucleases described herein. In some embodiments, the engineered Cas12b effector protein is enzymatically active. In some embodiments, the engineered Cas12b effector protein is a nuclease that cleaves both strands of a target duplex nucleic acid (e.g., duplex DNA) . In some embodiments, the engineered Cas12b effector protein is a nickase, i.e., cleaving a single strand of a target duplex nucleic acid (e.g., duplex DNA) . In some embodiments, the engineered Cas12b effector protein comprises an enzymatically inactive mutant of the engineered Cas12b nuclease. Mutations at one or more amino acid residues in the active site of a Cas12b nuclease can result in an enzymatically dead Cas12b. For example, R785A, R911A, or D977A mutants of AaCas12b have no nuclease activities in human cells. See, for example, Teng F. et al., Cell Discovery, 4, Article number: 63 (2018) , which is  incorporated herein by reference in its entirety. In some embodiments, the engineered Cas12b effector protein comprises an engineered Cas12b having a mutation corresponding to the R785A mutation of AaCas12b. In some embodiments, the engineered Cas12b effector protein comprises an engineered Cas12b having one or more mutations corresponding to R785A, R911A or D977A of AaCas12b. In some embodiments, the engineered Cas12b effector protein comprises an engineered Cas12b having a mutation corresponding to the R911A mutation of AaCas12b. In some embodiments, the engineered Cas12b effector protein comprises an engineered Cas12b having a mutation corresponding to the D977A mutation of AaCas12b.
In some embodiments, there is provided an engineered Cas12b nickase. In some embodiments, there is provided an engineered Cas12b fusion effector protein, comprising an engineered Cas12b nuclease or functional derivative thereof (e.g., an enzymatically inactive mutant of the engineered Cas12b nuclease) fused to a functional domain, such as a translation initiator domain, a transcription repressor domain, a transactivation domain, an epigenetic modification domain, a nucleobase-editing domain (e.g., CBE or ABE domain) , a reverse transcriptase domain, a reporter domain (e.g., a fluorescent domain) , or a nuclease domain. In some embodiments, there is provided an engineered split Cas12b effector protein. Split Cas12b effector proteins have been described, for example, in PCT/CN2020/111057, which are incorporated herein by reference in their entirety.
Also provided are engineered CRISPR-Cas12b systems comprising any one of the engineered Cas12b effector proteins described herein, and a guide RNA comprising a guide sequence complementary to a target sequence, or one or more nucleic acids encoding the guide RNA, wherein the engineered Cas12b effector protein and the guide RNA are capable of forming a CRISPR complex that specifically binds to a target nucleic acid comprising the target sequence and induces a modification of the target nucleic acid. In some embodiments, the guide RNA comprises a crRNA and a tracrRNA. In some embodiments, the guide RNA is a sgRNA. In some embodiments, the guide RNA is a precursor that can be processed into a crRNA and a tracrRNA. In some embodiments, the guide RNA is a precursor RNA array encoding a plurality of crRNAs, and wherein each processed crRNA is associated with a tracrRNA. In some embodiments, the engineered Cas12b effector protein and/or the guide RNA are encoded by one or more vectors such as AAV vectors. In some embodiments, the engineered CRISPR-Cas12b system is a ribonucleoprotein (RNP) complex comprising the engineered Cas12b effector protein bound to the guide RNA.
Engineered Cas12i effector proteins
The present application provides engineered Cas12i effector proteins that have improved activity (e.g., target binding, double-strand cleavage activity, nickase activity, and/or gene-editing activity) .
In some embodiments, there is provided an engineered Cas12i nuclease comprising one or more mutations that increase flexibility of a flexible region in a reference Cas12i nuclease that is selected from the group consisting of regions corresponding to amino acid residues 228-232, amino acid residues 439-443, amino acid residues 478-482, amino acid residues 500-504, amino acid residues 775-779, and amino acid residues 925-929, wherein the amino acid residue numbering is based on SEQ ID NO: 8, wherein the engineered Cas12i nuclease has an increased activity compared to the reference Cas12i nuclease. In some embodiments, the flexible region corresponds to amino acid residues 439-443, or amino acid residues 925-929, wherein the amino acid residue numbering is based on SEQ ID NO: 8. In some embodiments, the reference Cas12i nuclease is Cas12i2. In some embodiments, the one or more mutations comprise inserting one or more (e.g., 2) G residues in the flexible region. In some embodiments, the one or more G residues are inserted at the N-terminus of a flexible amino acid residue in the flexible region, wherein the flexible amino acid residue is selected from the group consisting of G, S, N, D, H, M, T, E, Q, K, R, A and P. In some embodiments, the flexible amino acid residue is chosen according to the preference: G>S>N>D>H>M>T>E>Q>K>R>A>P. In some embodiments, the one or more mutations comprise substituting a hydrophobic amino acid residue in a flexible region with a G residue, wherein the hydrophobic amino acid residues is selected from the group consisting of L, I, V, C, Y, F and W. In some embodiments, the engineered Cas12i nuclease comprises one or more amino acid sequences selected from the group consisting of SEQ ID NOs: 96-105. In some embodiments, the engineered Cas12i nuclease comprises one or more amino acid sequences selected from the group consisting of SEQ ID NOs: 99 and 104-105. In some embodiments, the engineered Cas12i nuclease comprises the amino acid sequence of having at least about 85% (e.g., at least about any one of 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, 99%or more) sequence identity to SEQ ID NO: 8.
In some embodiments, there is provided an engineered Cas12i2 nuclease comprising the amino acid sequence of SEQ ID NO: 99. In some embodiments, the engineered Cas12i2 nuclease comprises an amino acid sequence having at least about 85%(e.g., at least about any one of 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, 99%or more) sequence identity to SEQ ID NO: 14. In some embodiments, there is provided an engineered Cas12i nuclease comprising the amino acid sequence of SEQ ID NO: 14.
In some embodiments, there is provided an engineered Cas12i2 nuclease comprising the amino acid sequence of SEQ ID NO: 104. In some embodiments, the engineered Cas12i2 nuclease comprises an amino acid sequence having at least about 85%(e.g., at least about any one of 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, 99%or more) sequence identity to SEQ ID NO: 19. In some embodiments, there is provided an engineered Cas12i nuclease comprising the amino acid sequence of SEQ ID NO: 18.
In some embodiments, there is provided an engineered Cas12i2 nuclease comprising the amino acid sequence of SEQ ID NO: 105. In some embodiments, the engineered Cas12i2 nuclease comprises an amino acid sequence having at least about 85%(e.g., at least about any one of 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, 99%or more) sequence identity to SEQ ID NO: 18. In some embodiments, there is provided an engineered Cas12i nuclease comprising the amino acid sequence of SEQ ID NO: 19.
In some embodiments, there is provided an engineered Cas12i2 nuclease comprising the amino acid sequence of SEQ ID NO: 99 and the amino acid sequence of SEQ ID NO: 104. In some embodiments, the engineered Cas12i2 nuclease comprises an amino acid sequence having at least about 85% (e.g., at least about any one of 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, 99%or more) sequence identity to SEQ ID NO: 20. In some embodiments, there is provided an engineered Cas12i nuclease comprising the amino acid sequence of SEQ ID NO: 20.
SEQ ID NO: 8 wild-type Cas12i2 amino acid sequence
Figure PCTCN2020134249-appb-000004
SEQ ID NO: 14 Cas12i2-2.2 amino acid sequence
Figure PCTCN2020134249-appb-000005
SEQ ID NO: 18 Cas12i2-6.1 amino acid sequence
Figure PCTCN2020134249-appb-000006
SEQ ID NO: 19 Cas12i2-6.2 amino acid sequence
Figure PCTCN2020134249-appb-000007
Figure PCTCN2020134249-appb-000008
SEQ ID NO: 20 enCas12i2 (Cas12i2-2.2+6.1) amino acid sequence
Figure PCTCN2020134249-appb-000009
Type V-I CRISPR-Cas12i has been identified as an RNA-guided DNA endonuclease system. Unlike CRISPR-Cas systems such as Cas12b or Cas9, the Cas12i-based CRISPR system does not require tracrRNA sequences. In some embodiments, the RNA guide includes a crRNA. Generally, the crRNAs described herein include a direct repeat sequence and a spacer sequence. In certain embodiments, the crRNA includes, consists essentially of, or consists of a direct repeat sequence linked to a guide sequence or spacer sequence. In some embodiments, the crRNA includes a direct repeat sequence, a spacer sequence, and a direct repeat sequence (DR-spacer-DR) , which is typical of precursor crRNA (pre-crRNA) configurations in other CRISPR systems. In some embodiments, the crRNA includes a truncated direct repeat sequence and a spacer sequence, which is typical of processed or mature crRNA. In some embodiments, the CRISPR-Cas effector protein forms a complex with the RNA guide, and the spacer sequence directs the complex to a sequence-specific binding with the target nucleic acid that is complementary to the spacer sequence.
In some embodiments, the engineered Cas12i of the present application is an endonuclease, which binds to a specific site of a target sequence and cleaves under the guidance of a guide RNA, and has both DNA and RNA endonuclease activity. In some embodiments, the Cas12i is capable of autonomous crRNA biogenesis by processing of pre-crRNA arrays. Autonomous pre-crRNA processing facilitates Cas12i delivery for double nicking applications, as two separate genomic loci can be targeted from a single crRNA transcript. The Cas12i protein then processes the CRISPR array into two cognate crRNAs that result in the formation of paired nicking complexes. Multiplexing of Type V-I (Cas12i) effectors is accomplished using the pre-crRNA processing capability of the effectors, where multiple targets with different sequences can be programmed on a single RNA guide. As such, multiple genes or DNA targets can be manipulated simultaneously for therapeutic applications. In some embodiments, the guide RNA comprises a pre-crRNA expressed from a CRISPR array consisting of target sequences interleaved by unprocessed DR sequences, repeated to enable targeting of one, two, or more loci simultaneously by the intrinsic pre-crRNA processing of the effector.
In some embodiments, the Type V-I CRISPR-Cas effector protein is capable of recognizing a protospacer adjacent motif (PAM) , and the target nucleic acid includes or consists of a PAM including or consisting of the nucleic acid sequence 5′-TTN-3′ (SEQ ID NO: 21) or 5′-TTH-3′ (SEQ ID NO: 22) or 5′-TTY-3′ (SEQ ID NO: 23) or 5′-TTC-3′ (SEQ ID NO: 24) .
Cas12i nucleases from various organisms may be used as the reference Cas12i nuclease to provide engineered Cas12i effector proteins of the present application. Exemplary Cas12i nucleases have been described, for example, in WO2019/201331A1 and US2020/0063126A1, which are incorporated herein by reference in their entirety. In some embodiments, the reference Cas12i protein is enzymatically active. In some embodiments, the reference Cas12i is a nuclease, i.e., cleaving both strands of a target duplex nucleic acid (e.g., duplex DNA) . In some embodiments, the reference Cas12i is a nickase, i.e., cleaving a single strand of a target duplex nucleic acid (e.g., duplex DNA) . In some embodiments, the reference Cas12i protein is enzymatically inactive. In some embodiments, the reference Cas12i nuclease is Cas12i1 (e.g., SEQ ID NO: 9) , Cas12i2 (e.g., SEQ ID NO: 8) , or Cas12i-Phi (e.g., SEQ ID NO: 10) . In some embodiments, the reference Cas12i nuclease comprises an amino acid sequence selected from the group consisting of SEQ ID NOs: 8-10. Orthologues having a certain sequence identity (e.g., at least about any one of 60%, 70%,  80%, 85%, 90%, 95%, 98%or higher) to Cas12i or functional derivatives thereof may be used as basis to design the engineered Cas12i effector proteins of the present application.
In some embodiments, the engineered Cas12i proteins are based on a functional variant of a naturally occurring Cas12i protein. In some embodiments, the functional variant has one or more mutations, such as amino acid substitutions, insertions and deletions. By way of example, the functional variant may comprise any one of 1, 2, 3, 4, 5, 6, 7, 8, 9, 10 or more amino acid substitutions compared to a wild type naturally occurring Cas12i protein. In some embodiments, the one or more substitutions are conservative substitutions. In some embodiments, the functional variant has all domains of a naturally occurring Cas12i protein. In some embodiments, the functional variant does not have one or more domains of a naturally occurring Cas12i protein.
Also provided are engineered Cas12i effector proteins based on any one of the engineered Cas12i nucleases described herein. In some embodiments, the engineered Cas12i effector protein is enzymatically active. In some embodiments, the engineered Cas12i effector protein is a nuclease that cleaves both strands of a target duplex nucleic acid (e.g., duplex DNA) . In some embodiments, the engineered Cas12i effector protein is a nickase, i.e., cleaving a single strand of a target duplex nucleic acid (e.g., duplex DNA) . In some embodiments, the engineered Cas12i effector protein comprises an enzymatically inactive mutant of the engineered Cas12i nuclease. Mutations at one or more amino acid residues in the active site of a Cas12i nuclease can result in an enzymatically dead Cas12i. In some embodiments, the engineered Cas12i enzymes provided herein can be modified to have diminished nuclease activity, e.g., nuclease inactivation of at least 50%, at least 60%, at least 70%, at least 80%, at least 90%, at least 95%, at least 97%, or 100%as compared with the wild type Cas12i enzymes. The nuclease activity can be diminished by several methods, e.g., introducing mutations into the nuclease or PAM interacting domains of the Cas12i enzymes. In some embodiments, catalytic residues for the nuclease activities are identified, and these amino acid residues can be substituted by different amino acid residues (e.g., glycine or alanine) to diminish the nuclease activity. Examples of such mutations for Cas12i1 include D647A or E894A or D948A. Examples of such mutations for Cas12i2 include D599A or E833A or D886A.
In some embodiments, there is provided an engineered Cas12i nickase. In some embodiments, there is provided an engineered Cas12i fusion effector protein, comprising an engineered Cas12i nuclease or functional derivative thereof (e.g., an enzymatically inactive mutant of the engineered Cas12i nuclease) fused to a functional domain, such as a translation  initiator domain, a transcription repressor domain, a transactivation domain, an epigenetic modification domain, a nucleobase-editing domain (e.g., CBE or ABE domain) , a reverse transcriptase domain, a reporter domain (e.g., a fluorescent domain) or a nuclease domain. In some embodiments, there is provided an engineered Cas12i base editor comprising a catalytically inactive variant of any one of the engineered Cas12i nucleases described herein (e.g., enCas12i2 or SEQ ID NO: 20) fused to a cytosine deaminase domain or an adenosine deaminase domain. In some embodiments, there is provided an engineered Cas12i prime editor comprising a catalytically inactive variant of any one of the engineered Cas12i nucleases described herein (e.g., enCas12i2 or SEQ ID NO: 20) fused to a reverse transcriptase domain. In some embodiments, there is provided an engineered split Cas12i effector protein.
Also provided are engineered CRISPR-Cas12i systems comprising any one of the engineered Cas12i effector proteins described herein, and a guide RNA comprising a guide sequence complementary to a target sequence, or one or more nucleic acids encoding the guide RNA, wherein the engineered Cas12i effector protein and the guide RNA are capable of forming a CRISPR complex that specifically binds to a target nucleic acid comprising the target sequence and induces a modification of the target nucleic acid. In some embodiments, the guide RNA comprises a crRNA. In some embodiments, a tracrRNA is not required. In some embodiments, the guide RNA comprises a pre-crRNA expressed from a CRISPR array consisting of target sequences interleaved by unprocessed DR sequences, repeated to enable targeting of one, two, or more loci simultaneously by the intrinsic pre-crRNA processing of the effector. In some embodiments, the guide RNA is a precursor RNA array encoding a plurality of crRNAs. In some embodiments, the engineered Cas12i effector protein and/or the guide RNA are encoded by one or more vectors such as AAV vectors. In some embodiments, the engineered CRISPR-Cas12i system is a ribonucleoprotein (RNP) complex comprising the engineered Cas12i effector protein bound to the guide RNA.
Engineered Cas9 effector proteins
The present application provides engineered Cas9 effector proteins that have improved activity (e.g., target binding, double-strand cleavage activity, nickase activity, and/or gene-editing activity) .
In some embodiments, there is provided an engineered Cas9 nuclease comprising one or more mutations that increase flexibility of a flexible region in a reference Cas9 nuclease that is selected from the group consisting of regions corresponding to amino acid residues 39-43, amino acid residues 135-139, amino acid residues 176-180, amino acid  residues 274-278, amino acid residues 351-355, and amino acid residues 389-393, amino acid residues 521-525, amino acid residues 541-545, amino acid residues 755-759, amino acid residues 774-778, amino acid residues 786-790, amino acid residues 811-815, amino acid residues 848-852, amino acid residues 855-859, amino acid residues 874-878, amino acid residues 891-895, amino acid residues 1019-1023, and amino acid residues 1036-1040, wherein the amino acid residue numbering is based on SEQ ID NO: 25, and wherein the engineered Cas9 nuclease has an increased activity compared to the reference Cas9 nuclease. In some embodiments, the reference Cas9 nuclease is GeoCas9. In some embodiments, the flexible region in the reference Cas9 nuclease is selected from the group consisting of regions corresponding to amino acid residues 135-139, amino acid residues 176-180, amino acid residues 541-545, amino acid residues 755-759, and amino acid residues 811-815, wherein the amino acid residue numbering is based on SEQ ID NO: 25. In some embodiments, the one or more mutations comprise inserting one or more (e.g., 2) G residues in the flexible region. In some embodiments, the one or more G residues are inserted at the N-terminus of a flexible amino acid residue in the flexible region, wherein the flexible amino acid residue is selected from the group consisting of G, S, N, D, H, M, T, E, Q, K, R, A and P. In some embodiments, the flexible amino acid residue is chosen according to the preference: G>S>N>D>H>M>T>E>Q>K>R>A>P. In some embodiments, the one or more mutations comprise substituting a hydrophobic amino acid residue in a flexible region with a G residue, wherein the hydrophobic amino acid residues is selected from the group consisting of L, I, V, C, Y, F and W. In some embodiments, the engineered Cas9 nuclease comprises one or more amino acid sequences selected from the group consisting of SEQ ID NOs: 138-175. In some embodiments, the engineered Cas9 nuclease comprises one or more amino acid sequences selected from the group consisting of SEQ ID NOs: 139, 140, 145-146 and 153. In some embodiments, the engineered Cas9 nuclease comprises the amino acid sequence of having at least about 85% (e.g., at least about any one of 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, 99%or more) sequence identity to SEQ ID NO: 25.
In some embodiments, there is provided an engineered GeoCas9 nuclease comprising the amino acid sequence of SEQ ID NO: 139. In some embodiments, the engineered Cas9 nuclease comprises the amino acid sequence of having at least about 85%(e.g., at least about any one of 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, 99%or more) sequence identity to SEQ ID NO: 27. In some embodiments, there is provided an engineered Cas9 nuclease comprising the amino acid sequence of SEQ ID NO: 27.
In some embodiments, there is provided an engineered GeoCas9 nuclease comprising the amino acid sequence of SEQ ID NO: 140. In some embodiments, the engineered Cas9 nuclease comprises the amino acid sequence of having at least about 85%(e.g., at least about any one of 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, 99%or more) sequence identity to SEQ ID NO: 28. In some embodiments, there is provided an engineered Cas9 nuclease comprising the amino acid sequence of SEQ ID NO: 28.
In some embodiments, there is provided an engineered GeoCas9 nuclease comprising the amino acid sequence of SEQ ID NO: 145. In some embodiments, the engineered Cas9 nuclease comprises the amino acid sequence of having at least about 85%(e.g., at least about any one of 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, 99%or more) sequence identity to SEQ ID NO: 33. In some embodiments, there is provided an engineered Cas9 nuclease comprising the amino acid sequence of SEQ ID NO: 33.
In some embodiments, there is provided an engineered GeoCas9 nuclease comprising the amino acid sequence of SEQ ID NO: 146. In some embodiments, the engineered Cas9 nuclease comprises the amino acid sequence of having at least about 85%(e.g., at least about any one of 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, 99%or more) sequence identity to SEQ ID NO: 34. In some embodiments, there is provided an engineered Cas9 nuclease comprising the amino acid sequence of SEQ ID NO: 34.
In some embodiments, there is provided an engineered GeoCas9 nuclease comprising the amino acid sequence of SEQ ID NO: 153. In some embodiments, the engineered Cas9 nuclease comprises the amino acid sequence of having at least about 85%(e.g., at least about any one of 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, 99%or more) sequence identity to SEQ ID NO: 41. In some embodiments, there is provided an engineered Cas9 nuclease comprising the amino acid sequence of SEQ ID NO: 41.
SEQ ID NO: 25 wild-type GeoCas9 amino acid sequence
Figure PCTCN2020134249-appb-000010
Figure PCTCN2020134249-appb-000011
SEQ ID NO: 27 GeoCas9-2.1 amino acid sequence
Figure PCTCN2020134249-appb-000012
SEQ ID NO: 28 GeoCas9-3.1 amino acid sequence
Figure PCTCN2020134249-appb-000013
SEQ ID NO: 33 GeoCas9-8.1 amino acid sequence
Figure PCTCN2020134249-appb-000014
Figure PCTCN2020134249-appb-000015
SEQ ID NO: 34 GeoCas9-9.1 amino acid sequence
Figure PCTCN2020134249-appb-000016
SEQ ID NO: 41 GeoCas9-12.1 amino acid sequence
Figure PCTCN2020134249-appb-000017
Figure PCTCN2020134249-appb-000018
In some embodiments, there is provided an engineered Cas9 nuclease comprising one or more mutations that increase flexibility of a flexible region in a reference Cas9 nuclease that is selected from the group consisting of regions corresponding to amino acid residues 45-49, amino acid residues 84-88, amino acid residues 116-120, amino acid residues 128-132, amino acid residues 216-220, and amino acid residues 318-322, amino acid residues 387-391, amino acid residues 497-501, amino acid residues 583-587, amino acid residues 594-598, amino acid residues 614-618, amino acid residues 696-700, and amino acid residues 739-743, wherein the amino acid residue numbering is based on SEQ ID NO: 53, and wherein the engineered Cas9 nuclease has an increased activity compared to the reference Cas9 nuclease. In some embodiments, the flexible region in the reference Cas9 nuclease corresponds to amino acid residues 45-49, or amino acid residues 116-120, wherein the amino acid residue numbering is based on SEQ ID NO: 53. In some embodiments, the reference Cas9 nuclease is SaCas9. In some embodiments, the one or more mutations comprise inserting one or more (e.g., 2) G residues in the flexible region. In some embodiments, the one or more G residues are inserted at the N-terminus of a flexible amino acid residue in the flexible region, wherein the flexible amino acid residue is selected from the group consisting of G, S, N, D, H, M, T, E, Q, K, R, A and P. In some embodiments, the flexible amino acid residue is chosen according to the preference: G>S>N>D>H>M>T>E>Q>K>R>A>P. In some embodiments, the one or more mutations comprise substituting a hydrophobic amino acid residue in a flexible region with a G residue, wherein the hydrophobic amino acid residues is selected from the group consisting of L, I, V, C, Y, F and W. In some embodiments, the engineered Cas9 nuclease comprises one or more amino acid sequences selected from the group consisting of SEQ ID NOs: 199-217. In some embodiments, the engineered Cas9 nuclease comprises one or more amino acid sequences selected from the group consisting of SEQ ID NOs: 199, 203 and 204. In some embodiments, the engineered Cas9 nuclease comprises the amino acid sequence of having at least about 85%(e.g., at least about any one of 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, 99%or more) sequence identity to SEQ ID NO: 53.
In some embodiments, there is provided an engineered SaCas9 nuclease comprising the amino acid sequence of SEQ ID NO: 199. In some embodiments, the engineered Cas9 nuclease comprises the amino acid sequence of having at least about 85%(e.g., at least about any one of 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, 99%or more) sequence identity to SEQ ID NO: 54. In some embodiments, there is provided an engineered Cas9 nuclease comprising the amino acid sequence of SEQ ID NO: 54.
In some embodiments, there is provided an engineered SaCas9 nuclease comprising the amino acid sequence of SEQ ID NO: 203. In some embodiments, the engineered Cas9 nuclease comprises the amino acid sequence of having at least about 85%(e.g., at least about any one of 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, 99%or more) sequence identity to SEQ ID NO: 58. In some embodiments, there is provided an engineered Cas9 nuclease comprising the amino acid sequence of SEQ ID NO: 58.
In some embodiments, there is provided an engineered SaCas9 nuclease comprising the amino acid sequence of SEQ ID NO: 204. In some embodiments, the engineered Cas9 nuclease comprises the amino acid sequence of having at least about 85%(e.g., at least about any one of 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, 99%or more) sequence identity to SEQ ID NO: 59. In some embodiments, there is provided an engineered Cas9 nuclease comprising the amino acid sequence of SEQ ID NO: 59.
SEQ ID NO: 53 wild-type SaCas9 amino acid sequence
Figure PCTCN2020134249-appb-000019
SEQ ID NO: 54 SaCas9-1.1 amino acid sequence
Figure PCTCN2020134249-appb-000020
Figure PCTCN2020134249-appb-000021
SEQ ID NO: 58 SaCas9-3.1 amino acid sequence
Figure PCTCN2020134249-appb-000022
SEQ ID NO: 59 SaCas9-3.2 amino acid sequence
Figure PCTCN2020134249-appb-000023
Figure PCTCN2020134249-appb-000024
The Type II CRISPR-Cas9 system is a dual-RNA-guided (i.e., crRNA and tracrRNA) DNA endonuclease system. The mature crRNA that is base-paired to trans-activating crRNA (tracrRNA) forms a two-RNA structure that directs the CRISPR-associated protein Cas9 to introduce double-stranded (ds) breaks in target DNA. At sites complementary to the crRNA-guide sequence, the Cas9 HNH nuclease domain cleaves the complementary strand, whereas the Cas9 RuvC-like domain cleaves the noncomplementary strand. The dual-tracrRNA: crRNA, when engineered as a single RNA chimera, also directs sequence-specific Cas9 dsDNA cleavage. The DNA-targeting RNA (also referred to herein as “crRNA” ; or “guide RNA” ; or “gRNA” ) comprises: i) a first segment comprising a nucleotide sequence that is complementary to a target sequence in a target DNA; ii) a second segment that interacts with a site-directed polypeptide; and iii) a transcriptional terminator.
Cas9 nucleases from various organisms may be used as the reference Cas9 nuclease to provide engineered Cas9 effector proteins of the present application. Exemplary Cas9 proteins have been described, for example, in US8697359, US10266850, and US20170145425, which are incorporated herein by reference in their entirety.
In some embodiments, the engineered Cas9 effector protein is based on a reference Cas9 protein (e.g., Cas9 nuclease) selected from Cas9 proteins from Streptococcus pneumoniae (Csn1) , Streptococcus pyogenes (SpCas9) or Streptococcus thermophiles (StCas9) , Staphylococcus aureus (SaCas9) , Neisseria meningitides (Nm2Cas9) , Campylobacter jejuni (CjCas9) , Geobacillus stearothermophilus (GeoCas9) , and Treponema denticola (TdCas9) , and may include mutated Cas9 derived from these organisms. In some embodiments, the reference Cas9 protein may have desirable properties for certain applications, such as targeting thermophiles. For example, GeoCas9 is active at temperatures up to 70℃, compared to 45 ℃ for Streptococcus pyogenes Cas9 (SpCas9) . In some embodiments, the reference Cas9 nuclease comprises an amino acid sequence selected from the group consisting of SEQ ID NOs: 25, 53, or 72-73. Orthologues having a certain sequence identity (e.g., at least about any one of 60%, 70%, 80%, 85%, 90%, 95%, 98%or higher) to Cas9 or functional derivatives thereof may be used as basis to design the engineered Cas9 effector proteins of the present application.
Naturally occurring Cas9 nucleases have various structural domains. In some embodiments, a reference Cas9 nuclease comprises a domain architecture with a central HNH endonuclease domain and a split RuvC/RNaseH domain. In some embodiments, a reference Cas9 nuclease comprises 4 key motifs with a conserved architecture.  Motifs  1, 2, and 4 are RuvC like motifs while motif 3 is an HNH-motif. Domain boundaries may be determined using known methods in the art, such as based on crystal structures of a reference Cas9 nucleases (e.g., PDB ID Nos: 5CZZ, 4OGC, 5X2G, 6JOO) .
Also provided are engineered Cas9 effector proteins based on any one of the engineered Cas9 nucleases described herein. In some embodiments, the engineered Cas9 effector protein is enzymatically active. In some embodiments, the engineered Cas9 effector protein is a nuclease that cleaves both strands of a target duplex nucleic acid (e.g., duplex DNA) . In some embodiments, the engineered Cas9 effector protein is a nickase, i.e., cleaving a single strand of a target duplex nucleic acid (e.g., duplex DNA) . For example, an aspartate-to-alanine substitution (D10A) in the RuvC I catalytic domain of Cas9 from S. pyogenes converts Cas9 from a nuclease that cleaves both strands to a nickase (cleaves a single strand) . Other examples of mutations that render Cas9 a nickase include, without limitation, H840A, N854A, and N863A. In some embodiments, the engineered Cas9 effector protein comprises an enzymatically inactive mutant of the engineered Cas9 nuclease. Mutations at one or more amino acid residues in the active site of a Cas9 nuclease can result in an enzymatically dead Cas9. For example, two or more catalytic domains of Cas9 (RuvC I, RuvC II, and RuvC III) may be mutated to produce a mutated Cas9 substantially lacking all DNA cleavage activity. In some embodiments, a D10A mutation is combined with one or more of H840A, N854A, or N863A mutations to produce a Cas9 enzyme substantially lacking all DNA cleavage activity. In some embodiments, a CRISPR enzyme is considered to substantially lack all DNA cleavage activity when the DNA cleavage activity of the mutated enzyme is less than about 25%, 10%, 5%, 1%, 0.1%, 0.01%, or lower with respect to its non-mutated form. Other mutations may be useful; where the Cas9 or other CRISPR enzyme is from a species other than S. pyogenes, mutations in corresponding amino acids may be made to achieve similar effects.
In some embodiments, there is provided an engineered Cas9 nickase. In some embodiments, there is provided an engineered Cas9 fusion effector protein, comprising an engineered Cas9 nuclease or functional derivative thereof (e.g., an enzymatically inactive mutant of the engineered Cas9 nuclease) fused to a functional domain, such as a translation initiator domain, a transcription repressor domain, a transactivation domain, an epigenetic  modification domain, a nucleobase-editing domain (e.g., CBE or ABE domain) , a reverse transcriptase domain, a reporter domain (e.g., a fluorescent domain) or a nuclease domain. In some embodiments, there is provided an engineered split Cas9 effector protein. Split Cas9 effector proteins have been described, for example, in WO2016/112242, which is incorporated herein by reference in its entirety.
Also provided are engineered CRISPR-Cas9 systems comprising any one of the engineered Cas9 effector proteins described herein, and a guide RNA comprising a guide sequence complementary to a target sequence, or one or more nucleic acids encoding the guide RNA, wherein the engineered Cas9 effector protein and the guide RNA are capable of forming a CRISPR complex that specifically binds to a target nucleic acid comprising the target sequence and induces a modification of the target nucleic acid. In some embodiments, the guide RNA comprises a crRNA and a tracrRNA. In some embodiments, the guide RNA is a sgRNA. In some embodiments, the guide RNA is a precursor that can be processed into a crRNA and a tracrRNA. In some embodiments, the guide RNA is a precursor RNA array encoding a plurality of crRNAs, and wherein each processed crRNA is associated with a tracrRNA. In some embodiments, the engineered Cas9 effector protein and/or the guide RNA are encoded by one or more vectors such as AAV vectors. In some embodiments, the engineered CRISPR-Cas9 system is a ribonucleoprotein (RNP) complex comprising the engineered Cas9 effector protein bound to the guide RNA.
Other engineered Cas effector proteins
The present application provides engineered Cas effector proteins (e.g., Cas nucleases, Cas nickases, Cas fusion effector proteins, split Cas effector proteins) that have improved activity (e.g., target binding, double-strand cleavage activity, nickase activity, and/or gene-editing activity) . Suitable reference Cas proteins for engineering may include, for example, Cas1, Cas1B, Cas2, Cas3, Cas4, Cas5, Cas6, Cas7, Cas8, Cas9, Csn1, Csx12, Cas1O, Csy1, Csy2, Csy3, Cse1, Cse2, Csc1, Csc2, Csa5, Csn2, Csm2, Csm3, Csm4, Csm5, Csm6, Cmr1, Cmr3, Cmr4, Cmr5, Cmr6, Csb1, Csb2, Csb3, Csx17, Csx14, Csx1O, Csx16, CsaX, Csx3, Csx1, Csx15, Csf1, Csf2, Csf3, Csf4, Cpf1 (also known as Cas12a) , CasX (Cas12e) , Cas12c, Cas12d, Cas12g, Cas12k, Cas12i, cas12f, Cas12j, Cas12w, Cas12v, CasX, CasY, CasZ, or a homolog or modified version thereof. For description of exemplary Cas reference proteins, see review article: Zhang F (2019) . Development of CRISPR-Cas systems for genome editing and beyond. Quarterly Reviews of Biophysics 52, e6, 1-31. Li Y &Peng N. Endogenous CRISPR-Cas system-based genome editing and antimicrobials: review and prospects. Frontiers in microbiology 10 (2019) : 2471.
In some embodiments, the reference Cas nuclease is a Cas12a, Cas12b (such as any one of the reference proteins described in the “Engineered Cas12b effector proteins” subsection) , Cas9 (such as any one of the reference proteins described in the “Engineered Cas9 effector proteins” subsection) , Cas12i (such as any one of the reference proteins described in the “Engineered Cas12i effector proteins” subsection) , Cas12f, Cas12j, or CasX (Cas12e) .
In some embodiments, the reference protein is a Cas12a protein of a Type V-A CRISPR-Cas system (previously known as Cpf1) . Type V-A systems do not require tracrRNA, allowing for simplified guide RNA design. Cas12a (Cpf1) nucleases from various organisms may be used as the reference Cas12a nuclease to provide engineered Cas12a effector proteins of the present application. Exemplary Cas12a nucleases have been described, for example, in US10648020; US10669540; US9790490; US20180282713; and WO2018188571, which are incorporated herein by reference in their entirety.
In some embodiments, the engineered Cas12a effector protein is based on a reference Cas12a protein (e.g., Cas12a nuclease) selected from Cas12a proteins from Prevotella and Francisella, such as Francisella novicida (FnCas12a or FnCpf1) , or from Acidaminococcus (AsCas12a or AsCpf1) or Lachnospiraceae bacterium (LbCas12a or LbCpf1p) . In some embodiments, the reference Cas12a nuclease comprises an amino acid sequence selected from the group consisting of SEQ ID NOs: 74-75. Orthologues having a certain sequence identity (e.g., at least about any one of 60%, 70%, 80%, 85%, 90%, 95%, 98%or higher) to Cas12a or functional derivatives thereof may be used as basis to design the engineered Cas12a effector proteins of the present application.
A crystal structure of Acidaminococcus Cas12a-crRNA-target DNA complex have been described in Yamano T, Nishimasu H, Zetsche B, et al. Crystal structure of Cpf1 in complex with guide RNA and target DNA. Cell. 2016; 165: 949-962 and is available with PDB code 5B43. Crystal structures are also available for LbCas12a (LbCpf1) (e.g., PDB codes 5XUU, 5XH6, and 5XUT) .
In some embodiments, the engineered Cas12a effector protein comprises one or more mutations that reduce or eliminate a nuclease activity. For example, suitable mutated amino acid positions in the FnCpf1 RuvC domain include but are not limited to D917A, E1006A, E1028A, D1227A, D1255A, N1257A, D917A, E1006A, E1028A, D1227A, D1255A and N1257A. In some embodiments, the point mutations to substantially reduce nuclease activity include mutations in a putative second nuclease domain such as N580A, N584A, T587A, W609A, D610A, K613A, E614A, D616A, K624A, D625A, K627A and  Y629A. In another example, the mutation in the AsCpf1p RuvC domain is D908A, E993A, and D1263A, wherein the D908A, E993A, and D1263A mutations completely inactivates the DNA cleavage activity of the AsCpf1 effector protein. In some embodiments, the mutation in the LbCpf1p RuvC domain include but are not limited to 832, 947 or 1180. In some embodiments, embodiment, the mutation in the LbCpf1 RuvC domain is LbD832A, E925A, D947A or D1180A, wherein the LbD832A E925A, D947A or D1180A mutations completely inactivates the DNA cleavage activity of the Cas12a protein. Mutations in other engineered Cas12a at positions corresponding to those described for FnCpf1 and LbCpf1 are contemplated herein.
In some embodiments, the reference protein is a CasX (also known as Cas12e) protein. CasX nucleases from various organisms may be used as the reference CasX nuclease to provide engineered CasX effector proteins of the present application. In some embodiments, the engineered CasX protein of the subject methods and/or compositions is (or is derived from) a naturally occurring (wild type) protein. Exemplary CasX nucleases have been described, for example, in US10570415, WO2018/202800, and WO2019/084148, which are incorporated herein by reference in their entirety.
In some embodiments, the engineered CasX effector protein is based on a reference CasX protein (e.g., CasX nuclease) selected from CasX proteins from Deltaproteobacter (DpbCasX or CasX1) and Plantomycetes (PlmCasX or CasX2) . In some embodiments, the reference CasX nuclease comprises an amino acid sequence selected from the group consisting of SEQ ID NOs: 76-77. Orthologues having a certain sequence identity (e.g., at least about any one of 60%, 70%, 80%, 85%, 90%, 95%, 98%or higher) to CasX or functional derivatives thereof may be used as basis to design the engineered CasX effector proteins of the present application.
In some embodiments, CasX proteins are short compared to previously identified CRISPR-Cas endonucleases, and thus use of this protein as an alternative provides the advantage that the nucleotide sequence encoding the protein is relatively short. This is useful, for example, in cases where a nucleic acid encoding the engineered CasX protein is desirable, e.g., in situations that employ a viral vector (e.g., an AAV vector) , for delivery to a cell such as a eukaryotic cell (e.g., mammalian cell, human cell, mouse cell, in vitro, ex vivo, in vivo) for research and/or clinical applications. It is also noted that bacteria harboring naturally occurring CasX CRISPR loci were present in environmental samples that were collected at low temperature (e.g., 10-17℃) . Thus, CasX is expected to be able to function well at low temperatures (e.g., 10-14℃, 10-17℃, 10-20℃) .
Analysis of cryo-EM structures of a DpbCasX-sgRNA-DNA complex (PDB codes 6NY2, 6NY1, and 6NY3) has revealed non-target-strand binding (NTSB) and target-strand loading (TSL) domains (Liu-J-J, OrlovaN, OakesBL, et al. CasX enzymes comprise a distinct family of RNA-guided genome editors. Nature. 2019; 566: 218-223) . The TSL domain is located in a position analogous to that of the so-called ‘Nuc’ domain in Cas12a (PDB 5B43) . These domains perform similar functions in CasX and Cas12a enzymes: after non-target DNA strand cleavage by the RuvC domain, they bend the sgRNA-DNA duplex. This conformational change allows the target DNA strand to be cleaved by the RuvC domain. Thus, both Cas12e and Cas12a, rely on a single nuclease domain for double stranded DNA cleavage, in contrast to Cas9, which uses distinct domains, HNH and RuvC, to cleave each DNA strand. In Cas12e and Cas12a, a large structural change alters accessibility of DNA strands for the RuvC nuclease and in this way compensates the lack of the second nuclease domain.
Both Cas12a and Cas12X generate products with staggered ends. In contrast, Cas9 proteins mainly produce blunt ends. Interestingly, Jun-Jie Liu et al. found that DpbCasX produces staggered ends about 10-nucleotides long, which is longer than 3-5nt overhangs usually produced by Cas12a proteins. The 5′-overhangs produced by Cas12a and CasX, potentially can be used for in vivo or in vitro insertion of DNA fragments into genome through direct DNA ligation.
In contrast to Cas12a, CasX enzymes require tracrRNA in addition to crRNA for DNA target recognition.
In some embodiments, the reference Cas nuclease is a Cas12f. Exemplary Cas12f nucleases have been disclosed, for example, in WO2020088450. In some embodiments, the reference Cas nuclease is a Cas12i. Exemplary Cas12i nucleases have been disclosed, for example, in WO2020098772.
In some embodiments, the reference Cas nuclease is a Cas12j (e.g., SEQ ID NO: 78) .
Variants
The present application provides engineered Cas effector proteins which comprise functional variants of the engineered Cas nucleases described herein. In some embodiments, the functional variant has an amino acid sequence that is different by at least one amino acid residue (e.g., has a deletion, insertion, substitution, and/or fusion) when compared to the amino acid sequence of the corresponding engineered Cas nuclease. In some embodiments, the functional variant has one or more mutations, such as amino acid substitutions, insertions  and/or deletions. By way of example, the functional variant may comprise any one of 1, 2, 3, 4, 5, 6, 7, 8, 9, 10 or more amino acid substitutions compared to an engineered Cas nuclease. In some embodiments, the one or more substitutions are conservative substitutions. In some embodiments, the functional variant has all domains of an engineered Cas nuclease. In some embodiments, the functional variant does not have one or more domains of an engineered Cas nuclease.
For any of the Cas variant proteins described herein (e.g., nickase Cas protein, dead or catalytically inactive Cas (dCas) , fusion Cas) , the Cas variant can include a Cas protein sequence with the same parameters described above (e.g., domains that are present, percent identity, and the like) .
Catalytic activity
In some embodiments, the functional variant has different catalytic activity compared to its non-mutated form of the engineered Cas nuclease. In some embodiments, the mutations (e.g., amino acid substitutions, insertions, and/or deletions) are in a catalytic domain of the Cas effector protein (e.g., a RuvC domain) . In some embodiments, the variant comprises mutations in multiple catalytic domains. A Cas effector protein that cleaves one strand but not the other of a double stranded target nucleic acid is referred to herein as a “nickase” (e.g., a “nickase Cas” ) . A Cas protein that has substantially no nuclease activity is referred to herein as a dead Cas protein ( “dCas” ) (with the caveat that nuclease activity can be provided by a heterologous polypeptide-afusion partner-in the case of a fusion Cas effector protein, which is described in more detail below) . In some embodiments, a Cas effector protein is considered to substantially lack all DNA cleavage activity when the DNA cleavage activity of the mutated enzyme is less than about 25%, 10%, 5%, 1%, 0.1%, 0.01%, or lower with respect to its non-mutated form.
Exemplary mutations in Cas functional variants are described in the Cas12b, Cas12i, and Cas9 subsections above, and in WO2016205764, WO2020/087631, WO2019/201331A1, US2020/0063126A1, US8697359, US10266850, US20170145425, US10648020, US10669540, US9790490, US20180282713, WO2018188571, US10570415, WO2018/202800, and WO2019/084148, which are herein incorporated by reference in their entirety.
Split Cas effector proteins
The present application also provides split Cas effector proteins based on any one of the engineered Cas effector proteins described herein. The split Cas effector proteins may be advantageous for delivery. In some embodiments, the engineered Cas effector proteins are split to two parts of the enzymes, which can be reconstituted together to provide a substantially functioning Cas effector protein. Split versions of Cas effector proteins (e.g., Cas12 and Cas9 proteins) have been described, for example, in WO2016/112242; WO2016/205749; and PCT/CN2020/111057, which are herein incorporated by reference in their entirety.
In some embodiments, there is provided a split Cas effector protein, comprising a first polypeptide comprising an N-terminal portion of any one of the engineered Cas nucleases described herein or functional derivative thereof, and a second polypeptide comprising a C-terminal portion of the engineered Cas nuclease or functional derivative thereof, wherein the first polypeptide and the second polypeptide are capable of associating with each other in the presence of a guide RNA comprising a guide sequence to form a CRISPR complex that specifically binds to a target nucleic acid comprising a target sequence complementary to the guide sequence. In some embodiments, the first polypeptide and the second polypeptide each comprises a dimerization domain. In some embodiments, the first dimerization domain and the second dimerization domain associate with each other in the presence of an inducer (e.g., rapamycin) . In some embodiments, the first polypeptide and the second polypeptide do not comprise dimerization domains. In some embodiments, the split Cas effector protein is auto-inducing.
The split can be done in a way that the catalytic domain (s) are unaffected. The Cas effector proteins may function as a nuclease (including a nickase) or may be inactivated enzymes, which are essentially RNA-guided DNA-binding proteins with very little or no catalytic activity (e.g., due to mutation (s) in their catalytic domains) .
In some embodiments, the nuclease lobe and a-helical lobe of a Cas protein are expressed as separate polypeptides. Although the lobes do not interact on their own, the RNA guide recruits them into a complex that recapitulates the activity of full-length Cas enzymes and catalyzes site-specific DNA cleavage. In some embodiments, a modified RNA guide may be used to abrogate split-enzyme activity by preventing dimerization, allowing for the development of an inducible dimerization system. The split enzyme is described, e.g., in Wright, Addison V., et al. “Rational design of a split-Cas9 enzyme complex, ” Proc. Nat′l. Acad. Sci., 112.10 (2015) : 2984-2989, which is incorporated herein by reference in its entirety.
The split Cas effector protein portions described herein can be designed by dividing (i.e., splitting) a reference engineered Cas effector protein (e.g., a full-length engineered Cas12b, Cas12i, Cas9, Cas12a, or CasX effector protein or a functional variant thereof) into two halves at a split position, which is the point at which the N-terminal portion of the reference Cas effector protein is separated from the C-terminal portion. In some embodiments, the N-terminal portion comprises amino acid residues 1 to X, whilst the C-terminal portion comprises amino acid residues X+i to the C-terminus end of the reference Cas effector protein. In this example, the numbering is contiguous, but this may not always be necessary as amino acids (or the nucleotides encoding them) could be trimmed from the end of either one of the split ends, and/or mutations (e.g., insertions, deletions and substitutions) at internal regions of the polypeptide chain (s) are also contemplated, provided that sufficient DNA binding activity and, if required, DNA nickase or cleavage activity, of the reconstituted Cas effector protein is retained, for example at least 40%, 50%, 60%, 70%, 80%, 90%or 95%activity compared to the reference Cas effector protein.
The split point may be designed in silico and cloned into the constructs. During this process, mutations can be introduced to the split Cas effector protein and non-functional domains can be removed. In some embodiments, the two parts or fragments of the split Cas effector protein (i.e., the N-terminal and C-terminal fragments) , can form a full Cas effector protein, comprising, e.g., at least 70%, at least 80%, at least 90%, at least 95%, or at least 99%of the sequence of the full Cas effector protein.
The split Cas effector proteins may each comprise one or more dimerization domains. In some embodiments, the first polypeptide comprises a first dimerization domain fused to the first split Cas effector portion, and the second polypeptide comprises a second dimerization domain fused to the second split Cas effector portion. The dimerization domain may be fused to the split Cas effector portion via a peptide linker (e.g., a flexible peptide linker such as a GS linker) or a chemical bond. In some embodiments, the dimerization domain is fused to the N-terminus of the split Cas effector portion. In some embodiments, the dimerization domain is fused to the C-terminus of the split Cas effector portion.
In some embodiments, the split Cas effector proteins do not comprise any dimerization domains.
In some embodiments, the dimerization domains promotes association of the two split Cas effector portions. In some embodiments, the split Cas effector portions are induced to associate or dimerize into a functional Cas effector protein by an inducer. In some embodiments, the split Cas effector proteins comprise inducible dimerization domains. In  some embodiments, the dimerization domains are not inducible dimerization domains, i.e., the dimerization domains dimerize without the presence of an inducer.
An inducer may be an inducing energy source or an inducing molecule other than a guide RNA (e.g., a sgRNA) . The inducer acts to reconstitute two split Cas effector portions into a functional Cas effector protein via induced dimerization of the dimerization domains. In some embodiments, the inducer brings the two split Cas effector portions together through the action of induced association of the inducible dimerization domains. In some embodiments, without the inducer, the two split Cas effector portions do not associate with each other to reconstitute into a functional Cas effector protein. In some embodiments, without the inducer, the two split Cas effector portions may associate with each other to reconstitute into a functional Cas effector protein in the presence of a guide RNA (e.g., a sgRNA) .
The inducer of the present application may be heat, ultrasound, electromagnetic energy or a chemical compound. In some embodiments, the inducer is an antibiotic, a small molecule, a hormone, a hormone derivative, a steroid or a steroid derivative. In some embodiments, the inducer is abscisic acid (ABA) , doxycycline (DOX) , cumate, rapamycin, 4-hydroxytamoxifen (4OHT) , estrogen or ecdysone. In some embodiments, the split Cas effector system is an inducer-controlled system selected from the group consisting of antibiotic based inducible systems, electromagnetic energy based inducible systems, small molecule based inducible systems, nuclear receptor based inducible systems and hormone based inducible systems. In some embodiments, the split Cas effector system is an inducer-controlled system is selected from the group consisting of tetracycline (Tet) /DOX inducible systems, light inducible systems, ABA inducible systems, cumate repressor/operator systems, 4OHT/estrogen inducible systems, ecdysone-based inducible systems and FKBP12/FRAP (FKBP12-rapamycin complex) inducible systems. Such inducers are also discussed herein and in PCT/US2013/051418, which is incorporated herein by reference in its entirety. FRB/FKBP/Rapamycin systems have been described in Paulmurugan and Gambhir, Cancer Res, August 15, 2005 65; 7413; and Crabtree et al., Chemistry &Biology 13, 99-107, Jan 2006, which are incorporated herein by reference in their entirety.
In some embodiments, the pair of split Cas effector proteins are separate and inactive until induced dimerization of the dimerization domains (e.g., FRB and FKBP) , which results in reassembly of a functional Cas effector nuclease. In some embodiments, the first split Cas effector protein comprising a first half of an inducible dimer (e.g., FRB) is delivered  separately and/or is localized separately from the second split Cas effector protein comprising a second half of an inducible dimer (e.g., FKBP) .
Other exemplary FKBP-based inducible systems that may be used in inducer-controlled split Cas effector systems described herein include, but are not limited to, FKBP which dimerizes with CalcineurinA (CNA) , in the presence of FK506; FKBP which dimerizes with CyP-Fas, in the presence of FKCsA; FKBP which dimerizes with FRB, in the presence of Rapamycin; GyrB which dimerizes with GryB, in the presence of Coumermycin; GAI which dimerizes with GID 1, in the presence of Gibberellin; or Snap-tag which dimerizes with HaloTag, in the presence of HaXS.
Alternatives within the FKBP family itself are also contemplated. For example, FKBP, which homodimerizes (i.e., one FKBP dimerizes with another FKBP) in the presence of FK1012.
In some embodiments, the dimerization domain is FKBP and the inducer is FK1012. In some embodiments, the dimerization domain is GryB and the inducer is coumermycin. In some embodiments, the dimerization domain is ABA and the inducer is Gibberellin.
In some embodiments, the split Cas effector portions may be auto-induced (i.e., auto-activated or self-induced) to associate/dimerize into a functional Cas effector protein without the presence of an inducer. Without being bound by any theory or hypothesis, auto-induction of the split Cas effector portions may be mediated by binding to a guide RNA, such as sgRNA. In some embodiments, the first polypeptide and the second polypeptide do not comprise dimerization domains. In some embodiments, the first polypeptide and the second polypeptide comprise dimerization domains.
In some embodiments, the reconstituted Cas effector protein of the split Cas effector systems described herein (including inducer-controlled and auto-inducible systems) has an editing efficiency of at least 70% (such as at least about any of 70%, 75%, 80%, 85%, 90%, 95%, 98%, 99%or more efficiency, or 100%efficiency) of the editing efficiency of the reference Cas effector protein.
In some embodiments, the reconstituted Cas effector protein of an inducer-controlled split Cas effector systems described herein has an editing efficiency of no more than 50% (such as no more than about any of 50%, 45%, 40%, 35%, 30%, 25%, 20%, 15%, 10%, 5%, or less efficiency, or 0%efficiency) without the presence of an inducer (i.e., due to auto-induction) of the editing efficiency of the reference Cas effector protein.
Fusion Cas effector proteins
The present application also provides engineered Cas effector proteins comprising additional protein domains and/or components, such as linkers, nuclear localization/exportation sequences, functional domains, and/or reporter proteins.
In some embodiments, the engineered Cas effector protein is a protein complex comprising one or more heterologous protein domains (e.g., about or more than about 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, or more domains) in addition to the nucleic acid-targeting domains of the engineered Cas nuclease or functional derivative thereof. In some embodiments, the engineered Cas effector protein is a fusion protein comprising one or more heterologous protein domains (e.g. about or more than about 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, or more domains) fused to the engineered Cas nuclease.
In some embodiments, the engineered Cas effector proteins of the present application can comprise (e.g., via fusion protein, such as via one or more peptide linkers, for example, GS peptide linkers, etc. ) or be associated (e.g., via co-expression of multiple proteins) with one or more functional domains. In some embodiments, the one or more functional domains are enzymatic domains. These functional domains can have various activities, e.g., DNA and/or RNA methylase activity, demethylase activity, transcription activation activity, transcription repression activity, transcription release factor activity, histone modification activity, RNA cleavage activity, DNA cleavage activity, nucleic acid binding activity, and switch activity (e.g., light inducible) . In some embodiments, the one or more functional domains are transcriptional activation domains (i.e., transactivation domains) or repressor domains. In some embodiments, the one or more functional domains are histone-modifying domains. In some embodiments, the one or more functional domains are transposase domains, HR (Homologous Recombination) machinery domains, recombinase domains, and/or integrase domains. In some embodiments, the functional domains are Krüppel associated box (KRAB) , VP64, VP16, Fok1, P65, HSF1, MyoD1, biotin-APEX, APOBEC1, AID, PmCDA1, Tad1, and M-MLV reverse transcriptase. In some embodiments, the functional domain is selected from the group consisting of a translation initiator domain, a transcription repressor domain, a transactivation domain, an epigenetic modification domain, a nucleobase-editing domain (e.g., CBE or ABE domain) , a reverse transcriptase domain, a reporter domain (e.g., a fluorescent domain) and a nuclease domain.
In some embodiments, the positioning of the one or more functional domains in the engineered Cas effector proteins allows for correct spatial orientation for the functional domains to affect the target with the attributed functional effects. For example, if the functional domain is a transcription activator (e.g., VP16, VP64, or p65) , the transcription  activator is placed in a spatial orientation that allows it to affect the transcription of the target. Likewise, a transcription repressor is positioned to affect the transcription of the target, and a nuclease (e.g., Fok1) is positioned to cleave or partially cleave the target. In some embodiments, the functional domain is positioned at the N-terminus of the engineered Cas effector protein. In some embodiments, the functional domain is positioned at the C-terminus of the engineered Cas effector protein. In some embodiments, the engineered Cas effector protein comprises a first functional domain at the N-terminus and a second functional domain at the C-terminus. In some embodiments, the engineered Cas effector protein comprises a catalytically inactive mutant of any one of the engineered Cas nucleases described herein fused to one or more functional domains.
In some embodiments, the engineered Cas effector protein is a transcriptional activator. In some embodiments, the engineered Cas effector protein comprises an enzymatically inactive variant of any one of the engineered Cas nucleases described herein fused to a transactivation domain. In some embodiments, the transactivation domain is selected from the group consisting of VP64, p65, HSF1, VP16, MyoD1, HSF1, RTA, SET7/9, and combinations thereof. In some embodiments, the transactivation domain comprises VP64, p65 and HSF1. In some embodiments, the engineered Cas effector protein comprises two split Cas effector polypeptides, each fused to a transactivation domain.
In some embodiments, the engineered Cas effector protein is a transcriptional repressor. In some embodiments, the engineered Cas effector protein comprises an enzymatically inactive variant of any one of the engineered Cas nucleases described herein fused to a transcription repressor domain. In some embodiments, the transcription repressor domain is selected from the group consisting of Krüppel associated box (KRAB) , EnR, NuE, NcoR, SID, SID4X, and combinations thereof. In some embodiments, the engineered Cas effector protein comprises two split Cas effector polypeptides, each fused to a transcription repressor domain.
In some embodiments, the engineered Cas effector protein is a base editor, such as a cytosine editor or an adenosine editor. In some embodiments, the engineered Cas effector protein comprises an enzymatically inactive variant of any one of the engineered Cas nucleases described herein fused to a nucleobase-editing domain, such as a cytosine base editing (CBE) domain or an adenosine base editing (ABE) domain. In some embodiments, the nucleobase-editing domain is a DNA-editing domain. In some embodiments, the nucleobase-editing domain has deaminase activity. In some embodiments, the nucleobase-editing domain is a cytosine deaminase domain. In some embodiments, the nucleobase- editing domain is an adenosine deaminase domain. Exemplary base editors based on Cas nucleases have been described, for example, in WO2018/165629A1 and WO2019/226953A1, which are incorporated herein by reference in their entirety. Exemplary CBE domains include, but are not limited to, activation-induced cytidine deaminase or AID (e.g., hAID) , apolipoprotein B mRNA-editing complex or APOBEC (e.g., rat APOBEC1, hAPOBEC3 A/B/C/D/E/F/G) and PmCDA1. Exemplary ABE domains include, but are not limited to, TadA, ABE8 and variants thereof (see, e.g., Gaudelli et al., 2017, Nature 551: 464-471; and Richter et al., 2020, Nature Biotechnology 38: 883-891) . In some embodiments, the functional domain is an APOBEC1 domain, e.g., a rat APOBEC1 domain comprising the amino acid sequence of SEQ ID NO: 218. In some embodiments, the functional domain is a TadA domain, e.g., an E. coli TadA domain comprising the amino acid sequence of SEQ ID NO: 219. In some embodiments, the engineered Cas effector protein further comprises one or more nuclear localization sequences.
In some embodiments, the engineered Cas effector protein is a prime editor. Prime editors based on Cas9 have been described, for example, in A. Anzalone et al., Nature, 2019, 576 (7785) : 149-157, which is incorporated herein by reference in its entirety. In some embodiments, the engineered Cas effector protein comprises a nickase variant of any one of the engineered Cas nucleases described herein fused to a reverse transcriptase domain. In some embodiments, the functional domain is a reverse transcriptase domain. In some embodiments, the reverse transcriptase domains is an M-MLV reverse transcriptase, or a variant thereof, e.g., M-MLV reverse transcriptase having one or more mutations of D200N, T306K, W313F, T330P and L603W. In some embodiments, the reverse transcriptase domain comprises the amino acid sequence of SEQ ID NO: 220 or 221. In some embodiments, there is provided an engineered CRISPR/Cas system comprising the prime editor. In some embodiments, the engineered CRISPR/Cas system further comprises a second Cas nickase, e.g., based on the same engineered Cas nuclease as the prime editor. In some embodiments, the engineered CRISPR/Cas system comprises a prime editor guide RNA (pegRNA) , which comprises a primer binding site and a reverse transcriptase (RT) template sequence.
In some embodiments, the present application provides a split Cas effector system having one or more (e.g., 1, 2, 3, 4, 5, 6, or more) functional domains associated with (i.e., bound to or fused to) one or both split Cas effector portions. The functional domain (s) may be provided as part of the first and/or second split Cas effector proteins, as fusions within that construct. The functional domains are typically fused to other parts in the split Cas effector proteins (e.g., split Cas effector portions) via a peptide linker, such as GS linker. The  functional domains can be used to repurpose the function of the split Cas effector system based on a catalytically dead Cas effector.
In some embodiments, the engineered Cas effector proteins comprise one or more nuclear localization sequences (NLSs) and/or one or more nuclear exportation sequences (NESs) . Exemplary NLS sequences include, for example, PKKKRKVPG (SEQ ID NO: 79) and ASPKKKRKV (SEQ ID NO: 80) . The NLS (s) and/or NES (s) may be operably linked to the N-terminus and/or the C-terminus of the engineered Cas effector proteins or polypeptide chains in the engineered Cas effector proteins.
In some embodiments, the engineered Cas effector proteins may encode additional components, such as reporter proteins. In some embodiments, the engineered Cas effector protein comprises a fluorescent protein, e.g., GFP. Such system could allow imaging of genomic loci (see, for example, “Dynamic Imaging of Genomic Loci in Living Human Cells by an Optimized CRISPR/Cas System” Chen B et al. Cell 2013) . In some embodiments, the engineered Cas effector protein is an inducible split Cas effector system that can be used to image genomic loci.
Engineered CRISPR-Cas systems
Also provided are engineered CRISPR-Cas systems comprising: (a) any one of the engineered Cas effector proteins (e.g., engineered Cas nuclease, nickase, split Cas, transcriptional repressor, transcriptional activator, base editor, or prime editor) described herein; and (b) a guide RNA comprising a guide sequence complementary to a target sequence, or one or more nucleic acids encoding the guide RNA, wherein the engineered Cas effector protein and the guide RNA are capable of forming a CRISPR complex that specifically binds to a target nucleic acid comprising the target sequence and induces a modification of the target nucleic acid. In some embodiments, the engineered CRISPR-Cas system comprises one or more nucleic acids encoding the engineered Cas effector protein and/or the guide RNA. In some embodiments, the engineered CRISPR-Cas system comprises a precursor guide RNA array that can be processed, e.g., by the engineered Cas effector protein, into a plurality of crRNAs. In some embodiments, the engineered CRISPR-Cas system comprises one or more vectors encoding the engineered Cas effector protein and/or the guide RNA. In some embodiments, the engineered CRISPR-Cas system comprises a ribonucleoprotein (RNP) complex comprising the engineered Cas effector protein bound to the guide RNA.
The engineered CRISPR-Cas systems of the present application may comprise any suitable guide RNAs. A guide RNA (gRNA) may comprise a guide sequence capable of  hybridizing to a target sequence in a target nucleic acid of interest, such as a genomic locus of interest in a cell. In some embodiments, the gRNA comprises a CRISPR RNA (crRNA) sequence comprising the guide sequence. In some embodiments, the gRNA comprises a trans-activating CRISPR RNA (tracrRNA) sequence. In some embodiments, the guide RNA is a single-guide RNA (sgRNA) . In some embodiments, the sgRNA comprises a tracrRNA and a crRNA.
In some embodiments, the CRISPR-Cas systems provided herein do not require tracrRNA sequences (e.g., CRISPR-Cas12i or CRISPR-Cas12a systems) . In some embodiments, the guide RNA comprises a crRNA. Generally, the crRNAs described herein include a direct repeat sequence and a spacer sequence. In certain embodiments, the crRNA includes, consists essentially of, or consists of a direct repeat sequence linked to a guide sequence or spacer sequence. In some embodiments, the crRNA includes a direct repeat sequence, a spacer sequence, and a direct repeat sequence (DR-spacer-DR) , which is typical of precursor crRNA (pre-crRNA) configurations. In some embodiments, the crRNA includes a truncated direct repeat sequence and a spacer sequence, which is typical of processed or mature crRNA. In some embodiments, the CRISPR-Cas effector protein forms a complex with the RNA guide, and the spacer sequence directs the complex to a sequence-specific binding with the target nucleic acid that is complementary to the spacer sequence.
Some Type V Cas enzymes, such as Cas12c/d enzymes require a scoutRNA. See, Harrington et al., 2020, Mol. Cell., 79: 1-9. In some embodiments, the guide RNA comprises a crRNA and a scoutRNA.
In some embodiments, the guide RNA is a crRNA comprising the guide sequence. In some embodiments, the engineered CRISPR-Cas system comprises a precursor guide RNA array encoding a plurality of crRNAs. In some embodiments, the Cas effector protein cleaves the precursor guide RNA array to produce a plurality of crRNAs. In some embodiments, the engineered CRISPR-Cas system comprises a precursor guide RNA array encoding a plurality of crRNA, wherein each crRNA comprises a different guide sequence. In some embodiments, the crRNA encoded by the precursor guide RNA array is associated with a tracrRNA or scoutRNA.
The guide sequence may have a suitable length. In some embodiments, the guide sequence is between about 18 to about 35 nucleotides, including, for example, any one of 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34 or 35 nucleotides. The guide sequence may have at least about 70%, 75%, 80%, 85%, 90%, 95%, 98%, 99%, or 100%complementarity to a target sequence of the target nucleic acid.
Constructs and vectors
Also provided herein are constructs, vectors and expression systems encoding any one of the engineered Cas effector proteins described herein. In some embodiments, the construct, vector, or expression system further comprises one or more gRNAs (e.g., sgRNAs) or crRNA arrays.
A "vector" is a composition of matter which comprises an isolated nucleic acid and which can be used to deliver the isolated nucleic acid to the interior of a cell. Numerous vectors are known in the art including, but not limited to, linear polynucleotides, polynucleotides associated with ionic or amphiphilic compounds, plasmids, and viruses. In general, a suitable vector contains an origin of replication functional in at least one organism, a promoter sequence, convenient restriction endonuclease sites, and one or more selectable markers. The term “vector” should also be construed to include non-plasmid and non-viral compounds, which facilitate transfer of nucleic acid into cells, such as, for example, polylysine compounds, liposomes, and the like.
In some embodiments, the vector is a viral vector. Examples of viral vectors include, but are not limited to, adenoviral vectors, adeno-associated virus vectors, lentiviral vector, retroviral vectors, vaccinia vector, herpes simplex viral vector, and derivatives thereof. In some embodiments, the vector is a phage vector. Viral vector technology is well known in the art and is described, for example, in Sambrook et al. (2001, Molecular Cloning: A Laboratory Manual, Cold Spring Harbor Laboratory, New York) , and in other virology and molecular biology manuals.
A number of viral based systems have been developed for gene transfer into mammalian cells. For example, retroviruses provide a convenient platform for gene delivery systems. The heterologous nucleic acid can be inserted into a vector and packaged in retroviral particles using techniques known in the art. The recombinant virus can then be isolated and delivered to the engineered mammalian cell in vitro or ex vivo. A number of retroviral systems are known in the art. In some embodiments, adenovirus vectors are used. A number of adenovirus vectors are known in the art. In some embodiments, lentivirus vectors are used. In some embodiments, self-inactivating lentiviral vectors are used.
In certain embodiments, the vector is an adeno-associated viruses (AAV) vector, e.g., AAV2, AAV8, or AAV9, which can be administered in a single dose containing at least 1×10 5 particles (also referred to as particle units, pu) of adenoviruses or adeno-associated viruses. In some embodiments, the dose is at least about 1×10 6 particles, at least about 1×10 7 particles, at least about 1×10 8 particles, or at least about 1×10 9 particles of the adeno- associated viruses. The delivery methods and the doses are described, e.g., in WO 2016205764 and U.S. Pat. No. 8,454,972, both of which are incorporated herein by reference in their entirety.
In some embodiments, the vector is a recombinant adeno-associated virus (rAAV) vector. For example, in some embodiments, a modified AAV vector may be used for delivery. Modified AAV vectors can be based on one or more of several capsid types, including AAV1, AV2, AAV5, AAV6, AAV8, AAV8.2. AAV9, AAV rh1O, modified AAV vectors (e.g., modified AAV2, modified AAV3, modified AAV6) and pseudotyped AAV (e.g., AAV2/8, AAV2/5 and AAV2/6) . Exemplary AAV vectors and techniques that may be used to produce rAAV particles are known in the art (see, e.g., Aponte-Ubillus et al. (2018) Appl. Microbiol. Biotechnol. 102 (3) : 1045-54; Zhong et al. (2012) J. Genet. Syndr. Gene Ther. S1: 008; West et al. (1987) Virology 160: 38-47 (1987) ; Tratschin et al. (1985) Mol. Cell. Biol. 5: 3251-60) ; U.S. Pat. Nos. 4,797,368 and 5,173,414; and International Publication Nos. WO 2015/054653 and WO 93/24641, each of which is incorporated by reference) .
Any one of the known AAV vectors for delivering Cas9 and other Cas proteins may be used for delivery of the engineered Cas systems of the present application.
Methods of introducing vectors into a mammalian cell are known in the art. The vectors can be transferred into a host cell by physical, chemical, or biological methods.
Physical methods for introducing the vector into a host cell include calcium phosphate precipitation, lipofection, particle bombardment, microinjection, electroporation, and the like. Methods for producing cells comprising vectors and/or exogenous nucleic acids are well known in the art. See, for example, Sambrook et al. (2001) Molecular Cloning: A Laboratory Manual, Cold Spring Harbor Laboratory, New York. In some embodiments, the vector is introduced into the cell by electroporation.
Biological methods for introducing the heterologous nucleic acid into a host cell include the use of DNA and RNA vectors. Viral vectors have become the most widely used method for inserting genes into mammalian, e.g., human cells.
Chemical means for introducing the vector into a host cell include colloidal dispersion systems, such as macromolecule complexes, nanocapsules, microspheres, beads, and lipid-based systems including oil-in-water emulsions, micelles, mixed micelles, and liposomes. An exemplary colloidal system for use as a delivery vehicle in vitro is a liposome (e.g., an artificial membrane vesicle) . In some embodiments, the engineered CRISPR-Cas system is delivered as an RNP in a nanoparticle.
In some embodiments, the vector (s) or expression system encoding the CRISPR-Cas systems or components thereof comprise one or more selectable or detectable markers that provide a means to isolate or efficiently select cells that contain and/or have been modified by the CRISPR-Cas system, e.g., at an early stage and on a large scale.
Reporter genes may be used for identifying potentially transfected cells and for evaluating the functionality of regulatory sequences. In general, a reporter gene is a gene that is not present in or expressed by the recipient organism or tissue and that encodes a polypeptide whose expression is manifested by some easily detectable property, e.g., enzymatic activity. Expression of the reporter gene is assayed at a suitable time after the DNA has been introduced into the recipient cells. Suitable reporter genes may include genes encoding luciferase, beta-galactosidase, chloramphenicol acetyl transferase, secreted alkaline phosphatase, or the green fluorescent protein gene (e.g., Ui-Tei et al. FEBS Letters 479: 79-82 (2000) ) .
Other methods to confirm the presence of the heterologous nucleic acid in a host cell, include, for example, molecular biological assays well known to those of skill in the art, such as Southern and Northern blotting, RT-PCR and PCR; biochemical assays, such as detecting the presence or absence of a particular peptide, e.g., by immunological methods (such as ELISAs and Western blots) .
In some embodiments, the nucleic acid sequences encoding the encoding the engineered Cas effector protein (s) and/or the guide RNA are operably linked to a promoter. In some embodiments, the promoter is an endogenous promoter with respect to a cell that is engineered using the engineered CRISPR-Cas system. For example, the nucleic acid encoding the engineered Cas effector protein may be knocked-in to the genome of an engineered mammalian cell downstream of an endogenous promoter using any methods known in the art. In some embodiments, the endogenous promoter is a promoter for an abundant protein, such as beta-actin. In some embodiments, the endogenous promoter is an inducible promoter, for example, inducible by an endogenous activation signal of an engineered mammalian cell. In some embodiments, wherein the engineered mammalian cell is a T cell, the promoter is a T cell activation-dependent promoter (such as an IL-2 promoter, an NFAT promoter, or an NFκB promoter) .
In some embodiments, the promoter is a heterologous promoter with respect to a cell that is engineered using the engineered CRISPR-Cas system. Varieties of promoters have been explored for gene expression in mammalian cells, and any of the promoters known in  the art may be used in the present application. Promoters may be roughly categorized as constitutive promoters or regulated promoters, such as inducible promoters.
In some embodiments, the nucleic acid sequences encoding the engineered Cas effector protein and/or the guide RNA are operably linked to a constitutive promoter. Constitutive promoters allow heterologous genes (also referred to as transgenes) to be expressed constitutively in the host cells. Exemplary constitutive promoters contemplated herein include, but are not limited to, Cytomegalovirus (CMV) promoters, human elongation factors-1alpha (hEF1α) , ubiquitin C promoter (UbiC) , phosphoglycerokinase promoter (PGK) , simian virus 40 early promoter (SV40) , and chicken β-Actin promoter coupled with CMV early enhancer (CAG) . In some embodiments, the promoter is a CAG promoter comprising a cytomegalovirus (CMV) early enhancer element, the promoter, the first exon and the first intron of chicken beta-actin gene, and the splice acceptor of the rabbit beta-globin gene.
In some embodiments, the nucleic acid sequences encoding the engineered CRISPR-Cas protein (s) and/or the guide RNA are operably linked to an inducible promoter. Inducible promoters belong to the category of regulated promoters. The inducible promoter can be induced by one or more conditions, such as a physical condition, microenvironment, or the physiological state of a host cell, an inducer (i.e., an inducing agent) , or a combination thereof. In some embodiments, the inducing condition is selected from the group consisting off an inducer, irradiation (such as ionizing radiation, light) , temperature (such as heat) , redox state, tumor environment, and the activation state of a cell to be engineered by the engineered CRISPR-Cas system. In some embodiments, the promoter is inducible by a small molecule inducer, such as a chemical compound. In some embodiments, the small molecule is selected from the group consisting of doxycycline, tetracycline, alcohol, metal, or steroids. Chemically-induced promoters have been most widely explored. Such promoters includes promoters whose transcriptional activity is regulated by the presence or absence of a small molecule chemical, such as doxycycline, tetracycline, alcohol, steroids, metal and other compounds. Doxycycline-inducible system with reverse tetracycline-controlled transactivator (rtTA) and tetracycline-responsive element promoter (TRE) is the most mature system at present. WO9429442 describes the tight control of gene expression in eukaryotic cells by tetracycline responsive promoters. WO9601313 discloses tetracycline-regulated transcriptional modulators. Additionally, Tet technology, such as the Tet-on system, has described, for example, on the website of TetSystems. com. Any of the known chemically  regulated promoters may be used to drive expression of the encoding the engineered CRISPR-Cas protein (s) and/or the guide RNA in the present application.
In some embodiments, the nucleic acid sequence encoding the engineered Cas effector protein (e.g., enCas12i2) is codon optimized.
In some embodiments, there is provided an expression construct comprising the codon optimized sequence encoding the engineered Cas effector protein ligated into a BPK2104-ccdB vector. In some embodiments, the expression construct encodes a tag (e.g., a 10xHis tag) operably linked to the C terminus of the engineered Cas effector protein.
In some embodiments, each engineered split Cas constructs encodes a flourescent protein, such as GFP or RFP. The reporter proteins may be used to assess co-localization and/or dimerization of the engineered split Cas proteins, e.g., using microscopy. A nucleic acid sequence encoding an engineered Cas effector protein may be fused to a nucleic acid sequence encoding an additional component using a sequence encoding a self-cleaving peptide, such as a T2A, P2A, E2A or F2A peptide.
In some embodiments, there is provided an expression construct for mammalian cells (e.g., human cells) comprising a nucleic acid sequence encoding the engineered Cas effector protein. In some embodiments, the expression construct comprises the codon-optimized sequence encoding the engineered Cas effector protein inserted into a pCAG-2A-eGFP vector, such that the Cas protein is operably linked to eGFP. In some embodiments, a second vector is provided for expression of a guide RNA (e.g., an sgRNA, crRNA, or pre-crRNA array) in mammalian cells (e.g., human cells) . In some embodiments, the sequence encoding the guide RNA is expressed in a pUC19-U6-i2-cr RNA vector backbone. An exemplary two-vector expression system is shown in Fig. 1.
IV. Methods of use
One aspect of the present application provides methods of using the any one of the engineered Cas effector proteins or CRISPR-Cas systems described herein for detecting a target nucleic acid or modifying a nucleic acid in vitro, ex vivo, or in vivo, as well as methods of treatment or diagnosis using the engineered Cas effector proteins or CRISPR-Cas systems. Also provided are use of the engineered Cas effector proteins or CRISPR-Cas systems described herein for detecting or modifying a nucleic acid in a cell, and for treating or diagnosing a disease or condition in a subject; and compositions comprising any one of the engineered Cas effector proteins or one or more components of the engineered CRISPR-Cas systems for use in the manufacture of a medicament for detecting or modifying a nucleic acid in a cell, and for treating or diagnosing a disease or condition in a subject.
Methods of using Cas12i
One aspect of the present application provides methods of cleaving target nucleic acids and genome-editing in mammalian cells (e.g., human cells) using Cas12i, including wildtype or engineered Cas12i effector proteins.
In some embodiments, the present application provides a method of modifying a target sequence in a target nucleic acid, comprising contacting the target nucleic acid with an engineered CRISPR-Cas system at a temperature of about 40℃ to about 67℃, wherein the engineered CRISPR-Cas system comprises: (a) a Cas12i effector protein comprising a Cas12i nuclease or a functional derivative thereof; and (b) a crRNA comprising a guide sequence that is complementary to the target sequence, wherein the Cas12i effector protein and the crRNA are capable of forming a CRISPR complex that specifically binds to a target nucleic acid comprising the target sequence and induces a modification of the target nucleic acid. In some embodiments, the Cas12i effector protein is a Cas12i2 nuclease or a functional derivative thereof. In some embodiments, the method is carried out at an elevated temperature, e.g., at a temperature of about any one of 40, 42, 44, 46, 48, 50, 52, 54, 56, 58, 60, 62, 64, 66, or 67 ℃. In some embodiments, the method is carried out at a temperature of about 40℃ to 50℃, 50℃ to 60℃, 45℃ to 55℃, 55℃ to 65℃, 40℃ to 60℃, or 50℃ to 67℃. In some embodiments, the Cas12i effector protein has non-specific single-strand RNA cleavage activity.
In some embodiments, the method of using a Cas12i effector protein described herein (e.g., engineered Cas12i2 protein) is carried out at a temperature of4℃ to about 40℃, such as about any one of 4-10, 10-20, 20-30, 30-40, 15-37, 4-20, or 20-40℃.
The present application further provides engineered crRNAs that improves the gene-editing efficacy of Cas12i nucleases (e.g., Cas12i1 nuclease) . In some embodiments, the engineered crRNA increases the gene-editing activity of a Cas12i2 nuclease in a human cell by at least about 20% (e.g., at least about 30%, 40%, 50%, 60%, 70%, 80%, or more) compared to a crRNA comprising endogenous repeat sequences (e.g., SEQ ID NO: 171) corresponding to the Cas12i2 nuclease.
In some embodiments, there is provided an engineered crRNA comprising a substitution of one or more Uridine (U) residue with a non-U residue in a repeat sequence comprising at least four U residues. In some embodiments, there is provided an engineered precursor guide RNA array encoding a plurality of the engineered crRNAs described herein.
In some embodiments, the engineered crRNA comprises a spacer sequence of about 17 to 25 (e.g., any one of 17, 18, 19, 20, 21, 22, 23, 24, or 25) nucleotides long. In  some embodiments, the engineered crRNA comprises a spacer sequence of about 20 nucleotides long.
In some embodiments, the engineered crRNA comprises a repeat sequence comprising the nucleic acid sequence of SEQ ID NO: 173.
In some embodiments, there is provided an engineered CRISPR-Cas system comprising: (a) a Cas12i effector protein comprising a Cas12i1 nuclease or a functional derivative thereof; and (b) a crRNA comprising a substitution of one or more Uridine (U) residues with a non-U residue in a repeat sequence comprising at least four U residues and a guide sequence complementary to a target sequence; wherein the Cas12i effector protein and the crRNA are capable of forming a CRISPR complex that specifically binds to a target nucleic acid comprising the target sequence and induces a modification of the target nucleic acid. In some embodiments, there is provided a method of modifying a target sequence in a target nucleic acid using the engineered CRISPR-Cas system.
Methods of using a Cas12i (including wildtype or engineered Cas12i effector proteins) and/or the engineered crRNAs described herein to modify a target nucleic acid in a mammalian cell, methods of treatment, methods of detection, etc. according to any one of the methods described in this section IV. “Methods of use” are also contemplated.
Methods of modification
In some embodiments, the present application provides a method of modifying a target nucleic acid comprising a target sequence, comprising contacting the target nucleic acid with any one of the engineered CRISPR-Cas systems described herein. In some embodiments, the method is carried out in vitro. In some embodiments, the target nucleic acid is present in a cell. In some embodiments, the cell is a bacterial cell, a yeast cell, a mammalian cell, a plant cell, or an animal cell. In some embodiments, the method is carried out ex vivo. In some embodiments, the method is carried out in vivo.
In some embodiments, the target nucleic acid is cleaved or the target sequence in the target nucleic acid is altered by the engineered CRISPR-Cas system. In some embodiments, expression of the target nucleic acid is altered by the engineered CRISPR-Cas system. In some embodiments, the target nucleic acid is a genomic DNA. In some embodiments, the target sequence is associated with a disease or condition. In some embodiments, the engineered CRISPR-Cas system comprises a precursor guide RNA array encoding a plurality of crRNA, wherein each crRNA comprises a different guide sequence. In some embodiments, the method is carried out at a temperature of about 4℃ to about 67℃, such as about any one of 12℃-67℃, 4℃-25℃, 25 ℃ to about 37 ℃, about 37 ℃ to about 45  ℃, about 45 ℃ to about 50 ℃, about 50 ℃ to about 60 ℃, or about 40 ℃ to about 67 ℃. In some embodiments, the method is carried out at a low temperature, such as about 4℃ to about 12℃. In some embodiments, the method is carried out at a high temperature, such as about 40 ℃ to about 67 ℃.
In some embodiments, the present application provides a method of treating a disease or condition associated with a target nucleic acid in cells of an individual, comprising modifying the target nucleic acid in the cells of the individual using any one of the methods described herein, thereby treating the disease or condition. In some embodiments, the disease or condition is selected from the group consisting of cancer, cardiovascular diseases, hereditary diseases, autoimmune diseases, metabolic diseases, neurodegenerative diseases, ocular diseases, bacterial infections and viral infections.
The engineered CRISPR-Cas systems described herein can modify a target nucleic acid in a cell in a variety of ways, depending on the types of engineered Cas effector protein in the CRISPR-Cas system. In some embodiments, the method induces a site-specific cleavage in the target nucleic acid. In some embodiments, the method cleaves a genomic DNA in a cell, such as a bacterial cell, a plant cell, or an animal cell (e.g., a mammalian cell) . In some embodiments, the method kills a cell by cleaving a genomic DNA in the cell. In some embodiments, the method cleaves a viral nucleic acid in a cell.
In some embodiments, the method alters (such as increase or decrease) the expression level of the target nucleic acid in the cell. In some embodiments, the method increases the expression level of the target nucleic acid in the cell, e.g., using an engineered Cas effector protein based on an enzymatically inactive Cas protein fused to a transactivation domain (s) . In some embodiments, the method reduces the expression level of the target nucleic acid in the cell, e.g., using an engineered Cas effector protein based on an enzymatically inactive Cas protein fused to a transcription repressor domain (s) . In some embodiments, the method introduces epigenetic modifications to the target nucleic acid in the cell, e.g., using an engineered Cas effector protein based on an enzymatically inactive Cas protein fused to epigenetic modification domains. The engineered Cas systems described herein may be used to introduce other modifications to the target nucleic acid, depending on the functional domains comprised by the engineered Cas effector proteins.
In some embodiments, the method alters a target sequence in the target nucleic acid in the cell. In some embodiments, the method introduces a mutation to the target nucleic acid in the cell. In some embodiments, the method uses one or more endogenous DNA repair pathways, such as Non-homologous end joining (NHEJ) or Homology directed  recombination (HDR) , in the cell to repair a double-strand break induced in a target DNA as a result of sequence-specific cleavage by the CRISPR complex. Exemplary mutations include, but are not limited to, insertions, deletions, substitutions, and frameshifts. In some embodiments, the method inserts a donor DNA at the target locus. In some embodiments, the insertion of the donor DNA results in introduction of a selection marker or a reporter protein to the cell. In some embodiments, the insertion of the donor DNA results in knock-in of a gene. In some embodiments, the insertion of the donor DNA results in a knockout mutation. In some embodiments, the insertion of the donor DNA results in a substitution mutation, such as a single nucleotide substitution. In some embodiments, the method induces a phenotypic change to the cell.
In some embodiments, the engineered CRISPR-Cas system is used a part of a genetic circuit, or for inserting a genetic circuit into the genomic DNA of a cell. The inducer-controlled engineered split Cas effector proteins described herein may be especially useful as a component of a genetic circuit. Genetic circuits can be useful for gene therapy. Methods and techniques of designing and using genetic circuits are known in the art. Further reference may be made to, for example, Brophy, Jennifer AN, and Christopher A. Voigt. "Principles of genetic circuit design. " Nature methods 11.5 (2014) : 508.
The engineered CRISPR-Cas systems described herein are useful for modifying a wide range of target nucleic acids. In some embodiments, the target nucleic acid is in a cell. In some embodiments, the target nucleic acid is a genomic DNA. In some embodiments, the target nucleic acid is an extrachromosomal DNA. In some embodiments, the target nucleic acid is exogenous to a cell. In some embodiments, the target nucleic acid is a viral nucleic acid, such as viral DNA. In some embodiments, the target nucleic acid is a plasmid is a cell. In some embodiments, the target nucleic acid is a horizontally transferred plasmid. In some embodiments, the target nucleic acid is a RNA.
In some embodiments, the target nucleic acid is an isolated nucleic acid, such as an isolated DNA. In some embodiments, the target nucleic acid is present in a cell-free environment. In some embodiments, the target nucleic acid is an isolated vector, such as a plasmid. In some embodiments, the target nucleic acid is an isolated linear DNA fragment.
The methods described herein are applicable for any suitable cell type. In some embodiments, the cell is a bacterium, a yeast cell, a fungal cell, an algal cell, a plant cell, or an animal cell. (e.g., a mammalian cell, such as a human cell) . In some embodiments, the cell is a cell isolated from natural sources, such as a tissue biopsy. In some embodiments, the cell is a cell isolated from an in vitro cultured cell line. In some embodiments, the cell is from a  primary cell line. In some embodiments, the cell is from an immortalized cell line. In some embodiments, the cell is a genetically engineered cell.
In some embodiments, the cell is an animal cell from an organism selected from the group consisting of cattle, sheep, goat, horse, pig, deer, chicken, duck, goose, rabbit, and fish.
In some embodiments, the cell is a plant cell from an organism selected from the group consisting of maize, wheat, barley, oat, rice, soybean, oil palm, safflower, sesame, tobacco, flax, cotton, sunflower, pearl millet, foxtail millet, sorghum, canola, cannabis, a vegetable crop, a forage crop, an industrial crop, a woody crop, and a biomass crop.
In some embodiments, the cell is a mammalian cell. In some embodiments, the cell is a human cell. In some embodiments, the human cell is a human embryonic kidney 293T (HEK293T or 293T) cell or a HeLa cell. In some embodiments, the cell is a human embryonic kidney (HEK293T) cell. In some embodiments, the mammalian the mammalian cell is selected from the group consisting of an immune cell, a hepatic cell, a tumor cell, a stem cell, a zygote, a muscle cell, and a skin cell.
In some embodiments, the cell is an immune cell selected from the group consisting of a cytotoxic T cell, a helper T cell, a natural killer (NK) T cell, an iNK-T cell, an NK-T like cell, a γδ T cell, a tumor-infiltrating T cell and a dendritic cell (DC) -activated T cell. In some embodiments, the method produces a modified immune cell, such as a CAR-T cell or a TCR-T cell.
In some embodiments, the cell is an embryonic stem (ES) cell, an induced pluripotent stem (iPS) cell, a progenitor cell of a gamete, a gamete, a zygote, or a cell in an embryo.
The methods described herein can be used to a modify a target cell in vivo, ex vivo or in vitro and, may be conducted in a manner that alters the cell such that once modified the progeny or cell line of the modified cell retains the altered phenotype. The modified cells and progeny may be part of a multi-cellular organism such as a plant or animal with ex vivo or in vivo applications, such as genome editing and gene therapy.
In some embodiments, the method is carried out ex vivo. In some embodiments, the modified cell (e.g., mammalian cell) is propagated ex vivo after introduction of the engineered CRISPR-Cas system into the cell. In some embodiments, the modified cell is cultured to propagate for at least about any of 1 day, 2 days, 3 days, 4 days, 5 days, 6 days, 7 days, 10 days, 12 days, or 14 days. In some embodiments, the modified cell is cultured for no more than about any of 1 day, 2 days, 3 days, 4 days, 5 days, 6 days, 7 days, 10 days, 12 days,  or 14 days. In some embodiments, the modified cell is further evaluated or screened to select cells with one or more desirable phenotypes or properties.
In some embodiments, the target sequence is a sequence associated with a disease or condition. Exemplary diseases or conditions include, but are not limited to, cancer, cardiovascular diseases, hereditary diseases, autoimmune diseases, metabolic diseases, neurodegenerative diseases, ocular diseases, bacterial infections and viral infections. In some embodiments, the disease or condition is a genetic disease. In some embodiments, the disease or condition is a monogenetic disease or condition. In some embodiments, the disease or condition is a polygenetic disease or condition.
In some embodiments, the target sequence has a mutation compared to a wildtype sequence. In some embodiments, the target sequence has a single-nucleotide polymorphism (SNP) associated with a disease or condition.
In some embodiments, the donor DNA that is inserted into the target nucleic acid encodes a biological product selected from the group consisting of a reporter protein, an antigen-specific receptor, a therapeutic protein, an antibiotic resistance protein, an RNAi molecule, a cytokine, a kinase, an antigen, an antigen-specific receptor, a cytokine receptor, and a suicide polypeptide. In some embodiments, the donor DNA encodes a therapeutic protein. In some embodiments, the donor DNA encodes a therapeutic protein useful for gene therapy. In some embodiments, the donor DNA encodes a therapeutic antibody. In some embodiments, the donor DNA encodes an engineered receptor, such as a chimeric antigen receptor (CAR) , or an engineered TCR. In some embodiments, the donor DNA encodes a therapeutic RNA, such as a small RNA (e.g., siRNA, shRNA, or miRNA) , or a long non-coding RNA (lincRNA) .
The methods described herein may be used for multiplex gene editing or regulation at two or more (e.g., 2, 3, 4, 5, 6, 8, 10 or more) different target loci. In some embodiments, the method detects or modifies a plurality of target nucleic acids or target nucleic acid sequences. In some embodiments, the method comprises contacting the target nucleic acid with a guide RNA comprises a plurality (e.g., 2, 3, 4, 5, 6, 8, 10 or more) of crRNA sequences, wherein each crRNA comprises a different target sequence.
Also provided are engineered cells comprising a modified target nucleic acid, which are produced using any one of the methods described herein. The engineered cells may be used for cell therapy. Autologous or allogeneic cells may be used to prepare engineered cells using the methods described herein for cell therapy.
The methods described herein may also be used to generate isogenic lines of cells (e.g., mammalian cells) to study genetic variants.
Also provided are engineered non-human animals comprising the engineered cells described herein. In some embodiments, the engineered non-human animals are genome-edited non-human animals. The engineered non-human animals can be used as disease models.
Techniques for producing non-human genome-edited or transgenic animals are well known in the art and include, but are not limited to, pronuclear microinjection, viral infection, and transformation of embryonic stem cells and induced pluripotent stem (iPS) cells. Detailed methods that can be used include, but are not limited to, those described in Sundberg and Ichiki (2006, Genetically Engineered Mice Handbook, CRC Press) and Gibson (2004, A Primer Of Genome Science 2nd ed. Sunderland, Mass.: Sinauer) .
The engineered animals may be of any suitable species, including, but not limited to, such as bovids, equids, ovids, canids, cervids, felids, goats, swine, primates as well as less commonly known mammals such as elephants, deer, zebra, or camels.
Methods of treatment
Further provided are methods of treatment using any one of the methods of modifying a target nucleic acid in a cell described herein, and methods of diagnosis using any one of the methods of detecting a target nucleic acid described herein.
In some embodiments, the present application provides a method of treating a disease or condition associated with a target nucleic acid in cells of an individual, comprising contacting the target nucleic acid with any one of the engineered CRISPR-Cas systems described herein, wherein the guide sequence of the guide RNA is complementary to a target sequence of the target nucleic acid, wherein the engineered Cas effector protein and the guide RNA associate with each other to bind to the target nucleic acid to modify the target nucleic acid, thereby the disease or condition is treated. In some embodiments, a mutation (e.g., knockout or knock-in mutation) is introduced to the target nucleic acid. In some embodiments, expression of the target nucleic acid is enhanced. In some embodiments, expression of the target nucleic acid is inhibited.
In some embodiments, the present application provides a method of treating a disease or condition in an individual, comprising administering to the individual an effective amount of any one of the engineered CRISPR-Cas systems described herein, and a donor DNA encoding a therapeutic agent, wherein the guide sequence of the guide RNA is complementary to a target sequence of a target nucleic acid of the individual, wherein the  engineered Cas effector protein and the guide RNA associate with each other to bind to the target nucleic acid and inserts the donor DNA in the target sequence, thereby the disease or condition is treated.
In some embodiments, the present application provides a method of treating a disease or condition in an individual, comprising administering to the individual an effective amount of engineered cells comprising a modified target nucleic acid, wherein the engineered cells are prepared by contacting the cell with any one of the engineered CRISPR-Cas systems described herein, wherein the guide sequence of the guide RNA is complementary to a target sequence of the target nucleic acid, wherein the engineered Cas effector protein and the guide RNA associate with each other to bind to the target nucleic acid to modify the target nucleic acid. In some embodiments, the engineered cells are immune cells.
In some embodiments, the individual is a human being. In some embodiments, the individual is an animal, e.g., a model animal such as a rodent, a pet, or a farm animal. In some embodiments, the individual is a mammal.
In some embodiments, the disease or condition is selected from the group consisting of cancer, cardiovascular diseases, hereditary diseases, autoimmune diseases, metabolic diseases, neurodegenerative diseases, ocular diseases, bacterial infections and viral infections. In some embodiments, the target nucleic acid is PCSK9. In some embodiments, the disease or condition is a cardiovascular disease. In some embodiments, the disease or condition is a coronary artery disease. In some embodiments, the method reduces cholesterol levels in an individual. In some embodiments, the method treats diabetes in the individual.
Methods of detection
The present application also provides methods of using any one of the engineered Cas effector proteins with improved activity or CRISPR-Cas systems for detection of a target nucleic acid. The use of Cas effector proteins as detection agents takes advantage of the discovery that type V CRISPR/Cas proteins (e.g., Cas 12 proteins such as Cas12a, Cas12b, Cas12c, Cas12d, Cas12e (CasX) , and Cas12i) can promiscuously cleave non-targeted single stranded DNA (ssDNA) once activated by detection of a target DNA. Methods of using Cas proteins as detection agents have been described, for example, in US10253365 and WO2020/056924, which are herein incorporated by reference in their entirety.
In some embodiments, once a type V CRISPR/Cas effector protein (e.g., a Cas12 protein such as Cas12a, Cas12b, Cas12c, Cas12d, Cas12e (CasX) , or Cas12i) is activated by a guide RNA, which occurs when a sample includes a target DNA to which the guide RNA hybridizes (i.e., the sample includes the targeted DNA) , the Cas effector protein becomes a  nuclease that promiscuously cleaves single strand nucleic acids (e.g., non-target ssDNAs or RNAs, i.e., single strand nucleic acid to which the guide sequence of the guide RNA does not hybridize) . Thus, when the targeted DNA (double or single stranded) is present in the sample (e.g., in some cases above a threshold amount) , the result is cleavage of single strand nucleic acids in the sample, which can be detected using any convenient detection method (e.g., using a labeled single stranded detector nucleic acid, such as DNA or RNA) . Cas12i can cleave ssDNA and ssRNA.
In some embodiments, there is provided a method of detecting a target DNA (e.g., double stranded or single stranded) in a sample, comprising: (a) contacting the sample with: (i) any one of the engineered type V CRISPR/Cas effector proteins described herein (e.g., a Cas12 protein such as Cas12a, Cas12b, Cas12c, Cas12d, Cas12e (CasX) , or Cas12i) ; (ii) a guide RNA comprising a guide sequence that hybridizes with the target DNA; and (iii) a detector nucleic acid that is single stranded (i.e., a “single stranded detector nucleic acid” ) and does not hybridize with the guide sequence of the guide RNA; and (b) measuring a detectable signal produced by cleavage of the single stranded detector nucleic acid by the engineered type V CRISPR/Cas effector protein. In some cases, the single stranded detector nucleic acid includes a fluorescence-emitting dye pair (e.g., a fluorescence-emitting dye pair is a fluorescence resonance energy transfer (FRET) pair, a quencher/fluor pair) . In some cases, the target DNA is a viral DNA (e.g., papovavirus, hepadnavirus, herpesvirus, adenovirus, poxvirus, parvovirus, and the like) . In some embodiments, the single stranded detector nucleic acid is a DNA. In some embodiments, the single stranded detector nucleic acid is a RNA. In some embodiments, the engineered Cas effector protein is an engineered Cas12i nuclease, such as enCas12i2.
A method of the present disclosure for detecting a target DNA (single-stranded or double-stranded) in a sample can detect a target DNA with a high degree of sensitivity. In some cases, a method of the present disclosure can be used to detect a target DNA present in a sample comprising a plurality of DNAs (including the target DNA and a plurality of non-target DNAs) , where the target DNA is present at one or more copies per 10 7 non-target DNAs (e.g., one or more copies per 10 6 non-target DNAs, one or more copies per 10 5 non-target DNAs, one or more copies per 10 4 non-target DNAs, one or more copies per 10 3 non-target DNAs, one or more copies per 10 2 non-target DNAs, one or more copies per 50 non-target DNAs, one or more copies per 20 non-target DNAs, one or more copies per 10 non-target DNAs, or one or more copies per 5 non-target DNAs) .
In some embodiments, the engineered Cas effector proteins described herein can detect a target DNA with a higher degree of sensitivity compared to the reference Cas nuclease. In some embodiments, the engineered Cas effector protein can detect a target DNA with 10%, 15%, 20%, 25%, 30%, 40%, 50%, 60%, 70%, 80%, 90%or higher sensitivity compared to the reference Cas nuclease.
Methods of delivery
In some embodiments, the engineered CRISPR-Cas systems described herein, or components thereof, nucleic acid molecules thereof, or nucleic acid molecules encoding or providing components thereof, can be delivered to host cells by various delivery systems such as plasmid or viral vectors (e.g., any one of the vectors described in the “Constructs and Vectors” subsection above) . In some embodiments or methods, the engineered CRISPR-Cas systems can be delivered by other methods, such as nucleofection or electroporation of ribonucleoprotein complexes consisting of the engineered Cas effector proteins and their cognate RNA guide or guides.
In some embodiments, the delivery is via nanoparticles or exosomes.
In some embodiments, paired Cas nickase complexes can be delivered directly using nanoparticle or other direct protein delivery methods, such that complexes containing both paired crRNA elements are co-delivered. Furthermore, protein can be delivered to cells by viral vector or directly, followed by the direct delivery of a CRISPR array containing two paired spacers for double nicking. In some instances, for direct RNA delivery the RNA may be conjugated to at least one sugar moiety, such as N-acetyl galactosamine (GalNAc) (particularly, triantennary GalNAc) .
V. Kits and articles of manufacture
Further provided are compositions, kits, unit dosages, and articles of manufacture comprising one or more components of any one of the engineered Cas effector proteins or engineered CRISPR-Cas systems described herein.
In some embodiments, there is provided a kit comprising: one or more AAV vectors encoding any one of the engineered Cas effector proteins or engineered CRISPR-Cas systems described herein. In some embodiments, the kit further comprises one or more guide RNAs. In some embodiments, the kit further comprises a donor DNA. In some embodiments, the kit further comprises a cell, such as a human cell.
The kits may contain one or more additional components, such as containers, reagents, culturing media, cytokines, buffers, antibodies, and the like to allow propagation of an engineered cell. The kits may also contain a device for administration of the composition.
The kit may further comprise instructions for using the engineered CRISPR-Cas system described herein, such as methods of detecting or modifying a target nucleic acid. In some embodiments, the kit comprises instructions for treating or diagnosing a disease or condition. The instructions relating to the use of the components of the kit generally include information as to dosage, dosing schedule, and route of administration for the intended treatment. The containers may be unit doses, bulk packages (e.g., multi-dose packages) or sub-unit doses. For example, kits may be provided that contain sufficient dosages of the composition as disclosed herein to provide effective treatment of an individual for an extended period. Kits may also include multiple unit doses of the composition and instructions for use, packaged in quantities sufficient for storage and use in pharmacies, for example, hospital pharmacies and compounding pharmacies.
The kits of the invention are in suitable packaging. Suitable packaging includes, but is not limited to, vials, bottles, jars, flexible packaging (e.g., sealed Mylar or plastic bags) , and the like. Kits may optionally provide additional components such as buffers and interpretative information. The present application thus also provides articles of manufacture, which include vials (such as sealed vials) , bottles, jars, flexible packaging, and the like.
The article of manufacture can comprise a container and a label or package insert on or associated with the container. Suitable containers include, for example, bottles, vials, syringes, etc. The containers may be formed from a variety of materials such as glass or plastic. Generally, the container holds a composition which is effective for treating a disease or disorder described herein, and may have a sterile access port (for example the container may be an intravenous solution bag or a vial having a stopper pierceable by a hypodermic injection needle) . The label or package insert indicates that the composition is used for treating the particular condition in an individual. The label or package insert will further comprise instructions for administering the composition to the individual.
Package insert refers to instructions customarily included in commercial packages of therapeutic products that contain information about the indications, usage, dosage, administration, contraindications and/or warnings concerning the use of such therapeutic products.
Additionally, the article of manufacture may further comprise a second container comprising a pharmaceutically-acceptable buffer, such as bacteriostatic water for injection  (BWFI) , phosphate-buffered saline, Ringer′ssolution and dextrose solution. It may further include other materials desirable from a commercial and user standpoint, including other buffers, diluents, filters, needles, and syringes.
EXAMPLES
The examples below are intended to be purely exemplary of the invention and should therefore not be considered to limit the invention in any way. The following examples and detailed description are offered by way of illustration and not by way of limitation.
EXAMPLE 1: Pipeline for engineering enzymes with improved efficiency
This example provides a strategy for designing Cas enzymes with enhanced conformational transition dynamics that result in better catalytic efficiency of Cas endonucleases. The exemplary method provided herein allows engineering of Cas proteins that have improved activity, such as target binding, double-strand cleavage activity, nickase activity, and/or gene-editing activity.
We reasoned that the conditions for Cas conformational transition may not be optimal in human cells, and that enhanced conformational transition dynamics might result in better catalytic efficiency of Cas enzymes. To design Cas proteins with enhanced conformational transition dynamics, we designed a strategy of increasing the flexibility of flexible regions in the Cas enzyme. Fig. 1 shows the pipeline used to design variants of SaCas9 as an example of the design workflow.
Enhanced Cas Enzyme Design Pipeline
First, we identified flexible regions of the Cas enzymes using the DynaMine molecular dynamics predictor (Cilia et al. The DynaMine webserver: predicting protein dynamics from sequence. Nucleic Acids Res. 2014 Jul 1; 42 (Web Server issue) : W264-W270) . From the molecular dynamics predictions, we obtained the S2 order parameter score profile of the Cas protein. The S2 order parameter can range from 0-1, with 1 corresponding to a rigid bond-vector and 0 corresponding to complete flexibility.
Based on the S2 order parameter score profile obtained from DynaMine, we defined peak amino acids (peak aa) as amino acids whose score is the lowest compared with the scores of the 5 neighboring amino acids on either side (preceding or following the peak aa) . Based on this definition, we identified peak amino acids. We selected regions containing  a peak amino acid and its neighboring 2 amino acids on either side (preceding or following the peak aa) as candidate flexible regions for engineering.
We next engineered Cas variants with enhanced flexibility in the candidate flexible regions. In this example, we used the following strategies for engineering candidate flexible regions with enhanced flexibility:
(a) Glycine substitution
If a highly hydrophobic amino acids (aa; L, I, V, C, Y, F, W) existed in the candidate flexible region, the hydrophobic residue was substituted with a glycine (G) .
(b) Glycine insertion
Within the candidate flexible region, we inserted two G in front of the most flexible aa of the flexible region. The most flexible aa was selected based on the following priority order G>S>N>D>H>M>T>E>Q>K>R>A>P.
If there were more than one equally most flexible aa in the flexible region, we defined the most flexible aa as the aa which is closer to the peak aa.
If there were more than one equally most flexible aa in the region and both of the most flexible aa had the same distance to the peak aa, we defined the most flexible aa as the aa whose neighboring aa has the priority.
The resulting Cas variants were cloned and purified as described in the methods section below, and the cleavage efficiency of the variants was measured in human cells.
Methods
Plasmid construction
The coding sequence of BhCas12bv4, Cas12i2 and GeoCas9 were codon optimized (human) and synthesized. The variants of Cas protein were created by PCR-based site-directed mutagenesis. For prokaryotic expression, the coding sequences of Cas12i2 and its variants were ligated into a BPK2104-ccdB expression vector using T4 ligase, with the vector digested by XmaI and SpeI. The fusion construct contains a fused 10xHis tag at the C terminal of the protein. Cas effector proteins were expressed in human 293T cells by pCAG-2A-eGFP vector. The DNA coding Cas proteins were inserted between XmaI and NheI. The vector for expression of sgRNA or crRNA of BhCas12bv4, GeoCas9 and Cas12i2 in 293T were constructed by ligating annealed oligos containing targeted sequence into BasI-digested pUC19-U6-i2-crRNA scaffold.
Protein purification
The prokaryotic expression plasmid was transformed in E. coli strain BL21 (λDE3) and the transformants were plated on solid LB with chloramphenicol (CmR) . Picked 3-5  clones in to 15ml liquid LB with CmR to cultivate overnight at 37℃. Then 2ml of the culture was transferred into 300ml liquid LB to cultivate until the OD600 achieved at 1.2, following induction with IPTG at 16 ℃ for 16 h. Cell pellets were resuspended with lysis buffer followed by sonication. The supernatant after centrifugation was kept for future purification. Target protein was obtained by one-step purification using Ni column. Target protein was sterilized by 0.22 μm filter before stored in aliquots. The concentration was determined by Bradford protein assay with BSA as standard.
In vitro RNA transcription
PCR amplified dsDNA containing T7 promoter was used as in vitro transcription template to produce crRNA with HISRIBE TM T7 Quick High Yield RNA Synthesis Kit (NEB) . The transcribed crRNA were purified using Oligo Clean &Concentrator (ZYMO TM Research) and quantitated on NANODROP TM 2000 (Thermo Fisher Scientific) .
Cell culture, transfection, and fluorescence-activated cell sorting (FACS)
HEK293T cells were cultured in DMEM (Gibco) with 1% Penicillin-Streptomycin (Gibco) and 10%fetal bovine serum (Gibco) . Cells were seeded in 24-cell dish (Corning) for 16h until the confluency reached 70%. 600ng of plasmid encoding Cas protein and 3000ng of plasmid encoding crRNA were transfected into each 24-cell by using Lipofectamine 3000 (Invitrogen) . Fluorescence-activated cell sorting (FACS) HEK293T cells were digested by Trypsin-EDTA (0.05%) (Gibco) after transfecting 68h. Cell sorting was using MoFlo XDP (Beckman Coulter) with GFP signal.
T7 endonuclease I (T7EI) assay and targeted deep sequencing analysis for genomic modifications
FACS-sorted GFP-positive 293FT cells were lysed with Buffer L and incubated at 55 ℃ for 3 h following 95 ℃ for 10 min. The dsDNA fragments containing target sites in different genomic loci were PCR-amplified using the corresponding primer. For T7E 1 assay, 200~400ng PCR products were used, adding ddH2O to a final volume of 10 μL. Then the mix was subjected to reannealing procedure to form heteroduplex dsDNA. Then, the mix was treated with 1/10 volume of NEBUFFER TM 2.1 and 0.2 μL T7EI (NEB) at 37 ℃ for 50 min. The digested product was analyzed by ~3%agarose gel electrophoresis. Indels were calculated based on previous methods. The PCR products containing mutation identified by T7E1 assay were cloned in TA-cloning vector, which then transformed into competent E. coli. Colonies were randomly picked and sent for Sanger-sequencing after overnight culture. For targeted deep sequencing, target sites were amplified by barcoded PCR directly using cell lysate as template. The PCR products were purified and pooled into several libraries for high- throughput sequencing. The indel (%) were analyzed by CRISPResso2 by calculating the ratio of reads containing insertions or deletions. The reads whose number were lower than 0.05%of the whole reads were discarded.
EXAMPLE 2: Design and characterization of engineered BhCas12b
This example describes design and characterization of a Bacillus hisashii Cas12bv4 (BhCas12bv4) engineered variant with improved gene-editing activity in human cells using the design pipeline described in Example 1.
Flexible regions of Cas12bv4 were identified computationally using DynaMine as described in Example 1 above. A peak aa and candidate flexible region for engineering were identified as described in Example 1 above. Fig. 2 shows the flexibility (S2 score) profile of BhCas12bv4, with the selected peak aa indicated by a circle. The flexible region sequence (SEQ ID NO: 81) and Y>G substitution of the engineered variant (SEQ ID NO: 82) are shown in Table 1 below. The amino acid positions are based on SEQ ID NO: 1. BhCas12bv4 has no solved crystal structure, but the highly homologous BthCas12b (>98%homology) has an available crystal structure, so the linker of BhCas12bv4 as shown in Table 1 is determined based on the structure and homology of BthCas122b.
Table 1. Candidate flexible region and modified sequence of enBhCas12b4.
Figure PCTCN2020134249-appb-000025
The resulting BhCas12bv4 variant (enBhCas12bv4 1.1, set forth in SEQ ID NO: 2) was cloned and purified as described in Example 1. We then tested editing efficiency of enBhCas12bv4 1.1 in human 293T cells for 11 genomic loci compared to the efficiency of wild type BhCas12bv4, and found that the editing efficiency of the engineered variant was significantly improved (Fig. 3) . The most dramatic improvement in editing efficiency was observed at genomic sites where wild type BhCas12bv4 had low editing efficiency, such as RNF2-5, CCR5-8, and CCR5-1 (Fig. 3) .
EXAMPLE 3: Design and characterization of engineered Cas12i2
This example describes design and characterization of a Cas12i2 engineered variant with improved gene-editing activity in human cells using the design pipeline described in Example 1.
Cas12i2 has no known 3D structure and no homologue whose structure has been resolved. We thus sought to test whether our method, which does not require a resolved structure, could be used to engineer Cas12i2 variants with improved activity, such as target binding, double-strand cleavage activity, nickase activity, and/or gene-editing activity.
Flexible regions of Cas12i2 were identified computationally using DynaMine as described in Example 1 above. The flexibility (S2 score) profile of Cas12i2 is shown in Fig. 4. Based on the S2 score profile, we selected peaks with S2 scores of less than 0.71 as flexible regions. Although no resolved structural information is available of Cas12i2 or a closely related homologue, we found that the flexible regions we selected were associated with the linker regions of Cas12i2 identified by computational secondary structure predictions, as shown in Fig. 5 (Buchan DWA, Jones DT (2019) . The PSIPRED Protein Analysis Workbench: 20 years on. Nucleic Acids Research) .
The selected flexible region amino acid sequences and modified amino acid sequences of engineered flexible variant regions are shown in Table 2 below. The amino acid positions are based on SEQ ID NO: 8.
Table 2. Selected candidate flexible regions and modified sequences of engineered Cas12i2.
Figure PCTCN2020134249-appb-000026
Figure PCTCN2020134249-appb-000027
Genome-editing assay
To analyze the activity of the engineered Cas12i2 variants in human cells, four target sites were selected in the human genome (CCR5-3, CCR5-2, RNF2-7, and CCR5-8) . The editing efficiency (%indels) of the Cas12i2 variants compared to wild type Cas12i2 was determined at each site. Of the variants tested, Cas12i2-2.2 (SEQ ID NO: 14) and Cas12i2-6.1 (SEQ ID NO: 18) significantly improved gene editing efficiency. Notably, variant 6.1 significantly improved editing efficiency at all four target sites (Fig. 6) .
Based on the improved editing efficiency of Cas12i2 variants 2.2 and 6.1, we then tested whether the editing efficiency at five target sites (CCR5-3, CCR5-2, AAVS1-1, CCR5-15, RNF2-7) could be further improved by the combination of variant 2.2 and 6.1 mutations. The combined variant, 2.2+6.1 further improved the editing efficiency of Cas12i2 (Fig. 7) . Herein, we refer to the combination variant Cas12i2-2.2+6.1 as “enCas12i2” (SEQ ID NO: 20) .
We next tested the overall genome editing efficiency in human cells of enCas12i2 compared to SpCas9 and BhCas12b-v4. Editing efficiency (indel %) was analyzed at 46 loci for enCas12i2, 18 loci for SpCas9, and 23 loci for BhCas12bv4 (Fig. 8A) . We demonstrated that enCas12i2 can efficiently recognize the protospacer adjacent motif (PAM) sites NTTA, NTTC, NTTG, and NTTT, and ATTN, CTTN, GTTN, and TTTN (Fig. 8B) . Finally, we demonstrated that enCas12i2 retains the ability to process pre-crRNA (Fig. 9) . The efficiency of gene editing in the form of pre-crRNA was equivalent to the efficiency of gene editing in the form of single crRNA.
In vitro cleavage assay
We also compared in vitro plasmid cleavage activity of enCas12i2 with wildtype Cas12i2. Detailed description of the in vitro cleavage assay can be found in Example 6. Briefly, the target plasmid was incubated with different concentration of RNP formed by Cas12i2 or enCas12i2 plus crRNA. The reaction was conducted at 37℃ for 10min. As shown in FIG. 10, the purified enCas12i2 protein exhibited increased dsDNA cleavage activity at 37 ℃ than wild type Cas12i2 protein.
Nucleic acid detection assay
Finally, we used wildtype and engineered Cas12i2 in a nucleic acid detection assay. Briefly, 60 nM Cas12i2 or its engineered variant was mixed with 108 nM crRNA, 40 nM activator, Rnase inhibitor, 200 nM synthesized FQ-ssDNA-5T detector (5‘-6-FAM-TTTTT-BHQ1-3’) with 1X NEBuffer 2 in a single 20 μl reaction system. The reactions were conducted at 37℃ for 15 cycle in Applied Biosystems 7500 real-time PCR system (Thermo Fisher) with FAM channel measured every minute. ΔRn value were exported and analyzed by SigmaPlot software. The sequence of the trans-XBP activator is shown in SEQ ID NO: 230, and the XBP1 target site is shown in SEQ ID NO: 231.
As shown in FIG. 11A, both wildtype Cas12i2 and the engineered Cas12i2 variants (2.2, 6.1, and 2.2+6.1) were capable of detecting double-strand DNA (i.e., the trans-XBP activator) containing the XBP target site. The ΔRn value corresponds to the level of detector nucleic acid cleavage. The engineered Cas12i2 variants demonstrated higher detection activity than wildtype Cas12i2.
Furthermore, using detector nucleic acids containing RNA nucleotides, we found that wildtype Cas12i2 can cleave an RNA-based fluorescent reporters containing rUs (5‘-6-FAM-UUUUU-BHQ1-3’) in nucleic acid detection experiment (Fig. 11B) .
Together, these results demonstrated that Cas12i2 can be effectively modified by our method of engineering Cas proteins with improved activity. The method does not rely on three-dimensional structural information of Cas12i2.
EXAMPLE 4: Design and characterization of engineered GeoCas9
This example describes design and characterization of GeoCas9 engineered variants with improved gene-editing activity in human cells using the design pipeline described in Example 1.
In order to show that our method is broadly applicable to Cas proteins other than Cas12 proteins, we next generated and tested engineered variants of GeoCas9.
Flexible regions of GeoCas9 were identified computationally using DynaMine as described in Example 1 above. The flexibility (S2 score) profile of GeoCas9 is shown in Fig. 12. Based on the S2 score profile, we selected flexible regions from 18 aa peaks with S2 scores lower than 0.71. The flexible regions we selected were associated with linker regions of GeoCas9 identified by computational secondary structure predictions, as shown in Fig. 13 (Buchan DWA, Jones DT (2019) . The PSIPRED Protein Analysis Workbench: 20 years on. Nucleic Acids Research) .
The selected flexible region amino acid sequences and modified amino acid sequences of engineered flexible variant regions are shown in Table 3. The amino acid positions are based on SEQ ID NO: 25.
Table 3. Selected candidate flexible regions and modified sequences of engineered GeoCas9.
Figure PCTCN2020134249-appb-000028
Figure PCTCN2020134249-appb-000029
Targeted deep sequencing assay
To screen the engineered GeoCas9 variants for improved gene editing activity, we performed a targeted deep sequencing assay to determine the efficiency of indel generation  by GeoCas9. Cells expressing the engineered GeoCas9 variants were first isolated by FACS sorting for GFP-positive 293T cells. The cells were lysed in Buffer L and incubated at 55 ℃ for 3 h following 95 ℃ for 10 min. Target sites containing CD34-1 or CD34-2 were amplified by barcoded PCR directly using cell lysate as template. The PCR products were purified and pooled into several libraries for high-throughput sequencing. The indel (%) were analyzed by CRISPResso2 by calculating the ratio of reads containing insertions or deletions. The reads whose number were lower than 0.05%of the whole reads were discarded. The targeted deep sequencing results are shown in Fig. 14. Engineered GeoCas9 with significantly improved editing efficiency include 2.1, 3.1, 8.1, 9.1 and 12.1.
Methods
HEK293T cells were cultured in DMEM (Gibco) with 1% Penicillin-Streptomycin (Gibco) and 10%fetal bovine serum (Gibco) . Cells were seeded in 24-cell dish (Corning) for 16h until the confluency reached 70%. 600ng of plasmid encoding Cas protein and 3000ng of plasmid encoding crRNA were transfected into each 24-cell by using Lipofectamine 3000 (Invitrogen) . Fluorescence-activated cell sorting (FACS) HEK293T cells were digested by Trypsin-EDTA (0.05%) (Gibco) after transfecting 68h. Cell sorting was using MoFlo XDP (Beckman Coulter) with GFP signal.
FACS-sorted GFP-positive 293FT cells were lysed with Buffer L and incubated at 55 ℃ for 3 h following 95 ℃ for 10 min. The dsDNA fragments containing target sites in different genomic loci were PCR-amplified using the corresponding primers (Table 4 below) . For the T7E1 assay, 200~400ng PCR products were used, adding ddH2O to a final volume of 10 μL. Then the mix was subjected to reannealing procedure to form heteroduplex dsDNA. Then, the mix was treated with 1/10 volume of NEBUFFER TM 2.1 and 0.2 μL T7EI (NEB) at 37 ℃ for 50 min. The digested product was analyzed by ~3%agarose gel electrophoresis. Indels were calculated based on previous methods (Cong et al., 2013) . The PCR products containing mutation identified by T7E1 assay were cloned in TA-cloning vector, which then transformed into competent E. coli. Colonies were randomly picked and sent for Sanger-sequencing after overnight culture.
Table 4. Primers used.
Primer Sequence SEQ ID NO
AAVS1-F1 TCAGTCTGAAGAGCAGAGCCAGGAA 222
AAVS1-R1 TTATATTGTTCCTCCGTGCGTCAGT 223
CCR5-F1 CTTGTCATGGTCATCTGCTACTCGG 224
CCR5-R1 ATATTTCCTGCTCCCCAGTGGATCG 225
CD34-F1 TTGAAATGAGTTTGGTCAGGGATGG 226
CD34-R1 AACTGTGTATTTCCGTGCTGATTCC 227
RNF2-F1 GGAGCTGTAGGCGATTATAGTTGAA 228
RNF2-R1 TTCTCAAACCCTGGAAAGCACTTT 229
EXAMPLE 5: Design and characterization of engineered SaCas9
This example describes design and characterization of a SaCas9 engineered variant with improved gene-editing activity in human cells using the design pipeline described in Example 1.
Flexible regions of SaCas9 were identified computationally using DynaMine as described in Example 1 above. The flexibility (S2 score) profile of SaCas9 is shown in the exemplary design pipeline of Fig. 1. Based on the S2 score profile, we selected flexible regions from 13 aa peaks with S2 scores lower than 0.71 (peaks circled in Fig. 1) .
The locations of the selected flexible regions and corresponding domains of SaCas9 are shown in Fig. 15. The selected flexible region amino acid sequences and modified amino acid sequences of engineered flexible variant regions are shown in Table 5. The amino acid positions are based on SEQ ID NO: 53.
Table 5. Selected flexible region amino acid sequences and modified amino acid sequences of engineered SaCas9.
Figure PCTCN2020134249-appb-000030
Figure PCTCN2020134249-appb-000031
To screen the engineered SaCas9 variants for improved gene editing activity, we performed a T7 enzyme cleavage assay (Fig. 16) according to the same method described above for GeoCas9. Engineered SaCas9 variants 1.1 (SEQ ID NO: 54) , 3.1 (SEQ ID NO: 58) , and 3.2 (SEQ ID NO: 59) (indicated by box outline in Fig. 16) showed significantly improved gene editing efficiency.
EXAMPLE 6: Genome editing of Cas12i in mammalian cells
This example describes characterization of wild-type Cas12i1 and Cas12i2.
Here, we show that Cas12i1 and Cas12i2 can cleave dsDNA in vitro. Cas12i1 and Cas12i2 generated cleaved dsDNA products in vitro, resolved by gel electrophoresis (Fig. 17A) . The cRNAs used in the in vitro cleavage assay are shown in Fig. 17B.
We tested whether Cas12i1 and Cas12i2 are capable of genome editing in human cells. We designed multiple crRNAs targeting AAVs1 (2 crRNAs) , CCR5 (7 crRNAs) , CD34 (2 crRNAs) , and RNF2 (10 crRNAs) , and tested the ability of Cas12i1 and Cas12i2 to generate indels using each of the crRNAs. We found that Cas12i2 was able to generate indels in a broader range of targets, suggesting that it is a promising candidate for engineering (Fig. 18) .
Based on the wildtype crRNA sequences of Cas12i1, we designed three mutant crRNA sequences:
crRNA Sequence (mutation marked by underline) SEQ ID NO
wt ATTTTTGTGCCCATCGTTGGCAC 171
t2c AT cTTTGTGCCCATCGTTGGCAC 172
t3c ATT cTTGTGCCCATCGTTGGCAC 173
tt23aa AT aaTTGTGCCCATCGTTGGCAC 174
We found that Cas12i1 was able to generate indels using crRNAs with a t3c point mutation with significantly higher efficiency than using a wild type crRNA sequence (Fig. 19) .
Based on our finding that Cas12i2 is able to generate indels in a broad range of targets, we next sought to further characterize Cas12i2. Notably, we found that Cas12i2 is enzymatically active (i.e., able to cleave dsDNA) at a wide temperature range, from 12 ℃ to 67 ℃ (Fig. 20) . These data demonstrate that Cas12i2 has the potential to be used in a variety of organisms that live at different temperature conditions or have different body temperatures. 
Additionally, we found that wildtype Cas12i2 is capable of editing multiple genomic target loci in human cells using a crRNA array (Fig. 21) .
We next determined the seed sequence for wildtype Cas12i2. Target binding by CRISPR-Cas ribonucleoprotein effectors is initiated by the recognition of double-stranded PAM motifs by the Cas protein moiety followed by destabilization, localized melting, and interrogation of the target by the guide part of CRISPR RNA moiety. The latter process depends on seed sequences, parts of the target that must be strictly complementary to CRISPR RNA guide. Mismatches between the target and CRISPR RNA guide outside the seed have minor effects on target binding, thus contributing to off-target activity of CRISPR-Cas effectors. Here, we define the seed sequence.
To define the seed sequence of Cas12i2, we tested the ability of Cas12i2 to generate indels with the AAVs1-1 and RNF2-1 crRNAs with single base mismatches at one of bases 1-19 of the crRNA. We found that the seed sequence of Cas12i2 is within 1-10bp (Fig. 22) .
Finally, we determined the optimal spacer length for Cas12i2. We tested the ability of Cas12i2 to generate indels with two different crRNAs (CCR5-1 and CCR5-2) using spacer lengths ranging from 17 bp to 25 bp. We found that the optimal spacer length for Cas12i2 gene editing is 20 bp (Fig. 23) .
Methods
Cas12i Expression
For prokaryotic expression, the coding sequences of Cas12i2 and its variants were ligated into a BPK2104-ccdB expression vector using T4 ligase, with the vector digested by XmaI and SpeI.
In vitro cleavage assay
The template for in vitro cleavage assay was created by PCR and purified by DNA Clean & Concentrator (ZYMO TM Research) . In single in vitro cleavage assay, 100nM template was used, with 1mM Cas12i protein and 2mM crRNA. The reaction was conducted in the 1x NEBUFFER TM 3.1 for Cas12i1 and 1x NEBUFFER TM 2 for Cas12i2 at 37℃ for 60min. To determine the thermostability of Cas12i2, the cleavage mix was incubated at a large-range temperature (4℃ -67℃) for 1 h in its cleavage buffer. For plasmid cleavage assay, the plasmid was incubated with different concentration of RNP formed by Cas12i2 or enCas12i2 plus crRNA. The reaction was conducted at 37℃ for 10min in 1x NEBUFFER TM 2. The reaction was stopped by adding RNase cocktail (Thermo Fisher Scientific) to digest crRNA at 37℃ for 20 min, following the incubation with Proteinase (Takara) at 37℃ for 20 min. The reaction was resolved by agarose gel electrophoresis and ethidium bromide staining.

Claims (90)

  1. A method of engineering an enzyme, comprising:
    (a) obtaining a plurality of engineered enzymes each comprising one or more mutations that increase flexibility of a flexible region in a plurality of flexible regions of a reference enzyme; and
    (b) selecting one or more engineered enzymes from the plurality of the engineered enzymes, wherein the one or more engineered enzymes have an increased activity compared to the reference enzyme.
  2. The method of claim 1, wherein the plurality of flexible regions is determined based on the amino acid sequence of the reference enzyme.
  3. The method of claim 2, wherein the plurality of flexible regions is determined without reference to a three-dimensional structure of the reference enzyme or a homolog thereof.
  4. The method of claim 3, wherein the plurality of flexible regions is determined using a program selected from the group consisting of PredyFlexy, FoldUnfold, PROFbval, Flexserv, FlexPred, DynaMine and Disomine.
  5. The method of claim 4, wherein the plurality of flexible regions is determined using DynaMine.
  6. The method of any one of claims 1-5, further comprises:
    (i) calculating a flexibility score of each amino acid residue of the reference enzyme, wherein a higher flexibility score indicates lower conformational flexibility;
    (ii) selecting a plurality of peak amino acid residues at positions X i, wherein each peak amino acid residue has a flexibility score that is below a pre-determined threshold value and is lower than the flexibility scores of amino acid residues at positions X i-5 to X i-1 and X i+1 and X i+5; and
    (iii) defining the plurality of flexible regions as amino acid residues X i-2 to X i+2.
  7. The method of any one of claims 1-6, wherein the plurality of flexible regions are located in random coils.
  8. The method of any one of claims 1-7, wherein the one or more mutations comprise insertion of one or more Glycine (G) residues in a flexible region.
  9. The method of claim 8, wherein the one or more mutations comprise insertion of two G residues in a flexible region.
  10. The method of claim 8 or 9, wherein the one or more G residues are inserted at the N-terminus of a flexible amino acid residue in the flexible region, wherein the flexible amino acid residue is selected from the group consisting of G, Serine (S) , Asparagine (N) , Aspartic acid (D) , Histidine (H) , Methionine (M) , Threonine (T) , Glutamic acid (E) , Glutamine (Q) , Lysine (K) , Arginine (R) , Alanine (A) and Proline (P) .
  11. The method of any one of claims 1-10, wherein the one or more mutations comprise substitution of one or more non-G residues with one or more G residues.
  12. The method of claim 11, wherein the one or more mutations comprise substitution of a hydrophobic amino acid residue in a flexible region with a G residue, wherein the hydrophobic amino acid residues is selected from the group consisting of leucine (L) , isoleucine (I) , valine (V) , cysteine (C) , tyrosine (Y) , phenylalanine (F) , and tryptophan (W) .
  13. The method of any one of claims 1-12, wherein the enzyme is a bacterial or archaeal enzyme.
  14. The method of claim 13, wherein the enzyme is a Cas nuclease.
  15. The method of claim 14, wherein the Cas nuclease is selected from the group consisting of Cas9, Cas12a, Cas12b, Cas12c, Cas12d, Cas12f, Cas12g, Cas12h, Cms1, Cas12i, Cas12j, Cas12k and CasX.
  16. The method of claim 14 or 15, wherein the plurality of flexible regions are in domains of the reference Cas nuclease that interact with DNA and/or RNA.
  17. The method of any one of claims 14-16, wherein the activity is site-specific nuclease activity.
  18. The method of any one of claims 14-17, wherein the activity is gene-editing activity in a eukaryotic cell.
  19. The method of claim 18, wherein the activity is gene-editing activity in a human cell.
  20. The method of claim 18 or 19, wherein a selected engineered Cas nuclease of step (b) has at least about 20%higher gene-editing efficiency compared to the reference Cas nuclease at a genomic locus in the cell.
  21. The method of any one of claims 18-20, wherein the average gene-editing efficiency of a selected engineered Cas nuclease of step (b) at a plurality of genomic loci in the cell is at least about 20%higher than that of the reference Cas nuclease.
  22. The method of claim 20 or 21, wherein the gene-editing efficiency is measured using a T7 endonuclease 1 (T7E1) assay, sequencing of the target DNA, a Tracking of Indels by Decomposition (TIDE) assay, or Indel Detection by Amplicon Analysis (IDAA) assay.
  23. An engineered Cas nuclease obtained using the method of any one of claims 14-22.
  24. An engineered Cas nuclease comprising one or more mutations that increase flexibility of a flexible region comprising at least 5 consecutive amino acid residues of a reference Cas nuclease, wherein the engineered Cas nuclease has an increased activity compared to the reference Cas nuclease.
  25. The engineered Cas nuclease of claim 24, wherein the flexible region is determined based on the amino acid sequence of the reference Cas nuclease.
  26. The engineered Cas nuclease of claim 25, wherein the flexible region is determined using a program selected from the group consisting of PredyFlexy, FoldUnfold, PROFbval, Flexserv, FlexPred, DynaMine and Disomine.
  27. The engineered Cas nuclease of claim 26, wherein the flexible region is determined using DynaMine, and wherein the amino acid residue with the highest flexibility in the flexible region has a flexibility score S 2 pred of no more than about 0.8.
  28. The engineered Cas nuclease of any one of claims 24-27, wherein the flexible region has 5 consecutive amino acid residues, wherein the 3 rd amino acid residue has the lowest flexibility score, and wherein a higher flexibility score indicates lower conformational flexibility.
  29. The engineered Cas nuclease of any one of claims 24-28, wherein the flexible region is located in a random coil.
  30. The engineered Cas nuclease of any one of claims 24-29, wherein the flexible region is in a domain of the reference Cas nuclease that interacts with DNA and/or RNA.
  31. The engineered Cas nuclease of any one of claims 24-30, wherein the reference Cas nuclease is selected from the group consisting of Cas9, Cas12a, Cas12b, Cas12c, Cas12d, Cas12f, Cas12g, Cas12h, Cms1, Cas12i, Cas12j, Cas12k and CasX.
  32. An engineered Cas12b nuclease comprising one or more mutations that increase flexibility of a flexible region that corresponds to amino acid residues 835 to 839 in a reference Cas12b nuclease, wherein the amino acid residue numbering is based on SEQ ID NO: 1, wherein the engineered Cas12b nuclease has an increased activity compared to the reference Cas12b nuclease.
  33. The engineered Cas12b nuclease of claim 32, comprising an amino acid sequence having at least about 85%sequence identity to SEQ ID NO: 2.
  34. An engineered Cas12i nuclease comprising one or more mutations that increase flexibility of a flexible region in a reference Cas12i nuclease that is selected from the group consisting of regions corresponding to amino acid residues 228-232, amino acid residues 439-443, amino acid residues 478-482, amino acid residues 500-504, amino acid residues 775-779, and amino acid residues 925-929, wherein the amino acid residue numbering is  based on SEQ ID NO: 8, wherein the engineered Cas12i nuclease has an increased activity compared to the reference Cas12i nuclease.
  35. The engineered Cas12i nuclease of claim 34, wherein the flexible region corresponds to amino acid residues 439-443 or amino acid residues 925-929, wherein the amino acid residue numbering is based on SEQ ID NO: 8.
  36. The engineered Cas12i nuclease of claim 34 or 35, comprising an amino acid sequence having at least about 85%sequence identity to an amino acid sequence selected from the group consisting of SEQ ID NOs: 14, 18, and 20.
  37. An engineered Cas9 nuclease comprising one or more mutations that increase flexibility of a flexible region in a reference Cas9 nuclease that is selected from the group consisting of regions corresponding to amino acid residues 39-43, amino acid residues 135-139, amino acid residues 176-180, amino acid residues 274-278, amino acid residues 351-355, and amino acid residues 389-393, amino acid residues 521-525, amino acid residues 541-545, amino acid residues 755-759, amino acid residues 774-778, amino acid residues 786-790, amino acid residues 811-815, amino acid residues 848-852, amino acid residues 855-859, amino acid residues 874-878, amino acid residues 891-895, amino acid residues 1019-1023, and amino acid residues 1036-1040, wherein the amino acid residue numbering is based on SEQ ID NO: 25, wherein the engineered Cas9 nuclease has an increased activity compared to the reference Cas9 nuclease.
  38. The engineered Cas9 nuclease of claim 37, wherein the flexible region is selected from the group consisting of regions corresponding to amino acid residues 135-139, amino acid residues 176-180, amino acid residues 541-545, amino acid residues 755-759, and amino acid residues 811-815, wherein the amino acid residue numbering is based on SEQ ID NO: 25.
  39. The engineered Cas9 nuclease of claim 37 or 38, comprising an amino acid sequence having at least about 85%sequence identity to an amino acid sequence selected from the group consisting of SEQ ID NOs: 27, 28, 33, 34, and 41.
  40. An engineered Cas9 nuclease comprising one or more mutations that increase flexibility of a flexible region in a reference Cas9 nuclease that is selected from the group consisting of regions corresponding to amino acid residues 45-49, amino acid residues 84-88, amino acid residues 116-120, amino acid residues 128-132, amino acid residues 216-220, and amino acid residues 318-322, amino acid residues 387-391, amino acid residues 497-501, amino acid residues 583-587, amino acid residues 594-598, amino acid residues 614-618, amino acid residues 696-700, and amino acid residues 739-743, wherein the amino acid residue numbering is based on SEQ ID NO: 53, wherein the engineered Cas9 nuclease has an increased activity compared to the reference Cas9 nuclease.
  41. The engineered Cas9 nuclease of claim 40, wherein the flexible region corresponds to amino acid residues 45-49, or amino acid residues 116-120, wherein the amino acid residue numbering is based on SEQ ID NO: 53.
  42. The engineered Cas9 nuclease of claim 40 or 41, comprising an amino acid sequence having at least about 85%sequence identity to an amino acid sequence selected from the group consisting of SEQ ID NOs: 54 and 58-59.
  43. The engineered Cas nuclease of any one of claims 24-42, wherein the one or more mutations comprise inserting one or more G residues in the flexible region.
  44. The engineered Cas nuclease of claim 43, wherein the one or more mutations comprise inserting two G residues in a flexible region.
  45. The engineered Cas nuclease of claim 43 or 44, wherein the one or more G residues are inserted at the N-terminus of a flexible amino acid residue in the flexible region, wherein the flexible amino acid residue is selected from the group consisting of G, Serine (S) , Asparagine (N) , Aspartic acid (D) , Histidine (H) , Methionine (M) , Threonine (T) , Glutamic acid (E) , Glutamine (Q) , Lysine (K) , Arginine (R) , Alanine (A) and Proline (P) .
  46. The engineered Cas nuclease of any one of claims 24-45, wherein the one or more mutations comprise substituting a hydrophobic amino acid residue in a flexible region with a G residue, wherein the hydrophobic amino acid residues is selected from the group  consisting of leucine (L) , isoleucine (I) , valine (V) , cysteine (C) , tyrosine (Y) , phenylalanine (F) , and tryptophan (W) .
  47. The engineered Cas nuclease of any one of claims 24-46, comprising one or more mutations that increase flexibility of two or more flexible regions in the reference Cas nuclease.
  48. The engineered Cas nuclease of any one of claims 24-47, wherein the activity is site-specific nuclease activity.
  49. The engineered Cas nuclease of any one of claims 24-48, wherein the activity is gene-editing activity in a eukaryotic cell.
  50. The engineered Cas nuclease of claim 49, wherein the activity is gene-editing activity in a human cell.
  51. The engineered Cas nuclease of claim 49 or 50, wherein the engineered Cas nuclease has at least about 20%higher gene-editing efficiency compared to the reference Cas nuclease at a genomic locus in the cell.
  52. The engineered Cas nuclease of any one of claims 49-51, wherein the average gene-editing efficiency of the engineered Cas nuclease at a plurality of genomic loci in the cell is at least about 20%higher than that of the reference Cas nuclease.
  53. The engineered Cas nuclease of any one of claims 49-52, wherein the gene-editing efficiency is measured using a T7 endonuclease 1 (T7E1) assay, sequencing of the target DNA, a Tracking of Indels by Decomposition (TIDE) assay, or Indel Detection by Amplicon Analysis (IDAA) assay.
  54. An engineered Cas nuclease comprising an amino acid sequence selected from the group consisting of SEQ ID NOs: 2, 14, 18, 20, 27, 28, 33, 34, 41, 54 and 58-59.
  55. An engineered Cas effector protein comprising the engineered Cas nuclease of any one of claims 23-54, or a functional derivative thereof.
  56. The engineered Cas effector protein of claim 55, wherein the engineered Cas nuclease or functional derivative thereof is enzymatically active.
  57. The engineered Cas effector protein of claim 56, wherein the effector protein is capable of inducing a double-strand break in a DNA molecule.
  58. The engineered Cas effector protein of claim 56, wherein the effector protein is capable of inducing a single-strand break in a DNA molecule.
  59. The engineered Cas effector protein of claim 55, wherein the effector protein comprises an enzymatically inactive mutant of the engineered Cas nuclease.
  60. The engineered Cas effector protein of any one of claims 55-59, further comprising a functional domain fused to the engineered Cas nuclease or functional derivative thereof.
  61. The engineered Cas effector protein of claim 60, wherein the functional domain is selected from the group consisting of a translation initiator domain, a transcription repressor domain, a transactivation domain, an epigenetic modification domain, anucleobase-editing domain, a reverse transcriptase domain, a reporter domain, and a nuclease domain.
  62. The engineered Cas effector protein of any one of claims 55-61, comprising a first polypeptide comprising an N-terminal portion of the engineered Cas nuclease or functional derivative thereof, and a second polypeptide comprising a C-terminal portion of the engineered Cas nuclease or functional derivative thereof, wherein the first polypeptide and the second polypeptide are capable of associating with each other in the presence of a guide RNA comprising a guide sequence to form a Clustered Regularly Interspersed Short Palindromic Repeat (CRISPR) complex that specifically binds to a target nucleic acid comprising a target sequence complementary to the guide sequence.
  63. The engineered Cas effector protein of claim 62, wherein the first polypeptide and the second polypeptide each comprises a dimerization domain.
  64. The engineered Cas effector protein of claim 63, wherein the first dimerization domain and the second dimerization domain associate with each other in the presence of an inducer.
  65. The engineered Cas effector protein of claim 62, wherein the first polypeptide and the second polypeptide do not comprise dimerization domains.
  66. An engineered CRISPR-Cas system comprising:
    (a) the engineered Cas effector protein of any one of claims 55-65; and
    (b) a guide RNA comprising a guide sequence complementary to a target sequence, or one or more nucleic acids encoding the guide RNA,
    wherein the engineered Cas effector protein and the guide RNA are capable of forming a CRISPR complex that specifically binds to a target nucleic acid comprising the target sequence and induces a modification of the target nucleic acid.
  67. The engineered CRISPR-Cas system of claim 66, wherein the guide RNA is a crRNA comprising the guide sequence.
  68. The engineered CRISPR-Cas system of claim 66 or 67, comprising a precursor guide RNA array encoding a plurality of crRNAs.
  69. The engineered CRISPR-Cas system of claim 66, wherein the guide RNA comprises a crRNA and a tracrRNA.
  70. The engineered CRISPR-Cas system of claim 69, wherein the guide RNA is a single guide RNA (sgRNA) .
  71. The engineered CRISPR-Cas system of any one of claims 66-70, comprising one or more vectors encoding the engineered Cas effector protein.
  72. The engineered CRISPR-Cas system of claim 71, wherein the one or more vector is an adeno-associated viral (AAV) vector.
  73. The engineered CRISPR-Cas system of claim 72, wherein the AAV vector further encodes the guide RNA.
  74. A method of detecting a target nucleic acid in a sample, comprising:
    (a) contacting the sample with the engineered CRISPR-Cas system of any one of claims 66-68, and a labeled detector nucleic acid that is single stranded and does not hybridize with the guide sequence of the guide RNA; and
    (b) measuring a detectable signal produced by cleavage of the labeled detector nucleic acid by the engineered Cas effector protein, thereby detecting the target nucleic acid.
  75. A method of modifying a target nucleic acid comprising a target sequence, comprising contacting the target nucleic acid with the engineered CRISPR-Cas system of any one of claims 66-73.
  76. The method of claim 75, wherein the method is carried out in vitro.
  77. The method of claim 75, wherein the target nucleic acid is present in a cell.
  78. The method of claim 77, wherein the cell is a bacterial cell, a yeast cell, a mammalian cell, a plant cell, or an animal cell.
  79. The method of claim 77 or 78, wherein the method is carried out ex vivo.
  80. The method of claim 77 or 78, wherein the method is carried out in vivo.
  81. The method of any one of claims 75-80, wherein the target nucleic acid is cleaved or the target sequence in the target nucleic acid is altered by the engineered CRISPR-Cas system.
  82. The method of any one of claims 75-80, wherein expression of the target nucleic acid is altered by the engineered CRISPR-Cas system.
  83. The method of any one of claims 75-82, wherein the target nucleic acid is a genomic DNA.
  84. The method of any one of claims 75-83, wherein the target sequence is associated with a disease or condition.
  85. The method of any one of claims 75-84, wherein the engineered CRISPR-Cas system comprises a precursor guide RNA array encoding a plurality of crRNA, wherein each crRNA comprises a different guide sequence.
  86. The method of any one of claims 75-85, wherein the method is carried out at a temperature of about 4℃ to about 67℃.
  87. A method of treating a disease or condition associated with a target nucleic acid in cells of an individual, comprising modifying the target nucleic acid in the cells of the individual using the method of any one of claims 77-86, thereby treating the disease or condition.
  88. The method of claim 87, wherein the disease or condition is selected from the group consisting of cancer, cardiovascular diseases, hereditary diseases, autoimmune diseases, metabolic diseases, neurodegenerative diseases, ocular diseases, bacterial infections and viral infections.
  89. An engineered cell comprising a modified target nucleic acid, wherein the target nucleic acid has been modified using the method of any one of claims 75-86.
  90. An engineered non-human animal comprising one or more engineered cells of claim 89.
PCT/CN2020/134249 2020-12-07 2020-12-07 Engineered cas effector proteins and methods of use thereof WO2022120520A1 (en)

Priority Applications (2)

Application Number Priority Date Filing Date Title
CN202080107728.8A CN116601293A (en) 2020-12-07 2020-12-07 Engineered Cas effector proteins and methods of use thereof
PCT/CN2020/134249 WO2022120520A1 (en) 2020-12-07 2020-12-07 Engineered cas effector proteins and methods of use thereof

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
PCT/CN2020/134249 WO2022120520A1 (en) 2020-12-07 2020-12-07 Engineered cas effector proteins and methods of use thereof

Publications (1)

Publication Number Publication Date
WO2022120520A1 true WO2022120520A1 (en) 2022-06-16

Family

ID=81972837

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2020/134249 WO2022120520A1 (en) 2020-12-07 2020-12-07 Engineered cas effector proteins and methods of use thereof

Country Status (2)

Country Link
CN (1) CN116601293A (en)
WO (1) WO2022120520A1 (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US11649444B1 (en) 2021-11-02 2023-05-16 Huidagene Therapeutics Co., Ltd. CRISPR-CAS12i systems

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2018188571A1 (en) * 2017-04-10 2018-10-18 中国科学院动物研究所 System and method for genome editing
WO2019118463A1 (en) * 2017-12-15 2019-06-20 Danisco Us Inc Cas9 variants and methods of use
WO2019126577A2 (en) * 2017-12-22 2019-06-27 The Broad Institute, Inc. Crispr effector system based multiplex diagnostics
WO2019127087A1 (en) * 2017-12-27 2019-07-04 中国科学院动物研究所 System and method for genome editing
WO2019178427A1 (en) * 2018-03-14 2019-09-19 Arbor Biotechnologies, Inc. Novel crispr dna targeting enzymes and systems
WO2020160514A1 (en) * 2019-01-31 2020-08-06 Beam Therapeutics Inc. Nucleobase editors having reduced non-target deamination and assays for characterizing nucleobase editors
WO2020214610A1 (en) * 2019-04-16 2020-10-22 Arizona Board Of Regents On Behalf Of Arizona State University Cas9 fusion proteins and related methods

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2018188571A1 (en) * 2017-04-10 2018-10-18 中国科学院动物研究所 System and method for genome editing
WO2019118463A1 (en) * 2017-12-15 2019-06-20 Danisco Us Inc Cas9 variants and methods of use
WO2019126577A2 (en) * 2017-12-22 2019-06-27 The Broad Institute, Inc. Crispr effector system based multiplex diagnostics
WO2019127087A1 (en) * 2017-12-27 2019-07-04 中国科学院动物研究所 System and method for genome editing
WO2019178427A1 (en) * 2018-03-14 2019-09-19 Arbor Biotechnologies, Inc. Novel crispr dna targeting enzymes and systems
WO2020160514A1 (en) * 2019-01-31 2020-08-06 Beam Therapeutics Inc. Nucleobase editors having reduced non-target deamination and assays for characterizing nucleobase editors
WO2020214610A1 (en) * 2019-04-16 2020-10-22 Arizona Board Of Regents On Behalf Of Arizona State University Cas9 fusion proteins and related methods

Non-Patent Citations (5)

* Cited by examiner, † Cited by third party
Title
AACH JOHN, MALI PRASHANT, CHURCH GEORGE M: "CasFinder: Flexible algorithm for identifying specific Cas9 targets in genomes", BIORXIV, 12 May 2014 (2014-05-12), XP055941141, Retrieved from the Internet <URL:https://www.biorxiv.org/content/10.1101/005074v1.full.pdf> DOI: 10.1101/005074 *
CILIA ELISA, PANCSA RITA, TOMPA PETER, LENAERTS TOM, VRANKEN WIM F.: "The DynaMine webserver: predicting protein dynamics from sequence", NUCLEIC ACIDS RESEARCH, OXFORD UNIVERSITY PRESS, GB, vol. 42, no. W1, 1 July 2014 (2014-07-01), GB , pages W264 - W270, XP055941143, ISSN: 0305-1048, DOI: 10.1093/nar/gku270 *
JEAN-BAPTISTE RENAUD, CHARLOTTE BOIX, MARINE CHARPENTIER, ANNE DE CIAN, JULIEN COCHENNEC, EVELYNE DUVERNOIS-BERTHET, LOï: "Improved Genome Editing Efficiency and Flexibility Using Modified Oligonucleotides with TALEN and CRISPR-Cas9 Nucleases", CELL REPORTS, ELSEVIER INC, US, vol. 14, no. 9, 1 March 2016 (2016-03-01), US , pages 2263 - 2272, XP055566276, ISSN: 2211-1247, DOI: 10.1016/j.celrep.2016.02.018 *
LEGUT MATEUSZ, DANILOSKI ZHARKO, XUE XINHE, MCKENZIE DAYNA, GUO XINYI, WESSELS HANS-HERMANN, SANJANA NEVILLE E.: "High-Throughput Screens of PAM-Flexible Cas9 Variants for Gene Knockout and Transcriptional Modulation", CELL REPORTS, ELSEVIER INC, US, vol. 30, no. 9, 1 March 2020 (2020-03-01), US , pages 2859 - 2868.e5, XP055941144, ISSN: 2211-1247, DOI: 10.1016/j.celrep.2020.02.010 *
SAKI OSUKA, ISOMURA KAZUSHI, KAJIMOTO SHOHEI, KOMORI TOMOTAKA, NISHIMASU HIROSHI, SHIMA TOMOHIRO, NUREKI OSAMU, UEMURA SOTARO: "Real‐time observation of flexible domain movements in CRISPR–Cas9", THE EMBO JOURNAL, vol. 37, no. 10, 15 May 2018 (2018-05-15), pages e96941, XP055535303, ISSN: 1460-2075, DOI: 10.15252/embj.201796941 *

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US11649444B1 (en) 2021-11-02 2023-05-16 Huidagene Therapeutics Co., Ltd. CRISPR-CAS12i systems

Also Published As

Publication number Publication date
CN116601293A (en) 2023-08-15

Similar Documents

Publication Publication Date Title
CN113308451B (en) Engineered Cas effector proteins and methods of use thereof
CN113151215B (en) Engineered Cas12i nuclease, effector protein thereof and application thereof
JP7083364B2 (en) Optimized CRISPR-Cas dual nickase system, method and composition for sequence manipulation
EP2927318B1 (en) Methods and compositions for targeted cleavage and recombination
KR102494449B1 (en) Engineered cas9 systems for eukaryotic genome modification
US20140086885A1 (en) Methods and compositions for targeted cleavage and recombination
CA2989834A1 (en) Crispr enzymes and systems
WO2016014837A1 (en) Gene editing for hiv gene therapy
EP4349979A1 (en) Engineered cas12i nuclease, effector protein and use thereof
JP7389135B2 (en) CRISPR/CAS dropout screening platform to reveal genetic vulnerabilities associated with tau aggregation
JP7461368B2 (en) CRISPR/CAS Screening Platform to Identify Genetic Modifiers of Tau Seeding or Aggregation
WO2022120520A1 (en) Engineered cas effector proteins and methods of use thereof
WO2022042557A1 (en) Split cas12 systems and methods of use thereof
WO2023104185A1 (en) Engineered cas12b effector proteins and methods of use thereof
WO2023138617A1 (en) Engineered casx nuclease, effector protein and use thereof
US20230279442A1 (en) Engineered cas9-nucleases and method of use thereof
WO2023122663A2 (en) Effector proteins and methods of use
Odabas et al. Exon7 Targeted CRISPR-Prime Editing Approaches for SMN2 Gene Editing in Spinal Muscular Atrophy (SMA) Disease Can Increase In Vitro SMN Expression
US20230348873A1 (en) Nuclease-mediated nucleic acid modification

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 20964467

Country of ref document: EP

Kind code of ref document: A1

WWE Wipo information: entry into national phase

Ref document number: 202080107728.8

Country of ref document: CN

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 20964467

Country of ref document: EP

Kind code of ref document: A1