CN115851665A - Engineered Cas12i nuclease, effector protein thereof and application thereof - Google Patents

Engineered Cas12i nuclease, effector protein thereof and application thereof Download PDF

Info

Publication number
CN115851665A
CN115851665A CN202211325001.4A CN202211325001A CN115851665A CN 115851665 A CN115851665 A CN 115851665A CN 202211325001 A CN202211325001 A CN 202211325001A CN 115851665 A CN115851665 A CN 115851665A
Authority
CN
China
Prior art keywords
cas12i
engineered
nuclease
amino acid
amino acids
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202211325001.4A
Other languages
Chinese (zh)
Inventor
李伟
周琪
陈阳灿
胡艳萍
王鑫阁
陈逸
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Institute Of Stem Cell And Regenerative Medicine
Institute of Zoology of CAS
Original Assignee
Beijing Institute Of Stem Cell And Regenerative Medicine
Institute of Zoology of CAS
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Institute Of Stem Cell And Regenerative Medicine, Institute of Zoology of CAS filed Critical Beijing Institute Of Stem Cell And Regenerative Medicine
Priority to CN202211325001.4A priority Critical patent/CN115851665A/en
Publication of CN115851665A publication Critical patent/CN115851665A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12NMICROORGANISMS OR ENZYMES; COMPOSITIONS THEREOF; PROPAGATING, PRESERVING, OR MAINTAINING MICROORGANISMS; MUTATION OR GENETIC ENGINEERING; CULTURE MEDIA
    • C12N9/00Enzymes; Proenzymes; Compositions thereof; Processes for preparing, activating, inhibiting, separating or purifying enzymes
    • C12N9/14Hydrolases (3)
    • C12N9/16Hydrolases (3) acting on ester bonds (3.1)
    • C12N9/22Ribonucleases RNAses, DNAses
    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12NMICROORGANISMS OR ENZYMES; COMPOSITIONS THEREOF; PROPAGATING, PRESERVING, OR MAINTAINING MICROORGANISMS; MUTATION OR GENETIC ENGINEERING; CULTURE MEDIA
    • C12N15/00Mutation or genetic engineering; DNA or RNA concerning genetic engineering, vectors, e.g. plasmids, or their isolation, preparation or purification; Use of hosts therefor
    • C12N15/09Recombinant DNA-technology
    • C12N15/11DNA or RNA fragments; Modified forms thereof; Non-coding nucleic acids having a biological activity
    • C12N15/113Non-coding nucleic acids modulating the expression of genes, e.g. antisense oligonucleotides; Antisense DNA or RNA; Triplex- forming oligonucleotides; Catalytic nucleic acids, e.g. ribozymes; Nucleic acids used in co-suppression or gene silencing
    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12QMEASURING OR TESTING PROCESSES INVOLVING ENZYMES, NUCLEIC ACIDS OR MICROORGANISMS; COMPOSITIONS OR TEST PAPERS THEREFOR; PROCESSES OF PREPARING SUCH COMPOSITIONS; CONDITION-RESPONSIVE CONTROL IN MICROBIOLOGICAL OR ENZYMOLOGICAL PROCESSES
    • C12Q1/00Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions
    • C12Q1/68Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions involving nucleic acids
    • C12Q1/6869Methods for sequencing
    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12NMICROORGANISMS OR ENZYMES; COMPOSITIONS THEREOF; PROPAGATING, PRESERVING, OR MAINTAINING MICROORGANISMS; MUTATION OR GENETIC ENGINEERING; CULTURE MEDIA
    • C12N2310/00Structure or type of the nucleic acid
    • C12N2310/10Type of nucleic acid
    • C12N2310/20Type of nucleic acid involving clustered regularly interspaced short palindromic repeats [CRISPRs]

Abstract

The present application provides an engineered Cas12i nuclease; it comprises one, two, three or four of the following mutations based on the reference Cas12i nuclease: (1) Replacing one or more amino acids in the reference Cas12i nuclease that interact with PAM with a positively charged amino acid; and/or (2) replacing one or more amino acids involved in opening a DNA double strand in the reference Cas12i nuclease with an amino acid with an aromatic ring; and/or (3) replacing one or more amino acids in the reference Cas12i nuclease that are located in the RuvC domain and interact with the single-stranded DNA substrate with a positively charged amino acid; and/or (4) replacing one or more amino acids in the reference Cas12i nuclease that interact with the DNA-RNA duplex with a positively charged amino acid; in particular, the reference Cas12i nuclease is a native Cas12i nuclease, e.g. a native Cas12i2 nuclease, the amino acid sequence of which is defined as SEQ ID No.1.

Description

Engineered Cas12i nuclease, effector protein thereof and application thereof
The application is a divisional application of an invention patent application with the application date of 2021, 05 and 27, and the application number of 2021105812903, and the invention name of "engineered Cas12i nuclease, effector protein and application thereof".
Technical Field
The application belongs to the field of biotechnology. More specifically, the application relates to Cas12i nucleases, effector proteins and uses thereof with improved catalytic activity (e.g., gene editing activity).
Background
Genome editing is an important and useful technique in genome research. There are a number of systems available for genome editing, including Clustered Regularly Interspaced Short Palindromic Repeats (CRISPR) -Cas systems, transcription activator-like effector nucleases (TALENs) systems, and Zinc Finger Nucleases (ZFNs) systems.
The CRISPR-Cas system is a highly efficient and cost-effective genome editing technology, and can be widely applied to a series of eukaryotes from yeast, plants to zebra fish and humans (see reviews: van der Oost 2013, science 339. The CRISPR-Cas system provides adaptive immunity in archaea and bacteria by binding Cas12i effector protein and CRISPR RNA (crRNA). To date, CRISPR-Cas systems including two classes (class 1 and class 2) of six (I-VI) have been characterized based on the outstanding functional and evolutionary modularity of the system. In the class 2 CRISPR-Cas system, a II-type Cas9 system and a V-type-A/B/E/J Cas12a/Cas12B/Cas12E/Cas12J system are utilized for genome editing, and a wide prospect is provided for biomedical research.
However, current CRISPR-Cas systems have several limitations, including limited gene editing efficiency. Thus, there is a need for improved methods and systems for efficient genome editing across multiple loci.
Summary of The Invention
The application provides the following technical scheme:
1. an engineered Cas12i nuclease; comprising one, two, three or four reference Cas12i nuclease-based mutations selected from the group consisting of:
(1) Replacing one or more amino acids in the reference Cas12i nuclease that interact with PAM with a positively charged amino acid; and/or
(2) Replacing one or more amino acids in the reference Cas12i nuclease that are involved in opening the DNA double strand with an amino acid with an aromatic ring; and/or
(3) Replacing one or more amino acids in the reference Cas12i nuclease that are located in the RuvC domain and interact with the single-stranded DNA substrate with a positively charged amino acid; and/or
(4) One or more amino acids in the reference Cas12i nuclease that interact with the DNA-RNA duplex are replaced with positively charged amino acids.
In some embodiments, the reference Cas12i nuclease is a native wild-type Cas12i enzyme. In some embodiments, the reference Cas12i nuclease is Cas12i1, cas12i2, or a homologous nuclease thereof. In some embodiments, the reference enzyme is a native Cas12i2 nuclease, the amino acid sequence of which is defined as SEQ ID No.1. In some embodiments, the reference enzyme is a native Cas12i1 enzyme, the amino acid sequence of which is defined as SEQ ID No. 13. In some embodiments, the reference Cas12i nuclease is an engineered Cas12i nuclease.
2. The engineered Cas12i nuclease of item 1, wherein the one or more amino acids that interact with PAM are amino acids within 9 angstroms of PAM in three-dimensional structure. In some embodiments, the one or more amino acids that interact with PAM are one or more amino acids at the following positions: 176. 178, 226, 227, 229, 237, 238, 264, 447, and/or 563. In some embodiments, the one or more amino acids that interact with PAM are one or more of the following: e176, E178, Y226, a227, N229, E237, K238, K264, T447, and/or E563. In some embodiments, the one or more amino acids that interact with PAM are one or more of the following: e176, K238, T447, and/or E563. Wherein the amino acid position number is defined as SEQ ID NO.1.
3. The engineered Cas12i nuclease of item 2, wherein the positively charged amino acid is R or K. In some embodiments, the positively charged amino acid is R.
4. An engineered Cas12i nuclease as described in clause 3, wherein the replacement of one or more amino acids in a reference Cas12i nuclease that interact with PAM to positively charged amino acids refers to one or more of the following substitutions: E176R, K R, T R and/or E563R. In some embodiments, the Cas12i nuclease comprises any one or combination of mutations of: 1) E563R; (2) E176R, T447R, E R and E563R; (3) K238R and E563R; (4) E176R, K R and T447R; (5) E176R, K R and E563R; (6) E176R, T R and E563R; and/or (7) E176R, K R, T R and E563R. Wherein the amino acid position number is defined as SEQ ID NO.1.
5. An engineered Cas12i nuclease as claimed in any one of items 1 to 4, wherein the one or more amino acids involved in opening the DNA double strand are the amino acids that interact with the last base pair in the PAM relative to the 3' end of the target strand. In some embodiments, the one or more amino acids involved in opening a DNA duplex are one or more of the following positions: 163 and/or 164. (ii) a In some embodiments, the one or more amino acids involved in opening a DNA duplex are one or more of the following: q163 and/or N164. In some embodiments, the one or more amino acids involved in opening a DNA duplex is N164. Wherein the amino acid position numbering is as defined in SEQ ID NO.1.
6. The engineered Cas12i nuclease of item 5, wherein the one or more amino acids involved in opening a DNA double strand are replaced with an aromatic ring-bearing amino acid that is F, Y or W. In some embodiments, the amino acid is F or Y.
7. An engineered Cas12i nuclease as described in clause 6, wherein said replacement of one or more amino acids involved in opening a DNA double strand in a reference Cas12i enzyme with an aromatic ring-containing amino acid means: Q163F, Q163Y, Q163W, N F and/or N164Y. In some embodiments, the Cas12i nuclease comprises a N164Y or N164F mutation, e.g., N164Y.
8. An engineered Cas12i nuclease as claimed in any one of claims 1 to 7 wherein the one or more amino acids located in the RuvC domain and interacting with a single stranded DNA substrate are amino acids within 9 angstroms of a three dimensional structure from the single stranded DNA substrate. In some embodiments, the one or more amino acids located in the RuvC domain and interacting with the single-stranded DNA substrate are one or more amino acids at the following positions: 323. 362, 425, 925, 926, 390, 391, 392, 751, 755, 840, 848, 851, 856, 885, 897, 929, 932, 327, 355, 359, 360, 361, 414, 421, 650, 652, 705, 708, 709, 752, 928, 388, 393, 417, 418, 424, 653, 696, and/or 1022. In some embodiments, the one or more amino acids located in the RuvC domain that interact with the single-stranded DNA substrate are one or more of the following: e323, D362, Q425, N925, I926, N390, N391, F392, L751, E755, N840, N848, S851, a856, Q885, M897, G929, Y932, L327, V355, G359, G360, K361, Q414, K421, S650, E652, K705, K708, E709, S752, T928, L388, K393, L417, a418, Q424, G653, I696, and/or a1022. In some embodiments, the one or more amino acids located in the RuvC domain that interact with the single-stranded DNA substrate are one or more of the following: e323, D362, Q425, N925, I926, N391, Q424 and/or G929. Wherein the amino acid position number is defined as SEQ ID NO.1.
9. An engineered Cas12i nuclease as described in clause 8, wherein one or more amino acids in a reference Cas12i nuclease that are involved in cleavage of double-stranded DNA are replaced with positively charged amino acids that are R or K. In some embodiments, the amino acid is R.
10. An engineered Cas12i nuclease as described in clause 9, wherein the substitution of one or more amino acids of a reference Cas12i nuclease located in the RuvC domain and interacting with a single-stranded DNA substrate to a positively charged amino acid refers to the substitution comprising one or more of the following substitutions: E323R, D362R, N391R, Q R, Q425R, N32925R, I926R and/or G929R. In some embodiments, the Cas12i nuclease comprises any one or combination of mutations of: 1) E323R; (2) D362R; (3) Q425R; (4) N925R; (5) I926R; (6) E323R and D362R; (7) E323R and Q425R; (8) E323R and I926R; (9) Q425R and I926R; (10) D362R and I926R; (11) N925R and I926R; (12) E323R, D R and Q425R; (13) E323R, D R and I926R; (14) E323R, Q R and I926R; (15) D362R, N R and I926R; and/or (16) E323R, D362R, Q R and I926R. Wherein the amino acid position number is defined as SEQ ID NO.1.
11. An engineered Cas12i nuclease as claimed in any one of items 1 to 10 wherein the one or more amino acids that interact with a DNA-RNA duplex are amino acids within 9 angstroms of the DNA-RNA duplex in three-dimensional structure. In some embodiments, the one or more amino acids that interact with the DNA-RNA duplex are one or more of the following positions: 116. 117, 156, 159, 161, 301, 305, 306, 308, 312, 313, 427, 433, 438, 441, 442, 852, 855, 861, 865, 160, 316, 319, 320, 247, 343, 348, 349, 679, 683, 691, 782, 783, 797, 800, 853, 957, 958, 293, 294 and/or 297. In some embodiments, the one or more amino acids that interact with the DNA-RNA duplex are one or more of the following: g116, E117, a156, T159, S161, T301, I305, K306, T308, N312, F313, D427, K433, V438, N441, Q442, M852, L855, N861, Q865, E160, Q316, E319, Q320, E247, E343, E348, E349, N679, E683, E691, D782, E783, E797, E800, D853, S957, D958, G293, E294 and/or N297. In some embodiments, the one or more amino acids that interact with the DNA-RNA duplex are one or more of the following: g116, E117, T159, S161, E319, E343, and/or D958. Wherein the amino acid position number is defined as SEQ ID NO.1.
12. The engineered Cas12i nuclease of item 11, wherein the positively charged amino acid is R or K. In some embodiments, the positively charged amino acid is R.
13. An engineered Cas12i nuclease as claimed in item 12, wherein the substitution of one or more amino acids interacting with the DNA-RNA duplex in a reference Cas12i nuclease to positively charged amino acids refers to one or more of the following substitutions: G116R, E R, T159R, S R, E319R, E343R and/or D958R. In some embodiments, the Cas12i nuclease comprises D958R. Wherein the amino acid position number is defined as SEQ ID NO.1.
14. An engineered Cas12i nuclease as described in any one of items 1 to 13 comprising one or more flexible region mutations that increase the flexibility of a flexible region in a reference Cas12i nuclease, the flexible region being selected from the group of regions corresponding to: amino acid residues 439-443 or amino acid residues 925-929, wherein the amino acid position numbering is as defined in SEQ ID No.1. In some embodiments, the flexible region mutation is located at one or more of the following positions: 439, and/or 926. In some embodiments, the flexible region mutation is one or more of the following amino acids: l439 and/or I926.
15. The engineered Cas12i nuclease of item 14, wherein the one or more flexible region mutations are: the amino acid is replaced by a G, and/or one or two G's are inserted thereafter. In some embodiments, the one or more flexible region mutations comprises I926G. In some embodiments, the one or more flexible region mutations comprises 439G or 439GG. Wherein the amino acid position numbering is as defined in SEQ ID NO.1.
16. An engineered Cas12i nuclease (e.g., cas12i2 nuclease) having an amino acid position numbering as defined in SEQ ID No. 1; the engineered Cas12i nuclease comprises any one or more (e.g., 2, 3,4, 5, 6, or more) sets of mutations: (1) E563R; (2) E176R and T447R; (3) E176R and E563R; (4) K238R and E563R; (5) E176R, K R and T447R; (6) E176R, T R and E563R; (7) E176R, K R and E563R; (8) E176R, K R, T R and E563R; (9) N164Y; (10) N164F; (11) E323R; (12) D362R; (13) Q425R; (14) N925R; (15) I926R; (16) D958R; (17) E323R and D362R; (18) E323R and Q425R; (19) E323R and I926R; (20) Q425R and I926R; (21) D362R and I926R; (22) N925R and I926R; (23) E323R, D R and Q425R; (24) E323R, D R and I926R; (25) E323R, Q R and I926R; (26) D362R, N R and I926R; (27) E323R, D362R, Q R and I926R; (28) D362R and I926G; (29) N925R and I926G; (30) D362R, N R and I926G; (31) I926R and 439G; (32) I926R and 439GG; and/or (33) E323R, D R and I926G. In some embodiments, the engineered Cas12i nuclease comprises any one or more (e.g., 2, 3,4, or 5) sets of mutations: (1) E176R, K R, T R and E563R; (2) N164Y; (3) I926R; (4) E323R and D362R; (4) I926G; (5) I926R and 439G; (6) I926R and 439GG; and/or (7) D958R. In some embodiments, the engineered Cas12i nuclease comprises any one of the following sets of mutations: <xnotran> ((1) E176 238 447 563R N164Y; (2) E176 238 447 563R I926R; (3) N164 323R D362R; (4) E176 238 447 563 323R D362R; (5) N164Y I926R; (6) E176 238 447 563 164Y I926R; (7) E176 238 447 563 164 323R D362R; (8) E176 238 447 563 164 926 323R D362R; (9) E176 238 447 563 164 323 362R I926G; (10) E176 238 447 563 164 323 362 926G 439GG; (11) E176 238 447 563 164 323 362 926G 439G; (12) E176 238 447 563 164Y D958R; (13) E176 238 447 563 926R D958R; (14) E176 238 447 563 323 362R D958R; (15) N164 926R D958R; (16) N164 323 362R D958R; (17) E176 238 447 563 164 926R D958R; (18) E176 238 447 563 164 323 362R D958R; (19) E176 238 447 563 164 926 323 362R D958R; (20) E176 238 447 563 164 323 362 926G D958R; (21) E176 238 447 563 164 323 362 926G,439GG D958R; (22) E176 238 447 563 164 323 362 926G,439G D958R. , Cas12i2 , SEQ ID NO.1. </xnotran>
17. An engineered Cas12i nuclease comprising an amino acid sequence as set forth in any one of SEQ ID nos. 2 to 12. In some embodiments, the engineered Cas12i nuclease comprises an amino acid sequence having at least 85% (e.g., at least 87%, 89%, 91%, 93%, 95%, 97%, or 99%) sequence identity to the amino acid sequence set forth in any one of SEQ id nos. 2-12.
18. An engineered Cas12i effector protein comprising an engineered Cas12i nuclease or a functional derivative thereof according to any one of items 1 to 17. In some embodiments, the engineered Cas12i nuclease or functional derivative thereof has enzymatic activity. In some embodiments, the effector protein comprises an enzyme-inactivating mutant of the engineered Cas12i nuclease.
19. The engineered Cas12i effector protein of clause 18, wherein the effector protein is capable of inducing a double-strand break or a single-strand break in a DNA molecule.
20. The engineered Cas12i effector protein of clause 18, wherein the functional derivative of the engineered Cas12i nuclease is an enzyme inactivating mutant, for example an enzyme inactivating mutant containing D599A, E833A, S883A, H884A, D886A, R a and/or D1019A.
21. The engineered Cas12i effector protein of clauses 18-20, further comprising a functional domain fused to the engineered Cas12i nuclease.
22. The engineered Cas12i effector protein of item 21, wherein the functional domain is selected from the group consisting of: a translation initiation domain, a transcription repression domain, a transactivation domain, an epigenetic modification domain, a nucleobase editing domain (e.g., a CBE or ABE domain), a reverse transcriptase domain, a reporter domain (e.g., a fluorescent domain), and a nuclease domain.
23. An engineered Cas12i effector protein as described in items 18-22, comprising: a first polypeptide comprising an N-terminal portion of the engineered Cas12i nuclease or functional derivative thereof and a second polypeptide comprising a C-terminal portion of the engineered Cas12i nuclease or functional derivative thereof, wherein the first polypeptide and the second polypeptide are capable of associating with each other in the presence of a guide RNA comprising a guide sequence to form a Clustered Regularly Interspaced Short Palindromic Repeats (CRISPR) complex that specifically binds to a target nucleic acid comprising a target sequence complementary to the guide sequence. In some embodiments, the first polypeptide comprises amino acid residues 1 to X of the N-terminal portion of the engineered Cas12i nuclease of any one of items 1-17 and the second polypeptide comprises amino acid residues X +1 of the engineered Cas12i nuclease of any one of items 1-17 to the C-terminus of the Cas12i nuclease, wherein the first polypeptide and the second polypeptide are capable of associating with each other in the presence of a guide RNA comprising a guide sequence to form a Clustered Regularly Interspaced Short Palindromic Repeats (CRISPR) complex that specifically binds to a target nucleic acid comprising a target sequence complementary to the guide sequence. In some embodiments, the first polypeptide and the second polypeptide each comprise a dimerization domain. In some embodiments, the first dimeric domain and the second dimeric domain are associated with each other in the presence of an inducing agent. In some embodiments, the first polypeptide and the second polypeptide do not comprise a dimerization domain.
24. An engineered CRISPR-Cas12i system comprising:
(a) An engineered Cas12i effector protein of any one of items 18-23; and
(b) A guide RNA comprising a guide sequence complementary to a target sequence, or one or more nucleic acids encoding the guide RNA,
wherein the engineered Cas12i effector protein and the guide RNA are capable of forming a CRISPR complex that specifically binds to a target nucleic acid comprising the target sequence and induces a modification of the target nucleic acid. In some embodiments, the guide RNA is a crRNA comprising the guide sequence. In some embodiments, the system comprises a precursor guide RNA array (array) encoding a plurality of crrnas. In some embodiments, wherein the engineered Cas effector protein is a master editor and the guide RNA is a pegRNA.
25. An engineered CRISPR-Cas12i system as described in item 24 comprising one or more vectors encoding the engineered Cas12i effector protein. In some embodiments, the one or more carriers are selected from the group consisting of: retroviral vectors, lentiviral vectors, adenoviral vectors, adeno-associated vectors and herpes simplex vectors. In some embodiments, the one or more vectors are adeno-associated virus (AAV) vectors. In some embodiments, the AAV vector further encodes the guide RNA (e.g., a crRNA, or a precursor guide RNA array).
26. A method of detecting a target nucleic acid in a sample, comprising:
(a) Contacting the sample with the engineered CRISPR-Cas12i system in item 24 and a tagged detection nucleic acid that is single stranded and does not hybridize to the guide sequence of the guide RNA; and
(b) Measuring a detectable signal generated by cleavage of the tagged detection nucleic acid by the engineered Cas12i effector protein, thereby detecting the target nucleic acid.
27. A method of modifying a target nucleic acid comprising a target sequence comprising contacting the target nucleic acid with the engineered CRISPR-Cas12i system described in item 24. In some embodiments, the method is performed in vitro. In some embodiments, the target nucleic acid is present in a cell. In some embodiments, the cell is a bacterial cell, a yeast cell, a mammalian cell, a plant cell, or an animal cell. In some embodiments, the method is performed ex vivo. In some embodiments, the method is performed in vivo.
In some embodiments according to any of the above methods of modifying a target nucleic acid, the target nucleic acid is cleaved or a target sequence in the target nucleic acid is altered by the engineered CRISPR-Cas12i system. In some embodiments, expression of the target nucleic acid is altered by the engineered CRISPR-Cas12i system. In some embodiments, the target nucleic acid is genomic DNA. In some embodiments, the target sequence is associated with a disease or condition.
In some embodiments according to any of the above methods of modifying a target nucleic acid, the engineered CRISPR-Cas12i system comprises a precursor guide RNA array encoding a plurality of crrnas, wherein each crRNA comprises a different guide sequence.
28. Use of the engineered CRISPR-Cas12i system of item 24 in the manufacture of a medicament for treating a disease or disorder associated with a target nucleic acid in a cell of an individual. In some embodiments, the disease or disorder is selected from the group consisting of: cancer, cardiovascular disease, genetic disease, autoimmune disease, metabolic disease, neurodegenerative disease, eye disease, bacterial infection and viral infection.
29. A method of treating a disease or disorder associated with a target nucleic acid in a cell of a subject; the method comprises modifying a target nucleic acid in a cell of the subject with any of the methods using the engineered CRISPR-Cas12i system of item 27, thereby treating the disease or disorder. In some embodiments, the disease or disorder is selected from the group consisting of: cancer, cardiovascular disease, genetic disease, autoimmune disease, metabolic disease, neurodegenerative disease, ocular disease, bacterial infection, and viral infection.
30. An engineered cell comprising a modified target nucleic acid, wherein the target nucleic acid has been modified using the method described in clause 27.
31. An engineered non-human animal comprising one or more engineered cells according to item 27.
It is to be understood that certain features of the disclosure, which are, for clarity, described in the context of separate embodiments, may also be provided in combination in a single embodiment. Conversely, various features of the disclosure which are, for brevity, described in the context of a single embodiment, may also be provided separately or in any suitable subcombination. All combinations of embodiments involving particular method steps, reagents or conditions or composition components are specifically contemplated by the present disclosure and disclosed herein as if each and every combination were individually and specifically disclosed.
The technical scheme of the application obtains beneficial effect
The Cas12i nuclease and effector protein thereof engineered by the application have higher activity, such as catalytic efficiency for cutting nucleic acid substrates and gene editing efficiency in cells. The engineered Cas12i nucleases in the present application have superior gene editing efficiency in mammalian cells (such as human cells) than existing conventional Cas gene editing tools; for example, some exemplary Cas12i2 nuclease mutants in the present application tested gene editing efficiency at multiple sites (e.g., 62 sites) in human cells, and 57 sites were found to have gene editing efficiency of more than about 60% with an average gene editing efficiency approaching 70%. In some embodiments, the Cas12i nucleases and effector proteins thereof engineered herein also have one or more of the following advantages: the protein is small (1,054aa), the crRNA component is simple, the PAM sequence is simple, and the protein itself can process the precursor crRNA. These advantages make the highly efficient engineered Cas12i nucleases and their effector proteins of the present application very suitable for in vivo gene editing or gene regulation.
Drawings
FIG. 1a: amino acids interacting with PAM in the reference Cas12i nuclease are replaced with positively charged amino acids, thereby improving gene editing efficiency. As shown in the figure, the four mutants of E176R, K238R, T447R and E563R can significantly improve the gene editing efficiency in human 293T cells.
FIG. 1b: combining the amino acid mutations (E176R, K238R, T447R, E563R) in fig. 1a that significantly improved gene editing efficiency, it was found that the combined mutants could exhibit higher gene editing efficiency in human 293T cells.
FIG. 2: the amino acids involved in opening the DNA double strand in the reference Cas12i nuclease are replaced with amino acids with aromatic rings to improve the efficiency of gene editing. As shown, Q163F, Q163Y, Q163W, N F, N Y these mutants can significantly improve gene editing efficiency in human 293T cells.
Fig. 3a, 3b, 3c: amino acids in the reference Cas12i nuclease that are located in the RuvC domain and interact with single-stranded DNA substrates are replaced with positively charged amino acids, thereby increasing the efficiency of gene editing. As shown in the figure, the mutants of E323R, L327R, V355R, G359R, G360R, D362R, N391R, Q32424R, Q425R, N R, I R and G929R, etc. can significantly improve the gene editing efficiency in human 293T cells.
FIG. 3d: the point mutations with improved efficiency in FIGS. 3a and 3b were combined, and it was found that the combined mutants could exhibit higher gene editing efficiency in human 293T cells.
FIG. 3e: the point mutations for improving efficiency in FIGS. 3a and 3b and the modified mutations (439GG and I926G) based on the principle of molecular flexibility were combined, and it was found that the combined mutants could exhibit higher gene editing efficiency in human 293T cells.
FIG. 4: amino acids interacting with a DNA-RNA double helix in a reference Cas12i enzyme-reference Cas12i nuclease are replaced by positively charged amino acids, so that the gene editing efficiency is improved. As shown in the figure, G116R, E R, T159R, S R, E319R, E343R, D R these mutants can significantly improve gene editing efficiency in human 293T cells. Among them, D958R is most preferable.
FIG. 5: the high-efficiency mutants obtained by the three transformation strategies in fig. 1a to fig. 3e and the transformation mutations (439 GG and I926G) based on the principle of molecular flexibility are combined, and the combined mutants can show higher gene editing efficiency in human 293T cells. The gene editing efficiency can be greatly improved after combination. The mutant with the best gene editing effect is selected and named CASXX for subsequent experiments.
FIG. 6a: gene editing efficiency of CasXX at 62 human genomic loci was summarized. PAM = NTTN. Herein CasXX represents an engineered enzyme (based on the reference Cas12i2 of amino acid sequence SEQ ID No. 1) having a combination of E176R + K238R + T447R + E563R + N164Y + E323R + D362R mutations.
FIG. 6b: comparison of the efficiency of CasXX gene editing with AsCas12a, bhCas12b v.
FIG. 6c: comparison of CasXX with SpCas9, saCas9, saCas9-KKH Gene editing efficiency.
FIG. 6d: statistics of gene editing efficiency of CasXX in mouse Hepa1-6 cell line, it can be seen that CasXX exhibits strong gene editing ability at 65 sites, and the average gene editing efficiency is over 60%.
FIG. 7: homology alignment of Cas12i2 (SEQ ID No. 1) with the amino acid sequence of Cas12i1 (SEQ ID No. 13). Shaded amino acids represent the same amino acids of the 2 Cas12i proteins, and amino acids marked with white boxes represent amino acids of the 2 Cas12i proteins with similar properties.
Detailed Description
It should be noted that certain terms are used throughout the description and claims to refer to particular components. As one skilled in the art will appreciate, various names may be used to refer to a component. This specification and claims do not intend to distinguish between components that differ in name but not function. In the following description and in the claims, the terms "include" and "comprise" are used in an open-ended fashion, and thus should be interpreted to mean "include, but not limited to. The description which follows is a preferred embodiment of the invention, but is made for the purpose of illustrating the general principles of the invention and not for the purpose of limiting the scope of the invention. The scope of the present invention is defined by the appended claims. Unless defined otherwise, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this disclosure belongs.
I. Term(s) for
As used herein, "effector protein" refers to a protein having an activity such as site-specific binding activity, single-stranded DNA cleavage activity, double-stranded DNA cleavage activity, single-stranded RNA cleavage activity, or transcriptional regulatory activity.
As used herein, "guide RNA" and "gRNA" are used interchangeably herein and refer to an RNA that is capable of forming a complex with a Cas12i effector protein and a target nucleic acid (e.g., double-stranded DNA). Also contemplated herein are precursor guide RNA arrays that can be processed into multiple crrnas. "crRNA" or "CRISPR RNA" comprises a guide sequence of sufficient complementarity to a target sequence of a target nucleic acid (e.g., double-stranded DNA) that directs sequence-specific binding of a CRISPR complex to the target nucleic acid.
The terms "nucleic acid," "polynucleotide," and "nucleotide sequence" are used interchangeably to refer to a polymeric form of nucleotides of any length, including deoxyribonucleotides, ribonucleotides, combinations thereof, and analogs thereof. "oligonucleotide" and "oligonucleotide" are used interchangeably to refer to short polynucleotides having no more than about 50 nucleotides.
As used herein, "complementarity" refers to the ability of a nucleic acid to form a hydrogen bond with another nucleic acid through traditional Watson-Crick (Watson-Crick) base pairing. Percent complementarity refers to the percentage of residues (e.g., 5, 6, 7, 8, 9, 10 out of 10, complementary by about 50%, 60%, 70%, 80%, 90%, and 100%, respectively) in a nucleic acid molecule that can form hydrogen bonds (i.e., watson-crick base pairing) with a second nucleic acid. "completely complementary" means that all consecutive residues of a nucleic acid sequence form hydrogen bonds with the same number of consecutive residues in a second nucleic acid sequence. As used herein, "substantially complementary" refers to a degree of complementarity of at least any one of about 70%, 75%, 80%, 85%, 90%, 95%, 97%, 98%, 99%, or 100 over a region of about 40, 50, 60, 70, 80, 100, 150, 200, 250 or more nucleotides, or to two nucleic acids that hybridize under stringent conditions.
As used herein, "stringent conditions" for hybridization refer to conditions under which a nucleic acid having complementarity to a target sequence predominantly hybridizes to the target sequence, but not substantially to non-target sequences. Stringent conditions are generally sequence dependent and vary depending on a number of factors. Generally, the longer the sequence, the higher the temperature at which the sequence specifically hybridizes to its target sequence. Non-limiting examples of stringent conditions are described In detail In Tijssen (1993), laboratory Techniques In biochemistry and Molecular Biology-Hybridization within nucleic acid Probes, chapter I, "Overview of principles of Hybridization and the protocol of nucleic acid probe assay," Elsevier, N, Y.
"hybridization" refers to the reaction of one or more polynucleotides to form a complex that is stabilized by hydrogen bonding between the bases of the nucleotide residues. Hydrogen bonding can occur by watson crick base pairing, hopstein (Hoogstein) binding, or in any other sequence specific manner. Sequences that are capable of hybridizing to a given sequence are referred to as "complements" of the given sequence.
"percent (%) sequence identity" with respect to a nucleic acid sequence is defined as the percentage of nucleotides in a candidate sequence that are identical to the nucleotides in the particular nucleic acid sequence after aligning the sequences, if necessary, by allowing gaps (gaps) to achieve the maximum percent sequence identity. "percent (%) sequence identity" with respect to a peptide, polypeptide or protein sequence is the percentage of amino acid residues in a candidate sequence that are identically substituted with amino acid residues in the particular peptide or amino acid sequence after aligning the sequences by allowing gaps, if necessary, to achieve the maximum percent sequence homology. For the purpose of determining percent amino acid sequence identity, alignments can be performed in a variety of ways within the skill in the art, e.g., using techniques such as BLAST, BLAST-2, ALIGN, or MEGALIGN TM Publicly available computer software such as (DNASTAR) software. One skilled in the art can determine suitable parameters for measuring alignment, including any algorithms required to achieve maximum alignment over the full length of the sequences being compared.
The terms "polypeptide" and "peptide" are used interchangeably herein to refer to a polymer of amino acids of any length. The polymer may be linear or branched, it may comprise modified amino acids, and it may be interrupted by non-amino acids. The protein may have one or more polypeptides. The term also encompasses amino acid polymers that have been modified; for example, disulfide bond formation, glycosylation, lipidation, acetylation, phosphorylation, or any other manipulation (such as conjugation to a labeling component).
As used herein, "variant" is to be construed as a polynucleotide or polypeptide that differs from a reference polynucleotide or polypeptide, respectively, but retains the necessary properties. A typical variant of a polynucleotide differs in nucleic acid sequence from another, reference polynucleotide. Changes in the variant nucleic acid sequence may or may not alter the amino acid sequence of a polypeptide encoded by the reference polynucleotide. Nucleotide changes can result in amino acid substitutions, additions, deletions, fusions and truncations in the polypeptide encoded by the reference sequence, as described below. A typical variant of a polypeptide differs in amino acid sequence from another reference polypeptide. Typically, the differences are limited such that the sequences of the reference polypeptide and the variant are very similar overall and identical in many regions. The amino acid sequences of the variant and reference polypeptides may differ by any combination of one or more substitutions, additions, deletions. The substituted or inserted amino acid residue may or may not be an amino acid residue encoded by the genetic code. Variants of a polynucleotide or polypeptide may be naturally occurring (such as allelic variants), or may be variants that are not known to occur naturally. Non-naturally occurring variants of polynucleotides and polypeptides can be prepared by mutagenesis techniques, by direct synthesis, and by other recombinant methods known to those skilled in the art.
As used herein, the term "wild-type" has the meaning commonly understood by those skilled in the art, meaning a typical form of an organism, strain, gene or characteristic that distinguishes it from a mutant or variant when it exists in nature. It can be isolated from the resources in nature and not deliberately modified.
As used herein, the terms "non-naturally occurring" or "engineered" are used interchangeably and refer to artificial participation. When these terms are used to describe a nucleic acid molecule or polypeptide, it is meant that the nucleic acid molecule or polypeptide is at least substantially free of at least one other component with which it is naturally associated or naturally occurring.
As used herein, the term "ortholog" has the meaning commonly understood by one of ordinary skill in the art. By way of further guidance, an "ortholog" of a protein as referred to herein refers to a protein belonging to a different species which performs the same or similar function as the protein being an ortholog thereof.
As used herein, the term "identity" is used to indicate a sequence match between two polypeptides or between two nucleic acids. When a position in two compared sequences is occupied by the same base or amino acid monomer subunit (e.g., a position in each of two DNA molecules is occupied by adenine, or a position in each of two polypeptides is occupied by lysine), then each molecule is identical at that position. The "percent identity" between these two sequences is a function of the number of matching positions shared by the two sequences divided by the number of positions to be compared x 100. For example, if 6 of 10 positions of two sequences match, then the two sequences have 60% identity. For example, the DNA sequences CTGACT and CAGGTT are 50% identical (3 matches out of 6 positions in total). Typically, such a comparison is made when two sequences are aligned to yield maximum identity. Such alignment can be achieved, for example, by the method in Needleman et al, (1970) j.mol.biol.48:443-453, which can conveniently be performed by computer programs such as the alignment (Align) program (DNAstar, inc.). A PAM 120 weight residue table can also be employed, integrated into the ALIGN program (version 2.0) using the algorithm of e.meyers and w.miller (comput.appl biosci., 4. Gap length penalty 12 and gap penalty 4, for determining the percent identity between two amino acid sequences. In addition, the percent identity between two amino acid sequences can be determined using the Needleman and Wunsch (JMoI biol.48:444-453 (1970)) algorithm integrated into the GAP program of the GCG software package (available from www.gcg.com) using either the Blossum 62 matrix or the PAM250 matrix with GAP weights of 16, 14, 12, 10, 8, 6, or 4 and length weights of 1, 2, 3,4, 5, or 6.
As used herein, "cell" is understood to refer not only to a particular individual cell, but also to the progeny or potential progeny of such a cell. Because certain modifications may occur in the progeny due to mutation or environmental influences, such progeny may not, in fact, be identical to the parent cell, but are still included within the scope of the term.
As used herein, the terms "transduction" and "transfection" include methods known in the art for introducing DNA into cells using infectious agents (e.g., viruses) or otherwise to express a protein or molecule of interest. In addition to viral or virus-like agents, there are chemical-based transfection methods, such as those using calcium phosphate, dendrimers, liposomes or cationic polymers (e.g., DEAE-dextran or polyethyleneimine); non-chemical methods such as electroporation, cell extrusion (cell seeding), sonoporation (sonoporation), optical transfection, transfections by puncture (electroporation), protoplast fusion, plasmid delivery, or transposons; particle-based methods such as the use of particle guns, magnetic or magnet-assisted transfection, particle bombardment; and hybridization methods (such as nuclear transfection).
As used herein, the term "transfected," "transformed," or "transduced" refers to the process of transferring or introducing an exogenous nucleic acid into a host cell. A "transfected", "transformed" or "transduced" cell is a cell that has been transfected, transformed or transduced with an exogenous nucleic acid.
The term "in vivo" refers to the organism from which cells are obtained. "ex vivo" or "in vitro" refers to the organism from which the cells are obtained.
As used herein, "treatment" is a method for obtaining beneficial or desired results, including clinical results. For purposes of the present invention, beneficial or desired clinical results include, but are not limited to, one or more of the following: alleviating one or more symptoms caused by a disease, alleviating the extent of a disease, stabilizing a disease (e.g., preventing or delaying the worsening of a disease), preventing or delaying the spread (e.g., metastasis) of a disease, preventing or delaying the recurrence of a disease, reducing the rate of recurrence of a disease, delaying or slowing the progression of a disease, ameliorating the state of a disease, providing (partial or total) remission of a disease, reducing the dose of one or more other drugs required to treat the disease, delaying the progression of a disease, improving the quality of life, and/or prolonging survival. "treating" also includes reducing the pathological consequences of a disorder, condition, or disease. The methods of the invention contemplate any one or more of these therapeutic aspects.
As used herein, the term "effective amount" refers to an amount of a compound or composition sufficient to treat a particular disorder, condition, or disease (e.g., ameliorate, alleviate, reduce, and/or delay one or more symptoms thereof). As understood in the art, an "effective amount" may be administered in one or more administrations, i.e., a single administration or multiple administrations may be required to achieve the desired therapeutic endpoint.
"subject," "individual," or "patient" are used interchangeably herein for therapeutic purposes and refer to any animal classified as a mammal, including humans, domestic and farm animals, as well as zoo, farm or pet animals such as dogs, horses, cats, cattle, and the like. In some embodiments, the subject is a human subject.
It is to be understood that the embodiments of the invention described herein include embodiments that "consist of and/or" consist essentially of. Reference herein to "about" a value or parameter includes (and describes) variations that are directed to that value or parameter itself. For example, reference to a description of "about X" includes a description of "X".
As used herein, reference to a "not" value or parameter generally means and describes a "value or parameter other than …. For example, the method is not used to treat type X cancer, meaning that the method is used to treat cancers other than type X.
As used herein, the term "about X-Y" has the same meaning as "about X to about Y".
As used herein and in the appended claims, the singular forms "a", "an" and "the" include plural referents unless the context clearly dictates otherwise. It should also be noted that the claims may be drafted to exclude any optional element. It is thus intended that such statements be regarded as antecedent basis for use of such exclusive terminology as "solely," "only," and the like, or as limitations upon the use of "no" in connection with the recitation of claim elements.
As used herein, the term "and/or" in words such as "a and/or B" is intended to include both a and B; a or B; a (alone); and B (alone). Likewise, as used herein, the term "and/or" in words such as "A, B and/or C" is intended to include each of the following embodiments: A. b and C; A. b or C; a or C; a or B; b or C; a and C; a and B; b and C; a (alone); b (alone); and C (alone).
Cas12i nuclease and effector proteins
Engineered Cas12i nucleases
In some embodiments, an engineered Cas12i nuclease is provided; the engineered Cas12i nuclease comprises one, two, three, or four reference Cas12i nuclease-based mutations:
(1) Replacing an amino acid in the reference Cas12i nuclease that interacts with PAM with a positively charged amino acid; and/or
(2) Replacing amino acids involved in opening a DNA double strand in a reference Cas12i nuclease with amino acids with aromatic rings; and/or
(3) Replacing an amino acid in the reference Cas12i nuclease that is located in the RuvC domain and interacts with the single-stranded DNA substrate with a positively charged amino acid; and/or
(4) Replacing amino acids interacting with a DNA-RNA duplex in a reference Cas12i nuclease with positively charged amino acids; the reference Cas12i nuclease is a native Cas12i nuclease or an engineered Cas12i nuclease, e.g., a native Cas12i2 nuclease (e.g., whose amino acid sequence is defined as SEQ id No. 1).
The present application provides methods for engineering enzymes by introducing amino acid mutations based on a combination of any one or more of the three engineering principles described above, which result in increased enzyme activity in vitro and in vivo. The engineered Cas12i nuclease contains one or more specific mutations as described in sections 1) -4) below. In some embodiments, any one or more of the mutations described herein can be combined with existing Cas12i mutations (e.g., mutations described in section 5 below) to provide an engineered Cas12i nuclease with higher activity.
In some embodiments, an engineered Cas12i nuclease is provided; the engineered Cas12i nuclease comprises a mutation that replaces one or more amino acids in a reference Cas12i nuclease that interact with PAM with a positively charged amino acid. In some embodiments, an engineered Cas12i nuclease is provided; the engineered Cas12i nuclease comprises replacing one or more amino acids involved in opening a DNA double strand in a reference Cas12i nuclease with an amino acid with an aromatic ring. In some embodiments, an engineered Cas12i nuclease is provided; the engineered Cas12i nuclease comprises replacing one or more amino acids in a reference Cas12i nuclease that are located in the RuvC domain and interact with a single-stranded DNA substrate with positively charged amino acids. In some embodiments, an engineered Cas12i nuclease is provided; the engineered Cas12i nuclease comprises a substitution of one or more amino acids in a reference Cas12i nuclease that interact with a DNA-RNA duplex with a positively charged amino acid.
In some embodiments, an engineered Cas12i nuclease is provided; the engineered Cas12i nuclease comprises: 1) A mutation that replaces one or more amino acids in the reference Cas12i nuclease that interact with PAM with a positively charged amino acid; and 2) replacing one or more amino acids involved in opening the DNA double strand in the reference Cas12i nuclease with an amino acid with an aromatic ring. In some embodiments, an engineered Cas12i nuclease is provided; the engineered Cas12i nuclease comprises: 1) A mutation that replaces one or more amino acids in the reference Cas12i nuclease that interact with PAM with a positively charged amino acid; and 2) replacing one or more amino acids in the reference Cas12i nuclease that are located in the RuvC domain and interact with the single-stranded DNA substrate with a positively charged amino acid. In some embodiments, an engineered Cas12i nuclease is provided; the engineered Cas12i nuclease comprises: 1) Replacing one or more amino acids involved in opening a DNA double strand in a reference Cas12i nuclease with amino acids with aromatic rings; and 2) replacing one or more amino acids in the reference Cas12i nuclease that are located in the RuvC domain and interact with the single-stranded DNA substrate with a positively charged amino acid.
In some embodiments, an engineered Cas12i nuclease is provided; the engineered Cas12i nuclease comprises: 1) A mutation that replaces one or more amino acids in the reference Cas12i nuclease that interact with PAM with a positively charged amino acid; and 2) replacing one or more amino acids in the reference Cas12i nuclease that interact with the DNA-RNA duplex with a positively charged amino acid. In some embodiments, an engineered Cas12i nuclease is provided; the engineered Cas12i nuclease comprises: 1) Replacing one or more amino acids involved in opening a DNA double strand in a reference Cas12i nuclease with amino acids with aromatic rings; and 2) replacing one or more amino acids in the reference Cas12i nuclease that interact with the DNA-RNA duplex with a positively charged amino acid. In some embodiments, an engineered Cas12i nuclease is provided; the engineered Cas12i nuclease comprises: 1) Replacing one or more amino acids in the reference Cas12i nuclease that are located in the RuvC domain and interact with the single-stranded DNA substrate with a positively charged amino acid; and 2) replacing one or more amino acids in the reference Cas12i nuclease that interact with the DNA-RNA duplex with a positively charged amino acid.
In some embodiments, an engineered Cas12i nuclease is provided; the engineered Cas12i nuclease comprises: 1) A mutation that replaces one or more amino acids in the reference Cas12i nuclease that interact with PAM with a positively charged amino acid; 2) Replacing one or more amino acids involved in opening a DNA double strand in a reference Cas12i nuclease with amino acids with aromatic rings; and 3) replacing one or more amino acids in the reference Cas12i nuclease that are located in the RuvC domain and interact with the single-stranded DNA substrate with a positively charged amino acid. In some embodiments, an engineered Cas12i nuclease is provided; the engineered Cas12i nuclease comprises: 1) A mutation that replaces one or more amino acids in the reference Cas12i nuclease that interact with PAM with a positively charged amino acid; 2) Replacing one or more amino acids in the reference Cas12i nuclease that are involved in opening the DNA double strand with an amino acid with an aromatic ring; and 3) replacing one or more amino acids in the reference Cas12i nuclease that interact with the DNA-RNA duplex with a positively charged amino acid. In some embodiments, an engineered Cas12i nuclease is provided; the engineered Cas12i nuclease comprises: 1) A mutation that replaces one or more amino acids in the reference Cas12i nuclease that interact with PAM with a positively charged amino acid; 2) Replacing one or more amino acids in the reference Cas12i nuclease that are located in the RuvC domain and interact with the single-stranded DNA substrate with a positively charged amino acid; and 3) replacing one or more amino acids in the reference Cas12i nuclease that interact with the DNA-RNA duplex with a positively charged amino acid. In some embodiments, an engineered Cas12i nuclease is provided; the engineered Cas12i nuclease comprises: 1) Replacing one or more amino acids in the reference Cas12i nuclease that are located in the RuvC domain and interact with the single-stranded DNA substrate with a positively charged amino acid; 2) Replacing one or more amino acids in the reference Cas12i nuclease that are located in the RuvC domain and interact with the single-stranded DNA substrate with a positively charged amino acid; and 3) replacing one or more amino acids in the reference Cas12i nuclease that interact with the DNA-RNA duplex with a positively charged amino acid.
In some embodiments, an engineered Cas12i nuclease is provided; the engineered Cas12i nuclease comprises: 1) A mutation that replaces one or more amino acids in the reference Cas12i nuclease that interact with PAM with a positively charged amino acid; 2) Replacing one or more amino acids involved in opening a DNA double strand in a reference Cas12i nuclease with amino acids with aromatic rings; 3) Replacing one or more amino acids in the reference Cas12i nuclease that reside in the RuvC domain and interact with a single-stranded DNA substrate with a positively charged amino acid; and 4) replacing one or more amino acids in the reference Cas12i nuclease that interact with the DNA-RNA duplex with a positively charged amino acid.
1) Replacement of amino acids interacting with PAM in reference Cas12i nuclease to positively charged amino acids
In some embodiments, the engineered Cas12i nuclease comprises one or more reference Cas12i nuclease (e.g., cas12i 2) -based mutations that replace amino acids in the reference Cas12i nuclease that interact with the PAM with positively charged amino acids. In some embodiments, the engineered Cas12i nuclease comprises one, two, three, four, five, or six substitutions of the amino acid residues.
In some embodiments, the amino acid that interacts with PAM is an amino acid within 9 angstroms of the PAM in three-dimensional structure, and may be, for example: an amino acid within a distance of 9 angstroms from the PAM in a three-dimensional structure, an amino acid within a distance of 8 angstroms from the PAM in a three-dimensional structure, an amino acid within a distance of 7 angstroms from the PAM in a three-dimensional structure, an amino acid within a distance of 6 angstroms from the PAM in a three-dimensional structure, an amino acid within a distance of 5 angstroms from the PAM in a three-dimensional structure, an amino acid within a distance of 4 angstroms from the PAM in a three-dimensional structure, or an amino acid within a distance of 3 angstroms from the PAM in a three-dimensional structure.
In some embodiments, the one or more reference Cas12i nuclease-based mutations are one or more amino acids at the following positions: 176. 178, 226, 227, 229, 237, 238, 264, 447, and 563. In some embodiments, the one or more reference Cas12i nuclease-based mutations are at one or more of the following amino acids: e176, E178, Y226, a227, N229, E237, K238, K264, T447, E563. In some embodiments, the one or more reference Cas12i nuclease-based mutations are at one or more of the following amino acids: e176, K238, T447, E563. In some embodiments, the reference Cas12i nuclease-based mutation is at amino acid residue 563, e.g., E563. In some embodiments, the amino acid position numbering is as defined in SEQ ID No.1.
In the context of the present specification, the meaning of E176 is; in the amino acid sequences cited, amino acid No. 176E (glutamic acid); the common amino acids and their three-letter and one-letter abbreviations are here specified by way of example as follows:
alanine AlaA; arginine Arg R;
aspartic acid Asp D; cysteine Cys C;
glutamine Gln Q; glutamic acid Glu E;
histidine His H; isoleucine Ile I;
glycine Gly G; asparagine, asnN;
leucine Leu L; lysine Lys K;
methionine Met M; phenylalanine Phe F;
proline Pro P; serine, ser S;
threonine ThrT; tryptophan Trp W;
tyrosine TyrY; valine ValV.
As used herein, "the amino acid is at position X, wherein the amino acid position numbering is as defined in SEQ ID No. 1" means: the amino acid residue is located at a position of the reference enzyme Cas12i, which is equivalent to position X of SEQ ID No.1, and the amino acid sequence of the reference enzyme Cas12i and the amino acid sequence of SEQ ID No.1 are aligned with each other based on sequence homology. For example, FIG. 7 shows a homology alignment of CAS12i2 (SEQ ID NO. 1) with the amino acid sequence of CAS12i1 (SEQ ID NO. 13). The amino acid positions in any reference Cas12i nuclease corresponding to the amino acid positions defined in the present application based on SEQ ID No.1 can be obtained by alignment and comparison of the amino acid sequence of said reference Cas12i nuclease with SEQ ID No.1 by the skilled person using software commonly used in the art, such as Clustal Omega.
In some embodiments, the reference Cas12i nuclease-based mutation is a substitution of the corresponding amino acid residue in the reference Cas12i nuclease to R or K. In some embodiments, the reference Cas12i nuclease-based mutation is a substitution of a corresponding amino acid residue in a reference Cas12i nuclease to R.
In some embodiments, the engineered Cas12i nuclease comprises one or more of the following amino acid residues: r176, R238, R447 and R563, wherein the amino acid position numbering is as defined in SEQ ID No.1. In some embodiments, the engineered Cas12i nuclease comprises one or more of the following reference Cas12i nuclease-based mutations: E176R, K R, T R and E563R; wherein the amino acid position number is defined as SEQ ID NO.1. In some embodiments, the engineered Cas12i nuclease comprises an E563R mutation; wherein the amino acid position number is defined as SEQ ID NO.1. In some embodiments, engineered enzymes having at least 85% sequence identity to the engineered Cas12i nuclease described above can also be used for the purpose of improving gene editing efficiency; in some embodiments, engineered enzymes having at least 87%, 89%, 91%, 93%, 95%, 97%, 99% sequence identity thereto may be used.
In the context of the present specification, E176R means that the amino acid E (glutamic acid) No. 176 is replaced by R (arginine) in the amino acid sequence cited.
In some embodiments, the engineered Cas12i nuclease comprises a mutation or combination of mutations at any one of the following amino acid residue positions: 176. 238, 264, 447, 563, 176 and 238, 176 and 447, 176 and 563, 238 and 447, 238 and 563, 447 and 563, 176 and 238 and 447, 176 and 238 and 563, 176 and 447 and 563, 238 and 447 and 563, 176 and 238 and 447 and 563, and 447 and 563; wherein the amino acid position number is defined as SEQ ID NO.1. In some embodiments, the mutation is a substitution of the corresponding amino acid residue in a reference Cas12i nuclease to R or K, such as R. In some embodiments, the engineered Cas12i nuclease comprises any one of the following amino acid residues or combinations: r176, R238, R264, R447, R563, R176+ R238, R176+ R447, R176+ R563, R238+ R447, R238+ R563, R447+ R563, R176+ R238+ R447, R176+ R238+ R563, R176+ R447+ R563, R238+ R447+ R563, R176+ R238+ R447+ R563; wherein the amino acid position number is defined as SEQ ID NO.1.
In some embodiments, the engineered Cas12i nuclease comprises any one of the following mutation/mutation combinations: E176R, K R, E264R, T447R, E563R, E R + K238R, E R + T447R, E R + E563R, K R + T447R, K R + E563R, T R + E563 3524 zxft 35176R + K238R + T R, E R + K238R + E563R, E R + T447R + E563R 5272 zxft 3524R + T447R + E563R, E R + K238R + T447R + E563R; wherein the amino acid position number is defined as SEQ ID NO.1. In some embodiments, the engineered Cas12i nuclease comprises any one of the following mutation/mutation combinations: E563R, E R + T447R, E R + E563R, K R + E563R, E R + K238R + T447R, E R176R + K238R + E563R, E R + T447R + E563R, E R + K238R + T447R + E563R; wherein the amino acid position number is defined as SEQ ID NO.1. In some embodiments, the engineered Cas12i nuclease comprises an E176R + K238R + T447R + E563R mutation combination; wherein the amino acid position number is defined as SEQ ID NO.1. For the purpose of improving gene editing efficiency, engineered enzymes having at least 85% sequence identity to the above-described engineered Cas12i may also be used; in some embodiments, engineered enzymes having at least 87%, 89%, 91%, 93%, 95%, 97%, 99% sequence identity thereto may be used.
2) Reference to Cas12iNucleic acidsSubstitution of amino acids involved in opening DNA double strand in enzyme with amino acids having aromatic ring
In some embodiments, the engineered Cas12i nuclease comprises one or more reference Cas12i nuclease (e.g., cas12i 2) based mutations that replace amino acids involved in opening a DNA double strand in a reference Cas12i nuclease with amino acids with aromatic rings. In some embodiments, the engineered Cas12i nuclease comprises one, two, three, four, five, or six substitutions of the amino acid residues.
Wherein the one or more amino acids involved in opening the DNA duplex are amino acids that interact with the last base pair of the PAM relative to the 3' end of the target strand. For example, the PAM sequence recognized by Cas12i2 is a 5'-NTTN-3' base pair, where the base pair formed by the N base at the 3 'end of the PAM sequence and the target strand is the "last base pair to the 3' end of the PAM relative to the target strand" described herein, and this base pair is followed by the sequence of the target site.
In some embodiments, the one or more amino acids are at the following positions: 163 and/or 164. In some embodiments, the one or more amino acids are one or more of the following: q163, N164. In some embodiments, the amino acid is N164; wherein the amino acid position numbering is as defined in SEQ ID NO.1.
In some embodiments, the amino acid involved in opening a DNA double strand is replaced with F, Y or W. In some embodiments, the amino acid involved in opening the DNA duplex is replaced with F. In some embodiments, the amino acid involved in opening the DNA duplex is replaced with Y.
In some embodiments, the engineered Cas12i nuclease comprises any one or more of the following amino acid residues: f163, Y163, W163, F164, or Y164; wherein the amino acid position numbering is as defined in SEQ ID NO.1.
In some embodiments, the engineered Cas12i nuclease comprises any one of the mutations Q163F, Q163Y, Q163W, N F or N164Y. In some embodiments, wherein the amino acid position numbering is as defined in SEQ ID No.1. In some embodiments, the engineered Cas12i nuclease comprises a N164Y, or N164F mutation. In some embodiments, the engineered Cas12i nuclease comprises an N164Y mutation. In some embodiments, for the purpose of improving gene editing efficiency, engineered enzymes having at least 85% sequence identity to the above engineered enzymes may also be used; in some embodiments, an enzyme having at least 87%, 89%, 91%, 93%, 95%, 97%, 99% sequence identity thereto may be used.
3) Substitution of amino acids in a reference Cas12i nuclease that are located in RuvC domain and interact with single-stranded DNA substrate Conversion to positively charged amino acid
In some embodiments, the engineered Cas12i nuclease comprises one or more mutations based on a reference Cas12i nuclease (e.g., cas12i 2) that replace an amino acid in the reference Cas12i enzyme that is located in the RuvC domain and that interacts with a single-stranded DNA substrate with a positively charged amino acid. In some embodiments, the engineered Cas12i enzyme comprises one, two, three, four, five, or six substitutions of the amino acid residues.
Wherein the one or more amino acids located in the RuvC domain and interacting with the single-stranded DNA substrate are amino acids within 9 angstroms of the single-stranded DNA substrate in three-dimensional structure, and may be, for example: amino acids within a distance of 8 angstroms from the single-stranded DNA substrate in the three-dimensional structure, amino acids within a distance of 7 angstroms from the single-stranded DNA substrate in the three-dimensional structure, amino acids within a distance of 6 angstroms from the single-stranded DNA substrate in the three-dimensional structure, amino acids within a distance of 5 angstroms from the single-stranded DNA substrate in the three-dimensional structure, amino acids within a distance of 4 angstroms from the single-stranded DNA substrate in the three-dimensional structure, and amino acids within a distance of 3 angstroms from the single-stranded DNA substrate in the three-dimensional structure. The RuvC domain is the enzymatically active domain of the Cas12i protein responsible for cleaving single-stranded DNA or double-stranded DNA. In the primary sequence of the protein, the RuvC domain of Cas12i is divided into 3 parts: ruvC-1, ruvC-2, and RuvC-3. These 3 portions are adjacent in three-dimensional structure and together form a catalytic pocket with enzymatic activity. The three-dimensional crystal structure of Cas12i2, its domain composition, and interaction with DNA substrates are described in Huang x.et al, nature Communications,11, articule number. The three-dimensional crystal structure of Cas12i1, its domain composition, and interaction with DNA substrates are described in Zhang H.et al Nature Structural & Molecular Biology 27,1069-1076 (2020). A three-dimensional structural model of the interaction of the reference Cas12i and the substrate can be obtained by known three-dimensional crystal structures of Cas12i by homologous structure comparison and modeling (homology modeling). One way of modeling to obtain the amino acids in Cas12i2 that are located in the RuvC domain and within 9 angstroms of the single-stranded DNA substrate is described in example 3.
In some embodiments, the one or more amino acids are one or more of the following positions: 323. 362, 425, 925, 926, 390, 391, 392, 751, 755, 840, 848, 851, 856, 885, 897, 929, 932, 327, 355, 359, 360, 361, 414, 421, 650, 652, 705, 708, 709, 752, 928, 388, 393, 417, 418, 424, 653, 696, 1022. In some embodiments, the one or more amino acids are one or more of the following: e323, D362, Q425, N925, I926, N390, N391, F392, L751, E755, N840, N848, S851, a856, Q885, M897, G929, Y932, L327, V355, G359, G360, K361, Q414, K421, S650, E652, K705, K708, E709, S752, T928, L388, K393, L417, a418, Q424, G653, I696, a1022. In some embodiments, the one or more amino acids are one or more of the following: e323, D362, Q425, N925, I926, N391, Q424, G929, L388, L417. In some embodiments, the amino acid position numbering is as defined in SEQ ID No.1.
In some embodiments, the engineered Cas12i nuclease comprises a mutation that replaces one or more amino acids involved in cleavage of double-stranded DNA in a reference Cas12i nuclease with R or K. In some embodiments, the engineered Cas12i nuclease comprises a mutation that replaces one or more amino acids involved in cleavage of double-stranded DNA in a reference Cas12i nuclease with R.
In some embodiments, the engineered Cas12i nuclease comprises a mutation or combination of mutations at any one of the following amino acid residue positions: 323. 362, 425, 925, 926, and 929; wherein the amino acid position number is defined as SEQ ID NO.1. In some embodiments, the engineered Cas12i nuclease comprises a mutation or combination of mutations at any one of the following amino acid residue positions: 323. 362, 425, 925, 926, 323 and 362, 323 and 425, 323 and 926, 362 and 425, 362 and 926, 425 and 926, 925 and 926, 323 and 362 and 425, 323 and 362 and 926, 323 and 425 and 926, 362 and 925 and 926, 362 and 425 and 926; wherein the amino acid position number is defined as SEQ ID NO.1. In some embodiments, the mutation is a mutation that replaces the amino acid residue at the position with R or K (e.g., R). In some embodiments, the engineered Cas12i nuclease comprises any one or combination of amino acids: r323, R362, R425, R925, R926, R323+ R362, R323+ R425, R323+ R926, R362+ R425, R362+ R926, R425+ R926, R925+ R926, R323+ R362+ R425, R323+ R362+ R926, R323+ R425+ R926, R362+ R925+ R926, R362+ R425+ R926; wherein the amino acid position number is defined as SEQ ID NO.1. In some embodiments, the engineered Cas12i nuclease comprises any one or combination of the following mutations: E323R, D362R, Q424R, Q425 8978 zxft 89925R, I926R and G929R; wherein the amino acid position number is defined as SEQ ID NO.1. In some embodiments, the engineered Cas12i nuclease comprises any one or combination of the following mutations: E323R, D362R, Q425R, N925R, I926R, E R + D362R, E R + Q425R, E R + I926R, Q5R + I926R, D R + I926R, N R + I926R, E R + D362R + Q425R, E R + D362R + I926R, E R + Q425R + I926 5272 zxft 52362R + N R + I926R, E R + D362R + Q425R + I926R; wherein the amino acid position number is defined as SEQ ID NO.1. In some embodiments, the engineered Cas12I nuclease comprises an I926R mutation; wherein the amino acid position number is defined as SEQ ID NO.1. In some embodiments, the engineered Cas12i nuclease comprises an E323R + D362R mutation; wherein the amino acid position number is defined as SEQ ID NO.1. In some embodiments, engineered enzymes having at least 85% sequence identity to the above engineered enzymes may also be used for the purpose of improving gene editing efficiency. In some embodiments, an enzyme having at least 87%, 89%, 91%, 93%, 95%, 97%, 99% sequence identity thereto may be used.
4) Replacement of one or more amino acids in a reference Cas12i nuclease that interact with a DNA-RNA duplex with Positively charged amino acids
In some embodiments, the engineered Cas12i nuclease comprises one or more reference Cas12i nuclease (e.g., cas12i 2) -based mutations that replace one or more amino acids in a reference Cas12i nuclease that interact with a DNA-RNA duplex with a positively charged amino acid. In some embodiments, the engineered Cas12i enzyme comprises one, two, three, four, five, or six substitutions of the amino acid residues.
Wherein the one or more amino acids that interact with the DNA-RNA duplex are amino acids that are within 9 angstroms of the DNA-RNA duplex in three-dimensional structure, such as: amino acids within a distance of 8 angstroms from the DNA-RNA duplex in three-dimensional structure, amino acids within a distance of 7 angstroms from the DNA-RNA duplex in three-dimensional structure, amino acids within a distance of 6 angstroms from the DNA-RNA duplex in three-dimensional structure, amino acids within a distance of 5 angstroms from the DNA-RNA duplex in three-dimensional structure, amino acids within a distance of 4 angstroms from the DNA-RNA duplex in three-dimensional structure, or amino acids within a distance of 3 angstroms from the DNA-RNA duplex in three-dimensional structure. The working principle of certain Cas nucleases is as follows: cas and guide RNA (e.g., crRNA) form a complex, wherein the crRNA and the target DNA pair with each other to form a DNA-RNA duplex, which interacts with Cas nuclease to open the double-stranded target DNA and form an R-loop, such that the dsDNA is cleaved by the cleavage active site of Cas. The three-dimensional crystal structure of Cas12i2, its domain composition, and interaction with DNA-RNA duplexes are described in Huang x.et al, nature Communications,11, aromatic number 5241 (2020).
In some embodiments, the one or more amino acids are one or more of the following positions: 116. 117, 156, 159, 161, 301, 305, 306, 308, 312, 313, 427, 433, 438, 441, 442, 852, 855, 861, 865, 160, 316, 319, 320, 247, 343, 348, 349, 679, 683, 691, 782, 783, 797, 800, 853, 957, 958, 293, 294, or 297. In some embodiments, the one or more amino acids are one or more of the following: g116, E117, a156, T159, S161, T301, I305, K306, T308, N312, F313, D427, K433, V438, N441, Q442, M852, L855, N861, Q865, E160, Q316, E319, Q320, E247, E343, E348, E349, N679, E683, E691, D782, E783, E797, E800, D853, S957, D958, G293, E294, or N297. In some embodiments, the one or more amino acids are one or more of the following: g116, E117, T159, S161, E319, E343, or D958. In some embodiments, the one or more amino acids is D958. In some embodiments, the one or more amino acids are in some embodiments, the amino acid position numbering is as defined in SEQ ID No.1.
In some embodiments, the engineered Cas12i nuclease comprises a mutation that replaces one or more amino acids interacting with a DNA-RNA duplex in a reference Cas12i nuclease with R or K. In some embodiments, the engineered Cas12i nuclease comprises a mutation that replaces one or more amino acids involved in DNA-RNA duplex interactions in a reference Cas12i nuclease with R.
In some embodiments, the engineered Cas12i nuclease comprises a mutation or combination of mutations at any one of the following amino acid residue positions: 116. 117, 159, 161, 319, 343, or 958; wherein the amino acid position number is defined as SEQ ID NO.1. In some embodiments, the mutation is a mutation that replaces an amino acid residue at the position with R or K (e.g., R). In some embodiments, the engineered Cas12i nuclease comprises any one or combination of amino acids: r116, R117, R159, R161, R319, R343, or R958; wherein the amino acid position number is defined as SEQ ID NO.1. In some embodiments, the engineered Cas12i nuclease comprises any one or combination of the following mutations: E323R, D362R, Q424R, Q425 8978 zxft 89925R, I926R and G929R; wherein the amino acid position number is defined as SEQ ID NO.1. In some embodiments, the engineered Cas12i nuclease comprises any one or combination of the following mutations: G116R, E R, T159R, S R, E319R, E343R, or D958R; wherein the amino acid position number is defined as SEQ ID NO.1. In some embodiments, the engineered Cas12I nuclease comprises the I926R mutation; wherein the amino acid position number is defined as SEQ ID NO.1. In some embodiments, the engineered Cas12i nuclease comprises a D958R mutation; wherein the amino acid position number is defined as SEQ ID NO.1. In some embodiments, engineered enzymes having at least 85% sequence identity to the above engineered enzymes may also be used for the purpose of improving gene editing efficiency. In some embodiments, an enzyme having at least 87%, 89%, 91%, 93%, 95%, 97%, 99% sequence identity thereto may be used.
5) Other mutations
Any one or more of the mutations described in sections 1) to 4) can bind to any one or more of the known mutations that increase Cas12i activity, such as target binding, double strand cleavage activity, nickase activity, and/or gene editing activity. Exemplary mutations can be found, for example, in PCT/CN2020/0134249 and CN 112195164A, which are incorporated herein by reference in their entirety.
In some embodiments, the engineered Cas12i nuclease further comprises one or more flexibility region mutations that increase the flexibility of a flexible region in a reference Cas12i nuclease. The flexible region in the reference Cas12i nuclease can be determined using any method known in the art. In some embodiments, the plurality of flexible regions is determined based only on the amino acid sequence of the reference enzyme. In some embodiments, a plurality of flexible regions are determined based on structural information of the reference enzyme, including, for example, secondary structure, crystal structure, NMR structure, and the like.
The methods described herein for engineering the Cas12i nuclease flexible region include: (a) Obtaining a plurality of engineered Cas12i nucleases, each engineered Cas12i nuclease comprising one or more mutations that increase flexibility of a flexible region of a plurality of flexible regions of a reference Cas12i nuclease; and (b) selecting one or more engineered Cas12i nucleases from the plurality of engineered Cas12i nucleases, wherein the one or more engineered Cas12i nucleases have increased activity compared to the reference Cas12i nuclease. In some embodiments, the method further comprises determining a plurality of flexible regions in the reference Cas12i nuclease. In some embodiments, the activity is measured in a eukaryotic cell, such as a mammalian cell (e.g., a human cell).
In some embodiments, the plurality of flexible regions is determined using a procedure selected from the group consisting of: predyFlexy, foldUnfold, PROFbval, flexserv, flexPred, dynaMine, and Disomine. In some embodiments, the plurality of flexible regions are located at random convolutions. In some embodiments, the plurality of flexible regions are in DNA and/or RNA interacting domains of a reference Cas12i nuclease. In some embodiments, the flexible region is at least about 5 (e.g., 5) amino acids in length.
In some embodiments, the one or more mutations comprise the insertion of one or more (e.g., 2) glycine (G) residues in the flexible region. In some embodiments, the one or more G residues are inserted N-terminally of a flexible amino acid residue in the flexible region, wherein the flexible amino acid residue is selected from the group consisting of: G. serine (S), asparagine (N), aspartic acid (D), histidine (H), methionine (M), threonine (T), glutamic acid (E), glutamine (Q), lysine (K), arginine (R), alanine (a) and proline (P). In some embodiments, the flexible amino acid residues are selected according to the following priority: g > S > N > D > H > M > T > E > Q > K > R > A > P. In some embodiments, the one or more mutations comprise a substitution of one or more G residues for one or more non-G residues.
In some embodiments, the one or more mutations comprise a substitution of a hydrophobic amino acid residue in the flexible region with a G residue, wherein the hydrophobic amino acid residue is selected from the group consisting of: leucine (L), isoleucine (I), valine (V), cysteine (C), tyrosine (Y), phenylalanine (F), and tryptophan (W).
In some embodiments, the activity is a site-specific nuclease activity. In some embodiments, the activity is a gene editing activity in a eukaryotic cell (e.g., a human cell). In some embodiments, the gene editing efficiency is measured using the following method: t7 endonuclease 1 (T7E 1) assay, sequencing of target DNA, indel detection by decomposition-Tracking Indel (TIDE) assay or by amplicon analysis (IDAA) assay.
In some embodiments, the engineered Cas12i nuclease comprises one or more mutations that increase the flexibility of a flexible region in a reference Cas12i nuclease (such as a Cas12i2 nuclease) selected from the group of regions corresponding to: amino acid residues 228-232, amino acid residues 439-443, amino acid residues 478-482, amino acid residues 500-504, amino acid residues 775-779, and amino acid residues 925-929, wherein the amino acid residue numbering is based on SEQ ID NO:1. In some embodiments, the flexible region corresponds to amino acid residues 439-443 or amino acid residues 925-929, wherein the amino acid residue numbering is based on SEQ ID No.1. In some embodiments, the reference Cas12i enzyme is Cas12i2. In some embodiments, the one or more mutations comprise insertion of one or more (e.g., 2) G residues in the flexible region. In some embodiments, the one or more G residues are inserted N-terminally of a flexible amino acid residue in the flexible region, wherein the flexible amino acid residue is selected from the group consisting of: G. s, N, D, H, M, T, E, Q, K, R, A and P. In some embodiments, the flexible amino acid residues are selected according to the following priorities: g > S > N > D > H > M > T > E > Q > K > R > A > P. In some embodiments, the one or more mutations comprise a substitution of a hydrophobic amino acid residue in the flexible region with a G residue, wherein the hydrophobic amino acid residue is selected from the group consisting of: l, I, V, C, Y, F and W.
In some embodiments, the flexible region mutation is located at one or more of the following positions: 439. 926. In some embodiments, they are one or more of the following: l439 and I926.
In some embodiments, the engineered Cas12i nuclease comprises amino acid residues G926 and/or G439, the amino acid residue numbering based on SEQ ID No.1. In some embodiments, the engineered Cas12i nuclease comprises one or more of the following flexible region mutations: I926G; and/or 439G or 439GG. In some embodiments, the engineered Cas12I nuclease comprises the I926G mutation. In some embodiments, the engineered Cas12i nuclease comprises a 439G mutation. In some embodiments, the engineered Cas12i nuclease comprises a 439GG mutation.
In the context of the present specification 439G means that in the amino acid sequence referred to a glycine (G) is inserted after amino acid 439. And 439GG means that two glycines (GG) are inserted after amino acid No. 439 in the cited amino acid sequence.
In some embodiments, the engineered Cas12i nuclease comprises a mutation or combination of mutations at any one of the following amino acid residue positions: 926. 439, 925 and 926, 326 and 925, 926 and 439, 323 and 362 and 926; wherein the amino acid position number is defined as SEQ ID NO.1. In some embodiments, the mutation at amino acid position 323, 362, 925 or 926 is a mutation that replaces the amino acid residue at that position with R or K (e.g., R). In some embodiments, the mutation at amino acid position 439 or 926 is a mutation replacing the amino acid residue at said position with G or inserting G or GG after said amino acid residue.
In some embodiments, the engineered Cas12i nuclease comprises any one of the following amino acid residues or amino acid residue combinations: g926, 439GG, R925+ G926, R326+ R925+ G926, R926+439GG, or R323+ R362+ G926; wherein the amino acid position number is defined as SEQ ID NO.1.
In some embodiments, the engineered Cas12i nuclease comprises any one or combination of the following mutations: I926G,439GG, I926R +439GG, N925R + I926G, D R + N925R + I926G, E R + D362R + I926G; wherein the amino acid position numbering is as defined in SEQ ID NO.1.
In some embodiments, engineered enzymes having at least 85% sequence identity to the above engineered enzymes may also be used for the purpose of improving gene editing efficiency. In some embodiments, an enzyme having at least 87%, 89%, 91%, 93%, 95%, 97%, 99% sequence identity thereto may be used.
6) Combinatorial mutagenesis
Engineered enzymes obtained using the mutations described in sections 1) -5) of the present specification and combinations of the various amino acid substitutions/insertions in tables 1-5 are all within the scope of the claimed application.
In some embodiments, the engineered Cas12i nuclease comprises a mutation or combination of mutations at any one of the following amino acid residue positions: 176. 238, 447, 563, 164, 926, 323, 362, 439, 958, 176 and 238 and 447 and 563, 323 and 362, 176 and 238 and 447 and 563 and 164, 176 and 238 and 447 and 563 and 926, 176 and 238 and 447 and 563 and 362, 164 and 926, 164 and 323 and 362, 176 and 238 and 447 and 563 and 164 and 926, 176 and 238 and 447 and 164 and 323, 176 and 238 and 447 and 563 and 164 and 362, 176 and 238 and 563 and 164 and 926 and 323 and 362, 176 and 238 and 447 and 563 and 164 and 323 and 362, 176 and 238 and 447 and 563 and 323 and 362 and 323 and 926 and 439; wherein the amino acid position number is defined as SEQ ID NO.1. In some embodiments, the mutation at amino acid position 176, 238, 447, 563, 926, 323, 362 or 958 is a mutation that replaces the amino acid residue at that position with R or K (e.g., R). In some embodiments, the mutation at amino acid position 164 is a mutation that replaces the amino acid residue at that position with Y or F (e.g., Y). In some embodiments, the mutation at amino acid position 439 or 926 is a mutation replacing the amino acid residue at said position with G or inserting G or GG after said amino acid residue.
In some embodiments, the engineered Cas12i nuclease comprises any one of the following amino acid residues or amino acid residue combinations: r176, R238, R447, R563, Y164, R926, G926, R958, R323, R362, 439G, 439GG, R176+ R238+ R447+ R563, R323+ R362, R176+ R238+ R447+ R563+ Y164, R176+ R238+ R447+ R563+ R926, R176+ R238+ R447+ R563+ R362, Y164+ R926, Y164+ R323+ R362, R176+ R238+ R563+ Y164+ R447+ R926R 176+ R238+ R447+ R563+ Y164+ R323+ R362, R176+ R238+ R447+ R563+ Y164+ R926+ R323+ R362, R176+ R238+ R447+ R563+ Y164+ R323+ R362+ G926, R176+ R238+ R447+ R563+ Y164+ R323+ R362+ G926+439G, R + R238+ R447+ R563+ Y164+ R323+ R362+ G362 + 439; r176+ R238+ R447+ R563+ Y164+ R958; r176+ R238+ R447+ R563+ R926+ R958; r176+ R238+ R447+ R563+ R323+ R362+ R958; Y164Y + R926+ R958; y164+ R323+ R362+ R958; r176+ R238+ R447+ R563+ Y164+ R958; r176+ R238+ R447+ R563+ Y164+ R926+ R958; r176+ R238+ R447+ R563+ Y164+ R323+ R362+ R958; r176+ R238+ R447+ R563+ Y164+ R926+ R323+ R362+ R958; r176+ R238+ R447+ R563+ Y164+ R323+ R362+ G926+ R958; r176+ R238+ R447+ R563+ Y164+ R323+ R362+ G926+439GG + R958; or R176+ R238+ R447+ R563+ Y164+ R323+ R362+ G926+439G + R958;
wherein the amino acid position number is defined as SEQ ID NO.1.
In some embodiments, the engineered Cas12i nuclease comprises any one or combination of the following mutations: <xnotran> E176R + K238R + T447R + E563 3456 zxft 3456 164 3838 zxft 3838 926 5749 zxft 5749 323R + D362 6595 zxft 6595 176R + K238R + T447R + E563R + N164 6898 zxft 6898 176R + K238R + T447R + E563R + I926 3428 zxft 3428 164Y + E323R + D362 3476 zxft 3476 176R + K238R + T447R + E563R + E323R + D362 3734 zxft 3734 164Y + I926 3757 zxft 3757 176R + K238R + T447R + E563R + N164Y + I926 5852 zxft 5852 176R + K238R + T447R + E563R + N164Y + E323R + D362, E176R + K238R + T447R + E563R + N164Y + I926R + E323R + D362 3575 zxft 3575 176R + K238R + T447R + E563R + N164Y + E323R + D362R + I926 3625 zxft 3625 176R + K238R + T447R + E563R + N164Y + E323R + D362R + I926G +439GG, E176R + K238R + T447R + E563R + N164Y + E323R + D362R + I926G +439 3826 zxft 3826 176R + K238R + T447R + E563R + N164Y + D958 3828 zxft 3828 176R + K238R + T447R + E563R + I926R + D958 3925 zxft 3925 176R + K238R + T447R + E563R + E323R + D362R + D958 5483 zxft 5483 164Y + I926R + D958 5678 zxft 5678 164Y + E323R + D362R + D958 7439 zxft 7439 176R + K238R + T447R + E563R + N164Y + D958 8624 zxft 8624 176R + K238R + T447R + E563R + N164Y + I926R + D958 9696 zxft 9696 176R + K238R + T447R + E563R + N164Y + E323R + D362R + D958 3235 zxft 3235 176R + K238R + T447R + E563R + N164Y + I926R + E323R + D362R + D958 3292 zxft 3292 176R + K238R + T447R + E563R + N164Y + E323R + D362R + I926G + D958 3426 zxft 3426 176R + K238R + T447R + E563R + N164Y + E323R + D362R + I926G +439GG+D958R, E176R + K238R + T447R + E563R + N164Y + E323R + D362R + I926G +439G+D958R; </xnotran> Wherein the amino acid position number is defined as SEQ ID NO.1.
In some embodiments, the engineered Cas12i nuclease comprises E176R, K238R, T447R, E563R, N164Y, E R, and D362R mutations. In some embodiments, the engineered Cas12I nuclease comprises E176R, K238R, T447R, E563R, N Y, and I926R mutations. In some embodiments, the engineered Cas12i nuclease comprises E176R, K238R, T447R, E563R, E323R, and D362R mutations. In some embodiments, the engineered Cas12I nuclease comprises an E176R, K238R, T447R, E563R, N164Y, E R, D R, and an I926G mutation. In some embodiments, the engineered Cas12i nuclease comprises E176R, K R, T447R, E563R, N164Y, E323R, D362R, I G, and 439GG mutations. In some embodiments, the engineered Cas12i nuclease comprises the E176R, K32238R, T R, E563R, N164Y, E323R, D362R, I926G, and 439G mutations. In some embodiments, the engineered Cas12i nuclease comprises E176R, K238R, T447R, E563R, N Y, and D958R mutations. In some embodiments, the engineered Cas12i nuclease comprises the E176R, K32238R, T R, E563R, N164Y, I926R, and D958R mutations. In some embodiments, the engineered Cas12i nuclease comprises the E176R, K32238R, T R, E563R, N164Y, E323R, D362R, and D958R mutations. In some embodiments, the engineered Cas12i nuclease comprises E176R, K R, T447R, E563R, N164Y, I926R, E323R, D R, and D958R mutations. In some embodiments, the engineered Cas12i nuclease comprises E176R, K R, T447R, E563R, N164Y, E323R, D362R, I G, and D958R mutations. In some embodiments, the engineered Cas12i nuclease comprises the E176R, K R, T447R, E563R, N164Y, E323R, D362R, I G,439GG, and D958R mutations.
In some embodiments, for the purpose of improving gene editing efficiency, an engineered enzyme having at least 85% sequence identity to the above engineered enzyme may also be used; in some embodiments, enzymes having at least 87%, 89%, 91%, 93%, 95%, 97%, 99% sequence identity thereto may be used.
In some embodiments, an engineered Cas12i nuclease is provided that comprises any one of the amino acid sequences set forth in SEQ ID nos. 1-12, or an amino acid sequence having at least 85% (e.g., at least 87%, 89%, 91%, 93%, 95%, 97%, or 99%) sequence identity to any one of the amino acid sequences set forth in SEQ ID nos. 1-12.
6) Reference Cas12i nuclease
In some embodiments, the reference Cas12i nuclease is Cas12i1, cas12i2, or an ortholog thereof. In some embodiments, the reference Cas12i nuclease is a native Cas12i1. In some embodiments, the reference Cas12i nuclease is a native Cas12i2. In some embodiments, the reference Cas12i nuclease is an engineered Cas12i nuclease.
Type V-I CRISPR-Cas12I has been identified as an RNA-guided DNA endonuclease system. Unlike CRISPR-Cas systems such as Cas12b or Cas9, cas12 i-based CRISPR systems do not require a tracrRNA sequence. In some embodiments, the RNA guide sequence comprises crRNA. Typically, the crRNA described herein includes a direct repeat sequence and a spacer sequence. In certain embodiments, the crRNA comprises, consists essentially of, or consists of a direct repeat sequence linked to a guide sequence or a spacer sequence. In some embodiments, the crRNA includes direct repeats, spacer sequences, and direct repeats (DR-spacer-DR), which are typical features of precursor crRNA (pre-crRNA) configurations in other CRISPR systems. In some embodiments, the crRNA includes a truncated direct repeat sequence and a spacer sequence, which are typical features of processed or mature crRNA. In some embodiments, the CRISPR-Cas12i effector protein forms a complex with an RNA guide sequence and the spacer sequence directs the complex to sequence specific binding to a target nucleic acid that is complementary to the spacer sequence.
In some embodiments, the engineered Cas12i of the present application is an endonuclease that binds to a specific site of a target sequence and cleaves under the direction of a guide RNA, and has DNA and RNA endonuclease activity. In some embodiments, the Cas12i is capable of autonomous crRNA biogenesis by processing a precursor crRNA array. Autonomous precursor crRNA processing can facilitate delivery of Cas12i, enabling double-nicked applications, since two separate genomic sites can be targeted by a single crRNA transcript. The Cas12i protein then processes the CRISPR array into two homologous crrnas, forming a paired nick complex. Multiplexing (Multiplexing) of V-I type (Cas 12I) effector proteins is accomplished using the precursor crRNA processing capabilities of the effector protein, where programming on a single RNA guide sequence can be done for multiple targets with different sequences. In this way, multiple genes or DNA targets can be manipulated simultaneously for therapeutic applications. In some embodiments, the guide RNA comprises a precursor crRNA expressed by a CRISPR array consisting of a target sequence interleaved with an unprocessed DR sequence, repeated by intrinsic precursor crRNA processing of the effector protein to enable simultaneous targeting of one, two or more sites.
In some embodiments, the type VI CRISPR-Cas12i effector protein is capable of recognizing a Protospacer Adjacent Motif (PAM), and the target nucleic acid comprises or consists of a PAM that comprises or consists of the nucleic acid sequence 5'-TTN-3', 5'-TTH-3', 5'-TTY-3', or 5 '-TTC-3'.
Cas12i nucleases from a variety of organisms can be used as the reference Cas12i nuclease to provide the engineered Cas12i nucleases and effector proteins of the present application. Exemplary Cas12i nucleases have been described, for example, in WO2019/201331A1 and US2020/0063126A1, which are incorporated herein by reference in their entirety. In some embodiments, the reference Cas12i nuclease has enzymatic activity. In some embodiments, the reference Cas12i is a nuclease, i.e., cleaves both strands of a target duplex nucleic acid (e.g., duplex DNA). In some embodiments, the reference Cas12i is a nickase, i.e., cleaves a single strand of a target duplex nucleic acid (e.g., duplex DNA). In some embodiments, the reference Cas12i nuclease is enzymatically inactive. In some embodiments, the reference Cas12i enzyme is Cas12i1, cas12i2, or Cas12i-Phi. Orthologs with a certain sequence identity (e.g., at least any of about 60%, 70%, 80%, 85%, 90%, 95%, 98% or more) to Cas12i or a functional derivative thereof can be used as a basis for designing engineered Cas12i nucleases or effector proteins of the present application.
In some embodiments, the engineered Cas12i nuclease is based on a functional variant of a naturally occurring Cas12i nuclease. In some embodiments, the functional variant has one or more mutations, such as amino acid substitutions, insertions, and deletions. For example, the functional variant can comprise any of 1, 2, 3,4, 5, 6, 7, 8, 9, 10 or more amino acid substitutions as compared to a wild-type naturally occurring Cas12i nuclease. In some embodiments, the one or more substitutions are conservative substitutions. In some embodiments, the functional variant has all of the domains of a naturally occurring Cas12i nuclease. In some embodiments, the functional variant does not have one or more domains of a naturally occurring Cas12i nuclease.
Also provided are engineered Cas12i effector proteins based on any of the engineered Cas12i2 nucleases described herein. In some embodiments, the engineered Cas12i effector protein has enzymatic activity. In some embodiments, the engineered Cas12i effector protein is a nuclease that cleaves both strands of a target duplex nucleic acid (e.g., duplex DNA). In some embodiments, the engineered Cas12i effector protein is a nickase, i.e., cleaves a single strand of a target duplex nucleic acid (e.g., duplex DNA). In some embodiments, the engineered Cas12i effector protein comprises an enzyme inactivating mutant of the engineered Cas12i nuclease. Mutation of one or more amino acid residues in the active site of Cas12i nuclease results in a Cas12i that is enzymatically inactive. In some embodiments, an engineered Cas12i enzyme provided herein can be modified to have reduced nuclease activity, e.g., a nuclease that is inactivated by at least 50%, at least 60%, at least 70%, at least 80%, at least 90%, at least 95%, at least 97%, or 100% as compared to a wild-type Cas12i enzyme. The nuclease activity can be reduced by several methods, for example, introduction of mutations into the nuclease or PAM interaction domain of Cas12i enzyme. In some embodiments, catalytic residues for nuclease activity are identified, and these amino acid residues may be replaced with different amino acid residues (e.g., glycine or alanine) to reduce the nuclease activity. Examples of such mutations of Cas12i1 include D647A, E894A or D948A. Examples of such mutations of Cas12i2 include D599A, E833A, S883A, H884A, D886A, R a and/or D1019A.
7) Activity of engineered Cas12i nucleases
The engineered Cas12i nuclease has increased activity compared to the reference Cas12i nuclease. In some embodiments, the activity is a target DNA binding activity. In some embodiments, the activity is a site-specific nuclease activity. In some embodiments, the activity is a double-stranded DNA cleavage activity. In some embodiments, the activity is a single-stranded DNA cleavage activity, including, for example, a site-specific DNA cleavage activity or a non-specific DNA cleavage activity. In some embodiments, the activity is a single-stranded RNA cleavage activity, e.g., a site-specific RNA cleavage activity or a non-specific RNA cleavage activity. In some embodiments, the activity is measured in vitro. In some embodiments, the activity is measured in a cell, such as a bacterial cell, a plant cell, or a eukaryotic cell. In some embodiments, the activity is measured in a mammalian cell, such as a rodent cell or a human cell. In some embodiments, the activity is measured in human cells such as 293T cells. In some embodiments, the activity is measured in mouse cells, e.g., hepa1-6 cells. In some embodiments, the engineered Cas12i nuclease has a site-specific nuclease activity that is increased by any of at least about 20%, 30%, 40%, 60%, 70%, 80%, 90%, 1.5-fold, 2-fold, 3-fold, 4-fold, 5-fold, 10-fold, or more, compared to a reference Cas12i nuclease. The site-specific nuclease activity of the engineered Cas12i nuclease can be measured using methods known in the art, including, for example, gel migration assays, in vitro cleavage assays based on agarose gel electrophoresis as described in the examples provided herein.
In some embodiments, the activity is a gene editing activity in a cell. In some embodiments, the cell is a bacterial cell, a plant cell, or a eukaryotic cell. In some embodiments, the cell is a mammalian cell, such as a rodent cell or a human cell. In some embodiments, the cell is a 293T cell. In some embodiments, the activity is measured in mouse cells, e.g., hepa1-6 cells. In some embodiments, the activity is an indel forming activity of a target genomic site in the cell, e.g., site-specific cleavage of a target nucleic acid by the engineered Cas12i nuclease and DNA repair by a non-homologous end joining (NHEJ) mechanism. In some embodiments, the activity is insertion of an exogenous nucleic acid sequence at a target genomic site in the cell, e.g., site-specific cleavage of the target nucleic acid by the engineered Cas12i nuclease and DNA repair by a Homologous Recombination (HR) mechanism. In some embodiments, the engineered Cas12i nuclease increases gene editing (e.g., indel formation) activity of any of at least about 20%, 30%, 40%, 60%, 70%, 80%, 90%, 1.5-fold, 2-fold, 3-fold, 4-fold, 5-fold, 10-fold, or more at a genomic site of a cell (e.g., a human cell such as a 293T cell or a mouse Hepa1-6 cell) compared to a reference Cas12i nuclease. In some embodiments, the engineered Cas12i nuclease increases gene editing (e.g., indel formation) activity of any of at least about 20%, 30%, 40%, 60%, 70%, 80%, 90%, 1.5-fold, 2-fold, 3-fold, 4-fold, 5-fold, 10-fold, or more at a plurality (e.g., 2, 3,4, 5, 6, 7, 8, 9, 10 or more) of genomic sites of a cell (e.g., a human cell such as a 293T cell or a mouse Hepa1-6 cell) as compared to a reference Cas12i2 nuclease. In some embodiments, the engineered Cas12i nuclease is capable of editing a greater number of genomic sites than the reference Cas12i nuclease. In some embodiments, the consensus PAM sequence of the engineered Cas12i nuclease is the same as the reference Cas12i nuclease.
Gene editing efficiency of an engineered Cas12i nuclease in a cell can be determined using methods known in the art, including, for example, T7 endonuclease 1 (T7E 1) assays, sequencing of target DNA (including, for example, sanger sequences, and second generation sequencing), indel by break-down-Tracking Indel (TIDE) assays, or indel detection by amplicon analysis (IDAA) assays. See, e.g., sentmanat mf et al, "a superficial evaluation strategies for CRISPR-Cas9 editing," scientific reports,2018,8, article number 888, which is incorporated by reference herein in its entirety. In some embodiments, for example, as described in the examples herein, targeted Next Generation Sequencing (NGS) is used to measure gene editing efficiency of the engineered Cas12i nuclease in a cell. Exemplary genomic sites for determining gene editing efficiency of the engineered Cas12i nuclease include, but are not limited to CCR5, AAVS, CD34, RNF2, and EMX1. In some embodiments, the gene editing efficiency of the engineered Cas12i nuclease is the average gene editing efficiency of the engineered Cas12i nuclease at least 5, 10, 15,20, 25, 30, 35, 40, 45, 50, 55, 60, 65, or more sites (entry into the human cell genome site). In some embodiments, the engineered Cas12i nuclease achieves a gene editing efficiency (e.g., indel rate) of at least 20%, 30%, 40%, 45%, 50%, 55%, 60%, 65%, 70%, 75%, 80%, 85% or more.
Engineered Cas12i effector proteins
The present application provides engineered Cas12i (e.g., cas12i 2) effector proteins with improved activity, e.g., target binding, double strand cleavage activity, nickase activity, and/or gene editing activity. In some embodiments, engineered Cas12 i-effector proteins (e.g., cas12i nuclease, cas12i nickase, cas12i fusion effector protein, or split (split) Cas12i effector protein) are provided that comprise any of the engineered Cas12i nucleases or functional derivatives thereof described herein.
Variants
The present application provides engineered Cas12i effector proteins comprising functional variants of the engineered Cas12i nucleases described herein. In some embodiments, the amino acid sequence of the functional variant is different (e.g., has a deletion, insertion, substitution, and/or fusion) by at least one amino acid residue when compared to the amino acid sequence of the corresponding engineered Cas12i nuclease. In some embodiments, the functional variant has one or more mutations, such as amino acid substitutions, insertions, and/or deletions. For example, a functional variant can comprise any one of 1, 2, 3,4, 5, 6, 7, 8, 9, 10 or more amino acid substitutions as compared to an engineered Cas12i nuclease. In some embodiments, the one or more substitutions are conservative substitutions. In some embodiments, the functional variant has all of the domains of an engineered Cas12i nuclease. In some embodiments, the functional variant does not have one or more domains of an engineered Cas12i nuclease.
For any of the Cas12i variant proteins described herein (e.g., nickase Cas12i protein, inactivated or catalytically inactivated Cas12i (dCas 12 i), fused Cas12 i), the Cas12i variant can include a Cas12i protein sequence with the same parameters described above (e.g., domains present, percent identity, etc.).
Catalytic activity
In some embodiments, the functional variant has a different catalytic activity compared to the non-mutated form of the Cas12i nuclease it is engineered. In some embodiments, the mutation (e.g., amino acid substitution, insertion, and/or deletion) is in a catalytic domain (e.g., ruvC domain) of the Cas12i effector protein. In some embodiments, the variant comprises mutations in a plurality of catalytic domains. Cas12i effector proteins that cleave one strand of a double-stranded target nucleic acid without cleaving the other strand are referred to herein as "nickases" (e.g., "Cas12i nickases"). A Cas12i protein having substantially no nuclease activity is referred to herein as an inactivated Cas12i protein ("dCas 12 i") (stating that in the case of a fused Cas12i effector protein, a heterologous polypeptide (fusion partner) can provide nuclease activity, as will be described in detail below). In some embodiments, a Cas12i effector protein is considered to lack substantially all DNA cleavage activity when the DNA cleavage activity of the mutant enzyme is less than about 25%, 10%, 5%, 1%, 0.1%, 0.01% or less relative to its non-mutated form.
Split Cas12i effector proteins
The present application also provides split Cas12i effector proteins based on any one of the engineered Cas12i effector proteins described herein. A split-type Cas12i effector protein may be advantageous for delivery. In some embodiments, the engineered Cas12i effector protein is split into two portions of an enzyme that can be reconstituted together to provide a substantially functional Cas12i effector protein. Providing Cas effector proteins using known methods, e.g., split versions of Cas12 and Cas9 proteins have been described, e.g., in WO2016/112242, WO2016/205749, and PCT/CN 2020/111057, which are incorporated herein by reference in their entirety.
In some embodiments, a split Cas12i effector protein is provided, comprising a first polypeptide comprising an N-terminal portion of any one of the engineered Cas12i nucleases described herein, or a functional derivative thereof, and a second polypeptide comprising a C-terminal portion of the engineered Cas12i nuclease or a functional derivative thereof, wherein the first and second polypeptides are capable of associating with each other in the presence of a guide RNA comprising a guide sequence to form a CRISPR complex that specifically binds to a target nucleic acid comprising a target sequence complementary to the guide sequence. In some embodiments, the first and second polypeptides each comprise a dimerization domain. In some embodiments, the first dimerization domain and the second dimerization domain are associated with each other in the presence of an inducing agent (e.g., rapamycin). In some embodiments, the first and second polypeptides do not comprise a dimerization domain. In some embodiments, the split-type Cas12i effector protein is self-induced.
The partitioning can be performed in a manner that does not affect the catalytic domain. Cas12i effector proteins can be used as nucleases (including nickases) or can be inactivated enzymes, which are essentially RNA-guided DNA binding proteins with little or no catalytic activity (e.g., due to mutations in their catalytic domains).
In some embodiments, the nuclease leaf and the alpha-helical leaf of the Cas12i protein are expressed as separate polypeptides. Although the leaves do not interact by themselves, the RNA guide sequences recruit them into a complex that recapitulates the activity of the full-length Cas12i enzyme and catalyzes site-specific DNA cleavage. In some embodiments, the modified RNA guide sequence may be used to abrogate the activity of the split-type enzyme by preventing dimerization, thereby allowing the development of an inducible dimerization system. Such split-type enzymes are described, for example, in Wright, addison V., et al, "Rational design of a split-Cas9 enzyme complex," Proc. Nat' l.Acad.Sci.,112.10 (2015): 2984-2989, which is incorporated herein by reference in its entirety.
The split Cas12i effector protein portions described herein can be designed by splitting (i.e., splitting) a reference engineered Cas12i effector protein (e.g., a full-length engineered Cas12 i) in half at one split position, which is the point at which the N-terminal portion is separated from the C-terminal portion of the reference Cas12i effector protein. In some embodiments, the N-terminal portion comprises amino acid residues 1 to X and the C-terminal portion comprises amino acid residues X +1 to the C-terminus of the reference Cas12i effector protein. In this example, the numbering is consecutive, but this is not necessary as it is also contemplated that amino acids (or nucleotides encoding them) may be trimmed from any of the cleaved ends and/or mutations (e.g., insertions, deletions and substitutions) in the inner region of the polypeptide chain, provided that the reconstituted Cas12i effector protein retains sufficient DNA binding activity (if desired), DNA nickase or cleavage activity, e.g., has at least 40%, 50%, 60%, 70%, 80%, 90% or 95% activity compared to the reference Cas12i effector protein.
The partitioning point can be designed in silico and cloned into the construct. In this process, mutations can be introduced into the split-type Cas12i effector protein, and the non-functional domains can be removed. In some embodiments, the two portions or fragments (i.e., N-terminal and C-terminal fragments) of the split Cas12i effector protein may form a complete Cas12i effector protein comprising, for example, at least 70%, at least 80%, at least 90%, at least 95%, or at least 99% of the complete Cas12i effector protein sequence.
The split Cas12i effector proteins may each comprise one or more dimerization domains. In some embodiments, the first polypeptide comprises a first dimeric domain fused to a first split-type Cas12i effector protein moiety and the second polypeptide comprises a second dimeric domain fused to a second split-type Cas12i effector protein moiety. The dimerization domain may be fused to the split Cas12i effector protein portion by a peptide linker (e.g., a flexible peptide linker such as a GS linker) or a chemical bond. In some embodiments, the dimerization domain is fused to the N-terminus of the split Cas12i effector protein portion. In some embodiments, the dimerization domain is fused to the C-terminus of the split Cas12i effector protein portion.
In some embodiments the split Cas12i effector protein does not comprise any dimerization domain.
In some embodiments, the dimerization domain promotes association of two split Cas12i effector protein moieties. In some embodiments, the split-type Cas12i effector protein portion is induced by an inducing agent to associate or dimerize to a functional Cas12i effector protein. In some embodiments, the split Cas12i effector protein comprises an inducible dimerization domain. In some embodiments, the dimerization domain is not an inducible dimerization domain, i.e., the dimerization domain dimerizes in the absence of an inducing agent.
The inducing agent can be an inducing energy source or inducing molecule other than a guide RNA (e.g., sgRNA). The inducing agent reconstitutes the two split Cas12i effector protein portions into a functional Cas12i effector protein through induced dimerization of the dimerization domains. In some embodiments, the inducing agent aggregates the two split Cas12i effector protein moieties together by the action of an induced association of an inducible dimerization domain. In some embodiments, the two split-type Cas12i effector protein portions do not associate with each other to reconstitute into a functional Cas12i effector protein in the absence of an inducing agent. In some embodiments, in the absence of an inducing agent, two separate Cas12i effector protein portions may associate with each other in the presence of a guide RNA (e.g., crRNA) to reconstitute a functional Cas12i effector protein.
The inducing agent of the present application may be heat, ultrasound, electromagnetic energy, or a chemical compound. In some embodiments, the inducing agent is an antibiotic, a small molecule, a hormone derivative, a steroid, or a steroid derivative. In some embodiments, the inducing agent is abscisic acid (ABA), doxycycline (DOX), cumene carboxylic acid (cumate), rapamycin, 4-hydroxy tamoxifen (4 OHT), estrogen, or ecdysone. In some embodiments, the split-type Cas12i effector system is an inducer controlled system selected from the group consisting of: antibiotic-based induction systems, electromagnetic energy-based induction systems, small molecule-based induction systems, nuclear receptor-based induction systems, and hormone-based induction systems. In some embodiments, the split-type Cas12i effector system is an inducer controlled system selected from the group consisting of: tetracycline (Tet)/DOX induction system, light induction system, ABA induction system, isopropyl benzoate (cumate) repressor/operator system, 4 OHT/estrogen induction system, ecdysone-based induction system, and FKBP12/FRAP (FKBP 12-rapamycin complex) induction system. Such inducers are also discussed herein and in PCT/US2013/051418, which is incorporated herein by reference in its entirety. The FRB/FKBP/rapamycin system has been described in Paulmurugan and Gambrir, cancer Res, august 15,200565;7413, and Crabtree et al, chemistry & Biology 13,99-107, jan 2006, which are incorporated herein by reference in their entirety.
In some embodiments, the paired split-type Cas12i effector proteins are separated and inactive until dimerization of the dimerization domains (e.g., FRB and FKBP) is induced, which leads to reassembly of a functional Cas12i effector protein nuclease. In some embodiments, a first split Cas12i effector protein comprising a first half of an inducible dimer (e.g., FRB) is delivered separately and/or at a separate location from a second split Cas12i effector protein comprising a second half of an inducible dimer (e.g., FKBP).
Other exemplary FKBP-based induction systems that can be used in the inducer controlled split Cas12i effector systems described herein include, but are not limited to: FKBP that dimerizes with Calcineurin (CNA) in the presence of FK 506; FKBP dimerized with CyP-Fas in the presence of FKCsA; FKBP dimerized with FRB in the presence of rapamycin; gyrB dimerizing with GryB in the presence of coumaromycin; GAI that dimerizes with GID1 in the presence of gibberellin; or Snap-tag that dimerizes with HaloTag in the presence of HaXS.
Alternatives within the FKBP family itself are also contemplated. For example, FKBP homodimerizes in the presence of FK1012 (i.e., one FKBP dimerizes with another FKBP).
In some embodiments, the dimerization domain is FKBP and the inducing agent is FK1012. In some embodiments, the dimerization domain is GryB and the inducing agent is coumaromycin. In some embodiments, the dimerization domain is ABA and the inducing agent is gibberellin.
In some embodiments, the split-type Cas12i effector protein portion can be automatically induced (i.e., automatically activated or self-induced) to associate/dimerize to a functional Cas12i effector protein in the absence of an inducing agent. Without being bound by any theory or hypothesis, auto-induction of the split-type Cas12i effector protein portion may be mediated by binding to a guide RNA, such as crRNA. In some embodiments, the first and second polypeptides do not comprise a dimerization domain. In some embodiments, the first and second polypeptides comprise a dimerization domain.
In some embodiments, a reconstituted Cas12i effector protein of a split Cas12i effector system (including inducer controlled and auto-induced systems) described herein has an editing efficiency of at least 70% (such as at least about any one of 70%, 75%, 80%, 85%, 90%, 95%, 98%, 99% or more efficiency, or 100% efficiency) relative to a reference Cas12i effector protein editing efficiency.
In some embodiments, a reconstituted Cas12i effector protein of an inducer-controlled split-type Cas12i effector system described herein has an editing efficiency of no more than 50% (such as no more than about any of 50%, 45%, 40%, 35%, 30%, 25%, 20%, 15%, 10%, 5% or less efficiency, or 0% efficiency) relative to a reference Cas12i effector protein editing efficiency in the absence of an inducer (i.e., due to auto-induction).
Fusion of Cas12i effector proteins
The present application also provides engineered Cas12i effector proteins comprising additional protein domains and/or components, such as linkers, nuclear localization/export sequences, functional domains, and/or reporter proteins.
In some embodiments, the engineered Cas12i effector protein is a protein complex comprising one or more heterologous protein domains (e.g., about or greater than about 1, 2, 3,4, 5, 6, 7, 8, 9, 10 or more domains) and a nucleic acid targeting domain of the engineered Cas12i nuclease or a functional derivative thereof. In some embodiments, the engineered Cas12i effector protein is a fusion protein comprising one or more heterologous protein domains (e.g., about or more than about 1, 2, 3,4, 5, 6, 7, 8, 9, 10 or more domains) fused to the engineered Cas12i nuclease.
In some embodiments, an engineered Cas12i effector protein of the present application may comprise (e.g., via a fusion protein, such as via one or more peptide linkers, e.g., a GS peptide linker, etc.) one or more functional domains or associate (e.g., via co-expression of multiple proteins) thereto. In some embodiments, the one or more functional domains are enzymatic domains. These functional domains can have a variety of activities, such as DNA and/or RNA methylase activity, demethylase activity, transcriptional activation activity, transcriptional repression activity, transcriptional release factor activity, histone modification activity, RNA cleavage activity, DNA cleavage activity, nucleic acid binding activity, and switching activity (e.g., photoinduced). In some embodiments, the one or more functional domains are transcriptional activation domains (i.e., transactivation domains) or repressor domains. In some embodiments, the one or more functional domains are histone modification domains. In some embodiments, the one or more functional domains are transposase domains, HR (homologous recombination) mechanism domains, recombinase domains, and/or integrase domains. In some embodiments, the functional domain is Krluppel-related cassette (KRAB), VP64, VP16, fok1, P65, HSF1, myoD1, biotin-APEX, APOBEC1, AID, pmCDA1, tad1, and M-MLV reverse transcriptase. In some embodiments, the functional domain is selected from the group consisting of: a translation initiation domain, a transcription repression domain, a transactivation domain, an epigenetic modification domain, a nucleobase editing domain (e.g., a CBE or ABE domain), a reverse transcriptase domain, a reporter domain (e.g., a fluorescent domain), and a nuclease domain.
In some embodiments, the localization of one or more functional domains in the engineered Cas12i effector protein allows for the correct spatial orientation of the functional domains to affect targets with the conferred functional effect. For example, if the functional domain is a transcriptional activator (e.g., VP16, VP64, or p 65), the transcriptional activator is placed in a spatial orientation such that it is capable of affecting transcription of the target. Likewise, a transcription repressor is positioned to affect transcription of the target, and a nuclease (e.g., fok 1) is positioned to cleave, or partially cleave, the target. In some embodiments, the functional domain is N-terminal to the engineered Cas12i effector protein. In some embodiments, the functional domain is located C-terminal to the engineered Cas12i effector protein. In some embodiments, the engineered Cas12i effector protein comprises a first functional domain at the N-terminus and a second functional domain at the C-terminus. In some embodiments, the engineered Cas12i effector protein comprises a catalytically inactive mutant of any one of the engineered Cas12i nucleases described herein fused to one or more functional domains.
In some embodiments, the engineered Cas12i effector protein is a transcriptional activator. In some embodiments, the engineered Cas12i effector protein comprises an enzyme inactivating variant of any one of the engineered Cas12i nucleases described herein fused to a transactivation domain. In some embodiments, the transactivation domain is selected from the group consisting of: VP64, p65, HSF1, VP16, myoD1, HSF1, RTA, SET7/9, and combinations thereof. In some embodiments, the transactivation domain comprises VP64, p65, and HSF1. In some embodiments, the engineered Cas12i effector protein comprises two split Cas12i effector polypeptides, each fused to a transactivation domain.
In some embodiments, the engineered Cas12i effector protein is a transcription repressor. In some embodiments, the engineered Cas12i effector protein comprises an enzyme-inactivating variant of any one of the engineered Cas12i nucleases described herein fused to a transcription repression domain. In some embodiments, the transcriptional repressor domain is selected from the group consisting of: krluppel correlation box (KRAB), enR, nuE, ncoR, SID4X and combinations thereof. In some embodiments, the engineered Cas12 i-effector protein comprises two split Cas12 i-effector polypeptides, each fused to a transcriptional repression domain.
In some embodiments, the engineered Cas12i effector protein is a base editor, such as a cytosine editor or an adenosine editor. In some embodiments, the engineered Cas12i effector protein comprises an enzyme-inactivating variant of any of the engineered Cas12i nucleases described herein fused to a nucleobase-editing domain, such as a Cytosine Base Editing (CBE) domain or an Adenosine Base Editing (ABE) domain. In some embodiments, the nucleobase-editing domain is a DNA-editing domain. In some embodiments, the nucleobase editing domain has deaminase activity. In some embodiments, the nucleobase editing domain is a cytosine deaminase domain. In some embodiments, the nucleobase-editing domain is an adenosine deaminase domain. Exemplary base editors based on Cas nucleases are described, for example, in WO2018/165629A1 and WO2019/226953A1, which are incorporated herein by reference in their entirety. Exemplary CBE domains include, but are not limited to: activation-induced cytidine deaminase or AID (e.g., hAID), apolipoprotein B mRNA editing complex or APOBEC (e.g., rat APOBEC1, hAPOBEC 3A/B/C/D/E/F/G), and PmCDA1. Exemplary ABE domains include, but are not limited to: tadA, ABE8 and variants thereof (see, e.g., gaudell et al, 2017, nature 551 464-471 and Richter et al, 2020, nature Biotechnology 38. In some embodiments, the functional domain is an APOBEC1 domain, e.g., a rat APOBEC1 domain. In some embodiments, the functional domain is a TadA domain, for example an e.coli (e.coli) TadA domain. In some embodiments the engineered Cas12i effector protein further comprises one or more nuclear localization sequences.
In some embodiments, the engineered Cas12i effector protein is a master editor. Cas 9-based master editors are described, for example, in a. Azalone et al, nature,2019,576 (7785): 149-157, which is incorporated herein by reference in its entirety. In some embodiments, the engineered Cas12i effector protein comprises a nickase variant of any of the engineered Cas12i nucleases described herein fused to a reverse transcriptase domain. In some embodiments, the functional domain is a reverse transcriptase domain. In some embodiments, the reverse transcriptase domain is an M-MLV reverse transcriptase or a variant thereof, e.g., an M-MLV reverse transcriptase having one or more mutations of D200N, T32306K, W313F, T P and L603W. In some embodiments, an engineered CRISPR/Cas12i system comprising the master editor is provided. In some embodiments, the engineered CRISPR/Cas12i system further comprises a second Cas12i nickase, e.g., based on the same engineered Cas12i nuclease as the main editor. In some embodiments, the engineered CRISPR/Cas12i system comprises a master editor guide RNA (pegRNA) comprising a primer binding site and a Reverse Transcriptase (RT) template sequence.
In some embodiments, the present application provides a split-Cas 12i effector system having one or more (e.g., 1, 2, 3,4, 5, 6, or more) functional domains associated with (i.e., bound to or fused to) one or both split-Cas 12i effector protein moieties. The functional domain may be provided as part of the first and/or second split-type Cas12i effector protein, as a fusion within the construct. The functional domain is typically fused to other portions of the split Cas12i effector protein (e.g., the split Cas12i effector protein portion) by a peptide linker (such as a GS linker). These functional domains can be used to re-switch the function of the split Cas12i effector system based on catalytically inactive Cas12i effector proteins.
In some embodiments, the engineered Cas12i effector protein comprises one or more Nuclear Localization Sequences (NLS) and/or one or more Nuclear Export Sequences (NES). Exemplary NLS sequences include, for example, PKKKRKVPG and ASPKKKRKV. The NLS and/or NES may be operably linked to the N-terminus and/or C-terminus of the engineered Cas12i effector protein or a polypeptide chain in the engineered Cas12i effector protein.
In some embodiments, the engineered Cas12i effector protein may encode additional components, such as a reporter protein. In some embodiments, the engineered Cas12i effector protein comprises a fluorescent protein, such as GFP. Such a System may allow for Imaging of genomic sites (see, e.g., "Dynamic Imaging of genomic Loci in Living Human Cells by an Optimized CRISPR/Cas System" Chen B et al. In some embodiments the engineered Cas12i effector protein is an inducible split Cas12i effector system that can be used to image genomic sites.
In yet another specific embodiment, an engineered Cas12i effector protein is provided, wherein the effector protein is capable of inducing a double strand break or a single strand break in a DNA molecule.
In yet another specific embodiment, an engineered Cas12i effector protein is provided, wherein said functional derivative of an engineered Cas12i nuclease is an enzyme inactivating mutant, such as a Cas12i2 nuclease inactivating mutant comprising D599A, E833A, S883A, H884A, D886A, R a and/or D1019A, and a Cas12i1 nuclease inactivating mutant comprising D647A, E894A and/or D948A. Known enzymatically inactive mutants of Cas12i2 nuclease, such as any of the enzymatically inactive mutants of Cas12i2 nuclease described in US10808245B2 and Huang x.et al, nature Communications,11, aromatic number 5241 (2020), can be combined with the mutants herein to provide functional derivatives of an engineered Cas12i nuclease and its corresponding effector proteins.
In yet another specific embodiment, an engineered Cas12i effector protein is provided, further comprising a functional domain fused to the engineered Cas12i nuclease.
In yet another specific embodiment, an engineered Cas12i effector protein is provided, wherein the functional domain is selected from the group consisting of: a translation initiation domain, a transcription repression domain, a transactivation domain, an epigenetic modification domain, a nucleobase editing domain, a reverse transcriptase domain, a reporter domain, and a nuclease domain.
Engineered CRISPR-Cas12i systems
In some embodiments, an engineered CRISPR-Cas12i system is provided, comprising: (a) Any of the engineered Cas12i effector proteins described herein (e.g., engineered Cas12i nucleases); and (b) a guide RNA comprising a guide sequence complementary to the target sequence, or one or more nucleic acids encoding the guide RNA,
wherein the engineered Cas12i effector protein and the guide RNA are capable of forming a CRISPR complex that specifically binds to a target nucleic acid comprising the target sequence and induces a modification of the target nucleic acid.
In some embodiments, the engineered CRISPR-Cas12i system comprises: (a) Any of the engineered Cas12i effector proteins described herein (e.g., an engineered Cas12i nuclease, nickase, split Cas12i, transcription repressor, transcription activator, base editor, or master editor); and (b) a guide RNA comprising a guide sequence complementary to a target sequence, or one or more nucleic acids encoding the guide RNA, wherein the engineered Cas12i effector protein and the guide RNA are capable of forming a CRISPR complex that specifically binds to and induces a modification of a target nucleic acid comprising the target sequence. In some embodiments, the engineered CRISPR-Cas12i system comprises one or more nucleic acids encoding the engineered Cas12i effector protein and/or the guide RNA. In some embodiments, the engineered CRISPR-Cas12i system comprises a precursor guide RNA array that can be processed into multiple crrnas, e.g., by the engineered Cas12i effector protein. In some embodiments, the engineered CRISPR-Cas12i system comprises one or more vectors encoding the engineered Cas12i effector protein and/or the guide RNA. In some embodiments, the engineered CRISPR-Cas12i system comprises a Ribonucleoprotein (RNP) complex comprising the engineered Cas12i effector protein bound to the guide RNA.
The engineered CRISPR-Cas12i system of the present application may comprise any suitable guide RNA. The guide RNA (gRNA) can comprise a guide sequence capable of hybridizing to a target sequence in a target nucleic acid of interest, such as a genomic site of interest in a cell. In some embodiments, the gRNA comprises a CRISPR RNA (crRNA) sequence that contains the guide sequence.
Typically, the crRNA described herein includes a direct repeat sequence and a spacer sequence. In certain embodiments, the crRNA comprises, consists essentially of, or consists of a direct repeat sequence linked to a guide sequence or a spacer sequence. In some embodiments, the crRNA includes a direct repeat sequence, a spacer sequence, and a direct repeat sequence (DR-spacer sequence-DR), which are typical features of the precursor crRNA (pre-crRNA) configuration. In some embodiments, the crRNA includes a truncated direct repeat sequence and a spacer sequence, which are typical features of processed or mature crRNA. In some embodiments, the CRISPR-Cas12i effector protein forms a complex with an RNA guide sequence and the spacer sequence directs the complex to sequence-specific binding to a target nucleic acid that is complementary to the spacer sequence.
In some embodiments, the guide RNA is a crRNA comprising a guide sequence. In some embodiments, the engineered CRISPR-Cas12i system comprises a precursor guide RNA array encoding a plurality of crrnas. In some embodiments, the Cas12i effector protein cleaves the precursor guide RNA array to generate a plurality of crrnas. In some embodiments, the engineered CRISPR-Cas12i system comprises a precursor guide RNA array encoding a plurality of crrnas, wherein each crRNA comprises a different guide sequence.
The guide sequence may be of suitable length. In some embodiments, the guide sequence is between about 18 to about 35 nucleotides, including, for example, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, or 35 nucleotides. The guide sequence may have at least about 70%, 75%, 80%, 85%, 90%, 95%, 98%, 99%, or 100% complementarity to the target sequence of the target nucleic acid.
Constructs and vectors
Also provided herein are constructs, vectors, and expression systems encoding any of the engineered Cas12i effector proteins (e.g., engineered Cas12i nucleases) described herein. In some embodiments, the construct, vector, or expression system further comprises one or more grnas or crRNA arrays.
A "vector" is a composition of matter that comprises an isolated nucleic acid and can be used to deliver the isolated nucleic acid to the interior of a cell. Many vectors are known in the art, including but not limited to: linear polynucleotides, polynucleotides associated with ions or amphiphilic compounds, plasmids, and viruses. Generally, suitable vectors comprise an origin of replication functional in at least one organism, a promoter sequence, a convenient restriction endonuclease site and one or more selectable markers. The term "vector" should also be construed to include non-plasmid and non-viral compounds that facilitate transfer of nucleic acids into cells, such as, for example, polylysine compounds, liposomes, and the like.
In some embodiments, the vector is a viral vector. Examples of viral vectors include, but are not limited to: adenovirus vectors, adeno-associated virus vectors, lentivirus vectors, retrovirus vectors, vaccinia vectors, herpes simplex virus vectors, and derivatives thereof. In some embodiments, the vector is a phage vector. Viral vector technology is well known in the art and is described, for example, in Sambrook et al (2001, molecular cloning.
Many virus-based systems have been developed for gene transfer into mammalian cells. For example, retroviruses provide a convenient platform for gene delivery systems. Heterologous nucleic acids can be inserted into vectors and packaged into retroviral particles using techniques known in the art. Recombinant viruses can then be isolated and delivered to the engineered mammalian cells in vitro or ex vivo. Many retroviral systems are known in the art. In some embodiments, an adenoviral vector is used. Many adenoviral vectors are known in the art. In some embodiments, a lentiviral vector is used. In some embodiments, a self-inactivating lentiviral vector is used.
In certain embodiments, the vector is an adeno-associated virus (AAV) vector, e.g., AAV2, AAV8, or AAV9, which may comprise at least 1 x 10 5 A single dose of individual particles (also referred to as particle units, pu) is administered with adenovirus or adeno-associated virus. In some embodiments, the amount administered is at least about 1 × 10 6 Particles of at least about 1X 10 7 Particles of at least about 1X 10 8 Per particle, or at least about 1X 10 9 Individual particles of adeno-associated virus. Delivery methods and dosage amounts are described, for example, in WO 2016205764 and U.S. Pat. No.8,454,972, by integral referenceIncorporated herein by reference.
In some embodiments, the vector is a recombinant adeno-associated virus (rAAV) vector. For example, in some embodiments, modified AAV vectors can be used for delivery. The modified AAV vector may be based on one or more of several capsid types, including AAV1, AV2, AAV5, AAV6, AAV8, AAV8.2, AAV9, AAVrh10, modified AAV vectors (e.g., modified AAV2, modified AAV3, modified AAV 6) and pseudotyped AAV (e.g., AAV2/8, aav2/5 and AAV 2/6). Exemplary AAV vectors and techniques that can be used to produce rAAV particles are known in the art (see, e.g., aponte-Ulillus et al, (2018) appl.Microbiol.Biotechnol.102 (3): 1045-54.
Any known AAV vector for delivery of Cas9 and other Cas proteins can be used to deliver the engineered Cas12i system of the present application.
Methods for introducing vectors into mammalian cells are known in the art. The vector may be transferred into the host cell by physical, chemical or biological means.
Physical methods for introducing vectors into host cells include: calcium phosphate precipitation, lipofection, particle bombardment, microinjection, electroporation, etc. Methods for producing cells comprising vectors and/or exogenous nucleic acids are well known in the art. See, e.g., sambrook et al, (2001) Molecular Cloning, laboratory Manual, cold Spring Harbor Laboratory, new York. In some embodiments, the vector is introduced into the cell by electroporation.
Biological methods for introducing heterologous nucleic acids into host cells include the use of DNA and RNA vectors. Viral vectors have become the most widely used method for inserting genes into mammalian, e.g., human, cells.
Chemical methods for introducing vectors into host cells include colloidally dispersed systems such as macromolecular complexes, nanocapsules, microspheres, beads and lipid-based systems including oil-in-water emulsions, micelles, mixed micelles and liposomes. An exemplary colloidal system for use as an in vitro delivery vehicle is a liposome (e.g., an artificial membrane vesicle). In some embodiments, the engineered CRISPR-Cas12i system is delivered in the form of RNPs in nanoparticles.
In some embodiments, the vector or expression system encoding the CRISPR-Cas12i system or components thereof comprises one or more selectable or detectable markers that provide a means to isolate or efficiently select cells that contain and/or have been modified by the CRISPR-Cas12i system (e.g., at an early stage and large scale).
Reporter genes can be used to identify potentially transfected cells and to evaluate the function of regulatory sequences. Typically, a reporter gene is a gene that is not present or expressed in the recipient organism or tissue and whose encoded polypeptide expression is evidenced by certain easily detectable properties (e.g., enzymatic activity). Expression of the reporter gene is determined at a suitable time after introduction of the DNA into the recipient cells. Suitable reporter genes may include genes encoding luciferase, β -galactosidase, chloramphenicol acetyltransferase, secreted alkaline phosphatase, or green fluorescent protein genes (e.g., ui-Tei et al. Febs Letters 479 (2000)).
Other methods of confirming the presence of a heterologous nucleic acid in a host cell include, for example, molecular biological assays well known to those skilled in the art, such as Southern and Northern blots, RT-PCR and PCR; biochemical assays, for example, to detect the presence or absence of a particular peptide by immunological methods such as ELISA and Western blotting.
In some embodiments, the nucleic acid sequence encoding the engineered Cas12i effector protein and/or the guide RNA is operably linked to a promoter. In some embodiments, the promoter is an endogenous promoter of a cell engineered with the engineered CRISPR-Cas12i system. For example, the nucleic acid encoding the engineered Cas12i effector protein can be knocked into the genome of the engineered mammalian cell downstream of the endogenous promoter using any method known in the art. In some embodiments, the endogenous promoter is a promoter for an abundant protein (e.g., β -actin). In some embodiments, the endogenous promoter is an inducible promoter, e.g., inducible by endogenous activation signals of the engineered mammalian cell. In some embodiments, wherein the engineered mammalian cell is a T cell, the promoter is a T cell activation-dependent promoter (such as an IL-2 promoter, NFAT promoter, or nfkb promoter).
In some embodiments, the promoter is a heterologous promoter relative to a cell engineered with the engineered CRISPR-Cas12i system. A variety of promoters have been explored to express genes in mammalian cells, and any promoter known in the art can be used in the present application. Promoters can be broadly classified as constitutive promoters or regulated promoters, such as inducible promoters.
In some embodiments, the nucleic acid sequence encoding the engineered Cas12i effector protein and/or the guide RNA is operably linked to a constitutive promoter. Constitutive promoters allow constitutive expression of a heterologous gene (also known as a transgene) in a host cell. Exemplary constitutive promoters contemplated herein include, but are not limited to: the Cytomegalovirus (CMV) promoter, human elongation factor-1 α (hEF 1 α), ubiquitin C promoter (UbiC), phosphoglycerate kinase Promoter (PGK), simian virus 40 early promoter (SV 40), and chicken β -actin promoter are coupled with CMV early enhancer (CAG). In some embodiments, the promoter is a CAG promoter comprising a Cytomegalovirus (CMV) early enhancer element, a promoter, a first exon and a first intron of a chicken β -actin gene, and a splice acceptor of a rabbit β -globin gene.
In some embodiments, the nucleic acid sequence encoding the engineered CRISPR-Cas12i effector protein and/or the guide RNA is operably linked to an inducible promoter. Inducible promoters are among the regulated promoter types. The inducible promoter may be induced by one or more conditions such as physical conditions, microenvironment, or physiological state of the host cell, an inducer (i.e., an inducing agent), or a combination thereof. In some embodiments, the induction conditions are selected from the group consisting of: inducers, radiation (e.g. ionizing radiation, light), temperature (e.g. heat), redox status, tumor environment and activation status of the cell to be engineered by the engineered CRISPR-Cas12i system. In some embodiments, the promoter may be induced by a small molecule inducer, such as a chemical compound. In some embodiments, the small molecule is selected from the group consisting of: doxycycline, tetracycline, alcohol, metal, or steroid. Chemically induced promoters have been most extensively studied. Such promoters include those whose transcriptional activity is regulated by the presence or absence of small molecule chemicals such as doxycycline, tetracycline, alcohols, steroids, metals, and other compounds. The doxycycline inducible system with a retro-tetracycline controlled transactivator (rtTA) and a tetracycline responsive element promoter (TRE) is currently the most mature system. WO9429442 describes the tight control of gene expression in eukaryotic cells by tetracycline-responsive promoters. WO9601313 discloses tetracycline-regulated transcriptional modulators. Com web sites have also been described for Tet technologies such as the Tet-on system. In the present application, any known chemically regulated promoter can be used to drive expression of the gene encoding the engineered CRISPR-Cas12i protein and/or the guide RNA.
In some embodiments, the nucleic acid sequence encoding the engineered Cas12i effector protein is codon optimized.
In some embodiments, expression constructs are provided comprising a codon optimized sequence encoding the engineered Cas12i effector protein linked to a BPK2104-ccdB vector. In some embodiments, the expression construct encodes a tag (e.g., a10 xHis tag) operably linked to the C-terminus of the engineered Cas12i effector protein.
In some embodiments, each engineered split Cas12i construct encodes a fluorescent protein such as GFP or RFP. The reporter protein can be used to assess co-localization and/or dimerization of the engineered Cas12i protein, e.g., using a microscope. The nucleic acid sequence encoding the engineered Cas12i effector protein may be fused to nucleic acid sequences encoding additional components using sequences encoding self-cleaving peptides such as T2A, P2A, E a or F2A peptides.
In some embodiments, expression constructs for mammalian cells (e.g., human cells) are provided that comprise a nucleic acid sequence encoding the engineered Cas12i effector protein. In some embodiments, the expression construct comprises a codon optimized sequence encoding the engineered Cas12i effector protein inserted into a pCAG-2A-eGFP vector, thereby operably linking the Cas12i protein to the eGFP. In some embodiments, a second vector is provided for expressing a guide RNA (e.g., a crRNA or precursor crRNA array) in a mammalian cell (e.g., a human cell). In some embodiments, the sequence encoding the guide RNA is expressed in a pUC19-U6-i2-crRNA vector backbone.
Method of use
The present application provides methods of detecting a target nucleic acid or modified nucleic acid in vitro, ex vivo, or in vivo using any of the engineered Cas12i effector proteins or CRISPR-Cas12i systems described herein, as well as methods of treatment or diagnosis using the engineered Cas12i effector proteins or CRISPR-Cas12i systems. Also provided is the use of an engineered Cas12i effector protein or CRISPR-Cas12i system described herein for detecting or modifying nucleic acids in a cell, and for treating or diagnosing a disease or condition in a subject; and the use of a composition comprising any of the engineered Cas12i effector proteins or one or more components of the engineered CRISPR-Cas12i system in the manufacture of a medicament for detecting or modifying nucleic acids in a cell and for treating or diagnosing a disease or condition in a subject.
Method for detecting target nucleic acid in sample
The present application also provides methods of detecting a target nucleic acid using any of the engineered Cas12i effector proteins or CRISPR-Cas12i systems with improved activity. Using Cas12i effector proteins as detection reagents the following findings can be utilized: once activated by detection of target DNA, the V-type CRISPR/Cas protein (e.g., cas12 i) can promiscuously cleave non-targeted single-stranded DNA (ssDNA or RNA, i.e., single-stranded nucleic acid to which the guide sequence of the guide RNA does not hybridize). Thus, when target DNA (double-stranded or single-stranded) is present in the sample (e.g., exceeding a threshold amount in some cases), the result is cleavage of single-stranded nucleic acid in the sample, which can be detected using any convenient detection method (e.g., using a tagged single-stranded detection nucleic acid such as DNA or RNA). Cas12i can cleave ssDNA and ssRNA. Methods of using, for example, cas proteins as detection reagents are described in US10253365 and WO2020/056924, which are incorporated herein by reference in their entirety.
In some embodiments, methods of detecting a target DNA (e.g., double-stranded or single-stranded) in a sample are provided, comprising: (ii) (a) contacting the sample with: (i) Any one of the engineered Cas12i effector proteins described herein; (ii) A guide RNA comprising a guide sequence that hybridizes to the target DNA; and (iii) a detector nucleic acid that is single-stranded (i.e., a "single-stranded detector nucleic acid") and does not hybridize to the guide sequence of the guide RNA; and (b) measuring a detectable signal generated by cleavage of the single stranded detection nucleic acid by the engineered Cas12i effector protein. In certain instances, the single stranded detection nucleic acid comprises a fluorescent-emitting dye pair (e.g., the fluorescent-emitting dye pair is a Fluorescence Resonance Energy Transfer (FRET) pair, a quencher/fluorescence pair). In some cases, the target DNA is viral DNA (e.g., papillomavirus, hepadnavirus, herpesvirus, adenovirus, poxvirus, parvovirus, etc.). In some embodiments, the single stranded detection nucleic acid is DNA. In some embodiments, the single stranded detection nucleic acid is RNA.
The method for detecting a target DNA (single-stranded or double-stranded) in a sample of the present disclosure can detect the target DNA with high sensitivity. In some cases, the methods of the present disclosure can be used to detect target DNA present in a sample comprising a plurality of DNAs (including the target DNA and a plurality of non-target DNAs), wherein the target DNA is present every 10 7 One or more copies of the non-target DNA are present (e.g., every 10 6 One or more copies per 10 of non-target DNA 5 One or more copies of non-target DNAEvery 10 th 4 One or more copies of each non-target DNA, every 10 3 One or more copies per 10 of non-target DNA 2 One or more copies of each non-target DNA, one or more copies of each 50 non-target DNAs, one or more copies of each 20 non-target DNAs, one or more copies of each 10 non-target DNAs or one or more copies of each 5 non-target DNAs). In some embodiments, the engineered Cas12i effector proteins described herein can detect target DNA with greater sensitivity than the reference Cas12i nuclease. In some embodiments, the engineered Cas12i effector protein can detect target DNA with a sensitivity of 10%, 15%, 20%, 25%, 30%, 40%, 50%, 60%, 70%, 80%, 90% or more compared to the reference Cas12i nuclease.
Method of modification
In some embodiments, the present application provides methods of modifying a target nucleic acid comprising a target sequence, the methods comprising contacting the target nucleic acid with any of the engineered CRISPR-Cas12i systems described herein. In some embodiments, the method is performed in vitro. In some embodiments, the target nucleic acid is present in a cell. In some embodiments, the cell is a bacterial cell, a yeast cell, a mammalian cell, a plant cell, or an animal cell. In some embodiments, the method is performed ex vivo. In some embodiments, the method is performed in vivo.
In some embodiments, the target nucleic acid is cleaved or a target sequence in the target nucleic acid is altered by the engineered CRISPR-Cas12i system. In some embodiments, expression of the target nucleic acid is altered by the engineered CRISPR-Cas12i system. In some embodiments, the target nucleic acid is genomic DNA. In some embodiments, the target sequence is associated with a disease or condition. In some embodiments, the engineered CRISPR-Cas12i system comprises a precursor guide RNA array encoding a plurality of crrnas, wherein each crRNA comprises a different guide sequence.
In some embodiments, the present application provides methods of treating a disease or condition associated with a target nucleic acid in a cell of a subject, comprising modifying the target nucleic acid in the cell of the subject using any of the methods described herein, thereby treating the disease or condition. In some embodiments, the disease or condition is selected from the group consisting of: cancer, cardiovascular disease, genetic disease, autoimmune disease, metabolic disease, neurodegenerative disease, eye disease, bacterial infection and viral infection.
The engineered CRISPR-Cas12i systems described herein can modify a target nucleic acid in a cell in a variety of ways, depending on the type of engineered Cas12i effector protein in the CRISPR-Cas12i system. In some embodiments, the method induces site-specific cleavage in the target nucleic acid. In some embodiments, the method cleaves genomic DNA in a cell, such as a bacterial cell, a plant cell, or an animal cell (e.g., a mammalian cell). In some embodiments, the method kills the cell by cleaving genomic DNA in the cell. In some embodiments the method cleaves viral nucleic acid in a cell.
In some embodiments, the method alters (e.g., increases or decreases) the expression level of the target nucleic acid in the cell. In some embodiments, the methods use an engineered Cas12i effector protein to increase the expression level of the target nucleic acid in the cell, e.g., based on an enzymatically inactive Cas12i protein fused to a transactivation domain. In some embodiments, the methods use an engineered Cas12i effector protein to reduce the expression level of the target nucleic acid in a cell, e.g., based on an enzymatically inactive Cas12i protein fused to a transcriptional repression domain. In some embodiments, the methods introduce epigenetic modifications into the target nucleic acid in the cell using an engineered Cas12i effector protein, e.g., based on an enzymatically inactive Cas12i protein fused to an epigenetic modification domain. The engineered Cas12i systems described herein can be used to introduce other modifications to the target nucleic acid, depending on the functional domains comprised by the engineered Cas12i effector protein.
In some embodiments, the method alters a target sequence in the target nucleic acid in the cell. In some embodiments, the method introduces a mutation into the target nucleic acid in the cell. In some embodiments, the methods use one or more endogenous DNA repair pathways, such as non-homologous end joining (NHEJ) or Homologous Directed Recombination (HDR), to repair double-strand breaks induced in target DNA in a cell as a result of sequence-specific cleavage by a CRISPR complex. Exemplary mutations include, but are not limited to: insertions, deletions, substitutions and frameshifts. In some embodiments, the method inserts donor DNA at the target site. In some embodiments, insertion of the donor DNA results in introduction of a selectable marker or reporter protein into the cell. In some embodiments, insertion of the donor DNA results in knock-in of the gene. In some embodiments, insertion of the donor DNA results in a knockout mutation. In some embodiments, insertion of the donor DNA results in a substitution mutation such as a single nucleotide substitution. In some embodiments, the method induces a phenotypic change in the cell.
In some embodiments, the engineered CRISPR-Cas12i system is used as part of a genetic circuit (genetic circuit) or for inserting a genetic circuit into genomic DNA of a cell. The inducer-controlled engineered split-type Cas12i effector proteins described herein are particularly useful as components of genetic circuits. The gene circuit can be used for gene therapy. Methods and techniques for designing and using genetic circuits are known in the art. Reference may further be made to, for example, brophy, jenniferAN, and christophera. Voigt, "Principles of genetic circuit design," Nature methods 11.5 (2014): 508.
The engineered CRISPR-Cas12i systems described herein can be used to modify a variety of target nucleic acids. In some embodiments, the target nucleic acid is in a cell. In some embodiments, the target nucleic acid is genomic DNA. In some embodiments, the target nucleic acid is extrachromosomal DNA. In some embodiments, the target nucleic acid is exogenous to the cell. In some embodiments, the target nucleic acid is a viral nucleic acid such as viral DNA. In some embodiments, the target nucleic acid is a plasmid in a cell. In some embodiments, the target nucleic acid is a horizontally transferred (horizontally transferred) plasmid. In some embodiments, the target nucleic acid is RNA.
In some embodiments, the target nucleic acid is an isolated nucleic acid such as an isolated DNA. In some embodiments, the target nucleic acid is present in a cell-free environment. In some embodiments, the target nucleic acid is an isolated vector such as a plasmid. In some embodiments, the target nucleic acid is an isolated linear DNA fragment.
The methods described herein are applicable to any suitable cell type. In some embodiments, the cell is a bacterium, a yeast cell, a fungal cell, an algal cell, a plant cell, or an animal cell (e.g., a mammalian cell, such as a human cell). In some embodiments, the cells are of natural origin, such as cells isolated from a tissue biopsy. In some embodiments, the cell is a cell isolated from a cell line cultured in vitro. In some embodiments, the cell is from a primary cell line. In some embodiments, the cell is from an immortalized cell line. In some embodiments, the cell is a genetically engineered cell.
In some embodiments, the cell is an animal cell of an organism selected from the group consisting of: cattle, sheep, goats, horses, pigs, deer, chickens, ducks, geese, rabbits and fish.
In some embodiments, the cell is a plant cell of an organism selected from the group consisting of: corn, wheat, barley, oats, rice, soybean, oil palm, safflower, sesame, tobacco, flax, cotton, sunflower, pearl millet, sorghum, oilseed rape, hemp, vegetable crops, forage crops, industrial crops, woody crops, and biomass crops.
In some embodiments, the cell is a mammalian cell. In some embodiments, the cell is a human cell. In some embodiments, the human cell is a human embryonic kidney 293T (HEK 293T or 293T) cell or a HeLa cell. In some embodiments, the cell is a human embryonic kidney (HEK 293T) cell. In some embodiments, the cell is a mouse Hepa1-6 cell. In some embodiments, the mammalian cell is selected from the group consisting of: immune cells, liver cells, tumor cells, stem cells, zygotes, muscle cells, and skin cells.
In some embodiments, the cell is an immune cell selected from the group consisting of: cytotoxic T cells, helper T cells, natural Killer (NK) T cells, iNK-T cells, NK-T like cells, γ δ T cells, tumor infiltrating T cells, and Dendritic Cell (DC) -activated T cells. In some embodiments, the methods produce modified immune cells, such as CAR-T cells or TCR-T cells.
In some embodiments, the cell is an Embryonic Stem (ES) cell, an Induced Pluripotent Stem (iPS) cell, a gamete progenitor, a gamete, a zygote, or a cell in an embryo.
The methods described herein can be used to modify a target cell in vivo, ex vivo, or in vitro, and can be performed in a manner that alters the cell such that, once modified, the progeny or cell line of the modified cell retains the altered phenotype. The modified cells and progeny may be part of a multicellular organism, such as a plant or animal having ex vivo or in vivo applications (e.g., genome editing and gene therapy).
In some embodiments, the method is performed ex vivo. In some embodiments, the modified cell (e.g., mammalian cell) is propagated ex vivo following introduction of the engineered CRISPR-Cas12i system into the cell. In some embodiments, the modified cells are cultured to propagate for at least any one of about 1 day, 2 days, 3 days, 4 days, 5 days, 6 days, 7 days, 10 days, 12 days, or 14 days. In some embodiments, the modified cell is cultured for no more than about any one of 1 day, 2 days, 3 days, 4 days, 5 days, 6 days, 7 days, 10 days, 12 days, or 14 days. In some embodiments, the modified cells are further evaluated or screened to select cells having one or more desired phenotypes or characteristics.
In some embodiments, the target sequence is a sequence associated with a disease or condition. Exemplary diseases or conditions include, but are not limited to: cancer, cardiovascular disease, genetic disease, autoimmune disease, metabolic disease, neurodegenerative disease, ocular disease, bacterial infection, and viral infection. In some embodiments, the disease or condition is a genetic disease. In some embodiments, the disease or condition is a monogenic disease or condition. In some embodiments, the disease or condition is a polygenic disease or condition.
In some embodiments, the target sequence has a mutation compared to the wild-type sequence. In some embodiments, the target sequence has a Single Nucleotide Polymorphism (SNP) associated with a disease or condition.
In some embodiments, the donor DNA inserted into the target nucleic acid encodes a biological product selected from the group consisting of: reporter proteins, antigen-specific receptors, therapeutic proteins, antibiotic resistance proteins, RNAi molecules, cytokines, kinases, antigens, antigen-specific receptors, cytokine receptors, and suicide polypeptides. In some embodiments, the donor DNA encodes a therapeutic protein. In some embodiments, the donor DNA encodes a therapeutic protein useful for gene therapy. In some embodiments, the donor DNA encodes a therapeutic antibody. In some embodiments, the donor DNA encodes an engineered receptor, such as a Chimeric Antigen Receptor (CAR) or an engineered TCR. In some embodiments, the donor DNA encodes a therapeutic RNA, such as a small RNA (e.g., siRNA, shRNA, or miRNA) or a long non-coding RNA (lincRNA).
The methods described herein can be used for multiplex gene editing or modulation at two or more (e.g., 2, 3,4, 5, 6, 8, 10 or more) different target sites. In some embodiments, the method detects or modifies a plurality of target nucleic acids or target nucleic acid sequences. In some embodiments, the method comprises contacting the target nucleic acid with a guide RNA comprising a plurality (e.g., 2, 3,4, 5, 6, 8, 10, or more) of crRNA sequences, wherein each crRNA comprises a different target sequence.
Also provided are engineered cells comprising a modified target nucleic acid, the cells produced using any of the methods described herein. The engineered cells may be used in cell therapy. Autologous or allogeneic cells may be used to prepare the engineered cells using the methods described herein for cell therapy.
The methods described herein can also be used to generate an isogenic line of cells (e.g., mammalian cells) to study genetic variants.
Also provided are engineered non-human animals comprising the engineered cells described herein. In some embodiments, the engineered non-human animal is a genome edited non-human animal. The engineered non-human animals can be used as disease models.
Techniques for producing non-human genome editing or transgenic animals are well known in the art and include, but are not limited to: prokaryotic microinjection, viral infection, transformation of embryonic stem cells and Induced Pluripotent Stem (iPS) cells. Detailed methods that may be used include, but are not limited to, the methods described by Sundberg and Ichiki (2006, genetic Engineered Mice handbook, CRC Press) and the method described by Gibson (2004, APrimer Ofgenome Science 2nd Sunderland, mass.: sinauer).
The engineered animal may be of any suitable species, including but not limited to: cattle, horses, sheep, dogs, deer, felines, goats, pigs, primates, and less well understood mammals such as elephants, deer, zebras or camels.
Method of treatment
In some embodiments, there is provided a use of the aforementioned engineered CRISPR-Cas12i system in the manufacture of a medicament for treating a disease or disorder associated with a target nucleic acid in a cell of an individual.
Further provided are methods of treatment using any of the methods of modifying a target nucleic acid in a cell according to the description herein. In some embodiments, the present application provides methods of treating a disease or condition associated with a target nucleic acid in a cell of an individual comprising contacting the target nucleic acid with any of the engineered CRISPR-Cas12i systems described herein, wherein the guide sequence of the guide RNA is complementary to the target sequence of the target nucleic acid, wherein the engineered Cas12i effector protein and the guide RNA associate with each other to bind to the target nucleic acid to modify the target nucleic acid, thereby treating the disease or condition. In some embodiments, a mutation (e.g., a knockout or knock-in mutation) is introduced into the target nucleic acid. In some embodiments, the expression of the target nucleic acid is enhanced. In some embodiments, expression of the target nucleic acid is inhibited. In some embodiments, the application provides a method of treating a disease or condition in an individual comprising administering to the individual an effective amount of any one of the engineered CRISPR-Cas12i systems described herein and a donor DNA encoding a therapeutic agent, wherein the guide sequence of the guide RNA is complementary to a target sequence of a target nucleic acid of the individual, wherein the engineered Cas12i effector protein and the guide RNA bind to each other to bind to the target nucleic acid and insert donor DNA into the target sequence, thereby causing the disease or condition to be treated.
In some embodiments, the present application provides methods of treating a disease or condition in a subject comprising administering to the subject an effective amount of an engineered cell comprising a modified target nucleic acid, wherein the engineered cell is prepared by contacting the cell with any one of the engineered CRISPR-Cas12i systems described herein, wherein the guide sequence of the guide RNA is complementary to the target sequence of the target nucleic acid, wherein the engineered Cas12i effector protein and the guide RNA associate with each other to bind to the target nucleic acid to modify the target nucleic acid. In some embodiments, the engineered cell is an immune cell. In some embodiments, the individual is a human. In some embodiments, the individual is an animal, e.g., a model animal such as a rodent, pet, or farm animal. In some embodiments, the individual is a mammal.
In some embodiments, the disease or condition is selected from the group consisting of: cancer, cardiovascular disease, genetic disease, autoimmune disease, metabolic disease, neurodegenerative disease, ocular disease, bacterial infection, and viral infection. In some embodiments, the target nucleic acid is PCSK9. In some embodiments, the disease or condition is a cardiovascular disease. In some embodiments, the disease or condition is coronary artery disease. In some embodiments, the method reduces cholesterol levels in the subject. In some embodiments, the method treats diabetes in the individual.
Delivery method
In some embodiments, the engineered CRISPR-Cas12i systems described herein or components thereof, nucleic acid molecules thereof, or nucleic acid molecules encoding or providing components thereof can be delivered to a host cell (e.g., any of the vectors described in the "constructs and vectors" section above) by a variety of delivery systems such as plasmids or viruses. In some embodiments or methods, the engineered CRISPR-Cas12i system can be delivered by other methods, such as nuclear transfection or electroporation of ribonucleoprotein complexes consisting of the engineered Cas12i effector protein and its one or more homologous RNA guide sequences.
In some embodiments, the delivery is by nanoparticles or exosomes.
In some embodiments, the paired Cas12i nickase complexes can be delivered directly using nanoparticles or other direct protein delivery methods, such that the complex comprising the two paired crRNA elements is co-delivered. Furthermore, the protein can be delivered to the cell either by a viral vector or directly, followed by the direct delivery of a CRISPR array comprising two paired spacers for a double nick. In certain instances, for direct RNA delivery, the RNA can be conjugated to at least one sugar moiety such as N-acetylgalactosamine (GalNAc) (particularly triantenna GalNAc).
Kits and articles of manufacture
Also provided are compositions, kits, unit medicaments, and articles of manufacture comprising one or more components of any of the engineered Cas12i nucleases, engineered Cas12i effector proteins, or engineered CRISPR-Cas12i systems described herein.
In some embodiments, a kit is provided comprising: one or more AAV vectors encoding any of the engineered Cas12i nucleases, engineered Cas12i effector proteins, or engineered CRISPR-Cas12i systems described herein. In some embodiments, the kit further comprises one or more guide RNAs. In some embodiments, the kit further comprises donor DNA. In some embodiments, the kit further comprises a cell, such as a human cell.
The kit may comprise one or more additional components, such as containers, reagents, media, cytokines, buffers, antibodies, and the like, to allow propagation of the engineered cells. The kit may further comprise a device for administering the composition.
The kit can further comprise instructions for using the engineered CRISPR-Cas12i systems described herein, such as methods of detecting or modifying a target nucleic acid. In some embodiments, the kit comprises instructions for treating or diagnosing a disease or condition. Instructions regarding the use of the kit components typically include information regarding the amount, schedule and route of administration for the deliberate treatment. The container may be a unit dose, a bulk package (e.g., a multi-dose package), or a sub-unit dose. For example, a kit comprising a sufficient dose of a composition disclosed herein can be provided to provide effective treatment of an individual over an extended period of time. The kit may also include a plurality of unit doses of the composition and instructions for use packaged in quantities sufficient for storage and use in pharmacies (e.g., hospital pharmacies and compound pharmacies).
The kit of the invention is in a suitable package. Suitable packaging includes, but is not limited to: vials, bottles, jars, flexible packaging (e.g., sealed mylar or plastic bags), and the like. The kit may optionally provide additional components, such as buffers and explanatory information. Thus, the present application also provides an article of manufacture comprising a vial (e.g., a sealed vial), a bottle, a jar, a flexible package, and the like.
The article may comprise a container and a label or package insert on or adhered to the container. Suitable containers include, for example, bottles, vials, syringes, and the like. The container may be formed from a variety of materials, such as glass or plastic. Typically, the container contains a composition effective to treat the disease or condition described herein, and may have a sterile access port (e.g., the container may be an intravenous solution bag or a vial having a stopper pierceable by a hypodermic injection needle). The label or package insert indicates that the composition is used to treat a particular condition in an individual. The label or package insert will further comprise instructions for administering the composition to an individual.
A package insert refers to instructions typically included in commercial packages of therapeutic products that contain information regarding indications, usage, dosage, administration, contraindications, and/or warnings for use of such therapeutic products.
In addition, the article of manufacture may further comprise a second container comprising a pharmaceutically acceptable buffer, such as bacteriostatic water for injection (BWFI), phosphate buffered saline, ringer's solution, and dextrose solution. From a commercial and user perspective, it may also include other materials, including other buffers, diluents, filters, needles, and syringes.
Examples section
Specific embodiments of the present invention will be described in more detail below with reference to the accompanying drawings. While specific embodiments of the invention are shown in the drawings, it should be understood that the invention may be embodied in various forms and should not be construed as limited to the embodiments set forth herein. Rather, these embodiments are provided so that this disclosure will be thorough and complete, and will fully convey the scope of the invention to those skilled in the art.
Example 1: amino acids interacting with PAM in the reference Cas12i2 enzyme were replaced with positively charged amino acids, and their gene editing efficiency was verified.
Plasmid construction
The coding sequence for Cas12i2 was codon optimized (human) and synthesized. Variants of Cas proteins were generated by PCR-based site-directed mutagenesis. The specific method is to divide the DNA sequence design of the Cas12i2 protein into two parts by taking a mutation site as a center, design two pairs of primers to respectively amplify the two parts of DNA sequences, introduce a sequence to be mutated into the primers, and finally load the two fragments onto a pCAG-2A-eGFP vector in a Gibson cloning mode. The mutant combinations were constructed by splitting the Cas12i2 protein DNA into multiple fragments using PCR, gibson clone. The location of the mutant is determined by analyzing the structural information of Cas12i2 using protein structure visualization software commonly used in the art (for example, pyMol, chimera and other software can be selected). Structural information for Cas12i2 is referenced to PDB:6ltu,6ltr,6lu0, 6ltp). The Cas12i2 effector protein was expressed in human 293T cells by the pCAG-2A-eGFP vector. DNA encoding Cas12i2 protein was inserted between XmaI and NheI. Vectors for expression of Cas12i2crRNA in 293T were constructed by ligation of annealing oligonucleotides containing target sequences into the BasI digested pUC19-U6-i2-crRNA backbone.
Cell culture, transfection and Fluorescence Activated Cell Sorting (FACS)
HEK293T cells were cultured in DMEM (Gibco) containing 1% penicillin-streptomycin (Gibco) and 10% fetal bovine serum (Gibco). Cells were seeded in 24-cell culture dishes (Corning) for 16 hours until confluence reached 70%. 600ng of a plasmid encoding Cas12i2 protein and 3000ng of a plasmid encoding crRNA were transfected into each 24-cell culture dish by using Lipofectamine 3000 (Invitrogen). After 68h of transfection, HEK293T cells from Fluorescence Activated Cell Sorting (FACS) were digested with trypsin-EDTA (0.05%) (Gibco). Cell sorting was performed using MoFlo XDP (Beckman Coulter) with GFP signal.
Targeted deep sequencing analysis for genome modification
FACS-sorted GFP-positive 293FT cells were lysed with buffer L and incubated at 55 ℃ for 3 hours and then at 95 ℃ for 10 minutes. The dsDNA fragments containing the target site in different genomic sites are PCR amplified using the corresponding primers. For targeted deep sequencing, the target site was directly amplified by barcode (barcoded) PCR using cell lysate directly as template. PCR products were purified and pooled into several libraries for high throughput sequencing. The frequency (%) of indels was analyzed using the crispsreso 2 software by calculating the ratio of reads (reads) containing indels or insertions. In the present application, the gene editing efficiency was compared and analyzed using the index of the frequency (%) of indels in a unified manner. Reads less than 0.05% of the full read are discarded.
Example 1-A four engineered Cas12i2 with single amino acid substitutions were selected
Engineered Cas12i2 enzymes with single mutations in the amino acid sequence were expressed separately according to the method described in example 1, and preferred amino acid substitution patterns and their corresponding gene editing efficiencies are shown in fig. 1 and table 1. In FIG. 1, we first selected Cas12i2 from PAM DNA
Figure SMS_1
The internal 10 amino acids: e176, E178, Y226, a227, N229, E237, K238, K264, T447, E563, the point mutation test of arginine (R) was performed. These mutants were compared to wild-type Cas12i2 in 293T cells at 2 genomic sites: efficiency of Gene editing for CCR5-3, RNF2-7, we found mutants with the following amino acid substitutions: E176R, K R, T R and E563R can effectively improve gene editing efficiency (in FIG. 1, it is shown that four amino acid substitutions all achieve insertion and deletion rates higher than about 10% (this is the gene editing efficiency for the CCR5-3 reference enzyme) and higher than about 12% (this is the gene editing efficiency for the RNF2-7 reference enzyme), which is significantly better than other single amino acid substitution schemes), and the remaining 6 mutants do not contribute to the improvement of gene editing efficiency, even have serious negative effects.
Example 1-B comparison of engineered Cas12i2 with multiple preferred amino acid substitutions simultaneously
A comparison of the combination and gene editing efficiency of engineered Cas12i2 enzymes whose amino acid sequences had two or more preferred amino acid substitutions simultaneously were expressed separately according to the method described in example 1 and shown in table 1 and fig. 2. FIG. 2 shows that we combined the point mutations in the 4 mutants that were screened in example 1-A, E176R, K R, T R and E563R, to improve efficiency. These mutants were compared to wild-type Cas12i2 in 293T cells at 3 genomic sites: the gene editing efficiency of CCR5-3, CCR5-5 and RNF2-7, and we find that the mutant with even higher efficiency can be obtained after point mutation combination. Especially when combining 4 mutations together (E176R + K238R + T447R + E563R) an optimally efficient combination of mutations can be obtained.
Summary of the experimental results of example 1
Figure SMS_2
TABLE 1 summary of the results (Gene editing efficiency) of example 1
Example 2: replacing amino acids involved in opening DNA double chains in reference Cas12i2 enzyme with amino acids with aromatic rings, and respectively verifying the gene editing efficiency of the amino acids
Plasmid construction
The method comprises the steps of generating a variant of a Cas12i2 protein through PCR-based site-directed mutagenesis, dividing a DNA sequence of the Cas12i2 protein into two parts by taking a mutation site as a center, designing two pairs of primers to amplify the two parts of DNA sequences respectively, introducing a sequence needing mutation into the primers, and finally loading the two fragments onto a pCAG-2A-eGFP vector in a Gibson clone mode. Determination of the amino acid substitution position can be obtained by analyzing the structural information of Cas12i2 using common protein structure visualization software (for example, pyMol, chimera, etc. software can be used). Structural information for Cas12i2 is referenced to PDB:6ltu,6ltr,6lu0, 6ltp). The Cas12i2 effector protein was expressed in human 293T cells by the pCAG-2A-eGFP vector. DNA encoding the Cas12i2 protein was inserted between XmaI and NheI. Vectors for expression of Cas12i2crRNA in 293T were constructed by ligating annealing oligonucleotides containing the target sequence into the BasI digested pUC19-U6-i2-crRNA backbone.
Cell culture, transfection and Fluorescence Activated Cell Sorting (FACS)
HEK293T cells were cultured in DMEM (Gibco) containing 1% penicillin-streptomycin (Gibco) and 10% fetal bovine serum (Gibco). Cells were seeded in 24-cell culture dishes (Corning) for 16 hours until the degree of fusion reached 70%. 600ng of a plasmid encoding Cas12i2 protein and 3000ng of a plasmid encoding crRNA were transfected into each 24-cell culture dish by using Lipofectamine 3000 (Invitrogen). After 68h of transfection, HEK293T cells from Fluorescence Activated Cell Sorting (FACS) were digested with trypsin-EDTA (0.05%) (Gibco). Cell sorting was performed using MoFlo XDP (Beckman Coulter) with GFP signal.
Targeted deep sequencing analysis for genome modification
FACS-sorted GFP-positive 293FT cells were lysed with buffer L and incubated at 55 ℃ for 3 hours, followed by 95 ℃ for 10 minutes. The dsDNA fragments containing the target site in different genomic sites are PCR amplified using the corresponding primers. For targeted deep sequencing, the target site was directly amplified by barcode (barcoded) PCR using cell lysates directly as templates. PCR products were purified and pooled into several libraries for high throughput sequencing. The frequency (%) of indels was analyzed using the crispsreso 2 software by calculating the ratio of reads (reads) containing indels or insertions. Reads less than 0.05% of the full read are discarded.
As can be seen from fig. 2, first we have selected the amino acids of the Cas12i2 enzyme that are involved in opening the DNA double strand: q163 and N164 were subjected to a point mutation test of amino acids (Y, F, W) having an aromatic ring. These mutants were compared to wild-type Cas12i2 in 293T cells at 3 genomic sites: gene editing efficiency of CCR5-3, CCR5-5, RNF2-7, we found that there were 5 mutants: Q163W, Q163Y, Q163F, N164Y, N164F are effective in increasing gene editing efficiency (at least at one genomic locus). It can be concluded that: both N164Y and N164F showed excellent gene editing efficiency at 3 genomic loci; whereas N164W did not have any effect on improving gene editing efficiency compared to the reference enzyme.
Summary of the experimental results of example 2
Figure SMS_3
Table 2 summary of the results (gene editing efficiency) of example 2
Example 3: replacing amino acids in the reference Cas12i2 enzyme, which are located in the RuvC domain and interact with a single-stranded DNA substrate, with positively charged amino acids, and verifying the efficiency of gene editing
Plasmid construction
The method comprises the steps of generating a variant of a Cas12i2 protein through PCR-based site-directed mutagenesis, dividing a DNA sequence of the Cas12i2 protein into two parts by taking a mutation site as a center, designing two pairs of primers to amplify the two parts of DNA sequences respectively, introducing a sequence needing mutation into the primers, and finally loading the two fragments onto a pCAG-2A-eGFP vector in a Gibson clone mode. The mutant combinations were constructed by splitting the Cas12i2 protein DNA into multiple fragments using PCR, gibson clone. The location of the mutant is determined by analyzing the structural information of Cas12i2 using commonly used protein structure visualization software (e.g., available in PyMol, chimera, etc.). Structural information of Cas12i2 is referenced to PDB ID:6LTU,6LTR,6LU0,6LTP. The ssDNA substrate shown in these Cas12i2 structures is only 5nt. To obtain information on the interaction of longer ssDNA with Cas12i2, we homologously aligned the structure of Cas12i1 (PDB ID:6W5C,6W62 and 6W64, zhang H.et al. Nature structural &molecular biology 27,1069-1076 (2020)) with the structure of Cas12i2, so as to place the ssDNA substrate (9 nt) in the structure of Cas12i1 into the RuvC catalytic pocket of Cas12i2, and further search for amino acids within 9A through this model. The Cas12i2 effector protein was expressed in human 293T cells by the pCAG-2A-eGFP vector. DNA encoding Cas12i2 protein was inserted between XmaI and NheI. Vectors for expression of Cas12i2crRNA in 293T were constructed by ligating annealing oligonucleotides containing the target sequence into the BasI digested pUC19-U6-i2-crRNA backbone.
Cell culture, transfection and Fluorescence Activated Cell Sorting (FACS)
HEK293T cells were cultured in DMEM (Gibco) containing 1% penicillin-streptomycin (Gibco) and 10% fetal bovine serum (Gibco). Cells were seeded in 24-cell culture dishes (Corning) for 16 hours until the degree of fusion reached 70%. 600ng of a plasmid encoding Cas12i2 protein and 3000ng of a plasmid encoding crRNA were transfected into each 24-cell culture dish by using Lipofectamine 3000 (Invitrogen). After 68h of transfection, HEK293T cells for Fluorescence Activated Cell Sorting (FACS) were digested with trypsin-EDTA (0.05%) (Gibco). Cell sorting was performed using MoFlo XDP (Beckman Coulter) with GFP signal.
Targeted deep sequencing analysis for genome modification
FACS-sorted GFP-positive 293FT cells were lysed with buffer L and incubated at 55 ℃ for 3 hours, followed by 95 ℃ for 10 minutes. The dsDNA fragments containing the target site in different genomic sites were PCR amplified using the corresponding primers. For targeted deep sequencing, the target site was directly amplified by barcode (barcoded) PCR using cell lysates directly as templates. PCR products were purified and pooled into several libraries for high throughput sequencing. The frequency (%) of indels was analyzed using crispsreso 2 software by calculating the ratio of reads containing indels or insertions. Reads less than 0.05% of the full read are discarded.
In fig. 3a, we replaced the amino acid in the reference Cas12i2 enzyme that is located in the RuvC domain and interacts with the single stranded DNA substrate with a positively charged amino acid. These mutants were compared to wild-type Cas12i2 in 293T cells at 2 genomic sites: CCR5-3, RNF2-7 gene editing efficiency, we found that there are 3 mutants: N391R, I926R, G929R can effectively improve gene editing efficiency (at least at one genomic locus). Wherein I926R showed excellent gene editing efficiency at 2 genomic loci.
In fig. 3b, 3c, we replaced the amino acid in the reference Cas12i2 enzyme that is located in the RuvC domain and interacts with the single stranded DNA substrate with a positively charged amino acid. By comparing these mutants with wild-type Cas12i2 in 293T cells at 2 genomic sites: CCR5-3, RNF2-7 gene editing efficiency, and we found that there are many mutants that can effectively improve gene editing efficiency (at least at one genomic locus). The ranking of the single amino acid substitution schemes in which gene editing efficiency is improved is: D362R > E323R > Q425R > N925R > other mutants with improved efficiency.
In FIG. 3d, we combined the point mutations in the 4 mutants with improved efficiency selected from the group consisting of E323R, D362R, Q R and I926R of FIGS. 3a, 3b and 3 c. These mutants were compared to wild-type Cas12i2 in 293T cells at 2 genomic sites: CCR5-3, RNF2-7 gene editing efficiency, and we found that even more efficient mutants could be obtained after combining point mutations.
In FIG. 3e, we combined the point mutations in the partially efficiency-improved mutants screened in FIGS. 3a, 3b, and 3c with I926G,439 GG. These mutants were compared to wild-type Cas12i2 in 293T cells at 2 genomic sites: CCR5-3, RNF2-7 gene editing efficiency, and we found that even more efficient mutants could be obtained after combining point mutations.
Summary of the experimental results of example 3
Figure SMS_4
/>
Figure SMS_5
/>
Figure SMS_6
Table 3 the results (gene editing efficiency) of example 3 are summarized. *439GG means that two glycines are inserted after amino acid No. 439.
Example 4: amino acids interacting with DNA-RNA double helix in reference Cas12i2 enzyme are replaced by positively charged amino acids, and the gene editing efficiency is verified
Plasmid construction
The method comprises the steps of generating a variant of Cas12i2 protein through PCR-based site-directed mutagenesis, dividing a DNA sequence of the Cas12i2 protein into two parts by taking a mutation site as a center, designing two pairs of primers to amplify the two parts of DNA sequences respectively, introducing a sequence to be mutated into the primers, and finally loading the two fragments onto a pCAG-2A-eGFP vector in a Gibson clone mode. The mutant combinations were constructed by splitting the Cas12i2 protein DNA into multiple fragments using PCR, gibson clone. The location of the mutant is determined by analyzing the structural information of Cas12i2 using commonly used protein structure visualization software (e.g., available in PyMol, chimera, etc.). Structural information for Cas12i2 is referenced to PDB:6ltu,6ltr,6lu0, 6ltp). The Cas12i2 effector protein was expressed in human 293T cells by the pCAG-2A-eGFP vector. DNA encoding the Cas12i2 protein was inserted between XmaI and NheI. Vectors for expression of Cas12i2crRNA in 293T were constructed by ligating annealing oligonucleotides containing the target sequence into the BasI digested pUC19-U6-i2-crRNA backbone.
Cell culture, transfection and Fluorescence Activated Cell Sorting (FACS)
HEK293T cells were cultured in DMEM (Gibco) containing 1% penicillin-streptomycin (Gibco) and 10% fetal bovine serum (Gibco). Cells were seeded in 24-cell culture dishes (Corning) for 16 hours until the degree of fusion reached 70%. 600ng of a plasmid encoding Cas12i2 protein and 3000ng of a plasmid encoding crRNA were transfected into each 24-cell culture dish by using Lipofectamine 3000 (Invitrogen). After 68h of transfection, HEK293T cells from Fluorescence Activated Cell Sorting (FACS) were digested with trypsin-EDTA (0.05%) (Gibco). Cell sorting was performed using MoFlo XDP (Beckman Coulter) with GFP signal.
Targeted deep sequencing analysis for genome modification
FACS-sorted GFP-positive 293FT cells were lysed with buffer L and incubated at 55 ℃ for 3 hours, followed by 95 ℃ for 10 minutes. The dsDNA fragments containing the target site in different genomic sites are PCR amplified using the corresponding primers. For targeted deep sequencing, the target site was directly amplified by barcode (barcoded) PCR using cell lysates directly as templates. PCR products were purified and pooled into several libraries for high throughput sequencing. The frequency (%) of indels was analyzed using crispsreso 2 software by calculating the ratio of reads containing indels or insertions. Reads less than 0.05% of the full read are discarded.
Fig. 4 and table 4 summarize the comparison of Cas12i2 mutants in this example with wild-type Cas12i2 at 2 genomic sites in 293T cells: CCR5-3 and RNF 2-7. We found that there were 7 mutants: G116R, E R, T159R, S R, E319R, E R, and D958R can effectively increase gene editing efficiency (at least at one genomic locus). Wherein D958R showed excellent gene editing efficiency at all 2 genomic sites.
Summary of experimental results of example 4
Figure SMS_7
/>
Figure SMS_8
Figure SMS_9
Table 4 summary of the results (gene editing efficiency) of example 4.
Example 5: cas12i2 engineered amino acid mutations screened in examples 1-4, which partially improve gene editing efficiency, were combined and their gene editing efficiency was verified.
Plasmid construction
The mutant combinations were constructed by splitting the Cas12i2 protein DNA into multiple fragments using PCR, gibson clone. The location of the mutants was determined by analyzing the structural information of Cas12i2 using common protein structure visualization software (e.g., available from PyMol, chimera, etc.). Structural information for Cas12i2 is referenced to PDB:6ltu,6ltr,6lu0, 6ltp). The Cas12i2 effector protein was expressed in human 293T cells by the pCAG-2A-eGFP vector. DNA encoding Cas12i2 protein was inserted between XmaI and NheI. Vectors for expression of Cas12i2crRNA in 293T were constructed by ligating annealing oligonucleotides containing the target sequence into the BasI digested pUC19-U6-i2-crRNA backbone.
Cell culture, transfection and Fluorescence Activated Cell Sorting (FACS)
HEK293T cells were cultured in DMEM (Gibco) containing 1% penicillin-streptomycin (Gibco) and 10% fetal bovine serum (Gibco). Cells were seeded in 24-cell culture dishes (Corning) for 16 hours until the degree of fusion reached 70%. 600ng of a plasmid encoding Cas12i2 protein and 3000ng of a plasmid encoding crRNA were transfected into each 24-cell culture dish by using Lipofectamine 3000 (Invitrogen). After 68h of transfection, HEK293T cells from Fluorescence Activated Cell Sorting (FACS) were digested with trypsin-EDTA (0.05%) (Gibco). Cell sorting was performed using MoFlo XDP (Beckman Coulter) with GFP signal.
Targeted deep sequencing analysis for genome modification
FACS-sorted GFP-positive 293FT cells were lysed with buffer L and incubated at 55 ℃ for 3 hours and then at 95 ℃ for 10 minutes. The dsDNA fragments containing the target site in different genomic sites were PCR amplified using the corresponding primers. For targeted deep sequencing, the target site was directly amplified by barcode (barcoded) PCR using cell lysate directly as template. PCR products were purified and pooled into several libraries for high throughput sequencing. The frequency (%) of indels was analyzed using the crispsreso 2 software by calculating the ratio of reads (reads) containing indels or insertions. Reads less than 0.05% of the full read are discarded.
In fig. 5, we selected the amino acid mutations or combinations of amino acid mutations selected in examples 1, 2, 3: E176R + K238R + T447R + E563R, N164Y, E323R + D362R, I926R, E323R + D362R + I926R, E323R + D362R + I926G, E323R + D362R + I926G +439G, E323R + D362R + I926G + 439GG. By comparing these mutants to wild-type Cas12i in 293T cells at 5 genomic sites: the gene editing efficiency of CCR5-3, CCR5-5, CD34-8, CD34-9 and RNF2-14, and we find that mutants with even higher efficiency can be obtained after point mutation combination. Meanwhile, the mutant which we considered to be the most efficient (E176R + K238R + T447R + E563R + N164Y + E323R + D362R) was named CasXX.
Results of example 5 (only gene editing efficiency data shown in RNF2-14 is used as an example)
Figure SMS_10
Figure SMS_11
Table 5 summary of the results (gene editing efficiency) of example 5
*439G means that a glycine is inserted after amino acid 439.
Furthermore, we tested the gene editing efficiency of the following combination of mutations by T7 endonuclease 1 (T7E 1) assay and targeted deep sequencing: E176R + K238R + T447R + E563R + N164Y + D958R; E176R + K238R + T447R + E563R + I926R + D958R; E176R + K238R + T447R + E563R + E323R + D362R + D958R; N164Y + I926R + D958R; N164Y + E323R + D362R + D958R; E176R + K238R + T447R + E563R + N164Y + I926R + D958R; E176R + K238R + T447R + E563R + N164Y + E323R + D362R + D958R; E176R + K238R + T447R + E563R + N164Y + I926R + E323R + D362R + D958R; E176R + K238R + T447R + E563R + N164Y + E323R + D362R + I926G + D958R; E176R + K238R + T447R + E563R + N164Y + E323R + D362R + I926G +439GG + D958R; and E176R + K238R + T447R + E563R + N164Y + E323R + D362R + I926G +439G + D958R
Example 6: comparison of CasXX with conventional gene editing tools verifies its gene editing efficiency.
Plasmid construction
The coding sequences for AsCas12a, bhCas12b v, spCas9, saCas9, saCas9-KKH were codon optimized (human) and synthesized. Cas effector protein was expressed in human 293T cells by pCAG-2A-eGFP vector. The Cas protein-encoding DNA was inserted between XmaI and NheI. Vectors expressing sgRNA or crRNA of AsCas12a, bhCas12b v, spCas9, saCas9-KKH and Cas12i2 in 293T were constructed by ligating annealing oligonucleotides containing target sequences into a BasI digested pUC19-U6-i2-crRNA backbone.
Cell culture, transfection and Fluorescence Activated Cell Sorting (FACS)
HEK293T cells were cultured in DMEM (Gibco) containing 1% penicillin-streptomycin (Gibco) and 10% fetal bovine serum (Gibco). Cells were seeded in 24-cell culture dishes (Corning) for 16 hours until the degree of fusion reached 70%. 600ng of Cas protein-encoding plasmid and 3000ng of crRNA-encoding plasmid were transfected into each 24-cell culture dish by using Lipofectamine 3000 (Invitrogen). After 68h of transfection, HEK293T cells from Fluorescence Activated Cell Sorting (FACS) were digested with trypsin-EDTA (0.05%) (Gibco). Cell sorting was performed using MoFlo XDP (Beckman Coulter) with GFP signal.
Targeted deep sequencing analysis for genome modification
FACS-sorted GFP-positive 293FT cells were lysed with buffer L and incubated at 55 ℃ for 3 hours and then at 95 ℃ for 10 minutes. The dsDNA fragments containing the target site in different genomic sites were PCR amplified using the corresponding primers. For targeted deep sequencing, the target site was directly amplified by barcode (barcoded) PCR using cell lysates directly as templates. PCR products were purified and pooled into several libraries for high throughput sequencing. The frequency (%) of indels was analyzed using crispsreso 2 software by calculating the ratio of reads containing indels or insertions. Reads less than 0.05% of the full read are discarded.
In FIG. 6a, we first tested the efficiency of gene editing of CasXX at the 62 human genomic locus. CasXX exhibits extremely strong gene editing capacity, with an average gene editing efficiency of over 60%, and gene editing efficiency of over 50% at almost all sites tested. Moreover, the gene editing efficiency is high for any NTTN PAM (N = A, T, G, C). This is not possible with wild-type Cas12i.
In fig. 6b, to further demonstrate the gene editing capabilities of our engineered CasXX, we compared CasXX with assas 12a at the TTTN PAM site. Our CasXX showed higher average gene editing efficiency. Meanwhile, our CasXX showed higher average gene editing efficiency compared to BhCas12b v at the TTN PAM site.
In fig. 6c, to further demonstrate the gene editing ability of our engineered CasXX, we compared CasXX to SpCas9 at the same site. Our CasXX showed higher average gene editing efficiency. Meanwhile, our CasXX exhibits higher average gene editing efficiency compared to SaCas9, saCas9-KKH at the same site.
In FIG. 6d, we found statistics of the gene editing efficiency of CasXX in mouse Hepa1-6 cell line, and it can be seen that CasXX exhibits strong gene editing ability at 65 sites, and the average gene editing efficiency is over 60%.
Although embodiments of the present invention have been described above with reference to the accompanying drawings, the present invention is not limited to the specific embodiments and applications described above, which are illustrative, instructive, and not restrictive. Those skilled in the art, having the benefit of this disclosure, may effect numerous modifications thereto without departing from the scope of the invention as defined by the appended claims.
Sequence listing
Figure SMS_12
/>
Figure SMS_13
/>
Figure SMS_14
/>
Figure SMS_15
/>
Figure SMS_16
/>
Figure SMS_17
/>
Figure SMS_18
/>
Figure SMS_19
/>
Figure SMS_20
/>

Claims (28)

1. An engineered Cas12i nuclease; comprising one, two, three or four reference Cas12i nuclease-based mutations selected from the group consisting of:
(1) Replacing one or more amino acids in the reference Cas12i nuclease that interact with PAM with a positively charged amino acid; and/or
(2) Replacing one or more amino acids involved in opening a DNA double strand in a reference Cas12i nuclease with amino acids with aromatic rings; and/or
(3) Replacing one or more amino acids in the reference Cas12i nuclease that reside in the RuvC domain and interact with a single-stranded DNA substrate with a positively charged amino acid; and/or
(4) One or more amino acids in the reference Cas12i nuclease that interact with the DNA-RNA duplex are replaced with positively charged amino acids.
2. The engineered Cas12i nuclease of claim 1, wherein the one or more amino acids that interact with PAM are one or more amino acids at the following positions: 176. 238, 447, and/or 563; wherein the amino acid position number is defined as SEQ ID NO.1.
3. An engineered Cas12i nuclease as claimed in claim 2 wherein the positively charged amino acids are R or K.
4. An engineered Cas12i nuclease as claimed in claim 3 wherein the Cas12i nuclease comprises any one or combination of mutations of: 1) E563R; (2) E176R, T447R, E R and E563R; (3) K238R and E563R; (4) E176R, K R and T447R; (5) E176R, K R and E563R; (6) E176R, T R and E563R; and/or (7) E176R, K R, T R and E563R; wherein the amino acid position number is defined as SEQ ID NO.1.
5. An engineered Cas12i nuclease as claimed in claim 1 wherein the one or more amino acids involved in opening a DNA double strand are one or more amino acids at the following positions: 163 and/or 164; wherein the amino acid position number is defined as SEQ ID NO.1.
6. An engineered Cas12i nuclease as claimed in claim 5, wherein the one or more amino acids involved in opening a DNA double strand are replaced with an aromatic ring-bearing amino acid which is F, Y or W.
7. An engineered Cas12i nuclease as claimed in claim 6 wherein the replacement of one or more amino acids involved in opening a DNA double strand in a reference Cas12i nuclease by an aromatic ring-bearing amino acid is: Q163F, Q163Y, Q W, and/or N164F or N164Y.
8. An engineered Cas12i nuclease as claimed in claim 1 wherein the one or more amino acids located in the RuvC domain that interact with a single stranded DNA substrate are one or more of the amino acids at the following positions: 323. 362, 425, 925, 926, 391, 424, and/or 929; wherein the amino acid position number is defined as SEQ ID NO.1.
9. An engineered Cas12i nuclease as claimed in claim 8 wherein one or more amino acids in a reference Cas12i nuclease that are involved in cleavage of double-stranded DNA are replaced with positively charged amino acids that are R or K.
10. An engineered Cas12i nuclease as claimed in claim 9 wherein the Cas12i nuclease comprises any one or combination of mutations of: 1) E323R; (2) D362R; (3) Q425R; (4) N925R; (5) I926R; (6) E323R and D362R; (7) E323R and Q425R; (8) E323R and I926R; (9) Q425R and I926R; (10) D362R and I926R; (11) N925R and I926R; (12) E323R, D R and Q425R; (13) E323R, D R and I926R; (14) E323R, Q R and I926R; (15) D362R, N R and I926R; and/or (16) E323R, D362R, Q R and I926R; wherein the amino acid position number is defined as SEQ ID NO.1.
11. An engineered Cas12i nuclease as claimed in claim 1, wherein the one or more amino acids that interact with the DNA-RNA duplex are one or more of the following: 116. 117, 159, 161, 319, 343, and/or 958; wherein the amino acid position number is defined as SEQ ID NO.1.
12. An engineered Cas12i nuclease as claimed in claim 11 in which one or more amino acids in a reference Cas12i nuclease that interact with a DNA-RNA duplex are replaced with positively charged amino acids that are R or K.
13. An engineered Cas12i nuclease as claimed in claim 12, wherein the Cas12i nuclease comprises any one or combination of mutations of: G116R, E R, T159R, S R, E319R, E R, and/or D958R; wherein the amino acid position number is defined as SEQ ID NO.1.
14. The engineered Cas12i nuclease of any one of claims 1-13, comprising one or more flexible region mutations at one or more of the following positions: 439 and/or 926; wherein the amino acid position number is defined as SEQ ID NO.1.
15. The engineered Cas12i nuclease of claim 14, wherein the one or more flexible region mutations are: I926G; and/or 439G or 439GG.
16. An engineered Cas12i nuclease;
the engineered Cas12i nuclease comprises any one or more of the following sets of mutations: (1) E563R; (2) E176R and T447R; (3) E176R and E563R; (4) K238R and E563R; (5) E176R, K R and T447R; (6) E176R, T R and E563R; (7) E176R, K R and E563R; (8) E176R, K R, T R and E563R; (9) N164Y; (10) N164F; (11) E323R; (12) D362R; (13) Q425R; (14) N925R; (15) I926R; (16) D958R; (17) E323R and D362R; (18) E323R and Q425R; (19) E323R and I926R; (20) Q425R and I926R; (21) D362R and I926R; (22) N925R and I926R; (23) E323R, D R and Q425R; (24) E323R, D R and I926R; (25) E323R, Q R and I926R; (26) D362R, N R and I926R; (27) E323R, D362R, Q R and I926R; (28) D362R and I926G; (29) N925R and I926G; (30) D362R, N R and I926G; (31) I926R and 439G; (32) I926R and 439GG; and/or (33) E323R, D R and I926G; wherein the amino acid position number is defined as SEQ ID NO.1.
17. An engineered Cas12i nuclease, said engineered Cas12i nuclease comprising any one of the following sets of mutations:
(1) E176R, K R, T447R, E563R and N164Y; (2) E176R, K32238R, T447R, E563R and I926R; (3) N164Y, E R and D362R; (4) E176R, K R, T447R, E563R, E R and D362R; (5) N164Y and I926R; (6) E176R, K32238R, T447R, E563R, N Y and I926R; (7) E176R, K R, T447R, E563R, N164Y, E323R and D362R; (8) E176R, K R, T447R, E563R, N164Y, I926R, E R and D362R; (9) E176R, K32238R, T447R, E563R, N164Y, E323R, D R and I926G; (10) E176R, K R, T447R, E563R, N164Y, E323R, D362R, I G and 439GG; (11) E176R, K R, T447R, E563R, N164Y, E323R, D362R, I926G and 439G; (12) E176R, K32238R, T447R, E563R, N Y and D958R; (13) E176R, K32238R, T447R, E563R, I926R and D958R; (14) E176R, K R, T447R, E563 8978 zxft 89323R, D362R and D958R; (15) N164Y, I R and D958R; (16) N164Y, E323R, D R and D958R; (17) E176R, K R, T447R, E563R, N164Y, I926R and D958R; (18) E176R, K R, T447R, E563R, N164 zxft 6253 323R, D R and D958R; (19) E176R, K R, T447R, E563R, N164Y, I926R, E323R, D R and D958R; (20) E176R, K R, T447R, E563R, N164Y, E323R, D362R, I926G and D958R; (21) E176R, K R, T447R, E563R, N164Y, E323R, D362R, I926G,439GG and D958R; or (22) E176R, K238R, T447R, E563R, N164Y, E323R, D362R, I926G,439G and D958R;
wherein the amino acid position number is defined as SEQ ID NO.1.
18. An engineered Cas12i nuclease comprising an amino acid sequence as set forth in any one of SEQ ID nos. 2 to 12.
19. An engineered Cas12i effector protein comprising an engineered Cas12i nuclease of any one of claims 1-18.
20. The engineered Cas12i effector protein of claim 19, wherein the effector protein is capable of inducing a double-strand break or a single-strand break in a DNA molecule.
21. The engineered Cas12i effector protein of claim 19, wherein said engineered Cas12i nuclease is an enzyme-inactivating mutant with D599A, E833A, S883A, H884A, R a and/or D1019A.
22. An engineered Cas12i effector protein as set forth in claim 19 further comprising a functional domain fused to said engineered Cas12i nuclease.
23. The engineered Cas12i effector protein of claim 22, wherein said functional domain is selected from the group consisting of: a translation initiation domain, a transcription repression domain, a transactivation domain, an epigenetic modification domain, a nucleobase editing domain, a reverse transcriptase domain, a reporter domain, and a nuclease domain.
24. An engineered Cas12i effector protein as claimed in claim 19 comprising a first polypeptide comprising amino acid residues 1 to X of the N-terminal portion of the engineered Cas12i nuclease of any one of claims 1-18 and a second polypeptide comprising amino acid residues X +1 of the engineered Cas12i nuclease of any one of claims 1-18 to the C-terminus of the Cas12i nuclease, wherein the first and second polypeptides are capable of associating with each other in the presence of a guide RNA comprising a guide sequence to form a Clustered Regularly Interspaced Short Palindromic Repeats (CRISPR) complex that specifically binds to a target nucleic acid comprising a target sequence complementary to the guide sequence.
25. An engineered CRISPR-Cas12i system comprising:
(a) An engineered Cas12i effector protein of any one of claims 19-24; and
(b) A guide RNA comprising a guide sequence complementary to a target sequence, or one or more nucleic acids encoding the guide RNA,
wherein the engineered Cas12i effector protein and the guide RNA are capable of forming a CRISPR complex that specifically binds to a target nucleic acid comprising the target sequence and induces modification of the target nucleic acid.
26. A method of detecting a target nucleic acid in a sample, comprising:
(a) Contacting the sample with the engineered CRISPR-Cas12i system of claim 25 and a tagged detector nucleic acid that is single stranded and does not hybridize to the guide sequence of the guide RNA; and
(b) Measuring a detectable signal generated by cleavage of the tagged detection nucleic acid by the engineered Cas12i effector protein, thereby detecting the target nucleic acid.
27. Use of the engineered CRISPR-Cas12i system of claim 25 in the manufacture of a medicament for treating a disease or disorder associated with a target nucleic acid in a cell of an individual.
28. A method of modifying a target nucleic acid comprising a target sequence comprising contacting the target nucleic acid with an engineered CRISPR-Cas12i system as set forth in claim 25.
CN202211325001.4A 2021-05-27 2021-05-27 Engineered Cas12i nuclease, effector protein thereof and application thereof Pending CN115851665A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202211325001.4A CN115851665A (en) 2021-05-27 2021-05-27 Engineered Cas12i nuclease, effector protein thereof and application thereof

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202110581290.3A CN113151215B (en) 2021-05-27 2021-05-27 Engineered Cas12i nuclease, effector protein thereof and application thereof
CN202211325001.4A CN115851665A (en) 2021-05-27 2021-05-27 Engineered Cas12i nuclease, effector protein thereof and application thereof

Related Parent Applications (1)

Application Number Title Priority Date Filing Date
CN202110581290.3A Division CN113151215B (en) 2021-05-27 2021-05-27 Engineered Cas12i nuclease, effector protein thereof and application thereof

Publications (1)

Publication Number Publication Date
CN115851665A true CN115851665A (en) 2023-03-28

Family

ID=76877801

Family Applications (2)

Application Number Title Priority Date Filing Date
CN202110581290.3A Active CN113151215B (en) 2021-05-27 2021-05-27 Engineered Cas12i nuclease, effector protein thereof and application thereof
CN202211325001.4A Pending CN115851665A (en) 2021-05-27 2021-05-27 Engineered Cas12i nuclease, effector protein thereof and application thereof

Family Applications Before (1)

Application Number Title Priority Date Filing Date
CN202110581290.3A Active CN113151215B (en) 2021-05-27 2021-05-27 Engineered Cas12i nuclease, effector protein thereof and application thereof

Country Status (1)

Country Link
CN (2) CN113151215B (en)

Families Citing this family (12)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112877314B (en) * 2021-03-08 2023-06-13 四川大学 Inducible base editing system and application thereof
EP4349979A1 (en) * 2021-05-27 2024-04-10 Institute Of Zoology, Chinese Academy Of Sciences Engineered cas12i nuclease, effector protein and use thereof
US20230203539A1 (en) * 2021-08-11 2023-06-29 Arbor Biotechnologies, Inc. Gene editing systems comprising an rna guide targeting stathmin 2 (stmn2) and uses thereof
WO2023018856A1 (en) * 2021-08-11 2023-02-16 Arbor Biotechnologies, Inc. Gene editing systems comprising an rna guide targeting polypyrimidine tract binding protein 1 (ptbp1) and uses thereof
WO2023034475A1 (en) * 2021-09-01 2023-03-09 Arbor Biotechnologies, Inc. Cells modified by a cas12i polypeptide
CN114015674A (en) 2021-11-02 2022-02-08 辉二(上海)生物科技有限公司 Novel CRISPR-Cas12i system
WO2023078314A1 (en) * 2021-11-02 2023-05-11 Huidagene Therapeutics Co., Ltd. Novel crispr-cas12i systems and uses thereof
CN114085873A (en) * 2021-11-16 2022-02-25 珠海中科先进技术研究院有限公司 Cancer cell state identification gene circuit group and preparation method thereof
WO2023104185A1 (en) * 2021-12-09 2023-06-15 Beijing Institute For Stem Cell And Regenerative Medicine Engineered cas12b effector proteins and methods of use thereof
CN116497002A (en) * 2022-01-19 2023-07-28 中国科学院动物研究所 Engineered CasX nucleases, effector proteins and uses thereof
WO2023138685A1 (en) * 2022-01-24 2023-07-27 Huidagene Therapeutics Co., Ltd. Novel crispr-cas12i systems and uses thereof
CN117460822A (en) * 2022-04-25 2024-01-26 辉大基因治疗(新加坡)私人有限公司 Novel CRISPR-Cas12i system and application thereof

Family Cites Families (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
PT3765615T (en) * 2018-03-14 2023-08-28 Arbor Biotechnologies Inc Novel crispr dna targeting enzymes and systems
CN112195164B (en) * 2020-12-07 2021-04-23 中国科学院动物研究所 Engineered Cas effector proteins and methods of use thereof

Also Published As

Publication number Publication date
CN113151215B (en) 2022-11-18
CN113151215A (en) 2021-07-23

Similar Documents

Publication Publication Date Title
CN113151215B (en) Engineered Cas12i nuclease, effector protein thereof and application thereof
CN112195164B (en) Engineered Cas effector proteins and methods of use thereof
KR102494449B1 (en) Engineered cas9 systems for eukaryotic genome modification
CA3047313A1 (en) Enhanced hat family transposon-mediated gene transfer and associated compositions, systems, and methods
US11760983B2 (en) Enhanced hAT family transposon-mediated gene transfer and associated compositions, systems, and methods
JP6965466B2 (en) Manipulated cascade components and cascade complexes
TWI609960B (en) Transposon system, kit comprising the same, and uses thereof
WO2022247873A1 (en) Engineered cas12i nuclease, effector protein and use thereof
CN113711046B (en) CRISPR/Cas shedding screening platform for revealing gene vulnerability related to Tau aggregation
US11278570B2 (en) Enhanced hAT family transposon-mediated gene transfer and associated compositions, systems, and methods
WO2022042557A1 (en) Split cas12 systems and methods of use thereof
WO2022120520A1 (en) Engineered cas effector proteins and methods of use thereof
WO2023138617A1 (en) Engineered casx nuclease, effector protein and use thereof
WO2023104185A1 (en) Engineered cas12b effector proteins and methods of use thereof
US20230272355A1 (en) ENHANCED hAT FAMILY MEMBER SPIN TRANSPOSON-MEDIATED GENE TRANSFER AND ASSOCIATED COMPOSITIONS, SYSTEMS, AND METHODS
WO2022104344A2 (en) Knock-in of large dna for long-term high genomic expression
JP2024507451A (en) Fusion protein for transcriptional repression based on CRISPR

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination