CN118202044A - Base editing enzyme - Google Patents

Base editing enzyme Download PDF

Info

Publication number
CN118202044A
CN118202044A CN202280074006.6A CN202280074006A CN118202044A CN 118202044 A CN118202044 A CN 118202044A CN 202280074006 A CN202280074006 A CN 202280074006A CN 118202044 A CN118202044 A CN 118202044A
Authority
CN
China
Prior art keywords
seq
sequence
polypeptide
endonuclease
residue
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202280074006.6A
Other languages
Chinese (zh)
Inventor
布莱恩·C·托马斯
林俊良
亚伦·布鲁克斯
克里斯蒂娜·布特弗尔德
克利斯多佛·布朗
辛迪·卡斯泰勒
莫拉伊玛·特莫彻-迪亚兹
本杰明·弗里曼
克里斯汀·罗马诺
丽贝卡·拉莫特
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Macrogenomics
Original Assignee
Macrogenomics
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Macrogenomics filed Critical Macrogenomics
Priority claimed from PCT/US2022/079345 external-priority patent/WO2023081855A1/en
Publication of CN118202044A publication Critical patent/CN118202044A/en
Pending legal-status Critical Current

Links

Landscapes

  • Enzymes And Modification Thereof (AREA)

Abstract

The present disclosure provides endonucleases having distinguishing domain features, and methods of using such enzymes or variants thereof.

Description

Base editing enzyme
Cross reference to related applications
The present application claims the benefits of U.S. provisional application No. 63/276,461 filed on 5 of 11 of 2021, U.S. provisional application No. 63/289,998 filed on 15 of 12 of 2021, U.S. provisional application No. 63/342,824 filed on 17 of 5 of 2022, U.S. provisional application No. 63/356,888 filed on 29 of 6 of 2022, and U.S. provisional application No. 63/378,171 filed on 3 of 10 of 2022; each of the U.S. provisional applications is entitled "base editing enzyme (BASE EDITING ENZYMES)", and is incorporated herein by reference in its entirety. The present application relates to PCT patent application No. PCT/US2021/049962, which is incorporated herein by reference in its entirety.
Background
Cas enzymes and their associated Clustered Regularly Interspaced Short Palindromic Repeats (CRISPR) guide ribonucleic acids (RNAs) appear to be a common component of the prokaryotic immune system (about 45% bacteria, about 84% archaebacteria) for protecting such microorganisms from non-self nucleic acids, such as infectious viruses and plasmids, by CRISPR-RNA-guided nucleic acid cleavage. Although deoxyribonucleic acid (DNA) elements encoding CRISPR RNA elements may be relatively conserved in structure and length, their CRISPR-associated (Cas) proteins are highly diverse, containing a variety of nucleic acid interaction domains. Although CRISPR DNA elements were observed as early as 1987, the programmable endonuclease cleavage capability of CRISPR complexes was not until recently recognized, leading to the use of recombinant CRISPR systems in a variety of DNA manipulation and gene editing applications.
Sequence listing
The present application contains a sequence table that has been submitted electronically in XML format and is hereby incorporated in its entirety. The XML copies created at month 4 of 2022 were named 55921-742_601_SL.xml and were 2,274,288KB in size.
Disclosure of Invention
In some aspects, the present disclosure provides a method of deaminating a cytosine residue in a eukaryotic nucleic acid sequence in a cell, the method comprising: contacting a polypeptide having cytosine deaminase activity comprising a polypeptide having at least 80%, at least 81%, at least 82%, at least 83%, at least 84%, at least 85%, at least 86%, at least 87%, at least 88%, at least 86%, at least one of SEQ ID NOs 1-49, 444-447, 599-675, 744-835, 970-982 or variants thereof with said eukaryotic nucleic acid sequence, At least 89%, at least 90%, at least 91%, at least 92%, at least 93%, at least 94%, at least 95%, at least 96%, at least 97%, at least 98%, or at least 99% sequence identity. In some embodiments, the eukaryotic nucleic acid sequence is a mammalian, primate, or human nucleic acid sequence. In some embodiments, the cell is a mammalian cell, primate cell, or human cell. In some embodiments, the eukaryotic nucleic acid sequence comprises single stranded DNA (ssDNA) or ribonucleic acid (RNA). In some embodiments, the polypeptide having cytosine deaminase activity comprises a sequence having at least 80%, at least 81%, at least 82%, at least 83%, at least 84%, at least 85%, at least 86%, at least 87%, at least 88%, at least 89%, at least 90%, at least 91%, at least 92%, at least 93%, at least 94%, at least 95%, at least 96%, at least 97%, at least 98%, or at least 99% sequence identity with any one of SEQ ID NO:809-811、819、826、752、777、823、668-671、675、650、752、774、777、806、812、816、817、818、825、827、832、970-982 or variants thereof. In some embodiments, the polypeptide having cytosine deaminase activity comprises a sequence having at least 80%, at least 81%, at least 82%, at least 83%, at least 84%, at least 85%, at least 86%, at least 87%, at least 88%, at least 89%, at least 90%, at least 91%, at least 92%, at least 93%, at least 94%, at least 95%, at least 96%, at least 97%, at least 98%, or at least 99% sequence identity with any one of SEQ ID NOs 808, 810-811, 819, 826, 752, 777, or 823, or variants thereof. In some embodiments, the eukaryotic nucleic acid sequence comprises double-stranded DNA (dsDNA). In some embodiments, the polypeptide having cytosine deaminase activity comprises a sequence having at least 80%, at least 81%, at least 82%, at least 83%, at least 84%, at least 85%, at least 86%, at least 87%, at least 88%, at least 89%, at least 90%, at least 91%, at least 92%, at least 93%, at least 94%, at least 95%, at least 96%, at least 97%, at least 98%, or at least 99% sequence identity with any one of SEQ ID NOs 810-811 or variants thereof. In some embodiments, the polypeptide having cytosine deaminase activity further comprises a nucleic acid binding domain, an endonuclease, or a nicking enzyme. In some embodiments, the polypeptide having cytosine deaminase activity further comprises the endonuclease or the nicking enzyme, wherein the endonuclease or the nicking enzyme comprises a polypeptide having at least 80%, at least 81%, at least 82%, at least 83%, at least 84%, at least 85%, at least 86%, at least 87%, at least 88%, at least 89%, at least 90%, at least 91%, at least 92%, at least 93%, at least 94%, a polypeptide having cytosine deaminase activity, or a variant thereof, from any one of SEQ ID NOs 70-78, 596, 597, 1120, 1122-1127, 1647, or a variant thereof A sequence that is at least 95%, at least 96%, at least 97%, at least 98%, or at least 99% sequence identity. In some embodiments, the polypeptide having cytosine deaminase activity further comprises a nicking enzyme, wherein the nicking enzyme comprises a mutation from aspartic acid to alanine at: residue 9 relative to SEQ ID NO. 70, residue 13 relative to SEQ ID NO. 71, 72 or 74, residue 12 relative to SEQ ID NO. 73, residue 17 relative to SEQ ID NO. 75, residue 23 relative to SEQ ID NO. 76 or residue 10 relative to SEQ ID NO. 597 or any combination thereof. In some embodiments, the polypeptide having cytosine deaminase activity further comprises a uracil DNA glycosylase inhibitor sequence. In some embodiments, the uracil DNA glycosylase inhibitor comprises a sequence having at least 80%, at least 81%, at least 82%, at least 83%, at least 84%, at least 85%, at least 86%, at least 87%, at least 88%, at least 89%, at least 90%, at least 91%, at least 92%, at least 93%, at least 94%, at least 95%, at least 96%, at least 97%, at least 98%, or at least 99% sequence identity to any one of SEQ ID NOS: 52-56 or SEQ ID NO:67, or a variant thereof. In some embodiments, the polypeptide having cytosine deaminase activity further comprises a FAM72A sequence. In some embodiments, the FAM72A sequence has at least 80%, at least 81%, at least 82%, at least 83%, at least 84%, at least 85%, at least 86%, at least 87%, at least 88%, at least 89%, at least 90%, at least 91%, at least 92%, at least 93%, at least 94%, at least 95%, at least 96%, at least 97%, at least 98%, or at least 99% sequence identity to SEQ ID No. 1121 or a variant thereof.
In some aspects, the disclosure provides a method of deaminating a cytosine residue in a primate nucleic acid sequence in a cell, the method comprising: contacting a polypeptide having cytosine deaminase activity with a primate nucleic acid sequence, said polypeptide having cytosine deaminase activity comprising a polypeptide having at least 80%, at least 81%, at least 82%, at least 83%, at least 84%, at least 85%, at least 86%, at least 87%, at least 88%, at least 89%, at least 90%, a sequence of any one of SEQ ID NOs 599-638, 660-675, 828-835 or variants thereof, At least 91%, at least 92%, at least 93%, at least 94%, at least 95%, at least 96%, at least 97%, at least 98%, or at least 99% sequence identity. In some embodiments, the eukaryotic nucleic acid sequence comprises double stranded DNA (dsDNA), single stranded DNA (ssDNA), or ribonucleic acid (RNA). In some embodiments, the polypeptide having cytosine deaminase activity further comprises a nucleic acid binding domain, an endonuclease, or a nicking enzyme. In some embodiments, the polypeptide having cytosine deaminase activity further comprises the endonuclease or the nicking enzyme, wherein the endonuclease or the nicking enzyme comprises a polypeptide having at least 80%, at least 81%, at least 82%, at least 83%, at least 84%, at least 85%, at least 86%, at least 87%, at least 88%, at least 89%, at least 90%, at least 91%, at least 92%, at least 93%, at least 94%, a polypeptide having cytosine deaminase activity, or a variant thereof, from any one of SEQ ID NOs 70-78, 596, 597, 1120, 1122-1127, 1647, or a variant thereof A sequence that is at least 95%, at least 96%, at least 97%, at least 98%, or at least 99% sequence identity. In some embodiments, the polypeptide having cytosine deaminase activity further comprises a nicking enzyme, wherein the nicking enzyme comprises a mutation from aspartic acid to alanine at: residue 9 relative to SEQ ID NO. 70, residue 13 relative to SEQ ID NO. 71, 72 or 74, residue 12 relative to SEQ ID NO. 73, residue 17 relative to SEQ ID NO. 75, residue 23 relative to SEQ ID NO. 76 or residue 10 relative to SEQ ID NO. 597 or any combination thereof. In some embodiments, the polypeptide having cytosine deaminase activity further comprises a uracil DNA glycosylase inhibitor sequence. In some embodiments, the uracil DNA glycosylase inhibitor comprises a sequence having at least 80%, at least 81%, at least 82%, at least 83%, at least 84%, at least 85%, at least 86%, at least 87%, at least 88%, at least 89%, at least 90%, at least 91%, at least 92%, at least 93%, at least 94%, at least 95%, at least 96%, at least 97%, at least 98%, or at least 99% sequence identity to any one of SEQ ID NOS: 52-56 or SEQ ID NO:67, or a variant thereof. In some embodiments, the polypeptide having cytosine deaminase activity further comprises a FAM72A sequence. In some embodiments, the FAM72A sequence has at least 80%, at least 81%, at least 82%, at least 83%, at least 84%, at least 85%, at least 86%, at least 87%, at least 88%, at least 89%, at least 90%, at least 91%, at least 92%, at least 93%, at least 94%, at least 95%, at least 96%, at least 97%, at least 98%, or at least 99% sequence identity to SEQ ID No. 1121 or a variant thereof.
In some aspects, the disclosure provides a nucleic acid comprising an engineered nucleic acid sequence optimized for expression in a mammalian organism, wherein the nucleic acid encodes a sequence having at least 80%, at least 81%, at least 82%, at least 83%, at least 84%, at least 85%, at least 86%, at least 87%, at least 88%, at least 89%, at least 90%, at least 91%, at least 92%, at least 93%, at least 94%, at least 95%, at least 96%, at least 97%, at least 98%, or at least 99% sequence identity to any one of SEQ ID NOs 1-49, 444-447, 599-675, 744-835, 970-982, or variants thereof. In some embodiments, the nucleic acid encodes a sequence that has at least 80%, at least 81%, at least 82%, at least 83%, at least 84%, at least 85%, at least 86%, at least 87%, at least 88%, at least 89%, at least 90%, at least 91%, at least 92%, at least 93%, at least 94%, at least 95%, at least 96%, at least 97%, at least 98%, or at least 99% sequence identity to any one of SEQ ID NO:809-811、819、826、752、777、823、668-671、675、650、752、774、777、806、812、816、817、818、825、827、832、832、970-982 or variants thereof.
In some aspects, the disclosure provides a nucleic acid encoding any of the polypeptides described herein.
In some aspects, the disclosure provides a vector comprising any of the nucleic acids described herein.
In some aspects, the present disclosure provides a fusion polypeptide comprising: (a) A domain having cytosine deaminase activity comprising at least 80%, at least 81%, at least 82%, at least 83%, at least 84%, at least 85%, at least 86%, at least 87%, at least 88%, at least 89%, at least 90%, at least 91%, at least 92%, at least 93%, at least 91%, at least 93%, at least 80%, at least 81%, at least 82%, at least 83%, at least 84%, at least 86%, at least 87%, at least 88%, at least 89%, at least 90%, at least 91%, at least 92% or variants thereof with any one of SEQ ID nos. 1-49, 444-447, 599-675, 744-835, 970-982 A sequence that is at least 94%, at least 95%, at least 96%, at least 97%, at least 98%, or at least 99% sequence identity; and (b) a nucleic acid binding domain, an endonuclease domain or a nicking enzyme domain. In some embodiments, the domain having cytosine deaminase activity comprises a sequence having at least 80%, at least 81%, at least 82%, at least 83%, at least 84%, at least 85%, at least 86%, at least 87%, at least 88%, at least 89%, at least 90%, at least 91%, at least 92%, at least 93%, at least 94%, at least 95%, at least 96%, at least 97%, at least 98%, or at least 99% sequence identity to any one of SEQ ID NO:809-811、819、826、752、777、823、668-671、675、650、752、774、777、806、812、816、817、818、825、827、832、832、970-982 or variants thereof. In some embodiments, the domain having cytosine deaminase activity comprises a sequence having at least 80%, at least 81%, at least 82%, at least 83%, at least 84%, at least 85%, at least 86%, at least 87%, at least 88%, at least 89%, at least 90%, at least 91%, at least 92%, at least 93%, at least 94%, at least 95%, at least 96%, at least 97%, at least 98%, or at least 99% sequence identity with any one of SEQ ID NOs 809-811, 819, 826, 752, 777, 823, or variants thereof. In some embodiments, the fusion polypeptide comprises the endonuclease domain or the nicking enzyme domain, wherein the endonuclease domain or the nicking enzyme domain comprises a sequence that has at least 80%, at least 81%, at least 82%, at least 83%, at least 84%, at least 85%, at least 86%, at least 87%, at least 88%, at least 89%, at least 90%, at least 91%, at least 92%, at least 93%, a sequence that is different from any one of SEQ ID NOs 70-78, 596, 597-598, 1120, 1122-1127, 1647, or variants thereof At least 94%, at least 95%, at least 96%, at least 97%, at least 98%, or at least 99% sequence identity. In some embodiments, the fusion protein comprises the nicking enzyme domain, wherein the nicking enzyme domain comprises a mutation from aspartic acid to alanine at: residue 9 relative to SEQ ID NO. 70, residue 13 relative to SEQ ID NO. 71, 72 or 74, residue 12 relative to SEQ ID NO. 73, residue 17 relative to SEQ ID NO. 75, residue 23 relative to SEQ ID NO. 76 or residue 10 relative to SEQ ID NO. 597 or any combination thereof. In some embodiments, the fusion protein comprises a sequence having at least 80%, at least 81%, at least 82%, at least 83%, at least 84%, at least 85%, at least 86%, at least 87%, at least 88%, at least 89%, at least 90%, at least 91%, at least 92%, at least 93%, at least 94%, at least 95%, at least 96%, at least 97%, at least 98%, or at least 99% sequence identity to any one of SEQ ID NOs 877-916 or 968-969 or variants thereof.
In some aspects, the present disclosure provides a system comprising: (a) Any of the fusion proteins (e.g., an endonuclease-base editor or an endonuclease-deaminase fusion); and (b) an engineered guide-polynucleotide configured to form a complex with the endonuclease domain, the engineered guide-polynucleotide comprising: (i) A guide ribonucleic acid sequence configured to hybridize to a target deoxyribonucleic acid sequence; and (ii) a ribonucleic acid sequence configured to bind to the endonuclease domain. In some embodiments, the engineered guide-polynucleotide further comprises a sequence having at least 80%, at least 81%, at least 82%, at least 83%, at least 84%, at least 85%, at least 86%, at least 87%, at least 88%, at least 89%, at least 90%, at least 91%, at least 92%, at least 93%, at least 94%, at least 95%, at least 96%, at least 97%, at least 98%, or at least 99% sequence identity to a non-degenerate nucleotide of any one of SEQ ID NOs 88-96, 917-931, 963-967, 1099-1105, or a variant thereof.
In some aspects, the present disclosure provides a polypeptide having adenosine deaminase activity, comprising: a sequence having at least 80%, at least 81%, at least 82%, at least 83%, at least 84%, at least 85%, at least 86%, at least 87%, at least 88%, at least 89%, at least 90%, at least 91%, at least 92%, at least 93%, at least 94%, at least 95%, at least 96%, at least 97%, at least 98% or at least 99% sequence identity to any one of SEQ ID NOs 50, 51, 385-443, 448-475 or a variant thereof, wherein when optimally aligned, the polypeptide comprises substitution relative to at least one of the following residues of SEQ ID NO:50 or any combination thereof: t2, D7, E10, M13, W24, G32, K38, G45, G51, A63, E66, R75, C91, G93, H97, A107, E108, D109, P110, H124, A126, H129, F150 or S165. In some embodiments, when optimally aligned, the substitution comprises T2X1、D7X1、E10X1、M13X4、W24X1、G32X1、K38X2、G45X2、G51X5、A63X7、E66X5、E66X2、R75H、C91R、G93X6、H97X6、H97X5、A107X5、E108X2、D109N、P110H、H124X6、A126X2、H129R、H129N、F150P、F150S、S165X5 relative to SEQ ID NO. 50 or MG68-4, or any combination thereof, wherein X 1 is A or G; x 2 is D or E; x 3 is N or Q; x 4 is R or K; x 5 is I, L, M or V; x 6 is F, Y or W; and X 7 is S or T. In some embodiments, the polypeptide comprises a sequence having at least 80%, at least 81%, at least 82%, at least 83%, at least 84%, at least 85%, at least 86%, at least 87%, at least 88%, at least 89%, at least 90%, at least 91%, at least 92%, at least 93%, at least 94%, at least 95%, at least 96%, at least 97%, at least 98%, or at least 99% sequence identity to any one of SEQ ID NOs 836-860 or variants thereof. In some embodiments, the polypeptide comprises any one of SEQ ID NOs 839, 841, 843, 844, 847, 848, 849, 850, 851, 852, 859, or variants thereof. In some embodiments, when optimally aligned, the substitutions include W24G, G51V, E108D, P110H, F P, D G, E10G or H129N, or any combination thereof, relative to SEQ ID NO 50 or MG 68-4. In some embodiments, the polypeptide further comprises a nucleic acid binding domain, an endonuclease domain, or a nicking enzyme domain. In some embodiments, the polypeptide comprises the endonuclease domain or the nicking enzyme domain, wherein the endonuclease domain or the nicking enzyme domain comprises a polypeptide having at least 80%, at least 81%, at least 82%, at least 83%, at least 84%, at least 85%, at least 86%, at least 87%, at least 88%, at least 89%, at least 90%, at least 91%, at least 92%, at least 93%, at least 94%, or variants thereof with any of SEQ ID NOS: 70-78, 596, 597-598, 1120, 1122-1127, 1647 A sequence that is at least 95%, at least 96%, at least 97%, at least 98%, or at least 99% sequence identity. In some embodiments, the polypeptide comprises the nicking enzyme domain, wherein the nicking enzyme domain comprises a mutation from aspartic acid to alanine at: residue 9 relative to SEQ ID NO. 70, residue 13 relative to SEQ ID NO. 71, 72 or 74, residue 12 relative to SEQ ID NO. 73, residue 17 relative to SEQ ID NO. 75, residue 23 relative to SEQ ID NO. 76 or residue 10 relative to SEQ ID NO. 597 or any combination thereof.
In some aspects, the present disclosure provides a system comprising: (a) Any of the polypeptides or fusion polypeptides described herein; and (b) an engineered guide-polynucleotide configured to form a complex with the endonuclease domain, the engineered guide-polynucleotide comprising: (i) A guide ribonucleic acid sequence configured to hybridize to a target deoxyribonucleic acid sequence; and (ii) a ribonucleic acid sequence configured to bind to the endonuclease domain. In some embodiments, the engineered guide-polynucleotide further comprises a sequence having at least 80%, at least 81%, at least 82%, at least 83%, at least 84%, at least 85%, at least 86%, at least 87%, at least 88%, at least 89%, at least 90%, at least 91%, at least 92%, at least 93%, at least 94%, at least 95%, at least 96%, at least 97%, at least 98%, or at least 99% sequence identity to a non-degenerate nucleotide of any one of SEQ ID NOs 88-96, 917-931, 963-967, 1099-1105, or a variant thereof;
In some aspects, the present disclosure provides a method of deaminating a cytosine residue in a cell, the method comprising introducing into the cell: (a) A vector encoding a polypeptide having cytosine deaminase activity; and (b) a vector encoding FAM72A protein. In some embodiments, the vector encoding the FAM72A protein comprises a nucleotide sequence that hybridizes to SEQ ID NO:1115 or a variant thereof has at least 80%, at least 81%, at least 82%, at least 83%, at least 84%, at least 85%, at least 86%, at least 87%, at least 88%, at least 89%, at least 90%, at least 91%, at least 92%, at least 93%, at least 94%, at least 95%, at least 96%, at least 97%, at least 98% or at least 99% sequence identity, or encodes a sequence having at least 80% sequence identity to SEQ ID NO 1121 or a variant thereof, At least 81%, at least 82%, at least 83%, at least 84%, at least 85%, at least 86%, at least 87%, at least 88%, at least 89%, at least 90%, at least 91%, at least 92%, at least 93%, at least 94%, at least 95%, at least 96%, at least 97%, at least 98%, or at least 99% sequence identity. In some embodiments, the polypeptide having cytosine deaminase activity comprises a sequence having at least 80%, at least 81%, at least 82%, at least 83%, at least 84%, at least 85%, at least 86%, at least 87%, at least 88%, at least 89%, at least 90%, at least 91%, at least 92%, at least 93%, at least 94%, at least 95%, at least 96%, at least 97%, at least 98%, or at least 99% sequence identity with any one of SEQ ID NOs 1-49, 444-447, 599-675, 744-835, 970-982, or variants thereof. In some embodiments, the polypeptide having cytosine deaminase activity further comprises a nucleic acid binding domain, an endonuclease domain, or a nicking enzyme domain. In some embodiments, the polypeptide having cytosine deaminase activity comprises the endonuclease domain or the nicking enzyme domain, wherein the endonuclease domain or the nicking enzyme domain comprises a sequence that hybridizes with SEQ ID NO:70-78, 596, 597-598, 1120, 1122-1127, 1647, or variants thereof, having at least 80%, at least 81%, at least 82%, at least 83%, at least 84%, at least 85%, at least 86%, at least 87%, at least 88%, at least 89%, at least 90%, at least 91%, at least 92%, at least, At least 93%, at least 94%, at least 95%, at least 96%, at least 97%, at least 98%, or at least 99% sequence identity. In some embodiments, the polypeptide having cytosine deaminase activity comprises the nicking enzyme domain, wherein the nicking enzyme domain comprises a mutation from aspartic acid to alanine at: residue 9 relative to SEQ ID NO. 70, residue 13 relative to SEQ ID NO. 71, 72 or 74, residue 12 relative to SEQ ID NO. 73, residue 17 relative to SEQ ID NO. 75, residue 23 relative to SEQ ID NO. 76 or residue 10 relative to SEQ ID NO. 597 or any combination thereof.
In some aspects, the present disclosure provides an engineered nucleic acid editing polypeptide comprising (i) a sequence having cytosine deaminase activity; and (ii) a sequence derived from FAM72A protein. In some embodiments, the sequence having cytosine deaminase activity has at least 80%, at least 81%, at least 82%, at least 83%, at least 84%, at least 85%, at least 86%, at least 87%, at least 88%, at least 89%, at least 90%, at least 91%, at least 92%, at least 93%, at least 94%, at least 95%, at least 96%, at least 97%, at least 98%, or at least 99% sequence identity with any one of SEQ ID NOs 1-49, 444-447, 599-675, 744-835, 970-982, or variants thereof. In some embodiments, the sequence derived from the FAM72A protein has at least 80%, at least 81%, at least 82%, at least 83%, at least 84%, at least 85%, at least 86%, at least 87%, at least 88%, at least 89%, at least 90%, at least 91%, at least 92%, at least 93%, at least 94%, at least 95%, at least 96%, at least 97%, at least 98%, or at least 99% sequence identity to SEQ ID No. 1121 or a variant thereof. In some embodiments, the polypeptide further comprises an endonuclease sequence comprising a RuvC domain and an HNH domain, wherein the endonuclease sequence is a sequence of a type II endonuclease. In some embodiments, the RuvC domain lacks nuclease activity. In some embodiments, the endonuclease comprises a nicking enzyme. In some embodiments, the class 2 type II endonuclease sequence has at least 80%, at least 81%, at least 82%, at least 83%, at least 84%, at least 85%, at least 86%, at least 87%, at least 88%, at least 89%, at least 90%, at least 91%, at least 92%, at least 93%, at least 94%, at least 95%, at least 96%, at least 97%, at least 98%, or at least 99% sequence identity to any one of SEQ ID NOs 70-78, 596, 597-598, 1120, 1122-1127, 1647, or a variant thereof. In some embodiments, when optimally aligned, the class 2 type II endonuclease comprises a mutation from aspartic acid to alanine at: residue 9 relative to SEQ ID NO. 70, residue 13 relative to SEQ ID NO. 71, 72 or 74, residue 12 relative to SEQ ID NO. 73, residue 17 relative to SEQ ID NO. 75, residue 23 relative to SEQ ID NO. 76 or residue 10 relative to SEQ ID NO. 597.
In some aspects, the present disclosure provides a method of editing a cytosine residue in a cell to a thymine residue, the method comprising contacting any of the cytosine deaminase fusion polypeptides described herein with the cell. In some embodiments, the cell is a prokaryotic cell, eukaryotic cell, mammalian cell, primate cell, or human cell.
In some aspects, the present disclosure provides an engineered nucleic acid editing polypeptide comprising: a plurality of domains derived from a class 2 type II endonuclease, wherein the domains comprise a RUVC-I domain, a REC domain, a HNH domain, a RUVC-III domain, and a WED domain; and a domain comprising a base editor sequence, wherein the base editor sequence is inserted into: (a) The base editor sequence is inserted within the RUVC-I domain; (b) The base editor sequence is inserted within the REC domain; (c) The base editor sequence is inserted within the HNH domain; (d) The base editor sequence is inserted within the RUV-CIII domain; (e) The base editor sequence is inserted within the WED domain; (f) The base editor sequence is inserted prior to the HNH domain; (g) The base editor sequence is inserted prior to the RUV-CIII domain; or (h) the base editor sequence is interposed between the RUVC-III domain and the WED domain. In some embodiments, the class 2 type II endonuclease comprises a sequence that has at least 80%, at least 81%, at least 82%, at least 83%, at least 84%, at least 85%, at least 86%, at least 87%, at least 88%, at least 89%, at least 90%, at least 91%, at least 92%, at least 93%, at least 94%, at least 95%, at least 96%, at least 97%, at least 98%, or at least 99% sequence identity to any one of SEQ ID NOs or variants thereof. In some embodiments, the class 2 type II endonuclease comprises a sequence that has at least 80% sequence identity to SEQ ID NO. 1647 or a variant thereof. In some embodiments, the base editor sequence comprises a deaminase sequence. In some embodiments, the deaminase sequence has at least 80%, at least 81%, at least 82%, at least 83%, at least 84%, at least 85%, at least 86%, at least 87%, at least 88%, at least 89%, at least 90%, at least 91%, at least 92%, at least 93%, at least 94%, at least 95%, at least 96%, at least 97%, at least 80%, at least 81%, at least 82%, at least 83%, at least 84%, at least 85%, at least 86%, at least 87%, at least 88%, at least 89%, at least 90%, at least 91%, at least 92%, at least 93%, at least 94%, at least 95%, at least 96%, at least 97%, at least 982, 50, 51, 385-443, 448-475 with any of SEQ ID NO's of 1-49, 447, 599-675, 744-835, 970-982, or variants thereof At least 98% or at least 99% sequence identity. In some embodiments, the deaminase sequence has at least 80%, at least 81%, at least 82%, at least 83%, at least 84%, at least 85%, at least 86%, at least 87%, at least 88%, at least 89%, at least 90%, at least 91%, at least 92%, at least 93%, at least 94%, at least 95%, at least 96%, at least 97%, at least 98%, or at least 99% sequence identity with any of SEQ ID NOs 1-49, 444-447, 599-675, 744-835, 970-982, or variants thereof. In some embodiments, the deaminase sequence has at least 80%, at least 81%, at least 82%, at least 83%, at least 84%, at least 85%, at least 86%, at least 87%, at least 88%, at least 89%, at least 90%, at least 91%, at least 92%, at least 93%, at least 94%, at least 95%, at least 96%, at least 97%, at least 98%, or at least 99% sequence identity with any of SEQ ID NOs 50, 51, 385-443, 448-475, or variants thereof. In some embodiments, the deaminase has at least 80% sequence identity with SEQ ID NO 386 or a variant thereof. In some embodiments, when optimally aligned, the deaminase sequence comprises a substitution relative to one or any combination of the following residues of SEQ ID NO 50 or MG 68-4: t2, D7, E10, M13, W24, G32, K38, G45, G51, A63, E66, R75, C91, G93, H97, A107, E108, D109, P110, H124, A126, H129, F150 or S165. In some embodiments, the engineered nucleic acid editing polypeptide comprises a sequence having at least 80%, at least 81%, at least 82%, at least 83%, at least 84%, at least 85%, at least 86%, at least 87%, at least 88%, at least 89%, at least 90%, at least 91%, at least 92%, at least 93%, at least 94%, at least 95%, at least 96%, at least 97%, at least 98%, or at least 99% sequence identity to any one of SEQ ID NOS 1128-1160, or variants thereof. In some embodiments, the engineered nucleic acid editing polypeptide comprises a sequence having at least 80%, at least 81%, at least 82%, at least 83%, at least 84%, at least 85%, at least 86%, at least 87%, at least 88%, at least 89%, at least 90%, at least 91%, at least 92%, at least 93%, at least 94%, at least 95%, at least 96%, at least 97%, at least 98%, or at least 99% sequence identity to any one of SEQ ID NOs 1137, 1140, 1142, 1143, 1146, 1149, 1151-1158, or variants thereof. In some embodiments, the engineered nucleic acid editing polypeptide comprises a sequence having at least 80%, at least 81%, at least 82%, at least 83%, at least 84%, at least 85%, at least 86%, at least 87%, at least 88%, at least 89%, at least 90%, at least 91%, at least 92%, at least 93%, at least 94%, at least 95%, at least 96%, at least 97%, at least 98%, or at least 99% sequence identity to any one of SEQ ID NOs 1139, 1152, 1158, or variants thereof.
In some aspects, the present disclosure provides a polypeptide having adenosine deaminase activity, comprising: a sequence having at least 80%, at least 81%, at least 82%, at least 83%, at least 84%, at least 85%, at least 86%, at least 87%, at least 88%, at least 89%, at least 90%, at least 91%, at least 92%, at least 93%, at least 94%, at least 95%, at least 96%, at least 97%, at least 98% or at least 99% sequence identity to any one of SEQ ID NOs 50, 51, 385-443, 448-475 or a variant thereof, wherein when optimally aligned, the polypeptide comprises a substitution of the non-wild type residue at residue 109 relative to the wild type residue of SEQ ID NO:386 and one other residue comprising any one or any combination of: 24. 37, 49, 52, 83, 85, 107, 110, 112, 120, 123, 124, 147, 148, 150, 156, 157, 158, 166, 167 or 129. In some embodiments, the sequence has at least 80%, at least 81%, at least 82%, at least 83%, at least 84%, at least 85%, at least 86%, at least 87%, at least 88%, at least 89%, at least 90%, at least 91%, at least 92%, at least 93%, at least 94%, at least 95%, at least 96%, at least 97%, at least 98%, or at least 99% sequence identity to SEQ ID NO 386. In some embodiments, when optimally aligned, the polypeptide comprises a substitution of 109N relative to SEQ ID No. 386 and at least one other substitution comprising any one or any combination of: 24R, 37L, 49A, 52L, 83S, 85F, 107V, 110S, 112R, 120N, 123N, 124Y, 147C, 148Y, 148R, 150Y, 156V, 157F, 158N, 166I, or 129N. In some embodiments, the peptide includes any of the substitutions depicted in fig. 34B. In some embodiments, the polypeptide has at least 80% sequence identity to any one of SEQ ID NOS 1161-1183 or variants thereof. In some embodiments, the polypeptide has at least 80%, at least 81%, at least 82%, at least 83%, at least 84%, at least 85%, at least 86%, at least 87%, at least 88%, at least 89%, at least 90%, at least 91%, at least 92%, at least 93%, at least 94%, at least 95%, at least 96%, at least 97%, at least 98%, or at least 99% sequence identity to any one of SEQ ID NOs 1170, 1179, or 1166, or variants thereof. In some embodiments, the polypeptide further comprises an endonuclease or a nicking enzyme. In some embodiments, the polypeptide comprises the endonuclease or the nicking enzyme, wherein the endonuclease or the nicking enzyme comprises a polypeptide having at least 80%, at least 81%, at least 82%, at least 83%, at least 84%, at least 85%, at least 86%, at least 87%, at least 88%, at least 89%, at least 90%, at least 91%, at least 92%, at least 93%, at least 94%, at least 95%, at least 96%, a variant thereof, with any one of SEQ ID nos. 70-78, 596, 597-598, 1120, 1122-1127, 1647, or a variant thereof Sequences that are at least 97%, at least 98%, or at least 99% sequence identity. In some embodiments, the polypeptide comprises the nicking enzyme, wherein the nicking enzyme comprises a mutation from aspartic acid to alanine at: residue 9 relative to SEQ ID NO. 70, residue 13 relative to SEQ ID NO. 71, 72 or 74, residue 12 relative to SEQ ID NO. 73, residue 17 relative to SEQ ID NO. 75, residue 23 relative to SEQ ID NO. 76 or residue 10 relative to SEQ ID NO. 597 or any combination thereof.
In some aspects, the present disclosure provides a polypeptide having cytosine deaminase activity, the polypeptide comprising: a sequence having at least 80%, at least 81%, at least 82%, at least 83%, at least 84%, at least 85%, at least 86%, at least 87%, at least 88%, at least 89%, at least 90%, at least 91%, at least 92%, at least 93%, at least 94%, at least 95%, at least 96%, at least 97%, at least 98% or at least 99% sequence identity to any one of SEQ ID NOs 1-49, 444-447, 599-675, 744-835, 970-982 or variants thereof; wherein the polypeptide comprises at least one of the changes described in table 12C. In some embodiments, the polypeptide has at least one substitution :W90A、W90F、W90H、W90Y、Y120F、Y120H、Y121F、Y121H、Y121Q、Y121A、Y121D、Y121W、H122Y、H122F、H122I、H122A、H122W、H122D、Y121T、R33A、R34A、R34K、H122A、R33A、R34A、R52A、N57G、H122A、E123A、E123Q、W127F、W127H、W127Q、W127A、W127D、R39A、K40A、H128A、N63G、R58A、H121F、H121Y、H121Q、H121A、H121D、H121W、R33A、K34A、H122A、H121A、R52A、P26R、P26A、N27R、N27A、W44A、W45A、K49G、S50G、R51G、R121A、I122A、N123A、Y88F、Y120F、P22R、P22A、K23A、K41R、K41A、E54A、E54A、E55A、K30A、K30R、M32A、M32K、Y117A、K118A、I119A、I119H、R120A、R121A、P46A、P46R、N29A、R27A or N50G of a wild-type amino acid pair comprising a non-wild-type amino acid of any one of, or any combination of, the following, optionally with respect to an apodec polypeptide. In some embodiments, the polypeptide comprises a sequence having at least 80%, at least 81%, at least 82%, at least 83%, at least 84%, at least 85%, at least 86%, at least 87%, at least 88%, at least 89%, at least 90%, at least 91%, at least 92%, at least 93%, at least 94%, at least 95%, at least 96%, at least 97%, at least 98% or at least 99% sequence identity to any one of SEQ ID NOs 1208-1315 or variants thereof.
In some aspects, the present disclosure provides a polypeptide having cytosine deaminase activity, the polypeptide comprising: a cytosine deaminase sequence having at least 80%, at least 81%, at least 82%, at least 83%, at least 84%, at least 85%, at least 86%, at least 87%, at least 88%, at least 89%, at least 90%, at least 91%, at least 92%, at least 93%, at least 94%, at least 95%, at least 96%, at least 97%, at least 98%, or at least 99% sequence identity with any one of SEQ ID NO:835、1275、668、774、818、671、667、650、827、819、823、814、813、817、628、826、1223、834、618、621、669、833、830 or variants thereof; endonucleases or nicking enzymes. In some embodiments, the endonuclease or the nicking enzyme comprises a sequence having at least 80% identity to any one of SEQ ID NOs 70-78, 596, 597-598, 1120, 1122-1127, 1647, or variants thereof. In some embodiments, the polypeptide comprises the nicking enzyme, wherein the nicking enzyme comprises a mutation from aspartic acid to alanine at: residue 9 relative to SEQ ID NO. 70, residue 13 relative to SEQ ID NO. 71, 72 or 74, residue 12 relative to SEQ ID NO. 73, residue 17 relative to SEQ ID NO. 75, residue 23 relative to SEQ ID NO. 76 or residue 10 relative to SEQ ID NO. 597 or any combination thereof. In some embodiments, the cytosine deaminase sequence has at least 80% sequence identity with any one of SEQ ID NOs 1275, 835 or 774, or a combination thereof.
In some aspects, the present disclosure provides a polypeptide having adenosine deaminase activity, comprising: a sequence having at least 80%, at least 81%, at least 82%, at least 83%, at least 84%, at least 85%, at least 86%, at least 87%, at least 88%, at least 89%, at least 90%, at least 91%, at least 92%, at least 93%, at least 94%, at least 95%, at least 96%, at least 97%, at least 98% or at least 99% sequence identity to any one of SEQ ID NOs 50, 51, 385-443, 448-475, 1015-1098 or variants thereof, wherein said polypeptide comprises any combination of substitutions of wild-type residues to non-wild-type residues recited in table 12D. In some embodiments, the polypeptide has at least 80%, at least 81%, at least 82%, at least 83%, at least 84%, at least 85%, at least 86%, at least 87%, at least 88%, at least 89%, at least 90%, at least 91%, at least 92%, at least 93%, at least 94%, at least 95%, at least 96%, at least 97%, at least 98% or at least 99% sequence identity to any one of SEQ ID NOS 1556-1638 or variants thereof. In some embodiments, the polypeptide further comprises an endonuclease or a nicking enzyme. In some embodiments, the polypeptide comprises the endonuclease or the nicking enzyme, wherein the endonuclease or the nicking enzyme comprises a sequence having at least 80%, at least 81%, at least 82%, at least 83%, at least 84%, at least 85%, at least 86%, at least 87%, at least 88%, at least 89%, at least 90%, at least 91%, at least 92%, at least 93%, at least 94%, at least 95%, at least 96%, at least 97%, at least 98%, or at least 99% sequence identity to any one of SEQ ID NOs 70-78, 596, 597-598, 1120 or 1122-1127, 1647, or variants thereof. In some embodiments, the polypeptide comprises the nicking enzyme, wherein the nicking enzyme comprises a mutation from aspartic acid to alanine at: residue 9 relative to SEQ ID NO. 70, residue 13 relative to SEQ ID NO. 71, 72 or 74, residue 12 relative to SEQ ID NO. 73, residue 17 relative to SEQ ID NO. 75, residue 23 relative to SEQ ID NO. 76 or residue 10 relative to SEQ ID NO. 597 or any combination thereof.
In some aspects, the present disclosure provides a polypeptide having adenosine deaminase activity, comprising: a sequence having at least 80%, at least 81%, at least 82%, at least 83%, at least 84%, at least 85%, at least 86%, at least 87%, at least 88%, at least 89%, at least 90%, at least 91%, at least 92%, at least 93%, at least 94%, at least 95%, at least 96%, at least 97%, at least 98% or at least 99% sequence identity to any one of SEQ ID NOs 50, 51, 385-443, 448-475, 1015-1098 or variants thereof, wherein said polypeptide comprises any combination of substitutions of wild-type residues to non-wild-type residues listed in table 13. In some embodiments, the sequence has at least 80%, at least 81%, at least 82%, at least 83%, at least 84%, at least 85%, at least 86%, at least 87%, at least 88%, at least 89%, at least 90%, at least 91%, at least 92%, at least 93%, at least 94%, at least 95%, at least 96%, at least 97%, at least 98%, or at least 99% sequence identity to SEQ ID NO 386 or variants thereof. In some embodiments, the polypeptide further comprises an endonuclease or a nicking enzyme. In some embodiments, the polypeptide comprises the endonuclease or the nicking enzyme, wherein the endonuclease or the nicking enzyme comprises a sequence having at least 80%, at least 81%, at least 82%, at least 83%, at least 84%, at least 85%, at least 86%, at least 87%, at least 88%, at least 89%, at least 90%, at least 91%, at least 92%, at least 93%, at least 94%, at least 95%, at least 96%, at least 97%, at least 98%, or at least 99% sequence identity to any one of SEQ ID NOs 70-78, 596, 597-598, 1120 or 1122-1127, 1647, or variants thereof. In some embodiments, the polypeptide comprises the nicking enzyme, wherein the nicking enzyme comprises a mutation from aspartic acid to alanine at: residue 9 relative to SEQ ID NO. 70, residue 13 relative to SEQ ID NO. 71, 72 or 74, residue 12 relative to SEQ ID NO. 73, residue 17 relative to SEQ ID NO. 75, residue 23 relative to SEQ ID NO. 76 or residue 10 relative to SEQ ID NO. 597 or any combination thereof.
In some aspects, the present disclosure provides a method of editing an APOA1 locus in a cell, the method comprising contacting the following with the cell: (a) an RNA-guided endonuclease; and (b) an engineered guide nucleic acid structure, wherein the engineered guide nucleic acid structure is configured to form a complex with the endonuclease and the engineered guide nucleic acid structure comprises a spacer sequence configured to hybridize to a region of the APOA1 locus, wherein the engineered guide nucleic acid structure comprises a targeting sequence having at least 80%, at least 81%, at least 82%, at least 83%, at least 84%, at least 85%, at least 86%, at least 87%, at least 88%, at least 89%, at least 90%, at least 91%, at least 92%, at least 93%, at least 94%, at least 95%, at least 96%, at least 97%, at least 98%, or at least 99% sequence identity to at least 18, 19, 20, 21, 22, 23, 24, 25, or 26 consecutive nucleotides of any of SEQ ID nos. 1455-1478, or the reverse complement thereof. In some embodiments, the engineered guide-nucleic acid structure has at least 80%, at least 81%, at least 82%, at least 83%, at least 84%, at least 85%, at least 86%, at least 87%, at least 88%, at least 89%, at least 90%, at least 91%, at least 92%, at least 93%, at least 94%, at least 95%, at least 96%, at least 97%, at least 98%, or at least 99% sequence identity to any of SEQ ID NOs 1431-1454. In some embodiments, the engineered guide-nucleic acid structure comprises any of the nucleotide modifications listed in table 13A. In some embodiments, the RNA-guided endonuclease is a type 2 II endonuclease. In some embodiments, the RNA guided endonuclease has at least 80%, at least 81%, at least 82%, at least 83%, at least 84%, at least 85%, at least 86%, at least 87%, at least 88%, at least 89%, at least 90%, at least 91%, at least 92%, at least 93%, at least 94%, at least 95%, at least 96%, at least 97%, at least 98%, or at least 99% sequence identity to any one of SEQ ID NOs 70-78, 596, 597-598, 1120, 1122-1127, 1647, or a variant thereof.
In some aspects, the present disclosure provides a method of editing an ANGPTL3 locus in a cell, the method comprising contacting the cell with: (a) an RNA-guided endonuclease; and (b) an engineered guide nucleic acid structure, wherein the engineered guide nucleic acid structure is configured to form a complex with the endonuclease and the engineered guide nucleic acid structure comprises a spacer sequence configured to hybridize to a region of the ANGPTL3 locus, wherein the engineered guide nucleic acid structure comprises a targeting sequence having at least 80%, at least 81%, at least 82%, at least 83%, at least 84%, at least 85%, at least 86%, at least 87%, at least 88%, at least 89%, at least 90%, at least 91%, at least 92%, at least 93%, at least 94%, at least 95%, at least 96%, at least 97%, at least 98%, or at least 99% sequence identity to at least 18, 19, 20, 21, 22, 23, 24, 25, or 26 consecutive nucleotides of any of SEQ ID nos. 1484-1488, or the reverse complement thereof. In some embodiments, the engineered guide-nucleic acid structure has at least 80% identity to any one of SEQ ID NOs 1479-1483. In some embodiments, the engineered guide-nucleic acid structure comprises any of the nucleotide modifications listed in table 13A. In some embodiments, the RNA-guided endonuclease is a type 2 II endonuclease. In some embodiments, the RNA guided endonuclease has at least 80%, at least 81%, at least 82%, at least 83%, at least 84%, at least 85%, at least 86%, at least 87%, at least 88%, at least 89%, at least 90%, at least 91%, at least 92%, at least 93%, at least 94%, at least 95%, at least 96%, at least 97%, at least 98%, or at least 99% sequence identity to any one of SEQ ID NOs 70-78, 596, 597-598, 1120, 1122-1127, 1647, or a variant thereof.
In some aspects, the present disclosure provides a method of editing a TRAC locus in a cell, the method comprising contacting the following with the cell: (a) an RNA-guided endonuclease; and (b) an engineered guide nucleic acid structure, wherein the engineered guide nucleic acid structure is configured to form a complex with the endonuclease and the engineered guide nucleic acid structure comprises a spacer sequence configured to hybridize to a region of the TRAC locus, wherein the engineered guide nucleic acid structure comprises a targeting sequence having at least 80%, at least 81%, at least 82%, at least 83%, at least 84%, at least 85%, at least 86%, at least 87%, at least 88%, at least 89%, at least 90%, at least 91%, at least 92%, at least 93%, at least 94%, at least 95%, at least 96%, at least 97%, at least 98% or at least 99% sequence identity to at least 18, 19, 20, 21, 22, 23, 24, 25 or 26 consecutive nucleotides of any of SEQ ID nos. 1491-1492, or a complement thereof. In some embodiments, the engineered guide-nucleic acid structure has at least 80%, at least 81%, at least 82%, at least 83%, at least 84%, at least 85%, at least 86%, at least 87%, at least 88%, at least 89%, at least 90%, at least 91%, at least 92%, at least 93%, at least 94%, at least 95%, at least 96%, at least 97%, at least 98%, or at least 99% sequence identity to any of SEQ ID NOS. In some embodiments, the engineered guide-nucleic acid structure comprises any of the nucleotide modifications listed in table 13A. In some embodiments, the RNA-guided endonuclease is a type 2 II endonuclease. In some embodiments, the RNA guided endonuclease has at least 80%, at least 81%, at least 82%, at least 83%, at least 84%, at least 85%, at least 86%, at least 87%, at least 88%, at least 89%, at least 90%, at least 91%, at least 92%, at least 93%, at least 94%, at least 95%, at least 96%, at least 97%, at least 98%, or at least 99% sequence identity to any one of SEQ ID NOs 70-78, 596, 597-598, 1120, 1122-1127, 1647, or a variant thereof.
In some aspects, the disclosure provides an engineered adenosine base editor polypeptide, wherein the polypeptide comprises a sequence having at least 80%, at least 81%, at least 82%, at least 83%, at least 84%, at least 85%, at least 86%, at least 87%, at least 88%, at least 89%, at least 90%, at least 91%, at least 92%, at least 93%, at least 94%, at least 95%, at least 96%, at least 97%, at least 98%, or at least 99% sequence identity to any one of SEQ ID NOs 1647-1653.
In some aspects, the present disclosure provides a method of deaminating a cytosine residue in a eukaryotic nucleic acid sequence in a cell, the method comprising: contacting a polypeptide having cytosine deaminase activity comprising a polypeptide having at least 80%, at least 81%, at least 82%, at least 83%, at least 84%, at least 85%, at least 86%, at least 87%, at least 88% amino acid sequence with any one of SEQ ID NOs 1-49, 444-447, 599-675, 744-835, 970-982 or variants thereof with said eukaryotic nucleic acid sequence, At least 89%, at least 90%, at least 91%, at least 92%, at least 93%, at least 94%, at least 95%, at least 96%, at least 97%, at least 98%, or at least 99% sequence identity. In some embodiments, the eukaryotic nucleic acid sequence is a mammalian, primate, or human nucleic acid sequence. In some embodiments, the cell is a mammalian cell, primate cell, or human cell. In some embodiments, the eukaryotic nucleic acid sequence comprises single stranded DNA (ssDNA) or ribonucleic acid (RNA). In some embodiments, the polypeptide having cytosine deaminase activity comprises a sequence having at least 80%, at least 81%, at least 82%, at least 83%, at least 84%, at least 85%, at least 86%, at least 87%, at least 88%, at least 89%, at least 90%, at least 91%, at least 92%, at least 93%, at least 94%, at least 95%, at least 96%, at least 97%, at least 98%, or at least 99% sequence identity with any one of SEQ ID NO:809-811、819、826、752、777、823、668-671、675、650、752、774、777、806、812、816、817、818、825、827、832、970-982 or variants thereof. In some embodiments, the polypeptide having cytosine deaminase activity comprises a sequence with at least 80% identity to any one of SEQ ID NOS 808, 810-811, 819, 826, 752, 777, or 823, or variants thereof. In some embodiments, the eukaryotic nucleic acid sequence comprises double-stranded DNA (dsDNA). In some embodiments, the polypeptide having cytosine deaminase activity comprises a sequence with at least 80% identity to any one of SEQ ID NOS 810-811. In some embodiments, the polypeptide having cytosine deaminase activity further comprises a nucleic acid binding domain, an endonuclease, or a nicking enzyme. In some embodiments, the polypeptide having cytosine deaminase activity further comprises the endonuclease or the nicking enzyme, wherein the endonuclease or the nicking enzyme comprises a polypeptide having at least 80%, at least 81%, at least 82%, at least 83%, at least 84%, at least 85%, at least 86%, at least 87%, at least 88%, at least 89%, at least 90%, at least 91%, at least 92%, at least 93%, at least 94%, at least 95%, at least one of any one of SEQ ID NOs or variants thereof A sequence that is at least 96%, at least 97%, at least 98%, or at least 99% sequence identity. In some embodiments, the polypeptide having cytosine deaminase activity further comprises a nicking enzyme, wherein the nicking enzyme comprises a mutation from aspartic acid to alanine at: residue 9 relative to SEQ ID NO. 70, residue 13 relative to SEQ ID NO. 71, 72 or 74, residue 12 relative to SEQ ID NO. 73, residue 17 relative to SEQ ID NO. 75, residue 23 relative to SEQ ID NO. 76 or residue 10 relative to SEQ ID NO. 597 or any combination thereof. In some embodiments, the polypeptide having cytosine deaminase activity further comprises a uracil DNA glycosylase inhibitor sequence. In some embodiments, the uracil DNA glycosylase inhibitor comprises a sequence having at least 70%, at least 80%, at least 81%, at least 82%, at least 83%, at least 84%, at least 85%, at least 86%, at least 87%, at least 88%, at least 89%, at least 90%, at least 91%, at least 92%, at least 93%, at least 94%, at least 95%, at least 96%, at least 97%, at least 98%, or at least 99% sequence identity to any one of SEQ ID NOS: 52-56 or SEQ ID NO:67, or a variant thereof. In some embodiments, the polypeptide having cytosine deaminase activity further comprises a FAM72A sequence. In some embodiments, the FAM72A sequence has at least 80%, at least 81%, at least 82%, at least 83%, at least 84%, at least 85%, at least 86%, at least 87%, at least 88%, at least 89%, at least 90%, at least 91%, at least 92%, at least 93%, at least 94%, at least 95%, at least 96%, at least 97%, at least 98%, or at least 99% sequence identity to SEQ ID No. 1121 or a variant thereof.
In some aspects, the disclosure provides a method of deaminating a cytosine residue in a primate nucleic acid sequence in a cell, the method comprising: contacting a polypeptide having cytosine deaminase activity with the primate nucleic acid sequence, the polypeptide having cytosine deaminase activity comprising a polypeptide having at least 80%, at least 81%, at least 82%, at least 83%, at least 84%, at least 85%, at least 86%, at least 87%, at least 88%, at least 89%, at least 90%, at least 80%, at least 81%, at least 82%, at least 83%, at least 84%, at least 86%, at least 87%, at least 88%, at least 89%, at least 90% or variants thereof with any one of SEQ ID nos. 599-638, 660-675 or 828-835, At least 91%, at least 92%, at least 93%, at least 94%, at least 95%, at least 96%, at least 97%, at least 98%, or at least 99% sequence identity. In some embodiments, the eukaryotic nucleic acid sequence comprises double stranded DNA (dsDNA), single stranded DNA (ssDNA), or ribonucleic acid (RNA). In some embodiments, the polypeptide having cytosine deaminase activity further comprises a nucleic acid binding domain, an endonuclease, or a nicking enzyme. In some embodiments, the polypeptide having cytosine deaminase activity further comprises the endonuclease or the nicking enzyme, wherein the endonuclease or the nicking enzyme comprises a polypeptide having at least 80%, at least 81%, at least 82%, at least 83%, at least 84%, at least 85%, at least 86%, at least 87%, at least 88%, at least 89%, at least 90%, at least 91%, at least 92%, at least 93%, at least 94%, at least 95%, at least one of any one of SEQ ID NOs or variants thereof Sequences that are at least 96%, at least 97%, at least 98%, or at least 99% sequence identity. In some embodiments, the polypeptide having cytosine deaminase activity further comprises a nicking enzyme, wherein the nicking enzyme comprises a mutation from aspartic acid to alanine at: residue 9 relative to SEQ ID NO. 70, residue 13 relative to SEQ ID NO. 71, 72 or 74, residue 12 relative to SEQ ID NO. 73, residue 17 relative to SEQ ID NO. 75, residue 23 relative to SEQ ID NO. 76 or residue 10 relative to SEQ ID NO. 597 or any combination thereof. In some embodiments, the polypeptide having cytosine deaminase activity further comprises a uracil DNA glycosylase inhibitor sequence. In some embodiments, the uracil DNA glycosylase inhibitor comprises a sequence having at least 70%, at least 80%, at least 81%, at least 82%, at least 83%, at least 84%, at least 85%, at least 86%, at least 87%, at least 88%, at least 89%, at least 90%, at least 91%, at least 92%, at least 93%, at least 94%, at least 95%, at least 96%, at least 97%, at least 98%, or at least 99% sequence identity to any one of SEQ ID NOS: 52-56 or SEQ ID NO:67, or a variant thereof. In some embodiments, the polypeptide having cytosine deaminase activity further comprises a FAM72A sequence. In some embodiments, the FAM72A sequence has at least 80%, at least 81%, at least 82%, at least 83%, at least 84%, at least 85%, at least 86%, at least 87%, at least 88%, at least 89%, at least 90%, at least 91%, at least 92%, at least 93%, at least 94%, at least 95%, at least 96%, at least 97%, at least 98%, or at least 99% sequence identity to SEQ ID No. 1121 or a variant thereof.
In some aspects, the disclosure provides a nucleic acid comprising an engineered nucleic acid sequence optimized for expression in a mammalian organism, wherein the nucleic acid encodes a sequence having at least 70%, at least 80%, at least 81%, at least 82%, at least 83%, at least 84%, at least 85%, at least 86%, at least 87%, at least 88%, at least 89%, at least 90%, at least 91%, at least 92%, at least 93%, at least 94%, at least 95%, at least 96%, at least 97%, at least 98%, or at least 99% sequence identity to any one of SEQ ID NOs 1-49, 444-447, 599-675, 744-835, 970-982, or variants thereof. In some embodiments, the nucleic acid encodes a sequence that has at least 80%, at least 81%, at least 82%, at least 83%, at least 84%, at least 85%, at least 86%, at least 87%, at least 88%, at least 89%, at least 90%, at least 91%, at least 92%, at least 93%, at least 94%, at least 95%, at least 96%, at least 97%, at least 98%, or at least 99% sequence identity to any one of SEQ ID NO:809-811、819、826、752、777、823、668-671、675、650、752、774、777、806、812、816、817、818、825、827、832、832、970-982 or variants thereof.
In some aspects, the disclosure provides a vector comprising any of the nucleic acids described herein. In some embodiments, the vector is a non-viral or viral vector. In some embodiments, the vector is a plasmid, a minicircle, or a plasmid vector. In some embodiments, the viral vector is an AAV vector.
In some aspects, the present disclosure provides a fusion polypeptide comprising: (a) A domain having cytosine deaminase activity comprising a sequence having at least 80% identity to any one of SEQ ID NOs 1-49, 444-447, 599-675, 744-835, 970-982 or variants thereof; and (b) a nucleic acid binding domain, an endonuclease domain or a nicking enzyme domain. In some embodiments, the domain having cytosine deaminase activity comprises a sequence having at least 80%, at least 81%, at least 82%, at least 83%, at least 84%, at least 85%, at least 86%, at least 87%, at least 88%, at least 89%, at least 90%, at least 91%, at least 92%, at least 93%, at least 94%, at least 95%, at least 96%, at least 97%, at least 98%, or at least 99% sequence identity to any one of SEQ ID NO:809-811、819、826、752、777、823、668-671、675、650、752、774、777、806、812、816、817、818、825、827、832、832、970-982 or variants thereof. In some embodiments, the domain having cytosine deaminase activity comprises a sequence having at least 80%, at least 81%, at least 82%, at least 83%, at least 84%, at least 85%, at least 86%, at least 87%, at least 88%, at least 89%, at least 90%, at least 91%, at least 92%, at least 93%, at least 94%, at least 95%, at least 96%, at least 97%, at least 98%, or at least 99% sequence identity with any one of SEQ ID NOs 809-811, 819, 826, 752, 777, 823, or variants thereof. In some embodiments, the fusion polypeptide comprises the endonuclease domain or the nicking enzyme domain, wherein the endonuclease domain or the nicking enzyme domain comprises a sequence having at least 80%, at least 81%, at least 82%, at least 83%, at least 84%, at least 85%, at least 86%, at least 87%, at least 88%, at least 89%, at least 90%, at least 91%, at least 92%, at least 93%, at least 94%, a sequence that is different from any one of SEQ ID NOS: 70-78, 596, 597-598, 1120, or 1122-1127, or variants thereof A sequence that is at least 95%, at least 96%, at least 97%, at least 98%, or at least 99% sequence identity. In some embodiments, the fusion protein comprises the nicking enzyme domain, wherein the nicking enzyme domain comprises a mutation from aspartic acid to alanine at: residue 9 relative to SEQ ID NO. 70, residue 13 relative to SEQ ID NO. 71, 72 or 74, residue 12 relative to SEQ ID NO. 73, residue 17 relative to SEQ ID NO. 75, residue 23 relative to SEQ ID NO. 76 or residue 10 relative to SEQ ID NO. 597 or any combination thereof. In some embodiments, the fusion protein comprises a sequence having at least 80%, at least 81%, at least 82%, at least 83%, at least 84%, at least 85%, at least 86%, at least 87%, at least 88%, at least 89%, at least 90%, at least 91%, at least 92%, at least 93%, at least 94%, at least 95%, at least 96%, at least 97%, at least 98%, or at least 99% sequence identity to any one of SEQ ID NOs 877-916 or 968-969 or variants thereof.
In some aspects, the present disclosure provides a system comprising: (a) Any of the fusion polypeptides described herein; and (b) an engineered guide-polynucleotide configured to form a complex with the endonuclease domain, the engineered guide-polynucleotide comprising: (i) A guide ribonucleic acid sequence configured to hybridize to a target deoxyribonucleic acid sequence; and (ii) a ribonucleic acid sequence configured to bind to the endonuclease domain. In some embodiments, the engineered guide-polynucleotide further comprises a sequence having at least 80%, at least 81%, at least 82%, at least 83%, at least 84%, at least 85%, at least 86%, at least 87%, at least 88%, at least 89%, at least 90%, at least 91%, at least 92%, at least 93%, at least 94%, at least 95%, at least 96%, at least 97%, at least 98%, or at least 99% sequence identity to a non-degenerate nucleotide of any one of SEQ ID NOs 88-96, 917-931, 963-967, or 1099-1105, or a variant thereof.
In some aspects, the present disclosure provides a polypeptide having adenosine deaminase activity, comprising: a sequence having at least 80%, at least 81%, at least 82%, at least 83%, at least 84%, at least 85%, at least 86%, at least 87%, at least 88%, at least 89%, at least 90%, at least 91%, at least 92%, at least 93%, at least 94%, at least 95%, at least 96%, at least 97%, at least 98% or at least 99% sequence identity to any one of SEQ ID NOs 50, 51, 385-443, 448-475 or a variant thereof, wherein when optimally aligned, the polypeptide comprises substitution relative to at least one of the following residues of SEQ ID NO:50 or any combination thereof: t2, D7, E10, M13, W24, G32, K38, G45, G51, A63, E66, R75, C91, G93, H97, A107, E108, D109, P110, H124, A126, H129, F150 or S165. In some embodiments, when optimally aligned, the substitution comprises T2X1、D7X1、E10X1、M13X4、W24X1、G32X1、K38X2、G45X2、G51X5、A63X7、E66X5、E66X2、R75H、C91R、G93X6、H97X6、H97X5、A107X5、E108X2、D109N、P110H、H124X6、A126X2、H129R、H129N、F150P、F150S、S165X5 relative to SEQ ID NO. 50, or any combination thereof, wherein X 1 is A or G; x 2 is D or E; x 3 is N or Q; x 4 is R or K; x 5 is I, L, M or V; x 6 is F, Y or W; and X 7 is S or T. In some embodiments, the polypeptide comprises a sequence having at least 80%, at least 81%, at least 82%, at least 83%, at least 84%, at least 85%, at least 86%, at least 87%, at least 88%, at least 89%, at least 90%, at least 91%, at least 92%, at least 93%, at least 94%, at least 95%, at least 96%, at least 97%, at least 98%, or at least 99% sequence identity to any one of SEQ ID NOs 836-860 or variants thereof. In some embodiments, the polypeptide comprises any of SEQ ID NOs 839, 841, 843, 844, 847, 848, 849, 850, 851, 852, or 859. In some embodiments, when optimally aligned, the substitutions comprise W24G, G51V, E108D, P110H, F150P, D G, E G or H129N, or any combination thereof, relative to SEQ ID NO: 50. In some embodiments, the polypeptide further comprises a nucleic acid binding domain, an endonuclease domain, or a nicking enzyme domain. In some embodiments, the polypeptide comprises the endonuclease domain or the nicking enzyme domain, wherein the endonuclease domain or the nicking enzyme domain comprises a polypeptide having at least 80%, at least 81%, at least 82%, at least 83%, at least 84%, at least 85%, at least 86%, at least 87%, at least 88%, at least 89%, at least 90%, at least 91%, at least 92%, at least 93%, at least 94%, a variant thereof with any one of SEQ ID NOS: 70-78, 596, 597-598, 1120, or 1122-1127, or variants thereof A sequence that is at least 95%, at least 96%, at least 97%, at least 98%, or at least 99% sequence identity. In some embodiments, the polypeptide comprises the nicking enzyme domain, wherein the nicking enzyme domain comprises a mutation from aspartic acid to alanine at: residue 9 relative to SEQ ID NO. 70, residue 13 relative to SEQ ID NO. 71, 72 or 74, residue 12 relative to SEQ ID NO. 73, residue 17 relative to SEQ ID NO. 75, residue 23 relative to SEQ ID NO. 76 or residue 10 relative to SEQ ID NO. 597 or any combination thereof.
In some aspects, the present disclosure provides a system comprising: (a) Any of the polypeptides used in the base editor fusions described herein (e.g., endonuclease deaminase fusions); and (b) an engineered guide-polynucleotide configured to form a complex with the endonuclease domain, the engineered guide-polynucleotide comprising: (i) A guide ribonucleic acid sequence configured to hybridize to a target deoxyribonucleic acid sequence; and (ii) a ribonucleic acid sequence configured to bind to the endonuclease domain. In some embodiments, the engineered guide-polynucleotide further comprises a sequence having at least 80%, at least 81%, at least 82%, at least 83%, at least 84%, at least 85%, at least 86%, at least 87%, at least 88%, at least 89%, at least 90%, at least 91%, at least 92%, at least 93%, at least 94%, at least 95%, at least 96%, at least 97%, at least 98%, or at least 99% sequence identity to a non-degenerate nucleotide of any one of SEQ ID NOs 88-96, 917-931, 963-967, or 1099-1105.
In some aspects, the present disclosure provides a method of deaminating a cytosine residue in a cell, the method comprising introducing into the cell: (a) A vector encoding a polypeptide having cytosine deaminase activity; and (b) a vector encoding FAM72A protein. In some embodiments, the vector encoding the FAM72A protein comprises a sequence having at least 80% identity to SEQ ID No. 1115 or a sequence having at least 80% identity to SEQ ID No. 1121. In some embodiments, the polypeptide having cytosine deaminase activity comprises a sequence having at least 80% identity with any one of SEQ ID NOs 1-49, 444-447, 599-675, 744-835, 970-982 or a variant thereof. In some embodiments, the polypeptide having cytosine deaminase activity further comprises a nucleic acid binding domain, an endonuclease domain, or a nicking enzyme domain. In some embodiments, the polypeptide having cytosine deaminase activity comprises the endonuclease domain or the nicking enzyme domain, wherein the endonuclease domain or the nicking enzyme domain comprises a sequence having at least 80% identity to any one of SEQ ID NOs 70-78, 596, 597-598, 1120, or 1122-1127, or variants thereof. In some embodiments, the polypeptide having cytosine deaminase activity comprises the nicking enzyme domain, wherein the nicking enzyme domain comprises a mutation from aspartic acid to alanine at: residue 9 relative to SEQ ID NO. 70, residue 13 relative to SEQ ID NO. 71, 72 or 74, residue 12 relative to SEQ ID NO. 73, residue 17 relative to SEQ ID NO. 75, residue 23 relative to SEQ ID NO. 76 or residue 10 relative to SEQ ID NO. 597 or any combination thereof.
In some aspects, the present disclosure provides an engineered nucleic acid editing system comprising: an endonuclease comprising a RuvC domain and an HNH domain, wherein the endonuclease is derived from an uncultured microorganism, wherein the endonuclease is a type 2 II endonuclease, wherein the endonuclease is configured to lack nuclease activity; a base editor coupled to the endonuclease, and an engineered guide ribonucleic acid structure configured to form a complex with the endonuclease, the engineered guide ribonucleic acid structure comprising: a guide ribonucleic acid sequence configured to hybridize to a target deoxyribonucleic acid sequence; and a ribonucleic acid sequence configured to bind to the endonuclease. In some embodiments, the RuvC domain lacks nuclease activity. In some embodiments, the class 2 type II endonuclease comprises a nicking enzyme mutation. In some embodiments, when optimally aligned, the class 2 type II endonuclease comprises a mutation from aspartic acid to alanine at: residue 9 relative to SEQ ID NO. 70, residue 13 relative to SEQ ID NO. 71, 72 or 74, residue 12 relative to SEQ ID NO. 73, residue 17 relative to SEQ ID NO. 75, residue 23 relative to SEQ ID NO. 76 or residue 10 relative to SEQ ID NO. 597. In some embodiments, when optimally aligned, the endonuclease comprises a mutation from aspartic acid to alanine at: residue 9 relative to SEQ ID NO. 70, residue 13 relative to SEQ ID NO. 72 or residue 17 relative to SEQ ID NO. 75. In some embodiments, the endonuclease comprises a sequence having at least 95% sequence identity to any one of SEQ ID NOS: 70-78 or 597 or variants thereof. In some aspects, the present disclosure provides an engineered nucleic acid editing system comprising: an endonuclease having at least 95% sequence identity to any one of SEQ ID NOs 70-78 or 597 or a variant thereof; a base editor coupled to the endonuclease, and an engineered guide ribonucleic acid structure configured to form a complex with the endonuclease, the engineered guide ribonucleic acid structure comprising: a guide ribonucleic acid sequence configured to hybridize to a target deoxyribonucleic acid sequence; and a ribonucleic acid sequence configured to bind to the endonuclease. In some aspects, the present disclosure provides an engineered nucleic acid editing system comprising: an endonuclease configured to bind to a Protospacer Adjacent Motif (PAM) sequence comprising any one of SEQ ID NOs 360-368 or 598 or variants thereof, wherein the endonuclease is a type 2 II endonuclease, and wherein the endonuclease is configured to lack nuclease activity; a base editor coupled to the endonuclease, and an engineered guide ribonucleic acid structure configured to form a complex with the endonuclease, the engineered guide ribonucleic acid structure comprising: a guide ribonucleic acid sequence configured to hybridize to a target deoxyribonucleic acid sequence; and a ribonucleic acid sequence configured to bind to the endonuclease. In some embodiments, the endonuclease comprises a nicking enzyme mutation. In some embodiments, the endonuclease is configured to cleave one strand of a double stranded target deoxyribonucleic acid. In some embodiments, when optimally aligned, the class 2 type II endonuclease comprises a mutation from aspartic acid to alanine at: residue 9 relative to SEQ ID NO. 70, residue 13 relative to SEQ ID NO. 71, 72 or 74, residue 12 relative to SEQ ID NO. 73, residue 17 relative to SEQ ID NO. 75, residue 23 relative to SEQ ID NO. 76 or residue 10 relative to SEQ ID NO. 597. In some embodiments, the base editor comprises a sequence having at least 70%, 80%, 90% or 95% identity to any one of SEQ ID NOs 1-51, 57-66, 385-443, 444-475, 594-595 or 599-675 or a variant thereof. In some embodiments, the base editor comprises a sequence having at least 70%, 80%, 90% or 95% identity to any one of SEQ ID NOs 50-51 or 385-390. In some embodiments, the RuvC domain lacks nuclease activity. In some embodiments, the endonuclease is derived from an uncultured microorganism. In some embodiments, the endonuclease has less than 80% identity to a Cas9 endonuclease. In some embodiments, the endonuclease further comprises an HNH domain. In some embodiments, the engineered guide ribonucleic acid sequence comprises a sequence having at least 80% sequence identity to a non-degenerate nucleotide of any one of SEQ ID NOS 88-96, 488-489, or 679-680, or variants thereof. In some aspects, the present disclosure provides an engineered nucleic acid editing system comprising: an engineered guide ribonucleic acid structure, the engineered guide ribonucleic acid structure comprising: a guide ribonucleic acid sequence configured to hybridize to a target deoxyribonucleic acid sequence; and a ribonucleic acid sequence configured to bind to an endonuclease, wherein the engineered ribonucleic acid sequence comprises a sequence having at least 80% sequence identity to a non-degenerate nucleotide of any one of SEQ ID NOs 88-96, 488-489 or 679-680, or a variant thereof; a class 2 type II endonuclease, the class 2 type II endonuclease configured to bind to the engineered guide ribonucleic acid; and a base editor coupled to the endonuclease. In some embodiments, the base editor comprises a sequence having at least 70%, 80%, 90% or 95% identity to any one of SEQ ID NOs 50-51 or 385-390. In some embodiments, the endonuclease is configured to bind to a Protospacer Adjacent Motif (PAM) sequence selected from the group consisting of SEQ ID NOS: 360-368 or 598. In some embodiments, the base editor comprises a sequence having at least 70%, 80%, 90% or 95% identity to any one of SEQ ID NOs 1-51, 57-66, 385-443, 444-475, 594-595 or 599-675 or a variant thereof. In some embodiments, the base editor is an adenine deaminase. In some embodiments, the adenosine deaminase comprises a sequence with at least 70%, 80%, 90% or 95% identity to any one of SEQ ID NOs 50-51, 57, 385-443, 448-475, or 595, or a variant thereof. In some embodiments, the base editor is a cytosine deaminase. In some embodiments, the cytosine deaminase comprises a sequence having at least 70%, 80%, 90% or 95% identity with any one of SEQ ID NOs 1-49, 444-447, 594, or 58-66, or a variant thereof. In some embodiments, the system further comprises an uracil DNA glycosylase inhibitor coupled to the endonuclease or the base editor. In some embodiments, the uracil DNA glycosylase inhibitor comprises a sequence having at least 70%, 80%, 90% or 95% identity to any of SEQ ID NOs 52-56 or SEQ ID NO 67. In some embodiments, the engineered guide ribonucleic acid structure comprises at least two ribonucleic acid polynucleotides. In some embodiments, the engineered guide ribonucleic acid structure comprises one ribonucleic acid polynucleotide comprising the guide ribonucleic acid sequence and the tracr ribonucleic acid sequence. In some embodiments, the guide ribonucleic acid sequence is complementary to a prokaryotic, bacterial, archaeal, eukaryotic, fungal, plant, mammalian, or human genomic sequence. In some embodiments, the guide ribonucleic acid sequence is 15-24 nucleotides in length. In some embodiments, the endonuclease includes one or more Nuclear Localization Sequences (NLS) near the N-terminus or C-terminus of the endonuclease. In some embodiments, the NLS comprises a sequence with at least 90% identity to a sequence selected from SEQ ID NOS 369-384 or variants thereof. In some embodiments, the endonuclease is covalently coupled to the base editor directly or through a linker. In some embodiments, when optimally aligned, the endonuclease comprises a mutation from aspartic acid to alanine at: residue 9 relative to SEQ ID NO. 70, residue 13 relative to SEQ ID NO. 71, 72 or 74, residue 12 relative to SEQ ID NO. 73 or 78, residue 17 relative to SEQ ID NO. 75, residue 23 relative to SEQ ID NO. 76, residue 8 relative to SEQ ID NO. 77 or residue 10 relative to SEQ ID NO. 597. In some embodiments, when optimally aligned, the endonuclease comprises a mutation from aspartic acid to alanine at: residue 9 relative to SEQ ID NO. 70, residue 13 relative to SEQ ID NO. 72 or residue 17 relative to SEQ ID NO. 75. In some embodiments, the polypeptide comprises the endonuclease and the base editor. In some embodiments, the endonuclease is configured to cleave one strand of a double stranded target deoxyribonucleic acid. In some embodiments, the system further comprises a source of Mg 2+. In some embodiments: (a) The endonuclease comprises a sequence that is at least 70%, at least 80%, or at least 90% identical to any one of SEQ ID NOs 70, 71, 73, 74, 76, 78, 77, or 78, or a variant thereof; (b) The guide RNA structure comprises a sequence that is at least 70%, at least 80%, or at least 90% identical to a non-degenerate nucleotide of any one of SEQ ID NOs 88, 89, 91, 92, 94, 96, 95, or 488; (c) The endonuclease is configured to bind PAM comprising any one of SEQ ID NOs 360, 361, 363, 365, 367 or 368; or (d) the base editor comprises a sequence at least 70%, at least 80% or at least 90% identical to SEQ ID NO 58 or 595 or variants thereof. In some embodiments: (a) The endonuclease comprises a sequence that is at least 70%, at least 80% or at least 90% identical to any one of SEQ ID NOs 70, 71 or 78 or variants thereof; (b) The guide RNA structure comprises a sequence that is at least 70%, at least 80%, or at least 90% identical to a non-degenerate nucleotide of at least one of SEQ ID NOS 88, 89, or 96; (c) The endonuclease is configured to bind PAM comprising any one of SEQ ID NOs 360, 362 or 368; or the base editor comprises a sequence at least 70%, at least 80% or at least 90% identical to SEQ ID NO 594 or a variant thereof. In some embodiments, the sequence identity is determined by BLASTP, CLUSTALW, MUSCLE, MAFFT or Smith-Waterman homology search algorithm (Smith-Waterman homology search algorithm). In some embodiments, the sequence identity is determined by using parameters with a word length (W) of 3 and an expected value (E) of 10, and a BLOSUM62 scoring matrix (set gap penalty to present 11, extension 1) and using the BLASTP homology search algorithm with conditional composition scoring matrix adjustment. In some embodiments, the endonuclease is configured to catalyze death. In some embodiments, the endonuclease is configured to cleave one strand of a double stranded target deoxyribonucleic acid.
In some aspects, the disclosure provides a nucleic acid comprising an engineered nucleic acid sequence optimized for expression in an organism, wherein the nucleic acid encodes a type 2 II endonuclease coupled to a base editor, and wherein the endonuclease is derived from an uncultured microorganism.
In some aspects, the disclosure provides a nucleic acid comprising an engineered nucleic acid sequence optimized for expression in an organism, wherein the nucleic acid encodes an endonuclease having at least 70% sequence identity to any one of SEQ ID NOS: 70-78 coupled to a base editor. In some embodiments, the endonuclease includes a sequence encoding one or more Nuclear Localization Sequences (NLS) near the N-terminus or C-terminus of the endonuclease. In some embodiments, the NLS comprises a sequence with at least 90% identity to a sequence selected from SEQ ID NOS 369-384 or variants thereof. In some embodiments, the organism is a prokaryote, bacterium, eukaryote, fungus, plant, mammal, rodent, or human.
In some aspects, the present disclosure provides a vector comprising a nucleic acid sequence encoding a type 2 II endonuclease coupled to a base editor, wherein the endonuclease is derived from an uncultured microorganism.
In some aspects, the present disclosure provides a vector comprising a nucleic acid of any of the aspects or embodiments described herein. In some embodiments, the vector further comprises a nucleic acid encoding an engineered guide ribonucleic acid structure configured to form a complex with the endonuclease, the engineered guide ribonucleic acid structure comprising: a guide ribonucleic acid sequence configured to hybridize to a target deoxyribonucleic acid sequence; and a ribonucleic acid sequence configured to bind to the endonuclease. In some embodiments, the vector is a plasmid, a minicircle, CELiD, an adeno-associated virus (AAV) derived virion, or a lentivirus.
In some aspects, the present disclosure provides a cell comprising a vector of any of the aspects or embodiments described herein.
In some aspects, the present disclosure provides a method of preparing an endonuclease comprising culturing a cell of any of the aspects or embodiments described herein.
In some aspects, the present disclosure provides a method for modifying a double-stranded deoxyribonucleic acid polynucleotide, the method comprising contacting the double-stranded deoxyribonucleic acid polynucleotide with a complex comprising: an endonuclease comprising a RuvC domain and an HNH domain, wherein the endonuclease is derived from an uncultured microorganism, wherein the endonuclease is a type 2 II endonuclease, and wherein the endonuclease is configured to lack nuclease activity; a base editor coupled to the endonuclease; and an engineered guide ribonucleic acid structure configured to bind to the endonuclease and the double-stranded deoxyribonucleic acid polynucleotide; wherein the double stranded deoxyribonucleic acid polynucleotide comprises a Protospacer Adjacent Motif (PAM). In some embodiments, the endonuclease comprising RuvC domain and HNH domain is covalently coupled to the base editor directly or through a linker. In some embodiments, the endonuclease comprising a RuvC domain and an HNH domain comprises a sequence having at least 95% sequence identity to any one of SEQ ID NOs 70-78 or 597 or variants thereof.
In some aspects, the present disclosure provides a method for modifying a double-stranded deoxyribonucleic acid polynucleotide, the method comprising contacting the double-stranded deoxyribonucleic acid polynucleotide with a complex comprising: a class II endonuclease, a base editor coupled to the endonuclease, and an engineered guide-ribonucleic acid structure configured to bind to the endonuclease and the double-stranded deoxyribonucleic acid polynucleotide; wherein the double-stranded deoxyribonucleic acid polynucleotide comprises a Protospacer Adjacent Motif (PAM); and wherein said PAM comprises a sequence selected from the group consisting of SEQ ID NOS: 70-78 or 597. In some embodiments, the class 2 type II endonuclease is coupled to the base editor covalently or through a linker. In some embodiments, the base editor comprises a sequence having at least 70%, at least 80%, at least 90% or at least 95% identity to a sequence selected from the group consisting of SEQ ID NOS: 1-51, 57-66, 385-443, 444-475, 594-595 or 599-675 or variants thereof. In some embodiments, the base editor comprises an adenine deaminase; the double-stranded deoxyribonucleic acid polynucleotide comprises adenine; and modifying the double-stranded deoxyribonucleic acid polypeptide comprises converting the adenine to guanine. In some embodiments, the adenine deaminase comprises a sequence with at least 70%, 80%, 90% or 95% sequence identity to any one of SEQ ID NOs 50-51, 57, 385-443, 448-475, or 595, or a variant thereof. In some embodiments, the base editor comprises a cytosine deaminase; the double-stranded deoxyribonucleic acid polynucleotide comprises cytosine; and modifying the double-stranded deoxyribonucleic acid polypeptide comprises converting the cytosine to uracil. In some embodiments, the cytosine deaminase comprises a sequence having at least 70%, 80%, 90%, or 95% sequence identity with any one of SEQ ID NOs 1-49, 444-447, 594, or 58-66, or a variant thereof. In some embodiments, the complex further comprises an uracil DNA glycosylase inhibitor coupled to the endonuclease or the base editor. In some embodiments, the uracil DNA glycosylase inhibitor comprises a sequence having at least 70%, 80%, 90% or 95% identity to any one of SEQ ID NOs 52-56 or SEQ ID NO 67, or variant thereof. In some embodiments, the double-stranded deoxyribonucleic acid polynucleotide comprises a first strand comprising a sequence complementary to the sequence of the engineered ribonucleic acid structure; and a second strand, the second strand comprising the PAM. In some embodiments, the PAM is immediately adjacent to the 3' end of the sequence complementary to the sequence of the engineered guide ribonucleic acid structure. In some embodiments, the class 2 type II endonuclease is not a Cas9 endonuclease, a Cas14 endonuclease, a Cas12a endonuclease, a Cas12b endonuclease, a Cas12 c endonuclease, a Cas12d endonuclease, a Cas12e endonuclease, a Cas13a endonuclease, a Cas13b endonuclease, a Cas13c endonuclease, or a Cas13 d endonuclease. In some embodiments, the class 2 type II endonuclease is derived from an uncultured microorganism. In some embodiments, the double-stranded deoxyribonucleic acid polynucleotide is a eukaryotic, plant, fungal, mammalian, rodent, or human double-stranded deoxyribonucleic acid polynucleotide.
In some aspects, the present disclosure provides a method of modifying a target nucleic acid locus, the method comprising delivering to the target nucleic acid locus the engineered nucleic acid editing system of any of the aspects or embodiments described herein, wherein the endonuclease is configured to form a complex with the engineered guide ribonucleic acid structure, and wherein the complex is configured such that upon binding of the complex to the target nucleic acid locus, the complex modifies a nucleotide of the target nucleic acid locus. In some embodiments, the engineered nucleic acid editing system comprises an adenine deaminase, the nucleotide is adenine, and modifying the target nucleotide locus comprises converting the adenine to guanine. In some embodiments, the engineered nucleic acid editing system comprises a cytidine deaminase and a uracil DNA glycosylase inhibitor, the nucleotide is a cytosine and modifying the target nucleotide locus comprises converting the adenine to uracil. In some embodiments, the target locus comprises genomic DNA, viral DNA, or bacterial DNA. In some embodiments, the target nucleic acid gene locus is in vitro. In some embodiments, the target nucleic acid gene locus is within a cell. In some embodiments, the cell is a prokaryotic cell, a bacterial cell, a eukaryotic cell, a fungal cell, a plant cell, an animal cell, a mammalian cell, a rodent cell, a primate cell, or a human cell. In some embodiments, the cell is in an animal. In some embodiments, the cell is within the cochlea. In some embodiments, the cell is within an embryo. In some embodiments, the embryo is a two-cell embryo. In some embodiments, the embryo is a mouse embryo. In some embodiments, delivering the engineered nucleic acid editing system to the target nucleic acid locus comprises delivering a nucleic acid of any of the aspects or embodiments described herein or a vector of any of the aspects or embodiments described herein. In some embodiments, delivering the engineered nucleic acid editing system to the target nucleic acid locus comprises delivering a nucleic acid comprising an open reading frame encoding the endonuclease. In some embodiments, the nucleic acid comprises a promoter operably linked to the open reading frame encoding the endonuclease. In some embodiments, delivering the engineered nucleic acid editing system to the target nucleic acid locus comprises delivering a capped mRNA containing the open reading frame encoding the endonuclease. In some embodiments, delivering the engineered nucleic acid editing system to the target nucleic acid locus comprises delivering a translated polypeptide. In some embodiments, delivering the engineered nucleic acid editing system to the target nucleic acid locus comprises delivering deoxyribonucleic acid (DNA) encoding the engineered guide ribonucleic acid structure operably linked to a ribonucleic acid (RNA) pol III promoter.
In some aspects, the present disclosure provides an engineered nucleic acid editing polypeptide comprising: an endonuclease comprising a RuvC domain and an HNH domain, wherein the endonuclease is derived from an uncultured microorganism, wherein the endonuclease is a type 2 II endonuclease, and wherein the endonuclease is configured to lack nuclease activity; and a base editor coupled to the endonuclease. In some embodiments, the endonuclease comprises a sequence having at least 95% sequence identity to any one of SEQ ID NOS: 70-78 or 597 or variants thereof.
In some aspects, the present disclosure provides an engineered nucleic acid editing polypeptide comprising: an endonuclease having at least 80%, at least 81%, at least 82%, at least 83%, at least 84%, at least 85%, at least 86%, at least 87%, at least 88%, at least 89%, at least 90%, at least 91%, at least 92%, at least 93%, at least 94%, at least 95%, at least 96%, at least 97%, at least 98%, at least 99% or 100% sequence identity to any one of SEQ ID nos. 70-78 or 597, or a variant thereof, wherein the endonuclease is configured to lack nuclease activity; and a base editor coupled to the endonuclease. In some aspects, the present disclosure provides an engineered nucleic acid editing polypeptide comprising: an endonuclease configured to bind to a Protospacer Adjacent Motif (PAM) sequence comprising any one of SEQ ID NOs 360-368 or 598, wherein the endonuclease is a type 2 II endonuclease, and wherein the endonuclease is configured to lack nuclease activity; and a base editor coupled to the endonuclease. In some embodiments, the endonuclease is derived from an uncultured microorganism. In some embodiments, the endonuclease has less than 80% identity to a Cas9 endonuclease. In some embodiments, the endonuclease further comprises an HNH domain. In some embodiments, the tracr ribonucleic acid sequence comprises a sequence having at least 80% sequence identity to about 60 to 90 consecutive nucleotides selected from any one of SEQ ID NOs 88-96, 488, 489 and 679-680. In some embodiments, the base editor comprises a sequence having at least 70%, 80%, 90% or 95% identity to any one of SEQ ID NOs 1-51, 57-66, 385-443, 444-475, 594-595 or 599-675 or a variant thereof. In some embodiments, the base editor is an adenine deaminase. In some embodiments, the adenosine deaminase comprises a sequence with at least 70%, 80%, 90% or 95% sequence identity to any one of SEQ ID NOs 50-51, 57, 385-443, 448-475, or 595, or a variant thereof. In some embodiments, the base editor is a cytosine deaminase. In some embodiments, the cytosine deaminase comprises a sequence having at least 70%, 80%, 90%, or 95% sequence identity with any one of SEQ ID NOs 1-49, 444-447, 594, or 58-66, or a variant thereof.
In some aspects, the present disclosure provides an engineered nucleic acid editing polypeptide comprising: an endonuclease, wherein the endonuclease is configured to lack endonuclease activity; and a base editor coupled to the endonuclease, wherein the base editor comprises a sequence having at least 70%, 80%, 90% or 95% sequence identity to any one of SEQ ID NOs 1-51, 385-386, 387-443, 444-447, 488-475 or 595 or a variant thereof. In some embodiments, the endonuclease is configured to cleave one strand of a double stranded target deoxyribonucleic acid. In some embodiments, the endonuclease is configured to catalyze death. In some embodiments, the endonuclease is a type II endonuclease or a type II V endonuclease. In some embodiments, the endonuclease comprises a sequence having at least 70%, 80%, 90% or 95% sequence identity to any one of SEQ ID NOS 70-78 or 597, or variants thereof. In some embodiments, the endonuclease comprises a nicking enzyme mutation. In some embodiments, when optimally aligned, the endonuclease comprises a mutation from aspartic acid to alanine at: residue 9 relative to SEQ ID NO. 70, residue 13 relative to SEQ ID NO. 71, 72 or 74, residue 12 relative to SEQ ID NO. 73, residue 17 relative to SEQ ID NO. 75, residue 23 relative to SEQ ID NO. 76 or residue 10 relative to SEQ ID NO. 597. In some embodiments, the endonuclease is configured to bind to a Protospacer Adjacent Motif (PAM) sequence selected from the group consisting of SEQ ID NOS: 360-368 or 598. In some embodiments, the base editor is an adenine deaminase. In some embodiments, the adenosine deaminase comprises a sequence with at least 80%, at least 81%, at least 82%, at least 83%, at least 84%, at least 85%, at least 86%, at least 87%, at least 88%, at least 89%, at least 90%, at least 91%, at least 92%, at least 93%, at least 94%, at least 95%, at least 96%, at least 97%, at least 98%, at least 99% or 100% sequence identity to any of SEQ ID NOs 50-51, 385-443, or 448-475, or variants thereof. In some embodiments, the adenosine deaminase comprises a sequence with at least 70%, 80%, 90% or 95% identity to any one of SEQ ID NOs 50-51, 385-390 or 595, or variants thereof. In some embodiments, the base editor is a cytosine deaminase. In some embodiments, the cytosine deaminase comprises a sequence having at least 70%, 80%, 90%, or 95% identity with any one of SEQ ID NOs 1-49, 444-447, or a variant thereof. In some embodiments, the polypeptide further comprises an uracil DNA glycosylase inhibitor coupled to the endonuclease or the base editor. In some embodiments, the uracil DNA glycosylase inhibitor comprises a sequence having at least 70%, 80%, 90% or 95% identity to any one of SEQ ID NOs 52-56 or SEQ ID NO 67, or variant thereof. In some embodiments, the endonuclease includes one or more Nuclear Localization Sequences (NLS) near the N-terminus or C-terminus of the endonuclease. In some embodiments, the NLS comprises a sequence with at least 90% identity to a sequence selected from SEQ ID NOS 369-384 or variants thereof. In some embodiments, the endonuclease is covalently coupled to the base editor directly or through a linker.
In some aspects, the disclosure provides a nucleic acid comprising an engineered nucleic acid sequence that is optimized for expression in an organism, wherein the nucleic acid encodes a sequence having at least 70%, at least 80%, at least 81%, at least 82%, at least 83%, at least 84%, at least 85%, at least 86%, at least 87%, at least 88%, at least 89%, at least 90%, at least 91%, at least 92%, at least 93%, at least 94%, at least 95%, at least 96%, at least 97%, at least 98%, at least 99% or 100% sequence identity to any one of SEQ ID NOs 1-51, 385-386, 387-443, 444-447, or 488-475, or a variant thereof. In some embodiments, the organism is a prokaryote, bacterium, eukaryote, fungus, plant, mammal, rodent, or human.
In some aspects, the present disclosure provides a vector comprising a nucleic acid of any of the aspects or embodiments described herein. In some embodiments, the vector is a plasmid, a minicircle, CELiD, an adeno-associated virus (AAV) derived virion, or a lentivirus.
In some aspects, the present disclosure provides a cell comprising a vector of any one of the aspects or embodiments described herein.
In some aspects, the disclosure provides a method of making a base editor comprising culturing the cell of any one of the aspects or embodiments described herein.
In some aspects, the present disclosure provides a system comprising: (a) A nucleic acid editing polypeptide of any of the aspects or embodiments described herein; and (b) an engineered guide ribonucleic acid structure configured to form a complex with the nucleic acid editing polypeptide, the engineered guide ribonucleic acid structure comprising: a guide ribonucleic acid sequence configured to hybridize to a target deoxyribonucleic acid sequence; and a ribonucleic acid sequence configured to bind to the endonuclease. In some embodiments, the engineered guide ribonucleic acid sequence comprises a sequence having at least 80% sequence identity to a non-degenerate nucleotide of any one of SEQ ID NOS 88-96, 488-489, or 679-680.
In some aspects, the disclosure provides a method of modifying a target nucleic acid locus, the method comprising delivering to the target nucleic acid locus the engineered nucleic acid editing polypeptide of any of the aspects or embodiments described herein or the system of any of the aspects or embodiments described herein, wherein the complex is configured such that upon binding of the complex to the target nucleic acid locus, the complex modifies a nucleotide of the target nucleic acid locus.
In some aspects, the present disclosure provides an engineered nucleic acid editing system comprising: (a) An endonuclease comprising a RuvC domain and an HNH domain, wherein the endonuclease is derived from an uncultured microorganism, wherein the endonuclease is a type 2 II Cas endonuclease; and wherein the RuvC domain lacks nuclease activity; (b) A base editor coupled to the endonuclease; and (c) an engineered guide ribonucleic acid structure configured to form a complex with the endonuclease, the engineered guide ribonucleic acid structure comprising: (i) A guide ribonucleic acid sequence configured to hybridize to a target deoxyribonucleic acid sequence; and (ii) a tracr ribonucleic acid sequence configured to bind to the endonuclease. In some embodiments, the endonuclease comprises a sequence having at least 95% sequence identity to any one of SEQ ID NOS: 70-78.
In some aspects, the present disclosure provides an engineered nucleic acid editing system comprising: (a) An endonuclease having at least 95% sequence identity to any one of SEQ ID NOs 70-78 or variants thereof, wherein said endonuclease comprises a RuvC domain lacking nuclease activity; and a base editor coupled to the endonuclease; and an engineered guide ribonucleic acid structure configured to form a complex with the endonuclease, the engineered guide ribonucleic acid structure comprising: (i) A guide ribonucleic acid sequence configured to hybridize to a target deoxyribonucleic acid sequence; and (ii) a tracr ribonucleic acid sequence configured to bind to the endonuclease.
In some aspects, the present disclosure provides an engineered nucleic acid editing system comprising: (a) An endonuclease configured to bind to a Protospacer Adjacent Motif (PAM) sequence comprising SEQ ID NOs 360-368, wherein the endonuclease is a type 2 II Cas endonuclease; and wherein the endonuclease comprises a RuvC domain lacking nuclease activity; and (b) a base editor coupled to the endonuclease; and (c) an engineered guide ribonucleic acid structure configured to form a complex with the endonuclease, the engineered guide ribonucleic acid structure comprising: (i) A guide ribonucleic acid sequence configured to hybridize to a target deoxyribonucleic acid sequence; and (ii) a tracr ribonucleic acid sequence configured to bind to the endonuclease.
In some embodiments, the endonuclease is derived from an uncultured microorganism. In some embodiments, the endonuclease has less than 80% identity to a Cas9 endonuclease. In some embodiments, the endonuclease further comprises an HNH domain. In some embodiments, the tracr ribonucleic acid sequence comprises a sequence having at least 80% sequence identity to about 60 to 90 consecutive nucleotides selected from any one of SEQ ID NOs 88-96, 488, 489 and 679-680.
In some aspects, the present disclosure provides an engineered nucleic acid editing system comprising: (a) An engineered guide ribonucleic acid structure, the engineered guide ribonucleic acid structure comprising: (i) A guide ribonucleic acid sequence configured to hybridize to a target deoxyribonucleic acid sequence; and (ii) a tracr ribonucleic acid sequence configured to bind to an endonuclease, wherein the tracr ribonucleic acid sequence comprises a sequence having at least 80% sequence identity to about 60 to 90 consecutive nucleotides selected from any one of SEQ ID NOs 88-96, 488, 489 and 679-680; and a type 2 II endonuclease, the type 2 II endonuclease configured to bind to the engineered guide ribonucleic acid.
In some embodiments, the endonuclease is configured to bind to a Protospacer Adjacent Motif (PAM) sequence selected from the group consisting of SEQ ID NOS: 360-368. In some embodiments, the base editor comprises a sequence having at least 70%, 80%, 90% or 95% identity to any one of SEQ ID NOs 1-51 and 385-475. In some embodiments, the base editor is an adenine deaminase. In some embodiments, the adenosine deaminase comprises a sequence with at least 95% identity to SEQ ID NO: 57. In some embodiments, the base editor is a cytosine deaminase. In some embodiments, the cytosine deaminase comprises a sequence with at least 95% identity to SEQ ID NO: 58. In some embodiments, the cytosine deaminase comprises a sequence with at least 95% identity to any one of SEQ ID NOs 59-66.
In some embodiments, the engineered nucleic acid editing system further comprises a uracil DNA glycosylase inhibitor. In some embodiments, the uracil DNA glycosylase inhibitor comprises a sequence having at least 70%, 80%, 90% or 95% identity to any of SEQ ID NOs 52-56 or SEQ ID NO 67.
In some embodiments, the engineered guide ribonucleic acid structure comprises at least two ribonucleic acid polynucleotides. In some embodiments, the engineered guide ribonucleic acid structure comprises one ribonucleic acid polynucleotide comprising the guide ribonucleic acid sequence and the tracr ribonucleic acid sequence. In some embodiments, the guide ribonucleic acid sequence is complementary to a prokaryotic, bacterial, archaeal, eukaryotic, fungal, plant, mammalian, or human genomic sequence. In some embodiments, the guide ribonucleic acid sequence is 15-24 nucleotides in length. In some embodiments, the endonuclease includes one or more Nuclear Localization Sequences (NLS) near the N-terminus or C-terminus of the endonuclease. In some embodiments, the endonuclease is covalently coupled to the base editor directly or through a linker. In some embodiments, the polypeptide comprises the endonuclease and the base editor. In some embodiments, the endonuclease is configured to cleave one strand of a double stranded target deoxyribonucleic acid. In some embodiments, the endonuclease comprises SEQ ID NO. 370. In some embodiments, the system further comprises a source of Mg 2+.
In some embodiments, the endonuclease comprises a sequence that is at least 70%, at least 80%, or at least 90% identical to SEQ ID NO. 70; the guide RNA structure comprises a sequence that is at least 70%, at least 80%, or at least 90% identical to SEQ ID NO 88; and the endonuclease is configured to bind to PAM comprising SEQ ID NO. 360.
In some embodiments, the endonuclease comprises a sequence that is at least 70%, at least 80%, or at least 90% identical to SEQ ID NO. 71; the guide RNA structure comprises a sequence that is at least 70%, at least 80%, or at least 90% identical to SEQ ID NO 89; and the endonuclease is configured to bind to PAM comprising SEQ ID NO: 361.
In some embodiments, the endonuclease comprises a sequence that is at least 70%, at least 80%, or at least 90% identical to SEQ ID NO 73; the guide RNA structure comprises a sequence that is at least 70%, at least 80%, or at least 90% identical to SEQ ID NO 91; and the endonuclease is configured to bind to PAM comprising SEQ ID NO 363.
In some embodiments, the endonuclease comprises a sequence that is at least 70%, at least 80%, or at least 90% identical to SEQ ID NO 75; the guide RNA structure comprises a sequence at least 70%, at least 80% or at least 90% identical to SEQ ID NO. 93; and the endonuclease is configured to bind to PAM comprising SEQ ID NO. 365.
In some embodiments, the endonuclease comprises a sequence that is at least 70%, at least 80%, or at least 90% identical to SEQ ID NO 76; the guide RNA structure comprises a sequence that is at least 70%, at least 80%, or at least 90% identical to SEQ ID NO. 94; and the endonuclease is configured to bind to PAM comprising SEQ ID NO 366.
In some embodiments, the endonuclease comprises a sequence that is at least 70%, at least 80%, or at least 90% identical to SEQ ID NO. 77; the guide RNA structure comprises a sequence that is at least 70%, at least 80%, or at least 90% identical to SEQ ID NO 95; and the endonuclease is configured to bind to PAM comprising SEQ ID NO. 367.
In some embodiments, the endonuclease comprises a sequence that is at least 70%, at least 80%, or at least 90% identical to SEQ ID NO. 78; the guide RNA structure comprises a sequence at least 70%, at least 80% or at least 90% identical to SEQ ID NO. 96; and the endonuclease is configured to bind to PAM comprising SEQ ID NO. 368.
In some embodiments, the base editor comprises an adenine deaminase. In some embodiments, the adenine deaminase comprises SEQ ID NO:57. In some embodiments, the base editor comprises a cytosine deaminase. In some embodiments, the cytosine deaminase comprises SEQ ID NO:58. In some embodiments, the engineered nucleic acid editing systems described herein further comprise uracil DNA glycosylation inhibitors. In some embodiments, the uracil DNA glycosylation inhibitor comprises SEQ ID NO 67.
In some embodiments, the sequence identity is determined by BLASTP, CLUSTALW, MUSCLE, MAFFT or smith-whatman homology search algorithm. In some embodiments, the sequence identity is determined by using parameters with a word length (W) of 3 and an expected value (E) of 10, and a BLOSUM62 scoring matrix (set gap penalty to present 11, extension 1) and using the BLASTP homology search algorithm with conditional composition scoring matrix adjustment.
In some aspects, the disclosure provides a nucleic acid comprising an engineered nucleic acid sequence optimized for expression in an organism, wherein the nucleic acid encodes a type 2 II endonuclease coupled to a base editor, and wherein the endonuclease is derived from an uncultured microorganism.
In some aspects, the disclosure provides a nucleic acid comprising an engineered nucleic acid sequence optimized for expression in an organism, wherein the nucleic acid encodes an endonuclease having at least 70% sequence identity to any one of SEQ ID NOS: 70-78 coupled to a base editor. In some embodiments, the endonuclease includes a sequence encoding one or more Nuclear Localization Sequences (NLS) near the N-terminus or C-terminus of the endonuclease. In some embodiments, the organism is a prokaryote, bacterium, eukaryote, fungus, plant, mammal, rodent, or human.
In some aspects, the present disclosure provides a vector comprising a nucleic acid sequence encoding a type 2 II endonuclease coupled to a base editor, wherein the endonuclease is derived from an uncultured microorganism. In some embodiments, the vector comprises a nucleic acid described herein. In some embodiments, the vector further comprises a nucleic acid encoding an engineered guide ribonucleic acid structure configured to form a complex with the endonuclease, the engineered guide ribonucleic acid structure comprising: a guide ribonucleic acid sequence configured to hybridize to a target deoxyribonucleic acid sequence; and a tracr ribonucleic acid sequence configured to bind to the endonuclease. In some embodiments, the vector is a plasmid, a minicircle, CELiD, an adeno-associated virus (AAV) derived virion, or a lentivirus. In some aspects, the present disclosure provides a cell comprising a vector described herein. In some aspects, the present disclosure provides a method of preparing an endonuclease comprising culturing a cell described herein.
In some aspects, the present disclosure provides a method for modifying a double-stranded deoxyribonucleic acid polynucleotide, the method comprising contacting the double-stranded deoxyribonucleic acid polynucleotide with a complex comprising: an endonuclease comprising a RuvC domain and an HNH domain, wherein the endonuclease is derived from an uncultured microorganism, wherein the endonuclease is a type 2 II endonuclease, and wherein the RuvC domain lacks nuclease activity; a base editor coupled to the endonuclease; and an engineered guide ribonucleic acid structure configured to bind to the endonuclease and the double-stranded deoxyribonucleic acid polynucleotide; wherein the double stranded deoxyribonucleic acid polynucleotide comprises a Protospacer Adjacent Motif (PAM).
In some embodiments, the endonuclease comprising RuvC domain and HNH domain is covalently coupled to the base editor directly or through a linker. In some embodiments, the endonuclease comprising a RuvC domain and an HNH domain comprises a sequence having at least 95% sequence identity to any one of SEQ ID NOs 70-78.
In some aspects, the present disclosure provides a method for modifying a double-stranded deoxyribonucleic acid polynucleotide, the method comprising contacting the double-stranded deoxyribonucleic acid polynucleotide with a complex comprising: a class II endonuclease, a base editor coupled to the endonuclease, and an engineered guide-ribonucleic acid structure configured to bind to the endonuclease and the double-stranded deoxyribonucleic acid polynucleotide; wherein the double-stranded deoxyribonucleic acid polynucleotide comprises a Protospacer Adjacent Motif (PAM); and wherein said PAM comprises a sequence selected from the group consisting of SEQ ID NOS: 360-368.
In some embodiments, the class 2 type II endonuclease is coupled to the base editor covalently or through a linker. In some embodiments, the base editor comprises a sequence having at least 70%, at least 80%, at least 90% or at least 95% identity to a sequence selected from the group consisting of SEQ ID NOS: 1-51 and 385-475. In some embodiments, the base editor comprises an adenine deaminase; the double-stranded deoxyribonucleic acid polynucleotide comprises adenine; and modifying the double-stranded deoxyribonucleic acid polypeptide comprises converting the adenine to guanine. In some embodiments, the adenine deaminase comprises a sequence that has at least 95% identity to SEQ ID NO: 57.
In some embodiments, the base editor comprises a cytosine deaminase; the double-stranded deoxyribonucleic acid polynucleotide comprises cytosine; and modifying the double-stranded deoxyribonucleic acid polypeptide comprises converting the cytosine to uracil. In some embodiments, the cytosine deaminase comprises a sequence with at least 95% identity to SEQ ID NO: 58. In some embodiments, the cytosine deaminase comprises a sequence with at least 95% identity to any one of SEQ ID NOs 59-66.
In some embodiments, the complex further comprises a uracil DNA glycosylase inhibitor. In some embodiments, the uracil DNA glycosylase inhibitor comprises a sequence having at least 70%, 80%, 90% or 95% identity to any of SEQ ID NOs 52-56 or SEQ ID NO 67. In some embodiments, the double-stranded deoxyribonucleic acid polynucleotide comprises a first strand comprising a sequence complementary to the sequence of the engineered ribonucleic acid structure, and a second strand comprising the PAM. In some embodiments, the PAM is immediately adjacent to the 3' end of the sequence complementary to the sequence of the engineered guide ribonucleic acid structure.
In some embodiments, the class 2 type II endonuclease is not a Cas9 endonuclease, a Cas14 endonuclease, a Cas12a endonuclease, a Cas12b endonuclease, a Cas12 c endonuclease, a Cas12d endonuclease, a Cas12e endonuclease, a Cas13a endonuclease, a Cas13b endonuclease, a Cas13c endonuclease, or a Cas13 d endonuclease. In some embodiments, the class 2 type II endonuclease is derived from an uncultured microorganism. In some embodiments, the double-stranded deoxyribonucleic acid polynucleotide is a eukaryotic, plant, fungal, mammalian, rodent, or human double-stranded deoxyribonucleic acid polynucleotide.
In some aspects, the disclosure provides a method of modifying a target nucleic acid locus, the method comprising delivering an engineered nucleic acid editing system described herein to the target nucleic acid locus, wherein the endonuclease is configured to form a complex with the engineered guide ribonucleic acid structure, and wherein the complex is configured such that upon binding of the complex to the target nucleic acid locus, the complex modifies a nucleotide of the target nucleic acid locus.
In some embodiments, the engineered nucleic acid editing system comprises an adenine deaminase, the nucleotide is adenine, and modifying the target nucleotide locus comprises converting the adenine to guanine. In some embodiments, the engineered nucleic acid editing system comprises a cytidine deaminase and a uracil DNA glycosylase inhibitor, the nucleotide is a cytosine and modifying the target nucleotide locus comprises converting the adenine to uracil. In some embodiments, the target locus comprises genomic DNA, viral DNA, or bacterial DNA. In some embodiments, the target nucleic acid gene locus is in vitro. In some embodiments, the target nucleic acid gene locus is within a cell. In some embodiments, the cell is a prokaryotic cell, a bacterial cell, a eukaryotic cell, a fungal cell, a plant cell, an animal cell, a mammalian cell, a rodent cell, a primate cell, or a human cell. In some embodiments, the cell is in an animal.
In some embodiments, the cell is within the cochlea. In some embodiments, the cell is within an embryo. In some embodiments, the embryo is a two-cell embryo. In some embodiments, the embryo is a mouse embryo. In some embodiments, delivering the engineered nucleic acid editing system to the target nucleic acid locus comprises delivering a nucleic acid described herein or a vector described herein. In some embodiments, delivering the engineered nucleic acid editing system to the target nucleic acid locus comprises delivering a nucleic acid comprising an open reading frame encoding the endonuclease.
In some embodiments, the nucleic acid comprises a promoter operably linked to the open reading frame encoding the endonuclease. In some embodiments, delivering the engineered nucleic acid editing system to the target nucleic acid locus comprises delivering a capped mRNA containing the open reading frame encoding the endonuclease. In some embodiments, delivering the engineered nucleic acid editing system to the target nucleic acid locus comprises delivering a translated polypeptide. In some embodiments, delivering the engineered nucleic acid editing system to the target nucleic acid locus comprises delivering deoxyribonucleic acid (DNA) encoding the engineered guide ribonucleic acid structure operably linked to a ribonucleic acid (RNA) pol III promoter.
In some aspects, the present disclosure provides an engineered nucleic acid editing polypeptide comprising: an endonuclease comprising a RuvC domain and an HNH domain, wherein the endonuclease is derived from an uncultured microorganism, wherein the endonuclease is a type 2 II endonuclease, and wherein the RuvC domain lacks nuclease activity; and a base editor coupled to the endonuclease. In some embodiments, the endonuclease comprises a sequence having at least 95% sequence identity to any one of SEQ ID NOS: 70-78.
In some aspects, the present disclosure provides an engineered nucleic acid editing polypeptide comprising: an endonuclease having at least 95% sequence identity to any one of SEQ ID NOs 70-78 or variants thereof, wherein said endonuclease comprises a RuvC domain lacking nuclease activity; and a base editor coupled to the endonuclease.
In some aspects, the present disclosure provides an engineered nucleic acid editing polypeptide comprising: an endonuclease configured to bind to a Protospacer Adjacent Motif (PAM) sequence comprising SEQ ID NOs 360-368, wherein the endonuclease is a type 2 II endonuclease, and wherein the endonuclease comprises a RuvC domain lacking nuclease activity; and a base editor coupled to the endonuclease.
In some embodiments, the endonuclease is derived from an uncultured microorganism. In some embodiments, the endonuclease has less than 80% identity to a Cas9 endonuclease. In some embodiments, the endonuclease further comprises an HNH domain. In some embodiments, the tracr ribonucleic acid sequence comprises a sequence having at least 80% sequence identity to about 60 to 90 consecutive nucleotides selected from any one of SEQ ID NOs 88-96, 488, 489 and 679-680. In some embodiments, the base editor comprises a sequence having at least 70%, 80%, 90% or 95% identity to any one of SEQ ID NOs 1-51 and 385-475. In some embodiments, the base editor is an adenine deaminase. In some embodiments, the adenosine deaminase comprises a sequence with at least 95% identity to SEQ ID NO: 57. In some embodiments, the base editor is a cytosine deaminase. In some embodiments, the cytosine deaminase comprises a sequence with at least 95% identity to SEQ ID NO: 58. In some embodiments, the adenosine cytosine deaminase comprises a sequence with at least 95% identity to any one of SEQ ID NOs 59-66.
Additional aspects and advantages of the present disclosure will become readily apparent to those skilled in the art from the following detailed description, wherein only illustrative embodiments of the present disclosure are shown and described. As will be realized, the present disclosure is capable of other different embodiments and its several details are capable of modification in various obvious respects, all without departing from the present disclosure. Accordingly, the drawings and description are to be regarded as illustrative in nature and not as restrictive.
Incorporated by reference
All publications, patents, and patent applications mentioned in this specification are herein incorporated by reference to the same extent as if each individual publication, patent, or patent application was specifically and individually indicated to be incorporated by reference.
Drawings
The novel features of the invention are set forth with particularity in the appended claims. A better understanding of the features and advantages of the present invention will be obtained by reference to the following detailed description that sets forth illustrative embodiments, in which the principles of the invention are utilized, and the accompanying drawings (also referred to herein as "Figure/fig") ":
fig. 1 depicts an example organization of different kinds and types of CRISPR loci.
FIG. 2 shows the structure of a base editor plasmid containing the T7 promoter driving expression of the system described herein.
FIG. 3 shows a plasmid map of the system described herein. MGA contains TadA x (from abe8.17 m) -SV40 NLS and MGC contains apodec 1 (from BE 3) linked to uracil glycosylase inhibitor and SV40 NLS.
FIG. 4 shows predicted catalytic residues in the RuvCI domain of a selected endonuclease described herein that has been mutated to disrupt nuclease activity to produce a nicking enzyme.
FIG. 5 depicts an example method for cloning a single guide RNA expression cassette into the systems described herein. One fragment included the T7 promoter plus a spacer. Another fragment includes a spacer plus a single guide scaffold sequence plus a bi-directional terminator. The fragments are assembled into an expression plasmid, resulting in a functional construct that can express both the sgRNA and the base editor.
FIGS. 6A and 6B show lacZ-targeted sgRNA designs in E.coli (E.coli). The spacer for the systems described herein is 22 nucleotides in length. For the selected system described herein, three sgrnas targeting lacZ in e.coli were designed to determine the editing window.
FIG. 7 shows nicking enzyme activity of selected mutant effectors. A600 bp double-stranded DNA fragment labeled with a fluorophore (6-FAM) on both 5' ends was incubated with purified enzyme supplemented with its homologous sgRNA. The reaction product was resolved on a 10% TBE-urea denaturing gel. Double strand cleavage yields bands of 400 and 200 bases. Nicking enzyme activity produces bands of 600 and 200 bases.
FIGS. 8A, 8B and 8C show Mulberry sequencing (Sanger sequencing) results, demonstrating base editing by the selected system described herein.
FIG. 9 shows how the system described herein extends base editing capabilities with the endonucleases and base editors described herein.
FIGS. 10A and 10B show the base editing efficiency of an Adenine Base Editor (ABE) comprising TadA (ABE 8.17 m) and MG nicking enzymes. TadA is tRNA adenine deaminase and TadA (ABE 8.17 m) is an engineered variant of E.coli TadA. 12 MG nickases fused to TadA (ABE 8.17 m) were constructed and tested in E.coli. Three guides were designed to target lacZ. The numbers shown in the boxes indicate the percentage of a to G conversion quantified by editing r.abe8.17m, used as positive control for the experiments.
FIGS. 11A and 11B show the base editing efficiency of the Cytosine Base Editor (CBE) including rat APOBEC1, MG nicking enzyme, and uracil glycosylase inhibitor of Bacillus subtilis phage (UGI (PBS 1)). Apodec 1 is a cytosine deaminase. 12 MG nicking enzymes fused at their N-terminus to rAPOBEC and fused at their C-terminus to UGI were constructed and tested in E.coli. Three guides were designed to target lacZ. The numbers shown in the boxes indicate the percentage of C-to-T conversion quantified by editing R. BE3 was used as a positive control for the experiment.
FIGS. 12A and 12B show the effect of MG Uracil Glycosylase Inhibitors (UGI) on base editing activity of CBE. FIG. 12A depicts a graph showing the base editing activity of MGC15-1 and variants including N-terminal APOBEC1, MG15-1 nickase and C-terminal UGI. Three MG UGIs were tested for improvement of cytosine base editing activity in e. Panel fig. 12B is a graph showing base editing activity of BE3 including N-terminus rAPOBEC, spCas9 nickase, and C-terminus UGI. Two MG UGIs were tested for improvement of cytosine base editing activity in HEK293T cells. The editing efficiency was quantified by editing R.
Fig. 13A and 13B depict graphs showing edited sites of editing efficiency of a cytosine base editor comprising A0A2K5RDN7, MG nickase, and MG UGI. The construct included N-terminal A0A2K5RDN7, MG nickase and C-terminal MG69-1. For simplicity, the identification of MG nicking enzymes is shown in the figure. BE3 was used as a positive control for base editing. Empty vector was used for negative control. Three independent experiments were performed on different days. Abbreviations: r, repeating; NEG, negative control.
FIGS. 14A and 14B show a positive selection method for TadA characterization in E.coli. FIG. 14A shows a diagram of one plasmid system for TadA selection. Vectors include CAT (H193Y), CAT-targeted sgRNA expression cassettes, and ABE expression cassettes. In this figure, the N-terminus TadA from escherichia coli and the C-terminus SpCas9 (D10A) from streptococcus pyogenes (Streptococcus pyogenes) are shown. FIG. 14B shows a sequencing trace demonstrating the editing of the A2 position of the CAT (H193Y) template strand, reversing the H193Y mutant to wild-type and restoring its activity when introduced into/transformed into E.coli cells. Abbreviations: CAT, chloramphenicol acetyl transferase.
Fig. 15A and 15B show that the mutation caused by TadA makes chloramphenicol (Cm) highly tolerant. Fig. 15A shows photographs of growth plates in which different concentrations of chloramphenicol were used to select antibiotic resistance of e.coli. In this example, a wild type and two variants of TadA from E.coli (EcTadA) were tested. Figure 15B shows a summary of results demonstrating that ABE carrying mutation TadA shows higher editing efficiency than wild type. In these experiments, colonies were selected from plates with Cm greater than or equal to 0.5. Mu.g/mL. For simplicity, the identity of deaminase is shown in the table.
Fig. 16A shows photographs of growth plates used to study MG TadA activity in positive selection. 8 MG68 TadA candidates (ABE includes N-terminal TadA variant and C-terminal SpCas9 (D10A) nickase) were tested against chloramphenicol at 0 to 2 μg/mL. For simplicity, the identity of deaminase is shown. In this experiment, colonies were selected from plates with Cm greater than or equal to 0.5. Mu.g/mL.
FIG. 16B summarizes the editing efficiency of MG TadA candidates and demonstrates base editing of MG68-3 and MG68-4 driver adenine.
FIGS. 17A and 17B show improvement in base editing efficiency of MG68-4_nSpCas9 by the D109N mutation on MG 68-4. FIG. 17A shows photographs of growth plates in which wild-type MG68-4 and variants thereof were tested against chloramphenicol at 0 to 4. Mu.g/mL. For simplicity, the identity of deaminase is shown. The adenine base editor in this experiment included an N-terminal TadA variant and a C-terminal SpCas9 (D10A) nickase. Panel (b) shows a summary table depicting the edit efficiency of MG TadA candidates. FIG. 17B demonstrates that MG68-4 and MG68-4 (D109N) show base editing of adenine, wherein the D109N mutant shows increased activity. In this experiment, colonies were selected from plates with Cm greater than or equal to 0.5. Mu.g/mL.
FIGS. 18A and 18B show base editing of MG68-4 (D109N) _nMG 34-1. FIG. 18A shows a photograph of an experimental growth plate in which ABE including N-terminal MG68-4 (D109N) and C-terminal SpCas9 (D10A) nickases were tested for chloramphenicol at 0 to 2 μg/mL. FIG. 18B shows a summary table depicting editing efficiency with and without sgRNA. In this experiment, colonies were selected from plates with Cm greater than or equal to 1. Mu.g/mL.
FIG. 19 shows 28 MG68-4 variants designed to improve MG68-4-nMG34-1 base editing activity (SEQ ID NO: 448-475). 12 residues were selected for targeted mutagenesis to improve enzyme editing.
Fig. 20 shows the results of a gel-based deaminase assay showing activity of deaminase from several selected families (MG 93, MG138 and MG 139). The enzyme was expressed in bacterial (E.coli codon optimized) Purexpress cell lysate derived in vitro transcription translation system and incubated with 5' FAM labeled ssDNA and USER enzyme (uracil DNA glycosylase and endonuclease VIII) for 2.5 hours at 37 ℃. The resulting DNA was resolved on a denaturing polyacrylamide gel and imaged. The positive control is a sequence synthesized at the same position as target C that incorporates U, and the negative control is a sequence without U or C.
FIG. 21 shows a graph showing the base editing efficiency of an adenine base editor at a specific nucleotide site fused with nMG 68-4v1 or nSpCas using MG34-1 or nSpCas. The 9 guides were designed to target the genomic loci of HEK293T cells. Abbreviations: MG68-4v1, MG68-4 (D109N); nMG34-1, MG34-1 nickase; nSpCas9, spCas9 nickase.
FIGS. 22A, 22B, 22C, 22D, 22E and 22F show in vivo base editing with engineered MG34-1 and MG35-1 nickases. Panels (a) and (B) show base editing in the e.coli genome at four target loci. FIG. 22A shows the ABE-MG34-1 base editor versus the reference ABE-SpCas9 (both with TadA x (8.8 m) deaminase). FIG. 22B shows the CBE-MG34-1 base editor relative to a reference CBE-SpCas9 (both having rAPOBEC1 deaminase and PBS1 UGI). FIG. 22C shows base editing with ABE-MG34-1 nickase at three target loci in human HEK293T cells. The target sequence for each locus in panels A, B and C is shown above each heat map. The intended edit position is indicated by a subscript numeral in the sequence and is located at each position (square) on the heat map. Figures 22A, B and C represent percentages of NGS reads that support editing. The values in fig. 22 (a) and (B) represent the average of two independent experiments, while the values in fig. (C) represent the average of three independent biological replicates. Fig. 22D shows e.coli survival assays. Coli was transformed with plasmid containing ABE, non-functional chloramphenicol acetyl transferase (CAT H193Y) gene and sgRNA targeting CAT gene (target spacer) or not targeting CAT gene (non-target spacer). Coli survival under chloramphenicol selection depends on editing the nonfunctional CAT gene to the ABE base of its wild-type sequence. Figure 22E, top panel shows a diagram of ABE construct with engineered MG35-1 nickase containing C-terminal TadA x- (7.10) monomer and SV40 NLS fused to C-terminal. Fig. 22E, bottom panel: the transformed E.coli was grown on plates containing chloramphenicol at concentrations of 0,2, 3,4, and 8. Mu.g/mL. The plate also contained 100. Mu.g/mL card Bei Xilin (Carbecillin) and 0.1mM IPTG. Colonies grown on plates containing chloramphenicol concentrations of 0,2, 3, and 4 μg/mL were sequenced to assess the reversion of the CAT gene. Experiments were performed in duplicate.
Fig. 23A and 23B depict gel-based deaminase assays showing activity of deaminase from a selected family (MG 139). The enzyme was expressed in bacterial (E.coli codon optimized) Purexpress cell lysate derived in vitro transcription translation system and incubated with 5' FAM labeled ssDNA and USER enzyme (uracil DNA glycosylase and endonuclease VIII) for 2.5 hours at 37 ℃. The resulting DNA was resolved and imaged on a denaturing polyacrylamide gel, as shown in fig. 23A. The positive control is a sequence synthesized at the same position as target C that incorporates U, and the negative control is a sequence without U or C. FIG. 23B depicts the percent deamination activity of all active cytidine deaminase on ssDNA. The taxonomic classification of cytidine deaminase is shown.
Fig. 24 depicts a gel-based deaminase assay showing ssDNA and dsDNA activity of deaminase from several selected families (MG 93, MG138 and MG 139). The enzyme was expressed in bacterial (E.coli codon optimized) Purexpress cell lysate derived in vitro transcription translation system and incubated with 5' FAM labeled ssDNA or dsDNA and the USER enzyme (uracil DNA glycosylase and endonuclease VIII) for 2.5 hours at 37 ℃. The resulting DNA was resolved on a denaturing polyacrylamide gel and imaged. The positive control for ssDNA activity is a sequence synthesized to incorporate U at the same position as target C, and the negative control is a sequence without U or C. A positive control for dsDNA activity was DddA toxin deaminase (Mok, b.y., de morae, m.h., zeng, j. Et al) that has been recorded to be selective for dsDNA substrates, bacterial cytidine deaminase toxins enabled CRISPR-free mitochondrial base editing (a bacterial CYTIDINE DEAMINASE toxin enables CRISPR-free mitochondrial base editing).
Nature 583,631-637 2020.
https://doi.org/10.1038/s41586-020-2477-4)。
FIGS. 25A, 25B and 25C depict data demonstrating that a Cytosine Base Editor (CBE) containing a novel cytidine deaminase with spCas9, MG3-6 or MG34-1 effector shows different levels of editing in HEK293 cells. Each novel cytidine deaminase is fused to the N-terminus of an effector (spCas 9, MG3-6 or MG 34-1) by a linker. Uracil glycosylase inhibitor domains (UGI or MG 69-1) are fused to the C-terminus of the effector, followed by fusion to the Nuclear Localization Signal (NLS). Each CBE was transiently transfected into HEK293 cells and targeted to 5 different genomic positions with the corresponding sgRNA (indicating spacer sequences, targeted cytosines underlined). The level of editing (C to T (%)) of the spacer sequence and surrounding cytosines indicates CBEs with each different cytidine deaminase effector (n=3).
FIGS. 26A, 26B and 26C depict the activity of Cytidine Deaminase (CDA) fused to MG 3-6. Cytidine deaminase is fused to MG3-6 and its activity is assessed by targeting an engineered site in a reporter cell line. FIG. 26A shows the relative activity of various CDAs, with the controls used being the high activity CBE from documents A0A2K5RDN7 and rAPOBEC. FIG. 26B shows quantification of the activity of various CDAs compared to the high activity CDA A0A2K5RDN 7. Ext> FIG.ext> 26ext> Cext> showsext> thatext> MGext> 139ext> -ext> 52ext> activityext> highlightsext> Gext> -ext> Aext> conversionext>,ext> indicatingext> editingext> ofext> theext> strandext> inext> theext> oppositeext> strandext> -ext> DNAext> /ext> RNAext> heteroduplexext> inext> theext> Rext> loopext>.ext>
FIGS. 27A and 27B depict toxicity assays in mammalian cells. CDA toxicity was measured by stable expression of CDA as CBE (fused to MG 3-6). HEK293T cells stably expressing CBE were grown in puromycin for 3 days and live cells were stained with crystal violet. The crystal violet dye was then dissolved with 1% sds and quantified in a microplate reader. FIG. 27A shows a picture of cells stained with crystal violet; fig. 27B shows the quantification of fig. 27A. Absorbance was obtained in a microplate reader at 570 nm.
FIG. 28 depicts the mutations identified from chloramphenicol selection in E.coli. The r1v1 variant is the starting variant for the evolution experiments. 24 variants were identified and related mutations are shown in the table.
Fig. 29 depicts beneficial mutations identified from the variant screen in HEK 293T. The predicted structure of MG68-4 was aligned with tRNA Arg2 from Staphylococcus aureus (S.aureus) TadA (PDB 2B 3J). Key mutant residues are highlighted in the structural display.
FIG. 30 depicts selection of MG68-4 variants in HEK293T cells. Four wizards were used to screen the activity, edit window and sequence preference of the engineered variants.
FIG. 31 depicts the results of ABE-MG35-1 E.coli survival assay sequencing. For the first experimental repeat, surviving colonies were selected from the plates under chloramphenicol selection and Mulberry sequencing was performed. Sequencing of four of the five selected colonies showed a mutation on the negative strand from a back to G, restoring CAT function (framed nucleotides) on the positive strand from Y193 back to H. Bystander base editing was observed in two sequenced colonies out of the five sequenced colonies.
FIG. 32 depicts increased cytosine base editing efficiency upon Fam72a expression.
Fig. 33 depicts data demonstrating that the structurally optimized Adenine Base Editor (ABE) shows different levels of editing in HEK293 cells. Each of the 33 ABEs was constructed by inserting MG68-4 (D109N) deaminase upstream, downstream or within MG3-6_3-8 (D13A) nickase and cloning into the pCMV vector. These plasmids were co-transfected with a plasmid containing one of the 8 sgrnas targeting the HEK293 genome. The data shown are from sgrnas targeting ACAGACAAAACTGTGCTAGACA sequences. The editing level of A5, A7, A8, A9 and a10 (a to G (%)) within the spacer sequences is indicated, and the cell viability (n=2) for each individual experiment.
FIGS. 34A-34B depict a rational design of the MG68-4 variant. FIG. 34A depicts a structural alignment of E.coli TadA (PDB: 1z3 a) and predicted structure of MG68-4. tRNA constructs were retrieved from Staphylococcus aureus TadA (PDB: 2b3 j). FIG. 34B depicts mutations identified from EcTadA for the development of equivalent residues for adenine base editors (ABE 7.10, ABE8.8m, ABE8.17m and ABE8 e) and EcTadA on MG68-4. The EcTadA mutation was accordingly mounted to MG68-4. Identification of H129N was selected from bacteria in E.coli. In general, the nuclear localization signal (SV 40) is located on the C-terminal end. For the 2NLS construct, one SV40 on the N-terminus and one SV40 on the C-terminus were used. For simplicity, the deaminase sequences of the adenine base editor are shown in the table. Abbreviations: MGA0.1, MG68-4; MGA1.1, MG68-4 (D109N); MGA2.1, MG68-4 (D109N/H129N); RD, a rationally designed variant.
FIG. 35 depicts screening of adenine base editor in HEK293T cells. The first three variants are highlighted. The starting variant is MGA1.1. For the 2NLS construct, one SV40 on the N-terminus and one SV40 on the C-terminus were used. Abbreviations: MGA0.1, MG68-4; MGA1.1, MG68-4 (D109N); MGA2.1, MG68-4 (D109N/H129N); RD, a rationally designed variant.
FIG. 36 depicts a table summarizing the base editing activity of rationally designed ABE variants described herein.
Fig. 37 depicts the results of a gel-based deaminase assay showing the activity of variant deaminase from several selected families (MG 93, MG139 and MG 152). The enzyme was expressed in bacterial (E.coli codon optimized) Purexpress cell lysate derived in vitro transcription translation system and incubated with 5' FAM labeled ssDNA and USER enzyme (uracil DNA glycosylase and endonuclease VIII) for 2.5 hours at 37 ℃. The resulting DNA was resolved on a denaturing polyacrylamide gel and imaged. The positive control is a sequence synthesized at the same position as target C that incorporates U, and the negative control is a sequence without U or C.
FIGS. 38A-38C depict gel-based deaminase with dual fluorophore assay. FIG. 38A depicts a schematic of a substrate design. The substrate is designed to minimize overlap between the two fluorophores. The emission of Cy3 is about 560nm and the emission peak of Cy5.5 is about 700nm. FIGS. 38B and 38C depict TBE-urea gel images imaged using Cy3 and Cy5.5 filters, respectively. RF157 is a single nucleotide substrate with FAM molecules to act as a positive control to confirm that the USER enzyme is cleaving in the reaction and to provide a confirmation that the filter is functioning and can distinguish between either fluorophore. The master mix was used as a negative control to provide a baseline measurement of uncleaved substrate. Fig. 38B: deaminase that preferentially cleaves the substrate at T at the-1 position gives a fluorescent product of 65 nt. The substrate cleaved at C at the-1 position gave 45nt of product. Deaminase having activity on both C or T at the-1 position will give 30nt of product. Fig. 38C: deaminase that preferentially cleaves the substrate at G at the-1 position gives a fluorescent product of 65 nt. The substrate cleaved at C at the-1 position gave 45nt of product. Deaminase having activity on both A or G at the-1 position will give 30nt of product.
FIG. 39 depicts the percent deamination of each-1 position relative to target cytidine for each variant tested in this study (MG 93 and MG152 families).
Figure 40 depicts the percent deamination of each-1 position relative to target cytidine for each variant tested in this study (MG 139 family).
FIGS. 41A-41C depict a summary of activity data for novel and engineered CDAs as CBEs in mammalian cells. Fig. 41A depicts the detected maximum editing efficiency for all tested CDAs across 5 engineered spacers. Fig. 41B depicts the maximum activity detected normalized to the internal positive control across 5 engineered spacers. The internal experimental positive control used for normalization was the highly active CDA "A0A2K5RDN7". FIG. 41C depicts a side-by-side comparison of one of the leading candidates "139-52-V6" with the high activity positive control "A0A2K5RDN7" with 2 guides. 139-52-V6 shows similar editing efficiency compared to the high activity tested CDA.
FIG. 42 depicts the-1 nt preference for CDA as CBE with greater than 1% editing activity in mammalian cells. A comparison of-1 nt preference in mammalian cells versus in vitro is shown. The-1 preference observed as CBE in mammalian cells is largely comparable to the in vitro preference. In vitro preference shows a more relaxed pattern of CBE activity than in mammalian cells.
FIGS. 43A-43C depict examples of MG139-52 wt and mutating to A, MG139-52v6 at N27, which examples show differences in activity on ssDNA and/or on RNA: DNA duplex. FIG. 43A depicts structural prediction (pdb: 5W 3V) of MG139-52 using A3H as the template. The targeted mutation at N27 is indicated by an arrow and is located away from the catalytic center and recognition loop 7. FIG. 43B depicts a sketch showing DNA/RNA heteroduplex in the R loop targeted by 139-52 WT. Ext> CRISPRessoext> outputext> showsext> Gext> -ext> Aext> conversionext>,ext> indicatingext> deaminationext> inext> theext> DNAext> strandext> formingext> theext> DNAext> /ext> RNAext> heteroduplexext>.ext> Ext> FIG.ext> 43ext> Cext> depictsext> CRISPREssoext> outputsext> showingext> theext> eliminationext> ofext> Gext> -ext> Aext> changesext> inext> DNAext> /ext> RNAext> heteroduplexext> withext> theext> Next> 27ext> Aext> variantext>.ext> In contrast, such modifications occur outside of the DNA/RNA heteroduplex, indicating that deamination in the DNA/RNA heteroduplex has been compromised.
FIG. 44 depicts the edit window of the leading CDA compared to the high activity CDA A0A2K5RDN 7. The editing window shown corresponds to about 110nt. The R loop (Cas 9 target) is shown as square. The leading candidates 152-6 and 139-52-V6 have smaller edit windows than A0A2K5RDN7, which is an advantageous feature to avoid off-target editing. The engineered CDA 139-52-V6 shows a smaller edit window than its WT counterpart 139-52.
FIG. 45 depicts mammalian cytotoxicity of stably expressed CDA as CBE. CDA expressed as CBE is stably expressed in mammalian cells by lentiviral integration. Cytotoxicity was measured as fold change over low activity low cytotoxicity CDA (rapobecc). The leading candidate (high editing efficiency) showed moderate cytotoxic activity under these conditions. It will be appreciated that when the system is transiently expressed, the cytotoxic activity will be reduced.
FIGS. 46A-46B depict dimer designs for MG68-4 variants. FIG. 46A depicts the predicted structure of MG68-4 and the structural alignment of MG68-4 and SaTadA (PDB code: 2b3 j). The distance between the N-terminus of the first monomer and the C-terminus of the second monomer is shown. FIG. 46B depicts base editing efficiency comparing monomer and dimer designs. TadA x 8.8m was used for baseline testing. The target sequence is shown in the bar graph. The conversion of a to G is obtained from the highest edit position A8. All deaminase was fused to the N-terminus of MG34-1 (D10A). Edits were evaluated in HEK293T cells.
FIG. 47 depicts the effect of the D109Q mutation on the base substitution of C to G. A to G and C to G conversions are obtained from target sequences 633 and 634, respectively. The editing efficiency of residue C6 of target sequence 633 and residue A8 of target sequence 634 is shown. All deaminase was fused to the N-terminus of MG34-1 (D10A). Editing efficiency was assessed in HEK293T cells.
FIG. 48 depicts base editing efficiency of combinatorial libraries in HEK293T cells. Beneficial mutations identified from rational design and directed evolution were installed into MG68-4 to prepare combinatorial libraries. The variants were inserted into 3-68_div30_m_rdr1v1_b. Editing efficiency was assessed in HEK293T cells.
FIG. 49 depicts the effect of MG68-4 dimerization within the 3-68_DIV30 scaffold and/or MG68-4 amino acid sequence variants on the percent A to G conversion in HEK293T cells.
FIGS. 50A-50B depict data demonstrating that MG35-1 nicking enzyme can act as a scaffold for adenine base editors in E.coli cells. FIG. 50A depicts a schematic of the MG35-1 Adenine Base Editor (ABE) containing the C-terminal TadA x- (7.10) monomer and SV40 NLS fused to the C-terminal. FIG. 50B depicts a chloramphenicol selection experiment for evaluation of MG35-1 ABE base editing. Plasmids containing MG35-1 ABE, a non-functional Chloramphenicol Acetyl Transferase (CAT) gene and either a CAT gene-targeted (sgRNA-targeted) or a CAT gene-non-targeted (sgRNA-non-targeted) sgRNA were transformed into BL21 (DE 3) (Lucigen Co., ltd. (Lucigen)) E.coli cells. Coli survival under chloramphenicol selection was dependent on MG35-1 ABE editing the nonfunctional CAT gene to its wild-type sequence. The transformed E.coli was plated on plates containing chloramphenicol at concentrations of 0, 2,3, 4, and 8. Mu.g/mL. The plate also contained 100. Mu.g/mL card Bei Xilin and 0.1mM IPTG. Colonies grown on plates containing chloramphenicol concentrations of 0, 2,3, 4, and 8 μg/mL were sequenced to assess the reversion of the CAT gene. Experiments were performed with n=2.
FIG. 51 depicts the activity of 3-6/8ABE at Apoa 1. High a to G conversion was observed with 26 Apoa1 guides. For all spacers shown in the figure, the base conversions at all a positions within the spacer region are shown.
FIG. 52 depicts the activity of 3-6/8ABE at Angptl 3. High A to G conversion was observed with 5 Angptl3 guides. For all spacers shown in the figure, the base conversions at all a positions within the spacer region are shown.
FIG. 53 depicts the activity of 3-6/8ABE at Trac. High a to G conversion was observed with 2 Trac guides. For all spacers shown in the figure, the base conversions at all a positions within the spacer region are shown.
FIG. 54 depicts background 3-6/8ABE activity at Apoa 1. Primer pairs for activity guidance were tested on mock nuclear transfected samples to determine background editing at the targeted region. Scale is 0 to 1%.
FIGS. 55A-55E depict E.coli survival assays with nMG35-1 ABE. Coli was transformed with a plasmid containing nMG35-1-ABE, a non-functional chloramphenicol acetyl transferase (CAT Y193) gene, and sgrnas targeting the CAT gene (targeting spacer) or not targeting the CAT gene (disorder spacer). Fig. 55A depicts a graph showing a target sequence with an expected TAM. Cell growth was dependent on ABE base editing of the nonfunctional CAT gene (a from TAM/PAM at position 17, boxed) to restore activity. FIGS. 55B-55E depict base editing activity in E.coli comprising a base editor of nMG35-1 fused to TadA deaminase having a linker of various lengths. The X-axis shows the joints listed in Table 14.
FIGS. 56A-56D depict evaluation of nMG35-1ABE base editing in an E.coli survival assay under chloramphenicol selection, wherein cell growth is dependent upon editing the nonfunctional CAT gene stop codon and restoring active ABE base. Fig. 56A-56B depict diagrams showing target sequences with expected TAMs. The "a" base at position 11 (a) or 10 (B) expected from TAM (box) was edited as "G" in order to reverse the stop codon to glutamine and restore chloramphenicol (cm) resistance. Fig. 56C: coli was transformed with a plasmid containing nMG35-1-ABE, non-functional Chloramphenicol Acetyl Transferase (CAT) and sgrnas targeting the CAT gene (targeting spacer) or not targeting the CAT gene (no spacer). The transformed E.coli was grown on plates containing chloramphenicol at concentrations of 0, 2, 4, and 8. Mu.g/mL. The plate also contained 100. Mu.g/mL card Bei Xilin and 0.1mM IPTG. The nMG35-1-ABE targeting both STOP98Q and STOP122Q contains two STOP codons in the same gene that need to be reversed for CAT gene function. MIC: minimum inhibitory concentration. FIG. 56D depicts Mulberry sequencing chromatograms of five of 18 colonies grown at 2 μg/mL chloramphenicol with double inversion of STOP98Q and STOP122Q in the CAT gene. The chromatogram of the colony that did not show the reversal (colony 3) revealed a smaller peak for a to G conversion that could be obscured by co-conversion with the unedited plasmid.
FIG. 57 depicts data demonstrating that truncation of the predicted PLMP domain at the N-terminus of MG35-1 abrogates the function of MG35-1 ABE in E.coli. Coli was transformed with a plasmid containing nMG35-1-ABE, non-functional Chloramphenicol Acetyl Transferase (CAT) and sgRNA targeting either the CAT gene (WT (top row) or PLMP domain truncated (bottom row) MG35-1 ABE) or the non-target spacer (middle row: WT MG35-1 ABE with disordered spacers). The transformed E.coli was grown on plates containing chloramphenicol at concentrations of 0, 2, and 4. Mu.g/mL. The plate also contained 100. Mu.g/mL card Bei Xilin and 0.1mM IPTG. MIC: minimum inhibitory concentration.
Brief description of the sequence Listing
The sequence listing filed herewith provides exemplary polynucleotide and polypeptide sequences for use in methods, compositions and systems according to the present disclosure. The following is an exemplary description of sequences therein.
SEQ ID NOS.1-47 show full-length peptide sequences of MG66 deaminase suitable for the engineered nucleic acid editing systems described herein.
SEQ ID NOS.48-49 show full-length peptide sequences of MG67 deaminase suitable for the engineered nucleic acid editing systems described herein.
SEQ ID NOS.50-51 show full-length peptide sequences of MG68 deaminase suitable for the engineered nucleic acid editing systems described herein.
SEQ ID NOS.52-56 show sequences of uracil DNA glycosylase inhibitors suitable for the engineered nucleic acid editing systems described herein.
SEQ ID NOS.57-66 show the sequences of the reference deaminase.
SEQ ID NO. 67 shows the sequence of the reference uracil DNA glycosylase inhibitor.
SEQ ID NO. 68 shows the sequence of the adenine base editor.
SEQ ID NO. 69 shows the sequence of the cytosine base editor.
SEQ ID NOS.70-78 show full-length peptide sequences of MG nickases suitable for the engineered nucleic acid editing systems described herein.
SEQ ID NOS 79-87 show protospacers and PAM for use in the in vitro nicking enzyme assays described herein.
SEQ ID NOS 88-96 show peptide sequences of one-way guide RNAs for use in the in vitro nickase assays described herein.
SEQ ID NOS.97-156 shows the sequence of the spacer when targeting E.coli lacZ.
SEQ ID NOS.157-176 show the sequences of the primers when conducting site-directed mutagenesis.
SEQ ID NOS: 177-178 shows the sequences of the primers used for lacZ sequencing.
SEQ ID NOS.179-342 show the sequences of the primers used during amplification.
SEQ ID NOS 343-345 shows the sequences of the primers used for lacZ sequencing.
SEQ ID NOS 346-359 shows the sequences of the primers used during amplification.
SEQ ID NOS.360-368 show protospacer adjacent motifs suitable for use in the engineered nucleic acid editing systems described herein.
SEQ ID NOS.369-384 show Nuclear Localization Sequences (NLS) suitable for the engineered nucleic acid editing systems described herein.
385-443 Show the full-length peptide sequences of MG68 deaminase suitable for the engineered nucleic acid editing systems described herein.
SEQ ID NOS.444-447 show full-length peptide sequences of MG121 deaminase suitable for the engineered nucleic acid editing systems described herein.
SEQ ID NO 448-475 shows a full-length peptide sequence of MG68 deaminase suitable for the engineered nucleic acid editing system described herein.
SEQ ID NOS 476 and 477 show the sequences of the adenine base editor.
SEQ ID NOS 478-482 shows the sequence of a cytosine base editor.
SEQ ID NOS 483-487 shows sequences suitable for encoding plasmids of the engineered nucleic acid editing systems described herein.
SEQ ID NOS 488 and 489 show the sgRNA scaffold sequences of MG15-1 and MG 34-1.
SEQ ID NOS 490-522 show sequences for targeting spacers of genomic loci in E.coli and HEK293T cells.
SEQ ID NO. 523-585 shows the sequence of the primers used during amplification and Mulberry sequencing.
SEQ ID NO. 584-585 shows the sequence of the primer used during amplification.
SEQ ID NO. 586 shows the sequence of the adenine base editor.
SEQ ID NO. 587 shows the sequence of the cytosine base editor.
SEQ ID NO. 588-589 shows the sequence of the adenine base editor.
590-593 Shows the full-length peptide sequence of a linker suitable for the engineered nucleic acid editing systems described herein.
SEQ ID NO. 594 shows the sequence of cytosine deaminase.
SEQ ID NO. 595 shows the sequence of an adenosine deaminase.
SEQ ID NO 596 shows a sequence of an MG34 active effector suitable for the engineered nucleic acid editing system described herein.
SEQ ID NO 597 shows the sequence of MG34 nickase suitable for the engineered nucleic acid editing system described herein.
SEQ ID NO 598 shows the sequence of MG34 PAM.
SEQ ID NO 599-638 shows the full-length peptide sequence of MG138 cytidine deaminase suitable for the engineered nucleic acid editing system described herein.
SEQ ID NOS.639-659 shows the full-length peptide sequence of MG139 cytidine deaminase suitable for the engineered nucleic acid editing systems described herein.
SEQ ID NOS 660-662 shows the full-length peptide sequences of MG141 cytidine deaminase suitable for the engineered nucleic acid editing systems described herein.
SEQ ID NOS.663-664 show full-length peptide sequences of MG142 cytidine deaminase suitable for the engineered nucleic acid editing systems described herein.
665-675 Shows the full-length peptide sequence of MG93 cytidine deaminase suitable for the engineered nucleic acid editing systems described herein.
SEQ ID NOS.676-678 shows the sequence of the adenine base editor.
SEQ ID NOS 679-680 show the sgRNA scaffold sequences of MG34-1 and SpCas 9.
SEQ ID NO 681-689 shows spacer sequences for targeting genomic loci in guide RNA.
SEQ ID NOS.690-707 shows the sequences of primers used to amplify the genomic target of the Adenine Base Editor (ABE) for the Next Generation Sequencing (NGS) analysis.
SEQ ID NO. 708 shows the sequence of the blasticidin (blasticidin, BSD) resistance cassette.
SEQ ID NOS.709-719 show spacer sequences for targeting genomic loci in guide RNAs.
SEQ ID NOS.720-726 show sequences suitable for encoding plasmids of the engineered nucleic acid editing systems described herein.
SEQ ID NOS 728-729 shows the sequence of an adenine base editor.
SEQ ID NOS.730-736 shows spacer sequences for targeting genomic loci in guide RNAs.
SEQ ID NO. 737-738 shows sequences suitable for encoding plasmids of the engineered nucleic acid editing systems described herein.
SEQ ID NOS.739-740 shows the sequence of the cytidine base editor.
SEQ ID NO. 741 shows the sequence of a plasmid suitable for encoding the A1CF gene.
SEQ ID NO 742 shows the sequence used to test the RNA activity of CDA.
SEQ ID NO. 743 shows the sequence of the labeled primer for the poisoning primer extension assay for testing the RNA activity of CDA.
SEQ ID NO 744-827 shows a full-length peptide sequence of MG139 cytidine deaminase suitable for the engineered nucleic acid editing system described herein.
SEQ ID NO 828 shows a full-length peptide sequence of MG93 cytidine deaminase suitable for the engineered nucleic acid editing system described herein.
SEQ ID NO. 829 shows a full-length peptide sequence of MG142 cytidine deaminase suitable for the engineered nucleic acid editing system described herein.
SEQ ID NOS 830-835 show full-length peptide sequences of MG152 cytidine deaminase suitable for the engineered nucleic acid editing systems described herein.
SEQ ID NOS.836-860 shows the sequence of the adenine base editor.
SEQ ID NO 861-864 shows spacer sequences for targeting genomic loci in guide RNA.
SEQ ID NOS: 865-872 shows the sequences of primers used to amplify the genomic target of the Adenine Base Editor (ABE) for the Next Generation Sequencing (NGS) analysis.
SEQ ID NOS 873-875 show sequences suitable for encoding plasmids of the engineered nucleic acid editing systems described herein.
SEQ ID NO. 876 shows the sgRNA scaffold sequence of MG 34-1.
SEQ ID NOS.877-916 shows the sequences of cytosine base editors.
SEQ ID NO 917-931 shows a sequence of sgRNA suitable for the engineered nucleic acid editing systems described herein.
SEQ ID NOS.932-961 shows the sequences of primers used to amplify the genomic target of the Adenine Base Editor (ABE) for the Next Generation Sequencing (NGS) analysis.
SEQ ID NO. 962 shows 5 PAM engineered sites compatible with Cas9 and MG3-6 editing in mammalian cell lines.
SEQ ID NOS.963-967 show sequences of sgRNAs suitable for the engineered nucleic acid editing systems described herein.
SEQ ID NOS.968-969 show the sequences of the cytosine base editors.
SEQ ID NO 970 shows the full-length peptide sequence of MG139 cytidine deaminase suitable for the engineered nucleic acid editing system described herein.
SEQ ID NOS.971-977 show full-length peptide sequences of MG93 cytidine deaminase suitable for the engineered nucleic acid editing systems described herein.
SEQ ID NO 978-981 shows the full-length peptide sequence of MG138 cytidine deaminase suitable for the engineered nucleic acid editing systems described herein.
SEQ ID NO. 982 shows the full-length peptide sequence of MG142 cytidine deaminase suitable for the engineered nucleic acid editing system described herein.
SEQ ID NOS.983-1014 shows the full length peptide sequences of MG128 deaminase suitable for the engineered nucleic acid editing systems described herein.
SEQ ID NOS 1015-1026 shows the full length peptide sequence of MG129 deaminase suitable for the engineered nucleic acid editing systems described herein.
SEQ ID NO 1027-1031 shows the full-length peptide sequence of MG130 deaminase suitable for the engineered nucleic acid editing system described herein.
1032-1040 Shows the full-length peptide sequences of MG131 deaminase suitable for the engineered nucleic acid editing systems described herein.
SEQ ID NOS 1041-1043 show full-length peptide sequences of MG132 deaminase suitable for the engineered nucleic acid editing systems described herein.
SEQ ID NOS 1044-1057 shows the full-length peptide sequences of MG133 deaminase suitable for the engineered nucleic acid editing systems described herein.
SEQ ID NO. 1058-1061 shows the full-length peptide sequence of MG134 deaminase suitable for the engineered nucleic acid editing system described herein.
SEQ ID NOS 1062-1069 show full-length peptide sequences of MG135 deaminase suitable for the engineered nucleic acid editing systems described herein.
SEQ ID NO 1070-1081 shows the full-length peptide sequence of MG136 deaminase suitable for the engineered nucleic acid editing system described herein.
SEQ ID NO. 1082-1098 shows the full-length peptide sequence of MG137 deaminase suitable for the engineered nucleic acid editing system described herein.
SEQ ID NOS 1099-1105 shows the sequence of an sgRNA suitable for the engineered nucleic acid editing systems described herein.
SEQ ID NO 1106-1111 shows the sequence of MG35 PAM.
SEQ ID NO. 1112 shows the DNA sequence of the gene encoding the ABE-MG35-1 adenine base editor.
SEQ ID NO 1113 shows the protein sequence of the ABE-MG35-1 adenine base editor.
SEQ ID NO. 1114 shows the nucleotide sequence of a plasmid encoding a Cas 9-based Cytosine Base Editor (CBE).
SEQ ID NO. 1115 shows the nucleotide sequence of the plasmid encoding Fam72 a.
SEQ ID NOS 1116-1117 shows the sequence of the Cas9-CBE target site.
SEQ ID NOS 1118-1119 shows the sequence of the NGS amplicon.
SEQ ID NO. 1120 shows the full-length peptide sequence of the MG35 nuclease.
SEQ ID NO 1121 shows the full-length peptide sequence of Fam 72A.
SEQ ID NOS 1121-1127 show the full-length peptide sequences of MG35 nuclease.
SEQ ID NOS.1128-1160 shows the full-length peptide sequence of the MG3-6/3-8 adenine base editor.
SEQ ID NOS 1161-1186 shows the full-length peptide sequence of the MG34-1 adenine base editor.
SEQ ID NOS.1187-1195 shows the sequence of an sgRNA suitable for the engineered nucleic acid editing systems described herein.
SEQ ID NO 1196-1204 shows spacer sequences for targeting genomic loci in guide RNA.
SEQ ID NO. 1205 shows the nucleotide sequence of the plasmid encoding the MG3-6/3-8 adenine base editor.
SEQ ID NO. 1206 shows the nucleotide sequence of a plasmid encoding sgRNA suitable for the MG3-6/3-8 adenine base editor described herein.
SEQ ID NO. 1207 shows the nucleotide sequence of the plasmid encoding the MG34-1 adenine base editor.
SEQ ID NO 1208-1269 shows the full-length peptide sequence of MG93 deaminase suitable for the engineered nucleic acid editing system described herein.
SEQ ID NOS 1270-1296 show full-length peptide sequences of MG139 deaminase suitable for the engineered nucleic acid editing systems described herein.
1297-1311 Shows the full-length peptide sequence of MG152 deaminase suitable for the engineered nucleic acid editing systems described herein.
SEQ ID NOS 1312-1313 show full-length peptide sequences of MG138 deaminase suitable for the engineered nucleic acid editing systems described herein.
SEQ ID NO 1314-1315 shows the full-length peptide sequence of MG139 deaminase suitable for the engineered nucleic acid editing systems described herein.
SEQ ID NOS.1316-1319 show the nucleotide sequences of 5' -FAM-labeled ssDNA.
The nucleotide sequence of the Cy5.5-labeled ssDNA is shown in SEQ ID NOS.1320-1321.
SEQ ID NO. 1322-1355 shows the sequence of the cytidine base editor.
SEQ ID NO 1356-1362 shows the full-length peptide sequence of the MG34-1 adenine base editor.
SEQ ID NO. 1363-1415 shows the full-length peptide sequence of the MG3-6/3-8 adenine base editor.
SEQ ID NOS 1416-1417 show nucleotide sequences of sgRNAs suitable for use with the MG34-1 adenine base editor described herein.
SEQ ID NO 1418 shows the nucleotide sequence of sgRNA suitable for use with the MG3-6/3-8 adenine base editor described herein.
SEQ ID NOS 1419-1420 shows DNA sequences of target sites suitable for targeting by the MG34-1 adenine base editor described herein.
SEQ ID NO. 1421 shows a DNA sequence suitable for targeting a target site by the MG3-6/3-8 adenine base editor described herein.
SEQ ID NO. 1422 shows the nucleotide sequence of a plasmid suitable for expressing the MG34-1 adenine base editor described herein.
SEQ ID NO. 1423 shows the nucleotide sequence of a plasmid suitable for expressing the MG3-6/3-8 adenine base editor described herein.
SEQ ID NO. 1424 shows the full-length peptide sequence of the MG35-1 adenine base editor.
SEQ ID NOS.1425-1426 show nucleotide sequences of plasmids suitable for expressing the MG35-1 adenine base editor and sgRNA described herein.
SEQ ID NOS.1427-1428 show nucleotide sequences of sgRNAs suitable for use with the MG35-1 adenine base editor described herein.
SEQ ID NOS.1429-1430 shows a DNA sequence suitable for targeting a target site by the MG35-1 adenine base editor described herein.
SEQ ID NO. 1431-1454 shows a nucleotide sequence of sgRNA engineered to function with the MG3-6/3-8 adenine base editor to target APOA 1.
SEQ ID NOS 1455-1478 shows the DNA sequence of the APOA1 target site.
SEQ ID NO. 1479-1483 shows the nucleotide sequence of sgRNA engineered to work with the MG3-6/3-8 adenine base editor to target ANGPTL 3.
SEQ ID NOS 1484-1488 show the DNA sequence of the ANGPTL3 target site.
SEQ ID NOS 1489-1490 shows the nucleotide sequences of sgRNAs engineered to function with the MG3-6/3-8 adenine base editor to target TRAC.
SEQ ID NOS 1491-1492 show the DNA sequence of the TRAC site.
SEQ ID NOS.1493-1516 shows the nucleotide sequence of an NGS primer suitable for assessing base editing of APOA 1.
SEQ ID NO. 1517-1521 shows the nucleotide sequence of an NGS primer suitable for assessing base editing of ANGPTL 3.
SEQ ID NOS: 1522-1523 shows the nucleotide sequences of NGS primers suitable for use in assessing base editing of TRAC.
SEQ ID NOS: 1524-1547 shows the nucleotide sequence of an NGS primer suitable for assessing base editing of APOA 1.
SEQ ID NOS.1548-1552 shows the nucleotide sequences of NGS primers suitable for assessing base editing of ANGPTL 3.
SEQ ID NOS.1553-1554 shows the nucleotide sequences of NGS primers suitable for base editing for evaluation of TRAC.
SEQ ID NO. 1555 shows the nucleotide sequence of a plasmid suitable for mRNA production.
SEQ ID NO. 1556-1562 shows the full-length peptide sequence of the MG131 adenine deaminase variant.
SEQ ID NOS 1563-1566 shows the full-length peptide sequence of the MG134 adenine deaminase variant.
SEQ ID NOS 1567-1574 shows the full-length peptide sequence of the MG135 adenine deaminase variant.
SEQ ID NO. 1575-1589 shows the full-length peptide sequence of the MG137 adenine deaminase variant.
SEQ ID NOS 1590-1599 shows the full-length peptide sequence of the MG68 adenine deaminase variant.
SEQ ID NOS.1600-1602 shows the full-length peptide sequence of the MG132 adenine deaminase variant.
SEQ ID NOS.1603-1616 shows the full-length peptide sequence of the MG133 adenine deaminase variant.
SEQ ID NOS.1617-1624 shows the full-length peptide sequence of the MG136 adenine deaminase variant.
SEQ ID NO. 1625-1633 shows the full-length peptide sequence of the MG129 adenine deaminase variant.
SEQ ID NOS 1634-1638 show the full-length peptide sequence of the MG130 adenine deaminase variant.
SEQ ID NO. 1639-1644 shows the full-length peptide sequence of the MG34-1 adenine base editor.
SEQ ID NOS.1645-1646 shows the nucleotide sequence of ssDNA substrates suitable for testing adenine deaminase activity in vitro.
Detailed Description
While various embodiments of the present invention have been shown and described herein, it will be obvious to those skilled in the art that such embodiments are provided by way of example only. Numerous variations, changes, and substitutions will now occur to those skilled in the art without departing from the invention. It should be understood that various alternatives to the embodiments of the invention described herein may be employed.
Practice of some of the methods disclosed herein employs techniques of immunology, biochemistry, chemistry, molecular biology, microbiology, cell biology, genomics, and recombinant DNA unless otherwise indicated. See, e.g., sambrook and Green et al, molecular cloning: laboratory Manual (Molecular Cloning: A Laboratory Manual), 4 th edition (2012); cluster books "current molecular biology laboratory guidelines (Current Protocols in Molecular Biology)" (edited by F.M. Ausubel et al); books "methods of enzymology (Methods In Enzymology)" (academic Press company (ACADEMIC PRESS, inc.))) "PCR 2: practical methods (PCR 2:A Practical Approach) (M.J.MacPherson, B.D.Hames and G.R.Taylor edition (1995)); harlow and Lane editions (1988) antibody: laboratory manuals (Antibodies, A Laboratory Manual), animal cell culture: basic technology and specialized applications Manual (Culture of ANIMAL CELLS: A Manual of Basic Technique and Specialized Applications), 6 th edition (R.I. Freshney edit (2010)) (which is incorporated herein by reference in its entirety).
As used herein, the singular forms "a," "an," and "the" are intended to include the plural forms as well, unless the context clearly indicates otherwise. Furthermore, where the term "include" or "having" or variations thereof is used in the detailed description or claims, such term is intended to be inclusive in a manner similar to the term "comprising".
The term "about" or "approximately" means within an acceptable error range of a particular value as determined by one of ordinary skill in the art, which will depend in part on how the value is measured or determined, i.e., the limitations of the measurement system. For example, "about" may mean within one or more than one standard deviation in accordance with the practice in the art. Alternatively, "about" may mean a range of up to 20%, up to 15%, up to 10%, up to 5%, or up to 1% of a given value.
As used herein, "cell" generally refers to a biological cell. The cell may be the basic structure, function or biological unit of a living organism. The cells may be derived from any organism having one or more cells. Some non-limiting examples include: prokaryotic cells, eukaryotic cells, bacterial cells, archaebacterial cells, cells of single-cell eukaryotic organisms, protozoal cells, cells from plants (e.g., from crops, fruits, vegetables, grains, soybeans, corn, maize, wheat, seeds, tomatoes, rice, tapioca, sugarcane, pumpkin, hay, potatoes, cotton, hemp, tobacco, flowering plants, conifers, gymnosperms, ferns, lycopodium, goldfish algae, liverwort, moss cells), algae cells (e.g., botrytis (Botryococcus braunii), chlamydomonas reinhardtii (Chlamydomonas reinhardtii), nannochloropsis (Nannochloropsis gaditana), pyrenoidosa (Chlorella pyrenoidosa), c.agardh b., sargassum (sarbassum patents c.agadh), seaweed (e.g., kelp), fungal cells (e.g., yeast cells, cells from mushrooms), animal cells, cells from invertebrates (e.g., fruit, spiny, echinoderm, nematodes, etc.), cells from animals (e.g., fish, amphibians, reptiles, birds, rodents, rats, humans, etc.), non-human cells, rats, etc. Sometimes, the cells are not derived from a natural organism (e.g., the cells may be synthetically manufactured, sometimes referred to as artificial cells).
As used herein, the term "nucleotide" generally refers to a base-sugar-phosphate combination. Nucleotides may include synthetic nucleotides. Nucleotides may include synthetic nucleotide analogs. Nucleotides may be monomeric units of nucleic acid sequences such as deoxyribonucleic acid (DNA) and ribonucleic acid (RNA). The term nucleotide may comprise ribonucleoside triphosphates, adenosine Triphosphate (ATP), uridine Triphosphate (UTP), cytosine Triphosphate (CTP), guanosine Triphosphate (GTP) and deoxyribonucleoside triphosphates such as dATP, dCTP, dITP, dUTP, dGTP, dTTP or derivatives thereof. Such derivatives may comprise, for example, [ αS ] dATP, 7-deaza-dGTP and 7-deaza-dATP, as well as nucleotide derivatives which confer nuclease resistance to the nucleic acid molecules containing them. As used herein, the term nucleotide may refer to dideoxyribonucleoside triphosphates (ddntps) and derivatives thereof. Illustrative examples of dideoxyribonucleoside triphosphates can include, but are not limited to: ddATP, ddCTP, ddGTP, ddITP and ddTTP. The nucleotides may be unlabeled or detectably labeled, such as with a moiety comprising an optically detectable moiety (e.g., a fluorophore). The marks may also be made with quantum dots. The detectable label may comprise, for example, a radioisotope, a fluorescent label, a chemiluminescent label, a bioluminescent label, and an enzymatic label. Fluorescent labels for nucleotides may include, but are not limited to, fluorescein, 5-carboxyfluorescein (FAM), 2'7' -dimethoxy-4 ' 5-dichloro-6-carboxyfluorescein (JOE), rhodamine, 6-carboxyrhodamine (R6G), N, N, N ', N ' -tetramethyl-6-carboxyrhodamine (TAMRA), 6-carboxy-X-Rhodamine (ROX), 4- (4 ' -dimethylaminophenylazo) benzoic acid (DABCYL), waterfall blue, oreg green, texas red, cyan, and 5- (2 ' -aminoethyl) aminonaphthalene-1-sulfonic acid (EDANS). Specific examples of fluorescent-labeled nucleotides may include [R6G]dUTP、[TAMRA]dUTP、[R110]dCTP、[R6G]dCTP、[TAMRA]dCTP、[JOE]ddATP、[R6G]ddATP、[FAM]ddCTP、[R110]ddCTP、[TAMRA]ddGTP、[ROX]ddTTP、[dR6G]ddATP、[dR110]ddCTP、[dTAMRA]ddGTP and [ dROX ] ddTTP available from platinum elmer, inc. (PERKIN ELMER, foster City, calif.); fluoroLink deoxynucleotides, fluoroLink Cy3-dCTP, fluoroLink Cy-dCTP, fluoroLink Fluor X-dCTP, fluoroLink Cy3-dUTP and FluoroLink Cy-dUTP, available from Amersham, arlington Heights, ill.) at Allington, ill; fluorescein-15-dATP, fluorescein-12-dUTP, tetramethyl-rhodamine-6-dUTP, IR770-9-dATP, fluorescein-12-ddUTP, fluorescein-12-UTP, and fluorescein-15-2' -dATP, available from Boehringer Mannheim company (Boehringer Mannheim, indianapolis, ind.) of Indianapolis, ind; and chromosome-labeled nucleotides 、BODIPY-FL-14-UTP、BODIPY-FL-4-UTP、BODIPY-TMR-14-UTP、BODIPY-TMR-14-dUTP、BODIPY-TR-14-UTP、BODIPY-TR-14-dUTP、, waterfall blue-7-UTP, waterfall blue-7-dUTP, fluorescein-12-UTP, fluorescein-12-dUTP, oregon green 488-5-dUTP, rhodamine green-5-UTP, rhodamine green-5-dUTP, tetramethyl rhodamine-6-UTP, tetramethyl rhodamine-6-dUTP, texas Red-5-UTP, texas Red-5-dUTP, and Texas Red-12-dUTP available from Molecular Probes, inc. (Molecular Probes, eugene, oreg) of Eugene, oreg. Nucleotides may also be labeled or tagged by chemical modification. The chemically modified mononucleotide may be biotin-dNTP. Some non-limiting examples of biotinylated dNTPs may comprise biotin-dATP (e.g., bio-N6-ddATP, biotin-14-dATP), biotin-dCTP (e.g., biotin-11-dCTP, biotin-14-dCTP), and biotin-dUTP (e.g., biotin-11-dUTP, biotin-16-dUTP, biotin-20-dUTP).
The terms "polynucleotide", "oligonucleotide" and "nucleic acid" are used interchangeably to generally refer to a polymeric form of nucleotides of any length, i.e., deoxyribonucleotides or ribonucleotides or analogs thereof, in single-stranded, double-stranded or multi-stranded form. Polynucleotides may be exogenous or endogenous to the cell. The polynucleotide may be present in a cell-free environment. The polynucleotide may be a gene or fragment thereof. The polynucleotide may be DNA. The polynucleotide may be RNA. The polynucleotide may have any three-dimensional structure and may perform any function. Polynucleotides may include one or more analogs (e.g., altered backbones, sugars, or nucleobases). Modification of the nucleotide structure, if present, may be imparted either before or after assembly of the polymer. Some non-limiting examples of analogs include: 5-bromouracil, peptide nucleic acids, heterologous nucleic acids, morpholino, locked nucleic acids, glycerol nucleic acids, threose nucleic acids, dideoxynucleotides, cordycepin, 7-deaza-GTP, fluorophores (e.g., rhodamine or fluorescein linked to sugars), thiol-containing nucleotides, biotin-linked nucleotides, fluorescent base analogues, cpG islands, methyl-7-guanosine, methylated nucleotides, inosine, thiouridine, pseudouridine, dihydrouridine, plait-glycosides and Russian glycosides. Non-limiting examples of polynucleotides include coding or non-coding regions of a gene or gene fragment, multiple loci (one locus) defined according to ligation assays, exons, introns, messenger RNAs (mRNA), transfer RNAs (tRNA), ribosomal RNAs (rRNA), short interfering RNAs (siRNA), short hairpin RNAs (shRNA), micrornas (miRNA), ribozymes, cDNA, recombinant polynucleotides, branched polynucleotides, plasmids, vectors, isolated DNA of any sequence, isolated RNA of any sequence, cell-free polynucleotides comprising cell-free DNA (cfDNA) and cell-free RNA (cfRNA), nucleic acid probes and primers. The sequence of nucleotides may be interspersed with non-nucleotide components.
The term "transfection" or "transfected (transfected)" generally refers to the introduction of nucleic acids into cells by non-viral or viral-based methods. The nucleic acid molecule may be a gene sequence encoding the whole protein or a functional part thereof. See, e.g., sambrook et al (1989), molecular cloning: laboratory Manual, 18.1-18.88.
The terms "peptide," "polypeptide," and "protein" are used interchangeably herein to generally refer to a polymer of at least two amino acid residues joined by peptide bonds. This term does not denote a specific length of the polymer nor is it intended to suggest or distinguish whether the peptide was produced using recombinant techniques, chemical or enzymatic synthesis or naturally occurring. The term applies to naturally occurring amino acid polymers and amino acid polymers comprising at least one modified amino acid. In some cases, the polymer may be interspersed with non-amino acids. The term encompasses amino acid chains of any length, including full-length proteins as well as proteins with or without secondary or tertiary structures (e.g., domains). The term also encompasses amino acid polymers that have been modified; for example by disulfide bond formation, glycosylation, lipidation, acetylation, phosphorylation, oxidation and any other manipulation, such as conjugation with a labeling component. As used herein, the terms "amino acids" and "amino acids" generally refer to natural and unnatural amino acids, including, but not limited to, modified amino acids and amino acid analogs. The modified amino acids may comprise natural amino acids and unnatural amino acids that have been chemically modified to comprise groups or chemical moieties that do not naturally occur on the amino acid. Amino acid analogs may refer to amino acid derivatives. The term "amino acid" encompasses D-amino acids and L-amino acids.
As used herein, "non-native" may generally refer to a nucleic acid or polypeptide sequence that is not found in a native nucleic acid or protein. Non-natural may refer to an affinity tag. Non-natural may refer to fusion. Non-naturally may refer to naturally occurring nucleic acid or polypeptide sequences that include mutations, insertions, or deletions. The non-native sequence may exhibit or encode an activity (e.g., enzymatic activity, methyltransferase activity, acetyltransferase activity, kinase activity, ubiquitination activity, etc.) that may also be exhibited by a nucleic acid or polypeptide sequence fused to the non-native sequence. The non-native nucleic acid or polypeptide sequence may be joined to a naturally occurring nucleic acid or polypeptide sequence (or variant thereof) by genetic engineering to produce a chimeric nucleic acid or polypeptide sequence encoding a chimeric nucleic acid or polypeptide.
As used herein, the term "promoter" generally refers to a regulatory DNA region that controls transcription or expression of a gene and may be located adjacent to or overlapping with a nucleotide or nucleotide region that initiates transcription of RNA. Promoters may contain specific DNA sequences that bind protein factors (commonly referred to as transcription factors) that promote binding of RNA polymerase to DNA, thereby resulting in transcription of the gene. A 'base promoter' (also referred to as a 'core promoter') may generally refer to a promoter that contains all essential elements that promote transcriptional expression of an operably linked polynucleotide. The eukaryotic base promoter may contain a TATA box or CAAT box.
As used herein, the term "expression" generally refers to the process of transcribing a nucleic acid sequence or polynucleotide (e.g., into mRNA or other RNA transcript) from a DNA template or the subsequent translation of the transcribed mRNA into a peptide, polypeptide, or protein. Transcripts and encoded polypeptides may be collectively referred to as "gene products". If the polynucleotide is derived from genomic DNA, expression may comprise splicing of mRNA in eukaryotic cells.
As used herein, "operably linked," "operably linked," or grammatical equivalents thereof generally refers to the juxtaposition of genetic elements, such as promoters, enhancers, polyadenylation sequences, and the like, wherein the elements are in a relationship permitting them to operate in a desired manner. For example, a regulatory element, which may include a promoter or enhancer sequence, is operably linked to a coding region if the regulatory element helps to initiate transcription of the coding sequence. Insertion residues may be present between the regulatory element and the coding region as long as this functional relationship is maintained.
As used herein, "vector" generally refers to a macromolecule or association of macromolecules that include or are associated with a polynucleotide and that can be used to mediate the delivery of the polynucleotide to a cell. Examples of vectors include plasmids, viral vectors, liposomes, and other gene delivery vehicles. Vectors typically include genetic elements (e.g., regulatory elements) operably linked to a gene to facilitate expression of the gene in a target.
As used herein, an "expression cassette" and a "nucleic acid cassette" are generally used interchangeably to refer to a combination of nucleic acid sequences or elements that are expressed together or operably linked for expression. In some cases, an expression cassette refers to a combination of a regulatory element and one or more genes that are operably linked for expression.
"Functional fragment" of a DNA or protein sequence generally refers to a fragment that retains a biological activity (function or structure) substantially similar to that of the full-length DNA or protein sequence. The biological activity of a DNA sequence may be its ability to affect expression in a manner attributed to the full length sequence.
As used herein, an "engineered" object generally indicates that the object has been modified by human intervention. According to a non-limiting example: nucleic acids may be modified by changing their sequence to a sequence that does not exist in nature; nucleic acids can be modified by ligating them to nucleic acids with which they are not associated in nature, such that the ligation product has a function that is not present in the original nucleic acid; the engineered nucleic acid can be synthesized in vitro using sequences that do not exist in nature; proteins may be modified by changing their amino acid sequence to a sequence that does not exist in nature; engineered proteins may acquire new functions or properties. An "engineered" system includes at least one engineered component.
As used herein, "synthetic" and "artificial" are used interchangeably to refer to a protein or domain thereof that has low sequence identity (e.g., less than 50% sequence identity, less than 25% sequence identity, less than 10% sequence identity, less than 5% sequence identity, less than 1% sequence identity) to a naturally occurring human protein. For example, the VPR and VP64 domains are synthetic transactivation domains.
As used herein, the term "tracrRNA" or "tracrRNA sequence" may generally refer to a nucleic acid having at least about 5%, 10%, 20%, 30%, 40%, 50%, 60%, 70%, 80%, 90%, 95% or 100% sequence identity or sequence similarity to a wild-type example tracrRNA sequence (e.g., tracrRNA from streptococcus pyogenes, staphylococcus aureus, etc.). tracrRNA may refer to a nucleic acid having up to about 5%, 10%, 20%, 30%, 40%, 50%, 60%, 70%, 80%, 90% or 100% sequence identity or sequence similarity to a wild-type example tracrRNA sequence (e.g., tracrRNA from streptococcus pyogenes, staphylococcus aureus, etc.). tracrRNA may refer to a modified form of tracrRNA, which may include nucleotide changes, such as deletions, insertions or substitutions, variants, mutations or chimeras. tracrRNA may refer to a nucleic acid that is at least about 60% identical to a wild-type exemplary tracrRNA sequence (e.g., tracrRNA from streptococcus pyogenes, staphylococcus aureus, etc.) over a stretch of at least 6 contiguous nucleotides. For example, the tracrRNA sequence may be at least about 60% identical, at least about 65% identical, at least about 70% identical, at least about 75% identical, at least about 80% identical, at least about 85% identical, at least about 90% identical, at least about 95% identical, at least about 98% identical, at least about 99% identical, or 100% identical to a wild-type, exemplary tracrRNA (e.g., tracrRNA from streptococcus pyogenes, staphylococcus aureus, etc.) sequence over a stretch of at least 6 contiguous nucleotides. By identifying regions complementary to part of the repeat sequence in adjacent CRISPR arrays, a type II tracrRNA sequence can be predicted on genomic sequences.
As used herein, a "guide nucleic acid" may generally refer to a nucleic acid that can hybridize to another nucleic acid. The guide nucleic acid may be RNA. The guide nucleic acid may be DNA. The guide nucleic acid may be programmed to site-specifically bind to the nucleic acid sequence. The nucleic acid or target nucleic acid to be targeted may comprise nucleotides. The guide nucleic acid may comprise nucleotides. A portion of the target nucleic acid may be complementary to a portion of the guide nucleic acid. The strand of the double-stranded target polynucleotide that is complementary to and hybridizes to the guide nucleic acid may be referred to as the complementary strand. The strand of the double-stranded target polynucleotide that is complementary to the complementary strand, and thus may not be complementary to the guide nucleic acid, may be referred to as the non-complementary strand. The guide nucleic acid may comprise a polynucleotide strand, and may be referred to as a "one-way guide nucleic acid". The guide nucleic acid may comprise two polynucleotide strands and may be referred to as a "bidirectional guide nucleic acid". The term "guide" may be included, if not otherwise stated, to refer to both unidirectional and bidirectional guides. The guide nucleic acid may include a segment that may be referred to as a "nucleic acid targeting segment" or a "nucleic acid targeting sequence. The nucleic acid targeting segment may comprise a sub-segment, which may be referred to as a "protein binding segment" or "protein binding sequence" or "Cas protein binding segment.
In the context of two or more nucleic acid or polypeptide sequences, the term "sequence identity" or "percent identity" generally refers to sequences that are identical or have the same specified percentage of amino acid residues or nucleotides when compared and aligned within a local or global comparison window to obtain maximum correspondence, e.g., in a pairwise alignment, or more (e.g., in a multiple sequence alignment), as measured using a sequence comparison algorithm. Suitable sequence comparison algorithms for polypeptide sequences include BLASTP, for example, using a parameter with a word length (W) of 3 and an expected value (E) of 10 to set the gap penalty to 11, extend 1 and adjust using the conditional composition scoring matrix for polypeptide sequences longer than 30 residues; BLASTs using parameters with word length (W) of 2, expected value (E) of 1000000, and PAM30 scoring matrix (gap penalty set to 9 for sequences less than 30 residues to open the gap and to 1 to extend the gap) (these are default parameters for BLASTs in BLAST suite available at https:// BLAST. CLUSTALW with parameters; smith-Waterman homology search algorithm with the following parameters: match 2, mismatch-1 and void-1; MUSCLE with default parameters; MAFFT with the following parameters: retree is 2 and maxiterations is 1000; novafold with default parameters; HMMER HMMALIGN with default parameters.
As used herein, the term "ruvc_iii domain" generally refers to the third discontinuous segment of the RuvC endonuclease domain (RuvC nuclease domain comprises three discontinuous segments ruvc_ I, ruvC _ii and ruvc_iii). RuvC domains or segments thereof can generally be identified by alignment with recorded domain sequences, structural alignment with proteins with annotated domains, or by comparison with hidden markov models (Hidden Markov Model, HMM) constructed based on recorded domain sequences (e.g., PFAM HMM PF18541 of RuvC III).
As used herein, the term "HNH domain" generally refers to an endonuclease domain having characteristic histidine and asparagine residues. HNH domains can generally be identified by alignment with recorded domain sequences, structural alignment with proteins with annotated domains, or by comparison with Hidden Markov Models (HMMs) constructed based on recorded domain sequences (e.g., PFAM HMM PF01844 of domain HNH).
As used herein, the term "base editor" generally refers to an enzyme that catalyzes the conversion of one target base or base pair to another (e.g., a: T to G: C, C: G to T: a) without the need to create and repair a double strand break. In some embodiments, the base editor is a deaminase.
As used herein, the term "deaminase" generally refers to a protein or enzyme that catalyzes a deamination reaction. In some embodiments, the deaminase is an adenosine deaminase that catalyzes the hydrolytic deamination of adenine or adenosine (e.g., an engineered adenosine deaminase that deaminates adenosine in DNA). In some embodiments, the deaminase or deaminase domain is a cytidine (or cytosine) deaminase that catalyzes the hydrolytic deamination of cytidine (or cytosine) or deoxycytidine to uridine (or uracil) or deoxyuridine, respectively. In some embodiments, the deaminase or deaminase domain is a cytidine (or cytosine) deaminase domain that catalyzes the hydrolytic deamination of cytosine (or cytosine) to uracil (or uridine). In some embodiments, the deaminase or deaminase domain is a naturally occurring deaminase from an organism, such as a human, chimpanzee, gorilla, monkey, cow, dog, rat, mouse, or bacterium (e.g., e.coli). In some embodiments, the deaminase or deaminase domain is a variant of a naturally-occurring deaminase from an organism that does not occur in nature.
In the context of two or more nucleic acid or polypeptide sequences, the term "optimal alignment" generally refers to two (e.g., in a pairwise alignment) or more (e.g., in a multiple sequence alignment) sequences that have been aligned with the maximum correspondence of amino acid residues or nucleotides, e.g., as determined by the alignment that yields the highest or "optimal" percent identity score.
The present disclosure includes variants of any of the enzymes described herein having one or more conservative amino acid substitutions. Such conservative substitutions may be made in the amino acid sequence of the polypeptide without disrupting the three-dimensional structure or function of the polypeptide. Conservative substitutions may be made by substituting amino acids of similar hydrophobicity, polarity, and R chain length for each other. Additionally or alternatively, by comparing aligned sequences of homologous proteins from different species, conservative substitutions may be identified by locating mutated amino acid residues between the species (e.g., non-conservative residues that do not alter the essential function of the encoded protein). Such conservatively substituted variants can include variants that are at least about 20%, at least about 25%, at least about 30%, at least about 35%, at least about 40%, at least about 45%, at least about 50%, at least about 55%, at least about 60%, at least about 65%, at least about 70%, at least about 75%, at least about 80%, at least about 85%, at least about 90%, at least about 91%, at least about 92%, at least about 93%, at least about 94%, at least about 95%, at least about 96%, at least about 97%, at least about 98%, at least about 99% identical to any of the endonuclease protein sequences described herein. In some embodiments, such conservatively substituted variants are functional variants. Such functional variants may encompass sequences with substitutions such that the activity of one or more critical active site residues or guide RNA binding residues of the endonuclease is not disrupted.
The disclosure also includes variants of any of the enzymes described herein that replace one or more catalytic residues to reduce or eliminate the activity of the enzyme (e.g., variants with reduced activity). In some embodiments, variants that are reduced in activity of the proteins described herein include destructive substitutions of at least one, at least two, or all three catalytic residues. In some embodiments, any of the endonucleases described herein can include nicking enzyme mutations. In some embodiments, any of the endonucleases described herein can include RuvC domains lacking nuclease activity. In some embodiments, any of the endonucleases described herein can be configured to cleave one strand of a double-stranded target deoxyribonucleic acid. In some embodiments, any of the endonucleases described herein can include a nucleic acid that can be configured to lack endonuclease activity or catalyze death.
Conservative representations of amino acids that provide functional similarity can be obtained from various references (see, e.g., cright on, protein: structural and molecular Properties (Proteins: structures and Molecular Properties) (W H Frieman Press (W H FREEMAN & Co.))); 2 nd edition (12 1993)). The following eight groups each contain amino acids that are conservatively substituted with each other:
1) Alanine (a), glycine (G);
2) Aspartic acid (D), glutamic acid (E);
3) Asparagine (N), glutamine (Q);
4) Arginine (R), lysine (K);
5) Isoleucine (I), leucine (L), methionine (M), valine (V);
6) Phenylalanine (F), tyrosine (Y), tryptophan (W);
7) Serine (S), threonine (T); and
8) Cysteine (C), methionine (M)
SUMMARY
The discovery of new CRISPR enzymes with unique functions and structures may offer the possibility to further disrupt deoxyribonucleic acid (DNA) editing techniques, thereby improving speed, specificity, function and ease of use. Functionally characterized CRISPR enzymes are relatively few in the literature relative to the predicted prevalence of Clustered Regularly Interspaced Short Palindromic Repeats (CRISPR) systems in microorganisms and the pure diversity of microbial species. This is in part because a large number of microbial species may not be readily cultivated under laboratory conditions. Metagenomic sequencing of natural environmental niches representing a large number of microbial species may provide the possibility of greatly increasing the number of new CRISPR systems recorded and accelerating the discovery of new oligonucleotide editing functions. A recent example of the success of this approach was demonstrated by the CasX/CASY CRISPR system found by metagenomic analysis of the natural microbial community in 2016.
The CRISPR system is an RNA-guided nuclease complex that has been described as acting as an adaptive immune system in microorganisms. In the natural case of CRISPR systems, they occur in CRISPR (clustered regularly interspaced short palindromic repeats) operons or loci, which typically comprise two parts: (i) An array of short repeated sequences (30-40 bp) separated by equally short spacer sequences encoding RNA-based targeting elements; and (ii) an ORF encoding a nuclease polypeptide directed by an RNA-based targeting element in conjunction with an accessory protein/enzyme. Efficient nuclease targeting of a specific target nucleic acid sequence typically requires both: (i) Complementary hybridization between the first 6-8 nucleic acids of the target (target seed) and the crRNA guide; and (ii) the presence of a Protospacer Adjacent Motif (PAM) sequence within the defined vicinity of the target seed (PAM is typically a sequence that is not commonly represented within the host genome). CRISPR systems are generally classified into 2 categories, 5 types and 16 subtypes based on shared functional characteristics and evolutionary similarity, depending on the exact function and organization of the system (see fig. 1).
Class I CRISPR systems have large multi-subunit effector complexes and include type I, type III and type IV.
Type I CRISPR systems are considered to be moderately complex in terms of composition. In a type I CRISPR system, an array of RNA targeting elements is transcribed into long precursor crrnas (pre-crrnas) which are treated at repeat elements to release short mature crrnas that direct nuclease complexes to nucleic acid targets when followed by a suitable short consensus sequence called a Protospacer Adjacent Motif (PAM). This treatment is performed by an endoribonuclease subunit (Cas 6) of a large endonuclease complex called cascade, which also includes the nuclease (Cas 3) protein component of the crRNA-guided nuclease complex. Type I nucleases act primarily as DNA nucleases.
Type III CRISPR systems may be characterized by the presence of a central nuclease called Cas10 and a repeat-related mysterious protein (RAMP) comprising Csm or Cmr protein subunits. As in the type I system, mature crrnas are treated from pre-crrnas using Cas 6-like enzymes. Unlike type I and type II systems, type III systems appear to target and cleave DNA-RNA duplex (e.g., DNA strand that serves as a template for RNA polymerase).
Type IV CRISPR systems have an effector complex comprising two genes of the RAMP proteins of the highly reduced large subunit nuclease (csf 1), cas5 (csf 3) and Cas7 (csf 2) groups and in some cases the genes of the predicted small subunits; such systems are typically found on endogenous plasmids.
Class II CRISPR systems typically have single polypeptide multi-domain nuclease effectors and include type II, type V and type VI.
Type II CRISPR systems are considered to be the simplest in terms of composition. In a type II CRISPR system, the processing of a CRISPR array into mature crrnas does not require the presence of specific endonuclease subunits, but rather small trans-encoded crRNA (tracrRNA), the region of which is complementary to the array repeat sequence; the tracrRNA interacts with its corresponding effector nuclease (e.g., cas 9) and the repeat sequence to form a precursor dsRNA structure that is cleaved by endogenous rnase III, thereby producing a mature effector enzyme loaded with both the tracrRNA and crRNA. Type II nucleases are known as DNA nucleases. Type 2 effectors typically exhibit a structure comprising RuvC-like endonuclease domains that employ an rnase H fold, wherein the fold of RuvC-like nuclease domains has an unrelated HNH nuclease domain inserted within. RuvC-like domains are responsible for cleavage of the target (e.g., crRNA complement) DNA strand, while HNH domains are responsible for cleavage of the displaced DNA strand.
The V-type CRISPR system is characterized by a nuclease effector (e.g., cas 12) structure similar to that of a type II effector comprising RuvC-like domains. Similar to type II, most (but not all) V-type CRISPR systems use tracrRNA to process crRNA precursors into mature crrnas; however, unlike type II systems, which require rnase III to cleave a crRNA precursor into multiple crrnas, type V systems are able to cleave a crRNA precursor using the effector nuclease itself. Like the type II CRISPR system, the type V CRISPR system is again referred to as a DNA nuclease. Unlike the type II CRISPR system, some V-enzymes (e.g., cas12 a) appear to have strong single-stranded non-specific deoxyribonuclease activity activated by the first crRNA directed cleavage of a double-stranded target sequence.
Type VI CRISPR systems have RNA-guided RNA endonucleases. A single polypeptide effector of a type VI system (e.g., cas 13) includes two HEPN ribonuclease domains instead of a RuvC-like domain. Unlike both type II and type V systems, type VI systems also do not require tracrRNA in some cases to process pre-crRNA into crRNA. However, like the V-type system, some VI-type systems (e.g., C2) appear to have strong single-stranded non-specific nuclease (ribonuclease) activity activated by the first crRNA directed cleavage of the target RNA.
Because of the simpler architecture of class II CRISPR, it has been most widely used for engineering and development as a designer nuclease/genome editing application.
One of the early adaptations of such systems for in vitro use can be found in Jinek et al, (Science) 2012, 8, 17; 337 (6096): 816-21, which is incorporated herein by reference in its entirety. Jinek study first describes a system involving (i) recombinant expressed, purified full-length Cas9 (e.g., class II enzyme) isolated from streptococcus pyogenes SF 370; (ii) Purified mature about 42nt crRNA carrying about 20nt 5 'sequence complementary to the target DNA sequence to be cleaved, followed by a 3' tracr binding sequence (in vitro transcription of the entire crRNA from a synthetic DNA template carrying a T7 promoter sequence); (iii) In vitro transcription of purified tracrRNA from a synthetic DNA template carrying a T7 promoter sequence; and (iv) Mg 2+. Jinek later describes an improved engineered system in which the crRNA of (ii) is linked to the 5' end of (iii) by a linker (e.g., GAAA) to form a single fusion synthetic guide RNA (sgRNA) capable of itself guiding Cas9 to a target.
Mali et al, (science, 2013, month 2, 15; 339 (6121): 823-826), which is incorporated herein by reference in its entirety), later apply this system to mammalian cells by providing a DNA vector encoding: (i) An ORF encoding a codon optimized Cas9 (e.g., class II, type II enzyme) under a suitable mammalian promoter having a C-terminal nuclear localization sequence (e.g., SV40 NLS) and a suitable polyadenylation signal (e.g., tkpa signal); and (ii) an ORF encoding an sgRNA (having a5 'sequence starting with G followed by a 20nt complementary targeting nucleic acid sequence linked to a 3' tracr binding sequence, a linker and a tracrRNA sequence) under a suitable polymerase III promoter (e.g., U6 promoter).
Base editing
Base editing is an enzyme that converts one target base or base pair to another (e.g., A: T to G: C, C: G to T: A) without the need to create and repair a double strand break. Base editing can be accomplished with the aid of DNA and RNA base editors that allow for the introduction of point mutations in the DNA or RNA at specific sites. In general, a DNA base editor may comprise a fusion of a catalytically inactive nuclease and a catalytically active base modifying enzyme that acts on single stranded DNA (ssDNA). The RNA base editor may include a similar RNA-specific enzyme. Base editing can increase the efficiency of gene modification while reducing off-target and random mutations in DNA.
DNA base editors are engineered ribonucleoprotein complexes that act as a tool for single base substitution in cells and organisms. It can be produced by fusing an engineered base modifying enzyme that is unable to cleave dsDNA with a catalytic deficient CRISPR endonuclease variant, but which is able to stretch the dsDNA in a Protospacer Adjacent Motif (PAM) sequence dependent manner so that the guide RNA can find its complementary target to indicate the ssDNA splice site. The guide RNA anneals to complementary DNA, displacing fragments of ssDNA and directing CRISPR 'scissors' to the base modification site. The cell repair mechanism will repair the nicked, unedited strand using information from the complementary edited template.
To date, two types of DNA editors, the Cytosine Base (CBE) and Adenine Base Editors (ABE), have been developed. It has been demonstrated to edit point mutations in DNA efficiently and accurately with minimal off-target DNA editing (see, nat Biotechnol.) (2017; 35:435-437, & Nat Biotechnology 2017;35:438-440, and Nat Biotechnology 2017;35:475-480, each of which is incorporated herein by reference in its entirety). However, recent findings suggest that off-target modifications are present in DNA, and that many off-target modifications are also introduced into RNA by DNA base editors.
MG base editor
In some aspects, the present disclosure provides an engineered nucleic acid editing system comprising: (a) An endonuclease comprising a RuvC domain and an HNH domain, wherein the endonuclease is derived from an uncultured microorganism, wherein the endonuclease is a type 2 II Cas endonuclease; and wherein the endonuclease is configured to lack nuclease activity; (b) A base editor coupled to the endonuclease; and (c) an engineered guide ribonucleic acid structure configured to form a complex with the endonuclease, the engineered guide ribonucleic acid structure comprising: (i) A guide ribonucleic acid sequence configured to hybridize to a target deoxyribonucleic acid sequence; and (ii) a ribonucleic acid sequence configured to bind to the endonuclease. In some embodiments, an endonuclease comprises a sequence that has at least 80%, at least 81%, at least 82%, at least 83%, at least 84%, at least 85%, at least 86%, at least 87%, at least 88%, at least 89%, at least 90%, at least 91%, at least 92%, at least 93%, at least 94%, at least 95%, at least 96%, at least 97%, at least 98%, at least 99% or 100% sequence identity to any one of SEQ ID NOs 70-78 or 597, or a variant thereof. In some cases, ruvC domains lack nuclease activity. In some cases, the endonuclease includes a nicking enzyme mutation. In some cases, the endonuclease is configured to cleave one strand of a double stranded target deoxyribonucleic acid. In some cases, ribonucleic acid sequences configured to bind to endonucleases include tracr sequences.
In some aspects, the present disclosure provides an engineered nucleic acid editing system comprising: (a) An endonuclease having at least 80%, at least 81%, at least 82%, at least 83%, at least 84%, at least 85%, at least 86%, at least 87%, at least 88%, at least 89%, at least 90%, at least 91%, at least 92%, at least 93%, at least 94%, at least 95%, at least 96%, at least 97%, at least 98%, at least 99% or 100% sequence identity to any one of SEQ ID nos. 70-78 or 597, or a variant thereof, wherein the endonuclease is configured to lack nuclease activity; a base editor coupled to the endonuclease; and an engineered guide ribonucleic acid structure configured to form a complex with the endonuclease, the engineered guide ribonucleic acid structure comprising: (i) A guide ribonucleic acid sequence configured to hybridize to a target deoxyribonucleic acid sequence; and (ii) a ribonucleic acid sequence configured to bind to the endonuclease. In some cases, ribonucleic acid sequences configured to bind to endonucleases include tracr sequences. In some cases, ruvC domains lack nuclease activity. In some cases, the endonuclease includes a nicking enzyme mutation. In some cases, the endonuclease is configured to cleave one strand of a double stranded target deoxyribonucleic acid.
In some aspects, the present disclosure provides an engineered nucleic acid editing system comprising: (a) An endonuclease configured to bind to a Protospacer Adjacent Motif (PAM) sequence comprising any one of SEQ ID NOs 360-368 or 598, wherein the endonuclease is a type 2 II Cas endonuclease; and the endonuclease is configured to lack nuclease activity; and (b) a base editor coupled to the endonuclease; and (c) an engineered guide ribonucleic acid structure configured to form a complex with the endonuclease, the engineered guide ribonucleic acid structure comprising: (i) A guide ribonucleic acid sequence configured to hybridize to a target deoxyribonucleic acid sequence; and (ii) a ribonucleic acid sequence configured to bind to the endonuclease. In some cases, ribonucleic acid sequences configured to bind to endonucleases include tracr sequences. In some cases, the endonuclease includes a nicking enzyme mutation. In some cases, ruvC domains lack nuclease activity. In some cases, the endonuclease is configured to cleave one strand of a double stranded target deoxyribonucleic acid.
In some embodiments, the endonuclease is derived from an uncultured microorganism. In some embodiments, the endonuclease has less than 80% identity to a Cas9 endonuclease. In some embodiments, the endonuclease further comprises an HNH domain. In some embodiments, the tracr ribonucleic acid sequence comprises a sequence having at least 80%, at least 81%, at least 82%, at least 83%, at least 84%, at least 85%, at least 86%, at least 87%, at least 88%, at least 89%, at least 90%, at least 91%, at least 92%, at least 93%, at least 94%, at least 95%, at least 96%, at least 97%, at least 98%, at least 99% or 100% sequence identity to about 60 to 90 consecutive nucleotides or variants thereof selected from any of SEQ ID NOs 88-96, 488-489 or 679-680. In some embodiments, the tracr ribonucleic acid sequence comprises a sequence having at least 80%, at least 81%, at least 82%, at least 83%, at least 84%, at least 85%, at least 86%, at least 87%, at least 88%, at least 89%, at least 90%, at least 91%, at least 92%, at least 93%, at least 94%, at least 95%, at least 96%, at least 97%, at least 98%, at least 99% or 100% sequence identity to a non-degenerate nucleotide of any one of SEQ ID NOs 88-96, 488-489 or 679-680, or a variant thereof.
In some aspects, the present disclosure provides an engineered nucleic acid editing system comprising: (a) An engineered guide ribonucleic acid structure, the engineered guide ribonucleic acid structure comprising: (i) A guide ribonucleic acid sequence configured to hybridize to a target deoxyribonucleic acid sequence; and (ii) a tracr ribonucleic acid sequence configured to bind to an endonuclease, wherein the tracr ribonucleic acid sequence comprises a sequence having at least 80%, at least 81%, at least 82%, at least 83%, at least 84%, at least 85%, at least 86%, at least 87%, at least 88%, at least 89%, at least 90%, at least 91%, at least 92%, at least 93%, at least 94%, at least 95%, at least 96%, at least 97%, at least 98%, at least 99% or 100% sequence identity to a non-degenerate nucleotide of any one of SEQ ID NOs 88-96, 488-489 or 679-680, or a variant thereof; and a type 2 II endonuclease, the type 2 II endonuclease configured to bind to the engineered guide ribonucleic acid.
In some embodiments, the endonuclease is configured to bind to a Protospacer Adjacent Motif (PAM) sequence comprising any of SEQ ID NOs 360, 362 or 368. In some embodiments, the base editor comprises a sequence having at least 80%, at least 81%, at least 82%, at least 83%, at least 84%, at least 85%, at least 86%, at least 87%, at least 88%, at least 89%, at least 90%, at least 91%, at least 92%, at least 93%, at least 94%, at least 95%, at least 96%, at least 97%, at least 98%, at least 99% or 100% sequence identity to any one of SEQ ID NOs 1-51, 57-66, 385-443, 444-475, 594-595, or 599-675, or a variant thereof. In some embodiments, the base editor is an adenine deaminase. In some embodiments, the adenosine deaminase comprises a sequence with at least 80%, at least 81%, at least 82%, at least 83%, at least 84%, at least 85%, at least 86%, at least 87%, at least 88%, at least 89%, at least 90%, at least 91%, at least 92%, at least 93%, at least 94%, at least 95%, at least 96%, at least 97%, at least 98%, at least 99% or 100% sequence identity to SEQ ID No. 50-51, 57, 385-443, 448-475, or 595, or variants thereof. In some embodiments, the base editor is a cytosine deaminase. In some embodiments, the cytosine deaminase comprises a sequence having at least 80%, at least 81%, at least 82%, at least 83%, at least 84%, at least 85%, at least 86%, at least 87%, at least 88%, at least 89%, at least 90%, at least 91%, at least 92%, at least 93%, at least 94%, at least 95%, at least 96%, at least 97%, at least 98%, at least 99%, or 100% sequence identity with any one of SEQ ID NOs 1-49, 444-447, 594, or 58-66, or variants thereof.
In some embodiments, the engineered nucleic acid editing system further comprises a uracil DNA glycosylase inhibitor. In some embodiments, the uracil DNA glycosylase inhibitor comprises a sequence having at least 80%, at least 81%, at least 82%, at least 83%, at least 84%, at least 85%, at least 86%, at least 87%, at least 88%, at least 89%, at least 90%, at least 91%, at least 92%, at least 93%, at least 94%, at least 95%, at least 96%, at least 97%, at least 98%, at least 99% or 100% sequence identity to any one of SEQ ID NOs 52-56 or SEQ ID NOs 67, or variants thereof.
In some embodiments, the engineered guide ribonucleic acid structure comprises at least two ribonucleic acid polynucleotides. In some embodiments, the engineered guide ribonucleic acid structure comprises one ribonucleic acid polynucleotide comprising the guide ribonucleic acid sequence and the tracr ribonucleic acid sequence. In some embodiments, the guide ribonucleic acid sequence is complementary to a prokaryotic, bacterial, archaeal, eukaryotic, fungal, plant, mammalian, or human genomic sequence. In some embodiments, the guide ribonucleic acid sequence is 15-24 nucleotides in length. In some embodiments, the endonuclease includes one or more Nuclear Localization Sequences (NLS) near the N-terminus or C-terminus of the endonuclease.
NLS may include any sequence in Table 1 below, or a combination thereof:
Table 1: example NLS sequences that may be used with effectors in accordance with the present disclosure
/>
In some embodiments, the endonuclease is covalently coupled to the base editor directly or through a linker. In some embodiments, a linker that links any enzyme or domain described herein may include one or more copies of a sequence having at least 80%, at least 81%, at least 82%, at least 83%, at least 84%, at least 85%, at least 86%, at least 87%, at least 88%, at least 89%, at least 90%, at least 91%, at least 92%, at least 93%, at least 94%, at least 95%, at least 96%, at least 97%, at least 98%, at least 99% or 100% sequence identity to SGGSSGGSSGSETPGTSESATPESSGGSSGGS, SGSETPGTSESATPESA, GSGGS, SGSETPGTSESATPES, SGGSS or GAAA, or any other linker sequence described herein. In some embodiments, the polypeptide comprises the endonuclease and the base editor. In some embodiments, the endonuclease is configured to cleave one strand of a double stranded target deoxyribonucleic acid. In some embodiments, the endonuclease comprises a sequence that has at least 80%, at least 81%, at least 82%, at least 83%, at least 84%, at least 85%, at least 86%, at least 87%, at least 88%, at least 89%, at least 90%, at least 91%, at least 92%, at least 93%, at least 94%, at least 95%, at least 96%, at least 97%, at least 98%, at least 99% or 100% sequence identity to any one of SEQ ID NOs 70-78 or 597, or a variant thereof. In some embodiments, the system further comprises a source of Mg 2+.
In some embodiments, the endonuclease comprises a sequence that is at least 70%, at least 80%, or at least 90% identical to SEQ ID NO. 70 or a variant thereof; the guide RNA structure comprises a sequence that is at least 70%, at least 80%, or at least 90% identical to at least one of SEQ ID NOs 88; and the endonuclease is configured to bind to PAM comprising SEQ ID NO. 360.
In some embodiments, the endonuclease comprises a sequence that is at least 70%, at least 80%, or at least 90% identical to SEQ ID NO. 71 or a variant thereof; the guide RNA structure comprises a sequence that is at least 70%, at least 80%, or at least 90% identical to at least one of SEQ ID NOs 89; and the endonuclease is configured to bind to PAM comprising SEQ ID NO: 361.
In some embodiments, the endonuclease comprises a sequence that is at least 70%, at least 80%, or at least 90% identical to SEQ ID NO. 73 or a variant thereof; the guide RNA structure comprises a sequence that is at least 70%, at least 80%, or at least 90% identical to at least one of SEQ ID NOs 91; and the endonuclease is configured to bind to PAM comprising SEQ ID NO 363.
In some embodiments, the endonuclease comprises a sequence that is at least 70%, at least 80%, or at least 90% identical to SEQ ID NO 75 or a variant thereof; the guide RNA structure comprises a sequence that is at least 70%, at least 80%, or at least 90% identical to at least one of SEQ ID NOs 93; and the endonuclease is configured to bind to PAM comprising SEQ ID NO. 365.
In some embodiments, the endonuclease comprises a sequence that is at least 70%, at least 80%, or at least 90% identical to SEQ ID NO 76 or a variant thereof; the guide RNA structure comprises a sequence that is at least 70%, at least 80%, or at least 90% identical to at least one of SEQ ID NOs 94; and the endonuclease is configured to bind to PAM comprising SEQ ID NO 366.
In some embodiments, the endonuclease comprises a sequence that is at least 70%, at least 80%, or at least 90% identical to SEQ ID NO 77 or a variant thereof; the guide RNA structure comprises a sequence that is at least 70%, at least 80%, or at least 90% identical to at least one of SEQ ID NOs 95; and the endonuclease is configured to bind to PAM comprising SEQ ID NO. 367.
In some embodiments, the endonuclease comprises a sequence that is at least 70%, at least 80%, or at least 90% identical to SEQ ID NO. 78 or a variant thereof; the guide RNA structure comprises a sequence that is at least 70%, at least 80%, or at least 90% identical to at least one of SEQ ID NOs 96; and the endonuclease is configured to bind to PAM comprising SEQ ID NO. 368.
In some embodiments, the base editor comprises an adenine deaminase. In some embodiments, the adenine deaminase comprises SEQ ID NO:57 or a variant thereof. In some embodiments, the base editor comprises a cytosine deaminase. In some embodiments, the cytosine deaminase comprises SEQ ID NO:58 or a variant thereof. In some embodiments, the engineered nucleic acid editing systems described herein further comprise uracil DNA glycosylation inhibitors. In some embodiments, the uracil DNA glycosylation inhibitor comprises SEQ ID NO 67 or a variant thereof.
In some embodiments, the sequence identity is determined by BLASTP, CLUSTALW, MUSCLE, MAFFT or smith-whatman homology search algorithm. In some embodiments, the sequence identity is determined by using parameters with a word length (W) of 3 and an expected value (E) of 10, and a BLOSUM62 scoring matrix (set gap penalty to present 11, extension 1) and using the BLASTP homology search algorithm with conditional composition scoring matrix adjustment.
In some aspects, the disclosure provides a nucleic acid comprising an engineered nucleic acid sequence optimized for expression in an organism, wherein the nucleic acid encodes a type 2 II endonuclease coupled to a base editor, and wherein the endonuclease is derived from an uncultured microorganism.
In some aspects, the disclosure provides a nucleic acid comprising an engineered nucleic acid sequence optimized for expression in an organism, wherein the nucleic acid encodes an endonuclease having at least 70%, at least 80%, at least 81%, at least 82%, at least 83%, at least 84%, at least 85%, at least 86%, at least 87%, at least 88%, at least 89%, at least 90%, at least 91%, at least 92%, at least 93%, at least 94%, at least 95%, at least 96%, at least 97%, at least 98%, at least 99% or 100% sequence identity coupled to a base editor of any one of SEQ ID NOs 70-78 or 597, or variants thereof. In some embodiments, the endonuclease includes a sequence encoding one or more Nuclear Localization Sequences (NLS) near the N-terminus or C-terminus of the endonuclease. In some embodiments, the organism is a prokaryote, bacterium, eukaryote, fungus, plant, mammal, rodent, or human.
In some aspects, the present disclosure provides a vector comprising a nucleic acid sequence encoding a type 2 II endonuclease coupled to a base editor, wherein the endonuclease is derived from an uncultured microorganism. In some embodiments, the vector comprises a nucleic acid described herein. In some embodiments, the vector further comprises a nucleic acid encoding an engineered guide ribonucleic acid structure configured to form a complex with the endonuclease, the engineered guide ribonucleic acid structure comprising: a guide ribonucleic acid sequence configured to hybridize to a target deoxyribonucleic acid sequence; and a tracr ribonucleic acid sequence configured to bind to the endonuclease. In some embodiments, the vector is a plasmid, a minicircle, CELiD, an adeno-associated virus (AAV) derived virion, or a lentivirus. In some aspects, the present disclosure provides a cell comprising a vector described herein. In some aspects, the present disclosure provides a method of preparing an endonuclease comprising culturing a cell described herein.
In some aspects, the present disclosure provides a method for modifying a double-stranded deoxyribonucleic acid polynucleotide, the method comprising contacting the double-stranded deoxyribonucleic acid polynucleotide with a complex comprising: an endonuclease comprising a RuvC domain and an HNH domain, wherein the endonuclease is derived from an uncultured microorganism, wherein the endonuclease is a type 2 II endonuclease, and wherein the RuvC domain lacks nuclease activity; a base editor coupled to the endonuclease; and an engineered guide ribonucleic acid structure configured to bind to the endonuclease and the double-stranded deoxyribonucleic acid polynucleotide; wherein the double stranded deoxyribonucleic acid polynucleotide comprises a Protospacer Adjacent Motif (PAM).
In some embodiments, the endonuclease comprising RuvC domain and HNH domain is covalently coupled to the base editor directly or through a linker. In some embodiments, the endonuclease comprising a RuvC domain and an HNH domain comprises a sequence having at least 80%, at least 81%, at least 82%, at least 83%, at least 84%, at least 85%, at least 86%, at least 87%, at least 88%, at least 89%, at least 90%, at least 91%, at least 92%, at least 93%, at least 94%, at least 95%, at least 96%, at least 97%, at least 98%, at least 99% or 100% sequence identity to any one of SEQ ID NOs 70-78 or 597, or variants thereof.
In some aspects, the present disclosure provides a method for modifying a double-stranded deoxyribonucleic acid polynucleotide, the method comprising contacting the double-stranded deoxyribonucleic acid polynucleotide with a complex comprising: a class II endonuclease, a base editor coupled to the endonuclease, and an engineered guide-ribonucleic acid structure configured to bind to the endonuclease and the double-stranded deoxyribonucleic acid polynucleotide; wherein the double-stranded deoxyribonucleic acid polynucleotide comprises a Protospacer Adjacent Motif (PAM); and wherein said PAM comprises a sequence selected from the group consisting of SEQ ID NOS 360-368 or 598 or variants thereof.
In some embodiments, the class 2 type II endonuclease is coupled to the base editor covalently or through a linker. In some embodiments, the base editor comprises a sequence having at least 80%, at least 81%, at least 82%, at least 83%, at least 84%, at least 85%, at least 86%, at least 87%, at least 88%, at least 89%, at least 90%, at least 91%, at least 92%, at least 93%, at least 94%, at least 95%, at least 96%, at least 97%, at least 98%, at least 99% or 100% sequence identity to a sequence selected from the group consisting of SEQ ID NOs 1-51, 57-66, 385-443, 444-475, 594-595, or 599-675, or a variant thereof. In some embodiments, the base editor comprises an adenine deaminase; the double-stranded deoxyribonucleic acid polynucleotide comprises adenine; and modifying the double-stranded deoxyribonucleic acid polypeptide comprises converting the adenine to guanine. In some embodiments, the adenine deaminase comprises a sequence that has at least 95% identity to SEQ ID NO. 57 or a variant thereof.
In some embodiments, the base editor comprises a cytosine deaminase; the double-stranded deoxyribonucleic acid polynucleotide comprises cytosine; and modifying the double-stranded deoxyribonucleic acid polypeptide comprises converting the cytosine to uracil. In some embodiments, the cytosine deaminase comprises a sequence with at least 95% identity to SEQ ID NO:58 or a variant thereof. In some embodiments, the cytosine deaminase comprises a sequence with at least 95% identity to any one of SEQ ID NOs 59-66 or a variant thereof.
In some embodiments, the complex further comprises a uracil DNA glycosylase inhibitor. In some embodiments, the uracil DNA glycosylase inhibitor comprises a sequence having at least 70%, 80%, 90% or 95% identity to any one of SEQ ID NOs 52-56 or SEQ ID NO 67, or variant thereof. In some embodiments, the double-stranded deoxyribonucleic acid polynucleotide comprises a first strand comprising a sequence complementary to the sequence of the engineered ribonucleic acid structure, and a second strand comprising the PAM. In some embodiments, the PAM is immediately adjacent to the 3' end of the sequence complementary to the sequence of the engineered guide ribonucleic acid structure.
In some embodiments, the class 2 type II endonuclease is not a Cas9 endonuclease, a Cas14 endonuclease, a Cas12a endonuclease, a Cas12b endonuclease, a Cas12 c endonuclease, a Cas12d endonuclease, a Cas12e endonuclease, a Cas13a endonuclease, a Cas13b endonuclease, a Cas13c endonuclease, or a Cas13 d endonuclease. In some embodiments, the class 2 type II endonuclease is derived from an uncultured microorganism. In some embodiments, the double-stranded deoxyribonucleic acid polynucleotide is a eukaryotic, plant, fungal, mammalian, rodent, or human double-stranded deoxyribonucleic acid polynucleotide.
In some aspects, the disclosure provides a method of modifying a target nucleic acid locus, the method comprising delivering an engineered nucleic acid editing system described herein to the target nucleic acid locus, wherein the endonuclease is configured to form a complex with the engineered guide ribonucleic acid structure, and wherein the complex is configured such that upon binding of the complex to the target nucleic acid locus, the complex modifies a nucleotide of the target nucleic acid locus.
In some embodiments, the engineered nucleic acid editing system comprises an adenine deaminase, the nucleotide is adenine, and modifying the target nucleotide locus comprises converting the adenine to guanine. In some embodiments, the engineered nucleic acid editing system comprises a cytidine deaminase and a uracil DNA glycosylase inhibitor, the nucleotide is a cytosine and modifying the target nucleotide locus comprises converting the adenine to uracil. In some embodiments, the target locus comprises genomic DNA, viral DNA, or bacterial DNA. In some embodiments, the target nucleic acid gene locus is in vitro. In some embodiments, the target nucleic acid gene locus is within a cell. In some embodiments, the cell is a prokaryotic cell, a bacterial cell, a eukaryotic cell, a fungal cell, a plant cell, an animal cell, a mammalian cell, a rodent cell, a primate cell, or a human cell. In some embodiments, the cell is in an animal.
In some embodiments, the cell is within the cochlea. In some embodiments, the cell is within an embryo. In some embodiments, the embryo is a two-cell embryo. In some embodiments, the embryo is a mouse embryo. In some embodiments, delivering the engineered nucleic acid editing system to the target nucleic acid locus comprises delivering a nucleic acid described herein or a vector described herein. In some embodiments, delivering the engineered nucleic acid editing system to the target nucleic acid locus comprises delivering a nucleic acid comprising an open reading frame encoding the endonuclease.
In some embodiments, the nucleic acid comprises a promoter operably linked to the open reading frame encoding the endonuclease. In some embodiments, delivering the engineered nucleic acid editing system to the target nucleic acid locus comprises delivering a capped mRNA containing the open reading frame encoding the endonuclease. In some embodiments, delivering the engineered nucleic acid editing system to the target nucleic acid locus comprises delivering a translated polypeptide. In some embodiments, delivering the engineered nucleic acid editing system to the target nucleic acid locus comprises delivering deoxyribonucleic acid (DNA) encoding the engineered guide ribonucleic acid structure operably linked to a ribonucleic acid (RNA) pol III promoter.
In some aspects, the present disclosure provides an engineered nucleic acid editing polypeptide comprising: an endonuclease comprising a RuvC domain and an HNH domain, wherein the endonuclease is derived from an uncultured microorganism, wherein the endonuclease is a type 2 II endonuclease, and wherein the endonuclease is configured to lack nuclease activity. In some embodiments, the endonuclease comprises a sequence that has at least 80%, at least 81%, at least 82%, at least 83%, at least 84%, at least 85%, at least 86%, at least 87%, at least 88%, at least 89%, at least 90%, at least 91%, at least 92%, at least 93%, at least 94%, at least 95%, at least 96%, at least 97%, at least 98%, at least 99% or 100% sequence identity to any one of SEQ ID NOs 70-78 or 597, or a variant thereof.
In some aspects, the present disclosure provides an engineered nucleic acid editing polypeptide comprising: an endonuclease having at least 80%, at least 81%, at least 82%, at least 83%, at least 84%, at least 85%, at least 86%, at least 87%, at least 88%, at least 89%, at least 90%, at least 91%, at least 92%, at least 93%, at least 94%, at least 95%, at least 96%, at least 97%, at least 98%, at least 99% or 100% sequence identity to any one of SEQ ID nos. 70-78 or 597, or a variant thereof, wherein the endonuclease is configured to lack nuclease activity; and a base editor coupled to the endonuclease.
In some aspects, the present disclosure provides an engineered nucleic acid editing polypeptide comprising: an endonuclease configured to bind to a Protospacer Adjacent Motif (PAM) sequence comprising any one of SEQ ID NOs 360-368 or 598, wherein the endonuclease is a type 2 II endonuclease, and wherein the endonuclease is configured to lack nuclease activity; and a base editor coupled to the endonuclease.
In some embodiments, the endonuclease is derived from an uncultured microorganism. In some embodiments, the endonuclease has less than 80% identity to a Cas9 endonuclease. In some embodiments, the endonuclease further comprises an HNH domain. In some embodiments, the ribonucleic acid sequence is configured to bind to an endonuclease comprising at least 80%, at least 81%, at least 82%, at least 83%, at least 84%, at least 85%, at least 86%, at least 87%, at least 88%, at least 89%, at least 90%, at least 91%, at least 92%, at least 93%, at least 94%, at least 95%, at least 96%, at least 97%, at least 80%, at least 81%, at least 82%, at least 83%, at least 84%, at least 85%, at least 86%, at least 87%, at least 88%, at least 89%, at least 90%, at least 91%, at least 92%, at least 93%, at least 94%, at least 95%, at least 96%, at least 97% of a contiguous nucleotide sequence of any one selected from SEQ ID NOs 88-96, 488-489 or 679-680 Sequences that are at least 98%, at least 99%, or 100% sequence identity. In some embodiments, the ribonucleic acid sequence is configured to bind to an endonuclease comprising at least 80%, at least 81%, at least 82%, at least 83%, at least 84%, at least 85%, at least 86%, at least 87%, at least 88%, at least 89%, at least 90%, at least 91%, at least 92%, at least 93%, at least 94%, at least 95%, at least 96%, at least 97%, at least 98%, at least 92%, at least 93%, at least 94%, at least 95%, at least 96%, at least 97%, at least, Sequences that are at least 99% or 100% sequence identity. In some embodiments, the base editor comprises a sequence having at least 70%, at least 80%, at least 81%, at least 82%, at least 83%, at least 84%, at least 85%, at least 86%, at least 87%, at least 88%, at least 89%, at least 90%, at least 91%, at least 92%, at least 93%, at least 94%, at least 95%, at least 96%, at least 97%, at least 98%, at least 99% or 100% sequence identity to any one of SEQ ID NOS 70-78 or 597, or variants thereof. In some embodiments, the base editor is an adenine deaminase. In some embodiments, the adenosine deaminase comprises a sequence with at least 80%, at least 81%, at least 82%, at least 83%, at least 84%, at least 85%, at least 86%, at least 87%, at least 88%, at least 89%, at least 90%, at least 91%, at least 92%, at least 93%, at least 94%, at least 95%, at least 96%, at least 97%, at least 98%, at least 99% or 100% sequence identity to any of SEQ ID NOs 50-51, 57, 385-443, 448-475 or 595, or variants thereof. In some embodiments, the base editor is a cytosine deaminase. In some embodiments, the cytosine deaminase comprises a sequence having at least 80%, at least 81%, at least 82%, at least 83%, at least 84%, at least 85%, at least 86%, at least 87%, at least 88%, at least 89%, at least 90%, at least 91%, at least 92%, at least 93%, at least 94%, at least 95%, at least 96%, at least 97%, at least 98%, at least 99%, or 100% sequence identity with any one of SEQ ID NOs 1-49, 444-447, 594, or 58-66, or variants thereof.
The systems of the present disclosure can be used in a variety of applications, such as nucleic acid editing (e.g., gene editing), binding to nucleic acid molecules (e.g., sequence-specific binding). Such systems can be used, for example, to address (e.g., remove or replace) genetic mutations that may cause disease in a subject, inactivate genes in order to determine their function in cells, as diagnostic tools for detecting pathogenic genetic elements (e.g., by cleaving retroviral RNAs or amplified DNA sequences encoding pathogenic mutations), as inactivating enzymes in combination with probes to target and detect specific nucleotide sequences (e.g., sequences encoding bacterial antibiotic resistance), inactivate viruses by targeting viral genomes or to disable infection of host cells, engineer organisms to produce valuable small molecules, macromolecules or secondary metabolites, create gene driven elements for evolutionarily selected as biosensors to detect foreign small molecules and nucleotide to cell interference.
Table 2: sequence listing of proteins and nucleic acid sequences mentioned herein
/>
/>
/>
/>
/>
/>
/>
/>
/>
/>
/>
/>
/>
/>
/>
/>
/>
/>
/>
/>
/>
/>
/>
/>
/>
/>
/>
/>
/>
/>
/>
/>
/>
/>
/>
/>
/>
/>
/>
/>
/>
/>
/>
/>
/>
/>
/>
/>
/>
/>
/>
/>
/>
/>
/>
/>
/>
/>
/>
/>
/>
/>
/>
/>
/>
/>
/>
/>
/>
/>
/>
/>
/>
/>
/>
/>
/>
/>
/>
/>
/>
/>
/>
/>
/>
/>
/>
/>
/>
/>
/>
/>
/>
/>
Examples
Example 1 plasmid construction of base editor
To generate base editing enzymes that target their base editing using CRISPR functions, effector enzymes are fused in various configurations with the exemplary deaminase described herein. This process involves a first stage of constructing a vector suitable for the production of the fusion enzyme. First two entry plasmid vectors, MGA and MGC, were constructed.
To construct an MGA (adenine base editor (Metagenomi adenine base editor) from metagenomic company) containing T7 promoter-His tag-TadA x (abe 8.17 m) -SV40 NLS into the plasmid, three DNA fragments were amplified from pAL 6. To construct MGCs (metagenomic company cytosine base editors) containing the T7 promoter-His tag-apodec 1 (BE 3) -UGI-SV40 NLS into the plasmid, apodec 1 and UGI-SV40 NLS were amplified from pAL9, and two vector backbones were amplified from pAL6 (see fig. 3).
To introduce mutations into effectors, source plasmids containing the MG1-4, MG1-6, MG3-7, MG3-8, MG4-5, MG14-1, MG15-1 or MG18-1 effector gene sequences were amplified by Q5 DNA polymerase with forward and reverse primers incorporating appropriate mutations. The linear DNA fragments are then phosphorylated and ligated. The DNA template was digested with DpnI using KLD enzyme cocktail (New England Biolabs (NEW ENGLAND Biolabs)) according to the manufacturer's instructions.
To generate pMGA and pMGC expression plasmids, genes were amplified from plasmids carrying mutant effectors and cloned into the plasmids through XhoI and SacII sites into MGA and MGC, respectively. To clone an sgRNA expression cassette comprising a T7 promoter-sgRNA-bidirectional terminator into a BE expression plasmid, one set of primers (P366 as forward primer) was used to amplify the T7 promoter-spacer sequence, while the other set of primers (P367 as reverse primer) was used to amplify the spacer sequence-sgRNA scaffold-bidirectional terminator, with the pTCM plasmid used as template (see fig. 2). The two fragments were assembled to pMGA and pMGC via XbaI sites to give pMGA-sgRNA and pMGC-sgRNA, respectively.
TABLE 3 summary of constructs for the ABE screening systems described herein
/>
/>
All amplified DNA fragments were purified by QIAquick gel extraction kit (Qiagen), assembled by NEBuilder HiFi DNA assembly (new england biology laboratory), and the resulting assemblies propagated by engura electrocompetent cells (Lucergen company (Lucergen)) according to the manufacturer's instructions (see fig. 4 and 5). The DNA sequences of all cloned genes were confirmed at the tourmaline biomedical company (ELIM BIOPHARM).
TABLE 4 conserved catalytic residues resolved for selected systems described herein
Nicking enzyme candidates Length of Related full-length protein sequences
nMG1-4(D9A) 1025 SEQ ID NO:70
nMG1-6(D13A) 1059 SEQ ID NO:71
nMG3-6(D13A) 1134 SEQ ID NO:72
nMG3-7(D12A) 1131 SEQ ID NO:73
nMG3-8(D13A) 1132 SEQ ID NO:74
nMG4-5(D17A) 1055 SEQ ID NO:75
nMG14-1(D23A) 1003 SEQ ID NO:76
nMG15-1(D8A) 1082 SEQ ID NO:77
nMG18-1(D12A) 1348 SEQ ID NO:78
EXAMPLE 2 protein expression and purification
The mutant effector genes driven by the T7 promoter in the pMGA and pMGC plasmids were expressed in e.coli BL21 (DE 3) cells in MICHAIT MEDIA by transformation with each of the corresponding plasmids described in example 1 above, according to the manufacturer's instructions (sameira). After incubation at 16℃for 40 hours, the transformed cells were harvested, suspended in lysis buffer (HisTrap equilibration buffer: 20mM Tris (Sigma) T2319-100 ML), 300mM sodium chloride (VWR VWRVE529-500 ML), 5% glycerol, 10mM MgCl 2, and 10mM imidazole (Sigma 68268-100 ML-F), pH 7.5) and EDTA-free protease inhibitor (Pierce) and frozen in a-80℃freezer. The cells were then thawed on ice, sonicated, clarified, and filtered prior to affinity purification. Proteins were applied to Cytiva ML HisTrap FF column on AKTA AVANT FPLC according to manufacturer's instructions and the proteins were purified in 20mM Tris (Sigma company T2319-100 ML), 300mM sodium chloride (VWR VWRVE529-500 ML), 5% glycerol, 10mM MgCl 2 and 250mM imidazole (Sigma company 68268-100 ML-F); elution was performed in an isocratic elution at pH 7.5. Concentrating the eluted fraction containing His-tagged effector protein and exchanging the buffer to 50mM Tris-HCl, 300mM NaCl, 1mM TCEP, 5% glycerol; pH 7.5. Protein concentration was determined by a biquinolinecarboxylic acid assay (sameid) and adjusted after determination of relative purity by SDS PAGE densitometry in an image laboratory (Bio-Rad) (see fig. 7).
Example 3-in vitro incision enzyme assay
Primers P141 and P146 (SEQ ID NOS: 179 and 180) labeled with 6-carboxyfluorescein (6-FAM) synthesized by IDT were used to amplify a linear fragment of LacZ containing the targeting sequence of the effector using Q5 DNA polymerase. The DNA fragment containing the T7 promoter was transcribed in vitro using HiScribe T high-yield RNA synthesis kit (New England Biolabs) followed by transcription of sgRNA containing the 20-bp or 22-bp spacer sequence, according to the manufacturer's instructions. According to the user manual, synthetic sgrnas with sequences corresponding to the named sgrnas in the sequence list were purified by means of a Monarch RNA purification kit (Monarch RNA Cleanup Kit, new england biology laboratory) and the concentration was measured by Nanodrop.
To determine the DNA nicking enzyme activity, each of the purified mutant effectors was first supplemented with its cognate sgRNA. The reaction was initiated by adding linear DNA substrate to 15. Mu.L of reaction mixture containing 10mM Tris pH 7.5, 10mM MgCl 2 and 100mM NaCl, 150nM enzyme, 150nM RNA and 15nM DNA. The reaction was incubated at 37℃for 2 hours. Digested DNA was purified using AMPure XP SPRI paramagnetic beads (Beckman Coulter) and eluted with 6. Mu.L TE buffer (10mM Tris,1mM EDTA;pH8.0). Nicked DNA was resolved on a 10% TBE-urea denaturing gel (burle) and imaged by ChemiDoc (burle) (see figure 7, which shows that the depicted enzymes show nickase activity by producing bands of 600 and 200 bases versus 400 and 200 bases in the case of wild-type enzymes). The results indicate that all of the tested nicking enzyme mutants in FIG. 7, except MG4-5 (D17A), showed their expected nicking enzyme activity rather than wild-type cleavage activity, which was uncertain.
EXAMPLE 4 introduction of base editor into E.coli
Plasmids were transformed into Lucergen company electrocompetent BL21 (DE 3) cells according to the manufacturer's instructions. Following electroporation, cells were recovered with expression recovery medium at 37℃for 1 hour and spread onto LB plates containing 100L/mg ampicillin (ampicillin) and 0.1mM IPTG. After overnight growth at 37 ℃, colonies were picked and the lacZ gene was amplified by Q5 DNA polymerase (new england biological laboratory) with primers P137 and P360. The obtained PCR product was purified and sequenced by Mulberry sequencing of the company Lin biological medicine. Base editing is determined by examining the presence or absence of C-to-T conversion or a-to-G conversion in the targeted protospacer region of the cytosine base editor or adenine base editor, respectively.
To evaluate editing efficiency in E.coli, plasmids were transformed into electrocompetent BL21 (DE 3) (Lucergen Co.) and electroporated cells were recovered with expression recovery medium for 1 hour at 37 ℃. Then 10. Mu.L of the recovered cells were inoculated into 990. Mu.L SOB containing 100. Mu.L/mg ampicillin and 0.1mM IPTG in a 96-well deep well plate and grown at 37℃for 20 hours. 1. Mu.L of cells inducing the expression of the base editor was used to amplify the lacZ gene in a 20. Mu.L PCR reaction (Q5 DNA polymerase) with primers P137 and P360. The obtained PCR product was purified and sequenced by Mulberry sequencing of the company Lin biological medicine. Quantification of edit efficiency was handled by edit R as described in example 12.
TABLE 5MG base editor with associated PAM and deaminase as described herein
Example 5-protein Nuclear transfection and amplicon sequences in mammalian cells (prophetic)
Nuclear transfection was performed in mammalian cells (e.g., K-562, neural-2A or RAW 264.7) using the Longza company (Lonza) 4D nuclear transfection and the Longza company SF cell line 4D-nuclear transfection X kit S (catalog number V4 XC-2032) according to manufacturer' S recommendations. After formulation of SF nuclear transfection buffer, 200,000 cells were resuspended in 5. Mu.l of buffer/nuclear transfection. In the remaining 15 μl of buffer per nuclear transfection, 20pmol of chemically modified sgRNA from Synthego company (Synthego) was combined with 18pmol of base editor enzyme (e.g., ABE8 e) and incubated for 5 minutes at room temperature to give a complex. Cells were added to a 20 μl nuclear transfection cuvette, followed by the addition of protein solution, and the mixture was ground for mixing. Cells were transfected with the program CM-130 nucleus, immediately after which 80. Mu.l of warmed medium was added to each well for recovery. After 5 minutes, 25 μl from each sample was added to 250 μl of fresh medium in a 48-well poly-d-lysine plate (Corning). Then after three additional days of culture, the cells were treated in the same manner as the above lipofected cells for genomic DNA extraction.
After encoding with the company of henna (Illumina) bar code, PCR products were pooled and purified by electrophoresis with 2% agarose gel using the Monarch DNA gel extraction kit (new england biology laboratory), eluting with 30 μ l H2O. The DNA concentration was quantified using the Qubit dsDNA high sensitivity assay kit (Semer FireWipe technologies (Thermo FISHER SCIENTIFIC)) and sequenced on a Miseq instrument (paired end reads, R1:250-280 cycles, R2:0 cycles) from the company Neumena according to the manufacturer's protocol.
Sequencing reads were demultiplexed using a MiSeq Reporter (because of the susna company) and FASTQ files were analyzed using CRISPResso. Double editing in individual alleles was analyzed by Python script. Base edit values represent n=3 independent biological replicates collected by different researchers, with mean ± s.d. shown. Base edit values are reported as the percentage of reads with adenine mutagenesis relative to total alignment reads.
Example 6 plasmid Nuclear transfection and full genome sequence in mammalian cells (prophetic)
All plasmids were assembled by uracil-specific excision reagent (USER) cloning. Guide RNA plasmids for SpCas9, saCas9 and all engineered variants were assembled. Plasmids for mammalian cell transfection were prepared using ZymoPURE plasmid midi prep kit (Zymo research company (Zymo Research Corporation)). HEK293T cells (ATCC CRL-3216) were cultured in Dulbecco's modified Eagle's medium (Corning) supplemented with 10% fetal bovine serum (Semer Feishmania technologies) and maintained at 37℃with 5% CO 2.
HEK293T cells were seeded on 48-well poly-d-lysine plates (corning) in the same medium. Cells were transfected with 750ng of base editor plasmid, 250ng of guide RNA plasmid and 10ng of green fluorescent protein as transfection controls 12-16 hours after plating with 1.5. Mu.l Lipofectamine 2000 (Semer Feicher technologies). Cells were cultured for 3 days with medium exchanged after the first day, then with(Semerle technologies) followed by extraction of genomic DNA by direct addition of 100. Mu.l of freshly prepared lysis buffer (10 mM Tris-HCl, pH 7.5, 0.05% SDS, 25. Mu.g ml-1 proteinase K (Semerle technologies)) to each transfected well. The mixture was incubated at 37℃for 1 hour, and then heat-inactivated at 80℃for 30 minutes. The genomic DNA lysate was immediately subsequently used for High Throughput Sequencing (HTS).
HTS of genomic DNA from HEK293T cells was performed. After encoding the bar code of the company henna, the PCR products were pooled and purified by electrophoresis with 2% agarose gel using the Monarch DNA gel extraction kit (NEB), eluting with 30 μ l H2O. The DNA concentration was quantified using the Qubit dsDNA high sensitivity assay kit (Semer Feicher technologies) and sequenced on a Miseq instrument (paired end reads, R1:250-280 cycles, R2:0 cycles) from the company Neem according to the manufacturer's protocol.
Example 7-determination of edit Window (prophetic)
To examine the editing window region, the cytosines showing the highest C-T conversion frequency in the specified sgrnas were normalized to 1, and then the other cytosines at positions 30nt upstream to 10nt downstream of the PAM sequence (43 bp total) spanning the same sgrnas were normalized. The normalized C-T conversion frequencies were then classified and compared according to their positions for all tested sgrnas of the specified base editor. A Composite Edit Window (CEW) is defined to span a location where the average C-T conversion efficiency after normalization exceeds 0.6.
To examine the substrate preference of each cytidine deaminase, the C sites were initially classified according to their position in the sgRNA targeting region, and those positions containing at least one C site with a normalized C-T conversion frequency of ≡0.8 were included in the subsequent analysis. The selected C sites are then compared according to the base type upstream or downstream of the edited cytosine (NC or CN). For cytidine deaminase that shows efficient C-T conversion at both the N-and C-terminus of the endonuclease, substrate preference was assessed by integrating the corresponding NT-CBE and CT-CBE together. For statistical analysis, one-way ANOVA was used, and p <0.05 was considered significant.
Example 8 a-testing of Whole genome sequencing and transcriptomic off-target analysis in mammalian cells (prophetic)
HEK293T cells were plated in antibiotic-free dmem+glutamax medium (zemoeimer feishi technologies) at a density of 3.104 cells/well for 16 to 20 hours on 48-well poly-d-lysine coated plates. 750ng of the nicking enzyme or base editor expression plasmid DNA was combined with 250ng of sgRNA expression plasmid DNA in 15. Mu.l of Opti-MEM+Glutamax. This was combined with 10. Mu.l of lipid mixture, including 1.5. Mu.l Lipofectamine 2000 and 8.5. Mu.l Opti-MEM+Glutamax per well. Cells were harvested 3 days after transfection, and DNA or RNA was harvested. For DNA analysis, cells were washed once in PBS and then lysed in 100 μ l QuickExtract buffer (Lucigen) according to the manufacturer's instructions. For RNA harvest, magMAX mirVana Total RNA isolation kit (Sesamer Feicher technologies) was used with KINGFISHER FLEX.
Genomic DNA from mammalian cells was fragmented and adaptor ligated using a Nextera DNA Flex library preparation kit (henna) using 96-well plate Nextera index primers (henna) according to the manufacturer's instructions. Library size and concentration were confirmed by fragment analyzer (Agilent) and DNA was sent to nordsignature company (Novogene) for WGS using the genealogy company HiSeq system.
All targeted NGS data were analyzed by performing four general operations: (1) alignment; (2) repeating the marking; (3) variant call; and (4) background filtering of variants to remove artifacts and germline mutations. The mutant reference and alternative alleles are reported in a chain relative to the reference genome.
For whole transcriptome sequencing, mRNA selection was performed using a NEBNext Poly (a) mRNA magnetic separation module (new england biology laboratory). RNA library preparation was performed using NEBNext Ultra II RNA library preparation kit (New England Biolabs). Based on the RNA input, cycle number 12 was used for PCR enrichment of adaptor-ligated DNA. The nebnet sample purification beads (new england biology laboratory) were used for all size selections made by this method. According to the PCR formulation outlined in the protocol, NEBNext Multiplex Oligos (new england biosystems) from the company monarch was used for multiple indexing. Prior to sequencing, samples were quality checked using high sensitivity D1000 SCREENTAPE on the 4200TapeStation system (agilent). The library was pooled and sequenced using NovaSeq (noro origin). Targeted RNA sequencing was then performed. Complementary DNA was generated from the isolated RNA by reverse transcription PCR (RT-PCR) using a SuperScript IV one-step RT-PCR system with EZDNA enzyme (Sesameimers technology Co.) according to the manufacturer's instructions.
The following procedure was used: 58 ℃ for 12 minutes; the temperature is 98 ℃ for 2 minutes; the PCR cycle follows, which is altered by the amplicon: for CTNNB1 and IP90;32 cycles (98 ℃ for 10 seconds; 60 ℃ for 10 seconds; 72 ℃ for 30 seconds). Following combined RT-PCR, the amplicons were bar coded and sequenced using a MiSeq sequencer from henna, as described above. The first 125 nucleotides in each amplicon (starting from the first base after the end of the forward primer in each amplicon) were aligned with the reference sequence and used to analyze the maximum a to I frequency in each amplicon. Off-target DNA sequencing was performed using primers, using two-stage PCR and bar code coding methods to prepare samples for sequencing using the MiSeq sequencer from company, as described above.
Example 8 b-off-target editing by whole genome sequencing and transcriptomics analysis (prophetic)
Transfected cells prepared as in example 8a were harvested after 3 days and genomic DNA was isolated using Agencourt DNAdvance genomic DNA isolation kit (beckmann coulter) according to the manufacturer's instructions. The mid-target genomic region and off-target genomic region of interest were amplified by PCR with flanking HTS primer pairs. PCR amplification was performed using Phusion high fidelity DNA polymerase (Sesamer Feisher) using 5ng of genomic DNA as a template according to the manufacturer's instructions. The number of cycles per primer pair was determined separately to ensure that the reaction terminated within the linear range of amplification (30, 28, 32 and 32 cycles of EMX1, FANCF, HEK293 site 2, HEK293 site 3, HEK293 site 4 and RNF2 primers, respectively). The PCR product was purified using RAPIDTIPS (DIFFINITY genome Corp. (DIFFINITY GENOMICS)). The purified DNA was amplified by PCR with primers containing sequencing adaptors. The products were gel purified and quantified using the Quant-iT TM PicoGreen dsDNA assay kit (Semer Fidelity Co.) and the KAPA library quantification kit-Eimer (KAPA Biosystems). Samples were sequenced on the company MiSeq, as previously described.
Sequencing reads were automatically demultiplexed using MiSeq Reporter (as the company is susna) and individual FASTQ files were analyzed using custom Matlab scripts. Each read was aligned to the appropriate reference sequence using the Smith-Watman algorithm (Smith-Waterman algorithm). Base calls with a Q score below 31 are replaced by N and are therefore excluded in calculating nucleotide frequency. This process yields an expected MiSeq base call error rate of about 1/1,000. Aligned sequences in which the read sequence and the reference sequence do not contain gaps are stored in an alignment table from which the base frequencies of each locus are tabulated. The indel frequency was quantified using custom Matlab script.
The sequencing reads are scanned to exactly match the two 10-bp sequences flanking both sides of the window in which indels may occur. If an exact match is located, then the reads are excluded from the analysis. If the length of this indel window matches the reference sequence perfectly, the reads are classified as free of indels. If the indel window is two or more bases longer or shorter than the reference sequence, the sequencing reads are classified as indels or deletions, respectively.
Example 9-mouse editing experiments (prophetic)
It is contemplated that a base editor comprising a novel DNA targeting a nuclease domain fused to a novel deaminase domain can be validated as a therapeutic candidate by testing in an appropriate disease mouse model.
One example of a suitable model includes mice that have been engineered to express the human PCSK9 protein, e.g., as described by Herbert et al (10.1161/atvbaha.110.204040). PCSK9 proteins regulate LDL receptor (LDLR) levels and affect serum cholesterol levels. Mice expressing human PCSK9 protein exhibit elevated cholesterol levels and a faster progression of atherosclerosis. PCSK9 is a validated drug target for reducing lipid levels in humans at increased risk of cardiovascular disease due to abnormally high plasma lipid levels (https:// doi. Org/10.1038/s 41569-018-0107-8). Reduction of PCSK9 levels by genome editing is expected to permanently reduce lipid levels over the lifetime of an individual, providing a lifetime reduction in the risk of cardiovascular disease. A genome editing method may involve targeting the coding sequence of the PCSK9 gene in order to edit the sequence to create a premature stop codon and thus prevent translation of PCSK9mRNA into a functional protein. Targeting a region near the 5' end of the coding sequence can be used to block translation of most proteins. To generate a stop codon with high efficiency and specificity (TGA, TAA, TAG), it would be desirable to target a region of the PCSK9 coding sequence, where an editing window would be placed over the appropriate sequence such that the highest frequency editing event generates the stop codon. Thus, the availability of multiple base editing systems with a broad range of PAMs or the availability of base editing systems with degraded PAMs can be used to access a large number of potential target sites in the PCSK9 gene. In addition, additional editing systems in which the frequency of off-target editing is low (e.g., in the range of 1% or less of mid-target editing events) may also be used to conduct gene editing in such cases.
The efficiency of base editing required for therapeutic effect is in the range of 50% or higher in order to achieve a significant reduction in plasma lipid levels. An example of the use of a base editor to generate stop codons in the PCSK9 gene is the example of Carreras et al (https:// doi.org/10.1186/s 12915-018-0624-2), where 10% to 34% of the PCSK9 alleles are edited to generate stop codons. While such editing levels are sufficient to result in a measurable decrease in plasma lipid levels in mice, therapeutic use in humans would require greater editing efficiency.
To identify the Base Editing (BE) system and the guide that is optimal for introducing a stop codon into the PCSK9 gene, screening can BE performed in mouse liver cell lines such as Hepa1-6 cells. Computer screening can first BE used to identify guides targeting the PCSK9 gene using a variety of available BE systems. To select among a large number of possible guides, a computer analysis may be performed to determine which guides have an edit window covering the sequence that can produce a stop codon when edited. The preference then favors those guides that are closer to the 5' end of the coding sequence. The resulting collection of guides and BE proteins can BE combined to form ribonucleoprotein complexes (RNPs) and can BE nuclear transfected into Hepa1-6 cells. After 72 hours, editing efficiency at the target site can be determined by NGS analysis. Based on these in vitro results, one or more BE/guide combinations that result in the highest frequency of stop codon formation can BE selected for in vivo testing.
For use in the human therapeutic environment, a safe and effective method of delivering base editing components including base editors and guide RNAs is needed. In vivo delivery methods can be categorized as viral or non-viral methods. Among viral vectors, adeno-associated virus (AAV) is the virus of choice for clinical use due to its safe record, efficient delivery to a variety of tissues and cell types, and established manufacturing processes. The large size of the Base Editor (BE) exceeds the packaging capacity of AAV, which interferes with packaging in a single adeno-associated virus. While the method of packaging BE into two AAV using split intein technology has proven successful in mice (https:// doi. Org/10.1038/s 41551-019-0501-5), the need for 2 viruses can complicate development and manufacture. An additional disadvantage of AAV is that, although viruses do not have mechanisms for facilitating integration into the genome of host cells, and most AAV genomes remain episomal, the fractions of AAV genomes do integrate at random double strand breaks that occur naturally in cells (New see (Curr Opin Mol Ther.) 2009, month 8; 11 (4): 442-447). This may result in the gene sequence expressing BE being present continuously over the life of the organism. Furthermore, AAV genomes persist as episomes within the nucleus of transduced cells and can last for years, which may lead to long-term expression of BE in these cells, and thus an increased risk of off-target effects, as the risk of off-target events occurring varies with time when the editing enzyme is active. Adenoviruses (Ad) such as Ad5 can efficiently deliver DNA to mammalian liver in an efficient load, and can package up to 45kb of DNA. Adenovirus is however understood to induce a strong immune response in mammals (http:// dx. Doi. Org/10.1136/gun. 48.5.733), including in patients (https:// doi. Org/10.1016/j. Ymthe. 2020.02.010) that may lead to serious adverse events including death.
Non-viral delivery vectors comprising lipid nanoparticles and polymer nanoparticles (discussed in doi: 10.1038/mt.2012.79) have several advantages compared to viral delivery vectors, which comprise lower immunogenicity and transient expression of nucleic acid cargo. Transient expression elicited by non-viral delivery vectors is particularly suitable for genome editing applications, as off-target events are expected to be minimized. In addition, unlike viral vectors, non-viral delivery has the potential to be repeatedly administered to achieve therapeutic effects. There may also be no theoretical limit to the size of the nucleic acid molecules that may be packaged in a non-viral vector, but in practice packaging becomes less efficient as the size of the nucleic acid increases, and the particle size may increase.
Non-viral vectors such as Lipid Nanoparticles (LNPs) can BE used to deliver BE in vivo by encapsulating synthetic mRNA encoding BE into the LNP along with guide RNA. This may be done using any suitable method, for example as described by Finn et al (DOI: 10.1016/j. Celep.2018.02.014) or Yin et al (DOI: 10.1038/nbt.3471). LNP may be biased towards the liver's hepatocytes, which are also target organ/cell types when attempting to interfere with PCSK9 gene expression, delivering their cargo. To demonstrate the proof of concept of this approach, it is envisaged that a BE consisting of a novel genome editing protein fused to a deaminase domain may BE encoded in synthetic mRNA and packaged in LNP along with an appropriate guide RNA targeting a selected site in the mouse PCSK9 gene. In the case of mice engineered to express the human PCSK9 gene, the guide may be designed to selectively target the human PCSK9 gene or both the human and mouse PCSK9 genes. After injection of these LNPs, the efficiency of editing at the mid-target site in the liver cell genome can be analyzed by amplicon sequencing or other methods such as by split-trace indels (doi: 10.1093/nar/gku). Physiological effects can be determined by measuring lipid levels, including total cholesterol and triglyceride levels, in the blood of mice using standard methods.
Another example of a disease that can BE modeled in mice to evaluate novel BE is type I primary hyperoxaluria. Type I primary hyperoxaluria (PH 1) is a rare autosomal recessive disease caused by a deficiency in the AGXT gene encoding the enzyme alanine-glyoxylate aminotransferase. This results in defects in glyoxylate metabolism and accumulation of oxalic acid, a toxic metabolite. One approach to treating this disease is to reduce the expression of the enzyme Glycolate Oxidase (GO) that produces glyoxylate from glycolate, and thereby reduce the amount of substrate available to form oxalate (glyoxylate). PH1 can be modeled in mice in which both copies of the AGXT gene have been knocked out (AGXT-/-mice), resulting in a significant 3-fold increase in oxalate levels in urine compared to wild type controls. Thus agxt-/-mice can be used to assess the efficacy of novel base editors designed to generate stop codons in the coding sequence of endogenous mouse GO genes. To identify the BE system and the best guide for the introduction of stop codons in the GO gene, screening can BE performed in mouse liver cell lines such as Hepa1-6 cells. Computer screening can BE used first to identify guides targeting GO genes using various available BE systems. To select among a large number of possible guides, a computer analysis may be performed to determine which guides have an edit window covering the sequence that can produce a stop codon when edited. In some cases, a guide closer to the 5' end of the coding sequence may be utilized. The resulting collection of guides and BE proteins can BE combined to form ribonucleoprotein complexes (RNPs) and can BE nuclear transfected into Hepa1-6 cells. After 72 hours, editing efficiency at the target site can be determined by NGS analysis. Based on these in vitro results, one or more BE/guide combinations that result in the highest frequency of stop codon formation can BE selected for in vivo testing of mice.
The BE and guide can BE delivered to mice using AAV virus having a split intein system expressing BE and the 3 rd AAV of the delivery guide. Alternatively, adenovirus type 5 can BE used to deliver BE and guidance in a single virus, due to its packaging capacity >40 Kb. Further, BE can BE delivered as mRNA along with guide RNA packaged in an appropriate LNP. After intravenous injection of LNP into agxt-/-mice, the oxalate level in urine can BE monitored over time to determine if the oxalate level is decreasing, which can indicate that BE is active and has the desired therapeutic effect. To determine if BE has introduced a stop codon, appropriate regions of the GO gene can BE PCR amplified from genomic DNA extracted from the livers of treated mice and control mice. The resulting PCR products can be sequenced using next generation sequencing to determine the frequency of sequence changes.
EXAMPLE 10 Gene discovery of novel deaminase
Proprietary and publicly assembled metagenomic sequencing data of 4 tbps (terabase pairs) from different environments (soil, sediment, groundwater, thermophilic, human and non-human microbiomes) were mined to discover novel deaminase enzymes. HMMER3 (HMMER. Org) was used to construct and search the recorded HMM curves of deaminase for all predicted proteins to identify deaminase from the database. The predicted and reference (e.g., eukaryotic apodec 1, bacteria TadA) deaminase is aligned with MAFFT and phylogenetic tree is inferred using FastTree 2. Novel families and subfamilies are identified by identifying clades composed of the sequences disclosed herein. Candidates are selected based on the presence of key catalytic residues indicative of enzymatic function (see, e.g., SEQ ID NO:1-51, 385-386, 387-443, 444-447, 488-475, 599-675, 744-835 or 970-982).
EXAMPLE 11 plasmid construction
DNA fragments of genes were synthesized at Twist Bioscience (Twist Bioscience) or integrated DNA Technologies (INTEGRATED DNA Technologies, IDT). Plasmid DNA was amplified in Endura electrocompetent cells (Lucigen corporation) and isolated by QIAPREP SPIN MINIPREP kit (qiagen corporation). The vector backbone was prepared by restriction enzyme digestion of the plasmid. Inserts were amplified by Q5 high-fidelity DNA polymerase (new england biosystems) using primers ordered from either the tourmaline biomedical company or IDT (SEQ ID NOs: 690-707). Both the vector backbone and the insert were purified by gel extraction using a gel DNA recovery kit (Zymo research). One or more DNA fragments were assembled into the vector by NEBuilder HiFi DNA assembly (New England Biolabs) (SEQ ID NOS: 483-487, 720-726 or 737-738).
Example 12 evaluation of base editing efficiency in E.coli by sequencing
PCR amplification was performed using 5ng of the extracted DNA prepared in example 4 as a template and using primers (P137 and P360), and the resulting product was submitted for Mulberry sequencing at the company of Lelin Biomedicine. Primers used for sequencing are shown in tables 6 and 7 (Seq ID No: 523-531).
TABLE 6 primers for base editing analysis of lacZ gene in E.coli
/>
/>
TABLE 7 primers for base editing analysis of Uracil Glycosylase Inhibitor (UGI) action in E.coli
Name of the name SEQ ID NO: Description of the invention Sequence (5 '- > 3')
P137 523 Forward primer for amplifying lacZ CCAGGCTTTACACTTTATGCT
P360 524 Reverse primer for amplifying lacZ CGAACATCCAAAAGTTTGTGTTTTT
P461 530 Mulberry sequencing primer of lacZ site GGATTGAAAATGGTCTGCTG
Figures 8A-8C show example base editing of the enzyme interrogated by this experiment, as assessed by sanger sequencing.
FIGS. 10A-10B show the base editing efficiency of Adenine Base Editor (ABE) using TadA (ABE 8.17 m) (SEQ ID NO: 596) and MG nicking enzyme according to Table 3. TadA is tRNA adenine deaminase; tadA (ABE 8.17 m) is an engineered variant of E.coli TadA. Twelve MG nicking enzymes fused to TadA (abe 8.17 m) were constructed and tested in e.coli. Three guides were designed to target lacZ. The numbers shown in the boxes indicate the percentage of a-to-G conversion quantified by edit R at each position. Abe8.17m was used as a positive control for the experiment.
FIGS. 11A-11B show the base editing efficiency of the Cytosine Base Editor (CBE) including rat APOBEC1, MG nicking enzyme, and uracil glycosylase inhibitor of Bacillus subtilis phage (UGI (PBS 1)). Apodec 1 is a cytosine deaminase. 12 MG nicking enzymes fused to rAPOBEC at the N-terminus and fused to UGI at the C-terminus were constructed and tested in E.coli. Three guides were designed to target lacZ. The numbers shown in the boxes indicate the percentage of C-to-T conversion quantified by editing R. BE3 was used as a positive control for the experiment.
FIG. 12 shows the effect of MG Uracil Glycosylase Inhibitors (UGI) on base editing activity when added to CBE. (a) MGC15-1 is composed of an N-terminal APOBEC1, MG15-1 nickase and a C-terminal UGI. Three MG UGIs were tested for improvement of cytosine base editing activity in e. (b) BE3 includes N-terminus rAPOBEC, spCas9 nickase, and C-terminus UGI. Two MG UGIs were tested for improvement of cytosine base editing activity in HEK293T cells. The editing efficiency was quantified by editing R.
Example 13 cell culture, transfection, next Generation sequencing and base editing analysis
HEK293T cells were grown and passaged with 5% CO 2 at 37 ℃ in duchenne modified i's medium plus GlutaMAX (Ji Boke) supplemented with 10% (v/v) fetal bovine serum (Ji Boke company, gibco). 5x 10 4 cells were seeded onto 96-well cell culture plates (Ke Shi to Costar) treated for cell adhesion, grown for 20 to 24 hours, and the spent media was refreshed with fresh media prior to transfection. Per well transfection was performed using 200ng of expression plasmid and 1 μl lipofectamine2000 (sameir feishier technologies) according to the manufacturer's instructions. Transfected cells were grown for 3 days, harvested, and gDNA extracted with QuickExtract (Lucigen Co.) according to the manufacturer's instructions. The base-edited targeting region was amplified using Q5 high-fidelity DNA polymerase (New England Biolabs) with the primers listed in tables 8 and 9 (SEQ ID NOS: 538-585) and the extracted DNA was used as a template.
TABLE 8 primers for base edit analysis of UGI effects in HEK293T
Name of the name SEQ ID NO: Description of the invention Sequence (5 '- > 3')
P577 536 Forward primers for amplification of targeting regions GAGGCTGGAGAGGCCCGT
P578 537 Reverse primer for amplifying a target region GATTTTCATGCAGGTGCTGAAA
P577 536 Mulberry sequencing primer GAGGCTGGAGAGGCCCGT
TABLE 9 a-primers for amplifying the targeting region in HEK293T cells transfected with A0A2K5RND7-MG nicking enzyme-MG 69-1
/>
/>
The PCR product was purified using HIGHPREP PCR clean up system (MAGBIO company (MAGBIO)) according to the manufacturer's instructions. The effect of Uracil Glycosylase Inhibitors (UGIs) on base editing of candidate enzymes was analyzed by submitting PCR products to sanger sequencing with the tourmaline biomedical company, and the efficiency was quantified by editing R. For analysis of base editing of A0A2K5RND7-MG nicking enzyme-MG 69-1, adaptors for Next Generation Sequencing (NGS) were attached to the PCR products by subsequent PCR reactions using KAPA HiFi HotStart ReadyMix PCR kit (Roche) and primers compatible with TruSeq DNA library preparation kit (Mena). The DNA concentration of the resulting product was quantified by TapeStation (agilent) and the samples pooled together to prepare a library for NGS analysis. The resulting library was quantified by qPCR with the Aria real-time PCR system (agilent) and high throughput sequencing was performed with the company Miseq instrument according to the manufacturer's instructions. The sequencing data compiled by Cripresso's 2 bases was analyzed.
FIGS. 13A-13B show graphs of the sites targeted by the base editor showing the base editing efficiency of the cytosine base editor including a protein containing a CMP/dCMP type deaminase domain (uniprot accession A0A2K5RDN 7), MG nicking enzyme and MG UGI. The construct included N-terminal A0A2K5RDN7, MG nickase and C-terminal MG69-1. For simplicity, the identification of MG nicking enzymes is shown in the figure. BE3 (APOBEC 1) was used as a positive control for base editing. Empty vector was used for negative control. Three independent experiments were performed on different days. Abbreviations: r, repeating; NEG, negative control.
Table 9b: protein domains used in the constructs in example 13
EXAMPLE 14 Positive selection of base editor mutants in E.coli
FIG. 14 shows a positive selection method for TadA characterization in E.coli. Panel (a) shows a diagram of one plasmid system used for TadA selection. Vectors include CAT (H193Y), CAT-targeted sgRNA expression cassettes, and ABE expression cassettes. In this figure, the N-terminus TadA from E.coli and the C-terminus SpCas9 (D10A) from Streptococcus pyogenes are shown. Panel (b) shows a sequencing trace demonstrating the editing of the A2 position of the CAT (H193Y) template strand, reversing the H193Y mutant to wild-type and restoring its activity when introduced into/transformed into E.coli cells. Abbreviations: CAT, chloramphenicol acetyl transferase.
1. Mu.L of plasmid solution at a concentration of 10 ng/. Mu.L was transformed into 25. Mu.L of BL21 (DE 3) electrocompetent cells (Lucigen Co.) and recovered with 975. Mu.L of expression recovery medium at 37℃for 1 hour. mu.L of the resulting cells were spread on LB agar plates containing 100. Mu.g/mL carbenicillin (carbenicillin) 0.1mM IPTG and an appropriate amount of chloramphenicol. Plates were incubated at 37℃until colonies were picked. Genomic regions containing base edits were amplified using colony PCR and the resulting products were submitted to sanger sequencing with the tourmaline biomedical company. Primers used for PCR and sequencing are set forth in Table 10 (SEQ ID NOS: 532-537).
TABLE 10 primers for base edit analysis of CAT (H193Y)
Fig. 15 shows that the mutation caused by TadA makes chloramphenicol (Cm) highly tolerant. Panel (a) shows photographs of growth plates in which different concentrations of chloramphenicol were used to select antibiotic resistance of e.coli. In this example, a wild type and two variants of TadA from E.coli (EcTadA) were tested. Panel (b) shows a summary of results demonstrating that ABEs carrying mutation TadA show higher editing efficiency than wild type. In these experiments, colonies were selected from plates with Cm greater than or equal to 0.5. Mu.g/mL. For simplicity, the identity of the deaminase is shown in the table, but the effector (SpCas 9) and construct organization are shown in the upper panel.
FIGS. 16A-16B show investigation of MG TadA activity in positive selection. Fig. 16A shows photographs of growth plates from experiments in which 8 MG68TadA candidates (ABE including N-terminal TadA variant and C-terminal SpCas9 (D10A) nickase) were tested against chloramphenicol at 0 to 2 μg/mL. For simplicity, the identity of deaminase is shown. Panel (b) shows a summary table depicting the edit efficiency of MG TadA candidates. FIG. 16B demonstrates base editing of MG68-3 and MG68-4 driven adenine. In this experiment, colonies were selected from plates with Cm greater than or equal to 0.5. Mu.g/mL.
FIG. 17 shows improvement in base editing efficiency of MG68-4_nSpCas9 by the D109N mutation on MG 68-4. Panel (a) shows photographs of growth plates in which wild-type MG68-4 and variants thereof were tested against chloramphenicol of 0 to 4 μg/mL. For simplicity, the identity of deaminase is shown. The adenine base editor in this experiment included an N-terminal TadA variant and a C-terminal SpCas9 (D10A) nickase. Panel (b) shows a summary table depicting the edit efficiency of MG TadA candidates. Panel (b) demonstrates that MG68-4 and MG68-4 (D109N) show base editing of adenine, with the D109N mutant exhibiting increased activity. In this experiment, colonies were selected from plates with Cm greater than or equal to 0.5. Mu.g/mL.
FIG. 18 shows the base editing of MG68-4 (D109N) _nMG 34-1. Panel (a) shows a photograph of an experimental growth plate in which ABE including N-terminal MG68-4 (D109N) and C-terminal SpCas9 (D10A) nickases was tested for chloramphenicol at 0 to 2 μg/mL. Panel (b) shows a summary table depicting editing efficiency with and without sgRNA. In this experiment, colonies were selected from plates with Cm greater than or equal to 1. Mu.g/mL.
FIG. 19 shows 28 MG68-4 variants designed to improve MG68-4-nMG34-1 base editing activity. 12 residues were selected for targeted mutagenesis to improve enzyme editing.
EXAMPLE 15 plasmid construction of E.coli optimization constructs
All plasmids for cytidine deaminase expression were prepared by the company tevister biotechnology. Each construct was codon optimized for E.coli expression and inserted into the XhoI and BamHI restriction sites of the pET-21 (+) vector. The sequence was designed to exclude BsaI restriction sites. The following sequences were appended to the beginning of each construct: 5'-GAAATAATTTTGTTTAACTTTAAGAAGGAGATATACATATGGGCAGCA GTCATCATCATCACCATCAC-3'. This sequence encodes a ribosome binding site and an N-terminal hexahistidine tag. At the end of each CDA sequence, a stop codon was added to prevent incorporation of the C-terminal HisTag encoded by pET-21 (+).
EXAMPLE 16 plasmid construction of mammalian optimization constructs
All plasmids used for cytidine deaminase expression in mammalian cells were codon optimized and ordered from tev biotechnology company. Each construct was codon optimized for homo sapiens (h.sapiens) expression. The restriction sites avoided are: bsaI, sphI, ecoRI, bmtI, bstX, blpI and BamHI. The following sequences are appended to the codon-optimized sequence 5': ACCGGTGCTAGCCCACC. This sequence contains BmtI restriction sites for downstream cloning and a Kozak sequence for maximum translation. The following sequences are appended to the 3' of the codon optimized CDA: AGCGCATGC. This sequence contains SphI restriction sites to allow downstream cloning-removal of the stop codon in all constructs.
EXAMPLE 17 cell culture, transfection, next Generation sequencing and base editing analysis
HEK293T cells were grown and passaged with 5% CO 2 at 37 ℃ in duchenne's modified i-medium plus GlutaMAX (Ji Boke) supplemented with 10% (v/v) fetal bovine serum (Ji Boke). 2.5X10 4 cells were seeded on 96 well cell culture plates (Ke Shi to company) treated for cell adhesion, grown for 20 to 24 hours, and the spent medium was refreshed with fresh medium prior to transfection. Per well transfection was performed using 300ng of expression plasmid and 1 μl lipofectamine 2000 (sameimers scientific company) according to the manufacturer's instructions. Transfected cells were grown for 3 days, harvested, and gDNA extracted with QuickExtract (Lucigen Co.) according to the manufacturer's instructions. The base-edited targeting region was amplified with primers (SEQ ID NOS: 690-707, 865-872, and 932-961) using Q5 high-fidelity DNA polymerase (New England Biolabs), and the extracted DNA was used as a template. The PCR product was purified by HIGHPREP PCR clean up system (MAGBIO company) according to the manufacturer's instructions. For analysis of the base substitution of the adenine base editor, adapters for Next Generation Sequencing (NGS) were attached to the PCR products by subsequent PCR reactions using KAPAHiFi HotStart ReadyMix PCR kit (roche) and primers compatible with the TruSeq DNA library preparation kit (henna). The DNA concentration of the resulting product was quantified by TapeStation (agilent) and the samples pooled together to prepare a library for NGS analysis. The resulting library was quantified by qPCR with the Aria real-time PCR system (agilent) and high throughput sequencing was performed with the company Miseq instrument according to the manufacturer's instructions. The sequencing data compiled by Crispresso's 2 bases was analyzed.
EXAMPLE 18 in vitro deaminase in gel assay
Linear DNA constructs containing cytidine deaminase were amplified by PCR from the previously mentioned plasmids from terweis corporation. All constructs were cleaned by SPRI purification (SPRI Cleanp, lucigen) and eluted in 10mM tris buffer. The enzyme was expressed from the PCR template in an in vitro transcription translation system PURExpress (NEB) at 37℃for 2 hours. Deamination was prepared by mixing 2uL PURExpress reaction with 2um 5' -FAM labeled ssDNA (IDT) and 1U USER enzyme (NEB) in 1x Cutsmart buffer (NEB). The reaction was incubated at 37℃for 2 hours and then quenched by addition of 4 units of proteinase K (NEB) and 10 minutes at 55 ℃. The reaction was further treated by adding 11uL of 2x RNA-loaded dye and incubated for 10min at 75 ℃. All reaction conditions were analyzed by gel electrophoresis in a 10% denaturing gel (burle). The DNA bands were visualized by Chemi-Doc imager (burle) and band intensities were quantified using burle image lab v 6.0. Successful deamination was observed by visualization of the 10bp fluorescence-labeled band in the gel (fig. 20). The results indicate that MG93-3 to MG93-7, MG93-11, MG138-17, MG138-20, MG138-23, MG139-12 and MG139-19 to MG139-21 are capable of deaminating the cytidine-containing substrate.
In vitro activity of more than 90 novel cytidine deaminase on ssDNA substrates containing cytosine was measured in all four possible 5' -NC cases (fig. 23). 38 of these cytidine deaminases showed ssDNA deamination activity, comprising 5 cytidine deaminases capable of substantially completely deaminating the target cytidine (MG 139-84/SEQ ID NO:808, MG139-86/SEQ ID NO:810, MG139-87/SEQ ID NO:811, MG139-95/SEQ ID NO:819 and MG139-102/SEQ ID NO:826, see, e.g., FIG. 23). In addition, some deaminase also show more than 50% deamination of target cytosine (MG 139-30/SEQ ID NO:752, MG139-55/SEQ ID NO:777, MG139-99/SEQ ID NO: 823). While most of the reported DNA cytidine deaminase acts primarily on ssDNA, bases immediately 5' to substrate C are generally favored, but the relevant dsDNA substrate is also included as a control (fig. 24), verifying that MG139-86 and MG139-87 are also capable of deaminating the dsDNA substrate.
Example 19-NGS-based in vitro deep deamination assay
SsDNA libraries with single target C were created to determine cytosine deaminase activity and binding site preferences. Briefly, ssDNA substrate oligonucleotides 5' -NNNCNNN (integrated DNA technologies) flanking the 21-nt and 21-nt regions, including adenine, an upstream 20nt random barcode, and two conserved primer binding sites, were synthesized.
This resulted in an oligonucleotide pool with 4096 unique substrate sequences. In the case of non-target C deamination events, a unique barcode was included on each oligomer to determine the original variable region after sequencing. First, deaminase was expressed from the PCR template in an in vitro transcription translation system PURExpress (NEB) at 37 ℃ for 2 hours. PURExpress was then incubated with 0.5pmol of substrate oligonucleotide pool in 50mM Tris, pH 7.5, 75mM NaCl for 1 hour at 37 ℃.
A. Half of the treated pools were amplified using Accel-NGS1S Plus (Swift) to generate dsDNA pools. This pool was then further amplified with a unique double index and >15,000 reads/sample were sequenced on MiSeq.
B. half of the treated pools were annealed to the appropriate 3' -barcode encoded adaptors (IDT) and treated with T4 DNA polymerase at 12 ℃ for 20 minutes to create dsDNA pools. Using the conserved regions, this pool was amplified with a unique double Index (IDT) and >15,000 samples were sequenced on MiSeq.
EXAMPLE 20 lentivirus production and transduction
HEK293T cells were grown and passaged with 5% CO 2 at 37 ℃ in duchenne's modified i-medium plus GlutaMAX (Ji Boke) supplemented with 10% (v/v) fetal bovine serum (Ji Boke). The day before transfection, cells were seeded at 5x 10 6/dish. On the day of transfection, 8G PsPax, 1 μg pMD2-G and 9 μg plasmid containing cytidine deaminase fused to MG3-6 or Cas9 were mixed together and packaged into Mirus LT1 transfection reagent (Mirus Bio Inc. (Mirus Bio)). The mixture was transfected into HEK293T cells. Lentiviruses were collected 3 days after transfection, filtered through a 0.4uM filter, and immediately used to transduce cells. Transduction was performed by adding 1/2 volume of the supernatant-containing virus to cells with 8 μg/mL of polybrene.
EXAMPLE 21 adenine and cytidine base editors in E.coli and mammalian cells
To demonstrate that MG34-1, a small type II CRISPR nuclease can be used as a base editor, a construct was generated comprising TadA x (8.17 m) -nMG34-1 (ABE-MG 34-1, SEQ ID NO: 727), wherein TadA x (8.17 m) is engineered TadA from E.coli, and a construct comprising rAPOBEC-nMG 34-1-UGI (PBS) (CBE-MG 34-1, SEQ ID NO: 739), wherein rAPOBEC1 is rat APOBEC1, and UGI (PBS) is an uracil glycosylase inhibitor of B.subtilis phage. TadA (8.17 m) -nSpCas (SEQ ID NO: 728) and rAPOBEC1-nSpCas9-UGI (PBS) (SEQ ID NO: 740) were generated as positive controls for the analysis of the edit curves. Four guides targeting the lacZ gene in E.coli were designed and prepared for each base editor construct (SEQ ID NOS: 729-736). The plasmid was transformed into BL21 (DE 3), recovered in recovery medium at 37℃for 1 hour, and cell plates were plated on LB agar plates containing 100. Mu.g/mL carbenicillin and 0.1mM IPTG. After culturing the cells at 37 ℃ for 16 to 20 hours, the targeted region in the escherichia coli genome was amplified using colony PCR, and the resulting products were analyzed with sanger sequencing at the company of tourmaline biological medicine (fig. 22A-22C). Sequencing results showed that both ABE-MG34-1 and CBE-MG34-1 edited the target locus in the e.coli genome at levels and within editing windows comparable to the positive control SpCas9 base editor (fig. 22A and 22B). Further TadA (8.17 m) -nMG34-1 showed higher base substitution at both targeting loci. ABE-MG34-1 also showed base editing in human cells with editing efficiency of up to 22% across three different genomic targets (fig. 22C).
To determine whether SMART HNH endonuclease-related RNA and ORF (HEARO) enzymes could be used as base editors, ABE was constructed by fusing TadA x- (7.10) deaminase monomer to the C-terminus of engineered MG35-1 containing the D59A mutation (fig. 22E). The a to G editing of this ABE was tested in a positive selection single plasmid e.coli system where ABE was required to reverse the Chloramphenicol Acetyl Transferase (CAT) gene containing the Y193 mutation back to H193 to survive chloramphenicol selection (fig. 22D). This plasmid contains sgrnas with spacers targeting mutant CAT genes or out-of-order non-targeting spacers (control). When grown on plates containing 2,3 and 4. Mu.g/mL chloramphenicol, enrichment of colonies was detected using E.coli transformed with ABE-MG35-1 targeting the CAT gene, whereas no colonies grew on plates containing 8. Mu.g/mL chloramphenicol (FIG. 22E). Mulberry sequencing confirmed that 26 of 30 colonies selected from 2,3 and 4. Mu.g/mL plates transformed with the target spacer contained the expected Y193H reversal (Table 11 and FIG. 31).
TABLE 11 survival assay of E.coli with ABE-MG35-1
Colonies grown on plates containing chloramphenicol concentrations of 0, 2, 3, and 4 μg/mL were sequenced to confirm the reversal of CAT gene function. Experiments were performed with n=2.
It will be appreciated that four colonies without inverted CAT sequences contained more unedited copies than the edited copy of the selection construct, as a single inverted CAT gene was sufficient to confer colony survival. For E.coli cells transformed with non-targeting spacers, no colonies were observed on the 2, 3, 4 and 8. Mu.g/mL plates. When 0 μg/mL conditions were used as transformation controls, 1 colony of 10 colonies selected from 0 μg/mL plates of cells transformed with the targeting spacer contained Y193H reversals, indicating detectable levels of editing without chloramphenicol selection. However, enrichment of colony growth under chloramphenicol selection targeting ABE-MG35-1 conditions confirmed that MG35-1 nickase was a successful component of base editing. At 623aa, ABE-MG35-1 represents the smallest nicking enzyme based adenine base editor so far (Table 12).
Table 12-size comparison of SMART nuclease against reference
Enzymes Length (aa) ABE length (aa) CBE length (aa)
MG34-1 748 969 1104
MG35-1 429 623 -
SpCas9 1376 1588 1723
CasMINI (V type) 529 - -
The base editor (ABE and CBE) sizes are approximated based on the number of linkers and added NLS signals. * For ABE, the size was estimated with one TadA monomer.
EXAMPLE 22 adenine base editor in mammalian cells
In previous experiments, MG68-4v1 (predicted to be a tRNA adenosine deaminase) was able to convert adenine to guanine, allowing the bacteria to survive chloramphenicol selection. Next, two base editors, MG68-4v1-nMG34-1 and MG68-4v1-nSpCas9, were constructed to fuse deaminase with nickase. As a positive control for deaminase activity, an active variant engineered by Gaudelli et al and producing TadA x (8.8 m) -nMG34-1 was used. To ensure that the genomic locus is accessible by the base editor, a guide was selected that has shown SpCas9 activity in mammalian cells. Of the 9 sites tested, MG68-4v1-nMG34-1 showed an editing efficiency of 11.3% at position 8 of site 2. When MG68-4v1 was fused with nSpCas, the base editor exhibited 22.3% efficiency at position 5 at position 1 and 4.4% efficiency at position 6 at position 8. Substitution of TadA x (8.8 m) for MG68-4v1 in MG68-4v1-nMG34-1 shows 7.3% and 9.7% at positions 5 and 7 of site 1, respectively. At positions 6 and 8 of site 2, the efficiency increased to 16.5% and 19.5%, respectively. Furthermore, when targeting site 7, 4.1% and 3.4% editing was observed at positions 7 and 8. Taken together, these results indicate that MG68-4v1 and nMG34-1 exhibit base-editing activity in mammalian cells (FIG. 21).
EXAMPLE 23 Activity in mammalian cells (cytidine deaminase assay in tissue culture cells) (prophetic)
The cytidine deaminase assay in a cell is designed such that when the mutated stop codon ACG is mutated to ATG by cytidine deaminase, the cell can translate the blasticidin gene and thus acquire resistance to this antibiotic. When a reporter gene cell line (ACG-containing cells) is transduced with a cytidine deaminase library fused to Cas9 or MG3-6, it is expected that the fraction of cells will mutate ACG to ATG and thus acquire resistance to blasticidin. Cells that have acquired such resistance and thus survive the selection assay are then subjected to Next Generation Sequencing (NGS) to reveal the identity of successful cytidine deaminase that shows cytidine base editor activity.
EXAMPLE 24 mammalian constructs for Cytosine Base Editor (CBE)
Plasmids were constructed using the NEB HiFi assembly mixture and DNA fragments containing novel cytidine deaminase, nuclease and UNG sequences for the nicking enzyme forms of CBE using spCas9, MG3-6 and MG 34-1. For the spCas9 containing construct, pAL318 was digested with NotI and XmaI restriction enzymes. For the construct containing MG3-6, pAL320 was digested with NcoI restriction enzyme. For the MG34-1 containing construct, pAL226 was digested with NotI and BamHI restriction enzymes.
For experiments targeting engineered cell lines (SEQ ID NO: 962), CDA was fused to MG3-6 nickase. To clone the CDA construct in the MG3-6 nickase backbone, CDA was ordered as a gene fragment from the company Tweisi, and digested with SphI and BmtI. The plasmid backbone containing MG3-6 was digested with SphI and BmtI and the gene fragments were ligated using T4 DNA ligase. The plasmid backbone contains the mU6 promoter for cloning of gRNA targeting the engineered site. Spacers targeting engineered sites using MG3-6 are shown in SEQ ID NOS 963-967.
CBEs were constructed using various combinations of cytidine deaminase, nicking enzyme effectors, and uracil glycosylase inhibitors (fig. 25A-25C). In general, 14 cytidine deaminases (13 novel cytidine deaminases (MG139-12(SEQ ID NO:970)、MG93-3(SEQ ID NO:971)、MG93-4(SEQ ID NO:972)、MG93-5(SEQ ID NO:973)、MG93-6(SEQ ID NO:974)、MG93-7(SEQ ID NO:975)、MG93-9(SEQ ID NO:976)、MG93-11(SEQ ID NO:977)、MG138-17(SEQ ID NO:978)、MG138-20(SEQ ID NO:979)、MG138-23(SEQ ID NO:980)、MG138-32(SEQ ID NO:981) and MG142-1 (SEQ ID NO: 982)) were shown to be active in vitro, and A0A2K5RDN7 cytidine deaminase was fused with 3 effectors (spCas 9 (SEQ ID NO:877-889 and 968), MG3-6 (SEQ ID NO:890-902 and 969), or MG34-1 (SEQ ID NO: 903-916)) each to generate 42 different CBEs. The fusion containing spCas9 was fused to the C-terminal UGI and the fusion containing MG3-6 or MG34-1 was fused to the C-terminal MG69-1 UGI. Each CBE was tested with 5 sgRNAs targeting the HEK293 genome (spCas 9 (SEQ ID NO: 917-921), MG3-6 (SEQ ID NO: 922-926) or MG34-1 (SEQ ID NO: 927-931)). The level of editing (C to T (%)) of all cytosines within 5bp of the spacer region is shown. Many CBEs show detectable levels of editing when transiently transfected into HEK293 cells. When fused with spCas9, the edit rates of both MG93-4 and MG138-20 at certain sites exceeded 5%, and the edit rates of MG93-3, MG93-7, and A0A2K5RDN7 exceeded 10%. When fused with MG3-6, MG93-4 and A0A2K5RDN7 were more than 5% edited at some point. When fused with MG34-1, MG93-4, MG93-6, and MG93-9 edit more than 5% at some point, MG93-3, MG93-7, and MG139-12 edit more than 10%, and MG93-11 and A0A2K5RDN7 edit more than 20%. Many novel cytidine deaminase enzymes have been identified that are compatible with spCas9, MG3-6 and MG34-1 and are capable of deaminating cytosines in mammalian cells.
To test the novel CDA and-1 nucleotide bias assay, CDA was fused to MG3-6 and tandem targeted to reporter cell lines with 5 engineered PAMs (SEQ ID NO: 962). 14 CDAs were tested using this system, and many CDAs showed >1% editing (panel (a) of fig. 26). For the novel CDA fused to MG3-6, the highest activity observed was 38.4% of MG152-6, with the second highest activity being 17.6% of MG 139-52. Its relative activity versus A0A2K5RDN7 is shown in panel (b) of FIG. 26. Interestingly, it was also observed that high activity MG139-52 can deaminate DNA strands that are part of the DNA/RNA heteroduplex in the R loop (as well as ssDNA); an example of this is shown in panel (c) of fig. 26. This activity (DNA deamination when DNA is in a DNA/RNA heteroduplex) can highly improve off-target effects as well as editing windows, both of which may be beneficial to cytotoxicity.
EXAMPLE 25 cytosine base editor toxicity in mammalian cells
HEK293T cells were transduced with lentiviruses carrying the newly discovered CDA fused to MG 3-6. Successful transformants were selected by using 2. Mu.g/mL puromycin for 3 days. The dead cells were washed with PBS and the surviving cells were fixed and stained with 50% methanol and 1% crystal violet (panel (a) of fig. 27). The cells were then photographed in chemidoc and absorbance was measured by dissolving crystal violet in 1% SDS and measuring at 570nm (panel (b) of fig. 27).
High activity CDAA0A2K5RDN7 showed high editing efficiency, but it also showed high cytotoxicity (panel (a) of fig. 27). Deaminase was assayed as a base editor (fused to MG 3-6) and stably expressed in HEK293T cells. MG93-3 and MG93-4 both showed much less cytotoxicity than A0A2K5RDN 7. Quantification of toxicity measurements (panel (b) of FIG. 27) showed that MG93-3 and MG93-4 had a toxicity less than rAPOBEC.
EXAMPLE 26 directed evolution of adenosine deaminase in E.coli
MG68-4 having the D109N mutation can improve DNA editing efficiency in E.coli. For simplicity, this variant is designated r1v1. To further increase editing efficiency in mammalian cells, the deaminase portion of MG68-4 (D109N) -nMG34-1 was randomly mutagenized by error-prone PCR. The resulting library was tested for editing activity of variants by E.coli positive selection using chloramphenicol acetyl transferase with H193Y mutation.
To conduct this experiment, the gene fragment of MG68-4 (D109N) was mutagenized by the GeneMorph II random mutagenesis kit according to the manufacturer's instructions. Generally, 500ng of DNA template was used and 20 cycles of PCR reaction were performed to obtain mutation frequencies in the range of 0 to 4.5 mutations/kb. Vector pAL478 carrying nMG34-1, CAT (H193Y) and the unidirectional expression cassette was linearized by SacII and KpnI digestion. PCR products from random mutagenesis were then cloned into linearization vectors by NEBuilder HiFi DNA assembly kit. The assembled product was transformed into BL21 (DE 3) (Lucigen) and recovered with recovery medium and plated on LB agar plates containing 100. Mu.g/mL carbenicillin, 0.1mM IPTG and chloramphenicol at concentrations of 2, 4 and 8. Mu.g/mL. After bacterial selection, 260 colonies were selected from plates of 4 and 8 μg/mL chloramphenicol and sequenced by sanger sequencing with the company tourmaline biomedical. Colonies carrying the point mutation on MG68-4 (D109N) were grown in 96-well deep well plates and pooled. Plasmids of these cells were isolated using QIAPREP SPIN MINIPREP kit (QIAGEN), and the MG68-4 variant was subcloned into pAL478 by digestion and ligation using restriction enzymes (SacII and KpnI) and T4 DNA ligase, respectively. The resulting library was transformed into Endura electrically competent cells (Lucigen Co.), amplified, and isolated by miniprep. The collected DNA was transformed into BL21 (DE 3) and deaminase activity was tested using chloramphenicol selections at concentrations of 2, 16, 32, 64 and 128 μg/mL. 128 colonies (which are understood to contain mutations that promote deaminase activity of the MG68 enzyme and survival under chloramphenicol selection) were selected from plates selected from 32, 64 and 128 μg/mL chloramphenicol and sequenced by sanger sequencing.
A total of 25 variants (R2V 1 to R2V24 (SEQ ID NO: 837-860) were uncovered and the mutations were confirmed by sanger sequencing by this evolution process 24 residues mutated to other amino acids were identified (fig. 28.) these mutants contained mutations at: T2 (e.g., T2A), D7 (e.g., D7G), E10 (e.g., E10G), M13 (e.g., M13R), W24 (e.g., W24G), G32 (e.g., G32A), K38 (e.g., K38E), G45 (e.g., G45D), G51 (e.g., G51V), a63 (e.g., a 63S), E66 (e.g., E66V or E66D), R75 (e.g., R75H), C91 (e.g., C91R), G93 (e.g., g., G93W), H97 (e.g., H97Y or H97L), a107 (e.g., a 107V), E108 (e.g., E108D), D109 (e.g., D109N), P110 (e.g., P110H), H124 (e.g., H124Y), a126 (e.g., a 126D), H129 (e.g., H129R or H129N), F150 (e.g., F150P or F150S), S165 (e.g., S165L).
EXAMPLE 27 adenine base editor in mammalian cells
The variants of the adenine base editor identified in example 27 were selected from E.coli for codon optimization for mammalian cell expression and tested in HEK293T cells. Four guides were designed to test A to G conversion in cells (SEQ ID NO:861-864 of the spacer and SEQ ID NO:876 of the MG34-1 guide scaffold). 11 variants (r 2V3, r2V5, r2V7, r2V8, r2V11, r2V12, r2V13, r2V14, r2V15, r2V16 and r2V23 (SEQ ID NO:839, 841, 843, 844, 847, 848, 849, 850, 851, 852 and 859) exhibited a better performance in the first three guides of screening than r1v1. Five residues around the active site (W24, G51, E108, P110 and F150) were found to change when the mutation was displayed on the predicted structure of MG 68-4. Notably, r2V7 (D7G and E10G (SEQ ID NO: 843)) and r2V16 (H129N (SEQ ID NO: 852)) contained mutations distant from the active site, but showed a greater improvement in editing efficiency than the other mutations (FIG. 29.) in this round of screening, with an increase in editing efficiency of r1V 7 from r 7V 7 to 8.9% and an increase in editing efficiency of r2V 7.9% from 2.8% to 2.9% when using guide 2.
EXAMPLE 28 deaminase Activity of ssRNA (prophetic)
The protocol was adapted from Wolfe et al (NAR cancer, 2020, vol.2, 41 th stage doi:10.1093/narcan/zcaa 027). The linear DNA construct containing CDA and A1CF (cofactor) was amplified from the construct prepared by Tevesica (SEQ ID NO: 741) using the same primers developed for gel determination of ssDNA. The constructs were purified by PCR spin column purification (PCR Spin Column Cleanup, kaiji) and analyzed by gel electrophoresis. The enzyme was expressed from the PCR template in an in vitro transcription translation system PURExpress (NEB) at 37℃for 2.5 hours. Deamination was prepared by mixing 2uL of sPURExpress reaction (CDA and A1 CF) with 2uM ssRNA substrate (IDT, SEQ ID NO: 742) in the presence of RNase inhibitor and incubating at 37C for 2 hours. Then 5' FAM-labeled DNA primer (IDT, SEQ ID NO: 743) was added to a concentration of 1.3 uM. The reaction was heated at 95 ℃ for 10 minutes and then allowed to cool gradually to room temperature for at least 30 minutes. Then, a reverse transcription master mix including 5mM DTT, protoscript II RT (NEB) (5U/. Mu.L), protoscript II buffer (NEB) (1X), RNAseOut (Semerfeier's Co.) (0.4U/. Mu.L), dTTP (0.25 mM), dCTP (0.25 mM), dATP (0.25 mM) and ddGTP (5 mM) was added. Full-length transcripts are produced when RNA substrates are deaminated. In contrast, when deamination is not present, "C" will remain in the RNA substrate and the reverse transcription reaction will terminate upon incorporation of ddGTP as opposed to this C. The reaction was incubated at 42℃for one hour and then at 65℃for 10 minutes. Aliquots were then mixed with 2x RNA-loaded dye (NEB) and heated at 75 ℃ for 10 minutes, then cooled on ice for two minutes. Samples were loaded onto 10% or 15% urea-TBE denaturing gels (burle). The DNA bands were visualized by a Chemi-Doc imager (Berle). Successful deamination was observed by visualization of the full length (55 bp) fluorescent-labeled band in the gel. The non-deaminated product appears as a shorter (43 bp) fluorescence labelled band.
Example 29 increased efficiency of cytosine base editing upon Fam72a expression
Fam72a has been recorded as a relative Uracil DNA Glycosylase (UDG) during B cell somatic hypermutation and class switch recombination to prevent mismatch repair-based correction of mutant immunoglobulin alleles. Expression of Fam72a during engineered cytosine base editing can inhibit UDG activity and thereby increase targeted conversion of C to T.
HEK293 cells (150,000) were lipofected using JetOptimus according to the manufacturer's instructions, with the plasmid encoding Cas9-CBE fusion (pMG 3078;500 ng), the plasmid encoding sgRNAPE266 or PE691 (250 ng), and the plasmid encoding Fam72a (pMG 3072;500 ng) or not. Cells were harvested 72 hours after transfection, genomic DNA was prepared, and the extent of base editing was determined by computational analysis of next generation sequencing reads (fig. 32). When Fam72a is co-expressed with a Cas 9-based cytosine base editor, the CMV-driven Fam72a expression construct exhibits increased CBE activity at both loci. It was determined that Fam72a could be used to improve Cytosine Base Editing (CBE) with any type of cytosine base editor, not just Cas 9-based constructs.
Example 30 structural optimization of adenine base editor
Under the control of the CMV promoter (SEQ ID NOS: 1128-1160), 33 rationally designed ABE variants were constructed for use in mammalian cells. Eight constructs contained ABEs with MG68-4 (D109N) adenine deaminase fused to the N-terminus or C-terminus of MG3-6/3-8 nickase (D13A) with linker lengths of 20, 36, 48 and 62 amino acid residues. Additionally, 25 constructs contained ABE with MG68-4 (D109N) adenine deaminase embedded within the RUVC-I, REC, HNH, RUVC-III or WED domains with 18 amino acid linkers fused to either end. These constructs are summarized in table 12A.
Table 12A: rationally designed ABE variants from example 30
/>
* The insert represents the upstream natural residue after insertion of the deaminase. For example, "insert 887AA" indicates that deaminase is inserted between amino acids 887 and 888.
Plasmids expressing 33 ABE variants were transiently co-transfected into HEK293 cells, respectively, with plasmids expressing 8 sgrnas (SEQ ID NOs: 1188-1195) targeting specific loci in the human genome. After 72 hours, cells were harvested and analyzed for target editing (fig. 36 and table 12B).
/>
Sequencing results showed that 19 of the 33 ABEs were able to undergo mid-target editing at a level of at least 1% editing when co-expressed with sgrnas targeting the TRAC locus (figure 33). Constructs used in this experiment contained 3-68_DIV1_M_RDr1v1_B、3-68_DIV2_M_RDr1v1_B、3-68_DIV3_M_RDr1v1_B、3-68_DIV4_M_RDr1v1_B、3-68_DIV5_M_RDr1v1_B、3-68_DIV6_M_RDr1v1_B、3-68_DIV7_M_RDr1v1_B、3-68_DIV8_M_RDr1v1_B、3-68_DIV9_M_RDr1v1_B、3-68_DIV10_M_RDr1v1_B、3-68_DIV11_M_RDr1v1_B、3-68_DIV12_M_RDr1v1_B、3-68_DIV13_M_RDr1v1_B、3-68_DIV14_M_RDr1v1_B、3-68_DIV15_M_RDr1v1_B、3-68_DIV16_M_RDr1v1_B、3-68_DIV17_M_RDr1v1_B、3-68_DIV18_M_RDr1v1_B、3-68_DIV19_M_RDr1v1_B、3-68_DIV20_M_RDr1v1_B、3-68_DIV21_M_RDr1v1_B、3-68_DIV22_M_RDr1v1_B、3-68_DIV23_M_RDr1v1_B、3-68_DIV24_M_RDr1v1_B、3-68_DIV25_M_RDr1v1_B、3-68_DIV26_M_RDr1v1_B、3-68_DIV27_M_RDr1v1_B、3-68_DIV28_M_RDr1v1_B、3-68_DIV29_M_RDr1v1_B、3-68_DIV30_M_RDr1v1_B、3-68_DIV31_M_RDr1v1_B、3-68_DIV32_M_RDr1v1_B and 3-68_div33_m_rdr1v1_b (fig. 36). The construct with the highest editing level for any a residue within the spacer region was 3-68_div30_m_rdr1v1_b, where the maximum mid-target editing rate was 13.3% (n=2) (fig. 33). It is also notable that 3-68_div12_m_rdr1v1_b shows similar editing levels between A5 (5.86%) and a10 (6.18%), indicating that v12 can have a modified base editing window within the spacer region relative to other active ABEs. In addition to assessing target editing, cell viability was also assessed visually for each base editor/sgRNA co-transfection. Cells transfected with many constructs, including 3-68_div30_m_rdr1v1_b and 3-68_div12_m_rdr1v1_b, have high cell viability, while many cells transfected with N-terminal or C-terminal fused constructs have low cell viability.
EXAMPLE 31 engineering of adenosine deaminase
Since tRNA adenosine deaminase (TadA) from E.coli has been engineered to target DNA and improve base editing activity in mammalian cells, it is speculated that transplanting the analogous mutations noted to improve editing in EcTadA to MG68-4 (D109N) might increase deaminase activity. By investigation of the literature, ecTadA mutations from ABE7.10, ABE8.8m, ABE8.17m and ABE8e were collected. Equivalent residues on MG68-4 were resolved by multiple sequence and structural alignments. 22 rationally designed variants were generated on top of MG68-4 (D109N) and fused to the N-terminus of MG34-1 (D10A) (SEQ ID NOS: 1161-1183). To introduce the base editor into the nucleus, a Nuclear Localization Signal (NLS) was incorporated into the c-terminus of the enzyme. The effect of a dual NLS system (e.g., on both the N-and C-termini) on editing efficiency was evaluated (FIGS. 34A and 34B) (SEQ ID NOS: 1184-1186). Genes for the base editor and guide RNA were co-expressed by CMV and U6 promoters, respectively. In this experiment, a single plasmid carrying the desired editing components (SEQ ID NOS: 1187 and 1207) was transfected into HEK293T cells and editing efficiency was assessed by NGS. The results show that the first three apparent persons (RD 9, RD18 and RD 5) achieved a-to-G conversions of 27.4%, 26.6% and 23.8% on A8, respectively. When RD9 (MG 68-4 (D109N/T112R)) is compared with MGA1.1 (MG 68-4 (D109N)), a 45% increase in editing efficiency is obtained. The activity of the double NLS design is comparable to that of one NLS. Mga1.1_2nls achieved 11.4% conversion, which was lower than that of mga1.1 at 19.2% (fig. 35).
EXAMPLE 32 engineering CBE relaxes the sequence selectivity of CDA at the-1 position of the target cytosine and improves mid-target activity on DNA
Two mutagenesis methods were employed to improve the editing activity and selectivity of the Cytosine Base Editor (CBE). First, mutagenesis (point mutation) of cytidine deaminase is considered to alter the intrinsic DNA/RNA affinity, since it is assumed that low or medium editing efficiency of wild-type CBE and the nicking enzyme independent deamination event may be caused by the intrinsic DNA/RNA binding affinity of cytidine deaminase. Second, since loops adjacent to the active site have been identified as important for determining selectivity at the-1 position relative to the targeted cytosine in the relevant family of base editors (loop 7, kolhi et al journal of biochemistry (j. Biol. Chem.) 2009,284,22898-22904), experiments to exchange loop 7 sequences in a cytosine base editor have been considered.
The putative loop 7 of the novel cytidine deaminase described herein was predicted and identified using structure-based homology models of apodec 1 (Wolfe et al, (NAR cancer) 2020,2,1-15), AID (Kolhi et al, (journal of biochemistry) 2009,284,22898-22904) and apodec 3A (Shi et al, (Nat Struct Mol biol.)) 2017,24,131-139 in order to develop loop 7 exchange experiments to relax the sequence selectivity of these candidates. Several residues are also targeted for mutation to increase activity on DNA and decrease RNA activity (Yu et al, natural communication (Nature Communications) 2020,11,2052). A total of 108 CDA variants (with families MG93, MG139 and MG 152) were designed as loop 7 with point mutations or exchanged with AID deaminase which was recorded as having 5' RC selectivity (SEQ ID NOS: 1208-1315).
Table 12C: cytosine base editor mutants studied in example 32
/>
/>
/>
Example 33-in vitro Activity of novel CDA variants from the MG93, MG139 and MG152 families
In vitro deaminase in-gel assay
The linear DNA construct containing CDA was amplified by PCR from the previously mentioned plasmid from terweis company. All constructs were cleaned by SPRI purification (Lucigen Co.) and eluted in 10mM tris buffer. The enzyme was expressed from the PCR template in an in vitro transcription translation system PURExpress (NEB) at 37℃for 2 hours. Deamination reactions were prepared by mixing 2 μl of PURExpress' -FAM-labeled ssDNA (IDT) (4 different ssDNA substrates with different-1 nucleobases (a or C or T or G) next to target cytidine (SEQ ID NO:1316-1319; fig. 37) or with 0.5 μl Cy3 and Cy 5.5-labeled ssDNA (IDT, 2 different substrates, AC versus GC or CC versus TC, SEQ ID NO:1320-1321; fig. 38) and 1U USER enzyme (NEB) in 1x Cutsmart buffer (NEB)) the reactions were incubated at 37 ℃ for 2 hours and then quenched by adding 4 units of proteinase K (NEB) and incubated at 55 ℃ for 10 minutes the reactions were further treated by adding 11 μl of 2x RNA loading dye and incubated at 75 ℃ for 10 minutes by gel electrophoresis with all reaction conditions in a 10% denaturing gel (berk.1; fig. 38) and the bands were imaged by a fluorescent analyzer by a hybridization laboratory to a hybridization light of the hybridization laboratory (berk.39) and the intensity of the bands was quantified by a fluorescent light of the hybridization laboratory (berk.39).
Deamination of cytosine (C) is catalyzed by cytidine deaminase and produces uracil (U) with the base pairing properties of thymine (T). Most of the recorded cytidine deaminase acts on RNA, and a few examples of recorded recipient DNA require single stranded DNA (ssDNA). In vitro activity of 108 CDAs on 4 ssDNA substrates containing cytosine was measured in all four possible 5' -NC cases (fig. 37 and 38). The percent deamination of each nucleobase at the 1-nt position was also calculated to assess whether the selected mutation altered the sequence selectivity of the designed variant in vitro (figures 39 and 40). Notably, several variants showed more relaxed sequence base selectivity to the MG93 and MG139 families (fig. 39 and 40) and were selected for downstream in vivo mammalian cell activity as complete CBEs.
EXAMPLE 34 novel and engineered mammalian editing Activity of CDA as CBE
To test the activity of the novel CDA as well as engineered variants, engineered cell lines with 5 consecutive PAMs compatible with MG3-6 and Cas9 were designed. This cell line allowed gRNA to be plated to test editing efficiency and find-1 nt selectivity.
To test for novel and engineered CDA, CDA was cloned into the MG3-6 containing plasmid backbone. CDA clones are in the N-terminus. Once clones of the novel and variant CDA were confirmed, they were transiently transfected into engineered HEK293T cells using lipofectamine 2000. A total of 32 novel CDAs and 2 engineered variants (139-52-V6 and 93-4-V16) were tested in the gRNA tiling experiments described above (SEQ ID NO: 1322-1355). Of the 34 tested CDAs, 22 tested CDAs showed editing activity higher than 1% (fig. 41A). MG152-6、MG139-52v6、MG93-4、MG139-52、MG139-94、MG93-7、MG93-3、MG139-12、MG139-103、MG139-95、MG139-99、MG139-90、MG139-89、MG139-93、MG138-30、MG139-102、MG93-4v16、MG152-5、MG138-20、MG138-23、MG93-5、MG152-4 and MG152-1 perform best. When the editing activity was normalized to the positive control (high activity CDA recorded: A0A2K5RDN 7) according to experimental conditions, it was observed that the 9 candidates showed an activity of at least 20% of the activity of the A0A2K5RDN7 positive control (fig. 41B). Of these 9 candidates, 3 showed at least 50% of the activity of A0A2K5RDN 7; 139-52-V6, 152-6 and 139-52 show 95%, 65% and 60% activity, respectively. Fig. 41C shows a side-by-side comparison of 2 targeting spacers. As observed in FIG. 41C, 139-52-V6 shows substantially the same editing activity as A0A2K5RDN 7.
To characterize the-1 nt selectivity, 16 candidates of interest were selected. The-1 nt mammalian cell selectivity was calculated by selecting the first 4 modified cytosines of each guide RNA and calculating the ratio of each-1 position. Analysis was limited to cytosine with >1% editing. The average ratio of all 5 guides is plotted. The in vitro selectivity of-1 nt was plotted by summing the percent cleavage per-1 nt selectivity (percent cleavage measured as percent deamination) and then calculating the ratio of each-1 nucleotide. Mammalian cells and in vitro-1 nt selectivities are shown in figure 42. Notably, different CDA families are noted as having different-1 nt selectivities, and their selectivities tend to be conserved between proteins belonging to the same family. For example, the MG93 family is noted as selective for T-1, while the MG139 family is noted as selective for C-1. Importantly, active candidates were recorded with different-1 nt selectivities: 152-6 was selective for T at the-1 position, while 139-52 (WT and engineered variants) was strongly selective for C at the-1 position. Candidates with strong-1 nt selectivity are advantageous because having tighter nt selectivity improves target activity. Candidates with different and strong-1 nt selectivity allow targeting of different loci with minimal off-target activity. Notably, candidates with aberrant-1 selectivity were identified. Candidates with purine selectivity include 139-12 and 138-20 with A and G selectivity. These properties can produce variants with G and/or a-1 selectivity with high editing efficiency.
Candidates 139-52 were recorded as having deaminase activity on both ssDNA and on the DNA strand forming the DNA/RNA heteroduplex (also shown in fig. 43B). Having exclusive activity in DNA forming a DNA/RNA heteroduplex may be advantageous in terms of guide-dependent off-target activity and smaller editing windows, as such engineering of this feature is an important place. When the 139-52-V6 mutant was generated, it was interesting to eliminate deaminase activity in the DNA/RNA heteroduplex, revealing the potential importance of this residue for such activity.
139-52-V6, 152-6 and 139-52 candidates have high editing efficiency (FIGS. 41A, 41B and 41C) and different-1 nt selectivities (FIG. 42). To further characterize it, it was analyzed how wide its targeting window was related to the R-ring (spacer targeting). 2 of the 3 candidates (152-6 and 139-52-V6) showed a tighter editing window when compared to the high editing positive control A0A2K5RDN7 (fig. 44). Having a tighter editing window may help prevent off-target activity. The engineered candidate 139-52-V6 has a smaller edit window than its WT counterpart (fig. 44), revealing the importance of this mutation. The mutation improves the mid-target editing efficiency (fig. 41A and 41B) while narrowing the editing window (fig. 44).
In addition, cytotoxicity of all CDA candidates was measured by stably expressing the candidates in mammalian cells by lentiviral transduction. Each CDA candidate was cloned as a CBE (using MG3-6 as a partner), lentiviruses were generated, and cells transduced. Cells for viral integration and CBE expression were selected by puromycin selection 3 days after transduction. Puromycin cassette downstream of CBE with 2A peptide; thus, the surviving selected cells express CBE. Surviving cells were stained with crystal violet, then the crystal violet was dissolved with SDS, and absorbance was obtained in a microplate reader. Different CDAs were determined to have different levels of cytotoxicity (fig. 45). 139-52-V6, 152-6 and 139-52 candidates showed promising cytotoxicity curves under these conditions. It is expected that this effect may be greatly diminished when the candidate is transiently expressed.
Example 35-Low Activity CDA Using a nicking enzyme with improved target binding affinity (prophetic)
Analysis of the edit window and cytotoxicity profile demonstrated that it may be advantageous to use CDA with slower deamination kinetics in combination with effector enzymes with higher residence times in the target. To create such a system, long forms of tracr RNA (see, e.g., workman et al, cell 2021,184,675-688, incorporated herein by reference in its entirety) are used in gRNA with CDA having various kinetics (low, medium, and high). These systems can increase the mid-target editing efficiency of low and medium CDAs while producing a narrower editing window and more favorable cytotoxicity profile.
EXAMPLE 36 engineering of adenine deaminase (prophetic)
In order to improve the on-target activity on ssDNA and minimize the non-guided deamination of cellular RNAs, all beneficial mutations previously identified from directed evolution in rational design and literature were used to design novel Adenine Deaminase (ADA) variants from the novel deaminase family (MG 129-MG137 and MG68 families, SEQ ID NOs: 1556-1638).
Table 12D: the adenosine deaminase mutant designed in example 36
/>
/>
/>
/>
/>
/>
In vitro Activity of novel ADA variants from the MG129-MG137 and MG68 families
In vitro deaminase in-gel assay
The linear template of the candidate deaminase was amplified by PCR using a plasmid from terweis company. The product was cleaned using SPRI beads (Lucigen Co.) and eluted in 10mM tris. The enzyme was then expressed in PURExpress (NEB) hours at 37 ℃. Deamination was prepared by mixing PURExpress reaction (2. Mu.L) with 10. Mu.M DNA substrate (IDT, SEQ ID NO: 1645) labeled with Cy5.5, 1U endoV (NEB) and 10X NEB4 buffer. The reaction was incubated at 37℃for 20 hours. Samples were quenched by addition of 4 units of proteinase K (NEB) and incubation at 55℃for 10 min. The reaction was further treated by adding 11 μl of 2x RNA-loaded dye and incubated at 75 ℃ for 10 minutes. All reaction conditions were analyzed by gel electrophoresis in a 10% (TBE-urea) denaturing gel (Berle Corp.). The DNA bands were visualized by Chemi-Doc imager (burle) and band intensities were quantified using burle image lab v 6.0. Successful deamination was observed by visualization of the intermediate fluorescent-labeled bands in the gel.
In vitro deamination screening based on in vitro NGS
The linear template of the candidate deaminase was amplified by PCR using a plasmid from terweis company. The product was cleaned using SPRI beads (Lucigen Co.) and eluted in 10mM tris. The enzyme was then expressed in PURExpress (NEB) hours at 37 ℃. Deamination was prepared by mixing PURExpress reaction (2. Mu.L) with 250nM single-stranded DNA substrate (IDT, SEQ ID NO: 1646) and 1U NEB4 buffer. The reaction was incubated at 37℃for 2 hours. The reaction was quenched by incubation at 95 ℃ for 10 minutes, 90 μl of water was added at 95 ℃ and placed on ice for 2 minutes. Each PCR reaction (oligo IDT) used 1. Mu.L of digestion reaction. The reaction was then cleaned using column purification (Zymo), eluted in 10mM tris, and sequenced.
EXAMPLE 37 engineering of ABE Using MG34-1 (D10A) nicking enzyme
Plasmid construction
DNA fragments of the genes were synthesized at Tevister Biotech or Integrated DNA Technology (IDT). Plasmid DNA was amplified in Endura electrocompetent cells (Lucigen corporation) and isolated by QIAPREP SPIN MINIPREP kit (qiagen corporation). The vector backbone was prepared by restriction enzyme digestion of the plasmid. Inserts were amplified by Q5 high-fidelity DNA polymerase (new england biological laboratory) using primers ordered from either the tourmaline biomedical company or IDT. Both the vector backbone and the insert were purified by gel extraction using a gel DNA recovery kit (Zymo research). One or more DNA fragments were assembled into a vector by NEBuilder HiFi DNA assembly (new england biology laboratory). The plasmid sequences for expressing MG34-1 (D10A) adenine base editor and sgRNA are shown in SEQ ID NO. 1422.
Cell culture, transfection, next generation sequencing and base editing analysis
HEK293T cells were grown and passaged with 5% CO 2 at 37 ℃ in duchenne's modified i-medium plus GlutaMAX (Ji Boke) supplemented with 10% (v/v) fetal bovine serum (Ji Boke). 2.5X10 4 cells (3 rd to 8 th generation) were seeded on 96-well cell culture plates (Ke Shi to company) treated for cell adhesion, grown for 20 to 24 hours, and the spent medium was refreshed with fresh medium prior to transfection. For the two plasmid system, 300ng of expression plasmid and 100ng of guide plasmid were transfected per well using 1. Mu.L lipofectamine 2000 (Semer Feiche technologies Co.) according to the manufacturer's instructions. For the single plasmid system, 1. Mu.L lipofectamine was used to transfect 300ng of plasmid carrying the base editor gene and guide RNA. Transfected cells were grown for 3 days, harvested, and gDNA extracted with QuickExtract (Lucigen Co.) according to the manufacturer's instructions. The base edited targeting region was amplified with primers using Q5 high fidelity DNA polymerase (new england biosystems) and the extracted DNA was used as template. The PCR product was purified by HIGHPREP PCR clean up system (MAGBIO company) according to the manufacturer's instructions. After 72 hours, cell viability of individual wells was assessed visually based on cell growth and the presence of floating cells in the medium. After visual assessment of cell viability, cells were harvested and genomic DNA was extracted. PCR primers suitable for NGS-based DNA sequencing were generated, optimized, and used to amplify separate individual target sequences for each guide RNA. Amplicons were sequenced on a MiSeq machine from henna and analyzed with proprietary Python scripts to measure gene editing.
Results
MG68-4 is predicted to be a tRNA adenosine deaminase. Since the native enzymes of both E.coli TadA (EcTadA) and Staphylococcus aureus TadA (SaTadA) were dimers, MG68-4 was also suspected to be a dimer. Protein fusion using engineered EcTadA homodimers has been shown to improve editing efficiency (Gaudelli, n.m. et al, programmable base editing of AT to GC in genomic DNA without DNA cleavage (Programmable base editing of AT to GC in genomic DNA without DNA CLEAVAGE) & nature 2017,551,464-471). Thus, a series of MG68-4 (D109N) homodimers was designed and fused to MG34-1 (D10A). To design the linker between the two monomers, visual Molecular Dynamics (VMD) was used to estimate the length between the N-terminus of the first monomer and the C-terminus of the second monomer (Humphrey, W. Et al, VMD-visual molecular dynamics (VMD-Visual Molecular Dynamics), "J. Molecular graphics & modeling (J. Mol. Graph.))," 1996,14,33-38), and model suggestion 5.2nm (FIG. 46A). The fusion was optimized by varying the linker length in the range of 32 to 64 amino acids and contained a negative control with 5 amino acids (SEQ ID NO: 1356-1362). The results show that the optimal linker length is 64 amino acids, which may provide sufficient flexibility to accommodate the distance between monomers. Through this optimized linker, an increase in 87% editing was obtained compared to the monomer design of MG68-4 fused with nMG34-1 (D109N) (fig. 46B).
Previously, when guide 633 (SEQ ID NO: 1416) was used, MG68-4 (D109N) -nMG34-1 (D10A) was observed to have C-to-G edits at the sixth position. To reduce confounding activity on cytosine, the method used by Jeong was applied (Jeong, y.k. et al, adenine base editor engineering reduced the editing of bystanders cytosine (Adenine base editor engineering reduces editing of bystander cytosines) & Nature Biotechnology 2021,39,1426-1433), where Q was installed at the D108 position in EcTadA. By incorporating Q into the D109 position of MG68-4, ABE shows a 64% reduction in C-to-G editing at the C6 position using wizard 633, while maintaining comparable A-to-G editing at the A8 position using wizard 634 (SEQ ID NO: 1417). To increase editing efficiency, two beneficial mutations (H129N and D7G/E10G) were incorporated with D109Q. The results showed that the editing efficiency of the new mutant was reduced, indicating the incompatibility of the mutations (SEQ ID NO: 1639-1644) (FIG. 47).
EXAMPLE 38 engineering of ABE Using nMG3-6/3-8 (D13A) nicking enzyme
Plasmid construction
DNA fragments of the genes were synthesized at Tevister Biotech or Integrated DNA Technology (IDT). Plasmid DNA was amplified in Endura electrocompetent cells (Lucigen corporation) and isolated by QIAPREP SPIN MINIPREP kit (qiagen corporation). The vector backbone was prepared by restriction enzyme digestion of the plasmid. Inserts were amplified by Q5 high-fidelity DNA polymerase (new england biological laboratory) using primers ordered from either the tourmaline biomedical company or IDT. Both the vector backbone and the insert were purified by gel extraction using a gel DNA recovery kit (Zymo research). One or more DNA fragments were assembled into a vector by NEBuilder HiFi DNA assembly (new england biology laboratory). The plasmid sequences for expression of nMG3-6/3-8 adenine base editor and sgRNA are shown in SEQ ID NO. 1423.
Cell culture, transfection, next generation sequencing and base editing analysis
HEK293T cells were grown and passaged with 5% CO 2 at 37 ℃ in duchenne's modified i-medium plus GlutaMAX (Ji Boke) supplemented with 10% (v/v) fetal bovine serum (Ji Boke). 2.5X10 4 cells (3 rd to 8 th generation) were seeded on 96-well cell culture plates (Ke Shi to company) treated for cell adhesion, grown for 20 to 24 hours, and the spent medium was refreshed with fresh medium prior to transfection. For the two plasmid system, 300ng of expression plasmid and 100ng of guide plasmid were transfected per well using 1. Mu.L lipofectamine 2000 (Semer Feiche technologies Co.) according to the manufacturer's instructions. For the single plasmid system, 1. Mu.L lipofectamine was used to transfect 300ng of plasmid carrying the base editor gene and guide RNA. Transfected cells were grown for 3 days, harvested, and gDNA extracted with QuickExtract (Lucigen Co.) according to the manufacturer's instructions. The base edited targeting region was amplified with primers using Q5 high fidelity DNA polymerase (new england biosystems) and the extracted DNA was used as template. The PCR product was purified by HIGHPREP PCR clean up system (MAGBIO company) according to the manufacturer's instructions. After 72 hours, cell viability of individual wells was assessed visually based on cell growth and the presence of floating cells in the medium. After visual assessment of cell viability, cells were harvested and genomic DNA was extracted. PCR primers suitable for NGS-based DNA sequencing were generated, optimized, and used to amplify separate individual target sequences for each guide RNA. Amplicons were sequenced on a MiSeq machine from henna and analyzed with proprietary Python scripts to measure gene editing.
Results
Two mutants (D109N/D7G/E10G and D109N/H129N) were observed to have higher editing A to G efficiency in HEK293T cells than the D109N mutant by directed evolution of the predicted tRNA adenosine deaminase by MG68-4 (D109N) -nMG34-1 (D10A) in E.coli. Five mutants fused to MG34-1 (D10A) (V83S, L85F, T R, D R and A155R) were observed to be beneficial over the D109N mutation by rational design of AT to GC programmable base editing in genomic DNA without DNA cleavage based on reported EcTadA mutations (Gaudelli, N.M. et al; nature 2017,551,464-471; gaudelli N.M. et al; directed evolution of adenine base editor with enhanced activity and therapeutic applications (Directed evolution of adenine base editors WITH INCREASED ACTIVITY AND therapeutic application); nature Biotechnology 2020,38,892-900; and Richter M.F. et al; phage assisted evolution (Phage-assisted evolution of an adenine base editor with improved Cas domain compatibility and activity)." Nature Biotechnology with improved Cas domain compatibility and activity 2020,38,883-891). All identified mutations were combined and the combinatorial library was designed to interrogate the enzymatic properties of adenosine deaminase (Table 13) (SEQ ID NO: 1363-1409).
Table 13: mutations installed in a combinatorial library of MG 68-4. All Mg68-4 variants were inserted into 3-68_div30_m_rdr1v1_b
All variants were inserted into the 3-68_div30_m nicking enzyme chassis, where 3-68, DIV and M represent MG3-6/3-8 nicking enzyme, the domain insert 30 and the monomer, respectively. Screening of the resulting ABE revealed 27 variants superior to CL2 (MG 68-4 (D109M)). The highest editing efficiency was observed when V83S/L85F/D109N was combined together, and the effect of improving editing was supported by the increased activity of V83S/D109N and L85F/D109N observed in CL4 and CL5, respectively. In addition to CL16, CL22 also exhibits high editing efficiency. In this variant, the mutation of V83S was replaced by T112R in the V83S/L85F/D109N triple mutant (FIG. 48).
To increase the A to G base editing percentage of the 3-68_DIV30_M adenine base editor, a 3-68_DIV30_D ABE was designed in which two MG68-4 (D109N) monomers were linked by a 65AA linker and embedded within a 3-68 scaffold at the same V30 insertion site as 3-68_DIV30_M (SEQ ID NO: 1410-1411). This dimeric form of 3-68ABE increases editing of the sgRNA68 at position A10 of the internal site of the TRAC gene from 8% (3-68_DIV30_M) to 18% (3-68_DIV30_D) when co-transfected with a plasmid expressing the sgRNA68 (SEQ ID NO: 1421). The effect of two different MG68-4 variants (H129N or D7G/E10G) on 3-68_DIV30_M and 3-68_DIV30_D, which already contained D109N, was also tested (SEQ ID NO: 1412-1415). For 3-68_DIV30_D, the H129N or D7G/E10G mutation is installed in the second MG 68-4D 109N, and the first deaminase retains MG 68-4D 109N. Error-prone PCR libraries of MG68-4 fused to MG34-1 were used to identify H129N and D7G/E10G variants, and A-to-G conversions in E.coli were selected. After addition of either the H129N or D7G/E10G variants to both monomer and dimer MG 68-4D 109N, the edits were slightly lower compared to the equivalent monomer/dimer form of 3-68_DIV30MG68-4D 109N ABE (FIG. 49).
Example 39 engineering of nMG35-1 as a base editor
Coli selection
Nicking enzyme MG35-1 containing the D59A mutation of TadA X- (7.10) monomer with C-terminal fusion and C-terminal SV40 NLS was constructed to test MG35-1 Adenine Base Editor (ABE) activity (SEQ ID NO: 1424-1426). This ABE (SEQ ID NOS: 1429-1430) was tested with its compatible sgRNA containing either a 20 nucleotide spacer sequence targeting the Chloramphenicol Acetyl Transferase (CAT) gene or a non-targeting spacer sequence containing the same 20 nucleotides in an out of order. The CAT gene contains an H193Y mutation that renders the CAT gene ineffective for chloramphenicol selection. ABE, sgRNA and nonfunctional CAT genes were cloned into pET-21 backbone containing ampicillin resistance. For both constructs, 10ng of plasmid was transformed into 25. Mu.L of BL21 (DE 3) (Lucigen) E.coli cells and the cells and cells were shaken in 450. Mu.L of recovery medium for 90 min at 37 ℃. Next, 70. Mu.L of the recovery medium containing the transformed cells was plated onto plates containing chloramphenicol at concentrations of 0, 2, 3,4, and 8. Mu.g/mL. A0. Mu.g/mL plate was used as a transformation control. The plate also contained 100. Mu.g/mL card Bei Xilin and 0.1mM IPTG. The plates were left at 37℃for 40 hours. Colonies were sequenced by tourmaline biomedical company.
Results
To determine whether the SMART II enzyme could be used as a base editor, an Adenine Base Editor (ABE) was constructed by fusing TadA X- (7.10) monomer to the C-terminus of the nickase form of MG35-1 containing the D59A mutation (SEQ ID NO: 1424). The A to G editing of this ABE was tested in a positive selection single plasmid E.coli system in which the ABE requires reversion of the Chloramphenicol Acetyl Transferase (CAT) gene containing the Y193 mutation back to H193 in order for E.coli cells to survive chloramphenicol selection. This plasmid contains sgrnas with spacers or disordered non-targeting spacers that target the mutant CAT gene. When plated on plates containing 2, 3 and 4. Mu.g/mL chloramphenicol, enrichment of colonies was detected with E.coli transformed with MG35-1 ABE targeting the CAT gene, whereas no colonies grew on plates containing 8. Mu.g/mL chloramphenicol. Mulberry sequencing confirmed that 26/30 colonies selected from 2, 3 and 4. Mu.g/mL plates transformed with targeting MG35-1 ABE contained the expected Y193H reversals. It is possible that 4 colonies without the inverted CAT sequence contain more copies than the edited copy of the selection construct, since one inverted CAT gene is sufficient to confer colony survival. Colonies were not seen on 2, 3, 4 and 8. Mu.g/mL plates plated with E.coli transformed with non-targeted MG35-1 ABE. When 0 μg/mL conditions were used as transformation controls, mulberry sequencing found that 1/10 colonies selected from 0 μg/mL plates transformed with targeting MG35-1 ABE contained Y193H reversals, indicating detectable levels of editing without chloramphenicol selection. Enrichment of colony growth from chloramphenicol selection targeting MG35-1 ABE conditions with reverse CAT gene Y193H demonstrated that MG35-1 nickase could act as ABE in E.coli cells (FIG. 50).
EXAMPLE 40 guide screening of nMG3-6/3-8ABE in mouse hepatocytes
Cell culture, transfection, next generation sequencing and base editing analysis for screening
Hepa1-6 cells were grown and passaged at 37℃with 5% CO 2 in Du's modified Italian medium plus 1 XNEAA (Ji Boke) supplemented with 10% (v/v) fetal bovine serum (Ji Boke Co.) and 1% pen-strep. 1X10 5 cells were subjected to nuclear transfection with 500ng IVT mRNA and 150pmol chemically synthesized sgRNA (IDT) using the Dragon-4D nuclear transfection (program EH-100). Cells were grown for 3 days, harvested, and gDNA extracted with QuickExtract (Lucigen Co.) according to the manufacturer's instructions. The base edited targeting region was amplified using a Q5 high fidelity DNA polymerase (New England Biolabs) with primers suitable for use with NGS-based DNA sequencing (SEQ ID NO: 1493-1554) and the extracted DNA was used as a template. The PCR product was purified by HIGHPREP PCR clean up system (MAGBIO company) according to the manufacturer's instructions. Amplicons were sequenced on a MiSeq machine from henna and analyzed with proprietary Python scripts to measure gene editing.
MRNA production
The sequence of the base editor mRNA was codon optimized for human expression (GeneArt) and then synthesized and cloned into a high copy ampicillin plasmid (Tevister Biotech). The synthetic construct encoding the T7 promoter, UTR, base editor ORF and NLS sequences was digested from the Tevis backbone with HindII and BamHI (NEB) and ligated into the pUC19 plasmid backbone (SEQ ID NO: 1555) using T4 DNA ligase and 1x reaction buffer (NEB). Complete base editor mRNA plasmids include the origin of replication, ampicillin resistance cassette, synthetic construct and encoded polyA tail. Base editor mRNA was synthesized by In Vitro Transcription (IVT) using a linearized base editor mRNA plasmid. This plasmid was linearized by incubation with SapI (NEB) enzyme for 16 hours at 37 ℃. The linearization reaction included a 50. Mu.L reaction containing 10. Mu.g pDNA, 50 units of Sap I and 1 Xreaction buffer. The linearized plasmid was purified with phenol: chloroform: isoamyl alcohol (25:24:1, v/v), precipitated in EtOH, and resuspended in nuclease-free water at a adjusted concentration of 500 ng/. Mu.L. The IVT reaction to produce base editor mRNA was performed at 50 ℃ for 1 hour under the following conditions: 1 μg of linearized plasmid; 5mM ATP, CTP, GTP (NEB) and Nl-methyl pseudo-UTP (TriLink); 18750U/mL Hi-T7 RNA polymerase (NEB); 4mM CleanCap AG (trigeminy); 2.5U/mL inorganic E.coli pyrophosphatase (NEB); 1000U/mL murine RNase inhibitor (NEB); and 1x transcription buffer. After 1 hour, IVT was stopped and plasmid DNA was digested by the addition of 250U/mL DnaseI (NEB) and incubated for 10 minutes at 37 ℃. Purification of base editor mRNA was performed using the Rneasy Maxi kit (Kaij Co.) using standard manufacturer's protocols. Transcript concentration was determined by UV (NanoDrop) and further analyzed by capillary gel electrophoresis on a fragment analyzer (agilent).
Results
To test the activity of the engineered dimeric form of 3-68ABE described above, guides targeting the 527MG3-6/3-8 chemical synthesis of four treatment-related loci in the mouse genome were designed and purchased from IDT. These guides were co-transfected with in vitro synthesized mRNA in Hepa1-6 (mouse immortalized mouse hepatocyte cell line) by nuclear transfection, and a to G conversion was determined three days after nuclear transfection. The guides are ordered by total deamination percentage within the spacer regions, and more in-depth analysis of active guides is limited to guides with deamination >80% within the spacer and with a large number of NGS reads. In summary, more than 10% deamination of the total spacer A to G was observed at 31 different guides across the three loci (SEQ ID NOS: 1431-1492; FIGS. 51-53), two of which showed 89% and 95% conversion rates (Apoa 1D 11 and Apoa 1F 12, respectively).
Table 13A: wizard sequence used in example 40
/>
/>
/>
/>
/>
R=natural ribobase, m=2 '-O methyl modified base, f=2' fluoro modified base, =phosphorothioate linkage
Although the pattern of base conversion varies across the spacer, detectable conversion was observed across the edits of A4 to a 15. To assess the background of these genomic regions, NGS primer pairs for experimental samples were used in samples that mimic nuclear transfection and showed background conversion as low as undetectable (0-0.12%) (fig. 54). In summary, the engineered dimer 3-68ABE exhibits high editing activity at three independent loci in mammalian cells and across a large number of guides.
EXAMPLE 41 mRNA cytidine base editor
To test the activity of engineered cytidine deaminase on a large scale, 527 synthetic guides were designed that were suitable for use with MG3-6/3-8 to target four therapeutically relevant loci in the mouse genome and purchased from IDT. These guides were co-transfected with in vitro synthesized mRNA in Hepa1-6 (mouse immortalized mouse hepatocyte cell line) by nuclear transfection, and C to T conversion was determined three days after nuclear transfection. Prior to harvest, cell viability of individual wells was assessed visually based on cell growth and the presence of floating cells in the medium. In contrast to the mimetic samples, 3-68 152-6CBE showed no significant cytotoxicity.
Cell culture, transfection, next generation sequencing and base editing analysis for screening (prophetic)
Hepa1-6 cells were grown and passaged at 37℃with 5% CO 2 in Du's modified Italian medium plus 1 XNEAA (Ji Boke) supplemented with 10% (v/v) fetal bovine serum (Ji Boke Co.) and 1% pen-strep. 1X10 5 cells were subjected to nuclear transfection with 500ng IVT mRNA and 150pmol chemically synthesized sgRNA (IDT) using the Dragon-4D nuclear transfection (program EH-100). Cells were grown for 3 days, visually assessed for viability, harvested, and gDNA extracted with QuickExtract (Lucigen) according to manufacturer's instructions. The base edited targeting region was amplified using Q5 high fidelity DNA polymerase (new england biosystems) with primers suitable for use with NGS-based DNA sequencing, and the extracted DNA was used as a template. The PCR product was purified by HIGHPREP PCR clean up system (MAGBIO company) according to the manufacturer's instructions. Amplicons were sequenced on a MiSeq machine from henna and analyzed with proprietary Python scripts to measure gene editing.
Example 42-base editing preference of nMG35-1 ABE
Coli was transformed with a plasmid containing nMG35-1-ABE, a non-functional chloramphenicol acetyl transferase (CAT Y193) gene, and sgrnas targeting the CAT gene (targeting spacer) or not targeting the CAT gene (disorder spacer) as described in example 39. Cell growth was dependent on ABE bases editing the nonfunctional CAT gene (from a at position 17 of TAM) (fig. 55A) into its wild-type variant (H193) and restoring activity. Multiple linkers were evaluated for fusion to nMG35-1 of TadA deaminase monomer (table 14).
Table 14: the linker was assessed for fusion to nMG35-1 of TadA deaminase.
Results
Base editing was tested in an E.coli positive selection assay targeting Chloramphenicol Acetyl Transferase (CAT) gene expressed from the same plasmid co-expressing MG35-1 ABE containing the various linkers. In the base editing experiments, the nMG35-1 ABE construct with the 17 amino acid linker (XTEN) was superior to the other linkers (FIGS. 55B-55E). In addition, when analyzing adenine positions across the targeting spacer edited by nMG35-1 ABE, the highest editing level in escherichia coli was shown at position 9 (in the middle of the spacer region) (fig. 55D).
EXAMPLE 43 nMG35-1 ABE editing of additional target sites in E.coli
E.coli positive selection
A single plasmid construct encompassing nicking enzyme MG35-1 (D59A mutation), C-terminal fused TadA x- (7.10) monomer and C-terminal SV40 NLS (SEQ ID NO: 369) was tested as a base editor with a compatible sgRNA containing a 20bp spacer sequence targeting the Chloramphenicol Acetyl Transferase (CAT) gene, as described in example 39. Non-targeting sgrnas lacking spacer sequences served as negative controls. The CAT gene contains an engineered stop codon (at amino acid position 98 or 122) or an H193Y mutation that renders the CAT gene inoperative (FIGS. 56A and 56B). The ABE construct, sgRNA and nonfunctional CAT gene were cloned into the ampicillin resistant pET-21 backbone. Ten ng of plasmid was transformed into 25. Mu.L of BL21 (DE 3) (Lucigen Co.) E.coli cells and incubated in 450. Mu.L of recovery medium for 90 min at 37 ℃. Next, 70. Mu.L of the recovery medium containing the transformed cells was plated onto plates containing chloramphenicol at concentrations of 0, 2, 3,4, and 8. Mu.g/mL. A0. Mu.g/mL plate was used as a transformation control. The plate also contained 100. Mu.g/mL card Bei Xilin and 0.1mM IPTG. The plates were left at 37℃for 40 hours. CAT mutations in the colonies obtained were verified by Sanger sequencing (with the company of Lelin Biomedicine).
Results
The A-to-G editing of nMG35-1 ABE was tested in a positive selection single plasmid E.coli system in which ABE was required to reverse the Chloramphenicol Acetyl Transferase (CAT) gene termination codon mutation back to glutamine or the tyrosine mutation back to histidine (FIGS. 56A and 56B) for survival and growth of E.coli under chloramphenicol selection. Four different non-functional CAT gene reversals were tested by nMG35-1 ABE: three single mutations (the stop codon at residue 98 was inverted to Q; the stop codon at residue 122 was inverted to Q; and the Y at residue 193 was inverted to H) and a double mutation in which the CAT gene contained two stop codons at both residues 98 and 122 (both required simultaneous inversion to Q to restore CAT gene function). These four conditions were tested together with a paired negative control in which the nonfunctional CAT gene was co-expressed with sgRNA lacking the spacer sequence. The nMG35-1 ABE successfully edited four conditions containing double mutant reversals as shown by enrichment of E.coli colonies when grown on plates containing 2 and 4. Mu.g/mL chloramphenicol (FIG. 56C, "targeting" row). A small number of colonies were also grown on plates containing 8. Mu.g/mL chloramphenicol for reversing the individual stop codon mutations at residues 98 and 122 (FIG. 56C, "targeting" row). Mulberry sequencing of colonies grown on 2. Mu.g/mL plates from CAT double mutant reversal determined that 17 of the 18 colonies showed the expected A to G edits at both target sites (FIG. 56D). Colonies were not seen on 2,4 and 8. Mu.g/mL plates plated with E.coli transformed with non-targeting guide (FIG. 56C, "no spacer" row), confirming that nMG35-1-ABE is a successful base editor in E.coli.
When the predicted 3D structure of MG35-1 is aligned with the cryEM structure of IscB nuclease (PDB: 7 UTN), the PLMP domain of IscB is aligned with Amino Acid (AA) positions 1-53 of MG 35-1. The AA 1-53 deleted nicking enzyme nMG35-1 ABE was tested in a bacterial positive selection assay in which ABE was required to reverse the Y193 mutation in the CAT gene to H to restore CAT function (figure 57). When these AA were truncated from nMG35-1 ABE, E.coli was unable to survive chloramphenicol selection at a minimum inhibitory concentration of 2. Mu.g/mL. These results indicate that AA 1-53 of MG35-1 drives efficient base editing of MG35-1 ABE in E.coli cells.
EXAMPLE 44 base editing in human cells with nMG35-1-ABE (prophetic)
To demonstrate that the nMG35-1-ABE system was capable of base editing in human cells, a nicking enzyme MG35-1 (D59A mutation), a C-terminal fused TadA (8.8 m) deaminase monomer and a C-terminal SV40 NLS fusion system were constructed. HEK293T cells were grown and passaged with 5% CO 2 at 37 ℃ in duchenne's modified i-medium plus GlutaMAX (Ji Boke) supplemented with 10% (v/v) fetal bovine serum (Ji Boke). About 2.5x10 4 cells were seeded onto 96-well cell culture plates (Ke Shi to company) that treated cell adhesion and grown for 20 to 24 hours (spent medium was refreshed with fresh medium prior to transfection). Each plate well received 300ng of expression plasmid and 1. Mu.L of lipofectamine 2000 (Sesameisier technologies) for transfection according to the manufacturer's instructions. Transfected cells were grown for three days, harvested, and genomic DNA was extracted using QuickExtract (Lucigen) according to the manufacturer's instructions. The targeting region for base editing was amplified using Q5 high fidelity DNA polymerase (new england biosystems), wherein the target specific primers and PCR products were purified using HIGHPREP PCR cleaning system (MAGBIO company) according to the manufacturer's instructions. For analysis of nMG35-1-ABE base editing in human cells, adaptors for Next Generation Sequencing (NGS) were attached to the PCR products by a subsequent PCR reaction using KAPA HiFi HotStart ReadyMix PCR kit (roche) and primers compatible with TruSeq DNA library preparation kit (henna). The DNA concentration of the resulting product was quantified by TapeStation (agilent) and the samples pooled to prepare a library for NGS analysis. The resulting library was quantified by qPCR with the Aria real-time PCR system (agilent) and high throughput sequencing was performed with the company Miseq instrument according to the manufacturer's instructions.
While preferred embodiments of the present invention have been shown and described herein, it will be obvious to those skilled in the art that such embodiments are provided by way of example only. The present invention is not intended to be limited to the specific embodiments provided in the specification. While the invention has been described with reference to the foregoing specification, the descriptions and illustrations of the embodiments herein are not intended to be in a limiting sense. Numerous variations, changes, and substitutions will now occur to those skilled in the art without departing from the invention. Furthermore, it is to be understood that all aspects of the invention are not limited to the specific descriptions, configurations, or relative proportions set forth herein, depending on various conditions and variables. It should be understood that various alternatives to the embodiments of the invention described herein may be employed in practicing the invention. Accordingly, it is contemplated that the present invention likewise encompasses any such alternatives, modifications, variations or equivalents. The following claims are intended to define the scope of the invention and the method and structure within the scope of these claims and their equivalents.
Examples
The following examples are not intended to be limiting in any sense.
Embodiment 1. An engineered nucleic acid editing system comprising:
(a) An endonuclease comprising a RuvC domain and an HNH domain, wherein the endonuclease is derived from an uncultured microorganism, wherein the endonuclease is a type 2 II endonuclease, wherein the endonuclease is configured to lack nuclease activity;
(b) A base editor coupled to the endonuclease; and
(C) An engineered guide ribonucleic acid structure configured to form a complex with the endonuclease, the engineered guide ribonucleic acid structure comprising:
i. a guide ribonucleic acid sequence configured to hybridize to a target deoxyribonucleic acid sequence; and
A ribonucleic acid sequence configured to bind to the endonuclease.
Example 2. The engineered nucleic acid editing system of example 1, wherein the RuvC domain lacks nuclease activity.
Embodiment 3. The engineered nucleic acid editing system of embodiment 1, wherein the endonuclease is configured to cleave one strand of a double stranded target deoxyribonucleic acid.
Embodiment 4. The engineered nucleic acid editing system of embodiment 1 or embodiment 2, wherein the type 2 type II endonuclease comprises a nicking enzyme mutation.
Embodiment 5. The engineered nucleic acid editing system of any of embodiments 1 to 4, wherein the endonuclease comprises a sequence having at least 95% sequence identity to any of SEQ ID NOs 70-78 or 597 or variants thereof.
Embodiment 6. The engineered nucleic acid editing system of any of embodiments 1 to 5, wherein when optimally aligned, the class 2 type II endonuclease comprises a mutation from aspartic acid to alanine at: residue 9 relative to SEQ ID NO. 70, residue 13 relative to SEQ ID NO. 71, 72 or 74, residue 12 relative to SEQ ID NO. 73, residue 17 relative to SEQ ID NO. 75, residue 23 relative to SEQ ID NO. 76 or residue 10 relative to SEQ ID NO. 597.
Embodiment 7. The engineered nuclease system of any one of embodiments 1-5, wherein when optimally aligned, the endonuclease comprises a mutation from aspartic acid to alanine at: residue 9 relative to SEQ ID NO. 70, residue 13 relative to SEQ ID NO. 72 or residue 17 relative to SEQ ID NO. 75.
Example 8. An engineered nucleic acid editing system comprising:
(a) An endonuclease having at least 95% sequence identity to any one of SEQ ID NOs 70-78, 596 or 597-598 or variants thereof;
(b) A base editor coupled to the endonuclease; and
(C) An engineered guide ribonucleic acid structure configured to form a complex with the endonuclease, the engineered guide ribonucleic acid structure comprising:
i. a guide ribonucleic acid sequence configured to hybridize to a target deoxyribonucleic acid sequence; and
A ribonucleic acid sequence configured to bind to the endonuclease.
Example 9 an engineered nucleic acid editing system comprising:
(a) An endonuclease configured to bind to a Protospacer Adjacent Motif (PAM) sequence comprising any one of SEQ ID NOS: 360-368 or 598 or variants thereof,
Wherein the endonuclease is a type 2 type II endonuclease, and
Wherein the endonuclease is configured to lack nuclease activity;
(b) A base editor coupled to the endonuclease; and
(C) An engineered guide ribonucleic acid structure configured to form a complex with the endonuclease, the engineered guide ribonucleic acid structure comprising:
i. a guide ribonucleic acid sequence configured to hybridize to a target deoxyribonucleic acid sequence; and
A ribonucleic acid sequence configured to bind to the endonuclease.
Embodiment 10. The engineered nucleic acid editing system of embodiment 9, wherein the endonuclease comprises a nicking enzyme mutation.
Embodiment 11. The engineered nucleic acid editing system of embodiment 9, wherein the endonuclease is configured to cleave one strand of a double-stranded target deoxyribonucleic acid.
Example 12. The engineered nucleic acid editing system of example 9, wherein when optimally aligned, the class 2 type II endonuclease comprises mutations that change aspartic acid to alanine at: residue 9 relative to SEQ ID NO. 70, residue 13 relative to SEQ ID NO. 71, 72 or 74, residue 12 relative to SEQ ID NO. 73, residue 17 relative to SEQ ID NO. 75, residue 23 relative to SEQ ID NO. 76 or residue 10 relative to SEQ ID NO. 597.
Embodiment 13. The engineered nucleic acid editing system of embodiment 9, wherein the base editor comprises a sequence having at least 70%, 80%, 90% or 95% identity to any one of SEQ ID NOs 1-51, 57-66, 385-443, 444-475, 594-595, or 599-675, or variants thereof.
Embodiment 14. The engineered nucleic acid editing system of embodiment 9, wherein the base editor comprises a sequence having at least 70%, 80%, 90% or 95% identity to any one of SEQ ID NOs 50-51 or 385-390.
Embodiment 15. The engineered nucleic acid editing system of any of embodiments 8 to 14, wherein the endonuclease comprises a RuvC domain lacking nuclease activity.
Embodiment 16. The engineered nucleic acid editing system of any of embodiments 8 to 15, wherein the endonuclease is derived from an uncultured microorganism.
Embodiment 17 the engineered nucleic acid editing system of any of embodiments 8 to 16, wherein the endonuclease has less than 80% identity to a Cas9 endonuclease.
Embodiment 18. The engineered nucleic acid editing system of any of embodiments 8 to 17, wherein the endonuclease further comprises a HNH domain.
Embodiment 19. The engineered nucleic acid editing system of any of embodiments 1 to 18, wherein the engineered guide ribonucleic acid sequence comprises a sequence having at least 80% sequence identity to a non-degenerate nucleotide of any one of SEQ ID NOs 88-96, 488-489, or 679-680, or variants thereof.
Embodiment 20 an engineered nucleic acid editing system comprising:
(a) An engineered guide ribonucleic acid structure, the engineered guide ribonucleic acid structure comprising:
(i) A guide ribonucleic acid sequence configured to hybridize to a target deoxyribonucleic acid sequence; and
(Ii) Ribonucleic acid sequences configured to bind to endonucleases,
Wherein the engineered ribonucleic acid sequence comprises a sequence having at least 80% sequence identity to a non-degenerate nucleotide of any one of SEQ ID NOs 88-96, 488-489 or 679-680, or a variant thereof; and
(B) A class 2 type II endonuclease, the class 2 type II endonuclease configured to bind to the engineered guide ribonucleic acid; and
(C) A base editor coupled to the endonuclease.
Embodiment 21. The engineered nucleic acid editing system of embodiment 20, wherein the endonuclease is configured to bind to a Protospacer Adjacent Motif (PAM) sequence selected from the group consisting of SEQ ID NOs 360-368 or 598.
Embodiment 22. The engineered nucleic acid editing system of any of embodiments 1 to 21, wherein the base editor comprises a sequence having at least 70%, 80%, 90% or 95% identity to any one of SEQ ID NOs 1-51, 57-66, 385-443, 444-475, 594-595 or 599-675 or variants thereof.
Embodiment 23. The engineered nucleic acid editing system of any of embodiments 1 to 22, wherein the base editor comprises a sequence having at least 70%, 80%, 90% or 95% identity to any of SEQ ID NOs 50-51 or 385-390.
Embodiment 24. The engineered nucleic acid editing system of any of embodiments 1 to 22, wherein the base editor is an adenine deaminase.
Embodiment 25. The engineered nucleic acid editing system of embodiment 23, wherein the adenosine deaminase comprises a sequence with at least 70%, 80%, 90% or 95% identity to any one of SEQ ID NOs 50-51, 57, 385-443, 448-475, or 595, or variants thereof.
Embodiment 26. The engineered nucleic acid editing system of any of embodiments 1 to 22, wherein the base editor is a cytosine deaminase.
Embodiment 27. The engineered nucleic acid editing system of embodiment 26, wherein the cytosine deaminase comprises a sequence with at least 70%, 80%, 90%, or 95% identity to any one of SEQ ID NOs 1-49, 444-447, 594, 58-66, or 599-675, or variants thereof.
Embodiment 28. The engineered nucleic acid editing system of any of embodiments 1 to 27, comprising a uracil DNA glycosylase inhibitor (UGI) coupled to the endonuclease or the base editor.
Embodiment 29. The engineered nucleic acid editing system of embodiment 28, wherein the uracil DNA glycosylase inhibitor (UGI) comprises a sequence having at least 70%, 80%, 90% or 95% identity to any one of SEQ ID NOs 52-56 or 67.
Embodiment 30. The engineered nucleic acid editing system of any of embodiments 1 to 29, wherein the engineered guide ribonucleic acid structure comprises at least two ribonucleic acid polynucleotides.
Embodiment 31. The engineered nucleic acid editing system of any of embodiments 1 to 29, wherein the engineered guide ribonucleic acid structure comprises a ribonucleic acid polynucleotide comprising the guide ribonucleic acid sequence and the ribonucleic acid sequence is configured to bind to an endonuclease.
Embodiment 32. The engineered nucleic acid editing system of any of embodiments 1 to 31, wherein the guide ribonucleic acid sequence is complementary to a prokaryotic, bacterial, archaebacterial, eukaryotic, fungal, plant, mammalian, or human genomic sequence.
Embodiment 33. The engineered nucleic acid editing system of any of embodiments 1 to 32, wherein the guide ribonucleic acid sequence is 15-24 nucleotides in length.
Embodiment 34 the engineered nucleic acid editing system of any of embodiments 1 to 33, further comprising one or more Nuclear Localization Sequences (NLS) proximal to the N-terminus or C-terminus of the endonuclease.
Embodiment 35. The engineered nucleic acid editing system of embodiment 34, wherein the NLS comprises a sequence with at least 90% identity to a sequence selected from SEQ ID NOs 369-384 or variants thereof.
Embodiment 36. The engineered nucleic acid editing system of any of embodiments 1 to 35, wherein the endonuclease is covalently coupled to the base editor directly or through a linker.
Embodiment 37. The engineered nucleic acid editing system of embodiment 36, wherein the polypeptide comprises the endonuclease and the base editor.
Embodiment 38. The engineered nucleic acid editing system of any of embodiments 1 to 37, wherein the endonuclease is configured to cleave one strand of a double stranded target deoxyribonucleic acid.
Embodiment 39. The engineered nucleic acid editing system of any of embodiments 1 to 38, wherein the system further comprises a source of Mg 2+.
Embodiment 40. The engineered nucleic acid editing system of any of embodiments 1 to 39, wherein:
a) The endonuclease comprises a sequence that is at least 70%, at least 80%, or at least 90% identical to any one of SEQ ID NOs 70, 71, 73, 74, 76, 78, 77, or 78, or a variant thereof;
b) The guide RNA structure comprises a sequence that is at least 70%, at least 80%, or at least 90% identical to a non-degenerate nucleotide of any one of SEQ ID NOs 88, 89, 91, 92, 94, 96, 95, or 488;
c) The endonuclease is configured to bind PAM comprising any one of SEQ ID NOs 360, 361, 363, 365, 367 or 368; or (b)
D) The base editor comprises a sequence at least 70%, at least 80% or at least 90% identical to SEQ ID NO 58 or 595 or variants thereof.
Embodiment 41. The engineered nucleic acid editing system of any of embodiments 1 to 39, wherein:
a) The endonuclease comprises a sequence that is at least 70%, at least 80% or at least 90% identical to any one of SEQ ID NOs 70, 71 or 78 or variants thereof;
b) The guide RNA structure comprises a sequence that is at least 70%, at least 80%, or at least 90% identical to a non-degenerate nucleotide of at least one of SEQ ID NOS 88, 89, or 96;
c) The endonuclease is configured to bind PAM comprising any one of SEQ ID NOs 360, 362 or 368; or (b)
D) The base editor comprises a sequence that is at least 70%, at least 80%, or at least 90% identical to SEQ ID NO 594 or a variant thereof.
Embodiment 42. The engineered nucleic acid editing system of any of embodiments 1 to 41, wherein the sequence identity is determined by BLASTP, CLUSTALW, MUSCLE, MAFFT or smith-whatmann homology search algorithm.
Embodiment 43. The engineered nucleic acid editing system of embodiment 42 wherein the sequence identity is determined by using parameters with a word length (W) of 3 and an expected value (E) of 10 and a BLOSUM62 scoring matrix (set gap penalty to present 11, extension 1) and using the BLASTP homology search algorithm with conditional composition scoring matrix adjustment.
Embodiment 44. The engineered nucleic acid editing system of any of embodiments 1 to 43, wherein the endonuclease is configured to catalyze death.
Example 45. A nucleic acid comprising an engineered nucleic acid sequence optimized for expression in an organism, wherein the nucleic acid encodes a type 2 class II endonuclease coupled to a base editor, and wherein the endonuclease is derived from an uncultured microorganism.
Example 46. A nucleic acid comprising an engineered nucleic acid sequence optimized for expression in an organism, wherein the nucleic acid encodes an endonuclease having at least 70% sequence identity to any one of SEQ ID NOS: 70-78 coupled to a base editor.
Embodiment 47. The nucleic acid of any one of embodiments 44 to 46, wherein the endonuclease comprises a sequence encoding one or more Nuclear Localization Sequences (NLS) near the N-terminus or C-terminus of the endonuclease.
Embodiment 48. The nucleic acid of embodiment 47, wherein the NLS comprises a sequence with at least 90% identity to a sequence selected from SEQ ID NOS 369-384 or variants thereof.
Embodiment 49 the nucleic acid of any one of embodiments 44-48, wherein the organism is a prokaryote, a bacterium, a eukaryote, a fungus, a plant, a mammal, a rodent, or a human.
Example 50. A vector comprising a nucleic acid sequence encoding a type 2 type II endonuclease coupled to a base editor, wherein the endonuclease is derived from an uncultured microorganism.
Embodiment 51. A vector comprising a nucleic acid according to any one of embodiments 44 to 49.
Embodiment 52 the vector of any one of embodiments 50-51, further comprising a nucleic acid encoding an engineered guide ribonucleic acid structure configured to form a complex with the endonuclease, the engineered guide ribonucleic acid structure comprising:
a) A guide ribonucleic acid sequence configured to hybridize to a target deoxyribonucleic acid sequence; and
B) A ribonucleic acid sequence configured to bind to the endonuclease.
Embodiment 53 the vector of any one of embodiments 50 to 52, wherein the vector is a plasmid, a minicircle, CELiD, an adeno-associated virus (AAV) derived virion, or a lentivirus.
Embodiment 54. A cell comprising the vector of any one of embodiments 50-53.
Example 55A method of making an endonuclease, the method comprising culturing a cell according to example 54.
Embodiment 56. A method for modifying a double-stranded deoxyribonucleic acid polynucleotide, the method comprising contacting the double-stranded deoxyribonucleic acid polynucleotide with a complex comprising:
a) An endonuclease comprising a RuvC domain and an HNH domain, wherein the endonuclease is derived from an uncultured microorganism, wherein the endonuclease is a type 2 II endonuclease, and wherein the RuvC domain lacks nuclease activity;
b) A base editor coupled to the endonuclease; and
C) An engineered guide ribonucleic acid structure configured to bind to the endonuclease and the double stranded deoxyribonucleic acid polynucleotide;
Wherein the double stranded deoxyribonucleic acid polynucleotide comprises a Protospacer Adjacent Motif (PAM).
Embodiment 57. The method of embodiment 56, wherein the endonuclease comprising a RuvC domain and a HNH domain is covalently coupled to the base editor directly or through a linker.
Embodiment 58. The method of embodiment 56 or embodiment 57, wherein the endonuclease comprising a RuvC domain and a HNH domain comprises a sequence having at least 95% sequence identity to any one of SEQ ID NOs 70-78 or 597 or variants thereof.
Embodiment 59. The method of any one of embodiments 56-57, wherein when optimally aligned, the endonuclease comprises a mutation from aspartic acid to alanine at: residue 9 relative to SEQ ID NO. 70, residue 13 relative to SEQ ID NO. 71, 72 or 74, residue 12 relative to SEQ ID NO. 73 or 78, residue 17 relative to SEQ ID NO. 75, residue 23 relative to SEQ ID NO. 76, residue 8 relative to SEQ ID NO. 77 or residue 10 relative to SEQ ID NO. 597.
Embodiment 60. The method of any one of embodiments 56 to 57, wherein when optimally aligned, the endonuclease comprises a mutation from aspartic acid to alanine at: residue 9 relative to SEQ ID NO. 70, residue 13 relative to SEQ ID NO. 72 or residue 17 relative to SEQ ID NO. 75.
Example 61. A method for modifying a double-stranded deoxyribonucleic acid polynucleotide, the method comprising contacting the double-stranded deoxyribonucleic acid polynucleotide with a complex comprising:
a class 2 type II endonuclease, which is a class II endonuclease,
A base editor coupled to the endonuclease, and
An engineered guide ribonucleic acid structure configured to bind to the endonuclease and the double stranded deoxyribonucleic acid polynucleotide;
wherein the double-stranded deoxyribonucleic acid polynucleotide comprises a Protospacer Adjacent Motif (PAM); and
Wherein said PAM comprises a sequence selected from the group consisting of SEQ ID NOS: 70-78 or 597.
Embodiment 62. The method of embodiment 61 wherein the type 2 type II endonuclease is coupled to the base editor covalently or via a linker.
Embodiment 63. The method of embodiment 61 or embodiment 62, wherein the base editor comprises a sequence having at least 70%, at least 80%, at least 90% or at least 95% identity to a sequence selected from the group consisting of SEQ ID NOs 1-51, 57-66, 385-443, 444-475, 594-595 or 599-675 or a variant thereof.
Embodiment 64 the method of any one of embodiments 61-63, wherein
The base editor comprises adenine deaminase;
the double-stranded deoxyribonucleic acid polynucleotide comprises adenine; and
Modifying the double-stranded deoxyribonucleic acid polypeptide comprises converting the adenine to guanine.
Embodiment 65. The method of embodiment 64, wherein the adenine deaminase comprises a sequence having at least 70%, 80%, 90% or 95% sequence identity with any one of SEQ ID NOs 50-51, 57, 385-443, 448-475 or 595, or a variant thereof.
Embodiment 66. The method of any one of embodiments 61-63, wherein
The base editor comprises a cytosine deaminase;
the double-stranded deoxyribonucleic acid polynucleotide comprises cytosine; and
Modifying the double-stranded deoxyribonucleic acid polypeptide comprises converting the cytosine to uracil.
Embodiment 67. The method of embodiment 66, wherein the cytosine deaminase comprises a sequence having at least 70%, 80%, 90%, or 95% sequence identity with any one of SEQ ID NOs 1-49, 444-447, 594, 58-66, or 599-675, or a variant thereof.
Embodiment 68. The method of any one of embodiments 61-67, wherein the complex further comprises an uracil DNA glycosylase inhibitor coupled to the endonuclease or the base editor.
Embodiment 69. The method of embodiment 68, wherein the uracil DNA glycosylase inhibitor comprises a sequence having at least 70%, 80%, 90% or 95% identity to any one of SEQ ID NOs 52-56 or SEQ ID NO 67, or a variant thereof.
Embodiment 70 the method of any one of embodiments 61-69, wherein the double-stranded deoxyribonucleic acid polynucleotide comprises a first strand comprising a sequence complementary to the sequence of the engineered guide ribonucleic acid structure; and a second strand, the second strand comprising the PAM.
Embodiment 71. The method of embodiment 70 wherein the PAM is immediately adjacent to the 3' end of the sequence complementary to the sequence of the engineered guide ribonucleic acid structure.
Embodiment 72. The method of any one of embodiments 61-71, wherein the class 2 type II endonuclease is not a Cas9 endonuclease, a Cas14 endonuclease, a Cas12a endonuclease, a Cas12b endonuclease, a Cas12 c endonuclease, a Cas12d endonuclease, a Cas12e endonuclease, a Cas13a endonuclease, a Cas13b endonuclease, a Cas13c endonuclease, or a Cas13 d endonuclease.
Embodiment 73. The method of any one of embodiments 61 to 72, wherein the type 2 type II endonuclease is derived from an uncultured microorganism.
Embodiment 74. The method of any one of embodiments 61-73, wherein the double-stranded deoxyribonucleic acid polynucleotide is a eukaryotic, plant, fungal, mammalian, rodent, or human double-stranded deoxyribonucleic acid polynucleotide.
Embodiment 75. A method of modifying a target nucleic acid locus, the method comprising delivering the engineered nucleic acid editing system of any of embodiments 1 to 44 to the target nucleic acid locus, wherein the endonuclease is configured to form a complex with the engineered guide ribonucleic acid structure, and wherein the complex is configured such that upon binding of the complex to the target nucleic acid locus, the complex modifies a nucleotide of the target nucleic acid locus.
Embodiment 76. The method of embodiment 75, wherein the engineered nucleic acid editing system comprises an adenine deaminase, the nucleotide is adenine, and modifying the target nucleotide locus comprises converting the adenine to guanine.
Embodiment 77. The method of embodiment 75, wherein the engineered nucleic acid editing system comprises a cytidine deaminase and a uracil DNA glycosylase inhibitor, the nucleotide is cytosine and modifying the target nucleic acid locus comprises converting the adenine to uracil.
Embodiment 78. The method of any one of embodiments 75 to 77, wherein the target nucleic acid locus comprises genomic DNA, viral DNA, or bacterial DNA.
Embodiment 79. The method of any one of embodiments 75 to 78, wherein the target nucleic acid locus is in vitro.
Embodiment 80. The method of any one of embodiments 75 to 78, wherein the target nucleic acid locus is intracellular.
Embodiment 81. The method of embodiment 80, wherein the cell is a prokaryotic cell, a bacterial cell, a eukaryotic cell, a fungal cell, a plant cell, an animal cell, a mammalian cell, a rodent cell, a primate cell, or a human cell.
Embodiment 82 the method of any one of embodiments 80-81, wherein the cell is in an animal.
Embodiment 83. The method of embodiment 82, wherein the cell is intra-cochlear.
Embodiment 84. The method of any one of embodiments 80-81, wherein the cell is within an embryo.
Embodiment 85. The method of embodiment 84, wherein the embryo is a double cell embryo.
Embodiment 86. The method of embodiment 84, wherein the embryo is a mouse embryo.
Embodiment 87. The method of any of embodiments 75 to 86, wherein delivering the engineered nucleic acid editing system to the target nucleic acid locus comprises delivering the nucleic acid of any of embodiments 46 to 49 or the vector of any of embodiments 50 to 53.
Embodiment 88 the method of any one of embodiments 75-87, wherein delivering the engineered nucleic acid editing system to the target nucleic acid locus comprises delivering a nucleic acid comprising an open reading frame encoding the endonuclease.
Embodiment 89. The method of embodiment 88, wherein the nucleic acid comprises a promoter operably linked to the open reading frame encoding the endonuclease.
Embodiment 90 the method of any one of embodiments 75-89, wherein delivering the engineered nucleic acid editing system to the target nucleic acid locus comprises delivering a capped mRNA comprising the open reading frame encoding the endonuclease.
Embodiment 91. The method of any one of embodiments 75 to 86, wherein delivering the engineered nucleic acid editing system to the target nucleic acid locus comprises delivering a polypeptide.
Embodiment 92. The method of any of embodiments 75-86, wherein delivering the engineered nucleic acid editing system to the target nucleic acid locus comprises delivering deoxyribonucleic acid (DNA) encoding the engineered guide ribonucleic acid structure operably linked to a ribonucleic acid (RNA) pol III promoter.
Example 93 an engineered nucleic acid editing polypeptide comprising:
an endonuclease comprising a RuvC domain and an HNH domain, wherein the endonuclease is derived from an uncultured microorganism, wherein the endonuclease is a type 2 II endonuclease, and
Wherein the RuvC domain lacks nuclease activity; and
A base editor coupled to the endonuclease.
Embodiment 94 the engineered nucleic acid editing polypeptide of embodiment 93, wherein the endonuclease comprises a sequence having at least 95% sequence identity to any one of SEQ ID NOs 70-78 or 597 or variants thereof.
Example 95 an engineered nucleic acid editing polypeptide comprising:
An endonuclease having at least 95% sequence identity to any one of SEQ ID NOS 70-78 or 597 or variants thereof,
Wherein the endonuclease comprises a RuvC domain lacking nuclease activity; and
A base editor coupled to the endonuclease.
Example 96 an engineered nucleic acid editing polypeptide comprising:
An endonuclease configured to bind to a Protospacer Adjacent Motif (PAM) sequence comprising any one of SEQ ID NOs 360-368 or 598,
Wherein the endonuclease is a type 2 type II endonuclease, and
Wherein the endonuclease comprises a RuvC domain lacking nuclease activity; and
A base editor coupled to the endonuclease.
Embodiment 97 the engineered nucleic acid editing polypeptide of embodiment 95 or embodiment 96, wherein the endonuclease is derived from an uncultured microorganism.
Embodiment 98 the engineered nucleic acid editing polypeptide of any of embodiments 93-97, wherein the endonuclease has less than 80% identity to a Cas9 endonuclease.
Embodiment 99 the engineered nucleic acid editing polypeptide of any of embodiments 95 to 98, wherein the endonuclease further comprises a HNH domain.
Embodiment 100. The engineered nucleic acid editing polypeptide of any of embodiments 95 to 99, wherein the tracr ribonucleic acid sequence comprises a sequence having at least 80% sequence identity to about 60 to 90 consecutive nucleotides selected from any of SEQ ID NOs 88-96, 488, 489, and 679-680.
Embodiment 101. The engineered nucleic acid editing polypeptide of any of embodiments 93 to 100, wherein the base editor comprises a sequence having at least 70%, 80%, 90% or 95% identity to any one of SEQ ID NOs 1-51, 57-66, 385-443, 444-475, 594-595 or 599-675 or variants thereof.
Embodiment 102. The engineered nucleic acid editing polypeptide of any of embodiments 93 to 101, wherein the base editor is an adenine deaminase.
Embodiment 103. The engineered nucleic acid editing polypeptide of embodiment 102, wherein the adenosine deaminase comprises a sequence with at least 70%, 80%, 90% or 95% sequence identity to any one of SEQ ID NOs 50-51, 57, 385-443, 448-475, or 595, or variants thereof.
Embodiment 104 the engineered nucleic acid editing polypeptide of any of embodiments 93 to 101, wherein the base editor is a cytosine deaminase.
Embodiment 105. The engineered nucleic acid editing polypeptide of embodiment 104, wherein the cytosine deaminase comprises a sequence with at least 70%, 80%, 90%, or 95% sequence identity with any one of SEQ ID NOs 1-49, 444-447, 594, or 58-66, or variants thereof.
Embodiment 106. An engineered nucleic acid editing polypeptide comprising:
an endonuclease, wherein the endonuclease is configured to lack endonuclease activity; and
A base editor coupled to the endonuclease,
Wherein the base editor comprises a sequence having at least 70%, 80%, 90% or 95% sequence identity to any one of SEQ ID NOs 1-51, 385-386, 387-443, 444-447, 488-475, 595 or 599-675 or a variant thereof.
Embodiment 107. The engineered nucleic acid editing polypeptide of embodiment 106, wherein the endonuclease is configured to cleave one strand of a double-stranded target deoxyribonucleic acid.
Embodiment 108. The engineered nucleic acid editing polypeptide of embodiment 106, wherein the endonuclease is configured to catalyze death.
Embodiment 109. The engineered nucleic acid editing polypeptide of any of embodiments 106 to 108, wherein the endonuclease is an endonuclease.
Embodiment 110. The engineered nucleic acid editing polypeptide of embodiment 109, wherein the endonuclease is a type II endonuclease or a type II V endonuclease.
Embodiment 111 the engineered nucleic acid editing polypeptide of embodiment 106, wherein the endonuclease comprises a sequence having at least 70%, 80%, 90% or 95% sequence identity to any one of SEQ ID NOs 70-78 or 597 or variants thereof.
Embodiment 112 the engineered nucleic acid editing polypeptide of any of embodiments 109-111, wherein the endonuclease comprises a nickase mutation.
Embodiment 113. The engineered nucleic acid editing polypeptide of embodiment 112, wherein the endonuclease comprises a mutation from aspartic acid to alanine at: residue 9 relative to SEQ ID NO. 70, residue 13 relative to SEQ ID NO. 71, 72 or 74, residue 12 relative to SEQ ID NO. 73, residue 17 relative to SEQ ID NO. 75, residue 23 relative to SEQ ID NO. 76 or residue 10 relative to SEQ ID NO. 597.
Embodiment 114 the engineered nucleic acid editing polypeptide of any of embodiments 109-113, wherein the endonuclease is configured to bind to a Protospacer Adjacent Motif (PAM) sequence selected from the group consisting of SEQ ID NOs 360-368 or 598.
Embodiment 115. The engineered nucleic acid editing polypeptide of any of embodiments 106 to 114, wherein the base editor is an adenine deaminase.
Embodiment 116. The engineered nucleic acid editing polypeptide of embodiment 115, wherein the adenosine deaminase comprises a sequence having at least 80%, at least 81%, at least 82%, at least 83%, at least 84%, at least 85%, at least 86%, at least 87%, at least 88%, at least 89%, at least 90%, at least 91%, at least 92%, at least 93%, at least 94%, at least 95%, at least 96%, at least 97%, at least 98%, at least 99% or 100% sequence identity to any one of SEQ ID NOs 50-51, 385-443, 448-475, or 595, or variants thereof.
Embodiment 117. The engineered nucleic acid editing polypeptide of embodiment 116, wherein the adenosine deaminase comprises a sequence with at least 70%, 80%, 90% or 95% identity to any one of SEQ ID NOs 50-51, 385-390 or 595, or variants thereof.
Embodiment 118 the engineered nucleic acid editing polypeptide of any of embodiments 106 to 114, wherein the base editor is a cytosine deaminase.
Embodiment 119. The engineered nucleic acid editing polypeptide of embodiment 118, wherein the cytosine deaminase comprises a sequence with at least 70%, 80%, 90%, or 95% identity to any one of SEQ ID NOs 1-49, 444-447, or variants thereof.
Embodiment 120 the engineered nucleic acid editing polypeptide of any of embodiments 106-119, further comprising a uracil DNA glycosylase inhibitor (UGI) coupled to the endonuclease or the base editor.
Embodiment 121. The engineered nucleic acid editing polypeptide of embodiment 120, wherein the uracil DNA glycosylase inhibitor comprises a sequence having at least 70%, 80%, 90% or 95% identity to any one of SEQ ID NOs 52-56 or 67 or variants thereof.
Embodiment 122 the engineered nucleic acid editing polypeptide of any of embodiments 106-121, wherein a polypeptide comprising the endonuclease comprises one or more Nuclear Localization Sequences (NLS) proximal to the N-terminus or C-terminus of the endonuclease.
Embodiment 123. The engineered nucleic acid editing polypeptide of embodiment 122, wherein the NLS comprises a sequence with at least 90% identity to a sequence selected from SEQ ID NOs 369-384 or variants thereof.
Embodiment 124. The engineered nucleic acid editing polypeptide of any of embodiments 106 to 123, wherein the endonuclease is covalently coupled to the base editor directly or through a linker.
Embodiment 125. A nucleic acid comprising an engineered nucleic acid sequence optimized for expression in an organism, wherein the nucleic acid encodes a sequence having at least 80%, at least 81%, at least 82%, at least 83%, at least 84%, at least 85%, at least 86%, at least 87%, at least 88%, at least 89%, at least 90%, at least 91%, at least 92%, at least 93%, at least 94%, at least 95%, at least 96%, at least 97%, at least 98%, at least 99% or 100% sequence identity to any one of SEQ ID NOs 1-51, 385-386, 387-443, 444-447, 488-475, or 595, or a variant thereof.
Embodiment 126. The nucleic acid of embodiment 125, wherein the organism is a prokaryote, bacterium, eukaryote, fungus, plant, mammal, rodent, or human.
Embodiment 127. A vector comprising the nucleic acid of any one of embodiments 125-126.
Embodiment 128 the vector of embodiment 127, wherein the vector is a plasmid, a minicircle, CELiD, an adeno-associated virus (AAV) derived virion, or a lentivirus.
Embodiment 129. A cell comprising the vector of any one of embodiments 127-128.
Example 130. A method of making a base editor, the method comprising culturing the cell of example 129.
Embodiment 131. A system comprising:
(a) The nucleic acid editing polypeptide of any one of embodiments 106 to 124; and
(B) An engineered guide ribonucleic acid structure configured to form a complex with the nucleic acid editing polypeptide, the engineered guide ribonucleic acid structure comprising:
i. a guide ribonucleic acid sequence configured to hybridize to a target deoxyribonucleic acid sequence; and
A ribonucleic acid sequence configured to bind to the endonuclease.
Embodiment 132. The system of embodiment 131, wherein the engineered guide ribonucleic acid sequence comprises a sequence having at least 80% sequence identity to a non-degenerate nucleotide of any one of SEQ ID NOs 88-96, 488-489, or 679-680.
Embodiment 133. A method of modifying a target nucleic acid locus, the method comprising delivering to the target nucleic acid locus the engineered nucleic acid editing polypeptide of any one of embodiments 106-124 or the system of any one of embodiments 131-132, wherein the complex is configured such that upon binding of the complex to the target nucleic acid locus, the complex modifies a nucleotide of the target nucleic acid locus.
Example 134 a nucleic acid editing polypeptide comprising:
An adenosine deaminase comprising a polypeptide sequence comprising a substitution of at least one residue selected from the group consisting of SEQ ID NO: 386: residue 24, residue 83, residue 85, residue 107, residue 109, residue 112, residue 124, residue 143, residue 147, residue 148, residue 154 or residue 158.
Embodiment 135 the nucleic acid editing polypeptide of embodiment 134 wherein the substituted residue is selected from the group consisting of W24, V83, L85, a107, D109, T112, H124, a143, S147, D148, R154, and K158.
Embodiment 136. The nucleic acid editing polypeptide of embodiment 134 or embodiment 135 wherein the substitution is a conservative substitution.
Embodiment 137. The nucleic acid editing polypeptide of embodiment 134 or embodiment 135, wherein the substitution is a non-conservative substitution.
Embodiment 138 the nucleic acid editing polypeptide of any of embodiments 134 to 137 comprising a substitution at W24, wherein the substitution is W24R.
Embodiment 139 the nucleic acid editing polypeptide of any of embodiments 134-138 comprising a substitution at V83, wherein the substitution is V83S.
Embodiment 140 the nucleic acid editing polypeptide of any of embodiments 134 to 139 comprising a substitution at L85, wherein the substitution is L85F.
Embodiment 141 the nucleic acid editing polypeptide of any of embodiments 134 to 140 comprising a substitution at a107, wherein the substitution is a107V.
Embodiment 142 the nucleic acid editing polypeptide of any of embodiments 134 to 141 comprising a substitution at D109, wherein the substitution is D109N.
Embodiment 143 the nucleic acid editing polypeptide of any of embodiments 134-142 comprising a substitution at T112, wherein the substitution is T112R.
Embodiment 144 the nucleic acid editing polypeptide of any of embodiments 134 to 143 comprising a substitution at H124, wherein the substitution is H124Y.
Embodiment 145 the nucleic acid editing polypeptide of any of embodiments 134 to 144 comprising a substitution at a143, wherein the substitution is a143N.
Embodiment 146 the nucleic acid editing polypeptide of any of embodiments 134 to 145 comprising a substitution at S147, wherein the substitution is S147C.
Embodiment 147 the nucleic acid editing polypeptide of any of embodiments 134 to 146 comprising a substitution at D148, wherein the substitution is D148Y or D148R.
Embodiment 148 the nucleic acid editing polypeptide of any of embodiments 134-147 comprising a substitution at R154, wherein the substitution is R154P.
Embodiment 149 the nucleic acid editing polypeptide of any of embodiments 134-148 comprising a substitution at K158, wherein the substitution is K158N.
Embodiment 150 the nucleic acid editing polypeptide of any of embodiments 134 to 149, wherein the adenosine deaminase comprises a sequence having at least 80%, at least 81%, at least 82%, at least 83%, at least 84%, at least 85%, at least 86%, at least 87%, at least 88%, at least 89%, at least 90%, at least 91%, at least 92%, at least 93%, at least 94%, at least 95%, at least 96%, at least 97%, at least 98%, at least 99% sequence identity to any of SEQ ID NOs 50-51 or 385-443.
Embodiment 151 the engineered nucleic acid editing polypeptide of any of embodiments 134 to 150, further comprising an endonuclease, wherein the endonuclease is configured to lack endonuclease activity.
Embodiment 152. The engineered nucleic acid editing polypeptide of embodiment 151, wherein the endonuclease is configured to cleave one strand of a double-stranded target deoxyribonucleic acid.
Embodiment 153 the engineered nucleic acid editing polypeptide of embodiment 151 wherein the endonuclease is configured to catalyze death.
Embodiment 154 the engineered nucleic acid editing polypeptide of any of embodiments 151-153, wherein the endonuclease is a Cas endonuclease.
Embodiment 155. The engineered nucleic acid editing polypeptide of embodiment 154, wherein the endonuclease is a type II endonuclease or a type II V endonuclease.
Embodiment 156. The engineered nucleic acid editing polypeptide of embodiment 155, wherein the endonuclease comprises a sequence having at least 70%, 80%, 90% or 95% sequence identity to any one of SEQ ID NOs 70-78 or 597 or variants thereof.
Embodiment 157 the engineered nucleic acid editing polypeptide of any of embodiments 151-156, wherein the endonuclease comprises a nickase mutation.
Embodiment 158. The engineered nucleic acid editing polypeptide of embodiment 157, wherein the endonuclease comprises a mutation from aspartic acid to alanine at: residue 9 relative to SEQ ID NO. 70, residue 13 relative to SEQ ID NO. 71, 72 or 74, residue 12 relative to SEQ ID NO. 73, residue 17 relative to SEQ ID NO. 75, residue 23 relative to SEQ ID NO. 76 or residue 10 relative to SEQ ID NO. 597.
Embodiment 159 the engineered nucleic acid editing polypeptide of any of embodiments 151-156, wherein the endonuclease is configured to bind to a Protospacer Adjacent Motif (PAM) sequence selected from the group consisting of SEQ ID NOs 360-368 or 598.

Claims (115)

1. A method of deaminating a cytosine residue in a eukaryotic nucleic acid sequence in a cell, the method comprising:
Contacting a polypeptide having cytosine deaminase activity comprising a sequence having at least 80% identity to any one of SEQ ID NOs 1-49, 444-447, 599-675, 744-835, 970-982 or variants thereof with said eukaryotic nucleic acid sequence.
2. The method of claim 1, wherein the eukaryotic nucleic acid sequence is a mammalian, primate, or human nucleic acid sequence.
3. The method of claim 1 or 2, wherein the cell is a mammalian cell, primate cell, or human cell.
4. A method according to any one of claims 1 to 3, wherein the eukaryotic nucleic acid sequence comprises single stranded DNA (ssDNA) or ribonucleic acid (RNA).
5. The method of claim 4, wherein the polypeptide having cytosine deaminase activity comprises a sequence having at least 80% identity with any one of SEQ ID NO:809-811、819、826、752、777、823、668-671、675、650、752、774、777、806、812、816、817、818、825、827、832、970-982 or a variant thereof.
6. The method of claim 5, wherein the polypeptide having cytosine deaminase activity comprises a sequence having at least 80% identity to any one of SEQ ID NOs 808, 810-811, 819, 826, 752, 777, or 823, or variants thereof.
7. The method of any one of claims 1 to 3, wherein the eukaryotic nucleic acid sequence comprises double-stranded DNA (dsDNA).
8. The method of claim 7, wherein the polypeptide having cytosine deaminase activity comprises a sequence at least 80% identical to any one of SEQ ID NOs 810-811.
9. The method of any one of claims 1 to 8, wherein the polypeptide having cytosine deaminase activity further comprises a nucleic acid binding domain, an endonuclease, or a nicking enzyme.
10. The method of claim 9, wherein the polypeptide having cytosine deaminase activity further comprises the endonuclease or the nicking enzyme, wherein the endonuclease or the nicking enzyme comprises a sequence having at least 80% identity to any one of SEQ ID NOs 70-78, 596, 597, 1120, 1122-1127, 1647, or variants thereof.
11. The method of claim 9 or 10, wherein the polypeptide having cytosine deaminase activity further comprises a nicking enzyme, wherein the nicking enzyme comprises a mutation from aspartic acid to alanine at: residue 9 relative to SEQ ID NO. 70, residue 13 relative to SEQ ID NO. 71, 72 or 74, residue 12 relative to SEQ ID NO. 73, residue 17 relative to SEQ ID NO. 75, residue 23 relative to SEQ ID NO. 76 or residue 10 relative to SEQ ID NO. 597 or any combination thereof.
12. The method of any one of claims 1 to 11, wherein the polypeptide having cytosine deaminase activity further comprises a uracil DNA glycosylase inhibitor sequence.
13. The method of claim 12, wherein the uracil DNA glycosylase inhibitor comprises a sequence having at least 70%, 80%, 90% or 95% identity to any of SEQ ID NOs 52-56 or 67 or variants thereof.
14. The method of any one of claims 1 to 13, wherein the polypeptide having cytosine deaminase activity further comprises a FAM72A sequence.
15. The method of claim 14, wherein the FAM72A sequence has at least 80% identity to SEQ ID No. 1121 or a variant thereof.
16. A method of deaminating a cytosine residue in a primate nucleic acid sequence in a cell, the method comprising:
contacting a polypeptide having cytosine deaminase activity comprising a sequence having at least 80% identity to any one of SEQ ID NOs 599-638, 660-675, 828-835 or variants thereof with a primate nucleic acid sequence.
17. The method of claim 16, wherein the eukaryotic nucleic acid sequence comprises double-stranded DNA (dsDNA), single-stranded DNA (ssDNA), or ribonucleic acid (RNA).
18. The method of claim 16 or 17, wherein the polypeptide having cytosine deaminase activity further comprises a nucleic acid binding domain, an endonuclease, or a nicking enzyme.
19. The method of claim 18, wherein the polypeptide having cytosine deaminase activity further comprises the endonuclease or the nicking enzyme, wherein the endonuclease or the nicking enzyme comprises a sequence having at least 80% identity to any one of SEQ ID NOs 70-78, 596, 597, 1120, 1122-1127, 1647, or variants thereof.
20. The method of claim 19, wherein the polypeptide having cytosine deaminase activity further comprises a nicking enzyme, wherein the nicking enzyme comprises a mutation from aspartic acid to alanine at: residue 9 relative to SEQ ID NO. 70, residue 13 relative to SEQ ID NO. 71, 72 or 74, residue 12 relative to SEQ ID NO. 73, residue 17 relative to SEQ ID NO. 75, residue 23 relative to SEQ ID NO. 76 or residue 10 relative to SEQ ID NO. 597 or any combination thereof.
21. The method of any one of claims 16 to 20, wherein the polypeptide having cytosine deaminase activity further comprises a uracil DNA glycosylase inhibitor sequence.
22. The method of claim 21, wherein the uracil DNA glycosylase inhibitor comprises a sequence having at least 70%, 80%, 90% or 95% identity to any of SEQ ID NOs 52-56 or 67 or variants thereof.
23. The method of any one of claims 16 to 20, wherein the polypeptide having cytosine deaminase activity further comprises a FAM72A sequence.
24. The method of claim 23, wherein the FAM72A sequence has at least 80% identity to SEQ ID No. 1121 or a variant thereof.
25. A nucleic acid comprising an engineered nucleic acid sequence optimized for expression in a mammalian organism, wherein the nucleic acid encodes a sequence having at least 70%, 80%, 90% or 95% sequence identity to any one of SEQ ID NOs 1-49, 444-447, 599-675, 744-835, 970-982 or variants thereof.
26. The nucleic acid of claim 25, wherein the nucleic acid encodes a sequence having at least 70%, 80%, 90% or 95% sequence identity to any one of SEQ ID NO:809-811、819、826、752、777、823、668-671、675、650、752、774、777、806、812、816、817、818、825、827、832、832、970-982 or a variant thereof.
27. A vector comprising the nucleic acid of claim 25 or 26.
28. A fusion polypeptide, comprising:
(a) A domain having cytosine deaminase activity comprising a sequence having at least 80% identity to any one of SEQ ID NOs 1-49, 444-447, 599-675, 744-835, 970-982 or variants thereof; and
(B) A nucleic acid binding domain, an endonuclease domain or a nicking enzyme domain.
29. The fusion polypeptide of claim 28, wherein the domain having cytosine deaminase activity comprises a sequence at least 80% identical to any one of SEQ ID NO:809-811、819、826、752、777、823、668-671、675、650、752、774、777、806、812、816、817、818、825、827、832、832、970-982 or a variant thereof.
30. The fusion polypeptide of claim 28 or 29, wherein the domain having cytosine deaminase activity comprises a sequence with at least 80% identity to any one of SEQ ID NOs 809-811, 819, 826, 752, 777, 823 or variants thereof.
31. The fusion polypeptide of any one of claims 28-30, wherein the fusion polypeptide comprises the endonuclease domain or the nicking enzyme domain, wherein the endonuclease domain or the nicking enzyme domain comprises a sequence having at least 80% identity to any one of SEQ ID NOs 70-78, 596, 597-598, 1120, 1122-1127, 1647, or variants thereof.
32. The fusion polypeptide of any one of claims 28-31, wherein the fusion protein comprises the nicking enzyme domain, wherein the nicking enzyme domain comprises a mutation from aspartic acid to alanine at: residue 9 relative to SEQ ID NO. 70, residue 13 relative to SEQ ID NO. 71, 72 or 74, residue 12 relative to SEQ ID NO. 73, residue 17 relative to SEQ ID NO. 75, residue 23 relative to SEQ ID NO. 76 or residue 10 relative to SEQ ID NO. 597 or any combination thereof.
33. The fusion polypeptide of any one of claims 28-32, wherein the fusion protein comprises a sequence having at least 80% sequence identity to any one of SEQ ID NOs 877-916 or 968-969 or variants thereof.
34. A system, comprising:
(a) The fusion polypeptide of any one of claims 28 to 33; and
(B) An engineered guide-polynucleotide configured to form a complex with the endonuclease domain, the engineered guide-polynucleotide comprising:
i. a guide ribonucleic acid sequence configured to hybridize to a target deoxyribonucleic acid sequence; and
A ribonucleic acid sequence configured to bind to the endonuclease domain.
35. The system of claim 34, wherein the engineered guide-polynucleotide further comprises a sequence having at least 80% identity to a non-degenerate nucleotide of any one of SEQ ID NOs 88-96, 917-931, 963-967, 1099-1105, or a variant thereof.
36. A polypeptide having adenosine deaminase activity, the polypeptide comprising:
A sequence having at least 80% identity to any one of SEQ ID NOs 50, 51, 385-443, 448-475 or a variant thereof,
Wherein when optimally aligned, the polypeptide comprises substitution relative to at least one of the following residues of SEQ ID NO:50 or any combination thereof: t2, D7, E10, M13, W24, G32, K38, G45, G51, A63, E66, R75, C91, G93, H97, A107, E108, D109, P110, H124, A126, H129, F150 or S165.
37. The polypeptide of claim 36, wherein when optimally aligned, the substitution comprises T2X1、D7X1、E10X1、M13X4、W24X1、G32X1、K38X2、G45X2、G51X5、A63X7、E66X5、E66X2、R75H、C91R、G93X6、H97X6、H97X5、A107X5、E108X2、D109N、P110H、H124X6、A126X2、H129R、H129N、F150P、F150S、S165X5 relative to SEQ ID NO. 50 or any combination thereof,
Wherein X 1 is A or G;
X 2 is D or E;
x 3 is N or Q;
X 4 is R or K;
x 5 is I, L, M or V;
x 6 is F, Y or W; and
X 7 is S or T.
38. The polypeptide of claim 37, wherein the polypeptide comprises any one of SEQ ID NOs 836-860 or variants thereof.
39. The polypeptide of claim 38, wherein the polypeptide comprises any one of SEQ ID NOs 839, 841, 843, 844, 847, 848, 849, 850, 851, 852, 859, or variants thereof.
40. The polypeptide of claim 37, wherein when optimally aligned, the substitution comprises W24G, G51V, E108D, P110H, F150P, D G, E G or H129N relative to SEQ ID No. 50, or any combination thereof.
41. The polypeptide of any one of claims 36 to 40, wherein the polypeptide further comprises a nucleic acid binding domain, an endonuclease domain, or a nicking enzyme domain.
42. The polypeptide according to claim 41, wherein the polypeptide comprises the endonuclease domain or the nicking enzyme domain, wherein the endonuclease domain or the nicking enzyme domain comprises a sequence having at least 80% identity to any one of SEQ ID NOs 70-78, 596, 597-598, 1120, 1122-1127, 1647, or variants thereof.
43. The polypeptide of claim 41 or 42, wherein the polypeptide comprises the nicking enzyme domain, wherein the nicking enzyme domain comprises a mutation from aspartic acid to alanine at: residue 9 relative to SEQ ID NO. 70, residue 13 relative to SEQ ID NO. 71, 72 or 74, residue 12 relative to SEQ ID NO. 73, residue 17 relative to SEQ ID NO. 75, residue 23 relative to SEQ ID NO. 76 or residue 10 relative to SEQ ID NO. 597 or any combination thereof.
44. A system, comprising:
(a) A polypeptide according to any one of claims 36 to 43; and
(B) An engineered guide-polynucleotide configured to form a complex with the endonuclease domain, the engineered guide-polynucleotide comprising:
i. a guide ribonucleic acid sequence configured to hybridize to a target deoxyribonucleic acid sequence; and
A ribonucleic acid sequence configured to bind to the endonuclease domain.
45. A system according to claim 44 wherein the engineered guide-polynucleotide further comprises a sequence having at least 80% identity to a non-degenerate nucleotide of any one of SEQ ID NOs 88-96, 917-931, 963-967, 1099-1105 or a variant thereof.
46. A method of deaminating a cytosine residue in a cell, the method comprising introducing into the cell:
(a) A vector encoding a polypeptide having cytosine deaminase activity; and
(B) A vector encoding FAM72A protein.
47. The method of claim 46, wherein the vector encoding the FAM72A protein comprises a sequence that has at least 80% identity to SEQ ID No. 1115 or a variant thereof, or encodes a sequence that has at least 80% identity to SEQ ID No. 1121 or a variant thereof.
48. The method of claim 46 or 47, wherein the polypeptide having cytosine deaminase activity comprises a sequence having at least 80% identity to any one of SEQ ID NOs 1-49, 444-447, 599-675, 744-835, 970-982 or a variant thereof.
49. The method of any one of claims 46 to 48, wherein the polypeptide having cytosine deaminase activity further comprises a nucleic acid binding domain, an endonuclease domain, or a nicking enzyme domain.
50. The method of claim 49, wherein the polypeptide having cytosine deaminase activity comprises the endonuclease domain or the nicking enzyme domain, wherein the endonuclease domain or the nicking enzyme domain comprises a sequence having at least 80% identity to any one of SEQ ID NOs 70-78, 596, 597-598, 1120, 1122-1127, 1647, or variants thereof.
51. The method of claim 49 or 50, wherein the polypeptide having cytosine deaminase activity comprises the nicking enzyme domain, wherein the nicking enzyme domain comprises a mutation from aspartic acid to alanine at: residue 9 relative to SEQ ID NO. 70, residue 13 relative to SEQ ID NO. 71, 72 or 74, residue 12 relative to SEQ ID NO. 73, residue 17 relative to SEQ ID NO. 75, residue 23 relative to SEQ ID NO. 76 or residue 10 relative to SEQ ID NO. 597 or any combination thereof.
52. An engineered nucleic acid editing polypeptide comprising
(I) A sequence having cytosine deaminase activity; and
(Ii) A sequence derived from FAM72A protein.
53. The polypeptide of claim 52, wherein the sequence having cytosine deaminase activity has at least 80% identity with any one of SEQ ID NOs 1-49, 444-447, 599-675, 744-835, 970-982 or variants thereof.
54. The polypeptide of any one of claims 52 or 53, wherein the sequence derived from the FAM72A protein has at least 80% identity to SEQ ID No. 1121 or a variant thereof.
55. The polypeptide of any one of claims 52 to 54, further comprising an endonuclease sequence comprising a RuvC domain and an HNH domain, wherein the endonuclease sequence is a sequence of a type II endonuclease.
56. The polypeptide of claim 55, wherein said RuvC domain lacks nuclease activity.
57. The polypeptide according to claim 55, wherein the endonuclease comprises a nicking enzyme.
58. The polypeptide according to any one of claims 55 to 57, wherein the class 2 type II endonuclease sequence has at least 80% sequence identity to any one of SEQ ID NOs 70-78, 596, 597-598, 1120, 1122-1127, 1647 or a variant thereof.
59. The polypeptide according to any one of claims 56 to 58, wherein when optimally aligned, the type 2 type II endonuclease comprises a mutation from aspartic acid to alanine at: residue 9 relative to SEQ ID NO. 70, residue 13 relative to SEQ ID NO. 71, 72 or 74, residue 12 relative to SEQ ID NO. 73, residue 17 relative to SEQ ID NO. 75, residue 23 relative to SEQ ID NO. 76 or residue 10 relative to SEQ ID NO. 597.
60. A method of editing a cytosine residue in a cell to a thymine residue, the method comprising contacting a polypeptide according to any one of claims 52 to 59 with the cell.
61. The method of claim 60, wherein the cell is a prokaryotic cell, a eukaryotic cell, a mammalian cell, a primate cell, or a human cell.
62. An engineered nucleic acid editing polypeptide comprising:
A plurality of domains derived from a class 2 type II endonuclease, wherein the domains comprise a RUVC-I domain, a REC domain, a HNH domain, a RUVC-III domain, and a WED domain; and
A domain comprising a base editor sequence, wherein
(A) The base editor sequence is inserted within the RUVC-I domain;
(b) The base editor sequence is inserted within the REC domain;
(c) The base editor sequence is inserted within the HNH domain;
(d) The base editor sequence is inserted within the RUV-CIII domain;
(e) The base editor sequence is inserted within the WED domain;
(f) The base editor sequence is inserted prior to the HNH domain;
(g) The base editor sequence is inserted prior to the RUV-CIII domain; or (b)
(H) The base editor sequence is interposed between the RUVC-III domain and the WED domain.
63. The engineered nucleic acid editing polypeptide according to claim 62, wherein the class II endonuclease comprises a sequence that has at least 80% sequence identity to any one of SEQ ID NOs 70-78, 596, 597-598, 1120, 1122-1127, 1647, or variants thereof.
64. The engineered nucleic acid editing polypeptide according to claim 62 or 63, wherein the class 2 type II endonuclease comprises a sequence having at least 80% sequence identity to SEQ ID No. 1647 or a variant thereof.
65. The engineered nucleic acid editing polypeptide of any of claims 62 to 64, wherein the base editor sequence comprises a deaminase sequence.
66. The engineered nucleic acid editing polypeptide of claim 65 wherein the deaminase sequence has at least 80% sequence identity to any one of SEQ ID NOs 1-49, 444-447, 599-675, 744-835, 970-982, 50, 51, 385-443, 448-475, or variants thereof.
67. The engineered nucleic acid editing polypeptide of claim 66, wherein the deaminase sequence has at least 80% sequence identity to any one of SEQ ID NOs 1-49, 444-447, 599-675, 744-835, 970-982 or variants thereof.
68. The engineered nucleic acid editing polypeptide of claim 66, wherein the deaminase sequence has at least 80% sequence identity to any of SEQ ID NOs 50, 51, 385-443, 448-475, or variants thereof.
69. The engineered nucleic acid editing polypeptide of claim 66 or 68, wherein the deaminase has at least 80% sequence identity with SEQ ID NO 386 or a variant thereof.
70. The engineered nucleic acid editing polypeptide of any of claims 66, 68, or 69, wherein when optimally aligned, the deaminase sequence comprises a substitution relative to one or any combination of the following residues of SEQ ID NO: 50: t2, D7, E10, M13, W24, G32, K38, G45, G51, A63, E66, R75, C91, G93, H97, A107, E108, D109, P110, H124, A126, H129, F150 or S165.
71. The engineered nucleic acid editing polypeptide of any of claims 66 to 70, wherein the engineered nucleic acid editing polypeptide comprises a sequence having at least 80% sequence identity to any one of SEQ ID NOs 1128-1160, or variants thereof.
72. The engineered nucleic acid editing polypeptide of claim 71, wherein the engineered nucleic acid editing polypeptide comprises a sequence having at least 80% sequence identity to any one of SEQ ID NOs 1137, 1140, 1142, 1143, 1146, 1149, 1151-1158 or variants thereof.
73. The engineered nucleic acid editing polypeptide of claim 72, wherein the engineered nucleic acid editing polypeptide comprises a sequence having at least 80% sequence identity to any one of SEQ ID NOs 1139, 1152, 1158 or variants thereof.
74. A polypeptide having adenosine deaminase activity, the polypeptide comprising:
A sequence having at least 80% sequence identity to any one of SEQ ID NOs 50, 51, 385-443, 448-475 or a variant thereof,
Wherein when optimally aligned, the polypeptide comprises a substitution of the non-wild type residue at residue 109 relative to the wild type residue of SEQ ID NO:386 and one other residue comprising any one or any combination of: 24. 37, 49, 52, 83, 85, 107, 110, 112, 120, 123, 124, 147, 148, 150, 156, 157, 158, 166, 167 or 129.
75. The polypeptide of claim 74, wherein the sequence has at least 80% sequence identity to SEQ ID No. 386.
76. The polypeptide of any one of claims 74 or 75, wherein when optimally aligned, the polypeptide comprises a substitution of 109N relative to SEQ ID No. 386 and at least one other substitution comprising any one or any combination of: 24R, 37L, 49A, 52L, 83S, 85F, 107V, 110S, 112R, 120N, 123N, 124Y, 147C, 148Y, 148R, 150Y, 156V, 157F, 158N, 166I, or 129N.
77. The polypeptide of claim 74, comprising any of the substitutions depicted in figure 34B.
78. The polypeptide of any one of claims 74 to 77, wherein the polypeptide has at least 80% sequence identity to any one of SEQ ID NOs 1161-1183 or variants thereof.
79. The polypeptide of claim 78, wherein the polypeptide has at least 80% sequence identity to any one of SEQ ID NOs 1170, 1179 or 1166 or a variant thereof.
80. The polypeptide of any one of claims 74 to 79, wherein the polypeptide further comprises an endonuclease or a nicking enzyme.
81. The polypeptide of claim 80, wherein the polypeptide comprises the endonuclease or the nicking enzyme, wherein the endonuclease or the nicking enzyme comprises a sequence having at least 80% identity to any one of SEQ ID NOs 70-78, 596, 597-598, 1120, 1122-1127, 1647, or variants thereof.
82. The polypeptide of claim 41 or 42, wherein the polypeptide comprises the nicking enzyme, wherein the nicking enzyme comprises a mutation from aspartic acid to alanine at: residue 9 relative to SEQ ID NO. 70, residue 13 relative to SEQ ID NO. 71, 72 or 74, residue 12 relative to SEQ ID NO. 73, residue 17 relative to SEQ ID NO. 75, residue 23 relative to SEQ ID NO. 76 or residue 10 relative to SEQ ID NO. 597 or any combination thereof.
83. A polypeptide having cytosine deaminase activity, the polypeptide comprising:
A sequence having at least 80% sequence identity to any one of SEQ ID NOs 1-49, 444-447, 599-675, 744-835, 970-982 or variants thereof;
wherein the polypeptide comprises at least one of the changes described in table 12C.
84. The polypeptide of claim 83, wherein the polypeptide has at least one substitution :W90A、W90F、W90H、W90Y、Y120F、Y120H、Y121F、Y121H、Y121Q、Y121A、Y121D、Y121W、H122Y、H122F、H122I、H122A、H122W、H122D、Y121T、R33A、R34A、R34K、H122A、R33A、R34A、R52A、N57G、H122A、E123A、E123Q、W127F、W127H、W127Q、W127A、W127D、R39A、K40A、H128A、N63G、R58A、H121F、H121Y、H121Q、H121A、H121D、H121W、R33A、K34A、H122A、H121A、R52A、P26R、P26A、N27R、N27A、W44A、W45A、K49G、S50G、R51G、R121A、I122A、N123A、Y88F、Y120F、P22R、P22A、K23A、K41R、K41A、E54A、E54A、E55A、K30A、K30R、M32A、M32K、Y117A、K118A、I119A、I119H、R120A、R121A、P46A、P46R、N29A、R27A or N50G of a wild-type amino acid pair comprising a non-wild-type amino acid of any one of or any combination of the following.
85. The polypeptide of claim 83 or 84, comprising a sequence having at least 80% sequence identity to any one of SEQ ID NOs 1208-1315 or variants thereof.
86. A polypeptide having cytosine deaminase activity, the polypeptide comprising:
A cytosine deaminase sequence having at least 80% sequence identity with any one of SEQ ID NO:835、1275、668、774、818、671、667、650、827、819、823、814、813、817、628、826、1223、834、618、621、669、833、830 or a variant thereof; and
Endonuclease or nicking enzyme.
87. The polypeptide of claim 86, wherein the endonuclease or the nicking enzyme comprises a sequence that is at least 80% identical to any one of SEQ ID NOs 70-78, 596, 597-598, 1120, 1122-1127, 1647, or a variant thereof.
88. The polypeptide of claim 87, wherein the polypeptide comprises the nicking enzyme, wherein the nicking enzyme comprises a mutation from aspartic acid to alanine at: residue 9 relative to SEQ ID NO. 70, residue 13 relative to SEQ ID NO. 71, 72 or 74, residue 12 relative to SEQ ID NO. 73, residue 17 relative to SEQ ID NO. 75, residue 23 relative to SEQ ID NO. 76 or residue 10 relative to SEQ ID NO. 597 or any combination thereof.
89. The polypeptide of any one of claims 86 to 88, wherein the cytosine deaminase sequence has at least 80% sequence identity with any one of SEQ ID NOs 1275, 835 or 774 or a combination thereof.
90. A polypeptide having adenosine deaminase activity, the polypeptide comprising:
A sequence having at least 80% sequence identity to any one of SEQ ID NOs 50, 51, 385-443, 448-475, 1015-1098 or a variant thereof;
Wherein the polypeptide comprises any combination of substitutions of wild-type residues to non-wild-type residues listed in table 12D.
91. The polypeptide of claim 90, wherein the polypeptide has at least 80% sequence identity to any one of SEQ ID NOs 1556-1638 or variants thereof.
92. The polypeptide of claim 90 or 91, wherein the polypeptide further comprises an endonuclease or a nicking enzyme.
93. The polypeptide of claim 92, wherein the polypeptide comprises the endonuclease or the nicking enzyme, wherein the endonuclease or the nicking enzyme comprises a sequence having at least 80% identity to any one of SEQ ID NOs 70-78, 596, 597-598, 1120, 1122-1127, 1647, or variants thereof.
94. The polypeptide of claim 92 or 93, wherein the polypeptide comprises the nicking enzyme, wherein the nicking enzyme comprises a mutation from aspartic acid to alanine at: residue 9 relative to SEQ ID NO. 70, residue 13 relative to SEQ ID NO. 71, 72 or 74, residue 12 relative to SEQ ID NO. 73, residue 17 relative to SEQ ID NO. 75, residue 23 relative to SEQ ID NO. 76 or residue 10 relative to SEQ ID NO. 597 or any combination thereof.
95. A polypeptide having adenosine deaminase activity, the polypeptide comprising:
A sequence having at least 80% sequence identity to any one of SEQ ID NOs 50, 51, 385-443, 448-475, 1015-1098 or a variant thereof;
Wherein the polypeptide comprises any combination of substitutions of wild-type residues to non-wild-type residues listed in table 13.
96. The polypeptide of claim 95, wherein the sequence has at least 80% sequence identity to SEQ ID No. 386 or a variant thereof.
97. The polypeptide of claim 95 or 96, wherein the polypeptide further comprises an endonuclease or a nicking enzyme.
98. The polypeptide of claim 97, wherein the polypeptide comprises the endonuclease or the nicking enzyme, wherein the endonuclease or the nicking enzyme comprises a sequence having at least 80% identity to any one of SEQ ID NOs 70-78, 596, 597-598, 1120, 1122-1127, 1647, or variants thereof.
99. The polypeptide of claim 97 or 98, wherein the polypeptide comprises the nicking enzyme, wherein the nicking enzyme comprises a mutation from aspartic acid to alanine at: residue 9 relative to SEQ ID NO. 70, residue 13 relative to SEQ ID NO. 71, 72 or 74, residue 12 relative to SEQ ID NO. 73, residue 17 relative to SEQ ID NO. 75, residue 23 relative to SEQ ID NO. 76 or residue 10 relative to SEQ ID NO. 597 or any combination thereof.
100. A method of editing an APOA1 locus in a cell, the method comprising contacting the following with the cell:
(a) RNA-guided endonucleases; and
(B) An engineered guide nucleic acid structure, wherein the engineered guide nucleic acid structure is configured to form a complex with the endonuclease, and the engineered guide nucleic acid structure comprises a spacer sequence configured to hybridize to a region of the APOA1 locus,
Wherein the engineered guide nucleic acid structure comprises a targeting sequence having at least 80% sequence identity to at least 18 consecutive nucleotides of any one of SEQ ID NOS: 1455-1478 or the reverse complement thereof.
101. The method of claim 100, wherein the engineered guide-nucleic acid structure has at least 80% identity to any one of SEQ ID NOs 1431-1454.
102. The method of claim 100, wherein the engineered guide nucleic acid structure comprises any of the nucleotide modifications listed in table 13A.
103. The method according to any one of claims 100 to 102, wherein the RNA guided endonuclease is a type 2 II endonuclease.
104. The method of claim 103, wherein the RNA guided endonuclease has at least 80% sequence identity to any one of SEQ ID NOs 70-78, 596, 597-598, 1120, 1122-1127, 1647 or variants thereof.
105. A method of editing an ANGPTL3 locus in a cell, the method comprising contacting the cell with:
(a) RNA-guided endonucleases; and
(B) An engineered guide nucleic acid structure, wherein the engineered guide nucleic acid structure is configured to form a complex with the endonuclease, and the engineered guide nucleic acid structure comprises a spacer sequence configured to hybridize to a region of the ANGPTL3 locus,
Wherein the engineered guide nucleic acid structure comprises a targeting sequence having at least 80% sequence identity to at least 18 consecutive nucleotides of any one of SEQ ID nos. 1484-1488 or the reverse complement thereof.
106. The method of claim 105, wherein the engineered guide-nucleic acid structure has at least 80% identity to any one of SEQ ID NOs 1479-1483.
107. The method of claim 105, wherein the engineered guide nucleic acid structure comprises any of the nucleotide modifications listed in table 13A.
108. The method according to any one of claims 105 to 107, wherein the RNA guided endonuclease is a type 2 II endonuclease.
109. The method of claim 108, wherein the RNA guided endonuclease has at least 80% sequence identity to any one of SEQ ID NOs 70-78, 596, 597-598, 1120, 1122-1127, 1647 or variants thereof.
110. A method of editing a TRAC locus in a cell, the method comprising contacting the following with the cell:
(a) RNA-guided endonucleases; and
(B) An engineered guide structure, wherein the engineered guide structure is configured to form a complex with the endonuclease, and the engineered guide structure comprises a spacer sequence configured to hybridize to a region of the TRAC locus,
Wherein the engineered guide nucleic acid structure comprises a targeting sequence having at least 80% sequence identity to at least 18 consecutive nucleotides of any one of SEQ ID NOs 1491-1492 or the reverse complement thereof.
111. The method of claim 110, wherein the engineered guide-nucleic acid structure has at least 80% identity to any one of SEQ ID NOs 1489-1490.
112. The method of claim 111, wherein the engineered guide-nucleic acid structure comprises any of the nucleotide modifications listed in table 13A.
113. The method according to any one of claims 110 to 112, wherein the RNA guided endonuclease is a type 2 II endonuclease.
114. The method of claim 113, wherein the RNA guided endonuclease has at least 80% sequence identity to any one of SEQ ID NOs 70-78, 596, 597-598, 1120, 1122-1127, 1647 or variants thereof.
115. An engineered adenosine base editor polypeptide, wherein the polypeptide comprises a sequence having at least 80% sequence identity to any one of SEQ ID NOs 1647-1653.
CN202280074006.6A 2021-11-05 2022-11-04 Base editing enzyme Pending CN118202044A (en)

Applications Claiming Priority (7)

Application Number Priority Date Filing Date Title
US63/276,461 2021-11-05
US63/289,998 2021-12-15
US63/342,824 2022-05-17
US63/356,888 2022-06-29
US202263378171P 2022-10-03 2022-10-03
US63/378,171 2022-10-03
PCT/US2022/079345 WO2023081855A1 (en) 2021-11-05 2022-11-04 Base editing enzymes

Publications (1)

Publication Number Publication Date
CN118202044A true CN118202044A (en) 2024-06-14

Family

ID=91410382

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202280074006.6A Pending CN118202044A (en) 2021-11-05 2022-11-04 Base editing enzyme

Country Status (1)

Country Link
CN (1) CN118202044A (en)

Similar Documents

Publication Publication Date Title
AU2021231074B2 (en) Class II, type V CRISPR systems
JP2019526248A (en) Programmable CAS9-recombinase fusion protein and use thereof
US20230348876A1 (en) Base editing enzymes
CN116096877A (en) Class II type II CRISPR system
JP2024504981A (en) Novel engineered and chimeric nucleases
CA3234233A1 (en) Endonuclease systems
EP4200422A1 (en) Systems and methods for transposing cargo nucleotide sequences
AU2022380842A1 (en) Base editing enzymes
WO2023076952A1 (en) Enzymes with hepn domains
AU2022284808A1 (en) Class ii, type v crispr systems
CN118202044A (en) Base editing enzyme
US20230348877A1 (en) Base editing enzymes
CN116867897A (en) Base editing enzyme
JP7125727B1 (en) Compositions for modifying nucleic acid sequences and methods for modifying target sites in nucleic acid sequences
CN118265783A (en) Endonuclease system
JP2023179468A (en) Enzymes with ruvc domains
CN118019843A (en) Class II V-type CRISPR system

Legal Events

Date Code Title Description
PB01 Publication