CN116635524A - DNA modifying enzyme, active fragment and variant thereof and using method - Google Patents

DNA modifying enzyme, active fragment and variant thereof and using method Download PDF

Info

Publication number
CN116635524A
CN116635524A CN202180075570.5A CN202180075570A CN116635524A CN 116635524 A CN116635524 A CN 116635524A CN 202180075570 A CN202180075570 A CN 202180075570A CN 116635524 A CN116635524 A CN 116635524A
Authority
CN
China
Prior art keywords
sequence
polypeptide
rgn
fusion protein
seq
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202180075570.5A
Other languages
Chinese (zh)
Inventor
T·D·博文
A·B·克拉维利
T·D·埃里驰
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Life Editor Pharmaceutical Co ltd
Original Assignee
Life Editor Pharmaceutical Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Life Editor Pharmaceutical Co ltd filed Critical Life Editor Pharmaceutical Co ltd
Priority claimed from PCT/US2021/049853 external-priority patent/WO2022056254A2/en
Publication of CN116635524A publication Critical patent/CN116635524A/en
Pending legal-status Critical Current

Links

Landscapes

  • Enzymes And Modification Thereof (AREA)
  • Micro-Organisms Or Cultivation Processes Thereof (AREA)

Abstract

Compositions and methods are provided that include novel deaminase polypeptides for targeted editing of nucleic acids. The compositions include deaminase polypeptides. Fusion proteins comprising the DNA binding polypeptides and deaminase of the present application are also provided. The fusion protein comprises an RNA-guided nuclease fused to a deaminase, optionally complexed with a guide RNA. Compositions also include nucleic acid molecules encoding deaminase or fusion proteins. Vectors and host cells comprising nucleic acid molecules encoding deaminase or fusion proteins are also provided.

Description

DNA modifying enzyme, active fragment and variant thereof and using method
Cross-reference to related applications
The present application claims priority from U.S. provisional application No. 63/077,089, filed on day 9 and 11, 2020, and U.S. provisional application No. 63/146,840, filed on day 2 and 8, 2021, each of which is incorporated herein by reference in its entirety.
[ statement regarding sequence Listing ]
The sequence listing associated with the present application has been provided by ASCII format in place of paper copy and is incorporated by reference herein. This ASCII copy is named L103438_1230wo_0108_1_sl.txt, which is 1,071,246 bytes in size, which was created at 9/2021 and submitted electronically via EFS-Web.
[ field of technology ]
The present invention relates to the field of molecular biology and gene editing.
[ background Art ]
Targeted genome editing or modification is rapidly becoming an important tool for basic and applied research. The initial approach involved engineered nucleases such as meganucleases, zinc finger fusion proteins or TALENs, which required the production of chimeric nucleases with engineered, programmable, sequence-specific DNA binding domains specific for each particular target sequence. RNA-guided nucleases (RGNs) (e.g., clustered regularly interspaced short palindromic repeats (Clustered Regularly Interspaced Short Palindromic Repeats) (CRISPR) -associated (Cas) proteins of CRISPR-Cas bacterial systems) allow targeting of specific sequences (specific sequence) by complexing the nucleases with guide RNAs that specifically hybridize to the specific target sequences. The cost of generating target-specific guide RNAs is lower and more efficient than generating chimeric nucleases for each target sequence. Such RNA-guided nucleases can be used to edit a genome by the introduction of sequence-specific double strand breaks that are repaired via error-prone non-homologous end joining (NHEJ) to introduce mutations at specific genomic positions (specific genomic location).
In addition, RGN can be used to target DNA editing measures. Targeted editing of nucleic acid sequences that allow specific modifications to be introduced into genomic DNA, e.g., targeted cleavage, enables very subtle differential measures to study gene function and gene expression. RGN can also be used to generate chimeric proteins that use the RNA-directed activity of RGN in combination with DNA modifying enzymes such as deaminase for targeted base editing. Targeted editing can be deployed for targeting genetic diseases in humans or for introducing agronomically beneficial mutations in the crop genome. The development of genome editing tools provides new measures for gene editing-based mammalian therapies and agricultural biotechnology.
[ invention ]
Compositions and methods for modifying a target DNA molecule are provided. The compositions can be used to modify a target DNA molecule of interest. Compositions provided include deaminase polypeptides. Fusion proteins comprising a nucleic acid molecule binding polypeptide (e.g., a DNA binding polypeptide) and a deaminase polypeptide, and ribonucleoprotein complexes comprising fusion proteins comprising an RNA-guided nuclease and a deaminase polypeptide and ribonucleic acids are also provided. The provided compositions also include nucleic acid molecules encoding deaminase polypeptides or fusion proteins, and vectors and host cells comprising the nucleic acid molecules. The methods disclosed herein are designed to bind to and modify a target sequence of interest within a target DNA molecule of interest.
[ detailed description of the invention ]
Many modifications and other embodiments of the inventions set forth herein will come to mind to one skilled in the art to which these inventions pertain having the benefit of the teachings presented in the foregoing descriptions. Therefore, it is to be understood that the inventions are not to be limited to the specific embodiments disclosed and that modifications and other embodiments are intended to be included within the scope of the appended claims. Although specific terms are employed herein, they are used in a generic and descriptive sense only and not for purposes of limitation.
I. Summary of the invention
The present disclosure provides novel adenine deaminase enzymes and fusion proteins including nucleic acid molecule binding polypeptides (such as DNA binding polypeptides) and novel deaminase polypeptides. In certain embodiments, the DNA-binding polypeptide is a sequence-specific DNA-binding polypeptide in that the DNA-binding polypeptide binds to the sequence of interest at a frequency that is higher than the frequency of binding to the randomized background sequence. In some embodiments, the DNA-binding polypeptide is or is derived from meganuclease (meganuclease), zinc finger fusion protein, or TALEN. In some embodiments, the fusion protein comprises an RNA-guided DNA binding polypeptide and a deaminase polypeptide. In some embodiments, the RNA-guided DNA-binding polypeptide is an RNA-guided nuclease, such as a Cas9 polypeptide domain that binds to a guide RNA (also referred to as a gRNA), which in turn binds to a target nucleic acid sequence via strand hybridization.
Deaminase polypeptides disclosed herein can deaminate nucleobases such as adenine, for example. Deaminase deamination of a nucleobase can result in a point mutation at the corresponding residue, referred to herein as "nucleic acid editing" or "base editing". Thus, fusion proteins comprising an RNA-guided nuclease (RGN) polypeptide and a deaminase may be used for targeted editing of nucleic acid sequences.
Such fusion proteins are useful for in vitro targeted editing of DNA, e.g., for the production of genetically modified cells. Such genetically modified cells may be plant cells or animal cells. Such fusion proteins may also be useful for the introduction of targeted mutations, e.g., for correcting gene defects in mammalian cells ex vivo, such as in cells obtained from a subject that are subsequently reintroduced into the same or another subject; and can be used for targeted mutation introduction, such as correction of gene defects or introduction of inactivating mutations in disease-associated genes in mammalian subjects. Such fusion proteins may also be useful for the introduction of targeted mutations in plant cells, for example, for the introduction of beneficial or agronomically valuable traits or alleles.
The terms "protein," "peptide," and "polypeptide" are used interchangeably herein and refer to a polymer of amino acid residues that are linked together by peptide (amide) bonds. The term refers to a protein, peptide or polypeptide of any size, structure or function. Typically, a protein, peptide or polypeptide will be at least three amino acids in length. A protein, peptide or polypeptide may refer to an individual protein or collection of proteins. For example, one or more amino acids in a protein, peptide, or polypeptide may be modified by adding chemical entities such as a carbohydrate group, hydroxyl group, phosphate group, farnesyl (farnesyl) group, isofarnesyl group, fatty acid group, linker for conjugation, functionalization, or other modification, or the like. The protein, peptide or polypeptide may also be a single molecule, or may be a multi-molecular complex. The protein, peptide or polypeptide may be a fragment of a naturally occurring protein or peptide. The protein, peptide, or polypeptide may be naturally occurring, recombinant, synthetic, or any combination thereof.
Any of the proteins provided herein can be produced by any method known in the art. For example, any of the proteins provided herein can be produced via recombinant protein expression and purification, which is particularly suitable for fusion proteins comprising a peptide linker. Recombinant protein expression and purification methods are well known and include Green and Sambrook, molecular Cloning: a Laboratory Manual (4 th ed., cold Spring Harbor Laboratory Press, cold Spring Harbor, n.y. (2012)) are incorporated herein by reference in their entirety.
Deaminase enzyme
The term "deaminase" refers to an enzyme that catalyzes a deamination reaction. The deaminase of the present invention is a nucleobase deaminase and the terms "deaminase" and "nucleobase deaminase" are used interchangeably herein. The deaminase may be a naturally occurring deaminase enzyme or an active fragment or variant thereof. Deaminase may be active on single stranded nucleic acids such as ssDNA or ssRNA or on double stranded nucleic acids such as dsDNA or dsRNA. In some embodiments, deaminase only deaminates ssDNA, but has no effect on dsDNA.
The presently disclosed methods and compositions include adenine deaminase. In some embodiments, the deaminase is an ADAT family deaminase or variant thereof. Deamination of adenine, adenosine or deoxyadenosine yields inosine, which is processed to guanine by a polymerase. To date, naturally occurring adenine deaminase enzymes that deaminate adenine in DNA have not been known. Several methods have been employed to evolve and optimize Adenine Deaminase (ADAT) active on tRNA proteins in mammalian cells (Gaudelli et al, 2017; koblan, L.W. et al, 2018,Nat Biotechnol36, 843-846; richter, M.F. et al, 2020,Nat Biotechnol,doi:10.1038/s41587-020-0562-8, each of which is incorporated herein by reference in its entirety). One such method uses a bacterial selection assay in which only by a: t > G: c transition, cells with the ability to activate antibiotic resistance are able to survive.
The present invention relates to novel adenine deaminase polypeptides produced by evolution and optimization of bacterial deaminase. Novel adenine deaminase is currently disclosed and elucidated as SEQ ID NO:1-10 and 399-441. The deaminase of the present invention may be useful for editing of DNA or RNA molecules. In some embodiments, deaminase of the present invention may be useful for editing ssDNA or ssRNA molecules. The adenine deaminase described herein is used as a deaminase alone or as a component in a fusion protein. Fusion proteins comprising a DNA targeting polypeptide and an adenine deaminase polypeptide are referred to herein as "a-based editors", "adenine base editors" or "ABEs", and may be useful for targeted editing of nucleic acid sequences.
A "base editor" is a fusion protein comprising a DNA targeting polypeptide (such as RGN) and a deaminase. Adenine Base Editors (ABE) include DNA targeting proteins such as RGN and adenine deaminase. ABE acts in inosine on a DNA target molecule by deamination of adenine (Gaudelli, n.m. et al, 2017). Inosine is recognized as guanine by a polymerase and allows the incorporation of cytosine on the complementary DNA strand spanned from inosine. After one round of post-replication deamination, the presence of the resulting a in the genome: t to G: base pair changes of C. In some embodiments, the presently disclosed adenine deaminase or active variant or fragment thereof introduces a > N mutation into a DNA molecule, wherein N is C, G or T. In a further embodiment, they introduce A > G mutations into the DNA molecule.
In those embodiments in which deaminase has been targeted to a particular region of a nucleic acid molecule by fusion with a DNA-binding polypeptide, the mutation rate of adenine within or adjacent to the target sequence to which the DNA-binding polypeptide binds can be measured using methods known in the art, including Polymerase Chain Reaction (PCR), restriction Fragment Length Polymorphism (RFLP), or DNA sequencing.
The novel deaminase or active variants or fragments thereof that retain deaminase activity of the present disclosure may be introduced into cells as part of and/or may be co-expressed with a DNA binding polypeptide deaminase fusion to increase the efficiency of introducing the desired a > G mutation into the target DNA molecule. The deaminase presently disclosed has the amino acid sequence of SEQ ID NO:1-10 and 399-441, or a variant or fragment thereof that retains deaminase activity. In some embodiments, the deaminase has a sequence that matches SEQ ID NO: the amino acid sequence of any one of claims 1-10 and 399-441 has an amino acid sequence that is at least 50%, at least 55%, at least 60%, at least 65%, at least 70%, at least 75%, at least 80%, at least 81%, at least 82%, at least 83%, at least 84%, at least 85%, at least 86%, at least 87%, at least 88%, at least 89%, at least 90%, at least 91%, at least 92%, at least 93%, at least 94%, at least 95%, at least 96%, at least 97%, at least 98%, or at least 99% identical. In a particular embodiment, the deaminase comprises a nucleotide sequence that hybridizes to SEQ ID NO: 407. 405, 399, 1-10, 400-404, 406, and 408-441. In some embodiments, the deaminase comprises a nucleotide sequence that hybridizes to SEQ ID NO:407 has an amino acid sequence having at least 80% sequence identity. For example, deaminase comprises a sequence identical to SEQ ID NO:407 has an amino acid sequence that is at least about 80% identical, at least about 90% identical, at least about 95% identical, at least about 96% identical, at least about 97% identical, at least about 98% identical, at least about 99% identical, at least about 99.5% identical, or at least about 99.9% identical. In some embodiments, the deaminase comprises a nucleotide sequence that hybridizes to SEQ ID NO:407 has an amino acid sequence that is at least 80% identical, at least 90% identical, at least 95% identical, at least 96% identical, at least 97% identical, at least 98% identical, at least 99% identical, at least 99.5% identical, or at least 99.9% identical. In some embodiments, the deaminase comprises SEQ ID NO: 407. In some embodiments, the deaminase comprises a nucleotide sequence that hybridizes to SEQ ID NO:399 has an amino acid sequence having at least 80% sequence identity. For example, deaminase comprises a sequence identical to SEQ ID NO:399 has an amino acid sequence that is at least 80% identical, at least about 90% identical, at least about 95% identical, at least about 96% identical, at least about 97% identical, at least about 98% identical, at least about 99% identical, at least about 99.5% identical, or at least about 99.9% identical. In some embodiments, the deaminase comprises a nucleotide sequence that hybridizes to SEQ ID NO:399 has an amino acid sequence that is at least 80% identical, at least 90% identical, at least 95% identical, at least 96% identical, at least 97% identical, at least 98% identical, at least 99% identical, at least 99.5% identical, or at least 99.9% identical. In some embodiments, the deaminase comprises SEQ ID NO: 399. In some embodiments, the deaminase comprises a nucleotide sequence that hybridizes to SEQ ID NO:405 has an amino acid sequence having at least 80% sequence identity. For example, deaminase comprises a sequence identical to SEQ ID NO:405 has an amino acid sequence that is at least about 80% identical, at least about 90% identical, at least about 95% identical, at least about 96% identical, at least about 97% identical, at least about 98% identical, at least about 99% identical, at least about 99.5% identical, or at least about 99.9% identical. In some embodiments, the deaminase comprises a nucleotide sequence that hybridizes to SEQ ID NO:405 has an amino acid sequence that is at least 80% identical, at least 90% identical, at least 95% identical, at least 96% identical, at least 97% identical, at least 98% identical, at least 99% identical, at least 99.5% identical, or at least 99.9% identical. In some embodiments, the deaminase comprises SEQ ID NO: 405.
Nucleic acid molecule binding polypeptides
Some aspects of the disclosure provide fusion proteins comprising a nucleic acid molecule binding polypeptide and a deaminase polypeptide. Although binding to RNA molecules and targeted editing of RNA molecules are contemplated by the present invention, in some embodiments, the nucleic acid molecule binding polypeptide of the fusion protein is a DNA binding polypeptide. Such fusion proteins can be used for targeted editing of DNA in vitro, ex vivo, or in vivo. The novel fusion proteins are active in mammalian cells and can be used for targeted editing of DNA molecules.
The term "fusion protein" as used herein is meant to include protein domains from at least two different proteins. The fusion protein may include more than one distinct domain, for example, a DNA binding domain and a deaminase. In some embodiments, the fusion protein is a complex with a nucleic acid (e.g., RNA) or is associated with a nucleic acid (e.g., RNA).
In some embodiments, the fusion proteins of the present disclosure include a DNA binding polypeptide. The term "DNA-binding polypeptide" as used herein refers to any polypeptide capable of binding to DNA. In certain embodiments, the DNA-binding polypeptide sites of the fusion proteins of the present disclosure bind to double-stranded DNA. In certain embodiments, the DNA-binding polypeptide binds to DNA in a sequence-specific manner. The term "sequence-specific" or "sequence-specific manner" as used herein refers to selective interactions with a specific nucleotide sequence.
Two polynucleotide sequences are considered to be substantially complementary when the two sequences hybridize to each other under stringent conditions. Likewise, a DNA-binding polypeptide is considered to bind to a particular sequence of interest in a sequence-specific manner if the DNA-binding polypeptide binds to its sequence under stringent conditions. "stringent conditions" or "stringent hybridization conditions" are intended to refer to conditions under which two polynucleotide sequences (or polypeptides bind to their specific target sequences) will bind to each other to a detectable degree higher than other sequences (e.g., at least 2-fold higher than background). Stringent conditions are sequence-dependent and will be different in different circumstances. Typically, stringent conditions will be those in which: at a pH of 7.0 to 8.3, the salt concentration is less than about 1.5M Na ion, typically about 0.01 to 1.0M Na ion concentration (or other salt), and the temperature is at least about 30℃for short sequences (e.g., 10 to 50 nucleotides) and at least about 60℃for long sequences (e.g., greater than 50 nucleotides). Stringent conditions may also be achieved with the addition of destabilizing agents such as formamide. Exemplary low stringency conditions include hybridization with 30 to 35% formamide, 1M NaCl, 1% sds (sodium dodecyl sulfate) buffer solution at 37 ℃ and washing with 1X to 2X SSC (20X SSC = 3.0M NaCl/0.3M trisodium citrate) at 50 to 55 ℃. Exemplary moderately stringent conditions include hybridization in 40 to 45% formamide, 1.0M NaCl, 1% SDS at 37℃and washing in 0.5 to 1 XSSC at 55 to 60 ℃. Exemplary high stringency conditions include hybridization in 50% formamide, 1M NaCl, 1% sds at 37 ℃ and washing in 0.1X SSC at 60 to 65 ℃. Optionally, the wash buffer may comprise about 0.1% to about 1% SDS. The duration of hybridization is generally less than about 24 hours, typically about 4 to about 12 hours. The duration of the washing time is at least a length of time sufficient to reach equilibrium.
The Tm is the temperature (at defined ionic strength and pH) at which 50% of the complementary target sequence hybridizes to a perfectly matched sequence. For DNA-DNA hybrids, the Tm can be determined by Meinkoth and Wahl (1984) Anal. Biochem.138: 267-284: tm=81.5 ℃ +16.6 (log M) +0.41 (% GC) -0.61 (% form) -500/L estimated approximately; where M is the molar concentration of monovalent cations,% GC is the percentage of guanosine and cytosine nucleotides in the DNA,% form is the percentage of formamide in the hybridization solution, and L is the length of the hybrid in base pairs. In general, stringent conditions are selected to be about 5 ℃ lower than the thermal melting point (Tm) for the specific sequence and its complement at the defined ionic strength and pH. However, extremely stringent conditions may employ hybridization and/or washing at 1, 2, 3 or 4℃lower than the thermal melting point (TM); moderately stringent conditions can employ hybridization and/or washing at a temperature of 6, 7, 8, 9, or 10 ℃ below the thermal melting point (Tm); hybridization and/or washing at a temperature 11, 12, 13, 14, 15 or 20℃below the thermal melting point (Tm) can be employed under low stringency conditions. Those of ordinary skill in the art will appreciate that variations in the stringency of hybridization and/or wash solutions are inherently described using such formulas, hybridization and wash compositions, and desired Tm. Extensive guidelines for nucleic acid hybridization can be found in Tijssen (1993) Laboratory Techniques in Biochemistry and Molecular Biology-Hybridization with Nucleic Acid Probes, part I, chapter 2 (Elsevier, new York); and Ausubel et al eds (1995) Current Protocols in Molecular Biology, chapter 2 (Greene Publishing and Wiley-Interscience, new York). See Sambrook et al (1989) Molecular Cloning: a Laboratory Manual (2 ded., cold Spring Harbor Laboratory Press, plainview, new York).
In certain embodiments, the sequence-specific DNA-binding polypeptide is an RNA-guided DNA-binding polypeptide (RGDBP). The terms "RNA-guided DNA-binding polypeptide" and "RGDBP" as used herein refer to polypeptides that are capable of binding to DNA by hybridization of a cognate RNA molecule to a target DNA sequence.
In some embodiments, the DNA-binding polypeptide of the fusion protein is a nuclease, such as a sequence-specific nuclease. The term "nuclease" as used herein refers to an enzyme that catalyzes the cleavage of phosphodiester bonds between nucleotides in a nucleic acid molecule. In some embodiments, the DNA-binding polypeptide is an endonuclease that is capable of cleaving a phosphodiester bond between nucleotides within a nucleic acid molecule, while in certain embodiments, the DNA-binding polypeptide is an exonuclease that is capable of cleaving nucleotides at either end (5 'or 3') of a nucleic acid molecule. In some embodiments, the sequence-specific nuclease is selected from the group consisting of meganuclease (meganuclease), zinc finger nuclease, TAL effector DNA binding domain nuclease fusion protein (TALEN), and RNA-guided nuclease (RGN), or a variant thereof, wherein nuclease activity is reduced or inhibited.
The term "meganuclease" or "homing endonuclease" as used herein refers to an endonuclease that binds a recognition site of 12 to 40bp in length within double-stranded DNA. A non-limiting example of a meganuclease is one belonging to the LAGLIDADG family comprising the conserved amino acid motif LAGLIDADG (SEQ ID NO: 49). The term "meganuclease" may refer to a dimeric or single-stranded meganuclease.
The term "zinc finger nuclease" or "ZFN" as used herein refers to a chimeric protein comprising a zinc finger DNA binding domain and a nuclease domain.
The term "TAL effector DNA binding domain nuclease fusion protein" or "TALEN" as used herein refers to a chimeric protein comprising a TAL effector DNA binding domain and a nuclease domain.
The term "RNA-guided nuclease" or "RGN" as used herein refers to an RNA-guided DNA-binding polypeptide having nuclease activity. RGN is considered "RNA-guided" in that the guide RNA forms a complex with an RNA-guided nuclease to direct the RNA-guided nuclease to bind to a target sequence, and in some embodiments, introduce a single-or double-strand break at the target sequence. RGN can be CasX, casY, C cl, C2C2, C2C3, geoCas9, aSpCas9, saCas9, nme2Cas9, cjCas9, cas12a (previously referred to as Cpfl), cas12b, cas12g, cas12h, cas12i, aLbCas12a, asCas12a, casMINI, cas b, cas13C, cas13d, cas14, csn2, xCas9, spCas9-NG, lbCas12a, asCas12a, cas9-KKH, circular arrangement Cas9, argonaute (Ago), smacCas9, or Spy-macCas9, spy-macCas9 domain, or a polypeptide having the sequence of SEQ ID NO: 41. an RGN of the amino acid sequence shown in any one of 60, 366 or 368. In some embodiments, as described below, the RGN provided herein is an RGN nickase.
According to the present invention, RGN proteins that have been mutated to become nuclease-inactive (or nuclease-dead) (e.g., such as dCAS 9) may be referred to as RNA-directed DNA-binding polypeptides or nuclease-inactive RGNs or nuclease-free RGNs. In addition, suitable nuclease-inactivating Cas9 domains of other known RNA-guided nucleases (RGNs) can be determined (e.g., nuclease-inactivating variants of RGN APG08290.1 disclosed in U.S. patent publication No. 2019/0367949, the entire contents of which are incorporated herein by reference).
In some embodiments, the fusion protein comprises RGN fused to a deaminase described herein. In those embodiments of the fusion proteins described above, the deaminase is selected from the group consisting of a polypeptide that hybridizes with SEQ ID NO:1-10 and 399-441, and at least 80% sequence identity. In some embodiments, the deaminase comprises a nucleotide sequence that hybridizes to SEQ ID NO:407 has an amino acid sequence having at least 80% sequence identity. In some embodiments, the deaminase comprises a nucleotide sequence that hybridizes to SEQ ID NO:399 has an amino acid sequence having at least 80% sequence identity. In some embodiments, the deaminase comprises a nucleotide sequence that hybridizes to SEQ ID NO:405 has an amino acid sequence having at least 80% sequence identity. In those embodiments of the fusion proteins described above, RGN is from CasX, casY, C2C1, C2, C2C3, geoCas9, asCas 9, saCas9, nme2Cas9, cjCas9, casl2a (previously referred to as Cpfl), cas12b, cas12g, cas12h, cas12i, aLbCas12a, asCas12a, casMINI, cas b, cas13C, cas13d, cas14, csn2, xCas9, spCas9-NG, lbCas12a, asCas12a, cas9-KKH, circularly arranged Cas9, argonaute (Ago), smacas 9, spy-macCas9 domain, or a polypeptide having the sequence of SEQ ID NO: 41. RGN of the amino acid sequence set forth in any one of 60, 366 or 368. In a particular embodiment, the fusion protein comprises a sequence that hybridizes to SEQ ID NO:407, a deaminase fused Cas9 nickase having an amino acid sequence of at least 80% sequence identity. In some embodiments, the fusion protein comprises a sequence that hybridizes to SEQ ID NO:399 a deaminase fused Cas9 nickase having an amino acid sequence of at least 80% sequence identity. In a particular embodiment, the fusion protein comprises a sequence that hybridizes to SEQ ID NO:405 a deaminase fused Cas9 nickase having an amino acid sequence of at least 80% sequence identity. The Cas9 nickase may be any Cas9 nickase disclosed in PCT patent publication No. WO2020181195, the entire contents of which are incorporated herein by reference.
The term "RGN polypeptide" encompasses single-stranded RGN polypeptides that cleave only the nucleotide sequence of interest, which are referred to herein as nicking enzymes. Such RGNs have a single functional nuclease domain. The RGN nicking enzyme may be a naturally occurring nicking enzyme, or may be an RGN protein that naturally cleaves both strands of a double-stranded nucleic acid molecule that has been mutated within one or more nuclease domains such that the nuclease activity of such mutated domains is reduced or eliminated. In some embodiments, the nicking enzyme RGN of the fusion protein comprises a mutation (e.g., a D10A mutation) that enables RGN to cleave only the non-base edited target strand of an amino acid duplex (including the strand that PAM and base pairs with gRNA). This D10A mutation mutates the first aspartic acid residue in the cleaved RuvC nuclease domain of RGN. The present application discloses several D10A nicking enzyme variants or homologous nicking enzyme variants of the described RGN (see example 4). nAG 07433.1 and nAG 08190.1 (shown as SEQ ID NOs: 42 and 61 respectively) are shown as SEQ ID NOs: the nicking enzyme variants of APG07433.1 and APG08290.1 shown in 41 and 60, and described in WO 2019/236566 (the entire contents of which are incorporated herein by reference). The nicking enzyme variants of NAPG00969 (shown as SEQ ID NO: 52) and nPG 09748 (shown as SEQ ID NO: 54) are APG00969 and APG09748, respectively, which are described in WO 2020/139783 (the entire contents of which are incorporated herein by reference). The nAPG06646 (shown as SEQ ID NO: 53) and nAPG09882 (shown as SEQ ID NO: 55) are nicking enzyme variants of APG06646 and APG09882, respectively, which are described in PCT publication WO 202I/030344, the entire contents of which are incorporated herein by reference. The nAPG03850, the nAPG07553, the nAPG055886 and the nAPG01604 are respectively shown in SEQ ID NO:56-59, and are nicking enzyme variants of APG03850, APG07553, APG055886, and APG01604, described in pending PCT application No. PCT/US2021/028843, the entire contents of which are incorporated herein by reference. Various RGN nicking enzymes, variants thereof, and sequences thereof are disclosed in PCT patent publication No. WO2020181195, which is incorporated herein by reference in its entirety. One exemplary suitable nuclease-inactivating activity Cas9 is a D10A/H840A Cas9 mutant (see, e.g., qi et al, cell.2013;152 (5): 1173-83, the entire contents of which are incorporated herein by reference).
In some embodiments, the nicking enzyme RGN of the fusion protein comprises a mutation (e.g., an H840A mutation) that enables the RGN to cleave only the base-edited non-targeted strand of the nucleic acid double-stranded helix (excluding PAM and strands that do not base pair with gRNA). The H840A mutation mutates the first histidine of the HNH nuclease domain. Nicking enzyme RGN comprising an H840A mutation or an equivalent mutation has a inactivating HNH domain. Nicking enzyme RGN with the H840A mutation cleaves non-targeting strands. Nicking enzymes comprising D10A mutations or equivalent mutations have an inactivating RuvC nuclease domain and cleave the targeting strand. The D10A nickase is unable to cleave the non-targeted strand of DNA, i.e., the strand for which base editing is desired.
Other additional exemplary suitable nuclease-inactivating active Cas9 domains include, but are not limited to, D10A/D839A/H840A and D10A/D839A/H840A/N863A mutant domains (see, e.g., mali et al Nature biotechnology.2013;31 (9): 833-838, the entire contents of which are incorporated herein by reference). Based on this disclosure and knowledge in the art (e.g., RGN such as disclosed in PCT publication No. WO 2019/236566, WO2020181195, the entire contents of such PCT publications are incorporated herein by reference), additional suitable RGN proteins mutated to nicking enzymes will be apparent to those of skill in the art and are within the scope of this disclosure. In a preferred embodiment, RGNs with nicking enzyme activity on the target strand nick the target strand, while complementary non-target strands are modified by deaminase. Using the modified non-target strand as a template, the DNA repair mechanism of the cell may repair the nicked target strand, thereby introducing mutations into the DNA.
In some embodiments, the RGN nickase that retains nickase activity comprises a sequence identical to SEQ ID NO:42 or SEQ ID NO:52-59, 61, 397, and 398 having an amino acid sequence that is at least 60%, at least 65%, at least 70%, at least 75%, at least 80%, at least 85%, at least 90%, at least 95%, at least 96%, at least 97%, at least 98%, at least 99%, or at least 99.5% identical.
Any method known in the art for introducing mutations into amino acid sequences, such as PCR-mediated mutagenesis (PCR-mediated mutagenesis) and site-directed mutagenesis (site-directed mutagenesis), can be used to generate RGNs with nicking or no nuclease activity (nickases or nuclease-read). See, for example, U.S. publication 2014/0068797 and U.S. patent 9,790,490; the entire contents of each of this U.S. publication and U.S. patent are incorporated herein by reference. RNA-guided nucleases (RGNs) allow targeted manipulation of a single site within the genome and are useful in therapeutic and research applications in the context of gene targeting. RNA-guided nucleases have been used in genome engineering in a variety of organisms, including mammals, by stimulating non-homologous end joining or homologous recombination. RGN comprises a CRISPR-Cas protein, which is an RNA-guided nuclease that is directed to a target sequence by a guide RNA (gRNA) as part of a Clustered Regularly Interspaced Short Palindromic Repeats (CRISPR) RNA-guided nuclease system, or a variant or fragment thereof.
Further provided herein are nucleic acid sequences comprising the amino acid sequences as set forth in SEQ ID NOs: 41 or 60, but lacks the amino acid sequence set forth in SEQ ID NO:41 or 60 (and nucleic acid molecules encoding RGN polypeptides) or active variants or fragments thereof from amino acid residues 590 to 597. In certain embodiments, the RGN polypeptide comprises the amino acid sequence as set forth in SEQ ID NO: 366. 368, 397, or 398, or an active variant or fragment thereof.
Some aspects of this disclosure provide fusion proteins that include an RNA-guided DNA-binding polypeptide and a deaminase polypeptide (specifically, an adenine deaminase polypeptide). In some embodiments, the RNA-guided DNA-binding polypeptide is an RNA-guided nuclease. In further embodiments, the RNA-guided nuclease is a naturally-occurring CRISPR-Cas protein or an active variant or fragment thereof. CRISPR-Cas systems are classified as class 1 or class 2 systems. Class 2 systems include single effector nucleases and include types II, V, and VI. Class 1 and class 2 systems are subdivided into forms (I, II, III, IV, V, VI), and some are further divided into forms (e.g., II-A, II-B, II-C, V-A, V-B).
In certain embodiments, the CRISPR-Cas protein is a naturally occurring type II CRISPR-Cas protein or an active variant or fragment thereof. The term "type II CRISPR-Cas protein", "type II CRISPR-Cas effector protein" or "Cas9" as used herein refers to a CRISPR-Cas effector that requires transactivated RNA (tracrRNA) and includes two nuclease domains (i.e., ruvC and HNH), each of which results in cleavage of a single strand of a double-stranded DNA molecule. In some embodiments, the invention provides a fusion protein comprising the presently disclosed deaminase fused to streptococcus pyogenes Cas9 (SpCas 9) or SpCas9 nickase having the sequence set forth in SEQ ID NO:555 and 556, and are described in U.S. patent nos. 10,000,772 and 8,697,359, the entire contents of any one of which are incorporated herein by reference. In some embodiments, the present invention provides a fusion protein comprising the presently disclosed deaminase fused to streptococcus thermophilus Cas9 (StCas 9) or StCas9 nickase having the sequence set forth in SEQ ID NO:557 and 558, and is disclosed in U.S. patent No. 10,113,167, which is incorporated herein by reference in its entirety. In some embodiments, the invention provides a fusion protein comprising the presently disclosed deaminase fused to staphylococcus aureus (Streptococcus aureus) Cas9 (SaCas 9) or SaCas9 nickase having the sequence set forth in SEQ ID NO:559 and 560, and is disclosed in U.S. patent No. 9,752,132, which is incorporated herein by reference in its entirety.
In some embodiments, the CRISPR-Cas protein is a naturally occurring V-type CRISPR-Cas protein or an active variant or fragment thereof. The term "V-type CRISPR-Cas protein", "V-type CRISPR-Cas effector protein" or "Cas12" as used herein refers to CRISPR-Cas effector proteins that cleave dsDNA and include single RuvC nuclease domains or cleave RuvC nuclease domains and lack HNH domains (Zetsche et al 2015,Cell doi:10.1016/j. Cell.2015.09.038; shimakov et al 2017,Nat Rev Microbiol doi:10.1038/nrmicro.2016.184; yan et al 2018,Science doi:10.1126/science.aav7271; harrington et al 2018,Science doi:10.1126/science.aav4294). It should be noted that Cas12a is also referred to as Cpfl and does not require a tracrRNA, although other V-type CRISPR-Cas proteins such as Cas12b do require a tracrRNA. Most V-type effectors can also target ssDNA (single-stranded DNA) without the PAM requirement (Zetsche et al 2015; yan et al 2018; harrington et al 2018) in general. The term "V-type CRISPR-Cas protein" encompasses unique RGNs that include a split RuvC nuclease domain, such as those disclosed in U.S. provisional application nos. 62/955,014 and 63/058,169, and PCT international application nos. PCT/US2020/067138, both filed on 12 months 30, 2019 and 7 months 29, 2020, each of which are incorporated herein by reference in their entirety. In some embodiments, the present invention provides a fusion protein comprising a deaminase presently disclosed fused to any nuclease-inactivating mutant of Cas12a (FnCas 12 a) of francisco (Francisella novicida), the sequence of which is shown in SEQ ID NO:561 and disclosed in U.S. patent No. 9,790,490, or FnCas12a disclosed in U.S. patent No. 9,790,490, the entire contents of which are incorporated herein by reference.
In some embodiments, the CRISPR-Cas protein is a naturally occurring type VI CRISPR-Cas protein or an active variant or fragment thereof. The term "CRISPR-Cas protein type VI", "CRISPR-Cas effector protein type VI", or "Cas13" as used herein refers to a CRISPR-Cas effector protein that does not require a tracrRNA and includes two HEPN domains that cleave RNA.
The term "guide RNA" refers to a nucleotide sequence that is sufficiently complementary to a target nucleotide sequence to hybridize to the target sequence and direct sequence-specific binding of an cognate RGN to the target nucleotide sequence. With regard to CRISPR-Cas RGN, the corresponding guide RNAs are one or more RNA molecules (in general, one or two) that can bind to and guide RGN to bind to a particular target nucleotide sequence, and in those examples where RGN has nickase or nuclease activity, also cleave the target nucleotide sequence. The guide RNAs include CRISPR RNA (crrnas) and in some embodiments, trans-activated CRISPR RNA (tracrRNA).
CRISPR RNA includes spacer sequences and CRISPR repeats. A "spacer" is a nucleotide sequence that hybridizes directly to a nucleotide sequence of interest. The spacer sequence is engineered to be fully or partially complementary to the target sequence of interest. In various embodiments, the spacer sequence comprises from about 8 nucleotides to about 30 nucleotides or more. For example, the spacer sequence may be about 8, about 9, about 10, about 11, about 12, about 13, about 14, about 15, about 16, about 17, about 18, about 19, about 20, about 21, about 22, about 23, about 24, about 25, about 26, about 27, about 28, about 29, about 30, or more nucleotides in length. In some embodiments, the spacer sequence is 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, or more nucleotides in length. In some embodiments, the spacer sequence is about 10 to about 26 nucleotides in length, or about 12 to about 30 nucleotides in length. In some embodiments, the spacer sequence is 10 to 26 nucleotides in length, or 12 to 30 nucleotides in length. In a particular embodiment, the spacer sequence is about 30 nucleotides in length. In a particular embodiment, the spacer sequence is 30 nucleotides in length. In some embodiments, when optimally aligned using a suitable alignment algorithm, the degree of complementarity between a spacer sequence and its corresponding target sequence is between 50% and 99% or greater, including but not limited to about or greater than about 50%, about 60%, about 70%, about 75%, about 80%, about 81%, about 82%, about 83%, about 84%, about 85%, about 86%, about 87%, about 88%, about 89%, about 90%, about 91%, about 92%, about 93%, about 94%, about 95%, about 96%, about 97%, about 98%, about 99%, or greater. In particular embodiments, the degree of complementarity between a spacer sequence and its corresponding target sequence is 50%, 60%, 70%, 75%, 80%, 81%, 82%, 83%, 84%, 85%, 86%, 87%, 88%, 89%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, 99%, or higher when optimally aligned using a suitable alignment algorithm. In particular embodiments, the spacer sequence is free of secondary structures that are predictable using any suitable polynucleotide folding algorithm known in the art (folding algorithm), including, but not limited to mFold (see, e.g., zuker and Stiegler (1981) Nucleic Acids Res.9:133-148) and RNAfold (see, e.g., gruber et al (2008) Cell 106 (1): 23-24).
CRISPR RNA repeat sequences include nucleotide sequences that either alone or in combination with the hybridized tracrRNA form a structure recognized by the RGN molecule. In various embodiments, the CRISPR RNA repeat sequence comprises from about 8 nucleotides to about 30 nucleotides or more. In particular embodiments, the CRISPR RNA repeat sequence comprises 8 nucleotides to 30 nucleotides or more. For example, the CRISPR repeat can be about 8, about 9, about 10, about 11, about 12, about 13, about 14, about 15, about 16, about 17, about 18, about 19, about 20, about 21, about 22, about 23, about 24, about 25, about 26, about 27, about 28, about 29, about 30, or more nucleotides in length. In particular embodiments, the CRISPR repeat is 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, or more nucleotides in length. In some embodiments, the degree of complementarity between a CRISPR repeat and its corresponding tracrRNA sequence is between 50% and 99% or greater when optimally aligned using a suitable alignment algorithm, including but not limited to about or greater than about 50%, about 60%, about 70%, about 75%, about 80%, about 81%, about 82%, about 83%, about 84%, about 85%, about 86%, about 87%, about 88%, about 89%, about 90%, about 91%, about 92%, about 93%, about 94%, about 95%, about 96%, about 97%, about 98%, about 99% or greater. In particular embodiments, the degree of complementarity between a CRISPR repeat and its corresponding tracrRNA sequence is 50%, 60%, 70%, 75%, 80%, 81%, 82%, 83%, 84%, 85%, 86%, 87%, 88%, 89%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, 99% or more when optimally aligned using a suitable alignment algorithm.
In some embodiments, the guide RNA further comprises a tracrRNA molecule. A trans-activated CRISPR RNA or tracrRNA molecule comprises a nucleotide sequence that includes a region of sufficient complementarity to hybridize to a CRISPR repeat of a crRNA, which is referred to herein as an anti-repeat region. In some embodiments, the tracrRNA molecule further comprises a region with a secondary structure (e.g., a stem-loop), or forms a secondary structure upon hybridization to its corresponding crRNA. In certain embodiments, the region of the tracrRNA that is fully or partially complementary to the CRISPR repeat is at the 5 'end of the molecule, and the 3' end of the tracrRNA comprises a secondary structure. This secondary structural region generally includes several hairpin structures found adjacent to the anti-repeat sequence, including a fusion membrane (nexus) hairpin. The tracrRNA often has a terminal hairpin at the 3 'end, which can vary in structure and number, but often includes a GC-rich Rho independent transcription terminator hairpin followed by a string of U at the 3' end. See, e.g., briner et al (2014) Molecular Cell56:333-339, briner and barren gou (2016) Cold Spring Harb Protoc; doi:10.1101/pdb. Top090902 and U.S. patent publication 2017/0275648, each of which is incorporated herein by reference in its entirety.
In various embodiments, the region of the anti-repeat sequence of the tracrRNA that is fully or partially complementary to the CRISPR repeat comprises from about 6 nucleotides to about 30 nucleotides or more. For example, the length of the base pairing region between the tracrRNA anti-repeat sequence and the CRISPR repeat sequence can be about 6, about 7, about 8, about 9, about 10, about 11, about 12, about 13, about 14, about 15, about 16, about 17, about 18, about 19, about 20, about 21, about 22, about 23, about 24, about 25, about 26, about 27, about 28, about 29, about 30, or more nucleotides. In particular embodiments, the length of the base pairing region between the tracrRNA anti-repeat sequence and the CRISPR repeat sequence can be 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30 or more nucleotides. In a particular embodiment, the anti-repeat region of the tracrRNA that is fully or partially complementary to the CRISPR repeat is about 10 nucleotides in length. In a particular embodiment, the anti-repeat region of the tracrRNA that is fully or partially complementary to the CRISPR repeat is 10 nucleotides in length. In some embodiments, the degree of complementarity between a CRISPR repeat and its corresponding tracrRNA anti-repeat sequence is between about 50% and about 99% or more, including but not limited to about or greater than about 50%, about 60%, about 70%, about 75%, about 80%, about 81%, about 82%, about 83%, about 84%, about 85%, about 86%, about 87%, about 88%, about 89%, about 90%, about 91%, about 92%, about 93%, about 94%, about 95%, about 96%, about 97%, about 98%, about 99% or more, when optimally aligned using a suitable alignment algorithm. In particular embodiments, the degree of complementarity between a CRISPR repeat and its corresponding tracrRNA anti-repeat sequence is 50%, 60%, 70%, 75%, 80%, 81%, 82%, 83%, 84%, 85%, 86%, 87%, 88%, 89%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, 99% or more when optimally aligned using a suitable alignment algorithm.
In various embodiments, the entire tracrRNA comprises from about 60 nucleotides to more than about 210 nucleotides. In particular embodiments, the entire tracrRNA comprises 60 nucleotides to more than 210 nucleotides. For example, the tracrRNA can be about 60, about 65, about 70, about 75, about 80, about 85, about 90, about 95, about 100, about 105, about 110, about 115, about 120, about 125, about 130, about 135, about 140, about 150, about 160, about 170, about 180, about 190, about 200, about 210, or more nucleotides in length. In particular embodiments, the tracrRNA is 60, 65, 70, 75, 80, 85, 90, 95, 100, 105, 110, 115, 120, 125, 130, 135, 140, 150, 160, 170, 180, 190, 200, 210 or more nucleotides in length. In particular embodiments, the tracrRNA is about 100 to about 210 nucleotides in length, including about 95, about 96, about 97, about 98, about 99, about 100, about 105, about 106, about 107, about 108, about 109, and about 100 nucleotides. In particular embodiments, the tracrRNA is 1 to 110 nucleotides in length, comprising 95, 96, 97, 98, 99, 100, 105, 106, 107, 108, 109, and 110 nucleotides.
The guide RNA forms a complex with an RNA-guided DNA binding polypeptide or RNA-guided nuclease to direct the RNA-guided nuclease to bind to the target sequence. If the guide RNA is complexed with RGN, the bound RGN introduces a single or double strand break at the target sequence. After the target sequence has been cleaved, the cleavage can be repaired such that the DNA sequence of the target sequence is modified during the repair process. Provided herein are methods of modifying a sequence of interest in DNA of a host cell using a mutant variant of an RNA-guided nuclease (linked to a deaminase, an inactivating nuclease or a nicking enzyme). Mutant variants of RNA-guided nucleases in which nuclease activity is inactivated or significantly reduced may be referred to as RNA-guided DNA-binding polypeptides, as the polypeptides are capable of binding to, and not necessarily cleaving, a sequence of interest. RNA-guided nucleases that cleave only a single strand of a double-stranded nucleic acid molecule are referred to herein as nicking enzymes.
The nucleotide sequence of interest is bound by an RNA-guided DNA binding polypeptide and hybridizes to the guide RNA associated with RGDBP. If RGDBP possesses nuclease activity (i.e., is RGN) (which is encompassed by the activity of a nicking enzyme), then the sequence of interest can be cleaved.
The guide RNA may be a single guide RNA or a double guide RNA system. The single guide RNA includes crRNA and optionally tracrRNA on a single RNA molecule, while the dual guide RNA system includes crRNA and tracrRNA present on two distinct RNA molecules that hybridize to each other through at least a portion of the crRNA's CRISPR repeat and at least a portion of the tracrRNA, which may be fully or partially complementary to the crRNA's CRISPR repeat. In some of those embodiments wherein the guide RNA is a single guide RNA, the crRNA and the optional tracrRNA are separated by a linker nucleotide sequence.
In general, to avoid formation of a secondary structure within a nucleotide of a linker nucleotide sequence or formation of a secondary structure comprising a nucleotide of a linker nucleotide sequence, the linker nucleotide sequence is a nucleotide sequence that does not comprise complementary bases. In some embodiments, the length of the linker nucleotide sequence between the crRNA and the tracrRNA is at least 3, at least 4, at least 5, at least 6, at least 7, at least 8, at least 9, at least 10, at least 11, at least 12 or more nucleotides. In particular embodiments, the linker nucleotide sequence between the crRNA and the tracrRNA is 3, 4, 5, 6, 7, 8, 9, 10, 11, 12 or more nucleotides in length. In certain embodiments, the adaptor nucleotide sequence of the single guide RNA is at least 4 nucleotides in length. In a particular embodiment, the adaptor nucleotide sequence of the single guide RNA is 4 nucleotides in length.
In certain embodiments, the guide RNA may be an RNA molecule that is introduced into the target cell, organelle, or embryo. Guide RNAs may be transcribed or chemically synthesized in vitro. In some embodiments, the nucleotide sequence encoding the guide RNA is introduced into a cell, organelle, or embryo. In some embodiments, the nucleotide sequence encoding the guide RNA is operably linked to a promoter (e.g., an RNA polymerase III promoter). The promoter may be a native promoter or heterologous to the nucleotide sequence encoding the guide RNA.
In various embodiments, the guide RNA can be introduced into the target cell, organelle, or embryo as described herein as a ribonucleoprotein complex, wherein the guide RNA binds to an RNA-guided nuclease polypeptide.
The guide RNA directs the cognate RNA-guided nuclease to a particular target nucleotide sequence of interest by hybridization of the guide RNA to the target nucleotide sequence. The nucleotide sequence of interest may comprise DNA, RNA, or a combination of both, and may be single-stranded or double-stranded. The nucleotide sequence of interest may be genomic DNA (i.e., chromosomal DNA), plasmid DNA, or an RNA molecule (e.g., messenger RNA, ribosomal RNA, transfer RNA, microrna, small interfering RNA). The nucleotide sequence of interest may be bound (and in some embodiments, cleaved) in vitro or in a cell by an RNA-guided DNA-binding polypeptide. The chromosomal sequence targeted by RGDBP may be a nuclear, plastid or mitochondrial chromosomal sequence. In some embodiments, the nucleotide sequence of interest is unique within the genome of interest.
In some embodiments, the nucleotide sequence of interest is adjacent to a pre-spacer adjacent motif (protospacer adjacent motif) (PAM). PAM generally comprises about 1, about 2, about 3, about 4, about 5, about 6, about 7, about 8, about 9, or about 10 nucleotides from the target nucleotide sequence within about 1 to about 10 nucleotides from the target nucleotide sequence. In particular embodiments, PAM comprises 1, 2, 3, 4, 5, 6, 7, 8, 9, or 10 nucleotides from the target nucleotide sequence within 1 to 10 nucleotides from the target nucleotide sequence. PAM may be 5 'or 3' of the target sequence. In some embodiments, PAM is 3' of the target sequence. Generally, PAM is a consensus sequence of about 2-6 nucleotides, but in particular embodiments PAM may be 1, 2, 3, 4, 5, 6, 7, 8, 9 or more nucleotides in length.
PAM limits which sequences a given RGDBP or RGN can target, as its PAM needs to be close to the target nucleotide sequence. Upon recognition of its corresponding PAM sequence, RGN can cleave the nucleotide sequence of interest at specific cleavage sites. As used herein, a cleavage site is a cleavage of a nucleotide sequence between two specific nucleotide components within a target nucleotide sequence by RGN. The cleavage site may include 1 st and 2 nd, 2 nd and 3 rd, 3 rd and 4 th, 4 th and 5 th, 5 th and 6 th, 7 th and 8 th, or 8 th and 9 th nucleotides from PAM in the 5 'or 3' direction. In some embodiments, because RGN can cleave target nucleotide sequences, resulting in staggered ends, in some embodiments the cleavage site is defined based on the distance of two nucleotides on the plus (+) strand of a polynucleotide from a PAM and the distance of two nucleotides on the minus (-) strand of the polynucleotide from the PAM.
RGDBP and RGN can be used to deliver fusion polypeptides, polynucleotides, or small molecule payloads (payload) to specific genomic locations.
In those embodiments in which the DNA-binding polypeptide comprises a meganuclease, the sequence of interest may comprise a pair of inverted 9 base pair "half-sites" separated by four base pairs. In the case of single stranded meganucleases, the N-terminal domain of the protein contacts the first half-site and the C-terminal domain of the protein contacts the second half-site. Four base pair 3' overhangs are generated by cleavage by meganucleases. In those embodiments in which the DNA-binding polypeptide comprises a compact (compact) TALEN, the recognition sequence comprises a first CNNNGN sequence recognized by the I-TevI domain, followed by a second sequence of length 4-16 base pairs in length for the non-specific spacer, followed by a 16-22bp in length recognized by the TAL effector domain (this sequence typically has 5't bases). In those embodiments in which the DNA-binding polypeptide comprises a zinc finger, the DNA-binding domain typically recognizes an 18-bp recognition sequence comprising a pair of nine base pair "half-sites" separated by 2-10 base pairs, and creates blunt-ended or 5' -overhangs of variable length (often four base pairs) by cleavage by nucleases.
Fusion proteins
In some embodiments, a DNA binding polypeptide (e.g., a nuclease inactivating activity or a nicking enzyme RGN) is operably linked to a deaminase of the present invention. In some embodiments, a DNA binding polypeptide (e.g., a nuclease inactivating RGN or a nicking enzyme RGN) fused to a deaminase of the present invention can be targeted to a specific location (in some embodiments, a specific genomic locus) of a nucleic acid molecule (i.e., a nucleic acid molecule of interest) to alter expression of a desired sequence. In some embodiments, binding of the fusion protein to the target sequence results in deamination of the nucleobases, resulting in a transition from one nucleobase to another. In some embodiments, binding of the fusion protein to the target sequence results in deamination of nucleobases adjacent to the target sequence. Nucleobases adjacent to a target sequence that are deaminated and mutated using the presently disclosed compositions and methods can be 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 25, 30, 35, 40, 45, 50, 55, 60, 65, 70, 75, 80, 85, 90, 95, or 100 base pairs from the 5 'or 3' end of the target sequence (bound by a gRNA) within a target nucleic acid molecule. Some aspects of this disclosure provide fusion proteins comprising: (i) A DNA binding polypeptide (e.g., a nuclease inactivating or nicking enzyme RGN polypeptide); (ii) a deaminase polypeptide; and optionally (iii) a second deaminase. The second deaminase may be the same deaminase as the first deaminase, or may be a different deaminase. In some embodiments, both the first and second deaminase are adenine deaminase of the present invention.
The present disclosure provides fusion proteins of various configurations. In some embodiments, the deaminase polypeptide is fused to the N-terminus of a DNA-binding polypeptide (e.g., an RGN polypeptide). In some embodiments, the deaminase polypeptide is fused to the C-terminus of a DNA-binding polypeptide (e.g., an RGN polypeptide).
In some embodiments, the deaminase and the DNA-binding polypeptide (e.g., RNA-guided DNA-binding polypeptide) are fused to each other via a peptide linker. The linker between the deaminase and a DNA-binding polypeptide (e.g., an RNA-guided DNA-binding polypeptide) can determine the editing window of the fusion protein, thereby increasing deaminase specificity and reducing off-target mutations. Various linker lengths and flexibilities may be employed, ranging from forms (GGGGS) n (G) n Is very flexible in terms of form (EAAAK) n (XP) n To achieve optimal length and rigidity for deaminase activity for specific applications. As used herein, the term "linker" refers to a chemical group or molecule that connects two molecules or moieties (e.g., a binding domain and a cleavage domain of a nuclease). In some embodiments, the linker connects the RNA-guided nuclease and the deaminase. In some embodiments, the linker connects the inactive or inactive RGN with a deaminase. In a further embodiment, the linker connects the two deaminases. Typically, the linker is located in two groups, molecules or otherwise Between moieties or on the sides of two groups, molecules or other moieties, and are attached to each via covalent bonds, thereby linking the two. In some embodiments, the linker is an amino acid or a plurality of amino acids (e.g., peptides or proteins). In some embodiments, the linker is an organic molecule, group, polymer, or chemical moiety. In some embodiments, the linker is 3-100 amino acids in length, e.g., 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30-35, 35-40, 40-45, 45-50, 50-60, 60-70, 70-80, 80-90, 90-100, 100-150, or 150-200 amino acids in length. Longer or shorter linkers are also contemplated. In some embodiments, preferably, the shorter linker reduces the overall size or length of the fusion protein or coding sequence thereof.
In some embodiments, the linker comprises (GGGGS) n 、(G) n an(EAAAK) n Or (XP) n Motifs or combinations of any of these, wherein n is independently an integer between 1 and 30. In some embodiments, n is independently 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, or 30, or any combination thereof (if more than one linker or more than one linker motif is present). Additional suitable linker motifs and linker constructions will be apparent to those skilled in the art. In some embodiments, suitable linker motifs and constructs include those described in Chen et al, 2013 (Adv Drug Deliv Rev.65 (10): 1357-69, the entire contents of which are incorporated herein by reference). Additional suitable linker sequences will be apparent to those skilled in the art. In some embodiments, the linker sequence comprises the sequence set forth in SEQ ID NO:45 or 442.
In some embodiments, the general configuration of the exemplary fusion proteins provided herein include the structure: [ NH ] 2 ]- [ deaminase]-[DBP]-[COOH];[NH 2 ]-[DBP]- [ deaminase]-[COOH];[NH 2 ]-[DBP]- [ deaminase]- [ deaminase]-[COOH];[NH 2 ]- [ deaminase]-[DBP]- [ deaminase]-[COOH]The method comprises the steps of carrying out a first treatment on the surface of the Or [ NH ] 2 ]- [ deaminase]- [ deaminase]-[DBP]-[COOH]Wherein DBP is a DNA binding polypeptide, NH 2 N-terminus of the fusion protein, and COOH-terminus of the fusion protein. In some embodiments, the fusion protein comprises more than two deaminase polypeptides.
In certain embodiments, the general configuration of the exemplary fusion proteins provided herein include the structure: [ NH ] 2 ]- [ deaminase]-[RGN]-[COOH];[NH 2 ]-[RGN]- [ deaminase]-[COOH];[NH 2 ]-[RGN]- [ deaminase]- [ deaminase]-[COOH];[NH 2 ]- [ deaminase]-[RGN]- [ deaminase]-[COOH]The method comprises the steps of carrying out a first treatment on the surface of the Or [ NH ] 2 ]- [ deaminase]- [ deaminase]-[RGN]-[COOH]Wherein NH is 2 N-terminus of the fusion protein, and COOH-terminus of the fusion protein. In some embodiments, the fusion protein comprises more than two deaminase polypeptides.
In some embodiments, the fusion protein comprises the structure: [ NH ] 2 ]- [ deaminase]- [ nuclease-inactivating Activity RGN ]]-[COOH];[NH 2 ]- [ deaminase]- [ deaminase]- [ nuclease-inactivating Activity RGN ]]-[COOH];[NH 2 ]- [ nuclease-inactivating Activity RGN ]]- [ deaminase]-[COOH];[NH 2 ]- [ deaminase]- [ nuclease-inactivating Activity RGN ]]- [ deaminase]-[COOH]The method comprises the steps of carrying out a first treatment on the surface of the Or [ NH ] 2 ]- [ nuclease-inactivating Activity RGN ]]- [ deaminase]- [ deaminase]-[COOH]. It should be understood that "nuclease-inactivating RGN" represents any RGN comprising any CRISPR-Cas protein that has been mutated to be nuclease-inactivating. In some embodiments, the fusion protein comprises more than two deaminase polypeptides.
In some embodiments, the fusion protein comprises the structure: [ NH ] 2 ]- [ deaminase]- [ RGN nicking enzyme]-[COOH];[NH 2 ]- [ deaminase]- [ deaminase]- [ RGN nicking enzyme]-[COOH];[NH 2 ]- [ RGN nicking enzyme]- [ deaminase]-[COOH];[NH 2 ]- [ deaminase]- [ RGN nicking enzyme]- [ deaminase]-[COOH]The method comprises the steps of carrying out a first treatment on the surface of the Or [ NH ] 2 ]- [ RGN nicking enzyme]- [ deaminase]- [ deaminase]-[COOH]. It should be appreciated that "RGN nickase" represents any RGN comprising any CRISPR-Cas protein that has been mutated to be active as a nickase.
In some embodiments, "-" as used in the general configuration above indicates the presence of an optional linker sequence. In some embodiments, the fusion proteins provided herein do not include a linker sequence. In some embodiments, at least one optional linker sequence is present.
Other exemplary features that may be present are localization sequences (such as nuclear localization sequences, cytoplasmic localization sequences), export sequences (such as nuclear export sequences), or other localization sequences and sequence tags that may be used for the lysis, purification or detection of fusion proteins. Suitable localization signal sequences and protein sequence tags provided herein include, but are not limited to: biotin carboxylase carrier protein (biotin carboxylase carrier protein, BCCP) tag, myc tag, calmodulin tag, FLAG tag, hemagglutinin (HA) tag, polyhistidine tag (also known as histidine tag or His tag), maltose Binding Protein (MBP) tag, nus tag, glutathione-S-transferase (GST) tag, green Fluorescent Protein (GFP) tag, thiol redox protein tag, S tag, softag (e.g., softag 1, softag 3), strepitag, biotin-conjugating enzyme tag, flAsH tag, V5 tag, and SBP tag. Additional suitable sequences will be apparent to those skilled in the art.
In certain embodiments, the presently disclosed fusion proteins include at least one cell penetrating domain that facilitates cellular uptake of the fusion protein. Cell penetrating domains are known in the art and generally include: several segments of positively charged amino acid residues (i.e., polycationic cell penetrating domains), alternatively polar amino acid residues and non-polar amino acid residues (i.e., amphiphilic cell penetrating domains), or hydrophobic amino acid residues (i.e., hydrophobic cell penetrating domains) (see, e.g., milletti f. (2012) Drug Discov Today 17:850-860). A non-limiting example of a cell penetrating domain is the transactivation Transcriptional Activator (TAT) from human immunodeficiency virus 1.
In some embodiments, a deaminase or fusion protein provided herein further comprises a Nuclear Localization Sequence (NLS). The nuclear localization signal, plastid localization signal, mitochondrial localization signal, dual targeting localization signal, and/or cell penetrating domain may be disposed in the amino terminus (N-terminus), carboxy terminus (C-terminus), or internal position of the fusion protein.
In some embodiments, the NLS is fused to the N-terminus of a fusion protein or deaminase. In some embodiments, the NLS is fused to the C-terminus of a fusion protein or deaminase. In some embodiments, the NLS is fused to the N-terminus of the deaminase of the fusion protein. In some embodiments, the NLS is fused to the C-terminus of the deaminase of the fusion protein. In some embodiments, the NLS is fused to the N-terminus of a DNA-binding polypeptide (e.g., an RGN polypeptide) of the fusion protein. In some embodiments, the NLS is fused to the C-terminus of a DNA-binding polypeptide (e.g., an RGN polypeptide) of the fusion protein. In some embodiments, the NLS is fused to the N-terminus of the deaminase polypeptide of the fusion protein. In some embodiments, the NLS is fused to the C-terminus of the deaminase polypeptide of the fusion protein. In some embodiments, the NLS is fused to the fusion protein via one or more linkers. In some embodiments, the NLS is not fused to the fusion protein via a linker. In some embodiments, the NLS comprises the amino acid sequence of any one of the NLS sequences provided or referenced herein. In some embodiments, the NLS comprises the amino acid sequence as set forth in SEQ ID NO:43 or SEQ ID NO:46. In some embodiments, the fusion protein or deaminase comprises the amino acid sequence of SEQ ID NO:43 and at the C-terminus thereof SEQ ID NO:46.
In some embodiments, a fusion protein as provided herein includes a full-length sequence of a deaminase, e.g., SEQ ID NO:1-10 and 399-441. However, in some embodiments, fusion proteins as provided herein do not include the full length sequence of a deaminase, but only fragments thereof. For example, in some embodiments, the fusion proteins provided herein further include a DNA binding polypeptide (e.g., RNA-guided DNA binding) domain and a deaminase domain.
In some embodiments, the fusion proteins of the invention include a DNA-binding polypeptide (e.g., RGN) and a deaminase, wherein the deaminase has a sequence that matches SEQ ID NO: any one of 1-10 and 399-441 has an amino acid sequence that is at least 50%, at least 55%, at least 60%, at least 65%, at least 70%, at least 75%, at least 80%, at least 85%, at least 90%, at least 95%, at least 96%, at least 97%, at least 98%, at least 99%, or 100% identical. Examples of such fusion proteins are described in the examples section herein.
In some embodiments, the fusion protein comprises a deaminase polypeptide. In some embodiments, the fusion protein comprises at least two deaminase polypeptides operably linked, either indirectly or via a peptide linker. In some embodiments, the fusion protein comprises one deaminase polypeptide and a second deaminase polypeptide is co-expressed with the fusion protein.
Also provided herein are ribonucleoprotein complexes, including fusion proteins (including deaminase and RGDBP) and guide RNAs (single guide or double guide RNAs) (also collectively referred to as grnas).
V. nucleotides encoding deaminase, fusion protein and/or gRNA
The present disclosure provides polynucleotides encoding the deaminase polypeptides of the present disclosure (SEQ ID NOS: 11-20 and 443-485). The disclosure further provides polynucleotides encoding fusion proteins, including deaminase and DNA-binding polypeptides (e.g., meganucleases, zinc finger fusion proteins, or TALENs). The present disclosure further provides polynucleotides encoding fusion proteins comprising a deaminase domain and an RNA-guided DNA-binding polypeptide. Such RNA-guided DNA-binding polypeptides may be RGN or RGN variants. The protein variant may be nuclease-inactivating or nicking. RGN can be a CRISPR-Cas protein or an active variant or fragment thereof. SEQ ID NO:41 and 42 are non-limiting examples of RGN and nicking enzyme RGN variants, respectively. Examples of CRISPR-Cas nucleases are well known in the art, and similar corresponding mutations can create mutant variants that are also nickases or are nuclease-inactivating.
Embodiments of the present invention provide a polynucleotide encoding a fusion protein comprising RGDBP and deaminase described herein (SEQ ID NOS: 1-10 and 399-441, or variants thereof). In some embodiments, the second polynucleotide encodes a guide RNA required by RGDBP for targeting to the nucleotide sequence of interest. In some embodiments, the guide RNA and the fusion protein are encoded by the same polynucleotide.
The use of the term "polynucleotide" is not intended to limit the present disclosure to polynucleotides comprising DNA, but such DNA polynucleotides are contemplated. One of ordinary skill in the art will recognize that polynucleotides may include Ribonucleotides (RNAs) and combinations of ribonucleotides and deoxyribonucleotides. Such deoxyribonucleotides and ribonucleotides include both naturally occurring molecules and synthetic analogs. Polynucleotides disclosed herein also encompass all forms of sequences, including, but not limited to, single stranded forms, double stranded forms, stem loop structures, circular forms (e.g., comprising circular RNAs), and the like.
Embodiments of the invention are those comprising a sequence that hybridizes to SEQ ID NO:11-20 and 443-485, wherein the nucleic acid molecule encodes a deaminase having adenine deaminase activity, wherein the nucleic acid molecule has a sequence of at least 50%, at least 55%, at least 60%, at least 65%, at least 70%, at least 75%, at least 80%, at least 85%, at least 90%, at least 95%, at least 96%, at least 97%, at least 98%, at least 99% or 100% identity. The nucleic acid molecule may further comprise a heterologous promoter or terminator. The nucleic acid molecule may encode a fusion protein and optionally a second deaminase, wherein the encoded deaminase is operably linked to a DNA-binding polypeptide. In some embodiments, the nucleic acid molecule encodes a fusion protein, wherein the encoded deaminase is operably linked to RGN and optionally a second deaminase.
In some embodiments, nucleic acid molecules comprising polynucleotides encoding deaminase of the present invention may be codon optimized for expression in an organism of interest. A "codon optimized" coding sequence is a polynucleotide coding sequence whose frequency of codon usage is designed to mimic the preferred frequency of codon usage or transcription conditions of a particular host cell. Expression in a particular host cell or organism is enhanced because the change of one or more codons at the nucleic acid level leaves the translated amino acid sequence unchanged. The nucleic acid molecule may be codon optimized in whole or in part. Codon tables and other references providing preference information for a wide range of organisms are available in the art (see, e.g., campbell and Gowri (1990) Plant Physiol.92:1-11, discussion of Plant preferred codon usage). Methods are available in the art for synthesizing plant-preferred genes. See, for example, U.S. Pat. nos. 5,380,831 and 5,436,391 and Murray et al (1989) Nucleic Acids res.17:477-498, which are incorporated herein by reference.
In some embodiments, polynucleotides encoding deaminase, fusion proteins, and/or grnas described herein may be provided in an expression cassette for expression in vitro or in a cell, organelle, embryo, or organism of interest. The expression cassette may comprise 5 'and 3' regulatory sequences operably linked to a polynucleotide encoding a deaminase and/or fusion protein (including a deaminase, an RNA-guided DNA-binding polypeptide, and optionally a second deaminase, and/or a gRNA) that allows for expression of the polynucleotide provided herein. The expression cassette may additionally contain at least one additional gene or gene element for co-transformation into an organism. The components are operably linked if additional genes or elements are included. The term "operably connected" is intended to mean a functional connection between two or more elements. For example, the operative linkage between a promoter and a coding region of interest (e.g., a region encoding a deaminase, RNA-guided DNA-binding polypeptide, and/or gRNA) is a functional linkage that allows expression of the coding region of interest. The operatively connected elements may be continuous or discontinuous. When used in reference to the ligation of two protein coding regions, by operably linked means that the coding regions are in the same reading frame. In some embodiments, additional gene(s) or element(s) are provided on multiple expression cassettes. For example, the nucleotide sequences encoding the deaminase of the present disclosure, either alone or as a component of a fusion protein, may be present on one expression cassette, while the nucleotide sequences encoding the gRNA may be on a separate expression cassette. Another example may have a nucleotide sequence encoding the deaminase of the present disclosure alone on a first expression cassette, a second expression cassette encoding a fusion protein comprising the deaminase, and a nucleotide sequence encoding the gRNA on a third expression cassette. Such an expression cassette is provided with a plurality of restriction sites and/or recombination sites to allow for transcriptional regulation of the insertion of the polynucleotide into the regulated region. Expression cassettes comprising selectable marker genes may also be present.
The expression cassette may comprise in the 5'-3' direction of transcription: a transcription (and in some embodiments, translation) initiation region (i.e., a promoter), a deaminase encoding polynucleotide of the present invention, and a transcription (and in some embodiments, translation) termination region (i.e., a termination region) that is functional in an organism of interest. The promoters of the invention are capable of directing or driving expression of the coding sequences in the host cell. The regulatory regions (e.g., promoters, transcriptional regulatory regions, and translational termination regions) may be endogenous or heterologous to the host cell or to each other. As used herein, "heterologous" with respect to a sequence is a sequence that originates from a foreign species, or, if from the same species, is substantially modified from its native form in the composition and/or genomic locus by deliberate human intervention. As used herein, a chimeric gene includes a coding sequence operably linked to a transcription initiation region that is heterologous to the coding sequence.
Suitable termination regions may be obtained from the Ti plasmid of Agrobacterium tumefaciens (A.tumefaciens), such as octopine (octopine) synthase and nopaline (nopaline) synthase termination regions. See also Guerineau et al (1991) mol. Gen. Genet.262:141-144; proudroot (1991) Cell 64:671-674; sanfacon et al (1991) Genes Dev.5:141-149; mogen et al (1990) Plant Cell 2:1261-1272; munroe et al (1990) Gene 91:151-158; ballas et al (1989) Nucleic Acids Res.17:7891-7903; joshi et al (1987) Nucleic Acids Res.15:9627-9639.
Additional adjustment signals include, but are not limited to: transcription initiation sites (transcriptionalinitiation start site), operators, activators, enhancers, other regulatory elements, ribosome binding sites, start codons, termination signals and the like. See, for example, U.S. patent nos. 5,039,523 and 4,853,331; EPO 0480762A2; sambrook et al (1992), molecular Cloning: a Laboratory Manual, ed.Maniatis et al (Cold spring harbor laboratory Press, cold spring harbor, new York (Cold Spring Harbor Laboratory Press, cold Spring Harbor, N.Y.)), hereinafter "Sambrook 11"; davis et al eds. (1980) Advanced Bacterial Genetics (cold spring harbor laboratory press), cold spring harbor, new york, and references cited therein.
In preparing the expression cassette, the various DNA fragments can be manipulated to provide the DNA sequence in the appropriate reading frame in the appropriate orientation and where appropriate. To this end, adaptors or linkers may be employed to ligate the DNA fragments, or other manipulations may be involved to provide suitable restriction sites, remove excess DNA, remove restriction sites, and the like. For this purpose, in vitro mutagenesis, primer repair, restriction, adhesion, re-substitution, such as transitions and transversions, may be involved.
Many promoters may be used in the practice of the present invention. Promoters may be selected based on the desired outcome. The nucleic acid may be combined with constitutive, inducible, growth stage specific, cell type specific, tissue preferred, tissue specific promoters or other promoters for expression in the organism of interest. See, for example, WO 99/43838 and 8,575,425; 7,790,846; 8,147,856; 8,586832; 7,772,369; 7,534,939; no. 6,072,050; 5,659,026; 5,608,149; 5,608,144; 5,604,121; 5,569,597; 5,466,785; 5,399,680; 5,268,463; 5,608,142; and the promoter shown in U.S. Pat. No. 6,177,611; these are incorporated herein by reference.
For expression in plants, constitutive promoters also include the CaMV 35S promoter (Odell et al (1985) Nature 313:810-812); rice actin (rice actin) (McElroy et al (1990) Plant Cell 2:163-171); ubiquitin (Christensen et al (1989) Plant mol. Biol.12:619-632 and Christensen et al (1992) Plant mol. Biol. 18:675-689); pEMU (Last et al (1991) Theor. Appl. Genet. 81:581-588); MAS (Velten et al (1984) EMBO J.3:2723-2730).
Examples of inducible promoters are: an Adh1 promoter inducible by hypoxia or cold stress, an Hsp70 promoter inducible by heat stress, a PPDK promoter inducible by light and a PEP carboxylase (PEP carboxylase) promoter. Also useful are chemically inducible promoters such as the safener-induced In2-2 promoter (U.S. Pat. No. 5,364,780), the auxin-induced and tapetum-specific Axig1 activator (PCT US 01/22169) which is also active In healed tissue, steroid-responsive promoters (see, e.g., schena et al (1991) Proc. Natl. Acad. Sci. USA 88:10421-10425 and McNellis et al (1998) Plant J.14 (2): estrogen-ERE promoter and glucocorticoid-inducible promoter In 247-257), and tetracycline-inducible and tetracycline-repressible promoters (see, e.g., gatz et al (1991) mol. Gen. Genet.227:229-237 and U.S. Pat. No. 5,814,618 and 5,789,156), which are incorporated herein by reference.
In some embodiments, tissue-specific or tissue-preferred promoters may be employed to target expression constructs within a particular tissue. In certain embodiments, the tissue-specific or tissue-preferred promoter is active in plant tissue. Examples of promoters under developmental control in plants include activators that preferentially initiate transcription in certain tissues such as leaves, roots, fruits, seeds or flowers. A "tissue-specific" promoter is a promoter that initiates transcription only in certain tissues. Unlike constitutive expression of genes, tissue-specific expression is the result of several levels of gene-regulatory interactions. Thus, promoters from homologous or closely related plant species may be preferred for achieving efficient and reliable expression of transgenes in specific tissues. In some embodiments, expression includes a tissue-preferred promoter. A "tissue-preferred" activator is an activator that initiates transcription preferentially, but not necessarily entirely or only in certain tissues.
In some embodiments, the nucleic acid molecule encoding a deaminase described herein comprises a cell-type specific promoter. A "cell type specific" promoter is one that drives expression primarily in certain cell types of one or more organs. For example, some examples of plant cells in which a cell type specific promoter that functions in a plant may be predominantly active include bet l cells, vascular cells in roots, leaves, stem cells, and stem cells (stem cells). The nucleic acid molecule may also comprise a cell type-preferred promoter. A "cell type preferred" promoter is one that drives expression predominantly, but not necessarily entirely, or only in certain cell types in one or more organs. For example, some examples of plant cells in which a preferred promoter for a cell type that functions in a plant may be preferentially active include BETL cells, vascular cells in roots, leaves, stem cells, and stem cells.
In some embodiments, the nucleic acid sequence encoding a deaminase, fusion protein, and/or gRNA may be operably linked to a promoter sequence recognized, for example, by a bacteriophage RNA polymerase for in vitro mRNA synthesis. In such embodiments, the in vitro transcribed RNA can be purified for use in the methods described herein. For example, the promoter sequence may be a T7, T3 or SP6 promoter sequence, or a variant of a T7, T3 or SP6 promoter sequence. In such embodiments, the expressed proteins and/or RNAs may be purified for use in the genome modification methods described herein.
In certain embodiments, polynucleotides encoding deaminase, fusion proteins, and/or grnas may be linked to a polyadenylation signal (e.g., SV40 polyA signal and other signals functional in plants) and/or at least one transcription termination sequence. In some embodiments, the sequence encoding a deaminase or fusion protein may be linked to a sequence(s) encoding at least one nuclear localization signal, at least one cell penetrating domain, and/or at least one signal peptide capable of transporting a protein to a particular subcellular location, as described elsewhere herein.
In some embodiments, the polynucleotide encoding the deaminase, fusion protein, and/or gRNA may be present in a vector or in multiple vectors. "vector" refers to a polynucleotide composition for transferring, delivering or introducing a nucleic acid into a host cell. Suitable vectors include plasmid vectors, phagemids, cosmids, artificial/minichromosomes, transposons, and viral vectors (e.g., lentiviral vectors, adeno-associated viral vectors, baculovirus vectors). In some embodiments, the vector includes additional expression control sequences (e.g., enhancer sequences, kozak sequences, polyadenylation sequences, transcription termination sequences), selectable marker sequences (e.g., antibiotic resistance genes), origins of replication, and the like. Additional information may be found in "Current Protocols in Molecular Biology" Ausubel et al, john Wiley & Sons, new York, 2003; or "Molecular Cloning: a Laboratory Manual "Sambrook & Russell, cold Spring Harbor Press, cold Spring Harbor, n.y.,3rd edition, 2001.
In some embodiments, the vector comprises a selectable marker gene for selection of transformed cells. Selection of transformed cells or tissues employs a selectable marker gene. The marker gene comprises: genes encoding antibiotic resistance, such as genes encoding neomycin phosphotransferase II (NEO) and Hygromycin Phosphotransferase (HPT); and genes conferring resistance to herbicidal compounds such as glufosinate (glufosinate ammonium), bromoxynil, imidazolinone and 2, 4-dichlorophenoxyacetate (2, 4-D).
In some embodiments, the expression cassette or vector comprising a sequence encoding a fusion protein comprising an RNA-guided DNA-binding polypeptide such as RGN further comprises a sequence encoding a gRNA. In some embodiments, the sequence(s) encoding the gRNA is operably linked to at least one transcription control sequence for expression of the gRNA in an organism or host cell of interest. For example, a polynucleotide encoding a gRNA may be operably linked to a promoter sequence recognized by RNA polymerase III (Pol III). Examples of suitable Pol III promoters include, but are not limited to, mammalian U6, U3, H1 and 7SL RNA promoters and rice U6 and U3 promoters.
As indicated, expression constructs comprising nucleotide sequences encoding deaminase, fusion proteins, and/or grnas can be used to transform an organism of interest. Methods for transformation involve introducing the nucleotide construct into an organism of interest. By "introducing" is intended introducing the nucleotide construct into the host cell, thereby allowing the construct to enter the interior of the host cell. The method of the invention does not require a specific method of introducing the nucleotide construct into the host organism, but the nucleotide construct enters at least one cell interior of the host organism. The host cell may be eukaryotic or prokaryotic. In particular embodiments, the eukaryotic host cell is a plant cell, a mammalian cell, or an insect cell. Methods for introducing nucleotide constructs into plants and other host cells are known in the art and include, but are not limited to: stable transformation methods, transient transformation methods, and virus-mediated methods.
Such methods result in transformed organisms, such as plants, comprising: whole plants and plant organs (e.g., leaves, stems, roots, etc.), seeds, plant cells, propagules, embryos and their progeny. Plant cells may or may not differentiate (e.g., healing tissue, suspension culture cells, protoplasts, leaf cells, root cells, phloem cells, pollen).
A "transgenic organism" or "transformed organism" or "stably transformed" organism or cell or tissue refers to an organism into which a polynucleotide encoding a deaminase of the present invention has been incorporated or integrated. It should be appreciated that other exogenous or endogenous nucleic acid sequences or DNA fragments may also be incorporated into the host cell. Agrobacterium and gene gun mediated transformation remain two major approaches for plant cell transformation. However, transformation of host cells may be performed by infection, transfection, microinjection, electroporation, microprojection (microprojection), gene gun or particle bombardment, electroporation, silica/carbon fiber, ultrasound mediated, PEG mediated, calcium phosphate co-precipitation, polycationic DMSO technology, DEAE dextran (dextran) procedures, and virus mediated, liposome mediated, and the like. Viral-mediated introduction of polynucleotides encoding deaminase, fusion proteins, and/or grnas includes retrovirus, lentivirus, adenovirus, and adeno-associated virus-mediated introduction and expression, as well as use of cauliflower mosaic virus (e.g., cauliflower mosaic virus (cauliflower mosaic virus)), geminivirus (e.g., leguminous golden mosaic virus (bean golden yellow mosaic virs) or maize zebra virus (maize streak virs)), and RNA plant virus (e.g., tobacco mosaic virus).
The transformation protocol and protocols for introducing a polypeptide or polynucleotide sequence into a plant may vary with the type of host cell (e.g., monocot or dicot cell) targeted for transformation. Methods for transformation are known in the art and include those set forth in U.S. Pat. nos. 8,575,425, 7,692,068, 8,802,934, 7,541,517, each of which is incorporated herein by reference. See also Rakoczy-Trojanowska, m. (2002) Cell Mol Biol lett.7:849-858; jones et al (2005) Plant Methods 1:5, a step of; river et al (2012) Physics of Life Reviews 9:308-345; bartlett et al (2008) Plant Methods 4:1-12; bates, g.w. (1999) Methods in Molecular Biology 111:359-366; binns and Thomashow (1988), microbiology 42: annual Reviews in 575-606; christou, p. (1992) The Plant Journal 2:275-281; christou, p. (1995) Euphytica 85:13-27; tzfira et al (2004) TRENDS in Genetics: 375-383; yao et al (2006) Journal of Experimental Botany 57:3737-3746; zupan and Zambryski (1995) Plant Physiolog 107:1041-1047; jones et al (2005) plant methods 1:5.
Transformation may result in stable or transient incorporation of the nucleic acid into the cell. "stable transformation" is intended to mean that the nucleotide construct introduced into the host cell is integrated into the genome of the host cell and is capable of being inherited by its progeny. "transient transformation" is intended to mean that the expression polynucleotide is introduced into a host cell without integration into the genome of the host cell.
Methods for chloroplast transformation are known in the art. See, e.g., svab et al (1990) proc.tail.acad.sci.usa 87:8526-8530; svab and Maliga (1993) Proc.Natl.Acad.Sci.USA 90:913-917; svab and Maliga (1993) EMBO J.12:601-606. The method relies on particle gun delivery of DNA containing a selectable marker and targeting the DNA to the plastid genome by homologous recombination. Alternatively, plastid transformation may be accomplished by trans-activating the transgene carried by the silent plastid by tissue-preferred expression of a nuclear-encoded and plastid-directed RNA polymerase. McBride et al (1994) in Proc.Natl. Acad. Sci. USA 91:7301-7305 have reported this system.
The cells that have been transformed can be grown in a conventional manner into transgenic organisms, such as plants. See, for example, mccormik et al (1986) Plant Cell Reports 5:81-84. Such plants can then be grown and pollinated with the same transformed line (transfored strain) or a different line, and the resulting hybrids with deaminase or fusion protein polynucleotides identified. Two or more generations may be grown to ensure stable maintenance and inheritance of the deaminase or fusion protein polynucleotide, and seeds harvested to ensure the presence of the deaminase or fusion protein polynucleotide. In this way, the present invention provides transformed seeds (also referred to as "transgenic seeds") having the nucleotide constructs of the present invention (e.g., the expression cassettes of the present invention) stably incorporated within their genome.
In some embodiments, the cells that have been transformed may be introduced into an organism. Such cells may originate from organisms, wherein the cells are transformed in an ex vivo procedure.
The sequences provided herein can be used for transformation of any plant species, including but not limited to monocots and dicots. Examples of plants of interest include, but are not limited to: maize (corn), sorghum, wheat, sunflower, tomato, crucifers, pepper, potato, cotton, rice, soybean, sugar beet, sugarcane, tobacco, barley, canola (Brassica sp.), alfalfa, rye, millet, safflower, peanut, gan, cassava, coffee, coconut, pineapple, citrus trees, cocoa, tea, banana, avocado, fig, guava, mango, olive, papaya, cashew, australian walnut, almond, oat, vegetables, ornamental plants, and conifers.
Vegetables include, but are not limited to: tomato, lettuce, mung bean, lima bean, pea, and members of the genus cucumber (Curcumis) such as cucumis melo (curcurcurber), cantaloupe (cantaloupe) and cucumis melo (musk melon). Ornamental plants include, but are not limited to: azalea, hydrangea, cottonrose, rose, tulip, narcissus, petunia, carnation, gorilla, and chrysanthemum. Preferably, the plant of the invention is a crop plant (e.g., maize, sorghum, wheat, sunflower, tomato, crucifers, peppers, potatoes, cotton, rice, soybean, sugarbeet, sugarcane, tobacco, barley, canola, etc.).
As used herein, the term "plant" comprises: plant cells, plant protoplasts, plant cell tissue cultures from which plants can regenerate, plant calli, plant clumps, and plant cells that are intact in a plant or part of a plant, such as embryos, pollen, ovules, seeds, leaves, flowers, shoots, fruits, kernels, ears, cobs, husks, stems, roots, root tips, anthers, and the like. Grains are intended to express mature seeds produced by commercial growers for purposes other than planting or propagating species. Progeny, variants, and mutants of regenerated plants are included within the scope of the invention, provided that such portions include the introduced polynucleotide. Further provided are treated plant products or byproducts that retain the sequences disclosed herein, e.g., comprising soybean meal.
In some embodiments, polynucleotides encoding deaminase, fusion proteins, and/or grnas are used to transform any eukaryotic species, including but not limited to: animals (e.g., mammals, insects, fish, birds, and reptiles), fungi, amoebas, algae, and yeast. In some embodiments, polynucleotides encoding deaminase, fusion proteins, and/or grnas are used to transform any prokaryotic species, including, but not limited to: archaea and bacteria (e.g., bacillus spp.), klebsiella spp, streptomyces spp, rhizobium spp, escherichia spp, pseudomonas spp, salmonella spp, shigella spp, vibrio spp, yersinia spp, mycoplasma spp, agrobacterium spp, and Lactobacillus spp.
In some embodiments, traditional viral and non-viral based gene transfer methods are used to introduce nucleic acids into mammalian cells or tissues of interest. Such methods can be used to administer a nucleic acid encoding a deaminase or fusion protein of the present invention, or optionally a gRNA, to a cell in culture, or to a cell in a host organism. The non-viral vector delivery system comprises: DNA plasmids, RNA (e.g., transcripts of the vectors described herein), naked nucleic acids, and nucleic acids complexed with a delivery vehicle such as a liposome. Viral vector delivery systems include DNA and RNA viruses that have episomal or integrated genomes after delivery to cells. Non-limiting examples include vectors employing cauliflower mosaic virus groups (e.g., cauliflower mosaic virus (cauliflower mosaic virus)), geminiviruses (e.g., legume golden mosaic virus (bean golden yellow mosaic virus) or maize zebra virus (maize streak virus)), and RNA plant viruses (e.g., tobacco mosaic virus). For a review of gene therapy programs, see Anderson, science 256:808-813 (1992); nabel & Feigner, TIBTECH 11:211-217 (1993); mitani & Caskey, TIBTECH 11:162-166 (1993); dillon, TIBTECH 11:167-175 (1993); miller, nature 357:455-460 (1992); van Brnt, biotechnology 6 (10): 1149-1154 (1988); vigne, restorative Neurology and Neuroscience 8:35-36 (1995); kremer & Perricaudet, british Medical Bulletin (1): 31-44 (1995); haddada et al in Current Topics in Microbiology and Immunology, doerfler and Bohm (eds) (1995); and Yu et al, gene Therapy 1:13-26 (1994).
Non-viral delivery methods of nucleic acids include liposome transfection, agrobacterium-mediated transformation, nuclear transfection, microinjection, gene guns (biolistic), virosomes, liposomes, immunoliposomes, polycations or lipids: nucleic acid conjugates, naked DNA, artificial viral particles, and agents of DNA enhance uptake. For example, liposome transfection is described in U.S. Pat. Nos. 5,049,386, 4,946,787, and 4,897,355, and liposome transfection reagents are commercially available (e.g., transfectam TM Lipofectin T ). Cationic and neutral lipids suitable for use in efficient receptor-recognition liposome transfection (receptor-recognition lipofection) of polynucleotides comprise WO 91/17424 of Feigner; those of WO 91/16024. Delivery may be to cells (e.g., in vitro or ex vivo) or to target tissue (e.g., in vivoInternal application). Lipids comprising targeted liposomes such as immunolipid complexes: preparation of nucleic acid complexes is well known to those of ordinary skill in the art (see, e.g., crystal, science 270:404-410 (1995); blaese et al, cancer Gene Ther.2:291-297 (1995); behr et al, bioconjugate chem.5:382-389 (1994); rem et al, bioconjugate chem.5:647-654 (1994); gao et al, gene Therapy 2:710-722 (1995); ahmad et al, cancer Res.52:4817-4820 (1992); 4,186,183, 4,217,344, 4,235,871, 4,261,975, 4,485,054, 4,501,728, 4,774,085, 4,837,028, and 4,946,787).
Delivery of nucleic acids using RNA or DNA virus-based systems utilizes a highly evolved process to target viruses to specific cells in the body and transport viral loads to the nucleus. The viral vector may be administered directly to the patient (in vivo), or it may be used to treat cells in vitro, and the modified cells may optionally be administered to the patient (ex vivo). Traditional virus-based systems may include retroviral, lentiviral, adenoviral, adeno-associated, and herpes simplex virus vectors for gene transfer. Integration in the host genome is possible using retrovirus, lentivirus, and adeno-associated virus gene transfer methods, which often result in long-term expression of the inserted transgene. In addition, high transduction efficacy has been observed in many different cell types and target tissues.
The tropism (tropism) of retroviruses can be altered by the incorporation of foreign envelope proteins, thereby expanding the potential target population of target cells. Lentiviral vectors are retroviral vectors capable of transducing or infecting non-dividing cells and typically producing high viral titers. Thus, the choice of retroviral gene transfer system will depend on the tissue of interest. Retroviral vectors are composed of cis-acting long terminal repeats with packaging capabilities up to 6-10kb of foreign sequences. The minimal cis-acting LTR is sufficient for replication and packaging of the vector, which is then used to integrate the therapeutic gene into the target cell to provide permanent transgene expression. A widely used retroviral vector comprises: vectors based on murine leukemia Virus (MuLV), gibbon leukemia Virus (GaLV), simian Immunodeficiency Virus (SIV), human Immunodeficiency Virus (HIV), and combinations thereof (see, e.g., buchscher et al, J.Viral.66:2731-2739 (1992); johann et al, J.Viral.66:1635-1640 (1992); sommnerface et al, visual.176:58-59 (1990); wilson et al, J.Viral.63:2374-2378 (1989); miller et al, J.Viral.65:2220-2224 (1991); PCT/US 94/05700).
In applications where transient expression is preferred, adenovirus-based systems may be used. Adenovirus-based vectors can be very transduction efficient in many cell types and do not require cell division (cell division). With such vectors, high titers and high expression levels have been achieved. This carrier can be produced in large quantities in a relatively simple system. Adeno-associated virus ("AAV") vectors can also be used, for example, in the in vitro production of nucleic acids and peptides to transduce cells harboring a target nucleic acid, and for in vivo and ex vivo gene therapy procedures (see, for example, west et al, virology 160:38-47 (1987); U.S. Pat. No. 4,797,368; WO 93/24641;Katin,Human Gene Therapy 5:793-801 (1994); muzyczka, J.Clin. Invest.94:1351 (1994)). Construction of recombinant AAV vectors is described in a number of publications, including U.S. Pat. No. 5,173,414; tratschn et al, mol.cell.biol.5:3251-3260 (1985); tratschn et al, mol.cell.biol.4:2072-2081 (1984); hermonat & Muzyczka, PNAS 81:6466-6470 (1984); and Samulski et al, j.visual.63: 03822-3828 (1989). Packaging cells are typically used to form viral particles capable of infecting host cells. Such cells include 293 cells packaging adenovirus and ψj2 cells or PA317 cells packaging retrovirus.
Viral vectors used in gene therapy are typically produced by generating cell lines (cell lines) that encapsulate nucleic acid vectors within viral particles. The vector typically contains the minimum viral sequences required for packaging and subsequent integration into the host, with the other viral sequences being replaced by an expression cassette for the polynucleotide(s) to be expressed. The lost viral function is normally provided in trans by the packaging cell line. For example, AAV vectors used in gene therapy typically possess only ITR sequences from the AAV genome required for packaging and integration into the host genome. Viral DNA is packaged in cell lines containing helper plasmids (helper plasmids) encoding other AAV genes, i.e., rep and cap, but lacking ITR sequences.
Adenovirus can also be used as a helper to infect cell lines. Helper viruses promote replication of AAV vectors and expression of AAV genes from helper plasmids. Due to the lack of ITR sequences, helper plasmids were not packaged in significant amounts. Contamination of adenovirus can be reduced by, for example, heat treatment (heat treatment), which is more sensitive to adenovirus than to AAV. Additional methods of delivering nucleic acids to cells are known to those of ordinary skill in the art. See, for example, US20030087817, which is incorporated herein by reference.
In some embodiments, host cells are transiently or non-transiently transfected with one or more vectors described herein. In some embodiments, the cell is transfected when it naturally occurs in the subject. In some embodiments, the transfected cells are taken from a subject.
In some embodiments, the transfected cell is a eukaryotic cell. In some embodiments, the eukaryotic cell is an animal cell (e.g., mammalian, insect, fish, bird, and reptile). In some embodiments, the transfected cells are human cells. In some embodiments, the transfected cells are cells of hematopoietic origin, such as immune cells (i.e., cells of the innate or adaptive immune system), including, but not limited to: b cells, T cells, natural Killer (NK) cells, pluripotent stem cells, induced pluripotent stem cells, chimeric antigen receptor T (CAR-T) cells, monocytes, macrophages, and dendritic cells.
In some embodiments, the cells are obtained from cells, such as cell lines, taken from a subject. In some embodiments, the cell or cell line is prokaryotic. In some embodiments, the cell or cell line is eukaryotic. In further embodiments, the cell or cell line is obtained from an insect, avian, plant, or fungal species. In some embodiments, the cell or cell line may be mammalian, such as, for example, human, monkey, mouse, cow, pig, goat, hamster, rat, cat, or dog. A variety of cell lines for tissue culture are known in the art. Examples of cell lines include, but are not limited to: c8161, CCRF-CEM, MOLT, mIMCD-3, NHDF, heLaS3, huhl, huh4, huh7, HUVEC, HASMC, HEKn, HEKa, miaPaCell, panel, PC-3, TFl, CTLL-2, CIR, rat6, CVI, RPTE, alO, T, 182, A375, ARH-77, calul, SW480, SW620, SKOV3, SK-UT, caCo2, P388Dl, SEM-K2, WEHI-231, HB56, TIB55, lurcat, 145.01, LRMB, bcl-1, BC-3, IC21, DLD2, raw264.7, NRK-52E, MRC5, MEF, hep G2, heLa B, heLa T4.COS, COS-1, COS-6, COS-M6A, BS-C-1 monkey kidney epithelial cells, BALB/3T3 mouse embryonic fibroblasts, 3T3 Swiss, 3T3-L1, 132-d5 human fetal fibroblasts; 10.1 mouse fibroblasts, 293-T, 3T3, 721, 9L, A2780, A2780ADR, A2780cis, A172, A20, A253, A431, A-549, ALC, B16, B35, BCP-I cells, BEAS-2B, bEnd.3, BHK-21, BR 293, bxPC3, C3H-10Tl/2, C6/36, cal-27, CHO-7, CHO-IR, CHO-Kl, CHO-K2, CHO-T, CHO Dhfr-/-, COR-L23/CPR, COR-L235010, COL 23/R23, EL-7, COV-434, CML Tl, CT26, D17, DH82, DU145, duCaP, duEL 4, EM2, EM3, EMT6/AR1, T6/AR10.0, CHO-9, HB-55, HCH 69, HB-55, HCH 2, and the like HEK-293, heLa, hepalclc, HL-60, HMEC, HT-29, lurcat, lY cells, K562 cells, ku812, KCL22, KGl, KYOl, LNCap, ma-Mel 1-48, MC-38, MCF-7, MCF-L0A, MDA-MB-231, MDA-MB-468, MDA-MB-435, MDCKII, MDCKII, MOR/0.2R, MONO-MAC6, MTD-lA, myEnd, NCI-H69/CPR, NCI-H69/LX10, NCI-H69/LX20, NCI-H69/LX4, NIH-3T3, NALM-1, NW-145, OPCN/OPCT cell lines, peer, PNT-lA/PNT 2, renCa, RIN-5F, RMA/RMAS, saos-2 cells, sf-9, SKBr3, T2, T-47 9884, THPl cell lines, U373, U87, U937, VCP, vero, YAO-39, YAR 1, YAO-W, YAO-39, YAR 1 and YAO-W, and transgenic varieties thereof. Cell lines can be obtained from a variety of sources known to those of ordinary skill in the art (see, e.g., american type culture collection (American Type Culture Collection) (ATCC) (Ma Nasha s, VA.)).
In some embodiments, cells transfected with one or more vectors described herein are used to establish new cell lines comprising one or more vector-derived sequences. In some embodiments, a novel cell line comprising cells comprising a modification but lacking any other exogenous sequence is established using cells transiently transfected with the fusion proteins of the invention and optionally gRNA or with the ribonucleoprotein complex of the invention and modified by the activity of the fusion protein or ribonucleoprotein complex. In some embodiments, one or more test compounds are assessed using cells transiently or non-transiently transfected with one or more vectors described herein or cell lines derived from such cells.
In some embodiments, one or more vectors described herein are used to generate a transgenic non-human animal or transgenic plant. In some embodiments, the transgenic animal is an insect. In further embodiments, the insect is a pest, such as a mosquito or tick. In some embodiments, the insect is a plant pest, such as corn rootworm or fall armyworm (fall armyworm). In some embodiments, the transgenic animal is a bird, such as a chicken, turkey, goose, or duck. In some embodiments, the transgenic animal is a mammal, such as a human, mouse, rat, hamster, monkey, ape, rabbit, pig, cow, horse, goat, sheep, cat, or dog.
VI, variants and fragments of polypeptides and polynucleotides
The present disclosure provides nucleic acid sequences active on DNA molecules as set forth in SEQ ID NOs: 1-10 and 399-441, active variants or fragments thereof, and polynucleotides encoding the same.
Although the activity of the variants or fragments may be altered compared to the polynucleotide or polypeptide of interest, the variants and fragments should retain the function of the polynucleotide or polypeptide of interest. For example, a variant or fragment may have increased activity, decreased activity, a different spectrum of activity, or any other alteration in activity when compared to a polynucleotide or polypeptide of interest.
Fragments and variants of the deaminase of the present invention having adenine deaminase activity are maintained in that activity if they are part of a fusion protein further comprising a DNA binding polypeptide or fragment thereof.
The term "fragment" refers to a portion of a polynucleotide or polypeptide sequence of the invention. A "fragment" or "biologically active site" comprises a polynucleotide that includes a sufficient number of consecutive nucleotides to maintain biological activity (i.e., deaminase activity on a nucleic acid). A "fragment" or "biologically active site" comprises a polypeptide that includes a sufficient number of consecutive amino acid residues to maintain biological activity. Fragments of deaminase disclosed herein comprise those that are shorter than the full-length sequence due to the use of an alternative downstream initiation site. In some embodiments, the biologically active site of the deaminase may be a polypeptide comprising, for example, SEQ ID NO: the polypeptide of 10, 20, 30, 40, 50, 60, 70, 80, 90, 100, 110, 120, 130, 140, 150, 160, or more consecutive amino acid residues of any one of claims 1-10 and 399-441, or a variant thereof. Such bioactive sites can be prepared by recombinant techniques and evaluated for activity.
In general, "variants" are intended to express substantially similar sequences. For polynucleotides, variants include the deletion and/or addition of one or more nucleotides at one or more internal sites within a natural polynucleotide and/or the substitution of one or more nucleotides at one or more sites in a natural polynucleotide. As used herein, a "natural" or "wild-type" polynucleotide or polypeptide includes naturally occurring nucleotide sequences or amino acid sequences, respectively. For polynucleotides, the reserved variants comprise those sequences that encode the natural amino acid sequence of the gene of interest because of the degeneracy of the gene code. Naturally occurring allelic variants such as these can be identified using well known molecular biology techniques, as for example, using the Polymerase Chain Reaction (PCR) and hybridization techniques outlined below. Variant polynucleotides also include synthetically derived polynucleotides, such as those produced by using site-directed mutagenesis, but which nevertheless encode a polypeptide or polynucleotide of interest. In general, variants of a particular polynucleotide disclosed herein have at least 40%, at least 45%, at least 50%, at least 55%, at least 60%, at least 65%, at least 70%, at least 75%, at least 80%, at least 85%, at least 90%, at least 91%, at least 92%, at least 93%, at least 94%, at least 95%, at least 96%, at least 97%, at least 98%, at least 99% or more sequence identity to that particular polynucleotide as determined by sequence alignment procedures and parameters described elsewhere herein.
Variants of a particular polynucleotide disclosed herein (i.e., a reference polynucleotide) can also be assessed by comparing the percent sequence identity between a polypeptide encoded by the variant polynucleotide and a polypeptide encoded by the reference polynucleotide. The percentage of sequence identity between any two polypeptides can be calculated using sequence alignment programs and parameters described elsewhere herein. When any given polynucleotide pair disclosed herein (which encodes two polypeptides) is evaluated by comparison of the percentage of sequence identity shared by the two polypeptides, the percentage of sequence identity between the two encoded polypeptides is at least 40%, at least 45%, at least 50%, at least 55%, at least 60%, at least 65%, at least 70%, at least 75%, at least 80%, at least 85%, at least 90%, at least 91%, at least 92%, at least 93%, at least 94%, at least 95%, at least 96%, at least 97%, at least 98%, at least 99% or more identical.
In a particular embodiment, the disclosed polynucleotides encode an adenine deaminase comprising an amino acid sequence that hybridizes to SEQ ID NO: the amino acid sequence of any one of claims 1-10 and 399-441 has at least 40%, at least 45%, at least 50%, at least 55%, at least 60%, at least 65%, at least 70%, at least 75%, at least 80%, at least 81%, at least 82%, at least 83%, at least 84%, at least 85%, at least 86%, at least 87%, at least 88%, at least 89%, at least 90%, at least 91%, at least 92%, at least 93%, at least 94%, at least 95%, at least 96%, at least 97%, at least 98%, at least 99%, or more identity.
Biologically active variants of adenine deaminase of the present invention may differ by as few as 1-15 amino acid residues, as few as 1-10 (such as 6-10), as few as 5, as few as 4, as few as 3, as few as 2, or as few as 1 amino acid residue. In particular embodiments, the polypeptide comprises an N-terminal or C-terminal truncation, which truncation may comprise at least a deletion of 5, 10, 15, 20, 25, 30, 35, 40, 45, 50, 55, 60 or more amino acids from the N-terminal or C-terminal of the polypeptide. In some embodiments, the polypeptide comprises an internal deletion, which may comprise at least a deletion of 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 25, 30, 35, 40, 45, 50, 55, 60 or more amino acids.
It will be appreciated that deaminase provided herein can be modified to create variant proteins and polynucleotides. Variations in human design can be introduced through the application of site-directed mutagenesis techniques. In some embodiments, naturally occurring, unknown or unknown polynucleotides and/or polypeptides that are structurally and/or functionally related to the sequences disclosed herein may also be considered to fall within the scope of the present invention. The retained amino acid substitutions may be made in non-retained regions that do not alter the function of the polypeptide to adenine deaminase. In some embodiments, modifications may be made that improve the activity of the adenine deaminase of the deaminase.
Variant polynucleotides and proteins also encompass sequences and proteins derived from mutagenesis and recombination procedures such as DNA rearrangement (DNA shuffling). With this procedure, one or more of the different deaminases disclosed herein (e.g., SEQ ID NOS: 1-10 and 399-441) are manipulated to create a novel adenine deaminase having the desired properties. In this way, libraries of recombinant polynucleotides are generated from populations of related sequence polynucleotides comprising sequence regions that have substantial sequence identity and can be homologously recombined in vitro or in vivo. For example, using this approach, sequence motifs encoding domains of interest may be rearranged between the deaminase sequences provided herein and other subsequently identified deaminase genes to obtain a polypeptide having improved properties of interest (such as in an enzymeIn the case of (1) an increased K m ) A novel gene encoding a protein of (a). Strategies for such DNA shuffling are known in the art. See, for example, stemmer (1994) proc. Natl. Acad. Sci. USA 91:10747-10751; stemmer (1994) Nature 370:389-391; crameri et al (1997) Nature Biotech.15:436-438; moore et al (1997) j.mol. Biol.272:336-347; zhang et al (1997) proc.Natl. Acad. Sci. USA 94:4504-4509; crameri et al (1998) Nature 391:288-291; and U.S. Pat. No. 5,605,793 and U.S. Pat. No. 5,837,458. A "rearranged" nucleic acid is a nucleic acid that is generated by a rearrangement procedure, such as any of the rearrangement procedures set forth herein. Rearranged nucleic acids are generated by, for example, recombining two or more nucleic acids (or strings) in an artificial manner and optionally in a recursive manner (physically or virtually). In general, one or more screening steps are used in the rearrangement process to identify nucleic acids of interest; this screening step may be performed before or after any recombination step. In some (but not all) rearrangement embodiments, it is desirable to perform multiple rounds of recombination prior to selection to increase the diversity of the pool to be screened. The entire process of reorganization and selection may optionally be repeated recursively. Depending on the context, rearrangement may refer to the entire process of recombination and selection, or alternatively, may refer to only the recombination site of the entire process.
As used herein, in the context of two polynucleotide or polypeptide sequences, "sequence identity" or "identity" refers to residues in two sequences that are identical when aligned for maximum correspondence over a specified comparison window. When using percentages of sequence identity with respect to proteins, it should be appreciated that non-identical residue positions often differ by retaining amino acid substitutions, wherein the amino acid residue substitutions replace other amino acid residues having similar chemical properties (e.g., charge or hydrophobicity) and thus do not alter the functional properties of the molecule. When the sequences differ in the retention substitution (conservative substitution), the percent sequence identity can be adjusted upward to correct the retention properties of the substitution. Sequences that differ by such a reserved substitution are said to have "sequence similarity" or "similarity". Means for making this adjustment are well known to those of ordinary skill in the art. Typically, this involves scoring the remaining substitutions as partial mismatches rather than complete mismatches, thereby increasing the percent sequence identity. Thus, for example, when a consistent amino acid is given a score of 1, and a non-reserved substitution is given zero, a score between 0 and 1 is given to the reserved substitution. For example, as implemented in the program PC/GENE (Intelligenetics, mountain View, california), a score to retain a permutation is calculated.
As used herein, the expression "percent sequence identity" expresses a value determined by comparing two optimally aligned sequences over a comparison window, wherein the location of a polynucleotide sequence in the comparison window may include an addition or deletion (i.e., a gap) as compared to a reference sequence (excluding an addition or deletion) for optimal alignment of the two sequences. Determining the number of matching positions by determining the number of positions at which identical nucleobases or amino acid residues occur in the two sequences; dividing the number of matching locations by the total number of locations in the comparison window; and multiplying the result by 100 to find the percentage of sequence identity to calculate the percentage.
Unless otherwise stated, sequence identity/similarity values provided herein refer to values obtained using GAP version 10 using the following parameters: using the% identity and% similarity of GAP weight 50 and length weight 3 nucleotide sequences, and nwsgapdna.cmp scoring matrices; using the% identity and% similarity of amino acid sequences for GAP weight 8 and length weight 2, and BLOSUM62 scoring matrix; or any equivalent thereof. By "equivalent program" is meant: for any two sequences involved, any sequence comparison procedure with identical nucleotide or amino acid residue matches and identical percent sequence identity alignments is produced when compared to the corresponding alignments produced by GAP version 10.
Two sequences are "optimally aligned" when aligned for similarity scoring using a defined amino acid substitution matrix (e.g., BLOSUM 62), gap existence penalty (gap existence penalty), and gap extension penalty (gap extension penalty) to achieve the highest score possible for that sequence. Amino acid substitution matrices and their use in quantifying similarity between two sequences are well known in the art and are described, for example, in Dayhoff et al (1978) "A model of evolutionary change in proteins"; "Atlas of Protein Sequence and Strcture", vol.5, suppl.3 (ed.M.O.Dayhoff), pp.345-352; biosmed.res.found, washington, d.c.; henikoff et al (1992) Proc. Natl. Acad. Sci. USA 89: 10915-10919. The BLOSUM62 matrix is often used as the default score permutation matrix in the sequence alignment operation flow. The gap existence penalty is imposed for the introduction of a single amino acid gap into one of the aligned sequences, while the gap extension penalty is imposed for each additional empty amino acid position that is inserted into an already open gap. Alignment is defined by the amino acid position of each sequence at the beginning and end of the alignment, and may optionally be defined by the insertion of a gap or gaps in one or both sequences, in order to achieve the highest possible score. Although the best alignment and scoring can be done manually, the process can be facilitated by using a computer-implemented alignment algorithm, such as the gapped BLAST 2.0 described in Altschul et al (1997) Nucleic Acids Res.25:3389-3402, and is open to the public at the national center for Biotechnology information (National Center for Biotechnology Information) website (www.ncbi.nlm.nih.gov). Nucleic Acids Res.25, available for example through www.ncbi.nlm.nih.gov and from Altschul et al (1997) may be used: the PSI-BLAST described in 3389-3402 prepares the best alignment comprising multiple alignments.
With respect to an amino acid sequence that is optimally aligned with a reference sequence, an amino acid residue "corresponds" to the position in the reference sequence that is paired with the residue in the alignment. The "position" is indicated by a number that identifies each amino acid in the reference sequence in turn based on its position relative to the N-terminus. Due to deletions, insertions, truncations, fusions etc. which have to be considered in determining the optimal alignment, in general the number of amino acid residues in a test sequence determined by simply counting from the N-terminus is not necessarily the same as the number of its corresponding positions in the reference sequence. For example, in the case of a deletion in the aligned test sequences, there will be no amino acid corresponding to the position in the reference sequence at the deletion site. When an insertion is present in an aligned reference sequence, the insertion will not correspond to any amino acid position in the reference sequence. In the case of truncation or fusion, an amino acid segment (stretch) may be present in the reference sequence or aligned sequences that does not correspond to any amino acid in the corresponding sequence.
VII antibodies
Antibodies directed against deaminase, fusion proteins, or ribonucleoproteins comprising the deaminase of the present invention (including those deaminase, fusion proteins, or ribonucleoproteins having the amino acid sequence set forth in any of SEQ ID NOs: 1-10 and 399-441, or active variants or fragments thereof) are also contemplated. Methods of producing Antibodies are well known in the art (see, e.g., harlow and Lane (1988) Antibodies: A Laboratory Manual, cold spring harbor laboratory, cold spring harbor, new York (Cold Spring Harbor Laboratory, cold Spring Harbor, N.Y.), and U.S. Pat. No. 4,196,265). Such antibodies can be used in kits for detection and isolation of deaminase or fusion proteins or ribonucleoproteins comprising deaminase as described herein. Thus, the present disclosure provides a kit comprising an antibody that specifically binds to a polypeptide or ribonucleoprotein described herein, comprising, for example, a polypeptide or ribonucleoprotein that comprises a polypeptide sequence that hybridizes to SEQ ID NO:1-10 and 399-441.
Systems and ribonucleoprotein complexes for binding and/or modifying a target sequence of interest and methods of making same
The present disclosure provides a system that targets a nucleic acid sequence and modifies the target nucleic acid sequence. In some embodiments, RNA-guided DNA binding polypeptides such as RGN and gRNA result in targeting of ribonucleoprotein complexes to a nucleic acid sequence of interest; deaminase polypeptides fused to RGDBP result in nucleic acid sequences targeted from a > N modifications. In some embodiments, deaminase converts a > G. The guide RNA hybridizes to the target sequence of interest and also forms a complex with the RNA-guided DNA-binding polypeptide, directing the RNA-guided DNA-binding polypeptide to bind to the target sequence. An RNA-guided DNA-binding polypeptide is a domain of a fusion protein; the second domain is a deaminase as described herein. In some embodiments, the RNA-guided DNA-binding polypeptide is RGN, such as Cas9. Other examples of RNA-guided DNA binding polypeptides include RGN, such as those described in international patent application publications WO 2019/236566 and WO 2020/139783. In some embodiments, the RNA-guided DNA-binding polypeptide is a type II CRISPR-Cas polypeptide or an active variant or fragment thereof. In some embodiments, the RNA-guided DNA-binding polypeptide is a V-type CRISPR-Cas polypeptide or an active variant or fragment thereof. In some embodiments, the RNA-guided DNA-binding polypeptide is a type VI CRISPR-Cas polypeptide. In some embodiments, the DNA binding domain of the fusion protein does not require RNA guidance, such as a zinc finger nuclease, TALEN, or meganuclease polypeptide. In some embodiments, the nuclease activity of the DNA binding domain has been partially or completely inactivated. In a further embodiment, the RNA-guided DNA binding polypeptide comprises the amino acid sequence of RGN, such as, for example, APG07433.1 (SEQ ID NO: 41); or an active variant or fragment thereof, such as the other nicking enzyme RGN variants described in nicking enzyme nAG 07433.1 (SEQ ID NO: 42) or examples (SEQ ID NO:52-59, 61, 397 and 398).
In some embodiments, the systems provided herein for binding and modifying a target sequence of interest are ribonucleoprotein complexes, which are at least one molecule of RNA that binds to at least one protein. The ribonucleoprotein complexes provided herein include at least one guide RNA that is an RNA component and a fusion protein comprising a deaminase of the invention and an RNA-guided DNA-binding polypeptide that is a protein component. In some embodiments, the ribonucleoprotein complex is purified from a cell or organism that has been transformed with a polynucleotide encoding a fusion protein and a guide RNA and cultured under conditions that allow expression of the fusion protein and guide RNA.
In various embodiments, ribonucleoprotein complexes are provided that include any of the fusion proteins described herein and a guide RNA that binds to a DNA-binding polypeptide of the fusion protein. For example, provided herein are ribonucleoprotein complexes comprising fusion proteins with a deaminase comprising a polypeptide that hybridizes to SEQ ID NO:407 has an amino acid sequence having at least 80% sequence identity. In another example, a ribonucleoprotein complex is provided that comprises a fusion protein with a deaminase comprising a sequence that hybridizes to SEQ ID NO:399 has an amino acid sequence having at least 80% sequence identity. In yet another example, there is provided a ribonucleoprotein complex comprising a fusion protein with a deaminase comprising a sequence complementary to SEQ ID NO:405 has an amino acid sequence having at least 80% sequence identity. In some of those embodiments of the ribonucleoprotein complex described above, the fusion protein comprises RGN selected from the group consisting of: casX, casY, C2C1, C2, C2C3, geoCas9, asCas 9, saccas 9, nme2Cas9, cjCas9, casl2a (previously referred to as Cpfl), cas12b, cas12g, cas12h, cas12i, aLbCas12a, asCas12a, casMINI, cas b, cas13C, cas13d, cas14, csn2, xCas9, spCas9-NG, lbCas12a, asCas12a, cas9-KKH, annularly arranged Cas9, argonaute (Ago), smacas 9, spy-macCas9 domains, or a polypeptide having the sequence of SEQ ID NO: 41. an RGN of the amino acid sequence shown in any one of 60, 366 or 368. In some embodiments, the ribonucleoprotein complex comprises a polypeptide having a nucleotide sequence identical to SEQ ID NO: 42. 52-59, 61, 397, and 398, and an amino acid sequence comprising at least 95% sequence identity to any one of SEQ ID NOs: 407, a deaminase fused nicking enzyme having an amino acid sequence with at least 80% sequence identity. In some embodiments, the ribonucleoprotein complex comprises a polypeptide having a nucleotide sequence identical to SEQ ID NO: 42. 52-59, 61, 397, and 398, and an amino acid sequence comprising at least 95% sequence identity to any one of SEQ ID NOs: 399 having an amino acid sequence of at least 80% sequence identity. In some embodiments, the ribonucleoprotein complex comprises a polypeptide having a nucleotide sequence identical to SEQ ID NO: 42. 52-59, 61, 397, and 398, and an amino acid sequence comprising at least 95% sequence identity to any one of SEQ ID NOs: 405 a deaminase fused nicking enzyme having an amino acid sequence of at least 80% sequence identity. In some embodiments, the ribonucleoprotein complex comprises and consists of a sequence identical to SEQ ID NO:407, a deaminase fused Cas9 nickase having an amino acid sequence of at least 80% sequence identity. In some embodiments, the ribonucleoprotein complex comprises and consists of a sequence identical to SEQ ID NO:399 a deaminase fused Cas9 nickase having an amino acid sequence of at least 80% sequence identity. In some embodiments, the ribonucleoprotein complex comprises and consists of a sequence identical to SEQ ID NO:405 a deaminase fused Cas9 nickase having an amino acid sequence of at least 80% sequence identity. The Cas9 nickase may be any Cas9 nickase disclosed in PCT patent publication No. WO2020181195, the entire contents of which are incorporated herein by reference. In various embodiments described herein, the ribonucleoprotein complex may also contain the gRNA described herein.
Methods for making deaminase, fusion protein, or fusion protein ribonucleoprotein complexes are provided. Such a method comprises: cells comprising a nucleotide sequence encoding a deaminase, fusion protein, and in some embodiments, a guide RNA, are cultured under conditions that express the deaminase or fusion protein (and in some embodiments, the guide RNA). Deaminase, fusion protein or fusion ribonucleoprotein may then be purified from the lysate of the cultured cells.
Methods for purifying deaminase, fusion protein, or fusion ribonucleoprotein complexes from lysates of biological samples are known in the art (e.g., size exclusion and/or affinity chromatography, 2D-PAGE, HPLC, reverse phase chromatography, immunoprecipitation). In a particular method, the deaminase or fusion protein is recombinantly produced and includes a purification tag to aid in its purification, including but not limited to: glutathione-S-transferase (GST), chitin Binding Protein (CBP), maltose binding protein, thiol redox protein (TRX), poly (NANP), tandem Affinity Purification (TAP) tag, myc, acV5, AU1, AU5, E, ECS, E2, FLAG, HA, nus, softag 1, softag 3, strep, SBP, glu-Glu, HSV, KT3, S, S1, T7, V5, VSV-G, 6xHis, biotin Carboxyl Carrier Protein (BCCP), and calmodulin. In general, the labeled deaminase, fusion protein or fusion ribonucleoprotein complex is purified using immunoprecipitation or other similar methods known in the art.
An "isolated" or "purified" polypeptide or biologically active site thereof is substantially or essentially free of components that normally accompany or interact with the polypeptide as found in its naturally occurring environment. Thus, the isolated or purified polypeptide is substantially free of other cellular material or culture medium when produced by recombinant techniques, or substantially free of chemical precursors or other chemicals when chemically synthesized. Proteins that are substantially free of cellular material comprise protein preparations having less than 30%, less than 20%, less than 10%, less than 5%, or less than 1% (by dry weight) of contaminating proteins. When recombinantly producing a protein of the invention or a biologically active site thereof, optimally culture medium exhibits less than 30%, less than 20%, less than 10%, less than 5%, or less than 1% (by dry weight) of chemical precursors or non-protein chemicals of interest.
The specific methods provided herein for binding and/or cleaving a target sequence of interest involve the use of ribonucleoprotein complexes. In some embodiments, the ribonucleoprotein complex is assembled in vitro. In vitro assembly of ribonucleoprotein complexes can be performed using methods known in the art, wherein the RGDBP polypeptide or fusion protein comprising the same is contacted with a guide RNA under conditions that allow the RGDBP polypeptide or fusion protein comprising the same to bind to the guide RNA. As used herein, "contacting (contact, contacting, contacted)" refers to bringing together the components of a desired reaction under conditions suitable for the desired reaction. In some embodiments of the described methods for modifying a DNA molecule of interest, the contacting step is performed in vitro. In some embodiments, the contacting step is performed in vivo. In some embodiments, the contacting step is performed within a subject (e.g., a human subject or a non-human animal subject). In some embodiments, the contacting step is performed in a cell, such as a human or non-human animal cell. RGDBP polypeptides or fusion proteins comprising the RGDBP polypeptides may be purified from biological samples, cell lysates or culture media, produced via in vitro translation, or chemically synthesized. Guide RNAs may be purified from biological samples, cell lysates or culture media, transcribed in vitro, or chemically synthesized. The RGDBP polypeptide or fusion protein comprising the RGDBP polypeptide, and the guide RNA may be contacted in a solution (e.g., buffered saline solution) to allow for in vitro assembly of the ribonucleoprotein complex.
IX. method for modifying target sequence
The present disclosure provides methods for modifying a target nucleic acid molecule of interest (e.g., a target DNA molecule). The method comprises delivering to a target sequence or a cell, organelle or embryo comprising a target sequence a fusion protein comprising a DNA binding polypeptide, at least one deaminase of the present invention, or a polynucleotide encoding the same. In certain embodiments, the methods comprise delivering to a target sequence or a cell, organelle, or embryo comprising a target sequence a system comprising at least one guide RNA or polynucleotide encoding the same and at least one deaminase and RNA-guided DNA-binding polypeptide of the present invention or polynucleotide encoding the same. In some embodiments, the fusion protein comprises SEQ ID NO:1-10 and 399-441 or an active variant or fragment thereof.
In some embodiments, the methods comprise contacting the DNA molecule with: (a) Fusion proteins comprising deaminase and RNA-guided DNA-binding polypeptides, such as, for example, a nuclease-inactivating or nickase Cas9 domain; and (b) targeting the fusion protein of (a) to a gRNA of a target nucleotide sequence of a DNA molecule; wherein the DNA molecule is contacted with the fusion protein and the gRNA in an effective amount and under conditions suitable for deamination of nucleobases. In some embodiments, the DNA molecule of interest comprises a sequence associated with a disease or disorder, and wherein deamination of the nucleobase results in a sequence not associated with a disease or disorder. In some embodiments, the disease or disorder affects an animal. In further embodiments, the disease or disorder affects a mammal, such as a human, cow, horse, dog, cat, goat, sheep, pig, monkey, rat, mouse, or hamster. In some embodiments, the DNA sequence of interest resides in an allele of a crop plant, wherein a particular allele of the trait of interest results in a plant having lower agronomic value. Deamination of nucleobases results in alleles that improve traits and increase agronomic value of plants.
In those embodiments in which the method comprises delivering a polynucleotide encoding a guide RNA and/or fusion protein, the cell or embryo may then be cultured under conditions that express the guide RNA and/or fusion protein. In various embodiments, the method comprises contacting the sequence of interest with a ribonucleoprotein complex comprising a gRNA and a fusion protein (including deaminase and RNA-guided DNA-binding polypeptide of the invention). In certain embodiments, the method comprises introducing a ribonucleoprotein complex of the invention into a cellular organelle or embryo comprising a sequence of interest. The ribonucleoprotein complexes of the invention can be complexes that have been purified from a biological sample, recombinantly produced and subsequently purified, or assembled in vitro as described herein. In those embodiments in which the ribonucleoprotein complex contacted with the sequence of interest or the cell, organelle, or embryo has been assembled in vitro, the method may further comprise assembling the complex in vitro prior to contacting with the sequence of interest, the cell, organelle, or the embryo.
Purified or in vitro assembled ribonucleoprotein complexes of the invention can be introduced into cells, organelles, or embryos using any method known in the art, including but not limited to electroporation. In some embodiments, fusion proteins comprising deaminase of the present invention and RNA-guided DNA binding polypeptides, as well as polynucleotides encoding or comprising guide RNAs, are introduced into cells, organelles, or embryos using any method known in the art (e.g., electroporation).
Upon delivery to or contact with a target sequence or a cell, organelle or embryo comprising a target sequence, the guide RNA directs the fusion protein to bind to the target sequence in a sequence-specific manner. The sequence of interest may then be modified via the deaminase domain of the fusion protein. In some embodiments, binding of the fusion protein to the target sequence results in modification of nucleotides adjacent to the target sequence. The nucleotide adjacent to the target sequence that is modified by the deaminase may be 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 25, 30, 35, 40, 45, 50, 55, 60, 65, 70, 75, 80, 85, 90, 95, or 100 base pairs from the 5 'or 3' end of the target sequence. Fusion proteins comprising deaminase and RNA-guided DNA binding polypeptides of the present invention can introduce targeted a > N (and preferably targeted a > G) mutations into the targeted DNA molecules.
In some embodiments of the described methods for modifying a DNA molecule of interest, the contacting step is performed in vitro. In certain embodiments, the contacting step is performed in vivo. In some embodiments, the contacting step is performed within a subject (e.g., a human subject or a non-human animal subject). In some embodiments, the contacting step is performed in a cell, such as a human or non-human animal cell.
Methods for measuring binding of fusion proteins to target sequences are known in the art and comprise: chromatin immunoprecipitation assays, gel mobility shift assays, DNA pulldown assays, reporter assays (reporter assays), microplate capture and detection assays. Likewise, methods of measuring cleavage or modification of a target sequence are known in the art and include in vitro or in vivo cleavage assays, wherein cleavage is confirmed using PCR, sequencing, or gel electrophoresis with or without the attachment of an appropriate tag (e.g., radioisotope, fluorescent substance) to the target sequence in order to facilitate detection of degradation products. In some embodiments, a nick-triggered exponential amplification reaction (nicking triggered exponential amplifucation reaction) (NTEXPAR) assay is used (see, e.g., zhang et al (2016) chem. Sci.7:4951-4957). In vivo cleavage can be assessed using the survivin assay (Guschin et al (2010) Methods Mol Biol 649:247-256).
In some embodiments, the methods involve the use of DNA guide domains that are RNA-binding to portions of the fusion protein that are complexed with more than one guide RNA. More than one guide RNA may target different regions of a single gene, or may target multiple genes. Such multiple targeting allows the deaminase domain of the fusion protein to modify the nucleic acid, thereby introducing multiple mutations into the target nucleic acid molecule of interest (e.g., genome).
In those embodiments in which the method involves the use of an RNA-guided nuclease (RGN) such as a nicking enzyme RGN (i.e., capable of cleaving only a single strand in a double-stranded polynucleotide, e.g., nAG 07433.1 (SEQ ID NO:42 or SEQ ID NOs: 50-57)), the method may include introducing two different RGNs or RGN variants that target identical or overlapping target sequences and cleave different strands of the polynucleotide. For example, an RGN nicking enzyme that cleaves only the plus (+) strand of a double-stranded polynucleotide may be introduced along with a second RGN nicking enzyme that cleaves only the minus (-) strand of the double-stranded polynucleotide. In some embodiments, two different fusion proteins are provided, wherein each fusion protein includes a different RGN with a different PAM recognition sequence, such that a greater diversity of nucleotide sequences can be targeted for mutation.
One of ordinary skill in the art will appreciate that any of the methods presently disclosed may be used to target a single target sequence or multiple target sequences. Thus, these methods include the use of fusion proteins comprising single RNA-guided DNA-binding polypeptides in combination with multiple distinct guide RNAs, which can target a single gene and/or multiple distinct sequences within multiple genes. The deaminase domain of the fusion protein then directs the mutation at each targeting sequence. Also encompassed herein are methods in which a plurality of distinct guide RNAs are introduced in combination with a plurality of distinct RNA-guided DNA-binding polypeptides. Such RNA-guided DNA-binding polypeptides may be multiple RGNs or RGN variants. Such guide RNAs and guide RNA/fusion protein systems may target a single gene and/or multiple distinct sequences within multiple genes.
In some embodiments, fusion proteins comprising an RNA-guided DNA-binding polypeptide and a deaminase polypeptide of the present invention may be used to generate mutations in a targeted gene or in a targeted region of a gene of interest. In some embodiments, the fusion proteins of the invention may be used for saturation mutagenesis of targeted genes or regions of targeted genes of interest, followed by high throughput forward gene screening (high-throughput forward genetic screening) to discriminate between new mutations and/or expression patterns. In some embodiments, the fusion proteins described herein may be used to generate mutations in targeted genomic locations, which may or may not include coding DNA sequences. Libraries of cell lines induced by targeted mutations described above may also be useful for studying gene function or gene expression.
X. target polynucleotide
In one aspect, the invention provides a method of modifying a polynucleotide of interest in a eukaryotic cell, which may be in vivo, ex vivo or in vitro. In some embodiments, the method comprises: sampling cells or cell populations from human or non-human animals or plants (including microalgae); and modifying the cell or cells. Culturing may occur ex vivo at any stage. The cell or cells may even be reintroduced into a non-human animal or plant (including microalgae).
Using natural variability, plant breeders combine most useful genes for desirable qualities such as yield, quality, uniformity, cold tolerance, and pest resistance. Such desirable qualities also include growth, day length preference, temperature requirements, date of initiation of floral or reproductive development, fatty acid content, insect resistance, disease resistance, nematode resistance, fungal resistance, herbicide resistance, tolerance to various environmental factors including drought, heat, humidity, cold, wind and adverse soil conditions including high salinity. Sources of such useful genes include natural or foreign varieties, primordial species (heirloom varities), wild plant inbreds, and induced mutations such as treatment of plant material with mutagens. Using the present invention, new tools for inducing mutations are provided to plant breeders. Accordingly, one of ordinary skill in the art can employ the present invention to induce an increase in useful genes while being more accurate than previous mutagens and thereby accelerate and improve plant breeding programs.
The target polynucleotide of the deaminase or fusion protein of the present invention may be any polynucleotide that is endogenous or exogenous to the eukaryotic cell. For example, the polynucleotide of interest may be a polynucleotide residing in the nucleus of a eukaryotic cell. In some embodiments, the polynucleotide of interest is a sequence encoding a gene product (e.g., a protein) or a non-coding sequence (e.g., a regulatory polynucleotide or garbage DNA (junk DNA)). In some embodiments, the target sequence of the fusion protein of the invention is associated with PAM (pre-spacer adjacent motif); that is, the RNA-guided DNA binding polypeptide recognizes a short sequence. The exact sequence and length requirements of PAM will vary depending on the RNA-guided DNA binding polypeptide used, but PAM is typically a 2-5 base pair sequence adjacent to the pre-spacer sequence (i.e., the target sequence).
The target polynucleotide of the fusion protein of the present invention may comprise a number of disease-associated genes and polynucleotides and signaling biochemical pathway-associated genes and polynucleotides. Examples of polynucleotides of interest include sequences associated with signaling biochemical pathways, such as signaling biochemical pathway-associated genes or polynucleotides. Examples of polynucleotides of interest include disease-associated genes or polynucleotides. "disease-associated" gene or polynucleotide refers to: any gene or polynucleotide that produces a transcribed or translated product at an abnormal level or in an abnormal form in cells obtained from a tissue infected with a disease, as compared to a tissue or cell not under disease control. It may be a gene that becomes expressed at an abnormally high level; it may be a gene that becomes expressed at abnormally low levels, wherein altered expression is associated with the occurrence and/or progression of the disease. Disease-associated genes also refer to genes possessing mutation(s) or genetic variation that are responsible for the cause of the disease directly (e.g., causal mutation) or in linkage disequilibrium with the gene(s) responsible for the cause of the disease (e.g., causal mutation) (1 inkage disequilibrium). The transcription or translation products may be known or unknown, and may further be at normal or abnormal levels.
Non-limiting examples of disease-associated genes that can be targeted using the presently disclosed methods and compositions are provided in table 34. In some embodiments, the disease-associated genes targeted are those disclosed in table 34 having G > a mutations. Additional examples of disease-associated genes and polynucleotides can be found from the U.S. national center of biotechnology information (National Center for Biotechnology Information) at john hopkins university (barlmo, maryland) mckudock-Nathans institute of genetic medicine (mckudock-Nathans Institute of Genetic Medicine) and the U.S. national library of medicine (Bethesda, maryland) (National Library of Medicine (Bethesda, md.)) found on the world wide web.
In some embodiments, the polynucleotide of interest comprises a cystic fibrosis transmembrane conductance regulator (5) gene.
The term "cystic fibrosis transmembrane conductance regulator" or "CFTR" as used herein refers to cAMP-mediated chloride ion channels located in the apical membrane of epithelial cells that catalyze the passage of small ions through the membrane. Non-limiting examples of CFTR genes are set forth in SEQ ID NO: 51.
The term "target" or "targets" as used herein with respect to spacer sequences and target sequences refers to the positioning of an RNA-guided nuclease relative to a target sequence based on the ability of the spacer sequences within the associated guide RNAs to sufficiently hybridize to the target sequence.
CRISPR RNA (crRNA) or a nucleic acid molecule encoding CRISPR RNA (crRNA) is provided, wherein the crRNA comprises a spacer sequence targeting a CFTR target sequence. Also provided are guide RNAs comprising such crrnas, one or more nucleic acid molecules encoding guide RNAs comprising such crrnas, vectors comprising one or more nucleic acid molecules encoding guide RNAs comprising such crrnas, and systems comprising such crrnas. Methods of binding to, cleaving, and/or modulating a target sequence using such crrnas or nucleic acid molecules encoding such crrnas, guide RNAs comprising such crrnas, one or more nucleic acid molecules encoding guide RNAs comprising such crrnas, vectors comprising one or more nucleic acid molecules encoding guide RNAs comprising such crrnas, and systems comprising such crrnas are also provided.
In some embodiments, the CFTR target sequence of the crRNA or guide RNA has the sequence of SEQ ID NO:98-115, 140-151, 186-202, 235-250, 287-304, 345-364, 562 and 563, or a complement thereof. In some embodiments, a single guide RNA (sgRNA) comprising a crRNA having a spacer sequence targeting the CFTR target sequence is identical to SEQ ID NO: any of 98-115, 140-151, 186-202, 235-250, 287-304, 345-364, and 564 has at least 40%, 45%, 50%, 55%, 60%, 65%, 70%, 75%, 80%, 85%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, 99% or more sequence identity.
In some embodiments, the CFTR target sequence of the crRNA or guide RNA has the sequence of SEQ ID NO:62-68, 80-85, 116-119, 128-131, 163, 164, 180, 181, 203-209, 219-225, 256-258, 274-276, 310-313, and 330-333, or a complement thereof, and the related RGN polypeptide has a sequence identical to any one of SEQ ID NOs: 53 has an amino acid sequence that is at least 40%, 45%, 50%, 55%, 60%, 65%, 70%, 75%, 80%, 85%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, 99% or more sequence identity. In some embodiments, the sgrnas comprising crrnas that have spacer sequences that target CFTR target sequences are identical to SEQ ID NOs: any of 98-104, 140-143, 197, 198, 235-241, 292-294, and 350-353 has at least 40%, 45%, 50%, 55%, 60%, 65%, 70%, 75%, 80%, 85%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, 99% or more sequence identity, and the related RGN polypeptide has a sequence identical to SEQ ID NO:53 has an amino acid sequence that is at least 40%, 45%, 50%, 55%, 60%, 65%, 70%, 75%, 80%, 85%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, 99% or more sequence identity.
In some embodiments, the CFTR target sequence of the crRNA or guide RNA has the sequence of SEQ ID NO:68-71, 86-89, 120-122, 132-134, 152-156, 169-173, 213-215, 229-231, 251-255, 269-273, 305-309, and 325-329, or complements thereof, and the related RGN polypeptide has a sequence corresponding to any one of SEQ ID NOs: 55 has an amino acid sequence that is at least 40%, 45%, 50%, 55%, 60%, 65%, 70%, 75%, 80%, 85%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, 99% or more sequence identity. In some embodiments, the sgrnas comprising crrnas that have spacer sequences that target CFTR target sequences are identical to SEQ ID NOs: 104-107, 144-146, 186-190, 245-247, 287-291, 345-349, and the related RGN polypeptide has at least 40%, 45%, 50%, 55%, 60%, 65%, 70%, 75%, 80%, 85%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, 99% or more sequence identity to SEQ ID NO:55 has an amino acid sequence that is at least 40%, 45%, 50%, 55%, 60%, 65%, 70%, 75%, 80%, 85%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, 99% or more sequence identity.
In some embodiments, the CFTR target sequence of the crRNA or guide RNA has the sequence of SEQ ID NO: 72. 73, 90, 91, 161, 162, 178, 179, 265, 266, 283, and 284, or a complement thereof, and the relevant RGN polypeptide has a sequence identical to any one of SEQ ID NOs: 52 has an amino acid sequence that is at least 40%, 45%, 50%, 55%, 60%, 65%, 70%, 75%, 80%, 85%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, 99% or more sequence identity. In some embodiments, the sgrnas comprising crrnas that have spacer sequences that target CFTR target sequences are identical to SEQ ID NOs: 108. 109, 195, 196, 301, and 302, and the related RGN polypeptide has at least 40%, 45%, 50%, 55%, 60%, 65%, 70%, 75%, 80%, 85%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, 99% or more sequence identity to SEQ ID NO:52 has an amino acid sequence that is at least 40%, 45%, 50%, 55%, 60%, 65%, 70%, 75%, 80%, 85%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, 99% or more sequence identity.
In some embodiments, the CFTR target sequence of the crRNA or guide RNA has the sequence of SEQ ID NO: 74. 75, 92, 93, 123, 124, 135, 136, 167, 184, 216-218, 232-234, 259-261, 277-279, 314-317, and 334-337, or the complement thereof, and the relevant RGN polypeptide has a sequence identical to any one of SEQ ID NOs: 56 has an amino acid sequence that is at least 40%, 45%, 50%, 55%, 60%, 65%, 70%, 75%, 80%, 85%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, 99% or more sequence identity. In some embodiments, the sgrnas comprising crrnas that have spacer sequences that target CFTR target sequences are identical to SEQ ID NOs: 110. 111, 147, 148, 201, 248-250, 295-297, and 354-357, and the related RGN polypeptide has at least 40%, 45%, 50%, 55%, 60%, 65%, 70%, 75%, 80%, 85%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, 99% or more sequence identity to SEQ ID NO:56 has an amino acid sequence that is at least 40%, 45%, 50%, 55%, 60%, 65%, 70%, 75%, 80%, 85%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, 99% or more sequence identity.
In some embodiments, the CFTR target sequence of the crRNA or guide RNA has the sequence of SEQ ID NO: 76. 94, 210-212, 226-228, 322, 342, 562, and 563, or a complement thereof, and the relevant RGN polypeptide has a sequence identical to any one of SEQ ID NOs: 42 has an amino acid sequence that is at least 40%, 45%, 50%, 55%, 60%, 65%, 70%, 75%, 80%, 85%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, 99% or more sequence identity. In some embodiments, the sgrnas comprising crrnas that have spacer sequences that target CFTR target sequences are identical to SEQ ID NOs: 112. any of 242-244, 362, 564 has at least 40%, 45%, 50%, 55%, 60%, 65%, 70%, 75%, 80%, 85%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, 99% or more sequence identity, and the relevant RGN polypeptide has a sequence identical to SEQ ID NO:42 has an amino acid sequence that is at least 40%, 45%, 50%, 55%, 60%, 65%, 70%, 75%, 80%, 85%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, 99% or more sequence identity.
In some embodiments, the CFTR target sequence of the crRNA or guide RNA has the sequence of SEQ ID NO: 77. 95, 125, 137, 157-160, 174-177, 323, and 343, or a complement thereof, and the related RGN polypeptide has a sequence identical to any one of SEQ ID NOs: 54 has an amino acid sequence that is at least 40%, 45%, 50%, 55%, 60%, 65%, 70%, 75%, 80%, 85%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, 99% or more sequence identity. In some embodiments, the sgrnas comprising crrnas that have spacer sequences that target CFTR target sequences are identical to SEQ ID NOs: 113. 149, 191-194, 363, and the related RGN polypeptide has at least 40%, 45%, 50%, 55%, 60%, 65%, 70%, 75%, 80%, 85%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, 99% or more sequence identity to SEQ ID NO:54 has an amino acid sequence that is at least 40%, 45%, 50%, 55%, 60%, 65%, 70%, 75%, 80%, 85%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, 99% or more sequence identity.
In some embodiments, the CFTR target sequence of the crRNA or guide RNA has the sequence of SEQ ID NO: 78. 96, 126, 138, 168, 185, 267, 285, 318, 319, 338, and 339, or a complement thereof, and the related RGN polypeptide has a sequence identical to any one of SEQ ID NOs: 57 has an amino acid sequence that is at least 40%, 45%, 50%, 55%, 60%, 65%, 70%, 75%, 80%, 85%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, 99% or more sequence identity. In some embodiments, the sgrnas comprising crrnas that have spacer sequences that target CFTR target sequences are identical to SEQ ID NOs: 114. 150, 202, 303, 358, 359, and the related RGN polypeptide has at least 40%, 45%, 50%, 55%, 60%, 65%, 70%, 75%, 80%, 85%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, 99% or more sequence identity to SEQ ID NO:57 has an amino acid sequence that is at least 40%, 45%, 50%, 55%, 60%, 65%, 70%, 75%, 80%, 85%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, 99% or more sequence identity.
In some embodiments, the CFTR target sequence of the crRNA or guide RNA has the sequence of SEQ ID NO: 79. 97, 127, 139, 262-264, 280-282, 324, and 344, or a complement thereof, and the related RGN polypeptide has a sequence that is identical to any one of SEQ ID NOs: 58 has an amino acid sequence that is at least 40%, 45%, 50%, 55%, 60%, 65%, 70%, 75%, 80%, 85%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, 99% or more sequence identity. In some embodiments, the sgrnas comprising crrnas that have spacer sequences that target CFTR target sequences are identical to SEQ ID NOs: 115. 151, 298-300, and 364 have at least 40%, 45%, 50%, 55%, 60%, 65%, 70%, 75%, 80%, 85%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, 99% or more sequence identity, and the relevant RGN polypeptide has a sequence identical to SEQ ID NO:58 has an amino acid sequence that is at least 40%, 45%, 50%, 55%, 60%, 65%, 70%, 75%, 80%, 85%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, 99% or more sequence identity.
In some embodiments, the CFTR target sequence of the crRNA or guide RNA has the sequence of SEQ ID NO: 165. 166, 182, 183, 268, 286, 320, 321, 340, and 341, or a complement thereof, and the relevant RGN polypeptide has a sequence identical to any one of SEQ ID NOs: 59 has an amino acid sequence that is at least 40%, 45%, 50%, 55%, 60%, 65%, 70%, 75%, 80%, 85%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, 99% or more sequence identity. In some embodiments, the sgrnas comprising crrnas that have spacer sequences that target CFTR target sequences are identical to SEQ ID NOs: 199. any of 200, 304, 360, and 361 has at least 40%, 45%, 50%, 55%, 60%, 65%, 70%, 75%, 80%, 85%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, 99% or more sequence identity, and the relevant RGN polypeptide has a sequence identical to SEQ ID NO:59 has an amino acid sequence that is at least 40%, 45%, 50%, 55%, 60%, 65%, 70%, 75%, 80%, 85%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, 99% or more sequence identity.
In some embodiments, the methods comprise contacting a DNA molecule comprising a DNA sequence of interest with a DNA binding polypeptide deaminase fusion protein of the present invention, wherein the DNA molecule is contacted with the fusion protein in an effective amount and under conditions suitable for nucleobase deamination. In certain embodiments, the methods comprise contacting a DNA molecule comprising a DNA sequence of interest: (a) RGN deaminase fusion proteins of the present invention; and (b) targeting the fusion protein of (a) to a gRNA of a target nucleotide sequence of a DNA strand; wherein the DNA molecule is contacted with the fusion protein and the gRNA in an effective amount and under conditions suitable for nucleobase deamination. In some embodiments, the DNA sequence of interest comprises a sequence associated with a disease or disorder, and wherein deamination of the nucleobase results in a sequence not associated with a disease or disorder. In some embodiments, the DNA sequence of interest resides in an allele of a crop plant, wherein a particular allele of the trait of interest results in a plant having lower agronomic value. Deamination of nucleobases results in alleles that improve traits and increase agronomic value of plants.
In some embodiments, the DNA sequence of interest comprises a G > a point mutation associated with a disease or disorder, and wherein deamination of the mutant a base results in a sequence that is unassociated with the disease or disorder. In some embodiments, deamination corrects point mutations in sequences associated with a disease or disorder. In some embodiments, the sequence associated with the disease or disorder encodes a protein, and deamination introduces a stop codon into the sequence associated with the disease or disorder, resulting in truncation of the encoded protein. In some embodiments, the contacting is performed in vivo in a subject susceptible to, suffering from, or diagnosed with a disease or disorder. In some embodiments, the disease or disorder is a disease associated with a point mutation or single base mutation in the genome. In some embodiments, the disease is a genetic disease, cancer, metabolic disease, or lysosomal storage disease.
XI pharmaceutical composition and method of treatment
Provided herein are methods of treating a disease in a subject in need thereof. The method comprises administering to a subject in need thereof an effective amount of a presently disclosed fusion protein or a polynucleotide encoding the fusion protein, a presently disclosed gRNA or a polynucleotide encoding the gRNA, a presently disclosed fusion protein system, a presently disclosed ribonucleoprotein complex, or a cell modified by or comprising any of such compositions.
In some embodiments, the treatment comprises in vivo gene editing by administering to a subject in need thereof a fusion protein, gRNA, or fusion protein system of the present disclosure, or polynucleotide(s) encoding the fusion protein, gRNA, or fusion protein system of the present disclosure. In some embodiments, the treatment comprises in vitro gene editing, wherein the cell is genetically modified in vitro with the fusion protein, gRNA, or fusion protein system of the present disclosure, or polynucleotide(s) encoding the fusion protein, gRNA, or fusion protein system of the present disclosure, and then administering the modified cell to the subject. In some embodiments, the genetically modified cells originate from a subject to whom the modified cells are later administered, and the transplanted cells are referred to herein as autologous. In some embodiments, the genetically modified cells originate from a different subject (i.e., donor) in the same species as the subject (i.e., recipient) to which the modified cells were administered, and the transplanted cells are referred to herein as allogeneic. In some examples described herein, the cells may be expanded in culture prior to administration to a subject in need thereof.
For example, in some embodiments, a method is provided that includes administering to a subject having such a disease (e.g., a gene defect associated with the CFTR gene) an effective amount of a ribonucleoprotein complex comprising a polypeptide having a nucleotide sequence that matches SEQ ID NO: 399. and 405-407, and a deaminase fusion protein having an amino acid sequence with at least 80% identity to the sequence set forth in any of claims. In embodiments described herein, administration of ribonucleoprotein complexes corrects point mutations or introduces deactivating mutations into disease-associated CFTR genes. Other diseases treatable by correction of point mutations or introduction of deactivating mutations into disease-associated genes will be known to those of skill in the art, and the disclosure is not limited in this respect.
In some embodiments, the disease to be treated with the presently disclosed compositions is a disease that can be treated with immunotherapy, such as with Chimeric Antigen Receptor (CAR) T cells. Such diseases include, but are not limited to, cancer.
In some embodiments, deamination of the nucleobase of interest results in correction of a gene defect (e.g., to correct the CFTR gene), or correction of a point mutation that causes loss of function of the gene product. In some embodiments, the gene defect is associated with a disease or disorder, such as a lysosomal storage disorder or metabolic disease (e.g., such as type I diabetes). Thus, in some embodiments, a disease to be treated with the presently disclosed compositions is associated with a mutated sequence (i.e., the sequence is causal to the disease or disorder or causal to a symptom associated with the disease or disorder) in order to treat the disease or disorder or a reduction in the symptom associated with the disease or disorder.
In some embodiments, the disease to be treated with the presently disclosed compositions is associated with causal mutations. As used herein, "causal mutation" refers to a particular nucleotide, nucleotides or nucleotide sequence in the genome that contributes to the severity or appearance of a disease or disorder in a subject. Correction of the causal mutation results in an improvement of at least one symptom caused by the disease or disorder. In some embodiments, correction of the causal mutation results in an improvement in at least one symptom caused by the disease or disorder. In some embodiments, the causal mutation is adjacent to a PAM site recognized by an RGDBP (e.g., RGN) of the deaminase fusion disclosed herein. Causal mutations can be corrected with fusion polypeptides comprising RGDBP (e.g., RGN) and the deaminase enzymes of the current disclosure. Non-limiting examples of diseases associated with causal mutations include: cystic fibrosis, hurler syndrome, friedreich's Ataxia, huntington's Disease, and sickle cell Disease. Additional non-limiting examples of disease-associated genes and mutations are available from the university of john hopkins (barlmo, ma) mckudock-Nathans institute of genetic medicine (mckudock-Nathans Institute of Genetic Medicine, johns Hopkins University (back, md.)) and the national center for biotechnology information (National Center for Biotechnology Information, national Library of Medicine (Bethesda, md.)) of the national medical library (bestada, ma), which are available on the world wide web.
In some embodiments, the methods provided herein are used to introduce a point of deactivation mutation into a gene or allele encoding a gene product associated with a disease or disorder. For example, in some embodiments, provided herein are methods of introducing a point of inactivation mutation into an oncogene (e.g., in the treatment of a proliferative disease) using a fusion protein. In some embodiments, the inactivating mutation can produce a premature stop codon in the coding sequence, which results in expression of a truncated gene product (e.g., a truncated protein lacking the function of a full-length protein). In some embodiments, the purpose of the methods provided herein is to restore the function of a dysfunctional gene via genome editing. The fusion proteins provided herein may be effective for in vitro gene editing-based human therapies, such as by correcting disease-associated mutations in human cell culture. It will be appreciated by those skilled in the art that the fusion proteins provided herein (e.g., fusion proteins comprising an RNA-guided DNA-binding polypeptide and a deaminase polypeptide) can be used to correct any single point G > a mutation. Deamination of mutants a to G resulted in correction of the mutation.
As used herein, "treatment" or "treatment," or "alleviation" or "improvement" may be used interchangeably. Such terms refer to measures for achieving a beneficial or desired result, including but not limited to therapeutic benefit and/or prophylactic benefit. By therapeutic benefit is meant any treatment-related improvement in or effect on one or more diseases, conditions or symptoms in treatment. For prophylactic benefit, the composition may be administered to a subject at risk of developing a particular disease, disorder, or symptom, or to a subject reporting one or more physiological symptoms of a disease, even though the disease, disorder, or symptom may not have been manifested as a sign. In some embodiments, the treatment is administered after one or more symptoms have progressed and/or after the disease has been diagnosed. In particular embodiments, in the absence of symptoms, treatment may be administered, for example, to prevent or delay the onset of symptoms or to inhibit the onset or progression of disease. For example, a treatment is administered to a susceptible individual prior to onset of symptoms (e.g., in view of symptom history and/or in view of genes or other predisposition factors). Treatment may also continue after the symptoms are relieved, for example, to prevent or delay their prevention or recurrence.
The term "effective amount" or "therapeutically effective amount" refers to an amount of a drug sufficient to achieve a beneficial or desired result. The therapeutically effective amount may vary depending on one or more of the following: the subject and disease condition being treated, the weight and age of the subject, the severity of the disease condition, the manner of administration, and the like, as readily determined by one of skill in the art. The particular dose may vary depending on one or more of the following: the particular agent selected, the dosing regimen to be followed, whether to administer in combination with other compounds, the timing of administration, and the delivery system in which it is delivered.
The term "administering" refers to placing an active ingredient into a subject by a method or route that results in at least localized localization of the introduced active ingredient at a desired site (such as a site of injury or repair) such that the desired effect(s) is produced. In some embodiments, the present disclosure provides methods comprising delivering any of the isolated polypeptides, nucleic acid molecule fusion proteins, ribonucleoprotein complexes, vectors, pharmaceutical compositions, and/or grnas described herein. In some embodiments, the disclosure further provides cells produced by such methods and organisms (such as animals or plants) comprising or produced from such cells. In some embodiments, deaminase, fusion protein and/or nucleic acid molecule combined with (and optionally complexed with) a guide sequence as described herein is delivered to a cell.
In some embodiments, the administering comprises administering by viral delivery. Viral vectors comprising nucleic acids encoding the fusion proteins, ribonucleoprotein complexes, or vectors disclosed herein can be administered directly to a patient (i.e., in vivo), or they can be used to treat cells in vitro, and optionally modified cells can be administered to a patient (i.e., ex vivo). Conventional virus-based systems may include, but are not limited to, retroviral, lentiviral, adenoviral, adeno-associated, and herpes simplex virus vectors for gene transfer. Integration in the host genome is possible using retrovirus, lentivirus, and adeno-associated virus gene transfer methods, often resulting in long-term expression of the inserted transgene. Lentiviral vectors are retroviral vectors capable of transducing or infecting non-dividing cells and typically producing high viral titers. In preferred applications of transient expression, adenovirus-based systems may be used. Adenovirus-based vectors are capable of very high transduction efficiencies in many cell types and do not require cell division.
In some embodiments, administering comprises administering by electroporation. In some embodiments, the administering comprises administering by nanoparticle delivery. In some embodiments, the administering comprises administering by liposome delivery. Any effective route of administration may be used to administer the effective amounts of the pharmaceutical compositions described herein.
In some embodiments, administering includes administering by other non-viral delivery of the nucleic acid. Exemplary non-viral delivery methods include, but are not limited to, RNP complexes, lipofection, nuclear transfection, microinjection, gene guns (biolistics), virosomes, liposomes, immunoliposomes, polycations or lipid nucleic acid conjugates (1 ipidmucleic acid conjugate), naked DNA, artificial viral particles, and agents enhanced uptake of DNA. Liposome transfection is described, for example, in U.S. Pat. Nos. 5,049,386, 4,946,787, and 4,897,355, and liposome transfection reagents are commercially available (e.g., transfectam TM Lipofectin TM ). Cationic and neutral lipids suitable for use in efficient receptor-recognition liposome transfection (receptor-recognition lipofection) of polynucleotides comprise Feigner WO 1991/17424; those of WO 1991/16024. Delivery may be to cells (e.g., administered in vitro or ex vivo) or to a target tissue (e.g., administered in vivo).
As used herein, the term "subject" refers to any individual for whom diagnosis, treatment or therapy is desired. In some embodiments, the subject is an animal. In some embodiments, the subject is a mammal. In some embodiments, the subject is a human.
The efficacy of the treatment may be determined by a skilled clinician. However, a treatment is considered "effective treatment" if any or all of the signs or symptoms of the disease or disorder are altered in a beneficial manner (e.g., at least 10% reduced), or the symptoms or signs of other clinically acceptable diseases are improved or ameliorated. Efficacy can also be measured by the individual not experiencing deterioration as assessed by hospitalization, or not requiring medical intervention (e.g., progression of the disease is stopped or at least slowed). Methods of measuring such indicators are known to those skilled in the art. The treatment comprises: (1) Inhibiting a disease, such as, for example, arresting or slowing the progression of symptoms; or (2) slowing the disease, e.g., causing regression of symptoms; and (3) preventing or reducing the likelihood of symptom development.
Providing a pharmaceutical composition comprising: the cell and pharmaceutically acceptable carrier of any one of the presently disclosed RGN polypeptides or polynucleotides encoding the RGN polypeptides, the presently disclosed grnas or polynucleotides encoding the grnas, the presently disclosed deaminases or polynucleotides encoding the deaminases, the presently disclosed fusion proteins, the presently disclosed systems (such as those comprising fusion proteins), the presently disclosed ribonucleoprotein complexes or polynucleotides encoding the RGN polypeptides or RGNs, the grnas or grnas encoding polynucleotides, fusion protein encoding polynucleotides, or systems.
As used herein, a "pharmaceutically acceptable carrier" refers to a material that does not cause significant irritation to an organism and does not abrogate the activity and properties of the active ingredient (i.e., the deaminase or fusion protein or nucleic acid molecule encoding the deaminase or fusion protein). The carrier must be of sufficiently high purity and sufficiently low toxicity so that these are suitable for administration to the subject being treated. The carrier may be inert, which may also have pharmaceutical benefits. In some embodiments, the pharmaceutically acceptable carrier comprises one or more compatible solid or liquid fillers, diluents or encapsulating substances suitable for administration to a human or other vertebrate. In some embodiments, the pharmaceutically acceptable composition includes a pharmaceutically acceptable carrier that is not naturally occurring. In some embodiments, the pharmaceutically acceptable carrier is not found to be together with the active ingredient in nature and thus they are heterologous.
The pharmaceutical compositions used in the presently disclosed methods may be formulated with suitable carriers, excipients, and other agents that provide suitable transfer, delivery, tolerability, and the like. Numerous suitable formulations are known to those skilled in the art. See, e.g., remington, the Science and Practice of Pharmacy (21 st ed.2005). Non-limiting examples include: sterile diluents, such as water for injection, physiological saline solution, non-volatile oils, polyethylene glycols, glycerol, propylene glycol, or other synthetic solvents); antimicrobial agents such as benzyl alcohol or methyl parabens (methyl parabens); antioxidants such as ascorbic acid or sodium bisulfite; chelating agents such as ethylenediamine tetraacetic acid (ethylenediaminetetraacetic acid); buffers such as acetate, citrate or phosphate; and agents for osmotically modulating, such as sodium chloride or glucose. A particular carrier for intravenous administration is physiological saline or Phosphate Buffered Saline (PBS). Pharmaceutical compositions for oral or parenteral use may be prepared in unit dosage forms suitable for administration of a dose of the active ingredient. Dosage forms of such unit doses include, for example, lozenges, pills, capsules, injections (ampoules), suppositories and the like. Such compositions may also contain adjuvants comprising preserving, wetting, emulsifying, and dispersing agents. Prevention of the action of microorganisms must be ensured by various antibacterial and antifungal agents (e.g., parabens, chlorobutanol, phenol, sorbic acid, and the like). It may also be desirable to include isotonic agents, for example, sugars, sodium chloride, and the like. Prolonged absorption of the injectable pharmaceutical form can be brought about by the use of agents which delay absorption, for example, aluminum monostearate (aluminum monostearate) and gelatin.
In some embodiments of the subject comprising or modified with the presently disclosed RGN, gRNA, deaminase, fusion proteins, systems (including those comprising fusion proteins), or polynucleotides encoding the same (including those comprising fusion proteins), such cells are administered as a suspension agent with a pharmaceutically acceptable carrier. One of ordinary skill in the art will recognize that a pharmaceutically acceptable carrier to be used in a cell composition will not contain buffers, compounds, cryopreservation agents, preservatives, or other agents in amounts that substantially interfere with the viability of cells to be delivered to the subject. Formulations comprising cells may contain, for example, an osmotically buffered solution that allows the cell membrane to remain intact, and optionally, a nutrient that maintains cell viability or enhances implantation upon administration. Such formulations and suspending agents are known to those of ordinary skill in the art and/or may be adapted for use with the cells described herein using routine experimentation.
The cell composition may also be emulsified or presented as a liposome composition, provided that the emulsification procedure does not adversely affect cell viability. The cells and other active ingredients may be admixed with excipients that are pharmaceutically acceptable and compatible with the active ingredients, and in amounts suitable for use in the methods of treatment described herein.
The additional agent included in the cell composition may include pharmaceutically acceptable salts of the components therein. Pharmaceutically acceptable salts include acid addition salts (formed with free amino groups of the polypeptide) with, for example, inorganic acids such as hydrochloric or phosphoric acids, or with organic acids such as acetic, tartaric, mandelic and the like. Salts with free carboxyl groups may also be derived from, for example, inorganic bases such as sodium, potassium, ammonium, calcium or iron hydroxides, and organic bases such as isopropylamine, trimethylamine, 2-ethylamine ethanol, histidine, procaine and the like.
Suitable routes of administration of the pharmaceutical compositions described herein include, but are not limited to: topical, subcutaneous, transdermal, intradermal, intralesional, intra-articular, intraperitoneal, intravesical, mucosal, gingival, intra-dental, intra-cochlear, trans-tympanic (trans-tympanic), intra-organ, epidural, intrathecal, intramuscular, intravenous, intravascular, intraosseous, periocular, intratumoral, intracerebral, and intracerebroventricular administration.
In some embodiments, the pharmaceutical compositions described herein are administered locally to the site of disease (e.g., the lung). In some embodiments, the pharmaceutical compositions described herein are administered to a subject by injection, inhalation (e.g., inhalation of an aerosol) via a catheter, via a plug, or via an implant that is a porous, non-porous or gel material comprising a membrane or fiber such as a silicone rubber membrane (sialastic membrane). In some embodiments, the pharmaceutical composition is formulated for delivery to a subject (e.g., for gene editing).
In some embodiments, the pharmaceutical composition is formulated according to conventional procedures into a composition adapted for intravenous or subcutaneous administration to a subject (e.g., a human). In some embodiments, the pharmaceutical composition for administration by injection is a solution in a sterile isotonic aqueous buffer. If necessary, the medicament may also contain a co-solvent and a local anesthetic such as Lin Nuoka factor (lignocaine) to reduce pain at the injection site. In general, the multiple ingredients are supplied as a lyophilized powder or as an anhydrous concentrate, either separately or mixed together, for example in an airtight container such as an ampoule or pouch (sachette) indicating the amount of activator. If the medication is to be administered by infusion, it may be dispensed with an infusion bottle containing sterile pharmaceutical grade water or saline. If the pharmaceutical composition is to be administered by injection, an ampoule with sterile water for injection or physiological saline may be provided so that the ingredients may be mixed prior to administration.
In some embodiments, the pharmaceutical composition may be contained within a lipid particle or vesicle (such as a liposome or microcrystal) that is also suitable for non-oral administration.
Although the description of the pharmaceutical compositions provided herein is primarily directed to pharmaceutical compositions suitable for administration to humans, it will be understood by those skilled in the art that such compositions are generally suitable for administration to all classes of animals or organisms.
Modification of causal mutations using base editing
An example of a genetic disease that can be corrected using measures dependent on the RGN deaminase fusion proteins of the present invention is cystic fibrosis. Cystic Fibrosis (CF) is a chromosomal recessive disorder caused by a mutation in the Cystic Fibrosis Transmembrane Regulator (CFTR) gene (shown in SEQ ID NO: 51). CFTR encodes a cAMP-mediated chloride channel located in the apical membrane of epithelial cells that catalyzes the passage of small ions through the membrane. The imbalance of this mechanism causes a loss of salt and fluid homeostasis (salt and fluid homeostasis) that leads to multiple organ dysfunction and ultimately death from respiratory failure.
It has been found that almost 2,000 mutations in the CFTR gene cause CF. Based on functional defects in CFTR protein synthesis, or in transport, or in function, or in stability, CFTR mutations are classified into six classes, but it can be confirmed that many CFTR mutants exhibit various defects. Class I mutations cause severe defective protein production. They are mainly nonsense or frameshift mutations that introduce premature stop codons (PTC), leading to unstable messenger RNAs (mrnas) that degrade via the mRNA delay pathway (NMD). Nonsense mutations due to single nucleotide changes include a major subset of class I mutations (marargi, m. And pistitto, G,2018,Front Pharmacol 9, 396, doi:10.3389/fphar.2018.00396; pranke, I., et al, 2019,Front Pharmacol 10, 121, doi:10.3389/fphar.2019.00121, both of which are incorporated herein by reference). Treatment of patients with type I cystic fibrosis can be difficult because functional CFTR protein is not produced. In particular, significant segments of such nonsense mutations are potentially addressable with an A-to-G base editor (Geurns, M.H. et al 2020,Cell Stem Cell 26, 503-510e507, doi:10.1016/j.stem.2020.01.019, incorporated herein by reference).
Geurns et al were artificially using fusion proteins comprising adenine deaminase operably linked to RGN, that is, spyCas9 or xSpyCas9 variants, to perform a first group of precise base editing in cultured lung epithelial cells with class I mutations from cystic fibrosis patients. SpyCas9 recognizes 5'-nGG-3' PAM, while xSpyCas9 variants recognize reduced 5'-nG-3'. The authors state that the main limitation of the base editing technology is the PAM requirement of the Cas protein being used. They found that for fusion proteins comprising RGN SpyCas9, most of the nonsense mutations identified in the CFTR gene were not in the required targeting window. On the target DNA sequence recognized by RGN, PAM is a short motif, typically with one to four nucleotides. The PAM sequence is inherent to each RGN protein such that RGN can only access the genomic space around the appropriate PAM. In addition, the base editing window of a base editor is often limited to only a portion of the nucleotides in the target sequence. If the nucleotide of interest is too close to PAM, the RGN blocks the proximity (access) nucleotide. If the nucleotide is too far from PAM, deaminase tethered to RGN cannot reach the nucleotide. Furthermore, the amount of ssDNA exposed through the R loop limits accessibility of deaminase. The present invention comprises RGN deaminase fusion proteins, wherein RGN recognizes PAM near a class I mutation of the CFTR gene and deaminase is able to successfully modify the causal mutation targeted.
Another limitation of RGN deaminase fusion proteins known in the art is that the vector constructs encoding the fusion proteins are too large for in vivo delivery methods. AAV delivery of such fusion proteins is not an option for SpyCas 9-based fusion proteins because their size exceeds the limit for effective AAV packaging. The RGN component of the fusion proteins described herein is small in size and thus a viable candidate for AAV vector delivery strategies. The invention also discloses guide RNAs that are specific for RGNs described herein and direct fusion proteins of the invention to target sites of nonsense mutations in previously unreachable CFTR genes. The invention also teaches methods of using the fusion proteins for targeted base editing by in vivo AAV vector delivery.
Ideally, the coding sequence of the RGN deaminase fusion proteins of the present invention and the corresponding guide RNA for targeting the fusion proteins to the CFTR gene may all be packaged within a single AAV vector. In general, the size of the AAV vector received is limited to 4.7kb, although larger sizes are contemplated, at the cost of reduced packaging efficiency. RGN nicking enzymes in Table 28 have coding sequences ranging in length from about 3.15-3.45 kB. To ensure that the expression cassettes for both the fusion protein and its corresponding guide RNA can be loaded into AAV vectors, novel active deletion variants of RGN are described herein. In addition to shortening the amino acid sequence and thus the coding sequence of RGN of the fusion protein, the peptide linker linking RGN and deaminase may also be shortened. Finally, genetic elements such as promoters, enhancers and/or terminators may also be engineered, via deletion analysis, to determine the minimum size required for each to be functional.
Some embodiments of the present disclosure provide methods of editing nucleic acids using deaminase or RGN complexes described herein to achieve nucleobase changes (e.g., A: T base pairs to G: C base pairs). In some embodiments, the method is a method of editing nucleobases (e.g., base pairs of a double-stranded DNA sequence) of a nucleic acid. In some embodiments, deaminase or RGN complexes described herein are used to introduce point mutations into nucleic acids by deaminating and excising the "A" nucleobase of interest. In some embodiments, deamination and excision of the nucleobase of interest results in correction of a gene defect, such as correction of a point mutation in the CFTR gene. In some embodiments, the gene defect is associated with a disease, disorder, or condition (e.g., cystic fibrosis). For example, in some embodiments, provided herein are methods of correcting a gene associated with a gene defect, such as to correct a point mutation in the CFTR gene (e.g., in the treatment of a proliferative disease), using a base editing RGN complex comprising a fusion protein with a deaminase having a nucleotide sequence that matches SEQ ID NO: 399. and 405-407, and the sequence has at least 80% identity. In particular embodiments, the target sequence in the CFTR gene is 62-97, 116-139, 152-185, 203-234, 251-286, 305-344, 562, or 563.
In some embodiments, the purpose of the methods provided herein is to restore the function of a dysfunctional gene via genome editing. The base editing proteins provided herein may be effective for in vitro gene editing-based human therapies, such as by correcting disease-associated mutations in human cell culture. It will be appreciated by those of skill in the art that the fusion proteins and/or RGN complexes provided herein comprising a nucleic acid binding protein (e.g., nCas 9) and a nucleobase modification domain (e.g., deaminase having the amino acid sequence set forth in SEQ ID NOs: 407, 399, or 405) can be used to correct for any single point or T from T to G: pairing of a changes to G: C.
in some embodiments, provided herein are methods for treating a subject diagnosed with a disease associated with or caused by a point mutation (e.g., a mutation in the CFTR gene) that is correctable by a fusion protein or RGN complex described herein. For example, in some embodiments, a method is provided that includes administering to a subject having such a disease (e.g., cystic fibrosis) an effective amount of a fusion protein or RGN complex disclosed herein that corrects point mutations or introduces a deactivating mutation into a disease-associated gene. In some embodiments, a method is provided that includes administering to a subject having such a disease (e.g., cancer associated with the point mutation described above) an effective amount of a fusion protein, RGN complex, or pharmaceutical composition disclosed herein that corrects the point mutation or introduces a deactivating mutation into a disease-associated gene. In particular embodiments, methods of reducing at least one symptom of cystic fibrosis by administering an effective amount of a pharmaceutical composition disclosed herein are provided along with methods of treating cystic fibrosis. A pharmaceutical composition in an amount effective to treat or reduce symptoms of cystic fibrosis can reduce symptoms of cystic fibrosis (i.e., treatment) by about 5%, 10%, 15%, 20%, 25%, 30%, 40%, 50%, 60%, 70%, 80%, 90%, 95%, or more when compared to a control patient; or about 10-20%, 15-25%, 20-40%, 30-50%, 40-60%, 50-70%, 60-80%, 70-90%, 80-95%, or 90-95%. In particular embodiments, the control patient may be the same patient prior to administration of an effective amount of the pharmaceutical composition disclosed herein. Symptoms of cystic fibrosis may include, but are not limited to: sneezing, persistent coughing with mucous or sputum production, especially shortness of breath when exercising, recurrent lung infections, nasal congestion, dou Daosai (stuffy sinuses), greasy malodorous feces, constipation, nausea, abdominal distension (swollen abdomens), loss of appetite, etc. Methods for distinguishing and measuring symptoms of cystic fibrosis are known in the art.
In some embodiments of the described methods for modifying a DNA molecule of interest, the contacting step is performed in vitro. In certain embodiments, the contacting step is performed in vivo. In some embodiments, the contacting step is performed within a subject (e.g., a human subject or a non-human animal subject). In some embodiments, the contacting step is performed in a cell, such as a human or non-human animal cell.
XII cells comprising polynucleotide gene modifications
Provided herein are cells and organisms comprising target nucleic acid molecules of interest that have been modified using a fusion protein (optionally with a gRNA) mediated process as described herein. In some embodiments, the fusion protein comprises a polypeptide comprising SEQ ID NO:1-10 and 399-441, or an active variant or fragment thereof. In some embodiments, the fusion protein comprises an adenine deaminase comprising a nucleotide sequence identical to SEQ ID NO: any one of 1-10 and 399-441 has an amino acid sequence that is at least 50%, at least 55%, at least 60%, at least 65%, at least 70%, at least 75%, at least 80%, at least 85%, at least 90%, at least 95%, at least 96%, at least 97%, at least 98%, or at least 99% identical. In some embodiments, the fusion protein includes a deaminase and a DNA-binding polypeptide (e.g., an RNA-guided DNA-binding polypeptide). In a further embodiment, the fusion protein comprises deaminase and RGN or variants thereof, such as, for example, APG07433.1 (SEQ ID NO: 41) or its nicking enzyme variant nAPG07433.1 (SEQ ID NO: 42). In some embodiments, the fusion protein includes deaminase and Cas9 or a variant thereof, e.g., such as dCas9 or nickase Cas9. In some embodiments, the fusion protein comprises a nuclease-inactivating or nickase variant of a type II CRISPR-Cas polypeptide. In some embodiments, the fusion protein comprises a nuclease-inactivating or nickase variant of a V-type CRISPR-Cas polypeptide. In some embodiments, the fusion protein comprises a nuclease-inactivating or nickase variant of a type VI CRISPR-Cas polypeptide.
The modified cells can be eukaryotic (e.g., mammalian, plant, insect, avian cells) or prokaryotic. Organelles and embryos comprising at least one nucleotide sequence that has been modified by a process employing a fusion protein as described herein are also provided. Genetically modified cells, organisms, organelles, and embryos may be heterozygously or homozygously with respect to the modified nucleotide sequence. Mutations introduced through the deaminase domain of the fusion protein may result in altered expression (up-or down-regulated), inactivation, or expression of altered protein products or integrated sequences. In those paradigms in which the mutation(s) result in inactivation of a gene or expression of a nonfunctional protein product, the genetically modified cell, organism, organelle, or embryo is referred to as a "knockout". The knockout expression can be the result of a deletion mutation (i.e., a deletion of at least one nucleotide), an insertion mutation (i.e., an insertion of at least one nucleotide), or a nonsense mutation (i.e., a substitution of at least one nucleotide such that a stop codon is introduced).
In some embodiments, the mutation(s) introduced through the deaminase domain of the fusion protein results in the production of a variant protein product. The expressed variant protein product may have at least one amino acid substitution and/or addition or deletion of at least one amino acid. Variant protein products may exhibit modified characteristics or activities when compared to wild-type proteins, including but not limited to altered enzyme activity or substrate specificity.
In some embodiments, the mutation(s) introduced through the deaminase domain of the fusion protein results in an altered protein expression pattern. As non-limiting examples, mutation(s) in the regulatory region controlling expression of a protein product may result in over-expression or down-regulation or altered tissue or temporal expression patterns of the protein product.
Cells that have been modified can be grown in a conventional manner into organisms, such as plants. See, for example, mccormik et al (1986) Plant Cell Reports 5:81-84. Such plants can then be grown and pollinated with the same modified strain (modified strain) or a different strain, and the resulting hybrid has genetic modification. The present invention provides genetically modified seeds. Progeny, variants and mutants of regenerated plants are also included within the scope of the invention, provided that such parts include genetic modifications. Further provided are processed plant products or byproducts that retain the genetic modification, e.g., comprising soybean meal.
The methods provided herein can be used to modify any plant species, including but not limited to monocots and dicots. Examples of plants of interest include, but are not limited to: maize (corn), sorghum, wheat, sunflower, tomato, crucifers, pepper, potato, cotton, rice, soybean, sugar beet, sugarcane, tobacco, barley, canola, alfalfa, rye, millet, safflower, peanut, gan, cassava, coffee, coconut, pineapple, citrus, cocoa, tea, banana, avocado, fig, guava, mango, olive, papaya, cashew, australian walnut, almond, oat, vegetable, ornamental plant, and conifer.
Vegetables include, but are not limited to: tomatoes, lettuce, mung beans, lima beans, peas, and members of the cucumis genus such as cucumbers, reticulate melons, and kenaf melons. Ornamental plants include, but are not limited to: azalea, hydrangea, cottonrose, rose, tulip, narcissus, petunia, carnation, gorilla, and chrysanthemum. Preferably, the plant of the invention is a crop plant (e.g., maize, sorghum, wheat, sunflower, tomato, crucifers, peppers, potatoes, cotton, rice, soybean, sugarbeet, sugarcane, tobacco, barley, canola, etc.).
The methods provided herein can also be used to genetically modify any prokaryotic species, including but not limited to: archaea and bacteria (e.g., bacillus, klebsiella, streptomyces, rhizobium, escherichia, pseudomonas, salmonella, shigella, vibrio, yersinia, mycoplasma, agrobacterium, lactobacillus).
The methods provided herein can be used to genetically modify any eukaryotic species or cells derived therefrom, including but not limited to: animals (e.g., mammals, insects, fish, birds, and reptiles), fungi, amoebae, algae, and yeast. In some embodiments, the presently disclosed method-modified cells comprise cells of hematopoietic origin, such as immune cells (i.e., cells of the innate or adaptive immune system), including, but not limited to: b cells, T cells, natural Killer (NK) cells, pluripotent stem cells, induced pluripotent stem cells, chimeric antigen receptor T (CAR-T) cells, monocytes, macrophages, and dendritic cells.
The modified cells may be introduced into an organism. In the case of autologous cell transplantation, such cells may be derived from the same organism (e.g., human), wherein the cells are modified in an ex vivo manner. In some embodiments, in the case of allogeneic cell transplantation, the cell is derived from another organism (e.g., another person) in the same species.
XIII kit
Some aspects of this disclosure provide kits comprising deaminase of the present invention. In certain embodiments, the disclosure provides kits comprising a fusion protein comprising a deaminase of the present invention and a DNA-binding polypeptide (e.g., an RNA-guided DNA-binding polypeptide, such as an RGN polypeptide, e.g., a nuclease-inactivating Cas9 domain) and optionally a linker between the DNA-binding polypeptide domain and the deaminase. Furthermore, in some embodiments, the kit includes suitable reagents, buffers, and/or instructions for using the fusion protein, such as DNA or RNA editing in vitro or in vivo. In some embodiments, the kit includes instructions for the design and use of suitable grnas for nucleotide sequence targeted editing.
In some embodiments, the pharmaceutical composition may be provided as a pharmaceutical kit comprising (a) a container containing the composition of the present disclosure in lyophilized form and (b) a second container containing a pharmaceutically acceptable diluent for injection (e.g., sterile water). Pharmaceutically acceptable diluents can be used for reconstitution or dilution of the freeze-dried compounds of the present disclosure. Optionally associated with such container(s) may be a notice in the form prescribed by a government agency regulating the manufacture, use or sale of pharmaceutical or biological products, reflecting approval of human administration by the manufacture, use or sale agency.
The articles "a" and "an" are used herein to refer to one or to more than one (i.e., to at least one) of the grammatical object of the article. As an example, a "polypeptide" expresses one or more polypeptides.
All publications and patent applications mentioned in the specification are indicative of the level of skill of those skilled in the art to which this disclosure pertains. All publications and patent applications are herein incorporated by reference to the same extent as if each individual publication or patent application was specifically and individually indicated to be incorporated by reference.
Although the foregoing invention has been described in some detail by way of illustration and example for purposes of clarity of understanding, it will be apparent that certain changes and modifications may be practiced within the scope of the appended claims.
Non-limiting embodiments include:
1. an isolated polypeptide comprising a sequence identical to SEQ ID NO: 407. 405, 399, 1-10, 400-404, 406, and 408-441, wherein the polypeptide has deaminase activity.
2. An isolated polypeptide of embodiment 1, comprising a sequence that hybridizes to SEQ ID NO: 407. 405, 399, 1-10, 400-404, 406, and 408-441.
3. An isolated polypeptide of embodiment 1, comprising a sequence that hybridizes to SEQ ID NO: 407. 405, 399, 1-10, 400-404, 406, and 408-441.
4. A nucleic acid molecule comprising a polynucleotide encoding a deaminase polypeptide, wherein the deaminase is encoded by a nucleotide sequence that:
(a) And SEQ ID NO: 451. 449, 443, 11-20, 444-448, 450 and 452-485, or
(b) Encoding a sequence corresponding to SEQ ID NO: 407. 405, 399, 1-10, 400-404, 406, and 408-441.
5. The nucleic acid molecule of embodiment 4, wherein the deaminase is purified by hybridization with SEQ ID NO: 451. 449, 443, 11-20, 444-448, 450, and 452-485.
6. The nucleic acid molecule of embodiment 4, wherein the deaminase is purified by hybridization with SEQ ID NO:451 A nucleotide sequence encoding any one of 449, 443, 11-20, 444-448, 450, and 452-485 having at least 95% sequence identity.
7. The nucleic acid molecule of embodiment 4, wherein the deaminase is purified by hybridization with SEQ ID NO: 451. 449, 443, 11-20, 444-448, 450, and 452-485.
8. The nucleic acid molecule of embodiment 4, wherein the deaminase polypeptide has a sequence that matches SEQ ID NO: 407. 405, 399, 1-10, 400-404, 406, and 408-441.
9. The nucleic acid molecule of embodiment 4, wherein the deaminase polypeptide has a sequence that matches SEQ ID NO: 407. 405, 399, 1-10, 400-404, 406, and 408-441.
10. The nucleic acid molecule of any one of embodiments 4-9, wherein the nucleic acid molecule further comprises a heterologous promoter operably linked to the polynucleotide.
11. A pharmaceutical composition comprising a pharmaceutically acceptable carrier and the polypeptide of any one of embodiments 1-3 or the nucleic acid molecule of any one of embodiments 4-10.
12. The pharmaceutical composition of embodiment 11, wherein the pharmaceutically acceptable carrier is heterologous to the polypeptide or the nucleic acid molecule.
13. The pharmaceutical composition of embodiment 11 or 12, wherein the pharmaceutically acceptable carrier is not naturally occurring.
14. A fusion protein comprising a DNA binding polypeptide and a polypeptide that hybridizes to SEQ ID NO: 407. 405, 399, 1-10, 400-404, 406, and 408-441.
15. The fusion protein of embodiment 14, wherein the deaminase hybridizes to SEQ ID NO: 407. 405, 399, 1-10, 400-404, 406, and 408-441 have at least 95% sequence identity.
16. The fusion protein of embodiment 14, wherein the deaminase hybridizes to SEQ ID NO: 407. 405, 399, 1-10, 400-404, 406 and 408-441 have 100% sequence identity.
17. The fusion protein of any one of embodiments 14-16, wherein the deaminase is adenine deaminase.
18. The fusion protein of any of embodiments 14-17, wherein the DNA-binding polypeptide is a meganuclease, a zinc finger fusion protein, or a TALEN.
19. The fusion protein of any one of embodiments 14-17, wherein the DNA-binding polypeptide is an RNA-guided DNA-binding polypeptide.
20. The fusion protein of embodiment 19, wherein the RNA-guided DNA-binding polypeptide is an RNA-guided nuclease (RGN) polypeptide.
21. The fusion protein of embodiment 20, wherein the RGN is a type II CRISPR-Cas polypeptide.
22. The fusion protein of embodiment 20, wherein the RGN is a V-type CRISPR-Cas polypeptide.
23. The fusion protein of any one of embodiments 20-22, wherein the RGN is an RGN nickase.
24. The fusion protein of embodiment 20, wherein the RGN has a sequence identical to SEQ ID NO: 41. 60, 366, and 368.
25. The fusion protein of embodiment 20, wherein the RGN has the amino acid sequence of SEQ ID NO: 41. 60, 366, and 368.
26. The fusion protein of embodiment 23, wherein the RGN nickase is SEQ ID NO: 42. 52-59, 61, 397, and 398.
27. The fusion protein of any one of embodiments 14-26, wherein the fusion protein further comprises at least one Nuclear Localization Signal (NLS).
28. A nucleic acid molecule comprising a polynucleotide encoding a fusion protein comprising a DNA-binding polypeptide and a deaminase, wherein the deaminase is encoded by a nucleotide sequence that:
(a) And SEQ ID NO: 451. 449, 443, 11-20, 444-448, 450 and 452-485, or
(b) Encoding a sequence corresponding to SEQ ID NO: 407. 405, 399, 1-10, 400-404, 406, and 408-441.
29. The nucleic acid molecule of embodiment 28, wherein the nucleotide sequence hybridizes to SEQ ID NO: 451. 449, 443, 11-20, 444-448, 450 and 452-485 have at least 90% sequence identity.
30. The nucleic acid molecule of embodiment 28, wherein the nucleotide sequence hybridizes to SEQ ID NO: 451. 449, 443, 11-20, 444-448, 450 and 452-485 have at least 95% sequence identity.
31. The nucleic acid molecule of embodiment 28, wherein the nucleotide sequence hybridizes to SEQ ID NO: 451. 449, 443, 11-20, 444-448, 450 and 452-485 have 100% sequence identity.
32. The nucleic acid molecule of embodiment 28, wherein the nucleotide sequence encodes a sequence that hybridizes with SEQ ID NO: 407. 405, 399, 1-10, 400-404, 406, and 408-441.
33. The nucleic acid molecule of embodiment 28, wherein the nucleotide sequence encodes a sequence that hybridizes with SEQ ID NO: 407. 405, 399, 1-10, 400-404, 406, and 408-441.
34. The nucleic acid molecule of any one of embodiments 28-33, wherein the deaminase is adenine deaminase.
35. The nucleic acid molecule of any one of embodiments 28-34, wherein the DNA-binding polypeptide is a meganuclease, a zinc finger fusion protein, or a TALEN.
36. The nucleic acid molecule of any one of embodiments 28-34, wherein the DNA-binding polypeptide is an RNA-guided DNA-binding polypeptide.
37. The nucleic acid molecule of embodiment 36, wherein the RNA-guided DNA-binding polypeptide is an RNA-guided nuclease (RGN) polypeptide.
38. The nucleic acid molecule of embodiment 37, wherein the RGN is a type II CRISPR-Cas polypeptide.
39. The nucleic acid molecule of embodiment 37, wherein the RGN is a V-type CRISPR-Cas polypeptide.
40. The nucleic acid molecule of any one of embodiments 37-39, wherein the RGN is an RGN nickase.
41. The nucleic acid molecule of embodiment 37, wherein the RGN has a sequence identical to SEQ ID NO: 41. 60, 366, and 368.
42. The nucleic acid molecule of embodiment 37, wherein the RGN is SEQ ID NO: 41. 60, 366 or 368.
43. The nucleic acid molecule of embodiment 40, wherein the RGN nickase is SEQ ID NO: 42. 52-59, 61, 397, and 398.
44. The nucleic acid molecule of any of embodiments 28-43, wherein the polynucleotide encoding the fusion protein is operably linked at its 5' end to a heterologous promoter.
45. The nucleic acid molecule of any of embodiments 28-44, wherein the polynucleotide encoding the fusion protein is operably linked at its 3' end to a heterologous promoter.
46. The nucleic acid molecule of any one of embodiments 28-45, wherein the fusion protein comprises one or more nuclear localization signals.
47. The nucleic acid molecule of any one of embodiments 28-46, wherein the fusion protein is codon optimized for expression in a eukaryotic cell.
48. The nucleic acid molecule of any one of embodiments 28-46, wherein the fusion protein is codon optimized for expression in a prokaryotic cell.
49. A vector comprising the nucleic acid molecule of any one of embodiments 28-48.
50. A vector comprising the nucleic acid molecule of any one of embodiments 28-48, further comprising at least one nucleotide sequence encoding a guide RNA (gRNA) capable of hybridizing to a target sequence.
51. The vector of embodiment 50, wherein the gRNA is a single guide RNA.
52. The vector of embodiment 50, wherein the gRNA is a double guide RNA.
53. A cell comprising the fusion protein of any one of embodiments 14-27.
54. A cell comprising the fusion protein of any one of embodiments 14-27, wherein the cell further comprises a guide RNA.
55. A cell comprising the nucleic acid molecule of any one of embodiments 28-48.
56. A cell comprising the vector of any one of embodiments 49-52.
57. The cell of any one of embodiments 53-56, wherein the cell is a prokaryotic cell.
58. The cell of any one of embodiments 53-56, wherein the cell is a eukaryotic cell.
59. The cell of embodiment 58, wherein the eukaryotic cell is a mammalian cell.
60. The cell of embodiment 59, wherein the mammalian cell is a human cell.
61. The cell of embodiment 60, wherein the human cell is an immune cell.
62. The cell of embodiment 61, wherein the immune cell is a stem cell.
63. The cell of embodiment 62, wherein the stem cell is an induced pluripotent stem cell.
64. The cell of embodiment 58, wherein the eukaryotic cell is an insect or avian cell.
65. The cell of embodiment 58, wherein the eukaryotic cell is a fungal cell.
66. The cell of embodiment 58, wherein the eukaryotic cell is a plant cell.
67. A plant comprising the cells of embodiment 66.
68. A seed comprising the cell of embodiment 66.
69. A pharmaceutical composition comprising a pharmaceutically acceptable carrier and a fusion protein of any one of embodiments 14-27, a nucleic acid molecule of any one of embodiments 28-48, a vector of any one of embodiments 49-52, or a cell of any one of embodiments 59-63.
70. A method of making a fusion protein comprising culturing the cell of any one of embodiments 53-56 under conditions that express the fusion protein.
71. A method of making a fusion protein comprising introducing the nucleic acid molecule of any of embodiments 28-48 or the vector of any of embodiments 49-52 into a cell, and culturing the cell under conditions that express the fusion protein.
72. The method of embodiment 70 or 71, further comprising purifying the fusion protein.
73. A method of making an RGN fusion ribonucleoprotein complex, comprising introducing into a cell a nucleic acid molecule of any of embodiments 37-43 and a nucleic acid molecule comprising an expression cassette encoding a guide RNA, or a vector of any of embodiments 50-52, and culturing the cell under conditions that express the fusion protein and the gRNA and form the RGN fusion ribonucleoprotein complex.
74. The method of embodiment 73, further comprising purifying the RGN fusion ribonucleoprotein complex.
75. A system for modifying a target DNA molecule comprising a target DNA sequence, the system comprising:
a) A fusion protein comprising an RNA-guided nuclease polypeptide (RGN) and a deaminase or a nucleotide sequence encoding said fusion protein, wherein the deaminase has a sequence identical to SEQ ID NO: 407. 405, 399, 1-10, 400-404, 406, and 408-441; and
b) One or more guide RNAs capable of hybridizing to the target DNA sequence, or one or more nucleotide sequences encoding one or more guide RNAs (grnas); and
wherein the one or more guide RNAs are capable of forming a complex with a fusion protein so as to direct the fusion protein to bind to the target DNA sequence and modify the target DNA molecule.
76. The system of embodiment 75, wherein the deaminase has a sequence that matches SEQ ID NO: 407. 405, 399, 1-10, 400-404, 406, and 408-441.
77. The system of embodiment 75, wherein the deaminase has a sequence that matches SEQ ID NO: 407. 405, 399, 1-10, 400-404, 406, and 408-441.
78. The system of any one of embodiments 75-77, wherein at least one of said nucleotide sequences encoding one or more guide RNAs and said nucleotide sequence encoding a fusion protein are operably linked to a promoter heterologous to said nucleotide sequences.
79. The system of any of embodiments 75-78, wherein the target DNA sequence is a eukaryotic target DNA sequence.
80. The system of any of embodiments 75-79, wherein the target DNA sequence is positioned adjacent to a pre-spacer adjacent motif (PAM) recognized by RGN.
81. The system of any one of embodiments 75-80, wherein the target DNA molecule is intracellular.
82. The system of embodiment 81, wherein the cell is a eukaryotic cell.
83. The system of embodiment 82, wherein the eukaryotic cell is a plant cell.
84. The system of embodiment 82, wherein the eukaryotic cell is a mammalian cell.
85. The system of embodiment 84, wherein the mammalian cell is a human cell.
86. The system of embodiment 85, wherein the human cell is an immune cell.
87. The system of embodiment 86, wherein the immune cell is a stem cell.
88. The system of embodiment 87, wherein the stem cells are induced pluripotent stem cells.
89. The system of embodiment 82, wherein the eukaryotic cell is an insect cell.
90. The system of embodiment 81, wherein the cell is a prokaryotic cell.
91. The system of any of embodiments 75-90, wherein the RGN of the fusion protein is a type II CRISPR-Cas polypeptide.
92. The system of any of embodiments 75-90, wherein the RGN of the fusion protein is a V-type CRISPR-Cas polypeptide.
93. The system of any one of embodiments 75-90, wherein the RGN of the fusion protein has a sequence that matches the sequence of SEQ ID NO: 41. 60, 366 or 368 have an amino acid sequence having at least 95% sequence identity.
94. The system of any one of embodiments 75-90, wherein the RGN of the fusion protein has the amino acid sequence of SEQ ID NO: 41. 60, 366, and 368.
95. The system of any one of embodiments 75-90, wherein the RGN of the fusion protein is an RGN nickase.
96. The system of embodiment 95, wherein the RGN nickase is SEQ ID NO: 42. 52-59, 61, 397, and 398.
97. The system of any one of embodiments 75-96, wherein the fusion protein comprises one or more nuclear localization signals.
98. The system of any of embodiments 75-97, wherein the fusion protein is codon optimized for expression in a eukaryotic cell.
99. The system of any one of embodiments 75-98, wherein the nucleotide sequence encoding one or more guide RNAs and the nucleotide sequence encoding the fusion protein are located on one vector.
100. A pharmaceutical composition comprising a pharmaceutically acceptable carrier and the system of any one of embodiments 75-99.
101. A method for modifying a target DNA molecule comprising a target DNA sequence, the method comprising delivering a system according to any one of embodiments 75-99 to the target DNA molecule or a cell comprising the target DNA molecule.
102. The method of embodiment 101, wherein the modified target DNA molecule comprises an a > N mutation of at least one nucleotide within the target DNA molecule, wherein N is C, G, or T.
103. The method of embodiment 102, wherein the modified target DNA molecule comprises an a > G mutation of at least one nucleotide within the target DNA molecule.
104. A method for modifying a target DNA molecule comprising a target sequence, comprising:
a) Assembling the RGN deaminase ribonucleotide complex in vitro by combining the following under conditions suitable to form the RGN deaminase ribonucleotide complex:
i) One or more guide RNAs capable of hybridizing to a DNA sequence of interest; and
ii) a fusion protein comprising an RNA-guided nuclease polypeptide (RGN) and at least one deaminase, wherein the deaminase has a sequence identical to SEQ ID NO: 407. 405, 399, 1-10, 400-404, 406, and 408-441; and
b) Contacting the target DNA molecule or a cell comprising the target DNA molecule with an RGN deaminase ribonucleotide complex assembled in vitro;
wherein the one or more guide RNAs hybridizes to the target DNA sequence, thereby directing the fusion protein to bind to the target DNA sequence and modification of the target DNA molecule occurs.
105. The method of embodiment 104, wherein the deaminase has a sequence that matches SEQ ID NO: 407. 405, 399, 1-10, 400-404, 406, and 408-441.
106. The method of embodiment 104, wherein the deaminase has a sequence that matches SEQ ID NO: 407. 405, 399, 1-10, 400-404, 406, and 408-441.
107. The method of any of embodiments 104-106, wherein the modified target DNA molecule comprises an a > N mutation of at least one nucleotide within the target DNA molecule, wherein N is C, G, or T.
108. The method of embodiment 107, wherein the modified target DNA molecule comprises an a > G mutation of at least one nucleotide within the target DNA molecule.
109. The method of any of embodiments 104-108, wherein the RGN of the fusion protein is a type II CRISPR-Cas polypeptide.
110. The method of any of embodiments 104-108, wherein the RGN of the fusion protein is a V-type CRISPR-Cas polypeptide.
111. The method of any of embodiments 104-108, wherein the RGN of the fusion protein has a sequence that matches the sequence of SEQ ID NO: 41. 60, 366 or 368 have an amino acid sequence having at least 95% sequence identity.
112. The method of any of embodiments 104-108, wherein the RGN of the fusion protein has the amino acid sequence of SEQ ID NO: 41. 60, 366, and 368.
113. The method of any of embodiments 104-108, wherein the RGN of the fusion protein is RGN-nicking enzyme.
114. The method of embodiment 113, wherein the RGN nickase is SEQ ID NO: 42. 52-59, 61, 397, and 398.
115. The method of any of embodiments 104-114, wherein the fusion protein comprises one or more nuclear localization signals.
116. The method of any of embodiments 104-115, wherein the fusion protein is codon optimized for expression in a eukaryotic cell.
117. The method of any one of embodiments 104-116, wherein the target DNA sequence is a eukaryotic target DNA sequence.
118. The method of any one of embodiments 104-117, wherein the target DNA sequence is positioned adjacent to a pre-spacer adjacent motif (PAM).
119. The method of any one of embodiments 104-118, wherein the target DNA molecule is intracellular.
120. The method of embodiment 119, wherein the cell is a eukaryotic cell.
121. The method of embodiment 120, wherein the eukaryotic cell is a plant cell.
122. The method of embodiment 120, wherein the eukaryotic cell is a mammalian cell.
123. The method of embodiment 122, wherein the mammalian cell is a human cell.
124. The method of embodiment 123, wherein the human cell is an immune cell.
125. The method of embodiment 124, wherein the immune cell is a stem cell.
126. The method of embodiment 125, wherein the stem cell is an induced pluripotent stem cell.
127. The method of embodiment 120, wherein the eukaryotic cell is an insect cell.
128. The method of embodiment 119, wherein the cell is a prokaryotic cell.
129. The method of any of embodiments 119-128, further comprising selecting a cell comprising the modified DNA molecule.
130. A cell comprising a modified DNA sequence of interest according to the method of embodiment 129.
131. The cell of embodiment 130, wherein the cell is a eukaryotic cell.
132. The cell of embodiment 131, wherein the eukaryotic cell is a plant cell.
133. A plant comprising the cells of embodiment 132.
134. A seed comprising the cell of embodiment 132.
135. The cell of embodiment 131, wherein the eukaryotic cell is a mammalian cell.
136. The cell of embodiment 135, wherein the mammalian cell is a human cell.
137. The cell of embodiment 136, wherein the human cell is an immune cell.
138. The cell of embodiment 137, wherein the immune cell is a stem cell.
139. The cell of embodiment 138, wherein the stem cell is an induced pluripotent stem cell.
140. The cell of embodiment 131, wherein the eukaryotic cell is an insect cell.
141. The cell of embodiment 130, wherein the cell is a prokaryotic cell.
142. A pharmaceutical composition comprising the cells of any one of embodiments 135-139 and a pharmaceutically acceptable carrier.
143. A method of generating a genetically modified cell using correction in causal mutations against a genetic disease, the method comprising introducing into the cell:
a) A fusion protein comprising an RNA-guided nuclease polypeptide (RGN) and a deaminase or a polynucleotide encoding said fusion protein, wherein the deaminase has a sequence identical to SEQ ID NO: 407. 405, 399, 1-10, 400-404, 406, and 408-441, wherein said polynucleotide encoding the fusion protein is operably linked to a promoter such that the fusion protein is expressed in a cell; and
b) One or more guide RNAs (grnas) capable of hybridizing to a target DNA sequence or polynucleotides encoding the grnas, wherein the polynucleotides encoding the grnas are operably linked to a promoter such that the grnas are expressed in a cell,
thus, the fusion protein and gRNA target the genomic position of the causal mutation and modify the genomic sequence to remove the causal mutation.
144. The method of embodiment 143, wherein the deaminase has a sequence that matches SEQ ID NO: 407. 405, 399, 1-10, 400-404, 406, and 408-441.
145. The method of embodiment 143, wherein the deaminase has a sequence that matches SEQ ID NO: 407. 405, 399, 1-10, 400-404, 406, and 408-441.
146. The method of any of embodiments 143-145, wherein the RGN of the fusion protein is an RGN nickase.
147. The method of embodiment 146, wherein the RGN nickase is SEQ ID NO: 42. 52-59, 61, 397, and 398.
148. The method of any of embodiments 143-147, wherein the genomic modification comprises introducing an a > G mutation of at least one nucleotide into the target DNA sequence.
149. The method of any one of embodiments 143-148, wherein the cell is an animal cell.
150. The method of embodiment 149, wherein the animal cell is a mammalian cell.
151. The method of embodiment 150, wherein the cell is obtained from a dog, cat, mouse, rat, rabbit, horse, sheep, goat, cow, pig, or human.
152. The method of any one of embodiments 143-151, wherein correction of the causal mutation comprises correction of a nonsense mutation.
153. The method of embodiment 149, wherein the genetic disorder is a disorder listed in Table 34.
154. The method of embodiment 149, wherein the genetic disorder is cystic fibrosis.
155. The method of embodiment 154, wherein the gRNA further comprises targeting SEQ ID NO: the spacer sequence of any one of 62-97, 116-139, 152-185, 203-234, 251-286, 305-344, 562, and 563, or a complement thereof.
156. The method of embodiment 155, wherein the gRNA comprises SEQ ID NO: any of 98-115, 140-151, 186-202, 235-250, 287-304, 345-364, and 564.
157. A CRISPR RNA (crRNA) or a nucleotide molecule encoding the CRISPR RNA (crRNA), wherein the CRISPR RNA comprises a spacer sequence targeting a target DNA sequence within a cystic fibrosis transmembrane conductance regulator (CFTR) gene, wherein the target sequence has the sequence of SEQ ID NO:98-115, 140-151, 186-202, 235-250, 287-304, 345-364, 562 and 563 or a complement thereof.
158. A guide RNA comprising the crRNA of embodiment 157.
159. The guide RNA of embodiment 158, wherein the guide RNA is a double guide RNA.
160. The guide RNA of embodiment 158, wherein the guide RNA is a single guide RNA (sgRNA).
161. The guide RNA of embodiment 160, wherein the sgRNA hybridizes to SEQ ID NO: any of 98-115, 140-151, 186-202, 235-250, 287-304, 345-364 and 564 have at least 90% sequence identity.
162. The guide RNA of embodiment 160, wherein the sgRNA hybridizes to SEQ ID NO: any of 98-115, 140-151, 186-202, 235-250, 287-304, 345-364 and 564 have at least 95% sequence identity.
163. The guide RNA of embodiment 160, wherein the sgRNA has the sequence of SEQ ID NO:98-115, 140-151, 186-202, 235-250, 287-304, 345-364 and 564.
164. A vector comprising one or more nucleic acid molecules encoding the guide RNA of any of embodiments 158-163.
165. A system for binding a target DNA sequence of a DNA molecule, the system comprising:
a) One or more guide RNAs capable of hybridizing to the target DNA sequence, or one or more polynucleotides comprising one or more nucleotide sequences encoding the one or more guide RNAs (grnas); and
b) Fusion proteins comprising an RNA-guided nuclease polypeptide (RGN) and an adenine deaminase or polynucleotides comprising a nucleotide sequence encoding the fusion proteins;
wherein the one or more guide RNAs are capable of hybridizing to the target DNA sequence,
wherein one or more guide RNAs are capable of forming a complex with an RGN polypeptide so as to direct the RGN polypeptide to bind to the target DNA sequence of a DNA molecule, and
wherein at least one guide RGN comprises CRISPR RNA (crRNA), the CRISPR RNA (crRNA) comprising a spacer sequence targeting a target DNA sequence within a cystic fibrosis transmembrane conductance regulator (CFTR) gene, wherein the target sequence has the sequence set forth in SEQ ID NO:98-115, 140-151, 186-202, 235-250, 287-304, 345-364, 562 and 563 or a complement thereof.
166. The system of embodiment 165, wherein at least one of said nucleotide sequence encoding one or more guide RNAs and said nucleotide sequence encoding a fusion protein is operably linked to a promoter heterologous to said nucleotide sequence.
167. A system for binding a target DNA sequence of a DNA molecule, the system comprising:
a) One or more guide RNAs capable of hybridizing to the target DNA sequence, or one or more polynucleotides comprising one or more nucleotide sequences encoding the one or more guide RNAs (grnas); and
b) Fusion proteins comprising an RNA-guided nuclease polypeptide (RGN) and an adenine deaminase;
wherein the one or more guide RNAs are capable of hybridizing to the target DNA sequence,
wherein one or more guide RNAs are capable of forming a complex with an RGN polypeptide so as to direct the RGN polypeptide to bind to the target DNA sequence of a DNA molecule, and
wherein at least one guide RGN comprises CRISPR RNA (crRNA), the CRISPR RNA (crRNA) comprising a spacer sequence targeting a target DNA sequence within a cystic fibrosis transmembrane conductance regulator (CFTR) gene, wherein the target sequence has the sequence set forth in SEQ ID NO:98-115, 140-151, 186-202, 235-250, 287-304, 345-364, 562 and 563 or a complement thereof.
168. The system of embodiment 167, wherein at least one of the nucleotide sequences encoding one or more guide RNAs is operably linked to a promoter heterologous to the nucleotide sequence.
169. The system of any one of embodiments 165-168, wherein the deaminase has a sequence that hybridizes to SEQ ID NO:1-10 and 399-441.
170. The system of any one of embodiments 165-168, wherein the deaminase has a sequence that hybridizes to SEQ ID NO:1-10 and 399-441.
171. The system of any one of embodiments 165-168, wherein the deaminase comprises a polypeptide having the amino acid sequence of SEQ ID NO:1-10 and 399-441.
172. The system of any one of embodiments 165-171, wherein the RGN polypeptide and the one or more guide RNAs are not found to be substantially complexed with each other.
173. The system of any of embodiments 165-172, wherein
a) The target DNA sequence has the sequence shown in SEQ ID NO:62-68, 80-85, 116-119, 128-131, 163, 164, 180, 181, 203-209, 219-225, 256-258, 274-276, 310-313, and 330-333, or a complement thereof, and wherein the RGN polypeptide has a sequence that is identical to any one of SEQ ID NOs: 53 having a sequence of at least 90% sequence identity;
b) The target DNA sequence has the sequence shown in SEQ ID NO:68-71, 86-89, 120-122, 132-134, 152-156, 169-173, 213-215, 229-231, 251-255, 269-273, 305-309, and 325-329, and wherein the RGN polypeptide has a sequence that is identical to any one of SEQ ID NOs: 55 having at least 90% sequence identity;
c) The target DNA sequence has the sequence shown in SEQ ID NO: 72. 73, 90, 91, 161, 162, 178, 179, 265, 266, 283, and 284, or a complement thereof, and wherein the RGN polypeptide has a sequence that is identical to any one of SEQ ID NOs: 52 having at least 90% sequence identity;
d) The target DNA sequence has the sequence shown in SEQ ID NO: 74. 75, 92, 93, 123, 124, 135, 136, 167, 184, 216-218, 232-234, 259-261, 277-279, 314-317, and 334-337, or the complement thereof, and wherein the RGN polypeptide has a sequence complementary to any one of seq id NOs: 56 having at least 90% sequence identity;
e) The target DNA sequence has the sequence shown in SEQ ID NO: 76. 94, 210-212, 226-228, 322, 342, 562, and 563, or a complement thereof, and wherein the RGN polypeptide has a sequence that hybridizes to any one of SEQ ID NOs: 42 having at least 90% sequence identity;
f) The target DNA sequence has the sequence shown in SEQ ID NO: 77. 95, 125, 137, 157-160, 174-177, 323, and 343, or a complement thereof, and wherein the RGN polypeptide has a sequence that is identical to any one of SEQ ID NOs: 54 having at least 90% sequence identity;
g) The target DNA sequence has the sequence shown in SEQ ID NO: 78. 96, 126, 138, 168, 185, 267, 285, 318, 319, 338, and 339, or a complement thereof, and wherein the RGN polypeptide has a sequence that is identical to any one of SEQ ID NOs: 57 has a sequence of at least 90% sequence identity;
h) The target DNA sequence has the sequence shown in SEQ ID NO: 79. 97, 127, 139, 262-264, 280-282, 324, and 344, or a complement thereof, and wherein the RGN polypeptide has a sequence that is identical to any one of SEQ ID NOs: 58 having at least 90% sequence identity; and
i) The target DNA sequence has the sequence shown in SEQ ID NO: 165. 166, 182, 183, 268, 286, 320, 321, 340, and 341, or a complement thereof, and wherein the RGN polypeptide has a sequence identical to any one of SEQ ID NOs: 59 has at least 90% sequence identity.
174. The system of any of embodiments 165-172, wherein
a) The target DNA sequence has the sequence shown in SEQ ID NO:62-68, 80-85, 116-119, 128-131, 163, 164, 180, 181, 203-209, 219-225, 256-258, 274-276, 310-313, and 330-333, or a complement thereof, and wherein the RGN polypeptide has a sequence that is identical to any one of SEQ ID NOs: 53 having a sequence of at least 95% sequence identity;
b) The target DNA sequence has the sequence shown in SEQ ID NO:68-71, 86-89, 120-122, 132-134, 152-156, 169-173, 213-215, 229-231, 251-255, 269-273, 305-309, and 325-329, and wherein the RGN polypeptide has a sequence that is identical to any one of SEQ ID NOs: 55 having at least 95% sequence identity;
c) The target DNA sequence has the sequence shown in SEQ ID NO: 72. 73, 90, 91, 161, 162, 178, 179, 265, 266, 283, and 284, or a complement thereof, and wherein the RGN polypeptide has a sequence that is identical to any one of SEQ ID NOs: 52 having at least 95% sequence identity;
d) The target DNA sequence has the sequence shown in SEQ ID NO: 74. 75, 92, 93, 123, 124, 135, 136, 167, 184, 216-218, 232-234, 259-261, 277-279, 314-317, and 334-337, or the complement thereof, and wherein the RGN polypeptide has a sequence that is identical to any one of SEQ ID NOs: 56 having at least 95% sequence identity;
e) The target DNA sequence has the sequence shown in SEQ ID NO: 76. 94, 210-212, 226-228, 322, 342, 562, and 563, or a complement thereof, and wherein the RGN polypeptide has a sequence that hybridizes to any one of SEQ ID NOs: 42 having at least 95% sequence identity;
f) The target DNA sequence has the sequence shown in SEQ ID NO: 77. 95, 125, 137, 157-160, 174-177, 323, and 343, or a complement thereof, and wherein the RGN polypeptide has a sequence that is identical to any one of SEQ ID NOs: 54 having at least 95% sequence identity;
g) The target DNA sequence has the sequence shown in SEQ ID NO: 78. 96, 126, 138, 168, 185, 267, 285, 318, 319, 338, and 339, or a complement thereof, and wherein the RGN polypeptide has a sequence that is identical to any one of SEQ ID NOs: 57 has a sequence of at least 95% sequence identity;
h) The target DNA sequence has the sequence shown in SEQ ID NO: 79. 97, 127, 139, 262-264, 280-282, 324, and 344, or a complement thereof, and wherein the RGN polypeptide has a sequence that is identical to any one of SEQ ID NOs: 58 has a sequence of at least 95% sequence identity; and
i) The target DNA sequence has the sequence shown in SEQ ID NO: 165. 166, 182, 183, 268, 286, 320, 321, 340, and 341, or a complement thereof, and wherein the RGN polypeptide has a sequence identical to any one of SEQ ID NOs: 59 has a sequence of at least 95% sequence identity.
175. The system of any of embodiments 165-172, wherein
a) The target DNA sequence has the sequence shown in SEQ ID NO:62-68, 80-85, 116-119, 128-131, 163, 164, 180, 181, 203-209, 219-225, 256-258, 274-276, 310-313, and 330-333, or a complement thereof, and wherein the RGN polypeptide has a sequence that is identical to any one of SEQ ID NOs: 53 having 100% sequence identity;
b) The target DNA sequence has the sequence shown in SEQ ID NO:68-71, 86-89, 120-122, 132-134, 152-156, 169-173, 213-215, 229-231, 251-255, 269-273, 305-309, and 325-329, and wherein the RGN polypeptide has a sequence that is identical to any one of SEQ ID NOs: 55 a sequence having 100% sequence identity;
c) The target DNA sequence has the sequence shown in SEQ ID NO: 72. 73, 90, 91, 161, 162, 178, 179, 265, 266, 283, and 284, or a complement thereof, and wherein the RGN polypeptide has a sequence that is identical to any one of SEQ ID NOs: 52 having 100% sequence identity;
d) The target DNA sequence has the sequence shown in SEQ ID NO: 74. 75, 92, 93, 123, 124, 135, 136, 167, 184, 216-218, 232-234, 259-261, 277-279, 314-317, and 334-337, or the complement thereof, and wherein the RGN polypeptide has a sequence that is identical to any one of SEQ ID NOs: 56 a sequence having 100% sequence identity;
e) The target DNA sequence has the sequence shown in SEQ ID NO: 76. 94, 210-212, 226-228, 322, 342, 562, and 563, or a complement thereof, and wherein the RGN polypeptide has a sequence that hybridizes to any one of SEQ ID NOs: 42 having 100% sequence identity;
f) The target DNA sequence has the sequence shown in SEQ ID NO: 77. 95, 125, 137, 157-160, 174-177, 323, and 343, or a complement thereof, and wherein the RGN polypeptide has a sequence that is identical to any one of SEQ ID NOs: 54 having 100% sequence identity;
g) The target DNA sequence has the sequence shown in SEQ ID NO: 78. 96, 126, 138, 168, 185, 267, 285, 318, 319, 338, and 339, or a complement thereof, and wherein the RGN polypeptide has a sequence that is identical to any one of SEQ ID NOs: 57 has 100% sequence identity;
h) The target DNA sequence has the sequence shown in SEQ ID NO: 79. 97, 127, 139, 262-264, 280-282, 324, and 344, or a complement thereof, and wherein the RGN polypeptide has a sequence that is identical to any one of SEQ ID NOs: 58 sequences having 100% sequence identity; and
i) The target DNA sequence has the sequence shown in SEQ ID NO: 165. 166, 182, 183, 268, 286, 320, 321, 340, and 341, or a complement thereof, and wherein the RGN polypeptide has a sequence identical to any one of SEQ ID NOs: 59 has 100% sequence identity.
176. The system of any one of embodiments 165-175, wherein at least one guide RNA is a double guide RNA.
177. The system of any one of embodiments 165-175, wherein at least one guide RNA is a single guide RNA (sgRNA).
178. The system of embodiment 177, wherein
a) The sgrnas are identical to SEQ ID NOs: 98-104, 140-143, 197, 198, 235-241, 292-294, and 350-353, and wherein the RGN polypeptide has at least 90% sequence identity to any one of SEQ ID NOs: 53 having a sequence of at least 90% sequence identity;
b) The sgrnas are identical to SEQ ID NOs: 104-107, 144-146, 186-190, 245-247, 287-291, and 345-349, and wherein the RGN polypeptide has at least 90% sequence identity to any one of SEQ ID NOs: 55 having at least 90% sequence identity;
c) The sgrnas are identical to SEQ ID NOs: 108. 109, 195, 196, 301, and 302, and wherein the RGN polypeptide has at least 90% sequence identity to any one of SEQ ID NOs: 52 having at least 90% sequence identity;
d) The sgrnas are identical to SEQ ID NOs: 110. 111, 147, 148, 201, 248-250, 295-297, and 354-357, and wherein the RGN polypeptide has at least 90% sequence identity to any one of SEQ ID NOs: 56 having at least 90% sequence identity;
e) The sgrnas are identical to SEQ ID NOs: 112. any of 242-244, 362, and 564 has at least 90% sequence identity, and wherein the RGN polypeptide has a sequence identical to any of SEQ ID NOs: 42 having at least 90% sequence identity;
f) The sgrnas are identical to SEQ ID NOs: 113. 149, 191-194, and 363, and wherein the RGN polypeptide has at least 90% sequence identity to any of SEQ ID NOs: 54 having at least 90% sequence identity;
g) The sgrnas are identical to SEQ ID NOs: 114. 150, 202, 303, 358, and 359, and wherein the RGN polypeptide has at least 90% sequence identity to any one of SEQ ID NOs: 57 has a sequence of at least 90% sequence identity;
h) The sgrnas are identical to SEQ ID NOs: 115. 151, 298-300, and 364, and wherein the RGN polypeptide has at least 90% sequence identity to any one of SEQ ID NOs: 58 having at least 90% sequence identity; and
i) The sgrnas are identical to SEQ ID NOs: 199. any of 200, 304, 360, and 361 has at least 90% sequence identity, and wherein the RGN polypeptide has a sequence identical to SEQ ID NO:59 has at least 90% sequence identity.
179. The system of embodiment 177, wherein
a) The sgrnas are identical to SEQ ID NOs: 98-104, 140-143, 197, 198, 235-241, 292-294, and 350-353, and wherein the RGN polypeptide has at least 95% sequence identity to any one of SEQ ID NOs: 53 having a sequence of at least 95% sequence identity;
b) The sgrnas are identical to SEQ ID NOs: 104-107, 144-146, 186-190, 245-247, 287-291, and 345-349, and wherein the RGN polypeptide has at least 95% sequence identity to any one of SEQ ID NOs: 55 having at least 95% sequence identity;
c) The sgrnas are identical to SEQ ID NOs: 108. 109, 195, 196, 301, and 302, and wherein the RGN polypeptide has at least 95% sequence identity to any one of SEQ ID NOs: 52 having at least 95% sequence identity;
d) The sgrnas are identical to SEQ ID NOs: 110. 111, 147, 148, 201, 248-250, 295-297, and 354-357, and wherein the RGN polypeptide has at least 95% sequence identity to any one of SEQ ID NOs: 56 having at least 95% sequence identity;
e) The sgrnas are identical to SEQ ID NOs: 112. any of 242-244, 362, and 564 has at least 95% sequence identity, and wherein the RGN polypeptide has a sequence identical to any of SEQ ID NOs: 42 having at least 95% sequence identity;
f) The sgrnas are identical to SEQ ID NOs: 113. 149, 191-194, and 363, and wherein the RGN polypeptide has at least 95% sequence identity to any of SEQ ID NOs: 54 having at least 95% sequence identity;
g) The sgrnas are identical to SEQ ID NOs: 114. 150, 202, 303, 358, and 359, and wherein the RGN polypeptide has at least 95% sequence identity to any one of SEQ ID NOs: 57 has a sequence of at least 95% sequence identity;
h) The sgrnas are identical to SEQ ID NOs: 115. 151, 298-300, and 364, and wherein the RGN polypeptide has at least 95% sequence identity to any one of SEQ ID NOs: 58 has a sequence of at least 95% sequence identity; and
i) The sgrnas are identical to SEQ ID NOs: 199. any of 200, 304, 360, and 361 has at least 95% sequence identity, and wherein the RGN polypeptide has a sequence identical to SEQ ID NO:59 has a sequence of at least 95% sequence identity.
180. The system of embodiment 177, wherein
a) The sgrnas are identical to SEQ ID NOs: 98-104, 140-143, 197, 198, 235-241, 292-294, and 350-353, and wherein the RGN polypeptide has 100% sequence identity to any one of SEQ ID NOs: 53 having 100% sequence identity;
b) The sgrnas are identical to SEQ ID NOs: 104-107, 144-146, 186-190, 245-247, 287-291, and 345-349, and wherein the RGN polypeptide has 100% sequence identity to any one of SEQ ID NOs: 55 a sequence having 100% sequence identity;
c) The sgrnas are identical to SEQ ID NOs: 108. 109, 195, 196, 301, and 302, and wherein the RGN polypeptide has 100% sequence identity to any one of SEQ ID NOs: 52 having 100% sequence identity;
d) The sgrnas are identical to SEQ ID NOs: 110. 111, 147, 148, 201, 248-250, 295-297, and 354-357, and wherein the RGN polypeptide has 100% sequence identity to any one of SEQ ID NOs: 56 a sequence having 100% sequence identity;
e) The sgrnas are identical to SEQ ID NOs: 112. any of 242-244, 362, and 564 has 100% sequence identity, and wherein the RGN polypeptide has a sequence identical to any of SEQ ID NOs: 42 having 100% sequence identity;
f) The sgrnas are identical to SEQ ID NOs: 113. 149, 191-194, and 363, and wherein the RGN polypeptide has 100% sequence identity to any of SEQ ID NOs: 54 having 100% sequence identity;
g) The sgrnas are identical to SEQ ID NOs: 114. 150, 202, 303, 358, and 359, and wherein the RGN polypeptide has 100% sequence identity to any one of SEQ ID NOs: 57 has 100% sequence identity;
h) The sgrnas are identical to SEQ ID NOs: 115. 151, 298-300, and 364, and wherein the RGN polypeptide has 100% sequence identity to any one of SEQ ID NOs: 58 sequences having 100% sequence identity; and
i) The sgrnas are identical to SEQ ID NOs: 199. any of 200, 304, 360, and 361 has 100% sequence identity, and wherein the RGN polypeptide has a sequence identical to SEQ ID NO:59 has 100% sequence identity.
181. A cell comprising a crRNA or nucleic acid molecule of embodiment 157, a guide RNA of any of embodiments 158-163, a vector of embodiment 164, or a system of any of embodiments 165-180.
182. A pharmaceutical composition comprising a crRNA or nucleic acid molecule of embodiment 157, a guide RNA of any of embodiments 158-163, a vector of embodiment 164, a cell of embodiment 181, or a system of any of embodiments 165-180, and a pharmaceutically acceptable carrier.
183. A composition comprising:
a) A fusion protein comprising a DNA binding polypeptide and an adenine deaminase or a nucleic acid molecule encoding the fusion protein; and
b) And SEQ ID NO: 407. 405, 399, 1-10, 400-404, 406, and 408-441; or a nucleic acid molecule encoding the deaminase.
184. The composition of embodiment 183, wherein the second adenine deaminase hybridizes to SEQ ID NO: 407. 405, 399, 1-10, 400-404, 406, and 408-441 have at least 90% sequence identity.
185. The composition of embodiment 183, wherein the second adenine deaminase hybridizes to SEQ ID NO: 407. 405, 399, 1-10, 400-404, 406 and 408-441 have 100% sequence identity.
186. The composition of any one of embodiments 183-185, wherein the first adenine deaminase hybridizes to SEQ ID NO: 407. 405, 399, 1-10, 400-404, 406, and 408-441 have at least 90% sequence identity.
187. The composition of any one of embodiments 183-186, wherein the first adenine deaminase hybridizes to SEQ ID NO: 407. 405, 399, 1-10, 400-404, 406, and 408-441 have at least 95% sequence identity.
188. The composition of any one of embodiments 183-186, wherein the first adenine deaminase hybridizes to SEQ ID NO: 407. 405, 399, 1-10, 400-404, 406 and 408-441 have 100% sequence identity.
189. The composition of any one of embodiments 183-188, wherein the DNA-binding polypeptide is a meganuclease, a zinc finger fusion protein, or a TALEN.
190. The composition of any one of embodiments 183-189, wherein the DNA-binding polypeptide is an RNA-guided DNA-binding polypeptide.
191. The composition of embodiment 190, wherein the RNA-guided DNA-binding polypeptide is an RNA-guided nuclease (RGN) polypeptide.
192. The composition of embodiment 191, wherein the RGN is an RGN nickase.
193. A vector comprising a nucleic acid molecule encoding a fusion protein and a nucleic acid molecule encoding a second deaminase, wherein the fusion protein comprises a DNA binding polypeptide and a first adenine deaminase, and wherein the second adenine deaminase hybridizes with the nucleic acid molecule of SEQ ID NO: 407. 405, 399, 1-10, 400-404, 406, and 408-441 have at least 90% sequence identity.
194. The vector of embodiment 193, wherein the second adenine deaminase hybridizes to SEQ id no: 407. 405, 399, 1-10, 400-404, 406, and 408-441 have at least 90% sequence identity.
195. The vector of embodiment 193, wherein the second adenine deaminase hybridizes to SEQ ID NO: 407. 405, 399, 1-10, 400-404, 406 and 408-441 have 100% sequence identity.
196. The vector of any one of embodiments 193-195, wherein the first adenine deaminase hybridizes to SEQ ID NO: 407. 405, 399, 1-10, 400-404, 406, and 408-441 have at least 90% sequence identity.
197. The vector of any one of embodiments 193-195, wherein the first adenine deaminase hybridizes to SEQ ID NO: 407. 405, 399, 1-10, 400-404, 406, and 408-441 have at least 95% sequence identity.
198. The vector of any one of embodiments 193-195, wherein the first adenine deaminase hybridizes to SEQ ID NO: 407. 405, 399, 1-10, 400-404, 406 and 408-441 have 100% sequence identity.
199. The vector of any one of embodiments 193-198, wherein the DNA-binding polypeptide is a meganuclease, a zinc finger fusion protein, or a TALEN.
200. The vector of any one of embodiments 193-198, wherein the DNA-binding polypeptide is an RNA-guided DNA-binding polypeptide.
201. The vector of embodiment 200, wherein the RNA-guided DNA-binding polypeptide is an RNA-guided nuclease (RGN) polypeptide.
202. The vector of embodiment 201, wherein the RGN is RGN-nicking enzyme.
203. A cell comprising the vector of any one of embodiments 193-202.
204. A cell, comprising:
a) A fusion protein comprising a DNA binding polypeptide and a first adenine deaminase or a nucleic acid molecule encoding the fusion protein; and
b) And SEQ ID NO: 407. 405, 399, 1-10, 400-404, 406, and 408-441; or a nucleic acid molecule encoding the second adenine deaminase.
205. The cell of embodiment 204, wherein the second adenine deaminase hybridizes to SEQ ID NO: 407. 405, 399, 1-10, 400-404, 406, and 408-441 have at least 90% sequence identity.
206. The cell of embodiment 204, wherein the second adenine deaminase hybridizes to SEQ ID NO: 407. 405, 399, 1-10, 400-404, 406 and 408-441 have 100% sequence identity.
207. The cell of any one of embodiments 204-206, wherein the first adenine deaminase hybridizes to SEQ ID NO: 407. 405, 399, 1-10, 400-404, 406, and 408-441 have at least 90% sequence identity.
208. The cell of any one of embodiments 204-206, wherein the first adenine deaminase hybridizes to SEQ ID NO: 407. 405, 399, 1-10, 400-404, 406, and 408-441 have at least 95% sequence identity.
209. The cell of any one of embodiments 204-206, wherein the first adenine deaminase hybridizes to SEQ ID NO: 407. 405, 399, 1-10, 400-404, 406 and 408-441 have 100% sequence identity.
210. The cell of any one of embodiments 204-209, wherein the DNA-binding polypeptide is a meganuclease, a zinc finger fusion protein, or a TALEN.
211. The cell of any one of embodiments 204-209, wherein the DNA-binding polypeptide is an RNA-guided DNA-binding polypeptide.
212. The cell of embodiment 211, wherein the RNA-guided DNA-binding polypeptide is an RNA-guided nuclease (RGN) polypeptide.
213. The cell of embodiment 212, wherein the RGN is an RGN nickase.
214. A pharmaceutical composition comprising a pharmaceutically acceptable carrier and a composition of any one of embodiments 183-192, a carrier of any one of embodiments 193-202, or a cell of any one of embodiments 203-213.
215. A method of treating a disease, the method comprising administering to a subject in need thereof an effective amount of the pharmaceutical composition of any one of embodiments 69, 100, 142, and 214.
216. The method of embodiment 215, wherein said disease is associated with a causal mutation and said effective amount of said pharmaceutical composition corrects said causal mutation.
217. Use of the fusion protein of any of embodiments 14-27, the nucleic acid molecule of any of embodiments 28-48, the vector of any of embodiments 49-52 and 193-202, the cell of any of embodiments 59-63, 135-139, and 203-213, the system of any of embodiments 75-99, or the composition of any of embodiments 183-192 for treating a disease in a subject.
218. The use of embodiment 217, wherein the disease is associated with a causal mutation, and the treating comprises correcting the causal mutation.
219. Use of the fusion protein of any of embodiments 14-27, the nucleic acid molecule of any of embodiments 28-48, the vector of any of embodiments 49-52 and 193-202, the cell of any of embodiments 59-63, 135-139, and 203-213, the system of any of embodiments 75-99, or the composition of any of embodiments 183-192 in the manufacture of a medicament useful in the treatment of a disease.
220. The use of embodiment 219, wherein the disease is associated with a causal mutation and an effective amount of the agent corrects the causal mutation.
221. A nucleic acid molecule comprising a polynucleotide encoding an RNA-guided nuclease (RGN) polypeptide, wherein the polynucleotide comprises a nucleotide sequence encoding an RGN polypeptide comprising a sequence identical to SEQ ID NO:41 or 60, but lacks the amino acid sequence of SEQ ID NO:41 or 60 from 590 to 597;
wherein the RGN polypeptide is capable of binding to a target DNA sequence in an RNA-directed sequence-specific manner when bound to a guide RNA (gRNA) capable of hybridizing to the target DNA sequence.
222. The nucleic acid molecule of embodiment 221, wherein the polynucleotide encoding the RGN polypeptide is operably linked to a promoter heterologous to the polynucleotide.
223. The nucleic acid molecule of embodiment 221 or 222, wherein the RGN polypeptide comprises a sequence that hybridizes to SEQ ID NO:366 or 368 has an amino acid sequence having at least 95% sequence identity.
224. The nucleic acid molecule of embodiment 221 or 222, wherein the RGN polypeptide comprises the sequence of SEQ ID NO:366 or 368.
225. The nucleic acid molecule of any one of embodiments 221-223, wherein the RGN polypeptide is nuclease-free or acts as a nicking enzyme.
226. The nucleic acid molecule of embodiment 225, wherein the nicking enzyme has the sequence of SEQ ID NO:397 or 398.
227. The nucleic acid molecule of any one of embodiments 221-226, wherein the RGN polypeptide is operably linked to a base editing polypeptide.
228. A vector comprising the nucleic acid molecule of any one of claims 221-227.
229. An isolated polypeptide comprising a sequence identical to SEQ ID NO:41 or 60, but lacks the amino acid sequence of SEQ ID NO:41 or 60, wherein the polypeptide is an RNA-guided nuclease.
230. The isolated polypeptide of embodiment 229, wherein said RGN polypeptide comprises a sequence that hybridizes to SEQ ID NO:366 or 368 has an amino acid sequence having at least 95% sequence identity.
231. The isolated polypeptide of embodiment 230, wherein the RGN polypeptide comprises SEQ ID NO:366 or 368.
232. The isolated polypeptide of embodiment 229 or 230, wherein said RGN polypeptide is nuclease-free or acts as a nicking enzyme.
233. The isolated polypeptide of embodiment 232, wherein the nicking enzyme has the amino acid sequence of SEQ ID NO:397 or 398.
234. The isolated polypeptide of any of embodiments 229-233, wherein the RGN polypeptide is operably fused to a base editing polypeptide.
235. A cell comprising the nucleic acid molecule of any one of embodiments 221-227, the vector of claim 228, or the polypeptide of any one of claims 229-234.
236. An isolated polypeptide comprising a sequence identical to SEQ ID NO:407, wherein the polypeptide has deaminase activity.
237. An isolated polypeptide of embodiment 236, comprising an amino acid sequence that hybridizes to SEQ ID NO:407, wherein the polypeptide has deaminase activity.
238. The isolated polypeptide of embodiment 236, wherein the polypeptide comprises SEQ ID NO: 407.
239. A nucleic acid molecule comprising a polynucleotide encoding a deaminase polypeptide, wherein the deaminase is encoded by a nucleotide sequence that:
a) And SEQ ID NO:451 has at least 80% sequence identity, or
b) Encoding a sequence corresponding to SEQ ID NO:407, or a sequence of at least 90% sequence identity.
240. The nucleic acid molecule of embodiment 239, wherein the deaminase is purified by hybridization with SEQ ID NO:451 a nucleotide sequence encoding a sequence having at least 90% sequence identity.
241. The nucleic acid molecule of embodiment 239, wherein the deaminase is purified by hybridization with SEQ ID NO:451 a nucleotide sequence encoding at least 95% sequence identity.
242. The nucleic acid molecule of embodiment 239, wherein the deaminase is purified by hybridization with SEQ ID NO:451 a nucleotide sequence encoding a sequence having at least 100% sequence identity.
243. The nucleic acid molecule of any one of embodiments 239-242, wherein the nucleic acid molecule further comprises a heterologous promoter operably linked to the polynucleotide.
244. A pharmaceutical composition comprising a pharmaceutically acceptable carrier and a polypeptide of any one of embodiments 236-238 or a nucleic acid molecule of any one of embodiments 239-242.
245. A fusion protein comprising a DNA binding polypeptide and a polypeptide that hybridizes to SEQ ID NO:407 has at least 90% sequence identity.
246. The fusion protein of embodiment 245, comprising a DNA-binding polypeptide and a polypeptide that hybridizes to SEQ ID NO:407 has at least 95% sequence identity.
247. The fusion protein of embodiment 245, comprising a DNA-binding polypeptide and a polypeptide that hybridizes to SEQ id no:407 has 100% sequence identity.
248. The fusion protein of any one of embodiments 245-247, wherein the DNA binding polypeptide is an RNA-guided nuclease (RGN) polypeptide.
249. The fusion protein of embodiment 248, wherein the RGN polypeptide is a type II CRISPR-Cas polypeptide or a type V CRISPR-Cas polypeptide.
250. The fusion protein of any of embodiments 248-249, wherein the RGN polypeptide is Cas9, casX, casY, cpfl, C2C1, C2, C2C3, geoCas9, cjCas9, casl2a, casl2b, casl2g, casl2h, casl2i, casl3b, casl3C, casl3d, casl4, csn2, xCas9, spCas9-NG, lbcast 2a, ascas 2a, cas9-KKH, circularly arranged Cas9, argonaute (Ago), smacas 9, spy-macCas9 domain, or a polypeptide having the sequence of SEQ ID NO: 41. 60, 366 or 368.
251. The fusion protein of any one of embodiments 248-250, wherein the RGN polypeptide is a nicking enzyme.
252. The fusion protein of embodiment 251, wherein the nicking enzyme has a sequence identical to SEQ ID NO: 42. 52-59, 61, 397, and 398 having at least 95% sequence identity.
253. The fusion protein of embodiment 251, wherein the nicking enzyme has a sequence identical to SEQ ID NO: 42. 52-59, 61, 397, and 398, having 100% sequence identity.
254. A nucleic acid molecule comprising a polynucleotide encoding a fusion protein comprising a DNA-binding polypeptide and a deaminase, wherein the deaminase is encoded by a nucleotide sequence that:
a) And SEQ ID NO:451 has at least 80% sequence identity, or
b) Encoding a sequence corresponding to SEQ ID NO:407 has an amino acid sequence having at least 90% sequence identity.
255. The nucleic acid molecule of embodiment 254, wherein the deaminase is purified by hybridization with SEQ ID NO:451 a nucleotide sequence encoding a sequence having at least 90% sequence identity.
256. The nucleic acid molecule of embodiment 254, wherein the deaminase is purified by hybridization with SEQ ID NO:451 a nucleotide sequence encoding at least 95% sequence identity.
257. The nucleic acid molecule of embodiment 254, wherein the deaminase is purified by hybridization with SEQ ID NO:451 a nucleotide sequence encoding a sequence having at least 100% sequence identity.
258. The nucleic acid molecule of any one of embodiments 254-257, wherein the DNA-binding polypeptide is an RGN polypeptide.
259. The nucleic acid molecule of embodiment 258, wherein the RGN is a type II CRISPR-Cas polypeptide or a type V CRISPR-Cas polypeptide.
260. The nucleic acid molecule of any one of embodiments 258-259, wherein the RGN polypeptide is Cas9, casX, casY, cpfl, C2C1, C2, C2C3, geoCas9, cjCas9, casl2a, casl2b, casl2g, casl2h, casl2i, casl3b, casl3C, casl3d, casl4, csn2, xCas9, spCas9-NG, lbcast 2a, ascas 2a, cas9-KKH, circularly arranged Cas9, argonaute (Ago), smacas 9, spy-macCas9 domain, or a polypeptide having the sequence of SEQ ID NO: 41. 60, 366 or 368.
261. The nucleic acid molecule of any of embodiments 258-260, wherein the RGN polypeptide is a nicking enzyme.
262. The nucleic acid molecule of embodiment 261, wherein the nicking enzyme has a sequence that is identical to SEQ ID NO: 42. 52-59, 61, 397, and 398 having at least 95% sequence identity.
263. The nucleic acid molecule of embodiment 262, wherein the nicking enzyme has a sequence that matches SEQ ID NO: 42. 52-59, 61, 397, and 398, having 100% sequence identity.
264. A vector comprising the nucleic acid molecule of any one of embodiments 254-263.
265. The vector of embodiment 264, further comprising at least one nucleotide sequence encoding a guide RNA (gRNA) capable of hybridizing to the target sequence.
266. A Ribonucleoprotein (RNP) complex comprising the fusion protein of any of embodiments 245-253 and a guide RNA that binds to a DNA-binding polypeptide of the fusion protein.
267. A cell comprising the fusion protein of any one of embodiments 245-253, the nucleic acid molecule of any one of embodiments 254-263, the vector of any one of embodiments 264-265, or the RNP complex of embodiment 266.
268. A system for modifying a target DNA molecule comprising a target DNA sequence, the system comprising:
a) A fusion protein comprising an RNA-guided nuclease (RGN) polypeptide and a deaminase or a nucleotide sequence encoding said fusion protein, wherein the deaminase has a sequence identical to SEQ ID NO:407 has an amino acid sequence having at least 90% sequence identity; and
b) One or more guide RNAs or one or more nucleotide sequences encoding the one or more guide RNAs (grnas) capable of hybridizing to the target DNA sequence; and
Wherein one or more guide RNAs are capable of forming a complex with a fusion protein so as to direct the fusion protein to bind to the DNA sequence of interest and modify the DNA molecule of interest.
269. The system of embodiment 268, wherein the deaminase has a sequence that matches SEQ ID NO:407 has an amino acid sequence having at least 95% sequence identity.
270. The system of embodiment 268, wherein the deaminase has a sequence that matches SEQ ID NO:407 has an amino acid sequence of 100% sequence identity.
271. The system of any one of embodiments 268-270, wherein at least one of said nucleotide sequences encoding one or more guide RNAs and said nucleotide sequence encoding a fusion protein are operably linked to a promoter heterologous to said nucleotide sequences.
272. The system of any of embodiments 268-271, wherein the target DNA sequence is positioned adjacent to a pre-spacer adjacent motif (PAM) recognized by the RGN polypeptide.
273. The system of any one of embodiments 268-272, wherein the DNA sequence of interest comprises a sequence selected from the group consisting of SEQ ID NOs: 62-97, 116-139, 152-185, 203-234, 251-286, 305-344, 562 and 563 or a complement thereof.
274. The system of any one of embodiments 268-273, wherein the gRNA sequence comprises a sequence selected from the group consisting of SEQ ID NOs: 98-115, 140-151, 186-202, 235-250, 287-304, 345-364 and 564.
275. The system of embodiments 268-274, wherein the RGN polypeptide of the fusion protein is a type II CRISPR-Cas polypeptide or a type V CRISPR-Cas polypeptide.
276. The system of any of embodiments 272-275, wherein the RGN polypeptide is Cas9, casX, casY, cpfl, C2C1, C2, C2C3, geoCas9, cjCas9, casl2a, casl2b, casl2g, casl2h, casl2i, casl3b, casl3C, casl3d, casl4, csn2, xCas9, spCas9-NG, lbcast 2a, asCasl2a, cas9-KKH, circularly arranged Cas9, argonaute (Ago), smacas 9, spy-macCas9 domain, or a polypeptide having the sequence of SEQ ID NO: 41. an RGN of the amino acid sequence shown in any one of 60, 366 or 368.
277. The system of embodiment 276, wherein the RGN polypeptide is a nicking enzyme.
278. The system of embodiment 277, wherein the nicking enzyme has a nucleotide sequence that matches SEQ ID NO: 42. 52-59, 61, 397, and 398 having at least 95% sequence identity.
279. A pharmaceutical composition comprising a pharmaceutically acceptable carrier and the fusion protein of any one of embodiments 245-253, the nucleic acid molecule of any one of embodiments 254-263, the vector of any one of embodiments 264-265, the RNP complex of embodiment 266, the cell of embodiment 267, or the system of any one of embodiments 268-28.
280. A method for modifying a target DNA molecule comprising a target sequence, comprising:
a) Assembling the RGN deaminase ribonucleotide complex by combining under conditions suitable for forming the RGN deaminase ribonucleotide complex:
i) One or more guide RNAs capable of hybridizing to a DNA sequence of interest; and
ii) a fusion protein comprising an RNA-guided nuclease polypeptide (RGN) and at least one deaminase, wherein the deaminase has a sequence identical to SEQ ID NO:407 has an amino acid sequence having at least 90% sequence identity; and
b) Contacting the target DNA molecule or a cell comprising the target DNA molecule with an assembled RGN deaminase ribonucleotide complex;
wherein the one or more guide RNAs hybridizes to the target DNA sequence, thereby directing the fusion protein to bind to the target DNA sequence and modification of the target DNA molecule occurs.
281. The method of embodiment 280, wherein the DNA sequence of interest comprises a sequence selected from the group consisting of SEQ ID NOs: 62-97, 116-139, 152-185, 203-234, 251-286, 305-344, 562 and 563 or a complement thereof.
282. The method of any one of embodiments 280-281, wherein the gRNA sequence comprises a sequence selected from the group consisting of SEQ ID NOs: 98-115, 140-151, 186-202, 235-250, 287-304, 345-364 and 564.
283. The method of any one of embodiments 280-283, wherein the method is performed in vitro, in vivo, or ex vivo.
284. A method of treating a subject having or at risk of developing a disease, disorder or condition, the method comprising:
administering to a subject the fusion protein of any of embodiments 245-253, the nucleic acid molecule of any of embodiments 254-263, the vector of any of embodiments 264-265, the RNP complex of embodiment 266, the cell of embodiment 267, the system of any of embodiments 268-28, or the pharmaceutical composition of embodiment 279.
285. The method of embodiment 284, further comprising administering a polypeptide comprising a sequence selected from the group consisting of SEQ ID NOs: 98-115, 140-151, 186-202, 235-250, 287-304, 345-364 and 564.
286. An isolated polypeptide comprising a sequence identical to SEQ ID NO:405, wherein the polypeptide has deaminase activity.
287. An isolated polypeptide of embodiment 286, comprising an amino acid sequence that hybridizes to SEQ ID NO:405, wherein the polypeptide has deaminase activity.
288. The isolated polypeptide of embodiment 286, wherein the polypeptide comprises SEQ ID NO: 407.
289. A nucleic acid molecule comprising a polynucleotide encoding a deaminase polypeptide, wherein the deaminase is encoded by a nucleotide sequence that:
a) And SEQ ID NO:449 has at least 80% sequence identity, or
b) Encoding a sequence corresponding to SEQ ID NO:405, having at least 90% sequence identity.
290. The nucleic acid molecule of embodiment 289, wherein the deaminase is purified by hybridization with SEQ ID NO:449 has at least 90% sequence identity.
291. The nucleic acid molecule of embodiment 289, wherein the deaminase is purified by hybridization with SEQ ID NO:449 has at least 95% sequence identity.
292. The nucleic acid molecule of embodiment 289, wherein the deaminase is purified by hybridization with SEQ ID NO:449 has at least 100% sequence identity.
293. The nucleic acid molecule of any one of embodiments 289-292, wherein the nucleic acid molecule further comprises a heterologous promoter operably linked to the polynucleotide.
294. A pharmaceutical composition comprising a pharmaceutically acceptable carrier and a polypeptide of any one of embodiments 286-288 or a nucleic acid molecule of any one of embodiments 289-293.
295. A fusion protein comprising a DNA binding polypeptide and a polypeptide that hybridizes to SEQ ID NO:405 deaminase having at least 90% sequence identity.
296. The fusion protein of embodiment 295, comprising a DNA-binding polypeptide and a polypeptide that hybridizes to SEQ ID NO:405 deaminase having at least 95% sequence identity.
297. A fusion protein of embodiment 295 comprising a DNA-binding polypeptide and a polypeptide that hybridizes to SEQ id no:405 deaminase having 100% sequence identity.
298. The fusion protein of any of embodiments 295-297, wherein the DNA-binding polypeptide is an RNA-guided nuclease (RGN) polypeptide.
299. The fusion protein of embodiment 298, wherein the RGN polypeptide is a type II CRISPR-Cas polypeptide or a type V CRISPR-Cas polypeptide.
300. The fusion protein of any of embodiments 298-299, wherein the RGN polypeptide is Cas9, casX, casY, cpfl, C2C1, C2, C2C3, geoCas9, cjCas9, casl2a, casl2b, casl2g, casl2h, casl2i, casl3b, casl3C, casl3d, casl4, csn2, xCas9, spCas9-NG, lbcast 2a, ascas 2a, cas9-KKH, circularly arranged Cas9, argonaute (Ago), smacas 9, spy-macCas9 domain, or a polypeptide having the sequence of SEQ ID NO: 41. 60, 366 or 368.
301. The fusion protein of any one of embodiments 298-300, wherein the RGN polypeptide is a nicking enzyme.
302. The fusion protein of embodiment 301, wherein the nicking enzyme has a sequence identical to SEQ ID NO: 42. 52-59, 61, 397, and 398 having at least 95% sequence identity.
303. The fusion protein of embodiment 301, wherein the nicking enzyme has a sequence identical to SEQ ID NO: 42. 52-59, 61, 397, and 398, having 100% sequence identity.
304. A nucleic acid molecule comprising a polynucleotide encoding a fusion protein comprising a DNA-binding polypeptide and a deaminase, wherein the deaminase is encoded by a nucleotide sequence that:
a) And SEQ ID NO:449 has at least 80% sequence identity, or
b) Encoding a sequence corresponding to SEQ ID NO:405 has an amino acid sequence having at least 90% sequence identity.
305. The nucleic acid molecule of embodiment 304, wherein the deaminase is purified by hybridization with SEQ ID NO:449 has at least 90% sequence identity.
306. The nucleic acid molecule of embodiment 304, wherein the deaminase is purified by hybridization with SEQ ID NO:449 has at least 95% sequence identity.
307. The nucleic acid molecule of embodiment 304, wherein the deaminase is purified by hybridization with SEQ ID NO:449 has at least 100% sequence identity.
308. The nucleic acid molecule of any one of embodiments 304-307, wherein the DNA-binding polypeptide is an RGN polypeptide.
309. The nucleic acid molecule of embodiment 308, wherein the RGN is a type II CRISPR-Cas polypeptide or a type V CRISPR-Cas polypeptide.
310. The nucleic acid molecule of any one of embodiments 308-309, wherein the RGN polypeptide is Cas9, casX, casY, cpfl, C2C1, C2, C2C3, geoCas9, cjCas9, casl2a, casl2b, casl2g, casl2h, casl2i, casl3b, casl3C, casl3d, casl4, csn2, xCas9, spCas9-NG, lbcast 2a, ascas 2a, cas9-KKH, circularly arranged Cas9, argonaute (Ago), smacas 9, spy-macCas9 domain, or a polypeptide having the sequence of SEQ ID NO: 41. 60, 366 or 368.
311. The nucleic acid molecule of any of embodiments 308-310, wherein the RGN polypeptide is a nicking enzyme.
312. The nucleic acid molecule of embodiment 311, wherein the nicking enzyme has a sequence that matches SEQ ID NO: 42. 52-59, 61, 397, and 398 having at least 95% sequence identity.
313. The nucleic acid molecule of embodiment 312, wherein the nicking enzyme has a sequence that matches SEQ ID NO: 42. 52-59, 61, 397, and 398, having 100% sequence identity.
314. A vector comprising the nucleic acid molecule of any one of embodiments 304-313.
315. The vector of embodiment 314, further comprising at least one nucleotide sequence encoding a guide RNA (gRNA) capable of hybridizing to the target sequence.
316. A Ribonucleoprotein (RNP) complex comprising the fusion protein of any of embodiments 295-303 and a guide RNA that binds to a DNA-binding polypeptide of the fusion protein.
317. A cell comprising the fusion protein of any of embodiments 295-303, the nucleic acid molecule of any of embodiments 304-313, the vector of any of embodiments 314-315, or the RNP complex of embodiment 316.
318. A system for modifying a target DNA molecule comprising a target DNA sequence, the system comprising:
a) A fusion protein comprising an RNA-guided nuclease (RGN) polypeptide and a deaminase or a nucleotide sequence encoding said fusion protein, wherein the deaminase has a sequence identical to SEQ ID NO:405 has an amino acid sequence having at least 90% sequence identity; and
b) One or more guide RNAs or one or more nucleotide sequences encoding the one or more guide RNAs (grnas) capable of hybridizing to the target DNA sequence; and
Wherein one or more guide RNAs are capable of forming a complex with a fusion protein so as to direct the fusion protein to bind to the DNA sequence of interest and modify the DNA molecule of interest.
319. The system of embodiment 318, wherein the deaminase has a sequence that matches SEQ ID NO:405 has an amino acid sequence having at least 95% sequence identity.
320. The system of embodiment 318, wherein the deaminase has a sequence that matches SEQ ID NO:405 has an amino acid sequence with 100% sequence identity.
321. The system of any one of embodiments 318-320, wherein at least one of said nucleotide sequences encoding one or more guide RNAs and said nucleotide sequence encoding a fusion protein are operably linked to a promoter heterologous to said nucleotide sequences.
322. The system of any of embodiments 318-321, wherein the DNA sequence of interest is positioned adjacent to a pre-spacer adjacent motif (PAM) that is recognized by the RGN polypeptide.
323. The system of any of embodiments 318-322, wherein the DNA sequence of interest comprises a sequence selected from the group consisting of SEQ ID NOs: 62-97, 116-139, 152-185, 203-234, 251-286, 305-344, 562 and 563 or a complement thereof.
324. The system of any one of embodiments 318-323, wherein the gRNA sequence comprises a sequence selected from the group consisting of SEQ ID NOs: 98-115, 140-151, 186-202, 235-250, 287-304, 345-364 and 564.
325. The system of embodiments 318-324, wherein the RGN polypeptide of the fusion protein is a type II CRISPR-Cas polypeptide or a type V CRISPR-Cas polypeptide.
326. The system of any of embodiments 322-325, wherein the RGN polypeptide is Cas9, casX, casY, cpfl, C2C1, C2, C2C3, geoCas9, cjCas9, casl2a, casl2b, casl2g, casl2h, casl2i, casl3b, casl3C, casl3d, casl4, csn2, xCas9, spCas9-NG, lbcast 2a, ascalsl 2a, cas9-KKH, circularly arranged Cas9, argonaute (Ago), smacas 9, spy-macCas9 domain, or a polypeptide having the sequence of SEQ ID NO: 41. an RGN of the amino acid sequence shown in any one of 60, 366 or 368.
327. The system of embodiment 326, wherein the RGN polypeptide is a nicking enzyme.
328. The system of embodiment 327, wherein the nicking enzyme has a nucleotide sequence that matches SEQ ID NO: 42. 52-59, 61, 397, and 398 having at least 95% sequence identity.
329. A pharmaceutical composition comprising a pharmaceutically acceptable carrier and the fusion protein of any one of embodiments 295-303, the nucleic acid molecule of any one of embodiments 304-313, the vector of any one of embodiments 314-315, the RNP complex of embodiment 316, the cell of embodiment 317, or the system of any one of embodiments 318-328.
330. A method for modifying a target DNA molecule comprising a target sequence, comprising:
a) Assembling the RGN deaminase ribonucleotide complex by combining under conditions suitable for forming the RGN deaminase ribonucleotide complex:
i) One or more guide RNAs capable of hybridizing to a DNA sequence of interest; and
ii) a fusion protein comprising an RNA-guided nuclease polypeptide (RGN) and at least one deaminase, wherein the deaminase has a sequence identical to SEQ ID NO:405 has an amino acid sequence having at least 90% sequence identity; and
b) Contacting the target DNA molecule or a cell comprising the target DNA molecule with an assembled RGN deaminase ribonucleotide complex;
wherein the one or more guide RNAs hybridizes to the target DNA sequence, thereby directing the fusion protein to bind to the target DNA sequence and modification of the target DNA molecule occurs.
331. The method of embodiment 330, wherein the target DNA sequence comprises a sequence selected from the group consisting of SEQ ID NOs: 62-97, 116-139, 152-185, 203-234, 251-286, 305-344, 562 and 563 or a complement thereof.
332. The method of any one of embodiments 330-331, wherein the gRNA sequence comprises a sequence selected from the group consisting of SEQ ID NOs: 98-115, 140-151, 186-202, 235-250, 287-304, 345-364 and 564.
333. The method of any one of embodiments 330-332, wherein the method is performed in vitro, in vivo, or ex vivo.
334. A method of treating a subject having or at risk of developing a disease, disorder or condition, the method comprising:
administering to a subject the fusion protein of any of embodiments 295-303, the nucleic acid molecule of any of embodiments 304-313, the vector of any of embodiments 314-315, the RNP complex of embodiment 316, the cell of embodiment 317, the system of any of embodiments 318-328, or the pharmaceutical composition of embodiment 329.
335. The method of embodiment 334, further comprising administering a polypeptide comprising a polypeptide selected from the group consisting of SEQ ID NOs: 98-115, 140-151, 186-202, 235-250, 287-304, 345-364 and 564.
336. An isolated polypeptide comprising a sequence identical to SEQ ID NO:399, wherein said polypeptide has deaminase activity.
337. An isolated polypeptide of embodiment 336, comprising an amino acid sequence that hybridizes to SEQ ID NO:399, wherein said polypeptide has deaminase activity.
338. The isolated polypeptide of embodiment 336, wherein the polypeptide comprises SEQ ID NO: 399.
339. A nucleic acid molecule comprising a polynucleotide encoding a deaminase polypeptide, wherein the deaminase is encoded by a nucleotide sequence that:
a) And SEQ ID NO:443 has at least 80% sequence identity, or
b) Encoding a sequence corresponding to SEQ ID NO:399 having at least 90% sequence identity.
340. The nucleic acid molecule of embodiment 339, wherein the deaminase is purified by hybridization with SEQ ID NO:443 has at least 90% sequence identity of the nucleotide sequence encoding.
341. The nucleic acid molecule of embodiment 339, wherein the deaminase is purified by hybridization with SEQ ID NO:443 has at least 95% sequence identity of the nucleotide sequence encoding.
342. The nucleic acid molecule of embodiment 339, wherein the deaminase is purified by hybridization with SEQ ID NO:443 has at least 100% sequence identity of the nucleotide sequence encoding.
343. The nucleic acid molecule of any one of embodiments 339-342, wherein the nucleic acid molecule further comprises a heterologous promoter operably linked to the polynucleotide.
344. A pharmaceutical composition comprising a pharmaceutically acceptable carrier and a polypeptide of any one of embodiments 336-338 or a nucleic acid molecule of any one of embodiments 339-342.
345. A fusion protein comprising a DNA binding polypeptide and a polypeptide that hybridizes to SEQ ID NO:399 have at least 90% sequence identity.
346. The fusion protein of embodiment 345, comprising a DNA-binding polypeptide and a polypeptide that hybridizes to SEQ ID NO:399 have at least 95% sequence identity.
347. The fusion protein of embodiment 345, comprising a DNA-binding polypeptide and a polypeptide that hybridizes to SEQ ID NO:399 have 100% sequence identity.
348. The fusion protein of any one of embodiments 345-347, wherein the DNA-binding polypeptide is an RNA-guided nuclease (RGN) polypeptide.
349. The fusion protein of embodiment 348, wherein the RGN polypeptide is a type II CRISPR-Cas polypeptide or a type V CRISPR-Cas polypeptide.
350. The fusion protein of any of embodiments 348-349, wherein the RGN polypeptide is Cas9, casX, casY, cpfl, C2C1, C2, C2C3, geoCas9, cjCas9, casl2a, casl2b, casl2g, casl2h, casl2i, casl3b, casl3C, casl3d, casl4, csn2, xCas9, spCas9-NG, lbcast 2a, ascas 2a, cas9-KKH, circularly arranged Cas9, argonaute (Ago), smacas 9, spy-macCas9 domain, or a polypeptide having the sequence of SEQ ID NO: 41. 60, 366 or 368.
351. The fusion protein of any one of embodiments 348-350, wherein the RGN polypeptide is a nicking enzyme.
352. The fusion protein of embodiment 351, wherein the nicking enzyme has a sequence identical to SEQ ID NO: 42. 52-59, 61, 397, and 398 having at least 95% sequence identity.
353. The fusion protein of embodiment 351, wherein the nicking enzyme has a sequence identical to SEQ ID NO: 42. 52-59, 61, 397, and 398, having 100% sequence identity.
354. A nucleic acid molecule comprising a polynucleotide encoding a fusion protein comprising a DNA-binding polypeptide and a deaminase, wherein the deaminase is encoded by a nucleotide sequence that:
a) And SEQ ID NO:443 has at least 80% sequence identity, or
b) Encoding a sequence corresponding to SEQ ID NO:399 has an amino acid sequence having at least 90% sequence identity.
355. The nucleic acid molecule of embodiment 354, wherein the deaminase is purified by hybridization with SEQ ID NO:443 has at least 90% sequence identity of the nucleotide sequence encoding.
356. The nucleic acid molecule of embodiment 354, wherein the deaminase is purified by hybridization with SEQ ID NO:443 has at least 95% sequence identity of the nucleotide sequence encoding.
357. The nucleic acid molecule of embodiment 354, wherein the deaminase is purified by hybridization with SEQ ID NO:443 has at least 100% sequence identity of the nucleotide sequence encoding.
358. The nucleic acid molecule of any one of embodiments 354-357, wherein the DNA-binding polypeptide is an RGN polypeptide.
359. The nucleic acid molecule of embodiment 358, wherein the RGN is a type II CRISPR-Cas polypeptide or a type V CRISPR-Cas polypeptide.
360. The nucleic acid molecule of any of embodiments 358-359, wherein the RGN polypeptide is Cas9, casX, casY, cpfl, C2C1, C2, C2C3, geoCas9, cjCas9, casl2a, casl2b, casl2g, casl2h, casl2i, casl3b, casl3C, casl3d, casl4, csn2, xCas9, spCas9-NG, lbcast 2a, ascas 2a, cas9-KKH, circularly arranged Cas9, argonaute (Ago), smacas 9, spy-macCas9 domain, or a polypeptide having the sequence of SEQ ID NO: 41. 60, 366 or 368.
361. The nucleic acid molecule of any of embodiments 358-360, wherein the RGN polypeptide is a nicking enzyme.
362. The nucleic acid molecule of embodiment 361, wherein the nicking enzyme has a sequence that is identical to SEQ ID NO: 42. 52-59, 61, 397, and 398 having at least 95% sequence identity.
363. The nucleic acid molecule of embodiment 362, wherein the nicking enzyme has a sequence that is identical to SEQ ID NO: 42. 52-59, 61, 397, and 398, having 100% sequence identity.
364. A vector comprising the nucleic acid molecule of any one of embodiments 354-363.
365. The vector of embodiment 364, further comprising at least one nucleotide sequence encoding a guide RNA (gRNA) capable of hybridizing to the target sequence.
366. A Ribonucleoprotein (RNP) complex comprising the fusion protein of any of embodiments 345-353 and a guide RNA that binds to a DNA-binding polypeptide of the fusion protein.
367. A cell comprising the fusion protein of any one of embodiments 345-353, the nucleic acid molecule of any one of embodiments 354-363, the vector of any one of embodiments 364-365, or the RNP complex of embodiment 366.
368. A system for modifying a target DNA molecule comprising a target DNA sequence, the system comprising:
a) A fusion protein comprising an RNA-guided nuclease (RGN) polypeptide and a deaminase or a nucleotide sequence encoding said fusion protein, wherein the deaminase has a sequence identical to SEQ ID NO:399 having an amino acid sequence having at least 90% sequence identity; and
b) One or more guide RNAs or one or more nucleotide sequences encoding the one or more guide RNAs (grnas) capable of hybridizing to the target DNA sequence; and
Wherein one or more guide RNAs are capable of forming a complex with a fusion protein so as to direct the fusion protein to bind to the DNA sequence of interest and modify the DNA molecule of interest.
369. The system of embodiment 368, wherein the deaminase has a sequence that matches SEQ ID NO:399 has an amino acid sequence having at least 95% sequence identity.
370. The system of embodiment 368, wherein the deaminase has a sequence that matches SEQ ID NO:399 has an amino acid sequence of 100% sequence identity.
371. The system of any one of embodiments 368-370, wherein at least one of the nucleotide sequences encoding one or more guide RNAs and the nucleotide sequence encoding the fusion protein are operably linked to a promoter heterologous to the nucleotide sequences.
372. The system of any one of embodiments 368-371, wherein the target DNA sequence is located adjacent to a pre-spacer adjacent motif (PAM) recognized by the RGN polypeptide.
373. The system of any one of embodiments 368-372, wherein the DNA sequence of interest comprises a sequence selected from the group consisting of SEQ ID NOs: 62-97, 116-139, 152-185, 203-234, 251-286, 305-344, 562 and 563 or a complement thereof.
374. The system of any one of embodiments 368-373, wherein the gRNA sequence comprises a sequence selected from the group consisting of SEQ ID NOs: 98-115, 140-151, 186-202, 235-250, 287-304, 345-364 and 564.
375. The system of embodiments 368-374, wherein the RGN polypeptide of the fusion protein is a type II CRISPR-Cas polypeptide or a type V CRISPR-Cas polypeptide.
376. The system of any one of embodiments 372-375, wherein the RGN polypeptide is Cas9, casX, casY, cpfl, C2C1, C2, C2C3, geoCas9, cjCas9, casl2a, casl2b, casl2g, casl2h, casl2i, casl3b, casl3C, casl3d, casl4, csn2, xCas9, spCas9-NG, lbcast 2a, asCasl2a, cas9-KKH, circularly arranged Cas9, argonaute (Ago), smacas 9, spy-macCas9 domain, or a polypeptide having the sequence of SEQ ID NO: 41. an RGN of the amino acid sequence shown in any one of 60, 366 or 368.
377. The system of embodiment 376, wherein the RGN polypeptide is a nicking enzyme.
378. The system of embodiment 377, wherein the nicking enzyme has a nucleotide sequence that matches SEQ ID NO: 42. 52-59, 61, 397, and 398 having at least 95% sequence identity.
379. A pharmaceutical composition comprising a pharmaceutically acceptable carrier and the fusion protein of any one of embodiments 345-353, the nucleic acid molecule of any one of embodiments 354-363, the vector of any one of embodiments 364-365, the RNP complex of embodiment 366, the cell of embodiment 367, or the system of any one of embodiments 368-378.
380. A method for modifying a target DNA molecule comprising a target sequence, comprising:
a) Assembling the RGN deaminase ribonucleotide complex by combining under conditions suitable for forming the RGN deaminase ribonucleotide complex:
i) One or more guide RNAs capable of hybridizing to a DNA sequence of interest; and
ii) a fusion protein comprising an RNA-guided nuclease polypeptide (RGN) and at least one deaminase, wherein the deaminase has a sequence identical to SEQ ID NO:399 having an amino acid sequence having at least 90% sequence identity; and
b) Contacting the target DNA molecule or a cell comprising the target DNA molecule with an assembled RGN deaminase ribonucleotide complex;
wherein the one or more guide RNAs hybridizes to the target DNA sequence, thereby directing the fusion protein to bind to the target DNA sequence and modification of the target DNA molecule occurs.
381. The method of embodiment 380, wherein the DNA sequence of interest comprises a sequence selected from the group consisting of SEQ ID NOs: 62-97, 116-139, 152-185, 203-234, 251-286, 305-344, 562 and 563 or a complement thereof.
382. The method of any one of embodiments 380-381, wherein the gRNA sequence comprises a sequence selected from the group consisting of SEQ ID NOs: 98-115, 140-151, 186-202, 235-250, 287-304, 345-364 and 564.
383. The method of any one of embodiments 380-382, wherein the method is performed in vitro, in vivo, or ex vivo.
384. A method of treating a subject having or at risk of developing a disease, disorder or condition, the method comprising:
administering to the subject the fusion protein of any of embodiments 345-353, the nucleic acid molecule of any of embodiments 354-363, the vector of any of embodiments 364-365, the RNP complex of embodiment 366, the cell of embodiment 367, the system of any of embodiments 368-378, or the pharmaceutical composition of embodiment 379.
385. The method of embodiment 384, further comprising administering a polypeptide comprising a polypeptide selected from the group consisting of SEQ ID NOs: 98-115, 140-151, 186-202, 235-250, 287-304, 345-364 and 564.
386. A method for producing a treatment or reduction of at least one symptom of cystic fibrosis, the method comprising administering to a subject in need thereof an effective amount of:
a) A fusion protein comprising an RNA-guided nuclease polypeptide (RGN) and a deaminase or a polynucleotide encoding said fusion protein, wherein the deaminase has a sequence identical to SEQ ID NO: 407. 405, 399, 1-10, 400-404, 406, and 408-441, wherein said polynucleotide encoding the fusion protein is operably linked to a promoter such that the fusion protein is expressed in a cell; and
b) One or more guide RNAs (grnas) capable of hybridizing to a DNA sequence of interest or polynucleotides encoding the grnas, wherein the polynucleotides encoding the grnas are operably linked to a promoter such that the grnas are expressed in a cell;
thus, the fusion protein and gRNA target the genomic position of the causal mutation and modify the genomic sequence to remove the causal mutation.
387. The method of embodiment 386, wherein the gRNA comprises a targeting sequence of SEQ ID NO: the spacer sequence of any one of 62-97, 116-139, 152-185, 203-234, 251-286, 305-344, 562, and 563, or a complement thereof.
388. The method of embodiment 386 or 387, wherein the gRNA comprises SEQ ID NO: any of 98-115, 140-151, 186-202, 235-250, 287-304, 345-364, and 564.
389. The method of any one of claims 386-388, wherein the RGN has a nucleotide sequence that matches SEQ ID NO: 41. 60, 366, and 368.
390. The method of any one of claims 386-389, wherein the RGN has a nucleotide sequence that matches SEQ ID NO: 42. 52-59, 61, 397, and 398 having at least 90% sequence identity.
The following examples are provided by way of illustration and not by way of limitation.
Examples
Example 1: demonstration of base editing in mammalian cells
The deaminase shown in table 1 below is produced based on a naturally occurring deaminase which is then mutated and selected for adenine deaminase activity in prokaryotic cells.
Table 1: deaminase sequences
/>
To determine whether the deaminase of table 1 is capable of performing adenine base editing in mammalian cells, each deaminase is operably fused to an RGN nickase to produce a fusion protein. Residues predicted to deactivate the RuvC domain of RGN APG07433.1 (shown in SEQ ID NO: 41; described in PCT publication WO 2019/236566, incorporated herein by reference) were identified and RGN was modified to a nicking enzyme variant (napg 07433.1; SEQ ID NO: 42). The nicking enzyme variant of RGN is referred to herein as "nRGN". It will be appreciated that nicking enzyme variants of RGN may be used to generate fusion proteins of the present invention.
Deaminase and codon optimized nRGN nucleotide sequence for mammalian expression were synthesized as fusion proteins with an N-terminal nuclear localization tag and cloned into pTwist CMV (Twist Biosciences) expression plasmids. Each fusion protein includes an SV40 NLS (SEQ ID NO: 43) that is operably linked to a 3X FLAG tag (SEQ ID NO: 44) beginning at the amino terminus and operably linked to a deaminase at the C terminus, a peptide linker (SEQ ID NO: 45) operably linked to nRGN (e.g., nAG 07433.1, which is SEQ ID NO: 42) at the C terminus, and finally to a nucleoplasmin (nucleoplastin) NLS (SEQ ID NO: 45) at the C terminus. All fusion proteins included at least one NLS and 3X FLAG tag, as described above.
Expression plasmids comprising an expression cassette encoding sgRNA expressed by the human U6 promoter (SEQ ID NO: 50) were also generated. The sequence of the human genome target and the sgRNA sequence used to direct the fusion protein to the genome target are indicated in table 2.
Table 2: guide RNA sequences
In 24-well culture plates, 500ng of a plasmid comprising an expression cassette comprising the coding sequence for each deaminase described in table 1 and 500ng of a plasmid comprising an expression cassette encoding the sgrnas shown in table 2 were co-transfected into HEK293FT cells using Lipofectamine 2000 reagent (Life Technologies) at 75-90% confluency. Then, the cells were cultured at 37℃for 72 hours. Then, after the cultivation, genomic DNA was extracted using Nucleospin 96 Tissue (Macherey-Nagel) according to the manufacturer's protocol. Using the primers in Table 2, the genomic regions flanking the targeted genomic sites were PCR amplified and the product was purified using a ZR-96DNA cleaner and concentrator (Zymo Research) according to the manufacturer's protocol. The purified PCR products were next-generation sequenced on Illumina MiSeq. Typically, 100,000 paired-end reads of 250bp (2×100,000 reads) are generated per amplicon. The readings were analyzed using CRISPResso (Pinello et al, 2016,Nature Biotech 34:695-697) to calculate the edit rate. Output alignment was analyzed for INDEL formation or introduction of specific adenine mutations. Tables 3 to 7 show adenine base editing for each fusion protein comprising napg07433.1 and deaminase in table 1 and guide RNA in table 2. Deaminase components of each fusion protein are indicated. The edit rate of adenine within or near the target sequence is indicated. For example, "A5" indicates adenine at position 5 of the target sequence. The position of each nucleotide in the target sequence is determined as position 1 by counting the first nucleotide in the target sequence closest to PAM, and the number of positions increases in the 3' direction with the departure from the PAM sequence. The table also shows at what rate adenine is changed to which nucleotide. For example, table 3 shows that for the APG09982-nAPG07433.1 fusion protein, adenine at position 13 was mutated to guanine at a rate of 1.2%.
Table 3: a > N edit rate using guided SGN000139
/>
All fusion proteins showed detectable a > G transitions at positions a12 and a 13. APG09982 and APG0633 show at least 1% editing at position a 13.
Table 4: a > N edit rate using guided SGN000143
All fusion proteins show a > G transition at positions a11 and a 14. APG09982 shows a 4.5% transition from a11 to G and a 1.7% transition from a14 to G.
Table 5: a > N edit rate using guided SGN000186
All fusion proteins showed more than 1% base editing at multiple positions in target SGN 000186. APG09102 shows 6.2% a > G transition at position a 16; it also showed more than 2% base editing at positions A9 and a 18. For all fusion proteins tested, position a16 was most highly edited.
Table 6: a > N edit rate using guided SGN000194
/>
With SGN00194, all fusion proteins showed 0.9% to 1.8% editing of a > G at position a 15. Undetectable edits are seen at positions a21, a23, a26, and a 27.
Table 7: a > N edit rate using guided SGN000930
/>
With respect to all fusion proteins tested, a14 was the highest editing position in SGN 000930. For the A > G transition, the edit rate ranges from 0.3% to 1.2%.
Example 2: targeted adenine base editing fluorometry
Vectors were constructed with Enhanced Green Fluorescent Protein (EGFP) containing a W58X mutation causing a premature STOP codon (GFP-STOP, SEQ ID NO: 47) such that the W58 codon could be reverted from the STOP codon (TGA) to a wild-type Tryptophan (TGG) residue using adenine deaminase to change the third position A to G. Successful transition of a to G results in expression of EGFP that can be quantified. A second vector capable of expressing guide RNA targeting the deaminase RGN fusion protein to the region surrounding the W58X mutation (SEQ ID NO: 48) was also generated.
This GFP-STOP reporter vector was transfected into HEK293T cells using lipofection or electroporation, along with a vector capable of expressing deaminase nRGN fusion protein and corresponding guide RNA. For liposome transfection, cells were plated at 1×10 in 24-well plates the day prior to transfection in growth medium (dmem+10% foetal calf serum+1% penicillin/streptomycin) 5 Individual cells/wells were seeded. According to the manufacturer's instructions, use3000 reagent (Thermo Fisher Scientific) 500ng GFP-STOP reporter vector, deaminase RGN expression vector and guide RNA expressionEach of the carriers. For electroporation, use +. >Transfection system (Thermo Fisher Scientific), cells were electroporated.
In addition to transient transfection of fluorescent GFP-STOP reporter, stable cell lines with chromosome-integrated GFP-STOP expression cassettes were generated. For transfection, once a stable cell line was established, cells were plated at 1×10 in 24-well plates one day prior to transfection in growth medium (dmem+10% foetal calf serum+1% penicillin/streptomycin) 5 One day of cell/well seeding. According to the manufacturer's instructions, use3000 reagent (Thermo Fisher Scientific) 500ng of deaminase RGN expression vector and guide RNA expression vector were transfected. For electroporation, use +.>Transfection system (Thermo Fisher Scientific), cells were electroporated.
24-48 hours after liposome transfection or electroporation, GFP expression was determined by microscopic examination of cells for the presence of gfp+ cells. After visual inspection, the ratio of gfp+ cells to GFP-cells can be determined. Fluorescence was observed in mammalian cells expressing each deaminase nRGN fusion protein, indicating that the fusion protein was successfully targeted to GFP-STOP mutations and editing the mutations to restore GFP protein fluorescence.
Following microscopic analysis, cells were lysed in RIPA buffer and the resulting lysates were analyzed on a fluorescent plate reader (fluorescence plate reader) to determine the fluorescent intensity of GFP (table 8). Those skilled in the art will appreciate that cells can be analyzed to determine the ratio of gfp+ cells to GFP-cells by flow cytometry or fluorescence activated cell sorting (fluorescence activated cell sorting).
Table 8: GFP-STOP assay results
D = undetected; gfp+ cells with little + = were detected; ++ = several gfp+ cells were detected; ++ = many gfp+ cells were detectedExample 3: demonstration of A base editing in mammalian cells
The deaminase shown in table 9 below was generated based on a naturally occurring deaminase, which was then mutated and selected for adenine deaminase activity in prokaryotic cells.
Table 9: deaminase sequences
To determine whether the deaminase of table 9 is capable of performing adenine base editing in mammalian cells, each deaminase is operably fused to an RGN nickase to produce a fusion protein. Residues predicted to deactivate the RuvC domain of RGN APG07433.1 (shown in SEQ ID NO: 41; described in PCT publication WO 2019/236566, incorporated herein by reference) are recognized and RGN is modified to a nicking enzyme variant (napg 07433.1; SEQ ID NO: 42). The nicking enzyme variant of RGN is referred to herein as "nRGN". It will be appreciated that nicking enzyme variants of RGN may be used to generate fusion proteins of the present invention.
Deaminase and codon optimized nRGN nucleotide sequence for mammalian expression were synthesized as fusion proteins with an N-terminal nuclear localization tag and cloned into pTwist CMV (Twist Biosciences) expression plasmids. Each fusion protein includes an SV40 NLS (SEQ ID NO: 43) that is operably linked to a 3X FLAG tag (SEQ ID NO: 44) beginning at the amino terminus and operably linked to a deaminase at the C terminus, a peptide linker (SEQ ID NO: 442) operably linked to nRGN (e.g., nAG 07433.1, which is SEQ ID NO: 42) at the C terminus, and finally to a nucleoplasmin (nucleoplastin) NLS (SEQ ID NO: 46) at the C terminus. The nucleotide sequence of the peptide linker codon optimized for mammalian expression is set forth in SEQ ID NO:486 and 487. Table 10 shows the fusion proteins that were generated and tested for activity. All fusion proteins included at least one NLS and 3X FLAG tag, as described above.
Table 10: fusion protein sequence with N-terminal SV40 NLS, 3X FLAG tag and C-terminal nucleoplasmin NLS
/>
Expression plasmids comprising expression cassettes encoding sgrnas were also generated. The sequence of the human genome target and the sgRNA sequence used to direct the fusion protein to the genome target are indicated in table 11.
Table 11: guide RNA sequences
In 24-well plates, 500ng of a plasmid comprising an expression cassette comprising the coding sequence for the fusion protein shown in table 10 and 500ng of a plasmid comprising an expression cassette comprising the coding sgRNA shown in table 11 were co-transfected into HEK293FT cells using Lipofectamine 2000 reagent (Life Technologies) at 75-90% confluency. Then, the cells were cultured at 37℃for 72 hours. Then, after the cultivation, genomic DNA was extracted using Nucleospin 96Tissue (Macherey-Nagel) according to the manufacturer's protocol. Using the primers in Table 11, the genomic regions flanking the targeted genomic sites were amplified by PCR and the product was purified using a ZR-96DNA cleaner and concentrator (Zymo Research) according to the manufacturer's protocol. The purified PCR products were next-generation sequenced on Illumina MiSeq. Typically, 100,000 paired-end reads of 250bp (2×100,000 reads) are generated per amplicon. The readings were analyzed using CRISPResso (Pinello et al, 2016,Nature Biotech 34:695-697) to calculate the edit rate. Output alignment was analyzed for INDEL formation or introduction of specific adenine mutations.
For each adenine deaminase in table 10 and guide RNA in table 12, table 12 shows all adenine base edits. Tables 13-27 show specific nucleotide mutation profiles for selection of exemplary samples. The edit rate of adenine within or near the target sequence is indicated. For example, "A5" indicates adenine at position 5 of the target sequence. The position of each nucleotide in the target sequence was determined as position 1 by counting the first nucleotide in the target sequence closest to PAM (which is 3 'of the target of APG 07433.1), and the number of positions increased in the 5' direction with distance from the PAM sequence. The table also shows at what rate adenine is changed to which nucleotide. For example, table 13 shows that for LPG50148-nAPG07433.1 fusion protein, adenine at position 13 was mutated to guanine at a rate of 9.7%.
Table 12: estimated value of base editing rate of each adenine deaminase
/>
/>
/>
Table 13: a > N editing rate using deaminase LPG50148 and guided SGN000139
LPG50140, LPG50146, and LPG50148 show detectable a > G transitions at positions a12 and a 13. LPG50148 shows more than 9% editing at location a 13.
Table 14: a > N editing rate using deaminase LPG50148 and guided SGN000143
LPG50140, LPG50146, and LPG50148 show a detectable a > G transition at positions A9, a11, and a 14. LPG50148 shows more than 11% editing at location a 11.
Table 15: a > N editing rate using deaminase LPG50148 and guided SGN000186
LPG50140, LPG50146, and LPG50148 show a detectable a > G transition at positions A9, a16, and a 18. LPG50148 shows more than 23% editing at positions A9 and a 16.
Table 16: a > N editing rate using deaminase LPG50148 and guided SGN000194
LPG50140, LPG50146, and LPG50148 show a detectable a > G transition at positions a13 and a 15. LPG50148 shows more than 12% editing at position a13 and a 15.
Table 17: a > N editing rate using deaminase LPG50148 and guided SGN000930
LPG50140, LPG50146, and LPG50148 show detectable a > G transitions at positions a10, a14, a15, a16, a20, and a 21. LPG50148 shows more than 2% editing at positions a10, a14, a16, a20 and a 21.
Table 18: a > N edit rate using deaminase LPG50146 and guided SGN000139
LPG50140, LPG50146, and LPG50148 show a > G transition detectable at positions a12 and a 13. LPG50146 shows more than 4% editing at location a 13.
Table 19: a > N edit rate using deaminase LPG50146 and guided SGN000143
LPG50140, LPG50146, and LPG50148 show detectable a > G transitions at positions A9, a11, and a 14. LPG50146 shows more than 8% editing at location a 11.
Table 20: a > N edit rate using deaminase LPG50146 and guided SGN000186
LPG50140, LPG50146, and LPG50148 show a detectable a > G transition at positions A9, a16, and a 18. LPG50146 shows more than 13% editing at location a 16.
Table 21: a > N editing rate using deaminase LPG50146 and guided SGN000194
LPG50140, LPG50146, and LPG50148 show a detectable a > G transition at positions a13 and a 15. LPG50146 shows more than 3% editing at positions a13 and a 15.
Table 22: a > N editing rate using deaminase LPG50146 and guided SGN000930
LPG50140, LPG50146, and LPG50148 show a > G transition detectable at positions a10, a14, a15, a16, a20, and a 21. LPG50146 shows more than 2% editing at positions a14 and a 16.
Table 23: a > N editing rate using deaminase LPG50140 and guided SGN000139
LPG50140, LPG50146, and LPG50148 show a detectable a > G transition at positions a12 and a 13. LPG50140 shows more than 5% editing at location a 13.
Table 24: a > N editing rate using deaminase LPG50140 and guided SGN000143
LPG50140, LPG50146, and LPG50148 show a detectable a > G transition at positions A9, a11, and a 14. LPG50140 shows more than 14% editing at location a 11.
Table 25: a > N editing rate using deaminase LPG50140 and guided SGN000186
LPG50140, LPG50146, and LPG50148 show a detectable a > G transition at positions A9, a16, and a 18. LPG50140 shows more than 9% editing at positions A9 and a 16.
Table 26: a > N editing rate using deaminase LPG50140 and guided SGN000194
LPG50140, LPG50146, and LPG50148 show a detectable a > G transition at positions a13 and a 15. LPG50140 shows more than 6% editing at position a13 and a 15.
Table 27: a > N editing rate using deaminase LPG50140 and guided SGN000930
LPG50140, LPG50146, and LPG50148 show a > G transition detectable at positions a10, a14, a15, a16, a20, and a 21. LPG50140 shows more than 1% editing at positions a14 and a 16.
Table 28 below shows the average rate of editing of LPG50148-nAPG07433.1 at the lead of several tests in HEK293T cells by liposome transfection of two plasmids. The base editor is encoded on one plasmid and the guide RNA is encoded on a second plasmid. The total substitution rate in the target was used to measure the base editing rate.
Table 28: average editing Rate of LPG 50148-nAPGG433.1
/>
LPG50148-napg07433.1 shows editing at many different guides across the genome.
Table 29 shows the editing rate of adenine bases in each guide from LPG 50148-nAPGG433.1. Only adenine positions are shown below. Adenine conversion was found to be the average of several experiments where appropriate.
Table 29: editing rate of A nucleotide in mammalian cells for the first 10 guides
LPG50148-napg07433.1 shows adenine base editing in positions 6 to 21 in the target region depending on the guide RNA used. The editing rate varies depending on the guide RNA used.
Example 4: correction of class I cystic fibrosis nonsense mutations
Example 4.1: identification of RGN and guide RNA
Cystic fibrosis is generally caused by deleterious mutations in the CFTR gene (SEQ ID NO: 51). Six of the most common nonsense mutations are G542X, W1282X, R553X, R1162X, E60X, R785X, and Q493X. Each of these termination mutations can be edited by an RGN deaminase fusion protein described herein to restore the coding codon. To target each mutation, the following must be determined: 1) RGN with PAM recognition site near nonsense mutation; and 2) optimally targeting RGN deaminase fusion proteins to guide RNAs of the target DNA. Table 30 below shows the nicking enzyme variants with RGNs near each of the six nonsense mutations and the number of guide RNAs available for each RGN. Table 31 depicts the loci of the genes for each guide RNA. PAM recognition sites for the loci of each gene are underlined. The target sequence of the guide RNA and the guide RNA sequence itself are also indicated.
Table 30: RGN nicking enzyme and number of guide RNAs for nonsense mutations in CFTR
Table 31: guide RNA for nonsense mutations in CFTR
/>
/>
/>
/>
/>
Table 28 in example 3 provides edit data for SGN001101 sgrnas targeting CFTR.
To determine the activity of other guide RNAs, the guide RNAs of table 31 are provided with the corresponding nicking enzyme variants of each RGN described in table 30 operably linked to the deaminase of the present invention to generate fusion proteins. It should be appreciated that variants of each RGN that are also similarly tested for nuclease inactivating activity may also be tested. With respect to the editing capacity at the target site in 16HBE14 o-immortalized bronchial epithelial cells, each guide and fusion protein combination was determined. Currently, three HBE cell lines containing nonsense mutations to CFTR are available (Cystic Fibrosis Foundation, lexington, MA). Such cell lines were used to determine G542X, W1282X, R1162X nonsense mutation targets and compared to the 16HBE14 o-line. The fusion proteins and guide RNAs are delivered to cells as Ribonucleoproteins (RNPs) which are nuclear transfected into 16HBE14 o-cell lines according to the culture and transformation methods provided by Valley et al (Valley et al 2019.J Cyst Fibros 18, 476-483, which are incorporated herein by reference). The guide RNA is provided as a single guide RNA or as a tracrRNA in a 1:1 or 1:1.2 molar ratio with RGN protein: crRNA duplex (duplex). RNP-to-intracellular nuclear transfection was performed on a Lonza 4D-Nucleofector. Then, the cells were cultured at 37℃for 72 hours. In some embodiments, the fusion protein and the gRNA are delivered to the cell as an RNA molecule, and the fusion protein is encoded in mRNA.
Because there were no cell lines available for E60X, R553X, and Q493X, such mutations were determined in HEK293 cells using the modification of the GFP recovery assay described in example 2, in which the mutant loci containing the nonsense mutations were cloned into GFP reading frame 2.
Then, after the cultivation, genomic DNA was extracted using Nucleospin 96Tissue (Macherey-Nagel) according to the manufacturer's protocol. Genomic regions flanking the targeted genomic sites were amplified by PCR and the product was purified using ZR-96DNA clean and concentrator (Zymo Research) according to the manufacturer's protocol. The purified PCR product was sent on Illumina MiSeq for next generation sequencing. Typically, 100,000 paired-end reads of 250bp (2×100,000 reads) are generated per amplicon. The readings were analyzed using CRISPResso (Pinello et al 2016) to calculate edit rates. Manual trimming (hand-cut) output alignment to confirm the introduction of base editing mutations of interest and also screen for undesired INDEL formation.
In addition to the efficiency of base editing, the protein product of the base editing CFTR gene was evaluated for function. Regarding both of the nonsense mutations (Glu 60X and Gly 542X), the base editing changes of adenine to guanine do not restore the wild type sequence, as such mutations are caused by guanine to thymine transversions. The targeting activity of the fusion protein changed Glu60X to Glu60Gln and Gly452X to Gly542Arg. Although such mutations do allow the preparation of full-length proteins, the stability and functionality of CFTR proteins are also confirmed.
Example 4.2: size-reduced engineered RGN
Ideally, the coding sequence of the RGN deaminase fusion proteins of the present invention and the corresponding guide RNA for targeting the fusion proteins to the CFTR gene may all be packaged within a single AAV vector. In general, the size of the AAV vector accepted is limited to 4.7kb, but larger sizes are contemplated, but at the cost of reduced packaging efficiency. The RGN nicking enzyme in Table 30 has a coding sequence of about 3.15-3.45kB in length. To ensure that expression cassettes for both fusion proteins and their corresponding guide RNAs can be loaded into AAV vectors, it is desirable to shorten the length of the RGN amino acid and its corresponding nucleic acid coding sequence.
The unique 8 amino acid region at positions 590-597 was identified in APG07433.1 and its close homolog APG08290.1 by alignment with closely related homologs (described in WO 2019/236566 and shown herein as SEQ ID NO: 60). SEQ ID NO as APG 07433.1: 365 and APG08290.1, SEQ ID NO: this region, shown at 367, was removed from both proteins to give the variant RGNAPG07433.1-del (SEQ ID NO: 366) and APG08290.1-del (SEQ ID NO: 368). Such deletion variants and their corresponding wild-type RGNs were determined for editing activity in HEK293T cells using guide RNAs indicated in tables 32 and 33, following a method similar to that described in example 1. The editing rate of the target sequence is shown in tables 32 and 33 below.
Table 32: editing rate of APG07433.1 protein deletion variants
Regarding the targets SGN000169, SGN000173, SGN000186, SGN000927, SGN000930, and SGN001101, the wild-type APG07433.1 proteins were similar to the editing rate of the engineered variants. Regarding the targets SGN000139, SGN000143, and SGN000194, the editing rate was reduced when using engineered variants compared to wild-type proteins. In the case of SGN000929 and SGN000935, the editing rate was increased with the engineered APG07433.1 variants compared to the wild-type sequence.
Table 33: editing rate of APG08290.1 protein deletion variants
N.d. =undetermined
The APG08290.1 deleted variant showed editing in all samples of wild-type APG08290.1 protein that also showed editing. With the engineered protein, the lowest editing rate detected was 0.13%. Target SGN000926 displays the highest edit rate: 9.17%.
Using a method similar to example 1, fusion proteins comprising APG07433.1-del or APG08290.1-del and deaminase of the present invention were generated and assayed for base editing activity.
Fusion proteins include RGN and the expression of a polypeptide such as SEQ ID NO:45, and a flexible peptide linker-linked deaminase. SEQ ID NO:45 is 16 amino acids in length; this size can be reduced to reduce the size of the coding sequence of the fusion protein. Using a method similar to example 1, peptide linkers of less than 16 amino acids can be generated and operably linked to RGN APG07433.1-del or APG08290.1-del and deaminase of the present invention and tested for base editing activity. Because the peptide linker between RGN and deaminase can determine the editing window of the fusion protein, testing for alternative linkers with different lengths and rigidities can also lead to improved editing efficiency while reducing off-target mutations. Thus, the fusion protein with the highest editing rate was then determined to determine the editing efficiency for each of the CFTR target sequences in a similar manner to example 4.1. The fusion protein gRNA combination with the highest editing efficiency was selected as the preferred guide for editing at that location and was used for AAV vector design.
Example 4.3: AAV delivery
The coding sequence of the effective fusion protein/gRNA combination with the highest editing rate was packaged into AAV vectors. AAV delivery has many advantages, including lack of pathogenicity, low immunogenicity, high transduction rates, and defined manufacturing pathways. Furthermore, AAV administration to the lung has been shown to be safe and, at least to some extent, effective in both single and repeated administrations (Guggino et al, 2017,Expert Opin Biol Ther 17, 1265-1273). After the fusion protein/gRNA combination is cloned into an AAV vector, it can be packaged into several different serotypes to optimize tissue-specific infectivity. Regarding the treatment of CF, the goal of base editing is the apical epithelial progenitor cells of the lung (progenitor apical epithelium cell), which allow correction to persist throughout the cell renewal. To target respiratory epithelium, capsids of serotypes AAV1, AAV5 or AAV6 are employed, as such serotypes have been shown to have high infectivity in respiratory epithelial cells (zabiner et al, 2000, j Virol 74, 3852-3858).
Once AAV vectors are produced, they are transduced into human airway epithelial cells in culture. Three HBE cell lines containing CFTR G542X, R1162X, and W1282X nonsense mutation targets were used to make the constructs effective for correction of those mutations. The 16HBE14 o-line was used to test constructs correcting other nonsense mutations. The range of multiplicity of infection (multiplicities of infection, MOI) was tested. In either case, reversion (version) of the nonsense mutation to the wild-type CFTR sequence was assessed. After 2-3 days of culture, genomic DNA was harvested, amplicons surrounding the targeted sites were generated by PCR, and NGS was performed in a manner similar to that described in example 1 to determine the rate of editing at each locus. Because airway epithelial cells are used, AAV incorporation and editing rates were as similar as possible to treatment in vivo when using the cultured cell system. AAV with different serotypes are compared to determine which serotype is optimal for delivering fusion protein/gRNA into airway cells. The edit rate achieved by AAV introduction of such a system was compared to the RNP edit rate observed in example 4.2.
Because cell lines for the nonsense mutations R553X, E X, and Q493X are not available, fusion protein/gRNA systems targeting such mutations were evaluated in wild-type 16HBE14 o-cells to determine AAV introduction, base editor expression, and off-target editing rates at the location of interest. To determine the stop codon correction rate, mutant loci were cloned into GFP for GFP recovery assays as described in example 4.1.
In parallel with determining the rate of editing by NGS, total protein lysates from cells with CFTR mutations edited with the fusion protein/gRNA system were collected and the level of full length CFTR protein was assessed by western blotting.To test whether a functional CFTR protein is formed, a forskolin (forskolin) activation assay is performed using methods similar to those described by Devor et al (2000,Am J Physiol Cell Physiol 279, C461-479, incorporated herein by reference) and/or dousmas et al (2002,J Gen Physiol119, 545-559, incorporated herein by reference). In such experiments, edited CFTR mutant cells were treated with forskolin (an activator of adenylate cyclase) to increase intracellular levels of cAMP. Then, elevated cAMP levels activate CFTR, and Cl - Inflow of (C) through Cl based on Gene-encoded yellow fluorescent protein - A sensor or a small molecule fluorescent indicator such as MQAE chloride. In this assay, cell lines edited by G542X, R1162X, and W1282X were tested.
To determine off-target mutation rates, bioinformatic measures tailored with information about seed regions and flexible off-target PAM recognition space for each specific nuclease were used. This information has been bioinformatically determined for each protein and used to rank the likelihood of off-target activity for each protein.
To complement off-target bioinformatic predictions, off-target biochemical deletions via a modified SITE-seq operational procedure (camelon et al, 2017,Nat Methods 14, 600-606, incorporated herein by reference) were also performed. Briefly, genomic DNA from human airway epithelial cells was obtained. This DNA was then treated with several different concentrations of RGN of interest. With the adaptor sequences that allow NGS, any DNA double strand breaks are tagged, selectively isolated, and PCR amplified. The sequencing reads were then mapped to the genome, and the "pile-up" of reads was identified at the site of the double strand break, which labeled the putative off-target location. In a set of subsequent experiments, cells were edited with RGN or RGN deaminase fusion proteins of interest, and such putative sites were individually sequenced to confirm whether they were truly off-target. Biochemical methods typically overestimate the number of off-targets because of chromatin background, DNA accessibility (accessibility), and other factors can impact the efficiency of the genome editor in living cells. Thus, both bioinformatics and biochemistry methods together provide complementary methods to identify putative off-target sites, but such sites must be verified by amplicon sequencing to get an accurate assessment of off-target editing.
Once putative off-target sites are identified, sequencing of amplicons on 16HBE airway epithelial cells with the same optimized fusion protein and guide-editing ensures that the off-target profile established for such systems matches as closely as possible with the desired profile in the patient's lungs.
Careful analysis of the total transcripts of the cells after editing is necessary to determine whether the fusion proteins described herein induce changes in cellular RNAs. Fortunately, RNA-seq techniques to assess adenine base editing off-target have become conventional (Grunewald et al, 2017, nature569, 433-437; zhou et al, nature 571, 275-278, both incorporated herein by reference). Briefly, after editing cells with the fusion protein/gRNA system determined in example 4.2, total cellular mRNA was collected and subjected to RNA-seq. Total transcripts from edited cells were compared to cells transfected with ABE alone and were distinguished for significant differences in RNA sequence.
Example 5: targeted base editing for correction of causal disease mutations
Clinical variant databases are obtained from the NCBI ClinVar database available on the NCBI ClinVar website through the world wide web. Pathogenic Single Nucleotide Polymorphisms (SNPs) are identified from this list. Using genomic locus information, CRISPR targets in regions overlapping and surrounding each SNP are identified. The choices among SNPs that can be corrected using base editing in combination with RGN (e.g., RGN such as listed in table 30 or variants thereof) to target causal mutations ("Casl mut.") are listed in table 34. In table 34 below, only one alias for each disease is listed. "RS#" corresponds to the RS accession number in the SNP database on the NCBI website. "AlleID" corresponds to a causal allele accession number. The "name" column contains the locus identifier of the gene, the name of the gene, the location of the mutation in the gene, and the change resulting from the mutation.
Table 34: disease targets for base editing
/>
/>
/>
/>
/>
/>
/>
/>
/>
/>
/>
/>
/>
/>
/>
/>
/>
/>
/>
/>
Example 6: demonstration of Gene editing Activity in plant cells
The base editing activity of the RGN deaminase fusion proteins of the present invention was demonstrated in plant cells using an adapted protocol from Li et al, 2013 (Nat. Biotech.31:688-691). Briefly, using PEG-mediated transformation, an expression vector comprising an expression cassette capable of expressing in a plant cell an RGN deaminase fusion protein operably linked to an SV40 nuclear localization signal (SEQ ID NO: 43) and a second expression cassette encoding a guide RNA targeting one or more sites flanking the appropriate PAM sequence in the plant PDS gene was introduced into Nicotiana benthamiana (Nicotiana benthamiana) mesophyll protoplasts (mesophyll protoplast). The transformed protoplasts were cultured in the dark for up to 36 hours. Genomic DNA was isolated from protoplasts using DNeasy Plant Mini Kit (Qiagen). Genomic regions flanking the RGN target site were amplified by PCR, the product was purified, and the purified PCR product was analyzed using next generation sequencing on an Illumina Miseq. Typically, 100,000 paired-end reads of 250bp (2×100,000 reads) are generated per amplicon. The readings were analyzed using CRISPResso (Pinello et al, 2016,Nature Biotech 34:695-697) to calculate the edit rate. Output alignment was analyzed for INDEL formation or introduction of specific adenine mutations.
Example 7: testing mRNA delivery
To determine whether the base editor could be delivered in different formats, mRNA delivery was tested with primary T cells. Purified CD3+ T cells or PBMC were thawed and activated for 3 days using CD3/CD28 beads (ThermoFisher), and then nuclear transfected using the Lonza 4D-Nucleofector X unit and Nucleocuvette band. The P3 Primary Cell kit was used for both mRNA and RNP delivery. For mRNA and RNP delivery, cells were transfected using EO-115 and EH-115 procedures, respectively. Cells were cultured in CTS Optimizer T cell expansion Medium (ThermoFisher) containing IL-2, IL-7, and IL-15 (Miltenyi Biotec) for 4 days (after nuclear transfection, before harvest using Nucleospin Tissue genomic DNA isolation kit (Machery Nagel)).
Amplicons surrounding the editing site were generated by PCR using primers identified in table 35 and subjected to NGS sequencing using paired-end sequencing of 2x250bp using the Illumina Nexterra platform. The estimated base-editing rate was determined by counting the overall substitution rate for each sample. The average and number of samples tested for each guide are shown below.
Table 35: average rate of editing of LPG 50148-nAPGG433.1 via mRNA delivery
/>

Claims (211)

1. An isolated polypeptide comprising an amino acid sequence having at least 90% sequence identity to any one of SEQ ID NOs 407, 399, 405, 1-10, 400-404, 406 and 408-441, wherein said polypeptide has deaminase activity.
2. A nucleic acid molecule comprising a polynucleotide encoding a deaminase polypeptide, wherein the deaminase is encoded by a nucleotide sequence that:
(a) Has at least 80% sequence identity to any one of SEQ ID NOS 451, 449, 443, 11-20, 444-448, 450 and 452-485, or
(b) Encoding an amino acid sequence having at least 90% sequence identity to any one of SEQ ID NOs 407, 399, 405, 1-10, 400-404, 406 and 408-441.
3. The nucleic acid molecule of claim 2, wherein the nucleic acid molecule further comprises a heterologous promoter operably linked to the polynucleotide.
4. A pharmaceutical composition comprising a pharmaceutically acceptable carrier and the polypeptide of claim 1 or the nucleic acid molecule of claim 2 or 3.
5. A fusion protein comprising a DNA binding polypeptide and a deaminase having at least 90% sequence identity to any of SEQ ID NOs 407, 399, 405, 1-10, 400-404, 406 and 408-441.
6. The fusion protein of claim 5, wherein the deaminase is adenine deaminase.
7. The fusion protein of claim 5 or 6, wherein the DNA-binding polypeptide is a meganuclease, a zinc finger fusion protein, or a TALEN.
8. The fusion protein of claim 5 or 6, wherein the DNA-binding polypeptide is an RNA-guided DNA-binding polypeptide.
9. The fusion protein of claim 8, wherein the RNA-guided DNA-binding polypeptide is an RNA-guided nuclease (RGN) polypeptide.
10. The fusion protein of claim 9, wherein the RGN is a type II CRISPR-Cas polypeptide.
11. The fusion protein of claim 9, wherein the RGN is a V-type CRISPR-Cas polypeptide.
12. The fusion protein of any one of claims 9-11, wherein the RGN is an RGN nickase.
13. The fusion protein of claim 9, wherein the RGN has an amino acid sequence with at least 95% sequence identity to any one of SEQ ID NOs 41, 60, 366 and 368.
14. The fusion protein of claim 12, wherein the RGN nicking enzyme is any one of SEQ ID NOs 42, 52-59, 61, 397 and 398.
15. The fusion protein of any one of claims 5-14, wherein the fusion protein further comprises at least one Nuclear Localization Signal (NLS).
16. A nucleic acid molecule comprising a polynucleotide encoding a fusion protein comprising a DNA-binding polypeptide and a deaminase, wherein the deaminase is encoded by a nucleotide sequence that:
(a) Has at least 80% sequence identity to any one of SEQ ID NOS 451, 449, 443, 11-20, 444-448, 450 and 452-485, or
(b) Encoding an amino acid sequence having at least 90% sequence identity to any one of SEQ ID NOs 407, 399, 405, 1-10, 400-404, 406 and 408-441.
17. The nucleic acid molecule of claim 16, wherein the deaminase is adenine deaminase.
18. The nucleic acid molecule of claim 16 or 17, wherein the DNA-binding polypeptide is a meganuclease, a zinc finger fusion protein, or a TALEN.
19. The nucleic acid molecule of claim 16 or 17, wherein the DNA-binding polypeptide is an RNA-guided DNA-binding polypeptide.
20. The nucleic acid molecule of claim 19, wherein the RNA-guided DNA-binding polypeptide is an RNA-guided nuclease (RGN) polypeptide.
21. The nucleic acid molecule of claim 20, wherein the RGN is a type II CRISPR-Cas polypeptide.
22. The nucleic acid molecule of claim 20, wherein the RGN is a V-type CRISPR-Cas polypeptide.
23. The nucleic acid molecule of claim 20, wherein the RGN is an RGN nickase.
24. The nucleic acid molecule of claim 20, wherein the RGN has an amino acid sequence with at least 95% sequence identity to any one of SEQ ID NOs 41, 60, 366 and 368.
25. The nucleic acid molecule of claim 23, wherein the RGN nickase is any one of SEQ ID NOs 42, 52-59, 61, 397, and 398.
26. The nucleic acid molecule of any one of claims 16-25, wherein the polynucleotide encoding the fusion protein is operably linked at its 5' end to a heterologous promoter.
27. The nucleic acid molecule of any one of claims 16-26, wherein the polynucleotide encoding the fusion protein is operably linked at its 3' end to a heterologous terminator.
28. The nucleic acid molecule of any one of claims 16-27, wherein the fusion protein comprises one or more nuclear localization signals.
29. The nucleic acid molecule of any one of claims 16-28, wherein the fusion protein is codon optimized for expression in eukaryotic cells.
30. The nucleic acid molecule of any one of claims 16-28, wherein the fusion protein is codon optimized for expression in a prokaryotic cell.
31. A vector comprising the nucleic acid molecule of any one of claims 16-30.
32. The vector of claim 31, further comprising at least one nucleotide sequence encoding a guide RNA (gRNA) capable of hybridizing to a target sequence.
33. The vector of claim 32, wherein the gRNA is a single guide RNA.
34. The vector of claim 32, wherein the gRNA is a double guide RNA.
35. A cell comprising the fusion protein of any one of claims 5-15, the nucleic acid molecule of any one of claims 16-30, or the vector of any one of claims 31-34.
36. A cell comprising the fusion protein of any one of claims 5-15, wherein the cell further comprises a guide RNA.
37. A method of making a fusion protein comprising culturing the cell of claim 35 or 36 under conditions that express the fusion protein.
38. A method of preparing a fusion protein comprising introducing the nucleic acid molecule of any one of claims 16-30 or the vector of any one of claims 31-34 into a cell, and culturing the cell under conditions that express the fusion protein.
39. The method of claim 37 or 38, further comprising purifying the fusion protein.
40. A method of making an RGN fusion ribonucleoprotein complex comprising introducing the nucleic acid molecule of any of claims 16-30 and a nucleic acid molecule comprising an expression cassette encoding a guide RNA, or the vector of any of claims 31-34 into a cell, and culturing the cell under conditions that express the fusion protein and gRNA and form an RGN fusion ribonucleoprotein complex.
41. The method of claim 40, further comprising purifying the RGN fusion ribonucleoprotein complex.
42. A system for modifying a target DNA molecule comprising a target DNA sequence, the system comprising:
a) A fusion protein comprising an RNA-guided nuclease polypeptide (RGN) and a deaminase or a nucleotide sequence encoding said fusion protein, wherein the deaminase has an amino acid sequence with at least 90% sequence identity to any of SEQ ID NOs 407, 399, 405, 1-10, 400-404, 406 and 408-441; and
b) One or more guide RNAs capable of hybridizing to the target DNA sequence, or one or more nucleotide sequences encoding the one or more guide RNAs (grnas); and
wherein the one or more guide RNAs are capable of forming a complex with the fusion protein to direct the fusion protein to bind to the target DNA sequence and modify the target DNA molecule.
43. The system of claim 42, wherein at least one of the nucleotide sequence encoding one or more guide RNAs and the nucleotide sequence encoding a fusion protein is operably linked to a promoter heterologous to the nucleotide sequence.
44. The system of claim 42 or 43, wherein the target DNA sequence is a eukaryotic target DNA sequence.
45. The system of any one of claims 42-44, wherein the target DNA sequence is located adjacent to a pre-spacer adjacent motif (PAM) recognized by the RGN.
46. The system of any one of claims 42-45, wherein the target DNA molecule is intracellular.
47. The system of any one of claims 42-46, wherein the RGN of the fusion protein is a type II CRISPR-Cas polypeptide.
48. The system of any one of claims 42-46, wherein the RGN of the fusion protein is a V-type CRISPR-Cas polypeptide.
49. The system of any one of claims 42-46, wherein the RGN of the fusion protein has an amino acid sequence with at least 95% sequence identity to SEQ ID No. 41, 60, 366 or 368.
50. The system of any one of claims 42-46, wherein the RGN of the fusion protein is an RGN nickase.
51. The system of claim 50, wherein the RGN nickase is any one of SEQ ID NOs 42, 52-59, 61, 397 and 398.
52. The system of any one of claims 42-51, wherein the fusion protein comprises one or more nuclear localization signals.
53. The system of any one of claims 42-52, wherein the fusion protein is codon optimized for expression in eukaryotic cells.
54. The system of any one of claims 42-53, wherein the nucleotide sequence encoding the one or more guide RNAs and the nucleotide sequence encoding the fusion protein are located on one vector.
55. A pharmaceutical composition comprising a pharmaceutically acceptable carrier and the fusion protein of any one of claims 5-15, the nucleic acid molecule of any one of claims 16-30, the vector of any one of claims 31-34, the cell of claim 35 or 36, or the system of any one of claims 42-54.
56. A method for modifying a target DNA molecule comprising a target DNA sequence, the method comprising delivering the system of any one of claims 42-54 to the target DNA molecule or a cell comprising the target DNA molecule.
57. A method for modifying a target DNA molecule comprising a target sequence, comprising:
a) Assembling the RGN deaminase ribonucleotide complex in vitro under conditions suitable for forming the RGN deaminase ribonucleotide complex by combining:
i) One or more guide RNAs capable of hybridizing to the target DNA sequence; and
ii) a fusion protein comprising an RNA guided nuclease polypeptide (RGN) and at least one deaminase, wherein the deaminase has an amino acid sequence with at least 90% sequence identity to any of SEQ ID NOs 407, 399, 405, 1-10, 400-404, 406 and 408-441; and
b) Contacting the target DNA molecule or a cell comprising the target DNA molecule with an RGN deaminase ribonucleotide complex assembled in vitro;
wherein the one or more guide RNAs hybridizes to the target DNA sequence, thereby directing the fusion protein to bind to the target DNA sequence and modification of the target DNA molecule occurs.
58. The method of claim 56 or 57, wherein said modified target DNA molecule comprises an A > N mutation of at least one nucleotide within the target DNA molecule, wherein N is C, G, or T.
59. The method of claim 58, wherein the modified target DNA molecule comprises an A > G mutation of at least one nucleotide within the target DNA molecule.
60. The method of any one of claims 56-59, wherein the RGN of the fusion protein is a type II CRISPR-Cas polypeptide.
61. The method of any one of claims 56-59, wherein the RGN of the fusion protein is a V-type CRISPR-Cas polypeptide.
62. The method of any one of claims 56-59, wherein the RGN of the fusion protein has an amino acid sequence with at least 95% sequence identity to SEQ ID No. 41, 60, 366 or 368.
63. The method of any one of claims 56-59, wherein the RGN of the fusion protein is RGN-nicking enzyme.
64. The method of claim 63, wherein the RGN nickase is any one of SEQ ID NOs 42, 52-59, 61, 397 and 398.
65. The method of any one of claims 56-64, wherein the fusion protein comprises one or more nuclear localization signals.
66. The method of any one of claims 56-65, wherein the fusion protein is codon optimized for expression in eukaryotic cells.
67. The method of any one of claims 56-66, wherein the target DNA sequence is a eukaryotic target DNA sequence.
68. The method of any one of claims 56-67, wherein said target DNA sequence is positioned adjacent to a pre-spacer adjacent motif (PAM).
69. The method of any one of claims 56-68, wherein the target DNA molecule is intracellular.
70. The method of claim 69, further comprising selecting a cell comprising the modified DNA molecule.
71. A cell comprising a modified DNA sequence of interest of the method of claim 70.
72. A pharmaceutical composition comprising the cell of claim 71 and a pharmaceutically acceptable carrier.
73. A method of generating a genetically modified cell using correction in causal mutations against a genetic disease, the method comprising introducing into the cell:
a) A fusion protein comprising an RNA-guided nuclease polypeptide (RGN) and a deaminase or a polynucleotide encoding said fusion protein, wherein the deaminase has a sequence identical to SEQ ID NO: 407. 405, 399, 1-10, 400-404, 406, and 408-441, wherein said polynucleotide encoding the fusion protein is operably linked to a promoter such that the fusion protein is expressed in the cell; and
b) One or more guide RNAs (grnas) capable of hybridizing to a DNA sequence of interest or polynucleotides encoding the grnas, wherein the polynucleotides encoding the grnas are operably linked to a promoter such that the grnas are expressed in the cell;
Thus, the fusion protein and gRNA target the genomic position of the causal mutation and modify the genomic sequence to remove the causal mutation.
74. The method of claim 73, wherein said RGN of the fusion protein is RGN nicking enzyme.
75. The method of claim 74, wherein the RGN nickase is any one of SEQ ID NOs 42, 52-59, 61, 397 and 398.
76. The method of any one of claims 73-75, wherein the genomic modification comprises introducing an a > G mutation of at least one nucleotide into the target DNA sequence.
77. The method of any one of claims 73-76, wherein the correction of the causal mutation comprises correcting a nonsense mutation.
78. The method of claim 73, wherein the genetic disorder is a disorder listed in Table 34.
79. The method of claim 73, wherein the genetic disorder is cystic fibrosis.
80. A method of treating a disease, the method comprising administering to a subject in need thereof an effective amount of the pharmaceutical composition of claim 55 or 72.
81. The method of claim 80, wherein said disease is associated with a causal mutation, and said effective amount of said pharmaceutical composition corrects said causal mutation.
82. Use of the fusion protein of any one of claims 5-15, the nucleic acid molecule of any one of claims 16-30, the vector of any one of claims 31-34, the cells of any one of claims 35, 36 and 71, or the system of any one of claims 42-54 for treating a disease in a subject.
83. The use of claim 82, wherein the disease is associated with a causal mutation, and the treatment comprises correcting the causal mutation.
84. Use of the fusion protein of any one of claims 5-15, the nucleic acid molecule of any one of claims 16-30, the vector of any one of claims 31-34, the cell of any one of claims 35, 36 and 71, or the system of any one of claims 42-54 in the manufacture of a medicament for treating a disease.
85. The use of claim 84, wherein the disease is associated with a causal mutation, and an effective amount of the agent corrects the causal mutation.
86. A nucleic acid molecule comprising a polynucleotide encoding an RNA-guided nuclease (RGN) polypeptide, wherein the polynucleotide comprises a nucleotide sequence encoding an RGN polypeptide comprising a sequence identical to SEQ ID NO:41 or 60, but lacks the amino acid sequence of SEQ ID NO:41 or 60 from 590 to 597;
Wherein the RGN polypeptide is capable of binding to a target DNA sequence in an RNA-directed sequence-specific manner when bound to a guide RNA (gRNA) capable of hybridizing to the target DNA sequence.
87. The nucleic acid molecule of claim 86, wherein the polynucleotide encoding an RGN polypeptide is operably linked to a promoter heterologous to the polynucleotide.
88. The nucleic acid molecule of claim 86 or 87, wherein the RGN polypeptide is nuclease-inactive or acts as a nicking enzyme.
89. The nucleic acid molecule of any one of claims 86-88, wherein the RGN polypeptide is operably fused to a base editing polypeptide.
90. A vector comprising the nucleic acid molecule of any one of claims 86-89.
91. An isolated polypeptide comprising an amino acid sequence having at least 95% sequence identity to SEQ ID No. 41 or 60, but lacking amino acid residues 590 to 597 of SEQ ID No. 41 or 60, wherein said polypeptide is an RNA guided nuclease.
92. The isolated polypeptide of claim 91, wherein the RGN polypeptide comprises an amino acid sequence having at least 95% sequence identity to SEQ ID No. 366 or 368.
93. The isolated polypeptide of claim 91 or 92, wherein the RGN polypeptide is nuclease-inactive or acts as a nicking enzyme.
94. The isolated polypeptide of any one of claims 91-93, wherein the RGN polypeptide is operably fused to a base editing polypeptide.
95. A cell comprising the nucleic acid molecule of any one of claims 86-89, the vector of claim 90, or the polypeptide of any one of claims 91-94.
96. An isolated polypeptide comprising an amino acid sequence having at least 90% sequence identity to SEQ ID No. 407, wherein said polypeptide has deaminase activity.
97. The isolated polypeptide of claim 96, wherein the polypeptide comprises the amino acid sequence set forth in SEQ ID No. 407.
98. A nucleic acid molecule comprising a polynucleotide encoding a deaminase polypeptide, wherein the deaminase is encoded by a nucleotide sequence that:
a) Has at least 80% sequence identity to SEQ ID NO 451, or
b) Encoding an amino acid sequence having at least 90% sequence identity to any one of SEQ ID NOs 407.
99. The nucleic acid molecule of claim 98, wherein the nucleic acid molecule further comprises a heterologous promoter operably linked to the polynucleotide.
100. A pharmaceutical composition comprising a pharmaceutically acceptable carrier and the polypeptide of any one of claims 96-97 or the nucleic acid molecule of any one of claims 98-99.
101. A fusion protein comprising a DNA binding polypeptide and a deaminase having at least 90% sequence identity to SEQ ID No. 407.
102. The fusion protein of claim 101, wherein the DNA-binding polypeptide is an RNA-guided nuclease (RGN) polypeptide.
103. The fusion protein of claim 102, wherein the RGN polypeptide is a type II CRISPR-Cas polypeptide or a type V CRISPR-Cas polypeptide.
104. The fusion protein of any one of claims 101-103, wherein the RGN polypeptide is Cas9, casX, casY, cpfl, C cl, C2, C2C3, geoCas9, cjCas9, casl2a, casl2b, casl2g, casl2h, casl2i, casl3b, casl3C, casl3d, casl4, csn2, xCas9, spCas9-NG, lbcast 2a, asCasl2a, cas9-KKH, circularly arranged Cas9, argonaute (Ago), smacas 9, spy-macCas9 domain, or an RGN polypeptide having the amino acid sequence set forth in any one of SEQ ID NOs 41, 60, 366, or 368.
105. The fusion protein of any one of claims 102-104, wherein the RGN polypeptide is a nicking enzyme.
106. The fusion protein of claim 105, wherein the nicking enzyme has an amino acid sequence that has at least 95% sequence identity to any one of SEQ ID NOs 42, 52-59, 61, 397 and 398.
107. A nucleic acid molecule comprising a polynucleotide encoding a fusion protein comprising a DNA-binding polypeptide and a deaminase, wherein the deaminase is encoded by a nucleotide sequence that:
a) Has at least 80% sequence identity to SEQ ID NO 451, or
b) Encodes an amino acid sequence that has at least 90% sequence identity to SEQ ID NO 407.
108. The nucleic acid molecule of claim 107, wherein the DNA-binding polypeptide is an RGN polypeptide.
109. The nucleic acid molecule of claim 108, wherein the RGN is a type II CRISPR-Cas polypeptide or a type V CRISPR-Cas polypeptide.
110. The nucleic acid molecule of any one of claims 107-109, wherein the RGN polypeptide is Cas9, casX, casY, cpfl, C cl, C2, C2C3, geoCas9, cjCas9, casl2a, casl2b, casl2g, casl2h, casl2i, casl3b, casl3C, casl3d, casl4, csn2, xCas9, spCas9-NG, lbcast 2a, asCasl2a, cas9-KKH, circularly arranged Cas9, argonaute (Ago), smacas 9, spy-macCas9 domain, or an RGN polypeptide having the amino acid sequence set forth in any one of SEQ ID NOs 41, 60, 366, or 368.
111. The nucleic acid molecule of any one of claims 108-110, wherein the RGN polypeptide is a nicking enzyme.
112. The nucleic acid molecule of claim 111, wherein the nicking enzyme has an amino acid sequence that has at least 95% sequence identity to any one of SEQ ID NOs 42, 52-59, 61, 397 and 398.
113. A vector comprising the nucleic acid molecule of any one of claims 107-112.
114. The vector of claim 113, further comprising at least one nucleotide sequence encoding a guide RNA (gRNA) capable of hybridizing to a target sequence.
115. A Ribonucleoprotein (RNP) complex comprising the fusion protein of any one of claims 101-106 and the guide RNA of the DNA-binding polypeptide bound to the fusion protein.
116. A cell comprising the fusion protein of any one of claims 101-106, the nucleic acid molecule of any one of claims 107-112, the vector of any one of claims 113-114, or the RNP complex of claim 115.
117. A system for modifying a target DNA molecule comprising a target DNA sequence, the system comprising:
a) A fusion protein comprising an RNA-guided nuclease (RGN) polypeptide and a deaminase, or a nucleotide sequence encoding said fusion protein, wherein the deaminase has an amino acid sequence with at least 90% sequence identity to SEQ ID No. 407; and
b) One or more guide RNAs or one or more nucleotide sequences encoding the one or more guide RNAs (grnas) capable of hybridizing to the target DNA sequence; and
wherein the one or more guide RNAs are capable of forming a complex with the fusion protein so as to direct the fusion protein to bind to the target DNA sequence and modify the target DNA molecule.
118. The system of claim 117, wherein at least one of the nucleotide sequence encoding one or more guide RNAs and the nucleotide sequence encoding a fusion protein is operably linked to a promoter heterologous to the nucleotide sequence.
119. The system of any one of claims 117-118, wherein the target DNA sequence is positioned adjacent to a pre-spacer adjacent motif (PAM) that is recognized by the RGN polypeptide.
120. The system of any one of claims 117-119, wherein the target DNA sequence comprises a nucleic acid sequence selected from the group consisting of SEQ ID NOs 62-97, 116-139, 152-185, 203-234, 251-286, 305-344, 562, and 563, or a complement thereof.
121. The system of any one of claims 117-120, wherein the gRNA sequence comprises a nucleic acid sequence selected from the group consisting of SEQ ID NOs 98-115, 140-151, 186-202, 235-250, 287-304, 345-364, and 564.
122. The system of any one of claims 117-121, wherein the RGN polypeptide of the fusion protein is a type II CRISPR-Cas polypeptide or a type V CRISPR-Cas polypeptide.
123. The system of any one of claims 117-122, wherein the RGN polypeptide is Cas9, casX, casY, cpfl, C2cl, C2, C2C3, geoCas9, cjCas9, casl2a, casl2b, casl2g, casl2h, casl2i, casl3b, casl3C, casl3d, casl4, csn2, xCas9, spCas9-NG, lbcast 2a, ascas 2a, cas9-KKH, circularly arranged Cas9, argonaute (Ago), smacas 9, spy-macCas9 domain, or RGN having the amino acid sequence set forth in any one of SEQ ID NOs 41, 60, 366, or 368.
124. The system of claim 123, wherein the RGN polypeptide is a nicking enzyme.
125. The system of claim 124, wherein the nicking enzyme has an amino acid sequence that has at least 95% sequence identity to any one of SEQ ID NOs 42, 52-59, 61, 397, and 398.
126. A pharmaceutical composition comprising a pharmaceutically acceptable carrier and the fusion protein of any one of claims 101-106, the nucleic acid molecule of any one of claims 1107-112, the vector of any one of claims 113-114, the RNP complex of claim 115, the cell of claim 116, or the system of any one of claims 117-125.
127. A method for modifying a target DNA molecule comprising a target sequence, comprising:
a) Assembling the RGN deaminase ribonucleotide complex under conditions suitable for forming the RGN deaminase ribonucleotide complex by combining:
i) One or more guide RNAs capable of hybridizing to a DNA sequence of interest; and
ii) a fusion protein comprising an RNA guided nuclease polypeptide (RGN) and at least one deaminase, wherein the deaminase has an amino acid sequence with at least 90% sequence identity to SEQ ID NO 407; and
b) Contacting the target DNA molecule or a cell comprising the target DNA molecule with an assembled RGN deaminase ribonucleotide complex;
wherein the one or more guide RNAs hybridizes to the target DNA sequence, thereby directing the fusion protein to bind to the target DNA sequence and modification of the target DNA molecule occurs.
128. The method of claim 127, wherein the target DNA sequence comprises a nucleic acid sequence selected from the group consisting of SEQ ID NOs 62-97, 116-139, 152-185, 203-234, 251-286, 305-344, 562 and 563 or a complement thereof.
129. The method of any one of claims 127-128, wherein the gRNA sequence comprises a nucleic acid sequence selected from the group consisting of SEQ ID NOs 98-115, 140-151, 186-202, 235-250, 287-304, 345-364, and 564.
130. The method of any one of claims 127-129, wherein the method is performed in vitro, in vivo, or ex vivo.
131. A method of treating a subject having or at risk of developing a disease, disorder or condition, the method comprising:
administering to the subject the fusion protein of any one of claims 101-106, the nucleic acid molecule of any one of claims 1107-112, the vector of any one of claims 113-114, the RNP complex of claim 115, the cell of claim 116, the system of any one of claims 117-125, or the pharmaceutical composition of claim 126.
132. The method of claim 131, further comprising administering any one of the grnas comprising a nucleic acid sequence selected from the group consisting of SEQ ID NOs 98-115, 140-151, 186-202, 235-250, 287-304, 345-364, and 564.
133. An isolated polypeptide comprising an amino acid sequence having at least 90% sequence identity to SEQ ID No. 405, wherein said polypeptide has deaminase activity.
134. The isolated polypeptide of claim 133, wherein the polypeptide comprises the amino acid sequence set forth in SEQ ID No. 405.
135. A nucleic acid molecule comprising a polynucleotide encoding a deaminase polypeptide, wherein the deaminase is encoded by a nucleotide sequence that:
a) Has at least 80% sequence identity to SEQ ID NO 449, or
b) Encoding an amino acid sequence having at least 90% sequence identity to any one of SEQ ID NOs 405.
136. The nucleic acid molecule of claim 135, wherein the nucleic acid molecule further comprises a heterologous promoter operably linked to the polynucleotide.
137. A pharmaceutical composition comprising a pharmaceutically acceptable carrier and the polypeptide of any one of claims 133-134, or the nucleic acid molecule of any one of claims 135-136.
138. A fusion protein comprising a DNA binding polypeptide and a deaminase having at least 90% sequence identity with SEQ ID No. 405.
139. The fusion protein of claim 138, wherein the DNA-binding polypeptide is an RNA-guided nuclease (RGN) polypeptide.
140. The fusion protein of claim 139, wherein the RGN polypeptide is a type II CRISPR-Cas polypeptide or a type V CRISPR-Cas polypeptide.
141. The fusion protein of any one of claims 138-140, wherein the RGN polypeptide is Cas9, casX, casY, cpfl, C cl, C2, C2C3, geoCas9, cjCas9, casl2a, casl2b, casl2g, casl2h, casl2i, casl3b, casl3C, casl3d, casl4, csn2, xCas9, spCas9-NG, lbcast 2a, asCasl2a, cas9-KKH, circularly arranged Cas9, argonaute (Ago), smacas 9, spy-macCas9 domain, or an RGN polypeptide having the amino acid sequence set forth in any one of SEQ ID NOs 41, 60, 366, or 368.
142. The fusion protein of any one of claims 139-141, wherein the RGN polypeptide is a nicking enzyme.
143. The fusion protein of claim 142, wherein the nicking enzyme has an amino acid sequence having at least 95% sequence identity to any one of SEQ ID NOs 42, 52-59, 61, 397 and 398.
144. A nucleic acid molecule comprising a polynucleotide encoding a fusion protein comprising a DNA-binding polypeptide and a deaminase, wherein the deaminase is encoded by a nucleotide sequence that:
a) Has at least 80% sequence identity to SEQ ID NO 449, or
b) Encodes an amino acid sequence having at least 90% sequence identity to SEQ ID NO. 405.
145. The nucleic acid molecule of claim 144, wherein the DNA-binding polypeptide is an RGN polypeptide.
146. The nucleic acid molecule of claim 145, wherein the RGN is a type II CRISPR-Cas polypeptide or a type V CRISPR-Cas polypeptide.
147. The nucleic acid molecule of any one of claims 144-146, wherein the RGN polypeptide is Cas9, casX, casY, cpfl, C cl, C2, C2C3, geoCas9, cjCas9, casl2a, casl2b, casl2g, casl2h, casl2i, casl3b, casl3C, casl3d, casl4, csn2, xCas9, spCas9-NG, lbcas 2a, asCasl2a, cas9-KKH, circularly arranged Cas9, argonaute (Ago), smacas 9, spy-macCas9 domain, or an RGN polypeptide having the amino acid sequence set forth in any one of SEQ ID NOs 41, 60, 366, or 368.
148. The nucleic acid molecule of any one of claims 145-147, wherein the RGN polypeptide is a nicking enzyme.
149. The nucleic acid molecule of claim 148, wherein the nicking enzyme has an amino acid sequence that has at least 95% sequence identity to any one of SEQ ID NOs 42, 52-59, 61, 397 and 398.
150. A vector comprising the nucleic acid molecule of any one of claims 144-149.
151. The vector of claim 150, further comprising at least one nucleotide sequence encoding a guide RNA (gRNA) capable of hybridizing to a target sequence.
152. A Ribonucleoprotein (RNP) complex comprising the fusion protein of any one of claims 138-141 and the guide RNA of the DNA-binding polypeptide bound to the fusion protein.
153. A cell comprising the fusion protein of any one of claims 138-143, the nucleic acid molecule of any one of claims 144-149, the vector of any one of claims 150-151, or the RNP complex of claim 152.
154. A system for modifying a target DNA molecule comprising a target DNA sequence, the system comprising:
a) A fusion protein comprising an RNA-guided nuclease (RGN) polypeptide and a deaminase, or a nucleotide sequence encoding said fusion protein, wherein the deaminase has an amino acid sequence with at least 90% sequence identity to SEQ ID No. 405; and
b) One or more guide RNAs or one or more nucleotide sequences encoding the one or more guide RNAs (grnas) capable of hybridizing to the target DNA sequence; and
wherein the one or more guide RNAs are capable of forming a complex with the fusion protein to direct the fusion protein to bind to the target DNA sequence and modify the target DNA molecule.
155. The system of claim 154, wherein at least one of the nucleotide sequence encoding one or more guide RNAs and the nucleotide sequence encoding a fusion protein is operably linked to a promoter heterologous to the nucleotide sequence.
156. The system of any one of claims 154-155, wherein the target DNA sequence is positioned adjacent to a pre-spacer adjacent motif (PAM) that is recognized by the RGN polypeptide.
157. The system of any one of claims 154-156, wherein the target DNA sequence comprises a nucleic acid sequence selected from the group consisting of SEQ ID NOs 62-97, 116-139, 152-185, 203-234, 251-286, 305-344, 562, and 563, or a complement thereof.
158. The system of any one of claims 154-157, wherein the gRNA sequence comprises a nucleic acid sequence selected from the group consisting of SEQ ID NOs 98-115, 140-151, 186-202, 235-250, 287-304, 345-364, and 564.
159. The system of any of claims 154-158, wherein the RGN polypeptide of the fusion protein is a type II CRISPR-Cas polypeptide or a type V CRISPR-Cas polypeptide.
160. The system of any one of claims 154-159, wherein the RGN polypeptide is Cas9, casX, casY, cpfl, C2cl, C2, C2C3, geoCas9, cjCas9, casl2a, casl2b, casl2g, casl2h, casl2i, casl3b, casl3C, casl3d, casl4, csn2, xCas9, spCas9-NG, lbcast 2a, asCasl2a, cas9-KKH, circularly permuted Cas9, argonaute (Ago), smacas 9, spy-macCas9 domain, or RGN having the amino acid sequence set forth in any one of SEQ ID NOs: 41, 60, 366, or 368.
161. The system of claim 160, wherein the RGN polypeptide is a nicking enzyme.
162. The system of claim 161, wherein the nicking enzyme has an amino acid sequence that has at least 95% sequence identity to any one of SEQ ID NOs 42, 52-59, 61, 397, and 398.
163. A pharmaceutical composition comprising a pharmaceutically acceptable carrier and the fusion protein of any one of claims 138-143, the nucleic acid molecule of any one of claims 144-149, the vector of any one of claims 150-151, the RNP complex of claim 152, the cell of claim 153, or the system of any one of claims 154-162.
164. A method for modifying a target DNA molecule comprising a target sequence, comprising:
a) Assembling the RGN deaminase ribonucleotide complex under conditions suitable for forming the RGN deaminase ribonucleotide complex by combining:
i) One or more guide RNAs capable of hybridizing to a DNA sequence of interest; and
ii) a fusion protein comprising an RNA guided nuclease polypeptide (RGN) and at least one deaminase, wherein the deaminase has an amino acid sequence with at least 90% sequence identity to SEQ ID NO. 405; and
b) Contacting the target DNA molecule or a cell comprising the target DNA molecule with an assembled RGN deaminase ribonucleotide complex;
wherein the one or more guide RNAs hybridizes to the target DNA sequence, thereby directing the fusion protein to bind to the target DNA sequence and modification of the target DNA molecule occurs.
165. The method of claim 164, wherein the target DNA sequence comprises a nucleic acid sequence selected from the group consisting of SEQ ID NOs 62-97, 116-139, 152-185, 203-234, 251-286, 305-344, 562 and 563 or a complement thereof.
166. The method of any one of claims 164-165, wherein the gRNA sequence comprises a nucleic acid sequence selected from the group consisting of SEQ ID NOs 98-115, 140-151, 186-202, 235-250, 287-304, 345-364, and 564.
167. The method of any one of claims 164-166, wherein the method is performed in vitro, in vivo, or ex vivo.
168. A method of treating a subject having or at risk of developing a disease, disorder or condition, the method comprising:
administering to the subject the fusion protein of any one of claims 138-143, the nucleic acid molecule of any one of claims 144-149, the vector of any one of claims 150-151, the RNP complex of claim 152, the cell of claim 153, the system of any one of claims 154-162, or the pharmaceutical composition of claim 163.
169. The method of claim 168, further comprising administering any one of the grnas comprising a nucleic acid sequence selected from the group consisting of SEQ ID NOs 98-115, 140-151, 186-202, 235-250, 287-304, 345-364, and 564.
170. An isolated polypeptide comprising an amino acid sequence having at least 90% sequence identity to SEQ ID No. 399, wherein said polypeptide has deaminase activity.
171. The isolated polypeptide of claim 170, wherein the polypeptide comprises the amino acid sequence set forth in SEQ ID No. 399.
172. A nucleic acid molecule comprising a polynucleotide encoding a deaminase polypeptide, wherein the deaminase is encoded by a nucleotide sequence that:
a) Has at least 80% sequence identity to SEQ ID NO 443, or
b) Encodes an amino acid sequence that has at least 90% sequence identity to any one of SEQ ID NOs 399.
173. The nucleic acid molecule of claim 172, wherein the nucleic acid molecule further comprises a heterologous promoter operably linked to the polynucleotide.
174. A pharmaceutical composition comprising a pharmaceutically acceptable carrier and the polypeptide of any one of claims 170-171 or the nucleic acid molecule of any one of claims 172-173.
175. A fusion protein comprising a DNA binding polypeptide and a deaminase having at least 90% sequence identity to SEQ ID No. 399.
176. The fusion protein of claim 175, wherein the DNA-binding polypeptide is an RNA-guided nuclease (RGN) polypeptide.
177. The fusion protein of claim 176, wherein the RGN polypeptide is a type II CRISPR-Cas polypeptide or a type V CRISPR-Cas polypeptide.
178. The fusion protein of any one of claims 175-177, wherein the RGN polypeptide is Cas9, casX, casY, cpfl, C cl, C2, C2C3, geoCas9, cjCas9, casl2a, casl2b, casl2g, casl2h, casl2i, casl3b, casl3C, casl3d, casl4, csn2, xCas9, spCas9-NG, lbcast 2a, asCasl2a, cas9-KKH, circularly arranged Cas9, argonaute (Ago), smacas 9, spy-macCas9 domain, or an RGN polypeptide having the amino acid sequence set forth in any one of SEQ ID NOs 41, 60, 366, or 368.
179. The fusion protein of any one of claims 176-178, wherein the RGN polypeptide is a nicking enzyme.
180. The fusion protein of claim 179, wherein the nicking enzyme has an amino acid sequence that has at least 95% sequence identity to any one of SEQ ID NOs 42, 52-59, 61, 397 and 398.
181. A nucleic acid molecule comprising a polynucleotide encoding a fusion protein comprising a DNA-binding polypeptide and a deaminase, wherein the deaminase is encoded by a nucleotide sequence that:
a) Has at least 80% sequence identity to SEQ ID NO 443, or
b) Encodes an amino acid sequence that has at least 90% sequence identity to SEQ ID NO 399.
182. The nucleic acid molecule of claim 181, wherein the DNA-binding polypeptide is an RGN polypeptide.
183. The nucleic acid molecule of claim 182, wherein the RGN is a type II CRISPR-Cas polypeptide or a type V CRISPR-Cas polypeptide.
184. The nucleic acid molecule of any one of claims 181-183, wherein the RGN polypeptide is Cas9, casX, casY, cpfl, C cl, C2, C2C3, geoCas9, cjCas9, casl2a, casl2b, casl2g, casl2h, casl2i, casl3b, casl3C, casl3d, casl4, csn2, xCas9, spCas9-NG, lbcas 2a, asCasl2a, cas9-KKH, circularly arranged Cas9, argonaute (Ago), smacas 9, spy-macCas9 domain, or an RGN polypeptide having the amino acid sequence set forth in any one of SEQ ID NOs 41, 60, 366, or 368.
185. The nucleic acid molecule of any one of claims 182-184, wherein the RGN polypeptide is a nicking enzyme.
186. The nucleic acid molecule of claim 185, wherein the nicking enzyme has an amino acid sequence that has at least 95% sequence identity to any one of SEQ ID NOs 42, 52-59, 61, 397 and 398.
187. A vector comprising the nucleic acid molecule of any one of claims 181-186.
188. The vector of claim 187, further comprising at least one nucleotide sequence encoding a guide RNA (gRNA) capable of hybridizing to a target sequence.
189. A Ribonucleoprotein (RNP) complex comprising the fusion protein of any one of claims 175-180 and the guide RNA of the DNA-binding polypeptide bound to the fusion protein.
190. A cell comprising the fusion protein of any one of claims 175-180, the nucleic acid molecule of any one of claims 181-186, the vector of any one of claims 187-188, or the RNP complex of claim 189.
191. A system for modifying a target DNA molecule comprising a target DNA sequence, the system comprising:
a) A fusion protein comprising an RNA-guided nuclease (RGN) polypeptide and a deaminase, or a nucleotide sequence encoding said fusion protein, wherein the deaminase has an amino acid sequence with at least 90% sequence identity to SEQ ID No. 399; and
b) One or more guide RNAs or one or more nucleotide sequences encoding the one or more guide RNAs (grnas) capable of hybridizing to the target DNA sequence; and
wherein the one or more guide RNAs are capable of forming a complex with the fusion protein so as to direct the fusion protein to bind to the target DNA sequence and modify the target DNA molecule.
192. The system of claim 191, wherein at least one of the nucleotide sequence encoding one or more guide RNAs and the nucleotide sequence encoding a fusion protein is operably linked to a promoter heterologous to the nucleotide sequence.
193. The system of any one of claims 191-192, wherein the target DNA sequence is positioned adjacent to a pre-spacer adjacent motif (PAM) that is recognized by the RGN polypeptide.
194. The system of any one of claims 191-193, wherein the target DNA sequence comprises a nucleic acid sequence selected from the group consisting of SEQ ID NOs 62-97, 116-139, 152-185, 203-234, 251-286, 305-344, 562, and 563, or a complement thereof.
195. The system of any one of claims 191-194, wherein the gRNA sequence comprises a nucleic acid sequence selected from the group consisting of SEQ ID NOs 98-115, 140-151, 186-202, 235-250, 287-304, 345-364, and 564.
196. The system of any one of claims 191-195, wherein the RGN polypeptide of the fusion protein is a type II CRISPR-Cas polypeptide or a type V CRISPR-Cas polypeptide.
197. The system of any one of claims 191-196, wherein the RGN polypeptide is Cas9, casX, casY, cpfl, C cl, C2, C2C3, geoCas9, cjCas9, casl2a, casl2b, casl2g, casl2h, casl2i, casl3b, casl3C, casl3d, casl4, csn2, xCas9, spCas9-NG, lbcast 2a, ascas 2a, cas9-KKH, circularly arranged Cas9, argonaute (Ago), smacas 9, spy-macCas9 domain, or RGN having the amino acid sequence set forth in any one of SEQ ID NOs 41, 60, 366, or 368.
198. The system of claim 197, wherein the RGN polypeptide is a nicking enzyme.
199. The system of claim 198, wherein the nicking enzyme has an amino acid sequence that has at least 95% sequence identity to any one of SEQ ID NOs 42, 52-59, 61, 397, and 398.
200. A pharmaceutical composition comprising a pharmaceutically acceptable carrier and the fusion protein of any one of claims 175-180, the nucleic acid molecule of any one of claims 181-186, the vector of any one of claims 187-188, the RNP complex of claim 189, the cell of claim 190, or the system of any one of claims 191-199.
201. A method for modifying a target DNA molecule comprising a target sequence, comprising:
a) Assembling the RGN deaminase ribonucleotide complex under conditions suitable for forming the RGN deaminase ribonucleotide complex by combining:
i) One or more guide RNAs capable of hybridizing to the target DNA sequence; and
ii) a fusion protein comprising an RNA-guided nuclease polypeptide (RGN) and at least one deaminase, wherein the deaminase has an amino acid sequence with at least 90% sequence identity to SEQ ID No. 399; and
b) Contacting the target DNA molecule or a cell comprising the target DNA molecule with an assembled RGN deaminase ribonucleotide complex;
wherein the one or more guide RNAs hybridizes to the target DNA sequence, thereby directing the fusion protein to bind to the target DNA sequence and modification of the target DNA molecule occurs.
202. The method of claim 201, wherein the target DNA sequence comprises a nucleic acid sequence selected from the group consisting of SEQ ID NOs 62-97, 116-139, 152-185, 203-234, 251-286, 305-344, 562 and 563 or a complement thereof.
203. The method of any one of claims 201-202, wherein the gRNA sequence comprises a nucleic acid sequence selected from the group consisting of SEQ ID NOs 98-115, 140-151, 186-202, 235-250, 287-304, 345-364, and 564.
204. The method of any one of claims 201-203, wherein the method is performed in vitro, in vivo, or ex vivo.
205. A method of treating a subject having or at risk of developing a disease, disorder or condition, the method comprising:
administering to the subject the fusion protein of any one of claims 175-180, the nucleic acid molecule of any one of claims 181-186, the vector of any one of claims 187-188, the RNP complex of claim 189, the cell of claim 190, the system of any one of claims 191-199, or the pharmaceutical composition of claim 200.
206. The method of claim 205, further comprising administering any one of the grnas comprising a nucleic acid sequence selected from the group consisting of SEQ ID NOs 98-115, 140-151, 186-202, 235-250, 287-304, 345-364, and 564.
207. A method for producing a treatment or reduction of at least one symptom of cystic fibrosis, the method comprising administering to a subject in need thereof an effective amount of:
a) A fusion protein comprising an RNA-guided nuclease polypeptide (RGN) and a deaminase or a polynucleotide encoding said fusion protein, wherein the deaminase has a sequence identical to SEQ ID NO: 407. 405, 399, 1-10, 400-404, 406, and 408-441, wherein said polynucleotide encoding the fusion protein is operably linked to a promoter such that the fusion protein is expressed in the cell; and
b) One or more guide RNAs (grnas) capable of hybridizing to a DNA sequence of interest or polynucleotides encoding the grnas, wherein the polynucleotides encoding the grnas are operably linked to a promoter such that the grnas are expressed in the cell;
whereby the fusion protein and the gRNA target the genomic position of the causal mutation and modify the genomic sequence to remove the causal mutation.
208. The method of claim 207, wherein the gRNA comprises a spacer sequence or a complement thereof that targets any one of SEQ ID NOs 62-97, 116-139, 152-185, 203-234, 251-286, 305-344, 562, and 563.
209. The method of claim 207 or 208, wherein the gRNA comprises any one of SEQ ID NOs 98-115, 140-151, 186-202, 235-250, 287-304, 345-364, and 564.
210. The method of any one of claims 207-209, wherein the RGN has an amino acid sequence with at least 90% sequence identity to any one of SEQ ID NOs 41, 60, 366 and 368.
211. The method of any one of claims 207-209 wherein the RGN has an amino acid sequence with at least 90% sequence identity to any one of SEQ ID NOs 42, 52-59, 61, 397 and 398.
CN202180075570.5A 2020-09-11 2021-09-10 DNA modifying enzyme, active fragment and variant thereof and using method Pending CN116635524A (en)

Applications Claiming Priority (4)

Application Number Priority Date Filing Date Title
US63/077,089 2020-09-11
US202163146840P 2021-02-08 2021-02-08
US63/146,840 2021-02-08
PCT/US2021/049853 WO2022056254A2 (en) 2020-09-11 2021-09-10 Dna modifying enzymes and active fragments and variants thereof and methods of use

Publications (1)

Publication Number Publication Date
CN116635524A true CN116635524A (en) 2023-08-22

Family

ID=87590679

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202180075570.5A Pending CN116635524A (en) 2020-09-11 2021-09-10 DNA modifying enzyme, active fragment and variant thereof and using method

Country Status (1)

Country Link
CN (1) CN116635524A (en)

Similar Documents

Publication Publication Date Title
EP3902911B1 (en) Polypeptides useful for gene editing and methods of use
US11162114B2 (en) RNA-guided nucleases and active fragments and variants thereof and methods of use
WO2022056254A2 (en) Dna modifying enzymes and active fragments and variants thereof and methods of use
WO2021217002A1 (en) Rna-guided nucleases and active fragments and variants thereof and methods of use
KR20230049100A (en) Uracil stabilizing protein and active fragments and variants thereof and methods of use
CN117295817A (en) DNA modifying enzymes and active fragments and variants thereof and methods of use
CA3125175A1 (en) Polypeptides useful for gene editing and methods of use
CN116635524A (en) DNA modifying enzyme, active fragment and variant thereof and using method
US11981940B2 (en) DNA modifying enzymes and active fragments and variants thereof and methods of use
WO2024095245A2 (en) Evolved adenine deaminases and rna-guided nuclease fusion proteins with internal insertion sites and methods of use
WO2023139557A1 (en) Rna-guided nucleases and active fragments and variants thereof and methods of use
WO2021231437A1 (en) Rna-guided nucleic acid binding proteins and active fragments and variants thereof and methods of use
CA3173953A1 (en) Rna polymerase iii promoters and methods of use

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination