WO2021081384A1 - Synthetic nucleases - Google Patents

Synthetic nucleases Download PDF

Info

Publication number
WO2021081384A1
WO2021081384A1 PCT/US2020/057141 US2020057141W WO2021081384A1 WO 2021081384 A1 WO2021081384 A1 WO 2021081384A1 US 2020057141 W US2020057141 W US 2020057141W WO 2021081384 A1 WO2021081384 A1 WO 2021081384A1
Authority
WO
WIPO (PCT)
Prior art keywords
seq
nuclease
polynucleotide
acid sequence
amino acid
Prior art date
Application number
PCT/US2020/057141
Other languages
French (fr)
Inventor
Shiv B. Tiwari
Arianne Tremblay
Original Assignee
Greenvenus, Llc
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Greenvenus, Llc filed Critical Greenvenus, Llc
Publication of WO2021081384A1 publication Critical patent/WO2021081384A1/en

Links

Classifications

    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12NMICROORGANISMS OR ENZYMES; COMPOSITIONS THEREOF; PROPAGATING, PRESERVING, OR MAINTAINING MICROORGANISMS; MUTATION OR GENETIC ENGINEERING; CULTURE MEDIA
    • C12N15/00Mutation or genetic engineering; DNA or RNA concerning genetic engineering, vectors, e.g. plasmids, or their isolation, preparation or purification; Use of hosts therefor
    • C12N15/09Recombinant DNA-technology
    • C12N15/10Processes for the isolation, preparation or purification of DNA or RNA
    • C12N15/102Mutagenizing nucleic acids
    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12NMICROORGANISMS OR ENZYMES; COMPOSITIONS THEREOF; PROPAGATING, PRESERVING, OR MAINTAINING MICROORGANISMS; MUTATION OR GENETIC ENGINEERING; CULTURE MEDIA
    • C12N15/00Mutation or genetic engineering; DNA or RNA concerning genetic engineering, vectors, e.g. plasmids, or their isolation, preparation or purification; Use of hosts therefor
    • C12N15/09Recombinant DNA-technology
    • C12N15/63Introduction of foreign genetic material using vectors; Vectors; Use of hosts therefor; Regulation of expression
    • C12N15/79Vectors or expression systems specially adapted for eukaryotic hosts
    • C12N15/82Vectors or expression systems specially adapted for eukaryotic hosts for plant cells, e.g. plant artificial chromosomes (PACs)
    • C12N15/8201Methods for introducing genetic material into plant cells, e.g. DNA, RNA, stable or transient incorporation, tissue culture methods adapted for transformation
    • C12N15/8213Targeted insertion of genes into the plant genome by homologous recombination
    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12NMICROORGANISMS OR ENZYMES; COMPOSITIONS THEREOF; PROPAGATING, PRESERVING, OR MAINTAINING MICROORGANISMS; MUTATION OR GENETIC ENGINEERING; CULTURE MEDIA
    • C12N15/00Mutation or genetic engineering; DNA or RNA concerning genetic engineering, vectors, e.g. plasmids, or their isolation, preparation or purification; Use of hosts therefor
    • C12N15/09Recombinant DNA-technology
    • C12N15/87Introduction of foreign genetic material using processes not otherwise provided for, e.g. co-transformation
    • C12N15/90Stable introduction of foreign DNA into chromosome
    • C12N15/902Stable introduction of foreign DNA into chromosome using homologous recombination
    • C12N15/907Stable introduction of foreign DNA into chromosome using homologous recombination in mammalian cells
    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12NMICROORGANISMS OR ENZYMES; COMPOSITIONS THEREOF; PROPAGATING, PRESERVING, OR MAINTAINING MICROORGANISMS; MUTATION OR GENETIC ENGINEERING; CULTURE MEDIA
    • C12N9/00Enzymes; Proenzymes; Compositions thereof; Processes for preparing, activating, inhibiting, separating or purifying enzymes
    • C12N9/14Hydrolases (3)
    • C12N9/16Hydrolases (3) acting on ester bonds (3.1)
    • C12N9/22Ribonucleases RNAses, DNAses
    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12YENZYMES
    • C12Y301/00Hydrolases acting on ester bonds (3.1)
    • CCHEMISTRY; METALLURGY
    • C07ORGANIC CHEMISTRY
    • C07KPEPTIDES
    • C07K2319/00Fusion polypeptide
    • C07K2319/01Fusion polypeptide containing a localisation/targetting motif
    • C07K2319/09Fusion polypeptide containing a localisation/targetting motif containing a nuclear localisation signal
    • CCHEMISTRY; METALLURGY
    • C07ORGANIC CHEMISTRY
    • C07KPEPTIDES
    • C07K2319/00Fusion polypeptide
    • C07K2319/40Fusion polypeptide containing a tag for immunodetection, or an epitope for immunisation
    • C07K2319/42Fusion polypeptide containing a tag for immunodetection, or an epitope for immunisation containing a HA(hemagglutinin)-tag
    • CCHEMISTRY; METALLURGY
    • C07ORGANIC CHEMISTRY
    • C07KPEPTIDES
    • C07K2319/00Fusion polypeptide
    • C07K2319/80Fusion polypeptide containing a DNA binding domain, e.g. Lacl or Tet-repressor
    • C07K2319/81Fusion polypeptide containing a DNA binding domain, e.g. Lacl or Tet-repressor containing a Zn-finger domain for DNA binding
    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12NMICROORGANISMS OR ENZYMES; COMPOSITIONS THEREOF; PROPAGATING, PRESERVING, OR MAINTAINING MICROORGANISMS; MUTATION OR GENETIC ENGINEERING; CULTURE MEDIA
    • C12N2310/00Structure or type of the nucleic acid
    • C12N2310/10Type of nucleic acid
    • C12N2310/20Type of nucleic acid involving clustered regularly interspaced short palindromic repeats [CRISPRs]

Definitions

  • CRISPR Technology was a big step forward in gene editing capability.
  • CRISPR (clustered regularly interspaced short palindromic repeats) refers to the system by which an enzyme (Cas) is able to target and modify a genetic sequence in the DNA of interest. Since its discovery in 2012, numerous biotechnology companies have introduced CRISPR technology into their research platform. Some companies have emerged that specialize in improving CRISPR technology and providing services.
  • Zinc-finger, meganucleases and TALEN nucleases entered the market as tools for genome modification. Due to this sequential progress in genome editing area and prior art around it, several companies have entered the genome editing market with their own nucleases.
  • CRISPR/cas system is still far from perfect. It has low editing efficiency (often ⁇ 1%) and it produces many off-target effects, which has raised concerns over its use in health industries.
  • many genome editing companies are now racing to improve the system and attempting to build novel genome editing tools which have reduced off-target effects and increased editing efficiency.
  • the invention provides a polynucleotide encoding a synthetic nuclease comprising an amino acid sequence of SEQ ID NO:28 or SEQ ID NO:31.
  • the polynucleotide comprises a nucleic acid sequence of SEQ ID NO:27 or SEQ ID NO:30.
  • the polynucleotide comprises a nucleic acid sequence of SEQ ID NO:30.
  • the polynucleotide further comprises a nucleic acid encoding at least one nuclear localization sequence (NLS), such as but not limited to an NLS comprising the nucleic acid sequence of SEQ ID NO:7 or SEQ ID NO:8.
  • NLS nuclear localization sequence
  • the polynucleotide of any the invention also comprises a nucleic acid encoding a tag polypeptide, such as, but not limited to one that encodes the amino acid sequence of SEQ ID NO:9.
  • the polynucleotide encodes a polypeptide having the amino acid sequence of SEQ ID NO:2 or SEQ ID NO:4.
  • the polynucleotide comprises the nucleic acid sequence of SEQ ID NO: 1 or SEQ ID NO:3.
  • the invention provides a synthetic DNA nuclease comprising a DNA binding domain of Cpfl and a nuclease domain of MAD7, or a DNA binding domain of MAD7 and a nuclease domain of Cpfl.
  • the synthetic DNA nuclease comprises a DNA binding domain of Cpfl.
  • the DNA binding domain comprises the amino acid sequence of SEQ ID NO:31.
  • the nuclease domain of Cpfl comprises the amino acid sequence of SEQ ID NO:34.
  • the DNA binding domain of MAD7 comprises the amino acid sequence of SEQ ID NO:33.
  • the nuclease domain of MAD7 comprises the amino acid sequence of SEQ ID NO:32.
  • the nuclease comprises the amino acid sequence of SEQ ID NO:28.
  • the nuclease comprises the amino acid sequence of SEQ ID NO:31.
  • the nuclease comprises the amino acid sequence of SEQ ID NO:2. In other specific embodiments, the nuclease comprises the amino acid sequence of SEQ ID NO: 4.
  • the invention also provides a method of modifying a target locus of interest comprising delivering to the locus a non-naturally occurring composition comprising a synthetic effector protein and one or more nucleic acid components, wherein at least the one or more nucleic acid components is engineered and the effector protein forms a complex with the one or more nucleic acid components and upon binding of the said complex to the target locus of interest, the effector protein induces a modification of the target locus of interest, wherein the synthetic effector protein comprises a DNA binding domain of MAD7 or Cpfl and a nuclease domain of a heterologous nuclease.
  • the effector protein comprises a DNA binding domain of MAD7 operatively linked to a nuclease domain of Cpfl. In some embodiments, the effector protein comprises a DNA binding domain of Cpfl operatively linked to a nuclease domain of MAD7. In certain embodiments, the effector protein comprises a MAD7 DNA binding domain comprising the amino acid sequence of SEQ ID NO:34. In certain embodiments, the effector protein comprises a Cpfl DNA binding domain comprising the amino acid sequence of SEQ ID NO:32.
  • the DNA binding domain further comprises at least one NLS.
  • the NLS may have the nucleic acid sequence of, for example, one or more of SEQ ID NO:7, SEQ ID NO:8, SEQ ID NO:36, SEQ ID NO:37, SEQ ID NO:38, or SEQ ID NO:39.
  • the DNA binding domain comprises the amino acid sequence of SEQ ID NO:41. In other embodiments, the DNA binding domain comprises the amino acid sequence of SEQ ID NO:43.
  • the method of the invention may also be practice in which the DNA binding domain further comprises a molecular tag.
  • the molecular tag is an HA-tag, for example comprising the sequence of SEQ ID NO:53.
  • the method uses a DNA binding domain that comprises the amino acid sequence of SEQ ID NO:47 or SEQ ID NO:51.
  • the nuclease domain comprises an amino acid sequence of SEQ ID NO:32.
  • the nuclease domain comprises an amino acid sequence of SEQ ID NO:33.
  • the nuclease domain further comprises at least one NLS, such as, but not limited to an NLS with the nucleic acid sequence of one or more of SEQ ID NO:7, SEQ ID NO:8, SEQ ID NO:36, SEQ ID NO:37, SEQ ID NO:38, or SEQ ID NO:39.
  • the nuclease domain comprises the amino acid sequence of SEQ ID NO:45. In other embodiments, the nuclease domain comprises the amino acid sequence of SEQ ID NO:49.
  • the nuclease domain may further comprise a molecular tag.
  • the molecular tag is an HA-tag, such as, but not limited to an HA-tag comprising the sequence of SEQ ID NO:53.
  • the nuclease domain comprises the amino acid sequence of SEQ ID NO:47.
  • the nuclease domain comprises the amino acid sequence of SEQ ID NO:51.
  • the target locus of interest comprises DNA.
  • the modification of the target locus of interest may comprise a strand break, which may be a single strand break or a double strand break.
  • the target locus of interest comprises a DNA molecule in vitro.
  • the target locus of interest such as a genomic locus, comprises a DNA molecule within a cell, such as, for example, a prokaryotic cell, a eukaryotic cell, or a plant cell.
  • the nucleic acid component(s) may comprise a putative CRISPR RNA (crRNA) sequence and not any putative trans-activating crRNA (tracrRNA) sequences.
  • crRNA putative CRISPR RNA
  • tracrRNA putative trans-activating crRNA
  • the effector protein and nucleic acid component(s) are provided via one or more polynucleotide molecules encoding the polypeptides and/or the nucleic acid component(s), and wherein the one or more polynucleotide molecules are operably configured to express the polypeptides and/or the nucleic acid component(s).
  • the one or more polynucleotide molecules comprise one or more regulatory elements operably configured to express the polypeptides and/or the nucleic acid component(s), optionally wherein the one or more regulatory elements comprise inducible promotors.
  • the one or more polynucleotide molecules are comprised within one or more vectors.
  • the polynucleotide may be delivered to the cell using liposome(s), particle(s), exosome(s), microvesicle(s), a gene-gun, a ribonucleoprotein complex, one or more viral vectors or by a serine recombinase delivery method.
  • the invention also provides a method of editing the genome of an organism comprising introducing into a cell of the organism a polynucleotide encoding a serine recombinase operably linked to a promoter that is active in the cell and an exogenous polynucleotide comprising a putative CRISPR RNA (crRNA) sequence and an attP or attB site.
  • crRNA putative CRISPR RNA
  • the exogenous polynucleotide further comprises an excision cassette comprising an excision gRNA operably clinked to an inducible promoter or gene switch and a termination sequence.
  • the invention also provides a method for gene editing in a cell comprising: a. transforming a host cell with a gene expression system comprising: i. a first constitutive promoter operably linked to at least one first polynucleotide encoding at least one gene editing guide RNA (gRNA) to target a gene within a host cell; ii. a second constitutive promoter operably linked to a second polynucleotide encoding a synthetic nuclease of the invention; iii.
  • gRNA gene editing guide RNA
  • a third polynucleotide encoding an excision gRNA operably linked to a fourth polynucleotide comprising a inducible gene regulator, wherein the inducible gene regulator is responsive to an activator that activates transcription of the excision gRNA; and iv. PAM sequences flanking the expression system that bind the excision gRNA; wherein the gene expression system comprises at least one att site at one end of the gene expression system; and wherein the excision gRNA is capable of excising the gene expression system by acting with the synthetic nuclease on the PAM sequences; b. allowing the gene editing gRNA and synthetic nuclease to edit at least one target gene in said host cell; c. contacting the host cell with an activator that induces said inducible gene regulator to express the excision gRNA, wherein the excision gRNA acts on the PAM sequences to excise the expression system from the host cell.
  • the activator may be a chemical ligand or an environmental stimulus.
  • the host cell may be a mammalian cell, a plant cell, a non-mammalian vertebrate cell or an invertebrate cell.
  • the gene expression system may be introduced into the cell by expressing a serine recombinase that acts on the att site on the gene expression system and a pseudosite in the host cell genome to insert the gene expression system.
  • the att site may be an attP site or an attB site.
  • the serine recombinase may be a Mycobacterium avium phage Bxbl serine recombinase, a Streptococcus pyogenes phage 370.1 serine recombinase, a Bacillus subtilis phage SP c2 serine recombinase, a Listeria monocytogenes phage A118 serine recombinase, or a Streptomyces phage ⁇ DC31 serine recombinase.
  • the serine recombinase is a Bacillus subtilis phage SP c2 serine recombinase.
  • the invention also provides a method of gene editing in a cell comprising: a. transforming a host cell with a gene expression system comprising: i. a first constitutive promoter operably linked to at least one first polynucleotide encoding at least one gene editing guide RNA (gRNA) to target a gene within a host cell; ii. a second constitutive promoter operably linked to a second polynucleotide encoding a synthetic nuclease provided herein; iii. at least one att site recognized by a serine recombinase at one end of the gene expression system; iv.
  • gRNA gene editing guide RNA
  • a third polynucleotide encoding a recombinase directionality factor (RDF) operably linked to a fourth polynucleotide comprising a inducible gene regulator, wherein said inducible gene regulator is responsive to an activator that activates transcription of the RDF; wherein said transformation of said cell is accomplished by co-introducing said gene expression system and a serine recombinase that recognizes said att site on the gene expression vector and a pseudosite in the genome of the host cell; wherein the RDF is capable of excising the gene expression system by acting with a cognate serine recombinase; b.
  • RDF recombinase directionality factor
  • the activator may be a chemical ligand or an environmental stimulus.
  • the host cell may be a mammalian cell, a plant cell, a non-mammalian vertebrate cell or an invertebrate cell.
  • the att site may be an attP site or an attB site.
  • the serine recombinase may be a Mycobacterium avium phage Bxbl serine recombinase, a Streptococcus pyogenes phage 370.1 serine recombinase, a Bacillus subtilis phage SP c2 serine recombinase, a Listeria monocytogenes phage All 8 serine recombinase, or a Streptomyces phage ⁇ FC31 serine recombinase. In specific embodiments, it is a Bacillus subtilis phage SP c2 serine recombinase.
  • the RDF may be a fusion protein of RDF and its cognate serine recombinase.
  • the serine recombinase is an SP c2 serine recombinase and the RDF comprises the amino acid sequence of SEQ ID NO:53 or SEQ ID NO:54.
  • FIG. 1 shows a map of the vector ID525 containing the SynNucl synthetic nuclease showing the hAsCpfl DNA binding domain fused to the MAD7 nuclease domain, an NLS on the amino terminal portion of the fusion, an NLS on the carboxy terminal portion of the fusion, and an HA-tag at the carboxy terminal portion of the fusion.
  • Amp ampicillin resistance marker
  • 35S pro 35S promoter sequence
  • crRNA crisper RNA sequence
  • AtU6 pro Arabidopsis thaliana small nuclear RNA (U6-26 snRNA) promoter sequence
  • DR direct repeat
  • LsPDS-G2 gRNA
  • rep origin replication origin for plasmid.
  • FIG. 2 shows a map of the vector ID526 containing the SynNuc2 synthetic nuclease showing the MAD7 DNA binding domain fused to the Cpfl nuclease domain, an NLS on the amino terminal portion of the fusion, an NLS on the carboxy terminal portion of the fusion, and an HA-tag at the carboxy terminal portion of the fusion.
  • Amp ampicillin resistance marker
  • 35 S pro 35S promoter sequence
  • crRNA crisper RNA sequence
  • AtU6 pro Arabidopsis thaliana small nuclear RNA (U6-26 snRNA) promoter sequence
  • DR direct repeat
  • LsPDS-G2 gRNA
  • rep origin replication origin for plasmid.
  • FIG. 3 shows a map of the vector ID524 containing the AsCpfl wild type nuclease showing the native hAsCpfl nuclease, an NLS on the amino terminal portion of the fusion, an NLS on the carboxy terminal portion of the fusion, and an HA-tag at the carboxy terminal portion of the fusion.
  • Amp ampicillin resistance marker
  • 35S pro 35S promoter sequence
  • crRNA crisper RNA sequence
  • AtU6 pro Arabidopsis thaliana small nuclear RNA (U6-26 snRNA) promoter sequence
  • DR direct repeat
  • LsPDS-G2 gRNA
  • rep origin replication origin for plasmid.
  • FIG. 4 shows a map of the vector ID536 containing the SynNucl synthetic nuclease.
  • FIG. 5 shows representative sequences of the PDS gene edited by the SynNucl synthetic nuclease.
  • the black arrow represents the gRNA sequence for the PDS gene and the top sequence is the wild type PDS sequence.
  • FIG. 6 shows the results of gene editing in a PDS gene. Nucleases used were those on vector ID414 and ID525 as shown in boxes for each.
  • FIG. 7 shows a schematic of a construct that may be used for constitutive gene editing with controllable self-removal of the construct from the chromosome. Shown is a gene expression system for genome editing and scarless excision of a transgene that had been integrated into the chromosome via a serine recombinase; Pseudo-ATTR, pseudo-ATTR site from serine recombinase-mediated integration; Pseudo-ATTL, pseudo-ATTL site from serine recombinase-mediated integration; Pro., promoter; NPTII, neomycin phosphotransferase; Ter, Terminator; Nuclease, gene editing endonuclease; gRNAs, gRNAs for constitutive expression and editing; Switch, controllable gene switch or inducible promoter; SR/RDF, serine recombinase and cognate Recombination Directionality Factor that directs excision of gene expression
  • FIG. 8A shows exemplary results of a gene editing experiment in the lettuce PDS gene by SynNucl and SynNuc2 as described in Table 3.
  • FIG. 8B shows exemplary results of a gene editing experiment in the lettuce PDS gene by MAD7 and two different constructs of Cpfl, as described in Table 3.
  • FIG. 9A shows exemplary results of a further gene editing experiment in the lettuce PDS gene by SynNucl and SynNuc2 as described in Table 3.
  • FIG. 9B shows exemplary results of a further gene editing experiment in the lettuce PDS gene by MAD7 and two different constructs of Cpfl, as described in Table 3.
  • FIG. 10A shows exemplary results of a gene editing experiment in the lettuce PPO- B, PPO-D, PPO-E, PPO-G, and PPO-S genes by SynNucl constructs containing either a CaMV 35S promoter, AtUBQlO promoter, or potato St-LSl IV2 intron, shown as Constructs 1-3 in Table 4.
  • FIG. 10B shows exemplary results of a further gene editing experiment in the lettuce PPO-B, PPO-D, PPO-E, PPO-G, and PPO-S genes by SynNucl constructs containing either a CaMV 35S promoter, AtUBQlO promoter, or potato St-LSl IV2 intron, shown as Constructs 1- 3 in Table 4.
  • FIG. IOC shows exemplary results of a gene editing experiment in the lettuce PPO- B, PPO-D, PPO-E, PPO-G, and PPO-S genes by SynNucl constructs containing either an AtUBQlO promoter and no intron or an AtUBQlO promoter and a potato St-LSl IV2 intron, shown as Constructs 3 and 4 in Table 4.
  • FIG. 11A shows exemplary results of a gene editing experiment in the lettuce PDS gene by Cpfl, with genomic DNA sequenced from calli developed from Cpfl -transfected protoplasts.
  • FIG. 11B shows the amino acid sequence encoded by the edited gene in FIG. 11 A, which contains a three-amino acid deletion in exon 3.
  • FIG. 12 shows a summary of the gene editing events and efficiency based on the results in FIGS. 11 A, 11B, and 13.
  • FIG. 13 shows exemplary results of a gene editing experiment in the lettuce PDS gene by SynNucl, with genomic DNA sequenced from calli developed from SynNucl - transfected protoplasts.
  • the invention relates to synthetic nucleases that may be used for gene editing.
  • the nucleases have DNA binding domains derived from one native nuclease and a nuclease domain derived from a different native nuclease. The two domains are fused together to form a synthetic or chimeric nuclease that has both DNA binding and nuclease functions that are active in plant, animal and bacterial cells.
  • All journal articles or other publications, patents and patent applications referred to herein are expressly incorporated by reference as if each individual journal article, publication, patent, or patent application was specifically and individually indicated to be incorporated by reference. In the event of a conflict between any disclosure in the present application, compared to a disclosure incorporated by reference, the disclosure in the present application controls.
  • substantially free means that a composition comprising “A” (where “A” is a single protein, DNA molecule, vector, recombinant host cell, etc.) is substantially free of “B” (where “B” comprises one or more contaminating proteins, DNA molecules, vectors, etc.) when at least about 75% by weight of the proteins, DNA, vectors (depending on the category of species to which A and B belong) in the composition is “A”.
  • “A” comprises at least about 90% by weight of the A+B species in the composition, most preferably at least about 99% by weight. It is also preferred that a composition, which is substantially free of contamination, contain only a single molecular weight species having the activity or characteristic of the species of interest.
  • isolated designates a biological material (e.g., nucleic acid or protein) that has been removed from its original environment (the environment in which it is naturally present).
  • a biological material e.g., nucleic acid or protein
  • a polynucleotide present in the natural state in a plant or an animal is not isolated.
  • the same polynucleotide is "isolated” if it is separated from the adjacent nucleic acids in which it is naturally present.
  • the term “purified” does not require the material to be present in a form exhibiting absolute purity, exclusive of the presence of other compounds. It is rather a relative definition.
  • a polynucleotide is in the “purified” state after purification of the starting material or of the natural material by at least one order of magnitude, preferably 2 or 3 and preferably 4 or 5 orders of magnitude.
  • nucleic acid or “polynucleotide” is a polymeric compound comprised of covalently linked subunits called nucleotides.
  • Nucleic acid includes polyribonucleic acid (RNA) and polydeoxyribonucleic acid (DNA), both of which may be single-stranded or double- stranded.
  • DNA includes but is not limited to cDNA, genomic DNA, plasmid DNA, synthetic DNA, and semi-synthetic DNA. DNA may be linear, circular, or supercoiled.
  • a “nucleic acid molecule” refers to the phosphate ester polymeric form of ribonucleosides (adenosine, guanosine, uridine or cytidine; “RNA molecules”) or deoxyribonucleosides (deoxyadenosine, deoxyguanosine, deoxythymidine, or deoxycytidine; “DNA molecules”), or any phosphoester anologs thereof, such as phosphorothioates and thioesters, in either single stranded form, or a double-stranded helix. Double stranded DNA- DNA, DNA-RNA and RNA-RNA helices are possible.
  • nucleic acid molecule refers only to the primary and secondary structure of the molecule, and does not limit it to any particular tertiary forms.
  • this term includes double- stranded DNA found, inter alia, in linear or circular DNA molecules (e.g., restriction fragments), plasmids, and chromosomes.
  • sequences may be described herein according to the normal convention of giving only the sequence in the 5' to 3' direction along the non-transcribed strand of DNA (i.e., the strand having a sequence homologous to the mRNA).
  • a “recombinant DNA molecule” is a DNA molecule that has undergone a molecular biological manipulation.
  • fragment when referring to a polynucleotide will be understood to mean a nucleotide sequence of reduced length relative to the reference nucleic acid and comprising, over the common portion, a nucleotide sequence identical to the reference nucleic acid.
  • a nucleic acid fragment according to the invention may be, where appropriate, included in a larger polynucleotide of which it is a constituent.
  • Such fragments comprise, or alternatively consist of, oligonucleotides ranging in length from at least 8, 10, 12, 15, 18, 20 to 25, 30, 40, 50, 70, 80, 100, 200, 500, 1000 or 1500 consecutive nucleotides of a nucleic acid according to the invention.
  • an “isolated nucleic acid fragment” is a polymer of RNA or DNA that is single- or double-stranded, optionally containing synthetic, non-natural or altered nucleotide bases.
  • An isolated nucleic acid fragment in the form of a polymer of DNA may be comprised of one or more segments of cDNA, genomic DNA or synthetic DNA.
  • a “gene” refers to an assembly of nucleotides that encode a polypeptide, and includes cDNA and genomic DNA nucleic acids. “Gene” also refers to a nucleic acid fragment that expresses a specific protein or polypeptide, optionally including regulatory sequences preceding (5' noncoding sequences) and following (3' non-coding sequences) the coding sequence. “Native gene” refers to a gene as found in nature with its own regulatory sequences. “Chimeric gene” refers to any gene that is not a native gene, comprising regulatory and/or coding sequences that are not found together in nature.
  • a chimeric gene may comprise regulatory sequences and coding sequences that are derived from different sources, or regulatory sequences and coding sequences derived from the same source, but arranged in a manner different than that found in nature.
  • a chimeric gene may comprise coding sequences derived from different sources and/or regulatory sequences derived from different sources.
  • Endogenous gene refers to a native gene in its natural location in the genome of an organism
  • a “foreign” gene or “heterologous” gene refers to a gene not normally found in the host organism, but that is introduced into the host organism by gene transfer.
  • Foreign genes can comprise native genes inserted into a non-native organism, or chimeric genes.
  • a “transgene” is a gene that has been introduced into the genome by a transformation procedure.
  • Heterologous DNA refers to DNA not naturally located in the cell, or in a chromosomal site of the cell.
  • the heterologous DNA includes a gene or polynucleotides foreign to the cell.
  • Transformation refers to the transfer of a nucleic acid fragment into the genome of a host organism, resulting in genetically stable inheritance. Host organisms containing the transformed nucleic acid fragments are referred to as “transgenic” or “recombinant” or “transformed” organisms.
  • “Inducible gene regulator” means a genetic element that is capable of regulating gene expression of a gene that is operatively linked to it in a controllable manner.
  • inducible gene regulators include, but are not limited to gene switches and inducible promoters.
  • Promoter refers to a DNA sequence capable of controlling the expression of a coding sequence or functional RNA.
  • a coding sequence is located 3' to a promoter sequence. Promoters may be derived in their entirety from a native gene, or be composed of different elements derived from different promoters found in nature, or even comprise synthetic DNA segments. It is understood by those skilled in the art that different promoters may direct the expression of a gene in different tissues or cell types, or at different stages of development, or in response to different environmental or physiological conditions. Promoters that cause a gene to be expressed in most cell types at most times are commonly referred to as “constitutive promoters”.
  • Promoters that cause a gene to be expressed in a specific cell type are commonly referred to as “cell-specific promoters” or “tissue-specific promoters”. Promoters that cause a gene to be expressed at a specific stage of development or cell differentiation are commonly referred to as “developmentally-specific promoters” or “cell differentiation-specific promoters”. Promoters that are induced and cause a gene to be expressed following exposure or treatment of the cell with an agent, biological molecule, chemical, ligand, light, or the like that induces the promoter are commonly referred to as “inducible promoters” or “regulatable promoters”. It is further recognized that since in most cases the exact boundaries of regulatory sequences have not been completely defined, DNA fragments of different lengths may have identical promoter activity.
  • a “promoter sequence” is a DNA regulatory region capable of binding RNA polymerase in a cell and initiating transcription of a downstream (3' direction) coding sequence.
  • the promoter sequence is bounded at its 3' terminus by the transcription initiation site and extends upstream (5' direction) to include the minimum number of bases or elements necessary to initiate transcription at levels detectable above background.
  • a transcription initiation site (conveniently defined for example, by mapping with nuclease SI), as well as protein binding domains (consensus sequences) responsible for the binding of RNA polymerase or transcription factors.
  • a coding sequence is “under the control” of transcriptional and translational control sequences in a cell when RNA polymerase transcribes the coding sequence into mRNA, which is then RNA spliced (if the coding sequence contains introns) and translated into the protein encoded by the coding sequence.
  • Transcriptional and translational control sequences are DNA regulatory sequences, such as promoters, enhancers, terminators, and the like, that provide for the expression of a coding sequence in a host cell.
  • polyadenylation signals are control sequences.
  • gene switch refers to the combination of a, response element that activates transcription upon contact with a chemical ligand or environmental condition and associated with a promoter, and a switch system (examples of which are described herein) which, in the presence of one or more ligands, modulates the expression of a gene into which the response element and promoter are incorporated.
  • EcR-based gene switch is a chimeric (i.e., three-part heterologous) polypeptide comprised of a transcriptional transactivator domain, a DNA-binding domain and an EcR (ecdysone receptor-derived) ligand binding domain.
  • the ligand binding domain may be split into a bipartite arrangement.
  • protein is a polypeptide that performs a structural or functional role in a living cell.
  • isolated polypeptide or “isolated protein” or “isolated peptide” is a polypeptide or protein that is substantially free of those compounds that are normally associated therewith in its natural state (e.g., other proteins or polypeptides, nucleic acids, carbohydrates, lipids). “Isolated” is not meant to exclude artificial or synthetic mixtures with other compounds, or the presence of impurities which do not interfere with biological activity, and which may be present, for example, due to incomplete purification, addition of stabilizers, or compounding into a pharmaceutically acceptable preparation.
  • reference sequence means a nucleic acid or amino acid used as a comparator for another nucleic acid or amino acid, respectively, when determining sequence identity.
  • percent identity refers to the exactness of a match between a reference sequence and a sequence being compared to it when optimally aligned.
  • operatively linked DNA segments describes that one polynucleotide sequence is joined to another so that the polynucleotides are in association for transcriptional and/or translation control and can be expressed in a suitable host cell.
  • NLS nuclear localization sequence
  • fusions of the synthetic nuclease with an NLS because the NLS “tags” the synthetic nuclease for import into the nucleus by nuclear transport.
  • the NLS typically consists of one or more short sequences of positively charged lysines or arginines.
  • a consensus sequence for one family of NLS is K-K/R-X-K/R.
  • NLS examples include but are not limited to MGLDSTAPKK KRKVGIHGVP AA (SEQ ID NO:7), KRPAATKKAG QAKKKK (SEQ ID NO:8), SV40 Large T-Antigen, nucleoplasmin (AVKRPAATKKAGQAKKKKLD (SEQ ID NO:36)), EGL-13 (MSRRRKANPTKLSENAKKLAKEVEN (SEQ ID NO:37)), c-Myc (PAAKRVKLD (SEQ ID NO:38)) and TUS-protein (KLKIKRPVK (SEQ ID NO:39)).
  • the synthetic nucleases of the invention contain a DNA binding domain derived from any native nuclease. Examples include, but are not limited to Cas9, Cpfl and MAD7.
  • the synthetic nuclease comprises a Cpfl DNA binding domain fused to a nuclease domain of a non-Cpfl nuclease.
  • the Cpfl DNA binding domain comprises the amino acid sequence of SEQ ID NO: 32.
  • the amino acid sequence of SEQ ID NO:32 is encoded by a polynucleotide comprising the nucleic acid sequence of SEQ ID NO: 14.
  • the synthetic nuclease comprises a MAD7 DNA binding domain fused to a nuclease domain of a non-MAD7 nuclease.
  • the MAD7 DNA binding domain comprises the amino acid sequence of SEQ ID NO:34.
  • the amino acid sequence of SEQ ID NO:33 is encoded by a polynucleotide comprising the nucleic acid sequence of SEQ ID NO: 12.
  • the DNA binding domain may further comprise an NLS sequence, such as, but not limited to an amino acid sequence of SEQ ID NO:7, SEQ ID NO:8, SEQ ID NO:35, SEQ ID NO:36, SEQ ID NO:37, or SEQ ID NO:38.
  • the synthetic nucleases of the invention contain a nuclease domain derived from a native nuclease such as Cpfl or MAD7.
  • the synthetic nuclease comprises a Cpfl nuclease domain fused to a DNA binding domain of a non-Cpfl nuclease.
  • the Cpfl nuclease domain comprises the amino acid sequence of SEQ ID NO:34.
  • the amino acid sequence of SEQ ID NO:34 is encoded by a polynucleotide comprising the nucleic acid sequence of SEQ ID NO: 13.
  • the synthetic nuclease comprises a MAD7 nuclease domain fused to a DNA binding domain of a non-MAD7 nuclease.
  • the MAD7 nuclease domain comprises the amino acid sequence of SEQ ID NO:32.
  • the amino acid sequence of SEQ ID NO: 32 is encoded by a polynucleotide comprising the nucleic acid sequence of SEQ ID NO: 15.
  • the DNA binding domain may further comprise an NLS sequence, such as, but not limited to an amino acid sequence of SEQ ID NO:7, SEQ ID NO:8, SEQ ID NO:36, SEQ ID NO:37, SEQ ID NO:38, or SEQ ID NO:39.
  • the invention provides synthetic nucleases in which a DNA binding domain of a native nuclease is operably linked to a nuclease domain of a heterologous native nuclease.
  • the DNA binding domain of choice may be selected for binding a particular DNA sequence of interest and this may be paired with a heterologous nuclease domain of choice to provide specific double-strand breakage in the DNA of interest, such as a sequence to be edited with the aid of a guide RNA (gRNA).
  • the DNA binding domain may be derived from any native nuclease provided that the domain retains DNA binding function.
  • One of skill in the art would readily be able to determine the amino acid sequence necessary in a given DNA binding domain that would retain DNA binding activity through routine binding assays.
  • the DNA binding domain of a native nuclease may then be operably linked to a nuclease domain of a heterologous native nuclease to provide double stranded DNA cutting.
  • the nuclease domain may be derived from any native nuclease provided that the domain retains nuclease activity.
  • One of skill in the art would readily be able to determine the amino acid sequence necessary in a given nuclease domain that would retain nuclease activity through routine assays to determine DNA cleaving.
  • One of skill in the art would further readily be able to determine whether a particular construct with a DNA binding domain and a nuclease domain, so operatively linked would retain both DNA binding and nuclease activity.
  • the synthetic nuclease of the invention comprises a Cpfl DNA binding domain fused to a MAD7 nuclease domain (Synl or SynNucl).
  • the fusion of the Cpfl DNA binding domain and the MAD7 nuclease domain comprises the amino acid sequence of SEQ ID NO:28.
  • the amino acid sequence of SEQ ID NO:28 is encoded by a polynucleotide comprising the nucleic acid sequence of SEQ ID NO:27.
  • the synthetic nuclease of the invention comprises a MAD7 DNA binding domain fused to a Cpfl nuclease domain (Syn2 or SynNuc2).
  • the fusion of the MAD7 DNA binding domain and the Cpfl nuclease domain comprises the amino acid sequence of SEQ ID NO:30.
  • the amino acid sequence of SEQ ID NO:30 is encoded by a polynucleotide comprising the nucleic acid sequence of SEQ ID NO: 29.
  • the synthetic nuclease further comprises at least one NLS.
  • the NLS comprises an amino acid sequence of SEQ ID NO:7, SEQ ID NO:8, SEQ ID NO:35, SEQ ID NO:36, SEQ ID NO:37, or SEQ ID NO:38.
  • the synthetic nuclease further comprises a molecular tag sequence.
  • the molecular tag sequence comprises the amino acid sequence of SEQ ID NO: 9.
  • the synthetic nuclease comprises both one or more NLS sequences and one or more molecular tags.
  • the synthetic nuclease comprises two NLS sequences and a molecular tag sequence.
  • the synthetic nuclease comprises a Cpfl DNA binding domain, a MAD7 nuclease domain, 2 NLS sequences and a molecular tag sequence.
  • the synthetic nuclease comprises the amino acid sequence of SEQ ID NO:2.
  • the synthetic nuclease of SEQ ID NO:2 is encoded by a polynucleotide comprising the nucleic acid sequence of SEQ ID NO: 1.
  • the synthetic nuclease comprises a MAD7 DNA binding domain, a Cpfl nuclease domain, 2 NLS sequences and a molecular tag sequence.
  • the synthetic nuclease comprises the amino acid sequence of SEQ ID NO:4.
  • the synthetic nuclease of SEQ ID NO:4 is encoded by a polynucleotide comprising the nucleic acid sequence of SEQ ID NO:3.
  • the invention also provides polynucleotides encoding synthetic nucleotides.
  • the polynucleotides comprise a nucleic acid sequence encoding a DNA binding domain of a nuclease and a nuclease domain of a heterologous nuclease.
  • the DNA binding domain is from a MAD7 nuclease.
  • the polynucleotide encodes a MAD7 amino acid sequence of SEQ ID NO:33.
  • the polynucleotide encoding the MAD7 DNA binding domain comprises the nucleic acid sequence of SEQ ID NO: 12.
  • the DNA binding domain is from a Cpfl nuclease.
  • the polynucleotide encodes a Cpfl DNA binding domain amino acid sequence of SEQ ID NO:31.
  • the polynucleotide encoding the Cpfl DNA binding domain comprises the nucleic acid sequence of SEQ ID NO: 14.
  • the polynucleotide encoding the DNA binding domain also comprises a nucleic acid sequence encoding an NLS.
  • the nucleic acid sequence encodes an NLS comprising an amino acid sequence of SEQ ID NO:7, SEQ ID NO:8, SEQ ID NO:35, SEQ ID NO:36, SEQ ID NO:37, or SEQ ID NO:38.
  • the polynucleotide encodes a DNA binding domain of MAD7 and an NLS and comprising the amino acid sequence of SEQ ID NO:40. In some embodiments, this polynucleotide has the nucleic acid sequence of SEQ ID NO: 39. In some embodiments the polynucleotide encodes a DNA binding domain of Cpfl and an NLS and comprising the amino acid sequence of SEQ ID NO:42. In some embodiments, this polynucleotide has the nucleic acid sequence of SEQ ID NO:41.
  • the polynucleotide encoding the MAD7 DNA binding domain may be operatively linked to a polynucleotide encoding a Cpfl nuclease domain.
  • the polynucleotide encodes a polypeptide comprising the amino acid sequence of SEQ ID NO: 30.
  • the polynucleotide comprises the nucleic acid sequence of SEQ ID NO:29.
  • the polynucleotide encoding the Cpfl DNA binding domain may be operatively linked to a polynucleotide encoding a MAD7 nuclease domain.
  • the polynucleotide encodes a polypeptide comprising the amino acid sequence of SEQ ID NO:28.
  • the polynucleotide comprises the nucleic acid sequence of SEQ ID NO:27.
  • the polynucleotide encoding a MAD7 DNA binding domain and a Cpfl nuclease domain may further comprise nucleic acid sequences encoding at least one NLS and may further include a molecular tag sequence.
  • the tag sequence encodes an HA-tag.
  • the polynucleotide encoding the MAD7 DNA binding domain and Cpfl nuclease domain further comprises two NLS sequences and an HA-tag.
  • the polynucleotide encodes an amino acid sequence of SEQ ID NO:4.
  • the polynucleotide comprises the nucleic acid sequence of SEQ ID NO:3.
  • the polynucleotide encoding a Cpfl DNA binding domain and a MAD7 nuclease domain may further comprise nucleic acid sequences encoding at least one NLS and may further include a molecular tag sequence.
  • the tag sequence encodes an HA-tag.
  • the polynucleotide encoding the Cpfl DNA binding domain and MAD7 nuclease domain further comprises two NLS sequences and an HA-tag.
  • the polynucleotide encodes an amino acid sequence of SEQ ID NO:2.
  • the polynucleotide comprises the nucleic acid sequence of SEQ ID NO:l.
  • the disclosure provides a vector comprising a polynucleotide described herein.
  • the vector comprises a polynucleotide encoding a DNA binding domain (e.g., a Cpfl DNA binding domain such as SEQ ID NO:31 or SEQ ID NO: 14) and a polynucleotide encoding a nuclease domain (e.g., a MAD7 nuclease domain such as SEQ ID NO:28 or SEQ ID NO:27).
  • the vector comprises an intron sequence positioned between the DNA binding domain and the nuclease domain.
  • a vector comprising the intron sequence provides improved expression of a fusion protein comprising the DNA binding domain and the nuclease domain compared with a vector that does not comprise an intron sequence between the DNA binding domain and the nuclease domain.
  • the intron sequence is a potato St-LSl IV2 intron, a COR15A intron, a UBQ10 intron, a COR15a6L intron, or a COR15allL intron.
  • the intron sequence is a potato St-LSl IV2 intron.
  • the vector further comprises an NLS sequence as described herein (e.g, SEQ ID NO:7, SEQ ID NO:8, SEQ ID NO:35, SEQ ID NO:36, SEQ ID NO:37, or SEQ ID NO:38).
  • the vector further comprises a molecular tag sequence as described herein (e.g., SEQ ID NO:3).
  • the vector comprises a polynucleotide of SEQ ID NO:l.
  • the vector further comprises a promoter sequence. Exemplary promoters are provided herein.
  • the vector is suitable for transfection into a bacterial cell.
  • the vector is suitable for transfection into a mammalian cell, e.g, an animal cell.
  • the vector is suitable for transfection into a plant cell.
  • the nucleases may be used in place of well-known gene editing nucleases such as, for example but not by way of limitation, meganucleases, zinc finger nucleases (ZFNs), transcription activator-like effector-based nucleases (TALEN), and the clustered regularly interspaced short palindromic repeats (CRISPR) with Cas9 nuclease, and Cpfl.
  • gene editing nucleases such as, for example but not by way of limitation, meganucleases, zinc finger nucleases (ZFNs), transcription activator-like effector-based nucleases (TALEN), and the clustered regularly interspaced short palindromic repeats (CRISPR) with Cas9 nuclease, and Cpfl.
  • ZFNs zinc finger nucleases
  • TALEN transcription activator-like effector-based nucleases
  • CRISPR clustered regularly interspaced short palindromic repeats
  • the gene editing protocol would involve gRNA and a nuclease of the invention cutting the host cell DNA as a double-stranded cut and either Non-Homologous End-Joining (NHEJ) or Homology Directed Repair (HDR) mechanisms of repair.
  • NHEJ Non-Homologous End-Joining
  • HDR Homology Directed Repair
  • the former is a homology-independent pathway involving only a few complementary bases aligning for the re-ligation of two ends and is fairly nonspecific in terms of the alterations introduced as a result.
  • HDR is more specific and allows the user to direct very specific alterations in the genome by altering only a small number of nucleotide changes in an otherwise identical portion of the damaged DNA to be repaired.
  • the nucleases of the invention may be expressed in a polynucleotide construct in which the nuclease of the invention and/or gRNAs of interest are expressed under the control of an inducible promoter or gene switch.
  • “on-demand” gene editing may be effected by inducing the promoter or turning on the gene switch to allow transcription of the nucleic acid encoding the nuclease of the invention and/or the gRNA(s) of interest.
  • the promoter is derived from bacteria, e.g., a bacterial promoter.
  • bacteria e.g., a bacterial promoter.
  • bacterial promoters include T7 promoter, Sp6 promoter, lac promoter, araBad promoter, trp promoter, Ptac promoter, and the like.
  • the promoter is derived from a eukaryotic system, e.g., a eukaryotic promoter.
  • the promoter is a mammalian promoter.
  • the promoter is an insect promoter.
  • Non-limiting examples of mammalian promoters include simian virus 40 early promoter (SV40), cytomegalovirus immediate-early promoter (CMV), human Ubiquitin C promoter (UBC), human elongation factor la promoter (EF1A), mouse phosphogly cerate kinase 1 promoter (PGK), chicken b-Actin promoter coupled with CMV early enhancer (CAGG), and the like.
  • SV40 simian virus 40 early promoter
  • CMV cytomegalovirus immediate-early promoter
  • UBC human Ubiquitin C promoter
  • EEF1A human elongation factor la promoter
  • PGK mouse phosphogly cerate kinase 1 promoter
  • CAGG CMV early enhancer
  • insect promoters include copia transposon promoter (COPIA), actin 5C promoter (ACT5C), and the like.
  • the promoter is a doxycycline-inducible promoter, e.g., reverse tetracycline-controlled transactivator (rtTA) or tetracycline-responsive element promoter (TRE).
  • rtTA reverse tetracycline-controlled transactivator
  • TRE tetracycline-responsive element promoter
  • the promoter is derived from a plant, e.g., a plant promoter.
  • Non-limiting examples of plant promoters include Cauliflower mosaic virus (CaMV) 35S, opine promoters, plant ubiquitin (Ubi), rice actin 1 (Act- 1), maize alcohol dehydrogenase 1 (Adh-1), Arabidopsis thaliana small nuclear RNA (U6-26 snRNA) promoter, Arabidopsis thaliana ubiquitin 10 promoter (AtUBQlO), and the like.
  • the promoter is the CaMV 35 S promoter, the U6-26 snRNA promoter, or the AtUBQlO promoter.
  • the nucleases of the invention may be incorporated into a vector for gene editing in which one or more gRNAs are encoded by the vector to target different genes in the host cell.
  • the gRNA encoding sequences may be single and each under the control of a promoter, or in a polycistronic array in which one promoter leads to the expression of multiple gRNAs targeting different genes.
  • the expression of multiple gRNAs allows multiple genes to be edited simultaneously by the same nuclease of the invention. A working example of such construct is provided herein.
  • the invention provides genome editing within a cell.
  • the cells may be prokaryotic or eukaryotic cells.
  • Examples of the cells which may be edited using the method of the invention include, but are not limited to bacterial cells, yeast cells, plant cells (including monocots and dicots), mammalian cells (e.g., human cells, monkey cells, cattle cells, dog cells, cat cells, sheep cells, horse cells, camel cells, llama cells, alpaca cells, goat cells, pig cells and the like), animal cells including vertebrates and invertebrates (e.g., insect cells, fish cells, plants, animals and bacterial cells.
  • mammalian cells e.g., human cells, monkey cells, cattle cells, dog cells, cat cells, sheep cells, horse cells, camel cells, llama cells, alpaca cells, goat cells, pig cells and the like
  • animal cells including vertebrates and invertebrates (e.g., insect cells, fish cells, plants, animals and bacterial cells.
  • a polynucleotide construct comprising a sequence encoding a synthetic nuclease of the invention under the control of a promoter and an attR and/or attL sequence may be introduced into a cell using a serine recombinase.
  • the serine recombinase may direct insertion of the polynucleotide construct into the genome at a pseudo-attP or pseudo-attB site or (when both attB and attP are present on the construct) the construct is introduced randomly or by homologous recombination as determined and designed by the user.
  • the polynucleotide construct may further comprise sequences for one or more gRNAs for editing the genome of the host cell.
  • Constitutive expression of the gRNAs and the nuclease of the invention provide gene editing at loci of interest.
  • the gRNAs and/or nuclease may be under the control of an inducible promoter or gene switch to provide “on-demand” gene editing.
  • the host cell may be an animal cell or plant cell.
  • the polynucleotide constructs can be removed by co expression of a serine recombinase and a cognate Recombinase Directionality Factor (RDF), either separately or as a fusion protein.
  • RDF Recombinase Directionality Factor
  • serine recombinases include, but are not limited to Mycobacterium avium phage Bxbl (Accession ID: NP_075302.1); Streptococcus pyogenes phage 370.1 (Accession ID:WP_010922052.1); Bacillus subtilis phage SP c2 (Accession ID: WP_004399105.1); Listeria monocytogenes phage A118 (Accession ID: WP_015967157.1); and Streptomyces phage ⁇ DC31 (Accession ID: WP_107426086.1).
  • the invention provides a method of altering expression of a gene or genes in a plant cell comprising introducing into the plant cell a polynucleotide construct comprising an att site and gRNA for altering the expression of a gene(s) under the control of a promoter and a polynucleotide encoding a nuclease of the invention to effect gene editing using the gRNA(s).
  • the nuclease may be under the control of an inducible promoter or a gene switch to regulate expression of the nuclease.
  • both the gRNA(s) and the nuclease are constitutively expressed.
  • the polynucleotide construct is integrated into the plant genome at an att pseudosite using a serine recombinase that is co-introduced into the cell (either as a polynucleotide sequence operably linked to a promoter or as a polypeptide).
  • the serine recombinase effects the integration of the polynucleotide construct comprising the att site at the pseudosite in the genome.
  • the serine recombinase method of transformation can accommodate large pieces of DNA and obviates the need for plant pest sequences such as Agrobacterium sequences.
  • the polynucleotide construct comprises an attB site.
  • the polynucleotide construct comprises an attP site.
  • the serine recombinases that are useful for the invention include, but are not limited to Mycobacterium avium Bxbl, Streptococcus pyogenes phage 370.1, Bacillus subtilis phage SP c2, Listeria monocytogenes phage All 8, and Streptomyces phage ⁇ DC31.
  • the construct is integrated into the plant cell in a unidirectional manner. Thereafter, the expression of the gRNA(s) and the nuclease of the invention edit the plant cell as desired.
  • the method further includes introduction of a cognate RDF for the serine recombinase used to introduce the polynucleotide construct.
  • the serine recombinase and the cognate RDF is expressed as a fusion protein such as that provided as SEQ ID NO:
  • the polynucleotide construct comprises an attP site and is integrated with a Bacillus subtilis phage SP c2 serine recombinase.
  • the nuclease used may comprise the sequences of SEQ ID NO:2, SEQ ID NO:4, SEQ ID NO:28, or SEQ ID NO:30.
  • the RDF may comprise the amino acid sequence of SEQ ID NO:53 or SEQ ID NO:54.
  • the nucleic acid sequence for the RDF is codon-optimized for plants, such as for example, in SEQ ID NO:55.
  • an Editing Cassette comprising a promoter operably linked to a polynucleotide encoding a nuclease of the invention and gRNAs is part of a gene construct to be introduced into a cell.
  • the portion encoding the gRNAs may also be operably linked to a promoter.
  • a promoter is operably linked to the nuclease encoding sequence and a second promoter is operably linked to the portion encoding the gRNAs.
  • the promoters may be the same or different. Termination sequences may be provided operably linked to the portion encoding the nuclease and/or the gRNAs.
  • the promoters are constitutive promoters.
  • the construct would also contain an Excision Cassette in which an inducible promoter or gene switch is operably linked to a polynucleotide encoding a serine recombinase and a cognate RDF.
  • the serine recombinase and RDF may be separately expressed or expressed as a fusion protein, such as, but not limited to the fusion protein shown in SEQ ID NO:56.
  • Such construct could also contain additional termination sequences and selectable markers operably linked to promoters.
  • the constructs would also contain the attP or attB site for the serine recombinase such that when transfected into the cell, the serine recombinase would direct integration of the construct into a pseudosite for the serine recombinase.
  • the pseudosite is a pseudo-a//P site. In other embodiments the pseudosite is a pseudo -attB site.
  • the construct comprises an attP site for integration at a pseudo-a/7/i site and the serine recombinase is an SP c2 serine recombinase.
  • a non-limiting example of such a polynucleotide construct is shown in FIG. 7.
  • a construct containing attB and attP sites could be introduced using Agrobacterium and the insertion takes place generating Ti borders.
  • An on-demand excision can be incorporated by including an Excision Cassette as described to express the serine recombinate and cognate RDF for the incorporated att sites to excise the construct from the cell after gene editing.
  • Vectors may be introduced into the desired host cells by methods known in the art, e.g., Agrobaclerium-medialed transformation, transfection, electroporation, microinjection, transduction, cell fusion, DEAE dextran, the flower dipping method, use of a gene gun (biolistics), transformation using a serine recombinase and the like.
  • methods known in the art e.g., Agrobaclerium-medialed transformation, transfection, electroporation, microinjection, transduction, cell fusion, DEAE dextran, the flower dipping method, use of a gene gun (biolistics), transformation using a serine recombinase and the like.
  • Embodiment 1 A polynucleotide encoding a synthetic nuclease comprising an amino acid sequence of SEQ ID NO:28 or SEQ ID NO: 30.
  • Embodiment 2 The polynucleotide of embodiment 1 comprising the nucleic acid sequence of SEQ ID NO: 27 or SEQ ID NO: 29.
  • Embodiment 3 The polynucleotide of embodiment 1 or 2, wherein said polynucleotide further comprises a nucleic acid encoding at least one nuclear localization sequence (NLS).
  • NLS nuclear localization sequence
  • Embodiment 4 The polynucleotide of embodiment 3 wherein said NLS comprises the amino acid sequence of SEQ ID NO:7, SEQ ID NO:8, SEQ ID NO:35, SEQ ID NO:36, SEQ ID NO:37, or SEQ ID NO:38.
  • Embodiment 5 The polynucleotide of any of embodiments 1-4 further comprising a nucleic acid encoding a tag polypeptide.
  • Embodiment 6 The polynucleotide of embodiment 5 wherein said tag polypeptide comprises the amino acid sequence of SEQ ID NO:9.
  • Embodiment 7 The polynucleotide of embodiment 6 wherein said polynucleotide encodes a polypeptide having the amino acid sequence of SEQ ID NO:2 or SEQ ID NO:4.
  • Embodiment 8 The polynucleotide of embodiment 7 wherein said polynucleotide comprises the nucleic acid sequence of SEQ ID NO: 1 or SEQ ID NO:3.
  • Embodiment 9 A synthetic DNA nuclease comprising a DNA binding domain of Cpfl and a nuclease domain of MAD7, or a DNA binding domain of MAD7 and a nuclease domain of Cpfl.
  • Embodiment 10 The synthetic DNA nuclease of embodiment 9 wherein said DNA binding domain of Cpfl comprises the amino acid sequence of SEQ ID NO:31.
  • Embodiment 11 The synthetic DNA nuclease of embodiment 9 wherein said nuclease domain of Cpfl comprises the amino acid sequence of SEQ ID NO:34.
  • Embodiment 12 The synthetic DNA nuclease of embodiment 9 wherein said DNA binding domain of MAD7 comprises the amino acid sequence of SEQ ID NO:33.
  • Embodiment 13 The synthetic DNA nuclease of embodiment 9 wherein said nuclease domain of MAD7 comprises the amino acid sequence of SEQ ID NO:32.
  • Embodiment 14 The synthetic DNA nuclease of embodiment 9 wherein said nuclease comprises the amino acid sequence of SEQ ID NO:28.
  • Embodiment 15 The synthetic DNA nuclease of embodiment 9 wherein said nuclease comprises the amino acid sequence of SEQ ID NO:30.
  • Embodiment 16 The synthetic DNA nuclease of embodiment 9 wherein said nuclease comprises the amino acid sequence of SEQ ID NO:2.
  • Embodiment 17 The synthetic DNA nuclease of embodiment 9 wherein said nuclease comprises the amino acid sequence of SEQ ID NO:4.
  • Embodiment 18 A method of modifying a target locus of interest comprising delivering to said locus a non-naturally occurring composition comprising a synthetic effector protein and one or more nucleic acid components, wherein at least the one or more nucleic acid components is engineered and the effector protein forms a complex with the one or more nucleic acid components and upon binding of the said complex to the target locus of interest, the effector protein induces a modification of the target locus of interest, wherein the synthetic effector protein comprises a DNA binding domain of MAD7 or Cpfl .
  • Embodiment 19 The method of embodiment 18 wherein when said effector protein comprises a DNA binding domain of MAD7, said DNA binding domain is operatively linked to a nuclease domain of Cpfl .
  • Embodiment 20 The method of embodiment 18 wherein when said effector protein comprises a DNA binding domain of Cpfl, said DNA binding domain is operatively linked to a nuclease domain of MAD7.
  • Embodiment 21 The method of embodiment 18 or 19 wherein said effector protein comprises a MAD7 DNA binding domain comprising an amino acid sequence of SEQ ID NO:33.
  • Embodiment 22 The method of embodiment 18 or 20 wherein said effector protein comprises a Cpfl DNA binding domain comprising an amino acid sequence of SEQ ID NO:31.
  • Embodiment 23 The method of any of embodiments 18 to 22 wherein said DNA binding domain further comprises at least one NLS.
  • Embodiment 24 The method of embodiment 23 wherein said NLS is one or more of SEQ ID NO:7, SEQ ID NO:8, SEQ ID NO:35, SEQ ID NO:36, SEQ ID NO:37, or SEQ ID NO:38.
  • Embodiment 25 The method of embodiment 24 wherein said DNA binding domain comprises the amino acid sequence of SEQ ID NO:40.
  • Embodiment 26 The method of embodiment 24 wherein said DNA binding domain comprises the amino acid sequence of SEQ ID NO:42.
  • Embodiment 27 The method of embodiment 25 or 26 wherein said DNA binding domain further comprises a molecular tag.
  • Embodiment 28 The method of embodiment 27 wherein said molecular tag is an HA-tag comprising the sequence of SEQ ID NO:52.
  • Embodiment 29 The method of embodiment 27 wherein said nuclease domain comprises the amino acid sequence of SEQ ID NO:46 or SEQ ID NO:50.
  • Embodiment 30 The method of embodiment 19 or 20 wherein said nuclease domain comprises an amino acid sequence of SEQ ID NO:34
  • Embodiment 31 The method of embodiment 19 or 20 wherein said nuclease domain comprises an amino acid sequence of SEQ ID NO:32
  • Embodiment 32 The method of any of embodiments 27 or 28 wherein said nuclease domain further comprises at least one NLS.
  • Embodiment 33 The method of embodiment 23 wherein said NLS is one or more of SEQ ID NO:7, SEQ ID NO:8, SEQ ID NO:35, SEQ ID NO:36, SEQ ID NO:37, or SEQ ID NO:38.
  • Embodiment 34 The method of embodiment 32 wherein said nuclease domain comprises the amino acid sequence of SEQ ID NO:44.
  • Embodiment 35 The method of embodiment 32 wherein said nuclease domain comprises the amino acid sequence of SEQ ID NO:48.
  • Embodiment 36 The method of embodiment 34 or 35 wherein said nuclease domain further comprises a molecular tag.
  • Embodiment 37 The method of embodiment 36 wherein said molecular tag is an HA-tag comprising the sequence of SEQ ID NO:52.
  • Embodiment 38 The method of embodiment 36 wherein said nuclease domain comprises the amino acid sequence of SEQ ID NO:46 or SEQ ID NO:50.
  • Embodiment 39 The method of embodiment 18, wherein the target locus of interest comprises DNA.
  • Embodiment 40 The method of embodiment 39, wherein the modification of the target locus of interest comprises a strand break.
  • Embodiment 41 The method of embodiment 39, wherein the target locus of interest is comprised in a DNA molecule in vitro.
  • Embodiment 42 The method of embodiment 39, wherein the target locus of interest is comprised in a DNA within a cell.
  • Embodiment 43 The method of embodiment 42, wherein the cell is a prokaryotic cell.
  • Embodiment 44 The method of embodiment 42, wherein the cell is a eukaryotic cell.
  • Embodiment 45 The method of embodiment 42 wherein the cell is a plant cell.
  • Embodiment 46 The method of embodiment 18, wherein the target locus of interest comprises a genomic locus of interest.
  • Embodiment 47 The method of embodiment 18, wherein when in complex with the effector protein the nucleic acid component effects sequence specific binding of the complex to a target sequence of the target locus of interest.
  • Embodiment 48 The method of embodiment 18, wherein the nucleic acid component(s) comprise a putative CRISPR RNA (crRNA) sequence.
  • crRNA putative CRISPR RNA
  • Embodiment 49 The method of embodiment 48, wherein the nucleic acid component(s) do not comprise any putative trans-activating crRNA (tracr RNA) sequences.
  • Embodiment 50 The method of embodiment 40, wherein the strand break comprises a single strand break.
  • Embodiment 51 The method of embodiment 40, wherein the strand break comprises a double strand break.
  • Embodiment 52 The method of embodiment 18, wherein the effector protein and nucleic acid component(s) are provided via one or more polynucleotide molecules encoding the polypeptides and/or the nucleic acid component(s), and wherein the one or more polynucleotide molecules are operably configured to express the polypeptides and/or the nucleic acid component(s).
  • Embodiment 53 The method of embodiment 18, wherein the one or more polynucleotide molecules comprise one or more regulatory elements operably configured to express the polypeptides and/or the nucleic acid component(s), optionally wherein the one or more regulatory elements comprise inducible promotors.
  • Embodiment 54 The method of embodiment 52, wherein the one or more polynucleotide molecules are comprised within one or more vectors.
  • Embodiment 55 The method of embodiment 52, wherein the one or more polynucleotide molecules are comprised in a delivery system, or the method of embodiment 55 wherein the one or more vectors are comprised in a delivery system.
  • Embodiment 56 The method of embodiment 18, wherein the non-naturally occurring or engineered composition is delivered via a delivery vehicle comprising liposome(s), particle(s), exosome(s), microvesicle(s), a gene-gun, a ribonucleoprotein complex, one or more viral vectors or by a serine recombinase delivery method.
  • Embodiment 57 A method of editing the genome of an organism comprising introducing into a cell of the organism a polynucleotide encoding a serine recombinase operably linked to a promoter that is active in the cell and an exogenous polynucleotide comprising a putative CRISPR RNA (crRNA) sequence and an attP or attB site.
  • crRNA CRISPR RNA
  • Embodiment 58 The method of embodiment 57 wherein said exogenous polynucleotide further comprises an excision cassette comprising an excision gRNA operably clinked to an inducible promoter or gene switch and a termination sequence.
  • Embodiment 59 A method of gene editing in a cell comprising: a. transforming a host cell with a gene expression system comprising: i. a first constitutive promoter operably linked to at least one first polynucleotide encoding at least one gene editing guide RNA (gRNA) to target a gene within a host cell; ii. a second constitutive promoter operably linked to a second polynucleotide encoding a synthetic nuclease of any of embodiments 9 to 17; iii.
  • gRNA gene editing guide RNA
  • a third polynucleotide encoding an excision gRNA operably linked to a fourth polynucleotide comprising a inducible gene regulator, wherein said inducible gene regulator is responsive to an activator that activates transcription of the excision gRNA; and iv. PAM sequences flanking the expression system that bind the excision gRNA; wherein said gene expression system comprises at least one att site at one end of the gene expression system; and wherein the excision gRNA is capable of excising the gene expression system by acting with the synthetic nuclease on the PAM sequences; b. allowing said gene editing gRNA and synthetic nuclease to edit at least one target gene in said host cell; c. contacting said host cell with an activator that induces said inducible gene regulator to express said excision gRNA, wherein said excision gRNA acts on said PAM sequences to excise said expression system from the host cell.
  • Embodiment 60 The method of embodiments 59 wherein the activator is a chemical ligand.
  • Embodiment 61 The method of embodiments 59 wherein the activator is an environmental stimulus.
  • Embodiment 62 The method of any of embodiments 59 to 61 wherein said host cell is a mammalian cell, a plant cell, a non-mammalian vertebrate cell or an invertebrate cell.
  • Embodiment 63 The method of embodiment 59 wherein said gene expression system is introduced into the cell by expressing a serine recombinase that acts on the att site on the gene expression system and a pseudosite in the host cell genome to insert the gene expression system.
  • Embodiment 64 The method of embodiment 63 wherein said att site is an attP site.
  • Embodiment 65 The method of embodiment 63 wherein said att site is an attB site.
  • Embodiment 66 The method of any of embodiments 63 to 65 wherein said serine recombinase is a Mycobacterium avium phage Bxbl serine recombinase, a Streptococcus pyogenes phage 370.1 serine recombinase, a Bacillus subtilis phage SP c2 serine recombinase, a Listeria monocytogenes phage A118 serine recombinase, or a Streptomyces phage ⁇ DC31 serine recombinase.
  • Embodiment 67 The method of embodiment 66 wherein said serine recombinase is a Bacillus subtilis phage SP c2 serine recombinase.
  • Embodiment 68 A method of gene editing in a cell comprising: a. transforming a host cell with a gene expression system comprising: i. a first constitutive promoter operably linked to at least one first polynucleotide encoding at least one gene editing guide RNA (gRNA) to target a gene within a host cell; ii. a second constitutive promoter operably linked to a second polynucleotide encoding a synthetic nuclease of any of embodiments 9 to 17; iii. at least one att site recognized by a serine recombinase at one end of the gene expression system; iv.
  • gRNA gene editing guide RNA
  • a third polynucleotide encoding a recombinase directionality factor (RDF) operably linked to a fourth polynucleotide comprising a inducible gene regulator, wherein said inducible gene regulator is responsive to an activator that activates transcription of the RDF; wherein said transformation of said cell is accomplished by co-introducing said gene expression system and a serine recombinase that recognizes said att site on the gene expression vector and a pseudosite in the genome of the host cell; wherein the RDF is capable of excising the gene expression system by acting with a cognate serine recombinase; b.
  • RDF recombinase directionality factor
  • Embodiment 69 The method of embodiments 68 wherein the activator is a chemical ligand.
  • Embodiment 70 The method of embodiments 68 wherein the activator is an environmental stimulus.
  • Embodiment 71 The method of any of embodiments 68 to 70 wherein said host cell is a mammalian cell, a plant cell, a non-mammalian vertebrate cell or an invertebrate cell.
  • Embodiment 72 The method of embodiment 68 wherein said att site is an attP site.
  • Embodiment 73 The method of embodiment 68 wherein said att site is an attB site.
  • Embodiment 74 The method of any of embodiments 68 to 73 wherein said serine recombinase is a Mycobacterium avium phage Bxbl serine recombinase, a Streptococcus pyogenes phage 370.1 serine recombinase, a Bacillus subtilis phage SP c2 serine recombinase, a Listeria monocytogenes phage A118 serine recombinase, or a Streptomyces phage ⁇ FC31 serine recombinase.
  • serine recombinase is a Mycobacterium avium phage Bxbl serine recombinase, a Streptococcus pyogenes phage 370.1 serine recombinase, a Bacillus subtilis phage SP c2 serine recombinase, a Listeria
  • Embodiment 75 The method of embodiment 66 wherein said serine recombinase is a Bacillus subtilis phage SP c2 serine recombinase.
  • Embodiment 76 The method of any of embodiments 68 to 75 wherein said RDF is a fusion protein of RDF and its cognate serine recombinase.
  • Embodiment 77 The method of any of embodiments 68 to 76 wherein the serine recombinase is an SP c2 serine recombinase and the RDF comprises the amino acid sequence of SEQ ID NO:53 or SEQ ID NO:54.
  • SynNucl corresponds to the fusion of sequences from the nuclease domain of MAD7 (SEQ ID NO: 15) and the DNA-binding domain of Cpfl (SEQ ID NO: 13).
  • the translated SynNucl is shown in SEQ ID NO:2.
  • SynNuc2 corresponds to the fusion of sequences from the nuclease domain of Cpfl (SEQ ID NO: 13) and the DNA-binding domain of MAD7 (SEQ ID NO: 12).
  • the translated SynNuc2 is shown in SEQ ID NO:4.
  • a polynucleotide encoding a nuclear localization signal (NLS) was added both upstream and downstream of the fused sequences.
  • the upstream NLS encodes a polypeptide of SEQ ID NO:7 and the downstream NLS encodes a polypeptide of SEQ ID NO:8.
  • AsCpfl (SEQ ID NO:5) was also synthesized as control.
  • the translated AsCpfl is shown as SEQ ID NO:6.
  • a plant Kozak sequence was added upstream and an HA-tag was added downstream the whole sequence.
  • the HA-tag sequence translation is shown in SEQ ID NO:9. All sequences were optimized for the human embryonic kidney (HEK) cells.
  • ID414 contains a guide-RNA (gRNA) targeting the lettuce phytoene desaturase (PDS) gene for gene editing under an Arabidopsis thaliana U6 promoter.
  • FIGS. 1-3 shows the vector maps for these nucleases (ID525: SynNucl (FIG.l); ID526: SynNuc2 (FIG. 2); and ID524: AsCpfl control (FIG. 3)).
  • Protoplasts were isolated from six week old wild-type lettuce plants (about 1 g of leaf tissue) and transfected following Sheen’s protocol (Yoo, S.D. el al. (2007) Nature Protocols 2:1565-1575). Transfected protoplasts were incubated at 25°C in the dark for about 60 hours.
  • genomic DNA was extracted with 400 ul urea buffer (6.9 M Urea, 350 mM NaCl, 50 mM Tris-Cl pH 8.0, 20 mM EDTA pH 8.0, 1% Sarkosyl) followed by a phenol: chloroform: isoamyl alcohol and a chloroform: isoamyl alcohol steps. DNA precipitation was done at -80°C for 20 minutes in an equal volume of isopropanol. Finally, DNA was washed once with 70% ethanol and resuspended in 20 ul of distilled (DI) water. gDNA concentration was estimated using a nanodrop 8000 (Thermo Scientific) then diluted at 30 ng/ul for further analysis.
  • Lettuce PDS and/or other targeted region were PCR amplified using Phusion Hot Start II (Thermo Fisher Scientific) and specific set of primers for each target gene on 60 ng/2 ul gDNA following the manufacturer instructions. PCR reactions were run using an Eppendorf MasterCycler EPgradient instrument.
  • NGS Next Generation Sequencing
  • the synthetic nucleases successfully generated CRISPR-guided DNA double- stranded breaks in the lettuce PDS gene.
  • Protoplast cell transfections were performed for both the synthetic nucleases and AsCpfl (positive control) vectors targeting PDS for gene editing.
  • transfections were also performed with a previously constructed AsCpfl vector containing a different tag (V5-tag) located downstream of the NLS at the 5’ end of the humanized AsCpfl nuclease sequence (available from Addgene plasmid #69982 (ID414 - map) and a native MAD7 vector (ID440) where the E. coli codon optimized MAD7 sequence available from Inscripta has been further codon optimized for homo sapiens using the codon optimization tool from Integrated DNA Technology (IDT, Iowa, USA) both targeting lettuce PDS gene as well.
  • V5-tag a different tag located downstream of the NLS at the 5’ end of the humanized AsCpfl nuclease sequence
  • ID440 native MAD7 vector
  • PDS targeted region was amplified from gDNA extracted of these transfected protoplasts then submitted to NGS.
  • Table 1 shows mutation frequencies observed in lettuce PDS gene at the targeted site for each of the nucleases/vectors tested.
  • Protoplast cell transfections were performed with a vector (ID536) containing SynNucl driven by a CaMV 35S promoter along with a polycistronic gRNAs targeting eight different lettuce genes driven by the Arabidopsis U6 promoter.
  • ID536 a vector containing SynNucl driven by a CaMV 35S promoter along with a polycistronic gRNAs targeting eight different lettuce genes driven by the Arabidopsis U6 promoter.
  • transfections were also performed using few control vectors including ID525 and ID414 for PDS gene editing but also another vector (ID 121) containing Cas9 nuclease and a slightly different polycistronic gRNAs targeting the same eight lettuce genes and two others.
  • PPO Genes PPO-A, PPO-B and PPO-C are targeted by the same gRNA while PPO-G is targeted by a different gRNA as the other four genes (PPO-E, PPO-O, PPO-R, and PPO-S).
  • Mutations generated after NHEJ repair of the breaks created by SynNucl in four genes are ranging from one base pair to 5 base pair deletions as well as insertion of 1 to 37 base pairs. Similar mutations are generated after Cas9 breaks including longer deletion (up to 24 base pairs).
  • Table 2 does not show the data for the PDS controls performed in the same transfection experiment, however, the results were similar as previously obtained. SynNucl showed 0.02% mutation frequency on PDS while AsCpfl showed 0.25% mutation frequency.
  • FIGS. 8 and 9 Representative mutations generated in the PDS gene by the nucleases as described in Table 3 are shown in FIGS. 8 and 9.
  • FIG. 8A shows the mutations generated by SynNucl and SynNuc2
  • FIG. 8B shows the mutations generated by MAD7, Cpfl Construct 1, and Cpfl Construct 2 in Experiment 1.
  • FIG. 9A shows the mutations generated by SynNucl and SynNuc2
  • FIG. 8B shows the mutations generated by MAD7, Cpfl Construct 1, and Cpfl Construct 2 in Experiment 2.
  • FIGS. 10A-10C Results are shown in FIGS. 10A-10C.
  • SynNucl is referred to as “fMAD7.”
  • Constructs containing the AtUBQlO promoter for expression of the SynNucl nuclease (construct ID 3) or an intron between the DNA binding and nuclease domains of SynNucl (construct ID 2) provided up to 9-fold improvement in editing efficiency over constructs containing the CaMV 35S promoter and without the intron (FIGS. 10A and 10B).
  • a construct containing both the AtUBQlO promoter and the St-LSl IV2 intron resulted in a further increase of up to 3.5-fold, in editing efficiency, compared with AtUBQlO promoter with no intron (FIG. IOC).
  • Twenty -three calli developed from hAsCpfl -transfected protoplasts contained mutations (5-12 base pairs deletion) at the expected region of the PDS gene.
  • One of the mutations was a homozygous mutation (9 base pair deletion between nucleotides 1449 and 1457), resulting in a three amino acid deletion at the end of exon 3, as shown in FIGS. 11A and 1 IB.
  • the gene editing efficiency for the specific case of PDS was estimated to be 11%, based on the number of edited calli out of the total number of calli assessed for editing, as shown in FIG. 12
  • Two calli developed from SynNucl -transfected protoplasts contained a heterozygous mutation (6 and 9 base pair deletions between nucleotides 1447 and 1455), as shown in FIG. 13.
  • the PDS gene editing efficiency was estimated to be 1%, based on the number of edited calli out of the total number of calli assessed for editing, as shown in FIG. 12.

Landscapes

  • Health & Medical Sciences (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Genetics & Genomics (AREA)
  • Engineering & Computer Science (AREA)
  • Chemical & Material Sciences (AREA)
  • Organic Chemistry (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Zoology (AREA)
  • Wood Science & Technology (AREA)
  • Biomedical Technology (AREA)
  • General Engineering & Computer Science (AREA)
  • Biotechnology (AREA)
  • Molecular Biology (AREA)
  • General Health & Medical Sciences (AREA)
  • Biochemistry (AREA)
  • Microbiology (AREA)
  • Plant Pathology (AREA)
  • Biophysics (AREA)
  • Physics & Mathematics (AREA)
  • Cell Biology (AREA)
  • Crystallography & Structural Chemistry (AREA)
  • Medicinal Chemistry (AREA)
  • Mycology (AREA)
  • Micro-Organisms Or Cultivation Processes Thereof (AREA)
  • Enzymes And Modification Thereof (AREA)

Abstract

Synthetic nucleases for gene editing are described comprising combinations of DNA binding domains of one endonuclease with nuclease domains of a different nuclease. Methods of using the nucleases in gene editing applications are also described.

Description

SYNTHETIC NUCLEASES
REFERENCE TO SEQUENCE LISTING [0001] This application incorporates by reference a "Sequence Listing" (identified below) which is submitted concurrently herewith in text file format via the U.S. Patent Office's Electronic Filing System (EFS). The text file copy of the Sequence Listing submitted herewith is labeled "INX00464US-Vl_ST25.txt", is a file of 182,873 bytes in size, and was created on October 22, 2019; this Sequence Listing is incorporated by reference in its entirety herein.
BACKGROUND OF THE INVENTION
[0002] CRISPR Technology was a big step forward in gene editing capability. CRISPR (clustered regularly interspaced short palindromic repeats) refers to the system by which an enzyme (Cas) is able to target and modify a genetic sequence in the DNA of interest. Since its discovery in 2012, numerous biotechnology companies have introduced CRISPR technology into their research platform. Some companies have emerged that specialize in improving CRISPR technology and providing services.
[0003] Progress in genome editing technology has been a sequential process. It started as a gene therapy tool by using simple oligo-directed mutagenesis, which later evolved as triple-helix forming oligo (TFO) to facilitate homology-dependent recombination (HDR). Fusion of an endonuclease with a TFO/oligo, also called Protein-Nucleic Acid (PNA) was also tried as a tool for genome modification in pre-CRISPR era. Discovery of CRISPR/cas system in prokaryotic defense was described in early 2000s (Brouns et al, 2008; Carte et al, 2008; Hale et al, 2009; Westra et al. , 2012). Prior to the development of CRISPR-Cas based gene editing, Zinc-finger, meganucleases and TALEN nucleases entered the market as tools for genome modification. Due to this sequential progress in genome editing area and prior art around it, several companies have entered the genome editing market with their own nucleases.
[0004] CRISPR/cas system is still far from perfect. It has low editing efficiency (often < 1%) and it produces many off-target effects, which has raised concerns over its use in health industries. In the current market, many genome editing companies are now racing to improve the system and attempting to build novel genome editing tools which have reduced off-target effects and increased editing efficiency.
[0005] There is a need in the art for nucleases that have good efficiency with reduced off- target effects for use in gene editing applications. BRIEF SUMMARY OF THE INVENTION [0006] The invention provides a polynucleotide encoding a synthetic nuclease comprising an amino acid sequence of SEQ ID NO:28 or SEQ ID NO:31. In some embodiments, the polynucleotide comprises a nucleic acid sequence of SEQ ID NO:27 or SEQ ID NO:30. In some embodiments, the polynucleotide comprises a nucleic acid sequence of SEQ ID NO:30.
[0007] In some embodiments, the polynucleotide further comprises a nucleic acid encoding at least one nuclear localization sequence (NLS), such as but not limited to an NLS comprising the nucleic acid sequence of SEQ ID NO:7 or SEQ ID NO:8. In some embodiments, the polynucleotide of any the invention also comprises a nucleic acid encoding a tag polypeptide, such as, but not limited to one that encodes the amino acid sequence of SEQ ID NO:9. IN some embodiments, the polynucleotide encodes a polypeptide having the amino acid sequence of SEQ ID NO:2 or SEQ ID NO:4. In some embodiments, the polynucleotide comprises the nucleic acid sequence of SEQ ID NO: 1 or SEQ ID NO:3.
[0008] The invention provides a synthetic DNA nuclease comprising a DNA binding domain of Cpfl and a nuclease domain of MAD7, or a DNA binding domain of MAD7 and a nuclease domain of Cpfl. In some embodiments, the synthetic DNA nuclease comprises a DNA binding domain of Cpfl. In some embodiments, the DNA binding domain comprises the amino acid sequence of SEQ ID NO:31. In some embodiments, the nuclease domain of Cpfl comprises the amino acid sequence of SEQ ID NO:34. In some embodiments, the DNA binding domain of MAD7 comprises the amino acid sequence of SEQ ID NO:33. In some embodiments, the nuclease domain of MAD7 comprises the amino acid sequence of SEQ ID NO:32. In certain embodiments, the nuclease comprises the amino acid sequence of SEQ ID NO:28. In other embodiments, the nuclease comprises the amino acid sequence of SEQ ID NO:31.
[0009] In specific embodiments, the nuclease comprises the amino acid sequence of SEQ ID NO:2. In other specific embodiments, the nuclease comprises the amino acid sequence of SEQ ID NO: 4.
[00010] The invention also provides a method of modifying a target locus of interest comprising delivering to the locus a non-naturally occurring composition comprising a synthetic effector protein and one or more nucleic acid components, wherein at least the one or more nucleic acid components is engineered and the effector protein forms a complex with the one or more nucleic acid components and upon binding of the said complex to the target locus of interest, the effector protein induces a modification of the target locus of interest, wherein the synthetic effector protein comprises a DNA binding domain of MAD7 or Cpfl and a nuclease domain of a heterologous nuclease. [00011] In some embodiments, the effector protein comprises a DNA binding domain of MAD7 operatively linked to a nuclease domain of Cpfl. In some embodiments, the effector protein comprises a DNA binding domain of Cpfl operatively linked to a nuclease domain of MAD7. In certain embodiments, the effector protein comprises a MAD7 DNA binding domain comprising the amino acid sequence of SEQ ID NO:34. In certain embodiments, the effector protein comprises a Cpfl DNA binding domain comprising the amino acid sequence of SEQ ID NO:32.
[00012] In the method of the invention, the DNA binding domain further comprises at least one NLS. The NLS may have the nucleic acid sequence of, for example, one or more of SEQ ID NO:7, SEQ ID NO:8, SEQ ID NO:36, SEQ ID NO:37, SEQ ID NO:38, or SEQ ID NO:39.
[00013] In some embodiments of the method of the invention, the DNA binding domain comprises the amino acid sequence of SEQ ID NO:41. In other embodiments, the DNA binding domain comprises the amino acid sequence of SEQ ID NO:43.
[00014] The method of the invention may also be practice in which the DNA binding domain further comprises a molecular tag. In some embodiments, the molecular tag is an HA-tag, for example comprising the sequence of SEQ ID NO:53. In some embodiments of the invention the method uses a DNA binding domain that comprises the amino acid sequence of SEQ ID NO:47 or SEQ ID NO:51. In some embodiments, the nuclease domain comprises an amino acid sequence of SEQ ID NO:32. In other embodiments, the nuclease domain comprises an amino acid sequence of SEQ ID NO:33.
[00015] In the method, the nuclease domain further comprises at least one NLS, such as, but not limited to an NLS with the nucleic acid sequence of one or more of SEQ ID NO:7, SEQ ID NO:8, SEQ ID NO:36, SEQ ID NO:37, SEQ ID NO:38, or SEQ ID NO:39. In some embodiments, the nuclease domain comprises the amino acid sequence of SEQ ID NO:45. In other embodiments, the nuclease domain comprises the amino acid sequence of SEQ ID NO:49.
[00016] The nuclease domain may further comprise a molecular tag. In some embodiments, the molecular tag is an HA-tag, such as, but not limited to an HA-tag comprising the sequence of SEQ ID NO:53. In some embodiments, the nuclease domain comprises the amino acid sequence of SEQ ID NO:47. In some embodiments, the nuclease domain comprises the amino acid sequence of SEQ ID NO:51.
[00017] In some embodiments of the method of the invention, the target locus of interest comprises DNA. The modification of the target locus of interest may comprise a strand break, which may be a single strand break or a double strand break. In some embodiments, the target locus of interest comprises a DNA molecule in vitro. In some embodiments, the target locus of interest, such as a genomic locus, comprises a DNA molecule within a cell, such as, for example, a prokaryotic cell, a eukaryotic cell, or a plant cell.
[00018] In the method of the invention, the nucleic acid component(s) may comprise a putative CRISPR RNA (crRNA) sequence and not any putative trans-activating crRNA (tracrRNA) sequences.
[00019] In the method of the invention, the effector protein and nucleic acid component(s) are provided via one or more polynucleotide molecules encoding the polypeptides and/or the nucleic acid component(s), and wherein the one or more polynucleotide molecules are operably configured to express the polypeptides and/or the nucleic acid component(s).
[00020] In some embodiments, the one or more polynucleotide molecules comprise one or more regulatory elements operably configured to express the polypeptides and/or the nucleic acid component(s), optionally wherein the one or more regulatory elements comprise inducible promotors. The one or more polynucleotide molecules are comprised within one or more vectors.
[00021] In the method of the invention, the polynucleotide may be delivered to the cell using liposome(s), particle(s), exosome(s), microvesicle(s), a gene-gun, a ribonucleoprotein complex, one or more viral vectors or by a serine recombinase delivery method.
[00022] The invention also provides a method of editing the genome of an organism comprising introducing into a cell of the organism a polynucleotide encoding a serine recombinase operably linked to a promoter that is active in the cell and an exogenous polynucleotide comprising a putative CRISPR RNA (crRNA) sequence and an attP or attB site.
[00023] In some embodiments, the exogenous polynucleotide further comprises an excision cassette comprising an excision gRNA operably clinked to an inducible promoter or gene switch and a termination sequence.
[00024] The invention also provides a method for gene editing in a cell comprising: a. transforming a host cell with a gene expression system comprising: i. a first constitutive promoter operably linked to at least one first polynucleotide encoding at least one gene editing guide RNA (gRNA) to target a gene within a host cell; ii. a second constitutive promoter operably linked to a second polynucleotide encoding a synthetic nuclease of the invention; iii. a third polynucleotide encoding an excision gRNA operably linked to a fourth polynucleotide comprising a inducible gene regulator, wherein the inducible gene regulator is responsive to an activator that activates transcription of the excision gRNA; and iv. PAM sequences flanking the expression system that bind the excision gRNA; wherein the gene expression system comprises at least one att site at one end of the gene expression system; and wherein the excision gRNA is capable of excising the gene expression system by acting with the synthetic nuclease on the PAM sequences; b. allowing the gene editing gRNA and synthetic nuclease to edit at least one target gene in said host cell; c. contacting the host cell with an activator that induces said inducible gene regulator to express the excision gRNA, wherein the excision gRNA acts on the PAM sequences to excise the expression system from the host cell.
[00025] In this method, the activator may be a chemical ligand or an environmental stimulus. The host cell may be a mammalian cell, a plant cell, a non-mammalian vertebrate cell or an invertebrate cell.
[00026] The gene expression system may be introduced into the cell by expressing a serine recombinase that acts on the att site on the gene expression system and a pseudosite in the host cell genome to insert the gene expression system. The att site may be an attP site or an attB site. The serine recombinase may be a Mycobacterium avium phage Bxbl serine recombinase, a Streptococcus pyogenes phage 370.1 serine recombinase, a Bacillus subtilis phage SP c2 serine recombinase, a Listeria monocytogenes phage A118 serine recombinase, or a Streptomyces phage <DC31 serine recombinase. In specific embodiments, the serine recombinase is a Bacillus subtilis phage SP c2 serine recombinase.
[00027] The invention also provides a method of gene editing in a cell comprising: a. transforming a host cell with a gene expression system comprising: i. a first constitutive promoter operably linked to at least one first polynucleotide encoding at least one gene editing guide RNA (gRNA) to target a gene within a host cell; ii. a second constitutive promoter operably linked to a second polynucleotide encoding a synthetic nuclease provided herein; iii. at least one att site recognized by a serine recombinase at one end of the gene expression system; iv. a third polynucleotide encoding a recombinase directionality factor (RDF) operably linked to a fourth polynucleotide comprising a inducible gene regulator, wherein said inducible gene regulator is responsive to an activator that activates transcription of the RDF; wherein said transformation of said cell is accomplished by co-introducing said gene expression system and a serine recombinase that recognizes said att site on the gene expression vector and a pseudosite in the genome of the host cell; wherein the RDF is capable of excising the gene expression system by acting with a cognate serine recombinase; b. allowing said gene editing gRNA and synthetic nuclease to edit at least one target gene in said host cell; c. contacting said host cell with an activator that induces said inducible gene regulator to express said RDF, wherein said RDF acts with said serine recombinase to excise said expression system from said host cell.
[00028] In this method, the activator may be a chemical ligand or an environmental stimulus.
[00029] The host cell may be a mammalian cell, a plant cell, a non-mammalian vertebrate cell or an invertebrate cell. The att site may be an attP site or an attB site. The serine recombinase may be a Mycobacterium avium phage Bxbl serine recombinase, a Streptococcus pyogenes phage 370.1 serine recombinase, a Bacillus subtilis phage SP c2 serine recombinase, a Listeria monocytogenes phage All 8 serine recombinase, or a Streptomyces phage <FC31 serine recombinase. In specific embodiments, it is a Bacillus subtilis phage SP c2 serine recombinase.
[00030] The RDF may be a fusion protein of RDF and its cognate serine recombinase. In some embodiments, the serine recombinase is an SP c2 serine recombinase and the RDF comprises the amino acid sequence of SEQ ID NO:53 or SEQ ID NO:54.
BRIEF DESCRIPTION OF THE DRAWINGS [00031] FIG. 1 shows a map of the vector ID525 containing the SynNucl synthetic nuclease showing the hAsCpfl DNA binding domain fused to the MAD7 nuclease domain, an NLS on the amino terminal portion of the fusion, an NLS on the carboxy terminal portion of the fusion, and an HA-tag at the carboxy terminal portion of the fusion. Amp: ampicillin resistance marker; 35S pro: 35S promoter sequence; crRNA: crisper RNA sequence; Nos term nopaline synthase termination sequence; AtU6 pro: Arabidopsis thaliana small nuclear RNA (U6-26 snRNA) promoter sequence; DR: direct repeat; LsPDS-G2: gRNA; rep origin: replication origin for plasmid.
[00032] FIG. 2 shows a map of the vector ID526 containing the SynNuc2 synthetic nuclease showing the MAD7 DNA binding domain fused to the Cpfl nuclease domain, an NLS on the amino terminal portion of the fusion, an NLS on the carboxy terminal portion of the fusion, and an HA-tag at the carboxy terminal portion of the fusion. Amp: ampicillin resistance marker; 35 S pro: 35S promoter sequence; crRNA: crisper RNA sequence; Nos term nopaline synthase termination sequence; AtU6 pro: Arabidopsis thaliana small nuclear RNA (U6-26 snRNA) promoter sequence; DR: direct repeat; LsPDS-G2: gRNA; rep origin: replication origin for plasmid.
[00033] FIG. 3 shows a map of the vector ID524 containing the AsCpfl wild type nuclease showing the native hAsCpfl nuclease, an NLS on the amino terminal portion of the fusion, an NLS on the carboxy terminal portion of the fusion, and an HA-tag at the carboxy terminal portion of the fusion. Amp: ampicillin resistance marker; 35S pro: 35S promoter sequence; crRNA: crisper RNA sequence; Nos term nopaline synthase termination sequence; AtU6 pro: Arabidopsis thaliana small nuclear RNA (U6-26 snRNA) promoter sequence; DR: direct repeat; LsPDS-G2: gRNA; rep origin: replication origin for plasmid.
[00034] FIG. 4 shows a map of the vector ID536 containing the SynNucl synthetic nuclease.
[00035] FIG. 5 shows representative sequences of the PDS gene edited by the SynNucl synthetic nuclease. The black arrow represents the gRNA sequence for the PDS gene and the top sequence is the wild type PDS sequence.
[00036] FIG. 6 shows the results of gene editing in a PDS gene. Nucleases used were those on vector ID414 and ID525 as shown in boxes for each.
[00037] FIG. 7 shows a schematic of a construct that may be used for constitutive gene editing with controllable self-removal of the construct from the chromosome. Shown is a gene expression system for genome editing and scarless excision of a transgene that had been integrated into the chromosome via a serine recombinase; Pseudo-ATTR, pseudo-ATTR site from serine recombinase-mediated integration; Pseudo-ATTL, pseudo-ATTL site from serine recombinase-mediated integration; Pro., promoter; NPTII, neomycin phosphotransferase; Ter, Terminator; Nuclease, gene editing endonuclease; gRNAs, gRNAs for constitutive expression and editing; Switch, controllable gene switch or inducible promoter; SR/RDF, serine recombinase and cognate Recombination Directionality Factor that directs excision of gene expression system from chromosome.
[00038] FIG. 8A shows exemplary results of a gene editing experiment in the lettuce PDS gene by SynNucl and SynNuc2 as described in Table 3.
[00039] FIG. 8B shows exemplary results of a gene editing experiment in the lettuce PDS gene by MAD7 and two different constructs of Cpfl, as described in Table 3. [00040] FIG. 9A shows exemplary results of a further gene editing experiment in the lettuce PDS gene by SynNucl and SynNuc2 as described in Table 3.
[00041] FIG. 9B shows exemplary results of a further gene editing experiment in the lettuce PDS gene by MAD7 and two different constructs of Cpfl, as described in Table 3.
[00042] FIG. 10A shows exemplary results of a gene editing experiment in the lettuce PPO- B, PPO-D, PPO-E, PPO-G, and PPO-S genes by SynNucl constructs containing either a CaMV 35S promoter, AtUBQlO promoter, or potato St-LSl IV2 intron, shown as Constructs 1-3 in Table 4.
[00043] FIG. 10B shows exemplary results of a further gene editing experiment in the lettuce PPO-B, PPO-D, PPO-E, PPO-G, and PPO-S genes by SynNucl constructs containing either a CaMV 35S promoter, AtUBQlO promoter, or potato St-LSl IV2 intron, shown as Constructs 1- 3 in Table 4.
[00044] FIG. IOC shows exemplary results of a gene editing experiment in the lettuce PPO- B, PPO-D, PPO-E, PPO-G, and PPO-S genes by SynNucl constructs containing either an AtUBQlO promoter and no intron or an AtUBQlO promoter and a potato St-LSl IV2 intron, shown as Constructs 3 and 4 in Table 4.
[00045] FIG. 11A shows exemplary results of a gene editing experiment in the lettuce PDS gene by Cpfl, with genomic DNA sequenced from calli developed from Cpfl -transfected protoplasts.
[00046] FIG. 11B shows the amino acid sequence encoded by the edited gene in FIG. 11 A, which contains a three-amino acid deletion in exon 3.
[00047] FIG. 12 shows a summary of the gene editing events and efficiency based on the results in FIGS. 11 A, 11B, and 13.
[00048] FIG. 13 shows exemplary results of a gene editing experiment in the lettuce PDS gene by SynNucl, with genomic DNA sequenced from calli developed from SynNucl - transfected protoplasts.
DETAILED DESCRIPTION OF THE INVENTION [00049] The invention relates to synthetic nucleases that may be used for gene editing. The nucleases have DNA binding domains derived from one native nuclease and a nuclease domain derived from a different native nuclease. The two domains are fused together to form a synthetic or chimeric nuclease that has both DNA binding and nuclease functions that are active in plant, animal and bacterial cells. [00050] All journal articles or other publications, patents and patent applications referred to herein are expressly incorporated by reference as if each individual journal article, publication, patent, or patent application was specifically and individually indicated to be incorporated by reference. In the event of a conflict between any disclosure in the present application, compared to a disclosure incorporated by reference, the disclosure in the present application controls.
[00051] In this disclosure, a number of terms and abbreviations are used. The following definitions are provided and should be helpful in understanding the scope and practice of the present invention.
[00052] The term “substantially free” means that a composition comprising “A” (where “A” is a single protein, DNA molecule, vector, recombinant host cell, etc.) is substantially free of “B” (where “B” comprises one or more contaminating proteins, DNA molecules, vectors, etc.) when at least about 75% by weight of the proteins, DNA, vectors (depending on the category of species to which A and B belong) in the composition is “A”. Preferably, “A” comprises at least about 90% by weight of the A+B species in the composition, most preferably at least about 99% by weight. It is also preferred that a composition, which is substantially free of contamination, contain only a single molecular weight species having the activity or characteristic of the species of interest.
[00053] The term “isolated” for the purposes of the present invention designates a biological material (e.g., nucleic acid or protein) that has been removed from its original environment (the environment in which it is naturally present).
[00054] For example, a polynucleotide present in the natural state in a plant or an animal is not isolated. The same polynucleotide is "isolated" if it is separated from the adjacent nucleic acids in which it is naturally present. The term “purified” does not require the material to be present in a form exhibiting absolute purity, exclusive of the presence of other compounds. It is rather a relative definition.
[00055] A polynucleotide is in the “purified” state after purification of the starting material or of the natural material by at least one order of magnitude, preferably 2 or 3 and preferably 4 or 5 orders of magnitude.
[00056] A “nucleic acid” or “polynucleotide” is a polymeric compound comprised of covalently linked subunits called nucleotides. Nucleic acid includes polyribonucleic acid (RNA) and polydeoxyribonucleic acid (DNA), both of which may be single-stranded or double- stranded. DNA includes but is not limited to cDNA, genomic DNA, plasmid DNA, synthetic DNA, and semi-synthetic DNA. DNA may be linear, circular, or supercoiled. [00057] A “nucleic acid molecule” refers to the phosphate ester polymeric form of ribonucleosides (adenosine, guanosine, uridine or cytidine; “RNA molecules”) or deoxyribonucleosides (deoxyadenosine, deoxyguanosine, deoxythymidine, or deoxycytidine; “DNA molecules”), or any phosphoester anologs thereof, such as phosphorothioates and thioesters, in either single stranded form, or a double-stranded helix. Double stranded DNA- DNA, DNA-RNA and RNA-RNA helices are possible. The term nucleic acid molecule, and in particular DNA or RNA molecule, refers only to the primary and secondary structure of the molecule, and does not limit it to any particular tertiary forms. Thus, this term includes double- stranded DNA found, inter alia, in linear or circular DNA molecules (e.g., restriction fragments), plasmids, and chromosomes. In discussing the structure of particular double-stranded DNA molecules, sequences may be described herein according to the normal convention of giving only the sequence in the 5' to 3' direction along the non-transcribed strand of DNA (i.e., the strand having a sequence homologous to the mRNA). A “recombinant DNA molecule” is a DNA molecule that has undergone a molecular biological manipulation.
[00058] The term “fragment” when referring to a polynucleotide will be understood to mean a nucleotide sequence of reduced length relative to the reference nucleic acid and comprising, over the common portion, a nucleotide sequence identical to the reference nucleic acid. Such a nucleic acid fragment according to the invention may be, where appropriate, included in a larger polynucleotide of which it is a constituent. Such fragments comprise, or alternatively consist of, oligonucleotides ranging in length from at least 8, 10, 12, 15, 18, 20 to 25, 30, 40, 50, 70, 80, 100, 200, 500, 1000 or 1500 consecutive nucleotides of a nucleic acid according to the invention.
[00059] As used herein, an “isolated nucleic acid fragment” is a polymer of RNA or DNA that is single- or double-stranded, optionally containing synthetic, non-natural or altered nucleotide bases. An isolated nucleic acid fragment in the form of a polymer of DNA may be comprised of one or more segments of cDNA, genomic DNA or synthetic DNA.
[00060] A “gene” refers to an assembly of nucleotides that encode a polypeptide, and includes cDNA and genomic DNA nucleic acids. “Gene” also refers to a nucleic acid fragment that expresses a specific protein or polypeptide, optionally including regulatory sequences preceding (5' noncoding sequences) and following (3' non-coding sequences) the coding sequence. “Native gene” refers to a gene as found in nature with its own regulatory sequences. “Chimeric gene” refers to any gene that is not a native gene, comprising regulatory and/or coding sequences that are not found together in nature. Accordingly, a chimeric gene may comprise regulatory sequences and coding sequences that are derived from different sources, or regulatory sequences and coding sequences derived from the same source, but arranged in a manner different than that found in nature. A chimeric gene may comprise coding sequences derived from different sources and/or regulatory sequences derived from different sources. “Endogenous gene” refers to a native gene in its natural location in the genome of an organism A “foreign” gene or “heterologous” gene refers to a gene not normally found in the host organism, but that is introduced into the host organism by gene transfer. Foreign genes can comprise native genes inserted into a non-native organism, or chimeric genes. A “transgene” is a gene that has been introduced into the genome by a transformation procedure.
[00061] “Heterologous” DNA refers to DNA not naturally located in the cell, or in a chromosomal site of the cell. Preferably, the heterologous DNA includes a gene or polynucleotides foreign to the cell.
[00062] “Transformation” refers to the transfer of a nucleic acid fragment into the genome of a host organism, resulting in genetically stable inheritance. Host organisms containing the transformed nucleic acid fragments are referred to as “transgenic” or “recombinant” or “transformed” organisms.
[00063] “Inducible gene regulator” means a genetic element that is capable of regulating gene expression of a gene that is operatively linked to it in a controllable manner. Examples of inducible gene regulators include, but are not limited to gene switches and inducible promoters.
[00064] “Promoter” refers to a DNA sequence capable of controlling the expression of a coding sequence or functional RNA. In general, a coding sequence is located 3' to a promoter sequence. Promoters may be derived in their entirety from a native gene, or be composed of different elements derived from different promoters found in nature, or even comprise synthetic DNA segments. It is understood by those skilled in the art that different promoters may direct the expression of a gene in different tissues or cell types, or at different stages of development, or in response to different environmental or physiological conditions. Promoters that cause a gene to be expressed in most cell types at most times are commonly referred to as “constitutive promoters”. Promoters that cause a gene to be expressed in a specific cell type are commonly referred to as “cell-specific promoters” or “tissue-specific promoters”. Promoters that cause a gene to be expressed at a specific stage of development or cell differentiation are commonly referred to as “developmentally-specific promoters” or “cell differentiation-specific promoters”. Promoters that are induced and cause a gene to be expressed following exposure or treatment of the cell with an agent, biological molecule, chemical, ligand, light, or the like that induces the promoter are commonly referred to as “inducible promoters” or “regulatable promoters”. It is further recognized that since in most cases the exact boundaries of regulatory sequences have not been completely defined, DNA fragments of different lengths may have identical promoter activity.
[00065] A “promoter sequence” is a DNA regulatory region capable of binding RNA polymerase in a cell and initiating transcription of a downstream (3' direction) coding sequence. For purposes of defining the present invention, the promoter sequence is bounded at its 3' terminus by the transcription initiation site and extends upstream (5' direction) to include the minimum number of bases or elements necessary to initiate transcription at levels detectable above background. Within the promoter sequence will be found a transcription initiation site (conveniently defined for example, by mapping with nuclease SI), as well as protein binding domains (consensus sequences) responsible for the binding of RNA polymerase or transcription factors.
[00066] A coding sequence is “under the control” of transcriptional and translational control sequences in a cell when RNA polymerase transcribes the coding sequence into mRNA, which is then RNA spliced (if the coding sequence contains introns) and translated into the protein encoded by the coding sequence.
[00067] “Transcriptional and translational control sequences” are DNA regulatory sequences, such as promoters, enhancers, terminators, and the like, that provide for the expression of a coding sequence in a host cell. In eukaryotic cells, polyadenylation signals are control sequences.
[00068] As used herein, “gene switch” refers to the combination of a, response element that activates transcription upon contact with a chemical ligand or environmental condition and associated with a promoter, and a switch system (examples of which are described herein) which, in the presence of one or more ligands, modulates the expression of a gene into which the response element and promoter are incorporated.
[00069] As used herein, “EcR-based gene switch” is a chimeric (i.e., three-part heterologous) polypeptide comprised of a transcriptional transactivator domain, a DNA-binding domain and an EcR (ecdysone receptor-derived) ligand binding domain. The ligand binding domain may be split into a bipartite arrangement.
[00070] As used herein a “protein” is a polypeptide that performs a structural or functional role in a living cell.
[00071] An “isolated polypeptide” or “isolated protein” or “isolated peptide” is a polypeptide or protein that is substantially free of those compounds that are normally associated therewith in its natural state (e.g., other proteins or polypeptides, nucleic acids, carbohydrates, lipids). “Isolated” is not meant to exclude artificial or synthetic mixtures with other compounds, or the presence of impurities which do not interfere with biological activity, and which may be present, for example, due to incomplete purification, addition of stabilizers, or compounding into a pharmaceutically acceptable preparation.
[00072] As used herein “reference sequence” means a nucleic acid or amino acid used as a comparator for another nucleic acid or amino acid, respectively, when determining sequence identity.
[00073] As used herein “percent identity” or “% identical” refers to the exactness of a match between a reference sequence and a sequence being compared to it when optimally aligned.
[00074] As used herein, “operatively linked” DNA segments, describes that one polynucleotide sequence is joined to another so that the polynucleotides are in association for transcriptional and/or translation control and can be expressed in a suitable host cell.
[00075] The term “about” typically encompasses a range up to 10% of a stated value.
[00076] As used herein, “NLS” or nuclear localization sequence” refers to a peptide sequence that directs a protein to be translocated to the cell nucleus. Of particular interest are fusions of the synthetic nuclease with an NLS, because the NLS “tags” the synthetic nuclease for import into the nucleus by nuclear transport. The NLS typically consists of one or more short sequences of positively charged lysines or arginines. A consensus sequence for one family of NLS is K-K/R-X-K/R. Examples of NLS include but are not limited to MGLDSTAPKK KRKVGIHGVP AA (SEQ ID NO:7), KRPAATKKAG QAKKKK (SEQ ID NO:8), SV40 Large T-Antigen, nucleoplasmin (AVKRPAATKKAGQAKKKKLD (SEQ ID NO:36)), EGL-13 (MSRRRKANPTKLSENAKKLAKEVEN (SEQ ID NO:37)), c-Myc (PAAKRVKLD (SEQ ID NO:38)) and TUS-protein (KLKIKRPVK (SEQ ID NO:39)).
DNA Binding Domains
[00077] The synthetic nucleases of the invention contain a DNA binding domain derived from any native nuclease. Examples include, but are not limited to Cas9, Cpfl and MAD7. In some embodiments, the synthetic nuclease comprises a Cpfl DNA binding domain fused to a nuclease domain of a non-Cpfl nuclease. In some embodiments, the Cpfl DNA binding domain comprises the amino acid sequence of SEQ ID NO: 32. In some embodiments, the amino acid sequence of SEQ ID NO:32 is encoded by a polynucleotide comprising the nucleic acid sequence of SEQ ID NO: 14. In some embodiments, the synthetic nuclease comprises a MAD7 DNA binding domain fused to a nuclease domain of a non-MAD7 nuclease. In some embodiments, the MAD7 DNA binding domain comprises the amino acid sequence of SEQ ID NO:34. In some embodiments, the amino acid sequence of SEQ ID NO:33 is encoded by a polynucleotide comprising the nucleic acid sequence of SEQ ID NO: 12. In some embodiments, the DNA binding domain may further comprise an NLS sequence, such as, but not limited to an amino acid sequence of SEQ ID NO:7, SEQ ID NO:8, SEQ ID NO:35, SEQ ID NO:36, SEQ ID NO:37, or SEQ ID NO:38.
Nuclease Domains
[00078] The synthetic nucleases of the invention contain a nuclease domain derived from a native nuclease such as Cpfl or MAD7. In some embodiments, the synthetic nuclease comprises a Cpfl nuclease domain fused to a DNA binding domain of a non-Cpfl nuclease. In some embodiments, the Cpfl nuclease domain comprises the amino acid sequence of SEQ ID NO:34. In some embodiments, the amino acid sequence of SEQ ID NO:34 is encoded by a polynucleotide comprising the nucleic acid sequence of SEQ ID NO: 13. In some embodiments, the synthetic nuclease comprises a MAD7 nuclease domain fused to a DNA binding domain of a non-MAD7 nuclease. In some embodiments, the MAD7 nuclease domain comprises the amino acid sequence of SEQ ID NO:32. In some embodiments, the amino acid sequence of SEQ ID NO: 32 is encoded by a polynucleotide comprising the nucleic acid sequence of SEQ ID NO: 15. In some embodiments, the DNA binding domain may further comprise an NLS sequence, such as, but not limited to an amino acid sequence of SEQ ID NO:7, SEQ ID NO:8, SEQ ID NO:36, SEQ ID NO:37, SEQ ID NO:38, or SEQ ID NO:39.
Synthetic nuclease examples
[00079] The invention provides synthetic nucleases in which a DNA binding domain of a native nuclease is operably linked to a nuclease domain of a heterologous native nuclease. In this way, the DNA binding domain of choice may be selected for binding a particular DNA sequence of interest and this may be paired with a heterologous nuclease domain of choice to provide specific double-strand breakage in the DNA of interest, such as a sequence to be edited with the aid of a guide RNA (gRNA). The DNA binding domain may be derived from any native nuclease provided that the domain retains DNA binding function. One of skill in the art would readily be able to determine the amino acid sequence necessary in a given DNA binding domain that would retain DNA binding activity through routine binding assays.
[00080] The DNA binding domain of a native nuclease may then be operably linked to a nuclease domain of a heterologous native nuclease to provide double stranded DNA cutting. The nuclease domain may be derived from any native nuclease provided that the domain retains nuclease activity. One of skill in the art would readily be able to determine the amino acid sequence necessary in a given nuclease domain that would retain nuclease activity through routine assays to determine DNA cleaving. One of skill in the art would further readily be able to determine whether a particular construct with a DNA binding domain and a nuclease domain, so operatively linked would retain both DNA binding and nuclease activity.
[00081] In some embodiments, the synthetic nuclease of the invention comprises a Cpfl DNA binding domain fused to a MAD7 nuclease domain (Synl or SynNucl). In some embodiments, the fusion of the Cpfl DNA binding domain and the MAD7 nuclease domain comprises the amino acid sequence of SEQ ID NO:28. In some embodiments, the amino acid sequence of SEQ ID NO:28 is encoded by a polynucleotide comprising the nucleic acid sequence of SEQ ID NO:27. In some embodiments, the synthetic nuclease of the invention comprises a MAD7 DNA binding domain fused to a Cpfl nuclease domain (Syn2 or SynNuc2). In some embodiments, the fusion of the MAD7 DNA binding domain and the Cpfl nuclease domain comprises the amino acid sequence of SEQ ID NO:30. In some embodiments, the amino acid sequence of SEQ ID NO:30 is encoded by a polynucleotide comprising the nucleic acid sequence of SEQ ID NO: 29.
[00082] In some embodiments, the synthetic nuclease further comprises at least one NLS. In some embodiments, the NLS comprises an amino acid sequence of SEQ ID NO:7, SEQ ID NO:8, SEQ ID NO:35, SEQ ID NO:36, SEQ ID NO:37, or SEQ ID NO:38.
[00083] In some embodiments, the synthetic nuclease further comprises a molecular tag sequence. In some embodiments, the molecular tag sequence comprises the amino acid sequence of SEQ ID NO: 9. In some embodiments, the synthetic nuclease comprises both one or more NLS sequences and one or more molecular tags. In some embodiments, the synthetic nuclease comprises two NLS sequences and a molecular tag sequence.
[00084] In some embodiments, the synthetic nuclease comprises a Cpfl DNA binding domain, a MAD7 nuclease domain, 2 NLS sequences and a molecular tag sequence. In some embodiments, the synthetic nuclease comprises the amino acid sequence of SEQ ID NO:2. In some embodiments, the synthetic nuclease of SEQ ID NO:2 is encoded by a polynucleotide comprising the nucleic acid sequence of SEQ ID NO: 1.
[00085] In some embodiments, the synthetic nuclease comprises a MAD7 DNA binding domain, a Cpfl nuclease domain, 2 NLS sequences and a molecular tag sequence. In some embodiments, the synthetic nuclease comprises the amino acid sequence of SEQ ID NO:4. In some embodiments, the synthetic nuclease of SEQ ID NO:4 is encoded by a polynucleotide comprising the nucleic acid sequence of SEQ ID NO:3.
Polynucleotides [00086] The invention also provides polynucleotides encoding synthetic nucleotides. The polynucleotides comprise a nucleic acid sequence encoding a DNA binding domain of a nuclease and a nuclease domain of a heterologous nuclease. In some embodiments, the DNA binding domain is from a MAD7 nuclease. In some embodiments, the polynucleotide encodes a MAD7 amino acid sequence of SEQ ID NO:33. In some embodiments the polynucleotide encoding the MAD7 DNA binding domain comprises the nucleic acid sequence of SEQ ID NO: 12. In some embodiments, the DNA binding domain is from a Cpfl nuclease. In some embodiments, the polynucleotide encodes a Cpfl DNA binding domain amino acid sequence of SEQ ID NO:31. In some embodiments the polynucleotide encoding the Cpfl DNA binding domain comprises the nucleic acid sequence of SEQ ID NO: 14. In some embodiments the polynucleotide encoding the DNA binding domain also comprises a nucleic acid sequence encoding an NLS. In some embodiments, the nucleic acid sequence encodes an NLS comprising an amino acid sequence of SEQ ID NO:7, SEQ ID NO:8, SEQ ID NO:35, SEQ ID NO:36, SEQ ID NO:37, or SEQ ID NO:38. In some embodiments the polynucleotide encodes a DNA binding domain of MAD7 and an NLS and comprising the amino acid sequence of SEQ ID NO:40. In some embodiments, this polynucleotide has the nucleic acid sequence of SEQ ID NO: 39. In some embodiments the polynucleotide encodes a DNA binding domain of Cpfl and an NLS and comprising the amino acid sequence of SEQ ID NO:42. In some embodiments, this polynucleotide has the nucleic acid sequence of SEQ ID NO:41.
[00087] The polynucleotide encoding the MAD7 DNA binding domain may be operatively linked to a polynucleotide encoding a Cpfl nuclease domain. In some embodiments, the polynucleotide encodes a polypeptide comprising the amino acid sequence of SEQ ID NO: 30. In some embodiments the polynucleotide comprises the nucleic acid sequence of SEQ ID NO:29. The polynucleotide encoding the Cpfl DNA binding domain may be operatively linked to a polynucleotide encoding a MAD7 nuclease domain. In some embodiments, the polynucleotide encodes a polypeptide comprising the amino acid sequence of SEQ ID NO:28. In some embodiments the polynucleotide comprises the nucleic acid sequence of SEQ ID NO:27.
[00088] The polynucleotide encoding a MAD7 DNA binding domain and a Cpfl nuclease domain may further comprise nucleic acid sequences encoding at least one NLS and may further include a molecular tag sequence. In some embodiments, the tag sequence encodes an HA-tag. In some embodiments, the polynucleotide encoding the MAD7 DNA binding domain and Cpfl nuclease domain further comprises two NLS sequences and an HA-tag. In some embodiments, the polynucleotide encodes an amino acid sequence of SEQ ID NO:4. In some embodiments, the polynucleotide comprises the nucleic acid sequence of SEQ ID NO:3. [00089] The polynucleotide encoding a Cpfl DNA binding domain and a MAD7 nuclease domain may further comprise nucleic acid sequences encoding at least one NLS and may further include a molecular tag sequence. In some embodiments, the tag sequence encodes an HA-tag. In some embodiments, the polynucleotide encoding the Cpfl DNA binding domain and MAD7 nuclease domain further comprises two NLS sequences and an HA-tag. In some embodiments, the polynucleotide encodes an amino acid sequence of SEQ ID NO:2. In some embodiments, the polynucleotide comprises the nucleic acid sequence of SEQ ID NO:l.
[00090] In some embodiments, the disclosure provides a vector comprising a polynucleotide described herein. In some embodiments, the vector comprises a polynucleotide encoding a DNA binding domain (e.g., a Cpfl DNA binding domain such as SEQ ID NO:31 or SEQ ID NO: 14) and a polynucleotide encoding a nuclease domain (e.g., a MAD7 nuclease domain such as SEQ ID NO:28 or SEQ ID NO:27). In some embodiments, the vector comprises an intron sequence positioned between the DNA binding domain and the nuclease domain. In some embodiments, a vector comprising the intron sequence provides improved expression of a fusion protein comprising the DNA binding domain and the nuclease domain compared with a vector that does not comprise an intron sequence between the DNA binding domain and the nuclease domain. In some embodiments, the intron sequence is a potato St-LSl IV2 intron, a COR15A intron, a UBQ10 intron, a COR15a6L intron, or a COR15allL intron. In some embodiments, the intron sequence is a potato St-LSl IV2 intron.
[00091] In some embodiments, the vector further comprises an NLS sequence as described herein (e.g, SEQ ID NO:7, SEQ ID NO:8, SEQ ID NO:35, SEQ ID NO:36, SEQ ID NO:37, or SEQ ID NO:38). In some embodiments, the vector further comprises a molecular tag sequence as described herein (e.g., SEQ ID NO:3). In some embodiments, the vector comprises a polynucleotide of SEQ ID NO:l. In some embodiments, the vector further comprises a promoter sequence. Exemplary promoters are provided herein.
[00092] In some embodiments, the vector is suitable for transfection into a bacterial cell. In some embodiments, the vector is suitable for transfection into a mammalian cell, e.g, an animal cell. In some embodiments, the vector is suitable for transfection into a plant cell.
Methods of gene editing
[00093] The nucleases may be used in place of well-known gene editing nucleases such as, for example but not by way of limitation, meganucleases, zinc finger nucleases (ZFNs), transcription activator-like effector-based nucleases (TALEN), and the clustered regularly interspaced short palindromic repeats (CRISPR) with Cas9 nuclease, and Cpfl. The nucleases of the invention may be used in any gene editing protocol known in the art in place of the nuclease used for gene editing.
[00094] In some examples, the gene editing protocol would involve gRNA and a nuclease of the invention cutting the host cell DNA as a double-stranded cut and either Non-Homologous End-Joining (NHEJ) or Homology Directed Repair (HDR) mechanisms of repair. The former is a homology-independent pathway involving only a few complementary bases aligning for the re-ligation of two ends and is fairly nonspecific in terms of the alterations introduced as a result. HDR is more specific and allows the user to direct very specific alterations in the genome by altering only a small number of nucleotide changes in an otherwise identical portion of the damaged DNA to be repaired.
Controllable gene editing
[00095] The nucleases of the invention may be expressed in a polynucleotide construct in which the nuclease of the invention and/or gRNAs of interest are expressed under the control of an inducible promoter or gene switch. In this fashion, “on-demand” gene editing may be effected by inducing the promoter or turning on the gene switch to allow transcription of the nucleic acid encoding the nuclease of the invention and/or the gRNA(s) of interest.
[00096] In some embodiments, the promoter is derived from bacteria, e.g., a bacterial promoter. Non-limiting examples of bacterial promoters include T7 promoter, Sp6 promoter, lac promoter, araBad promoter, trp promoter, Ptac promoter, and the like. In some embodiments, the promoter is derived from a eukaryotic system, e.g., a eukaryotic promoter. In some embodiments, the promoter is a mammalian promoter. In some embodiments, the promoter is an insect promoter. Non-limiting examples of mammalian promoters include simian virus 40 early promoter (SV40), cytomegalovirus immediate-early promoter (CMV), human Ubiquitin C promoter (UBC), human elongation factor la promoter (EF1A), mouse phosphogly cerate kinase 1 promoter (PGK), chicken b-Actin promoter coupled with CMV early enhancer (CAGG), and the like. Non-limiting examples of insect promoters include copia transposon promoter (COPIA), actin 5C promoter (ACT5C), and the like. In some embodiments, the promoter is a doxycycline-inducible promoter, e.g., reverse tetracycline-controlled transactivator (rtTA) or tetracycline-responsive element promoter (TRE). In some embodiments, the promoter is derived from a plant, e.g., a plant promoter. Non-limiting examples of plant promoters include Cauliflower mosaic virus (CaMV) 35S, opine promoters, plant ubiquitin (Ubi), rice actin 1 (Act- 1), maize alcohol dehydrogenase 1 (Adh-1), Arabidopsis thaliana small nuclear RNA (U6-26 snRNA) promoter, Arabidopsis thaliana ubiquitin 10 promoter (AtUBQlO), and the like. In some embodiments, the promoter is the CaMV 35 S promoter, the U6-26 snRNA promoter, or the AtUBQlO promoter.
Polycistronic gene editing
[00097] The nucleases of the invention may be incorporated into a vector for gene editing in which one or more gRNAs are encoded by the vector to target different genes in the host cell. The gRNA encoding sequences may be single and each under the control of a promoter, or in a polycistronic array in which one promoter leads to the expression of multiple gRNAs targeting different genes. The expression of multiple gRNAs allows multiple genes to be edited simultaneously by the same nuclease of the invention. A working example of such construct is provided herein.
[00098] The invention provides genome editing within a cell. The cells may be prokaryotic or eukaryotic cells. Examples of the cells which may be edited using the method of the invention include, but are not limited to bacterial cells, yeast cells, plant cells (including monocots and dicots), mammalian cells (e.g., human cells, monkey cells, cattle cells, dog cells, cat cells, sheep cells, horse cells, camel cells, llama cells, alpaca cells, goat cells, pig cells and the like), animal cells including vertebrates and invertebrates (e.g., insect cells, fish cells, plants, animals and bacterial cells.
Transformation of cells using a serine recombinase for gene editing and scarless excision
[00099] A polynucleotide construct comprising a sequence encoding a synthetic nuclease of the invention under the control of a promoter and an attR and/or attL sequence may be introduced into a cell using a serine recombinase. The serine recombinase may direct insertion of the polynucleotide construct into the genome at a pseudo-attP or pseudo-attB site or (when both attB and attP are present on the construct) the construct is introduced randomly or by homologous recombination as determined and designed by the user. The polynucleotide construct may further comprise sequences for one or more gRNAs for editing the genome of the host cell. Constitutive expression of the gRNAs and the nuclease of the invention provide gene editing at loci of interest. Alternatively, the gRNAs and/or nuclease may be under the control of an inducible promoter or gene switch to provide “on-demand” gene editing.
[000100] The host cell may be an animal cell or plant cell.
[000101] In some embodiments, the polynucleotide constructs can be removed by co expression of a serine recombinase and a cognate Recombinase Directionality Factor (RDF), either separately or as a fusion protein. Examples of serine recombinases include, but are not limited to Mycobacterium avium phage Bxbl (Accession ID: NP_075302.1); Streptococcus pyogenes phage 370.1 (Accession ID:WP_010922052.1); Bacillus subtilis phage SP c2 (Accession ID: WP_004399105.1); Listeria monocytogenes phage A118 (Accession ID: WP_015967157.1); and Streptomyces phage <DC31 (Accession ID: WP_107426086.1).
[000102] In a particular embodiment, the invention provides a method of altering expression of a gene or genes in a plant cell comprising introducing into the plant cell a polynucleotide construct comprising an att site and gRNA for altering the expression of a gene(s) under the control of a promoter and a polynucleotide encoding a nuclease of the invention to effect gene editing using the gRNA(s). The nuclease may be under the control of an inducible promoter or a gene switch to regulate expression of the nuclease. In some embodiments, both the gRNA(s) and the nuclease are constitutively expressed. In the method of the invention, the polynucleotide construct is integrated into the plant genome at an att pseudosite using a serine recombinase that is co-introduced into the cell (either as a polynucleotide sequence operably linked to a promoter or as a polypeptide). The serine recombinase effects the integration of the polynucleotide construct comprising the att site at the pseudosite in the genome. The serine recombinase method of transformation can accommodate large pieces of DNA and obviates the need for plant pest sequences such as Agrobacterium sequences. In some embodiments, the polynucleotide construct comprises an attB site. In other embodiments, the polynucleotide construct comprises an attP site. The serine recombinases that are useful for the invention include, but are not limited to Mycobacterium avium Bxbl, Streptococcus pyogenes phage 370.1, Bacillus subtilis phage SP c2, Listeria monocytogenes phage All 8, and Streptomyces phage <DC31.
[000103] In the method, the construct is integrated into the plant cell in a unidirectional manner. Thereafter, the expression of the gRNA(s) and the nuclease of the invention edit the plant cell as desired. In some embodiments, the method further includes introduction of a cognate RDF for the serine recombinase used to introduce the polynucleotide construct. This may be done as a polynucleotide for expressing the RDF under the control of a promoter (including a constitutive promoter, inducible promoter or gene switch) and the co-expression of the cognate serine recombinase leads to the excision of the polynucleotide construct which leaves no trace of the polynucleotide construct behind, enabling the scarless genomic editing of a plant cell without any residual plant pest sequences. In some embodiments, the serine recombinase and the cognate RDF is expressed as a fusion protein such as that provided as SEQ ID NO:
[000104] In particular embodiments, the polynucleotide construct comprises an attP site and is integrated with a Bacillus subtilis phage SP c2 serine recombinase. The nuclease used may comprise the sequences of SEQ ID NO:2, SEQ ID NO:4, SEQ ID NO:28, or SEQ ID NO:30. The RDF may comprise the amino acid sequence of SEQ ID NO:53 or SEQ ID NO:54. In some embodiments, the nucleic acid sequence for the RDF is codon-optimized for plants, such as for example, in SEQ ID NO:55.
[000105] In a particular example, an Editing Cassette comprising a promoter operably linked to a polynucleotide encoding a nuclease of the invention and gRNAs is part of a gene construct to be introduced into a cell. The portion encoding the gRNAs may also be operably linked to a promoter. Optionally, a promoter is operably linked to the nuclease encoding sequence and a second promoter is operably linked to the portion encoding the gRNAs. The promoters may be the same or different. Termination sequences may be provided operably linked to the portion encoding the nuclease and/or the gRNAs. In some embodiments, the promoters are constitutive promoters.
[000106] The construct would also contain an Excision Cassette in which an inducible promoter or gene switch is operably linked to a polynucleotide encoding a serine recombinase and a cognate RDF. The serine recombinase and RDF may be separately expressed or expressed as a fusion protein, such as, but not limited to the fusion protein shown in SEQ ID NO:56. Such construct could also contain additional termination sequences and selectable markers operably linked to promoters. In some embodiments, the constructs would also contain the attP or attB site for the serine recombinase such that when transfected into the cell, the serine recombinase would direct integration of the construct into a pseudosite for the serine recombinase. In some embodiments, the pseudosite is a pseudo-a//P site. In other embodiments the pseudosite is a pseudo -attB site. The serine recombinase mar be, for example, but not by way of limitation, an SF370.1 recombinase, an SPBc2 recombinase, a BXB1 recombinase, a <FC31, or an All 8 recombinase. In particular embodiments, the construct comprises an attP site for integration at a pseudo-a/7/i site and the serine recombinase is an SP c2 serine recombinase. A non-limiting example of such a polynucleotide construct is shown in FIG. 7.
[000107] In another example, a construct containing attB and attP sites could be introduced using Agrobacterium and the insertion takes place generating Ti borders. An on-demand excision can be incorporated by including an Excision Cassette as described to express the serine recombinate and cognate RDF for the incorporated att sites to excise the construct from the cell after gene editing.
[000108] Vectors may be introduced into the desired host cells by methods known in the art, e.g., Agrobaclerium-medialed transformation, transfection, electroporation, microinjection, transduction, cell fusion, DEAE dextran, the flower dipping method, use of a gene gun (biolistics), transformation using a serine recombinase and the like. [000109] One of skill in the art would readily understand that the methods described herein may be modified and optimized for particular embodiments of choice. The following examples are intended to illustrate but not limit the invention.
[000110] All references cited herein, including patents, patent applications, papers, textbooks and the like, and the references cited therein, to the extent that they are not already, are hereby incorporated herein by reference in their entirety.
EMBODIMENTS
[000111] Embodiment 1: A polynucleotide encoding a synthetic nuclease comprising an amino acid sequence of SEQ ID NO:28 or SEQ ID NO: 30.
[000112] Embodiment 2: The polynucleotide of embodiment 1 comprising the nucleic acid sequence of SEQ ID NO: 27 or SEQ ID NO: 29.
[000113] Embodiment 3: The polynucleotide of embodiment 1 or 2, wherein said polynucleotide further comprises a nucleic acid encoding at least one nuclear localization sequence (NLS).
[000114] Embodiment 4: The polynucleotide of embodiment 3 wherein said NLS comprises the amino acid sequence of SEQ ID NO:7, SEQ ID NO:8, SEQ ID NO:35, SEQ ID NO:36, SEQ ID NO:37, or SEQ ID NO:38.
[000115] Embodiment 5: The polynucleotide of any of embodiments 1-4 further comprising a nucleic acid encoding a tag polypeptide.
[000116] Embodiment 6: The polynucleotide of embodiment 5 wherein said tag polypeptide comprises the amino acid sequence of SEQ ID NO:9.
[000117] Embodiment 7: The polynucleotide of embodiment 6 wherein said polynucleotide encodes a polypeptide having the amino acid sequence of SEQ ID NO:2 or SEQ ID NO:4. [000118] Embodiment 8: The polynucleotide of embodiment 7 wherein said polynucleotide comprises the nucleic acid sequence of SEQ ID NO: 1 or SEQ ID NO:3.
[000119] Embodiment 9: A synthetic DNA nuclease comprising a DNA binding domain of Cpfl and a nuclease domain of MAD7, or a DNA binding domain of MAD7 and a nuclease domain of Cpfl.
[000120] Embodiment 10: The synthetic DNA nuclease of embodiment 9 wherein said DNA binding domain of Cpfl comprises the amino acid sequence of SEQ ID NO:31.
[000121] Embodiment 11: The synthetic DNA nuclease of embodiment 9 wherein said nuclease domain of Cpfl comprises the amino acid sequence of SEQ ID NO:34.
[000122] Embodiment 12: The synthetic DNA nuclease of embodiment 9 wherein said DNA binding domain of MAD7 comprises the amino acid sequence of SEQ ID NO:33. [000123] Embodiment 13: The synthetic DNA nuclease of embodiment 9 wherein said nuclease domain of MAD7 comprises the amino acid sequence of SEQ ID NO:32.
[000124] Embodiment 14: The synthetic DNA nuclease of embodiment 9 wherein said nuclease comprises the amino acid sequence of SEQ ID NO:28.
[000125] Embodiment 15: The synthetic DNA nuclease of embodiment 9 wherein said nuclease comprises the amino acid sequence of SEQ ID NO:30.
[000126] Embodiment 16: The synthetic DNA nuclease of embodiment 9 wherein said nuclease comprises the amino acid sequence of SEQ ID NO:2.
[000127] Embodiment 17: The synthetic DNA nuclease of embodiment 9 wherein said nuclease comprises the amino acid sequence of SEQ ID NO:4.
[000128] Embodiment 18: A method of modifying a target locus of interest comprising delivering to said locus a non-naturally occurring composition comprising a synthetic effector protein and one or more nucleic acid components, wherein at least the one or more nucleic acid components is engineered and the effector protein forms a complex with the one or more nucleic acid components and upon binding of the said complex to the target locus of interest, the effector protein induces a modification of the target locus of interest, wherein the synthetic effector protein comprises a DNA binding domain of MAD7 or Cpfl .
[000129] Embodiment 19: The method of embodiment 18 wherein when said effector protein comprises a DNA binding domain of MAD7, said DNA binding domain is operatively linked to a nuclease domain of Cpfl .
[000130] Embodiment 20: The method of embodiment 18 wherein when said effector protein comprises a DNA binding domain of Cpfl, said DNA binding domain is operatively linked to a nuclease domain of MAD7.
[000131] Embodiment 21: The method of embodiment 18 or 19 wherein said effector protein comprises a MAD7 DNA binding domain comprising an amino acid sequence of SEQ ID NO:33.
[000132] Embodiment 22: The method of embodiment 18 or 20 wherein said effector protein comprises a Cpfl DNA binding domain comprising an amino acid sequence of SEQ ID NO:31. [000133] Embodiment 23: The method of any of embodiments 18 to 22 wherein said DNA binding domain further comprises at least one NLS.
[000134] Embodiment 24: The method of embodiment 23 wherein said NLS is one or more of SEQ ID NO:7, SEQ ID NO:8, SEQ ID NO:35, SEQ ID NO:36, SEQ ID NO:37, or SEQ ID NO:38.
[000135] Embodiment 25: The method of embodiment 24 wherein said DNA binding domain comprises the amino acid sequence of SEQ ID NO:40. [000136] Embodiment 26: The method of embodiment 24 wherein said DNA binding domain comprises the amino acid sequence of SEQ ID NO:42.
[000137] Embodiment 27: The method of embodiment 25 or 26 wherein said DNA binding domain further comprises a molecular tag.
[000138] Embodiment 28: The method of embodiment 27 wherein said molecular tag is an HA-tag comprising the sequence of SEQ ID NO:52.
[000139] Embodiment 29: The method of embodiment 27 wherein said nuclease domain comprises the amino acid sequence of SEQ ID NO:46 or SEQ ID NO:50.
[000140] Embodiment 30: The method of embodiment 19 or 20 wherein said nuclease domain comprises an amino acid sequence of SEQ ID NO:34
[000141] Embodiment 31: The method of embodiment 19 or 20 wherein said nuclease domain comprises an amino acid sequence of SEQ ID NO:32
[000142] Embodiment 32: The method of any of embodiments 27 or 28 wherein said nuclease domain further comprises at least one NLS.
[000143] Embodiment 33: The method of embodiment 23 wherein said NLS is one or more of SEQ ID NO:7, SEQ ID NO:8, SEQ ID NO:35, SEQ ID NO:36, SEQ ID NO:37, or SEQ ID NO:38.
[000144] Embodiment 34: The method of embodiment 32 wherein said nuclease domain comprises the amino acid sequence of SEQ ID NO:44.
[000145] Embodiment 35: The method of embodiment 32 wherein said nuclease domain comprises the amino acid sequence of SEQ ID NO:48.
[000146] Embodiment 36: The method of embodiment 34 or 35 wherein said nuclease domain further comprises a molecular tag.
[000147] Embodiment 37: The method of embodiment 36 wherein said molecular tag is an HA-tag comprising the sequence of SEQ ID NO:52.
[000148] Embodiment 38: The method of embodiment 36 wherein said nuclease domain comprises the amino acid sequence of SEQ ID NO:46 or SEQ ID NO:50.
[000149] Embodiment 39: The method of embodiment 18, wherein the target locus of interest comprises DNA.
[000150] Embodiment 40: The method of embodiment 39, wherein the modification of the target locus of interest comprises a strand break.
[000151] Embodiment 41: The method of embodiment 39, wherein the target locus of interest is comprised in a DNA molecule in vitro.
[000152] Embodiment 42: The method of embodiment 39, wherein the target locus of interest is comprised in a DNA within a cell. [000153] Embodiment 43: The method of embodiment 42, wherein the cell is a prokaryotic cell.
[000154] Embodiment 44: The method of embodiment 42, wherein the cell is a eukaryotic cell. [000155] Embodiment 45: The method of embodiment 42 wherein the cell is a plant cell. [000156] Embodiment 46: The method of embodiment 18, wherein the target locus of interest comprises a genomic locus of interest.
[000157] Embodiment 47: The method of embodiment 18, wherein when in complex with the effector protein the nucleic acid component effects sequence specific binding of the complex to a target sequence of the target locus of interest.
[000158] Embodiment 48: The method of embodiment 18, wherein the nucleic acid component(s) comprise a putative CRISPR RNA (crRNA) sequence.
[000159] Embodiment 49: The method of embodiment 48, wherein the nucleic acid component(s) do not comprise any putative trans-activating crRNA (tracr RNA) sequences. [000160] Embodiment 50: The method of embodiment 40, wherein the strand break comprises a single strand break.
[000161] Embodiment 51: The method of embodiment 40, wherein the strand break comprises a double strand break.
[000162] Embodiment 52: The method of embodiment 18, wherein the effector protein and nucleic acid component(s) are provided via one or more polynucleotide molecules encoding the polypeptides and/or the nucleic acid component(s), and wherein the one or more polynucleotide molecules are operably configured to express the polypeptides and/or the nucleic acid component(s).
[000163] Embodiment 53: The method of embodiment 18, wherein the one or more polynucleotide molecules comprise one or more regulatory elements operably configured to express the polypeptides and/or the nucleic acid component(s), optionally wherein the one or more regulatory elements comprise inducible promotors.
[000164] Embodiment 54: The method of embodiment 52, wherein the one or more polynucleotide molecules are comprised within one or more vectors.
[000165] Embodiment 55: The method of embodiment 52, wherein the one or more polynucleotide molecules are comprised in a delivery system, or the method of embodiment 55 wherein the one or more vectors are comprised in a delivery system.
[000166] Embodiment 56: The method of embodiment 18, wherein the non-naturally occurring or engineered composition is delivered via a delivery vehicle comprising liposome(s), particle(s), exosome(s), microvesicle(s), a gene-gun, a ribonucleoprotein complex, one or more viral vectors or by a serine recombinase delivery method. [000167] Embodiment 57: A method of editing the genome of an organism comprising introducing into a cell of the organism a polynucleotide encoding a serine recombinase operably linked to a promoter that is active in the cell and an exogenous polynucleotide comprising a putative CRISPR RNA (crRNA) sequence and an attP or attB site.
[000168] Embodiment 58: The method of embodiment 57 wherein said exogenous polynucleotide further comprises an excision cassette comprising an excision gRNA operably clinked to an inducible promoter or gene switch and a termination sequence.
[000169] Embodiment 59: A method of gene editing in a cell comprising: a. transforming a host cell with a gene expression system comprising: i. a first constitutive promoter operably linked to at least one first polynucleotide encoding at least one gene editing guide RNA (gRNA) to target a gene within a host cell; ii. a second constitutive promoter operably linked to a second polynucleotide encoding a synthetic nuclease of any of embodiments 9 to 17; iii. a third polynucleotide encoding an excision gRNA operably linked to a fourth polynucleotide comprising a inducible gene regulator, wherein said inducible gene regulator is responsive to an activator that activates transcription of the excision gRNA; and iv. PAM sequences flanking the expression system that bind the excision gRNA; wherein said gene expression system comprises at least one att site at one end of the gene expression system; and wherein the excision gRNA is capable of excising the gene expression system by acting with the synthetic nuclease on the PAM sequences; b. allowing said gene editing gRNA and synthetic nuclease to edit at least one target gene in said host cell; c. contacting said host cell with an activator that induces said inducible gene regulator to express said excision gRNA, wherein said excision gRNA acts on said PAM sequences to excise said expression system from the host cell.
[000170] Embodiment 60: The method of embodiments 59 wherein the activator is a chemical ligand.
[000171] Embodiment 61: The method of embodiments 59 wherein the activator is an environmental stimulus.
[000172] Embodiment 62: The method of any of embodiments 59 to 61 wherein said host cell is a mammalian cell, a plant cell, a non-mammalian vertebrate cell or an invertebrate cell. [000173] Embodiment 63: The method of embodiment 59 wherein said gene expression system is introduced into the cell by expressing a serine recombinase that acts on the att site on the gene expression system and a pseudosite in the host cell genome to insert the gene expression system.
[000174] Embodiment 64: The method of embodiment 63 wherein said att site is an attP site. [000175] Embodiment 65: The method of embodiment 63 wherein said att site is an attB site. [000176] Embodiment 66: The method of any of embodiments 63 to 65 wherein said serine recombinase is a Mycobacterium avium phage Bxbl serine recombinase, a Streptococcus pyogenes phage 370.1 serine recombinase, a Bacillus subtilis phage SP c2 serine recombinase, a Listeria monocytogenes phage A118 serine recombinase, or a Streptomyces phage <DC31 serine recombinase.
[000177] Embodiment 67: The method of embodiment 66 wherein said serine recombinase is a Bacillus subtilis phage SP c2 serine recombinase.
[000178] Embodiment 68: A method of gene editing in a cell comprising: a. transforming a host cell with a gene expression system comprising: i. a first constitutive promoter operably linked to at least one first polynucleotide encoding at least one gene editing guide RNA (gRNA) to target a gene within a host cell; ii. a second constitutive promoter operably linked to a second polynucleotide encoding a synthetic nuclease of any of embodiments 9 to 17; iii. at least one att site recognized by a serine recombinase at one end of the gene expression system; iv. a third polynucleotide encoding a recombinase directionality factor (RDF) operably linked to a fourth polynucleotide comprising a inducible gene regulator, wherein said inducible gene regulator is responsive to an activator that activates transcription of the RDF; wherein said transformation of said cell is accomplished by co-introducing said gene expression system and a serine recombinase that recognizes said att site on the gene expression vector and a pseudosite in the genome of the host cell; wherein the RDF is capable of excising the gene expression system by acting with a cognate serine recombinase; b. allowing said gene editing gRNA and synthetic nuclease to edit at least one target gene in said host cell; c. contacting said host cell with an activator that induces said inducible gene regulator to express said RDF, wherein said RDF acts with said serine recombinase to excise said expression system from said host cell.
[000179] Embodiment 69: The method of embodiments 68 wherein the activator is a chemical ligand.
[000180] Embodiment 70: The method of embodiments 68 wherein the activator is an environmental stimulus.
[000181] Embodiment 71: The method of any of embodiments 68 to 70 wherein said host cell is a mammalian cell, a plant cell, a non-mammalian vertebrate cell or an invertebrate cell. [000182] Embodiment 72: The method of embodiment 68 wherein said att site is an attP site. [000183] Embodiment 73: The method of embodiment 68 wherein said att site is an attB site. [000184] Embodiment 74: The method of any of embodiments 68 to 73 wherein said serine recombinase is a Mycobacterium avium phage Bxbl serine recombinase, a Streptococcus pyogenes phage 370.1 serine recombinase, a Bacillus subtilis phage SP c2 serine recombinase, a Listeria monocytogenes phage A118 serine recombinase, or a Streptomyces phage <FC31 serine recombinase.
[000185] Embodiment 75: The method of embodiment 66 wherein said serine recombinase is a Bacillus subtilis phage SP c2 serine recombinase.
[000186] Embodiment 76: The method of any of embodiments 68 to 75 wherein said RDF is a fusion protein of RDF and its cognate serine recombinase.
[000187] Embodiment 77: The method of any of embodiments 68 to 76 wherein the serine recombinase is an SP c2 serine recombinase and the RDF comprises the amino acid sequence of SEQ ID NO:53 or SEQ ID NO:54.
EXAMPLES
Example 1. Design and Construction of Nuclease Vectors for Gene Editing
[000188] DNA sequences corresponding to different functional enzymatic domains from Cpfl and MAD7 nucleases were fused in different combinations to create chimeric/synthetic nucleases. SynNucl (SEQ ID NO:l) corresponds to the fusion of sequences from the nuclease domain of MAD7 (SEQ ID NO: 15) and the DNA-binding domain of Cpfl (SEQ ID NO: 13). The translated SynNucl is shown in SEQ ID NO:2. SynNuc2 (SEQ ID NO:3) corresponds to the fusion of sequences from the nuclease domain of Cpfl (SEQ ID NO: 13) and the DNA-binding domain of MAD7 (SEQ ID NO: 12). The translated SynNuc2 is shown in SEQ ID NO:4. A polynucleotide encoding a nuclear localization signal (NLS) was added both upstream and downstream of the fused sequences. The upstream NLS encodes a polypeptide of SEQ ID NO:7 and the downstream NLS encodes a polypeptide of SEQ ID NO:8. AsCpfl (SEQ ID NO:5) was also synthesized as control. The translated AsCpfl is shown as SEQ ID NO:6. A plant Kozak sequence was added upstream and an HA-tag was added downstream the whole sequence. The HA-tag sequence translation is shown in SEQ ID NO:9. All sequences were optimized for the human embryonic kidney (HEK) cells. These DNA sequences containing nuclease, DNA- binding, NLS, Kozak and tag were synthesized then cloned into our vector backbone (ID414). ID414 contains a guide-RNA (gRNA) targeting the lettuce phytoene desaturase (PDS) gene for gene editing under an Arabidopsis thaliana U6 promoter. FIGS. 1-3 shows the vector maps for these nucleases (ID525: SynNucl (FIG.l); ID526: SynNuc2 (FIG. 2); and ID524: AsCpfl control (FIG. 3)).
Example 2. Construction of SynNucl Vector Targeting Eight Lettuce Genes through a Polycistronic gRNA
[000189] The gRNA region targeting the lettuce PDS gene in ID525 (520 bp) were removed by restriction enzyme digestion using Pad and BstZlll and replaced with polycistronic gRNAs, containing six independent gRNA targeting eight different lettuce genes creating a new vector, ID536 (FIG. 4).
Example 3. Lettuce Protoplast Isolation and Transfection
[000190] Protoplasts were isolated from six week old wild-type lettuce plants (about 1 g of leaf tissue) and transfected following Sheen’s protocol (Yoo, S.D. el al. (2007) Nature Protocols 2:1565-1575). Transfected protoplasts were incubated at 25°C in the dark for about 60 hours.
Example 4. Genomic DNA Extraction from Lettuce Protoplasts
[000191] Sixty hours post-transfection, protoplasts from three independent transfection reactions were pooled together then genomic DNA (gDNA) was extracted with 400 ul urea buffer (6.9 M Urea, 350 mM NaCl, 50 mM Tris-Cl pH 8.0, 20 mM EDTA pH 8.0, 1% Sarkosyl) followed by a phenol: chloroform: isoamyl alcohol and a chloroform: isoamyl alcohol steps. DNA precipitation was done at -80°C for 20 minutes in an equal volume of isopropanol. Finally, DNA was washed once with 70% ethanol and resuspended in 20 ul of distilled (DI) water. gDNA concentration was estimated using a nanodrop 8000 (Thermo Scientific) then diluted at 30 ng/ul for further analysis.
Example 5. PCR amplification of targeted region for gene editing
[000192] Lettuce PDS and/or other targeted region were PCR amplified using Phusion Hot Start II (Thermo Fisher Scientific) and specific set of primers for each target gene on 60 ng/2 ul gDNA following the manufacturer instructions. PCR reactions were run using an Eppendorf MasterCycler EPgradient instrument.
Example 6. Deep Sequencing of PCR Product to Assess Gene Editing Frequency
[000193] Next Generation Sequencing (NGS) was used to deeply sequence the specific PCR products for each target gene and assess gene editing frequency by comparing the number of pair reads showing a mutation (indels) at the target site to the number of pair reads depicting a wild-type pattern. NGS was done by the Center for Computational and Integrative Biology DNA Core Facility at Massachusetts General Hospital, Boston, USA. This method of mutation frequency assessment did not take into consideration the large background of untransfected cells (which do not show mutations); however, the ratio between the number of mutations obtained using one specific nuclease and this same number using a second nuclease are valid and meaningful.
[000194] The synthetic nucleases successfully generated CRISPR-guided DNA double- stranded breaks in the lettuce PDS gene. Protoplast cell transfections were performed for both the synthetic nucleases and AsCpfl (positive control) vectors targeting PDS for gene editing. In addition of these vectors, transfections were also performed with a previously constructed AsCpfl vector containing a different tag (V5-tag) located downstream of the NLS at the 5’ end of the humanized AsCpfl nuclease sequence (available from Addgene plasmid #69982 (ID414 - map) and a native MAD7 vector (ID440) where the E. coli codon optimized MAD7 sequence available from Inscripta has been further codon optimized for homo sapiens using the codon optimization tool from Integrated DNA Technology (IDT, Iowa, USA) both targeting lettuce PDS gene as well.
[000195] PDS targeted region was amplified from gDNA extracted of these transfected protoplasts then submitted to NGS. Table 1 shows mutation frequencies observed in lettuce PDS gene at the targeted site for each of the nucleases/vectors tested.
Figure imgf000032_0002
Figure imgf000032_0001
Figure imgf000033_0001
[000196] The newest AsCpfl vector (ID524) used as control generated lower mutation frequency than our previous AsCpfl vector (ID414). MAD7 vector is very poor in generating mutations. Previous experiments using E. coli optimized MAD7 protein introduced into cells was better at generating mutations. SynNucl was not superior in generating mutations, but was consistent. A few examples of mutations generated by non-homologous end joining (NHEJ) repair mechanism after AsCpfl and SynNucl double-stranded break creation are shown in FIG. 6. Overall, mutations observed ranged from a one base pair deletion to a 16 base pair deletion, or a one base pair insertion.
Example 7. SynNucl was successful in generating CRISPR-guided DNA double-stranded breaks in multiple genes at once
[000197] Protoplast cell transfections were performed with a vector (ID536) containing SynNucl driven by a CaMV 35S promoter along with a polycistronic gRNAs targeting eight different lettuce genes driven by the Arabidopsis U6 promoter. In addition of using this vector, transfections were also performed using few control vectors including ID525 and ID414 for PDS gene editing but also another vector (ID 121) containing Cas9 nuclease and a slightly different polycistronic gRNAs targeting the same eight lettuce genes and two others.
[000198] All targeted regions were PCR amplified from gDNA extracted of these transfected protoplasts then submitted to NGS. Half of the genes targeted for editing by ID121 vector were sent out for NGS. Table 2 shows mutation frequencies observed in all eight lettuce genes at their targeted sites.
Figure imgf000033_0002
Figure imgf000034_0001
[000199] The gene mutations were observed in half of the targeted genes. PPO Genes PPO-A, PPO-B and PPO-C are targeted by the same gRNA while PPO-G is targeted by a different gRNA as the other four genes (PPO-E, PPO-O, PPO-R, and PPO-S). Mutations generated after NHEJ repair of the breaks created by SynNucl in four genes are ranging from one base pair to 5 base pair deletions as well as insertion of 1 to 37 base pairs. Similar mutations are generated after Cas9 breaks including longer deletion (up to 24 base pairs). Table 2 does not show the data for the PDS controls performed in the same transfection experiment, however, the results were similar as previously obtained. SynNucl showed 0.02% mutation frequency on PDS while AsCpfl showed 0.25% mutation frequency.
Example 8. Comparison of Nuclease Editing Efficiency
[000200] Further experiments were performed in lettuce mesophyll protoplasts with transiently transfected plasmids containing different nuclease constructs according to Table 3. The nuclease constructs were assessed for their gene editing efficiency of the lettuce PDS gene. Transfection and analysis were performed as previously described in Examples 3-6. Briefly, freshly isolated mesophyll protoplasts were transfected with the plasmid of interest and incubated for 72 hours. Genomic DNA was extracted, and PCR was performed to amplify the region near the PDS gene, and next generation sequencing was performed to analyze the amplicon.
Table 3. Gene Editing Efficiency of Various Nucleases
Figure imgf000034_0002
Figure imgf000035_0001
[000201] Representative mutations generated in the PDS gene by the nucleases as described in Table 3 are shown in FIGS. 8 and 9. FIG. 8A shows the mutations generated by SynNucl and SynNuc2, and FIG. 8B shows the mutations generated by MAD7, Cpfl Construct 1, and Cpfl Construct 2 in Experiment 1. FIG. 9A shows the mutations generated by SynNucl and SynNuc2, and FIG. 8B shows the mutations generated by MAD7, Cpfl Construct 1, and Cpfl Construct 2 in Experiment 2.
Example 9. Gene Editing Efficiency of SynNucl Expression Constructs
[000202] Various expression constructs of SynNucl, as summarized in Table 4, were tested for gene editing efficiency. Two different promoters were tested, CaMV 35S and AtUBQlO. Constructs were also designed with no intron between the polynucleotides encoding the Cpfl DNA binding domain and the MAD7 nuclease domain, or with a potato St-LSl IV2 intron between the Cpfl DNA binding domain and the MAD7 nuclease domain. The PPO-B, PPO-D, PPO-E, PPO-G, and PPO-S genes were targeted for editing.
Table 4. SynNucl Expression Constructs
Figure imgf000036_0001
[000203] Lettuce mesophyll protoplasts were transfected with plasmids containing the constructs in Table 4. Twenty-four hours post-transfection, genomic DNA was extracted from the protoplasts, then amplified by PCR at the expected gene editing region and sequenced by NGS as described for the previous Examples.
[000204] Results are shown in FIGS. 10A-10C. In FIGS. 10A-10C, SynNucl is referred to as “fMAD7.” Constructs containing the AtUBQlO promoter for expression of the SynNucl nuclease (construct ID 3) or an intron between the DNA binding and nuclease domains of SynNucl (construct ID 2) provided up to 9-fold improvement in editing efficiency over constructs containing the CaMV 35S promoter and without the intron (FIGS. 10A and 10B). A construct containing both the AtUBQlO promoter and the St-LSl IV2 intron (construct ID 4) resulted in a further increase of up to 3.5-fold, in editing efficiency, compared with AtUBQlO promoter with no intron (FIG. IOC).
Example 10. Gene Editing by SynNucl and Cpfl
[000205] Nine hundred thousand lettuce protoplasts were transfected with either Construct 4 from Table 4 (expressing SynNucl under a AtUBQlO promoter and containing the St-LSl IV2 intron), or with a plasmid expressing hAsCpfl, targeting the PDS gene. The transient editing efficiency was evaluated 24 hours post transfection by NGS and determined to be approximately 0.8% for both nucleases. Between 150 and 350 calli per transfection were developed from the protoplasts.
[000206] Twenty -three calli developed from hAsCpfl -transfected protoplasts contained mutations (5-12 base pairs deletion) at the expected region of the PDS gene. One of the mutations was a homozygous mutation (9 base pair deletion between nucleotides 1449 and 1457), resulting in a three amino acid deletion at the end of exon 3, as shown in FIGS. 11A and 1 IB. The gene editing efficiency for the specific case of PDS was estimated to be 11%, based on the number of edited calli out of the total number of calli assessed for editing, as shown in FIG. 12
[000207] Two calli developed from SynNucl -transfected protoplasts contained a heterozygous mutation (6 and 9 base pair deletions between nucleotides 1447 and 1455), as shown in FIG. 13. The PDS gene editing efficiency was estimated to be 1%, based on the number of edited calli out of the total number of calli assessed for editing, as shown in FIG. 12.

Claims

1. A polynucleotide encoding a synthetic nuclease comprising an amino acid sequence of SEQ ID NO:28.
2. The polynucleotide of claim 1, comprising the nucleic acid sequence of SEQ ID NO:27.
3. The polynucleotide of claim 1 or 2, wherein said polynucleotide further comprises a nucleic acid encoding at least one nuclear localization sequence (NLS).
4. The polynucleotide of claim 3, wherein said NLS comprises the amino acid sequence of SEQ ID NO:7, SEQ ID NO:8, SEQ ID NO:35, SEQ ID NO:36, SEQ ID NO:37, or SEQ ID NO:38.
5. The polynucleotide of any of claims 1 to 4, further comprising a nucleic acid encoding a tag polypeptide.
6. The polynucleotide of claim 5, wherein said tag polypeptide comprises the amino acid sequence of SEQ ID NO: 9.
7. The polynucleotide of claim 6, wherein said polynucleotide encodes a polypeptide having the amino acid sequence of SEQ ID NO:2.
8. The polynucleotide of claim 7, wherein said polynucleotide comprises the nucleic acid sequence of SEQ ID NO:l.
9. A synthetic DNA nuclease comprising a DNA binding domain of Cpfl and a nuclease domain of MAD7.
10. The synthetic DNA nuclease of claim 9, wherein said DNA binding domain of Cpfl comprises the amino acid sequence of SEQ ID NO:31.
11. The synthetic DNA nuclease of claim 9 or 10, wherein said nuclease domain of MAD7 comprises the amino acid sequence of SEQ ID NO: 32.
12. The synthetic DNA nuclease of any of claims 9 to 11, wherein said nuclease comprises the amino acid sequence of SEQ ID NO:28.
13. The synthetic DNA nuclease of any of claims 9 to 12, wherein said nuclease comprises the amino acid sequence of SEQ ID NO:2.
14. A method of modifying a target locus of interest comprising delivering to said locus a non-naturally occurring composition comprising a synthetic effector protein and one or more nucleic acid components, wherein at least the one or more nucleic acid components is engineered and the effector protein forms a complex with the one or more nucleic acid components and upon binding of the said complex to the target locus of interest, the effector protein induces a modification of the target locus of interest, wherein the synthetic effector protein comprises a DNA binding domain of Cpfl.
15. The method of claim 14, wherein when said DNA binding domain is operatively linked to a nuclease domain of MAD7.
16. The method of claim 14 or 15, wherein said effector protein comprises a Cpfl DNA binding domain comprising an amino acid sequence of SEQ ID NO:31.
17. The method of any of claims 14 to 16, wherein said DNA binding domain further comprises at least one NLS.
18. The method of claim 17, wherein said NLS is one or more of SEQ ID NO:7, SEQ ID NO: 8, SEQ ID NO:35, SEQ ID NO:36, SEQ ID NO:37, or SEQ ID NO:38.
19. The method of any of claims 14 to 18, wherein said DNA binding domain comprises the amino acid sequence of SEQ ID NO:42.
20. The method of any of claims 14 to 19, wherein said DNA binding domain further comprises a molecular tag.
21. The method of claim 20, wherein said molecular tag is an HA-tag comprising the sequence of SEQ ID NO: 52.
22. The method of any of claims 15 to 21, wherein said nuclease domain comprises the amino acid sequence of SEQ ID NO:46.
23. The method of claim 15, wherein said nuclease domain comprises an amino acid sequence of SEQ ID NO: 32.
24. The method of any of claims 15 to 23, wherein said nuclease domain further comprises at least one NLS.
25. The method of claim 24, wherein said NLS is one or more of SEQ ID NO: 7, SEQ ID NO:8, SEQ ID NO:35, SEQ ID NO:36, SEQ ID NO:37, or SEQ ID NO:38.
26. The method of any of claims 15 to 25, wherein said nuclease domain comprises the amino acid sequence of SEQ ID NO:44.
27. The method of any of claims 15 to 26, wherein said nuclease domain further comprises a molecular tag.
28. The method of claim 27, wherein said molecular tag is an HA-tag comprising the sequence of SEQ ID NO: 52.
29. The method of claim any of claims 15 to 28, wherein said nuclease domain comprises the amino acid sequence of SEQ ID NO:46.
30. The method of any of claims 14 to 29, wherein the target locus of interest comprises DNA.
31. The method of claim 30, wherein the modification of the target locus of interest comprises a strand break.
32. The method of claim 30, wherein the target locus of interest is comprised in a DNA molecule in vitro.
33. The method of claim 30, wherein the target locus of interest is comprised in a DNA within a cell.
34. The method of claim 33, wherein the cell is a prokaryotic cell.
35. The method of claim 33, wherein the cell is a eukaryotic cell.
36. The method of claim 33 wherein the cell is a plant cell.
37. The method of any of claims 14 to 36, wherein the target locus of interest comprises a genomic locus of interest.
38. The method of any of claims 14 to 37, wherein when in complex with the effector protein the nucleic acid component effects sequence specific binding of the complex to a target sequence of the target locus of interest.
39. The method of any of claims 14 to 38, wherein the nucleic acid component(s) comprise a putative CRISPR RNA (crRNA) sequence.
40. The method of claim 39, wherein the nucleic acid component(s) do not comprise any putative trans-activating crRNA (tracr RNA) sequences.
41. The method of claim 31, wherein the strand break comprises a single strand break.
42. The method of claim 31, wherein the strand break comprises a double strand break.
43. The method of any of claims 14 to 42, wherein the effector protein and nucleic acid component(s) are provided via one or more polynucleotide molecules encoding the polypeptides and/or the nucleic acid component(s), and wherein the one or more polynucleotide molecules are operably configured to express the polypeptides and/or the nucleic acid component(s).
44. The method of claim 43, wherein the one or more polynucleotide molecules comprise one or more regulatory elements operably configured to express the polypeptides and/or the nucleic acid component(s), optionally wherein the one or more regulatory elements comprise inducible promotors.
45. The method of claim 43 or 44, wherein the one or more polynucleotide molecules are comprised within one or more vectors.
46. The method of claim 43 or 44, wherein the one or more polynucleotide molecules are comprised in a delivery system.
47. The method of claim 45, wherein the one or more vectors are comprised in a delivery system.
48. The method of any one of claims 14 to 42, wherein the non-naturally occurring or engineered composition is delivered via a delivery vehicle comprising liposome(s), particle(s), exosome(s), microvesicle(s), a gene-gun, a ribonucleoprotein complex, one or more viral vectors or by a serine recombinase delivery method.
49. A method of editing the genome of an organism comprising introducing into a cell of the organism a polynucleotide encoding a serine recombinase operably linked to a promoter that is active in the cell and an exogenous polynucleotide comprising a putative CRISPR RNA (crRNA) sequence and an attP or attB site.
50. The method of claim 49, wherein said exogenous polynucleotide further comprises an excision cassette comprising an excision gRNA operably clinked to an inducible promoter or gene switch and a termination sequence.
PCT/US2020/057141 2019-10-25 2020-10-23 Synthetic nucleases WO2021081384A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
US201962925818P 2019-10-25 2019-10-25
US62/925,818 2019-10-25

Publications (1)

Publication Number Publication Date
WO2021081384A1 true WO2021081384A1 (en) 2021-04-29

Family

ID=73402181

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/US2020/057141 WO2021081384A1 (en) 2019-10-25 2020-10-23 Synthetic nucleases

Country Status (1)

Country Link
WO (1) WO2021081384A1 (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2023092731A1 (en) * 2021-11-29 2023-06-01 科稷达隆(北京)生物技术有限公司 Mad7-nls fusion protein, and nucleic acid construct for site-directed editing of plant genome and application thereof

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP3009511A2 (en) * 2015-06-18 2016-04-20 The Broad Institute, Inc. Novel crispr enzymes and systems
WO2017184768A1 (en) * 2016-04-19 2017-10-26 The Broad Institute Inc. Novel crispr enzymes and systems
WO2017189308A1 (en) * 2016-04-19 2017-11-02 The Broad Institute Inc. Novel crispr enzymes and systems
WO2018213708A1 (en) * 2017-05-18 2018-11-22 The Broad Institute, Inc. Systems, methods, and compositions for targeted nucleic acid editing
WO2019126762A2 (en) * 2017-12-22 2019-06-27 The Broad Institute, Inc. Cas12a systems, methods, and compositions for targeted rna base editing

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP3009511A2 (en) * 2015-06-18 2016-04-20 The Broad Institute, Inc. Novel crispr enzymes and systems
WO2017184768A1 (en) * 2016-04-19 2017-10-26 The Broad Institute Inc. Novel crispr enzymes and systems
WO2017189308A1 (en) * 2016-04-19 2017-11-02 The Broad Institute Inc. Novel crispr enzymes and systems
WO2018213708A1 (en) * 2017-05-18 2018-11-22 The Broad Institute, Inc. Systems, methods, and compositions for targeted nucleic acid editing
WO2019126762A2 (en) * 2017-12-22 2019-06-27 The Broad Institute, Inc. Cas12a systems, methods, and compositions for targeted rna base editing

Non-Patent Citations (9)

* Cited by examiner, † Cited by third party
Title
BHAKTA M. S. ET AL: "Highly active zinc-finger nucleases by extended modular assembly", GENOME RESEARCH, vol. 23, no. 3, 5 December 2012 (2012-12-05), US, pages 530 - 538, XP055785489, ISSN: 1088-9051, Retrieved from the Internet <URL:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3589541/pdf/530.pdf> DOI: 10.1101/gr.143693.112 *
BIN MOON SU ET AL: "Highly efficient genome editing by CRISPR-Cpf1 using CRISPR RNA with a uridinylate-rich 3'-overhang", NATURE COMMUNICATIONS, vol. 9, no. 1, 7 September 2018 (2018-09-07), XP055785455, Retrieved from the Internet <URL:http://www.nature.com/articles/s41467-018-06129-w> DOI: 10.1038/s41467-018-06129-w *
BRIAN CHAIKIND ET AL: "A programmable Cas9-serine recombinase fusion protein that operates on DNA sequences in mammalian cells", NUCLEIC ACIDS RESEARCH, vol. 44, no. 20, 11 August 2016 (2016-08-11), GB, pages 9758 - 9770, XP055411362, ISSN: 0305-1048, DOI: 10.1093/nar/gkw707 *
GAJ THOMAS ET AL: "Enhancing the Specificity of Recombinase-Mediated Genome Engineering through Dimer Interface Redesign", JOURNAL OF THE AMERICAN CHEMICAL SOCIETY, vol. 136, no. 13, 20 March 2014 (2014-03-20), US, pages 5047 - 5056, XP055786442, ISSN: 0002-7863, Retrieved from the Internet <URL:https://pubs.acs.org/doi/pdf/10.1021/ja4130059> DOI: 10.1021/ja4130059 *
LIU ZHENYI ET AL: "ErCas12a CRISPR-MAD7 for Model Generation in Human Cells, Mice, and Rats", THE CRISPR JOURNAL, vol. 3, no. 2, 1 April 2020 (2020-04-01), pages 97 - 108, XP055780662, ISSN: 2573-1599, Retrieved from the Internet <URL:https://www.liebertpub.com/doi/pdf/10.1089/crispr.2019.0068> DOI: 10.1089/crispr.2019.0068 *
R. M. LIU ET AL: "Synthetic chimeric nucleases function for efficient genome editing", NATURE COMMUNICATIONS, vol. 10, no. 1, 1 December 2019 (2019-12-01), XP055697711, DOI: 10.1038/s41467-019-13500-y *
WIERSON WESLEY A. ET AL: "Expanding the CRISPR Toolbox with ErCas12a in Zebrafish and Human Cells", THE CRISPR JOURNAL, vol. 2, no. 6, 1 December 2019 (2019-12-01), pages 417 - 433, XP055786985, ISSN: 2573-1599, Retrieved from the Internet <URL:https://www.liebertpub.com/doi/pdf/10.1089/crispr.2019.0026> DOI: 10.1089/crispr.2019.0026 *
YAMANO TAKASHI ET AL: "Crystal Structure of Cpf1 in Complex with Guide RNA and Target DNA", CELL, ELSEVIER, AMSTERDAM NL, vol. 165, no. 4, 5 May 2016 (2016-05-05), pages 949 - 962, XP029530759, ISSN: 0092-8674, DOI: 10.1016/J.CELL.2016.04.003 *
YOO, S.D. ET AL., NATURE PROTOCOLS, vol. 2, 2007, pages 1565 - 1575

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2023092731A1 (en) * 2021-11-29 2023-06-01 科稷达隆(北京)生物技术有限公司 Mad7-nls fusion protein, and nucleic acid construct for site-directed editing of plant genome and application thereof

Similar Documents

Publication Publication Date Title
US10314297B2 (en) DNA knock-in system
Johnson et al. Comparative assessments of CRISPR-Cas nucleases’ cleavage efficiency in planta
Čermák et al. A multipurpose toolkit to enable advanced genome engineering in plants
CN106715694B (en) Nuclease-mediated DNA Assembly
KR102127418B1 (en) Method for obtaining glyphosate-resistant rice through site-specific nucleotide substitution
CN106795521B (en) Methods and compositions for modifying targeted loci
Hahn et al. Homology-directed repair of a defective glabrous gene in Arabidopsis with Cas9-based gene targeting
US20200224221A1 (en) Genome editing method
WO2022253185A1 (en) Cas12 protein, gene editing system containing cas12 protein, and application
Jiang et al. Demonstration of CRISPR/Cas9/sgRNA-mediated targeted gene modification in Arabidopsis, tobacco, sorghum and rice
WO2019127087A1 (en) System and method for genome editing
Jiang et al. A gene-within-a-gene Cas9/sgRNA hybrid construct enables gene editing and gene replacement strategies in Chlamydomonas reinhardtii
JP2020516255A (en) System and method for genome editing
Nandy et al. Gene stacking in plant cell using recombinases for gene integration and nucleases for marker gene deletion
Kapusi et al. phiC31 integrase-mediated site-specific recombination in barley
CN113913405A (en) System and method for editing nucleic acid
CN111094573A (en) Materials and methods for efficient targeted knock-in or gene replacement
Odell et al. Use of site-specific recombination systems in plants
WO2021081384A1 (en) Synthetic nucleases
CN112672640A (en) Compositions and methods for transferring biomolecules to injured cells
Grønlund et al. Functionality of the β/six site-specific recombination system in tobacco and Arabidopsis: a novel tool for genetic engineering of plant genomes
AU2021254373A1 (en) Genome engineering method and genome engineering kit
CN113795588A (en) Methods for scar-free introduction of targeted modifications in targeting vectors
Liang et al. Temporally gene knockout using heat shock–inducible genome‐editing system in plants
KR102302827B1 (en) Compositon for inhibiting gene expression using CRISPRi

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 20807223

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 20807223

Country of ref document: EP

Kind code of ref document: A1